For companies creating artificial intelligence, data collection is proving to be among the most ethically-fraught aspects of the entire process. Almost weekly, there’s news of another company getting it wrong when it comes to the ethics of the data it’s using to train A.I. systems. Ironically, this is true even when companies are seeking to gather new datasets specifically to correct the inherent biases of the old ones.
Google provides the latest case in point. It came under fire after The New York Daily News reported last week that a contractor working for the tech giant had offered black people $5 gift cards in exchange for completing a demographic survey and agreeing to play “a selfie game” that had them performing tasks, such as following a dot on the screen of a mobile phone screen that the contract workers had brought along. What the contractor was not clear about was that while people were playing, the phone was capturing their images, which would later be used to train a facial recognition algorithm.
Workers for the contractor, the recruitment firm Randstad, wound up approaching homeless people, students, and people attending the BET Awards in Atlanta, many of whom later said they were unaware that buried in legal disclaimers they signed was the right to use their faces in this way. Some Randstad workers told The Daily News that Google managers instructed them to target homeless people specifically because they’d be unlikely to understand what they were agreeing to and were “the least likely to say anything to the media.”
The premise of the project was, at least in Google’s telling, actually pretty noble: past facial recognition systems have been found to perform much less accurately with darker-toned faces, partly because black faces were underrepresented in the large datasets used to train the systems. Google said it wanted to build a better dataset so the facial recognition—which it says will power the unlock feature on its new Pixel 4 phone—will be as fair as possible. But the lack of transparency used in collecting the new dataset is appalling. (Google said it had temporarily suspended the project and was investigating Randstad for violating its policies on transparency and consent for research.)
Google is not the only company to stumble in this way. Earlier this year, IBM sought to compile a more diverse dataset of one million faces and then make it freely available to academic researchers. But the company created the database by scraping images from the photo-sharing site Flickr and other public Internet sites, without seeking consent from any of those pictured.
These companies should know better. The fact that they keep getting it wrong makes me wonder if they actually want to get it right. Which is not to say this is an easy problem. Today’s A.I. systems require huge datasets to work well, and obtaining enough data, especially personal or biometric information, with proper consent, is a challenge. Ultimately, synthetic data—which is artificial data created by researchers to mimic the characteristics of real-world data—or simulations may offer a solution. But, in the meantime, businesses need to try harder.
Written by: Jeremy Kahn
First published 08.10.19: https://fortune.com/2019/10/08/why-did-google-offer-black-people-5-to-harvest-their-faces-eye-on-a-i/