The internet is filled with trolls spewing hate speech, but machine learning algorithms can’t help us clean up the mess.
A paper from computer scientists from the University of Washington, Carnegie Mellon University, and the Allen Institute for Artificial Intelligence, found that machines were more likely to flag tweets from black people than white people as offensive. It all boils down to the subtle differences in language. African-American English (AAE), often spoken in urban communities, is peppered with racial slang and profanities.
But even if they contain what appear to be offensive words, the message itself often isn’t abusive. For example, the tweet “I saw him yesterday” is scored as 6 per cent toxic, but it suddenly skyrockets to 95 per cent for the comment “I saw his ass yesterday”. The word ass may be crude, but when used in that context it’s not aggressive at all.
An example of how African-American English (AAE) is mistakenly classified as offensive compared to standard American English. Image credit: Sap et al.
“I wasn’t aware of the exact level of bias in Perspective API–the tool used to detect online hate speech–when searching for toxic language, but I expected to see some level of bias from previous work that examined how easily algorithms like AI chatter bots learn negative cultural stereotypes and associations,”
said Saadia Gabriel, co-author of the paper and a PhD student at the University of Washington.
“Still, it’s always surprising and a little alarming to see how well these algorithms pick up on toxic patterns pertaining to race and gender when presented with large corpora of unfiltered data from the web.”
The researchers fed a total of 124,779 tweets collected from two datasets that were classified as toxic according to Perspective API. Originally developed by Google and Jigsaw, an incubator company currently operating under Alphabet, the machine learning software is used by Twitter to flag any abusive comments.
The tool mistakenly classified 46 per cent of non-offensive tweets crafted in the style of African American English (AAE) as inflammatory, compared to just nine per cent of tweets written in standard American English.
When humans were employed via the Amazon Mechanical Turk service to look at 1,351 tweets from the same dataset and asked to judge if the comment was either offensive to them or could be seen as offensive to anyone.
Just over half – about 55 per cent – were classified as “could be offensive to anyone”. That figure dropped to 44 per cent, however, when they were asked to consider the user’s race and their use of AAE.
“Our work serves as a reminder that hate speech and toxic language is highly subjective and contextual,” said Maarten Sap, first author of the paper and a PhD student at the University of Washington.
“We have to think about dialect, slang and in-group versus out-group, and we have to consider that slurs spoken by the out-group might actually be reclaimed language when spoken by the in-group.”
The study provides yet another reminder that AI models don’t understand the world enough to have common sense. Tools like Perspective API often fail when faced with subtle nuances in human language or even incorrect spellings.
Similar models employed by other social media platforms like Facebook to detect things like violence or pornography often don’t work for the same reason. And this is why these companies can’t rely on machines alone, and have to hire teams of human contractors to moderate questionable content.
Written by: Katyanna Quach
First published 11.10.19: https://www.theregister.co.uk/2019/10/11/ai_black_people/