A few weeks ago, when I was at the digital rights conference RightsCon in Taiwan, I watched in real time as civil society organizations from around the world, including the US, grappled with the loss of one of the biggest funders of global digital rights work: the United States government.
As I wrote in my dispatch, the Trump administration’s shocking, rapid gutting of the US government (and its push into what some prominent political scientists call “competitive authoritarianism”) also affects the operations and policies of American tech companies—many of which, of course, have users far beyond US borders. People at RightsCon said they were already seeing changes in these companies’ willingness to engage with and invest in communities that have smaller user bases—especially non-English-speaking ones.
As a result, some policymakers and business leaders—in Europe, in particular—are reconsidering their reliance on US-based tech and asking whether they can quickly spin up better, homegrown alternatives. This is particularly true for AI.
One of the clearest examples of this is in social media. Yasmin Curzi, a Brazilian law professor who researches domestic tech policy, put it to me this way: “Since Trump’s second administration, we cannot count on [American social media platforms] to do even the bare minimum anymore.”
Social media content moderation systems—which already use automation and are also experimenting with deploying large language models to flag problematic posts—are failing to detect gender-based violence in places as varied as India, South Africa, and Brazil. If platforms begin to rely even more on LLMs for content moderation, this problem will likely get worse, says Marlena Wisniak, a human rights lawyer who focuses on AI governance at the European Center for Not-for-Profit Law. “The LLMs are moderated poorly, and the poorly moderated LLMs are then also used to moderate other content,” she tells me. “It’s so circular, and the errors just keep repeating and amplifying.”
Part of the problem is that the systems are trained primarily on data from the English-speaking world (and American English at that), and as a result, they perform less well with local languages and context.
Even multilingual language models, which are meant to process multiple languages at once, still perform poorly with non-Western languages. For instance, one evaluation of ChatGPT’s response to health-care queries found that results were far worse in Chinese and Hindi, which are less well represented in North American data sets, than in English and Spanish.
For many at RightsCon, this validates their calls for more community-driven approaches to AI—both in and out of the social media context. These could include small language models, chatbots, and data sets designed for particular uses and specific to particular languages and cultural contexts. These systems could be trained to recognize slang usages and slurs, interpret words or phrases written in a mix of languages and even alphabets, and identify “reclaimed language” (onetime slurs that the targeted group has decided to embrace). All of these tend to be missed or miscategorized by language models and automated systems trained primarily on Anglo-American English. The founder of the startup Shhor AI, for example, hosted a panel at RightsCon and talked about its new content moderation API focused on Indian vernacular languages.