{"id":106017,"date":"2024-10-21T00:01:00","date_gmt":"2024-10-21T04:01:00","guid":{"rendered":"https:\/\/cdt.org\/?post_type=insight&#038;p=106017"},"modified":"2025-05-07T13:02:31","modified_gmt":"2025-05-07T17:02:31","slug":"beyond-english-centric-ai-lessons-on-community-participation-from-non-english-nlp-groups","status":"publish","type":"insight","link":"https:\/\/cdt.org\/insights\/beyond-english-centric-ai-lessons-on-community-participation-from-non-english-nlp-groups\/","title":{"rendered":"Beyond English-Centric AI: Lessons on Community Participation from Non-English NLP Groups"},"content":{"rendered":"\n<p><strong><em>This report brief was authored by Evani Radiya-Dixit, CDT Summer Fellow for the CDT AI Governance Lab<\/em><\/strong>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/cdt.org\/wp-content\/uploads\/2024\/10\/2025-05-06-AI-Gov-Lab-Beyond-English-Centric-AI-brief.pdf\" target=\"_blank\" rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/cdt.org\/wp-content\/uploads\/2024\/10\/Beyond-English-Centric-AI-Lessons-on-Community-Participation-from-Non-English-NLP-Groups-1024x536.png\" alt=\"CDT brief, entitled &quot;Beyond English-Centric AI: Lessons on Community Participation from Non-English NLP Groups.&quot; Black and white document on a grey background.\" class=\"wp-image-106019\" srcset=\"https:\/\/cdt.org\/wp-content\/uploads\/2024\/10\/Beyond-English-Centric-AI-Lessons-on-Community-Participation-from-Non-English-NLP-Groups-1024x536.png 1024w, https:\/\/cdt.org\/wp-content\/uploads\/2024\/10\/Beyond-English-Centric-AI-Lessons-on-Community-Participation-from-Non-English-NLP-Groups-640x335.png 640w, https:\/\/cdt.org\/wp-content\/uploads\/2024\/10\/Beyond-English-Centric-AI-Lessons-on-Community-Participation-from-Non-English-NLP-Groups-768x402.png 768w, https:\/\/cdt.org\/wp-content\/uploads\/2024\/10\/Beyond-English-Centric-AI-Lessons-on-Community-Participation-from-Non-English-NLP-Groups-1536x804.png 1536w, https:\/\/cdt.org\/wp-content\/uploads\/2024\/10\/Beyond-English-Centric-AI-Lessons-on-Community-Participation-from-Non-English-NLP-Groups-2048x1073.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption class=\"wp-element-caption\"><em>CDT brief, entitled &#8220;Beyond English-Centric AI: Lessons on Community Participation from Non-English NLP Groups.&#8221; Black and white document on a grey background.<\/em><\/figcaption><\/figure>\n\n\n\n<p>   <\/p>\n\n\n\n<p>Many leading language models are trained on nearly a thousand times more English text compared to text in other languages. These disparities in large language models have real-world impacts, especially for racialized and marginalized communities. For example, they have resulted in <a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3589334.3645643\" target=\"_blank\" rel=\"noreferrer noopener\">inaccurate medical advice<\/a> in Hindi, led to wrongful arrest because of <a href=\"https:\/\/www.theguardian.com\/technology\/2017\/oct\/24\/facebook-palestine-israel-translates-good-morning-attack-them-arrest\" target=\"_blank\" rel=\"noreferrer noopener\">mistranslations in Arabic<\/a>, and have been accused of fueling ethnic cleansing in Ethiopia due to <a href=\"https:\/\/restofworld.org\/2023\/ai-content-moderation-hate-speech\/\" target=\"_blank\" rel=\"noreferrer noopener\">poor moderation of speech that incites violence<\/a>.<\/p>\n\n\n\n<p>These harms reflect the English-centric nature of natural language processing (NLP) tools, which prominent tech companies often develop without centering or even involving non-English-speaking communities. In response, region- and language-specific research groups, such as <a href=\"https:\/\/www.masakhane.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">Masakhane<\/a> and <a href=\"https:\/\/turing.iimas.unam.mx\/americasnlp\/\" target=\"_blank\" rel=\"noreferrer noopener\">AmericasNLP<\/a>, have emerged to counter English-centric NLP by empowering their communities to both contribute to and benefit from NLP tools developed in their languages. Based on our research and conversations with these collectives, we outline promising practices that companies and research groups can adopt to broaden community participation in multilingual AI development.<\/p>\n\n\n\n<p><em><strong><a href=\"https:\/\/cdt.org\/wp-content\/uploads\/2024\/10\/2025-05-06-AI-Gov-Lab-Beyond-English-Centric-AI-brief.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Read the full brief.<\/a><\/strong><\/em><\/p>\n","protected":false},"featured_media":106019,"template":"","content_type":[521],"area-of-focus":[834,10216,7253,77,7252],"class_list":["post-106017","insight","type-insight","status-publish","has-post-thumbnail","hentry","content_type-report","area-of-focus-ai-policy-governance","area-of-focus-cdt-ai-governance-lab","area-of-focus-content-moderation","area-of-focus-free-expression","area-of-focus-transparency-accountability"],"acf":[],"_links":{"self":[{"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/insight\/106017","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/insight"}],"about":[{"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/types\/insight"}],"version-history":[{"count":7,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/insight\/106017\/revisions"}],"predecessor-version":[{"id":108745,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/insight\/106017\/revisions\/108745"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/media\/106019"}],"wp:attachment":[{"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/media?parent=106017"}],"wp:term":[{"taxonomy":"content_type","embeddable":true,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/content_type?post=106017"},{"taxonomy":"area-of-focus","embeddable":true,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/area-of-focus?post=106017"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}