{"id":106954,"date":"2025-01-16T00:01:00","date_gmt":"2025-01-16T05:01:00","guid":{"rendered":"https:\/\/cdt.org\/?post_type=insight&#038;p=106954"},"modified":"2025-01-15T15:17:02","modified_gmt":"2025-01-15T20:17:02","slug":"adopting-more-holistic-approaches-to-assess-the-impacts-of-ai-systems","status":"publish","type":"insight","link":"https:\/\/cdt.org\/insights\/adopting-more-holistic-approaches-to-assess-the-impacts-of-ai-systems\/","title":{"rendered":"Adopting More Holistic Approaches to Assess the Impacts of AI Systems"},"content":{"rendered":"\n<p><strong><em>by Evani Radiya-Dixit, CDT Summer Fellow<\/em><\/strong><\/p>\n\n\n\n<p>As artificial intelligence (AI) continues to advance and gain widespread adoption, the topic of how to hold developers and deployers accountable for the AI systems they implement remains pivotal. Assessments of the risks and impacts of AI systems tend to evaluate a system\u2019s outcomes or performance through methods like auditing, red-teaming, benchmarking evaluations, and impact assessments. CDT\u2019s new paper published today, \u201c<a href=\"http:\/\/cdt.org\/insights\/assessing-ai-surveying-the-spectrum-of-approaches-to-understanding-and-auditing-ai-systems\/\" target=\"_blank\" rel=\"noreferrer noopener\">Assessing AI: Surveying the Spectrum of Approaches to Understanding and Auditing AI Systems<\/a>,\u201d provides a framework for understanding this wide range of assessment methods; this explainer on more holistic, \u201csociotechnical\u201d approaches to AI impact assessment is intended as a supplement to that broader paper.<\/p>\n\n\n\n<p>While some have focused primarily on narrow, technical tests to assess AI systems, <a href=\"https:\/\/arxiv.org\/abs\/2306.05949\" target=\"_blank\" rel=\"noreferrer noopener\">academic researchers<\/a>, <a href=\"https:\/\/www.amnesty.org\/en\/latest\/campaigns\/2024\/01\/the-urgent-but-difficult-task-of-regulating-artificial-intelligence\/\" target=\"_blank\" rel=\"noreferrer noopener\">civil society organizations<\/a>, and <a href=\"https:\/\/fedscoop.com\/senate-legislation-to-establish-third-party-ai-audit-guidelines-is-now-bipartisan\/\" target=\"_blank\" rel=\"noreferrer noopener\">government bodies<\/a> have emphasized the need to consider broader social impacts in these assessments. <a href=\"https:\/\/cdt.org\/insights\/applying-sociotechnical-approaches-to-ai-governance-in-practice\/\" target=\"_blank\" rel=\"noreferrer noopener\">As CDT has written about before<\/a>, AI systems are not just technical tools\u2013\u2013they are embedded in society through human relationships and social institutions. The <a href=\"https:\/\/www.whitehouse.gov\/wp-content\/uploads\/2024\/03\/M-24-10-Advancing-Governance-Innovation-and-Risk-Management-for-Agency-Use-of-Artificial-Intelligence.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">OMB guidance on agency use of AI<\/a> and the <a href=\"https:\/\/nvlpubs.nist.gov\/nistpubs\/ai\/NIST.AI.600-1.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">NIST AI Risk Management Framework<\/a> seem to recognize the importance of social context, including policy mandates and recommendations for evaluating the impact of AI-powered products and services on safety and rights.<\/p>\n\n\n\n<p>Many practitioners use the term <a href=\"https:\/\/cdt.org\/insights\/applying-sociotechnical-approaches-to-ai-governance-in-practice\/\" target=\"_blank\" rel=\"noreferrer noopener\">\u201csociotechnical\u201d<\/a> to refer to these human and institutional dimensions that shape the use and impact of AI. Researchers at <a href=\"https:\/\/arxiv.org\/pdf\/2310.11986\" target=\"_blank\" rel=\"noreferrer noopener\">DeepMind<\/a> and <a href=\"https:\/\/arxiv.org\/pdf\/2306.05949\" target=\"_blank\" rel=\"noreferrer noopener\">elsewhere <\/a>have recommended frameworks that help envision what this more holistic approach to AI assessment can look like. These frameworks consider a few layers: First, assessments at the <strong>technical system layer<\/strong> focus on the technical components of an AI system, including the training data, model inputs, and model outputs. Some technical assessments can be conducted when the application or deployment context is not yet determined, such as with general-purpose systems like foundation models. But since the impact of an AI system can depend on factors like the context in which it is used and who uses it, evaluations focused on the <strong>human interaction layer<\/strong> consider the interplay between people and an AI system, such as <a href=\"https:\/\/essay.utwente.nl\/80003\/1\/JorisDijkkamp_MA_BMS.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">how AI hiring tools transform the role of recruiters<\/a>. And beyond this, an AI system can impact broader social systems like labor markets on a larger scale, requiring attention to the <strong>systemic impact layer<\/strong>. Assessments of the human interaction and systemic impact layers, in particular, require understanding the context in which an AI system is or will be deployed, and are critical for assessing systems built or customized for specific purposes. Importantly, <a href=\"https:\/\/deepmind.google\/discover\/blog\/evaluating-social-and-ethical-risks-from-generative-ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">these three layers are not neatly divided<\/a>, and <a href=\"https:\/\/arxiv.org\/pdf\/2306.05949\" target=\"_blank\" rel=\"noreferrer noopener\">social impacts often intersect multiple layers<\/a>.<\/p>\n\n\n\n<p>To illustrate how these three layers can be applied in a tangible context, we consider the example of facial recognition. Clearly a rights-impacting form of AI, this example usefully demonstrates how social context can be incorporated in technical assessments, while also highlighting the limitations of technical assessments in addressing broader societal impacts.<\/p>\n\n\n\n<p><strong>The Need for More Holistic Approaches<\/strong><\/p>\n\n\n\n<p>Current approaches for assessing the impacts of AI systems often focus on their technical components and rely on quantitative methods. For example, <a href=\"https:\/\/arxiv.org\/pdf\/2401.14462\" target=\"_blank\" rel=\"noreferrer noopener\">audits that evaluate the characteristics of datasets<\/a> tend to use methods like measurement of incorrect data and ablation studies, which involve altering aspects of a dataset and measuring the results. Initial industry efforts towards more holistic approaches to assess AI\u2019s impacts have often involved soliciting and crowdsourcing public input. For example, OpenAI initiated a <a href=\"https:\/\/openai.com\/index\/bug-bounty-program\/\" target=\"_blank\" rel=\"noreferrer noopener\">bug bounty program<\/a> and a <a href=\"https:\/\/cdn.openai.com\/chatgpt\/chatgpt-feedback-contest.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">feedback contest<\/a> to better understand the risks and harms of ChatGPT. While these efforts help prevent technical assessments from being overly driven by internal considerations, they still raise questions about <em>who<\/em> is included, whether participants are meaningfully involved in decision-making processes, and whether broader harms like surveillance, censorship, and discrimination are being considered in the public feedback process.<\/p>\n\n\n\n<p>Given the limits of narrow evaluation and feedback methods, we emphasize the role of mixed methods\u2013\u2013incorporating both qualitative and quantitative approaches\u2013\u2013across different layers of assessment. While quantitative metrics can be useful for evaluating AI systems at scale, they risk oversimplifying and missing nuanced notions of harms. In contrast, qualitative assessments can be more holistic, although they may require more resources.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"673\" src=\"https:\/\/cdt.org\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-13-at-42806\u202fPM-1024x673.png\" alt=\"Graphic of a table, showing examples of a Quantitative Assessment vs. Qualitative Assessment.\" class=\"wp-image-106966\" srcset=\"https:\/\/cdt.org\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-13-at-42806\u202fPM-1024x673.png 1024w, https:\/\/cdt.org\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-13-at-42806\u202fPM-640x421.png 640w, https:\/\/cdt.org\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-13-at-42806\u202fPM-768x505.png 768w, https:\/\/cdt.org\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-13-at-42806\u202fPM-1536x1010.png 1536w, https:\/\/cdt.org\/wp-content\/uploads\/2025\/01\/Screenshot-2025-01-13-at-42806\u202fPM.png 1764w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>Graphic of a table, showing examples of a Quantitative Assessment vs. Qualitative Assessment.<\/em><\/figcaption><\/figure>\n\n\n\n<p>As indicated in the table above, practitioners should actively consider social context across each layer and center marginalized communities most impacted by AI systems to ensure that assessments address the systemic inequities these communities face. These considerations can be strengthened through participatory methods that <a href=\"https:\/\/firstmonday.org\/ojs\/index.php\/fm\/article\/view\/13642\" target=\"_blank\" rel=\"noreferrer noopener\">involve users and impacted communities in decision-making processes<\/a> over how AI systems are evaluated.<\/p>\n\n\n\n<p>To make these approaches actionable for practitioners, below we outline an array of methods to better assess and address the impacts of AI systems, along with examples of assessments that use these methods.<\/p>\n\n\n\n<p><strong>1. Incorporate social context and community input into evaluations of AI\u2019s technical components<\/strong><\/p>\n\n\n\n<p>Evaluating an AI system requires not only analyzing its technical components but also examining its impact on people and broader social structures. Traditional assessments often narrowly evaluate impacts at the technical system layer like accuracy or algorithmic bias, relying on quantitative metrics pre-determined by researchers and practitioners. However, even when conducting a technical assessment, there are opportunities to consider the social dimensions of the technical components and decisions that shape the AI system.<\/p>\n\n\n\n<p>By integrating <strong>context about social and historical structures of harm<\/strong>, researchers and practitioners can better identify <em>which impacts<\/em> to evaluate \u2013\u2013 such as a more nuanced notion of bias \u2013\u2013 and determine the appropriate quantitative or qualitative methods for assessing those impacts. In the case of facial recognition<strong> <\/strong>tools, understanding how <a href=\"https:\/\/www.yorku.ca\/edu\/unleading\/systems-of-oppression\/cis-heteropatriarchy\/\" target=\"_blank\" rel=\"noreferrer noopener\">social structures often privilege cisgender men<\/a> can inform an analysis of how these tools <a href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3274357\" target=\"_blank\" rel=\"noreferrer noopener\">operationalize gender in a cis-centric way<\/a>, treating it as binary and tied to physical traits. While many quantitative analyses of facial recognition technology focus narrowly on comparing performance between cis men and cis women to assess gender bias, <a href=\"https:\/\/docs.wixstatic.com\/ugd\/eb2cd9_963fbde2284f4a72b33ea2ad295fa6d3.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">one study<\/a> conducted a mixed methods assessment of how this technology performed on transgender individuals and their experiences with the technology. This example shows that more holistic perspectives can be integrated even in technical assessments.<\/p>\n\n\n\n<p><strong>Input from affected communities<\/strong> can also be incorporated to identify what aspects of an AI system are most relevant to consider in a technical evaluation. For example, through a participatory workshop, <a href=\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3600211.3604682\" target=\"_blank\" rel=\"noreferrer noopener\">one study<\/a> identified harms that AI systems pose to queer people, such as data misrepresentation and exclusionary data collection, which can inform technical assessments that delve deeper into these harms and consider the lived experiences of queer people. Organizations advocating on behalf of communities \u2013\u2013 such as <a href=\"https:\/\/www.queerinai.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Queer in AI<\/a> and the National Association for the Advancement of Colored People (NAACP) \u2013\u2013 can offer valuable input on which impacts to evaluate, without overburdening individual community members. (At the same time, neither organizations nor individual members fully represent the entire community, and affected communities should not be treated as monoliths. And it is critical to remember that affected communities include not only those impacted by AI\u2019s outputs, but also those involved in its inputs and model development, such as <a href=\"https:\/\/ghostwork.info\/\" target=\"_blank\" rel=\"noreferrer noopener\">data workers<\/a> who produce and label data.)<\/p>\n\n\n\n<p>In the case of facial recognition, traditional assessments use metrics like false positive rates to measure the technology\u2019s performance. However, civil rights organizations such as Big Brother Watch offer community input that <a href=\"https:\/\/bigbrotherwatch.org.uk\/blog\/understanding-live-facial-recognition-statistics\/\" target=\"_blank\" rel=\"noreferrer noopener\">these metrics can be misleading<\/a> and suggest practitioners look to more nuanced metrics like <a href=\"https:\/\/www.mctd.ac.uk\/wp-content\/uploads\/2022\/10\/MCTD-FacialRecognition-Report-WEB-1.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">precision rates<\/a> across demographic groups to better understand how the technology impacts different communities. (False positive rates measure the number of errors relative to the total number of people scanned, which can result in a misleadingly low error rate when facial recognition is used to scan large crowds. In contrast, precision rates assess errors against the number of facial recognition matches, providing a clearer picture of the technology\u2019s accuracy.)<\/p>\n\n\n\n<p>Evaluating facial recognition models could also involve input from individuals whose data was used in training. A qualitative assessment might focus on how they were given the agency to provide informed consent, while a quantitative assessment might estimate the percentage of facial images in a dataset collected without consent. Such assessments are important, especially as companies seek to diversify their datasets, which has led to ethically questionable practices like Google reportedly<a href=\"https:\/\/www.theverge.com\/2019\/10\/2\/20896181\/google-contractor-reportedly-targeted-homeless-people-for-pixel-4-facial-recognition\" target=\"_blank\" rel=\"noreferrer noopener\"> collecting the images of unhoused people<\/a> without their informed consent to improve the Pixel phone\u2019s face unlock system for darker-skinned users.<\/p>\n\n\n\n<p>Of course, these examples illuminate the limits of a technical assessment, as they do not capture the many significant harms of facial recognition systems and related technologies, including their role in overpolicing and oversurveilling Black and brown communities. So while social context can be more deeply incorporated in technical assessments, this does not negate the need to consider the broader impact of AI on people and social structures.<\/p>\n\n\n\n<p><strong><em>Methods for considering social dimensions in technical assessments<\/em><\/strong><\/p>\n\n\n\n<p><strong>Literature reviews <\/strong>can be used to incorporate context about social structures of harm into a technical assessment. For example, <a href=\"https:\/\/facctconference.org\/static\/papers24\/facct24-83.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">this quantitative evaluation of racial classification in multimodal models<\/a> was grounded in a qualitative and historical analysis of Black studies and critical data studies literature on the dehumanization and criminalization of Black bodies. Consistent with this literature, the evaluation found that larger models increasingly predicted Black and Latino men as criminals as the pre-training datasets grew in size. Another example is <a href=\"https:\/\/arxiv.org\/pdf\/2006.16923\" target=\"_blank\" rel=\"noreferrer noopener\">this evaluation of the ImageNet dataset<\/a>, informed by a literature review of the critiques of the dataset creation process. The evaluation examined issues of privacy, consent, and harmful stereotypes and uncovered the inclusion of pornographic and non-consensual images in ImageNet. (Literature reviews can also be helpful when <a href=\"https:\/\/arxiv.org\/pdf\/2306.05949\" target=\"_blank\" rel=\"noreferrer noopener\">evaluating a technical system with respect to large-scale societal impacts<\/a>. For example, to evaluate the environmental costs of AI systems, <a href=\"https:\/\/iopscience.iop.org\/article\/10.1088\/2515-7620\/acf81b\" target=\"_blank\" rel=\"noreferrer noopener\">this article<\/a> reviews existing tools for measuring the carbon footprint when training deep learning models.)<\/p>\n\n\n\n<p><strong>Technical assessments can be co-designed with impacted and marginalized communities <\/strong>using processes like <a href=\"https:\/\/www.routledge.com\/Routledge-International-Handbook-of-Participatory-Design\/Simonsen-Robertson\/p\/book\/9780415720212\" target=\"_blank\" rel=\"noreferrer noopener\">Participatory Design<\/a>, <a href=\"https:\/\/www.belfercenter.org\/publication\/design-margins\" target=\"_blank\" rel=\"noreferrer noopener\">Design from the Margin<\/a>, and <a href=\"https:\/\/mitpress.mit.edu\/9780262039536\/\" target=\"_blank\" rel=\"noreferrer noopener\">Value Sensitive Design<\/a>. For example, <a href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3593013.3594005\" target=\"_blank\" rel=\"noreferrer noopener\">one study<\/a> conducted community-based design workshops with older Black Americans to explore how they conceptualize fairness and equity in voice technologies. Participants identified cultural representation \u2013\u2013 such as the technology having knowledge about Juneteenth or Black haircare \u2013\u2013 as a core component of fairness, while also expressing concerns about disclosing identity for representation. This work could inform a co-designed assessment of how voice technologies represent the diversity of Black culture and how much they learn about users\u2019 identities. <a href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3173574.3174230\" target=\"_blank\" rel=\"noreferrer noopener\">Another study<\/a> used participatory design workshops to broadly examine the perceptions of algorithmic fairness among traditionally marginalized communities in the United States, which could serve as a foundation for co-designing evaluation metrics.<\/p>\n\n\n\n<p><strong>Social science research methods like surveys, interviews, ethnography, <\/strong><a href=\"https:\/\/pressbooks.pub\/scientificinquiryinsocialwork\/chapter\/13-4-focus-groups\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>focus groups<\/strong><\/a><strong>, and <\/strong><a href=\"https:\/\/methods.sagepub.com\/foundations\/storytelling-as-qualitative-research\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>storytelling<\/strong><\/a> can be used to center the lived experiences of impacted communities when evaluating technical components like model inputs and outputs. <a href=\"https:\/\/www.arxiv.org\/abs\/2408.01458\" target=\"_blank\" rel=\"noreferrer noopener\">Research has shown<\/a> that surveys on AI topics often decontextualize participant responses, exclude or misrepresent marginalized perspectives, and perpetuate power imbalances between researchers and participants. To move towards <a href=\"https:\/\/www.media.mit.edu\/publications\/decolonial-pathways-our-manifesto-for-a-decolonizing-agenda-in-hci-research-and-design\/\" target=\"_blank\" rel=\"noreferrer noopener\">more just research practices<\/a>, surveys should be co-created with impacted communities, and qualitative methods with carefully chosen groups of participants should be adopted. For example, <a href=\"https:\/\/arxiv.org\/pdf\/2305.11844\" target=\"_blank\" rel=\"noreferrer noopener\">one study<\/a> used focus groups with participants from three South Asian countries to co-design culturally-specific text prompts for text-to-image models and understand their experiences with the generated outputs. The study found that these models often reproduced a problematic outsider\u2019s gaze of South Asia as exotic, impoverished, and homogeneous. <a href=\"https:\/\/facctconference.org\/static\/papers24\/facct24-108.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Another study<\/a> involved professional comedians in focus groups to evaluate the outputs of language models for comedy writing, focusing on issues of bias, stereotypes, and cultural appropriation. Additionally, at a <a href=\"https:\/\/facctconference.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">FAccT conference<\/a> <a href=\"https:\/\/youtu.be\/UySPgihj70E\" target=\"_blank\" rel=\"noreferrer noopener\">tutorial<\/a>, <a href=\"https:\/\/washingtonmonthly.com\/2017\/06\/11\/code-of-silence\/\" target=\"_blank\" rel=\"noreferrer noopener\">Glenn Rodriguez<\/a>, who was formerly incarcerated, used storytelling to illuminate how an input question to the COMPAS recidivism tool \u2013\u2013 which asks an evaluator if the person appears to have &#8220;notable disciplinary issues\u201d \u2013\u2013 could result in the difference between parole release and parole denied.<\/p>\n\n\n\n<p>When gathering community input through the co-design and social science methods discussed above, it is important to conduct a literature review beforehand to understand the histories and structures of harm experienced by affected communities. This desk research helps reduce misunderstandings and enables informed community engagement.<\/p>\n\n\n\n<p><strong>2. Engage with users, impacted communities, and entities with power to evaluate human-AI interactions<\/strong><\/p>\n\n\n\n<p>To evaluate the interactions between people and an AI system, it is important to engage with the users of the system, communities affected by the system, and entities that hold significant influence over the design and deployment of the system.&nbsp;<\/p>\n\n\n\n<p>First, researchers and practitioners can examine how <strong>users<\/strong> interact with the AI system in practice and how the system shapes their behavior or decisions. In the case of police use of<strong> <\/strong>facial recognition technology, a qualitative assessment could investigate whether and <a href=\"https:\/\/www.flawedfacedata.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">how officers modify the images submitted to the technology<\/a>. In contrast, a quantitative assessment might measure the accuracy of officer verifications of the technology\u2019s output when they serve as the \u201chuman in the loop,\u201d given the risk that they may incorrectly view the technology as objective and <a href=\"https:\/\/academic.oup.com\/bjc\/article\/61\/2\/325\/5921789\" target=\"_blank\" rel=\"noreferrer noopener\">defer to its decisions<\/a>.<\/p>\n\n\n\n<p>However, it is important to recognize that the users of an AI system are not always the <strong>communities impacted<\/strong> by the system. For instance, police use of facial recognition in the U.S. often disproportionately harms Black communities, <a href=\"https:\/\/www.dukeupress.edu\/dark-matters\" target=\"_blank\" rel=\"noreferrer noopener\">who have been historically oversurveilled<\/a>. To understand this broader impact of the technology on people, <a href=\"https:\/\/facctconference.org\/static\/papers24\/facct24-137.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">one study<\/a> used a mixed methods approach to examine how impacted communities in Detroit perceived police surveillance technologies. Another assessment might examine the technology\u2019s impact on encounters that Black activists have with police, as seen in the case of <a href=\"https:\/\/banthescan.amnesty.org\/nyc\/index.html#stories\" target=\"_blank\" rel=\"noreferrer noopener\">Derrick Ingram<\/a>, who was harassed by officers after being targeted with the technology at a Black Lives Matter protest.<\/p>\n\n\n\n<p>Moreover, just as researchers and practitioners can uncover how communities are impacted by an AI system, they can also \u201creverse the gaze\u201d by examining the <strong>entities that hold power<\/strong> over the system. In the case of facial recognition, one might examine where police deploy the technology and their decision-making processes that shape deployments. For instance, Amnesty International\u2019s <a href=\"https:\/\/www.amnesty.org\/en\/latest\/news\/2021\/06\/scale-new-york-police-facial-recognition-revealed\/\" target=\"_blank\" rel=\"noreferrer noopener\">Decode Surveillance initiative<\/a> mapped the locations of CCTV cameras across New York City that can be used by the police. <a href=\"https:\/\/www.amnesty.org\/en\/documents\/amr51\/5205\/2022\/en\/\" target=\"_blank\" rel=\"noreferrer noopener\">Their quantitative and qualitative analysis<\/a> revealed that areas with higher proportions of non-white residents had a higher concentration of cameras compatible with facial recognition technology.<\/p>\n\n\n\n<p><strong><em>Methods for holistic assessments of human-AI interactions<\/em><\/strong><\/p>\n\n\n\n<p><strong>Human-computer interaction (HCI) methods like surveys, workshops, interviews, ethnography, focus groups, <\/strong><a href=\"https:\/\/www.userinterviews.com\/ux-research-field-guide-chapter\/diary-studies\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>diary studies<\/strong><\/a><strong>, user research, usability testing, <\/strong><a href=\"https:\/\/www.routledge.com\/Routledge-International-Handbook-of-Participatory-Design\/Simonsen-Robertson\/p\/book\/9780415720212\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>participatory design<\/strong><\/a><strong>, and behavioral experiments<\/strong> can be used to engage with users of an AI system, impacted communities, and the entities shaping the system. For example, <a href=\"https:\/\/academic.oup.com\/socpro\/article-abstract\/68\/3\/608\/5782114\" target=\"_blank\" rel=\"noreferrer noopener\">one study<\/a> conducted an ethnography to examine how users \u2013\u2013 specifically, judges, prosecutors, and pretrial and probation officers \u2013\u2013&nbsp;employed risk scores from predictive algorithms to make decisions. <a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3290605.3300271\" target=\"_blank\" rel=\"noreferrer noopener\">Another study<\/a> assessed child welfare service algorithms using interactive workshops with front-line service providers and families affected by these algorithms. <a href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3531146.3533237\" target=\"_blank\" rel=\"noreferrer noopener\">Still another study<\/a> conducted interviews with financially-stressed users of instant loan platforms in India to investigate power dynamics between users and platforms, possibly influencing <a href=\"https:\/\/techcrunch.com\/2023\/04\/05\/google-personal-loan-apps-update\/\" target=\"_blank\" rel=\"noreferrer noopener\">Google to improve data privacy <\/a>for personal loan apps on its Play Store.<\/p>\n\n\n\n<p><strong>Investigative journalism methods<\/strong> like interviews, content and document analysis, and behind-the-scenes conversations with powerful stakeholders are valuable for examining how entities influence or deploy an AI system. When an AI system operates as an <a href=\"https:\/\/interaktiv.br.de\/paper\/AI-Automation-Lab_Blackbox-Reporting_EN.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">opaque black box<\/a>, legal channels like personal data requests under the California Consumer Privacy Act or public records requests under the Freedom of Information Act can enable access to relevant information about how powerful stakeholders shape and use the system. For example, a public-service radio organization in Germany <a href=\"https:\/\/web.archive.org\/web\/20210604075802\/https:\/\/www.br.de\/nachrichten\/deutschland-welt\/lieferando-neue-belege-fuer-fahrer-ueberwachung,SXxaLu1\" target=\"_blank\" rel=\"noreferrer noopener\">analyzed whether a food delivery company improperly monitored its riders<\/a> by creating an opportunity for riders to request the data the company tracks under the European General Data Protection Regulation and then share it with the organization for analysis. Researchers at the Minderoo Center for Technology and Democracy used freedom of information requests and document analysis to <a href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3593013.3594084\" target=\"_blank\" rel=\"noreferrer noopener\">examine how UK police design and deploy facial recognition technology<\/a>. While most commonly used by third-party researchers, these methods are not limited to external actors; internal practitioners working on AI ethics and governance can use similar methods to assess how research and product teams design AI systems before they launch.<\/p>\n\n\n\n<p><strong>3. Evaluate AI\u2019s impact on social systems and people\u2019s rights with specific objectives to enable accountability<\/strong><\/p>\n\n\n\n<p>Assessing the impact of an AI system requires considering not only how different groups of people interact with it but also its role within broader social and legal contexts. Important values such as privacy and equity are embedded in legal systems, and evaluating a technology\u2019s impact on <strong>people\u2019s rights<\/strong> can support advocacy and policy efforts. In the case of facial recognition, one might qualitatively examine the technology\u2019s impact on the rights to free expression, data protection, and non-discrimination, such as protections codified in the First Amendment, the California Consumer Privacy Act, and the Civil Rights Act in the U.S.<\/p>\n\n\n\n<p>It is also important to consider the impact of AI on <strong>social systems<\/strong>, like mass media, the environment, labor markets, political parties, educational institutions, and the criminal legal system, as well as effects on social dynamics like public trust, cultural norms, and human creativity. In the context of facial recognition, for example, a broader assessment might examine how <a href=\"https:\/\/www.odbproject.org\/wp-content\/uploads\/2019\/03\/ODB_DDP_HighRes_Single.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">community safety<\/a> and <a href=\"https:\/\/academic.oup.com\/bjc\/article-abstract\/60\/6\/1502\/5843315\" target=\"_blank\" rel=\"noreferrer noopener\">public trust in institutions<\/a> are impacted when this technology is adopted more widely, not only by the criminal legal system but also by schools, airports, and businesses.<\/p>\n\n\n\n<p>For such assessments of broader impacts to be impactful and support holding AI actors accountable, they should be <strong>designed with specific objectives and outcomes <\/strong>in mind. For example, an assessment of facial recognition might focus on its impact on the right to free expression and target entities that shape governance around the technology, such as the <a href=\"https:\/\/www.gao.gov\/products\/gao-21-518\" target=\"_blank\" rel=\"noreferrer noopener\">U.S. Government Accountability Office<\/a>. Moreover, the assessment should aim for a concrete outcome, like determining whether the use of facial recognition meets specific legal standards, rather than producing a broad, open-ended list of legal concerns. Although specificity is often associated with technical evaluations, <a href=\"https:\/\/arxiv.org\/pdf\/2401.14462\" target=\"_blank\" rel=\"noreferrer noopener\">research has identified<\/a> that when evaluations of broader impacts are made specific, they can prompt stakeholders to take action, help advocates cite concrete evidence, and enable more precise and actionable policy demands.<\/p>\n\n\n\n<p>When an assessment is made specific, it is important to prioritize the most relevant systemic impacts. For example, one might focus on facial recognition\u2019s effect on free expression since surveillance can significantly inhibit political dissent, which is vital for social justice movements. To operationalize this investigation, one could evaluate how the presence of the technology at protests affects activists\u2019 participation or how the application of the technology online affects <a href=\"https:\/\/docs.rwu.edu\/cgi\/viewcontent.cgi?article=1790&amp;context=rwu_LR\" target=\"_blank\" rel=\"noreferrer noopener\">the use of social media for activism<\/a>.<\/p>\n\n\n\n<p><strong><em>Methods for considering broader societal impacts in assessments<\/em><\/strong><\/p>\n\n\n\n<p><strong>Social science research methods like surveys, forecasts, interviews, experiments, and simulations<\/strong> can be used to evaluate the impact of AI on social systems and dynamics. For example, <a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0747563216301601?casa_token=8_8zQ6iF5qEAAAAA:phhY5dYLY_nuwDT9B5kol2up-i8zXb3n8ETPrnU48kvgMnIA_-ENyCqHmVRdBabaxd8FU83yJA\" target=\"_blank\" rel=\"noreferrer noopener\">one study<\/a> analyzed the chilling effect of peer-to-peer surveillance on Facebook through an experiment and interviews. <a href=\"https:\/\/facctconference.org\/static\/papers24\/facct24-135.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Another assessment<\/a> used simulation to examine how predictive algorithms in the distribution of social goods affect long-term unemployment in Switzerland. To understand the <a href=\"https:\/\/www.schroders.com\/en-gb\/uk\/intermediary\/insights\/ai-revolution-what-s-the-environmental-impact-\/\" target=\"_blank\" rel=\"noreferrer noopener\">environmental impact of AI systems<\/a>, one study <a href=\"https:\/\/dl.acm.org\/doi\/10.5555\/3648699.3648952\" target=\"_blank\" rel=\"noreferrer noopener\">estimated the carbon footprint of BLOOM<\/a>, a 176-billion parameter language model, across its lifecycle, while another argued that <a href=\"https:\/\/arxiv.org\/abs\/2305.05733\" target=\"_blank\" rel=\"noreferrer noopener\">assessments should focus on a specific physical geography<\/a> to highlight impacts on local communities and shape local actions that can advance global sustainability and environmental justice.<\/p>\n\n\n\n<p><strong>Legal analysis<\/strong> is a useful method for assessing the legal compliance of an AI system&#8217;s design and usage. This method involves examining how the AI system may infringe upon rights by reviewing relevant case law, legislation, and regulations. For example, <a href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3366423.3380109\" target=\"_blank\" rel=\"noreferrer noopener\">one audit<\/a> evaluated Facebook\u2019s ad delivery algorithm for compliance with Brazilian election laws around political advertising. <a href=\"https:\/\/repository.essex.ac.uk\/24946\/1\/London-Met-Police-Trial-of-Facial-Recognition-Tech-Report-2.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Another study<\/a> examined the London Metropolitan Police Service\u2019s use of facial recognition with respect to the Human Rights Act 1998, finding that the usage would likely be deemed unlawful if challenged in court.<\/p>\n\n\n\n<p><strong>Power mapping<\/strong> can be used <a href=\"https:\/\/littlesis.org\/toolkit\" target=\"_blank\" rel=\"noreferrer noopener\">to identify target entities<\/a> and design assessments that foster accountability. This method can help identify what will motivate influential individuals and institutions to take action. For example, the <a href=\"https:\/\/stoplapdspying.medium.com\/the-algorithmic-ecology-an-abolitionist-tool-for-organizing-against-algorithms-14fcbd0e64d0\" target=\"_blank\" rel=\"noreferrer noopener\">Algorithmic Ecology tool<\/a> mapped the ecosystem surrounding the predictive policing technology Predpol, outlining PredPol\u2019s impact on communities and identifying key actors across sectors who have shaped the technology. The Algorithmic Ecology tool has been crucial for understanding the extent of PredPol\u2019s harms, challenging its use, and offering a framework that can be applied to other technologies.<\/p>\n\n\n\n<p><strong>Not All Assessments Are Created Equal<\/strong><\/p>\n\n\n\n<p>We discuss a range of approaches to assess the impacts of a given AI system \u2013\u2013 at the technical system layer, the human interaction layer, and the systemic impact layer. However, efforts across these layers may not necessarily carry equal weight in every context, and researchers and practitioners should prioritize certain layers based on the specific AI system being assessed. The greater the system\u2019s potential to affect people\u2019s rights, the more critical it is to consider its impact on users, communities, and society at large.&nbsp;<\/p>\n\n\n\n<p>For example, an assessment of police use of facial recognition should center its significant impact of oversurveilling and overpolicing communities of color, rather than focusing narrowly on its performance on communities of color, which can result in technical improvements that <a href=\"https:\/\/www.amnesty.org\/en\/documents\/doc10\/4254\/2021\/en\/\" target=\"_blank\" rel=\"noreferrer noopener\">perfect it as a tool of surveillance<\/a>. In contrast, an assessment of a voice assistant like Siri, which may pose a lower immediate risk, could initially focus on the technical system. Yet, the social dimensions are still crucial to consider at this layer. For instance, understanding the dominance and enforcement of standardized American English, practitioners might explore <a href=\"https:\/\/www.pnas.org\/doi\/full\/10.1073\/pnas.1915768117\" target=\"_blank\" rel=\"noreferrer noopener\">how the voice assistant performs on African American Vernacular English<\/a> and may <a href=\"https:\/\/www.frontiersin.org\/journals\/artificial-intelligence\/articles\/10.3389\/frai.2021.725911\/full\" target=\"_blank\" rel=\"noreferrer noopener\">exclude or misunderstand Black American speakers<\/a>.<\/p>\n\n\n\n<p>By <a href=\"https:\/\/arxiv.org\/pdf\/2206.04737\" target=\"_blank\" rel=\"noreferrer noopener\">prioritizing certain kinds of assessments<\/a>, we can not only gain a deeper understanding of the impacts of AI technology, but also shape decisions around its design and deployment, and identify red lines where we may not want the technology to be developed or deployed in the first place. Additionally, by <a href=\"https:\/\/ieeexplore.ieee.org\/document\/9627858\" target=\"_blank\" rel=\"noreferrer noopener\">assessing AI systems that have real-world influence<\/a>, we can draw attention to their actual, everyday impacts rather than hypothetical concerns.<\/p>\n\n\n\n<p>Our recommendations consider AI technology not merely as a technical tool, but as a system that both shapes and is shaped by people and social structures. Understanding these broader impacts requires a diverse set of methods that are appropriate for the specific AI system being assessed. Thus, we encourage researchers and practitioners to adopt more holistic methods and urge policymakers to support and incentivize these approaches in AI governance. Moreover, we hope this work fosters the development of assessments that <a href=\"https:\/\/www.nature.com\/articles\/d41586-020-02003-2\" target=\"_blank\" rel=\"noreferrer noopener\">scrutinize systems of power and ultimately uplift the communities most impacted by AI<\/a>.<\/p>\n\n\n\n<p><strong>Acknowledgements<\/strong><\/p>\n\n\n\n<p>Thank you to Miranda Bogen and Ozioma Collins Oguine for valuable feedback on this blog post. We also acknowledge the Partnership on AI\u2019s <a href=\"https:\/\/partnershiponai.org\/global-task-force-for-inclusive-ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">Global Task Force for Inclusive AI<\/a> Guidelines for insights on participatory approaches to understanding the impacts of AI systems.<\/p>\n","protected":false},"featured_media":86099,"template":"","content_type":[7251],"area-of-focus":[834,10216],"class_list":["post-106954","insight","type-insight","status-publish","has-post-thumbnail","hentry","content_type-blog","area-of-focus-ai-policy-governance","area-of-focus-cdt-ai-governance-lab"],"acf":[],"_links":{"self":[{"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/insight\/106954","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/insight"}],"about":[{"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/types\/insight"}],"version-history":[{"count":11,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/insight\/106954\/revisions"}],"predecessor-version":[{"id":107022,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/insight\/106954\/revisions\/107022"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/media\/86099"}],"wp:attachment":[{"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/media?parent=106954"}],"wp:term":[{"taxonomy":"content_type","embeddable":true,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/content_type?post=106954"},{"taxonomy":"area-of-focus","embeddable":true,"href":"https:\/\/cdt.org\/wp-json\/wp\/v2\/area-of-focus?post=106954"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}