You are on page 1of 3

PERSPECTIVE - HYPOTHESIS crimination, especially if the AI uses the

race proxy in its assessment and clinicians

Implications of predicting race are not aware of this.


Therefore, it is critical to evaluate the
reliability of AI in truly independent and

variables from medical images representative datasets in which race vari-


ables are available—for example, from pa-
tient self-reporting or linked administra-
AI-predicted race variables pose risks and opportunities tive data. Ideally, this dataset would come
for studying health disparities from multiple settings or health systems to
avoid overfitting. Then, equity can be mea-
sured by disaggregating the model’s per-
By James Zou1, Judy Wawira Gichoya2, ultrasound videos because heartbeat pat- formance by race variables to ensure that it
Daniel E. Ho3, Ziad Obermeyer4 terns vary with age and sex (3). Therefore, does not have excess errors or differential
if patients of one race skew older or performance between groups.

T
here are now more than 500 US Food younger in the training dataset, then the The ability of AI to pick up hidden phe-
and Drug Administration (FDA)– model can use this correlation to predict notypes, such as race variables, from medi-
approved medical artificial intel- race. Measured confounders (variables cal data could also be a resource for equity.
ligence (AI) devices, and AI is used that may cause spurious association) that AI can generate valuable information for
in diverse medical tasks such as as- could lead to correlations can be adjusted auditing datasets to detect biases (7). For
sessing the risk of heart failure and for directly. But in a study of chest x-rays, example, if an AI model can accurately pre-
diagnosing cancer from images (1). Beyond measured potential confounders were bal- dict attributes that may not be medically

Downloaded from https://www.science.org on July 14, 2023


predicting standard diagnoses, AI models anced across the dataset, and a deep learn- relevant (such as race, or the machine type
can infer a substantial number of patient ing model was still able to predict race that took the image), this suggests possible
features from medical images in ways that variables from the images, achieving an confounders in the training dataset. Maybe
humans cannot. For example, several stud- area under the receiver operating charac- there are spurious correlations involving
ies have demonstrated that some AI models teristic curve (AUROC) above 0.9 (AUROC race variables in the data, or parts of the
can infer race variables (crude simplistic is a common metric of prediction accuracy hospital have different imaging setups. In
categories) directly from medical im- such cases, these confounders should
ages such as chest x-rays and cardiac
ultrasounds, even though there are
“…the ability of AI to predict race variables be explicitly accounted for in subse-
quent analyses. Unexpected predic-
no known human-readable race cor-
relates in these images (2–4). This has
from medical images could be useful tions by AI force these data warts
into the open and could improve
raised concerns about the possibility for monitoring health care disparity…” transparency.
of AI systems to discriminate. At the In many settings, race variables are
same time, AI predictions of demographic where 1 is perfect) (2). This suggests that not available, but laws in numerous coun-
attributes, including race variables, also unmeasured confounders also play a role. tries require assessment of disparities.
have the potential to help assess and moni- Moreover, illnesses can be unequally dis- How can it be ensured that a model, treat-
tor health care disparities and generate new tributed across racial groups; if those ill- ment, or policy benefits a diverse popula-
insights into risk factors (5). nesses have imaging manifestations, algo- tion if the relevant demographic attributes
Race is a complex and problematic so- rithms could also use them to predict race. are missing? Predicted race variables from
cial construct that does not have a bio- Could the ability of AI to predict race an AI algorithm could be a useful, albeit
logical basis (6). The concept of race varies variables from medical images, whatever imprecise, proxy. For example, if research-
across time and geography and according the mechanism, exacerbate health care ers develop a method to diagnose Long
to social factors. Demographic-prediction disparities? Although diagnostic AI is Covid from chest x-ray images, but they do
AI models can impute (or infer) race vari- not trained to directly read out race vari- not have race metadata for many images
ables that are captured in electronic health ables, it is possible that race or its corre- in their dataset, a demographic-prediction
records. These are simplistic and dated lates could be a shortcut that is used by AI model could impute race variables from
reporting categories, such as white, Black, AI algorithms to make disease diagnoses. their images. The model’s performance
and Asian. Despite such limitations, mea- There are some settings where this is pos- when stratified by race variables can then
sures of health care equity and algorithmic sible and potentially harmful. For example, be assessed to ensure that it works well
fairness often depend on stratifying out- suppose the patients of one race are more across diverse populations. There is prec-
comes by race variables. likely to have their images taken by type edent for this imputation approach: The
How AI predicts race variables is puz- A machines and, because of differences in US Centers for Medicare and Medicaid
zling. It is likely that other demographic where the machines are deployed, most Services use race inferred from surname
attributes, comorbidities, or imaging ar- of the positive cases in the training data and location to monitor health care dis-
tifacts are at play. For example, AI can are collected from type B machines. AI can parity (8), and the Consumer Financial
predict patients’ age and sex from cardiac learn to use machine type to predict race Protection Bureau uses a similar approach
variables and disease status (it is less likely to monitor equity in lending (9). The
1
Department of Biomedical Data Science, Stanford to predict disease for images taken from Internal Revenue Service is statutorily pro-
University, Stanford, CA, USA. 2Department of Radiology, machine A). In deployment, the AI would hibited from using Census demographic
Emory University, Atlanta, GA, USA. 3Stanford Law School, then underdiagnose patients of a particu- data to assess disparities in tax administra-
Stanford University, Stanford, CA, USA. 4School of Public
Health, University of California, Berkeley, Berkeley, CA, lar race, leading to care disparity. This tion, and the US Treasury Department has
USA. Email: jamesz@stanford.edu raises serious questions of algorithmic dis- in turn relied on imputed race variables to

SCIENCE science.org 14 JULY 2023 • VOL 381 ISSUE 6654 149


SPECIAL SEC TION A M AC H I N E - I N T E L L I G E N T WO R L D

uncover important disparities in tax ex- Asian American). It may take years for that PERSPECTIVE
penditures and audits (10). process to materialize in health data, and
The usage of imputed race variables in
health care should focus on disparity as-
sessment and algorithmic auditing instead
AI imputations can potentially help to un-
cover health care disparities between gran-
ular subgroups in the meantime (14).
Mitigating bias
of medical decision-making. The accuracy
of imputations should be carefully vali-
dated in each new application. When the
Although race variables are not a gener-
ally meaningful category in medicine, the
ability of AI to predict race variables from
in AI at the
imputed race variables are broad catego-
ries, downstream analysis may miss dispar-
ities within subgroups of the population.
medical images could be useful for monitor-
ing health care disparity and ensuring that
algorithms work well across diverse popu-
point of care
Furthermore, in applications involving lations. Where feasible, researchers can Promoting equity in AI
interpretations of DNA sequencing, other try to predict race and other demographic in health care requires
population labels such as genetic ancestry variables from their data as a mechanism to
are more meaningful categories for assess- thoroughly audit their dataset and model. addressing biases at
ment than race variables (11). If race variables and other attributes can clinical implementation
AI could help to provide more nuanced be accurately predicted, this could reveal
representations of race than the standard potential data or model biases that should
reporting categories, which are often very be addressed. Understanding what features By Matthew DeCamp1 and
coarse. AI produces continuous scores or AI mechanistically uses to predict race Charlotta Lindvall2,3,4
probabilities, not categories. This mirrors variables is important, and AI interpreta-

A
work in human genetics showing that it tion methods such as concept activation, rtificial intelligence (AI) shows

Downloaded from https://www.science.org on July 14, 2023


is more accurate to model populations as saliency map, and counterfactual explana- promise for improving basic and
continuous spectrums of variation in an- tions could help. Overall, being transparent translational science, medicine, and
cestry rather than modeling according to and deliberate about race variable predic- public health, but its success is not
a few discrete race labels (12). AI models, tions should reduce the risk of algorithmic guaranteed. Numerous examples
such as neural networks, also learn to rep- discrimination and could generate valuable have arisen of racial, ethnic, gen-
resent individuals as points in a continu- information about the dataset, the model, der, disability, and other biases in AI ap-
ous space (often called embedding) (13). and health care disparities. j plications to health care. In ethics, bias
Instead of treating, for example, “Asian” as generally refers to any systematic, unfair
REF ERENCES AND NOTES
a single discrete category, an AI model that favoring of people in terms of how they
1. E. Wu et al., Nat. Med. 27, 582 (2021).
assesses chest x-rays maps images from 2. J. W. Gichoya et al., Lancet Digit Health. 4, e406 (2022). are treated or the outcomes they experi-
Asian patients to a continuous geometric 3. G. Duffy et al., NPJ Digit. Med. 5, 188 (2022). ence. Consensus has emerged among sci-
space before making its diagnosis predic- 4. J. Adleberg et al., J. Am. Coll. Radiol. 19, 1151 (2022). entists, ethicists, and policy-makers that
tion. Variation in this space could reflect 5. R. Poplin et al., Nat. Biomed. Eng. 2, 158 (2018). minimizing bias is a shared responsibility
6. A. Deyrup, J. L. Graves Jr., N. Engl. J. Med. 386, 501
ancestry subgroups within Asians as well (2022).
among all involved in AI development. For
as differences in health care access, comor- 7. W. Liang et al., Nat. Mach. Intell. 4, 669 (2022). example, ensuring equity by eliminating
bidities, and other unmeasured factors. 8. A. Fremont, J. S. Weissman, E. Hoch, M. N. Elliott, Rand bias in AI is a core principle of the World
More research is needed to characterize Health Q. 6, 16 (2016). Health Organization for governing AI (1).
9. Consumer Financial Protection Bureau, Using publicly
the implications for health care from dif- But ensuring equity will require more than
available information to proxy for unidentified race
ferences between patient embedding. and ethnicity (2014); https://www.consumerfinance. unbiased data and algorithms. It will also
AI represents patients on a continuous gov/data-research/research-reports/using-publicly- require reducing biases in how clinicians
spectrum rather than as discrete racial available-information-to-proxy-for-unidentified-race- and patients use AI-based algorithms—a
and-ethnicity.
categories. This makes it easier to evaluate potentially more challenging task than re-
10. J.-A. Cronin, P. DeFilippes, R. Fisher, Tax Expenditures
and ensure that the model makes accurate by Race and Hispanic Ethnicity: An application of the ducing biases in algorithms themselves.
predictions across fine-grained subpopula- U.S. Treasury Department’s race and hispanic ethnic- Examination of bias in AI has tended to-
tions. Instead of measuring how well the ity imputation, Department of Treasury Office of Tax ward removing bias from datasets, analy-
AI model performs on Asians as one group, Analysis working paper (2023). ses, or AI development teams. For example,
11. Committee on the Use of Race, Ethnicity, and
which can mask limitations on less popu- Ancestry as Population Descriptors in Genomics because of unequal recruitment and en-
lous subgroups, the model’s performance Research, Board on Health Sciences Policy, rollment, oncology datasets demonstrate
can be assessed on different clusters or Committee on Population, Health and Medicine racial, ethnic, and global geographic biases
spectra of patients in the embedding space. Division, Division of Behavioral and Social Sciences (2). In another example, developers as-
and Education, National Academies of Sciences,
Suppose the model makes more mistakes Engineering, and Medicine, Using Population
sumed that health care costs were a proxy
than expected in a particular data cluster; Descriptors in Genetics and Genomics Research: for health care needs, but then learned
further investigation of what subpopula- A New Framework for an Evolving Field (National that Black Americans often receive less
tion is represented in this error cluster can Academies, 2023). medical care even when they have greater
12. M. Yudell, D. Roberts, R. DeSalle, S. Tishkoff, Science 351,
be undertaken, and the model could be
564 (2016).
retrained to improve on this subgroup. In 13. Y. LeCun, Y. Bengio, G. Hinton, Nature. 521, 436 1
Center for Bioethics and Humanities, Department of
response to advocacy by civil rights groups, (2015).
Medicine, University of Colorado Anschutz Medical
the White House recently proposed updat- 14. L. Cheng, I. Gallegos, D. Ouyang, J. Goldin, Campus, Aurora, CO, USA. 2Department of Psychosocial
D. E. Ho, Proceeding of the ACM Conference
ing the 1997 coarse race reporting stan- Oncology and Palliative Care, Dana-Farber Cancer
on Fairness, Accountability, and Transparency Institute, Boston, MA, USA. 3Department of Informatics and
dard to foster data disaggregation accord- 10.1145/3593013.3594034 (2023). Analytics, Dana-Farber Cancer Institute, Boston, MA, USA.
ing to reporting outcomes by subgroups 4
Department of Medicine, Harvard Medical School, Boston,
(for example, Vietnamese American versus 10.1126/science.adh4260 MA, USA. Email: matthew.decamp@cuanschutz.edu

150 14 JULY 2023 • VOL 381 ISSUE 6654 science.org SCIENCE


Implications of predicting race variables from medical images
James Zou, Judy Wawira Gichoya, Daniel E. Ho, and Ziad Obermeyer

Science, 381 (6654), .


DOI: 10.1126/science.adh4260

View the article online


https://www.science.org/doi/10.1126/science.adh4260

Downloaded from https://www.science.org on July 14, 2023


Permissions
https://www.science.org/help/reprints-and-permissions

Use of this article is subject to the Terms of service

Science (ISSN 1095-9203) is published by the American Association for the Advancement of Science. 1200 New York Avenue NW,
Washington, DC 20005. The title Science is a registered trademark of AAAS.
Copyright © 2023 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim
to original U.S. Government Works

You might also like