Bias In, Bias Out

Page 1

how artificial intelligence may unintentionally amplify health disparities


The development of machine learning and artificial intelligence (AI) algorithms has allowed computers to process large, complex datasets and make predictions based on existing patterns. These technological advances could automate the diagnosis of skin cancer through photos, recommend medication based on a patient's full medical history and current symptoms, or even categorize mental disorders based on complex patterns of neural activity rather than a list of psychological symptoms.

Unfortunately, because predictions made by AI systems are based on what humans have decided about similar data in the past, AI can amplify errors and bias embedded in the data that feeds into these algorithms.

Inequality, bias, and discrimination can all be codified and amplified by AI if we aren't careful. Due to systemic oppression, healthcare data can be rife with problems even when scientists are careful to avoid discrimination and bias.


When doing an experiment to test whether a drug works, everything except the drug should be kept as similar as possible, including the research subjects.

In laboratories that work with mice, the gold standard are animals inbred for 10+ generations to create the most homogenous population possible.


Laboratories at universities studying the brain or body often recruit college students, which results in a nicely homogenous population.

Researchers in hospitals might limit the age range to participate in a study, and participants usually need to be able to commit to regular visits with their own transportation, taking days off work if necessary. Despite no ill intentions, the participants in a study are likely to have more resources than people who can't participate, and may not be very diverse.


Because science has historically been conducted by Western men, there has been a pattern of assuming that men represent the norm. This phenomenon is called "male default" and has led to problems for women and children. Many models calculating the danger of exposure to radiation are based on "Reference Man", a 1974 assumption that the standard person is 25 to 30 years old, weighs 154 pounds, is 5 feet 6 inches tall, “Caucasian and has a Western European or North American� lifestyle. This has resulted in miscalculations for people who deviate from that default assumption. Women exposed to radiation in childhood are 10 times more likely to have cancer than predictions made on "Reference Man". Women are 17% more likely to be killed in a car crash than a man and 73% more likely to be injured in a frontal crash, likely due to the automobile safety standards that are based almost entirely on crash test dummies designed to represent men. To this day, there are no crash test dummies that accurately represent a the anatomy of a female body. The average height of a woman or child falls outside the norm that a crash test dummy represents -- resulting in increased danger that could be prevented.


The history of medical malpractice against marginalized people; particularly Black and Indigenous patients, has also led to immense mistrust of medical research. Even today, the inclusion of diverse study participants is an afterthought. Electroencephalography (EEG), one of the most common techniques for measuring brain activity, was developed nearly 100 years ago, but has been regarded as incompatible with thick, curly hair.

Only in recent years have advances been made - largely by Black students motivated to bridge this egregious gap.



When looking at the results of an experiment, researchers expect to see a "normal" distribution with most people responding near the average.

Data at the tail ends of the distribution are deemed "deviant".

If the data is collected from a group with limited diversity, there may be a very small range on the distribution that is considered within the standard amount of deviation. Data from groups that are not well-represented in clinical studies may look "abnormal" if they were not included when it was decided what was "normal".


As we enter an era of personalized medicine driven by an abundance of knowledge and data, artificial intelligence algorithms are being trained to predict the best treatments for individual patients.

But what about the people who have traits that are outside what researchers have deemed the norm? Whose data will be considered deviant due to lack of representation?


On top of problems with lack of representation in clinical studies, AI trained on observations of healthcare data can amplify health disparities too. One recent example was the use of an AI to calculate a "risk score" that would allow patients with the highest scores to access services like automatic prescription refills and home nurse visits. This algorithm relied on the amount of money that patients spent on their healthcare to calculate the score.

Assumption: Higher health costs = higher health needs

However, people who make less money are likely to spend less money on healthcare. Predictions made by AI systems may be missing this context assume poor people are healthier than in reality. This is exactly what happened, and poor Black people were assigned lower risk scores than they should have received, causing them to miss out on the additional resources.

The assumption that spending was an appropriate measure for health status ended up biasing the algorithm against poor and Black patients, even when there was no intention to cause harm.


Artificial intelligence algorithms rely on pre-existing data to make associations and predictions. If the bulk of existing data is biased towards certain groups; people from Western, Educated, Industrialized, Rich, and Democratic countries, what we have is WEIRD data.

Making predictions trained on WEIRD data is unlikely to be generalizable to all of humankind.


A major concern for relying on AIgenerated predictions is the potential for reinforcing cycles of oppression. For example, we could train an algorithm to predict the likelihood that a child has disease "X" before referring them to a costly diagnostic process. This could prevent healthy children from undergoing unnecessary diagnostic tests and flag children who are at high risk to participate in testing. Sounds awesome, right?? Disease "X" primarily affects people with ancestry in a specific region of Europe, and 95% of patients selfidentify as white.

This may cause patients of color to be underdiagnosed, leading to further underrepresentation in the clinical data, which reinforces the cycle of exclusion.

The AI categorizes most patients who selfidentify as non-white as low-risk and recommends against further testing. This example, while fictional, mirrors the racial disparities in Cystic Fibrosis, where some patients were assumed not to have the disease solely based on the color of their skin. However, having dark skin does not necessarily rule out European ancestry. This time around, instead of racism from doctors, it's biased or incomplete data that can cause the disparities.


Some ways to prevent AI from reinforcing healthcare disparities: Mandate and fund clinical research on diverse study populations. Regain trust from disenfranchised groups to encourage participation in clinical studies. Hire people from marginalized groups to design and develop AI tools. Protect whistleblowers who expose inequitable practices. Increase access to high-quality healthcare for all people. Facilitate the sharing of data collected by private companies with researchers in a safe and ethical manner. Perform regular checks for bias and discrimination in diagnoses and predictions made by computers and humans.

What else can you think of?


Artificial intelligence opens up a world of possibilities, but if we don't acknowledge the bias that may be baked into clinical research and medical data, we may amplify the disparities that are already plaguing healthcare. There is an urgent need for intervention; for clinicians and researchers to imagine a world where all patients are treated equitably and build models with this vision in mind. As they say; garbage in, garbage out. And if we don't clean the garbage up, we'll simply make more.

Training Data

AI Output


You Look Like a Thing and I Love You: How Artificial Intelligence Works and Why It's Making the World a Weirder Place by Janelle Shane https://www.passblue.com/2017/07/05/females-exposed-tonuclear-radiation-are-far-likelier-than-males-to-suffer-harm/ https://www.wired.com/story/how-algorithm-favored-whitesover-blacks-health-care/ https://www.scientificamerican.com/article/health-care-aisystems-are-biased/ https://massivesci.com/articles/racial-bias-eeg-electrodesresearch/ https://massivesci.com/articles/ai-medicine-racial-bias-covid19/ https://thehill.com/blogs/pundits-blog/healthcare/347780black-americans-dont-have-trust-in-our-healthcare-system https://www.sciencemag.org/news/2019/07/western-mindtoo-weird-study https://www.consumerreports.org/car-safety/crash-test-biashow-male-focused-testing-puts-female-drivers-at-risk/


Created by Christine Liu for the Resistance AI workshop at NeurIPs 2020 Special thanks to reviewers for important feedback