Supposedly ‘fair’ algorithms can perpetuate discrimination


Minnesota Historical Society/Corbis/Getty Images

Minnesota Historical Society/Corbis/Getty Images

By Joi Ito

During the Long Hot Summer of 1967, race riots erupted across the United States. The 159 riots—or rebellions, depending on which side you took—were mostly clashes between the police and African Americans living in poor urban neighborhoods. The disrepair of these neighborhoods before the riots began and the difficulty in repairing them afterward was attributed to something called redlining, an insurance-company term for drawing a red line on a map around parts of a city deemed too risky to insure.

In an attempt to improve recovery from the riots and to address the role redlining may have played in them, President Lyndon Johnson created the President’s National Advisory Panel on Insurance in Riot-Affected Areas in 1968. The report from the panel showed that once a minority community had been redlined, the red line established a feedback cycle that continued to drive inequity and deprive poor neighborhoods of financing and insurance coverage—redlining had contributed to creating poor economic conditions, which already affected these areas in the first place. There was a great deal of evidence at the time that insurance companies were engaging in overtly discriminatory practices, including redlining, while selling insurance to racial minorities, and would-be home- and business-owners were unable to get loans because financial institutions require insurance when making loans. Even before the riots, people there couldn’t buy or build or improve or repair because they couldn’t get financing.

Because of the panel’s report, laws were enacted outlawing redlining and creating incentives for insurance companies to invest in developing inner-city neighborhoods. But redlining continued. To justify their discriminatory pricing or their refusal to sell insurance in urban centers, insurance companies developed sophisticated arguments about the statistical risks that certain neighborhoods presented.

The argument insurers used back then—that their job was purely technical and that it didn’t involve moral judgments—is very reminiscent of the arguments made by some social network platforms today: That they are technical platforms running algorithms and should not be, and are not, involved in judging the content. Insurers argued that their job was to adhere to technical, mathematical, and market-based notions of fairness and accuracy and provide what was viewed—and is still viewed—as one of the most essential financial components of society. They argued that they were just doing their jobs. Second-order effects on society were really not their problem or their business.

Thus began the contentious career of the notion of “actuarial fairness,” an idea that would spread in time far beyond the insurance industry into policing and paroling, education, and eventually AI, igniting fierce debates along the way over the push by our increasingly market-oriented society to define fairness in statistical and individualistic terms rather than relying on the morals and community standards used historically.

Risk spreading has been a central tenet of insurance for centuries. Risk classification has a shorter history. The notion of risk spreading is the idea that a community such as a church or village could pool its resources to help individuals when something unfortunate happened, spreading risk across the group—the principle of solidarity. Modern insurance began to assign a level of risk to an individual so that others in the pool with her had roughly the same level of risk—an individualistic approach. This approach protected individuals from carrying the expense of someone with a more risk-prone and costly profile. This individualistic approach became more prevalent after World War II, when the war on communism made anything that sounded too socialist unpopular. It also helped insurance companies compete in the market. By refining their risk classifications, companies could attract what they called “good risks.” This saved them money on claims and forced competitors to take on more expensive-to-insure “bad risks.”

(A research colleague of mine, Rodrigo Ochigame, who focuses on algorithmic fairness and actuarial politics, directed me to historian Caley Horan, who is working on an upcoming book titled Insurance Era: The Privatization of Security and Governance in the Postwar United States that will elaborate on many of the ideas in this article, which is based on her research.)

The original idea of risk spreading and the principle of solidarity was based on the notion that sharing risk bound people together, encouraging a spirit of mutual aid and interdependence. By the final decades of the 20th century, however, this vision had given way to the so-called actuarial fairness promoted by insurance companies to justify discrimination.

While discrimination was initially based on outright racist ideas and unfair stereotypes, insurance companies evolved and developed sophisticated-seeming calculations to show that their discrimination was “fair.” Women should pay more for annuities because statistically they lived longer, and blacks should pay more for damage insurance when they lived in communities where crime and riots were likely to occur. While overt racism and bigotry still exist across American society, in insurance it has been integrated into and hidden from the public behind mathematics and statistics that are so difficult for nonexperts to understand that fighting back becomes nearly impossible.

By the late 1970s, women’s activists had joined civil rights groups in challenging insurance redlining and risk-rating practices. These new insurance critics argued that the use of gender in insurance risk classification was a form of sex discrimination. Once again, insurers responded to these charges with statistics and mathematical models. Using gender to determine risk classification, they claimed, was fair; the statistics they used showed a strong correlation between gender and the outcomes they insured against.

And many critics of insurance inadvertently bought into the actuarial fairness argument. Civil rights and feminist activists in the late 20th century lost their battles with the insurance industry because they insisted on arguing about the accuracy of certain statistics or the validity of certain classifications rather than questioning whether actuarial fairness—an individualistic notion of market-driven pricing fairness—was a valid way of structuring a crucial and fundamental social institution like insurance in the first place.

But fairness and accuracy are not necessarily the same thing. For example, when Julia Angwin pointed out in her ProPublica report that risk scores used by the criminal justice system were biased against people of color, the company that sold the algorithmic risk score system argued that its scores were fair because they were accurate. The scores accurately predicted that people of color were more likely to reoffend. This likelihood of reoffense, called the recidivism rate, is the likelihood that someone recommits a crime after being released, and the rate is calculated primarily using arrest data. But this correlation contributes to discrimination, because using arrests as a proxy for recommitting a crime means the algorithm is codifying biases in arrests, such as a police officer bias to arrest more people of color or to patrol more heavily in poor neighborhoods. This risk of recidivism is used to set bail and determine sentencing and parole, and it informs predictive policing systems that direct police to neighborhoods likely to have more crime.

There are several obvious problems with this. If you believe the risk scores are accurate in predicting the future outcomes of a certain group of people, then it means it’s “fair” that a person is more likely to spend more time in jail simply because they are black. This is actuarially “fair” but clearly not “fair” from a social, moral, or anti-discrimination perspective.

The other problem is that there are fewer arrests in rich neighborhoods, not because people there aren’t smoking as much pot as in poor neighborhoods but because there is less policing. Obviously, one is more likely to be rearrested if one lives in an overpoliced neighborhood, and that creates a feedback loop—more arrests mean higher recidivism rates. In very much the same way that redlining in minority neighborhoods created a self-fulfilling prophecy of uninsurable communities, overpolicing and predictive policing may be “fair” and “accurate” in the short term, but the long-term effects on communities have been shown to be negative, creating self-fulfilling prophecies of poor, crime-ridden neighborhoods.

Angwin also showed in a recent ProPublica report that, despite regulations, insurance companies charge minority communities higher premiums than white communities, even when the risks are the same. The Spotlight team at The Boston Globe reported that the household median net worth in the Boston area was $247,500 for whites and $8 for nonimmigrant blacks—the result of redlining and unfair access to housing and financial services. So while redlining for insurance is not legal, when Amazon decides to provide Amazon Prime free same-day shipping to its “best” customers, it’s effectively redlining—reinforcing the unfairness of the past in new and increasingly algorithmic ways.

Like the insurers, large tech firms and the computer science community also tend to frame “fairness” in a depoliticized, highly technical way involving only mathematics and code, which reinforces a circular logic. AI is trained to use the outcomes of discriminatory practices, like recidivism rates, to justify continuing practices such as incarceration or overpolicing that may contribute to the underlying causes of crime, such as poverty, difficulty getting jobs, or lack of education. We must create a system that requires long-term public accountability and understandability of the effects on society of policies developed using machines. The system should help us understand, rather than obscure, the impact of algorithms on society. We must provide a mechanism for civil society to be informed and engaged in the way in which algorithms are used, optimizations set, and data collected and interpreted.

The computer scientists of today are more sophisticated in many ways than the actuaries of yore, and they often sincerely are trying to build algorithms that are fair. The new literature on algorithmic fairness usually doesn’t simply equate fairness with accuracy, but instead defines various trade-offs between fairness and accuracy. The problem is that fairness cannot be reduced to a simple self-contained mathematical definition—fairness is dynamic and social and not a statistical issue. It can never be fully achieved and must be constantly audited, adapted, and debated in a democracy. By merely relying on historical data and current definitions of fairness, we will lock in the accumulated unfairnesses of the past, and our algorithms and the products they support will always trail the norms, reflecting past norms rather than future ideals and slowing social progress rather than supporting it.

Related Content