Gender Shades: leading tech companies' commercial AI systems significantly mis-gender women and darker skinned individuals. Researcher Joy Buolamwini initiated a systematic investigation after testing her TED speaker photo on facial analysis technology from leading companies. Some companies did not detect her face. Others labeled her face as male. After analyzing results on 1270 uniques faces, the Gender Shades authors uncovered severe gender and skin-type bias in gender classification.
In the worst case, the failure rate on darker female faces is over one in three, for a task with a 50 percent chance of being correct. In the best case,one classifier achieves flawless performance on lighter males: 0 percent error rate.
Pale Male Data: existing measures of success in AI don't reflect the global majority—we are fooling ourselves. Existing benchmark datasets overrepresent lighter men in particular and lighter individuals in general. Gender and skin type skews in existing datasets led to the creation of a new Pilot Parliaments Benchmark, composed of parliamentarians from the top three African and top three European countries, as ranked by gender parity in their parliaments as of May 2017.
Deploying AI in Ignorance: There is a need for inclusive AI testing and subgroup (demographic, appearance, etc.) accuracy reports. Companies do not disclose how well AI systems perform on different subgroups. Some admit to not checking. Evaluation needs to be intersectional: i.e. instead of examining male vs. female, lighter vs. darker, we also need to look at the intersections—darker females, darker males, lighter females, and lighter males. Phenotypic accuracy (performance on difference types of skin) should also be done where appropriate (computer vision applications).