Algorithmic auditing has emerged as a key strategy to expose systematic biases embedded in software platforms, yet scholarship on the impact of algorithmic audits on increasing algorithmic fairness and transparency in commercial systems is nascent. To analyze the impact of publicly naming and disclosing performance results of biased AI systems, we investigate the commercial impact of Gender Shades, the first algorithmic audit of gender and skin type performance disparities in commercial facial analysis models. This paper 1) outlines the audit design and structured disclosure procedure used in the Gender Shades study; 2) presents new performance metrics from targeted companies IBM, Microsoft, and Megvii(Face++) on the Pilot Parliaments Benchmark (PPB) as of August 2018; 3) provides performance results on PPB by non-target companies Amazon and Kairos; and 4) explores differences in company responses as shared through corporate communications that contextualize differences in performance on PPB. Within seven months of the original audit, we find that all three targets released new API versions.
All targets reduced accuracy disparities between males and females and darker and lighter-skinned subgroups, with the most significant update occurring for the darker-skinned female subgroup that underwent a 17.7% - 30.4% reduction in error between audit periods. Minimizing these disparities led to a 5.72% to 8.3% reduction in overall error on the Pilot Parliaments Benchmark (PPB) for target corporation APIs. The overall performance of non-targets Amazon and Kairos lags significantly behind that of the targets, with error rates of 8.66% and 6.60% overall, and error rates of 31.37% and 22.50% for the darker female subgroup, respectively.
While algorithmic fairness may be approximated through reductions in subgroup error rates or other performance metrics, algorithmic justice necessitates a transformation in the development, deployment, oversight, and regulation of facial analysis technology. Consequently, the potential for weaponization and abuse of facial analysis technologies cannot be ignored, nor the threats to privacy or breaches of civil liberties diminished even as accuracy disparities decrease. More extensive explorations of policy, corporate practice, and ethical guidelines are thus needed to ensure vulnerable and marginalized populations are protected and not harmed as this technology evolves.