Total: 1
Machine learning algorithms are increasingly involved in sensitive decision-making processes with adversarial implications on individuals. This paper presents a new tool, mdfa that identifies the characteristics of the victims of a classifier's discrimination. We measure discrimination as a violation of multi-differential fairness. Multi-differential fairness is a guarantee that a black box classifier's outcomes do not leak information on the sensitive attributes of a small group of individuals. We reduce the problem of identifying worst-case violations to matching distributions and predicting where sensitive attributes and classifier's outcomes coincide. We apply mdfa to a recidivism risk assessment classifier widely used in the United States and demonstrate that for individuals with little criminal history, identified African-Americans are three-times more likely to be considered at high risk of violent recidivism than similar non-African-Americans.