IJCAI.2020 - AI Ethics

| Total: 5

#1 WEFE: The Word Embeddings Fairness Evaluation Framework [PDF] [Copy] [Kimi] [REL]

Authors: Pablo Badilla ; Felipe Bravo-Marquez ; Jorge Pérez

Word embeddings are known to exhibit stereotypical biases towards gender, race, religion, among other criteria. Severa fairness metrics have been proposed in order to automatically quantify these biases. Although all metrics have a similar objective, the relationship between them is by no means clear. Two issues that prevent a clean comparison is that they operate with different inputs, and that their outputs are incompatible with each other. In this paper we propose WEFE, the word embeddings fairness evaluation framework, to encapsulate, evaluate and compare fairness metrics. Our framework needs a list of pre-trained embeddings and a set of fairness criteria, and it is based on checking correlations between fairness rankings induced by these criteria. We conduct a case study showing that rankings produced by existing fairness methods tend to correlate when measuring gender bias. This correlation is considerably less for other biases like race or religion. We also compare the fairness rankings with an embedding benchmark showing that there is no clear correlation between fairness and good performance in downstream tasks.

#2 Individual Fairness Revisited: Transferring Techniques from Adversarial Robustness [PDF] [Copy] [Kimi] [REL]

Authors: Samuel Yeom ; Matt Fredrikson

We turn the definition of individual fairness on its head - rather than ascertaining the fairness of a model given a predetermined metric, we find a metric for a given model that satisfies individual fairness. This can facilitate the discussion on the fairness of a model, addressing the issue that it may be difficult to specify a priori a suitable metric. Our contributions are twofold: First, we introduce the definition of a minimal metric and characterize the behavior of models in terms of minimal metrics. Second, for more complicated models, we apply the mechanism of randomized smoothing from adversarial robustness to make them individually fair under a given weighted Lp metric. Our experiments show that adapting the minimal metrics of linear models to more complicated neural networks can lead to meaningful and interpretable fairness guarantees at little cost to utility.

#3 Achieving Outcome Fairness in Machine Learning Models for Social Decision Problems [PDF] [Copy] [Kimi] [REL]

Authors: Boli Fang ; Miao Jiang ; Pei-yi Cheng ; Jerry Shen ; Yi Fang

Effective complements to human judgment, artificial intelligence techniques have started to aid human decisions in complicated social decision problems across the world. Automated machine learning/deep learning(ML/DL) classification models, through quantitative modeling, have the potential to improve upon human decisions in a wide range of decision problems on social resource allocation such as Medicaid and Supplemental Nutrition Assistance Program(SNAP, commonly referred to as Food Stamps). However, given the limitations in ML/DL model design, these algorithms may fail to leverage various factors for decision making, resulting in improper decisions that allocate resources to individuals who may not be in the most need of such resource. In view of such an issue, we propose in this paper the strategy of fairgroups, based on the legal doctrine of disparate impact, to improve fairness in prediction outcomes. Experiments on various datasets demonstrate that our fairgroup construction method effectively boosts the fairness in automated decision making, while maintaining high prediction accuracy.

#4 Relation-Based Counterfactual Explanations for Bayesian Network Classifiers [PDF] [Copy] [Kimi] [REL]

Authors: Emanuele Albini ; Antonio Rago ; Pietro Baroni ; Francesca Toni

We propose a general method for generating counterfactual explanations (CFXs) for a range of Bayesian Network Classifiers (BCs), e.g. single- or multi-label, binary or multidimensional. We focus on explanations built from relations of (critical and potential) influence between variables, indicating the reasons for classifications, rather than any probabilistic information. We show by means of a theoretical analysis of CFXs’ properties that they serve the purpose of indicating (potentially) pivotal factors in the classification process, whose absence would give rise to different classifications. We then prove empirically for various BCs that CFXs provide useful information in real world settings, e.g. when race plays a part in parole violation prediction, and show that they have inherent advantages over existing explanation methods in the literature.

#5 Metamorphic Testing and Certified Mitigation of Fairness Violations in NLP Models [PDF] [Copy] [Kimi] [REL]

Authors: Pingchuan Ma ; Shuai Wang ; Jin Liu

Natural language processing (NLP) models have been increasingly used in sensitive application domains including credit scoring, insurance, and loan assessment. Hence, it is critical to know that the decisions made by NLP models are free of unfair bias toward certain subpopulation groups. In this paper, we propose a novel framework employing metamorphic testing, a well-established software testing scheme, to test NLP models and find discriminatory inputs that provoke fairness violations. Furthermore, inspired by recent breakthroughs in the certified robustness of machine learning, we formulate NLP model fairness in a practical setting as (ε, k)-fairness and accordingly smooth the model predictions to mitigate fairness violations. We demonstrate our technique using popular (commercial) NLP models, and successfully flag thousands of discriminatory inputs that can cause fairness violations. We further enhance the evaluated models by adding certified fairness guarantee at a modest cost.