Optimising Fairness Through Parametrised Data Sampling
2021,
González Zelaya, Carlos Vladimiro,
Salas, Julián,
Prangle, Dennis,
Missier, Paolo
Improving machine learning models’ fairness is an active research topic, with most approaches focusing on specific definitions of fairness. In contrast, we propose ParDS, a parametrized data sampling method by which we can optimize the fairness ratios observed on a test set, in a way that is agnostic to both the specific fairness definitions, and the chosen classification model. Given a training set with one binary protected attribute and a binary label, our approach involves correcting the positive rate for both the favoured and unfavoured groups through resampling of the training set. We present experimental evidence showing that the amount of resampling can be optimized to achieve target fairness ratios for a specific training set and fairness definition, while preserving most of the model’s accuracy. We discuss conditions for the method to be viable, and then extend the method to include multiple protected attributes. In our experiments we use three different sampling strategies, and we report results for three commonly used definitions of fairness, and three public benchmark datasets: Adult Income, COMPAS and German Credit.