Optimising Fairness Through Parametrised Data Sampling

González Zelaya, Carlos Vladimiro; Salas, Julián; Prangle, Dennis; Missier, Paolo

doi:10.5441/002/edbt.2021.49

Optimising Fairness Through Parametrised Data Sampling

Journal

Advances in Database Technology - EDBT

Date Issued

2021

Author(s)

González Zelaya, Carlos Vladimiro

Facultad de Ciencias Económicas y Empresariales - CampCM

Salas, Julián

Prangle, Dennis

Missier, Paolo

Type

text::conference output::conference proceedings::conference paper

DOI

10.5441/002/edbt.2021.49

URL

https://scripta.up.edu.mx/handle/20.500.12552/1798

Abstract

Improving machine learning models’ fairness is an active research topic, with most approaches focusing on specific definitions of fairness. In contrast, we propose ParDS, a parametrized data sampling method by which we can optimize the fairness ratios observed on a test set, in a way that is agnostic to both the specific fairness definitions, and the chosen classification model. Given a training set with one binary protected attribute and a binary label, our approach involves correcting the positive rate for both the favoured and unfavoured groups through resampling of the training set. We present experimental evidence showing that the amount of resampling can be optimized to achieve target fairness ratios for a specific training set and fairness definition, while preserving most of the model’s accuracy. We discuss conditions for the method to be viable, and then extend the method to include multiple protected attributes. In our experiments we use three different sampling strategies, and we report results for three commonly used definitions of fairness, and three public benchmark datasets: Adult Income, COMPAS and German Credit.