On Hedden's proof that machine learning fairness metrics are flawed

Research output: Contribution to journal › Journal article › Research › peer-review

Standard

On Hedden's proof that machine learning fairness metrics are flawed. / Søgaard, Anders; Kappel, Klemens; Grünbaum, Thor.

In: Inquiry, 2024.

Research output: Contribution to journal › Journal article › Research › peer-review

Harvard

Søgaard, A, Kappel, K & Grünbaum, T 2024, 'On Hedden's proof that machine learning fairness metrics are flawed', Inquiry. https://doi.org/10.1080/0020174X.2024.2315169

APA

Søgaard, A., Kappel, K., & Grünbaum, T. (2024). On Hedden's proof that machine learning fairness metrics are flawed. Inquiry. https://doi.org/10.1080/0020174X.2024.2315169

Vancouver

Søgaard A, Kappel K, Grünbaum T. On Hedden's proof that machine learning fairness metrics are flawed. Inquiry. 2024. https://doi.org/10.1080/0020174X.2024.2315169

Author

Søgaard, Anders ; Kappel, Klemens ; Grünbaum, Thor. / On Hedden's proof that machine learning fairness metrics are flawed. In: Inquiry. 2024.

Bibtex

@article{90ce9cd7336b434282cab307f584eed5,

title = "On Hedden's proof that machine learning fairness metrics are flawed",

abstract = "Brian Hedden, in a recent article in Philosophy and Public Affairs [Hedden Citation2021. “On Statistical Criteria of Algorithmic Fairness.” Philosophy and Public Affairs 49 (2): 209–231. https://doi.org/10.1111/papa.v49.2.], presented a thought experiment designed to probe the validity of the fairness metrics used in machine learning (ML). The thought experiment has caused a great stir, also within machine learning [Vigan{\'o} et al. “People are Not Coins: Morally Distinct Types of Predictions Necessitate Different Fairness Constraints.” In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT '22, 2293–2301, New York, NY: Association for Computing Machinery.]. Brian Hedden describes a particular prediction problem p – involving 40 people divided into two rooms flipping biased coins – and a binary classification model m for predicting the outcome of these 40 coin flips. Brian Hedden argues that in the thought experiment, m is {\textquoteleft}perfectly fair{\textquoteright}, but at the same time, he shows that almost all existing fairness metrics would score m as unfair. He concludes that almost all existing fairness metrics are flawed. If he is right, this seriously undermines most recent work on fair ML. We present three counter-arguments to Brian Hedden's thought experiment, of which the first is the most important: (a) the prediction problem p is irrelevant for ML because p is not (evaluated as) a learning problem, (b) the model m is not actually fair and (c) the prediction problem p is irrelevant for fairness metrics, because group assignment in p is random.",

author = "Anders S{\o}gaard and Klemens Kappel and Thor Gr{\"u}nbaum",

year = "2024",

doi = "https://doi.org/10.1080/0020174X.2024.2315169",

language = "English",

journal = "Inquiry (United Kingdom)",

issn = "0020-174X",

publisher = "Routledge",

}

RIS

TY - JOUR

T1 - On Hedden's proof that machine learning fairness metrics are flawed

AU - Søgaard, Anders

AU - Kappel, Klemens

AU - Grünbaum, Thor

PY - 2024

Y1 - 2024

N2 - Brian Hedden, in a recent article in Philosophy and Public Affairs [Hedden Citation2021. “On Statistical Criteria of Algorithmic Fairness.” Philosophy and Public Affairs 49 (2): 209–231. https://doi.org/10.1111/papa.v49.2.], presented a thought experiment designed to probe the validity of the fairness metrics used in machine learning (ML). The thought experiment has caused a great stir, also within machine learning [Viganó et al. “People are Not Coins: Morally Distinct Types of Predictions Necessitate Different Fairness Constraints.” In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT '22, 2293–2301, New York, NY: Association for Computing Machinery.]. Brian Hedden describes a particular prediction problem p – involving 40 people divided into two rooms flipping biased coins – and a binary classification model m for predicting the outcome of these 40 coin flips. Brian Hedden argues that in the thought experiment, m is ‘perfectly fair’, but at the same time, he shows that almost all existing fairness metrics would score m as unfair. He concludes that almost all existing fairness metrics are flawed. If he is right, this seriously undermines most recent work on fair ML. We present three counter-arguments to Brian Hedden's thought experiment, of which the first is the most important: (a) the prediction problem p is irrelevant for ML because p is not (evaluated as) a learning problem, (b) the model m is not actually fair and (c) the prediction problem p is irrelevant for fairness metrics, because group assignment in p is random.

AB - Brian Hedden, in a recent article in Philosophy and Public Affairs [Hedden Citation2021. “On Statistical Criteria of Algorithmic Fairness.” Philosophy and Public Affairs 49 (2): 209–231. https://doi.org/10.1111/papa.v49.2.], presented a thought experiment designed to probe the validity of the fairness metrics used in machine learning (ML). The thought experiment has caused a great stir, also within machine learning [Viganó et al. “People are Not Coins: Morally Distinct Types of Predictions Necessitate Different Fairness Constraints.” In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT '22, 2293–2301, New York, NY: Association for Computing Machinery.]. Brian Hedden describes a particular prediction problem p – involving 40 people divided into two rooms flipping biased coins – and a binary classification model m for predicting the outcome of these 40 coin flips. Brian Hedden argues that in the thought experiment, m is ‘perfectly fair’, but at the same time, he shows that almost all existing fairness metrics would score m as unfair. He concludes that almost all existing fairness metrics are flawed. If he is right, this seriously undermines most recent work on fair ML. We present three counter-arguments to Brian Hedden's thought experiment, of which the first is the most important: (a) the prediction problem p is irrelevant for ML because p is not (evaluated as) a learning problem, (b) the model m is not actually fair and (c) the prediction problem p is irrelevant for fairness metrics, because group assignment in p is random.

U2 - https://doi.org/10.1080/0020174X.2024.2315169

DO - https://doi.org/10.1080/0020174X.2024.2315169

M3 - Journal article

JO - Inquiry (United Kingdom)

JF - Inquiry (United Kingdom)

SN - 0020-174X

ER -

ID: 382094235

Department of Communication