Sometimes humans resort to machine learning models to help them in making decisions when the stakes are high. For example, a model could predict which law school applicants are most likely to pass the bar exam, assisting admissions officers in deciding which students to accept.
Millions of parameters are considered by the model while making such predictions. Hence when researchers themselves find it nearly impossible to fully understand on what basis these predictions are made by the model, then how could an admission officer with no experience in machine learning understand this concept.
Researchers will sometimes use explanation methods mimicking a larger model by producing simple approximations of its predictions.
These approximations are much simpler to understand helping users to decide on trusting the model’s predictions.
However, the fairness of these explanation methods needs to be questioned. If there is a bias found in the explanation methods, for instance, men getting better approximations than women, or white people getting better approximations than black people this might result in users trusting the predictions of the model for some people and not for others.
MIT researchers examined the fairness of some commonly used explanation methods. They discovered that the approximation quality of these explanations varies dramatically between subgroups, with minoritized subgroups often having significantly lower quality.
In practice, if the approximation quality is lower for female applicants, there is a mismatch between the explanations and the model’s predictions, which may lead the admissions officer to reject more women than men.
Once the MIT researchers realized the prevalence of these fairness gaps various methods were tried to level these gaps. However, they were successful in reducing a few gaps but not eliminating them.
This implies that in the real world it is likely that people may wrongly trust predictions extra for some groups than for others. Hence it is important to enhance explanation models while keeping in mind the equal importance of communicating the details of these models to the end-users.
These gaps exist, so users should adjust their expectations of what they will get when using these explanations, states Aparna Balagopalan – lead author, and a graduate student in the MIT Computer Science and AI Laboratory’s (CSAIL)Healthy ML group.
The paper was co-authored by Balagopalan, CSAIL graduate students Haoran Zhang and Kimia Hamidieh, CSAIL postdoc Thomas Hartvigsen, Frank Rudzicz, associate professor of computer science at the University of Toronto, and senior author Marzyeh Ghassemi, assistant professor and head of the Healthy ML Group. The findings of the study will be presented at the ACM Conference on Fairness, Accountability, and Transparency.
Simplified explanation models can approximate human-readable predictions of a more complex machine-learning model. An effective explanation model maximizes fidelity, which measures how well it matches the predictions of the larger model.
Rather than focusing on the overall fidelity of the explanation model, the MIT researchers investigated fidelity for subgroups of people in the model’s dataset. In a dataset with men and women, the fidelity should be very identical, with both groups containing fidelity close to that of the overall explanation model.
If only the average fidelity across all instances is looked upon, one may miss out on artifacts existing in the explanation model, Balagopalan says.
Two metrics were created to assess fidelity gaps or differences in fidelity between subgroups. One is the difference in fidelity between the overall average and the fidelity for the worst-performing subgroup. The second computes the average of the absolute difference in fidelity between all possible pairs of subgroups.
They used these metrics to look for fidelity gaps using two types of explanation models trained on four real-world datasets for high-stakes situations like predicting whether a patient will die in the ICU, whether a defendant will re-offend, or whether a law school applicant will pass the bar exam.
Individual people’s sex and race were protected attributes in each dataset. Protected attributes are characteristics that cannot be used to make decisions, usually due to laws or organizational policies. The definition of these can vary depending on the task at hand in each decision setting.
The researchers discovered significant fidelity gaps in all datasets and explanation models. For disadvantaged groups, fidelity was frequently much lower, reaching up to 21% in some cases.
The fidelity gap between race subgroups in the law school dataset was 7%, which means that approximations for some subgroups were incorrect 7 percent more often on average. If there are 10,000 applicants from these subgroups in the dataset, for instance, a significant portion may be incorrectly rejected, according to Balagopalan.
I was surprised by how common these fidelity gaps are in all of the datasets we looked at. It is difficult to overstate how frequently explanations are used as a “fix” for black-box machine-learning models. We show in this paper that the explanation methods themselves are flawed approximations that may be bad for some subgroups, Ghassemi says.
Reduction of gaps
Some machine learning approaches were tried by the researchers to fix the fidelity gaps after identifying them. The explanation models were trained to recognize regions of a dataset that may be vulnerable to low fidelity and more focus was given to these samples. They also experimented with balanced datasets that contained an equal number of samples from each subgroup.
These rigorous training strategies were successful in minimizing a few fidelity gaps but they did not eradicate them.
The explanation models were then modified by the researchers to investigate the reason for the occurrence of the fidelity gaps. Their analysis disclosed that an explanation model could use protected group information, such as sex or race, and that it could indirectly learn from the dataset, even if group labels are hidden.
They intend to investigate this conundrum further in future work. They also intend to conduct additional research on the significance of fidelity gaps in the context of real-world decision-making.
Balagopalan is encouraged to see that parallel work on explanation fairness from an independent lab has reached similar conclusions, emphasizing the importance of thoroughly understanding this problem.
She has some words of caution for machine-learning users as she looks forward to the next phase of this research.
Select the explanation model with care. But, more importantly, consider the goals of using an explanation model and who it will eventually affect, she advises.
I believe this paper is a very important addition to the discourse regarding fairness in ML, says Krzysztof Gajos, Gordon McKay Professor of Computer Science at Harvard’s John A. Paulson School of Engineering and Applied Sciences, who was not involved with the research.
The preliminary evidence that discrepancies in explanation fidelity can have measurable effects on the quality of decisions made by people aided by machine learning models was particularly interesting and impactful to me. While the estimated difference in decision quality may appear small (around 1%), we know that the cumulative effects of such seemingly minor differences can be life-changing.