The overwhelming amount of data in almost every aspect of people’s lives makes many decision-making processes rely on algorithmic systems able to sort the data. Ranking, as a form of outcome from these algorithmic systems, is used in education, employment, financial services, health and personal well-being, and many other domains.

Incorporating ethics and legal compliance into algorithmic systems has attracted significant research interest in data management, machine learning, information retrieval, and recommender systems in the past few years. Several normative dimensions of algorithmic systems, including fairness, diversity, interpretability, transparency, and accountability, have been recognized as important and have been incorporated into classification tasks. Incorporating these same dimensions into algorithmic ranking approaches, however, has not received as much research attention as compared to classification.

During the PhD study, I focused on formalizing fairness for ranking tasks in different contexts and propose approaches to incorporate fairness into the ranked outcome. We believe that enforcing fairness in ranking is essential, especially when input data of the ranker reveals biases and intersectional discrimination.

Fairness in Ranked Outcomes
Ranking are ubiquitous in many context, such as recommendations, college admissions, and employment. They introduce competition for the top positions, and candidates in the ranking cannot be considered independently of each other. Considering college admissions, review committee wants to understand the representation of groups based on demographic, behavioral or other characteristics in the past admitted students, and then decide whether to develop policies to increase representation of particular groups in the future. How to quantify the representation of groups in a ranking?

Our work in Measuring Fairness in Ranked Outputs, the first to formalize fairness in ranking, achieves proportional representations across groups in the ranked outcome and is based on the belief that comparisons among groups are unfair to disadvantaged subjects when their input data reveals biases. We model these unfair group comparisons as the deviation from an impartial stochastic ranking process and, following this model, propose a ranking generation procedure that implements various levels of unfair comparisons. We also propose an optimization framework that produces fair rankings. We find that our framework has potential to improve fairness in ranked outcomes, while maintaining utility. We focus specifically on a single form of discrimination that performs comparison between groups that are disjoint and often defined by considering only one demographic attribute or several of them such as gender, race, and disability status, in combination. Code for this project can be found at GitHub.
Balanced Ranking with In-group Fairness and Diversity We extend our focus to intersectional discrimination, where several forms of distinct treatment may occur, intersect with each other, and produce new forms of disadvantages. First, we study how unfair comparisons among individuals within groups, as a new form of disadvantage induced by intersectional discrimination, in rankings constrained by diversity in the input. Our work in Balanced Ranking with Diversity Constraints formalizes in-group fairness to quantify such comparisons within groups in these ranking processes and integrate our formalization as fairness constraints into an Integer Linear Program that maximizes utility subject to input constraints. We find that our approach reduces these unfair in-group comparisons, especially within the disadvantaged groups who may suffer from intersectional discrimination more than others, while paying a small cost of utility.
Causal Intersectionality and Fair Ranking Next we formalize intersectional fairness in rankings using a causal approach. Our work in Causal Intersectionality and Fair Ranking proposes a framework CIF-Rank that models intersectional discrimination as the effects of demographic attributes on other variables in the input and mitigates the unfair effects of these attributes on ranked outcomes through intervening on the score distribution in the input. We find that CIF-Rank produces intersectionally fair rankings in different scenarios. We predict that our framework will be useful for mitigating the negative impacts of ranking tasks on people due to attributes that are out their control, and supporting decision makers to flexibly express their requirements for fair rankings in different contexts. Code available at GitHub.
A Nutritional Label for Score-based Rankings We develop Ranking Facts tool that aims to provide explanations about score-based rankings. Ranking Facts generates “nutritional label” that explains the ranking process and ranked outcomes to a user with appropriately summarized information. Specifically, the nutritional label in Ranking Facts is made up of a collection of visual widgets, where each widget shows an aspect of the outcomes such as fairness, diversity, a “nutritional label” for score-based rankings. Ranking Facts is made up of a collection of visual widgets, each showing an aspect of ranked outcomes including fairness, diversity, stability, and transparency. The fairness widget integrates the fairness measure rND proposed in the paper of Measuring Fairness in Ranked Outputs and FA*IR proposed by Zehlike et al. A demo paper is published at SIGMOD 2018. Ranking Facts can be accessed at dataresponsibly.github.io/tools/ Source code is available at GitHub.
FairDAGs fairDAGs is a web-based tool that extracts directed acyclic graph (DAG) representation of the pipelines. It tracks the changes of the distributions of targets and groups (based on some user-specified characteristics) due to each operation in the pipelines. Our work in Fairness-Aware Instrumentation of Preprocessing Pipelines for Machine Learning provides an intuitive way to interpret fairness in data science pipelines. Code available at GitHub.