Fairness in machine learning

Mittal, Vani; Shah, Rajiv Ratn (Advisor)

dc.contributor.author	Mittal, Vani
dc.contributor.author	Shah, Rajiv Ratn (Advisor)
dc.date.accessioned	2026-04-15T07:42:32Z
dc.date.available	2026-04-15T07:42:32Z
dc.date.issued	2025-08
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/1883
dc.description.abstract	Ensuring fairness in machine learning systems has become increasingly crucial as these models are deployed in socially consequential domains such as lending, hiring, and criminal justice. A significant challenge in fairness evaluation arises when sensitive attributes, such as race or gender, are unavailable due to privacy constraints or demographic scarcity. Traditional approaches rely on off-the-shelf proxy models to infer missing sensitive attributes; however, such methods can misrepresent true fairness, leading to potentially biased decisions. This thesis investigates the theoretical and practical framework proposed by Zhu et al. (2023) for fairness evaluation using weak proxies. We systematically implement a controlled pipeline to generate synthetic datasets via a Gaussian Copula, train multiple weak and independent proxy classifiers, and aggregate their predictions using ensemble techniques. Our experimental setup evaluates the impact of proxy quality, ensemble size, and noise on fairness metrics, particularly focusing on Equalized Odds, across three datasets: Adult, COMPAS, and synthetic Gaussian data. The results demonstrate that naive use of proxy-sensitive attributes can underestimate true disparities, while ensembles of weak proxies, when appropriately calibrated, provide accurate and robust fairness estimates. Furthermore, introducing differential privacy via controlled noise allows us to study the trade-off between privacy and fairness, showing that even noisy proxies can yield reliable estimates when combined with generative modeling and majority voting. This work validates the theoretical claims of weak proxy sufficiency, highlights the critical conditions required for reliable fairness measurement, and provides practical guidelines for deploying privacy-preserving fairness audits in scenarios where sensitive information is partially or entirely inaccessible.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT-Delhi	en_US
dc.subject	Fairness in Machine Learning	en_US
dc.subject	Weak Proxies	en_US
dc.subject	Sensitive Attributes	en_US
dc.subject	Privacy-Preserving Fairness	en_US
dc.subject	Equalized Odds	en_US
dc.subject	Fairness Evaluation	en_US
dc.subject	Proxy-Based Auditing	en_US
dc.title	Fairness in machine learning	en_US
dc.type	Thesis	en_US