Abstract:
The primary aim of any survey is to facilitate analysis of sensitive data to extract useful information, without jeopardizing privacy of the participants. Privacy models like k-anonymity do not guarantee privacy against attackers with background knowledge. "Differential privacy" is a model for data release which formalizes data privacy and makes no assumption on the attackers’ background knowledge. It builds on the idea that adding carefully computed noise to certain data can make it safer from privacy perspective while retaining utility of the data. We have addressed two problems for data release in this thesis and proposed algorithms that address the trade-off between utility and privacy. Following are the major contributions of this thesis:
1. We studied a functional mechanism to achieve differential privacy in regression analysis. We have extended an existing algorithm to achieve differential privacy for a more general form of linear regression. We have proved that the mechanism preserves differential privacy and provides better utility when compared to direct perturbation technique.
2. We have analyzed two strategies of achieving differential privacy for publishing summary statistics — the compose-then-perturb approach and perturb-then compose approach. We prove that the perturb-then-compose approach indeed satisfies differential privacy. We then try to find out which approach provides better balance between utility and privacy, for summary information that we want to publish, for a dataset.