Computational Privacy: Towards Privacy-Conscientious Uses of Metadata

de Montjoye Y.-A. "Computational Privacy: Towards Privacy-Conscientious Uses of Metadata"


The breadcrumbs left behind by our technologies have the power to fundamentally transform the health and development of societies. Metadata about our whereabouts, social lives, preferences, and finances can be used for good but can also be abused. In this thesis, I show that the richness of today's datasets have rendered traditional data protections strategies outdated, requiring us to deeply rethink our approach.

First, I show that the concept of anonymization, central to legal and technical data protection frameworks, does not scale. I introduce the concept of unicity to study the risks of re-identification of large-scale metadata datasets given p points. I then use unicity to show that four spatio-temporal points are enough to uniquely identify 95% of people in a mobile phone dataset and 90% of people in a credit card dataset. In both cases, I also show that traditional de-identification strategies such as data generalization are not sufficient to approach anonymity in modern high-dimensional datasets.

Second, I argue that the second pillar of data protection, risk assessment, is simi- larly crumbling as data gets richer. I show, for instance, how standard mobile phone data"information on how and when somebody calls or texts"can be used to predict personality traits up to 1.7 times better than random. The risk of inference in big data will render comprehensive risks assessments increasingly difficult and, moving forward, potentially irrelevant as they will require evaluating what can be inferred now, and in the future, from rich data.

However, this data has a great potential for good especially in developing coun- tries. While it is highly unlikely that we will ever find a magic bullet or even a one- size-fits-all approach to data protection, there are ways that exist to use metadata in privacy-conscientious ways. I finish this thesis by discussing technical solutions (incl. privacy-through-security ones) which, when combined with legal and regula- tory frameworks, provide a reasonable balance between the imperative of using this data and the legitimate concerns of the individual and society.

Related Content