De-anonymizing D4D Datasets

Abstract

Recent research on de-anonymizing datasets of anonymized personal records has not deterred organizations from releasing personal data, often with ingenuous attempts at defeating de-anonymization. Studying such techniques provides scientific evidence as to why anonymization of high dimensional databases is hard and throws light on what kinds of techniques to avoid. We study how to de-anonymize datasets released as a part of Data for Development (D4D) challenge. We show that the anonymization strategy used is weak and allows an attacker to re-identify and link records efficiently, we also suggest some measures to make such attacks harder.

Publication
Date
Links