Do you ever wonder how companies keep your private data safe when they use it for marketing purposes? Have you ever thought about what happen if that data became public? You might consider thinking about it now.
In 2006, the popular movie rental company, Netflix, opened a contest to improve their movie suggestions tool. The plan was to release sets of “anonymized” user data to researchers who signed up for the contest with a cash prize for making the biggest improvement. The trouble is, the data didn’t stay anonymous.
University of Texas researchers soon showed how easily Netflix users could be reidentified using a formula that matched the data to a more public movie-rating site, the Internet Movie Database. While people may have known they were sharing information on their public IMDB profile, they believed their Netflix ratings were theirs and theirs alone – a simple tool to discover new movies. In some cases, the anonymized user data could be assigned a name, history, and even photo.
But the not-so-anonymous users are now fighting back. According to Wired, a lesbian mother – whose sexual preferences are not known to some friends and family – has recently filed a suit against the online rental giant, claiming an invasion of her privacy. The class action case, Doe v. Netflix, was filed last Thursday in a San Jose court, seeks over $5 million in damages for the breach.
Unfortunately, Netflix is not the first company to show the limits of anonymous information. Also in 2006, AOL released three months of search data to the public, replacing personal usernames with a randomized ID number. However, it was not difficult for the New York Times to reidentify and track down a user based solely on the content of their searches.
And in the 90’s, the Massachusetts Group Insurance Commission released similarly scrubbed data on the state’s employees. But it didn’t take graduate student Latanya Sweeney long to realize the error: she sent a copy of the Governor’s hospital records and prescriptions to his office.
Sweeney, who has since received her PhD and researches at Carnegie Mellon University, has published numerous papers on the dangers of making data partially – but not totally – anonymous. Sweeney has found that “87% of the US population can be uniquely identified by gender, ZIP code and full date of birth.”
Ironically, Netflix is proposing a second contest where they will purportedly release ZIP code, gender and age data along with the randomized ID numbers. This could spell disaster both for Netflix and their users because the vast majority of users could be identified without the aid of an external site like IMDB.
What is really at stake here is not simply finding the skeletons in your friends’ film reels. The Netflix recommendation engine has become less of an internal service to its customers and more of an advertising technique to manipulate customers’ preferences via behavioral targeting. By releasing data without the proper privacy protections, Netflix is violating the trust of millions of its users.
When it comes down to it, your data is your own. No matter how companies “anonymize” or scrub your information, with the right data points and a little bit of research it can still be traced back to you.
If you’re worried that you might fall into the 87% of Americans who can be identified using ZIP code, gender, and age (or if you’re just disturbed that the world may soon know just how much you loved Titanic) check out Privacy PRO from ReputationDefender. With MyPrivacy you can begin removing your name, address, phone number, and age from people-search databases across the web today.