(Press-News.org) CAMBRIDGE, MA -- The proliferation of sensor-studded cellphones could lead to a wealth of data with socially useful applications — in urban planning, epidemiology, operations research and emergency preparedness, among other things. Of course, before being released to researchers, the data would have to be stripped of identifying information. But how hard could it be to protect the identity of one unnamed cellphone user in a data set of hundreds of thousands or even millions?
According to a paper appearing this week in Scientific Reports, harder than you might think. Researchers at MIT and the Université Catholique de Louvain, in Belgium, analyzed data on 1.5 million cellphone users in a small European country over a span of 15 months and found that just four points of reference, with fairly low spatial and temporal resolution, was enough to uniquely identify 95 percent of them.
In other words, to extract the complete location information for a single person from an "anonymized" data set of more than a million people, all you would need to do is place him or her within a couple of hundred yards of a cellphone transmitter, sometime over the course of an hour, four times in one year. A few Twitter posts would probably provide all the information you needed, if they contained specific information about the person's whereabouts.
The first author on the paper is Yves-Alexandre de Montjoye, a graduate student in the research group of Toshiba Professor of Media Arts and Science Sandy Pentland. He's joined by César Hidalgo, an assistant professor of media arts and science; Vincent Blondel, a visiting professor at MIT and a professor of applied mathematics at Université Catholique; and Michel Verleysen, a professor of electrical engineering at Université Catholique.
Focusing the debate
Hidalgo's group specializes in applying the tools of statistical physics to a wide range of subjects, from communications networks to genetics to economics. In this case, he and de Montjoye were able to use those tools to uncover a simple mathematical relationship between the resolution of spatiotemporal data and the likelihood of identifying a member of a data set.
According to their formula, the probability of identifying someone goes down if the resolution of the measurements decreases, but less than you might think. Reporting the time of each measurement as imprecisely as sometime within a 15-hour span, or location as imprecisely as somewhere amid 15 adjacent cell towers, would still enable the unique identification of half the people in the sample data set.
But while its initial application may be discouraging, de Montjoye and Hidalgo hope that their formula will provide a way for researchers and policy analysts to reason more rigorously about the privacy safeguards that need to be put in place when they're working with aggregated location data.
"Both César and I deeply believe that we all have a lot to gain from this data being used," de Montjoye says. "This formula is something that could be useful to help the debate and decide, OK, how do we balance things out, and how do we make it a fair deal for everyone to use this data?"
Everybody's different
In the data set that the researchers analyzed, the location of a cellphone was inferred solely from that of the cell tower it was connected to, and the time of the connection was given as falling within a one-hour interval. Each cellphone had a unique, randomly generated identifying number, so that its movement could be traced over time. But there was no information connecting that number to the phone's owner.
The researchers randomly selected a representative sampling from the set of 1.5 million cellphone traces and, for each trace, began choosing points at random. For 95 percent of the traces, just four randomly selected points was enough to distinguish them from all other traces in the database. In the worst (or, from another perspective, best) case, 11 measurements were necessary.
The researchers suspect that similar relationships might hold for other types of data. "I would not be surprised if a similar result — maybe requiring more points — would, for example, extend to web browsing," Hidalgo says. "The space of potential combinations is really large. When a person is, in some sense, being expressed in a space in which the total number of combinations is huge, the probability that two people would have the same exact trajectory — whether it's walking or browsing — is almost nil."
###
Written by Larry Hardesty, MIT News Office END
How hard is it to 'de-anonymize' cellphone data?
2013-03-27
ELSE PRESS RELEASES FROM THIS DATE:
An international study identifies new DNA variants that increase the risk for cancer
2013-03-27
The European Collaborative Oncological Gen-Environmental Study (COGS) project, whose main goal is to decipher the complex genetic bases of breast, prostate and ovarian cancers, publishes today a total of 12 research articles in several prestigious journals, including Nature Genetics, Nature Communications, The American Journal of Human Genetics and PLOS Genetics. Using mass sequencing techniques, the study has identified up to 80 new regions of the genome associated with an increased susceptibility to developing breast, prostate and ovarian cancers.
The conclusions are ...
New system to restore wetlands could reduce massive floods, aid crops
2013-03-27
CORVALLIS, Ore. – Engineers at Oregon State University have developed a new interactive system to create networks of small wetlands in Midwest farmlands, which could help the region prevent massive spring floods and also retain water and mitigate droughts in a warming climate.
The planning tool, which is being developed and tested in a crop-dominated watershed near Indianapolis, is designed to identify the small areas best suited to wetland development, optimize their location and size, and restore a significant portion of the region's historic water storage ability by ...
New insights into how genes turn on and off
2013-03-27
(SACRAMENTO, Calif.) — Researchers at UC Davis and the University of British Columbia have shed new light on methylation, a critical process that helps control how genes are expressed. Working with placentas, the team discovered that 37 percent of the placental genome has regions of lower methylation, called partially methylated domains (PMDs), in which gene expression is turned off. This differs from most human tissues, in which 70 percent of the genome is highly methylated.
While PMDs have been identified in cell lines, this is the first time they have been found in ...
Just 'weight' until menopause
2013-03-27
This press release is available in French.
Montreal, March 27, 2013 – Women tend to carry excess fat in their hips and thighs, while men tend to carry it on their stomachs. But after menopause, things start to change: many women's fat storage patterns start to resemble those of men. This indicates that there's a link between estrogen and body fat storage. This connection is well documented, but the underlying mechanisms remained poorly understood until now.
New research conducted by Sylvia Santosa, assistant professor in Concordia University's Department of Exercise ...
Understanding Earth processes and human impacts, plus another look at Mars
2013-03-27
Boulder, Colo., USA - New Geology articles cover using the architecture of ancient lava-fed deltas to estimate paleo-water levels and past ice thicknesses; bubbles and bubble haloes in lava; iron-silicate microgranules; the importance of durable, biomineralized hard parts; the link between wastewater disposal and earthquakes; shells, ocean pH, and atmospheric CO2; a SWEET hypothesis for mound-building on Mars; marine oxygenation may have preceded oxygenation on land; analysis of fossil plant tissues from Pakistan; and imaging the Transition fault.
Detailed highlights ...
Do intellectual property rights on existing technologies hinder subsequent innovation?
2013-03-27
A recent study published in the Journal of Political Economy suggests that some types of intellectual property rights discourage subsequent scientific research.
"The goal of intellectual property rights – such as the patent system – is to provide incentives for the development of new technologies. However, in recent years many have expressed concerns that patents may be impeding innovation if patents on existing technologies hinder subsequent innovation," said Heidi Williams, author of the study. "We currently have very little empirical evidence on whether this is ...
Why sticking around is sometimes the better choice for males
2013-03-27
Researchers from Lund University and the University of Oxford have been able to provide one answer as to why males in many species still provide paternal care, even when their offspring may not belong to them. The study finds that, when the conditions are right, sticking around despite being 'cuckolded' actually turns out to be the most successful evolutionary strategy. The study, by Charlie Cornwallis and colleagues, is published 26 March in the open access journal PLOS Biology.
In many species, males put a lot of effort into caring for offspring that are not their ...
Researchers discover how model organism Tetrahymena plays roulette with 7 sexes
2013-03-27
It's been more than fifty years since scientists discovered that the single-celled organism Tetrahymena thermophila has seven sexes. But in all that time, they've never known how each cell's sex, or "mating type," is determined; now they do. The new findings are published 26 March in the open access journal PLOS Biology.
By identifying Tetrahymena's long-unknown mating-type genes, a team of UC Santa Barbara biologists, with research colleagues in the Institute of Hydrobiology of the Chinese Academy of Sciences, and in the J. Craig Venter Institute, also uncovered the ...
Certified stroke centers more likely to give clot-busting drugs
2013-03-27
Stroke patients are three times more likely to receive clot-busting medication if treated at a certified stroke center, according to a study in the Journal of the American Heart Association.
Intravenous tissue plasminogen activator (tPA) is the only drug approved by the Food and Drug Administration for emergency treatment for people who have ischemic (clot-caused) stroke. The durg can reduce stroke disability.
"The stroke center concept has rapidly taken off, and this data demonstrates one way that certified centers are doing better than non-certified centers," said Michael ...
Chelation therapy may result in small reduction of risk of CV events
2013-03-27
Although chelation therapy with the drug disodium EDTA has been used for many years with limited evidence of efficacy for the treatment of coronary disease, a randomized trial that included patients with a prior heart attack found that use of a chelation regimen modestly reduced the risk of a composite of adverse cardiovascular outcomes, but the findings do not support the routine use of chelation therapy for treatment of patients who have had a heart attack, according to a study in the March 27 issue of JAMA.
Chelation therapy is an intravenous administration of chelating ...