HomeArtificial IntelligenceArtificial Intelligence NewsIdentifying anonymized datasets through AI

Identifying anonymized datasets through AI

How you interact with the crowd can help you stand out from the crowd, at least for AI.

By feeding information about target mobile interaction and their contact interactions, AI can correctly select targets from more than 40,000 anonymous mobile subscriber in more than half the time, researchers report January 25 in Nature. The results suggest that people socialize in a way that can be used to select people from a dataset that is said to be anonymized.

Jaideep Srivastava, a computer scientist at the University of Minnesota in Minneapolis who was not involved in the study, says people tend to stay in established social circles and that these regular interactions form stable patterns over time. “But it’s amazing that you can use this pattern to identify people in this part.”

In accordance with the EU General Data Protection Regulation and the California Consumer Privacy Act, companies that collect information about people’s day-to-day interactions may share or sell that data without your consent. The problem is that the data needs to be anonymized. Yves-Alexandre de Monjoy, a privacy researcher at Imperial College London, says some organizations can assume that they can meet this standard by giving users a pseudonyms. “Our results are showing that this is not true.”

de Montjoye and his colleagues hypothesized that people’s social behavior could be used to identify people from a dataset containing information about the interactions of anonymous users. To test their hypothesis, researchers trained artificial neural networks (AIs that simulate the neural circuits of the biological brain) to recognize patterns of users’ weekly social interactions.

In one test, researchers trained a neural network using data from an unidentified mobile phone service detailing the interactions of 43,606 subscribers over a 14-week period. This data includes the date, time, duration, type of each interaction (call or SMS), pseudonyms of the party involved, and the person who initiated the communication.

The interaction data of each user was organized into a network of data structures consisting of nodes representing users and their contacts. Strings threaded with interaction data that connects the nodes. After viewing the interaction web of a known person, AI set out to search for the most similar web with anonymized data.

When the Interaction Web was displayed containing information about the subject’s telephone interaction that occurred one week after the last recording of the anonymous dataset, the neural network linked only 14.7% of the individuals to the anonymized self. However, when given information about the interactions of their contacts as well as the interactions of their subjects, it identified 52.4 percent of people. Even if researchers provide AI with target-contact interaction data collected 20 weeks after the anonymous dataset, AI has a 24.3% chance of correctly identifying users and social behavior over time.

To see if AI can profile social behavior elsewhere, researchers tested it on a dataset consisting of four-week short-range data from 587 anonymous college student mobile phones collected by Copenhagen researchers. This included interaction data consisting of the student’s pseudonym, encounter time, and strength of the received signal indicating proximity to other students. These indicators are often collected by the COVID19 contact tracing application. Given the interaction data of the target and its contacts, the AI ​​has a 26.4% chance of correctly identifying the students in the dataset.

The researchers note that the findings will not apply to Google’s contact tracking protocol and Apple’s exposure notification system, which protects user privacy by encrypting all Bluetooth metadata and preventing the collection of location data.

de Montjoye hopes that this study will help policy makers improve strategies for protecting user identities. He said that under data protection laws, anonymous data could be shared to support useful research. “But for this to work, it’s important to make sure that anonymization actually protects people’s privacy.”

Source link

Most Popular