Researchers at ETH Zurich and Anthropic have demonstrated a significant advancement in the ability of large language models (LLMs) to deanonymize individuals online, matching pseudonymous accounts to real-world identities with increasing accuracy and at a surprisingly low cost. The findings, published in a recent paper, reveal that LLMs can now identify users across platforms like Hacker News, Reddit, LinkedIn, and even anonymized interview transcripts.
The research team built a four-stage pipeline to achieve these results: extracting identity signals from text, searching via embeddings, reasoning over potential candidates, and calibrating confidence scores. Testing the system on Hacker News and LinkedIn yielded a 45.1% recall rate at 99% precision, a stark contrast to the 0.1% recall achieved by classical deanonymization methods. The study likewise showed success in matching users across different Reddit communities, with a 2.8% recall rate at 99% precision, and in linking a user’s past and future activity on Reddit, achieving 38.4% recall at 99% precision.
“We show that LLM agents can figure out who you are from your anonymous online posts,” explained Simon Lermen, an AI engineer at MATS Research and a corresponding author of the paper, in an online post. “Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates.”
The cost of deanonymization using this method is remarkably low, ranging from $1 to $4 per profile. This represents a substantial shift from previous deanonymization efforts, which, as noted in a 2008 study, required structured data, specialized algorithms, and manual verification. The new research suggests that simply providing GPT-5.2 with a user’s Reddit comments can be sufficient to locate their LinkedIn profile.
The implications of this research are significant for online privacy. Researchers warn that the assumption of practical obscurity – the idea that anonymity can be maintained through low profile activity – is no longer valid. The study highlights that persistent usernames are increasingly vulnerable to being linked to real-world identities, and the more a user posts online, the easier they become to identify. Users sharing 10 or more movies, for example, were matched with 48% recall.
The findings build upon earlier work concerning the risks of data aggregation and identification, dating back to Latanya Sweeney’s 2002 research on k-Anonymity, which demonstrated the possibility of identifying a significant portion of the US population using limited data points. Yet, the automation and scalability offered by LLMs represent a new level of threat to online anonymity.
As of March 3, 2026, neither LinkedIn nor Reddit have issued public statements regarding the implications of this research. Further investigation into potential mitigation strategies and policy changes is scheduled for a workshop hosted by MATS Research next month.