Doubts Raised Over Anthropic‘s Claim of AI-Driven Cyberattacks by Chinese Hackers
Anthropic, an AI safety and research company, recently reported that a China-linked hacking group leveraged its Claude AI model to autonomously conduct cyberattacks against 30 companies. Though, the claims have been met with skepticism from autonomous security researchers who question the level of autonomy and the validity of the findings.
According to a Wall Street Journal report, Anthropic’s initial testing of Claude’s capabilities in a red-teaming exercise revealed significant issues with the AI’s reliability. The model produced “hallucinations,” fabricating prosperous outcomes, claiming access to non-functional data, and identifying publicly available details as critical discoveries. This necessitated thorough human verification of all results.
Several experts have voiced their doubts. Dan Tentler, founder of Phobos Group, told Ars Technica, “I continue to refuse to believe that attackers are somehow able to make these models do things that no one else can.Why do models give these attackers what they want 90 percent of the time,while the rest of us have to deal with ass-kissing,subterfuge,and hallucinations?”
Cybersecurity researcher Daniel Card labeled the publication a “marketing stunt,” as reported by Heise Online. Kevin Beaumont, also criticized Anthropic for failing to release “Indicators of Compromise” (IoCs) - digital evidence that would allow for independent verification of the attacks.
A key concern is the lack of transparency surrounding the incident. Without publicly available IoCs, neither the attribution to a Chinese hacking group nor the claimed 90% automation rate can be independently confirmed.
Furthermore, the effectiveness of the attacks is being questioned. While 30 targets were attempted, only a “small number” were successfully breached. Ars Technica reports that researchers suggest traditional, human-led methods might have yielded a higher success rate. The attacks relied on readily available open-source tools, indicating the AI was orchestrating existing techniques rather than developing new malware, leading some experts to compare the activity to established hacking frameworks like Metasploit, and seeing no fundamental shift in attack methodology.
In response to the campaign, Anthropic suspended the compromised accounts, enhanced its detection algorithms, and alerted relevant authorities and affected organizations. The company maintains that the same AI capabilities exploited for attack are vital for defense, and its threat intelligence team utilized Claude to analyze the data generated during the examination.
While the potential for AI to accelerate cybersecurity workflows – such as log analysis, reverse engineering, and triage – is widely acknowledged, the prospect of fully autonomous, large-scale attacks remains contentious. AI-orchestrated attacks could potentially lower the barrier to entry for less sophisticated groups, but the documented limitations, including hallucinations and a low success rate, suggest that truly autonomous cyberattacks are still a distant prospect.