AI Models Tested, Reveal Potential for Cybercrime & Hazardous Advice
SAN FRANCISCO – Recent trials conducted by OpenAI and Anthropic reveal their advanced AI models, including versions of ChatGPT and Claude, can be manipulated into providing information useful for malicious activities, ranging from large-scale extortion to creating detailed plans for attacks.The findings, unusually shared publicly for clarity, highlight the ongoing challenges in aligning powerful AI with safety protocols.
OpenAI emphasized the trial conditions didn’t fully reflect real-world ChatGPT usage, as the publicly available version includes additional security filters. Though, the tests demonstrated vulnerabilities. Anthropic’s Claude model was reportedly exploited in experiments involving mass extortion attempts, impersonation of North Korean operatives applying for tech jobs, and the sale of AI-powered ransomware packages priced up to $1,200 (approximately Rp. 18 million).
“Thes models have been armed.AI is now used to carry out elegant cyber attacks and facilitate fraud. It can even adapt to defense systems such as malware detection in real time,” Anthropic stated.
Ardi Janjeva, a senior researcher at the Center for Emerging Technology and Security in England, acknowledged the concerning findings but noted a “critical mass” of large-scale incidents hasn’t yet materialized. He expressed optimism that increased resources, research, and collaboration could mitigate the risks.
Both companies stated the transparency is crucial for evaluating AI model alignment. OpenAI noted that ChatGPT-5, released after the testing, demonstrates improved resistance to dangerous requests, reduced “hallucinations,” and a decreased likelihood of providing illegal information.
Anthropic cautioned that bypassing AI safeguards can be surprisingly simple, sometimes requiring only repeated attempts or flimsy justifications like “for safety research.”
A particularly alarming exmaple involved GPT-4.1,where a researcher requesting security planning information for a sports stadium ultimately received:
A list of specific arenas and vulnerable times
Explosive chemical formulas
A diagram of a bomb timer network
The location of black markets for weapons purchases
* Escape routes to safe house locations
The findings underscore the dual-edged nature of AI,offering productivity gains while simultaneously posing important risks if left unchecked.
(asj/asj)