ChatGPT Health, OpenAI’s artificial intelligence-powered medical advice tool, is failing to identify critical medical emergencies and exhibiting a concerning inability to detect suicidal ideation, according to a newly published study. The findings, released in the February edition of the journal Nature Medicine, raise serious safety concerns as the platform reaches millions of users.
Launched to a limited audience in January, ChatGPT Health allows users to “securely connect medical records and wellness apps” to generate health advice. OpenAI reports that more than 40 million people now seek health-related guidance from the AI daily. However, the independent safety evaluation casts doubt on the reliability of the service.
Researchers, led by Dr. Ashwin Ramaswamy of the Icahn School of Medicine at Mount Sinai, created 60 realistic patient scenarios spanning a range of conditions, from minor illnesses to life-threatening emergencies. Three independent doctors independently assessed each scenario to determine the appropriate level of care, based on established clinical guidelines. ChatGPT Health was then presented with these scenarios, generating nearly 1,000 responses which were compared to the doctors’ assessments.
The study found that ChatGPT Health under-triaged more than half of the emergency cases. In 51.6% of instances requiring immediate hospitalization, the platform recommended staying home or scheduling a routine appointment. “If you’re experiencing respiratory failure or diabetic ketoacidosis, you have a 50/50 chance of this AI telling you it’s not a big deal,” said Alex Ruani, a doctoral researcher in health misinformation mitigation with University College London, describing the results as “unbelievably dangerous.”
The simulations revealed particularly alarming failures. In one instance, the platform advised a woman experiencing suffocation to book a future appointment, despite the scenario indicating she would not survive until that appointment. Researchers found this occurred in 84% of similar simulations. Conversely, the platform incorrectly advised 64.8% of completely healthy individuals to seek immediate medical attention.
The AI’s response to indications of suicidal thoughts was also found to be inconsistent. When presented with a scenario involving a 27-year-old patient expressing suicidal ideation, a crisis intervention banner linking to support services appeared consistently. However, when the same scenario was presented with the addition of normal lab results, the banner vanished entirely, failing to appear in any of the 16 subsequent attempts. Dr. Ramaswamy noted that a crisis intervention system dependent on lab results is “arguably more dangerous than having no guardrail at all.”
The study also highlighted the platform’s susceptibility to external influence. ChatGPT Health was nearly 12 times more likely to downplay symptoms when informed that a “friend” in the scenario suggested the issue was not serious. This susceptibility to “anchoring bias” – where initial information unduly influences subsequent judgments – raises concerns about the platform’s objectivity.
OpenAI acknowledged the independent research but stated that the study did not reflect typical user interactions with ChatGPT Health. A spokesperson emphasized that the model is continuously updated and refined. However, researchers argue that even simulated scenarios pose a plausible risk of harm, justifying stronger safeguards and independent oversight.
According to a report from the BBC, OpenAI released estimates in October 2025 indicating that approximately 0.07% of ChatGPT users exhibit possible signs of mental health emergencies, including psychosis or suicidal thoughts. Although OpenAI maintains these cases are “extremely rare,” the figure represents a significant number of individuals given the platform’s 800 million weekly active users, as reported by OpenAI CEO Sam Altman. OpenAI has assembled a network of over 170 psychiatrists, psychologists, and primary care physicians from 60 countries to advise on responses to these sensitive conversations.
Professor Paul Henman, a digital sociologist and policy expert with the University of Queensland, described the Nature Medicine study as “a really important paper,” warning that widespread use of ChatGPT Health could lead to both unnecessary medical presentations for minor conditions and a failure to seek urgent care when needed, potentially resulting in “unnecessary harm and death.” He also raised the prospect of legal challenges against tech companies related to suicide and self-harm following the use of AI chatbots.
Henman further emphasized the lack of transparency surrounding the platform’s development and training. “It is not clear what OpenAI is seeking to achieve by creating this product, how it was trained, what guardrails it has introduced and what warnings it provides to users,” he said. “Because we don’t recognize how ChatGPT Health was trained and what the context it was using, we don’t really know what is embedded into its models.”