Bethany Crystal of New York credits ChatGPT with identifying a potentially life-threatening condition after she noticed unexplained red spots on her legs. Following a conversation with the chatbot, she was urged to seek immediate medical evaluation, ultimately leading to a diagnosis of immune thrombocytopenic purpura, a rare autoimmune disorder that can cause dangerous bleeding.
Crystal’s experience is increasingly common. OpenAI, the creator of ChatGPT, reports that hundreds of millions of people now consult the chatbot weekly for wellness advice. In January 2026, the company launched “ChatGPT Health,” a platform designed to enhance security for sharing medical records and data, signaling a formal move into the healthcare space.
The growing reliance on AI for medical guidance is prompting both excitement and caution within the medical community. Doctors acknowledge the potential benefits of AI tools, including improved diagnostic accuracy and more efficient patient communication. However, recent research casts doubt on the reliability of these tools when interacting with real patients.
A study conducted by researchers at the University of Oxford, published in the journal Nature Medicine, found that while large language models like GPT-4o, Llama 3, and Command R+ excel at medical knowledge tests – even passing the US medical licensing exam – their performance falters when presented with real-world patient scenarios. The study involved 1298 participants who were given ten common medical scenarios and asked to assess the severity of the situation and recommend appropriate action. Participants who used the AI chatbots did not demonstrate improved clinical assessment compared to a control group.
The Oxford study highlights a critical disconnect between the theoretical knowledge of AI models and their ability to apply that knowledge in practical, patient-facing situations. Researchers found that the clinical knowledge of the models did not translate effectively to interactions with real people, raising concerns about their suitability as a primary source of medical advice for the general public.
Despite these concerns, some doctors are already integrating ChatGPT and similar AI tools into their clinical practice. According to Mobius MD, physicians are leveraging these tools to improve efficiency and reduce documentation time. The chatbot’s ability to summarize patient records and classify symptoms is seen as a potential time-saver for busy healthcare professionals.
The UK’s National Health Service (NHS) has also been exploring the potential of AI chatbots as a “new gateway to the healthcare system,” according to a recent strategy paper. However, the Oxford study’s findings suggest that widespread implementation of AI-powered triage systems may require further scrutiny, and refinement.
OpenAI’s introduction of ChatGPT Health aims to address some of the privacy and security concerns associated with sharing sensitive medical information with an AI chatbot. The platform offers enhanced security features and a design informed by medical professionals. However, the fundamental question of whether AI can reliably provide accurate and safe medical advice remains unanswered.