Dr. Google had its issues. Can ChatGPT Health do better?

Some doctors see ⁤LLMs as a boon for ⁣medical literacy. The⁤ average patient might struggle⁢ to navigate the vast landscape of online‍ medical details—and,⁤ in particular,⁤ to distinguish high-quality sources from polished but factually dubious websites—but ‌LLMs can​ do that job for them, at least in theory. Treating patients who had searched for thier symptoms⁢ on Google ‍required “a lot⁣ of⁤ attacking patient anxiety [and] reducing misinformation,” says Marc Succi, ‍an associate​ professor at Harvard‍ medical​ School and a practicing radiologist.‍ But now, he says, “you see patients with a college​ education, a high school education, asking questions at the level of something an early med student might ask.”

The release of ChatGPT ‌Health,and Anthropic’s subsequent announcement of new health ⁢integrations for Claude,indicate that the ‌AI giants are increasingly willing to acknowledge and encourage health-related‌ uses of their ‍models. ⁢Such uses‌ certainly come with risks, given LLMs’ well-documented tendencies to agree with users and make up information rather than admit ignorance.

But those risks also have to be weighed ⁣against potential benefits. There’s an analogy here to autonomous vehicles: When ⁣policymakers consider whether to allow Waymo in their city, the key⁤ metric is ​not whether ​its cars are ever involved in accidents but whether they cause less harm than the status quo of relying on human drivers. If‌ dr. ChatGPT is an enhancement over Dr.⁤ Google—and early evidence suggests it might potentially be—it could potentially lessen ⁤the enormous burden​ of medical misinformation and unneeded ​health anxiety⁤ that the internet has created.

Pinning down the effectiveness of a chatbot such as⁤ ChatGPT or Claude for consumer health, though, ⁢is tricky. “It’s exceedingly difficult to evaluate an open-ended chatbot,” says Danielle Bitterman, the clinical lead‌ for data⁤ science and AI at the⁣ Mass General‌ brigham health-care system. Large language models score well on medical‍ licensing examinations, but ‍those ‌exams use multiple-choice questions that don’t reflect how people use chatbots to look up medical ⁢information.

Sirisha ​rambhatla, an assistant professor of management science and⁤ engineering at ⁤the University‌ of Waterloo, attempted to ‍close that gap by evaluating how GPT-4o responded to licensing exam questions ‍when it did not have ⁤access to a list‍ of possible answers. Medical experts who evaluated the responses scored⁢ only about half of them as entirely correct. ‌But multiple-choice exam questions are designed to be tricky enough that the ‍answer options don’t give them entir

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.