Due to lengthy waiting lists and increasing costs within strained healthcare systems, many individuals are now seeking medical self-diagnosis through AI-powered chatbots like ChatGPT. A recent survey indicates that approximately one in six American adults consults chatbots for health advice at least once a month.
However, relying too heavily on chatbot responses can pose risks, largely because users often struggle to provide the necessary information for optimal health recommendations, as highlighted by a recent study led by researchers at Oxford.
Adam Mahdi, director of graduate studies at the Oxford Internet Institute and a co-author of the study, explained to TechCrunch, “The study revealed a two-way communication breakdown. Users of [chatbots] did not make better decisions compared to those who utilized traditional methods such as online searches or their own judgment.”
In the study, approximately 1,300 participants from the U.K. were presented with medical scenarios crafted by a team of doctors. Their task was to identify potential health conditions and explore possible actions—such as consulting a doctor or heading to the hospital—using chatbots alongside their own methods.
The participants interacted with several AI models, including GPT-4o, the underlying model for ChatGPT, as well as Cohere’s Command R+ and Meta’s Llama 3, which previously powered Meta’s AI assistant. The findings revealed that chatbot usage not only decreased the likelihood of accurately identifying relevant health conditions but also led to a tendency to underestimate the severity of conditions that were recognized.
Mahdi noted that participants often failed to include critical information when questioning the chatbots or found the responses challenging to interpret. “The answers they received [from the chatbots] often mixed good and poor recommendations,” he stated. “Existing evaluation methods for [chatbots] do not capture the complexities of human interaction.”
These revelations come as technology firms increasingly promote AI as a means to enhance health outcomes. For instance, Apple is reportedly working on an AI tool to provide guidance on exercise, nutrition, and sleep, while Amazon is investigating AI methods to analyze medical data for “social determinants of health.” Microsoft is also involved in developing AI to prioritize messages from patients to healthcare providers.
Nevertheless, as previously reported by TechCrunch, both healthcare professionals and patients are divided on whether AI is suitable for high-risk health applications. The American Medical Association advises against using chatbots like ChatGPT for clinical decision support, and leading AI companies, including OpenAI, caution against making diagnoses based solely on chatbot outputs.
“We recommend relying on credible sources for healthcare decisions,” Mahdi advised. “Current evaluation methods for [chatbots] do not address the complexities of human interaction. Similar to how new medications undergo clinical trials, [chatbot] systems should be tested in real-world scenarios before implementation.”