By Philip R. Fischer, MD, DTM&H
Professor of Pediatrics, Department of Pediatric and Adolescent Medicine, Mayo Clinic, Rochester, MN; Department of Pediatrics, Sheikh Shakhbout Medical City, Abu Dhabi, United Arab Emirates
SYNOPSIS: Chatbots, such as ChatGPT, are thought to have potential to eventually replace some straightforward infectious disease consultations. However, current technology does not yet instill confidence that the use of chatbots will be adequately accurate or safe for complex patient care. In addition, philosophical and ethical constraints raise doubt about the value of replacing specialist physician consultations.
SOURCES: Maillard A, Micheli G, Lefevre L, et al. Can chatbot artificial intelligence replace infectious disease physicians in the management of bloodstream infections? A prospective cohort study. Clin Infect Dis 2023; Oct 12:ciad632. doi: 10.1093/cid/ciad632. [Online ahead of print].
Sarink MJ, Bakker IL, Anas AA, Yusuf E. A study on the performance of ChatGPT in infectious diseases clinical consultation. Clin Microbiol Infect 2023;29:1088-1089.
As artificial/augmented intelligence techniques, such as chatbots (machine-based simulated conversation responding to human questions), become available, there is some enthusiasm for using machine-learning technology to reduce physician involvement with subspecialty consultations. Two recent papers respond to the question of whether chatbot-based artificial intelligence could replace human-based infectious disease consultations.
Maillard and colleagues in France noted that during the initial year of chatbot availability, the technology was proven adequate to pass third-year medical student exams and to provide satisfactory responses to clinical infectious disease scenarios and actual cases. Then, this investigative team assessed the safety and quality of management advice from chatbot Chat Generative Pre-training Transformer 4 (ChatGPT-4) related to real patients with positive blood cultures. Recommendations from specialist physicians and the chatbot were compared regarding 44 consecutive patients with positive blood cultures in a tertiary care hospital. For each patient, the chatbot’s recommendations were considered to be adequately detailed and well-written. For 59% of patients, the chatbot and the physician gave identical diagnoses. Chatbot-recommended diagnostic evaluations were considered satisfactory in 80% of cases. Chatbot-suggested initial antimicrobial therapy was considered satisfactory in 64% of patients and harmful in 2%; definitive treatment was subsequently considered appropriate in 36% of patients and harmful in 5%. Source control plans suggested by the chatbot were considered inadequate in 9% of patients. The overall multifaceted plan proposed by the chatbot was considered optimal in 2% of patients, satisfactory in 39%, and harmful in 26%. Despite some valid information being provided by the chatbot, the study team concluded that the chatbot consults were “hazardous.”
In the Netherlands, Sarink and colleagues compared 40 consultations by infectious disease specialists with chatbot (ChatGPT-3.5) consultations involving the same patients in early 2023. Patients were sometimes referred for consultation from other physicians and sometimes referred automatically due to having positive blood cultures; chronic patients (defined as those requiring more than three clinical consultations) were excluded from the study. Chatbot consults were rated numerically from 1 (incorrect advice) to 5 (in full agreement with specialist advice). The mean chatbot consultation was rated 2.8 (3.3 if for a positive blood culture, 1.3 if for osteomyelitis or a prosthetic joint infection). There were occasional internal inconsistencies in the chatbot recommendations, and the chatbot sometimes suggested testing that already had been completed. The authors concluded that then-current chatbots could provide diagnostic and therapeutic recommendations of “moderate” quality, but that expert clinicians still were needed.
COMMENTARY
Chatbots use large language model machine-learning algorithms to gather information from the internet, from social media, and from books and articles to generate responses to user questions.1 The algorithms are based on which words are most likely to follow other words, somewhat distinct from the actual meaning of the resulting sentences.1 Sometimes, human ranking of the resulting information is used to help the machine “learn” to provide better responses.1 There is some optimism that the use of these sorts of artificial intelligence could mitigate the severe shortage of infectious disease clinicians, with 80% of counties in the United States lacking even a single infectious disease specialist.1 However, the current rate of inaccuracy of chatbot recommendations suggests that filling the gap in specialty care with machine-based learning systems could end up harming patients.1
A group of 33 clinicians generated 284 medical questions and evaluated the accuracy of chatbot responses.2 On a Likert ranking scale of 1 to 6 where 6 represented completely accurate responses, the chatbot median response rating was 5.5 using ChatGPT-3.5; the corresponding completeness score, however, was only 3.2 A subset of questions were later submitted to ChatGPT-4, and the accuracy rate had improved to a median of 6, suggesting that the machines can “learn” over time in ways that improve the quality of the responses.2 While a chatbot is not yet “safe” for clinical consultation, according to some experts, infectious disease clinicians should become informed about these improving technologies as chatbots might become more accurate and useful in the future.1 A microbiologist looking at applications of chatbots to clinical microbiology suggests that further evaluation and appropriate regulation will be required, but that chatbots are likely to “have a substantial impact on medicine.”3
Of course, clinicians provide much more than cognitive knowledge for patients. There have been efforts to train chatbots to express empathy, with some reported success.1 Concerns for the ethics of incorporating chatbots into clinical care have generated significant thought and commentary.4
What about using a chatbot to help write a scientific paper? Human authors still should be fully responsible for the accuracy of any submitted content (and not defer to a chatbot as a co-author), and reference citations reported by chatbots should routinely be confirmed, since they are not always accurate.5
Underlying the discussion of whether machines will replace specialist clinician consultants is the question of what a doctor is and why we are doing medicine. Increasingly, clinicians are seen as “providers” who provide information and prescriptions about treatment that would be legal and feasible and in line with patient desires.6 This can be a challenge for infectious disease clinicians who often seek to decide what treatment option, among many, is “best” for a patient and, at the same time, “best” for the health of the community.
Shared decision-making and meeting the desires (instead of actual needs) of patients can be difficult when patients insist on getting a specific treatment that is, indeed, legal (according to governmental laws but perhaps not according to institutional regulations and professional guidelines) and feasible (certainly based on possible payment and insurance coverage but perhaps not on appropriate use of overall resources) and desired (based on what the patient has learned from advertisements, the internet, and friends but perhaps not on what should be desired in terms of best individual and public health outcomes). An alternative way to see the role of a physician is that rather than being a “provider,” the physician is a “professional” with an educated opinion or belief to profess about what is proper and best for the true health of each patient, often with a view of what is also in line with public health or a greater good, rather than merely satisfying the wishes of an individual patient.6
It is difficult to predict what the new year and upcoming years will bring. Perhaps chatbots will succeed in meeting the unfulfilled dream of electronic medical records to make our work more efficient for professionals and more helpful to patients. Or maybe not. Along the way, we will continue to strive to be professionals who practice and teach medicine in ways that use our knowledge, skills, experiences, and beliefs to best serve the needs of patients and communities.
REFERENCES
- Schwartz IS, Link KE, Daneshjou R, Cortés-Penfield N. Black box warning: Large language models and the future of infectious diseases consultation. Clin Infect Dis 2023; Nov 16:ciad633. doi: 10.1093/cid/ciad633. [Online ahead of print].
- Goodman RS, Patrinely JR, Stone CA Jr, et al. Accuracy and reliability of chatbot responses to physician questions. JAMA Netw Open 2023;6:e2336483.
- Egli A. ChatGPT, GPT-4, and other large language models: The next revolution for clinical microbiology? Clin Infect Dis 2023;77:1322-1328.
- Parviainen J, Rantala J. Chatbot breakthrough in the 2020s? An ethical reflection on the trend of automated consultations in health care. Med Health Care Philos 2022;25:61-71.
- Guleria A, Krishan K, Sharma V, Kanchan T. ChatGPT: Ethical concerns and challenges in academics and research. J Infect Dev Ctries 2023;17:1292-1299.
- Curlin F, Tollefsen C. The Way of Medicine: Ethics and the Healing Profession. University of Notre Dame Press; 2021.