By Stacey Kusterbeck
Artificial intelligence (AI) tools are used for many healthcare-related applications — but what about helping clinicians with ethically complex medical decisions? A group of researchers conducted a study to evaluate ChatGPT’s “moral competence.”1 The authors concluded that ChatGPT demonstrates “medium” moral competence, with the newer edition of ChatGPT having an increased ability to evaluate moral arguments. Future versions of the tool potentially could assist physicians in ethically complex decision-making, according to the researchers. The authors suggest another possible application for AI tools: medical students generating case studies to provide challenging ethical scenarios on topics such as informed consent or allocation of resources.
Bioethicists are understandably wary of clinicians turning to AI tools for ethics guidance. “While AIs are interesting tools, they are not faultless. Like the healthcare provider who says, ‘Your Dr. Google is not the same as my MD experience,’ your AI ethicist is not the same as my decades of clinical ethics work,” says Craig M. Klugman, PhD, a professor of bioethics and health humanities at DePaul University and ethics committee member at Northwestern Memorial Hospital.
Rather than requesting a human ethics consult, clinicians might be tempted to bypass the entire process by asking AI. Also, healthcare professionals often do not like the answer that a clinical ethicist offers. “It is probably easier to ignore an algorithm than a human being who has entered a note in the medical chart. AI is less likely to offer a challenge,” says Klugman.
Therefore, ethicists have good reason to worry about AI becoming a workaround to ethics consultations. “In ethics, being proactive is always better than being reactive. At any hospital, it would be prudent for an ethicist to educate staff about the limits of AI for ethics questions and to propose policies that AI is not a substitute for human ethics consultant,” advises Klugman.
One clinical ethics program created an algorithm to solve ethical dilemmas.2 The program was limited in that it could only use Beauchamp & Childress’s Four Principles methods and could only solve fairly simple cases. “While the algorithm could not explain its reasoning behind its answers, the system would let you know which principles were in play. Most of the time, the human ethicists and the AI agreed. But what about the 25% of the time when it did not?” asks Klugman.
Klugman recently conducted his own informal experiment by asking Google Gemini if it could conduct an ethics consult. The tool responded “As an AI language model, I cannot provide an ethics consult in a hospital.” Klugman constructed some simple, hypothetical cases, such as “A patient is unconscious without a surrogate decision-maker. No family or friends can be found. Without an intervention, the patient will die,” and asked the question “Who should make medical decisions?” Next, Klugman tried out more complex cases on the tool, roughly based on actual cases that had occurred years ago. One involved a family who was uncertain what to do and a healthcare team that was unsure whether to recommend that the patient enter hospice or undergo aggressive care. Klugman found that the more detailed the case, the less certain the AI became. For all the cases, both simple and complex, the AI recommended seeking an ethics consult.
The AI also generated these two statements: “I can provide information and guidance on medical ethics,” and “As an AI language model, I cannot provide medical advice.” “The system could not explain the difference between the two kinds of advice,” says Klugman. “While Gemini did OK with easy cases, it did not provide an answer on complex ones.”
In the end, the tool was only able to offer general steps to resolve cases, such as arranging a family meeting and weighing the patient’s best interest. “What AI can’t do today it may be able to do a month from now. The advancement is that rapid,” says Klugman. However, the tools are always going to be subject to any biases in the dataset on which they are built and cannot explain the reasoning behind their advice. “When AI does not know an answer, it tends to make it up. For example, some AIs have given medical advice that was not only wrong, but would have injured or killed the patient,” says Klugman. Algorithms have been known to “hallucinate” and make up responses that are factually incorrect or nonsensical. For example, Klugman has asked the AI for a list of top journals in bioethics. It names a few real ones, but also names many nonexistent ones. “Thus, anything they tell us should be independently verified,” concludes Klugman. “Even if one uses AI for general advice, it is always best to check with a human clinical ethicist. And only look to the AI if your hospital policy allows it. If there is no policy, request that your hospital create one.”
For general ethics questions that are not about a specific patient situation, language models can be helpful, according to Benjamin Krohmal, JD, HEC-C, director of the John J. Lynch, MD, Center for Ethics at Med Star Washington Hospital Center. “AI will likely become increasingly good at collecting and organizing information, with the caveat that information should be checked and verified because sometimes models ‘hallucinate,’” says Krohmal.
AI tools can help clinicians to identify ethical issues and can suggest questions to ask that might help resolve ethical uncertainty. “That said, ethical questions about specific patient situations should generally prompt a call to an ethicist or ethics committee,” says Krohmal. Resolving ethically complex questions often requires knowledge of very specific contextual information. For privacy reasons, this should not be shared with an AI tool. “Since user prompts are saved by OpenAI, entering protected health information in a prompt amounts to sharing it with a third party,” Krohmal explains. Entering data about a patient (even if clinicians leave out identifying information) could violate the clinician’s ethical obligations to protect patient privacy.
Answering patient-specific ethics questions also requires skilled communication with multiple stakeholders. “It is very often the case that speaking to patients, their representatives, and multidisciplinary members of the clinical team resolves seemingly entrenched disagreements — or identifies new dispositive facts that result in a much different resolution of the initial ethics question. That’s not something a language model can do at this point,” says Krohmal.
REFERENCES
- Rashid AA, Skelly RA, Valdes CA, et al. Evaluating ChatGPT’s moral competence in health care-related ethical problems. JAMIA Open. 2024;7(3):ooae065.
- Meier LJ, Hein A, Diepold K, Buyx A. Algorithms for ethical decision-making in the clinic: A proof of concept. Am J Bioeth. 2022;22(7):4-20.