Context is king:
Why most of clinical medicine will be immune to ChatGPT’s charms

ChatGPT is an impressive technological achievement that will certainly create numerous new product opportunities within healthcare. Providers are just in the early stages of “taking it for a test drive,” and I hate to dampen the enthusiasm, but we’re seeing a disconnect between the trial scenarios being run and how it will need to be used in practice.  The number one rule for creating effective ChatGPT prompts is to provide it with clear and precise context. This may work well  for medical exams where the test question itself provides the context, but in real life, across broad swaths of medicine, remembering and communicating context IS the issue. This raises the question of whether ChatGPT (and its brethren Large Language Models – LLMs) will be used broadly to reduce clinical productivity waste and improve quality, or whether they’ll be used to just “plug some holes.”

To illustrate, we’ll use a representative example of where clinical productivity and quality problems occur, then look at how solutions based upon ChatGPT might apply. We’ll also go one step deeper and look at the existing evidence-based best practices to highlight additional challenges LLMs will face.

For our example we’ll use a nurse who’s caring for a patient and following an evidence-based clinical pathway – a scenario that’s repeated millions of times each year. More specifically, it’s a nurse on a medical ward caring for a patient who’s just had their catheter removed.

Nurses care for multiple patients. Keeping track of each patients’ status as they switch among them is a challenge – particularly since there are a variety of information sources that need to be synthesized (e.g., report sheet, voiding diary, EHR notes, etc.). If the nurse knows the clinical pathway thoroughly and can keep track of each patient’s details accurately, there may not be any problems. However, that’s not always the case, given staffing and training challenges, not to mention the sheer complexity and the cognitive overload that occurs. If the nurse has some uncertainty in the pathway and proceeds anyway, there may be unproductive or inappropriate care (Table 1). Given those risks, the nurse typically will call a physician and present the case to get clarity about next steps, quite often interrupting the physician.

Because the nurse is uncertain, often they’re unable to present a clear and succinct summary – they’re uncertain about what information to present and where the patient is within the pathway. Throughout the day, this wastes a significant amount of time – it’s referred to as “collaboration overhead and interruption.” (Table 1). In many cases the physician will have to “step in,” performing a task that should be performed by a nurse – it’s referred to as “poor skills-task alignment” (Table 1).

How big a problem is this? From a productivity standpoint, these types of problems – problems comprising poor skills-task alignment, unproductive and inappropriate care, or collaboration overhead and interruption – waste an average 34% of physicians total time, representing over 50% of all clinical productivity waste.

 

Table 1
Primary drivers of Clinical Productivity Waste (CPW) % of time wasted
1.       Physician time wasted on EHR ~ 18-38%
2.       Poor skills-task alignment ~10-20%
3.       Unproductive and inappropriate care ~7-11%
4.       Collaboration overhead and interruption ~8-12%

 

From a quality standpoint, an inability to remember or present all the relevant information so team members share the same understanding of the case is one of the most common sources of errors. According to the Agency for Healthcare Research and Quality (AHRQ), the top three causes of medical errors are communication problems, inadequate information flow, and the human problems of not following pathways and procedures.

The cost of quality issues is more difficult to quantify, however if we look at our example of indwelling catheters, in the US there’s a 12.9% chance of a catheter-associated urinary tract infection (CAUTI), which increases the hospital stay 2-4 days. The attributable costs of a CAUTI are: $876 inpatient cost to the hospital for additional diagnostic tests and medications; $1,764 for non-ICU patients; $8,398 for hospitalized pediatric patients; and $10,197 for ICU patients.

How might ChatGPT help to address these problems?  The first hurdle is that ChatGPT needs to be provided with clear and precise context to answer accurately. Basically, ChatGPT needs a case presentation – not a full one, but rather one that’s concise and specific to the question at hand:  where exactly the patient is in the pathway and the rationale for how they arrived there, the relevant details to make any decisions already synthesized (LLMs are confused by irrelevant context), any specific issues or concerns, etc.  Let’s leave aside that someone may need to type in (or transcribe) all the details. If the nurse has all that information, organized and accurate, the nurse probably doesn’t need ChatGPT. Ambiguity and errors in the case presentation ARE the root of the problem.

Besides context, what other challenges will ChatGPT and its brethren face? What would a chat have to “look like” to follow evidence-based best practices for digital aids that solve the above types of problems?

 

Table 2
Best PracticeResearch says to make the aid: Chat GPT Challenge
Readily available – Make it “poster on the wall” easy to find. Clinicians won’t dig to find it. ChatGPT could be readily available, if integrated into an EHR, with a separate chat for each patient
Graphical and appropriate for their skill level. 65% of people are visual learners, and a picture is worth a thousand words. There’s a reason we use flowcharts and graphics. If instructions take too long to read, clinicians won’t. ChatGPT is a text-based interface. We moved from text-based interfaces to GUIs about 50 years ago. It took over a decade. Hopefully it won’t take that long in the case of LLMs.
Familiar Match exactly what clinicians were trained on. Clinicians don’t have the time to re-learn or “cognitively translate” from what they learned to a different model. ChatGPT is designed not to replicate its training content. At best it can approximate. LLMs like ChatGPT are designed to be plausible, not precise.
Concise and precisely tailored – Make it highly specific to the patient and the task at hand, with the patient’s information tied in and extraneous information removed. Context specific aids substantially outperform general ones. ChatGPT can’t gather precisely relevant data in a trustworthy fashion..
Collaborative – Enable the whole clinical team to work in a coordinated manner. Medicine is a team sport. ChatGPT could be collaborative, potentially having chats shared among users, each user with a different role.
Consolidated and actionable – Unify the task interaction, so clinicians don’t need to switch among tools. (Retain the EHR as the “system of record”). Reduce the cognitive load as much as possible, or clinicians won’t use it. ChatGPT is a separate tool, which would compound cognitive load issues.
Usability tested and refined – Spend the time to ensure that the aid is effective for the specific users and the task at hand, and to avoid possible unintended harm or improper use. ChatGPT has no mechanism for usability testing and refinement. It’s a critical weakness of LLMs.

 

Yes, ChatGPT is an impressive technological achievement. But ChatGPT and LLMs, on their own have substantial inherent limitations that hinder their ability to improve productivity and quality across any broad spectrum of clinical use cases. Their ability to handle natural language interaction is unparalleled. If they can be combined with other tools that are graphical, familiar, precise and trustworthy, together they’ll have a huge positive impact on medicine.