Case presentation as a core quality improvement strategy

Urethral catheter management provides a good example of a case presentation problem that not only touches on all aspects of Clinical Productivity Waste (CPW), but also that substantially impacts care quality. To give you an idea of the problem’s scale,  according to the Center for Disease Control ( CDC) , between 15-25% of all hospitalized patients end up with an indwelling urethral. From a quality perspective, the related inappropriate care is responsible for up to 380,000 preventable infections, 9,000 preventable deaths, and more than $5B in preventable costs to hospitals. Unfortunately, these urethral catheter management issues are largely due to ineffective case presentation.

To illustrate, let’s pretend that there are three clipboards which will hang sequentially on the bed of each catharized patient. The clipboards correspond to 3 different time periods: the time prior to catheterization, the time while the catheter is in place, and the time following removal. Let’s also assume they each have the applicable case presentation template printed on them. And finally, let’s assume there’s a Physician Assistant (PA) whose only job is to keep them in sync with the EHR.

There are five persistent barriers to appropriate catheter management:

Lack of agreement on and awareness of standard protocol: Clinicians may have been trained on various catheter management protocols which exacerbates the issue, but most frequently there’s simply a lack of awareness of what the relevant protocol is. Having the case presentation template readily available “on the bedside clipboard” serves as a cognitive aid. For example, if a nurse and physician are discussing the case for removing a catheter, having the clipboard containing the protocol, with the relevant “check boxes checked” makes the presentation and discussion swift, as well as makes the protocol easy follow accurately.

Catheter data is hard to enter and find: The data are difficult to find in the EHR, and often temporarily recorded elsewhere (paper, Excel, online form, etc.). The data is neither consolidated nor readily available when needed. Having all the essential catheter protocol information recorded on the clipboard at the bedside for actionable presentation addresses the issue.

Communication challenges: Communication among clinicians is problematic. Messaging and calls to physicians are problematic because they interrupt the physician and put the onus on physicians to gather the data and figure out the status. Although rounds might be an opportunity to present the catheter management status, it’s often problematic to synch the nurses’ workflow with the physicians’s. Having the case presentation on an easily visible clipboard so any clinician could see it when they’re at the bedside would enable the requisite asynchronous communication (an electronic version would also be required for remote use)

Confusion about authority: Empowering nursing staff to remove the catheter has proven challenging. Physicians are uncomfortable with how well nurses understand the protocol for the situation, so they simply add a “do not remove” and ask the nurses to “call urology.” The case presentation clipboard with a clear protocol, along with specific authorities and the protocol patient data documented to assure adherence (enabling physicians to delegate with confidence) would go a long way to alleviating this issue.

Catheter removal is not a priority: Clinicians don’t think about a catheter unless an issue or complication arises, and physicians are commonly unaware that their patients have one. Moreover, it’s easier for overburdened clinicians simply to leave catheters in. The case presentation clipboard might partially mitigate the issue by being highly visible and by reducing documentation burden substantially.

Urethral catheter management quality issues are largely due to ineffective case presentation. The example highlights the need for case presentations that:

  • are readily available and make the protocol / guidelines clear;
  • consolidate essential information for the purpose;
  • may be viewed asynchronously to avoid interruptions;
  • incorporate clear lines of authority; and
  • reduce documentation burden to simplify adherence.

Urethral catheter management also reflects a substantial CPW problem. It includes wasted EHR time; skills-task alignment time; all the time wasted dealing with the consequences of inappropriate care; and the time wasted from interruptions.

Context is king:
Why most of clinical medicine will be immune to ChatGPT’s charms

ChatGPT is an impressive technological achievement that will certainly create numerous new product opportunities within healthcare. Providers are just in the early stages of “taking it for a test drive,” and I hate to dampen the enthusiasm, but we’re seeing a disconnect between the trial scenarios being run and how it will need to be used in practice.  The number one rule for creating effective ChatGPT prompts is to provide it with clear and precise context. This may work well  for medical exams where the test question itself provides the context, but in real life, across broad swaths of medicine, remembering and communicating context IS the issue. This raises the question of whether ChatGPT (and its brethren Large Language Models – LLMs) will be used broadly to reduce clinical productivity waste and improve quality, or whether they’ll be used to just “plug some holes.”

To illustrate, we’ll use a representative example of where clinical productivity and quality problems occur, then look at how solutions based upon ChatGPT might apply. We’ll also go one step deeper and look at the existing evidence-based best practices to highlight additional challenges LLMs will face.

For our example we’ll use a nurse who’s caring for a patient and following an evidence-based clinical pathway – a scenario that’s repeated millions of times each year. More specifically, it’s a nurse on a medical ward caring for a patient who’s just had their catheter removed.

Nurses care for multiple patients. Keeping track of each patients’ status as they switch among them is a challenge – particularly since there are a variety of information sources that need to be synthesized (e.g., report sheet, voiding diary, EHR notes, etc.). If the nurse knows the clinical pathway thoroughly and can keep track of each patient’s details accurately, there may not be any problems. However, that’s not always the case, given staffing and training challenges, not to mention the sheer complexity and the cognitive overload that occurs. If the nurse has some uncertainty in the pathway and proceeds anyway, there may be unproductive or inappropriate care (Table 1). Given those risks, the nurse typically will call a physician and present the case to get clarity about next steps, quite often interrupting the physician.

Because the nurse is uncertain, often they’re unable to present a clear and succinct summary – they’re uncertain about what information to present and where the patient is within the pathway. Throughout the day, this wastes a significant amount of time – it’s referred to as “collaboration overhead and interruption.” (Table 1). In many cases the physician will have to “step in,” performing a task that should be performed by a nurse – it’s referred to as “poor skills-task alignment” (Table 1).

How big a problem is this? From a productivity standpoint, these types of problems – problems comprising poor skills-task alignment, unproductive and inappropriate care, or collaboration overhead and interruption – waste an average 34% of physicians total time, representing over 50% of all clinical productivity waste.

 

Table 1
Primary drivers of Clinical Productivity Waste (CPW) % of time wasted
1.       Physician time wasted on EHR ~ 18-38%
2.       Poor skills-task alignment ~10-20%
3.       Unproductive and inappropriate care ~7-11%
4.       Collaboration overhead and interruption ~8-12%

 

From a quality standpoint, an inability to remember or present all the relevant information so team members share the same understanding of the case is one of the most common sources of errors. According to the Agency for Healthcare Research and Quality (AHRQ), the top three causes of medical errors are communication problems, inadequate information flow, and the human problems of not following pathways and procedures.

The cost of quality issues is more difficult to quantify, however if we look at our example of indwelling catheters, in the US there’s a 12.9% chance of a catheter-associated urinary tract infection (CAUTI), which increases the hospital stay 2-4 days. The attributable costs of a CAUTI are: $876 inpatient cost to the hospital for additional diagnostic tests and medications; $1,764 for non-ICU patients; $8,398 for hospitalized pediatric patients; and $10,197 for ICU patients.

How might ChatGPT help to address these problems?  The first hurdle is that ChatGPT needs to be provided with clear and precise context to answer accurately. Basically, ChatGPT needs a case presentation – not a full one, but rather one that’s concise and specific to the question at hand:  where exactly the patient is in the pathway and the rationale for how they arrived there, the relevant details to make any decisions already synthesized (LLMs are confused by irrelevant context), any specific issues or concerns, etc.  Let’s leave aside that someone may need to type in (or transcribe) all the details. If the nurse has all that information, organized and accurate, the nurse probably doesn’t need ChatGPT. Ambiguity and errors in the case presentation ARE the root of the problem.

Besides context, what other challenges will ChatGPT and its brethren face? What would a chat have to “look like” to follow evidence-based best practices for digital aids that solve the above types of problems?

 

Table 2
Best PracticeResearch says to make the aid: Chat GPT Challenge
Readily available – Make it “poster on the wall” easy to find. Clinicians won’t dig to find it. ChatGPT could be readily available, if integrated into an EHR, with a separate chat for each patient
Graphical and appropriate for their skill level. 65% of people are visual learners, and a picture is worth a thousand words. There’s a reason we use flowcharts and graphics. If instructions take too long to read, clinicians won’t. ChatGPT is a text-based interface. We moved from text-based interfaces to GUIs about 50 years ago. It took over a decade. Hopefully it won’t take that long in the case of LLMs.
Familiar Match exactly what clinicians were trained on. Clinicians don’t have the time to re-learn or “cognitively translate” from what they learned to a different model. ChatGPT is designed not to replicate its training content. At best it can approximate. LLMs like ChatGPT are designed to be plausible, not precise.
Concise and precisely tailored – Make it highly specific to the patient and the task at hand, with the patient’s information tied in and extraneous information removed. Context specific aids substantially outperform general ones. ChatGPT can’t gather precisely relevant data in a trustworthy fashion..
Collaborative – Enable the whole clinical team to work in a coordinated manner. Medicine is a team sport. ChatGPT could be collaborative, potentially having chats shared among users, each user with a different role.
Consolidated and actionable – Unify the task interaction, so clinicians don’t need to switch among tools. (Retain the EHR as the “system of record”). Reduce the cognitive load as much as possible, or clinicians won’t use it. ChatGPT is a separate tool, which would compound cognitive load issues.
Usability tested and refined – Spend the time to ensure that the aid is effective for the specific users and the task at hand, and to avoid possible unintended harm or improper use. ChatGPT has no mechanism for usability testing and refinement. It’s a critical weakness of LLMs.

 

Yes, ChatGPT is an impressive technological achievement. But ChatGPT and LLMs, on their own have substantial inherent limitations that hinder their ability to improve productivity and quality across any broad spectrum of clinical use cases. Their ability to handle natural language interaction is unparalleled. If they can be combined with other tools that are graphical, familiar, precise and trustworthy, together they’ll have a huge positive impact on medicine.

Evaluating risk when using AI in healthcare – the TRUST framework

ChatGPT has created a surge in interest in the medical community – certainly if you measure by the number of publications about it. There are many potential use cases for these impressive types of Large Language Model (LLM) AI — drafting messages to patients, giving general advice, providing medical education, and more. But for every success like passing the medical licensing exam or providing better responses to patient messages than physicians do, there are failures, like flunking the gastroenterology exam and completely making up medical citations to justify its answers.

So, when AI like ChatGPT be safe to use in medicine? A more precise way to ask that question is when will the risks of using a particular AI model in a particular scenario be acceptable to your particular organization, given the potential benefits? Here we talk about the risk side of that equation – what’s a framework for evaluating the risks to see if they’re acceptable? We call it the TRUST framework: Transparent, Reviewable, Understandable, Secure, Testable. It applies broadly to using AI in medicine, not just to GPTs.

Let’s start with four certainties:

  • AI is most accurate when it has lots of examples. It can be downright inaccurate if it has few (class imbalance / long tail data distribution).
  • If the instances being asked about don’t fit well within the AI’s training, the results can go wrong (domain shift).
  • Real people are often curating inputs or results. Their biases get incorporated into the model (cognitive bias).
  • Bad training data means bad results — AI is no exception to the garbage-in-garbage-out rule.

AI use in medicine must be weighed against these certainties. For example, there are far too many examples of AI bias to choose from concerning race, gender, age, and socioeconomic status among other factors. A model for detecting skin cancer was thought to be highly accurate, but later found to be less than half as accurate for people of color because it was trained on datasets of predominantly fair-skinned patients. In cardiology, coronary heart disease (CHD) is overwhelmingly misdiagnosed in women, yet prediction models are trained on predominantly male datasets. A sleep scoring model seemed to work well, but failed miserably decrypting sleep disorders in older patients, because there weren’t enough of them in the training dataset. A model for asthma management in children was found to be far less accurate for those of lower socio-economic status, largely due to incomplete EHR source data.

Another key risk area for AI in medicine is context. Consider the challenges of using AI trained on a specific large corpus of medical knowledge: clinical guidelines. The question for a healthcare provider is whether the model matches your specific context:

  • Is the AI model trained on the same guidelines you use: from your country, from the appropriate specialty society or source, targeted at your mix of skills and equipment, incorporating your most recent advances, and for your patient population? If not, the model may not be the right one.
  • How recent are the guidelines, and have they changed since the original model training? If they’re out of date, the model will be out of date as well.
  • Was the model trained using EHR data as a proxy for the guidelines? EHRs are notoriously full of errors and only ~ 3050% of physicians use the most recent clinical guidelines. Thus, the model may contain errors.

In each of these cases, the model may not prioritize the right information, display the right data, or suggest the right course of action for your context — all leading to avoidable errors. It’s emblematic of the broader risks created by AI.

Now let’s look at how the TRUST framework can help to understand AI risks.

Transparent

Transparency is about seeing inside the black box of AI. Visibility into the training data, methods, and curators enables grasp of potential bias, patient population mismatch, likelihood of misdiagnosis, reliance on outdated information, the list goes on (and on). Without transparency, AI risk is higher — and certainly more difficult to assess.

From a transparency perspective, insight into the exact composition of the training corpus is essential to evaluate risk. Yet, LLMs and training on large bodies of publications are particularly inscrutable in this regard. Even if you’re provided full training data “transparency” by an AI vendor, it may not be practical to dig into the training datasets themselves. It may be more practical to require transparency into the characteristics of the training data (e.g., specific population), the curators (e.g., demographics), and the methodologies (if any) used to assess and mitigate bias and other errors. There are tools designed specifically to address transparent reporting (like TRIPOD), assess the risk of bias (like PROBAST) and others to mitigate ML bias.

Recommendation: Require transparency into training data, methods, and curators. Assess if they match your full context and assume higher risk if they don’t. Require vendors to follow established methodologies to mitigate errors and bias.

Reviewable

While transparency helps to mitigate individual model risk, reviewability helps to mitigate more systemic risk. It’s most applicable to composite AI, which uses more than one model in sequence or collectively to reach a result. It also applicable to composite AI’s simpler brethren — AI that gathers data from an HER — because both share the same underlying problem. Can you review interim results to ensure that errors are intercepted and corrected so they don’t propagate and cause larger problems? In the complex world of healthcare, “No AI is an island.” The level of reviewability substantially affects the risk of using AI.

Here’s a personal anecdote to illustrate. My EHR documentation says I have heart tumors. Yes, heart tumors. Fortunately, I don’t. After a routine CT, an ML model mis-transcribed “unremarkable heart chambers” into “unremarkable heart tumors.” As I learned the hard way, the word “unremarkable” really doesn’t fit with “heart tumors,” so a subsequent AI model is quite likely to ignore the qualifier. Nor is my problem (records error, not heart tumors) uncommon. Up to half of health records may contain an error, 16% of which may be serious. EHRs are full of errors, omissions, and conflicting data, among other problems. Any AI used to help determine my medications, treatments, risk evaluation, insurance, and so on would all be detrimentally impacted without the ability to intercept the error in-line and correct it for downstream use.

Recommendation: Ensure there’s reviewability at each step in the data pipeline. Ensure that there’s a mechanism for updates, corrections, and consults to supplant erroneous data or results for “downstream” use.

Understandable

Using an AI model entails a lot more risk when you’re unable to understand how it reached a result. Understandability in AI is generally drawn from interpretability and explainability. Interpretability refers to models that are transparent in terms of how outcomes are generated — their “internal chain of thought” or reasoning, if you will. By contrast, explainability refers to creating a second model to explain the initial ML-based system results because the core ML-based system isn’t necessarily transparent (say, because it has millions of parameters). Having an interpretable model means that clinicians can understand and review how an outcome was generated. Consequently, interpretable models are better able to engender trust and less prone to error propagation. By contrast, explainable ML is often unreliable, can be misleading, and may fail to deliver clarity about function and objectives. Think of it as being rewarded for persuasiveness, not accuracy — obviously problematic in most of healthcare.

Recommendation: Focus upon using inherently interpretable AI models. Ensure that in a complex reasoning chain, all the interim results are visible, that the rationale in individual steps be traced back to reputable sources, and that the full context is evident.

Secure

Healthcare has unique security and privacy concerns that impact AI risk. While HIPAA and traditional cybersecurity measures represent ground floor elements in secure AI, medical AI risk also derives from the complex interaction between training data and the training algorithm. If vulnerabilities are found, you can’t simply “patch and continue.” The model itself may need retraining.

AI extends privacy and security threat vectors into new realms. In some AI, even if the original data is deleted, there are model inversion attacks that can reconstruct original training data. ChatGPT is known to memorize training data that should be protected. If a healthcare AI model trained on patient information is “out in the wild,” private patient data can be exposed with the correct attack — a gross violation of privacy. Patient context provided to a LLM in an ongoing conversation is also an issue. All the requisite patient information consumed to answer questions is transferred and stored for at least the duration of that conversation. Where and how it’s stored is an obvious attack vector.

Recommendation: Ensure that patient data and context, as well as the AI model itself, remain “inside your four walls.” Ensure that patient data is neither used in training, nor in model improvement, unless appropriately licensed and secured.

Testable

How can you verify trustworthiness and assess risk except through actual testing? The narrower the use case, the more readily testable the model is — and the more likely it is the developer will provide testing. Of course, given the problems of domain shift, any AI model needs to be tested in situ before being put into practice, which impacts ROI. Methodologies for validating models and quality criteria in AI are coming , but it’s unclear how (if ever) they’ll apply to LLMs. LLM plug-ins may be able to address the issue in specific problem segments, but only if there’s a way to validate that they’re being called appropriately.

Recommendation: Adopt a framework for testing and validation. Prioritize understandable, reviewable AI, where individual steps can be separately and collectively verified via testing.

Conclusion

Popular AI technologies like ChatGPT are impressive, but they are also risky for clinical use because they’re not trustworthy (and it’s possible they may never be). That said, they offer the immediate promise of providing substantial productivity improvements in situations where the stakes are lower — for example, generating draft communications for patients like email messages or discharge notes, answering general medical questions quickly, and the like.

Risk must be weighed against reward with AI healthcare settings. The narrower the AI training, use case, and context, the more straightforward it is to assess and mitigate risk. Conversely, the broader the training and more systemic the AI’s use, the more challenging risk assessment becomes — and the more methodological mitigation must be.

Well-founded fears of poorly understood “black box” conclusions or AI-triggered errors harming patient health rate among several current barriers to AI adoption in healthcare.

For medical use, AI that that you can TRUST — that is fully transparent, reviewable, understandable, secure, and testable — has a much lower risk profile, better addresses those barriers, and will be more easily adopted within health organizations.

Clinical Productivity Waste (CPW):
The underlying cause of physician burnout costs providers $372B/yr.

US physicians are overloaded and burned out. They now work an average upwards of 11 hours per day. Over 60% exhibit burnout symptoms, over 20% want to leave the profession, and one in ten has contemplated suicide.†

A recent survey of over 20,000 physicians identified pervasive burnout drivers and aggravating factors: crushing workload, time pressures and related stress; electronic health record (EHR) software use; inefficient teamwork; chaotic work environment; poor work control; and not feeling valued.

These are all symptoms of a much deeper problem: clinical productivity waste.

Clinical productivity waste (CPW) is the squandering of physician time and talent. It is what drives physician burnout — and it now consumes most of their workday. On average, ~62% of physicians’ time is wasted.

The cost of CPW to healthcare providers is substantial. The often-cited figure for physician burnout cost is $4.6B annually. However, its primary root cause — CPW — actually costs the healthcare industry $372B/yr.

The math is straightforward: The total direct expense of physicians in the healthcare industry is just over $600B/yr. The mean % of physician time lost to CPW is 62%, times $600B is $372B.

Breaking Down CPW

There are four primary ways in which physician time is wasted, contributing to burnout and accruing in CPW:

Primary drivers of CPW % of time wasted
1.       Physician time wasted on EHR ~ 18-38%
2.       Poor skills-task alignment (inefficient teamwork) ~10-20%
3.       Unproductive and inappropriate care (chaotic) ~7-11%
4.       Collaboration overhead and interruption (poor work control) ~8-12%

 

Here are some examples of each:

  • Wasted EHR time. Electronic Health Record (EHR) software is designed primarily for administrative use, as opposed to clinical utility. But it has become the de facto core of healthcare IT. Result: For every hour of time seeing patients, physicians spend around two hours of EHR time, much of it wasted. Consider performing a simple chart review, which accounts for 33% of the ~18-38% physician time wasted in the EHR: going into the patient’s records, digging up those relevant to your specialty or immediate needs, then synthesizing what’s going on in the case. For example, in a cancer case, the physician might want pertinent information, such as the location and grade of the tumor, the risk stratification, the results from the most recent CT, where the patient is in the treatment plan, and so on. But this information is spread over numerous records in the EHR, that not only must be searched for and found in the system, but also must be read and parsed, then extracted and synthesized in order for the physician to fully understand the case.
  • Poor skills-task alignment. Given current staffing and training issues in healthcare, physicians are often performing tasks that other lesser-skilled staff could or should be doing on their own. For example, with a nurse in cardiology caring for a patient with suspected heart failure, there’s a protocol to follow. But with a shortage of cardiology NPs, the nurse on duty may need to ask the physician “what’s next” at each step. Before making any decisions, the physician is going to have to understand where the nurse is in the protocol (and if the protocol has been followed), as well as what the status and updated situation is – and that’s time wasted relative to physicians are working “at the top of their license.”
  • Unproductive or inappropriate care. There are numerous examples where physician time is spent needlessly. For example, a referral of a patient to a specialist often comes with a long fax of records on the case. The specialist wades through the fax to make sense of the case only to find that the referral is premature or even that they’re not the right specialist (say it’s spine surgeon and the patient hasn’t undergone prerequisite physical therapy yet).
  • Collaboration overhead and interruption. Collaboration is essential in healthcare, but interruptions and task-switching can waste substantive amounts of time and incur increased cognitive burden. In the cardiology example above, each time the nurse contacts the physician about what to do next, the physician needs to spend time to refresh their memory on the case, answer the question, and then afterwards to refocus on what they were originally doing. That may only take 2-3 minutes per interruption, but the wasted time adds up quickly.

The amounts of time wasted in each of these examples will vary by physician and situation. One physician may always be required to wrestle information from the EHR, while another can “staff it out” for planned appointments. Another will waste more time dealing with unproductive diagnostics, and another covering gaps in support staff skills or training. Importantly, CPW derives from all these drivers. EHR inefficiency, while it’s the most widely discussed physician productivity drain, still accounts for less than half of total CPW.

The Financial Case for CPW Cost Recovery

US physician burnout constitutes a healthcare crises, and the AAMC now projects that the current shortage of ~40,000 physicians may grow to a shortage of as many as 110,000 physicians by 2030. Unsurprisingly, according to a survey of healthcare CEOs, staffing was their number one challenge in 2022, followed by financial challenges and safety/quality concerns — all impacted by CPW.

To say nothing of the urgent need to improve physician professional satisfaction, CPW is the most egregious deficiency in healthcare function and represents a staggering opportunity for cost recovery. In a mid-sized hospital with 575 acute care beds and 1,100 physicians on staff, the annual cost savings of eliminating 15% of CPW would be $55.4M.

Saving even a quarter of physicians’ wasted productivity (and 15% of their total time) would cut industry costs by over $90B. Said another way, 15% of wasted productivity was eliminated, and all the savings were allocated to cost, it would improve a provider’s operating income by an absolute 2% (-.5% operating income becomes +1.5% operating income). That’s substantial when you consider that average hospital operating margins fell from -0.7% in December 2022 to -1% in January 2023, with over 600 rural hospitals in danger of closure.

Cutting CPW is a “force multiplier” – it combats physician burnout, which in turn improves patient safety and care quality, while substantially improving financial results.

For restoring the health of the US healthcare industry, that’s a trifecta.

† For the full list of references underlying the figures in this post, please contact the author.