Anyone who is not in the medical profession and who has wandered into an emergency room may be baffled by the hours of waiting and the mysterious process by which nurses and doctors move patients through the stages of the ER.
Researchers at Yale School of Medicine and Johns Hopkins University wrote recently that an artificial intelligence program they’ve created can improve the emergency room process by making the task of triage more efficient and accurate. Triage is when nurses assess the severity of conditions at the intake of patients.
Also: How AI could supercharge your glucose monitor – and catch other health issues
“Triage is a critical first step in emergency care with profound implications for resource allocation and, ultimately, patient outcomes, including morbidity and mortality,” the scholars wrote in a study published in The New England Journal of Medicine.
Using AI in triage
It is the first study of its kind to show real effects of using AI in triage, the authors assert.
Lead author R. Andrew Taylor and colleagues describe a three-year experiment spanning 2020 through 2023, in which emergency room nurses at three ERs in the northeastern US used the AI program for 176,648 patients to help the nurses rank the severity of cases at intake.
Also: This AI tool studied medical journals to answer your health queries
The authors found that nurses using the tool were able to move patients through the emergency room process more rapidly — from how long it took to provide initial care to how long it took to assign a bed to how long it took to discharge patients. All of which resulted in decreased time in the ER overall.
The “AI-informed triage” program, a “clinical decision support tool” (CDS), resulted in “improved triage performance and ED [emergency department] patient flow,” they wrote, so that “AI could lead to decreased wait times and ED length of stay.”
But they also found that nurses with the tool were more attentive to when patients needed critical interventions, such as hospitalization, surgery, or admission to the intensive care unit.
A ‘tree’ of possible decisions
In the study, Impact of Artificial Intelligence–Based Triage Decision Support on Emergency Department Care, Taylor and his team describe a computer UI that displays the recommendation of the CDS to the nurse.
The AI program is not a large language model like OpenAI’s GPT. It is a much older, more traditional AI technique known as “random forest,” which relies on neural networks just like GPT but does not generate text outputs. Instead, it navigates a “tree” of possible decisions and chooses the best among them.
The CDS was input with the age, sex, arrival mode, vital signs, “chief complaint,” comorbidities (medical condition history that might indicate risk areas such as high blood pressure), and “active medical problems” of each patient at intake. (Interestingly, across all cases, the three most common chief complaints were abdominal pain, chest pain, and shortness of breath.)
Once the data was input, the user interface then showed the nurse a rating of the severity of the patient generated by the CDS in accord with a standard scale called the ESI, or Emergency Severity Index. The ESI rates patients from 1 to 5 in terms of the seriousness, or “acuity,” of the condition, with 1 being the most serious. A natural-language summary of the justification for the machine’s score was also displayed.
Also: This AI mirror could track your weight, blood pressure, sleep, and more
Nurses were asked whether they agreed or disagreed with the computer’s ESI score and were asked to assign their own score as they normally do in the ER. Their agreement or disagreement with the computer was an important variable in the experiment because the study measured what happened when nurses were in accord or not with the AI’s recommendation.
Patient flow results
What happened with “patient flow” was compared for patients six months before the CDS was implemented and six months after.
The principal result is that the number of people grouped by high or low acuity changed, and so did the profile of who was ranked high or low. The number of people put in “low” acuity (ESI 4 or 5) rose by nearly 50%, while the total in the “high” category declined by almost 9%, and the total in the middle, level 3, also dropped by almost 20%. More people were bumped down to lower risk with the CDS, in other words.
Also, more older patients were moved into the high-acuity group, while more young people were moved into the low-acuity group. There were also changes in how vitals, complaints, and comorbidities showed up, with, for example, chest pain becoming more prevalent in those assigned low-acuity and shortness of breath showing up more among those assigned high-acuity.
In other words, the AI led to complaints being used differently to “stratify” patients.
Also: How the ‘ChatGPT of healthcare’ could accelerate rheumatoid arthritis treatment
The immediate payoff, wrote Taylor and his team, was that patients “flowed” through things faster. “There was an observed decrease in time from arrival to the initial care area,” they wrote. There was also a change in how fast people were discharged from the ER, by as much as 82 minutes on average.
The biggest change is that those in the high-acuity category spent less time waiting before being sent to critical care, a reduction of over two hours. “The most notable changes were experienced by those critically ill or those meeting critical care or emergency surgery outcome criteria,” they wrote.
Efficiency isn’t the only outcome
It wasn’t just efficiency, however. The number of patients properly assigned to “critical care” rose when using the CDS, meaning patients who eventually wound up dying in the hospital or being admitted to the intensive care unit were more accurately identified beforehand during triage. With the AI, nurses were becoming more “sensitive” to the cases that required critical care, as Taylor and his team put it.
Also: Google’s AI co-scientist is ‘test-time scaling’ on steroids. What that means for research
“The results demonstrate a marked change in the triage process,” wrote Taylor and his team, “with improved distributional alignment, heightened precision in identifying high- and low-risk patients by AI-assisted nurses, and enhanced patient flow.”
They added, “AI triage CDS was associated with improved performance of triage nurses in the early identification of patients at risk for critical illness; this is an important primary objective of ED triage.”
And the nurses who agreed more often with the CDS ended up having even better sensitivity to the criticality of urgent care, surgery, ICU, etc.
Here, Taylor and his team can’t be sure it was the machine that guided the nurses to better decisions; it might just have been better nurses. As they wrote:
The nurse subgroup with high agreement rates generally outperformed the AI alone; conversely, the nurse subgroup with low agreement rates universally performed worse than the AI alone. While our findings suggest that higher agreement may be linked to better triage performance, it is possible that the high-agreement group of nurses possessed greater clinical acumen independent of the CDS, enabling them to better discern when to align with AI-based recommendations.
Their conclusion is that “the retention of human decision-making is critical and is aligned with prior studies that highlight a synergistic potential for integrating AI with human judgment.”
Limitations
The uncertainty about the role of human nurses’ individual acumen is not the only limitation of the study. In addition, different ERs can have seasonal trends that are “confounders,” factors that make the study’s findings problematic.
Another limitation is that the CDS drew upon electronic health records, which have their own limitations, such as a lack of specificity about patients.
Also: 10 key reasons AI went mainstream overnight – and what happens next
The most profound limitation is that the study did not follow what happened to patients after the ER. Did better triage lead to better patient outcomes? It’s not clear, wrote Taylor and his team.
“Future research should consider these longer-term factors to fully understand the implications of AI support in clinical decision-making within emergency settings,” they wrote.
One very intriguing conclusion — and it’s probably relevant for all AI implementations — is that AI needs to be tuned to the particular setting. The experiment was done across three ERs in a particular region of the US, and that clearly plays a role in the outcomes.
As Taylor and his team wrote:
Our data suggest that AI tools in health care may reach their fullest potential through site-specific deployment strategies. This approach marks a departure from the prevailing emphasis on broad generalizability and signals a shift toward a more nuanced, context-sensitive application of AI in health care.