Abstractive Health
Abstractive Health
APICME Credits
Abstractive Health
Abstractive Health

AI Clinical Summaries Show Strong Performance in Emergency Medicine: A Closer Look at the Data

Dec 15, 202

This article is based on: Developing and Evaluating Large Language Model–Generated Emergency Medicine Handoff Notes

In emergency departments, patient care transitions are among the most error-prone moments in the clinical workflow. With clinicians managing multiple patients, rapid decision-making, and constant interruptions, clinical summaries can vary notably in clarity and completeness.

Our study published in JAMA Network Open explored whether large language models (LLMs) can help standardize this critical aspect of documentation. The results suggest that AI clinical summaries have real potential to enhance the consistency and quality of emergency care.

Below is a deeper, data-grounded look at what our researchers actually found and what it means.

Study Overview: How the Researchers Tested AI in Real Emergencies

We evaluated 1,600 emergency department encounters leading to hospital admission in 2023 at NewYork-Presbyterian/Weill Cornell Medical Center. Our pipeline method generated AI clinical summaries for each case using RoBERTa and Llama-2 7B LLMs.

The evaluation had two parts for measuring AI clinical summaries and physician-written summaries:

  1. Automated evaluations with ROUGE, BERTScore, and SCALE
  2. A manual clinical evaluation on a subsample of 50 encounters with 3 board-certified emergency physicians that reviewed and scored:
  • Completeness
  • Curation
  • Readability
  • Correctness
  • Usefulness
  • Patient safety risk

Each domain used a 1 - 5 Likert scale (higher = better).

Key Findings: Where AI Performed Strongly

1. Automated metrics favored the AI clinical summaries (more overlap, more detail, higher faithfulness scores)

Across automated benchmarks, the AI clinical summaries scored higher than physician-written summaries:

  • ROUGE-2: 0.322 (LLM) vs 0.088 (physician-written)
  • BERTScore (precision): 0.859 (LLM) vs 0.796 (physician-written)
  • SCALE: 0.691 (LLM) vs 0.456 (physician-written)

These results suggest that the AI clinical summaries were more similar to the reference handoff labels and more detailed with source documentation information.

2. In clinician review, AI clinical summaries scored high

In the manual evaluation (50 encounters), AI clinical summaries were generally rated in a strong range (around 4 out of 5) near physician-written levels.

  • Usefulness: 4.04
  • Completeness: 4.00
  • Curation: 4.24
  • Readability: 4.00
  • Correctness: 4.52
  • Overall patient safety risk score: 4.06

3. No AI clinical summaries were classified as a critical patient safety risk

None of the AI clinical summaries were classified as a critical patient safety risk in the manual framework.

4. A physician in the loop model is ideal

We found that AI clinical summaries are best suited for a workflow where clinicians review and edit them before it becomes final.

Where AI Still Needs Human Oversight

Despite its strong performance, the study also reinforces an important point about how to deploy AI clinical summaries safely and effectively: they work best as high-quality drafts that clinicians quickly review and finalize.

Occasional inclusion of details that clinicians might streamline

Because the AI clinical summaries are built to pull from the full ED encounter record, they may occasionally include extra details that a clinician would personally condense. In practice, this is usually a feature.

Clinical judgment still matters for emphasis

ED-to-inpatient handoffs often hinge on nuance: what matters most right now, what can wait, and what should be watched closely. AI clinical summaries do a strong job creating a structured baseline, but clinicians still provide the final layer of judgment for the receiving team.

A draft-first workflow is the best fit

A physician-in-the-loop model preserves clinical ownership while letting the AI handle the heavy lift of first-pass drafting. AI clinical summaries are strong enough to be edited into the final handoff efficiently.

This reinforces the central message:AI can draft a strong handoff and the clinician can serve as the final editor.

Why These Findings Matter for Emergency Medicine

The implications are substantial:

1. Strong baseline quality at scale

In clinician review of the random 50 encounter subsample, AI clinical summaries scored around 4 out of 5 across key domains. That’s a meaningful signal that AI summaries can be useful across varied ED admissions.

2. Reduced variation in handoff structure and content

Even among skilled clinicians, handoff quality varies based on time pressure, interruptions, and patient complexity. AI clinical summaries help establish a reliable baseline format and content coverage that can reduce that variation across providers and shifts.

3. A safety-forward approach to AI summarization

In the clinician-reviewed safety framework, none of the AI clinical summaries were classified as a critical patient safety risk. That matters, because the real question in ED workflows isn’t “does it sound good,” it’s “can we trust it as a starting point?”

4. Better coverage of the encounter record

AI clinical summaries were more detailed and more grounded in source documentation which may help reduce missed items and clearer next steps during an ED shift handoff.

5. Scalable, system-level design for the entire ED

Our AI pipeline generated consistent AI clinical summaries across all encounters, so practices and hospitals have a practical path to implement AI for improving handoff notes.

The Bottom Line: AI as a Great Partner, Not a Replacement

The study provides strong evidence that AI clinical summaries can meaningfully support clinical communication during emergency department handoffs. In both automated benchmarking and physician review, the AI clinical summaries demonstrated strong performance across key dimensions, including completeness, correctness, and overall usefulness as a draft.

AI clinical summaries are effective when empowering clinicians to work alongside them, where clinicians review and edit the draft before finalizing. That approach preserves clinical judgment and accountability, while giving ED teams a faster, more consistent starting point for high-quality handoffs.

Rather than replacing clinical expertise, AI clinical summaries are emerging as a supportive collaborator—one that can help clinicians move faster, reduce variability, and maintain high standards essential to safe, effective emergency care.

Related Articles

Case Study

Automating External Record Review in Reproductive Endocrinology

Dec 11, 2025

Physician Summarization Tool

See the Abstractive Health AI assistant in action to discover what real efficiency can look like.

Try for free
Stay ahead of the curve in healthcare innovation.
Connect

333 E 56 St, Apt 7N, New York, NY 10022

support@abstractivehealth.comLinkedIn ↗Instagram ↗

©2026 Abstractive Health. All Rights Reserved.

Certified B Corporation