Why is RAG hard on medical records compared to other domains?

by Vince Hartman

Nov 12, 2025

By using Retrieval Augmented Generation in AI, you can take a generally trained model - such as GPT-5o or Sonnet 4.5 - and connect it to external content from other domains so that the large language model (LLM) is able to perform better within that specific knowledge base. This way, you can design RAG to pull in content such as finance articles, Wikipedia, webpages, or healthcare journal articles and create purposeful knowledge bases, and thereby improving the accuracy in those domains.

You will start by taking your knowledge base, and then breaking the content into smaller pieces, or “chunks”. So let's say you have a 10,000-word article from an academic journal about plate tectonics, that article may become 10-20 individual chunks based on the word-count size you set for each one. Chunking logic is a core part of RAG and greatly affects how it will perform.

The chunks are then converted into vectors (a numerical representation of the text). RAG is then able to retrieve the most relevant content based on mathematical similarity (commonly cosine similarity) of the vectors of the user’s question. So, when the user searches for content about earthquakes, RAG will pull in vectors that rank high in similarity - which could correspond to sentences containing “seismic waves” or “fault lines” since they are commonly found together.

Designing a clinical RAG system that excels in medical questions and answering is more challenging. It’s not just about vector similarity and high level keywords. Medical records are more unique in that they are temporal, highly duplicative, use lots of acronyms, and lack a topic cohesiveness. In contrast, research articles, like our plate tectonic example, contain clear keywords and content with an overarching thesis. That makes it much easier to achieve high levels of accuracy without requiring much algorithmic changes to a general purpose RAG framework. Out of the box RAG is designed to find the most important concepts in documents and such predictable structure makes this easy.

Conversely, medical record retrieval isn’t about finding “the most important” concept, it’s about finding what content is relevant and important for a time range. A doctor might ask:

Has the patient been hospitalized in the last 12 months?
How many radiology consultations has there been in the last few months?
Return me a list of the patient's most recent x-rays?

First, when using a vector database for retrieving content for temporal questions, you need to architecture your orchestration based on (1) when the content occurred, (2) summarize chunks with thematic concepts, and (3) provide prior information in the recent chunk so it creates a chronological index when performing the search. Because of the complexities that come with medical data, there are specialized systems such as CLI-RAG (https://arxiv.org/abs/2507.06715) which are architecturally designed for semantically creating better chunks for retrieval.

Second, you need the overarching content to be embedded with a comprehensive AI summary to assist with retrieval. Without a summary, your RAG system is operating without a longitudinal story of the patient. But by weaving in a summary as a first pass in your orchestration layer of content retrieval, you can align the RAG system to find information with more importance based on high-level summarization.

Third, you need to optimize your search logic for temporal inclusion. When RAG pulls content from the chunks you defined earlier, you want to add checks to weight chunks higher within the data range the user requested. While this is important in general purpose RAG models, it is EXTREMELY important for medical RAG systems. Almost all questions that clinicians will ask have a temporal component. When asking a question on a medical chart, doctors want to know what the story is leading up to your medical problems - and that story is time based.

Through the last month at my work with Abstractive in building a RAG model on medical records, I’ve also discovered that the uptake of such a system is also dependent on the user having access to an initial high level AI summary of the patient. Expecting a clinician to drop in a medical chart and just start asking questions on the record is challenging - they need a quick cognitive mapping of the patient (a longitudinal story), before a medical RAG system has real value.And adding to value, when content is retrieved via RAG, the provenance of linking the content to the original source information (the chunks that you used for feeding into your LLM), is super essential. Uptake of RAG is built on (1) a cognitive map of the patient, and (2) source mapping to the original content for answers you provide.

In the end, Medical RAG is an incredibly promising and amazing breakthrough for healthcare, but it’s not a system that can be just spun up fast like other out of the box RAG implementations. It’s super hard and requires purposeful design where the UI/UX is as critical as the answers provided.

At Abstractive Health, our mission is simple: empower clinicians with complete, accessible, and actionable patient data when they need it most. By delivering higher retrieval rates, requiring minimal inputs, providing faster access, and leading the field in clinically validated summarization, we’re helping clinicians make faster, safer, and more informed decisions on their patients.

Clinicians can sign up here and see our technology in action for free, with quick access to see it in action for the patients they care for.