Hollings researchers show how natural language processing can aid doctors

July 31, 2025
image of natural language processing researcher Jihad Obeid
Jihad Obeid, M.D., worked on a natural language processing program to unearth critical details in the medical record. Photo by Clif Rhodes

What if a computer could read a patient’s medical notes and help doctors determine important information for their treatments?

At MUSC Hollings Cancer Center, researchers led by Jihad Obeid, M.D., and Mario Fugal, Ph.D., are using an advanced form of artificial intelligence (AI) to do just that – unlocking critical details buried in medical records to tailor treatments for cancers that affect the brain. Their high-accuracy model could transform how doctors classify and treat metastatic tumors, offering faster answers and more personalized care without adding to doctors’ workloads.

In the new study, published in JCO Clinical Cancer Informatics, the researchers used a form of AI called natural language processing (NLP) to solve a frustrating problem: how to efficiently communicate specifics of the patient's diagnosis between doctors from different specialties if the patient is scheduled for radiation for brain tumors.

Tracing cancer’s origins

Most cancers in the brain, known as brain metastases, did not start there. Rather, those cancers started elsewhere in the body, such as the lung, breast, skin, kidney or digestive tract, and then traveled to the brain.

Knowing where brain metastases originated is a critical piece in the clinical puzzle. This is especially true for patients receiving a specialized treatment called stereotactic radiosurgery (SRS), which delivers a high dose of targeted radiation in a single session. While highly effective, SRS does come with risks. These include side effects from too much radiation and potential damage to healthy, noncancerous tissue nearby.

Mario Fugal 
Mario Fugal, Ph.D.

But these risks can be reduced or even avoided if patients’ treatments are tailored to their original cancer types because different cancers respond differently to radiation. For example, lung cancers are very sensitive to radiation and can be treated with lower doses. In contrast, kidney cancer tends to resist radiation and requires more prolonged treatment.

“The brain is such a sensitive organ that we want to be as precise as possible with the radiation dose,” Fugal said. “But first, we need to know what exactly we’re treating and then develop a specific treatment plan based on that information.”

Clinical notes contain a wealth of information about a patient’s diagnosis and treatment. But diving into individual patient notes to pull out the relevant details is a labor-intensive and time-consuming process.

“Medical records were never designed for research. They are often messy and imperfect,” Obeid explained. “But if we can make sense of them, we can turn them into something that helps doctors and patients by improving research efforts and enabling more precise care.”

That is where NLP – a branch of AI that trains computers to understand human language – can help. NLP essentially allows computers to make sense of what we write or say, bridging the gap between human communication and computer data.

Finding a common language

Some people may be surprised by the need for a better way to identify cancer diagnoses. Medical professionals already have a common diagnostic language to record and track diseases, known as the International Classification of Diseases (ICD) codes.

Unfortunately, for complex cases like brain metastases, those codes often miss the mark. That is because ICD codes may not address the underlying source of the tumor, especially for patients with more than one type of cancer or when the cancer spreads early. The codes also lack the specificity to break out cancer subtypes.

“The clinical note is the closest to the truth of what's going on as you can get,” Fugal said, “because it has nuance that ICD codes lack. A code will just say ‘lung cancer.’ It won’t go into whether it’s the left versus right lung, the upper versus lower part of the lung or small cell versus non-small cell. But the notes have those specifics.”

Reading between the lines

In this study, the researchers developed an NLP model that could “read” doctors’ notes and identify key words and phrases indicating the primary cancer type – for instance, words like “ductal” for breast cancer and “melanoma” for skin cancer. By developing an NLP that automatically extracted that data from clinical notes, the researchers hoped to make it easier to group patients for treatment and research.

“With better data, we can design better studies, make faster discoveries and tailor treatments more precisely to each patient,” Fugal emphasized.

The researchers tested the NLP model on doctors' notes found in electronic health records, specifically, radiation oncology notes with detailed descriptions of the cancer types and histories. They wanted to see if the model could identify the original cancer diagnoses more accurately than standard medical codes.

The researchers tested the model in 82,000 clinical notes from the medical records of more than 1,400 patients treated with SRS for brain metastases. The model was designed to read the notes, look for patterns in the text and use that data to determine the primary cancer type for each patient. Expert reviewers manually examined the notes for confirmation.

Better data, better care

The NLP model was strikingly accurate. While ICD codes were often wrong or unclear, the model correctly identified the primary cancer in more than 90% of cases. For common cancers like lung, breast and skin cancer, classification was nearly perfect at 97%. The program could even identify lung cancer subtypes, which ICD codes were unable to do. These results validate NLP as a powerful tool for clinical data extraction, capable of outperforming traditional medical codes in determining a patient’s original cancer diagnosis.

“This approach fills a crucial gap,” Fugal said. “Our AI tool pulled the diagnoses from doctors’ notes quickly, accurately and without extra work for care teams.”

Importantly, the model was designed to be simple and efficient. It does not require large datasets, robust training examples or intensive computing power, and it avoids many of the ethical concerns associated with larger, generative AI models.

“The real power here is that this approach is lightweight and scalable,” Obeid said. “Other hospitals could easily use this tool, even with limited resources.”

 
“The real power here is that this approach is lightweight and scalable. Other hospitals could easily use this tool, even with limited resources.”

Jihad Obeid, M.D.

The research team describes this work as a major step toward data-driven, personalized care for patients with cancer. Increased efficiency and accuracy in cancer classification could ultimately mean faster research, better treatment and less guesswork for doctors.

The team is now working on a study using a similar NLP approach to identify patients at risk for radiation necrosis – swelling in the brain that is a rare, but serious, side effect of too much radiation. That effort could help catch complications earlier or avoid them altogether. Future researchers could also use the NLP model with other health systems and other cancer types or add in health data, such as imaging scans or lab tests.

For Obeid, this work reflects a larger trend in health care: using electronic health records not just for documentation but as a rich source of data that can improve care in real time.

“Automating data extraction from unstructured notes that are already in health records builds accurate, up-to-date datasets,” he said. “This approach saves time and opens the door to more meaningful research on outcomes after radiosurgery and other treatments.”

As cancer treatment becomes more complex, data-driven tools like this are gaining importance. By teaching computers to read medical notes that doctors write, the researchers are helping to bridge the gap between raw data and real understanding.


Mario Fugal, David Marshall, Alexander V. Alekseyenko, Xia Jing, Graham Warren and Jihad Obeid. Classifying Stereotactic Radiosurgery Patients by Primary Diagnosis Using Natural Language Processing. JCO Clinical Cancer Informatics. [13 June 2025]. doi: 10.1200/CCI-24-00268.
This grant was support in part by MUSC Hollings Cancer Center.