Using AI to distinguish cancer types

Members of the Yu Lab meet

Dr. Kun-Hsing Yu meets with members of the Yu Lab in front of two screens, which show image samples.

It took a series of accidents, “almost 30,” for Dr. Kun-Hsing Yu to end up leading a Harvard Medical School (HMS) lab that focuses on using artificial intelligence to get quantitative and systematic analysis of pathology images.

As a student, he was very interested in pathology, the domain of medicine focused on disease causes, effects, and diagnosis. But there was one area where he struggled: 

“On a few occasions, I could not reliably distinguish between the features of disease A described in textbooks and those of disease B,” Dr. Yu said. 

After graduating and beginning medical research, he realized others shared this problem. “It’s an issue across all of medical practice–sometimes there’s simply too much uncertainty,” he said. Dr. Yu began exploring ways to minimize this uncertainty by using machine learning and informatics-based approaches, and now leveraging the capabilities of AI in a high-performance compute environment. 

Led by Dr. Yu, an Associate Professor in the Department of Biomedical Informatics at HMS, the Yu Lab works to establish robust, generalizable, and fair AI methods to help clinicians analyze cancer pathology samples. Traditionally, this involves processing tumor tissue specimens, staining them with appropriate chemicals, and visually observing their structures. This is a labor-intensive process, Dr. Yu said, and one that is prone to inter-rater variability. Using AI, his lab hopes to make diagnosis less laborious and more consistent.

Dr. Kun-Hsing Yu

Dr. Kun-Hsing Yu

To train the AI model, researchers used more than 60,000 image samples from partner hospitals. These are very high-resolution images, magnified more than 40 times, resulting in more than one billion pixels per sample. 

With multiple terabytes of raw data, the Yu Lab needs a high-performance computing environment, which it found in HMS's Longwood cluster at the Massachusetts Green High Performance Computing Center (MGHPCC). Dr. Yu gained early access to this cluster in mid-2024, after receiving the Dean’s Innovation Award for a project titled “Generative artificial intelligence for explainable colorectal pathology evaluation.” His was one of the first labs to join the space, which was set up by HMS IT’s Research Computing and Infrastructure teams.

“With the advent of the Longwood cluster, we now have access to multiple H100 GPUs, which further facilitated and expedited our process of training, validating, and deploying AI models,” Dr. Yu said. Accessing Longwood helps the Yu Lab in two ways: first, building large, general-purpose foundation models and, second, working on massive images with external partners.

It also helps the Yu Lab tackle a thorny AI issue: potential bias in training datasets.

Despite efforts to collect diverse samples, the Yu Lab received more training data from majority populations, which created a gap in the AI model’s diagnostic performance between majority and minority populations.

“Then our approach is to use generative AI methods to create synthetic images from those populations,” Dr. Yu said. “We can then train a model using both real and synthetic data, and further validate the trained model in both majority and minority populations.”

The AI methods developed by the Yu Lab have been independently validated by over 80 research laboratories worldwide. The team is now exploring ways to implement their AI model in clinical settings to assist clinicians in analyzing cancer pathology samples in real time. 

For more about their research, visit the Yu Lab’s website.