by Shreya Johri

Machine learning (ML) influences nearly every aspect of our lives. Whether it’s receiving personalized recommendations on your favorite shopping site, interacting with ChatGPT, or even navigating through your city with real-time traffic updates, these ML models adapt and learn from vast amounts of data. In fact, ML models are even starting to be adopted in complex settings such as healthcare. But how do these models manage to learn such a diverse range of tasks on such a grand scale?

Training ML models through examples

A decade ago, ML models learned mainly by observation. Scientists presented these models with numerous examples of specific tasks they wanted the model to understand. For example, to teach a model to recognize dogs, scientists would show the model thousands of dog photos so it could learn to identify dogs in new images. This method is similar to how children learn to distinguish animals by looking at picture books or observing nature. The more examples the model was exposed to, the better it became at performing its tasks. This method of learning through examples is called supervised learning, as we are providing the model supervision through examples. The model’s accuracy is scored by comparing the ML-predicted label to the actual label of the image. This scored comparison, termed as “loss,” is then input back into the model to teach it how to improve (Figure 1). 

Figure 1. Supervised learning–learning through examples: The model produces a prediction based on the input image. The prediction is compared to the correct label for the image, and subsequently a “loss” is calculated. The “loss” is then fed back into the model to allow it to learn if it made mistakes. This process is repeated for many images to allow the model to recognize different objects from various angles. 

To train accurate models using supervised learning, it is necessary to curate well-labeled datasets containing a large number of examples. One of the most significant early efforts was the ImageNet project, which involved thousands of crowdsourced workers who meticulously labeled more than 14 million images across a variety of everyday objects, helping to train models to recognize and categorize them accurately.

Although supervised learning may seem like an ideal way to train ML models, there are significant challenges when applying supervised learning to fields like healthcare, where consistently-labeled data is limited or costly to obtain. Take the example of chest X-rays used in diagnosing diseases in the lungs. Ideally, to train a ML model that can identify diseases from these X-rays, we would need numerous examples of X-rays paired with precise diagnoses, such as “healthy”, “pneumonia,” “tuberculosis,” or other conditions. In practice, though, each X-ray is typically accompanied by a detailed radiological report that describes the findings, rather than simple labels (Figure 2). Therefore, obtaining these concise labels would require extensive time and effort from skilled radiologists to review and label each image—a process that is not only time-consuming, but also expensive. Unlike in the case of ImageNet, crowdsourced workers typically don’t have the required expertise to label medical images, and radiologists are already burdened with a high patient load. As a result, relying solely on supervised learning, where models learn from labeled examples, might not always be feasible. 

Figure 2. Example of chest X-ray report.

Can ML models learn on their own?

But what if we could design a model to teach itself to identify diseases in chest X-rays using the full radiological reports? That is, the model would learn how to identify diseases without concise labels. Imagine a model is shown two chest X-rays and their corresponding reports. The model’s task is to match the X-ray with its correct report. Mathematically, the model does this by learning how to store the matching image and report closer to each other, with the unmatched image and report farther away from each other in the models’ learning space, referred to as its representation space (Figure 3). Simply explained, a model’s learning space can be thought of as the model’s way of organizing and understanding information. It is like a big, invisible map where the model places similar things close together and different things far apart. That is, the model learns how to associate each image with its corresponding report and not with an unrelated report. This helps the model recognize patterns and make decisions based on how things are grouped on this map. 

Figure 3. Self-supervised learning: The ML model learns how to store paired chest X-ray images and reports closer to each other in the model’s learning space, while unpaired images and reports are stored far away from each other.

Through this method, the model will learn to discern subtle details in the images that are indicative of specific diseases mentioned in the reports, without ever being explicitly told what to look for in the X-rays. This way, the model effectively teaches itself to diagnose diseases from X-rays by interpreting the text information in the reports, developing a nuanced understanding that might otherwise require large, consistently-labeled data sets. This approach is called self-supervised learning, because the model is supervising itself to identify the disease name from the radiology report and then using it to learn associations between the disease name and the image.

In a landmark study in 2022, scientists showed the promise of using self-supervised learning with medical datasets. They used a large collection of chest X-ray pairs (images, reports) to train their ML model, achieving an average diagnosis accuracy of 88.9%, comparable to that of professional radiologists for many of the diseases. Following this, another significant study in 2024 explored the potential of self-supervised learning in cancer research. In this study, they trained ML models to accurately identify different types and subtypes of cancer from histopathology images – images of biopsied tissue viewed under a microscope to identify if there are abnormal, cancerous cells present. Despite these complex datasets traditionally requiring extensive expert input to label, the models identified the subtypes of lung and kidney cancer with an accuracy of 90.7% and 90.2%, respectively. These studies, amongst many more, have enabled training medical ML models with limited, consistently-labeled data. 

Clinical Use of ML Models

Does this mean that these self-supervised ML models can soon replace doctors? Absolutely not! Despite their impressive performance, these ML models are not foolproof and still have limitations. For example, while they can identify patterns and diagnose diseases from X-rays and tissue images, their assessments might lack the nuanced understanding that experienced human doctors bring, such as the exact location of an abnormality in the image. The performance of self-supervised ML models also often falls short when dealing with rare diseases that a seasoned radiologist would recognize. This is because these models need a substantial amount of image and report examples to identify diseases, which are often scarce or unavailable for rare diseases. Furthermore, these ML models do not currently possess the ability to properly consider the patient’s overall medical history or other external factors that doctors routinely factor into their diagnoses. 

Research in improving the abilities of these self-supervised models and making them reliable is still an active area of investigation. As of June 2024, the US Food and Drug Administration (FDA) has approved 882 artificial intelligence and ML models for use in healthcare; however, most of these approved models were trained using supervised learning, and therefore required tremendous investment in collections of consistently labeled data, spearheaded mostly by large companies. Some of these FDA approved models have also found their way into products like fitness watches, such as for detecting irregular heart rhythm. As the field of self-supervised learning advances, there is hope for more healthcare related applications of these ML models. These models have the potential to assist doctors in diagnostics and enhance patient care, ultimately improving patient outcomes.


Shreya Johri is a PhD candidate in the Biological and Biomedical Sciences PhD program at Harvard University. You can find her at @sjohri20 on Twitter.

Cover image by AcatXIo from pixabay.

For more information:

  • Watch this video to learn more about self-supervised learning. 
  • Find the full list of FDA approved AI/ML models here.

Leave a Reply

Your email address will not be published. Required fields are marked *