Elad Walach

Unlocking Healthcare Potential: Precision AI's Edge in Clinical Adoption

The landscape of AI technology is evolving at an unprecedented pace, and the future remains largely unpredictable. Despite being an early adopter and frequent user of the OpenAI sandbox (early version of ChatGPT), I confess that I didn’t foresee the explosive growth of AI’s capabilities.

However, it’s vital to strike a note of caution. While the advancements of generative AI (GenAI) platforms like ChatGPT are astonishing and surpass previous expectations for AI, we must stay grounded when predicting its near-term implications. We must also differentiate between types of AI when they are applied in clinical settings. 

Working with ChatGPT and similar foundation models could create the false expectation that – very soon – any medical condition could be diagnosed in an image or medical record from a highly generalizable foundation model without the additional work of tuning it to a specific task. While these AI models can prove invaluable for tasks like reducing administrative burdens, the fact is that GenAI models currently do not have the levels of diagnostic accuracy needed in high-stakes clinical settings.

While there remain challenges in adapting foundation models such as ChatGPT to many clinical tasks, the dominant form of AI for clinical practice will remain what we call “Precision AI”. These are models trained for solving specific tasks and above all – achieving diagnostic accuracy to make them valuable in clinical practice.

The Cost of an Error

First, it’s essential to highlight the fundamental question that, while intuitively understood, warrants explicit mention: what is the cost of an error in AI? The risk profile of drafting a consumer email, for instance, is vastly different from making a medical decision. 

A recent study at Johns Hopkins revealed that every year in the US, 795,000 patients either die or suffer permanent disability due to medical errors. Clearly, when it comes to creating AI for clinical use, the stakes are remarkably high. So if AI is acting as an aid to physicians, and medical decisions in clinical environments can have such a drastic impact on patient outcomes, clinical AI then must prove its accuracy.

The Complexity of Healthcare Data and Applications

Let’s consider the complexity of healthcare data. It is:

  • Inherently multi-modal: This means it demands the integration of a diverse range of information types. This includes imaging data, textual information, genomic profiles, lab results and even time-series data like vitals. This multitude of data types makes the task of creating coherent and comprehensive healthcare AI models far more challenging.
  • Highly dimensional: The sophisticated data mentioned above necessitates an extensive context size. Current foundation models like ChatGPT typically handle contexts on the order of 100,000 tokens at a “non-diagnostic” accuracy. A typical CT image could easily contain millions of tokens and require “diagnostic” accuracy.
  • Highly domain specific: Many real world problems become easier to solve as foundation models evolve, due to the similarity between different domains. For example – an autonomous vehicle camera is still a digital camera with many similarities to your smartphone camera. In contrast, the medical data domain is inherently different from everyday data (an x-ray of your hand will look nothing like any photo produced by your smartphone), and thus a completely dedicated model is required for the medical domain, and the development of this model can’t be accelerated by relying on previous models.
  • Scarce in expert labels: Vast amounts of data are annotated for the training or validation of many “general domain” foundation models today. For instance, GenAI models for image segmentation are often built on annotations of millions of images from non-experts. Even many models which are trained on un-annotated data are validated on vast amounts of data annotated by non-experts. The more general-purpose the model becomes, the more use-cases need to be validated, and this is of even greater importance in the clinical domain.

Furthermore, there is a complexity to the tasks you need AI to perform, of which can fall under two broad categories: Detection and Extraction. Current AI systems, including ChatGPT, are used mainly for extraction of insights from the text or corpus they were trained on. However, detection, particularly of subtle anomalies, is far more challenging than extraction. 

Consider a radiologist reading a CT scan and detecting a subtle brain aneurysm. This requires “detection” at “diagnostic” accuracy. Once the radiologist writes this finding into the report, anyone reading the report only needs “extractive” accuracy to understand the patient has a brain aneurysm. This is a key differentiator that necessitates “Precision AI” to achieve clinical relevance rather than the extractive accuracy you find in foundation models like ChatGPT.

GenAI Accuracy: A Work in Progress

Achieving accuracy in AI, particularly in healthcare applications, is a more intricate challenge than it might initially seem. And despite its constant advancement, we might still be some distance away from reaching the level of accuracy necessary for effective clinical use of GenAI models. Most GenAI models, like ChatGPT, have been trained/validated  to solve problems significantly different from diagnostic-level detection. For example, consider the difference in complexity between answering a question about a text and detecting a subtle brain hemorrhage in a CT scan. The latter is a task of immense precision and subtlety, which might require detecting a subtle change in a 15 pixel needle in a 100 million pixel hay stack. It’s a vastly different problem and the dimension of the problem is immense. Recent research tried utilizing ChatGPT for detection in long text, which is an easy variation of solving the ‘needle in a haystack’ problem. They found that as ChatGPT input size grew (meaning the number of words that you give ChatGPT to search), it was less capable of answering questions about that input, yielding below 60% accuracy. 

In short, ChatGPT is not great at finding a needle in a haystack, which is exactly what clinical AI needs.

Explore the Latest AI Insights, Trends and Research

Elad Walach