14474
clinical study

Using a Large Language Model for Postdeployment Monitoring of FDA-Approved Artificial Intelligence: Pulmonary Embolism Detection Use Case

Materials & Methods
This retrospective study evaluated a post-deployment monitoring (PDM) framework integrating a large language model (LLM) with human oversight to track real-world performance of a commercially deployed pulmonary embolism detection AI (CVPED). A total of 11,999 CT pulmonary angiography (CTPA) studies performed between April 2023 and June 2024 were analyzed. LLM-based free-text report classification was compared with CVPED outputs, and discrepancies were reviewed by radiologists. Drift was defined as a sustained discrepancy rate exceeding a 95% confidence interval for seven consecutive days.

Results
Among 11,999 CTPAs, 1,285 (10.7%) had PE and 373 (3.1%) showed discrepancies between the LLM and CVPED. Of 111 CVPED-positive/LLM-negative cases, 29 triggered alerts due to lack of radiologist engagement, identifying four true incremental PEs. A 2–3% decline in model specificity produced a two- to threefold-rise in discrepancies, detectable within approximately three weeks.

Conclusions
An integrated LLM- and human-in-the-loop PDM framework enabled continuous AI performance tracking, early drift detection and incremental clinical value identification. This scalable monitoring system supports ongoing safety and reliability of FDA-cleared AI tools in clinical practice.

Explore the Latest AI Insights, Trends and Research