Unveiling large multimodal models in pulmonary CT: A comparative assessment of generative AI performance in lung cancer diagnostics
2025-07-30
(Press-News.org)
Gen-AI is increasingly recognized for its potential in healthcare, particularly in complex radiological interpretations. However, the clinical utility of Gen-AI requires thorough validation with real-world data.
Among 184 confirmed malignant lung tumor cases, diagnostic accuracy varied significantly across three models. Gemini achieved highest accuracy, followed by Claude-3-opus, both exceeding 90%, while GPT scored lowest at 65.22%. Statistical analysis confirmed Gemini's diagnostic accuracy in single-image tasks significantly exceeded Claude and GPT. However, Gemini's accuracy plummeted to 58.51% with continuous slices, likely due to difficulties interpreting lesion continuity and spatial relationships. Adding clinical history improved results slightly (68.30%), but still showed the most significant performance decline, suggesting Gemini overly relies on text input while neglecting imaging features. Similarly, GPT performed poorly with continuous CT slices or clinical history, averaging 48.91% and 63.95% accuracy respectively. Claude-3-opus and GPT showed higher stability across different image inputs, with Claude-3-opus demonstrating significant accuracy advantages in continuous slices. Using identical results in at least two attempts as the final diagnosis standard, we compared model accuracy under different inputs. Claude outperformed Gemini, which outperformed GPT. After incorporating non-malignant nodules (n=66), inflammatory lesions (n=100), and normal lungs (n=54) to enhance sample diversity, Claude and Gemini (both AUC = 0.61) performed best with single CT images. However, as input complexity increased, both models' AUC decreased significantly. GPT showed slight AUC improvement with increased input complexity, but remained in the 50-60% range, suggesting near-random performance.
Simplified prompts significantly improved diagnostic performance: Claude (AUC = 0.69), Gemini (AUC = 0.76), and GPT (AUC = 0.73) all showed increased AUC values. Accuracy, sensitivity, specificity, and F1 scores also improved, indicating more balanced performance. However, this improvement wasn't consistent in Gemini and GPT tests using normal images as controls. ROC curves for different control groups further demonstrated Claude's significant diagnostic improvement, while Gemini and GPT struggled with normal image recognition. Comparing pathological subtypes showed similar diagnostic sensitivity across all prompt environments, but overall performance was most balanced with simplified prompts.
Evaluation of Gen-AI-identified lesion features showed Claude and GPT demonstrated greater diversity and accuracy than Gemini in locating and describing lesions. Likert self-assessment indicated all models heavily relied on morphological and margin features for malignancy diagnosis, with "spiculated" and "irregular" as key differentiators. Lesion density and tumor size also played important roles. During sequential queries, we couldn't trace or supplement missing data, prompting further analysis of feature recognition and response rates.
Results showed Morphology/Margins features had highest response rates, with "spiculated" and "lobulated" features especially prominent. Likert scale results indicated models weighted Morphology/Margins features most heavily in malignant tumor diagnosis. In non-malignant lesions, false positives displayed similar feature patterns to malignant cases but with reduced diversity. Coefficient of variation analysis showed Claude had the lowest overall variation in the malignant lesion group. Claude and Gemini demonstrated good feature scoring consistency for both malignant and non-malignant lesions, while GPT showed greater fluctuations in malignant lesions.
In misdiagnosed cases, Gen-AI models showed significant deviations across multiple dimensions, some completely opposite, indicating potential feature fabrication risks and questioning the maturity of image feature learning during model training. For performance optimization, Lasso regression achieved AUCs of 0.896 and 0.884 before and after cross-validation, showing good stability. Stepwise regression achieved comparable AUC values (0.898 and 0.883) but with higher variability. TCGA-LUAD, TCGA-LUSC, and MIDRC-RICORD-1A datasets were used as external validation. Consistent with earlier findings, Claude showed better overall performance with simplified prompts. After feature dimensionality reduction, Lasso's performance indicators became more balanced, further validated by ROC curve analysis.
END
ELSE PRESS RELEASES FROM THIS DATE:
2025-07-30
Large language models (LLMs) like ChatGPT can be used to write convincing but biased peer reviews that are nearly impossible to distinguish from human writing, a new study reveals. This poses a serious threat to the integrity of scientific publishing, where peer review is the critical process for vetting research quality and accuracy.
In a study evaluating the risks of AI in academic publishing, a team of researchers from China tasked the AI model Claude with reviewing 20 real cancer research manuscripts. ...
2025-07-30
T cell senescence occurs in the TME, affecting cancer prognosis and immunotherapy efficacy. The TME induces T cell senescence through multiple pathways, including persistent stimulation by tumor-associated antigens, metabolic pathway alterations, activation of chronic inflammatory responses, proliferation of immunosuppressive cells, and T cell damage caused by tumor radiotherapy and chemotherapy. Senescent T cells exhibit characteristics such as genomic instability, protein imbalance, functional subgroup distribution and proportion imbalance, mitochondrial dysfunction with metabolic disorders, and epigenetic changes. Additionally, in the TME, crosstalk between senescent T cells and other immune ...
2025-07-30
In low-resource settings, babies born with gastroschisis — a congenital condition in which the developing intestines extend outside the body through a hole in the abdominal wall —face life-threatening challenges. While survival rates in high-income countries now exceed 90% thanks to advanced medical tools and neonatal care, infants in resource-constrained medical settings still face high mortality rates, partially because of a lack of access to the lifesaving equipment needed to treat the condition.
A team of engineers ...
2025-07-30
Drivers are not the only ones to blame for roadway fatalities.
That's the crux of a review article in the New England Journal of Medicine written by a pair of Virginia Tech Transportation Institute (VTTI) researchers invited to share their insights on the strategies aimed at progressing toward a future with zero traffic deaths.
Utilizing publicly available data, research publications, and their own expertise, Charlie Klauer and Zac Doerzaph evaluated the safety treatments and countermeasures that apply to what is known as the Safe System Approach, a framework that broadly embraces the concept that road users are not solely responsible ...
2025-07-30
EMBARGOED FOR RELEASE
Wednesday, July 30, 2025
5 p.m. Eastern Time
Media Contact:
NIH Office of Communications and Public Liaison
(301) 496-5787
Beta-HPV can directly cause skin cancer in immunocompromised people
NIH case study finds virus drives creation of cancer cells in context of defective T cells
Researchers at the National Institutes of Health (NIH) have shown for the first time that a type of human papillomavirus (HPV) commonly found on the skin can directly cause a form of skin cancer called cutaneous squamous cell ...
2025-07-30
Multi-institutional team, including physicians and researchers who successfully proposed updates to national guidelines, share important next steps for reevaluating how occupational impairment is determined
Last July, a team of physicians and researchers successfully proposed modifications to the American Medical Association (AMA) Guides to the Evaluation of Permanent Impairment, advocating against the use of race in lung function testing. In a new publication in The New England Journal of Medicine, the team describes the history of how race and pulmonary function testing have been used to quantify lung function impairment, which often determines ...
2025-07-30
Each year, thousands of patients worldwide receive CAR-T cell therapy for blood cancers, achieving remarkable success in treating previously incurable conditions. However, concerns about secondary primary malignancies (SPMs) following this revolutionary treatment have prompted global regulatory attention. In a study published in eClinicalMedicine, a group of researchers from China examined the largest dataset to date analyzing secondary cancer risks after CAR-T therapy.
"CAR-T therapy has transformed the treatment landscape for refractory blood cancers, ...
2025-07-30
Tumor hypoxia refers to the gradual decrease in ATP production when oxygen levels drop below a critical threshold, contributing to malignant tumor development. Studies show hypoxia-induced changes play an indispensable role in tumor progression, enabling tumors to become invasive or metastatic. However, hypoxia's effects vary across tumor types, and these mechanistic differences remain unclear.
To address this, we developed THER (https://smuonco.shinyapps.io/THER/), an online tool that allows analysis of hypoxia-associated transcriptomic data without requiring programming skills. THER contains 63 preprocessed datasets from ...
2025-07-30
July 30, 2025, NEW YORK – Obesity elevates the risk for at least 13 major cancers, including those of the breast, colon and liver. It also impairs immune responses that target tumors and are stimulated by cancer immunotherapies. But it has long been unclear whether these effects stem from the sheer adiposity—or mass of fat—in people living with obesity or from the specific dietary fats they consume.
Now, a decade-long study led by Ludwig Princeton’s Lydia Lynch and reported in the current issue of Nature ...
2025-07-30
EMBARGOED FOR RELEASE UNTIL 4:00 P.M. ET, WEDNESDAY, JULY 30, 2025
MINNEAPOLIS — Women are less likely than men to receive drugs for multiple sclerosis (MS) between the ages of 18 to 40, during women’s childbearing years, even when those drugs have been shown to be safe for use during pregnancy or to have a prolonged effect against the disease even when stopped before conception, according to a study published on July 30, 2025, in Neurology®, the medical journal of the American Academy of Neurology.
“We found that women were less likely to be treated with a disease-modifying ...
LAST 30 PRESS RELEASES:
[Press-News.org] Unveiling large multimodal models in pulmonary CT: A comparative assessment of generative AI performance in lung cancer diagnostics