Despite AI advancements, human oversight remains essential

Study reveals its limitations in medical coding

2024-04-22

(Press-News.org) New York, NY [April 22, 2024]—State-of-the-art artificial intelligence systems known as large language models (LLMs) are poor medical coders, according to researchers at the Icahn School of Medicine at Mount Sinai. Their study, published in the April 19 online issue of NEJM AI [DOI: 10.1056/AIdbp2300040], emphasizes the necessity for refinement and validation of these technologies before considering clinical implementation.

The study extracted a list of more than 27,000 unique diagnosis and procedure codes from 12 months of routine care in the Mount Sinai Health System, while excluding identifiable patient data. Using the description for each code, the researchers prompted models from OpenAI, Google, and Meta to output the most accurate medical codes. The generated codes were compared with the original codes and errors were analyzed for any patterns.

The investigators reported that all of the studied large language models, including GPT-4, GPT-3.5, Gemini-pro, and Llama-2-70b, showed limited accuracy (below 50 percent) in reproducing the original medical codes, highlighting a significant gap in their usefulness for medical coding. GPT-4 demonstrated the best performance, with the highest exact match rates for ICD-9-CM (45.9 percent), ICD-10-CM (33.9 percent), and CPT codes (49.8 percent).

GPT-4 also produced the highest proportion of incorrectly generated codes that still conveyed the correct meaning. For example, when given the ICD-9-CM description "nodular prostate without urinary obstruction," GPT-4 generated a code for "nodular prostate," showcasing its comparatively nuanced understanding of medical terminology. However, even considering these technically correct codes, an unacceptably large number of errors remained.

The next best-performing model, GPT-3.5, had the greatest tendency toward being vague. It had the highest proportion of incorrectly generated codes that were accurate but more general in nature compared to the precise codes. In this case, when provided with the ICD-9-CM description "unspecified adverse effect of anesthesia," GPT-3.5 generated a code for "other specified adverse effects, not elsewhere classified."

"Our findings underscore the critical need for rigorous evaluation and refinement before deploying AI technologies in sensitive operational areas like medical coding," says study corresponding author Ali Soroush, MD, MS, Assistant Professor of Data-Driven and Digital Medicine (D3M), and Medicine (Gastroenterology), at Icahn Mount Sinai. "While AI holds great potential, it must be approached with caution and ongoing development to ensure its reliability and efficacy in health care."

One potential application for these models in the health care industry, say the investigators, is automating the assignment of medical codes for reimbursement and research purposes based on clinical text.

“Previous studies indicate that newer large language models struggle with numerical tasks. However, the extent of their accuracy in assigning medical codes from clinical text had not been thoroughly investigated across different models,” says co-senior author Eyal Klang, MD, Director of the D3M’s Generative AI Research Program. "Therefore, our aim was to assess whether these models could effectively perform the fundamental task of matching a medical code to its corresponding official text description."

The study authors proposed that integrating LLMs with expert knowledge could automate medical code extraction, potentially enhancing billing accuracy and reducing administrative costs in health care.

"This study sheds light on the current capabilities and challenges of AI in health care, emphasizing the need for careful consideration and additional refinement prior to widespread adoption,” says co-senior author Girish Nadkarni, MD, MPH, Irene and Dr. Arthur M. Fishberg Professor of Medicine at Icahn Mount Sinai, Director of The Charles Bronfman Institute of Personalized Medicine, and System Chief of D3M.

The researchers caution that the study's artificial task may not fully represent real-world scenarios where LLM performance could be worse.

Next, the research team plans to develop tailored LLM tools for accurate medical data extraction and billing code assignment, aiming to improve quality and efficiency in health care operations.

The study is titled “Generative Large Language Models are Poor Medical Coders: A Benchmarking Analysis of Medical Code Querying.”

The remaining authors on the paper, all with Icahn Mount Sinai except where indicated, are: Benjamin S. Glicksberg, PhD; Eyal Zimlichman, MD (Sheba Medical Center and Tel Aviv University, Israel); Yiftach Barash, (Tel Aviv University and Sheba Medical Center, Israel); Robert Freeman, RN, MSN, NE-BC; and Alexander W. Charney, MD, PhD.

This research was supported by the AGA Research Foundation’s 2023 AGA-Amgen Fellowship to-Faculty Transition Award AGA2023-32-06 and an NIH UL1TR004419 award.

The researchers affirm that the study was conducted without the use of any Protected Health Information (“PHI”).

Please see the study to view more details, including on competing interests.

-####-

About the Icahn School of Medicine at Mount Sinai

The Icahn School of Medicine at Mount Sinai is internationally renowned for its outstanding research, educational, and clinical care programs. It is the sole academic partner for the eight- member hospitals* of the Mount Sinai Health System, one of the largest academic health systems in the United States, providing care to a large and diverse patient population.

Ranked 13th nationwide in National Institutes of Health (NIH) funding and among the 99th percentile in research dollars per investigator according to the Association of American Medical Colleges, Icahn Mount Sinai has a talented, productive, and successful faculty. More than 3,000 full-time scientists, educators, and clinicians work within and across 44 academic departments and 36 multidisciplinary institutes, a structure that facilitates tremendous collaboration and synergy. Our emphasis on translational research and therapeutics is evident in such diverse areas as genomics/big data, virology, neuroscience, cardiology, geriatrics, as well as gastrointestinal and liver diseases.

Icahn Mount Sinai offers highly competitive MD, PhD, and Master’s degree programs, with current enrollment of approximately 1,300 students. It has the largest graduate medical education program in the country, with more than 2,000 clinical residents and fellows training throughout the Health System. In addition, more than 550 postdoctoral research fellows are in training within the Health System.

A culture of innovation and discovery permeates every Icahn Mount Sinai program. Mount Sinai’s technology transfer office, one of the largest in the country, partners with faculty and trainees to pursue optimal commercialization of intellectual property to ensure that Mount Sinai discoveries and innovations translate into healthcare products and services that benefit the public.

Icahn Mount Sinai’s commitment to breakthrough science and clinical care is enhanced by academic affiliations that supplement and complement the School’s programs.

Through the Mount Sinai Innovation Partners (MSIP), the Health System facilitates the real-world application and commercialization of medical breakthroughs made at Mount Sinai. Additionally, MSIP develops research partnerships with industry leaders such as Merck & Co., AstraZeneca, Novo Nordisk, and others.

The Icahn School of Medicine at Mount Sinai is located in New York City on the border between the Upper East Side and East Harlem, and classroom teaching takes place on a campus facing Central Park. Icahn Mount Sinai’s location offers many opportunities to interact with and care for diverse communities. Learning extends well beyond the borders of our physical campus, to the eight hospitals of the Mount Sinai Health System, our academic affiliates, and globally.

-------------------------------------------------------

* Mount Sinai Health System member hospitals: The Mount Sinai Hospital; Mount Sinai Beth Israel; Mount Sinai Brooklyn; Mount Sinai Morningside; Mount Sinai Queens; Mount Sinai South Nassau; Mount Sinai West; and New York Eye and Ear Infirmary of Mount Sinai.

END

ELSE PRESS RELEASES FROM THIS DATE:

Gut bacteria and inflammatory bowel disease: a new frontier in treatment

2024-04-22

A growing body of research suggests that manipulating gut bacteria may offer a promising approach to managing inflammatory bowel disease (IBD), a chronic and debilitating condition affecting millions of people worldwide. IBD, encompassing Crohn's disease, ulcerative colitis, and unclassified IBD, is characterized by chronic inflammation of the digestive tract. Conventional treatments aim to control symptoms and prevent complications. However, they often have limited effectiveness and can come with side effects. This new research explores the potential of prebiotics, dietary fibers that nourish ...

Critical gap in colorectal cancer screening process: follow-up after positive stool test

2024-04-22

INDIANAPOLIS – Screening for colorectal cancer presents a unique opportunity unavailable for most other types of cancers. Screening allows for the detection of both precancerous polyps and cancer, enabling clinicians to intervene and reduce the chances of future development of new or more advanced malignancy. However, gaps in the colorectal screening process exist. One of the most critical gaps, according to an editorial published in JAMA Network Open, is the lack of timely follow-up with a colonoscopy after a positive stool-based test. A positive ...

Ion thermoelectric conversion devices for near room temperature

2024-04-22

They published their work on Apr. 10 in Energy Material Advances. The electrode sheet of the thermoelectric device consists of ionic hydrogel, which is sandwiched between the electrodes to form, and the Prussian blue on the electrode undergoes a redox reaction to improve the energy density and power density of the ionic thermoelectric generator. Prof. Zeng Wei of the Institute of Chemical Engineering, Guangdong Academy of Sciences, said that at the beginning, the group mainly carried out research based on the thermal diffusion effect and published a series of research results. In spite of this, their results never realized the ...

SwRI-led eclipse projects shed new light on solar corona

2024-04-22

SAN ANTONIO — April 22, 2024 —Teams led by Southwest Research Institute successfully executed two groundbreaking experiments — by land and air — collecting unique solar data from the total eclipse that cast a shadow from Texas to Maine on April 8, 2024. The Citizen Continental-America Telescopic Eclipse (CATE) 2024 experiment engaged more than 200 community participants in a broad, approachable and inclusive attempt to make a continuous 60-minute high-resolution movie of this exciting event. A nearly simultaneous investigation used unique equipment installed in NASA’s WB-57F research aircraft to chase the ...

Analyzing the impact of ovulation-inducing agents on the quality of embryo

2024-04-22

Low birth rates have become a serious problem in many developed countries throughout the world, with Japan being a prime example. In Japan particularly, aging and stress have led to a massive rise in infertility, which now affects one in every 4.4 couples. To find a workaround this condition, many couples have now turned to assisted reproduction technologies (ARTs) and in vitro fertilization (IVF) for conception. However, even though ARTs and IVF methods are well-established and have been widely used for over four decades, birth rates post IVF in Japan are still critically low, peaking at a meager 10.2%. One of the reasons ...

Prognostic biomarkers for hepatocellular carcinoma based on serine and glycine metabolism-related genes

2024-04-22

Background and Aims Targeted therapy and immunotherapy have emerged as treatment options for hepatocellular carcinoma (HCC) in recent years. The significance of serine and glycine metabolism in various cancers is widely acknowledged. This study aims to investigate their correlation with the prognosis and tumor immune microenvironment (TIME) of HCC. Methods Based on the public database, different subtypes were identified by cluster analysis, and the prognostic model was constructed through regression analysis. The gene expression omnibus (GEO) data set was used as the ...

In psychedelic therapy, clinician-patient bond may matter most

2024-04-22

COLUMBUS, Ohio – Drug effects have dominated the national conversation about psychedelics for medical treatment, but a new study suggests that when it comes to reducing depression with psychedelic-assisted therapy, what matters most is a strong relationship between the therapist and study participant. Researchers analyzed data from a 2021 clinical trial that found psilocybin (magic mushrooms) combined with psychotherapy in adults was effective at treating major depressive disorder. Data included depression outcomes and participant reports about their experiences ...

Family learning environments in Scandinavia: dimensions, types and socioeconomic profiles

2024-04-22

Do children have regular bedtimes and do parents enforce strict screen time policies? And do parents take their children to museums so that they can learn from an early age? Or is everyday life more about having fun together, without clear rules and any ambition to ‘develop’ children in any particular way? Family life can be lived in many different ways, and what children bring with them from the home environment has a substantial impact on their opportunities and development later in life. A new study from the Department of Sociology, University of Copenhagen, and VIVE - The Danish Center for Social Science Research ...

People think 'old age' starts later than it used to, study finds

2024-04-22

Middle-aged and older adults believe that old age begins later in life than their peers did decades ago, according to a study published by the American Psychological Association. “Life expectancy has increased, which might contribute to a later perceived onset of old age. Also, some aspects of health have improved over time, so that people of a certain age who were regarded as old in the past may no longer be considered old nowadays,” said study author Markus Wettstein, PhD, of Humboldt University in Berlin, Germany. However, the study, which was published in the journal Psychology and Aging, also found evidence that the trend of later perceived old age has slowed ...

Afib more common and dangerous in younger people than previously thought

2024-04-22

PITTSBURGH, April 22, 2024 – Atrial fibrillation (Afib), a common type of arrhythmia that is on the rise in people under the age of 65, is more dangerous in this increasingly younger population than previously thought, according to a new study published today in Circulation Arrhythmia and Electrophysiology and authored by physician-scientists at the UPMC Heart and Vascular Institute. The study, which is among the first to examine a large group of Afib patients younger than 65 in the U.S., found that these younger patients were more likely to be hospitalized for heart failure, stroke or heart ...

Despite AI advancements, human oversight remains essential

ELSE PRESS RELEASES FROM THIS DATE:

LAST 30 PRESS RELEASES: