PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

A new method to steer AI output uncovers vulnerabilities and potential improvements

The work could lead to more reliable, more efficient, and less computationally expensive training of large language models

2026-02-19
(Press-News.org) A team of researchers has found a way to steer the output of large language models by manipulating specific concepts inside these models. The new method could lead to more reliable, more efficient, and less computationally expensive training of LLMs. But it also exposes potential vulnerabilities. 

The researchers, led by Mikhail Belkin at the University of California San Diego and Adit Radhakrishnan at the Massachusetts Institute of Technology, present their findings in the Feb. 19, 2026, issue of the journal Science. 

In the study, researchers went under the hood of several LLMs to locate specific concepts. They then mathematically increased or decreased the importance of these concepts in the LLM’s output. The work builds on a 2024 Science paper led by Belkin and Radhakrishnan, in which they described predictive algorithms known as Recursive Feature Machines. These machines identify patterns within a series of mathematical operations inside LLMs that encode specific concepts. 

“We found that we could mathematically modify these patterns with math that is surprisingly simple,” said Mikhail Belkin, a professor in the Halıcıoğlu Data Science Institute, which is part of the School of Computing, Information and Data Sciences at UC San Diego. 

Using this steering approach, the research team conducted experiments on some of the largest open-source LLMs in use today, such as Llama and Deepseek, identifying and influencing 512 concepts within five classes, ranging from fears, to moods, to locations. The method worked not only in English, but also in languages such as Chinese and Hindi. 

Both studies are particularly important because, until recently, the processes inside LLMs have been essentially locked inside a black box, making it hard to understand how the models arrive at the answers they give users with varying levels of accuracy. 

Improving performance and uncovering vulnerabilities

Researchers found that steering can be used to improve LLM output. For example, the researchers showed steering improved LLM performance on narrow, precise tasks, such as translating from Python to C++ code. The researchers also used the method to identify hallucinations. 

But the method can also be used as an attack against LLMs. By decreasing the importance of the concept of refusal, the researchers found that their method could get an LLM to operate outside of its guardrails, a practice known as jailbreaking. An LLM gave instructions about how to use cocaine. It also provided Social Security numbers, although it’s unclear whether they were real or fabricated.

The method can also be used to boost political bias and a conspiracy theory mindset inside an LLM. In one instance, an LLM claimed that a satellite image of the Earth was the result of a NASA conspiracy to cover up that the Earth is flat. An LLM also claimed that the COVID vaccine was poisonous. 

Computational savings and next steps

The approach is more computationally efficient than existing methods. Using a single NVIDIA Ampere series (A100) graphics processing unit (GPU), it took less than one minute and fewer than 500 training samples to identify the patterns and steer them toward a concept of interest. This shows that the method could be easily integrated into standard LLM training methods. 

Researchers were not able to test their approach on commercial, closed LLMs, such as Claude. But they believe this type of steering would work with any open-source models. “We observed that newer and larger LLMs were more steerable,” they write. The method also might work on smaller, open-source models that can run on a laptop. 

Next steps include improving the steering method to adapt to specific inputs and specific applications. 

“These results suggest that the models know more than they express in responses and that understanding internal representations could lead to fundamental performance and safety improvements,” the research team writes.

This work was supported in part by the National Science Foundation, the Simons Foundation, the UC San Diego-led TILOS institute and the U.S. Office of Naval Research. 

Toward universal steering and monitoring of AI models

Daniel Beaglehole and Mikhail Belkin, University of California San Diego, Department of Computer Science and Engineering, Jacobs School of Engineering and Halıcıoğlu Data Science Institute 

Adityanarayanan Radhakrishnan, Massachusetts Institute of Technology, Broad Institute, MIT and Harvard

Enric Boix-Adserà, Wharton School, University of Pennsylvania

Beaglehole and Radhakrishnan contributed to the work equally 

 

 




 

END


ELSE PRESS RELEASES FROM THIS DATE:

Why some objects in space look like snowmen

2026-02-19
Astronomers have long debated why so many icy objects in the outer solar system look like snowmen. Michigan State University researchers now have evidence of the surprisingly simple process that could be responsible for their creation. Far beyond the violent, chaotic asteroid belt between Mars and Jupiter lies what’s known as the Kuiper Belt. There, past Neptune, you’ll find icy, untouched building blocks from the dawn of the solar system, known as planetesimals. About one in 10 of these objects are contact binaries, planetesimals that are shaped like two connected spheres, much like ...

Flickering glacial climate may have shaped early human evolution

2026-02-19
Researchers have identified a ‘tipping point’ about 2.7 million years ago when global climate conditions switched from being relatively warm and stable to cold and chaotic, as continental ice sheets expanded in the northern hemisphere. Following this transition, Earth’s climate began swinging back and forth between warm interglacial periods and frigid ice ages, linked to slow, cyclic changes in Earth’s orbit. However, glacial periods after this tipping point became far more variable, with ...

First AHA/ACC acute pulmonary embolism guideline: prompt diagnosis and treatment are key

2026-02-19
Guideline Highlights: The first clinical practice guideline on acute pulmonary embolism (PE) from the American Heart Association and the American College of Cardiology introduces a new Acute Pulmonary Embolism Clinical Category system to define the severity of an acute pulmonary embolism and assist in developing a treatment strategy for adults with this condition. The guideline details risk factors for acute PE, such as recent surgery or hospitalization, trauma, prolonged immobility, pregnancy, obesity, cancer and blood clotting ...

Could “cyborg” transplants replace pancreatic tissue damaged by diabetes?

2026-02-19
PHILADELPHIA— A new electronic implant system can help lab‑grown pancreatic cells mature and function properly, potentially providing a basis for novel, cell-based therapies for diabetes. The approach, developed by researchers at the Perelman School of Medicine at the University of Pennsylvania and the School of Engineering and Applied Sciences at Harvard University, incorporates an ultrathin mesh of conductive wires into growing pancreatic tissue, according to a study published today in Science.   “The words ‘bionic’, ‘cybernetic’, ...

Hearing a molecule’s solo performance

2026-02-19
When things vibrate, they make sounds. Molecules do too, but at frequencies far beyond human hearing. Chemical bonds stretch, bend and twist at characteristic rates that fall in the infrared region of the electromagnetic spectrum. Infrared spectroscopy, which measures how light excites these vibrations, is often likened to listening to a molecule's voice. Each molecule has its own unmistakable tone – a vibrational “fingerprint” that reflects not only its chemical structure but also the nanoscale environment around it. But the voices of individual molecules are so faint that traditional infrared spectroscopy ...

Justice after trauma? Race, red tape keep sexual assault victims from compensation

2026-02-19
Images of the researchers   Bureaucratic hurdles and racial disparities restrict access to victim compensation for adult survivors of sexual assault, deepen justice system inequities and compound trauma.   The absence of police verification of a crime is the primary reason for rejection, representing 34.4% of disapproved requests—which account for roughly 8 out of every 100 applicants, according to a new University of Michigan study published in the American Journal of Public Health.   "Our ...

Columbia researchers awarded ARPA-H funding to speed diagnosis of lymphatic disorders

2026-02-19
NEW YORK, NY--A team of researchers led by Columbia University Vagelos College of Physicians and Surgeons has been awarded an up to two-year $8.7 million contract from the Advanced Research Projects Agency for Health (ARPA-H) to create genetic tests to speed the diagnosis of patients born with defects in the lymphatic system. “Discovering genes that cause lymphatic anomalies and using this information to create new clinical tests will not only accelerate the diagnosis of patients, but will also lead to improved treatments and, most importantly, save lives,” says Carrie Shawber, PhD, associate professor of reproductive sciences at VP&S and principal investigator ...

James R. Downing, MD, to step down as president and CEO of St. Jude Children’s Research Hospital in late 2026

2026-02-19
MEMPHIS, Tenn., Feb. 19, 2026 – After leading an unprecedented growth of St. Jude Children’s Research Hospital over the past 12 years, James R. Downing, MD, will step down as president and CEO in late 2026 as part of a planned leadership transition. He will move into a faculty role in the Department of Global Pediatric Medicine, which he helped establish in 2018 to advance the mission of St. Jude around the world. “When I joined St. Jude 40 years ago, I came for the opportunity to do great science, but I stayed because of the mission and culture,” Downing said. “I’ve watched St. Jude ...

A remote-controlled CAR-T for safer immunotherapy

2026-02-19
FEBRUARY 19, 2026, NEW YORK – Among the most promising tools of cancer therapy, engineered immune cells known as chimeric antigen-receptor (CAR) T cells have already transformed the treatment of blood cancers. Yet, despite their promise, CAR-T cells do have their limitations. For one thing, they’ve so far largely failed against solid tumors, which is to say, most types of cancer. For another, they can inadvertently kill healthy cells along with cancerous ones—or, separately, provoke a systemic immune overreaction—causing ...

UT College of Veterinary Medicine dean elected Fellow of the American Academy of Microbiology

2026-02-19
The American Academy of Microbiology has elected Paul Plummer, dean of the University of Tennessee College of Veterinary Medicine, to its 2026 Class of Fellows. Plummer joins an international cohort of 63 distinguished scientists to the honorific leadership group within the American Society for Microbiology. The Fellows are elected annually through a highly selective, peer-review process, based on their records of scientific achievement and original contributions that have advanced microbiology. The Academy received 145 international nominations for the 2026 Fellowship Class. “Academy ...

LAST 30 PRESS RELEASES:

New ‘scimitar-crested’ Spinosaurus species discovered in the central Sahara

“Cyborg” pancreatic organoids can monitor the maturation of islet cells

Technique to extract concepts from AI models can help steer and monitor model outputs

Study clarifies the cancer genome in domestic cats

Crested Spinosaurus fossil was aquatic, but lived 1,000 kilometers from the Tethys Sea

MULTI-evolve: Rapid evolution of complex multi-mutant proteins

A new method to steer AI output uncovers vulnerabilities and potential improvements

Why some objects in space look like snowmen

Flickering glacial climate may have shaped early human evolution

First AHA/ACC acute pulmonary embolism guideline: prompt diagnosis and treatment are key

Could “cyborg” transplants replace pancreatic tissue damaged by diabetes?

Hearing a molecule’s solo performance

Justice after trauma? Race, red tape keep sexual assault victims from compensation

Columbia researchers awarded ARPA-H funding to speed diagnosis of lymphatic disorders

James R. Downing, MD, to step down as president and CEO of St. Jude Children’s Research Hospital in late 2026

A remote-controlled CAR-T for safer immunotherapy

UT College of Veterinary Medicine dean elected Fellow of the American Academy of Microbiology

AERA selects 34 exemplary scholars as 2026 Fellows

Similar kinases play distinct roles in the brain

New research takes first step toward advance warnings of space weather

Scientists unlock a massive new ‘color palette’ for biomedical research by synthesizing non-natural amino acids

Brain cells drive endurance gains after exercise

Same-day hospital discharge is safe in selected patients after TAVI

Why do people living at high altitudes have better glucose control? The answer was in plain sight

Red blood cells soak up sugar at high altitude, protecting against diabetes

A new electrolyte points to stronger, safer batteries

Environment: Atmospheric pollution directly linked to rocket re-entry

Targeted radiation therapy improves quality of life outcomes for patients with multiple brain metastases

Cardiovascular events in women with prior cervical high-grade squamous intraepithelial lesion

Transplantation and employment earnings in kidney transplant recipients

[Press-News.org] A new method to steer AI output uncovers vulnerabilities and potential improvements
The work could lead to more reliable, more efficient, and less computationally expensive training of large language models