PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

A new way to let AI chatbots converse all day without crashing

Researchers developed a simple yet effective solution for a puzzling problem that can worsen the performance of large language models such as ChatGPT

2024-02-13
(Press-News.org)

When a human-AI conversation involves many rounds of continuous dialogue, the powerful large language machine-learning models that drive chatbots like ChatGPT sometimes start to collapse, causing the bots’ performance to rapidly deteriorate.

A team of researchers from MIT and elsewhere has pinpointed a surprising cause of this problem and developed a simple solution that enables a chatbot to maintain a nonstop conversation without crashing or slowing down.

Their method involves a tweak to the key-value cache (which is like a conversation memory) at the core of many large language models. In some methods, when this cache needs to hold more information than it has capacity for, the first pieces of data are bumped out. This can cause the model to fail. 

By ensuring that these first few data points remain in memory, the researchers’ method allows a chatbot to keep chatting no matter how long the conversation goes.

The method, called StreamingLLM, enables a model to remain efficient even when a conversation stretches on for more than 4 million words. When compared to another method that avoids crashing by constantly recomputing part of the past conversations, StreamingLLM performed more than 22 times faster.

This could allow a chatbot to conduct long conversations throughout the workday without needing to be continually rebooted, enabling efficient AI assistants for tasks like copywriting, editing, or generating code.

“Now, with this method, we can persistently deploy these large language models. By making a chatbot that we can always chat with, and that can always respond to us based on our recent conversations, we could use these chatbots in some new applications,” says Guangxuan Xiao, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on StreamingLLM.

Xiao’s co-authors include his advisor, Song Han, an associate professor in EECS, a member of the MIT-IBM Watson AI Lab, and a distinguished scientist of NVIDIA; as well as Yuandong Tian, a research scientist at Meta AI; Beidi Chen, an assistant professor at Carnegie Mellon University; and senior author Mike Lewis, a research scientist at Meta AI. The work will be presented at the International Conference on Learning Representations.

A puzzling phenomenon

Large language models encode data, like words in a user query, into representations called tokens. Many models employ what is known as an attention mechanism that uses these tokens to generate new text.

Typically, an AI chatbot writes new text based on text it has just seen, so it stores recent tokens in memory, called a KV Cache, to use later. The attention mechanism builds a grid that includes all tokens in the cache, an “attention map” that maps out how strongly each token, or word, relates to each other token. 

Understanding these relationships is one feature that enables large language models to generate human-like text.

But when the cache gets very large, the attention map can become even more massive, which slows down computation. 

Also, if encoding content requires more tokens than the cache can hold, the model’s performance drops. For instance, one popular model can store 4,096 tokens, yet there are about 10,000 tokens in an academic paper. 

To get around these problems, researchers employ a “sliding cache” that bumps out the oldest tokens to add new tokens. However, the model’s performance often plummets as soon as that first token is evicted, rapidly reducing the quality of newly generated words.

In this new paper, researchers realized that if they keep the first token in the sliding cache, the model will maintain its performance even when the cache size is exceeded. 

But this didn’t make any sense. The first word in a novel likely has nothing to do with the last word, so why would the first word be so important for the model to generate the newest word? 

In their new paper, the researchers also uncovered the cause of this phenomenon.

Attention sinks

Some models use a Softmax operation in their attention mechanism, which assigns a score to each token that represents how much it relates to each other token. The Softmax operation requires all attention scores to sum up to 1. Since most tokens aren’t strongly related, their attention scores are very low. The model dumps any remaining attention score in the first token.

The researchers call this first token an “attention sink.”

“We need an attention sink, and the model decides to use the first token as the attention sink because it is globally visible — every other token can see it. We found that we must always keep the attention sink in the cache to maintain the model dynamics,” Han says.  

In building StreamingLLM, the researchers discovered that having four attention sink tokens at the beginning of the sliding cache leads to optimal performance. 

They also found that the positional encoding of each token must stay the same, even as new tokens are added and others are bumped out. If token 5 is bumped out, token 6 must stay encoded as 6, even though it is now the fifth token in the cache. 

By combining these two ideas, they enabled StreamingLLM to maintain a continuous conversation while outperforming a popular method that uses recomputation.

For instance, when the cache has 256 tokens, the recomputation method takes 63 milliseconds to decode a new token, while StreamingLLM takes 31 milliseconds. However, if the cache size grows to 4,096 tokens, recomputation requires 1,411 milliseconds for a new token, while StreamingLLM needs just 65 milliseconds.

The researchers also explored the use of attention sinks during model training by prepending several placeholder tokens in all training samples. 

They found that training with attention sinks allowed a model to maintain performance with only one attention sink in its cache, rather than the four that are usually required to stabilize a pretrained model’s performance.  

But while StreamingLLM enables a model to conduct a continuous conversation, the model cannot remember words that aren’t stored in the cache. In the future, the researchers plan to target this limitation by investigating methods to retrieve tokens that have been evicted or enable the model to memorize previous conversations.

StreamingLLM has been incorporated into NVIDIA's large language model optimization library, TensorRT-LLM.

This work is funded, in part, by the MIT-IBM Watson AI Lab, the MIT Science Hub, and the U.S. National Science Foundation.
 

###

Written by Adam Zewe, MIT News

Paper: “Efficient streaming language models with attention sinks”

https://arxiv.org/pdf/2309.17453.pdf

END



ELSE PRESS RELEASES FROM THIS DATE:

Better diagnosis and treatment of cryptococcosis

2024-02-13
A group of international mycology experts led by Professor Dr Oliver A. Cornely at the University of Cologne has jointly drafted a guideline for the diagnosis and treatment of cryptococcosis, which aims at improving infection management and thus the survival rate of patients. Cryptococcosis is a fungal infection of mainly the lungs that might lead to meningitis. The article ‘Global guideline for the diagnosis and management of cryptococcosis’ was published in the journal The Lancet Infectious Diseases. Cryptococcosis, ...

Why do flies fall in love? Researchers tease out the signals behind fruit fly courtship songs

Why do flies fall in love? Researchers tease out the signals behind fruit fly courtship songs
2024-02-13
Like a Valentine’s Day dinner or a box of chocolates, male fruit flies have their own rituals for wooing a potential mate. As part of a complex courtship behavior, male flies vibrate their wings to produce a distinctive song that conveys a message to nearby females. Using internal information and cues from females and the environment, males decide moment to moment whether to sing and how. Although scientists now know a lot about how fly movements produce songs, it was still not clear which cells and circuits in the fly’s nervous system enable the behavior. Now, using a suite of novel tools, ...

Polar bears unlikely to adapt to longer summers

Polar bears unlikely to adapt to longer summers
2024-02-13
PULLMAN, Wash. – More time stranded on land means greater risk of starvation for polar bears, a new study indicates. During three summer weeks, 20 polar bears closely observed by scientists tried different strategies to maintain energy reserves, including resting, scavenging and foraging. Yet nearly all of them lost weight rapidly: on average around 1 kilogram, or 2.2 pounds, per day. Some have speculated that polar bears might adapt to the longer ice-free seasons due to climate warming by acting like their grizzly bear relatives ...

Gastric bypass improves long-term diabetes remission, even after weight recurrence

2024-02-13
Key takeaways Diabetes remission: Gastric bypass surgery results in high rates of Type-2 diabetes remission five years after the operation, even after patients regain a significant amount of weight.   Gastric bypass vs. sleeve gastrectomy: Patients who underwent sleeve gastrectomy and regained their weight were five times more likely to see their diabetes return than patients who had gastric bypass surgery and regained their weight.   CHICAGO (February 13, 2024): Adults who have obesity and Type 2 diabetes are much more likely to see their diabetes stay in remission if they undergo gastric ...

Would you prefer a mammogram, MRI, or saliva on a test strip?

Would you prefer a mammogram, MRI, or saliva on a test strip?
2024-02-13
WASHINGTON, Feb. 13, 2024 — Breast cancer is on the rise, but new tools for early detection could save lives. In Journal of Vacuum Science & Technology B, by AIP Publishing, researchers from the University of Florida and National Yang Ming Chiao Tung University in Taiwan reported successful results from a hand-held breast cancer screening device that can detect breast cancer biomarkers from a tiny sample of saliva. Their biosensor design uses common components, such as widely available glucose testing strips and ...

Satellites unveil the size and nature of the world’s coral reefs

Satellites unveil the size and nature of the world’s coral reefs
2024-02-13
University of Queensland-led research has shown there is more coral reef area across the globe than previously thought, with detailed satellite mapping helping to conserve these vital ecosystems. Dr Mitchell Lyons from UQ’s School of the Environment, working as part of the Allen Coral Atlas project, said scientists have now identified 348,000 square kilometres of shallow coral reefs, up to 20-30 metres deep. “This revises up our previous estimate of shallow reefs in the world’s oceans,” Dr Lyons said. “Importantly, the high-resolution, up-to-date mapping satellite technology also allows us to see what these habitats ...

Prepandemic physical activity and risk of COVID-19 diagnosis and hospitalization in older adults

2024-02-13
About The Study: In this study of 61,000 adults age 45 or older, those who adhered to physical activity guidelines before the pandemic had lower odds of developing or being hospitalized for COVID-19. Thus, higher prepandemic physical activity levels may be associated with reduced odds of SARS-CoV-2 infection and hospitalization for COVID-19. Authors: Dennis Muñoz-Vergara, D.V.M., M.P.H., of Brigham and Women’s Hospital, Harvard Medical School in Boston, is the corresponding author. To access the embargoed study: Visit our For The Media website at this link https://media.jamanetwork.com/  (doi: 10.1001/jamanetworkopen.2023.55808) Editor’s Note: Please ...

Maternal tobacco use during pregnancy and child neurocognitive development

2024-02-13
About The Study: Maternal tobacco use during pregnancy was associated with enduring deficits in childhood neurocognition in this study including 11,000 children. Continued research on the association of maternal tobacco use during pregnancy with cognitive performance and brain structure related to language processing skills and episodic memory is needed. Authors: Hongying Daisy Dai, Ph.D., of the University of Nebraska Medical Center in Omaha, is the corresponding author. To access the embargoed study: Visit our For The Media website ...

Are you depressed? Scents might help, new study says

2024-02-13
Smelling a familiar scent can help depressed individuals recall specific autobiographical memories and potentially assist in their recovery, discovered a team of University of Pittsburgh School of Medicine researchers and UPMC social workers in a study published today in JAMA Network Open. The study showed that scents are more effective than words at cueing up a memory of a specific event and could even be used in the clinical setting to help depressed individuals get out of the negative thought cycles and rewire thought patterns, ...

Study finds high levels of physical activity lowered risk of developing COVID-19 infection and hospitalization

2024-02-13
A cohort study of older adults found that those who followed recommended exercise guidelines before the pandemic had significantly lower odds of being infected or hospitalized from COVID-19 than those who did not follow guidelines Need another reason to keep up with your exercise routine? Staying active just might protect you from infection and hospitalization from COVID-19. A new study led by investigators from Brigham and Women’s Hospital, a founding member of Mass General Brigham, suggests that higher levels of physical activity before the pandemic began in 2020 were associated with a lower likelihood of contracting ...

LAST 30 PRESS RELEASES:

Exercise as an anti-ageing intervention to avoid detrimental impact of mental fatigue

UMass Amherst Nursing Professor Emerita honored as ‘Living Legend’

New guidelines aim to improve cystic fibrosis screening

Picky eaters by day, buffet by night: Butterfly, moth diets sync to plant aromas

Pennington Biomedical’s Dr. Leanne Redman honored with the E. V. McCollum Award from the American Society for Nutrition

CCNY physicists uncover electronic interactions mediated via spin waves

Researchers’ 3D-printing formula may transform future of foam

Nurture more important than nature for robotic hand

Drug-delivering aptamers target leukemia stem cells for one-two knockout punch

New study finds that over 95% of sponsored influencer posts on Twitter were not disclosed

New sea grant report helps great lakes fish farmers navigate aquaculture regulations

Strain “trick” improves perovskite solar cells’ efficiency

How GPS helps older drivers stay on the roads

Estrogen and progesterone stimulate the body to make opioids

Dancing with the cells – how acoustically levitating a diamond led to a breakthrough in biotech automation

Machine learning helps construct an evolutionary timeline of bacteria

Cellular regulator of mRNA vaccine revealed... offering new therapeutic options

Animal behavioral diversity at risk in the face of declining biodiversity

Finding their way: GPS ignites independence in older adult drivers

Antibiotic resistance among key bacterial species plateaus over time

‘Some insects are declining but what’s happening to the other 99%?’

Powerful new software platform could reshape biomedical research by making data analysis more accessible

Revealing capillaries and cells in living organs with ultrasound

American College of Physicians awards $260,000 in grants to address equity challenges in obesity care

Researchers from MARE ULisboa discover that the European catfish, an invasive species in Portugal, has a prolonged breeding season, enhancing its invasive potential

Rakesh K. Jain, PhD, FAACR, honored with the 2025 AACR Award for Lifetime Achievement in Cancer Research

Solar cells made of moon dust could power future space exploration

Deporting immigrants may further shrink the health care workforce

Border region emergency medical services in migrant emergency care

Resident physician intentions regarding unionization

[Press-News.org] A new way to let AI chatbots converse all day without crashing
Researchers developed a simple yet effective solution for a puzzling problem that can worsen the performance of large language models such as ChatGPT