Technique improves the reasoning capabilities of large language models

Combining natural language and programming, the method enables LLMs to solve numerical, analytical, and language-based tasks transparently

2024-06-14

(Press-News.org) CAMBRIDGE, MA - Large language models like those that power ChatGPT have shown impressive performance on tasks like drafting legal briefs, analyzing the sentiment of customer reviews, or translating documents into different languages.

These machine-learning models typically use only natural language to process information and answer queries, which can make it difficult for them to perform tasks that require numerical or symbolic reasoning.

For instance, a large language model might be able to memorize and recite a list of recent U.S. presidents and their birthdays, but that same model could fail if asked the question “Which U.S. presidents elected after 1950 were born on a Wednesday?” (The answer is Jimmy Carter.)

Researchers from MIT and elsewhere have proposed a new technique that enables large language models to solve natural language, math and data analysis, and symbolic reasoning tasks by generating programs.

Their approach, called natural language embedded programs (NLEPs), involves prompting a language model to create and execute a Python program to solve a user’s query, and then output the solution as natural language.

They found that NLEPs enabled large language models to achieve higher accuracy on a wide range of reasoning tasks. The approach is also generalizable, which means one NLEP prompt can be reused for multiple tasks.

NLEPs also improve transparency, since a user could check the program to see exactly how the model reasoned about the query and fix the program if the model gave a wrong answer.

“We want AI to perform complex reasoning in a way that is transparent and trustworthy. There is still a long way to go, but we have shown that combining the capabilities of programming and natural language in large language models is a very good potential first step toward a future where people can fully understand and trust what is going on inside their AI model,” says Hongyin Luo PhD ’22, an MIT postdoc and co-lead author of a paper on NLEPs.

Luo is joined on the paper by co-lead authors Tianhua Zhang, a graduate student at the Chinese University of Hong Kong; and Jiaxin Ge, an undergraduate at Peking University; Yoon Kim, an assistant professor in MIT’s Department of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); senior author James Glass, senior research scientist and head of the Spoken Language Systems Group in CSAIL; and others. The research will be presented at the Annual Conference of the North American Chapter of the Association for Computational Linguistics.

Problem-solving with programs

Many popular large language models work by predicting the next word, or token, given some natural language input. While models like GPT-4 can be used to write programs, they embed those programs within natural language, which can lead to errors in the program reasoning or results.

With NLEPs, the MIT researchers took the opposite approach. They prompt the model to generate a step-by-step program entirely in Python code, and then embed the necessary natural language inside the program.

An NLEP is a problem-solving template with four steps. First, the model calls the necessary packages, or functions, it will need to solve the task. Step two involves importing natural language representations of the knowledge the task requires (like a list of U.S. presidents’ birthdays). For step three, the model implements a function that calculates the answer. And for the final step, the model outputs the result as a line of natural language with an automatic data visualization, if needed.

“It is like a digital calculator that always gives you the correct computation result as long as the program is correct,” Luo says.

The user can easily investigate the program and fix any errors in the code directly rather than needing to rerun the entire model to troubleshoot.

The approach also offers greater efficiency than some other methods. If a user has many similar questions, they can generate one core program and then replace certain variables without needing to run the model repeatedly.

To prompt the model to generate an NLEP, the researchers give it an overall instruction to write a Python program, provide two NLEP examples (one with math and one with natural language), and one test question.

“Usually, when people do this kind of few-shot prompting, they still have to design prompts for every task. We found that we can have one prompt for many tasks because it is not a prompt that teaches LLMs to solve one problem, but a prompt that teaches LLMs to solve many problems by writing a program,” says Luo.

“Having language models reason with code unlocks many opportunities for tool use, output validation, more structured understanding into model's capabilities and way of thinking, and more,” says Leonid Karlinsky, principal scientist at the MIT-IBM Watson AI Lab.

“No magic here”

NLEPs achieved greater than 90 percent accuracy when prompting GPT-4 to solve a range of symbolic reasoning tasks, like tracking shuffled objects or playing a game of 24, as well as instruction-following and text classification tasks. The researchers found that NLEPs even exhibited 30 percent greater accuracy than task-specific prompting methods. The method also showed improvements over open-source LLMs.

Along with boosting the accuracy of large language models, NLEPs could also improve data privacy. Since NLEP programs are run locally, sensitive user data do not need to be sent to a company like OpenAI or Google to be processed by a model.

In addition, NLEPs can enable small language models to perform better without the need to retrain a model for a certain task, which can be a costly process.

“There is no magic here. We do not have a more expensive or fancy language model. All we do is use program generation instead of natural language generation, and we can make it perform significantly better,” Luo says.

However, an NLEP relies on the program generation capability of the model, so the technique does not work as well for smaller models which have been trained on limited datasets. In the future, the researchers plan to study methods that could make smaller language models generate more effective NLEPs. In addition, they want to investigate the impact of prompt variations on NLEPs to enhance the robustness of the model’s reasoning processes.

###

This research was supported, in part, by the Center for Perceptual and Interactive Intelligence of Hong Kong.

END

ELSE PRESS RELEASES FROM THIS DATE:

URI study examines challenges, barriers to care for individuals leaving residential substance use facilities

2024-06-14

Residential treatment is among the most effective tools for treating substance use disorder, with people in these settings showing improvement not only in their substance use but also in their mental health, social functioning and quality of life. However, when people leave residential substance use facilities, they face immense challenges as they attempt to reintegrate into their communities and return to their normal lives. As many as 40 to 70 percent of people who complete residential treatment return ...

Some CRISPR screens may be missing cancer drug targets

2024-06-14

CRISPR/Cas9 gene editing has made possible a multitude of biomedical experiments including studies that systematically turn off genes in cancer cells to look for ones that the cancer cells heavily depend on to survive and grow. These genes, or “cancer dependencies,” are often promising drug targets. But new research shows that many of these CRISPR screening experiments rely on components, called CRISPR/Cas9 guides, that do not perform equally well in cells from people of all ancestries, which can cause CRISPR screens to miss cancer dependencies. These CRISPR guides are short sequences of RNA that ...

$18.5 million U19 grant will study B and T memory cells in transplanted lungs, uteruses and kidneys

2024-06-14

BIRMINGHAM, Ala. – Memory immune cells reside in many tissues, poised to react to a second infection or continuing antigen. Yet little is known about these tissue-resident memory cells — how they get there, how they evolve and how they compete in tissues. A five-year, $18.5 million grant will allow University of Alabama at Birmingham researchers to investigate T and B tissue-resident memory cells, known as TRM and BRM cells, in three unique sites — transplanted lungs, transplanted kidneys and the transplanted ...

Improving soil health yields unexpected benefits for farmers

2024-06-14

In the U.S., as farmers wrestle with extreme heat and drought, heavy rainfall and flooding, and erosion—all factors of climate change which can take a toll on crops—there's been a lot of buzz over regenerative agriculture over the past few years, as big agriculture companies promise opportunities to make money from "carbon farming" while also improving soil health. Regenerative farming strives to improve soil health through various methods, including reduced or no tillage, keeping the soil covered year-round through ...

NYCST announces inaugural awards for space technology projects

2024-06-14

The New York Consortium for Space Technology (NYCST) is led by Cornell University, which is funded by the U.S. Department of Defense’s Office of Local Defense Community Cooperation through the Defense Manufacturing Community Support Program. Ithaca, NY— June 14, 2024 — The New York Consortium for Space Technology Innovation and Development (NYCST) today announced more than $300 thousand has been allocated to support 6 projects through the inaugural round of the consortium’s funding program. The projects were selected during NYCST’s inaugural ...

St. Jude scientists solve decades long mystery of NLRC5 sensor function in cell death

2024-06-14

(MEMPHIS, Tenn. – June 14, 2024) The innate immune system is responsible for protecting the human body from threats that could cause disease or infection. The system relies on innate immune sensors to detect and transmit signals about these threats. One of the key innate immune strategies to respond to threats is through cell death. New research from St. Jude Children’s Research Hospital discovered that NLRC5 plays a previously unknown role as an innate immune sensor, triggering cell death. The findings, published in Cell, show how NLRC5 drives PANoptosis, a prominent type of inflammatory cell death. This understanding has implications for the development of therapeutics ...

Gonadal function in male mice disrupted by prenatal risk factors

2024-06-14

Researchers have consistently shown that prenatal exposure to Di (2-ethyhexyl) phthalate harms the reproductive system in male mice and causes fertility defects. In a new study, scientists from the University of Illinois Urbana-Champaign have shown that the combination of DEHP and a high-fat diet in pregnant mice can cause more damage to pups than each factor alone. Male reproductive disorders are a growing issue due to the global decrease in sperm count and quality. Concerningly, chemicals like DEHP, which can be found in food storage containers, pharmaceuticals, and building materials, have been ...

Endangered sea cucumbers for sale in NYC food markets

2024-06-14

ITHACA, N.Y. - After surveying food market retailers in three New York City Chinatown districts, Cornell University researchers have found genetic evidence that some endangered species of sea cucumbers – considered a pricey but nutritious dried delicacy – are being sold to consumers. The researchers collected 103 samples of dried sea cucumbers from retail food shops. By using mitochondrial DNA testing, they successfully identified 74 examples of sea cucumbers. Eight were classified as brown sea cucumbers– which are threatened and found on the International Union for the Conservation of Nature (IUCN) Red List due to overharvesting. “We ...

Infectious H5N1 influenza virus in raw milk rapidly declines with heat treatment

2024-06-14

WHAT: The amount of infectious H5N1 influenza viruses in raw milk rapidly declined with heat treatment in laboratory research conducted by scientists at the National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health. However, small, detectable amounts of infectious virus remained in raw milk samples with high virus levels when treated at 72 degrees Celsius (161.6 degrees Fahrenheit) for 15 seconds—one of the standard pasteurization methods used by the dairy industry. The authors of the study stress, ...

Erk5 and its potential applications in cancer treatment

2024-06-14

“Elucidating the function of Erk5 in cancer [...] will contribute to a better understanding of cancer pathogenesis and the development of novel therapeutic strategies.” BUFFALO, NY- June 14, 2024 – A new editorial paper was published in Oncoscience (Volume 11) on May 20, 2024, entitled, “Role of Erk5 expressed in bone marrow mesenchymal stem cells on bone homeostasis and its potential applications in cancer treatment.” In their new editorial, researchers Tetsuhiro Horie and Eiichi Hinoi from Kanazawa ...