PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

ETRI develops an automated benchmark for labguage-based task planners

Reduces costs and time, enables objective assessment. Reveals evaluation results for 33 LLM models with strategies for achieving improved task planning performance

ETRI develops an automated benchmark for labguage-based task planners
2024-04-26
(Press-News.org) If instructed to “Place a cooled apple into the microwave,” how would a robot respond?

Initially, the robot would need to locate an apple, pick it up, find the refrigerator, open its door, and place the apple inside. Subsequently, it would close the refrigerator door, reopen it to retrieve the cooled apple, pick up the apple again, and close the door. Following this, the robot would need to locate the microwave, open its door, place the apple inside, and then close the microwave door. Evaluating how well these steps are executed exemplifies the essence of benchmarking task planning AI technologies. It measures how effectively a robot can respond to commands and adhere to the specified procedures.

 

ETRI research team has developed a technology that automatically evaluates the performance of task plans generated by Large Language Models (LLMs1)), which paves the way for fast and objective assessment of task planning AIs.
1) Language models are constructed from artificial neural networks that contain a vast number of parameters.

Electronics and Telecommunications Research Institute (ETRI) has announced the development of LoTa-Benchmark (LoTa-Bench2)), which enables the automatic evaluation of language-based task planners. A language-based task planner understands the verbal instruction from a human user, plans a sequence of operations, and autonomously executes the designated operations to fulfill the goal of the instruction.
2) LoTa-Bench: A procedural generation artificial intelligence benchmark technology developed by ETRI, abbreviated from Language-oriented Task Planning.

The research team published a paper at one of the leading international AI conferences, the International Conference on Learning Representations (ICLR)3), and shared the evaluation results for a total of 33 large language models through GitHub.
3) https://openreview.net/forum?id=ADSxCpCu9s ICLR (International Conference on Learning Representations)

Recently, large language models have demonstrated remarkable performance not only in language processing, conversation, solving mathematical problems, and logic proof but also in understanding human commands, autonomously selecting sub-tasks, and sequentially executing them to achieve goals. Consequently, there has been a widespread effort to apply large language models in robotics applications and service implementation.

Previously, the absence of benchmark4) technology capable of automatically evaluating task planning performance necessitated manual assessments, which were labor-intensive. For instance, in existing research, including Google’s SayCan5), the method adopted involved multiple individuals directly observing the results of tasks being executed and then voting on their success or failure. This approach not only required a significant amount of time and effort for performance evaluation, making it cumbersome but also introduced the problem of subjective judgment influencing the results.
4) Benchmark: A system that uses programs to compare and evaluate the performance of computer components, among other things, assigning a score based on their efficiency.
5) https://say-can.github.io/

The LoTa-Bench technology developed by ETRI automates the evaluation process by actually executing task plans generated by large language models based on user commands and automatically compares the outcomes to the intended results of the commands to determine whether the plans were successful or not. This approach significantly reduces evaluation time and costs as well as ensures that the evaluation results are objective.

ETRI revealed benchmark results for different large language models, indicating that OpenAI’s GPT-3 achieved a success rate of 21.36%, GPT-4 exhibited 40.38%, Meta’s LLaMA 2-70B model showed 18.27%, and MosaicML’s MPT-30B model recorded 18.75%. It was noted that larger models tend to have superior task planning capabilities. A success rate of 20% implies that out of 100 instructions, 20 plans were successful in fulfilling the goal of the instructions.

In LoTa-Bench, performance evaluation is conducted in virtual simulation environments developed by the Allen Institute for AI(AI2-THOR6)) and the Massachusetts Institute of Technology(MIT’s VirtualHome7)) aimed at research and development of robotics and embodied agent intelligence. The evaluation utilized the ALFRED dataset8) that included everyday household task instructions such as “Place a cooled apple in the microwave” etc.
6) AI2-THOR: A robotic home service simulator.
7) VirtualHome: A simulation of household activities through programming.
8) ALFRED: A benchmark for testing and evaluating the performance of everyday household task execution / Watch-and-Help: A benchmark for testing and evaluating the performance of artificial intelligence in recognizing human task intentions and collaborating accordingly.

Leveraging the benefits of the LoTa-Bench technology for easy and rapid verification of new task planning methods, the research team discovered two strategies for improving task planning performance through data-driven training: In-Context Example Selection and Feedback-Based Replanning. They also confirmed that fine-tuning effectively enhances the performance of language-based task planning.

Minsu Jang, a principal researcher at ETRI’s Social Robotics Lab, stated, “LoTa-Bench marks the first step in the development of task planning AI. We plan to research and develop technologies that can predict task failures in uncertain situations or improve task generation intelligence by asking for and receiving help from humans. This technology is essential for realizing the era of one robot per household.”

Jaehong Kim, the director of ETRI’s Social Robotics Research Section, announced, “ETRI is dedicated to advancing robotic intelligence using foundation models to realize robots capable of generating and executing various mission plans in the real world.”

By releasing the software9) as open source, the ETRI researchers anticipate that companies and educational institutions will be able to freely utilize this technology, thereby accelerating the advancement of related technologies.
9) https: //github.com/lbaa2022/LLMTaskPlanning

 

###

This technology was developed as part of the R&D project titled “Development of Uncertainty-Aware Agents Learning by Asking Questions,” sponsored by the Ministry of Science and ICT and the Institute for Information & communications Technology Planning & Evaluation (IITP).

 

About Electronics and Telecommunications Research Institute (ETRI)

ETRI is a non-profit government-funded research institute. Since its foundation in 1976, ETRI, a global ICT research institute, has been making its immense effort to provide Korea a remarkable growth in the field of ICT industry. ETRI delivers Korea as one of the top ICT nations in the World, by unceasingly developing world’s first and best technologies.

END

[Attachments] See images for this press release:
ETRI develops an automated benchmark for labguage-based task planners ETRI develops an automated benchmark for labguage-based task planners 2 ETRI develops an automated benchmark for labguage-based task planners 3

ELSE PRESS RELEASES FROM THIS DATE:

Revolutionizing memory technology: multiferroic nanodots for low-power magnetic storage

Revolutionizing memory technology: multiferroic nanodots for low-power magnetic storage
2024-04-26
Traditional memory devices are volatile and the current non-volatile ones rely on either ferromagnetic or ferroelectric materials for data storage. In ferromagnetic devices, data is written or stored by aligning magnetic moments, while in ferroelectric devices, data storage relies on the alignment of electric dipoles. However, generating and manipulating magnetic fields is energy-intensive, and in ferroelectric memory devices, reading data destroys the polarized state, requiring the memory cell to be re-writing. Multiferroic materials, which contain both ferroelectric and ferromagnetic orders, offer a promising solution for more efficient ...

Researchers propose groundbreaking framework for future network systems

Researchers propose groundbreaking framework for future network systems
2024-04-26
In a new study published in Engineering, Academician Wu Jiangxing’s research team unveils a theoretical framework that could revolutionize the landscape of network systems and architectures. The paper titled “Theoretical Framework for a Polymorphic Network Environment,” addresses a fundamental challenge in network design—achieving global scalability while accommodating the diverse needs of evolving services. For decades, the quest for an ideal network capable of seamlessly scaling across various dimensions has remained elusive. The team, however, has identified a critical barrier known as the “impossible service-level ...

New favorite—smart electric wheel drive tractor: realizes efficient drive with ingenious structure and intelligent control

2024-04-26
Electric tractors are intended to be used in the field instead of traditional fuel tractors and can be used in greenhouse planting, indoor farming, mountainous operations, and other special operating scenarios. Unlike traditional fuel tractors, electric tractors have no exhaust emissions, rapid drive system response, flexible power output, or other advantages. These scenarios require electric tractors to be able to adapt to complex drive and operating environments, putting higher requirements on the design of electric tractors and their control systems. Therefore, ...

Using stem cell-derived heart muscle cells to advance heart regenerative therapy

Using stem cell-derived heart muscle cells to advance heart regenerative therapy
2024-04-26
Regenerative heart therapies involve transplanting cardiac muscle cells into damaged areas of the heart to recover lost function. However, the risk of arrhythmias following this procedure is reportedly high. In a recent study, researchers from Japan tested a novel approach that involves injecting ‘cardiac spheroids,’ cultured from human stem cells, directly into damaged ventricles. The highly positive outcomes observed in primate models highlight the potential of this strategy. Cardiovascular diseases are still among the top causes of death worldwide, and especially prevalent ...

Damon Runyon Cancer Research Foundation awards Quantitative Biology Fellowships to four cutting-edge scientists

2024-04-26
Damon Runyon has announced its 2024 Quantitative Biology Fellows, four exceptional early-career scientists who are bringing cutting-edge computational tools to bear on some of the most important questions in cancer biology. From the packaging of DNA to mechanisms of chemotherapy resistance, their projects aim to shed light on these fundamental questions through large-scale data collection, mathematical modeling, and quantitative analysis. “In the five years since we named the first class of Quantitative Biology Fellows, it has only become more evident that these scientists bring fresh perspectives and creative ...

Climb stairs to live longer

2024-04-26
Athens, Greece – 26 April 2024:  Climbing stairs is associated with a longer life, according to research presented today at ESC Preventive Cardiology 2024, a scientific congress of the European Society of Cardiology (ESC).1 “If you have the choice of taking the stairs or the lift, go for the stairs as it will help your heart,” said study author Dr. Sophie Paddock of the University of East Anglia and Norfolk and Norwich University Hospital Foundation Trust, Norwich, UK. “Even brief bursts of physical activity have beneficial health impacts, and short bouts of stair climbing should be an achievable target to integrate into ...

Scientists capture X-rays from upward positive lightning

Scientists capture X-rays from upward positive lightning
2024-04-26
Globally, lightning is responsible for over 4,000 fatalities and billions of dollars in damage every year; Switzerland itself weathers up to 150,000 strikes annually. Understanding exactly how lightning forms is key for reducing risk, but because lightning phenomena occur on sub-millisecond timescales, direct measurements are extremely difficult to obtain. Now, researchers from the Electromagnetic Compatibility Lab, led by Farhad Rachidi, in EPFL’s School of Engineering have for the first time directly measured an elusive phenomenon that explains a lot about the birth of a lightning bolt: X-ray radiation. In a collaborative study with the University of Applied Sciences of Western ...

AMS Science Preview: Hawaiian climates; chronic pain; lightning-caused wildfires

AMS Science Preview: Hawaiian climates; chronic pain; lightning-caused wildfires
2024-04-26
The American Meteorological Society continuously publishes research on climate, weather, and water in its 12 journals. Many of these articles are available for early online access–they are peer-reviewed, but not yet in their final published form. Below is a selection of articles published early online recently. Some articles are open-access; to view others, members of the media can contact kpflaumer@ametsoc.org for press login credentials. Routine Climate Monitoring in the State of Hawai‘i: Establishment of State Climate Divisions Bulletin of the American Meteorological ...

Researchers advance detection of gravitational waves to study collisions of neutron stars and black holes

Researchers advance detection of gravitational waves to study collisions of neutron stars and black holes
2024-04-26
MINNEAPOLIS/ST. PAUL (04/26/2024) — Researchers at the University of Minnesota Twin Cities College of Science and Engineering co-led a new study by an international team that will improve the detection of gravitational waves—ripples in space and time.  The research aims to send alerts to astronomers and astrophysicists within 30 seconds after the detection, helping to improve the understanding of neutron stars and black holes and how heavy elements, including gold and uranium, are produced. The findings were recently published in the Proceedings of the National Academy of Sciences of the United States of America (PNAS), a peer-reviewed, open access, scientific journal.   Gravitational ...

Automated machine learning robot unlocks new potential for genetics research

Automated machine learning robot unlocks new potential for genetics research
2024-04-26
MINNEAPOLIS/ST. PAUL (04/26/2024) — University of Minnesota Twin Cities researchers have constructed a robot that uses machine learning to fully automate a complicated microinjection process used in genetic research.  In their experiments, the researchers were able to use this automated robot to manipulate the genetics of multicellular organisms, including fruit fly and zebrafish embryos. The technology will save labs time and money while enabling them to more easily conduct new, large-scale genetic experiments that were not possible previously using manual techniques The research is featured on the cover of the ...

LAST 30 PRESS RELEASES:

New perspective highlights urgent need for US physician strike regulations

An eye-opening year of extreme weather and climate

Scientists engineer substrates hostile to bacteria but friendly to cells

New tablet shows promise for the control and elimination of intestinal worms

Project to redesign clinical trials for neurologic conditions for underserved populations funded with $2.9M grant to UTHealth Houston

Depression – discovering faster which treatment will work best for which individual

Breakthrough study reveals unexpected cause of winter ozone pollution

nTIDE January 2025 Jobs Report: Encouraging signs in disability employment: A slow but positive trajectory

Generative AI: Uncovering its environmental and social costs

Lower access to air conditioning may increase need for emergency care for wildfire smoke exposure

Dangerous bacterial biofilms have a natural enemy

Food study launched examining bone health of women 60 years and older

CDC awards $1.25M to engineers retooling mine production and safety

Using AI to uncover hospital patients’ long COVID care needs

$1.9M NIH grant will allow researchers to explore how copper kills bacteria

New fossil discovery sheds light on the early evolution of animal nervous systems

A battle of rafts: How molecular dynamics in CAR T cells explain their cancer-killing behavior

Study shows how plant roots access deeper soils in search of water

Study reveals cost differences between Medicare Advantage and traditional Medicare patients in cancer drugs

‘What is that?’ UCalgary scientists explain white patch that appears near northern lights

How many children use Tik Tok against the rules? Most, study finds

Scientists find out why aphasia patients lose the ability to talk about the past and future

Tickling the nerves: Why crime content is popular

Intelligent fight: AI enhances cervical cancer detection

Breakthrough study reveals the secrets behind cordierite’s anomalous thermal expansion

Patient-reported influence of sociopolitical issues on post-Dobbs vasectomy decisions

Radon exposure and gestational diabetes

EMBARGOED UNTIL 1600 GMT, FRIDAY 10 JANUARY 2025: Northumbria space physicist honoured by Royal Astronomical Society

Medicare rules may reduce prescription steering

Red light linked to lowered risk of blood clots

[Press-News.org] ETRI develops an automated benchmark for labguage-based task planners
Reduces costs and time, enables objective assessment. Reveals evaluation results for 33 LLM models with strategies for achieving improved task planning performance