(Press-News.org) If instructed to “Place a cooled apple into the microwave,” how would a robot respond?
Initially, the robot would need to locate an apple, pick it up, find the refrigerator, open its door, and place the apple inside. Subsequently, it would close the refrigerator door, reopen it to retrieve the cooled apple, pick up the apple again, and close the door. Following this, the robot would need to locate the microwave, open its door, place the apple inside, and then close the microwave door. Evaluating how well these steps are executed exemplifies the essence of benchmarking task planning AI technologies. It measures how effectively a robot can respond to commands and adhere to the specified procedures.
ETRI research team has developed a technology that automatically evaluates the performance of task plans generated by Large Language Models (LLMs1)), which paves the way for fast and objective assessment of task planning AIs.
1) Language models are constructed from artificial neural networks that contain a vast number of parameters.
Electronics and Telecommunications Research Institute (ETRI) has announced the development of LoTa-Benchmark (LoTa-Bench2)), which enables the automatic evaluation of language-based task planners. A language-based task planner understands the verbal instruction from a human user, plans a sequence of operations, and autonomously executes the designated operations to fulfill the goal of the instruction.
2) LoTa-Bench: A procedural generation artificial intelligence benchmark technology developed by ETRI, abbreviated from Language-oriented Task Planning.
The research team published a paper at one of the leading international AI conferences, the International Conference on Learning Representations (ICLR)3), and shared the evaluation results for a total of 33 large language models through GitHub.
3) https://openreview.net/forum?id=ADSxCpCu9s ICLR (International Conference on Learning Representations)
Recently, large language models have demonstrated remarkable performance not only in language processing, conversation, solving mathematical problems, and logic proof but also in understanding human commands, autonomously selecting sub-tasks, and sequentially executing them to achieve goals. Consequently, there has been a widespread effort to apply large language models in robotics applications and service implementation.
Previously, the absence of benchmark4) technology capable of automatically evaluating task planning performance necessitated manual assessments, which were labor-intensive. For instance, in existing research, including Google’s SayCan5), the method adopted involved multiple individuals directly observing the results of tasks being executed and then voting on their success or failure. This approach not only required a significant amount of time and effort for performance evaluation, making it cumbersome but also introduced the problem of subjective judgment influencing the results.
4) Benchmark: A system that uses programs to compare and evaluate the performance of computer components, among other things, assigning a score based on their efficiency.
5) https://say-can.github.io/
The LoTa-Bench technology developed by ETRI automates the evaluation process by actually executing task plans generated by large language models based on user commands and automatically compares the outcomes to the intended results of the commands to determine whether the plans were successful or not. This approach significantly reduces evaluation time and costs as well as ensures that the evaluation results are objective.
ETRI revealed benchmark results for different large language models, indicating that OpenAI’s GPT-3 achieved a success rate of 21.36%, GPT-4 exhibited 40.38%, Meta’s LLaMA 2-70B model showed 18.27%, and MosaicML’s MPT-30B model recorded 18.75%. It was noted that larger models tend to have superior task planning capabilities. A success rate of 20% implies that out of 100 instructions, 20 plans were successful in fulfilling the goal of the instructions.
In LoTa-Bench, performance evaluation is conducted in virtual simulation environments developed by the Allen Institute for AI(AI2-THOR6)) and the Massachusetts Institute of Technology(MIT’s VirtualHome7)) aimed at research and development of robotics and embodied agent intelligence. The evaluation utilized the ALFRED dataset8) that included everyday household task instructions such as “Place a cooled apple in the microwave” etc.
6) AI2-THOR: A robotic home service simulator.
7) VirtualHome: A simulation of household activities through programming.
8) ALFRED: A benchmark for testing and evaluating the performance of everyday household task execution / Watch-and-Help: A benchmark for testing and evaluating the performance of artificial intelligence in recognizing human task intentions and collaborating accordingly.
Leveraging the benefits of the LoTa-Bench technology for easy and rapid verification of new task planning methods, the research team discovered two strategies for improving task planning performance through data-driven training: In-Context Example Selection and Feedback-Based Replanning. They also confirmed that fine-tuning effectively enhances the performance of language-based task planning.
Minsu Jang, a principal researcher at ETRI’s Social Robotics Lab, stated, “LoTa-Bench marks the first step in the development of task planning AI. We plan to research and develop technologies that can predict task failures in uncertain situations or improve task generation intelligence by asking for and receiving help from humans. This technology is essential for realizing the era of one robot per household.”
Jaehong Kim, the director of ETRI’s Social Robotics Research Section, announced, “ETRI is dedicated to advancing robotic intelligence using foundation models to realize robots capable of generating and executing various mission plans in the real world.”
By releasing the software9) as open source, the ETRI researchers anticipate that companies and educational institutions will be able to freely utilize this technology, thereby accelerating the advancement of related technologies.
9) https: //github.com/lbaa2022/LLMTaskPlanning
###
This technology was developed as part of the R&D project titled “Development of Uncertainty-Aware Agents Learning by Asking Questions,” sponsored by the Ministry of Science and ICT and the Institute for Information & communications Technology Planning & Evaluation (IITP).
About Electronics and Telecommunications Research Institute (ETRI)
ETRI is a non-profit government-funded research institute. Since its foundation in 1976, ETRI, a global ICT research institute, has been making its immense effort to provide Korea a remarkable growth in the field of ICT industry. ETRI delivers Korea as one of the top ICT nations in the World, by unceasingly developing world’s first and best technologies.
END
ETRI develops an automated benchmark for labguage-based task planners
Reduces costs and time, enables objective assessment. Reveals evaluation results for 33 LLM models with strategies for achieving improved task planning performance
2024-04-26
ELSE PRESS RELEASES FROM THIS DATE:
Revolutionizing memory technology: multiferroic nanodots for low-power magnetic storage
2024-04-26
Traditional memory devices are volatile and the current non-volatile ones rely on either ferromagnetic or ferroelectric materials for data storage. In ferromagnetic devices, data is written or stored by aligning magnetic moments, while in ferroelectric devices, data storage relies on the alignment of electric dipoles. However, generating and manipulating magnetic fields is energy-intensive, and in ferroelectric memory devices, reading data destroys the polarized state, requiring the memory cell to be re-writing.
Multiferroic materials, which contain both ferroelectric and ferromagnetic orders, offer a promising solution for more efficient ...
Researchers propose groundbreaking framework for future network systems
2024-04-26
In a new study published in Engineering, Academician Wu Jiangxing’s research team unveils a theoretical framework that could revolutionize the landscape of network systems and architectures. The paper titled “Theoretical Framework for a Polymorphic Network Environment,” addresses a fundamental challenge in network design—achieving global scalability while accommodating the diverse needs of evolving services.
For decades, the quest for an ideal network capable of seamlessly scaling across various dimensions has remained elusive. The team, however, has identified a critical barrier known as the “impossible service-level ...
New favorite—smart electric wheel drive tractor: realizes efficient drive with ingenious structure and intelligent control
2024-04-26
Electric tractors are intended to be used in the field instead of traditional fuel tractors and can be used in greenhouse planting, indoor farming, mountainous operations, and other special operating scenarios. Unlike traditional fuel tractors, electric tractors have no exhaust emissions, rapid drive system response, flexible power output, or other advantages. These scenarios require electric tractors to be able to adapt to complex drive and operating environments, putting higher requirements on the design of electric tractors and their control systems. Therefore, ...
Using stem cell-derived heart muscle cells to advance heart regenerative therapy
2024-04-26
Regenerative heart therapies involve transplanting cardiac muscle cells into damaged areas of the heart to recover lost function. However, the risk of arrhythmias following this procedure is reportedly high. In a recent study, researchers from Japan tested a novel approach that involves injecting ‘cardiac spheroids,’ cultured from human stem cells, directly into damaged ventricles. The highly positive outcomes observed in primate models highlight the potential of this strategy.
Cardiovascular diseases are still among the top causes of death worldwide, and especially prevalent ...
Damon Runyon Cancer Research Foundation awards Quantitative Biology Fellowships to four cutting-edge scientists
2024-04-26
Damon Runyon has announced its 2024 Quantitative Biology Fellows, four exceptional early-career scientists who are bringing cutting-edge computational tools to bear on some of the most important questions in cancer biology. From the packaging of DNA to mechanisms of chemotherapy resistance, their projects aim to shed light on these fundamental questions through large-scale data collection, mathematical modeling, and quantitative analysis.
“In the five years since we named the first class of Quantitative Biology Fellows, it has only become more evident that these scientists bring fresh perspectives and creative ...
Climb stairs to live longer
2024-04-26
Athens, Greece – 26 April 2024: Climbing stairs is associated with a longer life, according to research presented today at ESC Preventive Cardiology 2024, a scientific congress of the European Society of Cardiology (ESC).1
“If you have the choice of taking the stairs or the lift, go for the stairs as it will help your heart,” said study author Dr. Sophie Paddock of the University of East Anglia and Norfolk and Norwich University Hospital Foundation Trust, Norwich, UK. “Even brief bursts of physical activity have beneficial health impacts, and short bouts of stair climbing should be an achievable target to integrate into ...
Scientists capture X-rays from upward positive lightning
2024-04-26
Globally, lightning is responsible for over 4,000 fatalities and billions of dollars in damage every year; Switzerland itself weathers up to 150,000 strikes annually. Understanding exactly how lightning forms is key for reducing risk, but because lightning phenomena occur on sub-millisecond timescales, direct measurements are extremely difficult to obtain.
Now, researchers from the Electromagnetic Compatibility Lab, led by Farhad Rachidi, in EPFL’s School of Engineering have for the first time directly measured an elusive phenomenon that explains a lot about the birth of a lightning bolt: X-ray radiation. In a collaborative study with the University of Applied Sciences of Western ...
AMS Science Preview: Hawaiian climates; chronic pain; lightning-caused wildfires
2024-04-26
The American Meteorological Society continuously publishes research on climate, weather, and water in its 12 journals. Many of these articles are available for early online access–they are peer-reviewed, but not yet in their final published form.
Below is a selection of articles published early online recently. Some articles are open-access; to view others, members of the media can contact kpflaumer@ametsoc.org for press login credentials.
Routine Climate Monitoring in the State of Hawai‘i: Establishment of State Climate Divisions
Bulletin of the American Meteorological ...
Researchers advance detection of gravitational waves to study collisions of neutron stars and black holes
2024-04-26
MINNEAPOLIS/ST. PAUL (04/26/2024) — Researchers at the University of Minnesota Twin Cities College of Science and Engineering co-led a new study by an international team that will improve the detection of gravitational waves—ripples in space and time.
The research aims to send alerts to astronomers and astrophysicists within 30 seconds after the detection, helping to improve the understanding of neutron stars and black holes and how heavy elements, including gold and uranium, are produced.
The findings were recently published in the Proceedings of the National Academy of Sciences of the United States of America (PNAS), a peer-reviewed, open access, scientific journal.
Gravitational ...
Automated machine learning robot unlocks new potential for genetics research
2024-04-26
MINNEAPOLIS/ST. PAUL (04/26/2024) — University of Minnesota Twin Cities researchers have constructed a robot that uses machine learning to fully automate a complicated microinjection process used in genetic research.
In their experiments, the researchers were able to use this automated robot to manipulate the genetics of multicellular organisms, including fruit fly and zebrafish embryos. The technology will save labs time and money while enabling them to more easily conduct new, large-scale genetic experiments that were not possible previously using manual techniques
The research is featured on the cover of the ...
LAST 30 PRESS RELEASES:
Music-based therapy may improve depressive symptoms in people with dementia
No evidence that substituting NHS doctors with physician associates is necessarily safe
At-home brain speed tests bridge cognitive data gaps
CRF appoints Josep Rodés-Cabau, M.D., Ph.D., as editor-in-chief of structural heart: the journal of the heart team
Violent crime is indeed a root cause of migration, according to new study
Customized smartphone app shows promise in preventing further cognitive decline among older adults diagnosed with mild impairment
Impact of COVID-19 on education not going away, UM study finds
School of Public Health researchers receive National Academies grant to assess environmental conditions in two Houston neighborhoods
Three Speculum articles recognized with prizes
ACM A.M. Turing Award honors two researchers who led the development of cornerstone AI technology
Incarcerated people are disproportionately impacted by climate change, CU doctors say
ESA 2025 Graduate Student Policy Award Cohort Named
Insomnia, lack of sleep linked to high blood pressure in teens
Heart & stroke risks vary among Asian American, Native Hawaiian & Pacific Islander adults
Levels of select vitamins & minerals in pregnancy may be linked to lower midlife BP risk
Large study of dietary habits suggests more plant oils, less butter could lead to better health
Butter and plant-based oils intake and mortality
20% of butterflies in the U.S. have disappeared since 2000
Bacterial ‘jumping genes’ can target and control chromosome ends
Scientists identify genes that make humans and Labradors more likely to become obese
Early-life gut microbes may protect against diabetes, research in mice suggests
Study raises the possibility of a country without butterflies
Study reveals obesity gene in dogs that is relevant to human obesity studies
A rapid decline in US butterfly populations
Indigenous farming practices have shaped manioc’s genetic diversity for millennia
Controlling electrons in molecules at ultrafast timescales
Tropical forests in the Americas are struggling to keep pace with climate change
Brain mapping unlocks key Alzheimer’s insights
Clinical trial tests novel stem-cell treatment for Parkinson’s disease
Awareness of rocky mountain spotted fever saves lives
[Press-News.org] ETRI develops an automated benchmark for labguage-based task plannersReduces costs and time, enables objective assessment. Reveals evaluation results for 33 LLM models with strategies for achieving improved task planning performance