(Press-News.org) If instructed to “Place a cooled apple into the microwave,” how would a robot respond?
Initially, the robot would need to locate an apple, pick it up, find the refrigerator, open its door, and place the apple inside. Subsequently, it would close the refrigerator door, reopen it to retrieve the cooled apple, pick up the apple again, and close the door. Following this, the robot would need to locate the microwave, open its door, place the apple inside, and then close the microwave door. Evaluating how well these steps are executed exemplifies the essence of benchmarking task planning AI technologies. It measures how effectively a robot can respond to commands and adhere to the specified procedures.
ETRI research team has developed a technology that automatically evaluates the performance of task plans generated by Large Language Models (LLMs1)), which paves the way for fast and objective assessment of task planning AIs.
1) Language models are constructed from artificial neural networks that contain a vast number of parameters.
Electronics and Telecommunications Research Institute (ETRI) has announced the development of LoTa-Benchmark (LoTa-Bench2)), which enables the automatic evaluation of language-based task planners. A language-based task planner understands the verbal instruction from a human user, plans a sequence of operations, and autonomously executes the designated operations to fulfill the goal of the instruction.
2) LoTa-Bench: A procedural generation artificial intelligence benchmark technology developed by ETRI, abbreviated from Language-oriented Task Planning.
The research team published a paper at one of the leading international AI conferences, the International Conference on Learning Representations (ICLR)3), and shared the evaluation results for a total of 33 large language models through GitHub.
3) https://openreview.net/forum?id=ADSxCpCu9s ICLR (International Conference on Learning Representations)
Recently, large language models have demonstrated remarkable performance not only in language processing, conversation, solving mathematical problems, and logic proof but also in understanding human commands, autonomously selecting sub-tasks, and sequentially executing them to achieve goals. Consequently, there has been a widespread effort to apply large language models in robotics applications and service implementation.
Previously, the absence of benchmark4) technology capable of automatically evaluating task planning performance necessitated manual assessments, which were labor-intensive. For instance, in existing research, including Google’s SayCan5), the method adopted involved multiple individuals directly observing the results of tasks being executed and then voting on their success or failure. This approach not only required a significant amount of time and effort for performance evaluation, making it cumbersome but also introduced the problem of subjective judgment influencing the results.
4) Benchmark: A system that uses programs to compare and evaluate the performance of computer components, among other things, assigning a score based on their efficiency.
5) https://say-can.github.io/
The LoTa-Bench technology developed by ETRI automates the evaluation process by actually executing task plans generated by large language models based on user commands and automatically compares the outcomes to the intended results of the commands to determine whether the plans were successful or not. This approach significantly reduces evaluation time and costs as well as ensures that the evaluation results are objective.
ETRI revealed benchmark results for different large language models, indicating that OpenAI’s GPT-3 achieved a success rate of 21.36%, GPT-4 exhibited 40.38%, Meta’s LLaMA 2-70B model showed 18.27%, and MosaicML’s MPT-30B model recorded 18.75%. It was noted that larger models tend to have superior task planning capabilities. A success rate of 20% implies that out of 100 instructions, 20 plans were successful in fulfilling the goal of the instructions.
In LoTa-Bench, performance evaluation is conducted in virtual simulation environments developed by the Allen Institute for AI(AI2-THOR6)) and the Massachusetts Institute of Technology(MIT’s VirtualHome7)) aimed at research and development of robotics and embodied agent intelligence. The evaluation utilized the ALFRED dataset8) that included everyday household task instructions such as “Place a cooled apple in the microwave” etc.
6) AI2-THOR: A robotic home service simulator.
7) VirtualHome: A simulation of household activities through programming.
8) ALFRED: A benchmark for testing and evaluating the performance of everyday household task execution / Watch-and-Help: A benchmark for testing and evaluating the performance of artificial intelligence in recognizing human task intentions and collaborating accordingly.
Leveraging the benefits of the LoTa-Bench technology for easy and rapid verification of new task planning methods, the research team discovered two strategies for improving task planning performance through data-driven training: In-Context Example Selection and Feedback-Based Replanning. They also confirmed that fine-tuning effectively enhances the performance of language-based task planning.
Minsu Jang, a principal researcher at ETRI’s Social Robotics Lab, stated, “LoTa-Bench marks the first step in the development of task planning AI. We plan to research and develop technologies that can predict task failures in uncertain situations or improve task generation intelligence by asking for and receiving help from humans. This technology is essential for realizing the era of one robot per household.”
Jaehong Kim, the director of ETRI’s Social Robotics Research Section, announced, “ETRI is dedicated to advancing robotic intelligence using foundation models to realize robots capable of generating and executing various mission plans in the real world.”
By releasing the software9) as open source, the ETRI researchers anticipate that companies and educational institutions will be able to freely utilize this technology, thereby accelerating the advancement of related technologies.
9) https: //github.com/lbaa2022/LLMTaskPlanning
###
This technology was developed as part of the R&D project titled “Development of Uncertainty-Aware Agents Learning by Asking Questions,” sponsored by the Ministry of Science and ICT and the Institute for Information & communications Technology Planning & Evaluation (IITP).
About Electronics and Telecommunications Research Institute (ETRI)
ETRI is a non-profit government-funded research institute. Since its foundation in 1976, ETRI, a global ICT research institute, has been making its immense effort to provide Korea a remarkable growth in the field of ICT industry. ETRI delivers Korea as one of the top ICT nations in the World, by unceasingly developing world’s first and best technologies.
END
ETRI develops an automated benchmark for labguage-based task planners
Reduces costs and time, enables objective assessment. Reveals evaluation results for 33 LLM models with strategies for achieving improved task planning performance
2024-04-26
ELSE PRESS RELEASES FROM THIS DATE:
Revolutionizing memory technology: multiferroic nanodots for low-power magnetic storage
2024-04-26
Traditional memory devices are volatile and the current non-volatile ones rely on either ferromagnetic or ferroelectric materials for data storage. In ferromagnetic devices, data is written or stored by aligning magnetic moments, while in ferroelectric devices, data storage relies on the alignment of electric dipoles. However, generating and manipulating magnetic fields is energy-intensive, and in ferroelectric memory devices, reading data destroys the polarized state, requiring the memory cell to be re-writing.
Multiferroic materials, which contain both ferroelectric and ferromagnetic orders, offer a promising solution for more efficient ...
Researchers propose groundbreaking framework for future network systems
2024-04-26
In a new study published in Engineering, Academician Wu Jiangxing’s research team unveils a theoretical framework that could revolutionize the landscape of network systems and architectures. The paper titled “Theoretical Framework for a Polymorphic Network Environment,” addresses a fundamental challenge in network design—achieving global scalability while accommodating the diverse needs of evolving services.
For decades, the quest for an ideal network capable of seamlessly scaling across various dimensions has remained elusive. The team, however, has identified a critical barrier known as the “impossible service-level ...
New favorite—smart electric wheel drive tractor: realizes efficient drive with ingenious structure and intelligent control
2024-04-26
Electric tractors are intended to be used in the field instead of traditional fuel tractors and can be used in greenhouse planting, indoor farming, mountainous operations, and other special operating scenarios. Unlike traditional fuel tractors, electric tractors have no exhaust emissions, rapid drive system response, flexible power output, or other advantages. These scenarios require electric tractors to be able to adapt to complex drive and operating environments, putting higher requirements on the design of electric tractors and their control systems. Therefore, ...
Using stem cell-derived heart muscle cells to advance heart regenerative therapy
2024-04-26
Regenerative heart therapies involve transplanting cardiac muscle cells into damaged areas of the heart to recover lost function. However, the risk of arrhythmias following this procedure is reportedly high. In a recent study, researchers from Japan tested a novel approach that involves injecting ‘cardiac spheroids,’ cultured from human stem cells, directly into damaged ventricles. The highly positive outcomes observed in primate models highlight the potential of this strategy.
Cardiovascular diseases are still among the top causes of death worldwide, and especially prevalent ...
Damon Runyon Cancer Research Foundation awards Quantitative Biology Fellowships to four cutting-edge scientists
2024-04-26
Damon Runyon has announced its 2024 Quantitative Biology Fellows, four exceptional early-career scientists who are bringing cutting-edge computational tools to bear on some of the most important questions in cancer biology. From the packaging of DNA to mechanisms of chemotherapy resistance, their projects aim to shed light on these fundamental questions through large-scale data collection, mathematical modeling, and quantitative analysis.
“In the five years since we named the first class of Quantitative Biology Fellows, it has only become more evident that these scientists bring fresh perspectives and creative ...
Climb stairs to live longer
2024-04-26
Athens, Greece – 26 April 2024: Climbing stairs is associated with a longer life, according to research presented today at ESC Preventive Cardiology 2024, a scientific congress of the European Society of Cardiology (ESC).1
“If you have the choice of taking the stairs or the lift, go for the stairs as it will help your heart,” said study author Dr. Sophie Paddock of the University of East Anglia and Norfolk and Norwich University Hospital Foundation Trust, Norwich, UK. “Even brief bursts of physical activity have beneficial health impacts, and short bouts of stair climbing should be an achievable target to integrate into ...
Scientists capture X-rays from upward positive lightning
2024-04-26
Globally, lightning is responsible for over 4,000 fatalities and billions of dollars in damage every year; Switzerland itself weathers up to 150,000 strikes annually. Understanding exactly how lightning forms is key for reducing risk, but because lightning phenomena occur on sub-millisecond timescales, direct measurements are extremely difficult to obtain.
Now, researchers from the Electromagnetic Compatibility Lab, led by Farhad Rachidi, in EPFL’s School of Engineering have for the first time directly measured an elusive phenomenon that explains a lot about the birth of a lightning bolt: X-ray radiation. In a collaborative study with the University of Applied Sciences of Western ...
AMS Science Preview: Hawaiian climates; chronic pain; lightning-caused wildfires
2024-04-26
The American Meteorological Society continuously publishes research on climate, weather, and water in its 12 journals. Many of these articles are available for early online access–they are peer-reviewed, but not yet in their final published form.
Below is a selection of articles published early online recently. Some articles are open-access; to view others, members of the media can contact kpflaumer@ametsoc.org for press login credentials.
Routine Climate Monitoring in the State of Hawai‘i: Establishment of State Climate Divisions
Bulletin of the American Meteorological ...
Researchers advance detection of gravitational waves to study collisions of neutron stars and black holes
2024-04-26
MINNEAPOLIS/ST. PAUL (04/26/2024) — Researchers at the University of Minnesota Twin Cities College of Science and Engineering co-led a new study by an international team that will improve the detection of gravitational waves—ripples in space and time.
The research aims to send alerts to astronomers and astrophysicists within 30 seconds after the detection, helping to improve the understanding of neutron stars and black holes and how heavy elements, including gold and uranium, are produced.
The findings were recently published in the Proceedings of the National Academy of Sciences of the United States of America (PNAS), a peer-reviewed, open access, scientific journal.
Gravitational ...
Automated machine learning robot unlocks new potential for genetics research
2024-04-26
MINNEAPOLIS/ST. PAUL (04/26/2024) — University of Minnesota Twin Cities researchers have constructed a robot that uses machine learning to fully automate a complicated microinjection process used in genetic research.
In their experiments, the researchers were able to use this automated robot to manipulate the genetics of multicellular organisms, including fruit fly and zebrafish embryos. The technology will save labs time and money while enabling them to more easily conduct new, large-scale genetic experiments that were not possible previously using manual techniques
The research is featured on the cover of the ...
LAST 30 PRESS RELEASES:
National Poll: Some parents need support managing children's anger
Political shadows cast by the Antarctic curtain
Scientists lead study on ‘spray on, wash off’ bandages for painful EB condition
A new discovery about pain signalling may contribute to better treatment of chronic pain
Migrating birds have stowaway passengers: invasive ticks could spread novel diseases around the world
Diabetes drug shows promise in protecting kidneys
Updated model reduces liver transplant disparities for women
Risk of internal bleeding doubles when people on anticoagulants take NSAID painkiller
‘Teen-friendly’ mindfulness therapy aims to help combat depression among teenagers
Innovative risk score accurately calculates which kidney transplant candidates are also at risk for heart attack or stroke, new study finds
Kidney outcomes in transthyretin amyloid cardiomyopathy
Partial cardiac denervation to prevent postoperative atrial fibrillation after coronary artery bypass grafting
Finerenone in women and men with heart failure with mildly reduced or preserved ejection fraction
Finerenone, serum potassium, and clinical outcomes in heart failure with mildly reduced or preserved ejection fraction
Hormone therapy reshapes the skeleton in transgender individuals who previously blocked puberty
Evaluating performance and agreement of coronary heart disease polygenic risk scores
Heart failure in zero gravity— external constraint and cardiac hemodynamics
Amid record year for dengue infections, new study finds climate change responsible for 19% of today’s rising dengue burden
New study finds air pollution increases inflammation primarily in patients with heart disease
AI finds undiagnosed liver disease in early stages
The American Society of Tropical Medicine and Hygiene and the Bill & Melinda Gates Foundation announce new research fellowship in malaria genomics in honor of professor Dominic Kwiatkowski
Excessive screen time linked to early puberty and accelerated bone growth
First nationwide study discovers link between delayed puberty in boys and increased hospital visits
Traditional Mayan practices have long promoted unique levels of family harmony. But what effect is globalization having?
New microfluidic device reveals how the shape of a tumour can predict a cancer’s aggressiveness
Speech Accessibility Project partners with The Matthew Foundation, Massachusetts Down Syndrome Congress
Mass General Brigham researchers find too much sitting hurts the heart
New study shows how salmonella tricks gut defenses to cause infection
Study challenges assumptions about how tuberculosis bacteria grow
NASA Goddard Lidar team receives Center Innovation Award for Advancements
[Press-News.org] ETRI develops an automated benchmark for labguage-based task plannersReduces costs and time, enables objective assessment. Reveals evaluation results for 33 LLM models with strategies for achieving improved task planning performance