PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

New method uses crowdsourced feedback to help train robots

Human Guided Exploration (HuGE) enables AI agents to learn quickly with some help from humans, even if the humans make mistakes.

2023-11-27
(Press-News.org)

To teach an AI agent a new task, like how to open a kitchen cabinet, researchers often use reinforcement learning — a trial-and-error process where the agent is rewarded for taking actions that get it closer to the goal.

In many instances, a human expert must carefully design a reward function, which is an incentive mechanism that gives the agent motivation to explore. The human expert must iteratively update that reward function as the agent explores and tries different actions. This can be time-consuming, inefficient, and difficult to scale up, especially when the task is complex and involves many steps.

Researchers from MIT, Harvard University, and the University of Washington have developed a new reinforcement learning approach that doesn’t rely on an expertly designed reward function. Instead, it leverages crowdsourced feedback, gathered from many nonexpert users, to guide the agent as it learns to reach its goal. 

While some other methods also attempt to utilize nonexpert feedback, this new approach enables the AI agent to learn more quickly, despite the fact that data crowdsourced from users are often full of errors. These noisy data might cause other methods to fail. 

In addition, this new approach allows feedback to be gathered asynchronously, so nonexpert users around the world can contribute to teaching the agent.

“One of the most time-consuming and challenging parts in designing a robotic agent today is engineering the reward function. Today reward functions are designed by expert researchers — a paradigm that is not scalable if we want to teach our robots many different tasks. Our work proposes a way to scale robot learning by crowdsourcing the design of reward function and by making it possible for nonexperts to provide useful feedback,” says Pulkit Agrawal, an assistant professor in the MIT Department of Electrical Engineering and Computer Science (EECS) who leads the Improbable AI Lab in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

In the future, this method could help a robot learn to perform specific tasks in a user’s home quickly, without the owner needing to show the robot physical examples of each task. The robot could explore on its own, with crowdsourced nonexpert feedback guiding its exploration.

“In our method, the reward function guides the agent to what it should explore, instead of telling it exactly what it should do to complete the task. So, even if the human supervision is somewhat inaccurate and noisy, the agent is still able to explore, which helps it learn much better,” explains lead author Marcel Torne ’23, a research assistant in the Improbable AI Lab.

Torne is joined on the paper by his MIT advisor, Agrawal; senior author Abhishek Gupta, assistant professor at the University of Washington; as well as others at the University of Washington and MIT. The research will be presented at the Conference on Neural Information Processing Systems next month.

Noisy feedback

One way to gather user feedback for reinforcement learning is to show a user two photos of states achieved by the agent, and then ask that user which state is closer to a goal. For instance, perhaps a robot’s goal is to open a kitchen cabinet. One image might show that the robot opened the cabinet, while the second might show that it opened the microwave. A user would pick the photo of the “better” state.

Some previous approaches try to use this crowdsourced, binary feedback to optimize a reward function that the agent would use to learn the task. However, because nonexperts are likely to make mistakes, the reward function can become very noisy, so the agent might get stuck and never reach its goal.

“Basically, the agent would take the reward function too seriously. It would try to match the reward function perfectly. So, instead of directly optimizing over the reward function, we just use it to tell the robot which areas it should be exploring,” Torne says.

He and his collaborators decoupled the process into two separate parts, each directed by its own algorithm. They call their new reinforcement learning method HuGE (Human Guided Exploration). 

On one side, a goal selector algorithm is continuously updated with crowdsourced human feedback. The feedback is not used as a reward function, but rather to guide the agent’s exploration. In a sense, the nonexpert users drop breadcrumbs that incrementally lead the agent toward its goal.

On the other side, the agent explores on its own, in a self-supervised manner guided by the goal selector. It collects images or videos of actions that it tries, which are then sent to humans and used to update the goal selector. 

This narrows down the area for the agent to explore, leading it to more promising areas that are closer to its goal. But if there is no feedback, or if feedback takes a while to arrive, the agent will keep learning on its own, albeit in a slower manner. This enables feedback to be gathered infrequently and asynchronously.

“The exploration loop can keep going autonomously, because it is just going to explore and learn new things. And then when you get some better signal, it is going to explore in more concrete ways. You can just keep them turning at their own pace,” adds Torne.

And because the feedback is just gently guiding the agent’s behavior, it will eventually learn to complete the task even if users provide incorrect answers. 

Faster learning

The researchers tested this method on a number of simulated and real-world tasks. In simulation, they used HuGE to effectively learn tasks with long sequences of actions, such as stacking blocks in a particular order or navigating a large maze. 

In real-world tests, they utilized HuGE to train robotic arms to draw the letter “U” and pick and place objects. For these tests, they crowdsourced data from 109 nonexpert users in 13 different countries spanning three continents. 

In real-world and simulated experiments, HuGE helped agents learn to achieve the goal faster than other methods. 

The researchers also found that data crowdsourced from nonexperts yielded better performance than synthetic data, which were produced and labeled by the researchers. For nonexpert users, labeling 30 images or videos took fewer than two minutes.

“This makes it very promising in terms of being able to scale up this method,” Torne adds.

In a related paper, which the researchers presented at the recent Conference on Robot Learning, they enhanced HuGE so an AI agent can learn to perform the task, and then autonomously reset the environment to continue learning. For instance, if the agent learns to open a cabinet, the method also guides the agent to close the cabinet.

“Now we can have it learn completely autonomously without needing human resets,” he says.

The researchers also emphasize that, in this and other learning approaches, it is critical to ensure that AI agents are aligned with human values.

In the future, they want to continue refining HuGE so the agent can learn from other forms of communication, such as natural language and physical interactions with the robot. They are also interested in applying this method to teach multiple agents at once.

This research is funded, in part, by the MIT-IBM Watson AI Lab.

###

Written by Adam Zewe, MIT News

Paper: "Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback"

https://arxiv.org/pdf/2307.11049.pdf

END



ELSE PRESS RELEASES FROM THIS DATE:

Study shows price discounts on healthful foods like vegetables and zero-calorie beverages lead to an increase in consumption of these foods

Study shows price discounts on healthful foods like vegetables and zero-calorie beverages lead to an increase in consumption of these foods
2023-11-27
Dietary food intake has a major influence on health indicators, including Body Mass Index (BMI), blood pressure, serum cholesterol and glucose. Previous research has shown that decisions to purchase specific food items are primarily based on taste and cost. In the United States, only 12 percent and 10 percent of adults meet fruit and vegetable intake recommendations, respectively. Since affordability of food items is a limiting factor for meeting fruit and vegetable intake guidelines, researchers hypothesize that more affordable low energy-dense foods like fruits and vegetables, which are relatively more expensive ...

New platform solves key problems in targeted drug delivery

New platform solves key problems in targeted drug delivery
2023-11-27
In recent years, cell and gene therapies have shown significant promise for treating cancer, cystic fibrosis, diabetes, heart disease, HIV/AIDS and other difficult-to-treat diseases. But the lack of effective ways to deliver biological treatments into the body has posed a major barrier for bringing these new therapies to the market — and, ultimately, to the patients who need them most.   Now, Northwestern University synthetic biologists have developed a flexible new platform that solves part of this daunting delivery problem. Mimicking natural ...

Schrum and Sleeter unpacking the history of higher education in the United States

2023-11-27
Kelly Schrum, Professor, Higher Education Program; Affiliated Faculty, History and Art History, and Nathan Sleeter, Research Assistant Professor, History and Art History, Roy Rosenzweig Center for History and New Media (RRCHNM), received $220,000 from the National Endowment for the Humanities for the project: "Unpacking the History of Higher Education in the United States."  This funding began in Oct. 2023 and will end in late Dec. 2024.  The history of higher education is central to understanding its present and future, especially for students in Higher Education and Student Affairs (HESA) programs who will lead colleges and universities for decades ...

SwRI-led PUNCH mission advances toward 2025 launch

SwRI-led PUNCH mission advances toward 2025 launch
2023-11-27
SAN ANTONIO — November 27, 2023 —On November 17, 2023, the Polarimeter to UNify the Corona and Heliosphere (PUNCH) mission achieved an important milestone, passing its internal system integration review and clearing the mission to start integrating its four observatories. Southwest Research Institute leads PUNCH, a NASA Small Explorer (SMEX) mission that will integrate understanding of the Sun’s corona, the outer atmosphere visible during total solar eclipses, with the “solar wind” that fills and defines the solar system. SwRI is also building the spacecraft and three of its five instruments. “This ...

SMART researchers pave the way for faster and safer T-cell therapy through novel contamination-detection method

SMART researchers pave the way for faster and safer T-cell therapy through novel contamination-detection method
2023-11-27
Traditional sterility testing methods for the presence of bacteria and fungi in T-cell cultures are time-consuming, taking from seven up to 14 days, while this novel method takes only up to 24 hours Researchers combined advanced long-read nanopore sequencing techniques and machine learning to ensure accuracy and speed in detecting and identifying sample sterility status and microbial species present in T-cell cultures This breakthrough has the potential to transform sterility assurance in biopharmaceutical manufacturing, leading to better patient outcomes by accelerating the process of getting ...

AI may spare breast cancer patients unnecessary treatments

2023-11-27
·  AI tool could reduce disparities for patients who are diagnosed in community settings ·  Non-cancerous cells can play an important role in sustaining or inhibiting cancer growth  ·  One in eight U.S. women will receive a breast cancer diagnosis in her lifetime CHICAGO --- A new AI (Artificial Intelligence) tool may make it possible to spare breast cancer patients unnecessary chemotherapy treatments by using a more precise method of predicting their outcomes, reports ...

Characteristics and obtainment methods of firearms used in adolescent school shootings

2023-11-27
About The Study: School shooting incidents in the U.S. were typically executed using low- and moderate-powered firearms, according to this analysis of data from 262 adolescents who discharged firearms in 253 school shootings spanning 26 years. These weapons were most frequently stolen from family members or relatives of the perpetrators. These findings may significantly influence discussions around gun control policy, particularly in advocating for secure firearm storage to reduce adolescents’ access to weapons.  Authors: Brent R. Klein, Ph.D., of the University of South Carolina in Columbia, is the corresponding author. To access ...

Association of smoking cessation and cardiovascular, cancer, and respiratory mortality

2023-11-27
About The Study: Excess cardiovascular mortality among former smokers was about one-third that of continuing smokers within the first decade after quitting, and the cardiovascular mortality rate of former smokers was similar to that of never smokers 20 to 29 years after quitting in this study of 438,000 U.S. adults. These findings emphasize that with sustained cessation, cause-specific mortality rates among former smokers may eventually approximate those of never smokers.  Authors: Blake Thomson, D.Phil., of the Stanford University School of Medicine in Stanford, California, is the corresponding author. To access the embargoed study: Visit our For The Media website at ...

Brain boost: Can a coach help elders at risk for Alzheimer’s?

2023-11-27
Brain Boost: Can a Coach Help Elders at Risk for Alzheimer’s? Study shows cognitive improvements when participants keep active and socially engaged, control blood pressure and diabetes. As more medications move towards federal approval for Alzheimer’s disease, a new study led by researchers at UC San Francisco and Kaiser Permanente Washington has found that personalized health and lifestyle changes can delay or even prevent memory loss for higher-risk older adults. The two-year study compared cognitive ...

Early-stage stem cell therapy trial shows promise for treating progressive MS

2023-11-27
An international team has shown that the injection of a type of stem cell into the brains of patients living with progressive multiple sclerosis (MS) is safe, well tolerated and has a long-lasting effect that appears to protect the brain from further damage. The study, led by scientists at the University of Cambridge, University of Milan Bicocca and Hospital Casa Sollievo della Sofferenza (Italy), is a step towards developing an advanced cell therapy treatment for progressive MS. Over 2 million people live with MS worldwide, ...

LAST 30 PRESS RELEASES:

Study shows alcohol-dependent men and women have different biochemistries, so may need different treatments

Researchers find that Antidepressants may improve brain function

Aviation can achieve Net-Zero by 2050 if immediate action is taken, says University of Cambridge report

Study shows psychedelic drug psilocybin gives comparable long-term antidepressant effects to standard antidepressants, but may offer additional benefits

Study finds symptoms of depression during pregnancy linked to specific brain activity: scientists hope to develop test for “baby blues” risk

Sexual health symptoms may correlate with poor adherence to adjuvant endocrine therapy in Black women with breast cancer

Black patients with triple-negative breast cancer may be less likely to receive immunotherapy than white patients

Affordable care act may increase access to colon cancer care for underserved groups

UK study shows there is less stigma against LGBTQ people than you might think, but people with mental health problems continue to experience higher levels of stigma

Bringing lost proteins back home

Better than blood tests? Nanoparticle potential found for assessing kidneys

Texas A&M and partner USAging awarded 2024 Immunization Neighborhood Champion Award

UTEP establishes collaboration with DoD, NSA to help enhance U.S. semiconductor workforce

Study finds family members are most common perpetrators of infant and child homicides in the U.S.

Researchers secure funds to create a digital mental health tool for Spanish-speaking Latino families

UAB startup Endomimetics receives $2.8 million Small Business Innovation Research grant

Scientists turn to human skeletons to explore origins of horseback riding

UCF receives prestigious Keck Foundation Award to advance spintronics technology

Cleveland Clinic study shows bariatric surgery outperforms GLP-1 diabetes drugs for kidney protection

Study reveals large ocean heat storage efficiency during the last deglaciation

Fever drives enhanced activity, mitochondrial damage in immune cells

A two-dose schedule could make HIV vaccines more effective

Wastewater monitoring can detect foodborne illness, researchers find

Kowalski, Salonvaara receive ASHRAE Distinguished Service Awards

SkAI launched to further explore universe

SLU researchers identify sex-based differences in immune responses against tumors

Evolved in the lab, found in nature: uncovering hidden pH sensing abilities

Unlocking the potential of patient-derived organoids for personalized sarcoma treatment

New drug molecule could lead to new treatments for Parkinson’s disease in younger patients

Deforestation in the Amazon is driven more by domestic demand than by the export market

[Press-News.org] New method uses crowdsourced feedback to help train robots
Human Guided Exploration (HuGE) enables AI agents to learn quickly with some help from humans, even if the humans make mistakes.