PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

New method uses crowdsourced feedback to help train robots

Human Guided Exploration (HuGE) enables AI agents to learn quickly with some help from humans, even if the humans make mistakes.

2023-11-27
(Press-News.org)

To teach an AI agent a new task, like how to open a kitchen cabinet, researchers often use reinforcement learning — a trial-and-error process where the agent is rewarded for taking actions that get it closer to the goal.

In many instances, a human expert must carefully design a reward function, which is an incentive mechanism that gives the agent motivation to explore. The human expert must iteratively update that reward function as the agent explores and tries different actions. This can be time-consuming, inefficient, and difficult to scale up, especially when the task is complex and involves many steps.

Researchers from MIT, Harvard University, and the University of Washington have developed a new reinforcement learning approach that doesn’t rely on an expertly designed reward function. Instead, it leverages crowdsourced feedback, gathered from many nonexpert users, to guide the agent as it learns to reach its goal. 

While some other methods also attempt to utilize nonexpert feedback, this new approach enables the AI agent to learn more quickly, despite the fact that data crowdsourced from users are often full of errors. These noisy data might cause other methods to fail. 

In addition, this new approach allows feedback to be gathered asynchronously, so nonexpert users around the world can contribute to teaching the agent.

“One of the most time-consuming and challenging parts in designing a robotic agent today is engineering the reward function. Today reward functions are designed by expert researchers — a paradigm that is not scalable if we want to teach our robots many different tasks. Our work proposes a way to scale robot learning by crowdsourcing the design of reward function and by making it possible for nonexperts to provide useful feedback,” says Pulkit Agrawal, an assistant professor in the MIT Department of Electrical Engineering and Computer Science (EECS) who leads the Improbable AI Lab in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

In the future, this method could help a robot learn to perform specific tasks in a user’s home quickly, without the owner needing to show the robot physical examples of each task. The robot could explore on its own, with crowdsourced nonexpert feedback guiding its exploration.

“In our method, the reward function guides the agent to what it should explore, instead of telling it exactly what it should do to complete the task. So, even if the human supervision is somewhat inaccurate and noisy, the agent is still able to explore, which helps it learn much better,” explains lead author Marcel Torne ’23, a research assistant in the Improbable AI Lab.

Torne is joined on the paper by his MIT advisor, Agrawal; senior author Abhishek Gupta, assistant professor at the University of Washington; as well as others at the University of Washington and MIT. The research will be presented at the Conference on Neural Information Processing Systems next month.

Noisy feedback

One way to gather user feedback for reinforcement learning is to show a user two photos of states achieved by the agent, and then ask that user which state is closer to a goal. For instance, perhaps a robot’s goal is to open a kitchen cabinet. One image might show that the robot opened the cabinet, while the second might show that it opened the microwave. A user would pick the photo of the “better” state.

Some previous approaches try to use this crowdsourced, binary feedback to optimize a reward function that the agent would use to learn the task. However, because nonexperts are likely to make mistakes, the reward function can become very noisy, so the agent might get stuck and never reach its goal.

“Basically, the agent would take the reward function too seriously. It would try to match the reward function perfectly. So, instead of directly optimizing over the reward function, we just use it to tell the robot which areas it should be exploring,” Torne says.

He and his collaborators decoupled the process into two separate parts, each directed by its own algorithm. They call their new reinforcement learning method HuGE (Human Guided Exploration). 

On one side, a goal selector algorithm is continuously updated with crowdsourced human feedback. The feedback is not used as a reward function, but rather to guide the agent’s exploration. In a sense, the nonexpert users drop breadcrumbs that incrementally lead the agent toward its goal.

On the other side, the agent explores on its own, in a self-supervised manner guided by the goal selector. It collects images or videos of actions that it tries, which are then sent to humans and used to update the goal selector. 

This narrows down the area for the agent to explore, leading it to more promising areas that are closer to its goal. But if there is no feedback, or if feedback takes a while to arrive, the agent will keep learning on its own, albeit in a slower manner. This enables feedback to be gathered infrequently and asynchronously.

“The exploration loop can keep going autonomously, because it is just going to explore and learn new things. And then when you get some better signal, it is going to explore in more concrete ways. You can just keep them turning at their own pace,” adds Torne.

And because the feedback is just gently guiding the agent’s behavior, it will eventually learn to complete the task even if users provide incorrect answers. 

Faster learning

The researchers tested this method on a number of simulated and real-world tasks. In simulation, they used HuGE to effectively learn tasks with long sequences of actions, such as stacking blocks in a particular order or navigating a large maze. 

In real-world tests, they utilized HuGE to train robotic arms to draw the letter “U” and pick and place objects. For these tests, they crowdsourced data from 109 nonexpert users in 13 different countries spanning three continents. 

In real-world and simulated experiments, HuGE helped agents learn to achieve the goal faster than other methods. 

The researchers also found that data crowdsourced from nonexperts yielded better performance than synthetic data, which were produced and labeled by the researchers. For nonexpert users, labeling 30 images or videos took fewer than two minutes.

“This makes it very promising in terms of being able to scale up this method,” Torne adds.

In a related paper, which the researchers presented at the recent Conference on Robot Learning, they enhanced HuGE so an AI agent can learn to perform the task, and then autonomously reset the environment to continue learning. For instance, if the agent learns to open a cabinet, the method also guides the agent to close the cabinet.

“Now we can have it learn completely autonomously without needing human resets,” he says.

The researchers also emphasize that, in this and other learning approaches, it is critical to ensure that AI agents are aligned with human values.

In the future, they want to continue refining HuGE so the agent can learn from other forms of communication, such as natural language and physical interactions with the robot. They are also interested in applying this method to teach multiple agents at once.

This research is funded, in part, by the MIT-IBM Watson AI Lab.

###

Written by Adam Zewe, MIT News

Paper: "Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback"

https://arxiv.org/pdf/2307.11049.pdf

END



ELSE PRESS RELEASES FROM THIS DATE:

Study shows price discounts on healthful foods like vegetables and zero-calorie beverages lead to an increase in consumption of these foods

Study shows price discounts on healthful foods like vegetables and zero-calorie beverages lead to an increase in consumption of these foods
2023-11-27
Dietary food intake has a major influence on health indicators, including Body Mass Index (BMI), blood pressure, serum cholesterol and glucose. Previous research has shown that decisions to purchase specific food items are primarily based on taste and cost. In the United States, only 12 percent and 10 percent of adults meet fruit and vegetable intake recommendations, respectively. Since affordability of food items is a limiting factor for meeting fruit and vegetable intake guidelines, researchers hypothesize that more affordable low energy-dense foods like fruits and vegetables, which are relatively more expensive ...

New platform solves key problems in targeted drug delivery

New platform solves key problems in targeted drug delivery
2023-11-27
In recent years, cell and gene therapies have shown significant promise for treating cancer, cystic fibrosis, diabetes, heart disease, HIV/AIDS and other difficult-to-treat diseases. But the lack of effective ways to deliver biological treatments into the body has posed a major barrier for bringing these new therapies to the market — and, ultimately, to the patients who need them most.   Now, Northwestern University synthetic biologists have developed a flexible new platform that solves part of this daunting delivery problem. Mimicking natural ...

Schrum and Sleeter unpacking the history of higher education in the United States

2023-11-27
Kelly Schrum, Professor, Higher Education Program; Affiliated Faculty, History and Art History, and Nathan Sleeter, Research Assistant Professor, History and Art History, Roy Rosenzweig Center for History and New Media (RRCHNM), received $220,000 from the National Endowment for the Humanities for the project: "Unpacking the History of Higher Education in the United States."  This funding began in Oct. 2023 and will end in late Dec. 2024.  The history of higher education is central to understanding its present and future, especially for students in Higher Education and Student Affairs (HESA) programs who will lead colleges and universities for decades ...

SwRI-led PUNCH mission advances toward 2025 launch

SwRI-led PUNCH mission advances toward 2025 launch
2023-11-27
SAN ANTONIO — November 27, 2023 —On November 17, 2023, the Polarimeter to UNify the Corona and Heliosphere (PUNCH) mission achieved an important milestone, passing its internal system integration review and clearing the mission to start integrating its four observatories. Southwest Research Institute leads PUNCH, a NASA Small Explorer (SMEX) mission that will integrate understanding of the Sun’s corona, the outer atmosphere visible during total solar eclipses, with the “solar wind” that fills and defines the solar system. SwRI is also building the spacecraft and three of its five instruments. “This ...

SMART researchers pave the way for faster and safer T-cell therapy through novel contamination-detection method

SMART researchers pave the way for faster and safer T-cell therapy through novel contamination-detection method
2023-11-27
Traditional sterility testing methods for the presence of bacteria and fungi in T-cell cultures are time-consuming, taking from seven up to 14 days, while this novel method takes only up to 24 hours Researchers combined advanced long-read nanopore sequencing techniques and machine learning to ensure accuracy and speed in detecting and identifying sample sterility status and microbial species present in T-cell cultures This breakthrough has the potential to transform sterility assurance in biopharmaceutical manufacturing, leading to better patient outcomes by accelerating the process of getting ...

AI may spare breast cancer patients unnecessary treatments

2023-11-27
·  AI tool could reduce disparities for patients who are diagnosed in community settings ·  Non-cancerous cells can play an important role in sustaining or inhibiting cancer growth  ·  One in eight U.S. women will receive a breast cancer diagnosis in her lifetime CHICAGO --- A new AI (Artificial Intelligence) tool may make it possible to spare breast cancer patients unnecessary chemotherapy treatments by using a more precise method of predicting their outcomes, reports ...

Characteristics and obtainment methods of firearms used in adolescent school shootings

2023-11-27
About The Study: School shooting incidents in the U.S. were typically executed using low- and moderate-powered firearms, according to this analysis of data from 262 adolescents who discharged firearms in 253 school shootings spanning 26 years. These weapons were most frequently stolen from family members or relatives of the perpetrators. These findings may significantly influence discussions around gun control policy, particularly in advocating for secure firearm storage to reduce adolescents’ access to weapons.  Authors: Brent R. Klein, Ph.D., of the University of South Carolina in Columbia, is the corresponding author. To access ...

Association of smoking cessation and cardiovascular, cancer, and respiratory mortality

2023-11-27
About The Study: Excess cardiovascular mortality among former smokers was about one-third that of continuing smokers within the first decade after quitting, and the cardiovascular mortality rate of former smokers was similar to that of never smokers 20 to 29 years after quitting in this study of 438,000 U.S. adults. These findings emphasize that with sustained cessation, cause-specific mortality rates among former smokers may eventually approximate those of never smokers.  Authors: Blake Thomson, D.Phil., of the Stanford University School of Medicine in Stanford, California, is the corresponding author. To access the embargoed study: Visit our For The Media website at ...

Brain boost: Can a coach help elders at risk for Alzheimer’s?

2023-11-27
Brain Boost: Can a Coach Help Elders at Risk for Alzheimer’s? Study shows cognitive improvements when participants keep active and socially engaged, control blood pressure and diabetes. As more medications move towards federal approval for Alzheimer’s disease, a new study led by researchers at UC San Francisco and Kaiser Permanente Washington has found that personalized health and lifestyle changes can delay or even prevent memory loss for higher-risk older adults. The two-year study compared cognitive ...

Early-stage stem cell therapy trial shows promise for treating progressive MS

2023-11-27
An international team has shown that the injection of a type of stem cell into the brains of patients living with progressive multiple sclerosis (MS) is safe, well tolerated and has a long-lasting effect that appears to protect the brain from further damage. The study, led by scientists at the University of Cambridge, University of Milan Bicocca and Hospital Casa Sollievo della Sofferenza (Italy), is a step towards developing an advanced cell therapy treatment for progressive MS. Over 2 million people live with MS worldwide, ...

LAST 30 PRESS RELEASES:

New perspective highlights urgent need for US physician strike regulations

An eye-opening year of extreme weather and climate

Scientists engineer substrates hostile to bacteria but friendly to cells

New tablet shows promise for the control and elimination of intestinal worms

Project to redesign clinical trials for neurologic conditions for underserved populations funded with $2.9M grant to UTHealth Houston

Depression – discovering faster which treatment will work best for which individual

Breakthrough study reveals unexpected cause of winter ozone pollution

nTIDE January 2025 Jobs Report: Encouraging signs in disability employment: A slow but positive trajectory

Generative AI: Uncovering its environmental and social costs

Lower access to air conditioning may increase need for emergency care for wildfire smoke exposure

Dangerous bacterial biofilms have a natural enemy

Food study launched examining bone health of women 60 years and older

CDC awards $1.25M to engineers retooling mine production and safety

Using AI to uncover hospital patients’ long COVID care needs

$1.9M NIH grant will allow researchers to explore how copper kills bacteria

New fossil discovery sheds light on the early evolution of animal nervous systems

A battle of rafts: How molecular dynamics in CAR T cells explain their cancer-killing behavior

Study shows how plant roots access deeper soils in search of water

Study reveals cost differences between Medicare Advantage and traditional Medicare patients in cancer drugs

‘What is that?’ UCalgary scientists explain white patch that appears near northern lights

How many children use Tik Tok against the rules? Most, study finds

Scientists find out why aphasia patients lose the ability to talk about the past and future

Tickling the nerves: Why crime content is popular

Intelligent fight: AI enhances cervical cancer detection

Breakthrough study reveals the secrets behind cordierite’s anomalous thermal expansion

Patient-reported influence of sociopolitical issues on post-Dobbs vasectomy decisions

Radon exposure and gestational diabetes

EMBARGOED UNTIL 1600 GMT, FRIDAY 10 JANUARY 2025: Northumbria space physicist honoured by Royal Astronomical Society

Medicare rules may reduce prescription steering

Red light linked to lowered risk of blood clots

[Press-News.org] New method uses crowdsourced feedback to help train robots
Human Guided Exploration (HuGE) enables AI agents to learn quickly with some help from humans, even if the humans make mistakes.