Q&A: New AI training method lets systems better adjust to users’ values

2024-12-18

(Press-News.org) Ask most major artificial intelligence chatbots, such as OpenAI’s ChatGPT, to say something cruel or inappropriate and the system will say it wants to keep things “respectful.” These systems, trained on the content of a profusely disrespectful internet, learned what constitutes respect through human training. The standard method, called reinforcement learning from human feedback, or RLHF, has people compare two outputs from the systems and select whichever is better. It’s used to improve the quality of responses — including putting up some guardrails around inappropriate outputs.

But it also means that these systems inherit value systems from the people training them. These values may not be shared by users. University of Washington researchers created a method for training AI systems — both for large language models like ChatGPT and for robots — that can better reflect users’ diverse values. Called “variational preference learning,” or VPL, the method predicts users’ preferences as they interact with it, then tailors its outputs accordingly.

The team presented its research Dec. 12 at the Conference on Neural Information Processing Systems in Vancouver, British Columbia.

UW News spoke with co-senior author Natasha Jaques, an assistant professor in the Paul G. Allen School of Computer Science & Engineering, about the new method and the trouble with AI systems’ values.

What is the problem with AI having fixed values?

NJ: Traditionally, a small set of raters — the people reviewing the outputs — are trained to answer in a way similar to the researchers at OpenAI, for instance. So it's essentially the researchers at OpenAI deciding what is and isn't appropriate to say for the model, which then gets deployed to 100 million monthly users. But we think this is insufficient, because people have very different preferences. What's appropriate and inappropriate varies a lot based on culture and norms and individuals, and it's actually a deeper problem than that.

A recent paper showed that if a majority group has only a weak preference for a certain outcome and a minority group that has a strong preference for a different outcome, the minority group will just be outvoted and the majority group will win. So a great example the authors use is a college admission system. An applicant could chat with the LLM about information they need when applying to the college. Let's say the college mostly serves people of high socioeconomic status, so most students don't care about seeing information about financial aid, but a minority of students really need that information. If that chatbot is trained on human feedback, it might then learn to never give information about financial aid, which would severely disadvantage that minority — even though the majority don't really care if they see it. They just have a slight preference not to.

Even if someone didn't care about the multicultural aspects of this and just wanted the best model performance, it's still a problem, because with RLHF, the model can basically try to average all the preferences together, and this can make it incorrect for all users. This is important in chatbots, but the problem is super clear in household robotics, where a robot is putting away your dishes, for instance. It’s pretty clear that each person needs the robot to put their dishes away in a different configuration. We show an example of this with a robot navigating a maze: If some users want the robot to go to the top right and some want it to go to the bottom right and you just train on their preferences, the robot learns to average their preferences and go to the middle. That’s just wrong for everybody.

Can you explain how your system is different?

NJ: In the RLHF model, the system learns to predict which of two things the human will prefer and output those, so it ends up adhering to a single set of values. What we do is tell our model to infer something about the user's hidden preferences. Given a few answers from the human about what things they like better, it learns a mapping of who this user is. It learns what’s called an “embedding vector” of this person’s unique preferences, and that enables it to make these personalized predictions about each person's preferences and adhere to those.

Can you explain what values mean in this context? Do they encompass political values? Or preferences for long, detailed responses or brief overviews?

NJ: It can be broad because people give feedback by just looking at two different outputs from the model and saying which one they like better. It could be that one output says something biased or inappropriate and the other doesn’t. Or it could just be that a person prefers the way one output sounds, like maybe it better adheres to their writing style.

In the robotics setting, imagine you're trying to train a household robot to help you clean up your house or unload your dishwasher. Everyone has a different way they've organized their kitchen. So the system needs to be able to learn each person's unique preferences.

What did you find with this new approach? How does it perform differently than the old one?

NJ: We created some datasets, both in language and in simulated robotics tasks where people had divergent preferences. And what we show is that the existing RLHF technique that's used to train things like ChatGPT just can't fit those datasets at all. It’s getting about 50% accuracy in predicting people's binary preferences, but when we introduce our model, the accuracy goes up 10% to 25%.

One of the big complaints a lot of people have about AI models is that they average things into mediocrity. They can write a novel, but it’s generic. Is this method a way to potentially move beyond that?

NJ: We haven't tested on this kind of scale, but our approach in theory would be capable of saying, like, “I've seen a bunch of preference data from you. I learned a unique embedding vector that describes what your preferences are, and I can better cater to your style.” Beyond what is biased or not, it’s guessing what you like better.

Are there potential drawbacks to having this more intuitive system of values? Could it just start reproducing people’s biases as it learns their preferences, and then direct them away from facts?

NJ: Yeah, I think you might not want to personalize every type of information. There's a nice paper published by UW researchers on this problem called A Roadmap to Pluralistic Alignment, which spells out different ways to align to the values of more than one set of people. Catering to the individual is one way you could handle it, which may not be the best way. The authors offer another, which would be just saying all possible answers and letting the user decide which they like better. They also talk about this idea of “distributional pluralistic alignment,” which means learning how to model the underlying distribution of people's preferences. So you can think of our work as a technical approach for achieving the distributional part. We wanted to see if, technically, we can find a method that's capable of learning those preferences.

What should the public know about this research and about AI value systems more broadly?

NJ: I think a really important misconception that some people have is that AI systems won't inherit human biases because they’re on computers. But actually, AI models tend to be more biased than people because they're training on all of this historical data. They're training on all the data on the internet since its inception. They tend to exhibit value systems that predate where we are in the modern era. Maybe that’s racism or sexism. I have work showing they have more conservative political values according to a moral foundation survey. The only technique we really have to address biases is RLHF.

I think it's a little scary that we have researchers at a handful of corporations, who aren't trained in policy or sociology, deciding what is appropriate and what is not for the models to say, and we have so many people using these systems and trying to find out the truth from them. This is one of the more pressing problems in AI, so we need better techniques to address it.

Where do you want to take this research going forward?

NJ: A limitation of the current work is there aren't that many publicly available datasets where people have genuinely different preferences, so we kind of had to synthesize the different preference data that we used in this paper. But there have recently been efforts to collect multicultural preference data. There's this PRISM dataset, which collects preference ratings on contentious topics from people from over 200 different countries. We'd like to actually try fitting our model to this real-world multicultural preference data to see how it's able to model these different preferences.

Additional coauthors include Sriyash Poddar, Yanming Wan, Hamish Ivison — all doctoral students in the Allen School — and Abhishek Gupta, an assistant professor in the Allen School.

For more information, contact Jaques at nj@cs.washington.edu.

END

ELSE PRESS RELEASES FROM THIS DATE:

New study unlocks parental identity with new lens on education spending

2024-12-18

How much parents spend on their children’s education has a big impact on family well-being and a country’s overall development. While past studies suggested that ethnic and racial backgrounds affect this spending, they lacked solid experimental proof – making their findings less reliable. A new study led by Lingjiang Lora Tu, Ph.D., from Baylor University’s Hankamer School of Business examines the psychological factors driving parental investment in education, highlighting how a parent’s self-view – whether they see themselves as independent or connected to others – shapes their spending patterns. ...

Getting in sync: Wearables reveal happiest times to sleep

2024-12-18

Sleep schedules are often one of the first things that people choose to compromise in order to check everything off their to-do lists, especially with the end of the year approaching. But folks hoping for happy holidays should reconsider. A new study from the University of Michigan shows that when people's sleep cycles are misaligned with their internal clocks, or circadian rhythms, it can have drastic effects on their moods. Conversely, however, that means getting sleep when the body's expecting it provides a potent boost to one's emotional state and could alleviate symptoms associated with mood disorders, said senior author Daniel Forger. "This is not going ...

Good news for seniors: Study finds antibiotics not linked to dementia

2024-12-18

EMBARGOED FOR RELEASE UNTIL 4 P.M. ET, WEDNESDAY, DECEMBER 18, 2024 MINNEAPOLIS – For healthy older adults, using antibiotics is not associated with an increased risk of cognitive impairment or dementia, according to a study published in the December 18, 2024, online issue of Neurology®, the medical journal of the American Academy of Neurology. Cognitive impairment is when someone has subtle changes in thinking and memory like forgetting events and losing items more often. Dementia is when thinking and memory problems become more advanced ...

Sleep apnea linked to changes in the brain

2024-12-18

EMBARGOED FOR RELEASE UNTIL 4 P.M. ET, WEDNESDAY, DECEMBER 18, 2024 MINNEAPOLIS – People with breathing problems during sleep may have a larger hippocampus, the area of the brain responsible for memory and thinking, according to a study published in the December 18, 2024, online issue of Neurology®, the medical journal of the American Academy of Neurology. The study, which included mostly Latino people, also found that those with lower oxygen levels during sleep had changes in the deep parts of the brain, the white matter, a common finding of decreased brain health that develops with age. Sleep disordered breathing is a range ...

Supportive marriages key to caregiver well-being: Rice study reveals vital link for dementia spousal caregivers

2024-12-18

A new Rice University study sheds light on the critical role marital relationships play in the mental and physical health of caregivers for spouses living with dementia, revealing that caregiver mental health dramatically improves when carers feel supported, understood and appreciated by their loved ones requiring care. The research was led by Vincent Lai, a graduate student in psychological sciences at Rice. The study involved 161 spousal caregivers and explored the unique challenges they face. Participants completed detailed assessments, including questionnaires, health evaluations and blood draws. The findings revealed that caregivers who reported ...

An immersive VR exercise session engaged participants in more intense and reportedly enjoyable exercise, with more positive emotions, compared to a workout presented on-screen

2024-12-18

An immersive VR exercise session engaged participants in more intense and reportedly enjoyable exercise, with more positive emotions, compared to a workout presented on-screen, suggesting immersive VR could be an efficient alternative to other forms of screen-based workouts ### Article URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0314331 Article Title: Acute psychological and physiological benefits of exercising with virtual reality Author Countries: U.K., Australia Funding: OR received contract research funding from FitXR https://fitxr.com/. The funders had no role in study design, data collection and analysis, ...

Pine-oak forests and frequent fires have been a predominant feature of Albany Pine Bush, New York, for the last 11,000 years

2024-12-18

Pine-oak forests and frequent fires have been a predominant feature of Albany Pine Bush, New York, for the last 11,000 years - though increases in ferns, mosses, and peat-deposition reflect moister climates in recent millennia, according to pollen and charcoal samples ### Article URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0314101 Article Title: A 13,000-year history of vegetation and fire in a rare inland pine barrens: The Albany Pine Bush (Albany County, New York, USA) Author Countries: Canada, U.S. Funding: (JCS) The private donor-funded Draper-Lussi Endowed Chair Fund at Paul Smith’s College, ...

Researchers reveal mechanisms underlying Sjögren’s disease

2024-12-18

Researchers at NYU College of Dentistry and NYU Grossman School of Medicine are closer to understanding what drives the autoimmune disorder Sjögren’s disease, thanks to new discoveries about the role of calcium signaling, regulatory T cells, and interferon. Their latest study, published in Science Translational Medicine, finds that impaired regulatory T cells are a critical contributing factor to Sjögren’s disease in both mice and humans, and identifies an existing rheumatology drug as a promising therapy for the disease. In Sjögren’s disease, the immune system attacks the glands that produce saliva and tears, ...

New knit haptic sleeve simulates realistic touch

2024-12-18

Wearable haptic devices, which provide touch-based feedback, can provide more realistic experiences in virtual reality, assist with rehabilitation, and create new opportunities for silent communication. Currently, most of these devices rely on vibration, as pressure-based haptics have typically required users to wear stiff exoskeletons or other bulky structures. Now, researchers at Stanford Engineering have designed a comfortable, flexible knit sleeve, called Haptiknit, that can provide realistic pressure-based haptic ...

Researchers compare artificial intelligence ‘ageing clocks’ to predict health and lifespan

2024-12-18

Researchers at the Institute of Psychiatry, Psychology & Neuroscience (IoPPN) at King’s College London have conducted a comprehensive study to evaluate artificial intelligence based ageing clocks, which predict health and lifespan using data from blood. The researchers trained and tested 17 machine learning algorithms using data on markers in the blood from over 225,000 UK Biobank participants, aged 40 to 69 years when they were recruited. They investigated how well different metabolomic ageing clocks predict lifespan and how robustly these clocks were associated with measures of health and ageing. A person’s metabolomic age, their “MileAge”, is a measure of ...

Study links exercise with decreased mortality and cardiovascular events in people recently diagnosed with type 2 diabetes but no previous cardiovascular disease

Q&A: New AI training method lets systems better adjust to users’ values

ELSE PRESS RELEASES FROM THIS DATE:

LAST 30 PRESS RELEASES: