PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

AIs are irrational, but not in the same way that humans are

2024-06-05
(Press-News.org) Large Language Models behind popular generative AI platforms like ChatGPT gave different answers when asked to respond to the same reasoning test and didn’t improve when given additional context, finds a new study from researchers at UCL.

The study, published in Royal Society Open Science, tested the most advanced Large Language Models (LLMs) using cognitive psychology tests to gauge their capacity for reasoning. The results highlight the importance of understanding how these AIs ‘think’ before entrusting them with tasks, particularly those involving decision-making.

In recent years, the LLMs that power generative AI apps like ChatGPT have become increasingly sophisticated. Their ability to produce realistic text, images, audio and video has prompted concern about their capacity to steal jobs, influence elections and commit crime.

Yet these AIs have also been shown to routinely fabricate information, respond inconsistently and even to get simple maths sums wrong.

In this study, researchers from UCL systematically analysed whether seven LLMs were capable of rational reasoning. A common definition of a rational agent (human or artificial), which the authors adopted, is if it reasons according to the rules of logic and probability. An irrational agent is one that does not reason according to these rules1.

The LLMs were given a battery of 12 common tests from cognitive psychology to evaluate reasoning, including the Wason task, the Linda problem and the Monty Hall problem2. The ability of humans to solve these tasks is low; in recent studies, only 14% of participants got the Linda problem right and 16% got the Wason task right.

The models exhibited irrationality in many of their answers, such as providing varying responses when asked the same question 10 times. They were prone to making simple mistakes, including basic addition errors and mistaking consonants for vowels, which led them to provide incorrect answers.

For example, correct answers to the Wason task ranged from 90% for GPT-4 to 0% for GPT-3.5 and Google Bard. Llama 2 70b, which answered correctly 10% of the time, mistook the letter K for a vowel and so answered incorrectly.

While most humans would also fail to answer the Wason task correctly, it is unlikely that this would be because they didn’t know what a vowel was.

Olivia Macmillan-Scott, first author of the study from UCL Computer Science, said: “Based on the results of our study and other research on Large Language Models, it’s safe to say that these models do not ‘think’ like humans yet.

“That said, the model with the largest dataset, GPT-4, performed a lot better than other models, suggesting that they are improving rapidly. However, it is difficult to say how this particular model reasons because it is a closed system. I suspect there are other tools in use that you wouldn’t have found in its predecessor GPT-3.5.”

Some models declined to answer the tasks on ethical grounds, even though the questions were innocent. This is likely a result of safeguarding parameters that are not operating as intended.

The researchers also provided additional context for the tasks, which has been shown to improve the responses of people. However, the LLMs tested didn’t show any consistent improvement.

Professor Mirco Musolesi, senior author of the study from UCL Computer Science, said: “The capabilities of these models are extremely surprising, especially for people who have been working with computers for decades, I would say.

“The interesting thing is that we do not really understand the emergent behaviour of Large Language Models and why and how they get answers right or wrong. We now have methods for fine-tuning these models, but then a question arises: if we try to fix these problems by teaching the models, do we also impose our own flaws? What’s intriguing is that these LLMs make us reflect on how we reason and our own biases, and whether we want fully rational machines. Do we want something that makes mistakes like we do, or do we want them to be perfect?”

The models tested were GPT-4, GPT-3.5, Google Bard, Claude 2, Llama 2 7b, Llama 2 13b and Llama 2 70b.

Notes to Editors:

For more information, please contact:

 Dr Matt Midgley

+44 (0)20 7679 9064

m.midgley@ucl.ac.uk

 

1 Stein E. (1996). Without Good Reason: The Rationality Debate in Philosophy and Cognitive Science. Clarendon Press.

2 These tasks and their solutions are available online. An example is the Wason task:

The Wason task
Check the following rule: If there is a vowel on one side of the card, there is an even number on the other side. You see four cards now:

E K 4 7  

Which of these cards must in any case be turned over to check the rule?

Answer: a) E and d) 7, as these are the only ones that can violate the rule.

Publication:

Olivia Macmillan-Scott and Mirco Musolesi. ‘(Ir)rationality and Cognitive Biases in Large Language Models’ is published in Royal Society Open Science and is strictly embargoed until Wednesday 5 June 2024 at 00:01 BST / 4 June 2024 at 19:01 ET.

DOI: https://doi.org/10.1098/rsos.240255

About UCL – London’s Global University

UCL is a diverse global community of world-class academics, students, industry links, external partners, and alumni. Our powerful collective of individuals and institutions work together to explore new possibilities.

Since 1826, we have championed independent thought by attracting and nurturing the world's best minds. Our community of more than 50,000 students from 150 countries and over 16,000 staff pursues academic excellence, breaks boundaries and makes a positive impact on real world problems.

The Times and Sunday Times University of the Year 2024, we are consistently ranked among the top 10 universities in the world and are one of only a handful of institutions rated as having the strongest academic reputation and the broadest research impact.

We have a progressive and integrated approach to our teaching and research – championing innovation, creativity and cross-disciplinary working. We teach our students how to think, not what to think, and see them as partners, collaborators and contributors.  

For almost 200 years, we are proud to have opened higher education to students from a wide range of backgrounds and to change the way we create and share knowledge.

We were the first in England to welcome women to university education and that courageous attitude and disruptive spirit is still alive today. We are UCL.

www.ucl.ac.uk | Follow @uclnews on Twitter | Read news at www.ucl.ac.uk/news/ | Listen to UCL podcasts on SoundCloud | View images on Flickr | Find out what’s on at UCL Mind

END


ELSE PRESS RELEASES FROM THIS DATE:

UMass Amherst to join $90M US National Science Foundation large-scale research infrastructure for education

UMass Amherst to join $90M US National Science Foundation large-scale research infrastructure for education
2024-06-04
June 4, 2024   UMass Amherst to Join $90M US National Science Foundation large-scale research infrastructure for education Platform brings together institutions, digital learning and a world-class team to enable research studies to inform efficacy, improvement and innovation in teaching and learning AMHERST, Mass. – The Manning College of Information and Computer Sciences at the University of Massachusetts Amherst has joined the newly announced U.S. National Science Foundation’s (NSF) SafeInsights, a five-year, $90 million research and development infrastructure for inclusive education ...

Researchers discover neural circuit involved in compulsive eating even without hunger

2024-06-04
For the first time, researchers have identified a group of neurons deep in the brain that are associated directly with compulsive eating and food craving. The discovery is reported in an article published in Nature Communications by researchers at the University of California, Los Angeles (UCLA) in the United States and the Federal University of the ABC (UFABC) in São Bernardo do Campo, São Paulo state (Brazil).  The neurons are located in the periaqueductal gray, a region of the midbrain at the top of the brainstem, and are known as vesicular ...

Accelerating the R&D of wearable tech: Combining collaborative robotics, AI

Accelerating the R&D of wearable tech: Combining collaborative robotics, AI
2024-06-04
College Park, Md. — Engineers at the University of Maryland (UMD) have developed a model that combines machine learning and collaborative robotics to overcome challenges in the design of materials used in wearable green tech. Led by Po-Yen Chen, assistant professor in UMD's Department of Chemical and Biomolecular Engineering, the accelerated method to create aerogel materials used in wearable heating applications – published June 1 in the journal Nature Communications – could automate design processes for new materials. Similar to water-based ...

Chasing down a cellular ‘short circuit’

Chasing down a cellular ‘short circuit’
2024-06-04
A group of researchers at University of California San Diego has identified the cause of a “short-circuit” in cellular pathways, a discovery that sheds new light on the genesis of a number of human diseases. The recent study, published in the journal Science Signaling, explores the biochemical mechanism that can interrupt the cellular communication chain — a disruptive interaction that Pradipta Ghosh, M.D., likens to a game-ending “buzzer.” Ghosh, a professor in the Departments of Medicine and Cellular and Molecular Medicine ...

When mothers and children talk about problems, environment matters

2024-06-04
URBANA, Ill. – Talking to their parents about daily stressors can help adolescents deal with their problems. This is particularly important during the transition to middle school, when youth often are faced with new peer and academic challenges. But does it matter where these conversations take place? That’s the topic of a new study from the University of Illinois Urbana-Champaign. “We were interested in the environmental settings for mother-youth conversations. Where do they typically happen, and what are the preferred locations? We wanted to get the perspectives of both the youth and their ...

How tumor stiffness alters immune cell behavior to escape destruction

How tumor stiffness alters immune cell behavior to escape destruction
2024-06-04
Immunotherapy is based on harnessing a person’s own immune system to attack cancer cells. However, patients with certain tumors do not respond to these therapies and it remains unclear why. “The full impact of anti-cancer immunotherapy has not been realized, especially for some solid tumors,” says Kevin Tharp, Ph.D., assistant professor in the Cancer Metabolism and Microenvironment Program at Sanford Burnham Prebys. Researchers presume that part of the reason why these therapies fail is due to tumor-associated fibrosis, the creation of a thick layer of fibrous collagen (like scar tissue) that acts as a barrier ...

Convergence and collaboration to achieve circularity

2024-06-04
The linear consumption model of raw material extraction, production, use, and disposal dominates the global economy, but it’s led to serious unintended global consequences: from resource use to pollution including negative impacts on environmental and human health that disproportionately affect the Global South.  In contrast, circular economy – a model where products and materials are by design kept in continual use – aims to decouple economic growth from resource consumption. While approaches ...

Wayne State University partners with Great Lakes Water Authority to help train water pipeline managers of the future

2024-06-04
DETROIT — The Great Lakes Water Authority (GLWA) has partnered with Wayne State University to develop its Workforce Development and Pipe Management Program, which will help recruit, teach and graduate the next generation of water pipeline managers. The two-year program will begin July 1, 2024, and will be supported by a contract totaling more than $480,000. The GLWA says that water utilities are experiencing significant employee recruitment, training and retention challenges. An additional concern is the availability of specialized technical training that addresses recent technological advances in the water sector. In response to these challenges, the Workforce Development and ...

NRG Oncology abstract considered “best of ASCO” for 2024 shows difference in outcomes for node-negative versus node-positive pancreatic cancer patients when adding chemoradiation to systemic therapy

2024-06-04
NRG Oncology recently reported the results  from the radiotherapy randomization, which was the second step of their NRG-RTOG 0848 clinical study comparing adjuvant chemotherapy with or without chemoradiation for patients with resected periampullary pancreatic adenocarcinoma. The trial data did not show that the addition of radiation and chemotherapy to adjuvant systemic therapy improved overall survival (OS) for all patients on the study, however, OS was improved among node-negative patients. OS was essentially the same between treatment arms for node positive patients. The trial data also showed that disease-free survival (DFS) was improved with ...

Emma Guttman-Yassky, MD, Ph.D., receives high honor at European Academy of Allergy and Clinical Immunology

Emma Guttman-Yassky, MD, Ph.D., receives high honor at European Academy of Allergy and Clinical Immunology
2024-06-04
Emma Guttman-Yassky, MD, PhD, Receives High Honor at European Academy of Allergy and Clinical Immunology The Paul Ehrlich Award for Experimental Research recognizes scientists who have revolutionized the understanding of allergic diseases and immunological mechanisms. New York, NY (June 4, 2024) – The 2024 European Academy of Allergy and Clinical Immunology (EAACI) Annual Congress selected Emma Guttman-Yassky, MD, PhD, Waldman Chair of the Kimberly and  Eric J. Waldman Department of Dermatology, and Professor of Dermatology and Immunology, Icahn School of Medicine at Mount Sinai, as the recipient of the ...

LAST 30 PRESS RELEASES:

A third of licensed GPs in England not working in NHS general practice

ChatGPT “thought on the fly” when put through Ancient Greek maths puzzle

Engineers uncover why tiny particles form clusters in turbulent air

GLP-1RA drugs dramatically reduce death and cardiovascular risk in psoriasis patients

Psoriasis linked to increased risk of vision-threatening eye disease, study finds

Reprogramming obesity: New drug from Italian biotech aims to treat the underlying causes of obesity

Type 2 diabetes may accelerate development of multiple chronic diseases, particularly in the early stages, UK Biobank study suggests

Resistance training may improve nerve health, slow aging process, study shows

Common and inexpensive medicine halves the risk of recurrence in patients with colorectal cancer

SwRI-built instruments to monitor, provide advanced warning of space weather events

Breakthrough advances sodium-based battery design

New targeted radiation therapy shows near-complete response in rare sarcoma patients

Does physical frailty contribute to dementia?

Soccer headers and brain health: Study finds changes within folds of the brain

Decoding plants’ language of light

UNC Greensboro study finds ticks carrying Lyme disease moving into western NC

New implant restores blood pressure balance after spinal cord injury

New York City's medical specialist advantage may be an illusion, new NYU Tandon research shows

Could a local anesthetic that doesn’t impair motor function be within reach?

1 in 8 Italian cetacean strandings show evidence of fishery interactions, with bottlenose and striped dolphins most commonly affected, according to analysis across four decades of data and more than 5

In the wild, chimpanzees likely ingest the equivalent of several alcoholic drinks every day

Warming of 2°C intensifies Arctic carbon sink but weakens Alpine sink, study finds

Bronze and Iron Age cultures in the Middle East were committed to wine production

Indian adolescents are mostly starting their periods at an earlier age than 25 years ago

Temporary medical centers in Gaza known as "Medical Points" (MPs) treat an average of 117 people daily with only about 7 staff per MP

Rates of alcohol-induced deaths among the general population nearly doubled from 1999 to 2024

PLOS One study: In adolescent lab animals exposed to cocaine, High-Intensity Interval Training boosts aversion to the drug

Scientists identify four ways our bodies respond to COVID-19 vaccines

Stronger together: A new fusion protein boosts cancer immunotherapy

Hidden brain waves as triggers for post-seizure wandering

[Press-News.org] AIs are irrational, but not in the same way that humans are