Lay intuition as effective at jailbreaking AI chatbots as technical methods

2025-11-04

(Press-News.org) UNIVERSITY PARK, Pa. — It doesn’t take technical expertise to work around the built-in guardrails of artificial intelligence (AI) chatbots like ChatGPT and Gemini, which are intended to ensure that the chatbots operate within a set of legal and ethical boundaries and do not discriminate against people of a certain age, race or gender. A single, intuitive question can trigger the same biased response from an AI model as advanced technical inquiries, according to a team led by researchers at Penn State.

“A lot of research on AI bias has relied on sophisticated ‘jailbreak’ techniques,” said Amulya Yadav, associate professor at Penn State’s College of Information Sciences and Technology. “These methods often involve generating strings of random characters computed by algorithms to trick models into revealing discriminatory responses. While such techniques prove these biases exist theoretically, they don’t reflect how real people use AI. The average user isn’t reverse-engineering token probabilities or pasting cryptic character sequences into ChatGPT — they type plain, intuitive prompts. And that lived reality is what this approach captures.”

Prior work probing AI bias — skewed or discriminatory outputs from AI systems caused by human influences in the training data, like language or cultural bias — has been done by experts using technical knowledge to engineer large language model (LLM) responses. To see how average internet users encounter biases in AI-powered chatbots, the researchers studied the entries submitted to a competition called “Bias-a-Thon.” Organized by Penn State’s Center for Socially Responsible AI(CSRAI), the competition challenged contestants to come up with prompts that would lead generative AI systems to respond with biased answers.

They found that the intuitive strategies employed by everyday users were just as effective at inducing biased responses as expert technical strategies. The researchers presented their findings at the 8th AAAI/ACM Conference on AI, Ethics, and Society.

Fifty-two individuals participated in the Bias-a-Thon, submitting screenshots of 75 prompts and AI responses from eight generative AI models. They also provided an explanation of the bias or stereotype that they identified in the response, such as age-related or historical bias.

The researchers conducted Zoom interviews with a subset of the participants to better understand their prompting strategies and their conceptions of ideas like fairness, representation and stereotyping when interacting with generative AI tools. Once they arrived at a participant-informed working definition of “bias” — which included a lack of representation, stereotypes and prejudice, and unjustified preferences toward groups — the researchers tested the contest prompts in several LLMs to see if they would elicit similar responses.

“Large language models are inherently random,” said lead author Hangzhi Guo, a doctoral candidate in information sciences and technology at Penn State. “If you ask the same question to these models two times, they might return different answers. We wanted to use only the prompts that were reproducible, meaning that they yielded similar responses across LLMs.”

The researchers found that 53 of the prompts generated reproducible results. Biases fell into eight categories: gender bias; race, ethnic and religious bias; age bias; disability bias; language bias; historical bias favoring Western nations; cultural bias; and political bias. The researchers also found that participants used seven strategies to elicit these biases: role playing, or asking the LLM to assume a persona; hypothetical scenarios; using human knowledge to ask about niche topics, where it’s easier to identify biased responses; using leading questions on controversial topics; probing biases in under-represented groups; feeding the LLM false information; and framing the task as having a research purpose.

“The competition revealed a completely fresh set of biases,” said Yadav, organizer of the Bias-a-Thon. “For example, the winning entry uncovered an uncanny preference for conventional beauty standards. The LLMs consistently deemed a person with a clear face to be more trustworthy than a person with facial acne, or a person with high cheekbones more employable than a person with low cheekbones. This illustrates how average users can help us uncover blind spots in our understanding of where LLMs are biased. There may be many more examples such as these which have been overlooked by the jailbreaking literature on LLM bias.”

The researchers described mitigating biases in LLMs as a cat-and-mouse game, meaning that developers are constantly addressing issues as they arise. They suggested strategies that developers can use to mitigate these issues now, including implementing a robust classification filter to screen outputs before they go to users, conducting extensive testing, educating users and providing specific references or citations so users can verify information.

“By shining a light on inherent and reproducible biases that laypersons can identify, the Bias-a-Thon serves an AI literacy function,” said co-author S. Shyam Sundar, Evan Pugh University Professor at Penn State and director of the Penn State Center for Socially Responsible Artificial Intelligence, which has since organized other AI competitions such as Fake-a-thon, Diagnose-a-thon and Cheat-a-thon. “The whole goal of these efforts is to increase awareness of systematic problems with AI, to promote the informed use of AI among laypersons and to stimulate more socially responsible ways of developing these tools.”

Other Penn State contributors to this research include doctoral candidates Eunchae Jang, Wenbo Zhang, Bonam Mingole and Vipul Gupta. Pranav Narayanan Venkit, research scientist at Salesforce AI Research; Mukund Srinath, machine learning scientist at Expedia; and Kush R. Varshney from IBM Research also participated in the work.

END

ELSE PRESS RELEASES FROM THIS DATE:

USC researchers use AI to uncover genetic blueprint of the brain’s largest communication bridge

2025-11-04

For the first time, a research team led by the Mark and Mary Stevens Neuroimaging and Informatics Institute (Stevens INI) at the Keck School of Medicine of USC has mapped the genetic architecture of a crucial part of the human brain known as the corpus callosum—the thick band of nerve fibers that connects the brain’s left and right hemispheres. The findings open new pathways for discoveries about mental illness, neurological disorders and other diseases related to defects in this part of the brain. The corpus callosum is critical for nearly everything the brain does, from coordinating ...

Tiny swarms, big impact: Researchers engineering adaptive magnetic systems for medicine, energy and environment

2025-11-04

Rice University is partnering with researchers at the University of Washington, Columbia University and Louisiana State University on a $2 million award from the National Science Foundation to revolutionize how materials and microrobots can be designed, controlled and applied in real-world environments. Funded through NSF’s Designing Materials to Revolutionize and Engineer our Future (DMREF) program, the four-year project — Adaptive and Responsive Magnetic Swarms (ARMS) — aims to create microscopic robotic swarms that move and think collectively, much like schools of fish or flocks of birds. Led by principal investigator Zach ...

MSU study: How can AI personas be used to detect human deception?

2025-11-04

EAST LANSING, Mich. – Can an AI persona detect when a human is lying – and should we trust it if it can? Artificial intelligence, or AI, has had many recent advances and continues t evolve in scope and capability. A new Michigan State University–led study is diving deeper into how well AI can understand humans by using it to detect human deception. In the study, published in the Journal of Communication, researchers from MSU and the University of Oklahoma conducted 12 experiments with over 19,000 AI participants to examine how well AI personas were ...

Slowed by sound: A mouse model of Parkinson’s Disease shows noise affects movement

2025-11-04

In the development of Parkinson’s disease, it may not be a good idea to turn the amp to 11. High-volume noise exposure produced motor deficits in a mouse model of early-stage Parkinson’s disease, and established a link between the auditory processing and movement areas of the brain, according to a study published November 4th in the open-access journal PLOS Biology by Pei Zhang from the Huazhong University of Science and Technology in Wuhan, China, and colleagues. The environment can play an important role in the development of Parkinson’s disease, but how sound volume in particular might impact the severity of symptoms was unknown. To understand how ...

Demographic shifts could boost drug-resistant infections across Europe

2025-11-04

The rates of bloodstream infections caused by drug-resistant bacteria will increase substantially across Europe in the next five years, driven largely by aging populations, according to a new paper published November 4th in the open-access journal PLOS Medicine by Gwenan Knight of the London School of Hygiene and Tropical Medicine, UK, and colleagues. Antimicrobial resistance (AMR) is a global public health crisis. To effectively target interventions and track progress toward international goals, accurately estimating how the AMR burden will change over time is necessary. In ...

Insight into how sugars regulate the inflammatory disease process

2025-11-04

New research has updated our understanding of how sugars, known as glycans, help immune cells move into skin in the inflammatory disease, psoriasis. The paper entitled “Leukocytes have a heparan sulfate glycocalyx that regulates recruitment during psoriasis-like skin inflammation” published in the journal Science Signaling. The lead authors are Dr Amy Saunders from Lancaster University and Dr Douglas Dyer from the University of Manchester, with their joint PhD student, ...

PKU scientists uncover climate impacts and future trends of hailstorms in China

2025-11-04

Peking University, November 4, 2025: A research team led by Professor Zhang Qinghong and Li Rumeng from the Department of Atmospheric and Oceanic Sciences at Peking University (PKU) School of Physics, has found that hailstorms in China have surged since the Industrial Revolution, likely due to human-driven climate warming. The study, published in Nature Communications in September 2025, combines historical records, meteorological data, and artificial intelligence to track long-term hailstorm trends. Why It Matters: Hail can fall fast and hit hard. Apart from smashing crops and damaging homes, it may even endanger lives. After 2024’s record-breaking ...

Computer model mimics human audiovisual perception

2025-11-04

A neural computation first discovered in insects has been shown to explain how humans combine sight and sound – even when illusions trick us into “hearing” what we do not see. Now, researcher Dr Cesare Parise from the University of Liverpool, UK, has created a biologically grounded model based on this computation, which can take in real-life audiovisual information instead of more abstract parameters used in previous models. Parise’s research, published today in eLife as the final Version of ...

AC instead of DC: A game-changer for VR headsets and near-eye displays

2025-11-04

WASHINGTON, Nov. 4, 2025 — LEDs, or light-emitting diodes, are essential components in near-eye displays like virtual reality and augmented reality headsets and smart glasses, along with electronics like cameras and medical equipment. Conventional LEDs use direct current power, which requires two contacts, like the positive and negative contacts to connect a battery. As device form factors continue to shrink, fabricating nano-LEDs requires each of the hundreds of microscopic components to touch both contacts, which presents a complicated alignment problem for device manufacturers. In Applied Physics Letters, ...

Prevention of cardiovascular disease events and deaths among black adults via systolic blood pressure equity

2025-11-04

About The Study: The findings of this modeling study suggest that achieving systolic blood pressure equity between non-Hispanic Black and white adults could substantially reduce the number of cardiovascular disease events and deaths experienced by non-Hispanic Black U.S. adults. Initiatives to maintain normal blood pressure and achieve blood pressure control for individuals with hypertension could have a substantial impact on health equity in the U.S. Corresponding Author: To contact the corresponding author, Shakia T. ...

In recognition of World AIDS Day 2025, Gregory Folkers and Anthony Fauci reflect on progress made in antiretroviral treatments and prevention of HIV/AIDS, highlighting promising therapeutic developmen

Lay intuition as effective at jailbreaking AI chatbots as technical methods

ELSE PRESS RELEASES FROM THIS DATE:

LAST 30 PRESS RELEASES: