(Press-News.org) In a peer-reviewed opinion paper publishing July 10 in the journal Patterns, researchers show that computer programs commonly used to determine if a text was written by artificial intelligence tend to falsely label articles written by non-native language speakers as AI-generated. The researchers caution against the use of such AI text detectors for their unreliability, which could have negative impacts on individuals including students and those applying for jobs.
“Our current recommendation is that we should be extremely careful about and maybe try to avoid using these detectors as much as possible,” says senior author James Zou (@james_y_zou), of Stanford University. “It can have significant consequences if these detectors are used to review things like job applications, college entrance essays or high school assignments.”
AI tools like OpenAI’s ChatGPT chatbot can compose essays, solve science and math problems, and produce computer code. Educators across the U.S. are increasingly concerned about the use of AI in students’ work and many of them have started using GPT detectors to screen students’ assignments. These detectors are platforms that claim to be able to identify if the text is generated by AI, but their reliability and effectiveness remain untested.
Zou and his team put seven popular GPT detectors to the test. They ran 91 English essays written by non-native English speakers for a widely recognized English proficiency test, called Test of English as a Foreign Language, or TOEFL, through the detectors. These platforms incorrectly labeled more than half of the essays as AI-generated, with one detector flagging nearly 98% of these essays as written by AI. In comparison, the detectors were able to correctly classify more than 90% of essays written by eighth-grade students from the U.S. as human-generated.
Zou explains that the algorithms of these detectors work by evaluating text perplexity, which is how surprising the word choice is in an essay. “If you use common English words, the detectors will give a low perplexity score, meaning my essay is likely to be flagged as AI-generated. If you use complex and fancier words, then it's more likely to be classified as human written by the algorithms,” he says. This is because large language models like ChatGPT are trained to generate text with low perplexity to better simulate how an average human talks, Zou adds.
As a result, simpler word choices adopted by non-native English writers would make them more vulnerable to being tagged as using AI.
The team then put the human-written TOEFL essays into ChatGPT and prompted it to edit the text using more sophisticated language, including substituting simple words with complex vocabulary. The GPT detectors tagged these AI-edited essays as human-written.
“We should be very cautious about using any of these detectors in classroom settings, because there's still a lot of biases, and they're easy to fool with just the minimum amount of prompt design,” Zou says. Using GPT detectors could also have implications beyond the education sector. For example, search engines like Google devalue AI-generated content, which may inadvertently silence non-native English writers.
While AI tools can have positive impacts on student learning, GPT detectors should be further enhanced and evaluated before putting into use. Zou says that training these algorithms with more diverse types of writing could be one way to improve these detectors.
###
This work is supported by the National Science Foundation, the US National Institutes of Health, the Silicon Valley Foundation, and the Chan-Zuckerberg Initiative.
Patterns, Liang et al.: “GPT detectors are biased against non-native English writers.” https://www.cell.com/patterns/fulltext/S2666-3899(23)00130-7
Patterns (@Patterns_CP), published by Cell Press, is a data science journal publishing original research focusing on solutions to the cross-disciplinary problems that all researchers face when dealing with data, as well as articles about datasets, software code, algorithms, infrastructures, etc., with permanent links to these research outputs. Visit https://www.cell.com/patterns. To receive Cell Press media alerts, please contact press@cell.com.
END
The summer of 2022 was the hottest summer ever recorded in Europe and was characterised by an intense series of record-breaking heat waves, droughts and forest fires. While Eurostat, the European statistical office, already reported unusually high excess mortality for those dates, until now the fraction of mortality attributable to heat had not been quantified. This is precisely what has been done in a study led by the Barcelona Institute for Global Health (ISGlobal), a centre supported by the "la Caixa" ...
Brain tissue is one of the most intricate specimens that scientists have arguably ever dealt with. Packed with currently immeasurable amount of information, the human brain is the most sophisticated computational device with its network of around 86 billion neurons. Understanding such complexity is a difficult task, and hence making progress requires technologies to unravel the tiny, complex interactions taking place in the brain at microscopic scales. Imaging is therefore an enabling tool in neuroscience.
The new imaging and virtual reconstruction technology developed by Johann Danzl’s group at ISTA is a ...
About The Study: In this cross-sectional study, more than one-fourth of children had delayed or missed preventive care due to the COVID-19 pandemic. These findings may guide targeted interventions to enhance timely pediatric preventive care among different racial and ethnic groups.
Authors: Maya Tabet, Ph.D., M.S., of the University of Health Sciences and Pharmacy in St. Louis, is the corresponding author.
To access the embargoed study: Visit our For The Media website at this link https://media.jamanetwork.com/
(doi:10.1001/jamanetworkopen.2023.22588)
Editor’s ...
About The Study: The findings of this study showed an association between exposure to the COVID-19 pandemic and delayed childhood development at age 5. Variations in development widened during the pandemic regardless of age. It is important to identify children with developmental delays associated with the pandemic and provide them with support for learning, socialization, physical and mental health, and family support.
Authors: Koryu Sato, M.P.H., of Kyoto University in Kyoto, Japan, is the corresponding author.
To access the embargoed ...
About The Study: Based on U.S. nationally representative data, the estimated learning disability prevalence was 8.83% among children and adolescents ages 6 to 17 from 1997 to 2021, which was slightly higher than that from a previous National Health Interview Survey study from 2009 to 2017 (7.74%). These data indicate that learning disability is a common chronic condition among U.S. children, affecting about 9 in 100 overall. In this population-based study, no significant annual change was found.
Authors: Wenhan ...
PRESS RELEASE FROM THE UNIVERSITY OF CAMBRIDGE
EMBARGOED UNTIL 16:00 LONDON TIME (GMT) ON MONDAY 10 JULY 2023
Paper available at: https://drive.google.com/drive/folders/1TzS8tT1_Z4knsHs38gp4LOlLKiPvOUEy?usp=sharing
In-person mindfulness courses help improve mental health for at least six months, study shows
Adults who voluntarily take part in mindfulness courses are less likely to experience symptoms of anxiety and depression for at least six months after completing the programmes, compared to adults who do not take ...
About The Study: This randomized clinical trial found that treatment with abatacept, cenicriviroc, or infliximab showed no significant difference of time to recovery compared with placebo for patients hospitalized with COVID-19 pneumonia.
Authors: William G. Powderly, M.D., of the Washington University School of Medicine in St. Louis, is the corresponding author.
To access the embargoed study: Visit our For The Media website at this link https://media.jamanetwork.com/
(doi:10.1001/jama.2023.11043)
Editor’s Note: Please see the article for additional information, including other authors, author contributions and affiliations, conflict of interest ...
The frequency and severity of wildfires have become increasingly alarming in recent years, substantially due to the effects of climate change. Rising global temperatures, altered weather patterns, and prolonged droughts are all consequences of climate change that contribute to the heightened risk of wildfires.
The 2019-2020 Australian wildfires demonstrated that compound climatic events – long-lasting record high temperatures combined with record low precipitation – can lead to unprecedented ...
This study has attracted the attention of the international scientific community and opens up unprecedented perspectives in the formation of frequency combs: it predicts the existence of two-dimensional optical rules, more complex than the one-dimensional ones used so far and offering unprecedented versatility in a wide range of applications.
Applications in communications, spectroscopy, or computing
Frequency combs have a wide range of applications, particularly in the field of communications. According to the authors of the study, these combs allow large amounts of information to be transmitted through optical fibres in a ...
Orangutans are dependent on their mothers longer than any other non-human animal, nursing until they are at least six years old and living with her for up to three more years, learning how to find, choose, and process the exceedingly varied range of foods they eat. But how do orangutans that have left their mothers and now live far from their natal ranges, where the available foods may be very different, decide what to eat and figure out how to eat it? Now, an international team of authors has shown that in such cases, migrants follow the rule ‘observe, and do as the locals do’.
“Here we show evidence that migrant orangutan males ...