(Press-News.org) Automatic speech recognition (ASR) has made incredible advances in the past few years, especially for widely spoken languages such as English. Prior to 2020, it was typically assumed that human abilities for speech recognition far exceeded automatic systems, yet some current systems have started to match human performance. The goal in developing ASR systems has always been to lower the error rate, regardless of how people perform in the same environment. After all, not even people will recognize speech with 100% accuracy in a noisy environment.
In a new study, UZH computational linguistics specialist Eleanor Chodroff and a fellow researcher from Cambridge University, Chloe Patman, compared two popular ASR systems – Meta’s wav2vec 2.0 and Open AI’s Whisper – against native British English listeners. They tested how well the systems recognized speech in speech-shaped noise (a static noise) or pub noise, and produced with or without a cotton face mask.
Latest OpenAI system better – with one exception
The researchers found that humans still maintained the edge against both ASR systems. However, OpenAI’s most recent large ASR system, Whisper large-v3, significantly outperformed human listeners in all tested conditions except naturalistic pub noise, where it was merely on par with humans. Whisper large-v3 has thus demonstrated its ability to process the acoustic properties of speech and successfully map it to the intended message (i.e., the sentence). “This was impressive as the tested sentences were presented out of context, and it was difficult to predict any one word from the preceding words,” Eleanor Chodroff says.
Vast training data
A closer look at the ASR systems and how they’ve been trained shows that humans are nevertheless doing something remarkable. Both tested systems involve deep learning, but the most competitive system, Whisper, requires an incredible amount of training data. Meta’s wav2vec 2.0 was trained on 960 hours (or 40 days) of English audio data, while the default Whisper system was trained on over 75 years of speech data. The system that actually outperformed human ability was trained on over 500 years of nonstop speech. “Humans are capable of matching this performance in just a handful of years,” says Chodroff. “Considerable challenges also remain for automatic speech recognition in almost all other languages.”
Different types of errors
The paper also reveals that humans and ASR systems make different types of errors. English listeners almost always produced grammatical sentences, but were more likely to write sentence fragments, as opposed to trying to provide a written word for each part of the spoken sentence. In contrast, wav2vec 2.0 frequently produced gibberish in the most difficult conditions. Whisper also tended to produce full grammatical sentences, but was more likely to “fill in the gaps” with completely wrong information.
References
Chloe Patman, Eleanor Chodroff. Speech recognition in adverse conditions by humans and machines. JASA Express Lett. 4, 115204 (2024). DOI: https://doi.org/10.1121/10.0032473
END
Automatic speech recognition on par with humans in noisy conditions
2025-01-14
ELSE PRESS RELEASES FROM THIS DATE:
PolyU researchers develop breakthrough method for self-stimulated ejection of freezing droplets, unlocking cost-effective applications in de-icing
2025-01-14
Water droplets under freezing conditions do not spontaneously detach from surfaces as they do at room temperature due to stronger droplet-surface interaction and lack of an energy transformation pathway. Since accumulated droplets or ice have to be removed manually or with mechanical equipment, which is costly and inefficient, preventing droplet accretion on surfaces is both scientifically intriguing and practically important. Researchers at The Hong Kong Polytechnic University (PolyU) have invented a ground-breaking self-powered mechanism of freezing droplet ejection that allows droplets to ...
85% of Mexican Americans with dementia unaware of diagnosis, outpacing overall rate
2025-01-14
More than three-quarters of older adults with dementia may be unaware of their diagnosis, a University of Michigan study finds.
That number is even higher — up to 85% — among Mexican Americans, who make up the largest share of the U.S. Hispanic and Latino population.
Fewer than 7% of all study participants, who live in Nueces County, Texas and were classified as having probable dementia based on a cognitive assessment, did not have a primary care provider.
The results are published in the Journal of General Internal Medicine.
“Dementia diagnosis unawareness is a public health issue that must be addressed,” ...
Study reveals root-lesion nematodes in maize crops - and one potential new species
2025-01-14
A new study has lifted the lid on five species of root-lesion nematodes living in maize crops across New Zealand - and suggested the existence of a hitherto-unsuspected cryptic species.
The article, ‘Molecular characterization of root-lesion nematode, (Pratylenchus spp.) and their prevalence in New Zealand maize fields’, is published in Letters in Applied Microbiology, an Applied Microbiology International publication.
Identifying these nematodes and understanding their distribution will enable targeted pest management strategies, helping to protect crop yields and maintain agricultural ...
Bioinspired weather-responsive adaptive shading
2025-01-14
Pine cones as a model: Researchers at the universities of Stuttgart and Freiburg have developed a new, energy-autonomous facade system that adapts passively to the weather. The journal Nature Communications has published the research results.
"Most attempts at weather responsiveness in architectural facades rely heavily on elaborate technical devices. Our research explores how we can harness the responsiveness of the material itself through advanced computational design and additive manufacturing," says Professor Achim Menges, head of the Institute for Computational Design and Construction ...
Researchers uncover what drives aggressive bone cancer
2025-01-14
Researchers uncover what drives aggressive bone cancer
Large-scale analysis of patient cohorts reveals a novel mechanism driving osteosarcoma, an aggressive paediatric bone cancer.
The researchers show that this mechanism occurs in approximately 50% of high-grade osteosarcoma cases.
This research also provides insights to help predict osteosarcoma patient outcomes which can help improve the management of this disease.
Osteosarcoma is a type of aggressive bone cancer that most commonly affects children and young adults between the ages of 10 and 20, during times ...
Just as Gouda: Improving the quality of cheese alternatives
2025-01-14
WASHINGTON, Jan. 14, 2025 – Plant-based dairy products are a great alternative for people who avoid animal products, but manufacturers have a hard time replicating the creamy, cheesy qualities that make dairy so indulgent.
Scientists from the University of Guelph in Ontario and Canadian Light Source Inc. in Saskatchewan are working to produce plant-based cheese with all the characteristics of real cheese, but with better health benefits.
In Physics of Fluids, by AIP Publishing, researchers studied multiple types of plant-based proteins and how they interact with ...
Digital meditation to target employee stress
2025-01-14
About The Study: The findings of this study suggest that a brief, digital mindfulness-based program is an easily accessible and scalable method for reducing perceptions of stress. Future work should seek to clarify mechanisms by which such interventions contribute to improvements in work-specific well-being.
Corresponding Author: To contact the corresponding author, Aric A. Prather, PhD, email aric.prather@ucsf.edu.
To access the embargoed study: Visit our For The Media website at this link https://media.jamanetwork.com/
(doi:10.1001/jamanetworkopen.2024.54435)
Editor’s Note: Please see ...
Electronic patient-reported outcome system implementation in outpatient cardiovascular care
2025-01-14
About The Study: In this randomized clinical trial, implementation of the electronic patient-reported outcome (ePRO) monitoring system significantly enhanced patient-physician communication and the clarity of physicians’ explanations about treatment. These findings suggest that the ePRO monitoring system is capable of supporting patient-centered cardiovascular care.
Corresponding Author: To contact the corresponding author, Yoshinori Katsumata, MD, PhD, email goodcentury21@keio.jp.
To ...
Knowledge and use of menthol-mimicking cigarettes among adults in the US
2025-01-14
About The Study: In this survey study of U.S. adults, a substantial proportion were aware of and had already experimented with synthetic cooling agent menthol-mimicking cigarettes. These products may serve as a substitute for menthol cigarettes and reduce the public health benefits of a menthol cigarette ban in promoting smoking cessation.
Corresponding Author: To contact the corresponding author, Kelvin Choi, PhD, email kelvin.choi@nih.gov.
To access the embargoed study: Visit our For The Media website at this link https://media.jamanetwork.com/
(doi:10.1001/jamanetworkopen.2024.54608)
Editor’s ...
Uncurling a single DNA molecule and gluing it down helps sharpen images
2025-01-14
WASHINGTON, Jan. 14, 2025 – Most microscopes can only illuminate objects down to a certain size before tiny features blur together. This blurring is known as the diffraction limit of light. Super-resolution imaging techniques, however, can distinguish between tiny biomolecular features, especially when thermal fluctuations are minimized.
Using advanced imaging techniques and precise microfluidics control to stretch out curly DNA into a straight line, research published this week in AIP Advances, from AIP Publishing, demonstrates ...