- Press Release Distribution

Deep learning networks prefer the human voice -- just like us

Columbia engineers demonstrate that AI systems might reach higher performance if programmed with sound files of human language rather than with binary data labels

Deep learning networks prefer the human voice -- just like us
( New York, NY--April 6, 2021--The digital revolution is built on a foundation of invisible 1s and 0s called bits. As decades pass, and more and more of the world's information and knowledge morph into streams of 1s and 0s, the notion that computers prefer to "speak" in binary numbers is rarely questioned. According to new research from Columbia Engineering, this could be about to change.

A new study from Mechanical Engineering Professor Hod Lipson and his PhD student Boyuan Chen proves that artificial intelligence systems might actually reach higher levels of performance if they are programmed with sound files of human language rather than with numerical data labels. The researchers discovered that in a side-by-side comparison, a neural network whose "training labels" consisted of sound files reached higher levels of performance in identifying objects in images, compared to another network that had been programmed in a more traditional manner, using simple binary inputs.


"To understand why this finding is significant," said Lipson, James and Sally Scapa Professor of Innovation and a member of Columbia's Data Science Institute, "It's useful to understand how neural networks are usually programmed, and why using the sound of the human voice is a radical experiment."

When used to convey information, the language of binary numbers is compact and precise. In contrast, spoken human language is more tonal and analog, and, when captured in a digital file, non-binary. Because numbers are such an efficient way to digitize data, programmers rarely deviate from a numbers-driven process when they develop a neural network.

Lipson, a highly regarded roboticist, and Chen, a former concert pianist, had a hunch that neural networks might not be reaching their full potential. They speculated that neural networks might learn faster and better if the systems were "trained" to recognize animals, for instance, by using the power of one of the world's most highly evolved sounds--the human voice uttering specific words.

One of the more common exercises AI researchers use to test out the merits of a new machine learning technique is to train a neural network to recognize specific objects and animals in a collection of different photographs. To check their hypothesis, Chen, Lipson and two students, Yu Li and Sunand Raghupathi, set up a controlled experiment. They created two new neural networks with the goal of training both of them to recognize 10 different types of objects in a collection of 50,000 photographs known as "training images."

One AI system was trained the traditional way, by uploading a giant data table containing thousands of rows, each row corresponding to a single training photo. The first column was an image file containing a photo of a particular object or animal; the next 10 columns corresponded to 10 possible object types: cats, dogs, airplanes, etc. A "1" in any column indicates the correct answer, and nine 0s indicate the incorrect answers.

The team set up the experimental neural network in a radically novel way. They fed it a data table whose rows contained a photograph of an animal or object, and the second column contained an audio file of a recorded human voice actually voicing the word for the depicted animal or object out loud. There were no 1s and 0s.

Once both neural networks were ready, Chen, Li, and Raghupathi trained both AI systems for a total of 15 hours and then compared their respective performance. When presented with an image, the original network spat out the answer as a series of ten 1s and 0s--just as it was trained to do. The experimental neural network, however, produced a clearly discernible voice trying to "say" what the object in the image was. Initially the sound was just a garble. Sometimes it was a confusion of multiple categories, like "cog" for cat and dog. Eventually, the voice was mostly correct, albeit with an eerie alien tone (see example on website).

At first, the researchers were somewhat surprised to discover that their hunch had been correct--there was no apparent advantage to 1s and 0s. Both the control neural network and the experimental one performed equally well, correctly identifying the animal or object depicted in a photograph about 92% of the time. To double-check their results, the researchers ran the experiment again and got the same outcome.

What they discovered next, however, was even more surprising. To further explore the limits of using sound as a training tool, the researchers set up another side-by-side comparison, this time using far fewer photographs during the training process. While the first round of training involved feeding both neural networks data tables containing 50,000 training images, both systems in the second experiment were fed far fewer training photographs, just 2,500 apiece.

It is well known in AI research that most neural networks perform poorly when training data is sparse, and in this experiment, the traditional, numerically trained network was no exception. Its ability to identify individual animals that appeared in the photographs plummeted to about 35% accuracy. In contrast, although the experimental neural network was also trained with the same number of photographs, its performance did twice as well, dropping only to 70% accuracy.

Intrigued, Lipson and his students decided to test their voice-driven training method on another classic AI image recognition challenge, that of image ambiguity. This time they set up yet another side-by-side comparison but raised the game a notch by using more difficult photographs that were harder for an AI system to "understand." For example, one training photo depicted a slightly corrupted image of a dog, or a cat with odd colors. When they compared results, even with more challenging photographs, the voice-trained neural network was still correct about 50% of the time, outperforming the numerically-trained network that floundered, achieving only 20% accuracy.

Ironically, the fact their results went directly against the status quo became challenging when the researchers first tried to share their findings with their colleagues in computer science. "Our findings run directly counter to how many experts have been trained to think about computers and numbers; it's a common assumption that binary inputs are a more efficient way to convey information to a machine than audio streams of similar information 'richness,'" explained Boyuan Chen, the lead researcher on the study. "In fact, when we submitted this research to a big AI conference, one anonymous reviewer rejected our paper simply because they felt our results were just 'too surprising and un-intuitive.'"

When considered in the broader context of information theory however, Lipson and Chen's hypothesis actually supports a much older, landmark hypothesis first proposed by the legendary Claude Shannon, the father of information theory. According to Shannon's theory, the most effective communication "signals" are characterized by an optimal number of bits, paired with an optimal amount of useful information, or "surprise."

"If you think about the fact that human language has been going through an optimization process for tens of thousands of years, then it makes perfect sense, that our spoken words have found a good balance between noise and signal;" Lipson observed. "Therefore, when viewed through the lens of Shannon Entropy, it makes sense that a neural network trained with human language would outperform a neural network trained by simple 1s and 0s."

The study, to be presented at the International Conference on Learning Representations conference on May 3, 2021, is part of a broader effort at Lipson's Columbia Creative Machines Lab to create robots that can understand the world around them by interacting with other machines and humans, rather than by being programed directly with carefully preprocessed data.

"We should think about using novel and better ways to train AI systems instead of collecting larger datasets," said Chen. "If we rethink how we present training data to the machine, we could do a better job as teachers."

One of the more refreshing results of computer science research on artificial intelligence has been an unexpected side effect: by probing how machines learn, sometimes researchers stumble upon fresh insight into the grand challenges of other, well-established fields.

"One of the biggest mysteries of human evolution is how our ancestors acquired language, and how children learn to speak so effortlessly," Lipson said. "If human toddlers learn best with repetitive spoken instruction, then perhaps AI systems can, too."


About the Study


Authors are: Boyuan Chen, Yu Li, Sunand Raghupathi, Hod Lipson, Mechanical Engineering and Computer Science, Columbia Engineering.

The study was supported by NSF NRI 1925157 and DARPA MTO grant L2M Program HR0011-18-2-0020.

The authors declare no financial or other conflicts of interest.

LINKS: Paper: VIDEO: PROJECT WEBSITE: password cml1234

Columbia Engineering Columbia Engineering, based in New York City, is one of the top engineering schools in the U.S. and one of the oldest in the nation. Also known as The Fu Foundation School of Engineering and Applied Science, the School expands knowledge and advances technology through the pioneering research of its more than 220 faculty, while educating undergraduate and graduate students in a collaborative environment to become leaders informed by a firm foundation in engineering. The School's faculty are at the center of the University's cross-disciplinary research, contributing to the Data Science Institute, Earth Institute, Zuckerman Mind Brain Behavior Institute, Precision Medicine Initiative, and the Columbia Nano Initiative. Guided by its strategic vision, "Columbia Engineering for Humanity," the School aims to translate ideas into innovations that foster a sustainable, healthy, secure, connected, and creative humanity.

[Attachments] See images for this press release:
Deep learning networks prefer the human voice -- just like us


Study links prenatal phthalate exposure to altered information processing in infants

Study links prenatal phthalate exposure to altered information processing in infants
CHAMPAIGN, Ill. -- Exposure to phthalates, a class of chemicals widely used in packaging and consumer products, is known to interfere with normal hormone function and development in human and animal studies. Now researchers have found evidence linking pregnant women's exposure to phthalates to altered cognitive outcomes in their infants. Most of the findings involved slower information processing among infants with higher phthalate exposure levels, with males more likely to be affected depending on the chemical involved and the order of information presented to the infants. Reported in the journal Neurotoxicology, the study ...

Aquatic invasive species cause damage worth billions of dollars

Aquatic invasive species cause damage worth billions of dollars
The global movement of goods and people, in its modern form, has many unwanted side effects. One of these is that animal and plant species travel around the world with it. Often they fail to establish themselves in the ecosystems of the destination areas. Sometimes, however, due to a lack of effective management, they multiply to such an extent in the new environment that they become a threat to the entire ecosystem and economy. Thousands of alien species are currently documented worldwide. A quarter of them are in highly vulnerable, aquatic habitats. So far, research has mainly focused on the ecological consequences of these invasions. In a first global data analysis, 20 scientists from 13 countries led by GEOMAR Helmholtz Centre for Ocean Research Kiel have now compiled the economic ...

Helping consumers trade fast fashion for durable, sustainable luxury goods

Researchers from Columbia University and Georgetown University published a new paper in the Journal of Marketing that examines how consumers can adopt a sustainable consumption lifestyle by purchasing durable high-end and luxury products. The study, forthcoming in the Journal of Marketing, is titled "Buy Less, Buy Luxury: Understanding and Overcoming Product Durability Neglect for Sustainable Consumption" and is authored by Jennifer Sun, Silvia Bellezza, and Neeru Paharia. What do luxury products and sustainable goods have in common? Luxury goods possess a unique, sustainable trait that gives them a longer lifespan than lower-end products. Sustainable consumption is on the rise with all consumers. However, younger millennial ...

Scientists uncover mutations that make cancer resistant to therapies targeting KRAS

BOSTON - A gene called KRAS is one of the most commonly mutated genes in all human cancers, and targeted drugs that inhibit the protein expressed by mutated KRAS have shown promising results in clinical trials, with potential approvals by the U.S. Food and Drug Administration anticipated later this year. Unfortunately, cancer cells often develop additional mutations that make them resistant to such targeted drugs, resulting in disease relapse. Now researchers led by a team at Massachusetts General Hospital (MGH) have identified the first resistance mechanisms that may occur to these drugs and identified strategies to overcome them. The findings are published in END ...

Understanding fruit fly behavior may be next step toward autonomous vehicles

Understanding fruit fly behavior may be next step toward autonomous vehicles
With over 70% of respondents to a AAA annual survey on autonomous driving reporting they would fear being in a fully self-driving car, makers like Tesla may be back to the drawing board before rolling out fully autonomous self-driving systems. But new research from Northwestern University shows us we may be better off putting fruit flies behind the wheel instead of robots. Drosophila have been subjects of science as long as humans have been running experiments in labs. But given their size, it's easy to wonder what can be learned by observing them. Research published today in the journal Nature Communications demonstrates that fruit flies use decision-making, learning and memory to perform simple functions like escaping heat. And researchers are using ...

US trade sanctions justified response to human rights abuses in China, law expert argues

LAWRENCE -- An international trade law expert at the University of Kansas argues in a pair of new articles that human rights and trade are now inextricably linked, as evidenced by U.S. and international reactions to actions in China, and asserts that approach is an appropriate use of trade. Raj BhalaAfter the United States, then Canada and the Netherlands, declared the Chinese Communist Party's actions against Uyghur Muslims as genocide, the nations followed with various trade sanctions. Likewise, countries have adopted trade measures in response to China's violation of its one-country, two-systems agreement with Hong Kong. ...

Great tits change their traditions for the better

Researchers at the University of Konstanz and Max Planck Institute for Animal Behavior in Germany have found that birds are able to change their culture to become more efficient. Populations of great tits were able to switch from one behavior to a better alternative when their group members were slowly replaced with new birds. Published today as open access in the journal Current Biology, this research reveals immigration as a powerful driver of cultural change in animal groups that could help them to adapt to rapidly changing environments. In animals, "culture" is considered to be any behavior that is learned from others, shared by members of the ...

People with HIV at high risk for intimate partner violence

Ann Arbor, April 6, 2021 - New data from the Centers for Disease Control and Prevention (CDC) show that one in four adults with HIV in the United States has experienced intimate partner violence (IPV), which disproportionately affects women and LGBT populations. Further, people with HIV who experienced IPV in the past 12 months were more likely to engage in behaviors associated with elevated HIV transmission risk, were less likely to be engaged in routine HIV care and more likely to seek emergency care services and have poor HIV clinical outcomes. The findings are reported in the American Journal of Preventive Medicine, published by Elsevier. Lead Investigator Ansley B. Lemons-Lyn, MPH, and colleagues from the CDC's National Center for HIV/AIDS, Viral Hepatitis, ...

'Brain glue' helps repair circuitry in severe TBI

Brain glue helps repair circuitry in severe TBI
At a cost of $38 billion a year, an estimated 5.3 million people are living with a permanent disability related to traumatic brain injury in the United States today, according to the Centers for Disease Control and Prevention. The physical, mental and financial toll of a TBI can be enormous, but new research from the University of Georgia provides promise. In a new study, researchers at UGA's Regenerative Biosciences Center have demonstrated the long-term benefits of a hydrogel, which they call "brain glue," for the treatment of traumatic brain injury. The new study provides evidence that not only does the gel protect against loss of brain tissue after ...

AGA recommends intragastric balloons as an additional weight loss strategy for obese patients

Bethesda, MD (April 6, 2021) -- Obesity is a global pandemic, affecting about 40% of adults in the United States. There is an enormous unmet need for an effective weight-loss solution. After a detailed review of available literature, the American Gastroenterological Association (AGA) has released new clinical guidelines recommending the use of intragastric balloons (IGB) for patients with obesity who have not been able to lose weight with traditional weight-loss strategies. This treatment is most successful with accompanying therapy, such as lifestyle modifications and pharmacological agents, ...


Heart patients advised to move more to avoid heart attacks and strokes

New amphibious centipede species discovered in Okinawa and Taiwan

Scientists may detect signs of extraterrestrial life in the next 5 to 10 years

The fate of the planet

Tarantula's ubiquity traced back to the cretaceous

On the pulse of pulsars and polar light

Neural plasticity depends on this long noncoding RNA's journey from nucleus to synapse

A new guide for communicating plant science

The future of particle accelerators is here

Simulations reveal how dominant SARS-CoV-2 strain binds to host, succumbs to antibodies

New understanding of the deleterious immune response in rheumatoid arthritis

The Trojan-Horse mechanism: How networks reduce gender segregation

Science Advances publishes proteomics technology from Oblique Therapeutics AB

Female protective effect: Yale researchers find clues to sex differences in autism

Researchers revise indicator of mobility limitation in older adults

Study shows past COVID-19 infection doesn't fully protect young people against reinfection

A new super-Earth detected orbiting a red dwarf star

Differences in national food security best explained by household income, not agriculture

Hidden magma pools pose eruption risks that we can't yet detect

COVID-19: Scientists identify human genes that fight infection

New CRISPR technology offers unrivaled control of epigenetic inheritance

How tangled proteins kill brain cells, promote Alzheimer's, CTE

Fitted filtration efficiency of double masking during COVID-19 pandemic

Fit matters most when double masking to protect yourself from COVID-19

Thermoelectric material discovery sets stage for new forms of electric power in the future

Researchers develop microscopic theory of polymer gel

Studies suggest people with blood cancers may not be optimally protected after COVID-19 vaccination

Are our oil and gas pipelines safe during an earthquake?

Virtual humans are equal to real ones in helping people practice new leadership skills

Promising results from first-in-humans study of a novel PET radiopharmaceutical

[] Deep learning networks prefer the human voice -- just like us
Columbia engineers demonstrate that AI systems might reach higher performance if programmed with sound files of human language rather than with binary data labels