Widespread machine learning methods behind ‘link prediction’ are performing very poorly

New research indicates that methods used to test the accuracy of link prediction are flawed, and that link prediction does not work as well as common benchmarking tests currently indicate

2024-02-12

(Press-News.org) As you scroll through any social media feed, you are likely to be prompted to follow or friend another person, expanding your personal network and contributing to the growth of the app itself. The person suggested to you is a result of link prediction: a widespread machine learning (ML) task that evaluates the links in a network — your friends and everyone else’s — and tries to predict what the next links will be.

Beyond being the engine that drives social media expansion, link prediction is also used in a wide range of scientific research, such as predicting the interaction between genes and proteins, and is used by researchers as a benchmark for testing the performance of new ML algorithms.

New research from UC Santa Cruz Professor of Computer Science and Engineering C. “Sesh” Seshadhri published in the journal Proceedings of the National Academy of Sciences establishes that the metric used to measure link prediction performance is missing crucial information, and link prediction tasks are performing significantly worse than popular literature indicates.

Seshadhri and his coauthor Nicolas Menand, who is a former UCSC undergraduate and masters student and a current Ph.D. candidate at the University of Pennsylvania, recommend that ML researchers stop using the standard practice metric for measuring link prediction, known as AUC, and introduce a new, more comprehensive metric for this problem. The research has implications for trustworthiness around decisionmaking in ML.

AUC’s ineffectiveness

Seshadhri, who works in the fields of theoretical computer science and data mining and is currently an Amazon scholar, has done previous research on ML algorithms for networks. In this previous work he found certain mathematical limitations that were negatively impacting algorithm performance, and in an effort to better understand the mathematical limitations in context, dove deeper into link prediction due to its importance as a testbed problem for ML algorithms.

‘“The reason why we got interested is because link prediction is one of these really important scientific tasks which is used to benchmark a lot of machine learning algorithms,” Seshadhri said. “What we were seeing was that the performance seemed to be really good… but we had an inkling that there seemed to be something off with this measurement. It feels like if you measured things in a different way, maybe you wouldn’t see such great results.”

Link prediction is based on the ML algorithm’s ability to carry out low dimensional vector embeddings, the process by which the algorithm represents the people within a network as a mathematical vector in space. All of the machine learning occurs as mathematical manipulations to those vectors.

AUC, which stands for “area under curve” and is the most common metric for measuring link prediction, gives ML algorithms a score from zero to one based on the algorithm's performance.

In their research, the authors discovered that there are fundamental mathematical limitations to using low dimensional embeddings for link predictions, and that AUC can not measure these limitations. The inability to measure these limitations caused the authors to conclude that AUC does not accurately measure link prediction performance.

Seshadhri said these results call into question the widespread use of low dimensional vector embeddings in the ML field, considering the mathematical limitations that his research has surfaced on their performance.

Leading methods fall short

The discovery of AUC’s shortcomings led the researchers to create a new metric to better capture the limitations, which they call VCMPR. They used VCMPR to measure 12 ML algorithms chosen to be representative of the field, including algorithms such as DeepWalk, Node2vec, NetMF, GraphSage, and graph benchmark leader HOP-Rec, and found that the link prediction performance was worse using VCMPR as the metric rather than AUC.

“When we look at the VCMPR scores, we see that the scores of most of the leading methods out there are really poor,” Seshadhri said. “It looks like they're actually not doing a good job when you measure things a different way.”

The results also showed that not only was performance lower across the board, some of the algorithms that performed worse than other algorithms when measured with AUC in turn performed better than the cohort with VCMPR, and vice versa.

Trustworthiness in machine learning

Seshadhri suggests that ML researchers use VCMPR to benchmark the link prediction performance of their algorithms, or at the very least stop using AUC as their measure. As metrics are so tightly connected to decision making in ML, using a flawed system to measure performance could lead to flawed decision making about which algorithms to employ in real world ML applications.

“Metrics are so closely tied to what we decide to deploy in the real world — people need to have some trust in that. If you have the wrong way of measuring, how can you trust the results?” Seshadri said. “This paper is in some sense cautionary: we have to be more careful about how we do our machine learning experiments, and we need to come up with a richer set of measures.”

In academia, using an accurate metric is crucial to creating progress in the ML field.

“This is in some sense a bit of a conundrum for scientific progress. A new result has to supposedly be better than everything previously, otherwise it's not doing anything new — but that all depends on how you measure it.”

Beyond machine learning, there are researchers across a wide range of fields who use link prediction and ML to conduct their research, often with profound potential impact. For example, some biologists use link prediction to determine which proteins are likely to interact as a part of drug discovery. These biologists and other researchers outside of ML depend on the ML experts to create trustworthy tools, as they often cannot become ML experts themselves.

While he thinks these results may not be a huge surprise to those deeply involved in the field, he hopes that the larger community of ML researchers, and particularly graduate and Ph.D. students who use the current literature to learn best practices and common wisdom about the field, will take note of these results and take caution in their work. He sees this research that presents a skeptical view to be in somewhat contrast to a dominant philosophy in ML, which tends to accept a set of metrics and focuses on “pushing the bar” when it comes to progress in the field.

“It's important that we have the skeptical view, are trying to understand deeper, and are constantly asking ourselves ‘Are we measuring things correctly?’”

This research was funded by the National Science Foundation and the Army Research Office.

END

ELSE PRESS RELEASES FROM THIS DATE:

The hidden rule for flight feathers—and how it could reveal which dinosaurs could fly

2024-02-12

Birds can fly— at least, most of them can. Flightless birds like penguins and ostriches have evolved lifestyles that don’t require flight. However, there’s a lot that scientists don’t know about how the wings and feathers of flightless birds differ from their airborne cousins. In a new study in the journal PNAS, scientists examined hundreds of birds in museum collections and discovered a suite of feather characteristics that all flying birds have in common. These “rules” provide clues as to how the dinosaur ancestors of modern birds first evolved the ability to fly, ...

Machine learning promises to accelerate metabolism research

2024-02-12

A new study shows that it is possible to use machine learning and statistics to address a problem that has long hindered the field of metabolomics: large variations in the data collected at different sites. “We don’t always know the source of the variation,” said Daniel Raftery, professor of anesthesiology and pain medicine at the University of Washington School of Medicine in Seattle. “It could be because the subjects are different with different genetics, diets and environmental exposures. Or it could be the way samples were collected and ...

Researchers uncover a key link in legume plant-bacteria symbiosis

2024-02-12

Legume plants have the unique ability to interact with nitrogen-fixing bacteria in the soil, known as rhizobia. Legumes and rhizobia engage in symbiotic relations upon nitrogen starvation, allowing the plant to thrive without the need for externally supplied nitrogen. Symbiotic nodules are formed on the root of the plant, which are readily colonized by nitrogen-fixing bacteria. The cell-surface receptor SYMRK (symbiosis receptor-like kinase) is responsible for mediating the symbiotic signal from rhizobia perception to formation of the nodule. ...

Genetic analysis and archaeological insight combine to reveal the ancient origins of the fallow deer

2024-02-12

Modern populations of fallow deer possess hidden cultural histories dating back to the Roman Empire which ought to be factored into decisions around their management and conservation. New research, bringing together DNA analysis with archaeological insights, has revealed how fallow deer have been repeatedly moved to new territories by humans, often as a symbol of colonial power or because of ancient cultures and religions. The results show that the animal was first introduced into Britain by the Romans ...

Researchers studying ocean transform faults, describe a previously unknown part of the geological carbon cycle

2024-02-12

Woods Hole, Mass. (February 12, 2024) – Studying a rock is like reading a book. The rock has a story to tell, says Frieder Klein, an associate scientist in the Marine Chemistry & Geochemistry Department at the Woods Hole Oceanographic Institution (WHOI). The rocks that Klein and his colleagues analyzed from the submerged flanks of the St. Peter and St. Paul Archipelago in the St. Paul’s oceanic transform fault, about 500 km off the coast of Brazil, tells a fascinating and previously unknown story about parts of the geological ...

Salt substitutes help to maintain healthy blood pressure in older adults

2024-02-12

The replacement of regular salt with a salt substitute can reduce incidences of hypertension, or high blood pressure, in older adults without increasing their risk of low blood pressure episodes, according to a recent study in the Journal of the American College of Cardiology. People who used a salt substitute had a 40% lower incidence and likelihood of experiencing hypertension compared to those who used regular salt. According to the World Health Organization, hypertension is the leading risk factor for cardiovascular disease and mortality. It affects over 1.4 billion adults and results in 10.8 million deaths per year worldwide. One of the ...

Heart disease risk factors in women highlight need for increased awareness, prevention

2024-02-12

Statement Highlights: The new scientific statement highlights heart disease as the leading cause of death for women and emerging evidence that has identified several gender-specific risk factors for heart disease in women, including complications during pregnancy and premature menopause. Compared to men, women also have different symptoms of heart disease, are less likely to receive evidence-based therapies and are more likely to have adverse cardiovascular outcomes after a cardiac event. Targeted public health interventions ...

SETI institute employs SETI ellipsoid technique for searching for signals from distant civilizations

2024-02-12

February 12, 2024, Mountain View, CA -- In a paper published in the Astronomical Journal, a team of researchers from the SETI Institute, Berkeley SETI Research Center and the University of Washington reported an exciting development for the field of astrophysics and the search for extraterrestrial intelligence (SETI), using observations from the Transiting Exoplanet Survey Satellite (TESS) mission to monitor the SETI Ellipsoid, a method for identifying potential signals from advanced civilizations in the ...

Sugar-reduced chocolate with oat flour just as tasty as original, study finds

2024-02-12

UNIVERSITY PARK, Pa. — The secret to making delicious chocolate with less added sugar is oat flour, according to a new study by Penn State researchers. In a blind taste test, recently published in the Journal of Food Science, 25% reduced-sugar chocolates made with oat flour were rated equally, and in some cases preferred, to regular chocolate. The findings provide a new option for decreasing chocolate’s sugar content while maintaining its texture and flavor. “We were able to show that there is a range in which you can ...

AI-supported image analysis: metrics determine quality

2024-02-12

How well do the algorithms used in the AI-supported analysis of medical images perform their respective tasks? This depends to a large extent on the metrics used to evaluate their performance. An international consortium led by scientists from the German Cancer Research Center (DKFZ) and the National Center for Tumor Diseases (NCT) in Heidelberg has compiled the knowledge available worldwide on the specific strengths, weaknesses and limitations of the various validation metrics. With "Metrics Reloaded", the researchers are ...

LAST 30 PRESS RELEASES:

Osteoporosis treatment benefits people older than 80

Consuming more protein may protect patients taking anti-obesity drug from muscle loss

Thyroid treatment may improve gut health in people with hypothyroidism

Combination of obesity medication tirzepatide and menopause hormone therapy fuels weight loss

High blood sugar may have a negative impact on men’s sexual health

Emotional health of parents tied to well-being of children with growth hormone deficiency

Oxytocin may reduce mood changes in women with disrupted sleep

Mouse study finds tirzepatide slowed obesity-associated breast cancer growth

CMD-OPT model enables the discovery of a potent and selective RIPK2 inhibitor as preclinical candidate for the treatment of acute liver injury

Melatonin receptor 1a alleviates sleep fragmentation-aggravated testicular injury in T2DM by suppression of TAB1/TAK1 complex through FGFR1

Single-cell RNA sequencing reveals Shen-Bai-Jie-Du decoction retards colorectal tumorigenesis by regulating the TMEM131–TNF signaling pathway-mediated differentiation of immunosuppressive dendritic ce

Acta Pharmaceutica Sinica B Volume 15, Issue 7 Publishes

New research expands laser technology

Targeted radiation offers promise in patients with metastasized small cell lung cancer to the brain

A high clinically translatable strategy to anti-aging using hyaluronic acid and silk fibroin co-crosslinked hydrogels as dermal regenerative fillers

Mount Sinai researchers uncover differences in how males and females change their mind when reflecting on past mistakes

CTE and normal aging are difficult to distinguish, new study finds

Molecular arms race: How the genome defends itself against internal enemies

Tiny chip speeds up antibody mapping for faster vaccine design

KTU experts reveal why cultural heritage is important for community unity

More misfolded proteins than previously known may contribute to Alzheimer’s and dementia

“Too much going on”: Autistic adults overwhelmed by non-verbal social cues

What’s driving America’s deep freezes in a warming world?

A key role of brain protein in learning and memory is deciphered by scientists

Heart attacks don’t follow a Hollywood script

Erin M. Schuman wins 2026 Nakasone Award for discovery on neural synapse function and change during formation of memories

Global ocean analysis could replace costly in-situ sound speed profiles in seafloor positioning, study finds

Power in numbers: Small group professional coaching reduces rates of physician burnout by nearly 30%

Carbon capture, utilization, and storage: A comprehensive review of CCUS-EOR

New high-temperature stable dispersed particle gel for enhanced profile control in CCUS applications

[Press-News.org] Widespread machine learning methods behind ‘link prediction’ are performing very poorly
New research indicates that methods used to test the accuracy of link prediction are flawed, and that link prediction does not work as well as common benchmarking tests currently indicate