PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Next top model: Competition-based AI study aims to lower data center costs

Digital data center monitor uses machine learning to diagnose and cure cluster computer defects, reducing downtime and optimizing limited resources

Next top model: Competition-based AI study aims to lower data center costs
2025-02-28
(Press-News.org) NEWPORT NEWS, VA – Who, or rather what, will be the next top model? 

Data scientists and developers at the U.S. Department of Energy’s Thomas Jefferson National Accelerator Facility are trying to find out, exploring some of the latest artificial intelligence (AI) techniques to help make high-performance computers more reliable and less costly to run.

The models in this case are artificial neural networks trained to monitor and predict the behavior of a scientific computing cluster, where torrents of numbers are constantly crunched. The goal is to help system administrators quickly identify and respond to troublesome computing jobs, reducing downtime for scientists processing data from their experiments. 

In almost fashion-show style, these machine learning (ML) models are judged to see which is best suited for the ever-changing dataset demands of experimental programs. But unlike the hit reality TV series “America’s Next Top Model” and its international spinoffs, it doesn’t take an entire season to pick a winner. In this contest, a new “champion model” is crowned every 24 hours based on its ability to learn from fresh data. 

“We’re trying to understand characteristics of our computing clusters that we haven’t seen before,” said Bryan Hess, Jefferson Lab’s scientific computing operations manager and a lead investigator – or judge, so to speak – in the study. “It’s looking at the data center in a more holistic way, and going forward, that's going to be some kind of AI or ML model.”

While these models don’t win any glitzy photoshoots, the project recently took the spotlight in the peer-reviewed scientific magazine IEEE Software as part of a special edition dedicated to machine learning in data center operations (MLOps). 

The results of the study could have big implications for Big Science.

The Need

Large-scale scientific instruments, such as particle accelerators, light sources and radio telescopes, are critical DOE facilities that enable scientific discovery. At Jefferson Lab, it’s the Continuous Electron Beam Accelerator Facility (CEBAF), a DOE Office of Science User Facility relied on by a global community of more than 1,650 nuclear physicists.

Experimental detectors at Jefferson Lab collect faint signatures of tiny particles originating from the CEBAF electron beams. Because CEBAF produces beam 24/7, those signals translate into mountains of data. The information collected is on the order of tens of petabytes per year. That’s enough to fill an average laptop’s hard drive about once a minute. 

Particle interactions are processed and analyzed in Jefferson Lab’s data center using high-throughput computing clusters with software tailored to each experiment.

Among the blinking lights and bundled cables, complex jobs requiring several processors (cores) are the norm. The fluid nature of these workloads means many moving parts – and more things that could go wrong.

Certain compute jobs or hardware problems can result in unexpected cluster behavior, referred to as “anomalies.” They can include memory fragmenting or input/output overcommitments, resulting in delays for scientists.

“When compute clusters get bigger, it becomes tough for system administrators to keep track of all the components that might go bad,” said Ahmed Hossam Mohammed, a postdoctoral researcher at Jefferson Lab and an investigator on the study. “We wanted to automate this process with a model that flashes a red light whenever something weird happens.

“That way, system administrators can take action before conditions deteriorate even further.”

A DIDACT-ic Approach

To address these challenges, the group developed an ML-based management system called DIDACT (Digital Data Center Twin). The acronym is a play on the word “didactic,” which describes something that’s designed to teach. In this case, it’s teaching artificial neural networks.

DIDACT is a project funded by Jefferson Lab’s Laboratory Directed Research & Development (LDRD) program. The program provides the resources for laboratory staff to pursue projects that could make rapid and significant contributions to critical national science and technology problems of mission relevance and/or advance the laboratory’s core scientific and technical capabilities.

The DIDACT system is designed to detect anomalies and diagnose their source using an AI approach called continual learning. 

In continual learning, ML models are trained on data that arrive incrementally, similar to the lifelong learning experienced by people and animals. The DIDACT team trains multiple models in this fashion, with each representing the system dynamics of active computing jobs, then selects the top performer based on that day’s data.

The models are variations of unsupervised neural networks called autoencoders. One is equipped with a graph neural network (GNN), which looks at relationships between components. 

“They compete using known data to determine which had lower error,” said Diana McSpadden, a Jefferson Lab data scientist and lead on the MLOps study. “Whichever won that day would be the ‘daily champion.’ ”

The method could one day help reduce downtime in data centers and optimize critical resources – meaning lower costs and improved science.

Here’s how it works.

The Next Top Model

To train the models without affecting day-to-day compute needs, the DIDACT team developed a testbed cluster called the “sandbox.” Think of the sandbox as a runway where the models are scored, in this case based their ability to train. 

The DIDACT software is an ensemble of open-source and custom-built code used to develop and manage and ML models, monitor the sandbox cluster, and write out the data. All those numbers are visualized on a graphical dashboard.

The system includes three pipelines for the ML “talent.” One is for offline development, like a dress rehearsal. Another is for continual learning – where the live competition takes place. Each time a new top model emerges, it becomes the primary monitor of cluster behavior in the real-time pipeline – until it's unseated by the next day’s winner.

“DIDACT represents a creative stitching together of hardware and open-source software,” said Hess, who is also the infrastructure architect for the High Performance Data Facility Hub being built at Jefferson Lab in partnership with DOE’s Lawrence Berkeley National Laboratory. “It’s a combination of things that you normally wouldn’t put together, and we’ve shown that it can work. It really draws on the strength of Jefferson Lab’s data science and computing operations expertise.”

In future studies, the DIDACT team would like to explore an ML framework that optimizes a data center’s energy usage, whether by reducing the water flow used in cooling or by throttling down cores based on data-processing demands.

“The goal is always to provide more bang for the buck,” Hess said, “more science for the dollar.”

Further Reading
Establishing Machine Learning Operations for Continual Learning in Computing Clusters
Jefferson Lab Devotes $3 Million to Testing New Ideas
High Performance Data Facility Hub
24s: A Businesslike Name for a ‘High-Performing Machine’
Rolling in the Deep: Norfolk Street Flooding Predicted in Seconds With Machine Learning Models
Steering Electrons Out of the Drift with Deep Learning
Unlocking Hidden Potential Through Artificial Intelligence

###

Jefferson Science Associates, LLC, manages and operates the Thomas Jefferson National Accelerator Facility, or Jefferson Lab, for the U.S. Department of Energy's Office of Science. JSA is a wholly owned subsidiary of the Southeastern Universities Research Association, Inc. (SURA).

DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science

END

[Attachments] See images for this press release:
Next top model: Competition-based AI study aims to lower data center costs

ELSE PRESS RELEASES FROM THIS DATE:

Innovative startup awarded $10,000 to tackle cardiovascular disparities

2025-02-28
DALLAS, Feb. 28, 2025  — Cardiovascular disease disproportionately affects Black communities, with more than 57% of non-Hispanic Black adults living with some form of the disease. To drive solutions that address these disparities, the American Heart Association, a global force changing the future of health for all, launched the Heart of Innovation HBCU Challenge to empower the next generation of health tech entrepreneurs from Historically Black Colleges and Universities (HBCUs). On Monday, Shadrach ...

Study compares indoor transmission-risk metrics for infectious diseases

Study compares indoor transmission-risk metrics for infectious diseases
2025-02-28
A recent study published in the journal Engineering delves into the complex world of assessing the transmission risk of infectious diseases in indoor spaces. With the ongoing impact of the COVID-19 pandemic, understanding how to accurately evaluate the effectiveness of non-pharmaceutical interventions (NPIs) has become crucial. Governments worldwide implemented NPIs to control the spread of COVID-19. Many studies used simulations to measure the risk of infection transmission before and after implementing these measures. However, the choice of metric to quantify ...

Micro-expression detection in ASD movies: a YOLOv8-SMART approach

Micro-expression detection in ASD movies: a YOLOv8-SMART approach
2025-02-28
Researchers have unveiled a groundbreaking AI-driven approach to improve the early diagnosis of Autism Spectrum Disorder by analyzing micro-expressions in movies. Micro-expressions, which are fleeting facial movements that reveal hidden emotions, are particularly challenging to detect in individuals with ASD. By employing the Cinemetrics method, the team successfully extracted micro-expressions from films featuring ASD patients and utilized an enhanced YOLOv8-SMART algorithm for precise detection. This advanced model significantly outperformed existing methods, achieving remarkable ...

Machine learning on blockchain: A new approach to engineering computational security

Machine learning on blockchain: A new approach to engineering computational security
2025-02-28
A new study published in Engineering presents a novel framework that combines machine learning (ML) and blockchain technology (BT) to enhance computational security in engineering. The framework, named Machine Learning on Blockchain (MLOB), aims to address the limitations of existing ML-BT integration solutions that primarily focus on data security while overlooking computational security. ML has been widely used in engineering to solve complex problems, offering high accuracy and efficiency. However, it faces security threats such as data tampering and ...

Vacuum glazing: A promising solution for low-carbon buildings

Vacuum glazing: A promising solution for low-carbon buildings
2025-02-28
A new review article published in Engineering offers a comprehensive look at vacuum glazing, a technology that shows great potential in enhancing energy efficiency in buildings. As buildings account for around 40% of society’s total energy consumption, improving the thermal performance of glazing is crucial for achieving low-carbon building goals. Vacuum glazing has gained attention for its heat preservation, sound insulation, lightweight features, and anti-condensation properties. The concept dates back to 1913, but it was not until 1989 that researchers in Australia successfully produced vacuum glazing with excellent thermal insulation performance. Since then, significant ...

Racial and ethnic differences in out-of-pocket spending for maternity care

2025-02-28
About The Study: In this study, differences in out-of-pocket maternity spending among the commercially insured were associated with differences in coinsurance rates. These costs could lead people to forgo needed health care or other basic needs that support health (e.g., food or housing). Changes to health plan benefit design could improve equity in out-of-pocket maternity spending and its consequences. Corresponding Author: To contact the corresponding author, Anna D. Sinaiko, PhD, MPP, email asinaiko@hsph.harvard.edu. To access the embargoed study: ...

Study reveals racial and ethnic disparities in maternity care spending

Study reveals racial and ethnic disparities in maternity care spending
2025-02-28
COLLEGE PARK, MD – A new study out today in JAMA Health Forum is the first to show that Black, Hispanic and Asian people with private insurance tend to pay more out-of-pocket for maternity care than white people.  “The average additional spending on medical care from pregnancy through postpartum paid by people who are Black, Hispanic and Asian is significantly more than white people,” said Dr. Rebecca Gourevitch, the study’s lead author and an assistant professor in the Department of Health Policy and Management at the University of Maryland School of Public Health (UMD SPH).  “We found that out-of-pocket costs were highest ...

Changes in food insecurity among US adults with low income during the COVID-19 pandemic

2025-02-28
About The Study: During the COVID-19 pandemic, food insecurity decreased among Supplemental Nutrition Assistance Program (SNAP) participants in most racial and ethnic groups but did not decrease among non-SNAP participants in any group. These results suggest that during the pandemic, increased SNAP benefit amounts were associated with ameliorating food insecurity for many U.S. adults who were able to access SNAP but did not reduce racial and ethnic disparities in food insecurity. Corresponding Author: To contact the corresponding author, Yingfei Wu, MD, MPH, email yingfeiwu322@gmail.com. To ...

After NIH decision to cap indirect costs, prominent molecular biologist calls for swift action, petition signatures

2025-02-28
On February 7, 2025, the U.S. National Institutes of Health announced a decision to cap indirect cost reimbursement—which supports the critical infrastructure and staff that make biomedical research possible—at 15%. In a commentary published February 28 in the Cell Press journal Cell, molecular biologist Tom Maniatis of the New York Genome Center (NYGC) and Columbia University's Zuckerman Institute reflects on the impact NIH funding has had on his own career and science, explores the value indirect investment has brought to U.S. science over the last five decades, and calls for urgent, unified action from the scientific community ...

Omitting race from lung function equations increases detection of asthma in Black children

Omitting race from lung function equations increases detection of asthma in Black children
2025-02-28
Despite ongoing progress, structural racism and health disparities continue to shape healthcare practices in ways healthcare providers may not even realize. A recent study in JAMA Network Open, published Feb. 28, 2025, shows that continued use of race-specific equations in the diagnostic process of children with asthma symptoms limits the identification of reduced lung function in Black children. “This finding is important because when these children are not identified as having reduced lung function, they may not receive further testing. This can lead to under-diagnosis, ...

LAST 30 PRESS RELEASES:

Father’s mental health can impact children for years

Scientists can tell healthy and cancerous cells apart by how they move

Male athletes need higher BMI to define overweight or obesity

How thoughts influence what the eyes see

Unlocking the genetic basis of adaptive evolution: study reveals complex chromosomal rearrangements in a stick insect

Research Spotlight: Using artificial intelligence to reveal the neural dynamics of human conversation

Could opioid laws help curb domestic violence? New USF research says yes

NPS Applied Math Professor Wei Kang named 2025 SIAM Fellow

Scientists identify agent of transformation in protein blobs that morph from liquid to solid

Throwing a ‘spanner in the works’ of our cells’ machinery could help fight cancer, fatty liver disease… and hair loss

Research identifies key enzyme target to fight deadly brain cancers

New study unveils volcanic history and clues to ancient life on Mars

Monell Center study identifies GLP-1 therapies as a possible treatment for rare genetic disorder Bardet-Biedl syndrome

Scientists probe the mystery of Titan’s missing deltas

Q&A: What makes an ‘accidental dictator’ in the workplace?

Lehigh University water scientist Arup K. SenGupta honored with ASCE Freese Award and Lecture

Study highlights gaps in firearm suicide prevention among women

People with medical debt five times more likely to not receive mental health care treatment

Hydronidone for the treatment of liver fibrosis associated with chronic hepatitis B

Rise in claim denial rates for cancer-related advanced genetic testing

Legalizing youth-friendly cannabis edibles and extracts and adolescent cannabis use

Medical debt and forgone mental health care due to cost among adults

Colder temperatures increase gastroenteritis risk in Rohingya refugee camps

Acyclovir-induced nephrotoxicity: Protective potential of N-acetylcysteine

Inhibition of cyclooxygenase-2 upregulates the nuclear factor erythroid 2-related factor 2 signaling pathway to mitigate hepatocyte ferroptosis in chronic liver injury

AERA announces winners of the 2025 Palmer O. Johnson Memorial Award

Mapping minds: The neural fingerprint of team flow dynamics

Patients support AI as radiologist backup in screening mammography

AACR: MD Anderson’s John Weinstein elected Fellow of the AACR Academy

Existing drug has potential for immune paralysis

[Press-News.org] Next top model: Competition-based AI study aims to lower data center costs
Digital data center monitor uses machine learning to diagnose and cure cluster computer defects, reducing downtime and optimizing limited resources