PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Next top model: Competition-based AI study aims to lower data center costs

Digital data center monitor uses machine learning to diagnose and cure cluster computer defects, reducing downtime and optimizing limited resources

Next top model: Competition-based AI study aims to lower data center costs
2025-02-28
(Press-News.org) NEWPORT NEWS, VA – Who, or rather what, will be the next top model? 

Data scientists and developers at the U.S. Department of Energy’s Thomas Jefferson National Accelerator Facility are trying to find out, exploring some of the latest artificial intelligence (AI) techniques to help make high-performance computers more reliable and less costly to run.

The models in this case are artificial neural networks trained to monitor and predict the behavior of a scientific computing cluster, where torrents of numbers are constantly crunched. The goal is to help system administrators quickly identify and respond to troublesome computing jobs, reducing downtime for scientists processing data from their experiments. 

In almost fashion-show style, these machine learning (ML) models are judged to see which is best suited for the ever-changing dataset demands of experimental programs. But unlike the hit reality TV series “America’s Next Top Model” and its international spinoffs, it doesn’t take an entire season to pick a winner. In this contest, a new “champion model” is crowned every 24 hours based on its ability to learn from fresh data. 

“We’re trying to understand characteristics of our computing clusters that we haven’t seen before,” said Bryan Hess, Jefferson Lab’s scientific computing operations manager and a lead investigator – or judge, so to speak – in the study. “It’s looking at the data center in a more holistic way, and going forward, that's going to be some kind of AI or ML model.”

While these models don’t win any glitzy photoshoots, the project recently took the spotlight in the peer-reviewed scientific magazine IEEE Software as part of a special edition dedicated to machine learning in data center operations (MLOps). 

The results of the study could have big implications for Big Science.

The Need

Large-scale scientific instruments, such as particle accelerators, light sources and radio telescopes, are critical DOE facilities that enable scientific discovery. At Jefferson Lab, it’s the Continuous Electron Beam Accelerator Facility (CEBAF), a DOE Office of Science User Facility relied on by a global community of more than 1,650 nuclear physicists.

Experimental detectors at Jefferson Lab collect faint signatures of tiny particles originating from the CEBAF electron beams. Because CEBAF produces beam 24/7, those signals translate into mountains of data. The information collected is on the order of tens of petabytes per year. That’s enough to fill an average laptop’s hard drive about once a minute. 

Particle interactions are processed and analyzed in Jefferson Lab’s data center using high-throughput computing clusters with software tailored to each experiment.

Among the blinking lights and bundled cables, complex jobs requiring several processors (cores) are the norm. The fluid nature of these workloads means many moving parts – and more things that could go wrong.

Certain compute jobs or hardware problems can result in unexpected cluster behavior, referred to as “anomalies.” They can include memory fragmenting or input/output overcommitments, resulting in delays for scientists.

“When compute clusters get bigger, it becomes tough for system administrators to keep track of all the components that might go bad,” said Ahmed Hossam Mohammed, a postdoctoral researcher at Jefferson Lab and an investigator on the study. “We wanted to automate this process with a model that flashes a red light whenever something weird happens.

“That way, system administrators can take action before conditions deteriorate even further.”

A DIDACT-ic Approach

To address these challenges, the group developed an ML-based management system called DIDACT (Digital Data Center Twin). The acronym is a play on the word “didactic,” which describes something that’s designed to teach. In this case, it’s teaching artificial neural networks.

DIDACT is a project funded by Jefferson Lab’s Laboratory Directed Research & Development (LDRD) program. The program provides the resources for laboratory staff to pursue projects that could make rapid and significant contributions to critical national science and technology problems of mission relevance and/or advance the laboratory’s core scientific and technical capabilities.

The DIDACT system is designed to detect anomalies and diagnose their source using an AI approach called continual learning. 

In continual learning, ML models are trained on data that arrive incrementally, similar to the lifelong learning experienced by people and animals. The DIDACT team trains multiple models in this fashion, with each representing the system dynamics of active computing jobs, then selects the top performer based on that day’s data.

The models are variations of unsupervised neural networks called autoencoders. One is equipped with a graph neural network (GNN), which looks at relationships between components. 

“They compete using known data to determine which had lower error,” said Diana McSpadden, a Jefferson Lab data scientist and lead on the MLOps study. “Whichever won that day would be the ‘daily champion.’ ”

The method could one day help reduce downtime in data centers and optimize critical resources – meaning lower costs and improved science.

Here’s how it works.

The Next Top Model

To train the models without affecting day-to-day compute needs, the DIDACT team developed a testbed cluster called the “sandbox.” Think of the sandbox as a runway where the models are scored, in this case based their ability to train. 

The DIDACT software is an ensemble of open-source and custom-built code used to develop and manage and ML models, monitor the sandbox cluster, and write out the data. All those numbers are visualized on a graphical dashboard.

The system includes three pipelines for the ML “talent.” One is for offline development, like a dress rehearsal. Another is for continual learning – where the live competition takes place. Each time a new top model emerges, it becomes the primary monitor of cluster behavior in the real-time pipeline – until it's unseated by the next day’s winner.

“DIDACT represents a creative stitching together of hardware and open-source software,” said Hess, who is also the infrastructure architect for the High Performance Data Facility Hub being built at Jefferson Lab in partnership with DOE’s Lawrence Berkeley National Laboratory. “It’s a combination of things that you normally wouldn’t put together, and we’ve shown that it can work. It really draws on the strength of Jefferson Lab’s data science and computing operations expertise.”

In future studies, the DIDACT team would like to explore an ML framework that optimizes a data center’s energy usage, whether by reducing the water flow used in cooling or by throttling down cores based on data-processing demands.

“The goal is always to provide more bang for the buck,” Hess said, “more science for the dollar.”

Further Reading
Establishing Machine Learning Operations for Continual Learning in Computing Clusters
Jefferson Lab Devotes $3 Million to Testing New Ideas
High Performance Data Facility Hub
24s: A Businesslike Name for a ‘High-Performing Machine’
Rolling in the Deep: Norfolk Street Flooding Predicted in Seconds With Machine Learning Models
Steering Electrons Out of the Drift with Deep Learning
Unlocking Hidden Potential Through Artificial Intelligence

###

Jefferson Science Associates, LLC, manages and operates the Thomas Jefferson National Accelerator Facility, or Jefferson Lab, for the U.S. Department of Energy's Office of Science. JSA is a wholly owned subsidiary of the Southeastern Universities Research Association, Inc. (SURA).

DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science

END

[Attachments] See images for this press release:
Next top model: Competition-based AI study aims to lower data center costs

ELSE PRESS RELEASES FROM THIS DATE:

Innovative startup awarded $10,000 to tackle cardiovascular disparities

2025-02-28
DALLAS, Feb. 28, 2025  — Cardiovascular disease disproportionately affects Black communities, with more than 57% of non-Hispanic Black adults living with some form of the disease. To drive solutions that address these disparities, the American Heart Association, a global force changing the future of health for all, launched the Heart of Innovation HBCU Challenge to empower the next generation of health tech entrepreneurs from Historically Black Colleges and Universities (HBCUs). On Monday, Shadrach ...

Study compares indoor transmission-risk metrics for infectious diseases

Study compares indoor transmission-risk metrics for infectious diseases
2025-02-28
A recent study published in the journal Engineering delves into the complex world of assessing the transmission risk of infectious diseases in indoor spaces. With the ongoing impact of the COVID-19 pandemic, understanding how to accurately evaluate the effectiveness of non-pharmaceutical interventions (NPIs) has become crucial. Governments worldwide implemented NPIs to control the spread of COVID-19. Many studies used simulations to measure the risk of infection transmission before and after implementing these measures. However, the choice of metric to quantify ...

Micro-expression detection in ASD movies: a YOLOv8-SMART approach

Micro-expression detection in ASD movies: a YOLOv8-SMART approach
2025-02-28
Researchers have unveiled a groundbreaking AI-driven approach to improve the early diagnosis of Autism Spectrum Disorder by analyzing micro-expressions in movies. Micro-expressions, which are fleeting facial movements that reveal hidden emotions, are particularly challenging to detect in individuals with ASD. By employing the Cinemetrics method, the team successfully extracted micro-expressions from films featuring ASD patients and utilized an enhanced YOLOv8-SMART algorithm for precise detection. This advanced model significantly outperformed existing methods, achieving remarkable ...

Machine learning on blockchain: A new approach to engineering computational security

Machine learning on blockchain: A new approach to engineering computational security
2025-02-28
A new study published in Engineering presents a novel framework that combines machine learning (ML) and blockchain technology (BT) to enhance computational security in engineering. The framework, named Machine Learning on Blockchain (MLOB), aims to address the limitations of existing ML-BT integration solutions that primarily focus on data security while overlooking computational security. ML has been widely used in engineering to solve complex problems, offering high accuracy and efficiency. However, it faces security threats such as data tampering and ...

Vacuum glazing: A promising solution for low-carbon buildings

Vacuum glazing: A promising solution for low-carbon buildings
2025-02-28
A new review article published in Engineering offers a comprehensive look at vacuum glazing, a technology that shows great potential in enhancing energy efficiency in buildings. As buildings account for around 40% of society’s total energy consumption, improving the thermal performance of glazing is crucial for achieving low-carbon building goals. Vacuum glazing has gained attention for its heat preservation, sound insulation, lightweight features, and anti-condensation properties. The concept dates back to 1913, but it was not until 1989 that researchers in Australia successfully produced vacuum glazing with excellent thermal insulation performance. Since then, significant ...

Racial and ethnic differences in out-of-pocket spending for maternity care

2025-02-28
About The Study: In this study, differences in out-of-pocket maternity spending among the commercially insured were associated with differences in coinsurance rates. These costs could lead people to forgo needed health care or other basic needs that support health (e.g., food or housing). Changes to health plan benefit design could improve equity in out-of-pocket maternity spending and its consequences. Corresponding Author: To contact the corresponding author, Anna D. Sinaiko, PhD, MPP, email asinaiko@hsph.harvard.edu. To access the embargoed study: ...

Study reveals racial and ethnic disparities in maternity care spending

Study reveals racial and ethnic disparities in maternity care spending
2025-02-28
COLLEGE PARK, MD – A new study out today in JAMA Health Forum is the first to show that Black, Hispanic and Asian people with private insurance tend to pay more out-of-pocket for maternity care than white people.  “The average additional spending on medical care from pregnancy through postpartum paid by people who are Black, Hispanic and Asian is significantly more than white people,” said Dr. Rebecca Gourevitch, the study’s lead author and an assistant professor in the Department of Health Policy and Management at the University of Maryland School of Public Health (UMD SPH).  “We found that out-of-pocket costs were highest ...

Changes in food insecurity among US adults with low income during the COVID-19 pandemic

2025-02-28
About The Study: During the COVID-19 pandemic, food insecurity decreased among Supplemental Nutrition Assistance Program (SNAP) participants in most racial and ethnic groups but did not decrease among non-SNAP participants in any group. These results suggest that during the pandemic, increased SNAP benefit amounts were associated with ameliorating food insecurity for many U.S. adults who were able to access SNAP but did not reduce racial and ethnic disparities in food insecurity. Corresponding Author: To contact the corresponding author, Yingfei Wu, MD, MPH, email yingfeiwu322@gmail.com. To ...

After NIH decision to cap indirect costs, prominent molecular biologist calls for swift action, petition signatures

2025-02-28
On February 7, 2025, the U.S. National Institutes of Health announced a decision to cap indirect cost reimbursement—which supports the critical infrastructure and staff that make biomedical research possible—at 15%. In a commentary published February 28 in the Cell Press journal Cell, molecular biologist Tom Maniatis of the New York Genome Center (NYGC) and Columbia University's Zuckerman Institute reflects on the impact NIH funding has had on his own career and science, explores the value indirect investment has brought to U.S. science over the last five decades, and calls for urgent, unified action from the scientific community ...

Omitting race from lung function equations increases detection of asthma in Black children

Omitting race from lung function equations increases detection of asthma in Black children
2025-02-28
Despite ongoing progress, structural racism and health disparities continue to shape healthcare practices in ways healthcare providers may not even realize. A recent study in JAMA Network Open, published Feb. 28, 2025, shows that continued use of race-specific equations in the diagnostic process of children with asthma symptoms limits the identification of reduced lung function in Black children. “This finding is important because when these children are not identified as having reduced lung function, they may not receive further testing. This can lead to under-diagnosis, ...

LAST 30 PRESS RELEASES:

Rice researchers develop efficient lithium extraction method, setting stage for sustainable EV battery supply chains

Statement on ABMS denying new cardiovascular board

St. Jude scientists solve mystery of how the drug retinoic acid works to treat neuroblastoma

New device could allow you to taste a cake in virtual reality

Illinois researchers develop next-generation organic nanozymes and point-of-use system for food and agricultural uses

Kicking yourself: Going against one’s better judgment amplifies self-blame

Rice researchers harness gravity to create low-cost device for rapid cell analysis

Revolutionary copper-infused microvesicles: a new era in biofunctional medicine

Primary care practices with NPs are key to increasing health care access in less advantaged areas, Columbia Nursing study shows

TTUHSC conducting study to help patients that experience traumatic blood loss

Next top model: Competition-based AI study aims to lower data center costs

Innovative startup awarded $10,000 to tackle cardiovascular disparities

Study compares indoor transmission-risk metrics for infectious diseases

Micro-expression detection in ASD movies: a YOLOv8-SMART approach

Machine learning on blockchain: A new approach to engineering computational security

Vacuum glazing: A promising solution for low-carbon buildings

Racial and ethnic differences in out-of-pocket spending for maternity care

Study reveals racial and ethnic disparities in maternity care spending

Changes in food insecurity among US adults with low income during the COVID-19 pandemic

After NIH decision to cap indirect costs, prominent molecular biologist calls for swift action, petition signatures

Omitting race from lung function equations increases detection of asthma in Black children

The role of solute carrier family transporters in hepatic steatosis and hepatic fibrosis

Cold sore discovery IDs unknown trigger for those annoying flare-ups

Health organizations join forces on Rare Disease Day for idiopathic pulmonary fibrosis

How many languages can you learn at the same time? – Ghanaian babies grow up speaking two to six languages

Virginia Tech to lead $10 million critical mineral research coalition in Appalachia

CFRP and UHPC: New insights into strengthening reinforced concrete beams under thermocyclic distress

Armsworth receives SEC Faculty Achievement Award

Novel network dynamic approach presents new way for aeroengine performance evaluation

Gene therapy developed for maple syrup urine disease shows promise, new UMass Chan study reports

[Press-News.org] Next top model: Competition-based AI study aims to lower data center costs
Digital data center monitor uses machine learning to diagnose and cure cluster computer defects, reducing downtime and optimizing limited resources