(Press-News.org) Researchers from MIT and NVIDIA have developed two techniques that accelerate the processing of sparse tensors, a type of data structure that’s used for high-performance computing tasks. The complementary techniques could result in significant improvements to the performance and energy-efficiency of systems like the massive machine-learning models that drive generative artificial intelligence.
Tensors are data structures used by machine-learning models. Both of the new methods seek to efficiently exploit what’s known as sparsity — zero values — in the tensors. When processing these tensors, one can skip over the zeros and save on both computation and memory. For instance, anything multiplied by zero is zero, so it can skip that operation. And it can compress the tensor (zeros don’t need to be stored) so a larger portion can be stored in on-chip memory.
However, there are several challenges to exploiting sparsity. Finding the nonzero values in a large tensor is no easy task. Existing approaches often limit the locations of nonzero values by enforcing a sparsity pattern to simplify the search, but this limits the variety of sparse tensors that can be processed efficiently.
Another challenge is that the number of nonzero values can vary in different regions of the tensor. This makes it difficult to determine how much space is required to store different regions in memory. To make sure the region fits, more space is often allocated than is needed, causing the storage buffer to be underutilized. This increases off-chip memory traffic, which requires extra computation.
The MIT and NVIDIA researchers crafted two solutions to address these problems. For one, they developed a technique that allows the hardware to efficiently find the nonzero values for a wider variety of sparsity patterns.
For the other solution, they created a method that can handle the case where the data do not fit in memory, which increases the utilization of the storage buffer and reduces off-chip memory traffic.
Both methods boost the performance and reduce the energy demands of hardware accelerators specifically designed to speed up the processing of sparse tensors.
“Typically, when you use more specialized or domain-specific hardware accelerators, you lose the flexibility that you would get from a more general-purpose processor, like a CPU. What stands out with these two works is that we show that you can still maintain flexibility and adaptability while being specialized and efficient,” says Vivienne Sze, associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), a member of the Research Laboratory of Electronics (RLE), and co-senior author of papers on both advances.
Her co-authors include lead authors Yannan Nellie Wu PhD ’23 and Zi Yu Xue, an electrical engineering and computer science graduate student; and co-senior author Joel Emer, an MIT professor of the practice in computer science and electrical engineering and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL), as well as others at NVIDIA. Both papers will be presented at the IEEE/ACM International Symposium on Microarchitecture.
HighLight: Efficiently finding zero values
Sparsity can arise in the tensor for a variety of reasons. For example, researchers sometimes “prune” unnecessary pieces of the machine-learning models by replacing some values in the tensor with zeros, creating sparsity. The degree of sparsity (percentage of zeros) and the locations of the zeros can vary for different models.
To make it easier to find the remaining nonzero values in a model with billions of individual values, researchers often restrict the location of the nonzero values so they fall into a certain pattern. However, each hardware accelerator is typically designed to support one specific sparsity pattern, limiting its flexibility.
By contrast, the hardware accelerator the MIT researchers designed, called HighLight, can handle a wide variety of sparsity patterns and still perform well when running models that don’t have any zero values.
They use a technique they call “hierarchical structured sparsity” to efficiently represent a wide variety of sparsity patterns that are composed of several simple sparsity patterns. This approach divides the values in a tensor into smaller blocks, where each block has its own simple, sparsity pattern (perhaps two zeros and two nonzeros in a block with four values).
Then, they combine the blocks into a hierarchy, where each collection of blocks also has its own simple, sparsity pattern (perhaps one zero block and three nonzero blocks in a level with four blocks). They continue combining blocks into larger levels, but the patterns remain simple at each step.
This simplicity enables HighLight to more efficiently find and skip zeros, so it can take full advantage of the opportunity to cut excess computation. On average, their accelerator design was about six times more energy-efficient than other approaches.
“In the end, the HighLight accelerator is able to efficiently accelerate dense models because it does not introduce a lot of overhead, and at the same time it is able to exploit workloads with different amounts of zero values based on hierarchical structured sparsity,” Wu explains.
In the future, she and her collaborators want to apply hierarchical structured sparsity to more types of machine-learning models and different types of tensors in the models.
Tailors and Swiftiles: Effectively “overbooking” to accelerate workloads
Researchers can also leverage sparsity to more efficiently move and process data on a computer chip.
Since the tensors are often larger than what can be stored in the memory buffer on chip, the chip only grabs and processes a chunk of the tensor at a time. The chunks are called tiles.
To maximize the utilization of that buffer and limit the number of times the chip must access off-chip memory, which often dominates energy consumption and limits processing speed, researchers seek to use the largest tile that will fit into the buffer.
But in a sparse tensor, many of the data values are zero, so an even larger tile can fit into the buffer than one might expect based on its capacity. Zero values don’t need to be stored.
But the number of zero values can vary across different regions of the tensor, so they can also vary for each tile. This makes it difficult to determine a tile size that will fit in the buffer. As a result, existing approaches often conservatively assume there are no zeros and end up selecting a smaller tile, which results in wasted blank spaces in the buffer.
To address this uncertainty, the researchers propose the use of “overbooking” to allow them to increase the tile size, as well as a way to tolerate it if the tile doesn’t fit the buffer.
The same way an airline overbooks tickets for a flight, if all the passengers show up, the airline must compensate the ones who are bumped from the plane. But usually all the passengers don’t show up.
In a sparse tensor, a tile size can be chosen such that usually the tiles will have enough zeros that most still fit into the buffer. But occasionally, a tile will have more nonzero values than will fit. In this case, those data are bumped out of the buffer.
The researchers enable the hardware to only re-fetch the bumped data without grabbing and processing the entire tile again. They modify the “tail end” of the buffer to handle this, hence the name of this technique, Tailors.
Then they also created an approach for finding the size for tiles that takes advantage of overbooking. This method, called Swiftiles, swiftly estimates the ideal tile size so that a specific percentage of tiles, set by the user, are overbooked. (The names “Tailors” and “Swiftiles” pay homage to Taylor Swift, whose recent Eras tour was fraught with overbooked presale codes for tickets).
Swiftiles reduces the number of times the hardware needs to check the tensor to identify an ideal tile size, saving on computation. The combination of Tailors and Swiftiles more than doubles the speed while requiring only half the energy demands of existing hardware accelerators which cannot handle overbooking.
“Swiftiles allows us to estimate how large these tiles need to be without requiring multiple iterations to refine the estimate. This only works because overbooking is supported. Even if you are off by a decent amount, you can still extract a fair bit of speedup because of the way the non-zeros are distributed,” Xue says.
In the future, the researchers want to apply the idea of overbooking to other aspects in computer architecture and also work to improve the process for estimating the optimal level of overbooking.
This research is funded, in part, by the MIT AI Hardware Program.
###
Written by Adam Zewe, MIT News
Paper: “HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity”
https://arxiv.org/pdf/2305.12718.pdf
Paper: “Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Occupancy”
https://arxiv.org/pdf/2310.00192.pdf
END
New techniques efficiently accelerate sparse tensors for massive AI models
Complimentary approaches — “HighLight” and “Tailors and Swiftiles” — could boost the performance of demanding machine-learning tasks
2023-10-31
ELSE PRESS RELEASES FROM THIS DATE:
The world’s first collection of brain metastasis living samples will help treat each patient with the most effective therapy for them
2023-10-31
A paper published in Trends in Cancer explains the advantages of RENACER, the world’s first repository of live brain metastases samples, created by researchers at CNIO.
Live samples allow researchers to study the way cancer cells respond to drugs. This paves the way to create avatars for each individual patient to test out possible therapies before applying them.
RENACER is made up of around twenty hospitals, who attended their first general assembly meeting today at the Fundación Ramón Areces, the foundation that is funding the ...
Hey, Siri: Moderate AI voice speed encourages digital assistant use
2023-10-31
UNIVERSITY PARK, Pa. — Voice speed and interaction style may determine whether a user sees a digital assistant like Alexa or Siri as a helpful partner or something to control, according to a team led by Penn State researchers. The findings reveal insights into the parasocial, or one-sided, relationships that people can form with digital assistants, according to the researchers.
They reported their findings in the Journal of Business Research.
“We endow these digital assistants with personalities and human characteristics, and it impacts how we interact with the devices,” said Brett Christenson, assistant clinical professor of marketing at Penn State and first author ...
How cruise ships can steer clear of viral spread
2023-10-31
WASHINGTON, Oct. 31, 2023 – When COVID-19 began to spread across the globe, its effects were significantly pronounced on cruise ships. Indeed, compared to other population segments, cruise ship passengers became disproportionately infected and often, ironically, stranded on board to quarantine. That’s why focus has been directed at addressing the need for improved ventilation on cruise ships – since dispersing fresh air in cabins and other enclosed spaces is critical for mitigating viral spread.
In Physics of Fluids, by AIP Publishing, a group of researchers from Cyprus examined how ventilation can affect ...
Masks during pandemics caused by respiratory pathogens— evidence and implications for action
2023-10-31
About The Study: Robust available data support the use of face masks in community settings to reduce transmission of SARS-CoV-2 and should inform future responses to epidemics and pandemics caused by respiratory viruses.
Authors: Shama Cash-Goldwasser, M.D., M.P.H., of Resolve to Save Lives in New York, is the corresponding author.
To access the embargoed study: Visit our For The Media website at this link https://media.jamanetwork.com/
(doi:10.1001/jamanetworkopen.2023.39443)
Editor’s Note: Please see the article for additional information, including other authors, ...
Interpregnancy interval after clinical pregnancy loss and outcomes of the next frozen embryo transfer
2023-10-31
About The Study: The results of this study of 2,433 women who received in vitro fertilization treatment suggest that delaying frozen embryo transfer for at least six months after a preceding clinical pregnancy loss was associated with beneficial pregnancy outcomes. Further prospective studies are needed to confirm these findings.
Authors: Daimin Wei, M.D., Ph.D., of Shandong University in Jinan, China, is the corresponding author.
To access the embargoed study: Visit our For The Media website at this link https://media.jamanetwork.com/
(doi:10.1001/jamanetworkopen.2023.40709)
Editor’s Note: Please see the article for additional information, ...
Wearing your heart (monitor) on your sleeve
2023-10-31
WASHINGTON, Oct. 31, 2023 – Nearly 200 million people around the globe have coronary heart disease, which accounts for about one in every six deaths, according to the British Heart Foundation. That’s why the recent and rapid rise in wearable electronic health-monitoring devices with heart rate-measuring electrocardiograms (ECG) represents a significant step forward. By detecting cardiovascular ailments and helping assess overall cardiac health, wearable ECGs save lives, not to mention exorbitant ...
Studies illustrate moderate awareness—and room for growth—with new 988 lifeline
2023-10-31
Two studies led by researchers at NYU’s School of Global Public Health and Silver School of Social Work and published in JAMA Network Open show emerging awareness of the new 988 Suicide and Crisis Lifeline among both policymakers and the general public—but also point to potential areas of improvement for the vital nationwide service.
In July 2022, “988” became the new number for the National Suicide and Crisis Lifeline, which provides a phone, text, and chat resource for people who are experiencing suicidal thoughts, hopelessness, substance use crises, and other psychological distress. Similar to ...
High insulin levels directly linked to pancreatic cancer
2023-10-31
A new study from researchers at the University of British Columbia's Faculty of Medicine reveals a direct link between high insulin levels, common among patients with obesity and Type 2 diabetes, and pancreatic cancer.
The study, published in Cell Metabolism, provides the first detailed explanation of why people with obesity and Type 2 diabetes are at an increased risk of pancreatic cancer. The research demonstrates that excessive insulin levels overstimulate pancreatic acinar cells, which produce digestive juices. This overstimulation leads to inflammation that converts these cells into precancerous ...
New study reveals insights from US cohort of the FLASH registry on mechanical thrombectomy for high-risk pulmonary embolism
2023-10-31
WASHINGTON – New research from the FLASH registry (ClinicalTrials.gov identifier: NCT03761173) shines a light on the effectiveness of large-bore mechanical thrombectomy in managing high-risk pulmonary embolism. The study, titled "Mechanical Thrombectomy for High-risk Pulmonary Embolism: Insights from the US Cohort of the FLASH Registry," provides valuable insights into this life-saving procedure. The findings were released today in the Journal of the Society for Cardiovascular ...
Individuals with severe sickle cell disease express high risk tolerance for gene therapies
2023-10-31
(WASHINGTON, Oct. 31, 2023) – Individuals living with severe sickle cell disease (SCD) are highly interested in new, potentially curative gene therapy treatments and are willing to accept associated risks for a chance at a cure, according to a study published today in Blood Advances.
SCD is an inherited red blood cell disorder affecting approximately 100,000 people in the United States. According to the Centers for Disease Control and Prevention (CDC), SCD affects one out of every 365 Black or African American births and one out of every 16,300 Hispanic American births. Those living with ...
LAST 30 PRESS RELEASES:
Mapping gene regulation
Exposure to air pollution before pregnancy linked to higher child body mass index, study finds
Neural partially linear additive model
Dung data: manure can help to improve global maps of herbivore distribution
Concerns over maternity provision for pregnant women in UK prisons
UK needs a national strategy to tackle harms of alcohol, argue experts
Aerobic exercise: a powerful ally in the fight against Alzheimer’s
Cambridge leads first phase of governmental project to understand impact of smartphones and social media on young people
AASM Foundation partners with Howard University Medical Alumni Association to provide scholarships
Protective actions need regulatory support to fully defend homeowners and coastal communities, study finds
On-chip light control of semiconductor optoelectronic devices using integrated metasurfaces
America’s political house can become less divided
A common antihistamine shows promise in treating liver complications of a rare disease complication
Trastuzumab emtansine improves long-term survival in HER2 breast cancer
Is eating more red meat bad for your brain?
How does Tourette syndrome differ by sex?
Red meat consumption increases risk of dementia and cognitive decline
Study reveals how sex and racial disparities in weight loss surgery have changed over 20 years
Ultrasound-directed microbubbles could boost immune response against tumours, new Concordia research suggests
In small preliminary study, fearful pet dogs exhibited significantly different microbiomes and metabolic molecules to non-fearful dogs, suggesting the gut-brain axis might be involved in fear behavior
Examination of Large Language Model "red-teaming" defines it as a non-malicious team-effort activity to seek LLMs' limits and identifies 35 different techniques used to test them
Most microplastics in French bottled and tap water are smaller than 20 µm - fine enough to pass into blood and organs, but below the EU-recommended detection limit
A tangled web: Fossil fuel energy, plastics, and agrichemicals discourse on X/Twitter
This fast and agile robotic insect could someday aid in mechanical pollination
Researchers identify novel immune cells that may worsen asthma
Conquest of Asia and Europe by snow leopards during the last Ice Ages uncovered
Researchers make comfortable materials that generate power when worn
Study finding Xenon gas could protect against Alzheimer’s disease leads to start of clinical trial
Protein protects biological nitrogen fixation from oxidative stress
Three-quarters of medical facilities in Mariupol sustained damage during Russia’s siege of 2022
[Press-News.org] New techniques efficiently accelerate sparse tensors for massive AI modelsComplimentary approaches — “HighLight” and “Tailors and Swiftiles” — could boost the performance of demanding machine-learning tasks