(Press-News.org) Contact information: Abby Abazorius
abbya@mit.edu
617-253-2709
Massachusetts Institute of Technology
Storage system for 'big data' dramatically speeds access to information
Using multiple nodes allows the same bandwidth and performance from a storage network as far more expensive machines
As computers enter ever more areas of our daily lives, the amount of data they produce has grown enormously.
But for this "big data" to be useful it must first be analyzed, meaning it needs to be stored in such a way that it can be accessed quickly when required.
Previously, any data that needed to be accessed in a hurry would be stored in a computer's main memory, or dynamic random access memory (DRAM) — but the size of the datasets now being produced makes this impossible.
So instead, information tends to be stored on multiple hard disks on a number of machines across an Ethernet network. However, this storage architecture considerably increases the time it takes to access the information, according to Sang-Woo Jun, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT.
"Storing data over a network is slow because there is a significant additional time delay in managing data access across multiple machines in both software and hardware," Jun says. "And if the data does not fit in DRAM, you have to go to secondary storage — hard disks, possibly connected over a network — which is very slow indeed."
Now Jun, fellow CSAIL graduate student Ming Liu, and Arvind, the Charles W. and Jennifer C. Johnson Professor of Electrical Engineering and Computer Science, have developed a storage system for big-data analytics that can dramatically speed up the time it takes to access information.
The system, which will be presented in February at the International Symposium on Field-Programmable Gate Arrays in Monterey, Calif., is based on a network of flash storage devices.
Flash storage systems perform better at tasks that involve finding random pieces of information from within a large dataset than other technologies. They can typically be randomly accessed in microseconds. This compares to the data "seek time" of hard disks, which is typically four to 12 milliseconds when accessing data from unpredictable locations on demand.
Flash systems also are nonvolatile, meaning they do not lose any of the information they hold if the computer is switched off.
In the storage system, known as BlueDBM — or Blue Database Machine — each flash device is connected to a field-programmable gate array (FPGA) chip to create an individual node. The FPGAs are used not only to control the flash device, but are also capable of performing processing operations on the data itself, Jun says.
"This means we can do some processing close to where the data is [being stored], so we don't always have to move all of the data to the machine to work on it," he says.
What's more, FPGA chips can be linked together using a high-performance serial network, which has a very low latency, or time delay, meaning information from any of the nodes can be accessed within a few nanoseconds. "So if we connect all of our machines using this network, it means any node can access data from any other node with very little performance degradation, [and] it will feel as if the remote data were sitting here locally," Jun says.
Using multiple nodes allows the team to get the same bandwidth and performance from their storage network as far more expensive machines, he adds.
The team has already built a four-node prototype network. However, this was built using 5-year-old parts, and as a result is quite slow.
So they are now building a much faster 16-node prototype network, in which each node will operate at 3 gigabytes per second. The network will have a capacity of 16 to 32 terabytes.
Using the new hardware, Liu is also building a database system designed for use in big-data analytics. The system will use the FPGA chips to perform computation on the data as it is accessed by the host computer, to speed up the process of analyzing the information, Liu says.
"If we're fast enough, if we add the right number of nodes to give us enough bandwidth, we can analyze high-volume scientific data at around 30 frames per second, allowing us to answer user queries at very low latencies, making the system seem real-time," he says. "That would give us an interactive database."
As an example of the type of information the system could be used on, the team has been working with data from a simulation of the universe generated by researchers at the University of Washington. The simulation contains data on all the particles in the universe, across different points in time.
"Scientists need to query this rather enormous dataset to track which particles are interacting with which other particles, but running those kind of queries is time-consuming," Jun says. "We hope to provide a real-time interface that scientists can use to look at the information more easily."
###
Written by Helen Knight, MIT News correspondent
Storage system for 'big data' dramatically speeds access to information
Using multiple nodes allows the same bandwidth and performance from a storage network as far more expensive machines
2014-01-30
ELSE PRESS RELEASES FROM THIS DATE:
CU-Boulder researchers sequence world's first butterfly bacteria, find surprises
2014-01-30
For the first time ever, a team led by the University of Colorado Boulder has sequenced the internal bacterial makeup of the three major life stages of a butterfly ...
Signs point to sharp rise in drugged driving fatalities
2014-01-30
The prevalence of non-alcohol drugs detected in fatally injured ...
Dartmouth researchers develop new tool to identify genetic risk factors
2014-01-30
(Lebanon, NH, 1/30/14) —Dartmouth researchers developed a new biological pathway-based computational model, called the Pathway-based Human Phenotype Network (PHPN), ...
Study finds brachytherapy offers lower rate of breast preservation compared to standard radiation for older women with breast cancer
2014-01-30
HOUSTON — When comparing treatments designed to enable long-term breast preservation for older ...
NASA gets 2 views of Tropical Cyclone Dylan making landfall in Australia
2014-01-30
NASA's Aqua satellite passed over Tropical Cyclone Dylan and captured both visible and infrared imagery of the storm as it began landfalling. The visible image showed the extent of the storm, ...
Women with mental health disability may face 4-fold risk of abusive relationship: Study
2014-01-30
TORONTO, ON, January 30, 2014 – Women with a severe mental health-related disability are nearly four times more likely to have been a victim of intimate partner violence ...
Researchers reverse some lung diseases in mice by coaxing production of healthy cells
2014-01-30
BOSTON, January 30, 2014—It may be possible one day to treat several lung diseases by introducing proteins that direct lung stem cells to grow the specific cell types ...
Zebra fish fins help Oregon researchers gain insight into bone regeneration
2014-01-30
EUGENE, Ore. -- University of Oregon biologists say they have opened the window on the natural ...
NIST cell membrane model studied as future diagnostic tool
2014-01-30
Researchers at the National Institute of Standards and Technology (NIST) and in Lithuania have used a NIST-developed laboratory model of a simplified cell membrane ...
New prognostic tool accurately predicts mortality risk in pediatric septic shock
2014-01-30
CINCINNATI - Researchers have developed a tool that allows caregivers to quickly and accurately predict the risk of death in children with septic shock – a systemic ...
LAST 30 PRESS RELEASES:
Language a barrier in biodiversity work
School dinners may encourage picky teenagers to eat better, says new study
Study suggests loss of lung capacity begins between the ages of 20 and 25
California chief nurse officer recognized as national champion for women’s health
Dental and vision services among veterans in Medicare Advantage vs traditional Medicare
Under embargo: Mount Sinai experts to present new research on preeclampsia, doula care and more at 2025 2025 ACOG Annual Clinical and Scientific Meeting
Study reveals a deep brain region that links the senses
Bismuth’s mask uncovered: Implications for quantum computing and spintronics materials
Two HIV vaccine trials show proof of concept for pathway to broadly neutralizing antibodies
Ewell joins Gerontological Society of America’s Board of Directors
Large study traces prehistoric human expansion into South America, where genomic studies have been lacking
Millions of previously undocumented genetic variants discovered in Brazil’s highly admixed population
Limited evidence for “escalator to extinction” in mountain ecosystems under climate change
Asians made humanity’s longest prehistoric migration and shaped the genetic landscape in the Americas, finds NTU Singapore-led study
OHSU study reveals impact of oft-overlooked cell in brain function
World’s largest bat organoid platform paves the way for pandemic preparedness
Mapping the genome of the Brazilian population, with implications for healthcare
Proof of concept for Amsterdam UMC-led HIV vaccination
MSK researchers identify key player in childhood food allergies: Thetis cells
Link between ADHD and obesity might depend on where you live
Scientists find two brain biomarkers in long COVID sufferers may be what’s causing their brain fog, other cognitive issues
Empowering cities to act: The Climate Action Navigator highlights where climate action is most needed
KAIST's pioneering VR precision technology & choreography tool receives spotlights at CHI 2025
Recently, a joint Chinese–American research team led by Dr. HU Han from the Institute of Vertebrate Paleontology and Paleoanthropology (IVPP) of the Chinese Academy of Sciences and Dr. Jingmai O’Conno
Nationally recognized emergency radiologist Tarek Hanna, MD, named new chair of Diagnostic Radiology & Nuclear Medicine at the University of Maryland School of Medicine
“Chicago archaeopteryx” unveiled: New clues on dinosaur–bird transition revealed by Chinese–American research team
‘Rogue’ immune cells explain why a gluten-free diet fails in some coeliac patients
World's first patient treated with personalized CRISPR gene editing therapy at Children’s Hospital of Philadelphia
Infant with rare, incurable disease is first to successfully receive personalized gene therapy treatment
Digital reconstruction reveals 80 steps of prehistoric life
[Press-News.org] Storage system for 'big data' dramatically speeds access to informationUsing multiple nodes allows the same bandwidth and performance from a storage network as far more expensive machines