Contact Information:

Media Contact

Andrew Carleen

Twitter: MIT

Kredyty mieszkaniowe Kredyty mieszkaniowe

Sprawdź aktualny ranking najlepszych kredytów mieszkaniowych w Polsce - atrakcyjne kredytowanie nieruchomości. - Press Release Distribution
RSS - Press News Release
Add Press Release

Searching big data faster

Theoretical analysis could expand applications of accelerated searching in biology, other fields

( CAMBRIDGE, Mass.--For more than a decade, gene sequencers have been improving more rapidly than the computers required to make sense of their outputs. Searching for DNA sequences in existing genomic databases can already take hours, and the problem is likely to get worse.

Recently, Bonnie Berger's group at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) has been investigating techniques to make biological and chemical data easier to analyze by, in some sense, compressing it.

In the latest issue of the journal Cell Systems, Berger and colleagues present a theoretical analysis that demonstrates why their previous compression schemes have been so successful. They identify properties of data sets that make them amenable to compression and present an algorithm for determining whether a given data set has those properties. They also show that several existing databases of chemical compounds and biological molecules do indeed exhibit them.

Given measurements for those properties, the researchers can also calculate the improvements in search efficiency that their compression techniques afford. For the data sets they analyze, those efficiencies scale sublinearly, meaning that the larger the data set, the more efficient the search should be.

"This paper provides a framework for how we can apply compressive algorithms to large-scale biological data," says Berger, a professor of applied mathematics at MIT. "We also have proofs for how much efficiency we can get."

The key to the researchers' compression scheme is that evolution is stingy with good designs. There tends to be a lot of redundancy in the genomes of closely related -- or even distantly related -- organisms.

That means that of all the possible sequences of the four DNA letters -- A, T, C, and G -- only a very small subset is represented by the genomes of real organisms. Moreover, within the space of possible genomes, those of real organisms are not distributed randomly. Instead, they trace out continuous patterns, which represent the relatively slow rate at which species diverge.

Birds of a feather

To make searching more efficient, the Berger group's compression algorithms cluster together similar genomic sequences -- those that diverge by only a few DNA letters --then choose one sequence as representative of the cluster. A search can concentrate only on the likeliest clusters; most of the data never has to be examined.

If genomic data is envisioned as tracing a continuous path through a much larger space of possibilities, then the clusters can be envisioned as spheres superimposed on the data. Data points that fall within a single sphere are closely related.

Berger and her colleagues -- first authors Noah Daniels, a postdoc in her group, and William Yu, a graduate student in applied mathematics, and David Danko, an undergraduate major in computational biology -- show that data sets are amenable to their compressive search techniques if they meet two criteria. The first they refer to as metric entropy. This means that the data inhabits only a small part of the larger space of possibilities.

The second is low fractal dimension. That means that the density of the data points doesn't vary greatly as you move through the data. If your search requires you to explore three spheres rather than one, it takes only three times as long -- not 10 times, or 100 times.

In their paper, the MIT researchers analyze three data sets. Two describe proteins -- one according to their sequences of amino acids, the other according to their shape -- and the third describes organic molecules. In a separate paper, now under submission, the researchers apply the same types of analysis to DNA segments between 32 and 63 letters in length.

Time's arrow

The efficiency of their search algorithm scales sublinearly, not with the number of data points, but with the metric entropy of the data set, which is a formal measure of the continuity of the data and their sparseness, relative to the space of possibilities. Because evolution is conservative, the metric entropy of genomic data should increase as new genomes are sequenced. That is, the addition of new genomes will not, in all likelihood, add new branches to the pattern traced out in the space of possibilities; rather, it will fill in gaps in the existing pattern, increasing the metric entropy.

Many other large data sets, however, could prove to be conservative in the same way. The range of behaviors exhibited by Web users, for instance, may, relative to the entire space of possibilities, be constrained by biology, by cultural history, or both. The MIT researchers' compression techniques could thus be applicable to a wide range of data outside biology.



Something to crow about

Something to crow about
Among our greatest achievements as humans, some might say, is our cumulative technological culture -- the tool-using acumen that is passed from one generation to the next. As the implements we use on a daily basis are modified and refined over time, they seem to evolve right along with us. A similar observation might be made regarding the New Caledonian crow, an extremely smart corvid and the only non-human species hypothesized to possess its own cumulative technological culture. How the birds transmit knowledge to each other is the focus of a study by Corina Logan, a ...

Cannabis and the brain, 2 studies, 1 editorial examine associations

Two studies and an editorial published online by JAMA Psychiatry examine associations between cannabis use and the brain. Cannabis, also known as marijuana, is a popular recreational drug and its legal status has been a source of enduring controversy. In the first study, David Pagliaccio, Ph.D., formerly of Washington University in St. Louis, and now at the National Institute of Mental Health, Bethesda, Md., and coauthors analyzed data from a group of twin/siblings (n=483 with 262 participants reporting ever using cannabis in their lifetime) to determine whether cannabis ...

Cannabis use may influence cortical maturation in adolescent males

Toronto, CANADA - Male teens who experiment with cannabis before age 16, and have a high genetic risk for schizophrenia, show a different brain development trajectory than low risk peers who use cannabis. The discovery, made from a combined analysis of over 1,500 youth, contributes to a growing body of evidence implicating cannabis use in adolescence and schizophrenia later in life. The study was led by Baycrest Health Sciences' Rotman Research Institute in Toronto and is reported in JAMA Psychiatry (online) today, ahead of print publication. Adolescence is a period ...

Cell transplantation procedure may one day replace liver transplants

Putnam Valley, NY. (Aug. 26, 2015) - Liver transplantation is currently the only established treatment for patients with end stage liver failure. However, this treatment is limited by the shortage of donors and the conditional integrity and suitability of the available organs. Transplanting donor hepatocytes (liver cells) into the liver as an alternative to liver transplantation also has drawbacks as the rate of survival of primary hepatocytes is limited and often severe complications can result from the transplantation procedure. In an effort to find potential therapeutic ...

Earth's mineralogy unique in the cosmos

Earths mineralogy unique in the cosmos
Washington, DC--New research from a team led by Carnegie's Robert Hazen predicts that Earth has more than 1,500 undiscovered minerals and that the exact mineral diversity of our planet is unique and could not be duplicated anywhere in the cosmos. Minerals form from novel combinations of elements. These combinations can be facilitated by both geological activity, including volcanoes, plate tectonics, and water-rock interactions, and biological activity, such as chemical reactions with oxygen and organic material. Nearly a decade ago, Hazen developed the idea that the ...

Observation stays over hospital admissions drives up costs for some Medicare patients

PHILADELPHIA - In the midst of a growing trend for Medicare patients to receive observation care in the hospital to determine if they should be formally admitted, a new study from researchers at the Perelman School of Medicine at the University of Pennsylvania shows that for more than a quarter of beneficiaries with multiple observation stays, the cumulative out-of-pocket costs of these visits exceeds the deductible they would have owed for an inpatient hospital admission. According to the Medicare Payment Advisory Commission, there were 1.8 million observation patients ...

LSU researchers conduct post-hurricane recovery analysis

BATON ROUGE - Ten years after Hurricanes Katrina and Rita devastated the Gulf Coast, LSU researchers have analyzed and documented the recovery effort for the state. Initial reports have been released this week. Due to the unprecedented destruction of the 2005 storm season, recovery efforts traditionally supported by insurance and FEMA were supplemented by a unique set of programs funded through $13.4 billion of Community Development Block Grant-Disaster Recovery, or CDBG-DR funds. The researchers from the LSU AgCenter and E. J. Ourso College of Business focused on ...

One in four hepatitis C patients denied initial approval for drug treatment

New Haven, Conn. -- Nearly one in four patients with chronic hepatitis C (HCV) are denied initial approval for a drug therapy that treats the most common strain of the infection, according to a Yale School of Medicine study. The finding, published Aug. 27 in PLOS ONE, identifies a new barrier to caring for patients with this severe condition. Prior to the FDA approval of novel antiviral therapies for HCV in 2014, treatment options for patients were limited, requiring weekly injections of interferon-based therapy that caused severe side effects. The new regimens revolutionized ...

Self-healing material could plug life-threatening holes in spacecraft (video)

For astronauts living in space with objects zooming around them at 22,000 miles per hour like rogue super-bullets, it's good to have a backup plan. Although shields and fancy maneuvers could help protect space structures, scientists have to prepare for the possibility that debris could pierce a vessel. In the journal ACS Macro Letters, one team reports on a new material that heals itself within seconds and could prevent structural penetration from being catastrophic. It's hard to imagine a place more inhospitable to life than space. Yet humans have managed to travel ...

NASA measures rainfall in stronger Tropical Storm Ignacio

NASA measures rainfall in stronger Tropical Storm Ignacio
The Global Precipitation Measurement or GPM mission core satellite measured rainfall as Tropical Depression Twelve was upgraded to Tropical Storm Ignacio. Tropical Depression 12E strengthened into Tropical Storm Ignacio at 5 p.m. EDT yesterday, August 25. At that time, it became the ninth named tropical storm of the Eastern Pacific hurricane season. The GPM core observatory satellite saw Ignacio on August 25, 2015 at 2256 UTC. GPM's Dual-Frequency Precipitation Radar (DPR) found rain falling at a rate of over 74 mm (2.9 inches) per hour with storm tops reaching to altitudes ...


How your brain decides blame and punishment -- and how it can be changed

Uniquely human brain region enables punishment decisions

Pinpointing punishment

Chapman University publishes research on attractiveness and mating

E-cigarettes: Special issue from Nicotine & Tobacco Research

Placental problems in early pregnancy associated with 5-fold increased risk of OB & fetal disorders

UT study: Invasive brood parasites a threat to native bird species

Criminals acquire guns through social connections

Restoring ocean health

Report: Cancer remains leading cause of death in US Hispanics

Twin study suggests genetic factors contribute to insomnia in adults

To be fragrant or not: Why do some male hairstreak butterflies lack scent organs?

International team discovers natural defense against HIV

Bolivian biodiversity observatory takes its first steps

Choice of college major influences lifetime earnings more than simply getting a degree

Dominant strain of drug-resistant MRSA decreases in hospitals, but persists in community

Synthetic biology needs robust safety mechanisms before real world application

US defense agencies increase investment in federal synthetic biology research

Robots help to map England's only deep-water Marine Conservation Zone

Mayo researchers identify protein -- may predict who will respond to PD-1 immunotherapy for melanoma

How much water do US fracking operations really use?

New approach to mammograms could improve reliability

The influence of citizen science grows despite some resistance

Unlocking secrets of how fossils form

What happens on the molecular level when smog gets into the lungs?

Using ultrasound to clean medical instruments

Platinum and iron oxide working together get the job done

Tiny silica particles could be used to repair damaged teeth, research shows

A quantum lab for everyone

No way? Charity's logo may influence perception of food in package

[] Searching big data faster
Theoretical analysis could expand applications of accelerated searching in biology, other fields is a service of DragonFly Company. All Rights Reserved.
Issuers of news releases are solely responsible for the accuracy of their content.