(Press-News.org) Comparing the genomes of different species -- or different members of the same species -- is the basis of a great deal of modern biology. DNA sequences that are conserved across species are likely to be functionally important, while variations between members of the same species can indicate different susceptibilities to disease.
The basic algorithm for determining how much two sequences of symbols have in common -- the "edit distance" between them -- is now more than 40 years old. And for more than 40 years, computer science researchers have been trying to improve upon it, without much success.
At the ACM Symposium on Theory of Computing (STOC) next week, MIT researchers will report that, in all likelihood, that's because the algorithm is as good as it gets. If a widely held assumption about computational complexity is correct, then the problem of measuring the difference between two genomes -- or texts, or speech samples, or anything else that can be represented as a string of symbols -- can't be solved more efficiently.
In a sense, that's disappointing, since a computer running the existing algorithm would take 1,000 years to exhaustively compare two human genomes. But it also means that computer scientists can stop agonizing about whether they can do better.
"This edit distance is something that I've been trying to get better algorithms for since I was a graduate student, in the mid-'90s," says Piotr Indyk, a professor of computer science and engineering at MIT and a co-author of the STOC paper. "I certainly spent lots of late nights on that -- without any progress whatsoever. So at least now there's a feeling of closure. The problem can be put to sleep."
Moreover, Indyk says, even though the paper hasn't officially been presented yet, it's already spawned two follow-up papers, which apply its approach to related problems. "There is a technical aspect of this paper, a certain gadget construction, that turns out to be very useful for other purposes as well," Indyk says.
Squaring off
Edit distance is the minimum number of edits -- deletions, insertions, and substitutions -- required to turn one string into another. The standard algorithm for determining edit distance, known as the Wagner-Fischer algorithm, assigns each symbol of one string to a column in a giant grid and each symbol of the other string to a row. Then, starting in the upper left-hand corner and flooding diagonally across the grid, it fills in each square with the number of edits required to turn the string ending with the corresponding column into the string ending with the corresponding row.
Computer scientists measure algorithmic efficiency as computation time relative to the number of elements the algorithm manipulates. Since the Wagner-Fischer algorithm has to fill in every square of its grid, its running time is proportional to the product of the lengths of the two strings it's considering. Double the lengths of the strings, and the running time quadruples. In computer parlance, the algorithm runs in quadratic time.
That may not sound terribly efficient, but quadratic time is much better than exponential time, which means that running time is proportional to 2N, where N is the number of elements the algorithm manipulates. If on some machine a quadratic-time algorithm took, say, a hundredth of a second to process 100 elements, an exponential-time algorithm would take about 100 quintillion years.
Theoretical computer science is particularly concerned with a class of problems known as NP-complete. Most researchers believe that NP-complete problems take exponential time to solve, but no one's been able to prove it. In their STOC paper, Indyk and his student Art?rs Bačkurs demonstrate that if it's possible to solve the edit-distance problem in less-than-quadratic time, then it's possible to solve an NP-complete problem in less-than-exponential time. Most researchers in the computational-complexity community will take that as strong evidence that no subquadratic solution to the edit-distance problem exists.
Can't get no satisfaction
The core NP-complete problem is known as the "satisfiability problem": Given a host of logical constraints, is it possible to satisfy them all? For instance, say you're throwing a dinner party, and you're trying to decide whom to invite. You may face a number of constraints: Either Alice or Bob will have to stay home with the kids, so they can't both come; if you invite Cindy and Dave, you'll have to invite the rest of the book club, or they'll know they were excluded; Ellen will bring either her husband, Fred, or her lover, George, but not both; and so on. Is there an invitation list that meets all those constraints?
In Indyk and Bačkurs' proof, they propose that, faced with a satisfiability problem, you split the variables into two groups of roughly equivalent size: Alice, Bob, and Cindy go into one, but Walt, Yvonne, and Zack go into the other. Then, for each group, you solve for all the pertinent constraints. This could be a massively complex calculation, but not nearly as complex as solving for the group as a whole. If, for instance, Alice has a restraining order out on Zack, it doesn't matter, because they fall in separate subgroups: It's a constraint that doesn't have to be met.
At this point, the problem of reconciling the solutions for the two subgroups -- factoring in constraints like Alice's restraining order -- becomes a version of the edit-distance problem. And if it were possible to solve the edit-distance problem in subquadratic time, it would be possible to solve the satisfiability problem in subexponential time.
INFORMATION:
PHILADELPHIA -- More than one-third of counties in the Unites States are located more than 50 miles from the nearest gynecologic oncologist, making access to specialty care for ovarian and other gynecologic cancers difficult for nearly 15 million women. While most of these "low access" counties are located in the Mountain-West and Midwest regions, the findings of a recent study from researchers at the Perelman School of Medicine at the University of Pennsylvania also reveal that 47 states have at least one county located more than 50 miles from the nearest gynecologic oncologist. ...
PHILADELPHIA - According to a new study, women experiencing difficulty with time management, attention, organization, memory, and problem solving - often referred to as executive functions - related to menopause may find improvement with a drug already being used to treat attention deficit hyperactivity disorder (ADHD). The study led by researchers at the Perelman School of Medicine at the University of Pennsylvania is the first to show that lisdexamfetamine (LDX) improved subjective and objective measures of cognitive decline commonly experienced in menopausal women. Results ...
Researchers from Boston Children's Hospital and Merck have built the beginnings of "digital phenotype" of insomnia and other sleep disorders based on data from Twitter. This study, published today in the Journal of Medical Internet Research, is one of the first to look at relationships between social media use and sleep issues, and--based on assessments the sentiments expressed in users' tweets--gives preliminary hints that patients with sleep disorders may be a greater risk of psychosocial issues.
The study--led by Jared Hawkins, PhD; David McIver, PhD; and John Brownstein, ...
Philadelphia, PA - "The impact of human mobility on disease dynamics has been the focus of mathematical epidemiology for many years, especially since the 2002-03 SARS outbreak, which showed that an infectious agent can spread across the globe very rapidly via transportation networks," says mathematician Gergely Röst. Röst is co-author of a paper to be published this week in the SIAM Journal on Applied Dynamical Systems that presents a mathematical model to study the effects of individual movement on infectious disease spread.
"More recently, the risk of polio ...
Washington D.C., June 11, 2015 - A study published in the June 2015 issue of the Journal of the American Academy of Child and Adolescent Psychiatry demonstrates that the atypical trajectory of cortical/brain development in autism spectrum disorder (ASD) extends well beyond young childhood and into late adolescence and young adulthood.
A considerable amount of work has focused on early structural brain development in ASD utilizing magnetic resonance imaging (MRI). This body of work has revealed evidence for brain overgrowth during the early postnatal years that appears ...
CAMBRIDGE, Mass--An MIT team has developed a way of making soft materials, using a 3-D printer, with surface textures that can then be modified at will to be perfectly smooth, or ridged or bumpy, or even to have complex patterns that could be used to guide fluids.
The process, developed using detailed computer simulations, involves a material that is composed of two different polymers with different degrees of stiffness: More rigid particles are embedded within a matrix of a more flexible polymer. When squeezed, the material's surface changes from smooth to a pattern ...
June 11, 2015 - The well-known "fight or flight" response is part of the inborn series of defense/fear responses activated in reaction to threats. Understanding the steps of the defense cascade can help in forming effective treatments for patients dealing with persistent aftereffects of trauma, according to a review in the Harvard Review of Psychiatry. The journal is published by Wolters Kluwer.
Child and adolescent psychiatrist Kasia Kozlowska of The Children's Hospital at Westmead, Australia, and colleagues explain the five steps of the defense cascade, in a framework ...
In a study that seems to defy conventional dietary wisdom, University of Iowa scientists have found that adding high salt to a high-fat diet actually prevents weight gain in mice.
As exciting as this may sound to fast food lovers, the researchers caution that very high levels of dietary salt are associated with increased risk for cardiovascular disease in humans. Rather than suggest that a high salt diet is suddenly a good thing, the researchers say these findings really point to the profound effect non-caloric dietary nutrients can have on energy balance and weight gain.
"People ...
This news release is available in German.
Small heat shock proteins ensure that other proteins do not clot, allowing the cell to survive stress. Defects in these "small helpers" are associated with medical conditions like cataracts and cancer. Now, scientists at the Technische Universität München (TUM) have characterized a small heat shock protein responsible for embryonic development in the Caenorhabditis elegans nematode. Presumably, a similar protein exists also in humans.
Like humans, cells often face catastrophic situations. Even though cells are ...
PETALUMA, California--A recent study published in the journal Conservation Biology makes a strong case for a new approach to conservation planning that uses much more robust data sets in order to better protect birds, plants, and animals.
The concept is fairly simple, but won't work unless scientists can agree to share data across studies.
"Right now, we primarily only use presence and absence data for species when conservation planning for large landscapes. Much of this is due to the cost and time of collecting more comprehensive data," said the study's lead author, ...