Speeding up sequence alignment across the tree of life
A sequence search engine for a new era of conservation genomics
2021-04-12
(Press-News.org) A team of researchers from the Max Planck Institutes of Developmental Biology in Tübingen and the Max Planck Computing and Data Facility in Garching develops new search capabilities that will allow to compare the biochemical makeup of different species from across the tree of life. Its combination of accuracy and speed is hitherto unrivalled.
Humans share many sequences of nucleotides that make up our genes with other species - with pigs in particular, but also with mice and even bananas. Accordingly, some proteins in our bodies - strings of amino acids assembled according to the blueprint of the genes - can also be the same as (or similar to) some proteins in other species. These similarities might sometimes indicate that two species have a common ancestry, or they may simply come about if the evolutionary need for a certain feature or molecular function happens to arise in the two species.
Beating the gold standard of comparative genomics research
But of course, finding out what you share with a pig or a banana can be a monumental task; the search of a database with all the information about you, the pig, and the banana is computationally quite involved. Researchers are expecting that the genomes of more than 1.5 million eukaryotic species - that includes all animals, plants, and mushrooms - will be sequenced within the next decade. "Even now, with only hundreds of thousand genomes available (mostly representing small genomes of bacteria and viruses), we are already looking at databases with up to 370 million sequences. Most current search tools would simply be impracticable and take too long to analyze data of the magnitude that we are expecting in the near future," explains Hajk-Georg Drost, Computational Biology group leader in the Department of Molecular Biology of the Max Planck Institute of Developmental Biology in Tübingen.
"For a long time, the gold standard for this kind of analyses used to be a tool called BLAST," recalls Drost. "If you tried to trace how a protein was maintained by natural selection or how it developed in different phylogenetic lineages, BLAST gave you the best matches at this scale. But it is foreseeable that at some point the databases will grow too large for comprehensive BLAST searches."
Finding the needle in the haystack - but quickly!
At the core of the problem is a tradeoff between speed versus sensitivity: just like you will miss some small or well-hidden Easter eggs if you scan a room only briefly, speeding up the search for similarities of protein sequences in a database typically comes with downside of missing some of the less obvious matches.
"This is why some time ago, we started to devise the DIAMOND algorithm, in the hope that it would allow us to deal with large datasets in a reasonable amount of time," remembers Benjamin Buchfink, collaborator and PhD student in Drost's research group who has been developing DIAMOND since 2013. "It did, but it also came with a downside: it couldn't pick up some of the more distant evolutionary relationships." That means that while the original DIAMOND may have been sensitive enough to detect a given human amino acid sequence in a chimpanzee, it may have been blind to the occurrence of a similar sequence in an evolutionary more remote species.
A powerful tool for future research
While being useful for studying material that was directly extracted from environmental samples, other research goals require more sensitive tools than the original DIAMOND search algorithm. The team of researchers from Tübingen and Garching was now able to modify and extend DIAMOND to make it as sensitive as BLAST while maintaining its superior speed: with the improved DIAMOND, researchers will be able to do comparative genomics research with the accuracy of BLAST at an 80- to 360-fold computational speedup. "In addition, DIAMOND enables researchers to perform alignments with BLAST-like sensitivity on a supercomputer, a high-performance computing cluster, or the Cloud in a truly massively parallel fashion, making extremely large-scale sequence alignments possible in tractable time," adds Klaus Reuter, collaborator from the Max Planck Computing and Data Facility."
Some queries that would have taken other tools two months on a supercomputer can be accomplished in several hours with the new DIAMOND infrastructure. "Considering the exponential growth of the number of available genomes, the speed and accuracy of DIAMOND are exactly what modern genomics will need to learn from the entire collection of all genomes rather than having to focus only on a smaller number of particular species due to a lack of sensitive search capacity," Drost predicts. The team is thus convinced that the full advantages of DIAMOND will become apparent in the years to come.
INFORMATION:
Original publication
Benjamin Buchfink, Klaus Reuter, Hajk-Georg Drost
Sensitive tree-of-life scale protein alignments using DIAMOND
Nature Methods, 2021
ELSE PRESS RELEASES FROM THIS DATE:
2021-04-12
A team of scientists at the Max Planck Institute for Developmental Biology in Tübingen and the University of Bayreuth have created a novel tool that provides a real-time visualization of the growth-regulating hormone auxin in living plant cells. This new biosensor enables them to observe spatial and temporal redistribution dynamics of the plant hormone, for example in conjunction with changing environmental conditions.
Auxin plays a central role in plant life. The hormone regulates various processes, from embryonic development to the formation of roots and the directional growth in response to light and gravity. Auxin binds to specific receptors in the nucleus of a cell, leading to an activation of signaling cascades that ...
2021-04-12
Study: "The Infrastructure of Social Control: A Multi-Level Counterfactual Analysis of Surveillance, Punishment, Achievement, and Persistence"
Authors: Odis Johnson (Johns Hopkins University), Jason F. Jabbari (Washington University in St. Louis)
This study will be presented today at the AERA 2021 Virtual Annual Meeting.
Session: The School-to-Prison and Prison-to-School Pipelines: Studies of the Nexus of Schooling and the Justice System
Date/Time: Sunday, April 11, 10:40 a.m. - 12:10 p.m. ET
Main Findings:
After controlling for levels of school social disorder and student misbehavior, students attending ...
2021-04-12
New research by Yale Cancer Center shows insights into modeling resistance to immune checkpoint inhibitors, a form of cancer immunotherapy. The study was presented today at the American Association of Cancer Research (AACR) virtual annual meeting.
"Acquired resistance to immune checkpoint inhibitors is a growing clinical challenge. About 50% of lung cancer patients who initially respond to immune checkpoint inhibitors eventually develop acquired resistance to these therapies," said Camila Robles-Oteiza, lead author of the study from Yale Cancer Center. ...
2021-04-12
CORVALLIS, Ore. - Coastal communities face increasing danger from rising water and storms, but the level of risk will be more closely tied to policy decisions regarding development than the varying conditions associated with climate change, new research by Oregon State University suggests.
The findings, published in the journal Water, provide an important framework for managing the interactions between human-made and natural systems in cities and towns along shorelines as the Earth continues to warm, the researchers said.
Professor Peter Ruggiero of OSU's College of Earth, Ocean, and Atmospheric Sciences and John Bolte, chair of OSU's Biological and Ecological Engineering program, led the study, which employed a modeling platform known as ...
2021-04-12
Groundbreaking research from Tel Aviv University may lead to a significant breakthrough in the battle against deadly brain cancer. To begin with, the researchers identified a failure in the brain's immune system, leading to the amplification of cell division and spread of Glioblastoma cancer cells. The failure results partially from the secretion of a protein called P-Selectin (SELP), which, when bound to its receptor on the brain immune cells, alters their function so that instead of inhibiting the spread of cancer cells, they do the opposite, enabling them to proliferate and penetrate brain tissues.
At the next stage of the study, the researchers were able to inhibit the secretion of the SELP protein, thereby neutralizing the failure in the immune system, restoring its normal activity, ...
2021-04-12
A comprehensive analysis of 80 scientific studies has identified a 'window of impairment' of between three and 10 hours caused by moderate to high doses of the intoxicating component of cannabis, tetrahydrocannabinol (THC). The findings have implications for the application of drug-driving laws globally, researchers say.
The study found the exact duration of impairment depends on the dose of THC, whether the THC is inhaled or taken orally, whether the cannabis user is regular or occasional and the demands of the task being undertaken while intoxicated.
The study represents the first ...
2021-04-12
The search for life on other planets has received a major boost after scientists revealed the spectral signatures of almost 1000 atmospheric molecules that may be involved in the production or consumption of phosphine, a study led by UNSW Sydney revealed.
Scientists have long conjectured that phosphine - a chemical compound made of one phosphorous atom surrounded by three hydrogen atoms (PH3) - may indicate evidence of life if found in the atmospheres of small rocky planets like our own, where it is produced by the biological activity of bacteria.
So when an international team of scientists last year claimed to ...
2021-04-12
The economies of northern Central Asia rely heavily on agriculture and are particularly affected by changes in the local hydrological cycle. However, this region is one of the largest dryland regions in the Northern Hemisphere and is facing a crisis of water resources shortage in recent decades. One example is the rapid desiccation and salinization of the Aral Sea. While the construction of dams, diversion of waterways and wasting of water have been blamed for the shortage, how climate change has influenced regional water resources remains unknown.
In a recently published research article in Geophysical Research Letters (drying trend over northern Central Asia), Jie Jiang and Tianjun Zhou from the Institute of Atmospheric Physics, Chinese ...
2021-04-12
Clouds play a key role in balancing incoming and outgoing solar and thermal radiation. This is a critical process in the earth-atmosphere system. Monitoring cloud height, particle size, particle concentration, etc. are integral to understanding climate dynamics and global climate change. These physical attributes determine the radiative forcing effect of a cloud, or how much incoming radiation that a cloud reflects back to space. Satellites and ground-based radar can measure the cloud top height (CTH). However, inconsistencies exist between various satellites and radar data due to different detection methods and algorithms used to process raw information.
To quantify these conflicts, Bo Liu, jointly supervised by Dr. Juan Huo and Prof. Daren Lyu from Institute of Atmospheric ...
2021-04-12
KAIST researchers have developed a novel nanofiber production technique called 'centrifugal multispinning' that will open the door for the safe and cost-effective mass production of high-performance polymer nanofibers. This new technique, which has shown up to a 300 times higher nanofiber production rate per hour than that of the conventional electrospinning method, has many potential applications including the development of face mask filters for coronavirus protection.
Nanofibers make good face mask filters because their mechanical interactions with aerosol particles give them a greater ability ...
LAST 30 PRESS RELEASES:
[Press-News.org] Speeding up sequence alignment across the tree of life
A sequence search engine for a new era of conservation genomics