PRESS-NEWS.org - Press Release Distribution
PRESS RELEASES DISTRIBUTION

Doubling down on known protein families

Using a novel computational approach, researchers confirm microbial diversity is wilder than ever.

Doubling down on known protein families
2023-10-11
(Press-News.org) Imagine researchers exploring a dark room with a flashlight, only able to clearly identify what falls within that single beam. When it comes to microbial communities, scientists have historically been unable to see beyond the beam — worse, they didn’t even know how big the room is.

A new study published online October 11, 2023 in Nature highlights the vast array of functional diversity of microbes through a novel approach to better understand microbial communities by looking at protein function within them. The work was led by a team of scientists at the U.S. Department of Energy (DOE) Joint Genome Institute (JGI), a DOE Office of Science User Facility located at Lawrence Berkeley National Laboratory (Berkeley Lab), and collaborators across multiple other research centers around the world.

"We've more than doubled the number of protein families known up until now, and identified many novel structure predictions," said lead author on the paper Georgios Pavlopoulos, now a research director at the Biomedical Sciences Research Center Alexander Fleming. "This was a massive analysis of 1.3 billion proteins with massively parallel computations."

Guided by JGI scientists, the team embarked on a mission to unveil the mysteries concealed within the “dark” functional realm. Their focus sharpened on deciphering the intricate world of protein functional diversity: the novel protein families and novel functions in as-yet unveiled microbes. Harnessing the collective power of more than 26,000 microbiome datasets, all accessible through the publicly available Integrated Microbial Genomes & Microbiomes (IMG/M) database, they successfully crafted the Novel Metagenome Protein Families (NMPF) Catalog.

“We can now analyze new datasets by comparing against these protein families, or further analyze the protein families in order to predict new functions,” said Nikos Kyrpides, senior author of the study and head of the JGI’s Microbiome Data Science group.

Shining a Light on Functional “Dark Matter”

Microbial communities living everywhere from soils and stomachs to the deep sea are capable of doing a lot of unique things when it comes to energy cycles — turning biomass into things like ethanol or hydrogen, or solar energy into hydrogen.

Microbial communities are also incredibly difficult to study. Many of the microbes within them cannot be cultivated in lab settings. Since each microbial community has its own unique makeup of microbe players and the functions they perform, artificially replicating a whole community is impossible.

Metagenomic sequencing allows researchers to study the entire genetic makeup of these communities via whole genome sequencing of the samples, without being able to distinguish which gene belongs to each individual microbial species within a community. Therefore, the process hinges on referencing to existing genome sequences.

Some of these proteins are what the scientists call “known knowns” — that is, they are similar to genes with known function. Others are called “known unknowns” — that is, they are similar to previously known genes from isolate organisms, but we still aren’t sure of their function.

However, if a gene in the community doesn’t match any of the previously known genes from isolates, there isn’t much scientists can tell about its function or its origin. As a result, these genes were typically discarded from any analysis as useless information. These represent the “unknown unknowns" because they aren’t similar to anything we’ve already defined.

“A huge percentage — around 30–50% of the protein families that we knew so far — still does not have any known function, but we knew the families,” Kyrpides said. Yet, “almost 20 years of metagenomic data and metagenomic analysis, and still there has been no real analysis of protein families from metagenomes per se.”

Recently, other research teams have leveraged the power of artificial intelligence to decode the language of protein sequences and obtain hints of their possible functions. Yet these efforts were limited to the realm of already-known protein sequences.

“In this endeavor, we have not only ventured into the uncharted territory of understanding the vast landscape of functional diversity, but we have also pushed the boundaries by applying AI methodologies to unravel their roles,” Pavlopoulos said. “Consequently, we have amassed an extensive repository of groundbreaking insights, significantly expanding the horizons of potential functions across various categories of proteins, including those with pivotal applications in biotechnology, such as DNA editing enzymes.”

Leveraging Protein Families in a New Way

The discovery of new protein families had started to plateau in recent years, perhaps suggesting that scientists had “captured” much of the diversity out there, even if it hadn’t yet defined what it did, exactly. But what kind of diversity might those “unknown unknowns” hold?

The team started with 8 billion metagenome genes from IMG (the study also references data from the JGI’s Genomes from Earth’s Microbiome, or GEM catalog). Then they removed any genes with even a remote similarity to previously known genes, leaving them with around 1.2 billion novel genes.

They took what they were left with and clustered them into families. From there they focused on families with at least 100 members.

“If you have 100 sequences, the quality of the cluster is significantly higher because it is very hard to have 100 sequences from different locations or habitats that align very well, randomly,” Kyrpides explained. “Replicating that 100 times would have been almost impossible.”

When the team was finished with this phase, they found that the protein family diversity within this metagenomic space (the “unknown unknowns”) was vastly greater than that of the reference genomes — by at least double.

“As we keep on adding more samples, we're getting more protein families,” Kyrpides said. “In a few years, as we keep on sequencing more metagenomes, some of the clusters that have currently 50 members or more will grow to 100 members or more as well. So, we're saying diversity has doubled, but in reality it could be three or four or five or tenfold more out there.”

Digging Further into an Array of Diversity

While the team didn’t drill down function, they were able to further characterize these families. They divided the protein families up by environment and found only 7% of protein families were shared across all eight environmental categories. Instead, families preferred a specific environment — whether that be soil, animal hosts, marine ecosystems, etc.

“So, they must be doing something interesting or important for that habitat,” Pavlopoulos explained. “That is definitely material that the scientific community now can use further. Let’s say somebody is working on soil environments or the human body — they may take some of those families and try to functionally characterize them because they are very specific to that habitat.”

Taxonomic analysis found that the majority of these protein families belonged to bacteria and viruses, though 6 million of the sequences evaded classification. Researchers also tried to hone in on the function of the genes via 3D modeling, and comparing structures of the unknown to those of the known — similar structure equates to high likelihood of similar function. The team also identified protein families with completely novel structures.

The computational power to perform this level of analysis hinged on access to the National Energy Research Scientific Computing Center, another user facility at Berkeley Lab.

"It's also a credit to Aydin Buluç's team with Berkeley Lab's Applied Mathematics and Computational Research Division," Pavlopoulos said. "They developed parallel algorithms to perform 'all-vs-all' comparisons and graph clustering able to run in such highly parallel infrastructures."

This is the first time protein structures have been used to help characterize the vast array of microbial dark matter. The study took roughly two years to complete, with only about 20,000 metagenomes sequenced at the time. Now, that number is closer to 60,000.

“There is still 70–80% of known microbial diversity out there that is not yet captured genomically,” Kyrpides said. “So, that diversity is definitely holding a lot of new secrets in terms of functional diversity as well.”

Researchers from Harvard University, Indiana University. University of Crete (Greece). Georgia Institute of Technology, Michigan State University, Lawrence Livermore National Laboratory, University of Washington, Centre for Research & Technology Hellas (Greece), Aristotle University of Thessalonica (Greece), and the University of California, Berkeley were also involved in the work. Other authors on the paper are Fotis Baltoumas, Sirui Liu, Oguz Selvitopi, Antonio Camargo Stephen Nayfach, Ariful Azad, Simon Roux, Lee Call, Natalia N. Ivanova, I Min Che, David Paez-Espino, Evangelos Karatzas, Novel Metagenome Protein Families Consortium, Ioannis Iliopoulos, Konstantinos Konstantinidis, James M. Tiedje, Jennifer Pett-Ridge, David Baker, Axel Visel, Christos A. Ouzounis, and Sergey Ovchinnikov.

 

Publication: Pavlopoulos G et al. Unraveling the functional dark matter through global metagenomics. Nature. 2023 October 11. doi: 10.1038/s41586-023-06583-7

 

***

The U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility at Lawrence Berkeley National Laboratory, is committed to advancing genomics in support of DOE missions related to clean energy generation and environmental characterization and cleanup. The JGI provides integrated high-throughput sequencing and computational analysis that enable systems-based scientific approaches to these challenges. Follow @jgi on Twitter.

###

Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 16 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy's Office of Science.

DOE's Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.

END


[Attachments] See images for this press release:
Doubling down on known protein families

ELSE PRESS RELEASES FROM THIS DATE:

Detection and extraction of similar features in the disease-related gene groups

Detection and extraction of similar features in the disease-related gene groups
2023-10-11
【Research Study】 1. Background   Multiomics3 analysis that integrates different layers of profiles altogether is challenging, since the number of variables in profile substantially differ from each other. For instance, gene expression profile and genomic DNA methylation profile are often analyzed together, however, there are only tens of thousands of genes, whereas the number of DNA methylation sites are as many as tens of millions. The numbers differ one thousand times and the number ...

Omega-3 discovery moves us closer to 'precision nutrition' for better health

2023-10-11
University of Virginia School of Medicine researchers have obtained new insights into how African-American and Hispanic-American people’s genes influence their ability to use Omega-3 and Omega-6 fatty acids for good health. The findings are an important step toward “precision nutrition” – where a diet tailored to exactly what our bodies need can help us live longer, healthier lives. Omega-3 and Omega-6 are “healthy fats.” We can get them from foods, but many people also take ...

Genalive earns CAP accreditation to raise the bar for clinical standards in Saudi Arabia

Genalive earns CAP accreditation to raise the bar for clinical standards in Saudi Arabia
2023-10-11
Genalive, a leading clinical laboratory in Saudi Arabia, has passed an audit organized by the College of American Clinical Pathologists (CAP), demonstrating its excellence in clinical laboratory testing and management practices. Genalive officially opened in June 2023, equipped with high-throughput sequencing platforms, advanced bioinformatics pipelines, AI-driven analytical tools and staffed by a team of experienced medical professionals and technicians. Genalive is a joint venture between BGI Almanahil Health for Medical Services, a wholly owned subsidiary of BGI Genomics, and Tibbiyah Holding, a renowned Saudi healthcare ...

Gene discoveries could help prevent deadly coronary artery disease

Gene discoveries could help prevent deadly coronary artery disease
2023-10-11
An international team of scientists has identified nearly a dozen genes that contribute to calcium buildup in our coronary arteries that can lead to life-threatening coronary artery disease, a condition responsible for up to one in four deaths in the United States. Doctors may be able to target these genes with existing medications – or possibly even nutritional supplements – to slow or halt the disease’s progression. “By sharing valuable genotype and phenotype datasets collected over many years, our team was able to uncover new genes that may foreshadow clinical coronary artery disease,” said researcher Clint L. Miller, PhD, of ...

Journal honors pioneering scientist with new series

Journal honors pioneering scientist with new series
2023-10-11
While scientific advances are made daily, foundational breakthroughs are rare and require exceptional researchers with unique points of view and questions, plus the necessary means to explore those ideas. One such researcher, Harold H. Flor, became a seminal figure in the study of plant pathology after developing the gene-for-gene concept in the mid-1900s. The gene-for-gene concept (namely, for each gene governing the host response, there is a corresponding gene in the pathogen) still stands as one of the most significant contributions to plant pathology—forever changing how scientists approach plant-microbe interactions and, more specifically, the molecular mechanisms ...

SwRI selected for $1.5 million DOE grant to evaluate compressor system for hydrogen-natural gas blends

SwRI selected for $1.5 million DOE grant to evaluate compressor system for hydrogen-natural gas blends
2023-10-11
SAN ANTONIO — October 11, 2023 —Southwest Research Institute has been selected to receive a $1.5 million contract from the U.S Department of Energy to evaluate the safety and efficiency of a full-scale compressor system for hydrogen-natural gas blends containing up to 20 percent hydrogen by volume. SwRI will collaborate with the Gas Machinery Research Council (GMRC) on this project. “Hydrogen has been recognized as a viable alternative to natural gas fuel,” said SwRI Senior Research Analyst Sarah Simons. “However, a pure hydrogen stream is not compatible with existing energy transport infrastructure because hydrogen and natural gas have ...

Researchers plot a course for building a “digital twin” of the brain

Researchers plot a course for building a “digital twin” of the brain
2023-10-11
Recent developments in neuroscience and brain-inspired artificial intelligence have opened up new possibilities in understanding intelligence. Now, a research team led by Tianzi Jiang at the Institute of Automation of the Chinese Academy of Sciences has outlined the key components and properties of an innovative platform called the Digital Twin Brain, which could bridge the gap between biological and artificial intelligence and provide new insights into both. This research was published Sept. 22 in Intelligent Computing, a Science Partner Journal. Network structure is something that biological and artificial intelligence have in common. Since the brain consists ...

New research unveils intricate mechanism behind immune system’s ability to differentiate between self and non-self antigens

New research unveils intricate mechanism behind immune system’s ability to differentiate between self and non-self antigens
2023-10-11
A groundbreaking study, led by Professor Kyemyung Park and his research team in the Graduate School of Health Science and Technology and the Department of Biomedical Engineering at UNIST has shed light on the intricate mechanism behind the immune system’s ability to differentiate between self and non-self antigens. Their research, published in the esteemed journal Trends in Immunology, presents a novel quantitative framework that could pave the way for predictive models in immune-related disease treatment response. The immune system is a complex network of cells and molecules that defends ...

Is less more? Or is less sometimes less? Examining the consumer trend toward minimalist packaging in consumable products

2023-10-11
Researchers from Texas Christian University, University of Illinois Urbana-Champaign, and University of Georgia published a new Journal of Marketing article that examines the consumer trend towards minimalist packaging in consumable products. The study, forthcoming in the Journal of Marketing, is titled “Symbolically Simple: How Simple Packaging Design Influences Willingness to Pay for Consumable Products” and is authored by Lan Anh N. Ton, Rosanna K. Smith, and Julio Sevilla. Designing products is both an ...

Killer whales’ diet more important than location for pollutant exposure, study says

Killer whales’ diet more important than location for pollutant exposure, study says
2023-10-11
Both elegant and fierce, killer whales are some of the oceans’ top predators, but even they can be exposed to environmental pollution. Now, in the largest study to date on North Atlantic killer whales, researchers in ACS’ Environmental Science & Technology report the levels of legacy and emerging pollutants in 162 individuals’ blubber. The animals’ diet, rather than location, greatly impacted contaminant levels and potential health risks — information that’s helpful to conservation efforts. As the largest member of the dolphin family, killer whales, also known as orcas, are ...

LAST 30 PRESS RELEASES:

Protecting confidentiality in adolescent patient portals

Gatling conducting digitization project

Regenstrief researcher awarded $1.9 million CDC grant

Independent expert report: The Human Brain Project significantly advanced neuroscience

Wu conducting molecular modeling of DR domain of HIV restriction factor PSGL-1

Nguyen working to make complex invariants accessible

Menstrual cycle luteal phase lengths are not 'fixed' at 13-14 days

Should men and women eat different breakfasts to lose weight?

SwRI’s Nathan Andrews named AIAA Associate Fellow

Invasive populations of tiger mosquitoes continuously expand the diversity of hosts in their blood-meal

After injury, these comb jellies can fuse to become one

Whale shark shipping collisions may increase as oceans warm

Despite medical advances, life expectancy gains are slowing

Johns Hopkins Medicine study finds commonly used arm positions can substantially overestimate blood pressure readings

Arm position and blood pressure readings

Longitudinal changes in epigenetic age acceleration across childhood and adolescence

An early blood test can predict survival in patients with metastatic prostate cancer, shows USC study

Scientists discover that special immune cells stop metastatic cancer

Cancer biologists discover a new mechanism for an old drug

Food deserts, limited access to transportation linked to more complications among preschool children with SCD

Space oddity: Most distant rotating disc galaxy found

How a common economic theory could help save endangered frogs

Stopping off-the-wall behavior in fusion reactors

Real-time cancer diagnostics and therapy through theranostics

Researchers confront new US and global challenges in vaccinations of adults

NCSA building stronger connections among observatories, astronomers

Latest advances in brain network models for medical applications: A comprehensive review highlights future potential

Jefferson Lab physicists named APS Fellows

Bias found when drug manufacturers fund clinical trials

The University of Texas at San Antonio is advancing space exploration as the lead of a multimillion-dollar DOE project

[Press-News.org] Doubling down on known protein families
Using a novel computational approach, researchers confirm microbial diversity is wilder than ever.