(Press-News.org) Engineers at the University of California have developed a new data structure and compression technique that enables the field of pangenomics to handle unprecedented scales of genetic information. The team, led by UC San Diego electrical and computer engineering professor Yatish Turakhia, described their compressive pangenomics approach in Nature Genetics on Jan. 12, 2026.
Pangenomics, a subset of bioinformatics, is the study of many different genomes from one specific species. This can provide a more holistic picture of the natural variation and mutations that occur within a species than using one singular reference genome. This has many practical applications, such as studying how genomic mutations lead to increased transmissibility or drug resistance in pathogens.
Although advances in genome sequencing technologies have reduced the cost and increased the speed of sequencing, the data structures and analysis tools needed to study and graphically represent the relationships between millions of sequenced genomes remain a challenge. While graph-based data formats for pangenomes have become popular and widely adopted, they only represent the genetic variation in a collection of genomes, not their shared evolutionary and mutational histories. They also have large storage requirements that do not scale well.
“The data structures used for pangenomics research are critical because they determine not only how efficiently genetic data is represented, but also what the data can represent,” said Sumit Walia, an electrical engineering PhD candidate at the Jacobs School of Engineering and co-first author of the study.
The research team, which includes engineers from the Genomics Institute at UC Santa Cruz, pioneered a new data structure and file format, called Pangenome Mutation-Annotated Network (PanMAN). PanMAN not only provides unmatched compression for pangenomes but also significantly advances the representative power by encoding additional biologically relevant information, including phylogenies, mutations, and whole-genome alignments. Their compressive pangenomics approach can perform analysis on compressed pangenomic data, allowing researchers to handle vastly larger scales of genetic data than currently possible.
“Our compressive technique with PanMANs allows doing more with less, greatly improving the scale and scope of current pangenomic analysis”, said Turakhia, the study’s corresponding author.
PanMANs are composed of mutation-annotated trees, called PanMATs, which store a single ancestral genome sequence at the root and annotate mutations, such as substitutions, insertions, and deletions, on the different branches. Multiple PanMATs are connected in the form of a network using edges to generate a PanMAN. These edges store complex mutations, such as recombination and horizontal gene transfer data, which result in sequences involving multiple parent sequences and violate the vertical inheritance assumption of single trees. This representation is compact as it exploits the shared ancestry among genomes, representing each mutation only once on the branch where it arose instead of duplicating them across individual sequences.
In addition, PanMAN was crafted to represent a rich set of biologically meaningful information that current pangenome formats lack. Some information in PanMAN is explicitly stored, such as mutations, phylogeny, annotations, and root sequence, whereas other information can be derived, such as ancestral sequences, multiple whole-genome alignment, and genetic variation.
So far, the researchers have used PanMAN to study microbial genomes. They have found that this method is the most compressible format among variation-preserving pangenomic formats, providing up to hundreds or even thousands of times more compression. For example, the team built the largest pangenome for SARS-CoV-2, using more than 8 million separate genomes of the virus. Using their PanMAN method, this vast amount of genetic data only required 366MB of file storage space, which is roughly 3,000 times less storage than its corresponding whole-genome alignment that PanMAN encodes. Constructing an alignment for SARS-CoV-2 genomes at this scale was itself a formidable challenge, which was addressed by another computational tool developed at Turakhia’s lab, called TWILIGHT.
Now, the researchers are expanding their use of TWILIGHT and PanMANs from microbes to human genomes. Turakhia and Melissa Gymrek, a professor of computer science and engineering at UC San Diego, received a Jacobs School Early Career Faculty Development Award to advance this effort.
“Extending compressive pangenomics to human genomes can fundamentally transform how we store, analyze, and share large-scale human genetic data,” said Turakhia. “Besides enabling studies of human genetic diversity, disease, and evolution at unprecedented scale and speed, it can depict detailed evolutionary and mutational histories which shape diverse human populations, something that current representations do not capture.”
Full study: Compressive pangenomics using mutation-annotated networks
END
Compressed data technique enables pangenomics at scale
2026-01-12
ELSE PRESS RELEASES FROM THIS DATE:
How brain waves shape our sense of self
2026-01-12
A new study from Karolinska Institutet, published in Nature Communications, reveals how rhythmic brain waves known as alpha oscillations help us distinguish between our own body and the external world. The findings offer new insights into how the brain integrates sensory signals to create a coherent sense of bodily self.
What makes you feel that your hand is yours? It might seem obvious, but the brain’s ability to tell self from non-self is a complex process.
Using a combination of behavioural experiments, brain recordings (EEG), brain stimulation, and computational modelling with a total of 106 participants, ...
Whole-genome sequencing may optimize PARP inhibitor use
2026-01-12
A whole-genome sequencing approach shows early promise over current commercial methods for identifying more patients likely to benefit from PARP inhibitor cancer treatments, according to a study led by Weill Cornell Medicine and NewYork-Presbyterian investigators. The findings suggest further development of this approach is merited.
In the study, published Jan. 12 in Communications Medicine, the researchers performed whole-genome sequencing analysis on hundreds of tumor samples obtained by informed consent as part of a precision medicine initiative by Weill Cornell, NewYork-Presbyterian and Illumina, ...
Like alcohol units, but for cannabis – experts define safer limits
2026-01-12
Researchers at the University of Bath in the UK are proposing thresholds for safe – or at least safer – cannabis use and hope their findings will help people monitor consumption and keep it within recommended limits – similar to how alcohol units guide safer drinking.
The threshold recommendations, proposed in a paper published today in the journal Addiction, are based on a system for measuring cannabis consumption not by weight but by THC content (THC is the compound responsible for the psychoactive effects of cannabis).
In the same ...
DNA testing of colorectal polyps improves insight into hereditary risks
2026-01-12
In about 5–10% of colorectal cancer patients, hereditary factors play a role, with higher percentages among younger patients. Research from Radboud university medical center and university hospital Bonn (UKB) in collaboration with researchers from Munich and Barcelona, shows that DNA analysis of colorectal polyps provides important additional information on the development of these polyps and colorectal cancer. This DNA analysis leads to better diagnostics and treatment ...
Researchers uncover axonal protein synthesis defect in ALS
2026-01-12
Leuven, January 12, 2026 – Researchers at VIB and KU Leuven have identified a molecular process that allows motor neurons to maintain protein production, a process that fails in amyotrophic lateral sclerosis (ALS). The study, published in Nature Neuroscience, reveals an early weakness in neurodegeneration and highlights a potential target for future therapies.
Building proteins
Motor neurons depend on local protein production within their axons to support their long-distance connections to muscles. Using advanced spatial transcriptomics, scientists at the VIB–KU Leuven Center for Brain & Disease Research analyzed gene expression ...
Why are men more likely to develop multiple myeloma than women?
2026-01-12
Rates of multiple myeloma (MM), the second most common blood cancer in the United States, are increasing and are twice as high in men than in women. A new study published by Wiley online in CANCER, a peer-reviewed journal of the American Cancer Society, provides insights that may help to explain this disparity.
To investigate the sex difference in MM, researchers analyzed data on 850 patients with newly diagnosed MM enrolled in the Integrative Molecular And Genetic Epidemiology (IMAGE) study at the University of Alabama at Birmingham.
Compared with female patients, male patients were more likely to have advanced (International Staging System stage III) disease at the time of diagnosis. Males ...
Smartphone-based interventions show promise for reducing alcohol and cannabis use: New research
2026-01-12
by W.B. Kagan
PISCATAWAY, NJ – Young adults today are digital natives—naturally fluent with devices and online platforms—so some of their most effective behavioral-health interventions will likely arrive in their pockets via text, app, or other mobile medium. Now, new research shows that such interventions for alcohol and cannabis use among young adults show potential to reduce harms, according to three reports in the Journal of Studies on Alcohol and Drugs.
Heavy drinking and cannabis use among young adults continue to exact a great cost from individuals and society, ...
How do health care professionals determine eligibility for MAiD?
2026-01-12
How do health care professionals in Canada assess applicants for medical assistance in dying (MAiD)? A research article in CMAJ (Canadian Medical Association Journal) https://www.cmaj.ca/lookup/doi/10.1503/cmaj.251071 describes the careful approach currently used to determine eligibility, and an analysis article suggests an approach to eligibility assessments for advance requests for MAiD — which are currently available in Quebec and being considered elsewhere in Canada.
In 2021, Canada ...
Microplastics detected in rural woodland
2026-01-12
Air-polluting microplastics have been found in rural environments in greater quantities than in urban locations, researchers say.
Scientists led by the University of Leeds detected up to 500 microscopic particles of plastic per square metre per day in an area of woodland during the three-month study – almost twice as much as in a sample collected in a city centre.
They believe trees and other vegetation capture airborne microplastic particles from the atmosphere and deposit them, highlighting the impact ...
JULAC and Taylor & Francis sign open access agreement to boost the impact of Hong Kong research
2026-01-12
Researchers in Hong Kong will have greater opportunities to share their work with a global audience through a new open access (OA) agreement between the Joint University Librarians Advisory Committee (JULAC) and Taylor & Francis.
The three-year agreement enables researchers at all participating institutions to publish OA articles in over 2,000 Taylor & Francis and Routledge Open Select (hybrid) journals without payment of an OA article publishing charge. Articles will be open on publication and free to access and reuse for readers around the world, ...