Medicine Technology 🌱 Environment Space Energy Physics Engineering Social Science Earth Science Science
Medicine 2026-03-09 3 min read

Long-read genome sequencing finds autism-linked variants that short reads missed

Analyzing 267 genomes from families with autism, UC San Diego researchers show that reading longer DNA stretches boosts discovery of structural variants and tandem repeats by over 30%

Cell Genomics, March 2026

A substantial portion of autism's genetic basis remains unexplained. Researchers call it the "missing heritability" problem: twin studies and family analyses indicate that autism spectrum disorder (ASD) is highly heritable, but the genetic variants identified so far account for only a fraction of that heritability. The gap has persisted for years, and one reason may be that the standard tools for reading genomes have been missing entire categories of variation.

A new study from the University of California San Diego, published in Cell Genomics, suggests that a different sequencing approach can find what traditional methods overlook. Using long-read whole genome sequencing (LR-WGS) on 267 genomes from families with autism, the researchers discovered genetic variants that had been invisible to conventional short-read technology.

Why read length matters

Standard genome sequencing works by breaking DNA into short fragments, reading each fragment, and computationally assembling the results. This approach, called short-read sequencing, reads pieces roughly 150 to 300 base pairs long. It works well for identifying single-letter changes in the genetic code (point mutations) but struggles with larger structural changes: deletions, duplications, inversions, and complex rearrangements that span thousands of base pairs.

Long-read sequencing reads much larger stretches of DNA at once, sometimes tens of thousands of base pairs in a single read. This makes it far easier to detect structural variants and tandem repeats, sections of DNA where a short sequence is repeated multiple times in a row. Both types of variation can disrupt gene function, and both are poorly captured by short reads.

The UC San Diego team found that LR-WGS enhanced the discovery of gene-disrupting structural variants by 33% and tandem repeats by 38% compared to short-read sequencing of the same genomes. Some of the newly discovered mutations were complex rearrangements that affected multiple genes simultaneously, a type of variation that short reads often cannot resolve at all.

Methylation data as a bonus

One advantage of long-read technology that the study exploited is its ability to detect DNA methylation, small chemical modifications that regulate gene activity, simultaneously with sequence information. This dual readout allowed the researchers to determine not just which genes were mutated but how those mutations affected gene function.

In some cases, a structural variant might disrupt a gene's coding sequence. In others, it might alter regulatory regions that control when and where the gene is expressed. By pairing sequence data with methylation data, the team could distinguish between these scenarios, providing a more complete picture of how each variant might contribute to autism.

Scale and its limitations

At 267 genomes, this is the largest study of its kind to date for autism. But senior author Jonathan Sebat, professor of psychiatry and cellular and molecular medicine at UC San Diego, acknowledged that even larger studies will be needed to estimate precisely how much of the missing heritability can be explained by variants that long reads detect.

Sebat hypothesizes that LR-WGS could double the amount of heritability explained by certain types of variants, such as tandem repeats and structural variants. But that estimate remains speculative until replicated in substantially larger cohorts. The cost of long-read sequencing, while declining, remains higher than short-read approaches, which limits the pace at which these larger studies can be conducted.

The study also focuses on rare variants, mutations found in individual families rather than common across the population. Rare variants have been a productive area of autism genetics, but they explain only a portion of overall heritability. Common variants, best studied through genome-wide association studies, represent another major piece of the puzzle that long-read sequencing does not directly address.

Clinical potential, with patience

The practical implications point in two directions. For diagnostics, long-read sequencing could identify genetic causes of autism in individuals whose standard genetic testing came back negative. For therapeutics, understanding the specific genetic mechanisms disrupted in each patient could eventually enable targeted treatments.

Both applications remain future possibilities rather than current realities. Translating genetic discoveries into diagnostic tools requires validation across diverse populations, and targeted therapies for specific genetic subtypes of autism are still in early development. But the study establishes that a meaningful amount of genetic information relevant to autism has been hiding in plain sight, missed not because it was not there but because the tools were not designed to see it.

Source: Sebat J et al. Cell Genomics, March 2026. Institution: University of California San Diego School of Medicine. Funded by NIMH (MH113715, MH133899), NIDA (U01DA051234), NHGRI (1R01HG010149).