Medicine Technology 🌱 Environment Space Energy Physics Engineering Social Science Earth Science Science
Medicine 2026-03-18

Ten million patient records, three cities, one question: what actually precedes Alzheimer's?

The M3AD platform links electronic health records from New York, Chicago, and Miami to track how chronic diseases interact over decades to shape dementia risk

What if Alzheimer's disease is not a single disease but the downstream result of dozens of conditions, behaviors, and circumstances interacting over decades?

That question drives a new large-scale data initiative described in a study published in Alzheimer's and Dementia by researchers at Columbia University Mailman School of Public Health and collaborators at the University of Miami and University of Chicago. The project, called the M3AD Study and Real-World Data Metaplatform, links electronic health records from nearly 10 million patients across three major U.S. health systems to study dementia not as an isolated brain disease but as the product of a complex web of interacting health trajectories.

Why single-disease research misses the picture

Traditional dementia research has tended to study risk factors one at a time. Does diabetes increase Alzheimer's risk? Does hypertension? Does depression? Each question gets its own study, its own cohort, its own analysis. The answers come back as isolated risk ratios: diabetes increases risk by this much, hypertension by that much.

But that is not how aging works. Nearly 90% of adults over age 60 have multimorbidity - two or more chronic diseases occurring simultaneously. A 70-year-old with diabetes is also likely to have hypertension, obesity, depression, sleep disorders, or cardiovascular disease. These conditions do not exist in parallel silos. They interact. Diabetes worsens vascular health. Vascular disease changes brain perfusion. Depression alters behavior and medication adherence. Sleep disruption accelerates neurodegeneration.

Studying each risk factor in isolation misses these interactions. And it may be precisely the interactions - the specific sequences and combinations of conditions that accumulate over a lifetime - that determine who develops dementia and who does not.

Moise Desvarieux, M.D., Ph.D., associate professor of Epidemiology at Columbia and corresponding author of the study, frames the challenge directly. As people live longer, chronic diseases increasingly occur together, creating complex health trajectories that traditional disease-by-disease research cannot easily capture.

Nearly 10 million records across three cities

The M3AD platform draws data from three large academic health systems spanning different regions, populations, and care settings:

  • NewYork-Presbyterian Hospital Clinical Data Warehouse: 32 years of records from approximately 6 million patients, including 33,000 with Alzheimer's disease or related dementias (AD/ADRD)
  • University of Chicago Clinical Research Data Warehouse: records from more than 2 million patients, including 11,000 with AD/ADRD
  • University of Miami Health System: approximately 1.4 million patients, including 13,000 with AD/ADRD

Together, the three datasets encompass roughly 60,000 individuals with Alzheimer's disease or related dementias, drawn from a multiethnic population spanning White, Black, Hispanic, and Asian communities. This diversity matters. Alzheimer's risk, presentation, and progression vary across racial and ethnic groups, and research conducted in predominantly White populations may not generalize to the broader population.

The platform harmonizes data that was originally collected in different formats, using different coding systems, at different institutions. This is not a trivial technical achievement. Electronic health records are notoriously messy - diagnoses may be coded differently across systems, lab values may use different units, and clinical notes may describe the same condition in different terms. The M3AD infrastructure standardizes these records so they can be analyzed together while preserving patient privacy through a federated model that allows each institution to retain its data locally.

Tracking interacting conditions over decades

The platform's analytical approach is fundamentally different from traditional epidemiological studies. Rather than testing whether individual risk factors correlate with dementia, it examines how multiple conditions interact and evolve over time within individual patients. This longitudinal, multi-disease perspective allows researchers to identify trajectories of multimorbidity - specific patterns of disease accumulation and interaction that precede Alzheimer's diagnosis.

For example, the platform could potentially reveal whether a person who develops diabetes at 50, followed by depression at 55, followed by sleep apnea at 60, faces a different dementia risk profile than someone who acquires the same three conditions in a different order or at different ages. These trajectory-level insights are invisible to studies that examine each condition separately.

The initiative also integrates neighborhood-level contextual data by linking clinical records with census-tract information. This allows researchers to examine how social and environmental conditions - income levels, air quality, food access, neighborhood safety - interact with medical conditions to shape Alzheimer's risk over time. Co-author Allison Aiello, professor of Epidemiology at Columbia, emphasized that embedding clinical data within neighborhood context allows the team to examine how social and environmental conditions shape disease risk and progression.

Machine learning and the eRADAR algorithm

The platform incorporates advanced analytical tools, including machine-learning models designed to identify patterns too complex for traditional statistical methods. One integrated tool is eRADAR (Electronic Health Record Risk of Alzheimer's and Dementia Assessment Rule), an algorithm that uses routine EHR data - diagnoses, medications, lab results, vital signs - to flag individuals who may have undiagnosed dementia and should receive further clinical evaluation.

Beyond prediction, the researchers envision using the platform to test prevention hypotheses in real-world populations. If smoking cessation at age 50 is associated with lower dementia risk at age 80, can that signal be detected in the EHR data? If aggressive blood-pressure control during middle age changes cognitive trajectories two decades later, will the records show it? The scale and time depth of the dataset make these questions, at least in principle, answerable.

Co-author Habibul Ahsan of the University of Chicago noted that the models can be expanded to integrate additional data sources, including imaging, genetic information, and novel biomarkers as they become available.

What this platform cannot do

The M3AD platform is observational. It can identify associations and generate hypotheses, but it cannot prove causation. Finding that a specific multimorbidity trajectory is associated with higher dementia risk does not prove that those conditions caused the dementia - they may share underlying causes, or the association may reflect confounding factors not captured in the data.

Electronic health records, for all their scale, have well-documented limitations. They capture only what is recorded during clinical encounters. Patients who rarely see doctors are underrepresented. Conditions that are never formally diagnosed do not appear in the data. Social determinants of health are captured only partially and indirectly. The records reflect care-seeking behavior as much as actual health status.

The federated privacy model, while essential for protecting patient data, introduces its own constraints. Researchers cannot pool raw data across institutions, which limits certain types of analysis and adds computational complexity.

More than 7.2 million older Americans currently live with Alzheimer's, including roughly 35% of those aged 85 and older. The disease's economic burden exceeds $300 billion annually, and current treatments can only modestly slow progression in some patients. A platform that helps identify which combinations of modifiable risk factors most strongly predict dementia - and in which populations - could inform prevention strategies targeted at the decades before symptoms appear.

Whether that potential translates into practical clinical tools depends on validation, replication, and the willingness of health systems to integrate algorithmic risk assessment into routine care. Those are significant hurdles. But assembling the data infrastructure is the necessary first step, and with nearly 10 million longitudinal patient records now linked and harmonized, that step has been taken.

Source: Desvarieux et al., Columbia University Mailman School of Public Health, with University of Miami and University of Chicago. Published in Alzheimer's and Dementia. Funded by the National Institute on Aging (R56AG082167) and Mailman Centennial Grand Challenges in Public Health.