Accelerating innovation

How Data Cures Rare Disease

The mechanism by which longitudinal patient-controlled data accelerates rare disease cures, written so a parent reading at 2 a.m. can follow it.

Most rare diseases are caused by a single error in a single gene. We can read those errors. We are learning to fix them. Gene therapy, gene editing, and individualized antisense drugs are approved and in clinical trials today. The technology to cure rare disease exists.

The missing piece is data.

Every Rare Disease Is a Typo

Your body runs on instructions written in DNA. Those instructions tell your cells how to build proteins, the molecular machines that digest food, build bones, send nerve signals, and fight infections.

A rare disease happens when one of those instructions contains an error. One letter wrong. One sentence missing. One paragraph scrambled. The result: a protein is built wrong, or is missing entirely, and the body loses a function it needs.

PKU: the error means the body cannot break down phenylalanine, an amino acid. It accumulates and damages the brain.

Sickle cell disease: the error reshapes red blood cells so they clog blood vessels and cause pain crises.

Ehlers-Danlos syndrome: the error (in most subtypes) produces defective collagen, the structural protein that holds the body together. Joints dislocate. Skin tears. Organs shift.

There are more than 7,000 known rare diseases. Most trace to a single genetic error. Whole genome sequencing, which cost $3 billion and 13 years for the first human genome in 2003, now costs roughly $500 and takes two days.

We can find the errors. We are building the tools to fix them.

The Tools That Fix Genetic Errors

Gene therapy delivers a working copy of the broken gene into the person's cells. The cells produce the correct protein. The disease stops progressing. Zolgensma does this for spinal muscular atrophy. A single intravenous infusion replaces the missing SMN1 gene.

Gene editing (CRISPR) goes into the person's DNA and corrects the error at its source. Casgevy, the first approved CRISPR therapy, does this for sickle cell disease. The original instruction is repaired. The body does the rest.

Antisense oligonucleotides (ASOs) are small molecules that attach to the RNA copy of a broken gene and modify how it is read. They can make the cell skip over an error, read through a premature stop signal, or adjust protein production. ASOs can be designed for a single person's specific mutation. Milasen, developed at Boston Children's Hospital for Mila Makovec, was designed, manufactured, and administered to one person with one mutation in one gene.

mRNA therapies deliver a temporary working copy of the genetic instruction directly to the cell. The cell reads it and builds the correct protein for a limited time. The same platform powered the COVID-19 vaccines. It is now being redirected toward rare metabolic diseases.

These tools work. They are approved. They are in trials for dozens of conditions. The technology exists.

The barrier to curing more rare diseases is data.

Data Is the Barrier

Developing a therapy for any disease requires three things: a map of how the disease progresses without treatment (natural history), evidence that the therapy works better than doing nothing (efficacy), and evidence that it is safe over time (safety).

For common diseases, this data exists in abundance. Millions of people, thousands of studies, decades of records.

For rare diseases, the data is sparse or absent. A disease that affects 100 people worldwide has no large natural history study. Those 100 people are scattered across dozens of countries, seeing different doctors, using incompatible medical record systems. Their clinical data is locked in individual hospital files. If a pharmaceutical company that was collecting data exits the market, the data may disappear entirely. No one has systematically asked these people what they have tried, what helped, and what did not.

The tools to fix the genetic errors exist. The data to design, test, and prove the fixes does not.

What Becomes Possible When the Data Exists

Mapping disease for the first time

When 100 people scattered across the world each contribute structured data about how their disease progresses, the natural history becomes visible. At what age do symptoms appear? How fast do they worsen? Which organ systems are affected, and in what order? This map is the foundation for every therapy. You cannot design a treatment without knowing what you are treating.

Running trials without placebo groups

The largest ethical problem in rare disease trials: if 30 people have a fatal disease and a promising treatment exists, asking 15 of them to take a sugar pill is unconscionable. The alternative is natural history data as the comparison group. If untreated individuals decline at a known rate, and treated individuals decline slower or stabilize, the data itself is the control arm. Everyone gets treated. The data does the comparing. The FDA's framework for external controls explicitly supports this approach.

Making each cure accelerate the next

When an ASO is developed for one person's specific mutation in Gene X, the development team learns how to design the molecule, manufacture it, test it for safety, and deliver it. When a second person has a different mutation in the same gene, the team starts from everything the first treatment taught them. The first person's outcome data becomes the safety and efficacy baseline for the second. Each person's data reduces the development burden for the next.

The FDA's Plausible Mechanism Framework formalizes this. Master protocols allow multiple individualized therapies targeting different mutations in the same gene to be evaluated through a single regulatory application. The first person treated in a gene family carries the full regulatory burden. The fifth carries a fraction of it.

Finding diseases hiding inside wrong diagnoses

Many people with rare diseases carry incorrect diagnoses. They have been told they have fibromyalgia, chronic fatigue syndrome, anxiety, or irritable bowel syndrome. Their symptoms are real. The explanation is wrong.

When enough people contribute detailed symptom data in structured formats, computational analysis can identify subgroups whose patterns match a connective tissue disorder, a metabolic condition, or something entirely new. The data finds the people the clinical system missed.

Turning product experience into clinical evidence

When someone with a rare disease tries a new supplement, switches medical food brands, starts a physical therapy program, or changes medications, the outcome is recorded nowhere useful. It might appear in a Facebook group post, help a few people, and disappear.

If that same information is contributed in a structured format with clinical context (diagnosis, subtype, severity, concurrent treatments, outcome measures), it accumulates. When 300 people with EDS have tried a specific compression garment and reported their experience with standardized outcome measures, that is comparative effectiveness data. When 500 people with PKU have compared two formulas and reported their blood phenylalanine levels on each, that is a pragmatic trial. The community's collective structured experience becomes the evidence base that does not exist today.

Why Families Have to Build This

Government programs get defunded. Advisory committees get dissolved. The ACHDNC, the federal committee that evaluated conditions for the newborn screening panel, was terminated in 2025. Hospital research programs depend on individual investigators. When a principal investigator retires or loses a grant, data collection stops. Pharmaceutical companies enter and exit rare disease markets based on profitability. When they exit, they take the data with them, or it disappears.

The one group that never leaves the rare disease space is the people who live with it and their families. The parent managing a child's PKU diet every day for 18 years does not lose interest. The adult with EDS who spent a decade pursuing a diagnosis does not move on. The family of a child with an ultra-rare condition does not get acquired by a larger company.

Families are the only permanent participant in rare disease. The data infrastructure has to be built by the people it serves, controlled by them, and governed in their interests. Institutions are temporary. Rare disease is permanent.

What Cure Looks Like When Data Exists

A child is born. Newborn screening detects a metabolic condition in the first days of life. Genome sequencing identifies the specific genetic error. The family contributes structured health data to a contributor-controlled data trust. That data, combined with data from other families worldwide, builds the natural history picture. A therapy is designed for the child's specific error, using tools and knowledge accelerated by every person who came before. The therapy is tested using accumulated data as the evidence base. The therapy is approved under frameworks designed for small populations and well-characterized mechanisms. The child is treated. The error is corrected. The disease stops. The child's outcome data feeds back into the system, making the next therapy faster and safer.

Every one of those steps exists today in some form. Gene therapy is approved. CRISPR editing is approved. Individualized ASOs have been made for single individuals. The FDA has published frameworks for this pathway. Newborn screening detects dozens of conditions at birth.

The missing layer is the data infrastructure that connects these steps into a system: the data trust, the longitudinal records, the structured contributions from the people living with these conditions, the persistence that outlasts any single sponsor or institution.

More than 7,000 rare diseases affect roughly 1 in 10 people worldwide, about 400 million. The system treats each condition as if it were alone. Data infrastructure connects them. The safety data from one gene therapy informs the next. The natural history of one metabolic disorder illuminates another. The same privacy framework protects them all.

The tools exist. The data is the bottleneck. The infrastructure to remove it is the project.