Accelerating innovation

Control data for the next cure

External control arms instead of placebo. Why every rare disease trial pays for natural history twice, and what changes when the data persists across sponsor transitions.

A randomized controlled trial in a rare disease faces an arithmetic problem the rest of clinical research does not face. With a hundred eligible patients in the world, randomizing half to placebo, the trial is asking fifty desperately ill people to forego a drug whose molecular mechanism predicts they would benefit from it. With twenty eligible patients, the same calculation strains beyond what families and ethicists can accept. With a single eligible patient, randomization is meaningless.

The alternative is the external control arm. Instead of randomizing within the trial, the trial uses outcome data from comparable patients managed under standard of care to estimate what the treated patients would have done without treatment. The comparison is between observed outcomes in the trial and predicted outcomes from the historical cohort. The external control arm is statistically less powerful than randomization but is statistically more powerful than nothing, and it does not require any patient to be assigned to placebo.

The external control arm requires natural history data. The natural history data has to be longitudinal, structured, comparable in measurement, and large enough to support meaningful comparison. The natural history data, for most rare diseases, does not exist in that form.

What natural history data costs to build inside a trial

A pharmaceutical sponsor developing a drug for a rare disease typically funds a natural history study in parallel with the early development program. The natural history study runs for two to five years before the pivotal trial begins, enrolls fifty to two hundred patients depending on the disease prevalence, and collects standardized clinical assessments at quarterly or semi-annual visits. The cost of a natural history study runs from several million to several tens of millions of dollars depending on duration, sample size, and assessment burden.

When the trial reads out, the natural history study has produced a dataset that is used to interpret the trial result. After the trial reads out, the natural history dataset is closed. The sponsor moves to the next development priority. The dataset, which represents the largest investment in characterizing the disease that the field has ever seen, is rarely made available for reuse beyond the original trial.

The next sponsor with a candidate for the same disease has to build a new natural history study. The data accumulated in the previous study, which would be the most valuable input the new sponsor could have, is not accessible. The cost of natural history data collection is paid twice, three times, four times, by successive sponsors who could each have started from where the previous one left off.

What persistent natural history data does

The data trust model proposes that natural history data, collected with patient consent under fiduciary governance, persists across sponsor transitions. The first sponsor's natural history study contributes data to the trust. The trust holds the data on behalf of the contributing patients. The next sponsor pays for access to the data under terms that the patient community establishes.

The mechanism shifts the natural history dataset from a sponsor-controlled asset that depreciates after the trial to a community-controlled asset that appreciates over time. Each subsequent trial adds outcome data that the trust holds. Each subsequent natural history study contributes additional patients. Each new trial that uses the dataset as an external control generates results that further characterize the disease.

The compounding step is across conditions. If two organic acidemias share a common pathway and have similar outcome measures, the natural history data from one informs the trial design for the other. The cardiomyopathy progression in propionic acidemia informs the cardiomyopathy progression assumptions in methylmalonic acidemia. The renal trajectory in MMA-mut informs the renal trajectory assumptions in adjacent conditions. The compounding is not automatic; it requires that the data is structured comparably and that the sharing infrastructure exists. Both are engineering problems, not scientific problems.

The Nurses' Health Study analogue

The Nurses' Health Study did not generate findings about one disease. It generated findings about cardiovascular disease, breast cancer, colorectal cancer, depression, diabetes, dementia, and a long list of other conditions, because the cohort was characterized broadly and followed long enough that multiple research questions could be answered from the same dataset. The marginal cost of asking the next question, once the cohort and the data infrastructure existed, was small. The marginal cost of building a new cohort to ask the same question would have been prohibitive.

The rare disease equivalent is a multi-condition longitudinal cohort that holds natural history data for dozens of rare diseases simultaneously. The infrastructure cost is paid once. The questions that the dataset can answer accumulate over time. The data that begins as a natural history dataset for condition A becomes external control data for condition A's first trial, then for condition B's trial when condition B's pathway overlaps, then for cross-condition signal detection that no single-disease cohort could support.

The trial sponsor that funds natural history collection in this model is not building a depreciating asset. The sponsor is contributing to an infrastructure that improves the next trial, including the next trial the same sponsor runs. The economics shift in the same direction as the patient interest shifts: toward persistent, well-governed, comparable data that no one party owns and that all parties can use under negotiated terms.