How OXFORDIA works¶
The problem¶
Rare disease research is constrained by a structural problem: no single institution has enough patients to power a statistically meaningful study. Nemaline myopathy, the condition that motivated this work, has an incidence of roughly 1 in 50,000. Fourteen universities across multiple jurisdictions hold trial data on this disease. Independently, none has enough records to meet regulatory evidentiary thresholds. Combined, they do.
But combining the data has historically been impossible. Each institution operates under different legal regimes (HIPAA, GDPR, local research ethics frameworks), uses different schemas and vocabularies, and reasonably refuses to surrender raw patient records to a central authority. The conventional answer — bilateral data-sharing agreements, secure file transfers, and bespoke ETL — does not scale beyond two or three sites and routinely takes years to negotiate per study.
The solution: federated computation¶
Each participating institution hosts its own OXFORDIA Node, holding its own data on-premise. A researcher declares a list of partner institutions to query and runs a statistical analysis from R. Raw patient records never leave the institution that owns them. What flows back to the researcher are aggregate results, combined client-side into a single answer.
library(oxfordia)
library(solidauthr)
auth <- solid_login(idp = "https://oxfordia.med.ox.ac.uk")
targets <- oxfordia_targets(
oxford = "https://oxfordia.med.ox.ac.uk/cohort/nemaline",
hacettepe = "https://oxfordia.hacettepe.edu.tr/cohort/nemaline",
partner3 = "https://oxfordia.partner3.edu/cohort/nemaline"
)
result <- oxfordia_mean(
targets = targets,
auth = auth,
graph_path = "BaselineAge"
)
result$value # 21.6309888889
result$n # 180
result$per_site # tibble: site, mean, count
The researcher's experience is unremarkable — it looks like normal R. The federation, the authentication, the per-site access checks, and the SPARQL queries are all invisible.
The novel contribution: Statistic Access Rules¶
The core innovation is the Statistic Access Rule (SAR): a permission model that lets an institution say "yes, you may compute the mean of this column" while still saying "no, you may not see any individual value."
Conventional access control can permit or deny a file. A site administrator using standard tools faces a binary choice per researcher per dataset:
- Grant read access — the researcher can pull every record and compute anything they like, including things the site would prefer they did not.
- Deny read access — the researcher gets nothing, including the aggregates the site would have been willing to share.
There is no middle position. SAR closes that gap. An institution can publish a dataset and, in the same breath, declare that named external collaborators may compute means and Kaplan–Meier curves against specific fields with a minimum cohort size of 10, while no one — including those collaborators — may pull a single row of data.
Architecture¶
OXFORDIA is a network of peer nodes. There is no central server, no central data store, and no central authority.
| Actor | Role |
|---|---|
| Researcher | Runs queries from R using their institutional identity |
| Sysadmin | Deploys and operates an OXFORDIA Node |
| Data administrator | Loads datasets and authors Statistic Access Rules |

Query flow¶
When a researcher runs a query:
- The R client obtains an identity token from the researcher's institutional identity provider.
- The token is presented to each target node.
- Each node independently verifies the identity and checks the query against its local Statistic Access Rules.
- If authorized, the node executes the query against its local triplestore.
- Post-query constraints are checked (e.g.,
minCount— if fewer records matched than the threshold, the result is withheld). - Each site returns its local aggregate result and a count.
- The R client combines per-site results into a global statistic.

Open standards¶
OXFORDIA is built on Solid, RDF, and SPARQL — all open W3C-aligned standards. The full implementation is published under the MIT License at github.com/OXFORDIA-project/OXFORDIA-node.