Skip to content

Data Plugins

A Data Plugin describes a domain-specific shape of data. It tells OXFORDIA what the data means, how it is structured, and how researchers should refer to it in queries.

Each Data Plugin ships with four components.

Components

Data Schema

A formal contract that data must satisfy to be considered valid input to the plugin's tooling. The schema language is ShEx (Shape Expressions). Ingest is validated automatically against the schema, so downstream tools can rely on the structure being exactly what they expect.

Graph Shortcuts

Real research questions rarely line up directly with the underlying RDF graph structure. "Baseline age" is not a single value — it is a path through related concepts: a Person has an Age Aspect, which has a Magnitude, which has a numeric value and a unit of measure.

Spelling that out on every query is tedious and error-prone. The graph-shortcut mechanism gives the whole path a stable, human-meaningful name — BaselineAge — that researchers and administrators refer to without needing to write out the full path.

UI Component

The plugin contributes its own resource view and any required custom importers (a CSV upload form, for example) into the OXFORDIA admin dashboard. When a dataset of this plugin's type is loaded, the dashboard uses the plugin's UI component to display and manage it.

R Client Library

The plugin exposes a typed R interface so that researchers can refer to plugin-provided graph shortcuts (e.g., BaselineAge) by name in their queries, without needing to know the underlying RDF structure.


Reference implementation: Nemaline Data Plugin

The reference Data Plugin models nemaline myopathy clinical trial records.

Package: oxfordia-plugin-data-nemaline

Graph shortcuts

Shortcut Description
BaselineAge Age of participant at study enrollment
EventTime Time from baseline to the event or censoring
EventOccurred Whether the event occurred (boolean)
GeneticGroup Genetic subgroup classification
Ambulation Ambulatory status
TotalMFM Total Motor Function Measure score

CSV import format

The Nemaline plugin accepts a standardized CSV. Required columns:

Column Type Description
ID integer Participant identifier
CLUSTER string Cluster assignment
GENETIC_GROUP string Genetic variant group
BASELINE_AGE float Age at enrollment (years)
AMBULATION string Ambulatory / Non-ambulatory
TOTAL_MFM integer Total MFM score
KM_EVENT 0 or 1 Event indicator
KM_TIME_YR float Time to event or censoring (years)

Writing a new Data Plugin

A Data Plugin is a package that exports:

  1. A ShEx schema for validation
  2. A graph shortcut map (shortcut name → SPARQL property path)
  3. A UI component (React) for the admin dashboard
  4. An R package exposing typed query helpers

Refer to the Nemaline plugin source as the reference implementation.