STAC Extension Schemas¶
fAIr defines three custom STAC extensions that layer fAIr-specific metadata on top of upstream extensions (MLM, Label, Version, Classification).
Each schema follows the STAC Extension pattern: a JSON Schema with a $id matching its published URL, a stac_extensions check, required properties, and field definitions.
Base Model¶
Extends the MLM Extension with training pipeline metadata: metrics spec, split spec, hyperparameter bounds, and runtime container references.
Schema URL: v1.0.0/base-model/schema.json
Required properties: title, description, mlm:name, mlm:architecture, mlm:tasks, mlm:framework, mlm:framework_version, mlm:pretrained, mlm:input, mlm:output, mlm:hyperparameters, keywords, version, license, fair:metrics_spec, fair:split_spec
Required assets: model, source-code, mlm:training, mlm:inference
Dataset¶
Extends the Label Extension with fAIr training data metadata: user attribution, chip counts, and download archives.
Schema URL: v1.0.0/dataset/schema.json
Required properties: title, description, label:type, label:tasks, label:classes, keywords, fair:user_id, version, deprecated
Required assets: chips, labels
Local Model (Finetuned)¶
Extends the base model schema with training provenance: links to the base model and dataset, evaluation metrics, training duration, and ZenML artifact references.
Schema URL: v1.0.0/local-model/schema.json
Required properties: title, description, mlm:name, mlm:architecture, mlm:tasks, mlm:framework, mlm:framework_version, mlm:pretrained, mlm:pretrained_source, mlm:input, mlm:output, mlm:hyperparameters, keywords, version, deprecated, fair:user_id
Required assets: model, source-code
Versioning and IDs¶
Each item type follows the STAC Version Extension with a consistent archiving pattern.
ID Strategy¶
| Type | ID | How determined |
|---|---|---|
| Base model | Human-readable slug | item_id from the STAC JSON file (e.g. yolo11n-detection) |
| Dataset | Human-readable slug | _slugify(title) or item_id from the STAC JSON file |
| Local model | UUID | ZenML model version ID (unique per user/training run) |
Local models use UUIDs because the same base model + dataset pair can produce different finetuned models across users and training runs.
Version Lifecycle¶
All three types use the version property (string, starting at "1").
Base models and datasets follow the same pattern:
- On first register:
version: "1", item ID is the slug (e.g.yolo11n-detection) - On re-register: previous item is archived as
{slug}-v{N}withdeprecated: true, new item keeps the original slug withversion: "{N+1}" - Archived items link to their successor via
successor-version; the active item links back viapredecessor-version
Base models match on mlm:name, datasets match on title to find previous active versions.
Local models get their version from ZenML's model version number. No archiving is performed since each promotion creates a new item with a unique UUID.
Version Links¶
All items use links from the Version Extension:
| Link relation | Direction | Present on |
|---|---|---|
latest-version |
self-referencing | Active items only |
predecessor-version |
current -> previous | Items with version > 1 |
successor-version |
old -> new | Archived (deprecated) items |
Example¶
After registering a dataset three times:
| Item ID | Version | Deprecated | Links |
|---|---|---|---|
buildings-banepa-segmentation-v1 |
1 | true | successor-version -> v2 |
buildings-banepa-segmentation-v2 |
2 | true | successor-version -> v3 |
buildings-banepa-segmentation |
3 | false | latest-version -> self, predecessor-version -> v2 |
Timestamps¶
Temporal tracking uses:
| Property | Set by | Purpose |
|---|---|---|
created |
Builder (on first creation) | When the STAC item was first published |
updated |
Backend (on every publish) | When the item was last modified |
Validation¶
Schemas are registered into PySTAC's JsonSchemaSTACValidator.schema_cache at runtime so item.validate() resolves them without network access: