Architecture¶
STAC Catalog Structure¶
Three collections, all items use MLM Extension and Version Extension
Catalog: fair-models
|
+-- Collection: base-models
| Model blueprints contributed via PR.
| Each item = complete model card (weights, code, Docker, MLM spec).
| Versioned by contributors, registered via CLI utility.
| |
| +-- Item: unet-segmentation (v1) category: semantic-segmentation
| +-- Item: resnet18-classification (v1) category: classification
| +-- Item: yolo11n-detection (v1) category: object-detection
|
+-- Collection: local-models
| Finetuned models produced by ZenML pipelines.
| Only promoted (production) versions appear here.
| |
| +-- Item: unet-segmentation-finetuned-banepa-v2 (production, latest-version)
| +-- Item: unet-segmentation-finetuned-banepa-v1 (deprecated: true)
| +-- Item: yolo11n-detection-finetuned-banepa-v1 (production)
|
+-- Collection: datasets
Training data registered via fAIr UI/backend.
|
+-- Item: buildings-banepa-segmentation category: semantic-segmentation
+-- Item: buildings-banepa-detection category: object-detection
What STAC Items Contain
All fields are from existing STAC/MLM standards. Custom fair:* fields are
avoided wherever a standard exists.
Base model item¶
See models/unet_segmentation/stac-item.json for a complete example.
All three base models (unet_segmentation, resnet18_classification, yolo11n_detection) follow this structure.
Key properties: mlm:name, mlm:architecture, mlm:tasks, mlm:framework,
mlm:input (with pre_processing_function), mlm:output (with post_processing_function
and classification:classes), mlm:hyperparameters, keywords.
Key assets: model (weights), source-code (with mlm:entrypoint),
training-runtime / inference-runtime (Docker image or "local").
The mlm:entrypoint tells the backend which Python function to call.
pre_processing_function / post_processing_function are standard MLM
Processing Expression fields.
Local model item¶
Same MLM fields as base model, plus:
derived_fromlink pointing to the base model itemderived_fromlink pointing to the dataset item used for trainingmlm:modelasset pointing to S3 finetuned weights- Runtime assets reference the same Docker image as parent base model
- Version Extension:
version,deprecated,predecessor-version/successor-version/latest-versionlinks mlm:hyperparametersreflects the actual training params used
Dataset item¶
Label + file extensions. Properties: label:type, label:tasks, label:classes, keywords.
Assets: chips (image directory), labels (GeoJSON).
Tagging and Classification¶
| Concept | Standard field | Example values |
|---|---|---|
| ML task | mlm:tasks |
semantic-segmentation, object-detection |
| Feature type tags | keywords (STAC core) |
building, road, tree |
| Output geometry | keywords (STAC core) |
polygon, line, point |
| Output classes | classification:classes |
{name: "building", value: 1} |
| Dataset label type | label:type (Label ext) |
vector, raster |
| Dataset label task | label:tasks (Label ext) |
segmentation, detection |
| Pre/post processing | pre_processing_function / post_processing_function (MLM) |
Python entrypoint |
Compatibility Validation¶
Warning
The backend validates that a base model and dataset are compatible before
triggering finetuning. Validation is based on matching keywords and
mlm:tasks / label:tasks between the model and dataset STAC items.
Flows¶

1. Base Model Registration (PR workflow)¶
flowchart TD
A[Model Developer] -->|Prepares PR| B[fAIr-Models GitHub]
B -->|CI: build, validate, test| C{Review}
C -->|Merge| D[Post-merge CLI / CI]
D --> E[Build + push Docker image]
D --> F[Upload weights to S3]
D --> G[Register STAC item in base-models]
G --> H[STAC: base-models/model-name v1]
2. Finetuning (ZenML pipeline)¶
flowchart TD
A[User picks base model + dataset] --> B[fAIr Backend]
B -->|Read STAC items| C[Validate compatibility]
C --> D[Generate ZenML YAML config]
D --> E[ZenML Pipeline in model Docker]
E --> F[load_data]
E --> G[preprocess_data]
E --> H[train_model]
E --> I[evaluate_model]
H --> J[ZenML Model Control Plane]
I --> J
3. Promotion to STAC¶
flowchart TD
A[User picks best version] --> B[fAIr Backend]
B --> C[ZenML: set stage = production]
B --> D[StacCatalogManager]
D --> E[Build STAC MLM item]
D --> F[Deprecate previous version]
D --> G[Add Version Extension links]
E --> H[STAC: local-models/model-v3 production]
| ZenML action | STAC effect |
|---|---|
| Promote to production | Create item, deprecate previous |
| Archive version | Set deprecated: true on item |
| Delete version | Remove item from collection |
| Delete model | Remove all items + clean up |
4. Inference¶
Works for both base models and local models. The STAC item always has enough information to run inference: model weights, inference runtime, input/output spec.
Identity Model¶
| Concept | Example | ZenML | STAC |
|---|---|---|---|
| Base model | unet-segmentation |
Not in ZenML MCP | Item in base-models |
| Finetuned model | unet-segmentation-finetuned-banepa |
ZenML Model (many versions) | Item(s) in local-models |
| Specific version | unet-segmentation-finetuned-banepa v2 |
ZenML Model Version 2 | Item unet-segmentation-finetuned-banepa-v2 |
| Dataset | buildings-banepa-segmentation |
Not in ZenML MCP | Item in datasets |
Infrastructure¶
| Component | Local | Production |
|---|---|---|
| STAC Catalog | pystac JSON catalog | stac-fastapi + pgstac |
| ZenML | SQLite | ZenML Server (PostgreSQL) |
| Orchestrator | local |
Kubernetes |
| Artifact Store | local filesystem | S3 |
| Experiment Tracker | MLflow | MLflow |
| Container Registry | local Docker | ghcr.io |
Architecture Decisions
- STAC replaces ZenML Model Registry : STAC is a downstream publish target via
StacCatalogManager, not a ZenML stack component. - STAC item = self-sufficient source of truth : contains everything needed to run training or inference.
- Finetuned models share parent pipeline code : only weights differ between base and local models.
- Standards over custom fields :
mlm:tasks,keywords,classification:classesinstead of customfair:*fields. - YAML-based training & inference : every run is driven by a generated config logged as a ZenML artifact.
- MLM Processing Expression for dispatch :
pre_processing_function/post_processing_functionuse Python entrypoints. - Pipeline contract : every model must export
training_pipelineandinference_pipelineas@pipeline-decorated functions.
References¶
STAC Extensions¶
- STAC MLM Extension v1.5.1
- MLM Best Practices
- STAC Version Extension v1.2.0
- STAC Classification Extension
- STAC Label Extension