Architecture¶

STAC Catalog Structure¶

Three collections, all items use MLM Extension and Version Extension

Catalog hierarchy

Catalog: fair-models
|
+-- Collection: base-models
|     Model blueprints contributed via PR.
|     Each item = complete model card (weights, code, Docker, MLM spec).
|     Versioned by contributors, registered via CLI utility.
|     |
|     +-- Item: unet-segmentation (v1)           category: semantic-segmentation
|     +-- Item: resnet18-classification (v1)      category: classification
|     +-- Item: yolo11n-detection (v1)            category: object-detection
|
+-- Collection: local-models
|     Finetuned models produced by ZenML pipelines.
|     Only promoted (production) versions appear here.
|     |
|     +-- Item: unet-segmentation-finetuned-banepa-v2   (production, latest-version)
|     +-- Item: unet-segmentation-finetuned-banepa-v1   (deprecated: true)
|     +-- Item: yolo11n-detection-finetuned-banepa-v1   (production)
|
+-- Collection: datasets
      Training data registered via fAIr UI/backend.
      |
      +-- Item: buildings-banepa-segmentation    category: semantic-segmentation
      +-- Item: buildings-banepa-detection       category: object-detection

What STAC Items Contain

All fields are from existing STAC/MLM standards. Custom fair:* fields are avoided wherever a standard exists.

Base model item¶

See models/unet_segmentation/stac-item.json for a complete example. All three base models (unet_segmentation, resnet18_classification, yolo11n_detection) follow this structure.

Key properties: mlm:name, mlm:architecture, mlm:tasks, mlm:framework, mlm:input (with pre_processing_function), mlm:output (with post_processing_function and classification:classes), mlm:hyperparameters, keywords.

Key assets: model (weights), source-code (with mlm:entrypoint), training-runtime / inference-runtime (Docker image or "local").

The mlm:entrypoint tells the backend which Python function to call. pre_processing_function / post_processing_function are standard MLM Processing Expression fields.

Local model item¶

Same MLM fields as base model, plus:

derived_from link pointing to the base model item
derived_from link pointing to the dataset item used for training
mlm:model asset pointing to S3 finetuned weights
Runtime assets reference the same Docker image as parent base model
Version Extension: version, deprecated, predecessor-version / successor-version / latest-version links
mlm:hyperparameters reflects the actual training params used

Dataset item¶

Label + file extensions. Properties: label:type, label:tasks, label:classes, keywords. Assets: chips (image directory), labels (GeoJSON).

Tagging and Classification¶

Concept	Standard field	Example values
ML task	`mlm:tasks`	`semantic-segmentation`, `object-detection`
Feature type tags	`keywords` (STAC core)	`building`, `road`, `tree`
Output geometry	`keywords` (STAC core)	`polygon`, `line`, `point`
Output classes	`classification:classes`	`{name: "building", value: 1}`
Dataset label type	`label:type` (Label ext)	`vector`, `raster`
Dataset label task	`label:tasks` (Label ext)	`segmentation`, `detection`
Pre/post processing	`pre_processing_function` / `post_processing_function` (MLM)	Python entrypoint

Compatibility Validation¶

Warning

The backend validates that a base model and dataset are compatible before triggering finetuning. Validation is based on matching keywords and mlm:tasks / label:tasks between the model and dataset STAC items.

Flows¶

fAIr-models workflow

1. Base Model Registration (PR workflow)¶

flowchart TD
    A[Model Developer] -->|Prepares PR| B[fAIr-Models GitHub]
    B -->|CI: build, validate, test| C{Review}
    C -->|Merge| D[Post-merge CLI / CI]
    D --> E[Build + push Docker image]
    D --> F[Upload weights to S3]
    D --> G[Register STAC item in base-models]
    G --> H[STAC: base-models/model-name v1]

2. Finetuning (ZenML pipeline)¶

flowchart TD
    A[User picks base model + dataset] --> B[fAIr Backend]
    B -->|Read STAC items| C[Validate compatibility]
    C --> D[Generate ZenML YAML config]
    D --> E[ZenML Pipeline in model Docker]
    E --> F[load_data]
    E --> G[preprocess_data]
    E --> H[train_model]
    E --> I[evaluate_model]
    H --> J[ZenML Model Control Plane]
    I --> J

3. Promotion to STAC¶

flowchart TD
    A[User picks best version] --> B[fAIr Backend]
    B --> C[ZenML: set stage = production]
    B --> D[StacCatalogManager]
    D --> E[Build STAC MLM item]
    D --> F[Deprecate previous version]
    D --> G[Add Version Extension links]
    E --> H[STAC: local-models/model-v3 production]

ZenML action	STAC effect
Promote to production	Create item, deprecate previous
Archive version	Set `deprecated: true` on item
Delete version	Remove item from collection
Delete model	Remove all items + clean up

4. Inference¶

Works for both base models and local models. The STAC item always has enough information to run inference: model weights, inference runtime, input/output spec.

Identity Model¶

Concept	Example	ZenML	STAC
Base model	`unet-segmentation`	Not in ZenML MCP	Item in `base-models`
Finetuned model	`unet-segmentation-finetuned-banepa`	ZenML Model (many versions)	Item(s) in `local-models`
Specific version	`unet-segmentation-finetuned-banepa` v2	ZenML Model Version 2	Item `unet-segmentation-finetuned-banepa-v2`
Dataset	`buildings-banepa-segmentation`	Not in ZenML MCP	Item in `datasets`

Infrastructure¶

Component	Local	Production
STAC Catalog	pystac JSON catalog	stac-fastapi + pgstac
ZenML	SQLite	ZenML Server (PostgreSQL)
Orchestrator	`local`	Kubernetes
Artifact Store	local filesystem	S3
Experiment Tracker	MLflow	MLflow
Container Registry	local Docker	ghcr.io

Architecture Decisions

STAC replaces ZenML Model Registry : STAC is a downstream publish target via StacCatalogManager, not a ZenML stack component.
STAC item = self-sufficient source of truth : contains everything needed to run training or inference.
Finetuned models share parent pipeline code : only weights differ between base and local models.
Standards over custom fields : mlm:tasks, keywords, classification:classes instead of custom fair:* fields.
YAML-based training & inference : every run is driven by a generated config logged as a ZenML artifact.
MLM Processing Expression for dispatch : pre_processing_function / post_processing_function use Python entrypoints.
Pipeline contract : every model must export training_pipeline and inference_pipeline as @pipeline-decorated functions.

Architecture¶

STAC Catalog Structure¶

Base model item¶

Local model item¶

Dataset item¶

Tagging and Classification¶

Compatibility Validation¶

Flows¶

1. Base Model Registration (PR workflow)¶

2. Finetuning (ZenML pipeline)¶

3. Promotion to STAC¶

4. Inference¶

Identity Model¶

Infrastructure¶

References¶

STAC Extensions¶

ZenML¶

fAIr Ecosystem¶