Utilities¶
Data helpers and model validation utilities.
Data Helpers¶
fair.utils.data
¶
S3 data helpers for model pipelines.
Uses universal-pathlib (UPath) over fsspec for unified local/S3 file access. fsspec/s3fs reads AWS_ENDPOINT_URL natively for MinIO compatibility.
Caching: fsspec supports URL-chaining (simplecache::s3://, filecache::s3://, blockcache::s3://) — model developers opt in as needed.
list_files(href, pattern='*')
¶
List files under href matching glob pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
href
|
str
|
Local path or s3://bucket/prefix. |
required |
pattern
|
str
|
Glob pattern (e.g. "OAM-*.tif"). |
'*'
|
Source code in fair/utils/data.py
count_chips(chips_href)
¶
Count image files in a chips directory (local or S3).
Counts files matching common raster extensions. Useful for setting fair:chip_count on STAC dataset items.
Source code in fair/utils/data.py
resolve_path(href, local_dir=None)
¶
Download a single remote file to local cache. Local paths pass through.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
href
|
str
|
Local path or s3://bucket/key URI. |
required |
local_dir
|
Path | None
|
Download target directory. Defaults to /tmp/fair-data. |
None
|
Source code in fair/utils/data.py
resolve_directory(href, pattern='*', local_dir=None)
¶
Download all files under a remote prefix to local cache. Local paths pass through.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
href
|
str
|
Local path or s3://bucket/prefix. |
required |
pattern
|
str
|
Glob pattern to filter files (e.g. "OAM-*.tif"). |
'*'
|
local_dir
|
Path | None
|
Download target root. Defaults to /tmp/fair-data. |
None
|
Source code in fair/utils/data.py
create_dataset_archive(chips_dir, labels_dir, output_path)
¶
Zip chips and labels directories into a single archive.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chips_dir
|
str
|
Path (local or s3://) to the chips directory. |
required |
labels_dir
|
str
|
Path (local or s3://) to the labels directory. |
required |
output_path
|
str
|
Local path for the output .zip file. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The output_path after the archive is written. |
Source code in fair/utils/data.py
upload_item_assets(item, data_prefix, collection_id)
¶
Upload local asset files to S3 and rewrite hrefs in-place.
Deterministic path: {data_prefix}/{collection_id}/{item.id}/{asset_key}/... Files are uploaded; directories are uploaded recursively. Remote hrefs are left untouched.
Returns the item with rewritten hrefs.
Source code in fair/utils/data.py
Model Validator¶
fair.utils.model_validator
¶
Validate model contributions have required pipeline entrypoints.
Uses AST parsing (no imports, no runtime dependencies) to check that every model's pipeline.py defines the required @pipeline and @step decorated functions.