OME-IRIS#
OME-IRIS is an open bioimage dataset catalog for benchmarking image input/output (IO), transformations, metadata management, and bioimage-linked workflows.
We also provide a small Python package by the same name (ome_iris) to help fetch and validate the datasets in the catalog.
Inspired by both the classic iris.csv dataset and the iris of the eye that brings images into focus, OME-IRIS aims to provide a collection of reference datasets for evaluating interoperable bioimage data formats, tools, and workflows.
What this is#
A lightweight manifest catalog for small benchmark datasets
A fetch + verify workflow with a single CLI
LinkML-based schema definitions for dataset manifests
What this is not#
Not a data portal
Not DVC-based
Not a large-file git storage approach
Not a full ontology or end-to-end benchmark system yet
Quick start#
uv run ome-iris fetch --tier small
uv run ome-iris verify
uv run ome-iris export-rocrate --dataset nf1-cellpainting-shrunken
Download a reproducible subset for local development or benchmarking:
uv run ome-iris download nf1 \
--output .benchmark-data/ome-iris/nf1 \
--preset tiny \
--channel DAPI
Python API:
from ome_iris import datasets
datasets.download(
"nf1",
output_dir=".benchmark-data/ome-iris/nf1",
subset={"images": 20, "channels": ["DAPI"]},
)
Fetch output modes:
uv run ome-iris fetch --tier small --verbose # show per-file labels + downloader progress
uv run ome-iris fetch --tier small --silent # suppress downloader progress output
What fetch does#
High-level flow when you run ome-iris fetch:
Loads dataset manifests from
--manifests-dir.Applies optional filters (
--dataset,--tier).Creates local dataset roots under
--data-dir/<source_identifier>/.Writes
ro-crate-metadata.jsoninto each dataset root.Iterates over each
filesentry:for
kind: file: downloads the file URL (or skips if already present)for
kind: directory: traverses/downloads directory contents (or extracts archive sources)
Reports a summary:
downloaded count + item list
skipped count + item list
missing URLs
failed downloads
Output layout example:
data/
NF1_cellpainting_data_shrunken/
ro-crate-metadata.json
profiles.parquet
images/
masks/
Local files are stored under ./data/ by default.
Each dataset directory also gets ro-crate-metadata.json with source/provenance metadata from the manifest.
To use another data directory:
uv run ome-iris fetch --data-dir /tmp/ome-iris-data
uv run ome-iris verify --data-dir /tmp/ome-iris-data
What download does#
ome-iris download creates a small, reproducible subset under the exact --output
directory. It supports named dataset aliases such as nf1, preset sizes
(tiny, small, benchmark), image limits, channel filters, plate/well/site
filters, and Z/T/C ranges where filenames expose those values.
Downloaded subsets include manifest.json with the source dataset, selected
subset options, downloaded file paths, source URLs, SHA-256 checksums, file
sizes, image shapes, dtypes, and file metadata. Existing files are reused and
included in the manifest. Use --validate-only to verify an existing subset
cache against its manifest without downloading data:
uv run ome-iris download nf1 \
--output .benchmark-data/ome-iris/nf1 \
--validate-only
Add a dataset#
Add or update a dataset manifest and catalog metadata.
Include source, formats, and file-level metadata.
Run:
uv run ome-iris verify
Starter scaffolding command:
uv run ome-iris scaffold --source-path /path/to/JUMP_plate_BR00117006
uv run ome-iris scaffold --source-path /path/to/JUMP_plate_BR00117006 --append-csv
uv run ome-iris scaffold --source-path /path/to/JUMP_plate_BR00117006 --include-directory-entry --directory-path images --archive-format zip
The command guesses a dataset id/name/formats, writes a starter YAML manifest, and prints a suggested datasets.csv row.
File entry patterns#
source_identifieris required at the top level of each manifest.All
files[].pathvalues are relative todata/<source_identifier>/.sha256is optional for file entries.Use
kind: directoryto fetch everything under a directory source.For GitHub tree URLs (
https://github.com/<owner>/<repo>/tree/<ref>/<path>), OME-IRIS traverses files under that subtree.For local directory paths, OME-IRIS recursively copies files.
For archive URLs, set
archive_format(ziportar) to extract an archive into the destination directory.
Relationships#
Use an optional top-level relationships list to describe links between dataset components.
from: source file path (must match afiles[].path)to: target file path (must match afiles[].path)type: relationship label (for examplelinks_to_images_by,links_to_masks_by,references_metadata)rocrate_predicate: explicit RO-Crate/JSON-LD predicate URI for export (required)via_columns(optional): explicit table columns used for linkingfilename_patterns(optional): standardized filename templates used by the relationshipderived_from_columns(optional): columns used when deriving one component from another (for example images -> masks)
Example:
files:
- path: profiles.parquet
- path: images
kind: directory
relationships:
- from: profiles.parquet
to: images
type: links_to_images_by
rocrate_predicate: http://schema.org/associatedMedia
Example directory entry:
files:
- path: jump-plate/images
kind: directory
archive_format: zip
url: https://example.org/jump-plate-images.zip
sha256: "" # optional
Custom metadata (first-class)#
OME-IRIS supports custom metadata as a first-class field via custom_metadata objects at manifest, source, and file levels.
Rules:
custom_metadatamust be an object/map.Keys must be strings.
Values may be strings, numbers, booleans, null, lists, or nested objects.
Example:
id: jump-plate
source_identifier: JUMP_plate_BR00117006
name: JUMP plate BR00117006 (JUMP_plate_BR00117006) example
description: Plate-level cell painting benchmark subset.
tier: small
license: CC-BY-4.0
custom_metadata:
study: jump-cp
species: human
source:
repository: https://example.org/repo
path: datasets/JUMP_plate_BR00117006
url: https://example.org/repo/tree/main/datasets/JUMP_plate_BR00117006
formats: [csv, tiff]
files:
- path: profiles.csv
url: https://example.org/files/profiles.csv
sha256: "..."
custom_metadata:
role: profile_table
Why large files are not committed#
Large image/profile files make repositories slow and fragile for contributors and CI. OME-IRIS tracks metadata and download locations, while actual data is fetched locally when needed.
Documentation#
Build docs locally:
uv sync --group docs
uv run --frozen sphinx-build docs/src docs/build
Contents: