Cytomining Workflow#
iceberg-bioimage treats Cytomining interoperability as a primary workflow.
The package supports two common paths:
Create a Cytomining-compatible Parquet warehouse root directly from image stores such as Zarr or OME-TIFF.
Materialize a Cytomining-compatible Parquet warehouse root from existing Iceberg metadata tables.
These exports are designed to be useful to tools like pycytominer while
keeping image scanning, metadata canonicalization, and namespace handling in
this repository.
Each warehouse root also carries a warehouse_manifest.json file so appended
tables are described by role, join keys, provenance, and columns rather than
only by directory name.
Warehouse layout#
The Parquet warehouse root can contain:
image_assets/chunk_index/joined_profiles/
image_assets is the base metadata table.
chunk_index is optional and only contains rows for chunked assets.
joined_profiles is optional and is written when a profile table is provided.
Export From Image Stores#
Use this when a Cytomining project starts from raw image data and wants a Parquet warehouse root immediately:
from iceberg_bioimage import export_store_to_cytomining_warehouse
result = export_store_to_cytomining_warehouse(
"data/experiment.zarr",
"warehouse-root",
profiles="data/profiles.parquet",
profile_dataset_id="experiment",
)
print(result.to_dict())
CLI:
iceberg-bioimage export-cytomining \
--warehouse-root warehouse-root \
--profiles data/profiles.parquet \
--profile-dataset-id experiment \
data/experiment.zarr
Export From Existing Iceberg Metadata#
Use this when a project already has image_assets and chunk_index in an
Iceberg catalog and wants a Cytomining warehouse root for downstream tools:
from iceberg_bioimage import export_catalog_to_cytomining_warehouse
result = export_catalog_to_cytomining_warehouse(
"default",
"bioimage.cytotable",
"warehouse-root",
profiles="data/profiles.parquet",
profile_dataset_id="experiment",
)
print(result.to_dict())
CLI:
iceberg-bioimage export-cytomining-catalog \
--catalog default \
--namespace bioimage.cytotable \
--warehouse-root warehouse-root \
--profiles data/profiles.parquet \
--profile-dataset-id experiment
Existing Warehouse Roots#
Both export helpers support:
mode="overwrite"for replacing target tablesmode="append"for adding additional Parquet parts to an existing warehouse
mode="overwrite" is table-scoped and does not remove unrelated table
directories in the same warehouse root.
This makes it possible to incrementally add datasets from multiple assays or plates into the same Cytomining-oriented warehouse root.
ExampleHuman To Cytomining Workflow#
One useful pattern for Cytomining projects is:
use
CytoTableto convert ExampleHuman-style measurement outputs into an Iceberg-backed warehouseuse
iceberg-bioimageto materialize that metadata into a Cytomining Parquet warehouse rootappend downstream
pycytomineroutputs as named warehouse tablesappend downstream
coSMicQCoutputs as named warehouse tables
That looks like:
# 1. External step: build or update the CytoTable/Iceberg warehouse
# from ExampleHuman measurement outputs.
# 2. Export the Iceberg-backed metadata into a Cytomining warehouse root.
iceberg-bioimage export-cytomining-catalog \
--catalog default \
--namespace bioimage.cytotable \
--warehouse-root warehouse-root \
--profiles data/examplehuman_profiles.parquet \
--profile-dataset-id ExampleHuman
# 3. Append a pycytominer output table.
iceberg-bioimage export-cytomining-profiles \
--warehouse-root warehouse-root \
--table-name pycytominer_profiles \
--profile-dataset-id ExampleHuman \
data/pycytominer_output.parquet
# 4. Append a coSMicQC output table.
iceberg-bioimage export-cytomining-profiles \
--warehouse-root warehouse-root \
--table-name cosmicqc_profiles \
--profile-dataset-id ExampleHuman \
data/cosmicqc_output.parquet
After those steps, the same warehouse root can contain:
image_assets/chunk_index/joined_profiles/pycytominer_profiles/cosmicqc_profiles/
This keeps the image metadata and downstream Cytomining analysis outputs in one portable Parquet layout.
Generic Table Export#
For Cytomining projects with outputs that do not fit one narrow static convention, use the generic table export API and record the table role in the manifest:
import pyarrow as pa
from iceberg_bioimage import export_table_to_cytomining_warehouse
result = export_table_to_cytomining_warehouse(
pa.table(
{
"dataset_id": ["ExampleHuman"],
"image_id": ["ExampleHuman:0"],
"embedding_0": [0.1],
"embedding_1": [0.2],
}
),
"warehouse-root",
table_name="embeddings",
role="embeddings",
join_keys=["dataset_id", "image_id"],
source_type="custom",
source_ref="my-embedding-pipeline",
)
print(result.to_dict())
This is the intended scaling path for additional Cytomining outputs such as:
embeddings
QC summaries
annotations
segmentation metrics
experiment-level reports
Cytomining Metadata Compatibility#
Profile-table compatibility is designed for common Cytomining conventions.
The join and export paths recognize aliases such as:
Metadata_dataset_idMetadata_ImageIDMetadata_PlateMetadata_WellMetadata_Site
If a profile table does not include dataset_id but all rows belong to a
single dataset, pass profile_dataset_id.
If your project uses custom column names, load aliases from TOML and pass them into the profile export path:
[microscopy.aliases]
dataset_id = ["ProjectID"]
image_id = ["ImageKey"]
well_id = ["WellName"]
from iceberg_bioimage import (
export_profiles_to_cytomining_warehouse,
load_profile_column_aliases,
)
aliases = load_profile_column_aliases("aliases.toml")
export_profiles_to_cytomining_warehouse(
"data/custom_profiles.parquet",
"warehouse-root",
table_name="custom_profiles",
alias_map=aliases,
)
OME-Arrow and Other Columns#
This repository does not try to reinterpret arbitrary non-tabular payload columns. If a profile table or joined output includes OME-Arrow payload-related columns, they are preserved in the Parquet export as long as the join keys and the analyzable feature columns remain valid.