Metadata Workflow Example#
This notebook demonstrates the stable, user-facing API without requiring a live catalog. It covers three common tasks:
build a canonical
ScanResultsummarize it into a user-facing
DatasetSummaryjoin it to profile data in Arrow
import pyarrow as pa
from iceberg_bioimage import (
ImageAsset,
ScanResult,
join_profiles_with_scan_result,
summarize_scan_result,
)
from iceberg_bioimage.publishing.chunk_index import scan_result_to_chunk_rows
from iceberg_bioimage.publishing.image_assets import scan_result_to_rows
scan_result = ScanResult(
source_uri="data/example.ome.zarr",
format_family="zarr",
image_assets=[
ImageAsset(
uri="data/example.ome.zarr",
array_path="0",
shape=[1, 1, 256, 256],
dtype="uint16",
chunk_shape=[1, 1, 128, 128],
metadata={
"axes": "czyx",
"channel_count": 1,
"storage_variant": "zarr-v2",
},
image_id="example:0",
)
],
)
scan_result.to_dict()
summary = summarize_scan_result(scan_result)
summary.to_dict()
profiles = pa.table(
{
"dataset_id": ["example"],
"image_id": ["example:0"],
"cell_count": [42],
}
)
joined = join_profiles_with_scan_result(scan_result, profiles, include_chunks=True)
joined.to_pydict()
{
"image_assets": scan_result_to_rows(scan_result),
"chunk_index_count": len(scan_result_to_chunk_rows(scan_result)),
}
Optional OME-Arrow path#
If you install the optional ome-arrow extra, the package also exposes a
small bridge for Arrow-native image payload workflows:
from iceberg_bioimage import create_ome_arrow, scan_ome_arrow
oa = create_ome_arrow("image.ome.tiff")
lazy_oa = scan_ome_arrow("image.ome.parquet")
That keeps metadata registration and Arrow-native image handling adjacent, without pulling OME-Arrow into the required dependency set.