DuckDB Integration#
DuckDB is the first supported query integration for iceberg-bioimage, but it
is intentionally optional. The core package focuses on scanning,
canonicalization, Cytomining warehouse export, validation, and publishing.
Install#
uv sync --group duckdb
For install alternatives and first-run workflow selection, see Getting Started.
Supported helper functions#
create_duckdb_connectionquery_metadata_tablejoin_image_assets_with_profilesload_catalog_tablecatalog_table_to_arrowjoin_catalog_image_assets_with_profiles
These helpers operate on canonical metadata in Parquet, Arrow, or row-list form. Catalog-backed helpers use PyIceberg to materialize canonical metadata tables into Arrow before querying. None of these helpers replace catalog management, storage access, or image IO.
For Cytomining workflows, a common pattern is:
export
image_assets, optionalchunk_index, and optionaljoined_profilesinto a Parquet warehouse rootuse
pycytominerto load those Parquet datasets directlyuse DuckDB helpers here when you want lightweight SQL over the same metadata
Example#
import pyarrow as pa
from iceberg_bioimage import join_image_assets_with_profiles
image_assets = pa.table(
{
"dataset_id": ["ds-1"],
"image_id": ["img-1"],
"array_path": ["0"],
"uri": ["data/example.zarr"],
}
)
profiles = pa.table(
{
"dataset_id": ["ds-1"],
"image_id": ["img-1"],
"cell_count": [42],
}
)
joined = join_image_assets_with_profiles(image_assets, profiles)
print(joined.to_pydict())
Catalog-backed example#
import pyarrow as pa
from iceberg_bioimage import join_catalog_image_assets_with_profiles
profiles = pa.table(
{
"dataset_id": ["ds-1"],
"image_id": ["img-1"],
"cell_count": [42],
}
)
joined = join_catalog_image_assets_with_profiles(
"default",
"bioimage.cytotable",
profiles,
chunk_index_table="chunk_index",
)
print(joined.to_pydict())