Python API#

ome_arrow#

Init file for ome_arrow package.

ome_arrow.core#

Core of the ome_arrow package, used for classes and such.

class ome_arrow.core.OMEArrow(data: str | dict | pa.StructScalar | 'np.ndarray', tcz: Tuple[int, int, int] = (0, 0, 0), *, dim_order: str | None = None, column_name: str = 'ome_arrow', row_index: int = 0, image_type: str | None = None, lazy: bool = False)[source]#

Bases: object

Small convenience toolkit for working with ome-arrow data.

If input is a TIFF path, this loads it via tiff_to_ome_arrow. If input is a dict, it will be converted using to_struct_scalar. If input is already a pa.StructScalar, it is used as-is.

In Jupyter, evaluating the instance will render the first plane using matplotlib (via _repr_html_). Call view_matplotlib() to select a specific (z, t, c) plane.

Parameters:

input – TIFF path, nested dict, or pa.StructScalar.
struct – Expected Arrow StructType (e.g., OME_ARROW_STRUCT).

collect() → OMEArrow[source]#

Materialize deferred source data and return self.

Returns:: The same instance after materialization.
Return type:: OMEArrow

property data: StructScalar#

Return the materialized OME-Arrow StructScalar.

Returns:: Materialized OME-Arrow record.
Return type:: pa.StructScalar
Raises:: RuntimeError – If the record could not be initialized.

export(how: str = 'numpy', dtype: np.dtype = np.uint16, strict: bool = True, clamp: bool = False, *, out: str | None = None, dim_order: str = 'TCZYX', compression: str | None = 'zlib', compression_level: int = 6, tile: tuple[int, int] | None = None, chunks: tuple[int, int, int, int, int] | None = None, zarr_compressor: str | None = 'zstd', zarr_level: int = 7, use_channel_colors: bool = False, parquet_column_name: str = 'ome_arrow', parquet_compression: str | None = 'zstd', parquet_metadata: dict[str, str] | None = None, vortex_column_name: str = 'ome_arrow', vortex_metadata: dict[str, str] | None = None) → np.array | dict | pa.StructScalar | str[source]#

Export the OME-Arrow content in a chosen representation.

Parameters:

how – “numpy” → TCZYX np.ndarray “dict” → plain Python dict “scalar” → pa.StructScalar (as-is) “ome-tiff” → write OME-TIFF via BioIO “ome-zarr” → write OME-Zarr (OME-NGFF) via BioIO “parquet” → write a single-row Parquet with one struct column “vortex” → write a single-row Vortex file with one struct column
dtype – Target dtype for “numpy”/writers (default: np.uint16).
strict – For “numpy”: raise if a plane has wrong pixel length.
clamp – For “numpy”/writers: clamp values into dtype range before cast.
specific) (Keyword-only (writer)
------------------------------
out – Output path (required for ‘ome-tiff’, ‘ome-zarr’, and ‘parquet’).
dim_order – Axes string for BioIO writers; default “TCZYX”.
tile (compression / compression_level /) – OME-TIFF options (passed through to tifffile via BioIO).
zarr_level (chunks / zarr_compressor /) – OME-Zarr options (chunk shape, compressor hint, level). If chunks is None, a TCZYX default is chosen (1,1,<=4,<=512,<=512).
use_channel_colors – Try to embed per-channel display colors when safe; otherwise omitted.
parquet_* – Options for Parquet export (column name, compression, file metadata).
vortex_* – Options for Vortex export (column name, file metadata).

Returns:

“numpy”: np.ndarray (T, C, Z, Y, X)
”dict”: dict
”scalar”: pa.StructScalar
”ome-tiff”: output path (str)
”ome-zarr”: output path (str)
”parquet”: output path (str)
”vortex”: output path (str)

Return type:

Any

Raises:

ValueError: – Unknown ‘how’ or missing required params.

info() → Dict[str, Any][source]#

Describe the OME-Arrow data structure.

Returns:

shape: (T, C, Z, Y, X)
type: classification string
summary: human-readable text

Return type:

dict with keys

property is_lazy: bool#: Return whether this instance still has deferred work.

classmethod scan(data: str, *, tcz: Tuple[int, int, int] = (0, 0, 0), column_name: str = 'ome_arrow', row_index: int = 0, image_type: str | None = None) → OMEArrow[source]#

Create a lazily-loaded OMEArrow, similar to Polars scan semantics.

Parameters:

data – Input source path/URL.
tcz – Default (t, c, z) indices used for view helpers.
column_name – OME-Arrow column name for tabular sources.
row_index – Row index for tabular sources.
image_type – Optional image type override.

Returns:

Lazily planned OMEArrow instance.

Return type:

OMEArrow

slice(x_min: int, x_max: int, y_min: int, y_max: int, t_indices: Iterable[int] | None = None, c_indices: Iterable[int] | None = None, z_indices: Iterable[int] | None = None, fill_missing: bool = True) → OMEArrow[source]#

Create a cropped copy of an OME-Arrow record.

Crops spatially to [y_min:y_max, x_min:x_max] (half-open) and, if provided, filters/reindexes T/C/Z to the given index sets.

Parameters:

x_min (int) – Half-open crop bounds in pixels (0-based).
x_max (int) – Half-open crop bounds in pixels (0-based).
y_min (int) – Half-open crop bounds in pixels (0-based).
y_max (int) – Half-open crop bounds in pixels (0-based).
t_indices (Iterable[int] | None) – Optional explicit indices to keep for T, C, Z. If None, keep all. Selected indices are reindexed to 0..len-1 in the output.
c_indices (Iterable[int] | None) – Optional explicit indices to keep for T, C, Z. If None, keep all. Selected indices are reindexed to 0..len-1 in the output.
z_indices (Iterable[int] | None) – Optional explicit indices to keep for T, C, Z. If None, keep all. Selected indices are reindexed to 0..len-1 in the output.
fill_missing (bool) – If True, any missing (t,c,z) planes in the selection are zero-filled.

Returns:

New OME-Arrow record with updated sizes and planes.

Return type:

OMEArrow object

slice_lazy(x_min: int, x_max: int, y_min: int, y_max: int, t_indices: Iterable[int] | None = None, c_indices: Iterable[int] | None = None, z_indices: Iterable[int] | None = None, fill_missing: bool = True) → OMEArrow[source]#

Return a lazily planned slice, collected on first execution.

For lazy sources created with OMEArrow.scan(...), this queues a deferred slice operation and returns a new lazy OMEArrow plan produced from OMEArrow.scan(...). For already materialized sources, this falls back to eager slice(). This method does not mutate self.

Notes

slice_lazy always returns a new plan object. Internally, the returned plan gets a fresh _lazy_slices list ([*self._lazy_slices, new_slice]), so chained plans do not share mutable slice state with the original OMEArrow. A common footgun is: oa.slice_lazy(...).collect() followed by oa.tensor_view(...). Those calls can load/materialize the same source twice because oa remains the original plan. For a single-load workflow, keep working from the value returned by slice_lazy / collect.

Parameters:

x_min – Inclusive minimum X index for the crop.
x_max – Exclusive maximum X index for the crop.
y_min – Inclusive minimum Y index for the crop.
y_max – Exclusive maximum Y index for the crop.
t_indices – Optional time indices to retain.
c_indices – Optional channel indices to retain.
z_indices – Optional depth indices to retain.
fill_missing – Whether to zero-fill missing (t, c, z) planes.

Returns:

Lazy plan when source is lazy; eager slice result otherwise.

Return type:

OMEArrow

Create a TensorView of the pixel data.

Parameters:

scene – Scene index (only 0 is supported for single-image records).
t – Time index selection (int, slice, or sequence). Default: all.
z – Z index selection (int, slice, or sequence). Default: all.
c – Channel index selection (int, slice, or sequence). Default: all.
roi – Spatial crop (x, y, w, h) in pixels.
roi3d – Spatial + depth crop (x, y, z, w, h, d) in pixels/planes. This is a convenience alias for roi=(x, y, w, h) and z=slice(z, z + d).
roi_nd – General ROI tuple with min/max bounds.
roi_type – ROI interpretation mode for roi_nd. Supported values: "2d", "2d_timelapse", "3d", and "4d".
tile – Tile index (tile_y, tile_x) based on chunk grid.
layout – Desired layout string using TZCYX letters where T=time, Z=depth, C=channel, Y=row axis, X=column axis. TZCHW aliases are also accepted for compatibility.
dtype – Output dtype override.
chunk_policy – Handling for pyarrow.ChunkedArray inputs.
channel_policy – Behavior when dropping C from layout while multiple channels are selected. “error” raises (default). “first” keeps the first channel.

Returns:

Tensor view over selected pixels. In lazy mode, this returns a deferred LazyTensorView that resolves on first execution call (for example to_numpy()) without forcing self to materialize unless deferred slice_lazy operations are queued.

Return type:

TensorView | LazyTensorView

Raises:

ValueError – If an unsupported scene is requested.

view(how: str = 'matplotlib', tcz: tuple[int, int, int] = (0, 0, 0), autoscale: bool = True, vmin: int | None = None, vmax: int | None = None, cmap: str = 'gray', show: bool = True, c: int | None = None, downsample: int = 1, opacity: str | float = 'sigmoid', clim: tuple[float, float] | None = None, show_axes: bool = True, scaling_values: tuple[float, float, float] | None = None) → tuple[matplotlib.figure.Figure, Any, Any] | 'pyvista.Plotter'[source]#

Render an OME-Arrow record using Matplotlib or PyVista.

This convenience method supports two rendering backends:

how="matplotlib" renders a single (t, c, z) plane as a 2D image.
how="pyvista" creates an interactive 3D PyVista visualization.

Parameters:

how – Rendering backend. One of "matplotlib" or "pyvista".
tcz – (t, c, z) indices used for plane display.
autoscale – Infer Matplotlib display limits from image range when vmin/vmax are not provided.
vmin – Lower display limit for Matplotlib intensity scaling.
vmax – Upper display limit for Matplotlib intensity scaling.
cmap – Matplotlib colormap name for single-channel display.
show – Whether to display the plot immediately.
c – Channel index override for PyVista. If None, uses tcz[1].
downsample – Integer downsampling factor for PyVista views. Higher values render faster for large volumes but reduce spatial resolution.
opacity – Opacity for PyVista. Either a float in [0, 1] or "sigmoid".
clim – Contrast limits (low, high) for PyVista rendering.
show_axes – Whether to display axes in the PyVista scene.
scaling_values – Physical scale multipliers (x, y, z) used by PyVista. If None, uses OME metadata-derived scaling.

Returns:

tuple[matplotlib.figure.Figure, matplotlib.axes.Axes, matplotlib.image.AxesImage] | pyvista.Plotter: For how="matplotlib", returns the tuple emitted by ome_arrow.view.view_matplotlib() as (figure, axes, image). For how="pyvista", returns a pyvista.Plotter.

Raises:

ValueError – If a requested plane is not found or the render mode is unsupported.
TypeError – If parameter types are invalid.

Notes

The how="pyvista" mode normally outputs an interactive visualization, but attempts to embed a static PNG snapshot for non-interactive renderers (for example, static docs builds, nbconvert HTML/PDF exports, rendered/read-only notebook views such as GitHub notebook previews, and CI log viewers).
When show=False and how="pyvista", the returned pyvista.Plotter can be shown later.

ome_arrow.ingest#

Converting to and from OME-Arrow formats.

ome_arrow.ingest.from_jax_array(arr: Any, *, dim_order: str | None = None, image_id: str | None = None, name: str | None = None, image_type: str | None = None, channel_names: Sequence[str] | None = None, acquisition_datetime: datetime | None = None, clamp_to_uint16: bool = True, chunk_shape: Tuple[int, int, int] | None=(1, 512, 512), chunk_order: str = "ZYX", chunk_encoding: Literal['list', 'bytes']="list", chunk_compression: str | None = None, chunk_compression_level: int | None = None, build_chunks: bool = True, physical_size_x: float = 1.0, physical_size_y: float = 1.0, physical_size_z: float = 1.0, physical_size_unit: str = "µm", dtype_meta: str | None = None) → StructScalar[source]#

Build an OME-Arrow StructScalar from a JAX array.

This is useful when your pipeline already works with jax.Array objects and you want a direct path into the canonical OME-Arrow struct without manual conversion boilerplate in user code.

Parameters:

arr – jax.Array image data.
dim_order – Axis labels for arr. If None, infer from rank: 2D->”YX”, 3D->”ZYX”, 4D->”TCYX”, 5D->”TCZYX”.
image_id – Optional stable image identifier.
name – Optional human label.
image_type – Open-ended image kind (e.g., “image”, “label”).
channel_names – Optional channel names. Defaults to None. When None (or length does not match channel count), names are auto-generated as C0..C{n-1} (for example, 3 channels become C0, C1, C2).
acquisition_datetime – Defaults to now (UTC) if None.
clamp_to_uint16 – If True, clamp/cast planes to uint16 before serialization.
chunk_shape – Chunk shape as (Z, Y, X). Defaults to (1, 512, 512).
chunk_order – Flattening order for chunk pixels (default “ZYX”).
chunk_encoding – "list" stores historical numeric pixel lists. "bytes" stores compact typed chunk byte buffers.
chunk_compression – Optional leaf-level compression for byte chunks, such as "zstd" or "lz4".
chunk_compression_level – Optional codec compression level.
build_chunks – If True, build chunked pixels from planes.
physical_size_x – Spatial pixel size (µm) for X.
physical_size_y – Spatial pixel size (µm) for Y.
physical_size_z – Spatial pixel size (µm) for Z when present.
physical_size_unit – Unit string for spatial axes (default “µm”).
dtype_meta – Pixel dtype string to place in metadata.

Returns:

Typed OME-Arrow record.

Return type:

pa.StructScalar

ome_arrow.ingest.from_numpy(arr: ndarray, *, dim_order: str = "TCZYX", image_id: str | None = None, name: str | None = None, image_type: str | None = None, channel_names: Sequence[str] | None = None, acquisition_datetime: datetime | None = None, clamp_to_uint16: bool = True, chunk_shape: Tuple[int, int, int] | None=(1, 512, 512), chunk_order: str = "ZYX", chunk_encoding: Literal['list', 'bytes']="list", chunk_compression: str | None = None, chunk_compression_level: int | None = None, build_chunks: bool = True, physical_size_x: float = 1.0, physical_size_y: float = 1.0, physical_size_z: float = 1.0, physical_size_unit: str = "µm", dtype_meta: str | None = None) → StructScalar[source]#

Build an OME-Arrow StructScalar from a NumPy array.

Parameters:

arr – Image data with axes described by dim_order.
dim_order – Axis labels for arr. Must include “Y” and “X”. Supported examples: “YX”, “ZYX”, “CYX”, “CZYX”, “TYX”, “TCYX”, “TCZYX”.
image_id – Optional stable image identifier.
name – Optional human label.
image_type – Open-ended image kind (e.g., “image”, “label”).
channel_names – Optional channel names. Defaults to None. When None (or length does not match channel count), names are auto-generated as C0..C{n-1} (for example, 3 channels become C0, C1, C2).
acquisition_datetime – Defaults to now (UTC) if None.
clamp_to_uint16 – If True, clamp/cast planes to uint16 before serialization.
chunk_shape – Chunk shape as (Z, Y, X). Defaults to (1, 512, 512).
chunk_order – Flattening order for chunk pixels (default “ZYX”).
chunk_encoding – "list" stores historical numeric pixel lists. "bytes" stores compact typed chunk byte buffers.
chunk_compression – Optional leaf-level compression for byte chunks, such as "zstd" or "lz4".
chunk_compression_level – Optional codec compression level.
build_chunks – If True, build chunked pixels from planes.
physical_size_x – Spatial pixel size (µm) for X.
physical_size_y – Spatial pixel size (µm) for Y.
physical_size_z – Spatial pixel size (µm) for Z when present.
physical_size_unit – Unit string for spatial axes (default “µm”).
dtype_meta – Pixel dtype string to place in metadata; if None, inferred from the (possibly cast) array’s dtype.

Returns:

Typed OME-Arrow record (schema = OME_ARROW_STRUCT).

Return type:

pa.StructScalar

Raises:

TypeError – If arr is not a NumPy ndarray.
ValueError – If dim_order is invalid or dimensions are non-positive.

Notes

If Z is not in dim_order, size_z will be 1 and the meta dimension_order becomes “XYCT”; otherwise “XYZCT”.
If T/C are absent in dim_order, they default to size 1.

ome_arrow.ingest.from_ome_parquet(parquet_path: str | Path, *, column_name: str | None = 'ome_arrow', row_index: int = 0, strict_schema: bool = False, return_array: bool = False) → StructScalar | tuple[StructScalar, StructArray][source]#

Read an OME-Arrow record from a Parquet file.

Parameters:

parquet_path – Path to the Parquet file.
column_name – Column to read; auto-detected when None or invalid.
row_index – Row index to extract.
strict_schema – Require the exact OME-Arrow schema if True.
return_array – When True, also return a 1-row StructArray.

Returns:

A typed OME-Arrow StructScalar, or (StructScalar, StructArray) when return_array=True.

Raises:

FileNotFoundError – If the Parquet path does not exist.
ValueError – If the row index is out of range or no suitable column exists.

Notes

This reader targets the row group containing row_index and requests only column_name when provided, avoiding eager full-table reads.

ome_arrow.ingest.from_ome_vortex(vortex_path: str | Path, *, column_name: str | None = 'ome_arrow', row_index: int = 0, strict_schema: bool = False, return_array: bool = False) → StructScalar | tuple[StructScalar, StructArray][source]#

Read an OME-Arrow record from a Vortex file.

Parameters:

vortex_path – Path to the Vortex file.
column_name – Column to read; auto-detected when None or invalid.
row_index – Row index to extract.
strict_schema – Require the exact OME-Arrow schema if True.
return_array – When True, also return a 1-row StructArray.

Returns:

A typed OME-Arrow StructScalar, or (StructScalar, StructArray) when return_array=True.

Raises:

FileNotFoundError – If the Vortex path does not exist.
ImportError – If the optional vortex-data dependency is missing.
ValueError – If the row index is out of range or no suitable column exists.

ome_arrow.ingest.from_ome_zarr(zarr_path: str | Path, image_id: str | None = None, name: str | None = None, image_type: str | None = None, channel_names: Sequence[str] | None = None, acquisition_datetime: datetime | None = None, clamp_to_uint16: bool = True, chunk_encoding: Literal['list', 'bytes'] = 'list', chunk_compression: str | None = None, chunk_compression_level: int | None = None) → StructScalar[source]#

Read an OME-Zarr directory and return a typed OME-Arrow StructScalar.

Uses BioIO with the OMEZarrReader backend to read TCZYX (or XY) data, flattens each YX plane into OME-Arrow planes, and builds a validated StructScalar via to_ome_arrow.

Parameters:

zarr_path – Path to the OME-Zarr directory (e.g., “image.ome.zarr”).
image_id – Optional stable image identifier (defaults to directory stem).
name – Optional display name (defaults to directory name).
image_type – Optional image kind (e.g., “image”, “label”).
channel_names – Optional list of channel names. Defaults to C0, C1, …
acquisition_datetime – Optional datetime (defaults to UTC now).
clamp_to_uint16 – If True, cast pixels to uint16.
chunk_encoding – "list" stores historical numeric pixel lists. "bytes" stores compact typed chunk byte buffers.
chunk_compression – Optional leaf-level compression for byte chunks, such as "zstd" or "lz4".
chunk_compression_level – Optional codec compression level.

Returns:

Validated OME-Arrow struct for this image.

Return type:

pa.StructScalar

ome_arrow.ingest.from_stack_pattern_path(pattern_path: str | Path, default_dim_for_unspecified: str = 'C', map_series_to: str | None = 'T', clamp_to_uint16: bool = True, channel_names: List[str] | None = None, image_id: str | None = None, name: str | None = None, image_type: str | None = None) → StructScalar[source]#

Build an OME-Arrow record from a filename pattern describing a stack.

Parameters:

pattern_path – Path or pattern string describing the stack layout.
default_dim_for_unspecified – Dimension to use when tokens lack a dim.
map_series_to – Dimension to map series tokens to (e.g., “T”), or None.
clamp_to_uint16 – Whether to clamp pixel values to uint16.
channel_names – Optional list of channel names to apply.
image_id – Optional image identifier override.
name – Optional display name override.
image_type – Optional image kind (e.g., “image”, “label”).

Returns:

A validated OME-Arrow StructScalar describing the stack.

ome_arrow.ingest.from_tiff(tiff_path: str | Path, image_id: str | None = None, name: str | None = None, image_type: str | None = None, channel_names: Sequence[str] | None = None, acquisition_datetime: datetime | None = None, clamp_to_uint16: bool = True, chunk_encoding: Literal['list', 'bytes'] = 'list', chunk_compression: str | None = None, chunk_compression_level: int | None = None) → StructScalar[source]#

Read a TIFF and return a typed OME-Arrow StructScalar.

Uses bioio to read TCZYX (or XY) data, flattens each YX plane, and delegates struct creation to to_struct_scalar.

Parameters:

tiff_path – Path to a TIFF readable by bioio.
image_id – Optional stable image identifier (defaults to stem).
name – Optional human label (defaults to file name).
image_type – Optional image kind (e.g., “image”, “label”).
channel_names – Optional channel names; defaults to C0..C{n-1}.
acquisition_datetime – Optional acquisition time (UTC now if None).
clamp_to_uint16 – If True, clamp/cast planes to uint16.
chunk_encoding – "list" stores historical numeric pixel lists. "bytes" stores compact typed chunk byte buffers.
chunk_compression – Optional leaf-level compression for byte chunks, such as "zstd" or "lz4".
chunk_compression_level – Optional codec compression level.

Returns:

pa.StructScalar validated against struct.

ome_arrow.ingest.from_torch_array(arr: Any, *, dim_order: str | None = None, image_id: str | None = None, name: str | None = None, image_type: str | None = None, channel_names: Sequence[str] | None = None, acquisition_datetime: datetime | None = None, clamp_to_uint16: bool = True, chunk_shape: Tuple[int, int, int] | None=(1, 512, 512), chunk_order: str = "ZYX", chunk_encoding: Literal['list', 'bytes']="list", chunk_compression: str | None = None, chunk_compression_level: int | None = None, build_chunks: bool = True, physical_size_x: float = 1.0, physical_size_y: float = 1.0, physical_size_z: float = 1.0, physical_size_unit: str = "µm", dtype_meta: str | None = None) → StructScalar[source]#

Build an OME-Arrow StructScalar from a torch tensor.

This is useful when your pipeline already works with torch.Tensor objects (for example model inputs/outputs) and you want a direct path into the canonical OME-Arrow struct without manually converting and reshaping in user code.

Parameters:

arr – torch.Tensor image data.
dim_order – Axis labels for arr. If None, infer from rank: 2D->”YX”, 3D->”ZYX”, 4D->”TCYX”, 5D->”TCZYX”.
image_id – Optional stable image identifier.
name – Optional human label.
image_type – Open-ended image kind (e.g., “image”, “label”).
channel_names – Optional channel names. Defaults to None. When None (or length does not match channel count), names are auto-generated as C0..C{n-1} (for example, 3 channels become C0, C1, C2).
acquisition_datetime – Defaults to now (UTC) if None.
clamp_to_uint16 – If True, clamp/cast planes to uint16 before serialization.
chunk_shape – Chunk shape as (Z, Y, X). Defaults to (1, 512, 512).
chunk_order – Flattening order for chunk pixels (default “ZYX”).
chunk_encoding – "list" stores historical numeric pixel lists. "bytes" stores compact typed chunk byte buffers.
chunk_compression – Optional leaf-level compression for byte chunks, such as "zstd" or "lz4".
chunk_compression_level – Optional codec compression level.
build_chunks – If True, build chunked pixels from planes.
physical_size_x – Spatial pixel size (µm) for X.
physical_size_y – Spatial pixel size (µm) for Y.
physical_size_z – Spatial pixel size (µm) for Z when present.
physical_size_unit – Unit string for spatial axes (default “µm”).
dtype_meta – Pixel dtype string to place in metadata.

Returns:

Typed OME-Arrow record.

Return type:

pa.StructScalar

ome_arrow.ingest.open_lazy_plane_source(source: str) → tuple[dict[str, Any], Callable[[int, int, int], ndarray]] | None[source]#

Open a source-backed per-plane loader for lazy tensor execution.

Parameters:: source – Input path/URL string for TIFF or OME-Zarr sources.
Returns:: A tuple of (pixels_meta, plane_loader) when source-backed lazy plane loading is supported for source; otherwise None.

ome_arrow.ingest.to_ome_arrow(type_: str = OME_ARROW_TAG_TYPE, version: str = OME_ARROW_TAG_VERSION, image_id: str = "unnamed", name: str = "unknown", image_type: str | None = "image", acquisition_datetime: datetime | None = None, dimension_order: str = "XYZCT", dtype: str = "uint16", size_x: int = 1, size_y: int = 1, size_z: int = 1, size_c: int = 1, size_t: int = 1, physical_size_x: float = 1.0, physical_size_y: float = 1.0, physical_size_z: float = 1.0, physical_size_unit: str = "µm", channels: Dict[str, ~typing.Any]] | None=None, planes: Dict[str, ~typing.Any]] | None=None, chunks: Dict[str, ~typing.Any]] | None=None, chunk_shape: Tuple[int, int, int] | None=(1, 512, 512), chunk_order: str = "ZYX", chunk_encoding: Literal['list', 'bytes']="list", chunk_compression: str | None = None, chunk_compression_level: int | None = None, build_chunks: bool = True, masks: Any = None) → StructScalar[source]#

Create a typed OME-Arrow StructScalar with sensible defaults.

This builds and validates a nested dict that conforms to the given StructType (e.g., OME_ARROW_STRUCT). You can override any field explicitly; others use safe defaults.

Parameters:

type – Top-level type string (“ome.arrow” by default).
version – Specification version string.
image_id – Unique image identifier.
name – Human-friendly name.
image_type – Open-ended image kind (e.g., “image”, “label”). Note that from_* helpers pass image_type=None by default to preserve “unspecified” vs explicitly set (“image”).
acquisition_datetime – Datetime of acquisition (defaults to now).
dimension_order – Dimension order (“XYZCT” or “XYCT”).
dtype – Pixel data type string (e.g., “uint16”).
size_x – Axis sizes.
size_y – Axis sizes.
size_z – Axis sizes.
size_c – Axis sizes.
size_t – Axis sizes.
physical_size_x/y/z – Physical scaling in µm.
physical_size_unit – Unit string, default “µm”.
channels – List of channel dicts. Autogenerates one if None.
planes – List of plane dicts. Empty if None.
chunks – Optional list of chunk dicts. If None and build_chunks is True, chunks are derived from planes using chunk_shape.
chunk_shape – Chunk shape as (Z, Y, X). Defaults to (1, 512, 512).
chunk_order – Flattening order for chunk pixels (default “ZYX”).
chunk_encoding – "list" stores historical numeric pixel lists. "bytes" stores compact typed chunk byte buffers.
chunk_compression – Optional leaf-level compression for byte chunks, such as "zstd" or "lz4".
chunk_compression_level – Optional codec compression level.
build_chunks – If True, build chunked pixels from planes when chunks is None.
masks – Optional placeholder for future annotations.

Returns:

A validated StructScalar for the schema.

Return type:

pa.StructScalar

Example

>>> s = to_struct_scalar(OME_ARROW_STRUCT, image_id="img001")
>>> s.type == OME_ARROW_STRUCT
True

ome_arrow.export#

Module for exporting OME-Arrow data to other formats.

ome_arrow.export.plane_from_chunks(data: Dict[str, Any] | StructScalar, *, t: int, c: int, z: int, dtype: dtype = np.uint16, strict: bool = True, clamp: bool = False) → ndarray[source]#

Extract a single (t, c, z) plane using chunked pixels when available.

Parameters:

data – OME-Arrow data as a Python dict or a pa.StructScalar.
t – Time index for the plane.
c – Channel index for the plane.
z – Z index for the plane.
dtype – Output dtype (default: np.uint16).
strict – When True, raise if chunk pixels are malformed.
clamp – If True, clamp values to the valid range of the target dtype.

Returns:

2D array with shape (Y, X).

Return type:

np.ndarray

Raises:

KeyError – If required OME-Arrow fields are missing.
ValueError – If indices are out of range or pixels are malformed.

ome_arrow.export.to_numpy(data: Dict[str, Any] | StructScalar, dtype: dtype = np.uint16, strict: bool = True, clamp: bool = False) → ndarray[source]#

Convert an OME-Arrow record into a NumPy array shaped (T,C,Z,Y,X).

The OME-Arrow “planes” are flattened YX slices indexed by (z, t, c). When chunks are present, this function reconstitutes the dense TCZYX array from chunked pixels instead of planes.

Parameters:

data – OME-Arrow data as a Python dict or a pa.StructScalar.
dtype – Output dtype (default: np.uint16). If different from plane values, a cast (and optional clamp) is applied.
strict – When True, raise if a plane has wrong pixel length. When False, truncate/pad that plane to the expected length.
clamp – If True, clamp values to the valid range of the target dtype before casting.

Returns:

Dense array with shape (T, C, Z, Y, X).

Return type:

np.ndarray

Raises:

KeyError – If required OME-Arrow fields are missing.
ValueError – If dimensions are invalid or planes are malformed.

Examples

>>> arr = ome_arrow_to_tczyx(my_row)  # (T, C, Z, Y, X)
>>> arr.shape
(1, 2, 1, 512, 512)

ome_arrow.export.to_ome_parquet(data: Dict[str, Any] | StructScalar, out_path: str, column_name: str = 'image', file_metadata: Dict[str, str] | None = None, compression: str | None = 'zstd', row_group_size: int | None = None, inline_chunk_encoding: Literal['existing', 'list', 'bytes'] = 'existing', inline_chunk_compression: str | None = None, inline_chunk_compression_level: int | None = None) → None[source]#: Export an OME-Arrow record to a Parquet file as a single-row, single-column table. The single column holds a struct with an OME-Arrow schema.

ome_arrow.export.to_ome_tiff(data: Dict[str, Any] | StructScalar, out_path: str, *, dtype: dtype = np.uint16, clamp: bool = False, dim_order: str = 'TCZYX', compression: str | None = 'zlib', compression_level: int = 6, tile: Tuple[int, int] | None = None, use_channel_colors: bool = False) → None[source]#

Export an OME-Arrow record to OME-TIFF using BioIO’s OmeTiffWriter.

Notes

No ‘bigtiff’ kwarg is passed (invalid for tifffile.TiffWriter.write()). BigTIFF selection is automatic based on file size.

ome_arrow.export.to_ome_vortex(data: Dict[str, Any] | StructScalar, out_path: str, column_name: str = 'image', file_metadata: Dict[str, str] | None = None) → None[source]#

Export an OME-Arrow record to a Vortex file.

The file is written as a single-row, single-column Arrow table where the column holds a struct with the OME-Arrow schema.

Parameters:

data – OME-Arrow dict or StructScalar.
out_path – Output path for the Vortex file.
column_name – Column name to store the struct.
file_metadata – Optional file-level metadata to attach.

Raises:

ImportError – If the optional vortex-data dependency is missing.

ome_arrow.export.to_ome_zarr(data: Dict[str, Any] | StructScalar, out_path: str, *, dtype: dtype = np.uint16, clamp: bool = False, dim_order: str = 'TCZYX', multiscale_levels: int = 1, downscale_spatial_by: int = 2, zarr_format: int = 3, chunks: Tuple[int, int, int, int, int] | None = None, shards: Tuple[int, int, int, int, int] | None = None, compressor: str | None = 'zstd', compressor_level: int = 3, image_name: str | None = None) → None[source]#

Write OME-Zarr using your OMEZarrWriter (instance API).

Builds arr as (T,C,Z,Y,X) using your to_numpy.
Creates level shapes for a multiscale pyramid (if multiscale_levels>1).
Chooses Blosc codec compatible with zarr_format (v2 vs v3).
Populates axes names/types/units and physical pixel sizes from pixels_meta.
Uses default TCZYX chunks if none are provided.

ome_arrow.meta#

Meta-definition for OME-Arrow format.

ome_arrow.tensor#

Tensor view utilities for OME-Arrow pixel data.

Bases: object

Deferred TensorView plan with Polars-style collect semantics.

collect() → TensorView[source]#: Materialize this lazy plan into a concrete TensorView.

property device: str#: Return the tensor storage device.

Note

For unresolved lazy plans, this returns "cpu" without calling collect().

property dtype: dtype#: Return the tensor dtype.

Note

Accessing this property calls collect() and may materialize data from source files (for example Parquet/TIFF), which can be expensive.

iter_dlpack(*, batch_size: int | None = None, tile_size: tuple[int, int] | None = None, tiles: tuple[int, int] | None = None, shuffle: bool = False, seed: int | None = None, prefetch: int = 0, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') → Iterator[Any][source]#

Iterate DLPack outputs in batches or 2D tiles.

Parameters:

batch_size – Number of time indices per batch.
tile_size – Optional tile size as (tile_h, tile_w).
tiles – Deprecated alias for tile_size.
shuffle – Whether to shuffle iteration order.
seed – Optional random seed for deterministic shuffling.
prefetch – Placeholder prefetch count.
device – Target device ("cpu" or "cuda").
contiguous – When True, materialize contiguous data when needed.
mode – Export mode ("arrow" or "numpy").

Returns:

Iterator of DLPack-compatible objects.

Return type:

Iterator[Any]

iter_tiles_3d(*, tile_size: tuple[int, int, int], shuffle: bool = False, seed: int | None = None, prefetch: int = 0, device: str = 'cpu', contiguous: bool = True, mode: str = 'numpy') → Iterator[Any][source]#

Iterate DLPack outputs in 3D tiles.

Parameters:

tile_size – Tile shape as (tile_z, tile_h, tile_w).
shuffle – Whether to shuffle iteration order.
seed – Optional random seed for deterministic shuffling.
prefetch – Placeholder prefetch count.
device – Target device ("cpu" or "cuda").
contiguous – When True, materialize contiguous data when needed.
mode – Export mode (currently "numpy" only).

Returns:

Iterator of DLPack-compatible objects.

Return type:

Iterator[Any]

property layout: str#: Return the effective tensor layout.

Note

Accessing this property calls collect() and may materialize data from source files (for example Parquet/TIFF), which can be expensive.

select(*, t: int | slice | Sequence[int] | None | _Unset = _UNSET, z: int | slice | Sequence[int] | None | _Unset = _UNSET, c: int | slice | Sequence[int] | None | _Unset = _UNSET, roi: tuple[int, int, int, int] | None | _Unset = _UNSET, roi3d: tuple[int, int, int, int, int, int] | None | _Unset = _UNSET, roi_nd: tuple[int, ...] | None | _Unset = _UNSET, roi_type: Literal['2d', '2d_timelapse', '3d', '4d'] | None | _Unset = _UNSET, tile: tuple[int, int] | None | _Unset = _UNSET) → LazyTensorView[source]#: Return a new lazy plan with updated index/ROI selections.

property shape: tuple[int, ...]#: Return the tensor shape.

Note

Accessing this property calls collect() and may materialize data from source files (for example Parquet/TIFF), which can be expensive.

property strides: tuple[int, ...]#: Return tensor strides in bytes.

Note

Accessing this property calls collect() and may materialize data from source files (for example Parquet/TIFF), which can be expensive.

to_dlpack(*, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') → Any[source]#

Export the planned view as a DLPack object.

Parameters:

device – Target device ("cpu" or "cuda").
contiguous – When True, materialize contiguous data when needed.
mode – Export mode ("arrow" or "numpy").

Returns:

DLPack-compatible object.

Return type:

Any

to_jax(*, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') → Any[source]#

Convert the planned view to a JAX array.

Parameters:

device – Target device ("cpu" or "cuda").
contiguous – When True, materialize contiguous data when needed.
mode – Export mode ("arrow" or "numpy").

Returns:

JAX array when JAX is installed.

Return type:

Any

to_numpy(*, contiguous: bool = False) → ndarray[source]#

Materialize as a NumPy array.

Parameters:: contiguous – When True, return a contiguous array copy.
Returns:: Materialized array.
Return type:: np.ndarray

to_torch(*, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') → Any[source]#

Convert the planned view to a torch tensor.

Parameters:

device – Target device ("cpu" or "cuda").
contiguous – When True, materialize contiguous data when needed.
mode – Export mode ("arrow" or "numpy").

Returns:

torch.Tensor when torch is installed.

Return type:

Any

with_layout(layout: str) → LazyTensorView[source]#: Return a new lazy view with an updated layout.

Bases: object

View OME-Arrow pixel data as a tensor-like object.

Parameters:

data – OME-Arrow dict, StructScalar, or 1-row StructArray/ChunkedArray.
t – Time index selection (int, slice, or sequence). Default: all.
z – Z index selection (int, slice, or sequence). Default: all.
c – Channel index selection (int, slice, or sequence). Default: all.
roi – Spatial crop (x, y, w, h) in pixels. Default: full frame.
roi3d – Spatial + depth crop (x, y, z, w, h, d). This is a convenience alias for roi=(x, y, w, h) and z=slice(z, z + d).
roi_nd – General ROI tuple with min/max bounds, interpreted by roi_type.
roi_type – ROI interpretation mode for roi_nd. Supported values: "2d" = (ymin, xmin, ymax, xmax); "2d_timelapse" = (tmin, ymin, xmin, tmax, ymax, xmax); "3d" = (zmin, ymin, xmin, zmax, ymax, xmax); "4d" = (tmin, zmin, ymin, xmin, tmax, zmax, ymax, xmax).
tile – Tile index (tile_y, tile_x) based on chunk grid.
layout – Desired layout string using TZCYX letters where T=time, Z=depth, C=channel, Y=row axis, X=column axis. TZCHW aliases are also accepted for compatibility.
dtype – Output dtype override. Defaults to pixels_meta.type when valid.
chunk_policy – Handling for pyarrow.ChunkedArray inputs. “auto” keeps multi-chunk arrays and unwraps single-chunk arrays. “combine” always combines multi-chunk arrays eagerly. “keep” always keeps chunked storage.
channel_policy – Behavior when dropping C from layout while multiple channels are selected. “error” raises (default). “first” keeps the first channel.

property device: str#: Return the storage device for the view (currently always “cpu”).

property dtype: dtype#: Return the tensor dtype.

Iterate over DLPack capsules in batches or tiles.

Parameters:

batch_size – Number of T indices per batch. Defaults to full range.
tile_size – Tile size (tile_h, tile_w) in pixels for spatial tiling.
tiles – Deprecated alias for tile_size.
shuffle – Whether to shuffle the iteration order.
seed – Seed for deterministic shuffling.
prefetch – Placeholder for future asynchronous prefetch support. Currently validated but does not change synchronous iteration.
device – Target device (“cpu” or “cuda”).
contiguous – When True, materialize contiguous buffers if needed.
mode – Export mode. “arrow” returns 1D values buffers.

Yields:

DLPack object per batch or tile.

Iterate over 3D tiles (z, y, x) as DLPack capsules.

Parameters:

tile_size – Tile size as (tile_z, tile_h, tile_w).
shuffle – Whether to shuffle the tile order.
seed – Seed for deterministic shuffling.
prefetch – Placeholder for future asynchronous prefetch support.
device – Target device (“cpu” or “cuda”).
contiguous – When True, materialize contiguous buffers if needed.
mode – Export mode. Must be "numpy" for tiled 3D iteration.

Yields:

DLPack object per 3D tile.

property layout: str#: Return the effective layout for this view.

property shape: tuple[int, ...]#: Return the tensor shape for the current layout.

property strides: tuple[int, ...]#: Return the tensor strides in bytes for the current layout.

to_dlpack(*, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') → Any[source]#

Export the view as a DLPack capsule.

Parameters:

device – Target device (“cpu” or “cuda”).
contiguous – When True, materialize a contiguous buffer if needed.
mode – Export mode. “arrow” returns a capsule for the Arrow values buffer (1D). “numpy” materializes a tensor-shaped NumPy view. Zero-copy Arrow mode requires Arrow-backed inputs (typically Parquet/Vortex ingestion with canonical schema); StructScalar and dict inputs are normalized through Python objects.

Returns:

DLPack object compatible with torch/jax import utilities. The returned object is single-use per DLPack ownership semantics: after a consumer imports it, the capsule must not be reused.

Raises:

ValueError – If an unsupported device is requested.
RuntimeError – If required optional dependencies are missing.

to_jax(*, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') → Any[source]#

Convert the view into a JAX array using DLPack.

Parameters:

device – Target device (“cpu” or “cuda”).
contiguous – When True, materialize a contiguous buffer if needed.
mode – Export mode. “arrow” returns a 1D values buffer.

Returns:

Array backed by the DLPack capsule.

Return type:

jax.Array

to_numpy(*, contiguous: bool = False) → ndarray[source]#

Materialize the view as a NumPy array.

Parameters:: contiguous – When True, return a contiguous array copy.
Returns:: Array in the requested layout.
Return type:: np.ndarray

to_torch(*, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') → Any[source]#

Convert the view into a torch.Tensor using DLPack.

Parameters:

device – Target device (“cpu” or “cuda”).
contiguous – When True, materialize a contiguous buffer if needed.
mode – Export mode. “arrow” returns a 1D values buffer.

Returns:

Tensor backed by the DLPack capsule.

Return type:

torch.Tensor

with_layout(layout: str) → TensorView[source]#

Return a new TensorView with a layout override.

Parameters:: layout – Desired layout string using TZCYX letters where T=time, Z=depth, C=channel, Y=row axis, X=column axis. TZCHW aliases are also accepted for compatibility.
Returns:: New view with the requested layout.
Return type:: TensorView

ome_arrow.transform#

Module for transforming OME-Arrow data (e.g., slices, projections, or other changes).

ome_arrow.transform.slice_ome_arrow(data: Dict[str, Any] | StructScalar, x_min: int, x_max: int, y_min: int, y_max: int, t_indices: Iterable[int] | None = None, c_indices: Iterable[int] | None = None, z_indices: Iterable[int] | None = None, fill_missing: bool = True) → StructScalar[source]#

Create a cropped copy of an OME-Arrow record.

Crops spatially to [y_min:y_max, x_min:x_max] (half-open) and, if provided, filters/reindexes T/C/Z to the given index sets.

Parameters:

data (dict | pa.StructScalar) – OME-Arrow record.
x_min (int) – Half-open crop bounds in pixels (0-based).
x_max (int) – Half-open crop bounds in pixels (0-based).
y_min (int) – Half-open crop bounds in pixels (0-based).
y_max (int) – Half-open crop bounds in pixels (0-based).
t_indices (Iterable[int] | None) – Optional explicit indices to keep for T, C, Z. If None, keep all. Selected indices are reindexed to 0..len-1 in the output.
c_indices (Iterable[int] | None) – Optional explicit indices to keep for T, C, Z. If None, keep all. Selected indices are reindexed to 0..len-1 in the output.
z_indices (Iterable[int] | None) – Optional explicit indices to keep for T, C, Z. If None, keep all. Selected indices are reindexed to 0..len-1 in the output.
fill_missing (bool) – If True, any missing (t,c,z) planes in the selection are zero-filled.

Returns:

New OME-Arrow record with updated sizes and planes.

Return type:

pa.StructScalar

ome_arrow.utils#

Utility functions for ome-arrow.

ome_arrow.utils.describe_ome_arrow(data: StructScalar | dict) → Dict[str, Any][source]#

Describe the structure of an OME-Arrow image record.

Reads pixels_meta from the OME-Arrow struct to report TCZYX dimensions and classify whether it’s a 2D image, 3D z-stack, movie/timelapse, or 4D timelapse-volume. Also flags whether it is multi-channel (C > 1) or single-channel.

Parameters:

data – OME-Arrow row as a pa.StructScalar or plain dict.

Returns:

shape: (T, C, Z, Y, X)
type: classification string
summary: human-readable text

Return type:

dict with keys

ome_arrow.utils.verify_ome_arrow(data: Any, struct: StructType) → bool[source]#

Return True if data conforms to the given Arrow StructType.

This tries to convert data into a pyarrow scalar using struct as the declared type. If conversion fails, the data does not match.

Parameters:

data – A nested Python dict/list structure to test.
struct – The expected pyarrow.StructType schema.

Returns:

True if conversion succeeds, False otherwise.

Return type:

bool

ome_arrow.view#

Viewing utilities for OME-Arrow data.

ome_arrow.view.view_matplotlib(data: dict[str, object] | StructScalar, tcz: tuple[int, int, int] = (0, 0, 0), autoscale: bool = True, vmin: int | None = None, vmax: int | None = None, cmap: str = 'gray', show: bool = True) → tuple[Figure, Axes, AxesImage][source]#

Render a single (t, c, z) plane with Matplotlib.

Parameters:

data – OME-Arrow row or dict containing pixels_meta and planes.
tcz – (t, c, z) indices of the plane to render.
autoscale – If True, infer vmin/vmax from the image data.
vmin – Explicit lower display limit for intensity scaling.
vmax – Explicit upper display limit for intensity scaling.
cmap – Matplotlib colormap name.
show – Whether to display the plot immediately.

Returns:

A tuple of (figure, axes, image) from Matplotlib.

Raises:

ValueError – If the requested plane is missing or pixel sizes mismatch.

ome_arrow.view.view_pyvista(data: dict | pa.StructScalar, c: int = 0, downsample: int = 1, scaling_values: tuple[float, float, float] | None = None, opacity: str | float = 'sigmoid', clim: tuple[float, float] | None = None, show_axes: bool = True, backend: str = 'auto', interpolation: str = 'nearest', background: str = 'black', percentile_clim: tuple[float, float] = (1.0, 99.9), sampling_scale: float = 0.5, show: bool = True) → pyvista.Plotter[source]#

Jupyter-inline interactive volume view using PyVista backends. Tries ‘trame’ → ‘html’ → ‘static’ when backend=’auto’.

sampling_scale controls ray step via the mapper after add_volume.