Python API#

ome_arrow#

Init file for ome_arrow package.

ome_arrow.core#

Core of the ome_arrow package, used for classes and such.

class ome_arrow.core.OMEArrow(data: str | dict | pa.StructScalar | 'np.ndarray', tcz: Tuple[int, int, int] = (0, 0, 0), *, dim_order: str | None = None, column_name: str = 'ome_arrow', row_index: int = 0, image_type: str | None = None, lazy: bool = False)[source]#

Bases: object

Small convenience toolkit for working with ome-arrow data.

If input is a TIFF path, this loads it via tiff_to_ome_arrow. If input is a dict, it will be converted using to_struct_scalar. If input is already a pa.StructScalar, it is used as-is.

In Jupyter, evaluating the instance will render the first plane using matplotlib (via _repr_html_). Call view_matplotlib() to select a specific (z, t, c) plane.

Parameters:
  • input – TIFF path, nested dict, or pa.StructScalar.

  • struct – Expected Arrow StructType (e.g., OME_ARROW_STRUCT).

collect() OMEArrow[source]#

Materialize deferred source data and return self.

Returns:

The same instance after materialization.

Return type:

OMEArrow

property data: StructScalar#

Return the materialized OME-Arrow StructScalar.

Returns:

Materialized OME-Arrow record.

Return type:

pa.StructScalar

Raises:

RuntimeError – If the record could not be initialized.

export(how: str = 'numpy', dtype: np.dtype = np.uint16, strict: bool = True, clamp: bool = False, *, out: str | None = None, dim_order: str = 'TCZYX', compression: str | None = 'zlib', compression_level: int = 6, tile: tuple[int, int] | None = None, chunks: tuple[int, int, int, int, int] | None = None, zarr_compressor: str | None = 'zstd', zarr_level: int = 7, use_channel_colors: bool = False, parquet_column_name: str = 'ome_arrow', parquet_compression: str | None = 'zstd', parquet_metadata: dict[str, str] | None = None, vortex_column_name: str = 'ome_arrow', vortex_metadata: dict[str, str] | None = None) np.array | dict | pa.StructScalar | str[source]#

Export the OME-Arrow content in a chosen representation.

Parameters:
  • how – “numpy” → TCZYX np.ndarray “dict” → plain Python dict “scalar” → pa.StructScalar (as-is) “ome-tiff” → write OME-TIFF via BioIO “ome-zarr” → write OME-Zarr (OME-NGFF) via BioIO “parquet” → write a single-row Parquet with one struct column “vortex” → write a single-row Vortex file with one struct column

  • dtype – Target dtype for “numpy”/writers (default: np.uint16).

  • strict – For “numpy”: raise if a plane has wrong pixel length.

  • clamp – For “numpy”/writers: clamp values into dtype range before cast.

  • specific) (Keyword-only (writer)

  • ------------------------------

  • out – Output path (required for ‘ome-tiff’, ‘ome-zarr’, and ‘parquet’).

  • dim_order – Axes string for BioIO writers; default “TCZYX”.

  • tile (compression / compression_level /) – OME-TIFF options (passed through to tifffile via BioIO).

  • zarr_level (chunks / zarr_compressor /) – OME-Zarr options (chunk shape, compressor hint, level). If chunks is None, a TCZYX default is chosen (1,1,<=4,<=512,<=512).

  • use_channel_colors – Try to embed per-channel display colors when safe; otherwise omitted.

  • parquet_* – Options for Parquet export (column name, compression, file metadata).

  • vortex_* – Options for Vortex export (column name, file metadata).

Returns:

  • “numpy”: np.ndarray (T, C, Z, Y, X)

  • ”dict”: dict

  • ”scalar”: pa.StructScalar

  • ”ome-tiff”: output path (str)

  • ”ome-zarr”: output path (str)

  • ”parquet”: output path (str)

  • ”vortex”: output path (str)

Return type:

Any

Raises:

ValueError: – Unknown ‘how’ or missing required params.

info() Dict[str, Any][source]#

Describe the OME-Arrow data structure.

Returns:

  • shape: (T, C, Z, Y, X)

  • type: classification string

  • summary: human-readable text

Return type:

dict with keys

property is_lazy: bool#

Return whether this instance still has deferred work.

classmethod scan(data: str, *, tcz: Tuple[int, int, int] = (0, 0, 0), column_name: str = 'ome_arrow', row_index: int = 0, image_type: str | None = None) OMEArrow[source]#

Create a lazily-loaded OMEArrow, similar to Polars scan semantics.

Parameters:
  • data – Input source path/URL.

  • tcz – Default (t, c, z) indices used for view helpers.

  • column_name – OME-Arrow column name for tabular sources.

  • row_index – Row index for tabular sources.

  • image_type – Optional image type override.

Returns:

Lazily planned OMEArrow instance.

Return type:

OMEArrow

slice(x_min: int, x_max: int, y_min: int, y_max: int, t_indices: Iterable[int] | None = None, c_indices: Iterable[int] | None = None, z_indices: Iterable[int] | None = None, fill_missing: bool = True) OMEArrow[source]#

Create a cropped copy of an OME-Arrow record.

Crops spatially to [y_min:y_max, x_min:x_max] (half-open) and, if provided, filters/reindexes T/C/Z to the given index sets.

Parameters:
  • x_min (int) – Half-open crop bounds in pixels (0-based).

  • x_max (int) – Half-open crop bounds in pixels (0-based).

  • y_min (int) – Half-open crop bounds in pixels (0-based).

  • y_max (int) – Half-open crop bounds in pixels (0-based).

  • t_indices (Iterable[int] | None) – Optional explicit indices to keep for T, C, Z. If None, keep all. Selected indices are reindexed to 0..len-1 in the output.

  • c_indices (Iterable[int] | None) – Optional explicit indices to keep for T, C, Z. If None, keep all. Selected indices are reindexed to 0..len-1 in the output.

  • z_indices (Iterable[int] | None) – Optional explicit indices to keep for T, C, Z. If None, keep all. Selected indices are reindexed to 0..len-1 in the output.

  • fill_missing (bool) – If True, any missing (t,c,z) planes in the selection are zero-filled.

Returns:

New OME-Arrow record with updated sizes and planes.

Return type:

OMEArrow object

slice_lazy(x_min: int, x_max: int, y_min: int, y_max: int, t_indices: Iterable[int] | None = None, c_indices: Iterable[int] | None = None, z_indices: Iterable[int] | None = None, fill_missing: bool = True) OMEArrow[source]#

Return a lazily planned slice, collected on first execution.

For lazy sources created with OMEArrow.scan(...), this queues a deferred slice operation and returns a new lazy OMEArrow plan produced from OMEArrow.scan(...). For already materialized sources, this falls back to eager slice(). This method does not mutate self.

Notes

slice_lazy always returns a new plan object. Internally, the returned plan gets a fresh _lazy_slices list ([*self._lazy_slices, new_slice]), so chained plans do not share mutable slice state with the original OMEArrow. A common footgun is: oa.slice_lazy(...).collect() followed by oa.tensor_view(...). Those calls can load/materialize the same source twice because oa remains the original plan. For a single-load workflow, keep working from the value returned by slice_lazy / collect.

Parameters:
  • x_min – Inclusive minimum X index for the crop.

  • x_max – Exclusive maximum X index for the crop.

  • y_min – Inclusive minimum Y index for the crop.

  • y_max – Exclusive maximum Y index for the crop.

  • t_indices – Optional time indices to retain.

  • c_indices – Optional channel indices to retain.

  • z_indices – Optional depth indices to retain.

  • fill_missing – Whether to zero-fill missing (t, c, z) planes.

Returns:

Lazy plan when source is lazy; eager slice result otherwise.

Return type:

OMEArrow

tensor_view(*, scene: int | None = None, t: int | slice | Sequence[int] | None = None, z: int | slice | Sequence[int] | None = None, c: int | slice | Sequence[int] | None = None, roi: tuple[int, int, int, int] | None = None, roi3d: tuple[int, int, int, int, int, int] | None = None, roi_nd: tuple[int, ...] | None = None, roi_type: Literal['2d', '2d_timelapse', '3d', '4d'] | None = None, tile: tuple[int, int] | None = None, layout: str | None = None, dtype: dtype | None = None, chunk_policy: Literal['auto', 'combine', 'keep'] = 'auto', channel_policy: Literal['error', 'first'] = 'error') TensorView | LazyTensorView[source]#

Create a TensorView of the pixel data.

Parameters:
  • scene – Scene index (only 0 is supported for single-image records).

  • t – Time index selection (int, slice, or sequence). Default: all.

  • z – Z index selection (int, slice, or sequence). Default: all.

  • c – Channel index selection (int, slice, or sequence). Default: all.

  • roi – Spatial crop (x, y, w, h) in pixels.

  • roi3d – Spatial + depth crop (x, y, z, w, h, d) in pixels/planes. This is a convenience alias for roi=(x, y, w, h) and z=slice(z, z + d).

  • roi_nd – General ROI tuple with min/max bounds.

  • roi_type – ROI interpretation mode for roi_nd. Supported values: "2d", "2d_timelapse", "3d", and "4d".

  • tile – Tile index (tile_y, tile_x) based on chunk grid.

  • layout – Desired layout string using TZCYX letters where T=time, Z=depth, C=channel, Y=row axis, X=column axis. TZCHW aliases are also accepted for compatibility.

  • dtype – Output dtype override.

  • chunk_policy – Handling for pyarrow.ChunkedArray inputs.

  • channel_policy – Behavior when dropping C from layout while multiple channels are selected. “error” raises (default). “first” keeps the first channel.

Returns:

Tensor view over selected pixels. In lazy mode, this returns a deferred LazyTensorView that resolves on first execution call (for example to_numpy()) without forcing self to materialize unless deferred slice_lazy operations are queued.

Return type:

TensorView | LazyTensorView

Raises:

ValueError – If an unsupported scene is requested.

view(how: str = 'matplotlib', tcz: tuple[int, int, int] = (0, 0, 0), autoscale: bool = True, vmin: int | None = None, vmax: int | None = None, cmap: str = 'gray', show: bool = True, c: int | None = None, downsample: int = 1, opacity: str | float = 'sigmoid', clim: tuple[float, float] | None = None, show_axes: bool = True, scaling_values: tuple[float, float, float] | None = None) tuple[matplotlib.figure.Figure, Any, Any] | 'pyvista.Plotter'[source]#

Render an OME-Arrow record using Matplotlib or PyVista.

This convenience method supports two rendering backends:

  • how="matplotlib" renders a single (t, c, z) plane as a 2D image.

  • how="pyvista" creates an interactive 3D PyVista visualization.

Parameters:
  • how – Rendering backend. One of "matplotlib" or "pyvista".

  • tcz(t, c, z) indices used for plane display.

  • autoscale – Infer Matplotlib display limits from image range when vmin/vmax are not provided.

  • vmin – Lower display limit for Matplotlib intensity scaling.

  • vmax – Upper display limit for Matplotlib intensity scaling.

  • cmap – Matplotlib colormap name for single-channel display.

  • show – Whether to display the plot immediately.

  • c – Channel index override for PyVista. If None, uses tcz[1].

  • downsample – Integer downsampling factor for PyVista views. Higher values render faster for large volumes but reduce spatial resolution.

  • opacity – Opacity for PyVista. Either a float in [0, 1] or "sigmoid".

  • clim – Contrast limits (low, high) for PyVista rendering.

  • show_axes – Whether to display axes in the PyVista scene.

  • scaling_values – Physical scale multipliers (x, y, z) used by PyVista. If None, uses OME metadata-derived scaling.

Returns:

tuple[matplotlib.figure.Figure, matplotlib.axes.Axes, matplotlib.image.AxesImage] | pyvista.Plotter: For how="matplotlib", returns the tuple emitted by ome_arrow.view.view_matplotlib() as (figure, axes, image). For how="pyvista", returns a pyvista.Plotter.

Raises:
  • ValueError – If a requested plane is not found or the render mode is unsupported.

  • TypeError – If parameter types are invalid.

Notes

  • The how="pyvista" mode normally outputs an interactive visualization, but attempts to embed a static PNG snapshot for non-interactive renderers (for example, static docs builds, nbconvert HTML/PDF exports, rendered/read-only notebook views such as GitHub notebook previews, and CI log viewers).

  • When show=False and how="pyvista", the returned pyvista.Plotter can be shown later.

ome_arrow.ingest#

Converting to and from OME-Arrow formats.

ome_arrow.ingest.from_jax_array(arr: Any, *, dim_order: str | None = None, image_id: str | None = None, name: str | None = None, image_type: str | None = None, channel_names: Sequence[str] | None = None, acquisition_datetime: datetime | None = None, clamp_to_uint16: bool = True, chunk_shape: Tuple[int, int, int] | None=(1, 512, 512), chunk_order: str = "ZYX", build_chunks: bool = True, physical_size_x: float = 1.0, physical_size_y: float = 1.0, physical_size_z: float = 1.0, physical_size_unit: str = "µm", dtype_meta: str | None = None) StructScalar[source]#

Build an OME-Arrow StructScalar from a JAX array.

This is useful when your pipeline already works with jax.Array objects and you want a direct path into the canonical OME-Arrow struct without manual conversion boilerplate in user code.

Parameters:
  • arrjax.Array image data.

  • dim_order – Axis labels for arr. If None, infer from rank: 2D->”YX”, 3D->”ZYX”, 4D->”TCYX”, 5D->”TCZYX”.

  • image_id – Optional stable image identifier.

  • name – Optional human label.

  • image_type – Open-ended image kind (e.g., “image”, “label”).

  • channel_names – Optional channel names. Defaults to None. When None (or length does not match channel count), names are auto-generated as C0..C{n-1} (for example, 3 channels become C0, C1, C2).

  • acquisition_datetime – Defaults to now (UTC) if None.

  • clamp_to_uint16 – If True, clamp/cast planes to uint16 before serialization.

  • chunk_shape – Chunk shape as (Z, Y, X). Defaults to (1, 512, 512).

  • chunk_order – Flattening order for chunk pixels (default “ZYX”).

  • build_chunks – If True, build chunked pixels from planes.

  • physical_size_x – Spatial pixel size (µm) for X.

  • physical_size_y – Spatial pixel size (µm) for Y.

  • physical_size_z – Spatial pixel size (µm) for Z when present.

  • physical_size_unit – Unit string for spatial axes (default “µm”).

  • dtype_meta – Pixel dtype string to place in metadata.

Returns:

Typed OME-Arrow record.

Return type:

pa.StructScalar

ome_arrow.ingest.from_numpy(arr: ndarray, *, dim_order: str = "TCZYX", image_id: str | None = None, name: str | None = None, image_type: str | None = None, channel_names: Sequence[str] | None = None, acquisition_datetime: datetime | None = None, clamp_to_uint16: bool = True, chunk_shape: Tuple[int, int, int] | None=(1, 512, 512), chunk_order: str = "ZYX", build_chunks: bool = True, physical_size_x: float = 1.0, physical_size_y: float = 1.0, physical_size_z: float = 1.0, physical_size_unit: str = "µm", dtype_meta: str | None = None) StructScalar[source]#

Build an OME-Arrow StructScalar from a NumPy array.

Parameters:
  • arr – Image data with axes described by dim_order.

  • dim_order – Axis labels for arr. Must include “Y” and “X”. Supported examples: “YX”, “ZYX”, “CYX”, “CZYX”, “TYX”, “TCYX”, “TCZYX”.

  • image_id – Optional stable image identifier.

  • name – Optional human label.

  • image_type – Open-ended image kind (e.g., “image”, “label”).

  • channel_names – Optional channel names. Defaults to None. When None (or length does not match channel count), names are auto-generated as C0..C{n-1} (for example, 3 channels become C0, C1, C2).

  • acquisition_datetime – Defaults to now (UTC) if None.

  • clamp_to_uint16 – If True, clamp/cast planes to uint16 before serialization.

  • chunk_shape – Chunk shape as (Z, Y, X). Defaults to (1, 512, 512).

  • chunk_order – Flattening order for chunk pixels (default “ZYX”).

  • build_chunks – If True, build chunked pixels from planes.

  • physical_size_x – Spatial pixel size (µm) for X.

  • physical_size_y – Spatial pixel size (µm) for Y.

  • physical_size_z – Spatial pixel size (µm) for Z when present.

  • physical_size_unit – Unit string for spatial axes (default “µm”).

  • dtype_meta – Pixel dtype string to place in metadata; if None, inferred from the (possibly cast) array’s dtype.

Returns:

Typed OME-Arrow record (schema = OME_ARROW_STRUCT).

Return type:

pa.StructScalar

Raises:
  • TypeError – If arr is not a NumPy ndarray.

  • ValueError – If dim_order is invalid or dimensions are non-positive.

Notes

  • If Z is not in dim_order, size_z will be 1 and the meta dimension_order becomes “XYCT”; otherwise “XYZCT”.

  • If T/C are absent in dim_order, they default to size 1.

ome_arrow.ingest.from_ome_parquet(parquet_path: str | Path, *, column_name: str | None = 'ome_arrow', row_index: int = 0, strict_schema: bool = False, return_array: bool = False) StructScalar | tuple[StructScalar, StructArray][source]#

Read an OME-Arrow record from a Parquet file.

Parameters:
  • parquet_path – Path to the Parquet file.

  • column_name – Column to read; auto-detected when None or invalid.

  • row_index – Row index to extract.

  • strict_schema – Require the exact OME-Arrow schema if True.

  • return_array – When True, also return a 1-row StructArray.

Returns:

A typed OME-Arrow StructScalar, or (StructScalar, StructArray) when return_array=True.

Raises:
  • FileNotFoundError – If the Parquet path does not exist.

  • ValueError – If the row index is out of range or no suitable column exists.

Notes

This reader targets the row group containing row_index and requests only column_name when provided, avoiding eager full-table reads.

ome_arrow.ingest.from_ome_vortex(vortex_path: str | Path, *, column_name: str | None = 'ome_arrow', row_index: int = 0, strict_schema: bool = False, return_array: bool = False) StructScalar | tuple[StructScalar, StructArray][source]#

Read an OME-Arrow record from a Vortex file.

Parameters:
  • vortex_path – Path to the Vortex file.

  • column_name – Column to read; auto-detected when None or invalid.

  • row_index – Row index to extract.

  • strict_schema – Require the exact OME-Arrow schema if True.

  • return_array – When True, also return a 1-row StructArray.

Returns:

A typed OME-Arrow StructScalar, or (StructScalar, StructArray) when return_array=True.

Raises:
  • FileNotFoundError – If the Vortex path does not exist.

  • ImportError – If the optional vortex-data dependency is missing.

  • ValueError – If the row index is out of range or no suitable column exists.

ome_arrow.ingest.from_ome_zarr(zarr_path: str | Path, image_id: str | None = None, name: str | None = None, image_type: str | None = None, channel_names: Sequence[str] | None = None, acquisition_datetime: datetime | None = None, clamp_to_uint16: bool = True) StructScalar[source]#

Read an OME-Zarr directory and return a typed OME-Arrow StructScalar.

Uses BioIO with the OMEZarrReader backend to read TCZYX (or XY) data, flattens each YX plane into OME-Arrow planes, and builds a validated StructScalar via to_ome_arrow.

Parameters:
  • zarr_path – Path to the OME-Zarr directory (e.g., “image.ome.zarr”).

  • image_id – Optional stable image identifier (defaults to directory stem).

  • name – Optional display name (defaults to directory name).

  • image_type – Optional image kind (e.g., “image”, “label”).

  • channel_names – Optional list of channel names. Defaults to C0, C1, …

  • acquisition_datetime – Optional datetime (defaults to UTC now).

  • clamp_to_uint16 – If True, cast pixels to uint16.

Returns:

Validated OME-Arrow struct for this image.

Return type:

pa.StructScalar

ome_arrow.ingest.from_stack_pattern_path(pattern_path: str | Path, default_dim_for_unspecified: str = 'C', map_series_to: str | None = 'T', clamp_to_uint16: bool = True, channel_names: List[str] | None = None, image_id: str | None = None, name: str | None = None, image_type: str | None = None) StructScalar[source]#

Build an OME-Arrow record from a filename pattern describing a stack.

Parameters:
  • pattern_path – Path or pattern string describing the stack layout.

  • default_dim_for_unspecified – Dimension to use when tokens lack a dim.

  • map_series_to – Dimension to map series tokens to (e.g., “T”), or None.

  • clamp_to_uint16 – Whether to clamp pixel values to uint16.

  • channel_names – Optional list of channel names to apply.

  • image_id – Optional image identifier override.

  • name – Optional display name override.

  • image_type – Optional image kind (e.g., “image”, “label”).

Returns:

A validated OME-Arrow StructScalar describing the stack.

ome_arrow.ingest.from_tiff(tiff_path: str | Path, image_id: str | None = None, name: str | None = None, image_type: str | None = None, channel_names: Sequence[str] | None = None, acquisition_datetime: datetime | None = None, clamp_to_uint16: bool = True) StructScalar[source]#

Read a TIFF and return a typed OME-Arrow StructScalar.

Uses bioio to read TCZYX (or XY) data, flattens each YX plane, and delegates struct creation to to_struct_scalar.

Parameters:
  • tiff_path – Path to a TIFF readable by bioio.

  • image_id – Optional stable image identifier (defaults to stem).

  • name – Optional human label (defaults to file name).

  • image_type – Optional image kind (e.g., “image”, “label”).

  • channel_names – Optional channel names; defaults to C0..C{n-1}.

  • acquisition_datetime – Optional acquisition time (UTC now if None).

  • clamp_to_uint16 – If True, clamp/cast planes to uint16.

Returns:

pa.StructScalar validated against struct.

ome_arrow.ingest.from_torch_array(arr: Any, *, dim_order: str | None = None, image_id: str | None = None, name: str | None = None, image_type: str | None = None, channel_names: Sequence[str] | None = None, acquisition_datetime: datetime | None = None, clamp_to_uint16: bool = True, chunk_shape: Tuple[int, int, int] | None=(1, 512, 512), chunk_order: str = "ZYX", build_chunks: bool = True, physical_size_x: float = 1.0, physical_size_y: float = 1.0, physical_size_z: float = 1.0, physical_size_unit: str = "µm", dtype_meta: str | None = None) StructScalar[source]#

Build an OME-Arrow StructScalar from a torch tensor.

This is useful when your pipeline already works with torch.Tensor objects (for example model inputs/outputs) and you want a direct path into the canonical OME-Arrow struct without manually converting and reshaping in user code.

Parameters:
  • arrtorch.Tensor image data.

  • dim_order – Axis labels for arr. If None, infer from rank: 2D->”YX”, 3D->”ZYX”, 4D->”TCYX”, 5D->”TCZYX”.

  • image_id – Optional stable image identifier.

  • name – Optional human label.

  • image_type – Open-ended image kind (e.g., “image”, “label”).

  • channel_names – Optional channel names. Defaults to None. When None (or length does not match channel count), names are auto-generated as C0..C{n-1} (for example, 3 channels become C0, C1, C2).

  • acquisition_datetime – Defaults to now (UTC) if None.

  • clamp_to_uint16 – If True, clamp/cast planes to uint16 before serialization.

  • chunk_shape – Chunk shape as (Z, Y, X). Defaults to (1, 512, 512).

  • chunk_order – Flattening order for chunk pixels (default “ZYX”).

  • build_chunks – If True, build chunked pixels from planes.

  • physical_size_x – Spatial pixel size (µm) for X.

  • physical_size_y – Spatial pixel size (µm) for Y.

  • physical_size_z – Spatial pixel size (µm) for Z when present.

  • physical_size_unit – Unit string for spatial axes (default “µm”).

  • dtype_meta – Pixel dtype string to place in metadata.

Returns:

Typed OME-Arrow record.

Return type:

pa.StructScalar

ome_arrow.ingest.open_lazy_plane_source(source: str) tuple[dict[str, Any], Callable[[int, int, int], ndarray]] | None[source]#

Open a source-backed per-plane loader for lazy tensor execution.

Parameters:

source – Input path/URL string for TIFF or OME-Zarr sources.

Returns:

A tuple of (pixels_meta, plane_loader) when source-backed lazy plane loading is supported for source; otherwise None.

ome_arrow.ingest.to_ome_arrow(type_: str = OME_ARROW_TAG_TYPE, version: str = OME_ARROW_TAG_VERSION, image_id: str = "unnamed", name: str = "unknown", image_type: str | None = "image", acquisition_datetime: datetime | None = None, dimension_order: str = "XYZCT", dtype: str = "uint16", size_x: int = 1, size_y: int = 1, size_z: int = 1, size_c: int = 1, size_t: int = 1, physical_size_x: float = 1.0, physical_size_y: float = 1.0, physical_size_z: float = 1.0, physical_size_unit: str = "µm", channels: Dict[str, ~typing.Any]] | None=None, planes: Dict[str, ~typing.Any]] | None=None, chunks: Dict[str, ~typing.Any]] | None=None, chunk_shape: Tuple[int, int, int] | None=(1, 512, 512), chunk_order: str = "ZYX", build_chunks: bool = True, masks: Any = None) StructScalar[source]#

Create a typed OME-Arrow StructScalar with sensible defaults.

This builds and validates a nested dict that conforms to the given StructType (e.g., OME_ARROW_STRUCT). You can override any field explicitly; others use safe defaults.

Parameters:
  • type – Top-level type string (“ome.arrow” by default).

  • version – Specification version string.

  • image_id – Unique image identifier.

  • name – Human-friendly name.

  • image_type – Open-ended image kind (e.g., “image”, “label”). Note that from_* helpers pass image_type=None by default to preserve “unspecified” vs explicitly set (“image”).

  • acquisition_datetime – Datetime of acquisition (defaults to now).

  • dimension_order – Dimension order (“XYZCT” or “XYCT”).

  • dtype – Pixel data type string (e.g., “uint16”).

  • size_x – Axis sizes.

  • size_y – Axis sizes.

  • size_z – Axis sizes.

  • size_c – Axis sizes.

  • size_t – Axis sizes.

  • physical_size_x/y/z – Physical scaling in µm.

  • physical_size_unit – Unit string, default “µm”.

  • channels – List of channel dicts. Autogenerates one if None.

  • planes – List of plane dicts. Empty if None.

  • chunks – Optional list of chunk dicts. If None and build_chunks is True, chunks are derived from planes using chunk_shape.

  • chunk_shape – Chunk shape as (Z, Y, X). Defaults to (1, 512, 512).

  • chunk_order – Flattening order for chunk pixels (default “ZYX”).

  • build_chunks – If True, build chunked pixels from planes when chunks is None.

  • masks – Optional placeholder for future annotations.

Returns:

A validated StructScalar for the schema.

Return type:

pa.StructScalar

Example

>>> s = to_struct_scalar(OME_ARROW_STRUCT, image_id="img001")
>>> s.type == OME_ARROW_STRUCT
True

ome_arrow.export#

Module for exporting OME-Arrow data to other formats.

ome_arrow.export.plane_from_chunks(data: Dict[str, Any] | StructScalar, *, t: int, c: int, z: int, dtype: dtype = np.uint16, strict: bool = True, clamp: bool = False) ndarray[source]#

Extract a single (t, c, z) plane using chunked pixels when available.

Parameters:
  • data – OME-Arrow data as a Python dict or a pa.StructScalar.

  • t – Time index for the plane.

  • c – Channel index for the plane.

  • z – Z index for the plane.

  • dtype – Output dtype (default: np.uint16).

  • strict – When True, raise if chunk pixels are malformed.

  • clamp – If True, clamp values to the valid range of the target dtype.

Returns:

2D array with shape (Y, X).

Return type:

np.ndarray

Raises:
  • KeyError – If required OME-Arrow fields are missing.

  • ValueError – If indices are out of range or pixels are malformed.

ome_arrow.export.to_numpy(data: Dict[str, Any] | StructScalar, dtype: dtype = np.uint16, strict: bool = True, clamp: bool = False) ndarray[source]#

Convert an OME-Arrow record into a NumPy array shaped (T,C,Z,Y,X).

The OME-Arrow “planes” are flattened YX slices indexed by (z, t, c). When chunks are present, this function reconstitutes the dense TCZYX array from chunked pixels instead of planes.

Parameters:
  • data – OME-Arrow data as a Python dict or a pa.StructScalar.

  • dtype – Output dtype (default: np.uint16). If different from plane values, a cast (and optional clamp) is applied.

  • strict – When True, raise if a plane has wrong pixel length. When False, truncate/pad that plane to the expected length.

  • clamp – If True, clamp values to the valid range of the target dtype before casting.

Returns:

Dense array with shape (T, C, Z, Y, X).

Return type:

np.ndarray

Raises:
  • KeyError – If required OME-Arrow fields are missing.

  • ValueError – If dimensions are invalid or planes are malformed.

Examples

>>> arr = ome_arrow_to_tczyx(my_row)  # (T, C, Z, Y, X)
>>> arr.shape
(1, 2, 1, 512, 512)
ome_arrow.export.to_ome_parquet(data: Dict[str, Any] | StructScalar, out_path: str, column_name: str = 'image', file_metadata: Dict[str, str] | None = None, compression: str | None = 'zstd', row_group_size: int | None = None) None[source]#

Export an OME-Arrow record to a Parquet file as a single-row, single-column table. The single column holds a struct with the OME-Arrow schema.

ome_arrow.export.to_ome_tiff(data: Dict[str, Any] | StructScalar, out_path: str, *, dtype: dtype = np.uint16, clamp: bool = False, dim_order: str = 'TCZYX', compression: str | None = 'zlib', compression_level: int = 6, tile: Tuple[int, int] | None = None, use_channel_colors: bool = False) None[source]#

Export an OME-Arrow record to OME-TIFF using BioIO’s OmeTiffWriter.

Notes

  • No ‘bigtiff’ kwarg is passed (invalid for tifffile.TiffWriter.write()). BigTIFF selection is automatic based on file size.

ome_arrow.export.to_ome_vortex(data: Dict[str, Any] | StructScalar, out_path: str, column_name: str = 'image', file_metadata: Dict[str, str] | None = None) None[source]#

Export an OME-Arrow record to a Vortex file.

The file is written as a single-row, single-column Arrow table where the column holds a struct with the OME-Arrow schema.

Parameters:
  • data – OME-Arrow dict or StructScalar.

  • out_path – Output path for the Vortex file.

  • column_name – Column name to store the struct.

  • file_metadata – Optional file-level metadata to attach.

Raises:

ImportError – If the optional vortex-data dependency is missing.

ome_arrow.export.to_ome_zarr(data: Dict[str, Any] | StructScalar, out_path: str, *, dtype: dtype = np.uint16, clamp: bool = False, dim_order: str = 'TCZYX', multiscale_levels: int = 1, downscale_spatial_by: int = 2, zarr_format: int = 3, chunks: Tuple[int, int, int, int, int] | None = None, shards: Tuple[int, int, int, int, int] | None = None, compressor: str | None = 'zstd', compressor_level: int = 3, image_name: str | None = None) None[source]#

Write OME-Zarr using your OMEZarrWriter (instance API).

  • Builds arr as (T,C,Z,Y,X) using your to_numpy.

  • Creates level shapes for a multiscale pyramid (if multiscale_levels>1).

  • Chooses Blosc codec compatible with zarr_format (v2 vs v3).

  • Populates axes names/types/units and physical pixel sizes from pixels_meta.

  • Uses default TCZYX chunks if none are provided.

ome_arrow.meta#

Meta-definition for OME-Arrow format.

ome_arrow.tensor#

Tensor view utilities for OME-Arrow pixel data.

class ome_arrow.tensor.LazyTensorView(*, loader: Callable[[], dict[str, Any] | StructScalar | StructArray | ChunkedArray], resolver: Callable[[dict[str, Any]], TensorView] | None = None, t: int | slice | Sequence[int] | None = None, z: int | slice | Sequence[int] | None = None, c: int | slice | Sequence[int] | None = None, roi: tuple[int, int, int, int] | None = None, roi3d: tuple[int, int, int, int, int, int] | None = None, roi_nd: tuple[int, ...] | None = None, roi_type: Literal['2d', '2d_timelapse', '3d', '4d'] | None = None, tile: tuple[int, int] | None = None, layout: str | None = None, dtype: dtype | None = None, chunk_policy: Literal['auto', 'combine', 'keep'] = 'auto', channel_policy: Literal['error', 'first'] = 'error')[source]#

Bases: object

Deferred TensorView plan with Polars-style collect semantics.

collect() TensorView[source]#

Materialize this lazy plan into a concrete TensorView.

property device: str#

Return the tensor storage device.

Note

For unresolved lazy plans, this returns "cpu" without calling collect().

property dtype: dtype#

Return the tensor dtype.

Note

Accessing this property calls collect() and may materialize data from source files (for example Parquet/TIFF), which can be expensive.

iter_dlpack(*, batch_size: int | None = None, tile_size: tuple[int, int] | None = None, tiles: tuple[int, int] | None = None, shuffle: bool = False, seed: int | None = None, prefetch: int = 0, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') Iterator[Any][source]#

Iterate DLPack outputs in batches or 2D tiles.

Parameters:
  • batch_size – Number of time indices per batch.

  • tile_size – Optional tile size as (tile_h, tile_w).

  • tiles – Deprecated alias for tile_size.

  • shuffle – Whether to shuffle iteration order.

  • seed – Optional random seed for deterministic shuffling.

  • prefetch – Placeholder prefetch count.

  • device – Target device ("cpu" or "cuda").

  • contiguous – When True, materialize contiguous data when needed.

  • mode – Export mode ("arrow" or "numpy").

Returns:

Iterator of DLPack-compatible objects.

Return type:

Iterator[Any]

iter_tiles_3d(*, tile_size: tuple[int, int, int], shuffle: bool = False, seed: int | None = None, prefetch: int = 0, device: str = 'cpu', contiguous: bool = True, mode: str = 'numpy') Iterator[Any][source]#

Iterate DLPack outputs in 3D tiles.

Parameters:
  • tile_size – Tile shape as (tile_z, tile_h, tile_w).

  • shuffle – Whether to shuffle iteration order.

  • seed – Optional random seed for deterministic shuffling.

  • prefetch – Placeholder prefetch count.

  • device – Target device ("cpu" or "cuda").

  • contiguous – When True, materialize contiguous data when needed.

  • mode – Export mode (currently "numpy" only).

Returns:

Iterator of DLPack-compatible objects.

Return type:

Iterator[Any]

property layout: str#

Return the effective tensor layout.

Note

Accessing this property calls collect() and may materialize data from source files (for example Parquet/TIFF), which can be expensive.

select(*, t: int | slice | Sequence[int] | None | _Unset = _UNSET, z: int | slice | Sequence[int] | None | _Unset = _UNSET, c: int | slice | Sequence[int] | None | _Unset = _UNSET, roi: tuple[int, int, int, int] | None | _Unset = _UNSET, roi3d: tuple[int, int, int, int, int, int] | None | _Unset = _UNSET, roi_nd: tuple[int, ...] | None | _Unset = _UNSET, roi_type: Literal['2d', '2d_timelapse', '3d', '4d'] | None | _Unset = _UNSET, tile: tuple[int, int] | None | _Unset = _UNSET) LazyTensorView[source]#

Return a new lazy plan with updated index/ROI selections.

property shape: tuple[int, ...]#

Return the tensor shape.

Note

Accessing this property calls collect() and may materialize data from source files (for example Parquet/TIFF), which can be expensive.

property strides: tuple[int, ...]#

Return tensor strides in bytes.

Note

Accessing this property calls collect() and may materialize data from source files (for example Parquet/TIFF), which can be expensive.

to_dlpack(*, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') Any[source]#

Export the planned view as a DLPack object.

Parameters:
  • device – Target device ("cpu" or "cuda").

  • contiguous – When True, materialize contiguous data when needed.

  • mode – Export mode ("arrow" or "numpy").

Returns:

DLPack-compatible object.

Return type:

Any

to_jax(*, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') Any[source]#

Convert the planned view to a JAX array.

Parameters:
  • device – Target device ("cpu" or "cuda").

  • contiguous – When True, materialize contiguous data when needed.

  • mode – Export mode ("arrow" or "numpy").

Returns:

JAX array when JAX is installed.

Return type:

Any

to_numpy(*, contiguous: bool = False) ndarray[source]#

Materialize as a NumPy array.

Parameters:

contiguous – When True, return a contiguous array copy.

Returns:

Materialized array.

Return type:

np.ndarray

to_torch(*, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') Any[source]#

Convert the planned view to a torch tensor.

Parameters:
  • device – Target device ("cpu" or "cuda").

  • contiguous – When True, materialize contiguous data when needed.

  • mode – Export mode ("arrow" or "numpy").

Returns:

torch.Tensor when torch is installed.

Return type:

Any

with_layout(layout: str) LazyTensorView[source]#

Return a new lazy view with an updated layout.

class ome_arrow.tensor.TensorView(data: dict[str, Any] | StructScalar | StructArray | ChunkedArray, *, plane_loader: Callable[[int, int, int], ndarray] | None = None, t: int | slice | Sequence[int] | None = None, z: int | slice | Sequence[int] | None = None, c: int | slice | Sequence[int] | None = None, roi: tuple[int, int, int, int] | None = None, roi3d: tuple[int, int, int, int, int, int] | None = None, roi_nd: tuple[int, ...] | None = None, roi_type: Literal['2d', '2d_timelapse', '3d', '4d'] | None = None, tile: tuple[int, int] | None = None, layout: str | None = None, dtype: dtype | None = None, chunk_policy: Literal['auto', 'combine', 'keep'] = 'auto', channel_policy: Literal['error', 'first'] = 'error')[source]#

Bases: object

View OME-Arrow pixel data as a tensor-like object.

Parameters:
  • data – OME-Arrow dict, StructScalar, or 1-row StructArray/ChunkedArray.

  • t – Time index selection (int, slice, or sequence). Default: all.

  • z – Z index selection (int, slice, or sequence). Default: all.

  • c – Channel index selection (int, slice, or sequence). Default: all.

  • roi – Spatial crop (x, y, w, h) in pixels. Default: full frame.

  • roi3d – Spatial + depth crop (x, y, z, w, h, d). This is a convenience alias for roi=(x, y, w, h) and z=slice(z, z + d).

  • roi_nd – General ROI tuple with min/max bounds, interpreted by roi_type.

  • roi_type – ROI interpretation mode for roi_nd. Supported values: "2d" = (ymin, xmin, ymax, xmax); "2d_timelapse" = (tmin, ymin, xmin, tmax, ymax, xmax); "3d" = (zmin, ymin, xmin, zmax, ymax, xmax); "4d" = (tmin, zmin, ymin, xmin, tmax, zmax, ymax, xmax).

  • tile – Tile index (tile_y, tile_x) based on chunk grid.

  • layout – Desired layout string using TZCYX letters where T=time, Z=depth, C=channel, Y=row axis, X=column axis. TZCHW aliases are also accepted for compatibility.

  • dtype – Output dtype override. Defaults to pixels_meta.type when valid.

  • chunk_policy – Handling for pyarrow.ChunkedArray inputs. “auto” keeps multi-chunk arrays and unwraps single-chunk arrays. “combine” always combines multi-chunk arrays eagerly. “keep” always keeps chunked storage.

  • channel_policy – Behavior when dropping C from layout while multiple channels are selected. “error” raises (default). “first” keeps the first channel.

property device: str#

Return the storage device for the view (currently always “cpu”).

property dtype: dtype#

Return the tensor dtype.

iter_dlpack(*, batch_size: int | None = None, tile_size: tuple[int, int] | None = None, tiles: tuple[int, int] | None = None, shuffle: bool = False, seed: int | None = None, prefetch: int = 0, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') Iterator[Any][source]#

Iterate over DLPack capsules in batches or tiles.

Parameters:
  • batch_size – Number of T indices per batch. Defaults to full range.

  • tile_size – Tile size (tile_h, tile_w) in pixels for spatial tiling.

  • tiles – Deprecated alias for tile_size.

  • shuffle – Whether to shuffle the iteration order.

  • seed – Seed for deterministic shuffling.

  • prefetch – Placeholder for future asynchronous prefetch support. Currently validated but does not change synchronous iteration.

  • device – Target device (“cpu” or “cuda”).

  • contiguous – When True, materialize contiguous buffers if needed.

  • mode – Export mode. “arrow” returns 1D values buffers.

Yields:

DLPack object per batch or tile.

iter_tiles_3d(*, tile_size: tuple[int, int, int], shuffle: bool = False, seed: int | None = None, prefetch: int = 0, device: str = 'cpu', contiguous: bool = True, mode: str = 'numpy') Iterator[Any][source]#

Iterate over 3D tiles (z, y, x) as DLPack capsules.

Parameters:
  • tile_size – Tile size as (tile_z, tile_h, tile_w).

  • shuffle – Whether to shuffle the tile order.

  • seed – Seed for deterministic shuffling.

  • prefetch – Placeholder for future asynchronous prefetch support.

  • device – Target device (“cpu” or “cuda”).

  • contiguous – When True, materialize contiguous buffers if needed.

  • mode – Export mode. Must be "numpy" for tiled 3D iteration.

Yields:

DLPack object per 3D tile.

property layout: str#

Return the effective layout for this view.

property shape: tuple[int, ...]#

Return the tensor shape for the current layout.

property strides: tuple[int, ...]#

Return the tensor strides in bytes for the current layout.

to_dlpack(*, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') Any[source]#

Export the view as a DLPack capsule.

Parameters:
  • device – Target device (“cpu” or “cuda”).

  • contiguous – When True, materialize a contiguous buffer if needed.

  • mode – Export mode. “arrow” returns a capsule for the Arrow values buffer (1D). “numpy” materializes a tensor-shaped NumPy view. Zero-copy Arrow mode requires Arrow-backed inputs (typically Parquet/Vortex ingestion with canonical schema); StructScalar and dict inputs are normalized through Python objects.

Returns:

DLPack object compatible with torch/jax import utilities. The returned object is single-use per DLPack ownership semantics: after a consumer imports it, the capsule must not be reused.

Raises:
  • ValueError – If an unsupported device is requested.

  • RuntimeError – If required optional dependencies are missing.

to_jax(*, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') Any[source]#

Convert the view into a JAX array using DLPack.

Parameters:
  • device – Target device (“cpu” or “cuda”).

  • contiguous – When True, materialize a contiguous buffer if needed.

  • mode – Export mode. “arrow” returns a 1D values buffer.

Returns:

Array backed by the DLPack capsule.

Return type:

jax.Array

to_numpy(*, contiguous: bool = False) ndarray[source]#

Materialize the view as a NumPy array.

Parameters:

contiguous – When True, return a contiguous array copy.

Returns:

Array in the requested layout.

Return type:

np.ndarray

to_torch(*, device: str = 'cpu', contiguous: bool = True, mode: str = 'arrow') Any[source]#

Convert the view into a torch.Tensor using DLPack.

Parameters:
  • device – Target device (“cpu” or “cuda”).

  • contiguous – When True, materialize a contiguous buffer if needed.

  • mode – Export mode. “arrow” returns a 1D values buffer.

Returns:

Tensor backed by the DLPack capsule.

Return type:

torch.Tensor

with_layout(layout: str) TensorView[source]#

Return a new TensorView with a layout override.

Parameters:

layout – Desired layout string using TZCYX letters where T=time, Z=depth, C=channel, Y=row axis, X=column axis. TZCHW aliases are also accepted for compatibility.

Returns:

New view with the requested layout.

Return type:

TensorView

ome_arrow.transform#

Module for transforming OME-Arrow data (e.g., slices, projections, or other changes).

ome_arrow.transform.slice_ome_arrow(data: Dict[str, Any] | StructScalar, x_min: int, x_max: int, y_min: int, y_max: int, t_indices: Iterable[int] | None = None, c_indices: Iterable[int] | None = None, z_indices: Iterable[int] | None = None, fill_missing: bool = True) StructScalar[source]#

Create a cropped copy of an OME-Arrow record.

Crops spatially to [y_min:y_max, x_min:x_max] (half-open) and, if provided, filters/reindexes T/C/Z to the given index sets.

Parameters:
  • data (dict | pa.StructScalar) – OME-Arrow record.

  • x_min (int) – Half-open crop bounds in pixels (0-based).

  • x_max (int) – Half-open crop bounds in pixels (0-based).

  • y_min (int) – Half-open crop bounds in pixels (0-based).

  • y_max (int) – Half-open crop bounds in pixels (0-based).

  • t_indices (Iterable[int] | None) – Optional explicit indices to keep for T, C, Z. If None, keep all. Selected indices are reindexed to 0..len-1 in the output.

  • c_indices (Iterable[int] | None) – Optional explicit indices to keep for T, C, Z. If None, keep all. Selected indices are reindexed to 0..len-1 in the output.

  • z_indices (Iterable[int] | None) – Optional explicit indices to keep for T, C, Z. If None, keep all. Selected indices are reindexed to 0..len-1 in the output.

  • fill_missing (bool) – If True, any missing (t,c,z) planes in the selection are zero-filled.

Returns:

New OME-Arrow record with updated sizes and planes.

Return type:

pa.StructScalar

ome_arrow.utils#

Utility functions for ome-arrow.

ome_arrow.utils.describe_ome_arrow(data: StructScalar | dict) Dict[str, Any][source]#

Describe the structure of an OME-Arrow image record.

Reads pixels_meta from the OME-Arrow struct to report TCZYX dimensions and classify whether it’s a 2D image, 3D z-stack, movie/timelapse, or 4D timelapse-volume. Also flags whether it is multi-channel (C > 1) or single-channel.

Parameters:

data – OME-Arrow row as a pa.StructScalar or plain dict.

Returns:

  • shape: (T, C, Z, Y, X)

  • type: classification string

  • summary: human-readable text

Return type:

dict with keys

ome_arrow.utils.verify_ome_arrow(data: Any, struct: StructType) bool[source]#

Return True if data conforms to the given Arrow StructType.

This tries to convert data into a pyarrow scalar using struct as the declared type. If conversion fails, the data does not match.

Parameters:
  • data – A nested Python dict/list structure to test.

  • struct – The expected pyarrow.StructType schema.

Returns:

True if conversion succeeds, False otherwise.

Return type:

bool

ome_arrow.view#

Viewing utilities for OME-Arrow data.

ome_arrow.view.view_matplotlib(data: dict[str, object] | StructScalar, tcz: tuple[int, int, int] = (0, 0, 0), autoscale: bool = True, vmin: int | None = None, vmax: int | None = None, cmap: str = 'gray', show: bool = True) tuple[Figure, Axes, AxesImage][source]#

Render a single (t, c, z) plane with Matplotlib.

Parameters:
  • data – OME-Arrow row or dict containing pixels_meta and planes.

  • tcz – (t, c, z) indices of the plane to render.

  • autoscale – If True, infer vmin/vmax from the image data.

  • vmin – Explicit lower display limit for intensity scaling.

  • vmax – Explicit upper display limit for intensity scaling.

  • cmap – Matplotlib colormap name.

  • show – Whether to display the plot immediately.

Returns:

A tuple of (figure, axes, image) from Matplotlib.

Raises:

ValueError – If the requested plane is missing or pixel sizes mismatch.

ome_arrow.view.view_pyvista(data: dict | pa.StructScalar, c: int = 0, downsample: int = 1, scaling_values: tuple[float, float, float] | None = None, opacity: str | float = 'sigmoid', clim: tuple[float, float] | None = None, show_axes: bool = True, backend: str = 'auto', interpolation: str = 'nearest', background: str = 'black', percentile_clim: tuple[float, float] = (1.0, 99.9), sampling_scale: float = 0.5, show: bool = True) pyvista.Plotter[source]#

Jupyter-inline interactive volume view using PyVista backends. Tries ‘trame’ → ‘html’ → ‘static’ when backend=’auto’.

sampling_scale controls ray step via the mapper after add_volume.