Exporting OME-Arrow pixel data via DLPack#

OME-Arrow exposes a small tensor view API for pixel data. The returned TensorView can export DLPack capsules for zero-copy interoperability on CPU and (optionally) GPU.

Key defaults:

  • OME-Arrow tensor layouts always include channels (C) as a tensor axis.

  • Default layout is CHW (equivalent to CYX) when both T and Z are singleton in the source.

  • Otherwise, default layout is TZCHW (equivalent to TZCYX, with singleton T/Z retained unless you override layout).

  • You can override with any valid TZCHW/TZCYX permutation/subset, for example YXC, ZCYX, or CYX.

Layout nomenclature:

  • T: time index

  • Z: z/depth index

  • C: channel index

  • Y: image row axis (height)

  • X: image column axis (width) (H/W aliases are also accepted for compatibility).

Practical mapping:

  • 2D image content (YX) is typically exposed as CYX.

  • 3D z-stack content (ZYX) is typically exposed as ZCYX or TZCYX (with T=1).

  • Time-lapse and volumetric content use TZCYX/TZCHW by default.

PyTorch#

from ome_arrow import OMEArrow

obj = OMEArrow("example.ome.parquet")
view = obj.tensor_view(t=0, z=0, c=0)

# DLPack capsule -> torch.Tensor
import torch

capsule = view.to_dlpack(mode="arrow", device="cpu")
flat = torch.utils.dlpack.from_dlpack(capsule)
tensor = flat.reshape(view.shape)

Lazy scan-style slicing#

from ome_arrow import OMEArrow

obj = OMEArrow.scan("example.ome.parquet")
# Prioritize lazy slice planning first.
lazy_crop = obj.slice_lazy(0, 512, 0, 512).slice_lazy(64, 256, 64, 256)
cropped = lazy_crop.collect()

# Then execute tensor selections on the sliced result.
tensor_view = cropped.tensor_view(t=0, z=slice(0, 8), roi=(64, 64, 128, 128))
arr = tensor_view.to_numpy()

# Note: executing a LazyTensorView from OMEArrow.scan(...) does not
# materialize the original OMEArrow object itself.
# Call obj.collect() explicitly if you need to materialize `obj`.

JAX#

from ome_arrow import OMEArrow

obj = OMEArrow("example.ome.parquet")
view = obj.tensor_view(t=0, z=0, c=0, layout="CYX")

import jax.numpy as jnp

capsule = view.to_dlpack(mode="arrow", device="cpu")
flat = jnp.from_dlpack(capsule)
arr = flat.reshape(view.shape)

Iteration examples#

from ome_arrow import OMEArrow
import numpy as np

obj = OMEArrow("example.ome.parquet")
view = obj.tensor_view()

# Batch over time (T) dimension.
for cap in view.iter_dlpack(batch_size=2, shuffle=False, mode="numpy"):
    batch = np.from_dlpack(cap)
    # batch shape: (batch, Z, C, Y, X) in TZCYX layout
from ome_arrow import OMEArrow
import numpy as np

obj = OMEArrow("example.ome.parquet")
view = obj.tensor_view(t=0, z=0)

# Tile over spatial region.
for cap in view.iter_dlpack(
    tile_size=(256, 256), shuffle=True, seed=123, mode="numpy"
):
    tile = np.from_dlpack(cap)
    # tile shape: (C, Y, X) in CYX layout

Ownership and lifetime#

TensorView.to_dlpack() returns a DLPack-capable object (with __dlpack__) that references the underlying Arrow values buffer in mode="arrow", or a NumPy buffer in mode="numpy". Keep the TensorView (or any NumPy array returned by to_numpy) alive until the consumer finishes using the DLPack object.

mode="arrow" currently requires a single (t, z, c) selection and a full-frame ROI. Use mode="numpy" for batches, crops, or layout reshaping beyond a simple reshape.

Zero-copy guarantees depend on the source: Arrow-backed inputs preserve buffers, while records built from Python lists or NumPy arrays will materialize once into Arrow buffers. The same applies to StructScalar inputs, which are normalized through Python objects before Arrow-mode export. For Parquet/Vortex sources, zero-copy also requires the on-disk struct schema to match OME_ARROW_STRUCT; non-strict schema normalization materializes via Python objects.

Optional dependencies#

CPU DLPack export uses Arrow buffers by default. For framework helpers and GPU paths, install only what you need:

pip install "ome-arrow[dlpack-torch]"  # torch only
pip install "ome-arrow[dlpack-jax]"    # jax only
pip install "ome-arrow[dlpack]"        # both

Benchmarking lazy reads#

To quickly compare lazy tensor read paths (TIFF source-backed execution, Parquet planes, Parquet chunks), run:

uv run python benchmarks/benchmark_lazy_tensor.py --repeats 5 --warmup 1

This is a lightweight local benchmark intended for directional performance checks during development.

In CI, the tests workflow runs a benchmark_canary job that executes the same script and uploads a JSON report artifact.

Recalibrating ci-baseline.json#

When performance changes are intentional (or runner behavior shifts), update benchmarks/ci-baseline.json as follows:

  1. Check out the latest main.

  2. Run the benchmark multiple times: uv run python benchmarks/benchmark_lazy_tensor.py --repeats 7 --warmup 2 --json-out benchmark-results.json

  3. Record median_ms per case across runs.

  4. Set each baseline value to a stable, slightly conservative median.

  5. Open a PR that updates baseline values only, with benchmark evidence.

Expected variability:

  • Small fluctuations are normal on GitHub-hosted runners.

  • Relative ordering of cases is usually stable.

  • Typical drift should be modest, but occasional jumps can happen due to runner image or dependency changes.