Python API#
cytodataframe.frame#
Defines a CytoDataFrame class.
- class src.cytodataframe.frame.CytoDataFrame(data: CytoDataFrame_type | DataFrame | str | Path, data_context_dir: str | None = None, data_image_paths: DataFrame | None = None, data_bounding_box: DataFrame | None = None, data_mask_context_dir: str | None = None, data_outline_context_dir: str | None = None, segmentation_file_regex: Dict[str, str] | None = None, image_adjustment: Callable | None = None, **kwargs: Dict[str, Any])[source]#
Bases:
DataFrame
A class designed to enhance single-cell data handling by wrapping pandas DataFrame capabilities, providing advanced methods for quality control, comprehensive analysis, and image-based data processing.
This class can initialize with either a pandas DataFrame or a file path (CSV, TSV, TXT, or Parquet). When initialized with a file path, it reads the data into a pandas DataFrame. It also includes capabilities to export data.
- _metadata#
A class-level attribute that includes custom attributes.
- Type:
ClassVar[list[str]]
- _custom_attrs#
A dictionary to store custom attributes, such as data source, context directory, and bounding box information.
- Type:
dict
- _metadata: ClassVar = ['_custom_attrs']#
- _repr_html_(key: int | str | None = None) str [source]#
Returns HTML representation of the underlying pandas DataFrame for use within Juypyter notebook environments and similar.
Referenced with modifications from: pandas-dev/pandas
Modifications added to help achieve image-based output for single-cell data within the context of CytoDataFrame and coSMicQC.
Mainly for Jupyter notebooks.
- Returns:
The data in a pandas DataFrame.
- Return type:
str
- _wrap_method(method: Callable, *args: List[Any], **kwargs: Dict[str, Any]) Any [source]#
Wraps a given method to ensure that the returned result is an CytoDataFrame if applicable.
- Parameters:
method (Callable) – The method to be called and wrapped.
*args (List[Any]) – Positional arguments to be passed to the method.
**kwargs (Dict[str, Any]) – Keyword arguments to be passed to the method.
- Returns:
The result of the method call. If the result is a pandas DataFrame, it is wrapped in an CytoDataFrame instance with additional context information (data context directory and data bounding box).
- Return type:
Any
- export(file_path: str, **kwargs: Dict[str, Any]) None [source]#
Exports the underlying pandas DataFrame to a file.
- Parameters:
file_path (str) – The path where the DataFrame should be saved.
**kwargs – Additional keyword arguments to pass to the pandas to_* methods.
- find_image_columns() List[str] [source]#
Find columns containing image file names.
This method searches for columns in the DataFrame that contain image file names with extensions .tif or .tiff (case insensitive).
- Returns:
A list of column names that contain image file names.
- Return type:
List[str]
- find_image_path_columns(image_cols: List[str], all_cols: List[str]) Dict[str, str] [source]#
Find columns containing image path names (the directory storing the images but not the file names). We do this by seeking the pattern: Image_FileName_X –> Image_PathName_X.
- Parameters:
image_cols – List[str]: A list of column names that contain image file names.
all_cols – List[str]: A list of all column names.
- Returns:
A list of column names that contain image file names.
- Return type:
Dict[str, str]
- get_bounding_box_from_data() CytoDataFrame_type | None [source]#
Retrieves bounding box data from the DataFrame based on predefined column groups.
This method identifies specific groups of columns representing bounding box coordinates for different cellular components (cytoplasm, nuclei, cells) and checks for their presence in the DataFrame. If all required columns are present, it filters and returns a new CytoDataFrame instance containing only these columns.
- Returns:
A new instance of CytoDataFrame containing the bounding box columns if they exist in the DataFrame. Returns None if the required columns are not found.
- Return type:
Optional[CytoDataFrame_type]
- get_image_paths_from_data(image_cols: List[str]) Dict[str, str] [source]#
Gather data containing image path names (the directory storing the images but not the file names). We do this by seeking the pattern: Image_FileName_X –> Image_PathName_X.
- Parameters:
image_cols – List[str]: A list of column names that contain image file names.
- Returns:
A list of column names that contain image file names.
- Return type:
Dict[str, str]
- static is_notebook_or_lab() bool [source]#
Determines if the code is being executed in a Jupyter notebook (.ipynb) returning false if it is not.
This method attempts to detect the interactive shell environment using IPython’s get_ipython function. It checks the class name of the current IPython shell to distinguish between different execution environments.
- Returns:
- True
if the code is being executed in a Jupyter notebook (.ipynb).
- False
otherwise (e.g., standard Python shell, terminal IPython shell, or scripts).
- Return type:
bool
- process_image_data_as_html_display(data_value: Any, bounding_box: Tuple[int, int, int, int], image_path: str | None = None) str [source]#
Process the image data based on the provided data value and bounding box, applying masks or outlines where applicable, and return an HTML representation of the cropped image for display.
- Parameters:
data_value (Any) – The value to search for in the file system or as the image data.
bounding_box (Tuple[int, int, int, int]) – The bounding box to crop the image.
- Returns:
The HTML image display string, or the unmodified data value if the image cannot be processed.
- Return type:
str
- search_for_mask_or_outline(data_value: str, pattern_map: dict, file_dir: str, candidate_path: Path, orig_image: ndarray, mask: bool = True) ndarray [source]#
Search for a mask or outline image file based on the provided patterns and apply it to the target image.
- Parameters:
data_value (str) – The value used to match patterns for locating mask or outline files.
pattern_map (dict) – A dictionary of file patterns and their corresponding original patterns for matching.
file_dir (str) – The directory where image files are stored.
candidate_path (pathlib.Path) – The path to the candidate image file to apply the mask or outline to.
orig_image (np.ndarray) – The image which will have a mask or outline applied.
mask (bool, optional) – Whether to search for a mask (True) or an outline (False). Default is True.
- Returns:
The target image with the applied mask or outline, or None if no relevant file is found.
- Return type:
np.ndarray
- sort_values(*args: List[Any], **kwargs: Dict[str, Any]) CytoDataFrame_type [source]#
Sorts the DataFrame by the specified column(s) and returns a new CytoDataFrame instance.
Note: we wrap this method within CytoDataFrame to help ensure the consistent return of CytoDataFrames in the context of pd.Series (which are treated separately but have specialized processing within the context of sort_values).
- Parameters:
*args (List[Any]) – Positional arguments to be passed to the pandas DataFrame’s sort_values method.
**kwargs (Dict[str, Any]) – Keyword arguments to be passed to the pandas DataFrame’s sort_values method.
- Returns:
A new instance of CytoDataFrame sorted by the specified column(s).
- Return type:
CytoDataFrame_type
cytodataframe.image#
Helper functions for working with images in the context of CytoDataFrames.
- src.cytodataframe.image.adjust_image_brightness(image: Image) Image [source]#
Adjust the brightness of an image using histogram equalization.
- Parameters:
image (Image) – The input PIL Image.
- Returns:
The brightness-adjusted PIL Image.
- Return type:
Image
- src.cytodataframe.image.adjust_with_adaptive_histogram_equalization(image: ndarray) ndarray [source]#
Adaptive histogram equalization with additional smoothing to reduce graininess.
- Parameters:
image (np.ndarray) – The input image to be processed.
- Returns:
The processed image with enhanced contrast.
- Return type:
np.ndarray
- src.cytodataframe.image.draw_outline_on_image_from_mask(orig_image: ndarray, mask_image_path: str) ndarray [source]#
Draws green outlines on an image based on a binary mask and returns the combined result.
Please note: masks are inherently challenging to use when working with multi-compartment datasets and may result in outlines that do not pertain to the precise compartment. For example, if an object mask overlaps with one or many other object masks the outlines may not differentiate between objects.
- Parameters:
orig_image (np.ndarray) – Image which a mask will be applied to. Must be a NumPy array.
mask_image_path (str) – Path to the binary mask image file.
- Returns:
The resulting image with the green outline applied.
- Return type:
np.ndarray
- src.cytodataframe.image.draw_outline_on_image_from_outline(orig_image: ndarray, outline_image_path: str) ndarray [source]#
Draws green outlines on an image based on a provided outline image and returns the combined result.
- Parameters:
orig_image (np.ndarray) – The original image on which the outlines will be drawn. It must be a grayscale or RGB image with shape (H, W) for grayscale or (H, W, 3) for RGB.
outline_image_path (str) – The file path to the outline image. This image will be used to determine the areas where the outlines will be drawn. It can be grayscale or RGB.
- Returns:
The original image with green outlines drawn on the non-black areas from the outline image. The result is returned as an RGB image with shape (H, W, 3).
- Return type:
np.ndarray
- src.cytodataframe.image.is_image_too_dark(image: Image, pixel_brightness_threshold: float = 10.0) bool [source]#
Check if the image is too dark based on the mean brightness. By “too dark” we mean not as visible to the human eye.
- Parameters:
image (Image) – The input PIL Image.
threshold (float) – The brightness threshold below which the image is considered too dark.
- Returns:
True if the image is too dark, False otherwise.
- Return type:
bool