Python API

cosmicqc.analyze

Module for detecting various quality control aspects from source data.

src.cosmicqc.analyze.find_outliers(df: CytoDataFrame | DataFrame | str, metadata_columns: List[str], feature_thresholds: Dict[str, float] | str, feature_thresholds_file: str | None = DEFAULT_QC_THRESHOLD_FILE, export_path: str | None = None) DataFrame[source]

This function uses identify_outliers to return a dataframe with only the outliers and provided metadata columns.

Parameters:
  • df – Union[CytoDataFrame, pd.DataFrame, str] DataFrame or file string-based filepath of a Parquet, CSV, or TSV file with CytoTable output or similar data.

  • metadata_columns – List[str] List of metadata columns that should be outputted with the outlier data.

  • feature_thresholds – Dict[str, float] One of two options: A dictionary with the feature name(s) as the key(s) and their assigned threshold for identifying outliers. Positive int for the threshold will detect outliers “above” than the mean, negative int will detect outliers “below” the mean. Or a string which is a named key reference found within the feature_thresholds_file yaml file.

  • feature_thresholds_file – Optional[str] = DEFAULT_QC_THRESHOLD_FILE, An optional feature thresholds file where thresholds may be defined within a file.

  • export_path – Optional[str] = None An optional path to export the data using CytoDataFrame export capabilities. If None no export is performed. Note: compatible exports are CSV’s, TSV’s, and parquet.

Returns:

Outlier data frame for the given conditions.

Return type:

pd.DataFrame

src.cosmicqc.analyze.identify_outliers(df: CytoDataFrame | DataFrame | str, feature_thresholds: Dict[str, float] | str, feature_thresholds_file: str | None = DEFAULT_QC_THRESHOLD_FILE, include_threshold_scores: bool = False, export_path: str | None = None) Series | CytoDataFrame[source]

This function uses z-scoring to format the data for detecting outlier nuclei or cells using specific CellProfiler features. Thresholds are the number of standard deviations away from the mean, either above (positive) or below (negative). We recommend making sure to not use a threshold of 0 as that would represent the whole dataset.

Parameters:
  • df – Union[CytoDataFrame, pd.DataFrame, str] DataFrame or file string-based filepath of a Parquet, CSV, or TSV file with CytoTable output or similar data.

  • feature_thresholds – Dict[str, float] One of two options: A dictionary with the feature name(s) as the key(s) and their assigned threshold for identifying outliers. Positive int for the threshold will detect outliers “above” than the mean, negative int will detect outliers “below” the mean. Or a string which is a named key reference found within the feature_thresholds_file yaml file.

  • feature_thresholds_file – Optional[str] = DEFAULT_QC_THRESHOLD_FILE, An optional feature thresholds file where thresholds may be defined within a file.

  • include_threshold_scores – bool Whether to include the threshold scores in addition to whether the threshold set passes per row.

  • export_path – Optional[str] = None An optional path to export the data using CytoDataFrame export capabilities. If None no export is performed. Note: compatible exports are CSV’s, TSV’s, and parquet.

Returns:

Outlier series with booleans based on whether outliers were detected or not for use within other functions.

Return type:

Union[pd.Series, CytoDataFrame]

src.cosmicqc.analyze.label_outliers(df: CytoDataFrame | DataFrame | str, feature_thresholds: Dict[str, float] | str | None = None, feature_thresholds_file: str | None = DEFAULT_QC_THRESHOLD_FILE, include_threshold_scores: bool = False, export_path: str | None = None, report_path: str | None = None, **kwargs: Dict[str, Any]) CytoDataFrame[source]

Use identify_outliers to label the original dataset for where a cell passed or failed the quality control condition(s).

Args:
df: Union[CytoDataFrame, pd.DataFrame, str]

DataFrame or file string-based filepath of a Parquet, CSV, or TSV file with CytoTable output or similar data.

feature_thresholds: Dict[str, float]

One of two options: A dictionary with the feature name(s) as the key(s) and their assigned threshold for identifying outliers. Positive int for the threshold will detect outliers “above” than the mean, negative int will detect outliers “below” the mean. Or a string which is a named key reference found within the feature_thresholds_file yaml file.

feature_thresholds_file: Optional[str] = DEFAULT_QC_THRESHOLD_FILE,

An optional feature thresholds file where thresholds may be defined within a file.

include_threshold_scores: bool = False

Whether to include the scores in addition to whether an outlier was detected or not.

export_path: Optional[str] = None

An optional path to export the data using CytoDataFrame export capabilities. If None no export is performed. Note: compatible exports are CSV’s, TSV’s, and parquet.

Returns:
CytoDataFrame:

Full dataframe with optional scores and outlier boolean column.

src.cosmicqc.analyze.read_thresholds_set_from_file(feature_thresholds_file: str, feature_thresholds: str | None = None) Dict[str, int] | Dict[str, Dict[str, int]][source]

Reads a set of feature thresholds from a specified file.

This function takes the path to a feature thresholds file and a specific feature threshold string, reads the file, and returns the thresholds set from the file.

Parameters:
  • feature_thresholds_file (str) – The path to the file containing feature thresholds.

  • feature_thresholds (Optional str, default None) – A string specifying the feature thresholds. If we have None, return all thresholds.

Returns:

A dictionary containing the processed feature thresholds.

Return type:

dict

Raises:

LookupError – If the file does not contain the specified feature_thresholds key.

cosmicqc.cli

Setup coSMicQC CLI through python-fire

src.cosmicqc.cli.HasCustomRepr(component: object) bool[source]

Reproduces above HasCustomStr function to determine if component has a custom __repr__ method.

Parameters:

component – The object to check for a custom __repr__ method.

Returns:

Whether component has a custom __repr__ method.

src.cosmicqc.cli._PrintResult(component_trace: FireTrace, verbose: bool = False, serialize: bool | None = None) None[source]

Prints the result of the Fire call to stdout in a human readable way.

src.cosmicqc.cli.cli_analyze() None[source]

Run the analyze module functions through python-fire CLI

This function serves as the CLI entry point for functions within the analyze module.

cosmicqc.frame

cosmicqc.image