sopa.annotation

`sopa.annotation.tangram_annotate(sdata, adata_sc, cell_type_key, reference_preprocessing=None, bag_size=10000, max_obs_reference=10000, **kwargs)`

Tangram multi-level annotation. Tangram is run on multiple bags of cells to decrease the RAM usage.

Parameters:

Name	Type	Description	Default
`sdata`	`SpatialData`	A `SpatialData` object	required
`adata_sc`	`AnnData`	A scRNAseq annotated reference	required
`cell_type_key`	`str`	Key of `adata_sc.obs` containing the cell types. For multi-level annotation, provide other levels like such: if `cell_type_key = "ct"`, then `"ct_level1"` and `"ct_level2"` are the two next levels	required
`reference_preprocessing`	`str`	Preprocessing method used on the reference. Can be `"log1p"` (normalize_total + log1p) or `"normalized"` (just normalize_total). By default, consider that no processing was applied (raw counts)	`None`
`bag_size`	`int`	Size of each bag on which tangram will be run. Use smaller bags to lower the RAM usage	`10000`
`max_obs_reference`	`int`	Maximum number of cells used in `adata_sc` at each level. Decrease it to lower the RAM usage.	`10000`

Source code in sopa/annotation/tangram/run.py

def tangram_annotate(
    sdata: SpatialData,
    adata_sc: AnnData,
    cell_type_key: str,
    reference_preprocessing: str = None,
    bag_size: int = 10_000,
    max_obs_reference: int = 10_000,
    **kwargs,
):
    """Tangram multi-level annotation. Tangram is run on multiple bags of cells to decrease the RAM usage.

    Args:
        sdata: A `SpatialData` object
        adata_sc: A scRNAseq annotated reference
        cell_type_key: Key of `adata_sc.obs` containing the cell types. For multi-level annotation, provide other levels like such: if `cell_type_key = "ct"`, then `"ct_level1"` and `"ct_level2"` are the two next levels
        reference_preprocessing: Preprocessing method used on the reference. Can be `"log1p"` (normalize_total + log1p) or `"normalized"` (just normalize_total). By default, consider that no processing was applied (raw counts)
        bag_size: Size of each bag on which tangram will be run. Use smaller bags to lower the RAM usage
        max_obs_reference: Maximum number of cells used in `adata_sc` at each level. Decrease it to lower the RAM usage.
    """
    assert SopaKeys.TABLE in sdata.tables, f"No '{SopaKeys.TABLE}' found in sdata.tables"

    ad_sp = sdata.tables[SopaKeys.TABLE]

    MultiLevelAnnotation(
        ad_sp,
        adata_sc,
        cell_type_key,
        reference_preprocessing,
        bag_size,
        max_obs_reference,
        **kwargs,
    ).run()

`sopa.annotation.higher_z_score(adata, marker_cell_dict, cell_type_key='cell_type')`

Simple channel-based segmentation using a marker-to-population dictionary

Parameters:

Name	Type	Description	Default
`adata`	`AnnData`	An `AnnData` object	required
`marker_cell_dict`	`dict`	Dictionary whose keys are channels, and values are the corresponding populations.	required
`cell_type_key`	`str`	Key of `adata.obs` where annotations will be stored	`'cell_type'`

Source code in sopa/annotation/fluorescence.py

def higher_z_score(adata: AnnData, marker_cell_dict: dict, cell_type_key: str = "cell_type"):
    """Simple channel-based segmentation using a marker-to-population dictionary

    Args:
        adata: An `AnnData` object
        marker_cell_dict: Dictionary whose keys are channels, and values are the corresponding populations.
        cell_type_key: Key of `adata.obs` where annotations will be stored
    """
    adata.obsm[SopaKeys.Z_SCORES] = preprocess_fluo(adata)

    markers, cell_types = list(marker_cell_dict.keys()), np.array(list(marker_cell_dict.values()))
    ct_indices = adata.obsm[SopaKeys.Z_SCORES][markers].values.argmax(1)

    adata.obs[cell_type_key] = cell_types[ct_indices]
    adata.uns[SopaKeys.UNS_KEY][SopaKeys.UNS_CELL_TYPES] = [cell_type_key]

    log.info(f"Annotation counts: {adata.obs[cell_type_key].value_counts()}")

`sopa.annotation.preprocess_fluo(adata)`

Preprocess fluorescence data. For each column \(X\), we compute \(asinh(\frac{X}{5Q(0.2, X)})\) and apply standardization

Parameters:

Name	Type	Description	Default
`adata`	`AnnData`	An `AnnData` object	required

Returns:

Type	Description
`DataFrame`	A dataframe of preprocessed channels intensities

Source code in sopa/annotation/fluorescence.py

def preprocess_fluo(adata: AnnData) -> pd.DataFrame:
    """Preprocess fluorescence data. For each column $X$, we compute $asinh(\\frac{X}{5Q(0.2, X)})$ and apply standardization

    Args:
        adata: An `AnnData` object

    Returns:
        A dataframe of preprocessed channels intensities
    """
    if SopaKeys.INTENSITIES_OBSM in adata.obsm:
        df = adata.obsm[SopaKeys.INTENSITIES_OBSM]
    else:
        df = adata.to_df()

    divider = 5 * np.quantile(df, 0.2, axis=0)
    divider[divider == 0] = df.max(axis=0)[divider == 0]

    scaled = np.arcsinh(df / divider)
    return (scaled - scaled.mean(0)) / scaled.std(0)