Skip to content

sopa.annotation

sopa.annotation.tangram_annotate(sdata, adata_sc, cell_type_key, reference_preprocessing=None, bag_size=10000, max_obs_reference=10000, **kwargs)

Tangram multi-level annotation. Tangram is run on multiple bags of cells to decrease the RAM usage.

Parameters:

Name Type Description Default
sdata SpatialData

A SpatialData object

required
adata_sc AnnData

A scRNAseq annotated reference

required
cell_type_key str

Key of adata_sc.obs containing the cell types. For multi-level annotation, provide other levels like such: if cell_type_key = "ct", then "ct_level1" and "ct_level2" are the two next levels

required
reference_preprocessing str

Preprocessing method used on the reference. Can be "log1p" (normalize_total + log1p) or "normalized" (just normalize_total). By default, consider that no processing was applied (raw counts)

None
bag_size int

Size of each bag on which tangram will be run. Use smaller bags to lower the RAM usage

10000
max_obs_reference int

Maximum number of cells used in adata_sc at each level. Decrease it to lower the RAM usage.

10000
Source code in sopa/annotation/tangram/run.py
def tangram_annotate(
    sdata: SpatialData,
    adata_sc: AnnData,
    cell_type_key: str,
    reference_preprocessing: str = None,
    bag_size: int = 10_000,
    max_obs_reference: int = 10_000,
    **kwargs,
):
    """Tangram multi-level annotation. Tangram is run on multiple bags of cells to decrease the RAM usage.

    Args:
        sdata: A `SpatialData` object
        adata_sc: A scRNAseq annotated reference
        cell_type_key: Key of `adata_sc.obs` containing the cell types. For multi-level annotation, provide other levels like such: if `cell_type_key = "ct"`, then `"ct_level1"` and `"ct_level2"` are the two next levels
        reference_preprocessing: Preprocessing method used on the reference. Can be `"log1p"` (normalize_total + log1p) or `"normalized"` (just normalize_total). By default, consider that no processing was applied (raw counts)
        bag_size: Size of each bag on which tangram will be run. Use smaller bags to lower the RAM usage
        max_obs_reference: Maximum number of cells used in `adata_sc` at each level. Decrease it to lower the RAM usage.
    """
    assert SopaKeys.TABLE in sdata.tables, f"No '{SopaKeys.TABLE}' found in sdata.tables"

    ad_sp = sdata.tables[SopaKeys.TABLE]

    MultiLevelAnnotation(
        ad_sp,
        adata_sc,
        cell_type_key,
        reference_preprocessing,
        bag_size,
        max_obs_reference,
        **kwargs,
    ).run()

sopa.annotation.higher_z_score(adata, marker_cell_dict, cell_type_key='cell_type')

Simple channel-based segmentation using a marker-to-population dictionary

Parameters:

Name Type Description Default
adata AnnData

An AnnData object

required
marker_cell_dict dict

Dictionary whose keys are channels, and values are the corresponding populations.

required
cell_type_key str

Key of adata.obs where annotations will be stored

'cell_type'
Source code in sopa/annotation/fluorescence.py
def higher_z_score(adata: AnnData, marker_cell_dict: dict, cell_type_key: str = "cell_type"):
    """Simple channel-based segmentation using a marker-to-population dictionary

    Args:
        adata: An `AnnData` object
        marker_cell_dict: Dictionary whose keys are channels, and values are the corresponding populations.
        cell_type_key: Key of `adata.obs` where annotations will be stored
    """
    adata.obsm[SopaKeys.Z_SCORES] = preprocess_fluo(adata)

    markers, cell_types = list(marker_cell_dict.keys()), np.array(list(marker_cell_dict.values()))
    ct_indices = adata.obsm[SopaKeys.Z_SCORES][markers].values.argmax(1)

    adata.obs[cell_type_key] = cell_types[ct_indices]
    adata.uns[SopaKeys.UNS_KEY][SopaKeys.UNS_CELL_TYPES] = [cell_type_key]

    log.info(f"Annotation counts: {adata.obs[cell_type_key].value_counts()}")

sopa.annotation.preprocess_fluo(adata)

Preprocess fluorescence data. For each column \(X\), we compute \(asinh(\frac{X}{5Q(0.2, X)})\) and apply standardization

Parameters:

Name Type Description Default
adata AnnData

An AnnData object

required

Returns:

Type Description
DataFrame

A dataframe of preprocessed channels intensities

Source code in sopa/annotation/fluorescence.py
def preprocess_fluo(adata: AnnData) -> pd.DataFrame:
    """Preprocess fluorescence data. For each column $X$, we compute $asinh(\\frac{X}{5Q(0.2, X)})$ and apply standardization

    Args:
        adata: An `AnnData` object

    Returns:
        A dataframe of preprocessed channels intensities
    """
    if SopaKeys.INTENSITIES_OBSM in adata.obsm:
        df = adata.obsm[SopaKeys.INTENSITIES_OBSM]
    else:
        df = adata.to_df()

    divider = 5 * np.quantile(df, 0.2, axis=0)
    divider[divider == 0] = df.max(axis=0)[divider == 0]

    scaled = np.arcsinh(df / divider)
    return (scaled - scaled.mean(0)) / scaled.std(0)