Patches

`sopa.make_image_patches(sdata, patch_width=2000, patch_overlap=50, image_key=None, roi_key=SopaKeys.ROI, key_added=None)`

Create overlapping patches on an image. This can be used for image-based segmentation methods such as Cellpose, which will run on each patch.

Parameters:

Name	Type	Description	Default
`sdata`	`SpatialData`	A `SpatialData` object.	required
`patch_width`	`int \| None`	Width of the patches, in pixels. If `None`, creates only one patch.	`2000`
`patch_overlap`	`int`	Number of pixels of overlap between patches.	`50`
`image_key`	`str \| None`	Optional key of the image on which the patches will be made. If not provided, it is found automatically.	`None`
`roi_key`	`str \| None`	Optional name of the shapes that need to touch the patches. Patches that do not touch any shape will be ignored during segmentation. By default, uses `"region_of_interest"` if existing. If `None`, all patches will be used.	`ROI`
`key_added`	`str \| None`	Optional name of the patches to be saved. By default, uses `"image_patches"`.	`None`

Source code in sopa/patches/_factory.py

def make_image_patches(
    sdata: SpatialData,
    patch_width: int | None = 2000,
    patch_overlap: int = 50,
    image_key: str | None = None,
    roi_key: str | None = SopaKeys.ROI,
    key_added: str | None = None,
):
    """Create overlapping patches on an image. This can be used for image-based segmentation methods such as Cellpose, which will run on each patch.

    Args:
        sdata: A `SpatialData` object.
        patch_width: Width of the patches, in pixels. If `None`, creates only one patch.
        patch_overlap: Number of pixels of overlap between patches.
        image_key: Optional key of the image on which the patches will be made. If not provided, it is found automatically.
        roi_key: Optional name of the shapes that need to touch the patches. Patches that do not touch any shape will be ignored during segmentation. By default, uses `"region_of_interest"` if existing. If `None`, all patches will be used.
        key_added: Optional name of the patches to be saved. By default, uses `"image_patches"`.
    """
    image_key, _ = get_spatial_image(sdata, key=image_key, return_key=True)

    patches = Patches2D(
        sdata,
        image_key,
        patch_width=patch_width,
        patch_overlap=patch_overlap,
        roi_key=roi_key,
    )

    patches.add_shapes(key_added=key_added)

`sopa.make_transcript_patches(sdata, patch_width=2000, patch_overlap=50, points_key=None, prior_shapes_key=None, unassigned_value=None, min_points_per_patch=4000, write_cells_centroids=False, roi_key=SopaKeys.ROI, key_added=None, **kwargs)`

Create overlapping patches on a transcripts dataframe, and save it in a cache. This can be used for trancript-based segmentation methods such as Baysor or Proseg.

Prior segmentation usage

To save assign a prior segmentation to each transcript, you can use the prior_shapes_key argument:

If a segmentation has already been performed (for example an existing 10X-Genomics segmentation), use prior_shapes_key="auto" to use it (or, provide manually the column name and the unassigned_value argument).
If you have already run segmentation with Sopa, use prior_shapes_key to denote the name of the shapes (GeoDataFrame) containing the boundaries, e.g. prior_shapes_key="cellpose_boundaries".

Parameters:

Name	Type	Description	Default
`sdata`	`SpatialData`	A `SpatialData` object.	required
`patch_width`	`float \| int \| None`	Width of the patches, in microns. If `None`, creates only one patch.	`2000`
`patch_overlap`	`int`	Number of microns of overlap between patches.	`50`
`points_key`	`str \| None`	Optional key of the points on which the patches will be made. If not provided, it is found automatically.	`None`
`prior_shapes_key`	`Literal['auto'] \| str \| None`	Optional key of `sdata` containing the shapes with the prior segmentation, or column of the points dataframe. If `"auto"`, use the prior column from the technology.	`None`
`unassigned_value`	`int \| str \| None`	If `prior_shapes_key` has been provided and corresponds to a points column: this argument is the value given to the transcript that are not inside any cell.	`None`
`min_points_per_patch`	`int`	Minimum number of points/transcripts for a patch to be considered for segmentation.	`4000`
`write_cells_centroids`	`bool`	If `True`, the centroids of the prior cells will be saved. This is useful for some segmentation tools such as ComSeg.	`False`
`roi_key`	`str \| None`	Optional name of the shapes that need to touch the patches. Patches that do not touch any shape will be ignored during segmentation. By default, uses `"region_of_interest"` if existing. If `None`, all patches will be used.	`ROI`
`key_added`	`str \| None`	Optional name of the patches to be saved. By default, uses `"transcripts_patches"`.	`None`
`**kwargs`	`int`	Additional arguments passed to the `OnDiskTranscriptPatches` class.	`{}`

Source code in sopa/patches/_factory.py

def make_transcript_patches(
    sdata: SpatialData,
    patch_width: float | int | None = 2000,
    patch_overlap: int = 50,
    points_key: str | None = None,
    prior_shapes_key: Literal["auto"] | str | None = None,
    unassigned_value: int | str | None = None,
    min_points_per_patch: int = 4000,
    write_cells_centroids: bool = False,
    roi_key: str | None = SopaKeys.ROI,
    key_added: str | None = None,
    **kwargs: int,
):
    """Create overlapping patches on a transcripts dataframe, and save it in a cache. This can be used for trancript-based segmentation methods such as Baysor or Proseg.

    !!! info "Prior segmentation usage"
        To save assign a prior segmentation to each transcript, you can use the `prior_shapes_key` argument:

        - If a segmentation has already been performed (for example an existing 10X-Genomics segmentation), use `prior_shapes_key="auto"` to use it (or, provide manually the column name and the `unassigned_value` argument).
        - If you have already run segmentation with Sopa, use `prior_shapes_key` to denote the name of the shapes (GeoDataFrame) containing the boundaries, e.g. `prior_shapes_key="cellpose_boundaries"`.

    Args:
        sdata: A `SpatialData` object.
        patch_width: Width of the patches, in microns. If `None`, creates only one patch.
        patch_overlap: Number of microns of overlap between patches.
        points_key: Optional key of the points on which the patches will be made. If not provided, it is found automatically.
        prior_shapes_key: Optional key of `sdata` containing the shapes with the prior segmentation, or column of the points dataframe. If `"auto"`, use the prior column from the technology.
        unassigned_value: If `prior_shapes_key` has been provided and corresponds to a points column: this argument is the value given to the transcript that are not inside any cell.
        min_points_per_patch: Minimum number of points/transcripts for a patch to be considered for segmentation.
        write_cells_centroids: If `True`, the centroids of the prior cells will be saved. This is useful for some segmentation tools such as ComSeg.
        roi_key: Optional name of the shapes that need to touch the patches. Patches that do not touch any shape will be ignored during segmentation. By default, uses `"region_of_interest"` if existing. If `None`, all patches will be used.
        key_added: Optional name of the patches to be saved. By default, uses `"transcripts_patches"`.
        **kwargs: Additional arguments passed to the `OnDiskTranscriptPatches` class.
    """
    assert not write_cells_centroids or prior_shapes_key, "write_cells_centroids argument requires prior_shapes_key"

    points_key, _ = get_spatial_element(
        sdata.points,
        key=points_key or sdata.attrs.get(SopaAttrs.TRANSCRIPTS),
        return_key=True,
    )

    if prior_shapes_key == "auto":
        assert SopaAttrs.PRIOR_TUPLE_KEY in sdata.attrs, (
            f"prior_shapes_key='auto' requires a prior segmentation to be present in the points dataframe ('{SopaAttrs.PRIOR_TUPLE_KEY}' must be in `sdata.attrs`)."
        )
        prior_shapes_key, unassigned_value = sdata.attrs[SopaAttrs.PRIOR_TUPLE_KEY]

    patches = OnDiskTranscriptPatches(
        sdata,
        points_key,
        patch_width=patch_width,
        patch_overlap=patch_overlap,
        prior_shapes_key=prior_shapes_key,
        unassigned_value=unassigned_value,
        min_points_per_patch=min_points_per_patch,
        write_cells_centroids=write_cells_centroids,
        roi_key=roi_key,
        **kwargs,
    )

    patches.write()
    patches.add_shapes(key_added=key_added)

`sopa.patches.compute_embeddings(sdata, model, patch_width, patch_overlap=0, level=0, magnification=None, image_key=None, batch_size=32, device=None, roi_key=SopaKeys.ROI, key_added=None, **kwargs)`

It creates patches, runs a computer vision model on each patch, and store the embeddings of each all patches as an image. This is mostly useful for WSI images.

Info

The image will be saved into the SpatialData object with the key "{model_name}_embeddings" (see the model_name argument below), except if key_added is provided. The shapes of the patches will be saved with the key "embeddings_patches".

Warning

In addition to the WSI extra (pip install 'sopa[wsi]') and depending on the model used, you might need to install additional dependencies. Also, CONCH requires to be logged in Hugging Face and having approved their License.

Parameters:

Name	Type	Description	Default
`sdata`	`SpatialData`	A `SpatialData` object	required
`model`	`Callable \| str`	Callable that takes as an input a tensor of size `(batch_size, channels, x, y)` and returns a vector for each tile `(batch_size, emb_dim)`, or a string with the name of one of the available models (`resnet50`, `histo_ssl`, `dinov2`, `hoptimus0`, `conch`).	required
`patch_width`	`int`	Width (pixels) of the patches.	required
`patch_overlap`	`int`	Width (pixels) of the overlap between the patches.	`0`
`level`	`int \| None`	Image level on which the processing is performed. Either `level` or `magnification` should be provided.	`0`
`magnification`	`int \| None`	The target magnification on which the processing is performed. If `magnification` is provided, the `level` argument will be automatically computed.	`None`
`image_key`	`str \| None`	Optional image key of the image, unecessary if there is only one image.	`None`
`batch_size`	`int`	Mini-batch size used during inference.	`32`
`device`	`str \| None`	Device used for the computer vision model.	`None`
`roi_key`	`str \| None`	Optional name of the shapes that needs to touch the patches. Patches that do not touch any shape will be ignored. If `None`, all patches will be used.	`ROI`
`key_added`	`str \| None`	Optional name of the spatial element that will be added (storing the embeddings).	`None`
`**kwargs`	`int`	Additional keyword arguments passed to the `Patches2D` constructor.	`{}`

Returns:

Type	Description
`str`	The key of the spatial element that was added to the `SpatialData` object.

Source code in sopa/patches/infer.py

def compute_embeddings(
    sdata: SpatialData,
    model: Callable | str,
    patch_width: int,
    patch_overlap: int = 0,
    level: int | None = 0,
    magnification: int | None = None,
    image_key: str | None = None,
    batch_size: int = 32,
    device: str | None = None,
    roi_key: str | None = SopaKeys.ROI,
    key_added: str | None = None,
    **kwargs: int,
) -> str:
    """It creates patches, runs a computer vision model on each patch, and store the embeddings of each all patches as an image. This is mostly useful for WSI images.

    !!! info
        The image will be saved into the `SpatialData` object with the key `"{model_name}_embeddings"` (see the `model_name` argument below), except if `key_added` is provided.
        The shapes of the patches will be saved with the key `"embeddings_patches"`.

    !!! warning
        In addition to the WSI extra (`pip install 'sopa[wsi]'`) and depending on the model used, you might need to install additional dependencies. Also, CONCH requires to be logged in Hugging Face and having approved their License.

    Args:
        sdata: A `SpatialData` object
        model: Callable that takes as an input a tensor of size `(batch_size, channels, x, y)` and returns a vector for each tile `(batch_size, emb_dim)`, or a string with the name of one of the available models (`resnet50`, `histo_ssl`, `dinov2`, `hoptimus0`, `conch`).
        patch_width: Width (pixels) of the patches.
        patch_overlap: Width (pixels) of the overlap between the patches.
        level: Image level on which the processing is performed. Either `level` or `magnification` should be provided.
        magnification: The target magnification on which the processing is performed. If `magnification` is provided, the `level` argument will be automatically computed.
        image_key: Optional image key of the image, unecessary if there is only one image.
        batch_size: Mini-batch size used during inference.
        device: Device used for the computer vision model.
        roi_key: Optional name of the shapes that needs to touch the patches. Patches that do not touch any shape will be ignored. If `None`, all patches will be used.
        key_added: Optional name of the spatial element that will be added (storing the embeddings).
        **kwargs: Additional keyword arguments passed to the `Patches2D` constructor.

    Returns:
        The key of the spatial element that was added to the `SpatialData` object.
    """
    try:
        import torch
    except ImportError:
        raise ImportError(
            "For patch embedding, you need `torch` (and perhaps `torchvision`). Consider installing the sopa WSI extra: `pip install 'sopa[wsi]'`."
        )

    from ._inference import Inference

    image = _get_image_for_inference(sdata, image_key)

    infer = Inference(image, model, patch_width, level, magnification, device)
    patches = Patches2D(sdata, infer.image, infer.patch_width, patch_overlap, roi_key=roi_key, **kwargs)

    log.info(f"Processing {len(patches)} patches extracted from level {infer.level}")

    predictions = []
    for i in tqdm.tqdm(range(0, len(patches), batch_size)):
        prediction = infer.infer_bboxes(patches.bboxes[i : i + batch_size])
        predictions.extend(prediction)
    predictions = torch.stack(predictions)

    if len(predictions.shape) == 1:
        predictions = torch.unsqueeze(predictions, 1)

    patches.add_shapes(key_added=SopaKeys.EMBEDDINGS_PATCHES)

    gdf = sdata[SopaKeys.EMBEDDINGS_PATCHES]

    adata = AnnData(predictions.numpy())
    adata.obs["region"] = SopaKeys.EMBEDDINGS_PATCHES
    adata.obs["instance"] = gdf.index.values
    adata = TableModel.parse(
        adata,
        region=SopaKeys.EMBEDDINGS_PATCHES,
        region_key="region",
        instance_key="instance",
    )
    adata.obsm["spatial"] = patches.centroids()
    adata.uns["embedding_config"] = {
        "patch_width": patch_width,
        "patch_overlap": patch_overlap,
        "magnification": magnification,
        "level": infer.level,
        "resize_factor": infer.resize_factor,
        "model_str": infer.model_str,
    }

    key_added = key_added or f"{infer.model_str}_embeddings"
    add_spatial_element(sdata, key_added, adata)

    return key_added

`sopa.patches.cluster_embeddings(sdata, element, method='leiden', key_added='cluster', **method_kwargs)`

Create clusters of the patches embeddings (obtained from sopa.patches.compute_embeddings).

Info

The clusters are added to the key_added column of the "inference_patches" shapes (key_added='cluster' by default).

Parameters:

Name	Type	Description	Default
`sdata`	`SpatialData \| None`	A `SpatialData` object. Can be `None` if element is an `AnnData` object.	required
`element`	`AnnData \| str`	The `AnnData` containing the embeddings, or the name of the element	required
`method`	`Callable \| str`	Callable that takes as an AnnData object and returns an array of clusters of size `n_obs`, or an available method name (`leiden` or `kmeans`)	`'leiden'`
`key_added`	`str`	The key containing the clusters to be added to the `element.obs`	`'cluster'`
`method_kwargs`	`str`	kwargs provided to the method callable	`{}`

Source code in sopa/patches/cluster.py

def cluster_embeddings(
    sdata: SpatialData | None,
    element: AnnData | str,
    method: Callable | str = "leiden",
    key_added: str = "cluster",
    **method_kwargs: str,
) -> None:
    """Create clusters of the patches embeddings (obtained from [sopa.patches.compute_embeddings][]).

    Info:
        The clusters are added to the `key_added` column of the "inference_patches" shapes (`key_added='cluster'` by default).

    Args:
        sdata: A `SpatialData` object. Can be `None` if element is an `AnnData` object.
        element: The `AnnData` containing the embeddings, or the name of the element
        method: Callable that takes as an AnnData object and returns an array of clusters of size `n_obs`, or an available method name (`leiden` or `kmeans`)
        key_added: The key containing the clusters to be added to the `element.obs`
        method_kwargs: kwargs provided to the method callable
    """
    if isinstance(element, str):
        element: AnnData = sdata.tables[element]

    if isinstance(method, str):
        assert method in METHODS_DICT, f"Method {method} is not available. Use one of: {', '.join(METHODS_DICT.keys())}"
        method = METHODS_DICT[method]

    element.obs[key_added] = method(element, **method_kwargs)
    element.obs[key_added] = element.obs[key_added].astype("category")