Skip to content

Patches

sopa.make_image_patches(sdata, patch_width=2000, patch_overlap=50, image_key=None, roi_key=SopaKeys.ROI, key_added=None)

Create overlapping patches on an image. This can be used for image-based segmentation methods such as Cellpose, which will run on each patch.

Parameters:

Name Type Description Default
sdata SpatialData

A SpatialData object.

required
patch_width int | None

Width of the patches, in pixels. If None, creates only one patch.

2000
patch_overlap int

Number of pixels of overlap between patches.

50
image_key str | None

Optional key of the image on which the patches will be made. If not provided, it is found automatically.

None
roi_key str | None

Optional name of the shapes that need to touch the patches. Patches that do not touch any shape will be ignored during segmentation. By default, uses "region_of_interest" if existing. If None, all patches will be used.

ROI
key_added str | None

Optional name of the patches to be saved. By default, uses "image_patches".

None
Source code in sopa/patches/_factory.py
def make_image_patches(
    sdata: SpatialData,
    patch_width: int | None = 2000,
    patch_overlap: int = 50,
    image_key: str | None = None,
    roi_key: str | None = SopaKeys.ROI,
    key_added: str | None = None,
):
    """Create overlapping patches on an image. This can be used for image-based segmentation methods such as Cellpose, which will run on each patch.

    Args:
        sdata: A `SpatialData` object.
        patch_width: Width of the patches, in pixels. If `None`, creates only one patch.
        patch_overlap: Number of pixels of overlap between patches.
        image_key: Optional key of the image on which the patches will be made. If not provided, it is found automatically.
        roi_key: Optional name of the shapes that need to touch the patches. Patches that do not touch any shape will be ignored during segmentation. By default, uses `"region_of_interest"` if existing. If `None`, all patches will be used.
        key_added: Optional name of the patches to be saved. By default, uses `"image_patches"`.
    """
    image_key, _ = get_spatial_image(sdata, key=image_key, return_key=True)

    patches = Patches2D(
        sdata,
        image_key,
        patch_width=patch_width,
        patch_overlap=patch_overlap,
        roi_key=roi_key,
    )

    patches.add_shapes(key_added=key_added)

sopa.make_transcript_patches(sdata, patch_width=2000, patch_overlap=50, points_key=None, prior_shapes_key=None, unassigned_value=None, min_points_per_patch=4000, write_cells_centroids=False, roi_key=SopaKeys.ROI, key_added=None, **kwargs)

Create overlapping patches on a transcripts dataframe, and save it in a cache. This can be used for trancript-based segmentation methods such as Baysor or Proseg.

Prior segmentation usage

To save assign a prior segmentation to each transcript, you can use the prior_shapes_key argument:

  • If a segmentation has already been performed (for example an existing 10X-Genomics segmentation), use prior_shapes_key="auto" to use it (or, provide manually the column name and the unassigned_value argument).
  • If you have already run segmentation with Sopa, use prior_shapes_key to denote the name of the shapes (GeoDataFrame) containing the boundaries, e.g. prior_shapes_key="cellpose_boundaries".

Parameters:

Name Type Description Default
sdata SpatialData

A SpatialData object.

required
patch_width float | int | None

Width of the patches, in microns. If None, creates only one patch.

2000
patch_overlap int

Number of microns of overlap between patches.

50
points_key str | None

Optional key of the points on which the patches will be made. If not provided, it is found automatically.

None
prior_shapes_key Literal['auto'] | str | None

Optional key of sdata containing the shapes with the prior segmentation, or column of the points dataframe. If "auto", use the prior column from the technology.

None
unassigned_value int | str | None

If prior_shapes_key has been provided and corresponds to a points column: this argument is the value given to the transcript that are not inside any cell.

None
min_points_per_patch int

Minimum number of points/transcripts for a patch to be considered for segmentation.

4000
write_cells_centroids bool

If True, the centroids of the prior cells will be saved. This is useful for some segmentation tools such as ComSeg.

False
roi_key str | None

Optional name of the shapes that need to touch the patches. Patches that do not touch any shape will be ignored during segmentation. By default, uses "region_of_interest" if existing. If None, all patches will be used.

ROI
key_added str | None

Optional name of the patches to be saved. By default, uses "transcripts_patches".

None
**kwargs int

Additional arguments passed to the OnDiskTranscriptPatches class.

{}
Source code in sopa/patches/_factory.py
def make_transcript_patches(
    sdata: SpatialData,
    patch_width: float | int | None = 2000,
    patch_overlap: int = 50,
    points_key: str | None = None,
    prior_shapes_key: Literal["auto"] | str | None = None,
    unassigned_value: int | str | None = None,
    min_points_per_patch: int = 4000,
    write_cells_centroids: bool = False,
    roi_key: str | None = SopaKeys.ROI,
    key_added: str | None = None,
    **kwargs: int,
):
    """Create overlapping patches on a transcripts dataframe, and save it in a cache. This can be used for trancript-based segmentation methods such as Baysor or Proseg.

    !!! info "Prior segmentation usage"
        To save assign a prior segmentation to each transcript, you can use the `prior_shapes_key` argument:

        - If a segmentation has already been performed (for example an existing 10X-Genomics segmentation), use `prior_shapes_key="auto"` to use it (or, provide manually the column name and the `unassigned_value` argument).
        - If you have already run segmentation with Sopa, use `prior_shapes_key` to denote the name of the shapes (GeoDataFrame) containing the boundaries, e.g. `prior_shapes_key="cellpose_boundaries"`.

    Args:
        sdata: A `SpatialData` object.
        patch_width: Width of the patches, in microns. If `None`, creates only one patch.
        patch_overlap: Number of microns of overlap between patches.
        points_key: Optional key of the points on which the patches will be made. If not provided, it is found automatically.
        prior_shapes_key: Optional key of `sdata` containing the shapes with the prior segmentation, or column of the points dataframe. If `"auto"`, use the prior column from the technology.
        unassigned_value: If `prior_shapes_key` has been provided and corresponds to a points column: this argument is the value given to the transcript that are not inside any cell.
        min_points_per_patch: Minimum number of points/transcripts for a patch to be considered for segmentation.
        write_cells_centroids: If `True`, the centroids of the prior cells will be saved. This is useful for some segmentation tools such as ComSeg.
        roi_key: Optional name of the shapes that need to touch the patches. Patches that do not touch any shape will be ignored during segmentation. By default, uses `"region_of_interest"` if existing. If `None`, all patches will be used.
        key_added: Optional name of the patches to be saved. By default, uses `"transcripts_patches"`.
        **kwargs: Additional arguments passed to the `OnDiskTranscriptPatches` class.
    """
    assert not write_cells_centroids or prior_shapes_key, "write_cells_centroids argument requires prior_shapes_key"

    points_key, _ = get_spatial_element(
        sdata.points,
        key=points_key or sdata.attrs.get(SopaAttrs.TRANSCRIPTS),
        return_key=True,
    )

    if prior_shapes_key == "auto":
        assert SopaAttrs.PRIOR_TUPLE_KEY in sdata.attrs, (
            f"prior_shapes_key='auto' requires a prior segmentation to be present in the points dataframe ('{SopaAttrs.PRIOR_TUPLE_KEY}' must be in `sdata.attrs`)."
        )
        prior_shapes_key, unassigned_value = sdata.attrs[SopaAttrs.PRIOR_TUPLE_KEY]

    patches = OnDiskTranscriptPatches(
        sdata,
        points_key,
        patch_width=patch_width,
        patch_overlap=patch_overlap,
        prior_shapes_key=prior_shapes_key,
        unassigned_value=unassigned_value,
        min_points_per_patch=min_points_per_patch,
        write_cells_centroids=write_cells_centroids,
        roi_key=roi_key,
        **kwargs,
    )

    patches.write()
    patches.add_shapes(key_added=key_added)

sopa.patches.compute_embeddings(sdata, model, patch_width, patch_overlap=0, level=0, magnification=None, image_key=None, batch_size=32, device=None, data_parallel=False, roi_key=SopaKeys.ROI, key_added=None, **kwargs)

It creates patches, runs a computer vision model on each patch, and store the embeddings of each all patches as an AnnData object. This is mostly useful for WSI images.

Info

The AnnData object will be saved into the SpatialData object with the key "{model_name}_embeddings" (see the model_name argument below), except if key_added is provided. The shapes of the patches will be saved with the key "embeddings_patches".

Warning

In addition to the WSI extra (pip install 'sopa[wsi]') and depending on the model used, you might need to install additional dependencies. Also, CONCH requires to be logged in Hugging Face and having approved their License.

Parameters:

Name Type Description Default
sdata SpatialData

A SpatialData object

required
model Callable | str

A supported model name (resnet50, histo_ssl, dinov2, hoptimus0, or conch), or a callable that takes as an input a tensor of size (batch_size, channels, x, y) and returns a vector for each tile (batch_size, emb_dim).

required
patch_width int

Width of the patches in pixels.

required
patch_overlap int

Width of the overlap between the patches in pixels.

0
level int | None

Image level on which the processing is performed. Either level or magnification should be provided.

0
magnification int | None

The target magnification on which the processing is performed. If magnification is provided, the level argument will be automatically computed.

None
image_key str | None

Optional image key of the image. By default, uses the only image (if only one) or the image used for cell or tissue segmentation.

None
batch_size int

Mini-batch size used during inference.

32
device str | None

Device used for the computer vision model.

None
data_parallel bool | list[int]

If True, the model will be run in data parallel mode. If a list of GPUs is provided, the model will be run in data parallel mode on the specified GPUs.

False
roi_key str | None

Optional name of the shapes that needs to touch the patches. Patches that do not touch any shape will be ignored. If None, all patches will be used. By default, uses the tissue segmentation if available.

ROI
key_added str | None

Optional name of the spatial element that will be added (storing the embeddings).

None
**kwargs int

Additional keyword arguments passed to the Patches2D constructor.

{}

Returns:

Type Description
str

The name of the AnnData table that was added to the SpatialData object.

Source code in sopa/patches/infer.py
def compute_embeddings(
    sdata: SpatialData,
    model: Callable | str,
    patch_width: int,
    patch_overlap: int = 0,
    level: int | None = 0,
    magnification: int | None = None,
    image_key: str | None = None,
    batch_size: int = 32,
    device: str | None = None,
    data_parallel: bool | list[int] = False,
    roi_key: str | None = SopaKeys.ROI,
    key_added: str | None = None,
    **kwargs: int,
) -> str:
    """It creates patches, runs a computer vision model on each patch, and store the embeddings of each all patches as an [`AnnData` object](https://anndata.readthedocs.io/en/stable/). This is mostly useful for WSI images.

    !!! info
        The `AnnData` object will be saved into the `SpatialData` object with the key `"{model_name}_embeddings"` (see the `model_name` argument below), except if `key_added` is provided.
        The shapes of the patches will be saved with the key `"embeddings_patches"`.

    !!! warning
        In addition to the WSI extra (`pip install 'sopa[wsi]'`) and depending on the model used, you might need to install additional dependencies. Also, CONCH requires to be logged in Hugging Face and having approved their License.

    Args:
        sdata: A `SpatialData` object
        model: A supported model name (`resnet50`, `histo_ssl`, `dinov2`, `hoptimus0`, or `conch`), or a callable that takes as an input a tensor of size `(batch_size, channels, x, y)` and returns a vector for each tile `(batch_size, emb_dim)`.
        patch_width: Width of the patches in pixels.
        patch_overlap: Width of the overlap between the patches in pixels.
        level: Image level on which the processing is performed. Either `level` or `magnification` should be provided.
        magnification: The target magnification on which the processing is performed. If `magnification` is provided, the `level` argument will be automatically computed.
        image_key: Optional image key of the image. By default, uses the only image (if only one) or the image used for cell or tissue segmentation.
        batch_size: Mini-batch size used during inference.
        device: Device used for the computer vision model.
        data_parallel: If `True`, the model will be run in data parallel mode. If a list of GPUs is provided, the model will be run in data parallel mode on the specified GPUs.
        roi_key: Optional name of the shapes that needs to touch the patches. Patches that do not touch any shape will be ignored. If `None`, all patches will be used. By default, uses the tissue segmentation if available.
        key_added: Optional name of the spatial element that will be added (storing the embeddings).
        **kwargs: Additional keyword arguments passed to the `Patches2D` constructor.

    Returns:
        The name of the `AnnData` table that was added to the `SpatialData` object.
    """
    try:
        import torch
    except ImportError:
        raise ImportError(
            "For patch embedding, you need `torch` (and perhaps `torchvision`). Consider installing the sopa WSI extra: `pip install 'sopa[wsi]'`."
        )
    from . import models

    if isinstance(model, str):
        assert model in models.available_models, (
            f"'{model}' is not a valid model name. Valid names are: {', '.join(list(models.available_models.keys()))}"
        )
        model_name, model = model, models.available_models[model]()
    else:
        model_name = model.__class__.__name__

    if device is not None:
        model.to(device)

    if data_parallel:
        ids = data_parallel if isinstance(data_parallel, list) else list(range(torch.cuda.device_count()))
        model = torch.nn.DataParallel(model, device_ids=ids)

    tile_loader = TileLoader(sdata, patch_width, image_key, level, magnification, patch_overlap, roi_key)

    log.info(f"Processing {len(tile_loader)} patches extracted from level {tile_loader.level}")

    predictions = []
    with torch.no_grad():
        for i in tqdm.tqdm(range(0, len(tile_loader), batch_size)):
            batch = tile_loader[i : i + batch_size]
            embedding: torch.Tensor = model(batch.to(device))
            assert len(embedding.shape) == 2, "The model must have the signature (B, C, Y, X) -> (B, C)"

            predictions.append(embedding.cpu())

    predictions = torch.cat(predictions)
    if len(predictions.shape) == 1:
        predictions = torch.unsqueeze(predictions, 1)

    patches = tile_loader.patches
    patches.add_shapes(key_added=SopaKeys.EMBEDDINGS_PATCHES)

    gdf = sdata[SopaKeys.EMBEDDINGS_PATCHES]

    adata = AnnData(predictions.numpy())
    adata.obs["region"] = SopaKeys.EMBEDDINGS_PATCHES
    adata.obs["instance"] = gdf.index.values
    adata = TableModel.parse(
        adata,
        region=SopaKeys.EMBEDDINGS_PATCHES,
        region_key="region",
        instance_key="instance",
    )
    adata.obsm["spatial"] = patches.centroids()
    adata.uns["embedding_config"] = {
        "patch_width": patch_width,
        "patch_overlap": patch_overlap,
        "magnification": magnification,
        "level": tile_loader.level,
        "level_downsample": tile_loader.level_downsample,
        "tile_resize_factor": tile_loader.tile_resize_factor,
        "model_name": model_name,
    }

    key_added = key_added or f"{model_name}_embeddings"
    add_spatial_element(sdata, key_added, adata)

    return key_added

sopa.patches.cluster_embeddings(sdata, element, method='leiden', key_added='cluster', **method_kwargs)

Create clusters of the patches embeddings (obtained from sopa.patches.compute_embeddings).

Info

The clusters are added to the key_added column of the "inference_patches" shapes (key_added='cluster' by default).

Parameters:

Name Type Description Default
sdata SpatialData | None

A SpatialData object. Can be None if element is an AnnData object.

required
element AnnData | str

The AnnData containing the embeddings, or the name of the element

required
method Callable | str

Callable that takes as an AnnData object and returns an array of clusters of size n_obs, or an available method name (leiden or kmeans)

'leiden'
key_added str

The key containing the clusters to be added to the element.obs

'cluster'
method_kwargs str

kwargs provided to the method callable

{}
Source code in sopa/patches/cluster.py
def cluster_embeddings(
    sdata: SpatialData | None,
    element: AnnData | str,
    method: Callable | str = "leiden",
    key_added: str = "cluster",
    **method_kwargs: str,
) -> None:
    """Create clusters of the patches embeddings (obtained from [sopa.patches.compute_embeddings][]).

    Info:
        The clusters are added to the `key_added` column of the "inference_patches" shapes (`key_added='cluster'` by default).

    Args:
        sdata: A `SpatialData` object. Can be `None` if element is an `AnnData` object.
        element: The `AnnData` containing the embeddings, or the name of the element
        method: Callable that takes as an AnnData object and returns an array of clusters of size `n_obs`, or an available method name (`leiden` or `kmeans`)
        key_added: The key containing the clusters to be added to the `element.obs`
        method_kwargs: kwargs provided to the method callable
    """
    if isinstance(element, str):
        element: AnnData = sdata.tables[element]

    if isinstance(method, str):
        assert method in METHODS_DICT, f"Method {method} is not available. Use one of: {', '.join(METHODS_DICT.keys())}"
        method = METHODS_DICT[method]

    element.obs[key_added] = method(element, **method_kwargs)
    element.obs[key_added] = element.obs[key_added].astype("category")