Skip to content

Patches

sopa.make_image_patches(sdata, patch_width=2000, patch_overlap=50, image_key=None, key_added=None)

Create overlapping patches on an image. This can be used for image-based segmentation methods such as Cellpose, which will run on each patch.

Parameters:

Name Type Description Default
sdata SpatialData

A SpatialData object.

required
patch_width int

Width of the patches, in pixels.

2000
patch_overlap int

Number of pixels of overlap between patches.

50
image_key str | None

Optional key of the image on which the patches will be made. If not provided, it is found automatically.

None
key_added str | None

Optional name of the patches to be saved. By default, uses "image_patches".

None
Source code in sopa/patches/_factory.py
def make_image_patches(
    sdata: SpatialData,
    patch_width: int = 2000,
    patch_overlap: int = 50,
    image_key: str | None = None,
    key_added: str | None = None,
):
    """Create overlapping patches on an image. This can be used for image-based segmentation methods such as Cellpose, which will run on each patch.

    Args:
        sdata: A `SpatialData` object.
        patch_width: Width of the patches, in pixels.
        patch_overlap: Number of pixels of overlap between patches.
        image_key: Optional key of the image on which the patches will be made. If not provided, it is found automatically.
        key_added: Optional name of the patches to be saved. By default, uses `"image_patches"`.
    """
    image_key, _ = get_spatial_image(sdata, key=image_key, return_key=True)

    patches = Patches2D(sdata, image_key, patch_width=patch_width, patch_overlap=patch_overlap)

    patches.add_shapes(key_added=key_added)

sopa.make_transcript_patches(sdata, patch_width=2000, patch_overlap=50, points_key=None, prior_shapes_key=None, unassigned_value=None, min_points_per_patch=4000, write_cells_centroids=False, key_added=None, **kwargs)

Create overlapping patches on a transcripts dataframe, and save it in a cache. This can be used for trancript-based segmentation methods such as Baysor.

Prior segmentation usage

To save assign a prior segmentation to each transcript, you can use the prior_shapes_key argument:

  • If a segmentation has already been performed (for example an existing 10X-Genomics segmentation), use prior_shapes_key to denote the column of the transcript dataframe containing the cell IDs (you can also optionaly use the unassigned_value argument).
  • If you have already run segmentation with Sopa, use prior_shapes_key to denote the name of the shapes (GeoDataFrame) containing the boundaries.

Parameters:

Name Type Description Default
sdata SpatialData

A SpatialData object.

required
patch_width int

Width of the patches, in microns.

2000
patch_overlap int

Number of microns of overlap between patches.

50
points_key str | None

Optional key of the points on which the patches will be made. If not provided, it is found automatically.

None
prior_shapes_key str | None

Optional key of sdata containing the shapes with the prior segmentation, or column of the points dataframe.

None
unassigned_value int | str | None

If prior_shapes_key has been provided and corresponds to a points column: this argument is the value given to the transcript that are not inside any cell.

None
min_points_per_patch int

Minimum number of points/transcripts for a patch to be considered for segmentation.

4000
write_cells_centroids bool

If True, the centroids of the prior cells will be saved. This is useful for some segmentation tools such as ComSeg.

False
key_added str | None

Optional name of the patches to be saved. By default, uses "transcripts_patches".

None
**kwargs int

Additional arguments passed to the OnDiskTranscriptPatches class.

{}
Source code in sopa/patches/_factory.py
def make_transcript_patches(
    sdata: SpatialData,
    patch_width: int = 2000,
    patch_overlap: int = 50,
    points_key: str | None = None,
    prior_shapes_key: str | None = None,
    unassigned_value: int | str | None = None,
    min_points_per_patch: int = 4000,
    write_cells_centroids: bool = False,
    key_added: str | None = None,
    **kwargs: int,
):
    """Create overlapping patches on a transcripts dataframe, and save it in a cache. This can be used for trancript-based segmentation methods such as Baysor.

    !!! info "Prior segmentation usage"
        To save assign a prior segmentation to each transcript, you can use the `prior_shapes_key` argument:

        - If a segmentation has already been performed (for example an existing 10X-Genomics segmentation), use `prior_shapes_key` to denote the column of the transcript dataframe containing the cell IDs (you can also optionaly use the `unassigned_value` argument).
        - If you have already run segmentation with Sopa, use `prior_shapes_key` to denote the name of the shapes (GeoDataFrame) containing the boundaries.

    Args:
        sdata: A `SpatialData` object.
        patch_width: Width of the patches, in microns.
        patch_overlap: Number of microns of overlap between patches.
        points_key: Optional key of the points on which the patches will be made. If not provided, it is found automatically.
        prior_shapes_key: Optional key of `sdata` containing the shapes with the prior segmentation, or column of the points dataframe.
        unassigned_value: If `prior_shapes_key` has been provided and corresponds to a points column: this argument is the value given to the transcript that are not inside any cell.
        min_points_per_patch: Minimum number of points/transcripts for a patch to be considered for segmentation.
        write_cells_centroids: If `True`, the centroids of the prior cells will be saved. This is useful for some segmentation tools such as ComSeg.
        key_added: Optional name of the patches to be saved. By default, uses `"transcripts_patches"`.
        **kwargs: Additional arguments passed to the `OnDiskTranscriptPatches` class.
    """
    assert not write_cells_centroids or prior_shapes_key, "write_cells_centroids argument requires prior_shapes_key"

    points_key, _ = get_spatial_element(
        sdata.points, key=points_key or sdata.attrs.get(SopaAttrs.TRANSCRIPTS), return_key=True
    )

    patches = OnDiskTranscriptPatches(
        sdata,
        points_key,
        patch_width=patch_width,
        patch_overlap=patch_overlap,
        prior_shapes_key=prior_shapes_key,
        unassigned_value=unassigned_value,
        min_points_per_patch=min_points_per_patch,
        write_cells_centroids=write_cells_centroids,
        **kwargs,
    )

    patches.write()
    patches.add_shapes(key_added=key_added)

sopa.patches.compute_embeddings(sdata, model, patch_width, patch_overlap=0, level=0, magnification=None, image_key=None, batch_size=32, device=None, key_added=None)

It creates patches, runs a computer vision model on each patch, and store the embeddings of each all patches as an image. This is mostly useful for WSI images.

Info

The image will be saved into the SpatialData object with the key "{model_name}_embeddings" (see the model_name argument below), except if key_added is provided. The shapes of the patches will be saved with the key "embeddings_patches".

Parameters:

Name Type Description Default
sdata SpatialData

A SpatialData object

required
model Callable | str

Callable that takes as an input a tensor of size (batch_size, channels, x, y) and returns a vector for each tile (batch_size, emb_dim), or a string with the name of one of the available models (resnet50, histo_ssl, or dinov2).

required
patch_width int

Width (pixels) of the patches.

required
patch_overlap int

Width (pixels) of the overlap between the patches.

0
level int | None

Image level on which the processing is performed. Either level or magnification should be provided.

0
magnification int | None

The target magnification on which the processing is performed. If magnification is provided, the level argument will be automatically computed.

None
image_key str | None

Optional image key of the image, unecessary if there is only one image.

None
batch_size int

Mini-batch size used during inference.

32
device str | None

Device used for the computer vision model.

None
key_added str | None

Optional name of the spatial element that will be added (storing the embeddings).

None
Source code in sopa/patches/infer.py
def compute_embeddings(
    sdata: SpatialData,
    model: Callable | str,
    patch_width: int,
    patch_overlap: int = 0,
    level: int | None = 0,
    magnification: int | None = None,
    image_key: str | None = None,
    batch_size: int = 32,
    device: str | None = None,
    key_added: str | None = None,
) -> None:
    """It creates patches, runs a computer vision model on each patch, and store the embeddings of each all patches as an image. This is mostly useful for WSI images.

    !!! info
        The image will be saved into the `SpatialData` object with the key `"{model_name}_embeddings"` (see the `model_name` argument below), except if `key_added` is provided.
        The shapes of the patches will be saved with the key `"embeddings_patches"`.

    Args:
        sdata: A `SpatialData` object
        model: Callable that takes as an input a tensor of size `(batch_size, channels, x, y)` and returns a vector for each tile `(batch_size, emb_dim)`, or a string with the name of one of the available models (`resnet50`, `histo_ssl`, or `dinov2`).
        patch_width: Width (pixels) of the patches.
        patch_overlap: Width (pixels) of the overlap between the patches.
        level: Image level on which the processing is performed. Either `level` or `magnification` should be provided.
        magnification: The target magnification on which the processing is performed. If `magnification` is provided, the `level` argument will be automatically computed.
        image_key: Optional image key of the image, unecessary if there is only one image.
        batch_size: Mini-batch size used during inference.
        device: Device used for the computer vision model.
        key_added: Optional name of the spatial element that will be added (storing the embeddings).
    """
    try:
        import torch
    except ImportError:
        raise ImportError(
            "For patch embedding, you need `torch` (and perhaps `torchvision`). Consider installing the sopa WSI extra: `pip install 'sopa[wsi]'`."
        )

    from ._inference import Inference

    image = _get_image_for_inference(sdata, image_key)

    infer = Inference(image, model, patch_width, level, magnification, device)
    patches = Patches2D(sdata, infer.image, infer.patch_width, patch_overlap)

    log.info(f"Processing {len(patches)} patches extracted from level {infer.level}")

    predictions = []
    for i in tqdm.tqdm(range(0, len(patches), batch_size)):
        prediction = infer.infer_bboxes(patches.bboxes[i : i + batch_size])
        predictions.extend(prediction)
    predictions = torch.stack(predictions)

    if len(predictions.shape) == 1:
        predictions = torch.unsqueeze(predictions, 1)

    output_image = np.zeros((predictions.shape[1], *patches.shape), dtype=np.float32)
    for (loc_x, loc_y), pred in zip(patches.ilocs, predictions):
        output_image[:, loc_y, loc_x] = pred

    output_image = DataArray(output_image, dims=("c", "y", "x"))
    output_image = Image2DModel.parse(output_image, transformations=infer.get_patches_transformations(patch_overlap))

    key_added = key_added or f"{infer.model_str}_embeddings"
    add_spatial_element(sdata, key_added, output_image)

    patches.add_shapes(key_added=SopaKeys.EMBEDDINGS_PATCHES)

sopa.patches.cluster_embeddings(sdata, element, method='leiden', key_added='cluster', **method_kwargs)

Create clusters of the patches embeddings (obtained from sopa.patches.compute_embeddings).

Info

The clusters are added to the key_added column of the "inference_patches" shapes (key_added='cluster' by default).

Parameters:

Name Type Description Default
sdata SpatialData

A SpatialData object

required
element DataArray | str

The DataArray containing the embeddings, or the name of the element

required
method Callable | str

Callable that takes as an input an array of size (n_patches x embedding_size) and returns an array of clusters of size n_patches, or an available method name (leiden)

'leiden'
key_added str

The key containing the clusters to be added to the patches GeoDataFrame

'cluster'
method_kwargs str

kwargs provided to the method callable

{}
Source code in sopa/patches/cluster.py
def cluster_embeddings(
    sdata: SpatialData,
    element: DataArray | str,
    method: Callable | str = "leiden",
    key_added: str = "cluster",
    **method_kwargs: str,
) -> None:
    """Create clusters of the patches embeddings (obtained from [sopa.patches.compute_embeddings][]).

    Info:
        The clusters are added to the `key_added` column of the "inference_patches" shapes (`key_added='cluster'` by default).

    Args:
        sdata: A `SpatialData` object
        element: The `DataArray` containing the embeddings, or the name of the element
        method: Callable that takes as an input an array of size `(n_patches x embedding_size)` and returns an array of clusters of size `n_patches`, or an available method name (`leiden`)
        key_added: The key containing the clusters to be added to the patches `GeoDataFrame`
        method_kwargs: kwargs provided to the method callable
    """
    if isinstance(element, str):
        element: DataArray = sdata.images[element]

    if isinstance(method, str):
        assert method in METHODS_DICT, f"Method {method} is not available. Use one of: {', '.join(METHODS_DICT.keys())}"
        method = METHODS_DICT[method]

    gdf_patches = sdata[SopaKeys.EMBEDDINGS_PATCHES]

    ilocs = np.array(list(gdf_patches.ilocs))
    embeddings = element.compute().data[:, ilocs[:, 1], ilocs[:, 0]].T

    gdf_patches[key_added] = method(embeddings, **method_kwargs)
    gdf_patches[key_added] = gdf_patches[key_added].astype("category")