API usage¶
import spatialdata
import sopa
Context¶
Sopa is built on top of the spatialdata
library, the core scverse data structure, which can be seen as an extension of AnnData
for spatial omics.
It means that Sopa will manipulate SpatialData
objects, that we usually denote by sdata
.
Create a SpatialData object¶
The first step is to create a SpatialData
object from your raw data. To do so, you need to use the function from sopa.io
that is specific to your technology. If you don't know which raw files you need, refer to this FAQ section.
For instance, for MERSCOPE data:
sdata = sopa.io.merscope("/path/to/region_0")
.
For the sake of this tutorial, we use a synthetic dataset:
sdata = sopa.io.toy_dataset() # you can use `sopa.io.merscope` / `sopa.io.xenium` / ...
sdata
[INFO] (sopa.utils.data) Image of size ((4, 2048, 2048)) with 400 cells and 100 transcripts per cell
SpatialData object ├── Images │ ├── 'he_image': DataTree[cyx] (3, 1024, 1024), (3, 512, 512), (3, 256, 256) │ └── 'image': DataArray[cyx] (4, 2048, 2048) ├── Points │ ├── 'misc': DataFrame with shape: (<Delayed>, 2) (2D points) │ └── 'transcripts': DataFrame with shape: (<Delayed>, 5) (2D points) └── Shapes └── 'cells': GeoDataFrame shape: (400, 1) (2D shapes) with coordinate systems: ▸ 'global', with elements: he_image (Images), image (Images), misc (Points), transcripts (Points), cells (Shapes) ▸ 'microns', with elements: transcripts (Points)
sdata.write("tuto.zarr")
sdata = spatialdata.read_zarr("tuto.zarr") # we can read the data back
INFO The Zarr backing store has been changed from None the new file path: tuto.zarr
/Users/quentinblampey/Library/Caches/pypoetry/virtualenvs/sopa-hDHgkEug-py3.12/lib/python3.12/site-packages/zarr/creation.py:614: UserWarning: ignoring keyword argument 'read_only' compressor, fill_value = _kwargs_compat(compressor, fill_value, kwargs)
Note that, when your data is saved on-disk, Sopa will automatically save on-disk any new spatial element such as the cells boundaries. To disable this behavior, you can use sopa.settings.auto_save_on_disk = False
.
Tissue segmentation (optional)¶
If desired, the tissue can be segmented (i.e., defining the contour of the tissue). This can be useful mostly for sparse tissues, as we will not run segmentation on patches that are outside of the tissue.
This tissue segmentation is either using an H&E image, or a staining image (e.g., with DAPI). For more details, refer to the documentation of sopa.segmentation.tissue
.
Note that, here, we have two images, but Sopa decides on which image the tissue segmentation is run. See the FAQ to understand how Sopa handles this.
sopa.segmentation.tissue(sdata)
[INFO] (sopa.segmentation._tissue) Using image_key='he_image' and mode='saturation' as default
Now, we have a "region_of_interest"
GeoDataFrame (see below) which contains the contour of the tissue.
sdata
SpatialData object, with associated Zarr store: /Users/quentinblampey/dev/sopa/docs/tutorials/tuto.zarr ├── Images │ ├── 'he_image': DataTree[cyx] (3, 1024, 1024), (3, 512, 512), (3, 256, 256) │ └── 'image': DataArray[cyx] (4, 2048, 2048) ├── Points │ ├── 'misc': DataFrame with shape: (<Delayed>, 2) (2D points) │ └── 'transcripts': DataFrame with shape: (<Delayed>, 5) (2D points) └── Shapes ├── 'cells': GeoDataFrame shape: (400, 1) (2D shapes) └── 'region_of_interest': GeoDataFrame shape: (2, 1) (2D shapes) with coordinate systems: ▸ 'global', with elements: he_image (Images), image (Images), misc (Points), transcripts (Points), cells (Shapes), region_of_interest (Shapes) ▸ 'microns', with elements: transcripts (Points)
On a real tissue, the 'region_of_interest'
could look like this black line:
Alternative using napari¶
Instead of segmenting the tissue, it's possible to select interactively a region of interest (ROI) via napari-spatialdata
, as shown in this tutorial.
Important: provide the name region_of_interest
to your selection, and save it inside your SpatialData object with Shift + E
.
Later on, the segmentation will only be run inside the region of interest, and not outside.
Cell segmentation¶
For cell segmentation, multiple choices are available: cellpose
, baysor
, comseg
, or even a combination of them (e.g., baysor
with cellpose
as a prior).
Cellpose¶
First, we generate the bounding boxes of the patches on which Cellpose will be run. Here, the patches have a width and height of 1500 pixels and an overlap of 50 pixels. We advise bigger sizes for real datasets (see our default parameters in one of our config files). On the toy dataset, this will generate 4 patches.
sopa.make_image_patches(sdata, patch_width=1200, patch_overlap=50)
[INFO] (sopa.patches._patches) 4 patches were added to sdata['image_patches']
The following channels are available for segmentation. Choose one or two channels used by Cellpose.
sopa.utils.get_channel_names(sdata)
array(['DAPI', 'CK', 'CD3', 'CD20'], dtype='<U4')
Then, we run Cellpose via the sopa.segmentation.cellpose
function.
Here, we run segmentation using "DAPI"
only, and we set the cell diameter to be about 35
pixels. Note that the diameter parameter is crucial: you can refer to this guide to get default parameters according to your technology.
NB: to make is faster, we can set a parallelization backend as below. See here for more parallelization details.
# optional: set the parallelization backend to use dask
sopa.settings.parallelization_backend = "dask"
sopa.segmentation.cellpose(sdata, channels=["DAPI"], diameter=35)
Now, we see that we have the "cellpose_boundaries"
inside the sdata
object:
sdata
SpatialData object, with associated Zarr store: /Users/quentinblampey/dev/sopa/docs/tutorials/tuto.zarr ├── Images │ ├── 'he_image': DataTree[cyx] (3, 1024, 1024), (3, 512, 512), (3, 256, 256) │ └── 'image': DataArray[cyx] (4, 2048, 2048) ├── Points │ ├── 'misc': DataFrame with shape: (<Delayed>, 2) (2D points) │ └── 'transcripts': DataFrame with shape: (<Delayed>, 5) (2D points) └── Shapes ├── 'cellpose_boundaries': GeoDataFrame shape: (372, 1) (2D shapes) ├── 'cells': GeoDataFrame shape: (400, 1) (2D shapes) └── 'image_patches': GeoDataFrame shape: (4, 3) (2D shapes) with coordinate systems: ▸ 'global', with elements: he_image (Images), image (Images), misc (Points), transcripts (Points), cellpose_boundaries (Shapes), cells (Shapes), image_patches (Shapes) ▸ 'microns', with elements: transcripts (Points)
Baysor¶
To use Baysor, make sure you have correctly installed it, as in our getting started instructions.
Baysor needs a config to be executed, but, by default Sopa can create one for you (if you use cellpose as a prior, see below).
You can find official config examples here. You can also reuse the Baysor parameter we have defined for each machine, as in our Snakemake config files.
Before actually running Baysor, we generate the bounding boxes of the patches on which Baysor will be run. Here, the patches have a width and height of 500 microns and an overlap of 50 microns (default value). We advise bigger sizes for real datasets (see our default parameters in one of our config files).
In this example, we pass prior_shapes_key="cellpose_boundaries"
, which means that Baysor will use the cellpose segmentation as a prior. You can also remove the argument if you don't want it.
sopa.make_transcript_patches(
sdata,
patch_width=500,
prior_shapes_key="cellpose_boundaries",
)
[########################################] | 100% Completed | 519.54 ms
[INFO] (sopa.patches._transcripts) 1 patche(s) were added to sdata['transcripts_patches']
Then, we run Baysor via the sopa.segmentation.baysor
function.
NB: to make is faster, we can set a parallelization backend as below. See here for more parallelization details.
# optional: set the parallelization backend to use dask
sopa.settings.parallelization_backend = "dask"
sopa.segmentation.baysor(sdata, min_area=10)
[INFO] (sopa.segmentation.methods._baysor) No config provided, inferring a default Baysor config. 100%|██████████| 1/1 [00:46<00:00, 46.64s/it] [INFO] (sopa.segmentation._transcripts) Cells whose area is less than 10 microns^2 will be removed Reading transcript-segmentation outputs: 0%| | 0/1 [00:00<?, ?it/s]OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead. Reading transcript-segmentation outputs: 100%|██████████| 1/1 [00:00<00:00, 1.60it/s] Resolving conflicts: 0it [00:00, ?it/s] [INFO] (sopa.segmentation._transcripts) Added sdata.tables['table'], and 267 cell boundaries to sdata['baysor_boundaries']
sopa.make_transcript_patches(
sdata,
patch_width=500,
prior_shapes_key="cellpose_boundaries",
write_cells_centroids=True,
)
[########################################] | 100% Completed | 458.95 ms
[INFO] (sopa.patches._transcripts) 1 patche(s) were added to sdata['transcripts_patches']
Then, we run Comseg via the sopa.segmentation.comseg
function.
sopa.segmentation.comseg(sdata)
[INFO] (sopa.segmentation.methods._comseg) The Comseg config was not provided, using the following by default:
{'dict_scale': {'x': 1, 'y': 1, 'z': 1}, 'mean_cell_diameter': 7.873894882387665, 'max_cell_radius': 13.779316044178413, 'norm_vector': False, 'alpha': 0.5, 'allow_disconnected_polygon': False, 'min_rna_per_cell': 20, 'gene_column': 'genes'}
0%| | 0/1 [00:00<?, ?it/s]OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
/Users/quentinblampey/Library/Caches/pypoetry/virtualenvs/sopa-hDHgkEug-py3.12/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
/Users/quentinblampey/Library/Caches/pypoetry/virtualenvs/sopa-hDHgkEug-py3.12/lib/python3.12/site-packages/comseg/clustering.py:154: FutureWarning: In the future, the default backend for leiden will be igraph instead of leidenalg.
To achieve the future defaults please pass: flavor="igraph" and n_iterations=2. directed must also be False to work with igraph's implementation.
sc.tl.leiden(adata, resolution=resolution)
100%|██████████| 1/1 [00:32<00:00, 32.94s/it]
Reading transcript-segmentation outputs: 100%|██████████| 1/1 [00:00<00:00, 16.44it/s]
Resolving conflicts: 0it [00:00, ?it/s]
[INFO] (sopa.segmentation._transcripts) Added sdata.tables['table'], and 372 cell boundaries to sdata['comseg_boundaries']
Aggregation¶
The purpose of sopa.aggregate
is to create an AnnData
object of features per cell. Depending on your technology, it will count the transcript/bins inside each cell, and/or average each channel intensity inside each cell boundary.
More specifically:
- For bins technologies like Visium HD data: for each cell, it will sum the transcript counts of all bins that are touching this cell.
- For transcript-based technologies like MERSCOPE/Xenium, it will count the transcripts inside each cells, and optionally average the channel intensities
- For multiplex imaging data, it will average the intensity of each channel inside each cell
You don't need to specify any argument, whatever the technology you use. See here to understand how Sopa knows which elements to use for aggregation.
sopa.aggregate(sdata)
Now, in the SpatialData
object, there is a "table"
element (which is an AnnData
object, see below).
adata = sdata["table"]
adata
AnnData object with n_obs × n_vars = 372 × 5 obs: 'region', 'slide', 'cell_id', 'area' uns: 'sopa_attrs', 'spatialdata_attrs' obsm: 'intensities', 'spatial'
What's inside adata depends on your technology:
- If you have transcripts/bins, then
adata.X
are the raw counts - Else,
adata.X
are the channels intensities - If you both count the transcript/bins and average the intensities, then
adata.X
are the raw counts, andadata.obsm["intensities"]
are the channels intensities (as above)
Annotation (optional)¶
Annotation within Sopa is totally optionnal. These annotations tools were optimized for annotation of large data, but feel free to perform your own.
Transcript-based (Tangram)¶
Tangram is a transcript-based annotation that uses an annotated single-cell reference. Let's suppose your reference AnnData
object is stored in a file called adata_reference.h5ad
(preferably, keep raw counts), and the cell type is in adata.obs["cell_type"]
. Then, you can annotate your spatial data as follows:
import anndata
adata_reference = anndata.read_h5ad("adata_reference.h5ad")
sopa.utils.tangram_annotate(sdata, adata_reference, "cell_type")
Staining-based¶
For now, our fluorescence-based annotation is very simple. We provide a dictionary where a channel is associated with a population. Then, each cell is associated with the cell type whose corresponding channel is the brightest (according to a certain Z-score). In this tutorial example, we can annotate Tumoral cells, T cells, and B cells:
marker_cell_dict = {"CK": "Tumoral cell", "CD20": "B cell", "CD3": "T cell"}
sopa.utils.higher_z_score(sdata.tables["table"], marker_cell_dict)
[INFO] (sopa.utils.annotation) Annotation counts: cell_type
Tumoral cell 129
T cell 123
B cell 120
Name: count, dtype: int64
Visualization¶
Many visualization tools can be used. Here, we show three of them, with different capabilities.
With the Xenium Explorer¶
The Xenium Explorer is a software developed by 10X Genomics for visualizing spatial data, and it can be downloaded freely here. This is the most user-friendly visualizer of the three, but it may not show all the spatial element from your SpatialData
object (it only shows images, transcripts, and cell boundaries).
Sopa allows the conversion to the Xenium Explorer, whatever the type of spatial data you worked on.
The command below creates some files under a new tuto.explorer
directory:
sopa.io.explorer.write("tuto.explorer", sdata)
[INFO] (sopa.io.explorer.table) Writing table with 5 columns [INFO] (sopa.io.explorer.table) Writing 4 cell categories: image_name, region, slide, cell_type [INFO] (sopa.io.explorer.shapes) Writing 372 cell polygons [INFO] (sopa.io.explorer.points) Writing 40000 transcripts [INFO] (sopa.io.explorer.points) > Level 0: 40000 transcripts [INFO] (sopa.io.explorer.points) > Level 1: 10000 transcripts [INFO] (sopa.io.explorer.images) Writing multiscale image with procedure=semi-lazy (load in memory when possible) [INFO] (sopa.io.explorer.images) (Loading image of shape (4, 2048, 2048)) in memory [INFO] (sopa.io.explorer.images) > Image of shape (4, 2048, 2048) [INFO] (sopa.io.explorer.images) > Image of shape (4, 1024, 1024) [INFO] (sopa.io.explorer.images) > Image of shape (4, 512, 512) [INFO] (sopa.io.explorer.images) > Image of shape (4, 256, 256) [INFO] (sopa.io.explorer.images) > Image of shape (4, 128, 128) [INFO] (sopa.io.explorer.images) > Image of shape (4, 64, 64) [INFO] (sopa.io.explorer.converter) Saved files in the following directory: tuto.explorer [INFO] (sopa.io.explorer.converter) You can open the experiment with 'open tuto.explorer/experiment.xenium'
If you have downloaded the Xenium Explorer, you can now open the results in the explorer: simply double-click on the tuto.explorer/experiment.xenium
file. For more advanced usage of the Xenium Explorer, refer to this tutorial.
With spatialdata-plot¶
spatialdata-plot
library is a static plotting library for SpatialData
objects. This solution is very convenient when working in Jupyter Notebooks.
You can install it via pip install spatialdata-plot
import spatialdata_plot
sdata.pl.render_images("image").pl.render_shapes(
"cellpose_boundaries", outline_alpha=1, fill_alpha=0, outline_color="#fff"
).pl.show("global")
INFO Rasterizing image for faster rendering.
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [0.0..1.9530949634755863].
With napari-spatialdata¶
napari-spatialdata
is a plugin for Napari developed to visualize your SpatialData objects. This visualizer is very flexible, as you can display any spatial element, and you can directly save annotations from Napari in your SpatialData object.
You can install it via pip install 'napari-spatialdata[all]'
from napari_spatialdata import Interactive
Interactive(sdata)
Pipeline report¶
You can optionally create an HTML report of the pipeline run (in the example below, we save it under report.html
). It contains some quality controls for your data.
sopa.io.write_report("report.html", sdata)
[INFO] (sopa.io.report.generate) Writing general_section [INFO] (sopa.io.report.generate) Writing cell_section [INFO] (sopa.io.report.generate) Writing channel_section [INFO] (sopa.io.report.generate) Writing transcripts_section [INFO] (sopa.io.report.generate) Writing representation_section [INFO] (sopa.io.report.generate) Computing UMAP on 268 cells [INFO] (sopa.io.report.generate) Writing report to report.html
Further analyses¶
If not done yet, you can save your SpatialData
object, as below. It will create a .zarr
directory.
If you already saved your SpatialData
object, you don't need to save it again, because all elements are automatically saved by default (see here to change this setting).
sdata.write("tuto.zarr")
This way, you'll be able to open it later for further analysis, using spatialdata.read_zarr("tuto.zarr")
.
Now, here is a list of ressources you may consider to go further:
- This tutorial on spatial statistic and geometric analysis.
- Use Squidpy which operates on both the
SpatialData
object or theAnnData
object, or use other tools of thescverse
ecosystem such asScanpy
. - You can also try the CLI or the Snakemake pipeline of Sopa.