Visium HD tutorial¶
We can run Sopa on Visium HD, as the 2 micron bins are subcellular. It uses similar functions as in the "normal" API tutorial.
For this tutorial, we use the mouse small intestine public dataset from 10X Genomics.
import spatialdata
import sopa
Reading the data¶
You can read your Visium HD data with sopa.io.visium_hd.
Important notes:
- This API tutorial assumes you already ran Space Ranger. If you didn't, you can consider running the full sopa pipeline for Visium HD via Nextflow (see
nf-core/sopa) - If Space Ranger produced files with no prefix (for instance,
feature_slice.h5), then passdataset_id=""to the reader below. - If your full resolution microscopy image is not stored in the
microscope_image, you need to providefullres_image_fileas below. Else, it will be found automatically. This is not necessary if you have prior 10X Genomics segmentation (in which case you can diectly use proseg).
sdata = sopa.io.visium_hd(
"data/visium_hd/Visium_HD_Mouse_Small_Intestine", # Space Ranger output directory
fullres_image_file="/path/to/microscopy/image.tiff", # not needed for this tuto
dataset_id="", # only if Space Ranger produced files with no prefix
)
sdata
Then, we save it on-disk:
sdata.write("mouse_small_intestine.zarr") # save it
sdata = spatialdata.read_zarr("mouse_small_intestine.zarr") # open-it back
Option 1: running StarDist¶
The first option is to run Stardist on the H&E image. Note that, to do that, you'll need the full-resolution image. Else, if you have a prior 10X segmentation, you can directly go to the Proseg section.
First, we create the patches for the cell segmentation.
sopa.make_image_patches(sdata)
[INFO] (sopa.patches._patches) Added 156 patche(s) to sdata['image_patches']
Now we can run stardist on the H&E image.
sopa.segmentation.stardist(sdata, min_area=30)
[WARNING] (sopa._settings) Running without parallelization backend can be slow. Consider using a backend, e.g. via `sopa.settings.parallelization_backend = 'dask'`, or `export SOPA_PARALLELIZATION_BACKEND=dask`.
3%|▎ | 4/156 [00:34<22:03, 8.71s/it]
WARNING:tensorflow:5 out of the last 5 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x5638ebb50> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
3%|▎ | 5/156 [00:36<16:07, 6.40s/it]
WARNING:tensorflow:6 out of the last 6 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x563ab4550> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
100%|██████████| 156/156 [16:27<00:00, 6.33s/it] [INFO] (sopa.segmentation._stainings) Found 430536 total cells Resolving conflicts: 100%|██████████| 63782/63782 [00:23<00:00, 2714.30it/s] [INFO] (sopa.segmentation._stainings) Added 410196 cell boundaries in sdata['stardist_boundaries']
Option 2: running Proseg¶
You can use proseg to use the bins information to improve the segmentation and aggregation. To do so, proseg needs a prior segmentation: you can either use stardist (see how to run it above), or directly use the 10X Genomics default segmentation available on the recent Visium HD versions.
Using prior_shapes_key="auto" will automatically detect which segmentation to use. If you ran stardist, it will use it as a prior, else it will fallback to the default 10X Genomics segmentation.
NB: this feature is still experimental, and was added in sopa==2.1.10. Ensure you read your data with sopa.io.visium_hd using sopa>=2.1.10.
sopa.segmentation.proseg(sdata, prior_shapes_key="auto")
Since proseg already aggregate the bins inside the cells, the next section (i.e., aggregation) is not mandatory.
Aggregation¶
Now, we can run sopa.aggregate to aggregate the bins into the cells. This is mandatory if you used stardist only, but optional if you ran proseg. Even if you used proseg, it can still be useful it if you want some additional aggregation or filtering.
For each cell, we expand their radius, and then Sopa will sum the transcript counts of all 2-microns-bins touching or included within the cell.
Here, expand_radius_ratio = 1, which means that the cells will be expanded a value of 1 * mean_radius before aggregating the means. You can choose any positive float value. If you ran proseg, it will only be used for channel aggregation, since the transcripts are already aggregated.
Note: There is an argument
bins_key, but by default Sopa will understand that it's Visium HD data and that it should use the 2-microns bins. Also, on the example below, we only aggregate the bins, not the H&E channels.
Note 2: You can also provide
no_overlap=Trueto force each bin being mapped to only one cell.
sopa.aggregate(sdata, aggregate_channels=False, expand_radius_ratio=1)
Single-cell table¶
Now, we have an AnnData object with the gene expression per cell.
adata = sdata["table"]
adata
AnnData object with n_obs × n_vars = 408458 × 19059
obs: 'region', 'slide', 'cell_id', 'area'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'sopa_attrs', 'spatialdata_attrs'
obsm: 'spatial'
For instance, we can now use Scanpy to plot gene expression.
import scanpy as sc
# basic preprocessing
sc.pp.filter_genes(adata, min_cells=10)
sc.pp.filter_cells(adata, min_counts=20)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
We can then use sc.pl.spatial to show the gene expression per cells. Note that, here, we show cells, not bins.
sc.pl.spatial(adata, color="Tpm2", spot_size=80, vmax="p98")
Bins visualization¶
The 2-micron bins are arranged in a grid, so they can be visualized as an image of G channels, where G is the number of genes.
Creating the image would be massive, so we need to create it lazily. This can be done with spatialdata.rasterize_bins.
sdata["square_002um"].X = sdata["square_002um"].X.tocsc() # optimisation with the csc format
lazy_bins_image = spatialdata.rasterize_bins(
sdata,
bins="Visium_HD_Mouse_Small_Intestine_square_002um", # key of the bins shapes
table_name="square_002um", # key of the table with the bins gene expression
row_key="array_row",
col_key="array_col",
)
Note that lazy_bins_image is an image of size (19059, 690, 690), that is G=19059 genes, and 690x690 bins. This would correspond to a 33.80GB image in memory, if it wasn't lazy.
lazy_bins_image
We can save this image in the sdata object.
sdata["gene_expression_2_um"] = lazy_bins_image
Then, we can visualize this image with Napari. When showing a gene, it will compute the corresponding layer of the lazy image, and it will be displayed in milliseconds, i.e. looking instantaneous.
from napari_spatialdata import Interactive
Interactive(sdata)
You'll be able to display cells and the bins expression. It should look like that:
Xenium Explorer¶
Although the Xenium Explorer can be used (as below), it will not display the bins. If you want to see the bins, use Napari as detailed above.
sopa.io.explorer.write("mouse_small_intestine.explorer", sdata)
[INFO] (sopa.io.explorer.table) Writing table with 18166 columns [INFO] (sopa.io.explorer.table) Writing 2 cell categories: region, slide [INFO] (sopa.io.explorer.shapes) Writing 332556 cell polygons [INFO] (sopa.io.explorer.images) Writing multiscale image with procedure=semi-lazy (load in memory when possible) [INFO] (sopa.io.explorer.images) (Loading image of shape (3, 21943, 23618)) in memory [INFO] (sopa.io.explorer.images) > Image of shape (3, 21943, 23618) [INFO] (sopa.io.explorer.images) > Image of shape (3, 10971, 11809) [INFO] (sopa.io.explorer.images) > Image of shape (3, 5485, 5904) [INFO] (sopa.io.explorer.images) > Image of shape (3, 2742, 2952) [INFO] (sopa.io.explorer.images) > Image of shape (3, 1371, 1476) [INFO] (sopa.io.explorer.images) > Image of shape (3, 685, 738) [INFO] (sopa.io.explorer.converter) Saved files in the following directory: mouse_small_intestine.explorer [INFO] (sopa.io.explorer.converter) You can open the experiment with 'open mouse_small_intestine.explorer/experiment.xenium'