Visium HD tutorial¶
We can run Sopa on Visium HD, as the 2 micron bins are subcellular. You can follow the "normal" API tutorial, or continue below to get exemples more specific to Visium HD data.
For this tutorial, we use the mouse small intestine public dataset from 10X Genomics.
import sopa
import spatialdata
Reading the data¶
sdata = sopa.io.visium_hd("data/visium_hd/Visium_HD_Mouse_Small_Intestine")
sdata
Then, we save it on-disk:
sdata.write("mouse_small_intestine.zarr") # save it
sdata = spatialdata.read_zarr("mouse_small_intestine.zarr") # open-it back
Usual pipeline¶
Now, we run Sopa as usual. Although, since we don't have transcripts, we can't run Baysor. Therefore, we will run Stardist on the H&E image.
First, we create the patches for the cell segmentation.
sopa.make_image_patches(sdata)
[INFO] (sopa.patches._patches) 156 patches were added to sdata['image_patches']
Now we can run stardist on the H&E image. Here, we decrease prob_thresh
and increase nms_thresh
to get more cells.
sopa.segmentation.stardist(sdata, prob_thresh=0.2, nms_thresh=0.6, min_area=30)
[WARNING] (sopa._settings) Running without parallelization backend can be slow. Consider using a backend, e.g. via `sopa.settings.parallelization_backend = 'dask'`, or `export SOPA_PARALLELIZATION_BACKEND=dask`.
3%|▎ | 4/156 [00:34<22:12, 8.77s/it]
WARNING:tensorflow:5 out of the last 5 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x3a211ee60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
3%|▎ | 5/156 [00:36<16:10, 6.42s/it]
WARNING:tensorflow:6 out of the last 6 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x4e57d3880> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
100%|██████████| 156/156 [19:26<00:00, 7.48s/it] [INFO] (sopa.segmentation._stainings) Found 428836 total cells Resolving conflicts: 100%|██████████| 64106/64106 [00:22<00:00, 2885.00it/s] [INFO] (sopa.segmentation._stainings) Added 408458 cell boundaries in sdata['stardist_boundaries']
Aggregation¶
Now, we need to run sopa.aggregate
to aggregate the bins into the cells. For each cell, we expand their radius, and then Sopa will sum the transcript counts of all 2-microns-bins touching or included within the cell.
Here, expand_radius_ratio = 1
, which means that the cells will be expanded a value of 1 * mean_radius
before aggregating the means. You can choose any positive float value.
There is an argument
bins_key
, but by default Sopa will understand that it's Visium HD data and that it should use the 2-microns bins. Also, on the example below, we only aggregate the bins, not the H&E channels.
sopa.aggregate(sdata, aggregate_channels=False, expand_radius_ratio=1)
aligned_df.py (68): Transforming to str index. _elements.py (104): Key `stardist_boundaries` already exists. Overwriting it in-memory.
Single-cell table¶
Now, we have an AnnData object with the gene expression per cell.
adata = sdata["table"]
adata
AnnData object with n_obs × n_vars = 408458 × 19059 obs: 'region', 'slide', 'cell_id', 'area' var: 'gene_ids', 'feature_types', 'genome' uns: 'sopa_attrs', 'spatialdata_attrs' obsm: 'spatial'
For instance, we can now use Scanpy to plot gene expression.
import scanpy as sc
# basic preprocessing
sc.pp.filter_genes(adata, min_cells=10)
sc.pp.filter_cells(adata, min_counts=20)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
We can then use sc.pl.spatial
to show the gene expression per cells. Note that, here, we show cells, not bins.
sc.pl.spatial(adata, color="Tpm2", spot_size=80, vmax="p98")
Bins visualization¶
The 2-micron bins are arranged in a grid, so they can be visualized as an image of G
channels, where G
is the number of genes.
Creating the image would be massive, so we need to create it lazily. This can be done with spatialdata.rasterize_bins
.
sdata["square_002um"].X = sdata["square_002um"].X.tocsc() # optimisation with the csc format
lazy_bins_image = spatialdata.rasterize_bins(
sdata,
bins="Visium_HD_Mouse_Small_Intestine_square_002um", # key of the bins shapes
table_name="square_002um", # key of the table with the bins gene expression
row_key="array_row",
col_key="array_col",
)
Note that lazy_bins_image
is an image of size (19059, 690, 690)
, that is G=19059
genes, and 690x690
bins. This would correspond to a 33.80GB image in memory, if it wasn't lazy.
lazy_bins_image
We can save this image in the sdata
object.
sdata["gene_expression_2_um"] = lazy_bins_image
Then, we can visualize this image with Napari. When showing a gene, it will compute the corresponding layer of the lazy image, and it will be displayed in milliseconds, i.e. looking instantaneous.
from napari_spatialdata import Interactive
Interactive(sdata)
You'll be able to display cells and the bins expression. It should look like that:
Xenium Explorer¶
Although the Xenium Explorer can be used (as below), it will not display the bins. If you want to see the bins, use Napari
as detailed above.
sopa.io.explorer.write("mouse_small_intestine.explorer", sdata)
[INFO] (sopa.io.explorer.table) Writing table with 18166 columns [INFO] (sopa.io.explorer.table) Writing 2 cell categories: region, slide [INFO] (sopa.io.explorer.shapes) Writing 332556 cell polygons [INFO] (sopa.io.explorer.images) Writing multiscale image with procedure=semi-lazy (load in memory when possible) [INFO] (sopa.io.explorer.images) (Loading image of shape (3, 21943, 23618)) in memory [INFO] (sopa.io.explorer.images) > Image of shape (3, 21943, 23618) [INFO] (sopa.io.explorer.images) > Image of shape (3, 10971, 11809) [INFO] (sopa.io.explorer.images) > Image of shape (3, 5485, 5904) [INFO] (sopa.io.explorer.images) > Image of shape (3, 2742, 2952) [INFO] (sopa.io.explorer.images) > Image of shape (3, 1371, 1476) [INFO] (sopa.io.explorer.images) > Image of shape (3, 685, 738) [INFO] (sopa.io.explorer.converter) Saved files in the following directory: mouse_small_intestine.explorer [INFO] (sopa.io.explorer.converter) You can open the experiment with 'open mouse_small_intestine.explorer/experiment.xenium'