Snakemake pipeline
Sopa comes with an existing Snakemake workflow to get started quickly. This will not involve any coding but requires some setup specific to snakemake.
Info
If you're more familiar with Nextflow, you can try nf-core/sopa instead.
Setup
Installation
Follow our installation instructions until the end of the "Snakemake setup" section.
At the end, you should have one sopa environment, one one environment with snakemake>=8.0.0 (it can be the same environment, if desired), and you should also have cloned the sopa repository.
Choose a config file
Our pipeline config is a YAML file that describes all the steps desired for the pipeline. It is flexible; for instance, if you remove the baysor section from the config, then it will not run baysor.
You can choose a config among the existing ones here or create your own.
Keep in mind the path of your config (relative to the workflow directory) because you'll need it later. For instance, config/merscope/base.yaml is a valid relative path. You can also use an absolute path if you prefer.
Run the pipeline
Locate your raw data
First, you need to locate the path to one sample's raw experiment file(s). This is usually a directory containing one or many image(s) and, eventually, a transcript file. If you don't know what data you need, see our FAQ.
Again, remind this path, as we will use it later.
Activate snakemake
Then, activate an environment that has the snakemake command:
And move in the workflow directory of the sopa repository:
Run Snakemake
You can either execute the pipeline locally or on a high-performance-cluster (choose the right option below).
You can execute the pipeline locally as below (in this example, we use only one core). Make sure to replace data_path with the path to your raw data directory, and configfile with the relative path to your config (as detailed above).
snakemake \
--config data_path=/path/to/directory \
--configfile=config/merscope/base.yaml \
--workflow-profile profile/local \
--cores 1
Faster pipeline
Even though Sopa can be run locally, we recommend to use it on high-performance-clusters to benefit from all the pipeline capabilities (see the second tab just above).
To fully benefit from Slurm, you'll need a Snakemake cluster profile. Sopa offers a default Slurm profile for you. Make sure you have snakemake>=8.0.0, and also install the Slurm plugin with pip install snakemake-executor-plugin-slurm.
Then, you can use the Slurm profile as shown below. Make sure to replace data_path with the path to your raw data directory, and configfile with the relative path to your config (as detailed above).
snakemake \
--config data_path=/path/to/directory \
--configfile=config/merscope/base.yaml \
--workflow-profile profile/slurm # or any profile you want
Specify the slurm partition names
You may need to update the slurm_partition parameters inside the workflow/profile/slurm/config.yaml file according to the partition names of your cluster (else, it will always use the same partition). You can also change mem_mb, depending on the RAM capabilities of your cluster.
Warning
The LSF profile is experimental. Don't hesitate to open an issue or a PR.
To fully benefit from LSF, you'll need a Snakemake cluster profile. Sopa offers a default LSF profile for you, but it is still experimental. Make sure you have snakemake>=8.0.0, and also install the LSF plugin with pip install snakemake-executor-plugin-lsf.
Then, you can use the LSF profile as shown below. Make sure to replace data_path with the path to your raw data directory, and configfile with the relative path to your config (as detailed above).
If you have high-performance-cluster that is not a Slurm/LSF HPC, then we recommend reading more about the Snakemake profiles, and, especially, the different executor plugins.
Once you installed an executor plugin, you can use it with the command below. Make sure to replace data_path with the path to your raw data directory, and configfile with the relative path to your config (as detailed above).
snakemake \
--config data_path=/path/to/directory \
--configfile=config/merscope/base.yaml
--executor my_executor # your new executor
Or you can also create a new profile for your HPC: for instance, you can create workflow/profile/my_profile/config.yaml, which you can use it witht the following command:
snakemake \
--config data_path=/path/to/directory \
--configfile=config/merscope/base.yaml
--workflow-profile profile/my_profile # your new profile
RAM per rule
Some Snakemake rules may use more RAM, for instance explorer, or to_spatialdata. Consider adjusting the RAM and the walltime depending on the Snakemake rules.
For more customization, see the snakemake CLI documentation.
Toy example
In the example below, we run the pipeline on a generated toy dataset. Running it locally can help test a new pipeline or config.
Make sure you have installed everything as detailed in this tutorial, and then run the following command lines:
Make sure you have installed sopa with the cellpose extra (for instance, this can be done via the following command: pip install 'sopa[cellpose]').
conda activate snakemake # or any environment that has `snakemake`
cd workflow # move to the workflow directory inside the sopa repository
# you can replace tuto.zarr by another path where the data will be saved
snakemake \
--config sdata_path=tuto.zarr \
--configfile=config/toy/cellpose.yaml \
--workflow-profile profile/local \
--cores 1
Make sure you have installed the proseg command (refer to our getting started).
conda activate snakemake # or any environment that has `snakemake`
cd workflow # move to the workflow directory inside the sopa repository
# you can replace tuto.zarr by another path where the data will be saved
snakemake \
--config sdata_path=tuto.zarr \
--configfile=config/toy/proseg.yaml \
--workflow-profile profile/local \
--cores 1
Make sure you have installed sopa with the baysor extra, and that you have installed the baysor command (refer to our getting started).
conda activate snakemake # or any environment that has `snakemake`
cd workflow # move to the workflow directory inside the sopa repository
# you can replace tuto.zarr by another path where the data will be saved
snakemake \
--config sdata_path=tuto.zarr \
--configfile=config/toy/baysor.yaml \
--workflow-profile profile/local \
--cores 1
Notes
On the above example, it executes snakemake sequentially (one core), which is enough for debugging purposes. You can remove the argument, or set a higher number of cores.
You can then check tuto.explorer for output files. Notably, if you have installed the Xenium Explorer, double-click on experiment.xenium to visualize the results.
Pipeline outputs
The pipeline outputs consists in two directories located next to your raw data directory. They have the same name as your raw directory, but with extension .zarr and .explorer respectively (see below for more details).
Info
You can also change the path to the .zarr output, by providing sdata_path=/your/path.zarr just after --config on the snakemake execution line. This will also move the .explorer directory, that will be saved at /your/path.explorer
SpatialData directory
If you are familiar with the spatialdata library, you can use the .zarr directory, corresponding to a SpatialData object:
Explorer directory
The .explorer directory contains the following files:
-
report.htmla short quality control of you data, as an HTML report -
adata.h5adthe AnnData object with spatial locations of the cells (seeadata.obsm['spatial']), and also cell-by-gene table and/or the cell-by-channel table. -
experiment.xeniumthe Xenium Explorer file: double-click on it to open it on the Xenium Explorer (download the software here) -
The other files are data files related and required by the Xenium Explorer
Advanced usage
Create your own config
If the existing config files are not suited for your project, you can update an existing one or create a whole new one. For this, use this commented config to understand the purpose of each argument. Note that some sections are optional: in this case, remove the section or the argument, and Sopa will not run it.
When running snakemake, you will then need to provide the relative or absolute path to your .yaml config, for instance --configfile=/path/to/your/config.yaml.
Passing kwargs to the config
Internally, the Snakemake pipeline is calling Sopa's CLI. Not all argument are used in the default config files. You can pass additionnal kwargs to the config so that they are given to the CLI.
For instance, if you want to pass kwargs to the MERSCOPE reader, you can update the config as below:
Create your own profile
As mentioned above, you can use a Snakemake profile to execute the pipeline. In Snakemake>=8.0.0, there are multiple executor plugins that can help you.
Save your new profile under workflow/profile/my_profile/config.yaml.
Then, to use the new profile, pass --workflow-profile profile/my_profile to your snakemake command.