Snakemake pipeline
Sopa comes with an existing Snakemake workflow to get started quickly. This will not involve any coding but requires some setup specific to snakemake
.
Setup
Installation
Follow our installation instructions until the end of the "Snakemake setup" section.
At the end, you should have one sopa
environment, one one environment with snakemake>=8.0.0
(it can be the same environment, if desired), and you should also have cloned the sopa
repository.
Choose a config file
Our pipeline config is a YAML file that describes all the steps desired for the pipeline. It is flexible; for instance, if you remove the baysor
section from the config, then it will not run baysor.
You can choose a config among the existing ones here or create your own.
Keep in mind the path of your config (relative to the workflow
directory) because you'll need it later. For instance, config/merscope/base.yaml
is a valid relative path. You can also use an absolute path if you prefer.
Run the pipeline
Locate your raw data
First, you need to locate the path to one sample's raw experiment file(s). This is usually a directory containing one or many image(s) and, eventually, a transcript file. If you don't know what data you need, see our FAQ.
Again, remind this path, as we will use it later.
Activate snakemake
Then, activate an environment that has the snakemake command:
And move in the workflow
directory of the sopa
repository:
Run Snakemake
You can either execute the pipeline locally or on a high-performance-cluster (choose the right option below).
You can execute the pipeline locally as below (in this example, we use only one core). Make sure to replace data_path
with the path to your raw data directory, and configfile
with the relative path to your config (as detailed above).
snakemake \
--config data_path=/path/to/directory \
--configfile=config/merscope/base.yaml \
--workflow-profile profile/local \
--cores 1
Faster pipeline
Even though Sopa can be run locally, we recommend to use it on high-performance-clusters to benefit from all the pipeline capabilities (see the second tab just above).
To fully benefit from Slurm, you'll need a Snakemake cluster profile. Sopa offers a default Slurm profile for you. Make sure you have snakemake>=8.0.0
, and also install the Slurm plugin with pip install snakemake-executor-plugin-slurm
.
Then, you can use the Slurm profile as shown below. Make sure to replace data_path
with the path to your raw data directory, and configfile
with the relative path to your config (as detailed above).
snakemake \
--config data_path=/path/to/directory \
--configfile=config/merscope/base.yaml \
--workflow-profile profile/slurm # or any profile you want
Specify the slurm partition names
You may need to update the slurm_partition
parameters inside the workflow/profile/slurm/config.yaml
file according to the partition names of your cluster (else, it will always use the same partition). You can also change mem_mb
, depending on the RAM capabilities of your cluster.
Warning
The LSF profile is experimental. Don't hesitate to open an issue or a PR.
To fully benefit from LSF, you'll need a Snakemake cluster profile. Sopa offers a default LSF profile for you, but it is still experimental. Make sure you have snakemake>=8.0.0
, and also install the LSF plugin with pip install snakemake-executor-plugin-lsf
.
Then, you can use the LSF profile as shown below. Make sure to replace data_path
with the path to your raw data directory, and configfile
with the relative path to your config (as detailed above).
If you have high-performance-cluster that is not a Slurm/LSF HPC, then we recommend reading more about the Snakemake profiles, and, especially, the different executor plugins.
Once you installed an executor plugin, you can use it with the command below. Make sure to replace data_path
with the path to your raw data directory, and configfile
with the relative path to your config (as detailed above).
snakemake \
--config data_path=/path/to/directory \
--configfile=config/merscope/base.yaml
--executor my_executor # your new executor
Or you can also create a new profile for your HPC: for instance, you can create workflow/profile/my_profile/config.yaml
, which you can use it witht the following command:
snakemake \
--config data_path=/path/to/directory \
--configfile=config/merscope/base.yaml
--workflow-profile profile/my_profile # your new profile
RAM per rule
Some Snakemake rules may use more RAM, for instance explorer
, or to_spatialdata
. Consider adjusting the RAM and the walltime depending on the Snakemake rules.
For more customization, see the snakemake CLI documentation.
Toy example
In the example below, we run the pipeline on a generated toy dataset. Running it locally can help test a new pipeline or config.
Make sure you have installed everything as detailed in this tutorial, and then run the following command lines:
Make sure you have installed sopa with the cellpose
extra (for instance, this can be done via the following command: pip install 'sopa[cellpose]'
).
conda activate snakemake # or any environment that has `snakemake`
cd workflow # move to the workflow directory inside the sopa repository
# you can replace tuto.zarr by another path where the data will be saved
snakemake \
--config sdata_path=tuto.zarr \
--configfile=config/toy/cellpose.yaml \
--workflow-profile profile/local \
--cores 1
Make sure you have installed sopa with the baysor
extra, and that you have installed the baysor
command (refer to our getting started).
conda activate snakemake # or any environment that has `snakemake`
cd workflow # move to the workflow directory inside the sopa repository
# you can replace tuto.zarr by another path where the data will be saved
snakemake \
--config sdata_path=tuto.zarr \
--configfile=config/toy/baysor.yaml \
--workflow-profile profile/local \
--cores 1
Notes
On the above example, it executes snakemake sequentially (one core), which is enough for debugging purposes. You can remove the argument, or set a higher number of cores.
You can then check tuto.explorer
for output files. Notably, if you have installed the Xenium Explorer, double-click on experiment.xenium
to visualize the results.
Pipeline outputs
The pipeline outputs consists in two directories located next to your raw data directory. They have the same name as your raw directory, but with extension .zarr
and .explorer
respectively (see below for more details).
Info
You can also change the path to the .zarr
output, by providing sdata_path=/your/path.zarr
just after --config
on the snakemake execution line. This will also move the .explorer
directory, that will be saved at /your/path.explorer
SpatialData
directory
If you are familiar with the spatialdata
library, you can use the .zarr
directory, corresponding to a SpatialData
object:
Explorer directory
The .explorer
directory contains the following files:
-
report.html
a short quality control of you data, as an HTML report -
adata.h5ad
the AnnData object with spatial locations of the cells (seeadata.obsm['spatial']
), and also cell-by-gene table and/or the cell-by-channel table. -
experiment.xenium
the Xenium Explorer file: double-click on it to open it on the Xenium Explorer (download the software here) -
The other files are data files related and required by the Xenium Explorer
Advanced usage
Create your own config
If the existing config
files are not suited for your project, you can update an existing one or create a whole new one. For this, use this commented config to understand the purpose of each argument. Note that some sections are optional: in this case, remove the section or the argument, and Sopa will not run it.
When running snakemake, you will then need to provide the relative or absolute path to your .yaml
config, for instance --configfile=/path/to/your/config.yaml
.
Passing kwargs to the config
Internally, the Snakemake pipeline is calling Sopa's CLI. Not all argument are used in the default config files. You can pass additionnal kwargs to the config so that they are given to the CLI.
For instance, if you want to pass kwargs to the MERSCOPE reader, you can update the config as below:
Create your own profile
As mentioned above, you can use a Snakemake profile to execute the pipeline. In Snakemake>=8.0.0, there are multiple executor plugins that can help you.
Save your new profile under workflow/profile/my_profile/config.yaml
.
Then, to use the new profile, pass --workflow-profile profile/my_profile
to your snakemake
command.