Frequently asked questions
What kind of inputs do I need to run Sopa?
You need the raw inputs of your machine, that is:
-
One or multiple image(s), usually corresponding to one or multiple
.tiff
file(s) -
Optionally, a file of transcript location, usually a
.csv
or.parquet
file
In this documentation, data_path
denotes the path to your raw data. Select the correct tab below to understand what is the right path to your raw data:
data_path
is the path to the directory containing the following files: morphology.ome.tif
, experiment.xenium
and transcripts.parquet
. In brief, you should have this file structure:
data_path
is the path to the "region" directory containing a detected_transcripts.csv
file and an images
directory. For instance, the directory may be called region_0
. In brief, you should have this file structure:
data_path
is the path to the directory containing:
- a transcript file
*_tx_file
(with columnstarget
,x_global_px
,y_global_px
) - a FOV locations file
*_fov_positions_file
(with columnsFOV
,X_mm
,Y_mm
) - a
Morphology_ChannelID_Dictionary.txt
file containing channel names - a
Morphology2D
directory containing the images, end in_F*.TIF
.
These files must be exported as flat files in AtomX. That is: within a study, click on "Export" and then select files from the "Flat CSV Files" section (transcripts flat and FOV position flat). You should have this file structure:
data_path
is the path to the directory containing multiple .ome.tif
files (one file per channel). In brief, you should have this file structure:
data_path
is the path to one .qptiff
file, or one .tif
file (if exported from QuPath).
data_path
is path to the directory containing multiple .ome.tiff
files (one file per channel). In brief, you should have this file structure:
Other file formats (ND2, CZI, LIF, or DV) are supported via the aicsimageio
reader. In that case, you'll need to add new dependencies: pip install aicsimageio
(and, for CZI data, also pip install aicspylibczi
).
This reader is called aicsimageio
, i.e. you can use it via sopa.io.aicsimageio(data_path)
, where data_path
is the path to your data file containing your image(s). For the Snakemake pipeline, provide aicsimageio
as a technology
in the config file.
I have small artifact cells, how do remove them?
You may have small cells that were segmented but that should be removed. For that, Sopa
offers three filtering approaches: using their area, their transcript count, or their fluorescence intensity. Refer to the following config parameters from this example config: min_area
, min_transcripts
, and min_intensity_ratio
.
If using the CLI, --min-area
can be provided to sopa segmentation cellpose
or sopa resolve baysor
, and --min-transcripts
/--min-intensity-ratio
can be provided to sopa aggregate
.
Cellpose is not segmenting enough cells; what should I do?
- The main Cellpose parameter to check is
diameter
, i.e. a typical cell diameter in pixels. Note that this is highly specific to the technology you're using since the micron-to-pixel ratio can differ. We advise you to start with the default parameter for your technology of interest (see thediameter
parameter inside our config files here). - Maybe
min_area
is too high, and all the cells are filtered because they are smaller than this area. Remind that, when using Cellpose, the areas correspond to pixels^2. - This can be due to a low image quality. If the image is too pixelated, consider increasing
gaussian_sigma
(e.g.,2
) under the cellpose parameters of our config. If the image has a low contrast, consider increasingclip_limit
(e.g.,0.3
). These parameters are detailed in this example config. - Consider updating the official Cellpose parameters. In particular, try
cellprob_threshold=-6
andflow_threshold=2
.
How to use a custom Cellpose model?
You can use any existing Cellpose model with the model_type
argument (via the API, CLI, or Snakemake pipeline). For the Snakemake pipeline, see here how to set this argument.
If you have a custom pretrained model, use the pretrained_model
argument instead of model_type
, and give the path to your cellpose model.
How to provide other arguments to Cellpose?
When using the Snakemake pipeline, you can use method_kwargs
to provide extra arguments to Cellpose. For instance, we use resample=False
in the example below, which may significantly speed up the segmentation while not decreasing significantly the segmentation quality:
segmentation:
cellpose:
diameter: 60
channels: ["DAPI"]
flow_threshold: 2
cellprob_threshold: -6
min_area: 2000
method_kwargs:
resample: False
How to use a prior cell segmentation?
If you have MERSCOPE or Xenium data, you probably already have a cell segmentation. This can be used as a prior for Baysor, instead of running Cellpose with Sopa. For that, you have an existing config file for the Snakemake pipeline for both MERSCOPE and Xenium data. If using the API/CLI, consider using the cell_key
and the unassigned_value
arguments when creating the patches for the transcripts. For MERSCOPE data, cell_key="cell_id"
and unassigned_value=-1
. For Xenium data, cell_key="cell_id"
and unassigned_value="UNASSIGNED"
.
How to provide dictionnaries to CLI arguments?
Some CLI arguments are optionnal dictionnaries. For instance, sopa read
has a --kwargs
option. In that case, a dictionnary can be provided as an inline string, for instance:
--kwargs "{'backend': 'rioxarray'}"
How to fix an "out-of-memory" issue on MERSCOPE data?
If using MERSCOPE data, images can be huge. To improve RAM efficiency, you can install rioxarray
(pip install rioxarray
). Then, the rioxarray
will be used by default by the reader (no change needed, it will be detected automatically).
Can I use Nextflow instead of Snakemake?
Nextflow is not supported yet, but we are working on it. You can also help re-write our Snakemake pipeline for Nextflow (see issue #7).
I have another issue; how do I fix it?
Don't hesitate to open an issue on Sopa's Github repository, and detail your issue with as much precision as possible for the maintainers to be able to reproduce it.