RNA-DGE-SALMON-DESEQ2 (⤫ LEGACY)¶
Version: 11-09-2020 Tags: RNA-Seq / Differential Gene Expression / DGE / DESeq2 / Salmon
This pipeline assumes that reads were quantified over transcriptome with salmon. See genomic_expression/rna-count-salmon for more information.
You may find more information about Salmon at:
Pipeline dependencies¶
This pipeline requires the following packages to be run. Any other additional requirements are being installed dynamically.
Conda:
conda-forge::python=3.8.5
conda-forge::pytest=5.4.3
conda-forge::datrie=0.8.2
conda-forge::git=2.27.0
conda-forge::jinja2=2.11.2
conda-forge::pygraphviz=1.5
conda-forge::flask=1.1.2
conda-forge::pandas=1.0.5
conda-forge::zlib=1.2.11
conda-forge::openssl=1.1.1g
conda-forge::networkx=2.4
bioconda::snakemake=5.20.1
conda-forge::ipython=7.16.1
conda-forge::bashlex=0.15
conda-forge::black=19.10b0
Additionally, the following prerequisites are non-optional:
Conda
Genome annotation
Input files¶
Please find below the list of required input files:
Salmon-formatted transcriptome abundance estimation
GTF-formatted genome annotation
Output files¶
Please find below the list of expected output files:
DESeq2 result archive
Chlustered heatmaps (PNG)
Shiny GSEAapp tables (TSV)
Expression distribution (PNG)
MA-plots (PNG)
PCA and their correlations and screes (PNG + TSV)
rlog/vst normalizations (TSV)
VolcanoPlot (PNG)
MultiQC report
Complete report embedding material and methods
Notes¶
This pipeline takes the cold storage into account. No need to copy your data in advance.
In order to install, use “conda” to install required environment, and “git” to clone the git repository.
Installation¶
While installing the workflow, you may run the following commands (order matters):
Case |
Command line |
---|---|
git |
# This command clones the github repository
if [ ! -d "${RNA_DGE_SALMON_DESEQ2:?}" ]; then git clone https://github.com/tdayris-perso/rna-dge-salmon-deseq2.git "${RNA_DGE_SALMON_DESEQ2:?}"; fi
|
conda |
# This command creates the conda virtual environment. It requires an
# access to the git repository (see above).
conda env create --force --file "${STRONGR_DIR:?}/workflows/differential_expression/rna-dge-salmon-deseq2/environment.yaml"
|
Testing¶
In order to test the pipeline, you may try the following commands:
Case |
Command line |
---|---|
quick-test |
cd "${RNA_DGE_SALMON_DESEQ2:?}"
make all-unit-tests
make test-conda-report.html
make clean
|
Preparation¶
In order to prepare a run, you may try the following commands:
Case |
Command line |
---|---|
activate |
# This command activates the conda environment available after the
# installation process.
conda activate rna-dge-salmon-deseq2 || source activate rna-dge-salmon-deseq2
|
gustaveroussy-references-hg38 |
# This points to HG38 references for Gustave Roussy's flamingo
GTF="/mnt/beegfs/database/bioinfo/Index_DB/GTF/Gencode/GRCH38/release_34/gencode.v34.annotation.gtf"
FASTA="/mnt/beegfs/database/bioinfo/Index_DB/Fasta/Gencode/GRCH38/RNA/gencode.v34.transcripts.fa"
COLD_STORAGE=(/mnt/isilon /mnt/archivage)
|
config |
# This command builds the configuration file
python3.8 "${RNA_DGE_SALMON_DESEQ2:?}/scripts/prepare_config.py" "${FASTA}" "${GTF:?}" --cold_storage "${COLD_STORAGE:?}" --workdir "${RNA_DGE_SALMON_DESEQ2_PREPARE_DIR:?}"
|
paired-end-design |
# This command builds the design file for pair-ended data
python3.7 "${RNA_DGE_SALMON_DESEQ2:?}/scripts/prepare_design.py" "${RNA_DGE_SALMON_DESEQ2_PREPARE_DIR:?}"
|
single-end-design |
# This command builds the design file for single-ended data
python3.7 "${RNA_DGE_SALMON_DESEQ2:?}/scripts/prepare_design.py" "${RNA_DGE_SALMON_DESEQ2_PREPARE_DIR:?}" --single
|
Execution¶
In order to execute the pipeline, you may run the following commands:
Case |
Command line(s) |
---|---|
local |
source activate rna-dge-salmon-deseq2 || conda activate rna-dge-salmon-deseq2
snakemake -s "${RNA_DGE_SALMON_DESEQ2:?}/Snakefile" -r -p --configfile config.yaml -j 4 --use-conda
|
torque |
# While reserving optimal threads and memory requirements,
# the choice of the queue might not be optimal.
# See profiles below.
source activate rna-dge-salmon-deseq2 || conda activate rna-dge-salmon-deseq2
snakemake -s "${RNA_DGE_SALMON_DESEQ2:?}/Snakefile" -r -p --configfile config.yaml -j 100 --cluster "qsub -V -d ${RNA_DGE_SALMON_DESEQ2_WORKDIR:?} -j oe -l nodes=1:ppn={threads},mem={resources.mem_mb}mb,walltime={resources.time_min}:00" --use-conda
|
slurm |
# While reserving optimal threads and memory requirements,
# the choice of the queue might not be optimal.
# See profiles below.
source activate rna-dge-salmon-deseq2 || conda activate rna-dge-salmon-deseq2
snakemake -s "${RNA_DGE_SALMON_DESEQ2:?}/Snakefile" -r -p --configfile config.yaml -j 100 --cluster "sbatch --mem={resources.mem_mb} --time={resources.time_min} --cpus-per-task={threads}" --use-conda
|
profile |
# Requires slurm profile installation
source activate rna-dge-salmon-deseq2 || conda activate rna-dge-salmon-deseq2
snakemake -s "${RNA_DGE_SALMON_DESEQ2:?}/Snakefile" --configfile config.yaml --profile slurm
|
report |
snakemake -s "${RNA_DGE_SALMON_DESEQ2:?}/Snakefile" --configfile config.yaml --report
|