RNA-DGE-SALMON-DESEQ2 (⤫ LEGACY)

Version: 11-09-2020 Tags: RNA-Seq / Differential Gene Expression / DGE / DESeq2 / Salmon

This pipeline assumes that reads were quantified over transcriptome with salmon. See genomic_expression/rna-count-salmon for more information.

You may find more information about Salmon at:

Pipeline dependencies

This pipeline requires the following packages to be run. Any other additional requirements are being installed dynamically.

Conda:

  • conda-forge::python=3.8.5

  • conda-forge::pytest=5.4.3

  • conda-forge::datrie=0.8.2

  • conda-forge::git=2.27.0

  • conda-forge::jinja2=2.11.2

  • conda-forge::pygraphviz=1.5

  • conda-forge::flask=1.1.2

  • conda-forge::pandas=1.0.5

  • conda-forge::zlib=1.2.11

  • conda-forge::openssl=1.1.1g

  • conda-forge::networkx=2.4

  • bioconda::snakemake=5.20.1

  • conda-forge::ipython=7.16.1

  • conda-forge::bashlex=0.15

  • conda-forge::black=19.10b0

Additionally, the following prerequisites are non-optional:

  • Conda

  • Genome annotation

Input files

Please find below the list of required input files:

  • Salmon-formatted transcriptome abundance estimation

  • GTF-formatted genome annotation

Output files

Please find below the list of expected output files:

  • DESeq2 result archive

  • Chlustered heatmaps (PNG)

  • Shiny GSEAapp tables (TSV)

  • Expression distribution (PNG)

  • MA-plots (PNG)

  • PCA and their correlations and screes (PNG + TSV)

  • rlog/vst normalizations (TSV)

  • VolcanoPlot (PNG)

  • MultiQC report

  • Complete report embedding material and methods

Notes

This pipeline takes the cold storage into account. No need to copy your data in advance.

In order to install, use “conda” to install required environment, and “git” to clone the git repository.

Installation

While installing the workflow, you may run the following commands (order matters):

Case

Command line

git

# This command clones the github repository

if [ ! -d "${RNA_DGE_SALMON_DESEQ2:?}" ]; then git clone https://github.com/tdayris-perso/rna-dge-salmon-deseq2.git "${RNA_DGE_SALMON_DESEQ2:?}"; fi

conda

# This command creates the conda virtual environment. It requires an

# access to the git repository (see above).

conda env create --force --file "${STRONGR_DIR:?}/workflows/differential_expression/rna-dge-salmon-deseq2/environment.yaml"

Testing

In order to test the pipeline, you may try the following commands:

Case

Command line

quick-test

cd "${RNA_DGE_SALMON_DESEQ2:?}"

make all-unit-tests

make test-conda-report.html

make clean

Preparation

In order to prepare a run, you may try the following commands:

Case

Command line

activate

# This command activates the conda environment available after the

# installation process.

conda activate rna-dge-salmon-deseq2 || source activate rna-dge-salmon-deseq2

gustaveroussy-references-hg38

# This points to HG38 references for Gustave Roussy's flamingo

GTF="/mnt/beegfs/database/bioinfo/Index_DB/GTF/Gencode/GRCH38/release_34/gencode.v34.annotation.gtf"

FASTA="/mnt/beegfs/database/bioinfo/Index_DB/Fasta/Gencode/GRCH38/RNA/gencode.v34.transcripts.fa"

COLD_STORAGE=(/mnt/isilon /mnt/archivage)

config

# This command builds the configuration file

python3.8 "${RNA_DGE_SALMON_DESEQ2:?}/scripts/prepare_config.py" "${FASTA}" "${GTF:?}" --cold_storage "${COLD_STORAGE:?}" --workdir "${RNA_DGE_SALMON_DESEQ2_PREPARE_DIR:?}"

paired-end-design

# This command builds the design file for pair-ended data

python3.7 "${RNA_DGE_SALMON_DESEQ2:?}/scripts/prepare_design.py" "${RNA_DGE_SALMON_DESEQ2_PREPARE_DIR:?}"

single-end-design

# This command builds the design file for single-ended data

python3.7 "${RNA_DGE_SALMON_DESEQ2:?}/scripts/prepare_design.py" "${RNA_DGE_SALMON_DESEQ2_PREPARE_DIR:?}" --single

Execution

In order to execute the pipeline, you may run the following commands:

Case

Command line(s)

local

source activate rna-dge-salmon-deseq2 || conda activate rna-dge-salmon-deseq2

snakemake -s "${RNA_DGE_SALMON_DESEQ2:?}/Snakefile" -r -p --configfile config.yaml -j 4 --use-conda

torque

# While reserving optimal threads and memory requirements,

# the choice of the queue might not be optimal.

# See profiles below.

source activate rna-dge-salmon-deseq2 || conda activate rna-dge-salmon-deseq2

snakemake -s "${RNA_DGE_SALMON_DESEQ2:?}/Snakefile" -r -p --configfile config.yaml -j 100 --cluster "qsub -V -d ${RNA_DGE_SALMON_DESEQ2_WORKDIR:?} -j oe -l nodes=1:ppn={threads},mem={resources.mem_mb}mb,walltime={resources.time_min}:00" --use-conda

slurm

# While reserving optimal threads and memory requirements,

# the choice of the queue might not be optimal.

# See profiles below.

source activate rna-dge-salmon-deseq2 || conda activate rna-dge-salmon-deseq2

snakemake -s "${RNA_DGE_SALMON_DESEQ2:?}/Snakefile" -r -p --configfile config.yaml -j 100 --cluster "sbatch --mem={resources.mem_mb} --time={resources.time_min} --cpus-per-task={threads}" --use-conda

profile

# Requires slurm profile installation

source activate rna-dge-salmon-deseq2 || conda activate rna-dge-salmon-deseq2

snakemake -s "${RNA_DGE_SALMON_DESEQ2:?}/Snakefile" --configfile config.yaml --profile slurm

report

snakemake -s "${RNA_DGE_SALMON_DESEQ2:?}/Snakefile" --configfile config.yaml --report