INFER-SEQUENCING-LIBRARY (✓ PRODUCTION)¶

Version: 11-09-2020 Tags: Quality / Control / QC / inference / library

This pipeline uses BWA, Picard, RSeQC and Samtools to build inferences on sequencing libraries used in a provided paired-end sequencing.

https://github.com/tdayris-perso/infer-sequencing-library

Pipeline dependencies¶

This pipeline requires the following packages to be run. Any other additional requirements are being installed dynamically.

Conda:

conda-forge::python=3.8.5

conda-forge::pytest=6.0.1

conda-forge::datrie=0.8.2

conda-forge::git=2.28.0

conda-forge::jinja2=2.11.2

conda-forge::pygraphviz=1.5

conda-forge::flask=1.1.2

conda-forge::pandas=1.1.0

conda-forge::zlib=1.2.11

conda-forge::openssl=1.1.1g

conda-forge::networkx=2.4

bioconda::snakemake=5.22.1

conda-forge::black=19.10b0

conda-forge::ipython=7.17.0

conda-forge::bashlex=0.15

Additionally, the following prerequisites are non-optional:

Conda
Genome sequence and annotation

Input files¶

Please find below the list of required input files:

Fasta-formatted genome sequence
GTF-formatted genome annotation
BED12-formatted genome annotation

Output files¶

Please find below the list of expected output files:

TSV formatted library information

Notes¶

This pipeline takes the cold storage into account. No need to copy your data in advance.

In order to install, use “conda” to install required environment, and “git” to clone the git repository.

Installation¶

While installing the workflow, you may run the following commands (order matters):

Case

Command line

git

# This command clones the github repository

if [ ! -d "${INFER_SEQUENCING_LIBRARY_DIR:?}" ]; then git clone https://github.com/tdayris-perso/infer-sequencing-library.git "${INFER_SEQUENCING_LIBRARY_DIR:?}"; fi

conda

# This command creates the conda virtual environment. It requires an

# access to the git repository (see above).

conda env create --force --file "${STRONGR_DIR:?}/workflows/quality/infer-sequencing-library/environment.yaml"

Testing¶

In order to test the pipeline, you may try the following commands:

Case	Command line
quick-test	cd "${INFER_SEQUENCING_LIBRARY_DIR:?}" make conda tests make all-unit-tests make test-conda-report.html make clean

Preparation¶

In order to prepare a run, you may try the following commands:

Case

Command line

activate

# This command activates the conda environment available after the

# installation process.

conda activate infer-sequencing-library || source activate infer-sequencing-library

gustaveroussy-references-hg38

# This points to HG38 references for Gustave Roussy's flamingo

FASTA="/mnt/beegfs/database/bioinfo/Index_DB/Fasta/Gencode/GRCH38/DNA/GRCh38.p13.genome.fa"

GTF="/mnt/beegfs/database/bioinfo/Index_DB/GTF/Gencode/GRCH38/release_34/gencode.v34.annotation.gtf"

BED="/mnt/beegfs/database/bioinfo/Index_DB/rseqc/hg38_Gencode_V28.bed"

COLD_STORAGE=(/mnt/isilon /mnt/archivage)

prepare-pipeline

# This command builds the configuration file

python3.8 "${INFER_SEQUENCING_LIBRARY_DIR:?}/scripts/prepare_pipeline.py" "${FASTA:?}" "${GTF:?}" "${BED:?}" --cold_storage "${COLD_STORAGE:?}" --workdir "${INFER_SEQUENCING_LIBRARY_PREPARE_DIR:?}"

Execution¶

In order to execute the pipeline, you may run the following commands:

Case	Command line(s)
local	source activate infer-sequencing-library \|\| conda activate infer-sequencing-library snakemake -s "${INFER_SEQUENCING_LIBRARY_DIR:?}/Snakefile" -r -p --configfile config.yaml -j 4 --use-conda
torque	# While reserving optimal threads and memory requirements, # the choice of the queue might not be optimal. # See profiles below. source activate infer-sequencing-library \|\| conda activate infer-sequencing-library snakemake -s "${INFER_SEQUENCING_LIBRARY_DIR:?}/Snakefile" -r -p --configfile config.yaml -j 100 --cluster "qsub -V -d ${INFER_SEQUENCING_LIBRARY_WORKDIR:?} -j oe -l nodes=1:ppn={threads},mem={resources.mem_mb}mb,walltime={resources.time_min}:00" --use-conda
slurm	# While reserving optimal threads and memory requirements, # the choice of the queue might not be optimal. # See profiles below. source activate infer-sequencing-library \|\| conda activate infer-sequencing-library snakemake -s "${INFER_SEQUENCING_LIBRARY_DIR:?}/Snakefile" -r -p --configfile config.yaml -j 100 --cluster "sbatch --mem={resources.mem_mb} --time={resources.time_min} --cpus-per-task={threads}" --use-conda
profile	# Requires slurm profile installation source activate infer-sequencing-library \|\| conda activate infer-sequencing-library snakemake -s "${INFER_SEQUENCING_LIBRARY_DIR:?}/Snakefile" --configfile config.yaml --profile slurm
report	snakemake -s "${INFER_SEQUENCING_LIBRARY_DIR:?}/Snakefile" --configfile config.yaml --report