WES-MAPPING-BWA-GATK (⤫ LEGACY)¶

Version: 03-07-2019 Tags: WES / BWA / GATK / Mapping / Picard

This pipeline takes your fastq-formatted reads and returns mapped reads. These reads are optionally corrected with GATK base recalibrator and picard mark duplicates.

You may find more information about:

GATK: https://software.broadinstitute.org/gatk/
Picard: https://github.com/broadinstitute/picard
BWA: https://github.com/lh3/bwa

Citations:

BWA: Li, Heng. “Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.” arXiv preprint arXiv:1303.3997 (2013).
GATK: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA, 2010 GENOME RESEARCH 20:1297-303

Pipeline dependencies¶

This pipeline requires the following packages to be run. Any other additional requirements are being installed dynamically.

Conda:

conda-forge::python=3.8.5

conda-forge::pytest=5.4.3

conda-forge::datrie=0.8.2

conda-forge::git=2.27.0

conda-forge::jinja2=2.11.2

conda-forge::pygraphviz=1.5

conda-forge::flask=1.1.2

conda-forge::pandas=1.0.5

conda-forge::zlib=1.2.11

conda-forge::openssl=1.1.1g

conda-forge::networkx=2.4

bioconda::snakemake=5.20.1

conda-forge::ipython=7.16.1

conda-forge::bashlex=0.15

conda-forge::black=19.10b0

conda-forge::patsy=0.5.1

Additionally, the following prerequisites are non-optional:

Conda
Genome sequence
Known variant sites

Input files¶

Please find below the list of required input files:

A fasta formatted genome sequence with corresponding dictionnary and index.
Fastq formatted WES reads (one or multiple ones)
Known variant sites in VCF format, with its corresponding index (one or multiple files)

Output files¶

Please find below the list of expected output files:

BAM formatted mapped reads (corrected or not, according to the parameters)
Multiple txt files containing quality metrics
HTML files containing containing quality metrics

Notes¶

This pipeline takes the cold storage into account. No need to copy your data in advance.

Installation¶

While installing the workflow, you may run the following commands (order matters):

Case

Command line

git

# This command clones the git repository

if [ ! -d "${WES_MAPPING_BWA_GATK_DIR:?}" ]; then git clone https://github.com/tdayris/wes-mapping-bwa-gatk.git "${WES_MAPPING_BWA_GATK_DIR:?}"; fi

conda

# This command requires the git repository

# and creates a conda virtual environment

conda env create --force --file "${STRONGR_DIR:?}/workflows/mapping/wes-mapping-bwa-gatk/environment.yaml"

Testing¶

In order to test the pipeline, you may try the following commands:

Case	Command line
quick-test	cd "${WES_MAPPING_BWA_GATK_DIR:?}/" make all-unit-tests make test-conda-report.html make clean

Preparation¶

In order to prepare a run, you may try the following commands:

Case

Command line

gustaveroussy-references-hg38

# These commands point to available datasets for HG38 mapping on Flamingo

FASTA="/mnt/beegfs/database/bioinfo/Index_DB/Fasta/Gencode/GRCH38/DNA/gencodeV27_dna.fa"

KNOWN_VCF=""

COLD_STORAGE=(/mnt/isilon /mnt/archivage)

single-end

# These commands help you to build single-ended configuration files

conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_design.py" --single "${WES_MAPPING_BWA_GATK_PREPARE_DIR:?}" --single

python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_config.py" "${FASTA:?}" "${KNOWN_VCF[@]}"

paired-end

# These commands help you to build pair-ended configuration files

conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_design.py" --single "${WES_MAPPING_BWA_GATK_PREPARE_DIR:?}"

python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_config.py" "${FASTA:?}" "${KNOWN_VCF[@]}"

Execution¶

In order to execute the pipeline, you may run the following commands:

Case	Command line(s)
local	conda activate wes-mapping-bwa-gatk \|\| source activate wes-mapping-bwa-gatk snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report
dry-run	conda activate wes-mapping-bwa-gatk \|\| source activate wes-mapping-bwa-gatk snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -prn
torque	# These commands help you to run this pipeline on clusters. However {'\# queues may not be chosen wisely': 'see profiles.'} conda activate wes-mapping-bwa-gatk \|\| source activate wes-mapping-bwa-gatk snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr -j 100 --cluster "qsub -V -d ${CEL_CNV_EACON_WORKDIR:?} -j oe -l nodes=1:ppn={threads},mem={resources.mem_mb}mb,walltime={resources.time_min}:00" --restart-time 3 snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report
slurm	# These commands help you to run this pipeline on clusters. However {'\# queues may not be chosen wisely': 'see profiles.'} conda activate wes-mapping-bwa-gatk \|\| source activate wes-mapping-bwa-gatk snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr -j 100 --cluster "sbatch --mem={resources.mem_mb} --time={resources.time_min} --cpus-per-task={threads} --partition=mediumq " --restart-time 3 snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report
profile	# These commands help you to run this pipeline on clusters. However # they require the profile installation. Then, queues, threads, memory # and restarts times will be chosen the best way. conda activate wes-mapping-bwa-gatk \|\| source activate wes-mapping-bwa-gatk snakemake -s "${STRONGR_DIR:?}/Snakefile" --profile slurm snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report