WES-MAPPING-BWA-GATK (⤫ LEGACY)

Version: 03-07-2019 Tags: WES / BWA / GATK / Mapping / Picard

This pipeline takes your fastq-formatted reads and returns mapped reads. These reads are optionally corrected with GATK base recalibrator and picard mark duplicates.

You may find more information about:
Citations:
  • BWA: Li, Heng. “Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.” arXiv preprint arXiv:1303.3997 (2013).

  • GATK: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA, 2010 GENOME RESEARCH 20:1297-303

Pipeline dependencies

This pipeline requires the following packages to be run. Any other additional requirements are being installed dynamically.

Conda:

  • conda-forge::python=3.8.5

  • conda-forge::pytest=5.4.3

  • conda-forge::datrie=0.8.2

  • conda-forge::git=2.27.0

  • conda-forge::jinja2=2.11.2

  • conda-forge::pygraphviz=1.5

  • conda-forge::flask=1.1.2

  • conda-forge::pandas=1.0.5

  • conda-forge::zlib=1.2.11

  • conda-forge::openssl=1.1.1g

  • conda-forge::networkx=2.4

  • bioconda::snakemake=5.20.1

  • conda-forge::ipython=7.16.1

  • conda-forge::bashlex=0.15

  • conda-forge::black=19.10b0

  • conda-forge::patsy=0.5.1

Additionally, the following prerequisites are non-optional:

  • Conda

  • Genome sequence

  • Known variant sites

Input files

Please find below the list of required input files:

  • A fasta formatted genome sequence with corresponding dictionnary and index.

  • Fastq formatted WES reads (one or multiple ones)

  • Known variant sites in VCF format, with its corresponding index (one or multiple files)

Output files

Please find below the list of expected output files:

  • BAM formatted mapped reads (corrected or not, according to the parameters)

  • Multiple txt files containing quality metrics

  • HTML files containing containing quality metrics

Notes

This pipeline takes the cold storage into account. No need to copy your data in advance.

Installation

While installing the workflow, you may run the following commands (order matters):

Case

Command line

git

# This command clones the git repository

if [ ! -d "${WES_MAPPING_BWA_GATK_DIR:?}" ]; then git clone https://github.com/tdayris/wes-mapping-bwa-gatk.git "${WES_MAPPING_BWA_GATK_DIR:?}"; fi

conda

# This command requires the git repository

# and creates a conda virtual environment

conda env create --force --file "${STRONGR_DIR:?}/workflows/mapping/wes-mapping-bwa-gatk/environment.yaml"

Testing

In order to test the pipeline, you may try the following commands:

Case

Command line

quick-test

cd "${WES_MAPPING_BWA_GATK_DIR:?}/"

make all-unit-tests

make test-conda-report.html

make clean

Preparation

In order to prepare a run, you may try the following commands:

Case

Command line

gustaveroussy-references-hg38

# These commands point to available datasets for HG38 mapping on Flamingo

FASTA="/mnt/beegfs/database/bioinfo/Index_DB/Fasta/Gencode/GRCH38/DNA/gencodeV27_dna.fa"

KNOWN_VCF=""

COLD_STORAGE=(/mnt/isilon /mnt/archivage)

single-end

# These commands help you to build single-ended configuration files

conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_design.py" --single "${WES_MAPPING_BWA_GATK_PREPARE_DIR:?}" --single

python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_config.py" "${FASTA:?}" "${KNOWN_VCF[@]}"

paired-end

# These commands help you to build pair-ended configuration files

conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_design.py" --single "${WES_MAPPING_BWA_GATK_PREPARE_DIR:?}"

python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_config.py" "${FASTA:?}" "${KNOWN_VCF[@]}"

Execution

In order to execute the pipeline, you may run the following commands:

Case

Command line(s)

local

conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr

snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report

dry-run

conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -prn

torque

# These commands help you to run this pipeline on clusters. However

{'\# queues may not be chosen wisely': 'see profiles.'}

conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr -j 100 --cluster "qsub -V -d ${CEL_CNV_EACON_WORKDIR:?} -j oe -l nodes=1:ppn={threads},mem={resources.mem_mb}mb,walltime={resources.time_min}:00" --restart-time 3

snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report

slurm

# These commands help you to run this pipeline on clusters. However

{'\# queues may not be chosen wisely': 'see profiles.'}

conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr -j 100 --cluster "sbatch --mem={resources.mem_mb} --time={resources.time_min} --cpus-per-task={threads} --partition=mediumq " --restart-time 3

snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report

profile

# These commands help you to run this pipeline on clusters. However

# they require the profile installation. Then, queues, threads, memory

# and restarts times will be chosen the best way.

conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

snakemake -s "${STRONGR_DIR:?}/Snakefile" --profile slurm

snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report