WES-MAPPING-BWA-GATK (⤫ LEGACY)¶
Version: 03-07-2019 Tags: WES / BWA / GATK / Mapping / Picard
This pipeline takes your fastq-formatted reads and returns mapped reads. These reads are optionally corrected with GATK base recalibrator and picard mark duplicates.
- You may find more information about:
- Citations:
BWA: Li, Heng. “Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.” arXiv preprint arXiv:1303.3997 (2013).
GATK: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA, 2010 GENOME RESEARCH 20:1297-303
Pipeline dependencies¶
This pipeline requires the following packages to be run. Any other additional requirements are being installed dynamically.
Conda:
conda-forge::python=3.8.5
conda-forge::pytest=5.4.3
conda-forge::datrie=0.8.2
conda-forge::git=2.27.0
conda-forge::jinja2=2.11.2
conda-forge::pygraphviz=1.5
conda-forge::flask=1.1.2
conda-forge::pandas=1.0.5
conda-forge::zlib=1.2.11
conda-forge::openssl=1.1.1g
conda-forge::networkx=2.4
bioconda::snakemake=5.20.1
conda-forge::ipython=7.16.1
conda-forge::bashlex=0.15
conda-forge::black=19.10b0
conda-forge::patsy=0.5.1
Additionally, the following prerequisites are non-optional:
Conda
Genome sequence
Known variant sites
Input files¶
Please find below the list of required input files:
A fasta formatted genome sequence with corresponding dictionnary and index.
Fastq formatted WES reads (one or multiple ones)
Known variant sites in VCF format, with its corresponding index (one or multiple files)
Output files¶
Please find below the list of expected output files:
BAM formatted mapped reads (corrected or not, according to the parameters)
Multiple txt files containing quality metrics
HTML files containing containing quality metrics
Notes¶
This pipeline takes the cold storage into account. No need to copy your data in advance.
Installation¶
While installing the workflow, you may run the following commands (order matters):
Case |
Command line |
---|---|
git |
# This command clones the git repository
if [ ! -d "${WES_MAPPING_BWA_GATK_DIR:?}" ]; then git clone https://github.com/tdayris/wes-mapping-bwa-gatk.git "${WES_MAPPING_BWA_GATK_DIR:?}"; fi
|
conda |
# This command requires the git repository
# and creates a conda virtual environment
conda env create --force --file "${STRONGR_DIR:?}/workflows/mapping/wes-mapping-bwa-gatk/environment.yaml"
|
Testing¶
In order to test the pipeline, you may try the following commands:
Case |
Command line |
---|---|
quick-test |
cd "${WES_MAPPING_BWA_GATK_DIR:?}/"
make all-unit-tests
make test-conda-report.html
make clean
|
Preparation¶
In order to prepare a run, you may try the following commands:
Case |
Command line |
---|---|
gustaveroussy-references-hg38 |
# These commands point to available datasets for HG38 mapping on Flamingo
FASTA="/mnt/beegfs/database/bioinfo/Index_DB/Fasta/Gencode/GRCH38/DNA/gencodeV27_dna.fa"
KNOWN_VCF=""
COLD_STORAGE=(/mnt/isilon /mnt/archivage)
|
single-end |
# These commands help you to build single-ended configuration files
conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk
python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_design.py" --single "${WES_MAPPING_BWA_GATK_PREPARE_DIR:?}" --single
python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_config.py" "${FASTA:?}" "${KNOWN_VCF[@]}"
|
paired-end |
# These commands help you to build pair-ended configuration files
conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk
python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_design.py" --single "${WES_MAPPING_BWA_GATK_PREPARE_DIR:?}"
python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_config.py" "${FASTA:?}" "${KNOWN_VCF[@]}"
|
Execution¶
In order to execute the pipeline, you may run the following commands:
Case |
Command line(s) |
---|---|
local |
conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk
snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr
snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report
|
dry-run |
conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk
snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -prn
|
torque |
# These commands help you to run this pipeline on clusters. However
{'\# queues may not be chosen wisely': 'see profiles.'}
conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk
snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr -j 100 --cluster "qsub -V -d ${CEL_CNV_EACON_WORKDIR:?} -j oe -l nodes=1:ppn={threads},mem={resources.mem_mb}mb,walltime={resources.time_min}:00" --restart-time 3
snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report
|
slurm |
# These commands help you to run this pipeline on clusters. However
{'\# queues may not be chosen wisely': 'see profiles.'}
conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk
snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr -j 100 --cluster "sbatch --mem={resources.mem_mb} --time={resources.time_min} --cpus-per-task={threads} --partition=mediumq " --restart-time 3
snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report
|
profile |
# These commands help you to run this pipeline on clusters. However
# they require the profile installation. Then, queues, threads, memory
# and restarts times will be chosen the best way.
conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk
snakemake -s "${STRONGR_DIR:?}/Snakefile" --profile slurm
snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report
|