.. _`wes-mapping-bwa-gatk (⤫ legacy)`: WES-MAPPING-BWA-GATK (⤫ LEGACY) =============================== Version: 03-07-2019 Tags: WES / BWA / GATK / Mapping / Picard This pipeline takes your fastq-formatted reads and returns mapped reads. These reads are optionally corrected with GATK base recalibrator and picard mark duplicates. You may find more information about: * GATK: https://software.broadinstitute.org/gatk/ * Picard: https://github.com/broadinstitute/picard * BWA: https://github.com/lh3/bwa Citations: * BWA: Li, Heng. "Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM." arXiv preprint arXiv:1303.3997 (2013). * GATK: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA, 2010 GENOME RESEARCH 20:1297-303 .. role:: bash(code) :language: bash Pipeline dependencies --------------------- This pipeline requires the following packages to be run. Any other additional requirements are being installed dynamically. Conda: * conda-forge::python=3.8.5 * conda-forge::pytest=5.4.3 * conda-forge::datrie=0.8.2 * conda-forge::git=2.27.0 * conda-forge::jinja2=2.11.2 * conda-forge::pygraphviz=1.5 * conda-forge::flask=1.1.2 * conda-forge::pandas=1.0.5 * conda-forge::zlib=1.2.11 * conda-forge::openssl=1.1.1g * conda-forge::networkx=2.4 * bioconda::snakemake=5.20.1 * conda-forge::ipython=7.16.1 * conda-forge::bashlex=0.15 * conda-forge::black=19.10b0 * conda-forge::patsy=0.5.1 Additionally, the following prerequisites are non-optional: * Conda * Genome sequence * Known variant sites Input files ----------- Please find below the list of required input files: * A fasta formatted genome sequence with corresponding dictionnary and index. * Fastq formatted WES reads (one or multiple ones) * Known variant sites in VCF format, with its corresponding index (one or multiple files) Output files ------------ Please find below the list of expected output files: * BAM formatted mapped reads (corrected or not, according to the parameters) * Multiple txt files containing quality metrics * HTML files containing containing quality metrics Notes ----- This pipeline takes the cold storage into account. No need to copy your data in advance. Installation ------------ While installing the workflow, you may run the following commands (order matters): .. list-table:: :widths: 10 80 :header-rows: 1 :align: left * - Case - Command line * - git - .. code-block:: bash # This command clones the git repository if [ ! -d "${WES_MAPPING_BWA_GATK_DIR:?}" ]; then git clone https://github.com/tdayris/wes-mapping-bwa-gatk.git "${WES_MAPPING_BWA_GATK_DIR:?}"; fi * - conda - .. code-block:: bash # This command requires the git repository # and creates a conda virtual environment conda env create --force --file "${STRONGR_DIR:?}/workflows/mapping/wes-mapping-bwa-gatk/environment.yaml" Testing ------- In order to test the pipeline, you may try the following commands: .. list-table:: :widths: 10 80 :header-rows: 1 :align: left * - Case - Command line * - quick-test - .. code-block:: bash cd "${WES_MAPPING_BWA_GATK_DIR:?}/" make all-unit-tests make test-conda-report.html make clean Preparation ----------- In order to prepare a run, you may try the following commands: .. list-table:: :widths: 10 80 :header-rows: 1 :align: left * - Case - Command line * - gustaveroussy-references-hg38 - .. code-block:: bash # These commands point to available datasets for HG38 mapping on Flamingo FASTA="/mnt/beegfs/database/bioinfo/Index_DB/Fasta/Gencode/GRCH38/DNA/gencodeV27_dna.fa" KNOWN_VCF="" COLD_STORAGE=(/mnt/isilon /mnt/archivage) * - single-end - .. code-block:: bash # These commands help you to build single-ended configuration files conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_design.py" --single "${WES_MAPPING_BWA_GATK_PREPARE_DIR:?}" --single python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_config.py" "${FASTA:?}" "${KNOWN_VCF[@]}" * - paired-end - .. code-block:: bash # These commands help you to build pair-ended configuration files conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_design.py" --single "${WES_MAPPING_BWA_GATK_PREPARE_DIR:?}" python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_config.py" "${FASTA:?}" "${KNOWN_VCF[@]}" Execution --------- In order to execute the pipeline, you may run the following commands: .. list-table:: :widths: 10 80 :header-rows: 1 :align: left * - Case - Command line(s) * - local - .. code-block:: bash conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report * - dry-run - .. code-block:: bash conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -prn * - torque - .. code-block:: bash # These commands help you to run this pipeline on clusters. However {'\# queues may not be chosen wisely': 'see profiles.'} conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr -j 100 --cluster "qsub -V -d ${CEL_CNV_EACON_WORKDIR:?} -j oe -l nodes=1:ppn={threads},mem={resources.mem_mb}mb,walltime={resources.time_min}:00" --restart-time 3 snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report * - slurm - .. code-block:: bash # These commands help you to run this pipeline on clusters. However {'\# queues may not be chosen wisely': 'see profiles.'} conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr -j 100 --cluster "sbatch --mem={resources.mem_mb} --time={resources.time_min} --cpus-per-task={threads} --partition=mediumq " --restart-time 3 snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report * - profile - .. code-block:: bash # These commands help you to run this pipeline on clusters. However # they require the profile installation. Then, queues, threads, memory # and restarts times will be chosen the best way. conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk snakemake -s "${STRONGR_DIR:?}/Snakefile" --profile slurm snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report