.. _`wes-mapping-bwa-gatk (⤫ legacy)`:

WES-MAPPING-BWA-GATK (⤫ LEGACY)
===============================

Version: 03-07-2019
Tags: WES / BWA / GATK / Mapping / Picard

This pipeline takes your fastq-formatted reads and returns mapped reads. These reads are optionally corrected with GATK base recalibrator and picard mark duplicates.

You may find more information about:
  * GATK: https://software.broadinstitute.org/gatk/
  * Picard: https://github.com/broadinstitute/picard
  * BWA: https://github.com/lh3/bwa

Citations:
  * BWA: Li, Heng. "Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM." arXiv preprint arXiv:1303.3997 (2013).
  * GATK: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA, 2010 GENOME RESEARCH 20:1297-303


.. role:: bash(code)
  :language: bash

Pipeline dependencies
---------------------

This pipeline requires the following packages to be run. Any other additional requirements are being installed dynamically.

Conda:
  
  * conda-forge::python=3.8.5
  
  * conda-forge::pytest=5.4.3
  
  * conda-forge::datrie=0.8.2
  
  * conda-forge::git=2.27.0
  
  * conda-forge::jinja2=2.11.2
  
  * conda-forge::pygraphviz=1.5
  
  * conda-forge::flask=1.1.2
  
  * conda-forge::pandas=1.0.5
  
  * conda-forge::zlib=1.2.11
  
  * conda-forge::openssl=1.1.1g
  
  * conda-forge::networkx=2.4
  
  * bioconda::snakemake=5.20.1
  
  * conda-forge::ipython=7.16.1
  
  * conda-forge::bashlex=0.15
  
  * conda-forge::black=19.10b0
  
  * conda-forge::patsy=0.5.1
  

Additionally, the following prerequisites are non-optional:


* Conda

* Genome sequence

* Known variant sites


Input files
-----------

Please find below the list of required input files:


* A fasta formatted genome sequence with corresponding dictionnary and index.

* Fastq formatted WES reads (one or multiple ones)

* Known variant sites in VCF format, with its corresponding index (one or multiple files)


Output files
------------

Please find below the list of expected output files:


* BAM formatted mapped reads (corrected or not, according to the parameters)

* Multiple txt files containing quality metrics

* HTML files containing containing quality metrics


Notes
-----

This pipeline takes the cold storage into account. No need to copy your data in advance.


Installation
------------

While installing the workflow, you may run the following commands (order matters):

.. list-table::
  :widths: 10 80
  :header-rows: 1
  :align: left

  * - Case
    - Command line
  * - git
    - .. code-block:: bash 

        # This command clones the git repository

        if [ ! -d "${WES_MAPPING_BWA_GATK_DIR:?}" ]; then git clone https://github.com/tdayris/wes-mapping-bwa-gatk.git "${WES_MAPPING_BWA_GATK_DIR:?}"; fi
  * - conda
    - .. code-block:: bash 

        # This command requires the git repository

        # and creates a conda virtual environment

        conda env create --force --file "${STRONGR_DIR:?}/workflows/mapping/wes-mapping-bwa-gatk/environment.yaml"


Testing
-------

In order to test the pipeline, you may try the following commands:

.. list-table::
  :widths: 10 80
  :header-rows: 1
  :align: left

  * - Case
    - Command line
  * - quick-test
    - .. code-block:: bash 

        cd "${WES_MAPPING_BWA_GATK_DIR:?}/"

        make all-unit-tests

        make test-conda-report.html

        make clean


Preparation
-----------

In order to prepare a run, you may try the following commands:

.. list-table::
  :widths: 10 80
  :header-rows: 1
  :align: left

  * - Case
    - Command line
  * - gustaveroussy-references-hg38
    - .. code-block:: bash 

        # These commands point to available datasets for HG38 mapping on Flamingo

        FASTA="/mnt/beegfs/database/bioinfo/Index_DB/Fasta/Gencode/GRCH38/DNA/gencodeV27_dna.fa"

        KNOWN_VCF=""

        COLD_STORAGE=(/mnt/isilon /mnt/archivage)
  * - single-end
    - .. code-block:: bash 

        # These commands help you to build single-ended configuration files

        conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

        python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_design.py" --single "${WES_MAPPING_BWA_GATK_PREPARE_DIR:?}" --single

        python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_config.py" "${FASTA:?}" "${KNOWN_VCF[@]}"
  * - paired-end
    - .. code-block:: bash 

        # These commands help you to build pair-ended configuration files

        conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

        python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_design.py" --single "${WES_MAPPING_BWA_GATK_PREPARE_DIR:?}"

        python3.7 "${WES_MAPPING_BWA_GATK_DIR:?}/scripts/prepare_config.py" "${FASTA:?}" "${KNOWN_VCF[@]}"


Execution
---------

In order to execute the pipeline, you may run the following commands:

.. list-table::
  :widths: 10 80
  :header-rows: 1
  :align: left

  * - Case
    - Command line(s)
  * - local
    - .. code-block:: bash 

        conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

        snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr

        snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report
  * - dry-run
    - .. code-block:: bash 

        conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

        snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -prn
  * - torque
    - .. code-block:: bash 

        # These commands help you to run this pipeline on clusters. However

        {'\# queues may not be chosen wisely': 'see profiles.'}

        conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

        snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr -j 100 --cluster "qsub -V -d ${CEL_CNV_EACON_WORKDIR:?} -j oe -l nodes=1:ppn={threads},mem={resources.mem_mb}mb,walltime={resources.time_min}:00" --restart-time 3

        snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report
  * - slurm
    - .. code-block:: bash 

        # These commands help you to run this pipeline on clusters. However

        {'\# queues may not be chosen wisely': 'see profiles.'}

        conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

        snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr -j 100 --cluster "sbatch --mem={resources.mem_mb} --time={resources.time_min} --cpus-per-task={threads} --partition=mediumq " --restart-time 3

        snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report
  * - profile
    - .. code-block:: bash 

        # These commands help you to run this pipeline on clusters. However

        # they require the profile installation. Then, queues, threads, memory

        # and restarts times will be chosen the best way.

        conda activate wes-mapping-bwa-gatk || source activate wes-mapping-bwa-gatk

        snakemake -s "${STRONGR_DIR:?}/Snakefile" --profile slurm

        snakemake -s "${STRONGR_DIR:?}/Snakefile" --use-conda -pr --report