.. _`infer-sequencing-library (✓ production)`:

INFER-SEQUENCING-LIBRARY (✓ PRODUCTION)
=======================================

Version: 11-09-2020
Tags: Quality / Control / QC / inference / library

This pipeline uses BWA, Picard, RSeQC and Samtools to build inferences on
sequencing libraries used in a provided paired-end sequencing.

   * https://github.com/tdayris-perso/infer-sequencing-library


.. role:: bash(code)
  :language: bash

Pipeline dependencies
---------------------

This pipeline requires the following packages to be run. Any other additional requirements are being installed dynamically.

Conda:
  
  * conda-forge::python=3.8.5
  
  * conda-forge::pytest=6.0.1
  
  * conda-forge::datrie=0.8.2
  
  * conda-forge::git=2.28.0
  
  * conda-forge::jinja2=2.11.2
  
  * conda-forge::pygraphviz=1.5
  
  * conda-forge::flask=1.1.2
  
  * conda-forge::pandas=1.1.0
  
  * conda-forge::zlib=1.2.11
  
  * conda-forge::openssl=1.1.1g
  
  * conda-forge::networkx=2.4
  
  * bioconda::snakemake=5.22.1
  
  * conda-forge::black=19.10b0
  
  * conda-forge::ipython=7.17.0
  
  * conda-forge::bashlex=0.15
  

Additionally, the following prerequisites are non-optional:


* Conda

* Genome sequence and annotation


Input files
-----------

Please find below the list of required input files:


* Fasta-formatted genome sequence

* GTF-formatted genome annotation

* BED12-formatted genome annotation


Output files
------------

Please find below the list of expected output files:


* TSV formatted library information


Notes
-----

This pipeline takes the cold storage into account. No need to copy your data in advance.

In order to install, use "conda" to install required environment, and "git" to clone the git repository.


Installation
------------

While installing the workflow, you may run the following commands (order matters):

.. list-table::
  :widths: 10 80
  :header-rows: 1
  :align: left

  * - Case
    - Command line
  * - git
    - .. code-block:: bash 

        # This command clones the github repository

        if [ ! -d "${INFER_SEQUENCING_LIBRARY_DIR:?}" ]; then git clone https://github.com/tdayris-perso/infer-sequencing-library.git "${INFER_SEQUENCING_LIBRARY_DIR:?}"; fi
  * - conda
    - .. code-block:: bash 

        # This command creates the conda virtual environment. It requires an

        # access to the git repository (see above).

        conda env create --force --file "${STRONGR_DIR:?}/workflows/quality/infer-sequencing-library/environment.yaml"


Testing
-------

In order to test the pipeline, you may try the following commands:

.. list-table::
  :widths: 10 80
  :header-rows: 1
  :align: left

  * - Case
    - Command line
  * - quick-test
    - .. code-block:: bash 

        cd "${INFER_SEQUENCING_LIBRARY_DIR:?}"

        make conda tests

        make all-unit-tests

        make test-conda-report.html

        make clean


Preparation
-----------

In order to prepare a run, you may try the following commands:

.. list-table::
  :widths: 10 80
  :header-rows: 1
  :align: left

  * - Case
    - Command line
  * - activate
    - .. code-block:: bash 

        # This command activates the conda environment available after the

        # installation process.

        conda activate infer-sequencing-library || source activate infer-sequencing-library
  * - gustaveroussy-references-hg38
    - .. code-block:: bash 

        # This points to HG38 references for Gustave Roussy's flamingo

        FASTA="/mnt/beegfs/database/bioinfo/Index_DB/Fasta/Gencode/GRCH38/DNA/GRCh38.p13.genome.fa"

        GTF="/mnt/beegfs/database/bioinfo/Index_DB/GTF/Gencode/GRCH38/release_34/gencode.v34.annotation.gtf"

        BED="/mnt/beegfs/database/bioinfo/Index_DB/rseqc/hg38_Gencode_V28.bed"

        COLD_STORAGE=(/mnt/isilon /mnt/archivage)
  * - prepare-pipeline
    - .. code-block:: bash 

        # This command builds the configuration file

        python3.8 "${INFER_SEQUENCING_LIBRARY_DIR:?}/scripts/prepare_pipeline.py" "${FASTA:?}" "${GTF:?}" "${BED:?}" --cold_storage "${COLD_STORAGE:?}" --workdir "${INFER_SEQUENCING_LIBRARY_PREPARE_DIR:?}"


Execution
---------

In order to execute the pipeline, you may run the following commands:

.. list-table::
  :widths: 10 80
  :header-rows: 1
  :align: left

  * - Case
    - Command line(s)
  * - local
    - .. code-block:: bash 

        source activate infer-sequencing-library || conda activate infer-sequencing-library

        snakemake -s "${INFER_SEQUENCING_LIBRARY_DIR:?}/Snakefile" -r -p --configfile config.yaml -j 4 --use-conda
  * - torque
    - .. code-block:: bash 

        # While reserving optimal threads and memory requirements,

        # the choice of the queue might not be optimal.

        # See profiles below.

        source activate infer-sequencing-library || conda activate infer-sequencing-library

        snakemake -s "${INFER_SEQUENCING_LIBRARY_DIR:?}/Snakefile" -r -p --configfile config.yaml -j 100 --cluster "qsub -V -d ${INFER_SEQUENCING_LIBRARY_WORKDIR:?} -j oe -l nodes=1:ppn={threads},mem={resources.mem_mb}mb,walltime={resources.time_min}:00" --use-conda
  * - slurm
    - .. code-block:: bash 

        # While reserving optimal threads and memory requirements,

        # the choice of the queue might not be optimal.

        # See profiles below.

        source activate infer-sequencing-library || conda activate infer-sequencing-library

        snakemake -s "${INFER_SEQUENCING_LIBRARY_DIR:?}/Snakefile" -r -p --configfile config.yaml -j 100 --cluster "sbatch --mem={resources.mem_mb} --time={resources.time_min} --cpus-per-task={threads}" --use-conda
  * - profile
    - .. code-block:: bash 

        # Requires slurm profile installation

        source activate infer-sequencing-library || conda activate infer-sequencing-library

        snakemake -s "${INFER_SEQUENCING_LIBRARY_DIR:?}/Snakefile" --configfile config.yaml --profile slurm
  * - report
    - .. code-block:: bash 

        snakemake -s "${INFER_SEQUENCING_LIBRARY_DIR:?}/Snakefile" --configfile config.yaml --report