How to add your own pipeline

STRonGR welcomes all kind of pipeline: it does not control them, it does not test them, it does not provide any warranty on any pipeline.

In order to add you workflow into STRonGR, you will need to:

  1. Select your category (or create one)

  2. Build two yaml files

  3. Test documentation compilation

Third step won’t work unless the previous ones are correctly done.

We suppose, that if you’re here, then you’ve installed STRonGR successfully.

Create a git branch for you work

1
2
3
4
5
6
7
8
# Look at the current branch and state:
git status

# Create a new branch:
git checkout -b my_branch

# Check that you're in the expected brach:
git status

Select or build a pipeline category

Each pipeline belongs to a dedicated section. This is done in order to ease future users research. Please, have a look at the categories in the left navigation bar: it is a mirror of the STRonGR/workflows repository!

By adding new repositories, you will create new sections within the documentation! This means, that if you do not find a correct (sub-)section for your pipeline, because you’re working on Brandnew-Seq technology, then you still can document it here.

If you’re looking for a sec of *-Seq metods, then look at this catalog.

If your section already exists, then you’re good!

Example: I want to add a pipeline that performs RNA-Seq quantification with Kallisto. I see there is a RNA directory, which contains a Count section. Well, I’m good.

Example2: I want to add a pipeline that performs SingleCell RNA-Seq mapping, then I don’t see any Single Cell section. I can create it with:

1
2
# Create a new directory
mkdir --verbose --parents SingleCell/rna/mapping

If you question yourself, weather to put your pipeline in one or another section, read the meta.yaml present in each subdirectory: it contains the description of each node within the documentation.

If you create a new section, feel free to create the associated meta.yaml to describe it to future users! Thanks in advance!

Finally, create a repository with the name of your pipeline. Please, no space in pipeline names. Use CamelCase or snake_case.

Build your yaml files

Now that your pipeline has a dedicated repository, then you need a pair of yaml files. Why two of them ? Because one contains the documentation, and the other contains the conda environment!

Let’s build them.

environment.yaml

This is a classic Conda yaml file, see conda documentation in order to build it.

Remember, STRonGR does not validate any environment.yaml! If this does not work, please contact the pipeline’s author or the one who added it within STRonGR.

meta.yaml

This yaml file is dedicated to STRonGR.

It contains the following keys:

Tag name

Description

name

Your pipeline name (this time, it may contain spaces and any other special characters!)

tags

A list of tags used to in quicksearch to find your pipeline.

short_description

A short description, usually in one line. It will be displayed in command line parsing and in the documentation.

long_description

A long description. Please, add as many information as one may need to understand and use your pipeline. If it exists, provide a link to the pipeline’s own documentation.

authors

The persons who have developed/documented this pipeline within STRonGR. This is a list which can have one or multiple values.

input

A list of required input files. This list must be non-empty.

output

A list of remarkable output files/directories.

prerequisites

A list of prerequisites that are not supported by the conda environment.yaml. This can be softwares, annotations, …

note

Additional notes about the pipeline. This can be remarks about your experience with this pipelines.

install

There are installation directives. Please, add sud-categories like conda/source/… to let the user choose its installation methods. You are free to name the sub-sections the way you want to.

prepare

Many pipelines provide a set of scripts like a config.sh. Please give some examples for local runs, or cluster ones. You are free to name the sub-sections the way you want to.

execute

Command lines related to a pipeline’s execution. Just like install and prepare, this also accepts sub-categories, which you are free to name.

test

Command lines related to a pipeline’s testing (instalaltion, unit-testing, integrative, …) Just like install, execute and prepare, this also accepts sub-categories, which you are free to name.

When one uses STRonGR, one gets a bash file that is to be executed. If one used the argument –lucky, then STRonGR runs the script right away!

However, along the help pages, you might have seen environment variables like RNA_COUNT_KALLISTO_DIR. Do not worry, this is one of the environment variables set by STRonGR within the bash script produced.

The environment variables are build as follows: prefix + “_” + suffix. The are always both UPPER CASE, and snake_case. No exception.

There are two possible prefix:

  • STRONGR, which is used to point to STRonGR itself

  • PIPELINE_NAME, which is used to refer to the pipeline you use. Example: rna-count-kallisto has the following prefix: RNA_COUNT_KALLISTO. Everything is upper case, dash have been turned into underscores.

There are the possible suffixes:

  • DIR, which refers to the directory that contains the given prefix. Example: STRONGR_DIR is the installation path of STRonGR. RNA_COUNT_KALLISTO_DIR is the installation path of rna-count-kallisto.

  • PREPARE_DIR, which refers to the directory containing raw data. Example: RNA_COUNT_KALLISTO_PREPARE_DIR contains the raw data (fastq reads) for the pipeline rna-count-kallisto

Test your additions

STRonGR comes along with a complete testing suite. Remember, that we do not intend to test any pipeline functionality, only the pair of yaml files that were added.

In order to do such validation, please add tests in the test_workflows.py file. This file is meant to be executed by pytest , if you’re not familiar with it, please check these Basic patterns and examples.

The script test_workflows.py provides a couple of function to help you designing your tests without any effort. There are other testing scripts. You should not have to modify them.

For a pipeline addition:

Use the function pipeline_checking to test it. This function takes only one argument: the relative path to the pipeline.

1
2
3
4
5
def test_my_pipeline() -> None:
  """
  Some optional information about my test
  """
  pipeline_checking("relative/path/to/pipeline")
For intermediary directories:

Use the function discipline_checking to test it. One function for each directory / subdirectory you have created. It takes only one argument: the relative path to the directory.

1
2
3
4
5
def test_my_intermediary_repository() -> None:
  """
  Some optional information about my test
  """
  discipline_checking("relative/path/to/directory")

Remember to keep the very last test (doc generation) as the last test in the file.

Finally, in order to run the tests, use:

conda activate STRonGR      # To add Pytest and requirements in PATH
cd /path/to/STRonGR/test/   # Go to testing repository
make workflow-tests         # Run tests on your own additions (yaml files)

# Alternatively, you may run complete tests
# on either core of STRonGR, yaml files, documentation generation,
# and environment building.
make all-tests

Everything should be green [OK] before you commit and merge.

Finally

You can commit, push and share your work with the team using classic git commands.

Alternatively, you can see your additions by re-building the documentation like explained before in the tutorial:

conda activate STRonGR     # To add sphynx and requirements in PATH
cd /path/to/STRonGR/docs   # Go to documentation dir
# Optional
make clean                 # Clean previous pages
make html                  # Build new documentation

A new release version will be deployed on demand.