Skip to content

CLI Reference: Preprocessing

The PrismToolBox CLI provides useful preprocessing capabilities for whole slide images through the ptb preprocessing command.

Overview

The preprocessing module includes two main commands:

  1. contouring: Extract tissue contours from whole slide images
  2. patching: Extract patches from slides using tissue contours

Installation

Make sure you have PrismToolBox installed:

# Basic installation
pip install prismtoolbox

Global Options

All preprocessing commands support these global options:

  • --verbose, -v: Increase verbosity (can be used multiple times: -v, -vv)
  • --help: Show help message

Commands

ptb preprocessing contouring

Extract tissue contours from whole slide images.

Usage

ptb preprocessing contouring [OPTIONS] SLIDE_DIRECTORY RESULTS_DIRECTORY

Arguments

  • SLIDE_DIRECTORY: Path to the directory containing the slide files
  • RESULTS_DIRECTORY: Path to the directory where the results will be saved

Options

Option Type Description Default
--engine str Engine for reading slides (openslide, tiffslide). openslide
--annotations-directory str | None Path to annotations directory None
--contours-exts list[str] File extensions for contour annotations (geojson, pickle) [pickle]
--config-file str Path to configuration file None
--visualize bool Visualize the extracted contours False

Configuration File

You can use a YAML configuration file to specify tissue extraction and visualization parameters:

# Default configuration for PrismToolBox contouring

# Tissue contour extraction parameters
contouring:
  seg_level: 2 # (int) Segmentation level for the tissue contour extraction.
  window_avg: 30 # (int) Size of the window average for tissue extraction.
  window_eng: 3 # (int) Size of the window to use for computing energy for tissue extraction.
  thresh: 120 # (int) Threshold for the tissue extraction algorithm.
  area_min: 6000 # (int) Minimum area for the tissue contour.

# Tissue visualization parameters
visualizing:
  vis_level: 2 # (int) Visualization level for the tissue contour extraction.
  number_contours: false # (bool) Plot the id number for each contour.
  line_thickness: 50 # (bool) Line thickness for the contour visualization.

Examples

# Basic contour extraction
ptb preprocessing contouring slides/ results/

# With visualization
ptb preprocessing contouring slides/ results/ --visualize

# Using custom configuration
ptb preprocessing contouring slides/ results/ --config-file custom_config.yaml

# With annotations and multiple output formats
ptb preprocessing contouring slides/ results/ --annotations-directory annotations/ --contours-exts pickle geojson --visualize

ptb preprocessing patching

Extract patches from slides using tissue contours.

Usage

ptb preprocessing patching [OPTIONS] SLIDE_DIRECTORY RESULTS_DIRECTORY

Arguments

  • SLIDE_DIRECTORY: Path to the directory containing the slide files
  • RESULTS_DIRECTORY: Path to the directory where the results will be saved

Options

Option Type Description Default
--contours-directory str | None Path to directory containing contour annotations None
--engine str Engine for reading slides openslide
--mode str Extraction mode (contours, roi, all) contours
--patch-exts list[str] File extensions for patches (h5, geojson) [h5]
--config-file str | None Path to configuration file None

Configuration File

Example configuration for patch extraction:

# Default configuration for PrismToolBox patching

# Patch extraction parameters
patching:
  patch_level: 0 # (float) Level of the slide to extract patches from. 
  patch_size: 256 # (float) Size of the patches to extract. 
  overlap: 0 # (float) Overlap between the patches. 
  units: ["px", "px"] # (str, str) Units for the patch size and overlap. Options are 'pixels' or 'micro' for micrometers.
  contours_mode: "four_pt" #  (str) The mode to use for the contour checking. Possible values are center, four_pt, and four_pt_hard.
  rgb_threshs: [2, 240] # (int, int) The thresholds for the RGB channels (black threshold, white threshold).
  percentages: [0.6, 0.9] # (float, float) The percentages of pixels below/above the thresholds to consider the patch as black/white.

# Patch stitching parameters
stitching:
  vis_level: 2 # (int) Level of the slide to stitch the patches at.
  draw_grid: false # (bool) Whether to draw a grid on the stitched image.

Examples

# Basic patch extraction
ptb preprocessing patching slides/ results/ --contours-directory results/contours/

# With custom configuration
ptb preprocessing patching slides/ results/  --contours-directory results/contours/ --config-file patch_config.yaml

# Extract patches in multiple formats
ptb preprocessing patching slides/ results/ --contours-directory results/contours/ --patch-exts h5 geojson

Complete Workflow Example

Here's a complete example of processing a dataset:

# Step 1: Extract tissue contours with visualization
ptb preprocessing contouring slides/ results/ --visualize --config-file tissue_config.yaml

# Step 2: Extract patches from the contours
ptb preprocessing patching slides/ results/ --contours-directory results/contours/ --config-file patch_config.yaml --patch-exts geojson

# Results will be saved in:
# - results/contours/        (tissue contours)
# - results/contoured_images/ (visualizations)
# - results/patches_256_ovelap_0/       (extracted patches)
# - results/stitched_images_256_ovelap_0/ (patch visualizations)

Error Handling

Common issues and solutions:

Missing Dependencies

Error: Segmentation features require additional dependencies.
Please install with: pip install prismtoolbox[seg]

Solution: Install the required dependencies:

pip install prismtoolbox[seg,emb]

Configuration File Issues

Warning: Incomplete tissue extraction parameters in config file

Solution: Ensure your configuration file contains all required parameters for each section.

File Path Issues

Error: No valid config file found. Using default parameters.

Solution: Check that your configuration file path is correct and the file exists.

Tips and Best Practices

  1. Start with visualization: Use --visualize flag to check if tissue detection works correctly
  2. Test with small datasets: Process a few slides first to validate your parameters
  3. Use configuration files: Store your parameters in YAML files for reproducibility
  4. Monitor output: Use verbose mode (-v or -vv) to see detailed processing information