CLI Reference: Preprocessing
The PrismToolBox CLI provides useful preprocessing capabilities for whole slide images through the ptb preprocessing command.
Overview
The preprocessing module includes two main commands:
contour: Extract tissue contours from whole slide imagespatchify: Extract patches from slides using tissue contours
Installation
Make sure you have PrismToolBox installed:
Global Options
All preprocessing commands support these global options:
--verbose, -v: Increase verbosity (can be used multiple times:-v,-vv)--help: Show help message
Commands
ptb preprocessing contour
Extract tissue contours from whole slide images.
Usage
Arguments
SLIDE_DIRECTORY: Path to the directory containing the slide filesRESULTS_DIRECTORY: Path to the directory where the results will be saved
Options
| Option | Type | Description | Default |
|---|---|---|---|
--engine |
str |
Engine for reading slides (openslide, tiffslide). |
openslide |
--annotations-directory |
str | None |
Path to annotations directory | None |
--contours-exts |
list[str] |
File extensions for contour annotations (geojson, pickle) |
[pickle] |
--config-file |
str |
Path to configuration file | None |
--visualize |
bool |
Visualize the extracted contours | False |
Configuration File
You can use a YAML configuration file to specify tissue extraction and visualization parameters:
# Default configuration for PrismToolBox contouring
# Tissue contour extraction parameters
contour_settings:
seg_level: 4 # (int) Segmentation level for the tissue contour extraction.
window_avg: 30 # (int) Size of the window average for tissue extraction.
window_eng: 5 # (int) Size of the window to use for computing energy for tissue extraction.
thresh: 190 # (int) Threshold for the tissue extraction algorithm.
area_min: 50000 # (int) Minimum area for the tissue contour.
# Tissue visualization parameters
visualization_settings:
vis_level: 4 # (int) Visualization level for the tissue contour extraction.
number_contours: false # (bool) Plot the id number for each contour.
line_thickness: 50 # (bool) Line thickness for the contour visualization.
Examples
# Basic contour extraction
ptb preprocessing contour slides/ results/
# With visualization
ptb preprocessing contour slides/ results/ --visualize
# Using custom configuration
ptb preprocessing contour slides/ results/ --config-file custom_config.yaml
# With annotations and multiple output formats
ptb preprocessing contour slides/ results/ --annotations-directory annotations/ --contours-exts pickle geojson --visualize
ptb preprocessing patchify
Extract patches from slides using tissue contours.
Usage
Arguments
SLIDE_DIRECTORY: Path to the directory containing the slide filesRESULTS_DIRECTORY: Path to the directory where the results will be saved
Options
| Option | Type | Description | Default |
|---|---|---|---|
--roi-csv |
str | None |
Path to the csv file containing the ROIs | None |
--contours-directory |
str | None |
Path to directory containing contour annotations | None |
--engine |
str |
Engine for reading slides | openslide |
--mode |
str |
Extraction mode (contours, roi, all) |
contours |
--patch-exts |
list[str] |
File extensions for patches (h5, geojson) |
[h5] |
--config-file |
str | None |
Path to configuration file | None |
Configuration File
Example configuration for patch extraction:
# Default configuration for PrismToolBox patching
# Patch extraction parameters
patch_settings:
patch_level: 0 # (float) Level of the slide to extract patches from.
patch_size: 256 # (float) Size of the patches to extract.
overlap: 0 # (float) Overlap between the patches.
units: ["px", "px"] # (str, str) Units for the patch size and overlap. Options are 'pixels' or 'micro' for micrometers.
contours_mode: "four_pt" # (str) The mode to use for the contour checking. Possible values are center, four_pt, and four_pt_hard.
rgb_threshs: [2, 240] # (int, int) The thresholds for the RGB channels (black threshold, white threshold).
percentages: [0.6, 0.9] # (float, float) The percentages of pixels below/above the thresholds to consider the patch as black/white.
# Patch stitching parameters
stitching_settings:
vis_level: 4 # (int) Level of the slide to stitch the patches at.
draw_grid: false # (bool) Whether to draw a grid on the stitched image.
Examples
# Basic patch extraction
ptb preprocessing patchify slides/ results/
# Patch extraction within a ROI
ptb preprocessing patchify slides/ results/ --mode roi --roi-directory results/rois.csv
# Within previously extracted tissue contours and custom configuration
ptb preprocessing patchify slides/ results/ --mode contours --contours-directory results/contours/ --config-file patch_config.yaml
# Extract patches in multiple formats
ptb preprocessing patchify slides/ results/ --contours-directory results/contours/ --patch-exts h5 geojson
Attention: For the roi mode, you need to provide a table with the ROIs in a CSV format, where each row corresponds to a slide and contains the slide ID and coordinates of the ROI.
# Extract patches from a specific ROI
ptb preprocessing patchify slides/ results/ --mode roi --roi-csv results/rois.csv
Complete Workflow Example
Here's a complete example of processing a dataset:
# Step 1: Extract tissue contours with visualization
ptb preprocessing contouring slides/ results/ --visualize --config-file tissue_config.yaml
# Step 2: Extract patches from the contours
ptb preprocessing patching slides/ results/ --contours-directory results/contours/ --config-file patch_config.yaml --patch-exts geojson
Results will be saved in: - results/contours/ (tissue contours as pickle files) - results/contoured_images/ (visualizations) - results/patches_256_ovelap_0/ (extracted patches as geojson coordinates) - results/stitched_images_256_ovelap_0/ (patch visualizations)
Tips and Best Practices
- Start with small datasets: Process a few slides first to validate your parameters
- Use visualizations: Use
--visualizeflag to check if tissue detection works correctly, and--stitchto visualize the selected patches. - Monitor output: Use verbose mode (
-vor-vv) to see detailed processing information