Unified Processing Pipeline

IGN LiDAR HD v2.0+ introduces a unified processing pipeline that combines RAW LAZ preprocessing and patch extraction into a single, optimized workflow.

Overview

Old Pipeline (v1.x)

RAW LAZ → Enriched LAZ → Patches
  ↓          ↓            ↓
 Disk      Disk         Disk

Issues:

Multiple disk I/O operations
Intermediate files require storage
Slower overall processing
Manual two-step workflow

New Pipeline (v2.0+)

RAW LAZ → [In-Memory Processing] → Patches
  ↓                                  ↓
 Disk                              Disk

Benefits:

Single-step workflow
In-memory processing
35-50% space savings (no intermediate files)
2-3x faster processing
Automatic optimization

Key Features

1. Single-Step Processing

Process RAW LAZ files directly to patches in one command:

# Using Hydra CLI
python -m ign_lidar.cli.hydra_app \
  input_dir=/path/to/raw_laz \
  output_dir=/path/to/patches

# Using legacy CLI
ign-lidar-hd enrich /path/to/raw_laz /path/to/patches

The pipeline automatically:

Downloads missing files (if auto-download enabled)
Preprocesses RAW LAZ (enrichment)
Extracts patches
Computes features
Saves final output

2. In-Memory Processing

No intermediate files written to disk unless explicitly configured:

# config.yaml
output:
  save_enriched_laz: false # Skip intermediate LAZ (default)
  only_enriched_laz: false # Only save enriched LAZ (optional)

Memory Management:

Processes one tile at a time
Configurable batch sizes for patches
Automatic cleanup
GPU memory pooling (if GPU enabled)

3. Space Savings

Before (v1.x):

RAW LAZ:      1.0 GB
Enriched LAZ: 1.5 GB  ← Intermediate file
Patches:      0.5 GB
─────────────────────
Total Disk:   3.0 GB

After (v2.0+):

RAW LAZ:      1.0 GB
Patches:      0.5 GB
─────────────────────
Total Disk:   1.5 GB (50% savings!)

4. Speed Improvements

Processing Time Comparison:

Dataset Size	v1.x Pipeline	v2.0+ Unified	Speedup
1 tile	45s	20s	2.25x
10 tiles	8.5 min	3.5 min	2.4x
100 tiles	92 min	38 min	2.4x

Performance

The speedup comes from:

Eliminating disk I/O for intermediate files
In-memory data passing
Optimized memory layout
GPU acceleration (if available)

Configuration

Basic Configuration

# Unified pipeline (default)
output:
  save_enriched_laz: false # Don't save intermediate LAZ

Save Intermediate LAZ (Optional)

If you need enriched LAZ files:

output:
  save_enriched_laz: true # Save both enriched LAZ and patches
  enriched_laz_dir: "/path/to/enriched"

Enriched LAZ Only Mode

For 3-5x faster processing when you only need enriched LAZ:

output:
  only_enriched_laz: true # Skip patch extraction
  save_enriched_laz: true

See Enriched LAZ Only Mode for details.

Usage Examples

Example 1: Default Unified Pipeline

Process RAW LAZ to patches directly:

python -m ign_lidar.cli.hydra_app \
  input_dir=/data/raw_laz \
  output_dir=/data/output \
  preprocessing.num_neighbors=50

Output:

/data/output/
├── tile_001.laz          # Processed patches
├── tile_002.laz
└── tile_003.laz

Example 2: With Intermediate LAZ

Save both enriched LAZ and patches:

python -m ign_lidar.cli.hydra_app \
  input_dir=/data/raw_laz \
  output_dir=/data/output \
  output.save_enriched_laz=true \
  output.enriched_laz_dir=/data/enriched

Output:

/data/enriched/
├── tile_001.laz          # Enriched LAZ files
├── tile_002.laz
└── tile_003.laz

/data/output/
├── tile_001.laz          # Processed patches
├── tile_002.laz
└── tile_003.laz

Example 3: With Tile Stitching

Combine multiple tiles in-memory before patch extraction:

python -m ign_lidar.cli.hydra_app \
  input_dir=/data/raw_laz \
  output_dir=/data/output \
  stitching.enabled=true \
  stitching.pattern=3x3

Workflow:

RAW Tiles → [Stitch In-Memory] → [Extract Patches] → Output
tile_001.laz ─┐
tile_002.laz ─┼→ Combined Point Cloud → Patches → stitched.laz
tile_003.laz ─┘

Example 4: GPU Accelerated

Enable GPU for faster feature computation:

python -m ign_lidar.cli.hydra_app \
  input_dir=/data/raw_laz \
  output_dir=/data/output \
  gpu.enabled=true \
  gpu.use_cuml=true

Performance Optimization

1. Memory Management

Control memory usage:

preprocessing:
  batch_size: 50000 # Points per batch
  num_workers: 4 # CPU threads

gpu:
  chunk_size: 100000 # GPU chunk size

Guidelines:

Larger batches = faster, more memory
Smaller batches = slower, less memory
Monitor with: nvidia-smi (GPU) or htop (CPU)

2. Disk I/O Optimization

Use fast storage:

SSD preferred over HDD
Local storage faster than network
Temporary files on fast partition

Example:

output:
  temp_dir: "/tmp" # Fast local storage

3. Parallel Processing

Process multiple tiles concurrently:

from ign_lidar.core.processor import LiDARProcessor
from concurrent.futures import ProcessPoolExecutor

processor = LiDARProcessor(config)

def process_tile(tile_path):
    return processor.process_tile(tile_path)

with ProcessPoolExecutor(max_workers=4) as executor:
    results = executor.map(process_tile, tile_paths)

Migration from v1.x

Old Two-Step Workflow

# v1.x - Step 1: Enrich
ign-lidar-hd enrich /data/raw /data/enriched

# v1.x - Step 2: Extract patches
ign-lidar-hd extract /data/enriched /data/patches

New Unified Workflow

# v2.0+ - Single step
python -m ign_lidar.cli.hydra_app \
  input_dir=/data/raw \
  output_dir=/data/patches

Configuration Migration

Old (v1.x):

enrichment:
  num_neighbors: 50
  features: ["planarity", "linearity"]

extraction:
  patch_size: 50.0
  overlap: 10.0

New (v2.0+):

preprocessing:
  num_neighbors: 50

features:
  enabled_features: ["planarity", "linearity"]

processing:
  patch_size: 50.0
  patch_overlap: 10.0

API Usage