Unified Processing Pipeline
IGN LiDAR HD v2.0+ introduces a unified processing pipeline that combines RAW LAZ preprocessing and patch extraction into a single, optimized workflow.
Overviewβ
Old Pipeline (v1.x)β
RAW LAZ β Enriched LAZ β Patches
β β β
Disk Disk Disk
Issues:
- Multiple disk I/O operations
- Intermediate files require storage
- Slower overall processing
- Manual two-step workflow
New Pipeline (v2.0+)β
RAW LAZ β [In-Memory Processing] β Patches
β β
Disk Disk
Benefits:
- Single-step workflow
- In-memory processing
- 35-50% space savings (no intermediate files)
- 2-3x faster processing
- Automatic optimization
Key Featuresβ
1. Single-Step Processingβ
Process RAW LAZ files directly to patches in one command:
# Using Hydra CLI
python -m ign_lidar.cli.hydra_app \
input_dir=/path/to/raw_laz \
output_dir=/path/to/patches
# Using legacy CLI
ign-lidar-hd enrich /path/to/raw_laz /path/to/patches
The pipeline automatically:
- Downloads missing files (if auto-download enabled)
- Preprocesses RAW LAZ (enrichment)
- Extracts patches
- Computes features
- Saves final output
2. In-Memory Processingβ
No intermediate files written to disk unless explicitly configured:
# config.yaml
output:
save_enriched_laz: false # Skip intermediate LAZ (default)
only_enriched_laz: false # Only save enriched LAZ (optional)
Memory Management:
- Processes one tile at a time
- Configurable batch sizes for patches
- Automatic cleanup
- GPU memory pooling (if GPU enabled)
3. Space Savingsβ
Before (v1.x):
RAW LAZ: 1.0 GB
Enriched LAZ: 1.5 GB β Intermediate file
Patches: 0.5 GB
βββββββββββββββββββββ
Total Disk: 3.0 GB
After (v2.0+):
RAW LAZ: 1.0 GB
Patches: 0.5 GB
βββββββββββββββββββββ
Total Disk: 1.5 GB (50% savings!)
4. Speed Improvementsβ
Processing Time Comparison:
Dataset Size | v1.x Pipeline | v2.0+ Unified | Speedup |
---|---|---|---|
1 tile | 45s | 20s | 2.25x |
10 tiles | 8.5 min | 3.5 min | 2.4x |
100 tiles | 92 min | 38 min | 2.4x |
The speedup comes from:
- Eliminating disk I/O for intermediate files
- In-memory data passing
- Optimized memory layout
- GPU acceleration (if available)
Configurationβ
Basic Configurationβ
# Unified pipeline (default)
output:
save_enriched_laz: false # Don't save intermediate LAZ
Save Intermediate LAZ (Optional)β
If you need enriched LAZ files:
output:
save_enriched_laz: true # Save both enriched LAZ and patches
enriched_laz_dir: "/path/to/enriched"
Enriched LAZ Only Modeβ
For 3-5x faster processing when you only need enriched LAZ:
output:
only_enriched_laz: true # Skip patch extraction
save_enriched_laz: true
See Enriched LAZ Only Mode for details.
Usage Examplesβ
Example 1: Default Unified Pipelineβ
Process RAW LAZ to patches directly:
python -m ign_lidar.cli.hydra_app \
input_dir=/data/raw_laz \
output_dir=/data/output \
preprocessing.num_neighbors=50
Output:
/data/output/
βββ tile_001.laz # Processed patches
βββ tile_002.laz
βββ tile_003.laz
Example 2: With Intermediate LAZβ
Save both enriched LAZ and patches:
python -m ign_lidar.cli.hydra_app \
input_dir=/data/raw_laz \
output_dir=/data/output \
output.save_enriched_laz=true \
output.enriched_laz_dir=/data/enriched
Output:
/data/enriched/
βββ tile_001.laz # Enriched LAZ files
βββ tile_002.laz
βββ tile_003.laz
/data/output/
βββ tile_001.laz # Processed patches
βββ tile_002.laz
βββ tile_003.laz
Example 3: With Tile Stitchingβ
Combine multiple tiles in-memory before patch extraction:
python -m ign_lidar.cli.hydra_app \
input_dir=/data/raw_laz \
output_dir=/data/output \
stitching.enabled=true \
stitching.pattern=3x3
Workflow:
RAW Tiles β [Stitch In-Memory] β [Extract Patches] β Output
tile_001.laz ββ
tile_002.laz ββΌβ Combined Point Cloud β Patches β stitched.laz
tile_003.laz ββ
Example 4: GPU Acceleratedβ
Enable GPU for faster feature computation:
python -m ign_lidar.cli.hydra_app \
input_dir=/data/raw_laz \
output_dir=/data/output \
gpu.enabled=true \
gpu.use_cuml=true
Performance Optimizationβ
1. Memory Managementβ
Control memory usage:
preprocessing:
batch_size: 50000 # Points per batch
num_workers: 4 # CPU threads
gpu:
chunk_size: 100000 # GPU chunk size
Guidelines:
- Larger batches = faster, more memory
- Smaller batches = slower, less memory
- Monitor with:
nvidia-smi
(GPU) orhtop
(CPU)
2. Disk I/O Optimizationβ
Use fast storage:
- SSD preferred over HDD
- Local storage faster than network
- Temporary files on fast partition
Example:
output:
temp_dir: "/tmp" # Fast local storage
3. Parallel Processingβ
Process multiple tiles concurrently:
from ign_lidar.core.processor import LiDARProcessor
from concurrent.futures import ProcessPoolExecutor
processor = LiDARProcessor(config)
def process_tile(tile_path):
return processor.process_tile(tile_path)
with ProcessPoolExecutor(max_workers=4) as executor:
results = executor.map(process_tile, tile_paths)
Migration from v1.xβ
Old Two-Step Workflowβ
# v1.x - Step 1: Enrich
ign-lidar-hd enrich /data/raw /data/enriched
# v1.x - Step 2: Extract patches
ign-lidar-hd extract /data/enriched /data/patches
New Unified Workflowβ
# v2.0+ - Single step
python -m ign_lidar.cli.hydra_app \
input_dir=/data/raw \
output_dir=/data/patches
Configuration Migrationβ
Old (v1.x):
enrichment:
num_neighbors: 50
features: ["planarity", "linearity"]
extraction:
patch_size: 50.0
overlap: 10.0
New (v2.0+):
preprocessing:
num_neighbors: 50
features:
enabled_features: ["planarity", "linearity"]
processing:
patch_size: 50.0
patch_overlap: 10.0
API Usageβ
Unified Processingβ
from ign_lidar.core.processor import LiDARProcessor
from omegaconf import OmegaConf
# Load configuration
config = OmegaConf.load("config.yaml")
# Create processor
processor = LiDARProcessor(config)
# Process single tile (RAW β Patches)
result = processor.process_tile(
input_path="raw_tile.laz",
output_path="output_patches.laz"
)
print(f"Processed {result.num_points} points")
print(f"Extracted {result.num_patches} patches")
With Intermediate Outputβ
# Save enriched LAZ during processing
processor = LiDARProcessor(config)
result = processor.process_tile(
input_path="raw_tile.laz",
output_path="output_patches.laz",
save_enriched=True,
enriched_path="enriched_tile.laz"
)
Batch Processingβ
from pathlib import Path
input_dir = Path("/data/raw")
output_dir = Path("/data/output")
# Process all LAZ files
for laz_file in input_dir.glob("*.laz"):
output_file = output_dir / laz_file.name
processor.process_tile(
input_path=str(laz_file),
output_path=str(output_file)
)
Troubleshootingβ
Out of Memoryβ
Issue: Process crashes with memory errors
Solutions:
-
Reduce batch size:
preprocessing:
batch_size: 25000 # Smaller batches -
Process one tile at a time:
for tile in tiles:
processor.process_tile(tile)
# Memory released after each tile -
Enable enriched LAZ saving (trades speed for memory):
output:
save_enriched_laz: true
Slow Processingβ
Issue: Processing slower than expected
Solutions:
-
Enable GPU acceleration:
gpu:
enabled: true -
Increase parallel workers:
preprocessing:
num_workers: 8 # Match CPU cores -
Check disk I/O bottleneck:
iotop # Monitor disk usage
Incomplete Outputβ
Issue: Some patches missing
Solutions:
-
Check logs for errors:
logging:
level: DEBUG -
Verify input files are valid:
import laspy
las = laspy.read("input.laz")
print(f"Points: {len(las.points)}")
Best Practicesβ
1. Start with Default Settingsβ
# Use defaults for first run
python -m ign_lidar.cli.hydra_app \
input_dir=/data/raw \
output_dir=/data/output
2. Monitor Resource Usageβ
# Terminal 1: Run processing
python -m ign_lidar.cli.hydra_app ...
# Terminal 2: Monitor resources
watch -n 1 nvidia-smi # GPU
htop # CPU/Memory
iotop # Disk
3. Test on Small Datasetβ
# Test on 1-2 tiles first
python -m ign_lidar.cli.hydra_app \
input_dir=/data/raw \
output_dir=/data/test \
input.file_pattern="*_001.laz"
4. Use Configuration Filesβ
# Save working configuration
python -m ign_lidar.cli.hydra_app \
input_dir=/data/raw \
output_dir=/data/output \
--cfg job > working_config.yaml
# Reuse configuration
python -m ign_lidar.cli.hydra_app \
--config-name=working_config
Related Documentationβ
- Hydra CLI Guide - CLI usage
- Configuration System - Configuration options
- Enriched LAZ Only Mode - Faster preprocessing
- Tile Stitching - Multi-tile processing
- Performance Guide - Optimization tips
Summaryβ
The unified processing pipeline in v2.0+ offers:
β
Simplicity - Single command for RAW to Patches
β
Speed - 2-3x faster than v1.x
β
Efficiency - 35-50% disk space savings
β
Flexibility - Multiple output modes
β
Compatibility - Works with existing workflows
Recommended for all new projects!