Complete Workflow Guide
This guide walks you through the entire process of transforming raw IGN LiDAR HD data into machine learning-ready datasets.
๐ Overviewโ
The complete workflow consists of three main stages:
- Download - Acquire LiDAR tiles from IGN servers
- Enrich - Add geometric features and optional RGB data
- Patch - Create training-ready patches for ML models
๐ฏ Prerequisitesโ
Requiredโ
- Python 3.8 or higher
ign-lidar-hd
package installed- Internet connection (for downloading tiles)
- ~10GB free disk space per 10 tiles
Optionalโ
- NVIDIA GPU with CUDA support (for 5-10x speedup)
- IGN BD ORTHOยฎ orthophotos (for RGB augmentation)
๐ Method 1: Pipeline Configuration (Recommended)โ
The easiest way to run a complete workflow is using YAML configuration files.
Step 1: Create Configuration Fileโ
ign-lidar-hd pipeline config.yaml --create-example full
This creates a config.yaml
file with all options:
# config.yaml - Complete Pipeline Configuration
global:
num_workers: 4 # Parallel processing threads
verbose: true # Detailed logging
download:
# Bounding box: longitude_min, latitude_min, longitude_max, latitude_max
bbox: "2.3, 48.8, 2.4, 48.9" # Paris area
output: "data/raw"
max_tiles: 10
tile_selection_strategy: "urban" # or "building_rich", "random"
enrich:
input_dir: "data/raw"
output: "data/enriched"
mode: "full" # Focus on building features
# RGB Augmentation (optional)
add_rgb: true
rgb_source: "ign_orthophoto"
rgb_cache_dir: "cache/orthophotos"
# GPU Acceleration (optional)
use_gpu: true # Auto-fallback to CPU if GPU unavailable
# Feature Extraction
compute_normals: true
compute_curvature: true
neighborhood_size: 20
patch:
input_dir: "data/enriched"
output: "data/patches"
lod_level: "LOD2" # or "LOD3"
num_points: 16384
patch_size: 150 # meters
overlap: 0.1 # 10% overlap
# Data Augmentation
augment: true
augmentation_factor: 3 # Generate 3 augmented versions per patch
# Quality Control
min_building_points: 1000
filter_empty_patches: true
Step 2: Run Complete Pipelineโ
ign-lidar-hd pipeline config.yaml
The pipeline will:
- โ Download tiles from IGN
- โ Enrich with features and optional RGB
- โ Create training patches
- โ Save metadata and statistics
Output Structure:
project/
โโโ config.yaml
โโโ data/
โ โโโ raw/ # Downloaded tiles
โ โโโ enriched/ # Feature-enriched tiles
โ โโโ patches/ # ML-ready patches
โ โโโ LOD2/
โ โ โโโ train/
โ โ โโโ val/
โ โ โโโ test/
โ โโโ metadata.json
โโโ cache/
โโโ orthophotos/ # Cached RGB data
Step 3: Verify Resultsโ
from pathlib import Path
import json
# Load metadata
metadata = json.loads(Path("data/patches/metadata.json").read_text())
print(f"Total patches: {metadata['total_patches']}")
print(f"Classes: {metadata['classes']}")
print(f"Features: {metadata['features']}")
๐ ๏ธ Method 2: Command-Line Step by Stepโ
For more control, run each stage separately.
Stage 1: Download Tilesโ
# Download by bounding box (Paris area)
ign-lidar-hd download \
--bbox 2.3,48.8,2.4,48.9 \
--output data/raw \
--max-tiles 10 \
--strategy urban
# Or download specific tiles
ign-lidar-hd download \
--tiles 0750_6620 0750_6621 0750_6622 \
--output data/raw
Options:
--bbox
: Geographic bounding box (lon_min, lat_min, lon_max, lat_max)--max-tiles
: Limit number of tiles to download--strategy
: Tile selection strategy (urban/building_rich/random)--tiles
: Specific tile IDs to download
Stage 2: Enrich with Featuresโ
# Basic enrichment (CPU only)
ign-lidar-hd enrich \
--input-dir data/raw \
--output data/enriched \
--num-workers 4
# With GPU acceleration
ign-lidar-hd enrich \
--input-dir data/raw \
--output data/enriched \
--use-gpu \
--num-workers 2
# With RGB augmentation
ign-lidar-hd enrich \
--input-dir data/raw \
--output data/enriched \
--add-rgb \
--rgb-cache-dir cache/orthophotos \
--num-workers 4
Options:
--use-gpu
: Enable GPU acceleration (requires CUDA)--add-rgb
: Add RGB colors from IGN orthophotos--rgb-cache-dir
: Cache directory for orthophoto tiles--num-workers
: Number of parallel workers
Stage 3: Create Patchesโ
# Create LOD2 patches (15 classes)
ign-lidar-hd patch \
--input-dir data/enriched \
--output data/patches \
--lod-level LOD2 \
--num-points 16384
# Create LOD3 patches (30+ classes)
ign-lidar-hd patch \
--input-dir data/enriched \
--output data/patches \
--lod-level LOD3 \
--num-points 32768
Options:
--lod-level
: LOD2 (15 classes) or LOD3 (30+ classes)--num-points
: Points per patch (typically 8192-32768)
๐ Method 3: Python APIโ
For maximum flexibility, use the Python API directly.
Complete Workflow Scriptโ
from ign_lidar import LiDARProcessor, TileDownloader, PatchGenerator
from pathlib import Path
# Configuration
bbox = (2.3, 48.8, 2.4, 48.9) # Paris area
raw_dir = Path("data/raw")
enriched_dir = Path("data/enriched")
patches_dir = Path("data/patches")
# Stage 1: Download Tiles
print("๐ฅ Downloading tiles...")
downloader = TileDownloader(output_dir=raw_dir)
tiles = downloader.download_bbox(
bbox=bbox,
max_tiles=10,
strategy="urban"
)
print(f"โ
Downloaded {len(tiles)} tiles")
# Stage 2: Enrich with Features
print("โก Enriching with features...")
processor = LiDARProcessor(
use_gpu=True, # Enable GPU if available
include_rgb=True, # Add RGB colors
rgb_cache_dir=Path("cache/orthophotos"),
num_workers=4
)
enriched_files = []
for tile_path in raw_dir.glob("*.laz"):
output_path = enriched_dir / tile_path.name
processor.enrich(tile_path, output_path)
enriched_files.append(output_path)
print(f" โ {tile_path.name}")
print(f"โ
Enriched {len(enriched_files)} files")
# Stage 3: Create Patches
print("๐ฆ Creating patches...")
generator = PatchGenerator(
lod_level="LOD2",
num_points=16384,
augment=True,
augmentation_factor=3
)
patches = generator.generate_from_directory(
enriched_dir,
patches_dir
)
print(f"โ
Generated {len(patches)} patches")
# Summary
print("\n๐ Summary:")
print(f" Raw tiles: {len(tiles)}")
print(f" Enriched files: {len(enriched_files)}")
print(f" Training patches: {len(patches)}")
Advanced: Custom Feature Extractionโ
from ign_lidar import LiDARProcessor
import numpy as np
# Custom processor with specific features
processor = LiDARProcessor(
lod_level="LOD2",
use_gpu=True,
features={
"normals": True,
"curvature": True,
"planarity": True,
"verticality": True,
"density": True,
"architectural_style": True
},
neighborhood_size=20, # k-nearest neighbors
min_building_height=3.0 # meters
)
# Process with custom filtering
def custom_filter(points):
"""Keep only high-quality points"""
# Remove isolated points
from scipy.spatial import cKDTree
tree = cKDTree(points[:, :3])
distances, _ = tree.query(points[:, :3], k=10)
mask = distances.mean(axis=1) < 2.0 # 2m threshold
return points[mask]
# Apply processing
enriched = processor.enrich(
input_path="data/raw/tile.laz",
output_path="data/enriched/tile.laz",
preprocess_fn=custom_filter
)
๐ Monitoring Progressโ
Real-time Monitoringโ
from ign_lidar import LiDARProcessor
from tqdm import tqdm
processor = LiDARProcessor()
# Progress bar for batch processing
files = list(Path("data/raw").glob("*.laz"))
for file_path in tqdm(files, desc="Processing tiles"):
processor.enrich(file_path, Path("data/enriched") / file_path.name)
Resource Monitoringโ
import psutil
import time
def monitor_resources():
"""Monitor CPU and memory usage"""
process = psutil.Process()
while True:
cpu_percent = process.cpu_percent(interval=1)
memory_mb = process.memory_info().rss / 1024 / 1024
print(f"CPU: {cpu_percent:.1f}% | Memory: {memory_mb:.0f} MB")
time.sleep(5)
# Run in separate thread
import threading
monitor_thread = threading.Thread(target=monitor_resources, daemon=True)
monitor_thread.start()
# Your processing code here
processor.process_directory("data/raw", "data/enriched")
๐ง Troubleshootingโ
Common Issuesโ
1. Out of Memoryโ
Solution: Use chunked processing or reduce batch size:
processor = LiDARProcessor(
chunk_size=1_000_000, # Process 1M points at a time
num_workers=2 # Reduce parallel workers
)
2. GPU Not Detectedโ
Solution: Verify CUDA installation:
# Check CUDA version
nvidia-smi
# Test CuPy
python -c "import cupy; print(cupy.cuda.runtime.getDeviceCount())"
If GPU is not available, the library automatically falls back to CPU processing.
3. RGB Augmentation Failsโ
Solution: Ensure orthophotos are accessible:
from ign_lidar.rgb_augmentation import verify_rgb_source
# Test RGB source
result = verify_rgb_source(
test_tile="0750_6620",
cache_dir=Path("cache/orthophotos")
)
print(f"RGB source valid: {result}")
4. Slow Processingโ
Solution: Enable optimizations:
processor = LiDARProcessor(
use_gpu=True, # Enable GPU
num_workers=8, # Max parallel workers
cache_features=True, # Cache intermediate results
skip_existing=True # Skip already processed files
)
๐ Performance Tipsโ
1. Optimal Worker Countโ
import os
# Use 75% of CPU cores for I/O-bound tasks
optimal_workers = max(1, int(os.cpu_count() * 0.75))
processor = LiDARProcessor(num_workers=optimal_workers)
2. GPU Batch Processingโ
# Process multiple tiles on GPU for better utilization
processor = LiDARProcessor(
use_gpu=True,
gpu_batch_size=4 # Process 4 tiles simultaneously
)
3. Disk I/O Optimizationโ
# Use SSD for intermediate storage
export TMPDIR=/mnt/ssd/tmp
# Or in Python
import tempfile
tempfile.tempdir = "/mnt/ssd/tmp"
๐ Next Stepsโ
- ๐ Analyze generated patches
- ๐งช Train ML models
- ๐จ Visualize results
- โก GPU optimization guide