Aller au contenu principal

Complete Workflow Guide

This guide walks you through the entire process of transforming raw IGN LiDAR HD data into machine learning-ready datasets.

๐Ÿ“‹ Overviewโ€‹

The complete workflow consists of three main stages:

  1. Download - Acquire LiDAR tiles from IGN servers
  2. Enrich - Add geometric features and optional RGB data
  3. Patch - Create training-ready patches for ML models

๐ŸŽฏ Prerequisitesโ€‹

Requiredโ€‹

  • Python 3.8 or higher
  • ign-lidar-hd package installed
  • Internet connection (for downloading tiles)
  • ~10GB free disk space per 10 tiles

Optionalโ€‹

  • NVIDIA GPU with CUDA support (for 5-10x speedup)
  • IGN BD ORTHOยฎ orthophotos (for RGB augmentation)

The easiest way to run a complete workflow is using YAML configuration files.

Step 1: Create Configuration Fileโ€‹

ign-lidar-hd pipeline config.yaml --create-example full

This creates a config.yaml file with all options:

# config.yaml - Complete Pipeline Configuration
global:
num_workers: 4 # Parallel processing threads
verbose: true # Detailed logging

download:
# Bounding box: longitude_min, latitude_min, longitude_max, latitude_max
bbox: "2.3, 48.8, 2.4, 48.9" # Paris area
output: "data/raw"
max_tiles: 10
tile_selection_strategy: "urban" # or "building_rich", "random"

enrich:
input_dir: "data/raw"
output: "data/enriched"
mode: "full" # Focus on building features

# RGB Augmentation (optional)
add_rgb: true
rgb_source: "ign_orthophoto"
rgb_cache_dir: "cache/orthophotos"

# GPU Acceleration (optional)
use_gpu: true # Auto-fallback to CPU if GPU unavailable

# Feature Extraction
compute_normals: true
compute_curvature: true
neighborhood_size: 20

patch:
input_dir: "data/enriched"
output: "data/patches"
lod_level: "LOD2" # or "LOD3"
num_points: 16384
patch_size: 150 # meters
overlap: 0.1 # 10% overlap

# Data Augmentation
augment: true
augmentation_factor: 3 # Generate 3 augmented versions per patch

# Quality Control
min_building_points: 1000
filter_empty_patches: true

Step 2: Run Complete Pipelineโ€‹

ign-lidar-hd pipeline config.yaml

The pipeline will:

  1. โœ… Download tiles from IGN
  2. โœ… Enrich with features and optional RGB
  3. โœ… Create training patches
  4. โœ… Save metadata and statistics

Output Structure:

project/
โ”œโ”€โ”€ config.yaml
โ”œโ”€โ”€ data/
โ”‚ โ”œโ”€โ”€ raw/ # Downloaded tiles
โ”‚ โ”œโ”€โ”€ enriched/ # Feature-enriched tiles
โ”‚ โ””โ”€โ”€ patches/ # ML-ready patches
โ”‚ โ”œโ”€โ”€ LOD2/
โ”‚ โ”‚ โ”œโ”€โ”€ train/
โ”‚ โ”‚ โ”œโ”€โ”€ val/
โ”‚ โ”‚ โ””โ”€โ”€ test/
โ”‚ โ””โ”€โ”€ metadata.json
โ””โ”€โ”€ cache/
โ””โ”€โ”€ orthophotos/ # Cached RGB data

Step 3: Verify Resultsโ€‹

from pathlib import Path
import json

# Load metadata
metadata = json.loads(Path("data/patches/metadata.json").read_text())

print(f"Total patches: {metadata['total_patches']}")
print(f"Classes: {metadata['classes']}")
print(f"Features: {metadata['features']}")

๐Ÿ› ๏ธ Method 2: Command-Line Step by Stepโ€‹

For more control, run each stage separately.

Stage 1: Download Tilesโ€‹

# Download by bounding box (Paris area)
ign-lidar-hd download \
--bbox 2.3,48.8,2.4,48.9 \
--output data/raw \
--max-tiles 10 \
--strategy urban

# Or download specific tiles
ign-lidar-hd download \
--tiles 0750_6620 0750_6621 0750_6622 \
--output data/raw

Options:

  • --bbox: Geographic bounding box (lon_min, lat_min, lon_max, lat_max)
  • --max-tiles: Limit number of tiles to download
  • --strategy: Tile selection strategy (urban/building_rich/random)
  • --tiles: Specific tile IDs to download

Stage 2: Enrich with Featuresโ€‹

# Basic enrichment (CPU only)
ign-lidar-hd enrich \
--input-dir data/raw \
--output data/enriched \
--num-workers 4

# With GPU acceleration
ign-lidar-hd enrich \
--input-dir data/raw \
--output data/enriched \
--use-gpu \
--num-workers 2

# With RGB augmentation
ign-lidar-hd enrich \
--input-dir data/raw \
--output data/enriched \
--add-rgb \
--rgb-cache-dir cache/orthophotos \
--num-workers 4

Options:

  • --use-gpu: Enable GPU acceleration (requires CUDA)
  • --add-rgb: Add RGB colors from IGN orthophotos
  • --rgb-cache-dir: Cache directory for orthophoto tiles
  • --num-workers: Number of parallel workers

Stage 3: Create Patchesโ€‹

# Create LOD2 patches (15 classes)
ign-lidar-hd patch \
--input-dir data/enriched \
--output data/patches \
--lod-level LOD2 \
--num-points 16384

# Create LOD3 patches (30+ classes)
ign-lidar-hd patch \
--input-dir data/enriched \
--output data/patches \
--lod-level LOD3 \
--num-points 32768

Options:

  • --lod-level: LOD2 (15 classes) or LOD3 (30+ classes)
  • --num-points: Points per patch (typically 8192-32768)

๐Ÿ Method 3: Python APIโ€‹

For maximum flexibility, use the Python API directly.

Complete Workflow Scriptโ€‹

from ign_lidar import LiDARProcessor, TileDownloader, PatchGenerator
from pathlib import Path

# Configuration
bbox = (2.3, 48.8, 2.4, 48.9) # Paris area
raw_dir = Path("data/raw")
enriched_dir = Path("data/enriched")
patches_dir = Path("data/patches")

# Stage 1: Download Tiles
print("๐Ÿ“ฅ Downloading tiles...")
downloader = TileDownloader(output_dir=raw_dir)
tiles = downloader.download_bbox(
bbox=bbox,
max_tiles=10,
strategy="urban"
)
print(f"โœ… Downloaded {len(tiles)} tiles")

# Stage 2: Enrich with Features
print("โšก Enriching with features...")
processor = LiDARProcessor(
use_gpu=True, # Enable GPU if available
include_rgb=True, # Add RGB colors
rgb_cache_dir=Path("cache/orthophotos"),
num_workers=4
)

enriched_files = []
for tile_path in raw_dir.glob("*.laz"):
output_path = enriched_dir / tile_path.name
processor.enrich(tile_path, output_path)
enriched_files.append(output_path)
print(f" โœ“ {tile_path.name}")

print(f"โœ… Enriched {len(enriched_files)} files")

# Stage 3: Create Patches
print("๐Ÿ“ฆ Creating patches...")
generator = PatchGenerator(
lod_level="LOD2",
num_points=16384,
augment=True,
augmentation_factor=3
)

patches = generator.generate_from_directory(
enriched_dir,
patches_dir
)
print(f"โœ… Generated {len(patches)} patches")

# Summary
print("\n๐Ÿ“Š Summary:")
print(f" Raw tiles: {len(tiles)}")
print(f" Enriched files: {len(enriched_files)}")
print(f" Training patches: {len(patches)}")

Advanced: Custom Feature Extractionโ€‹

from ign_lidar import LiDARProcessor
import numpy as np

# Custom processor with specific features
processor = LiDARProcessor(
lod_level="LOD2",
use_gpu=True,
features={
"normals": True,
"curvature": True,
"planarity": True,
"verticality": True,
"density": True,
"architectural_style": True
},
neighborhood_size=20, # k-nearest neighbors
min_building_height=3.0 # meters
)

# Process with custom filtering
def custom_filter(points):
"""Keep only high-quality points"""
# Remove isolated points
from scipy.spatial import cKDTree
tree = cKDTree(points[:, :3])
distances, _ = tree.query(points[:, :3], k=10)
mask = distances.mean(axis=1) < 2.0 # 2m threshold
return points[mask]

# Apply processing
enriched = processor.enrich(
input_path="data/raw/tile.laz",
output_path="data/enriched/tile.laz",
preprocess_fn=custom_filter
)

๐Ÿ“Š Monitoring Progressโ€‹

Real-time Monitoringโ€‹

from ign_lidar import LiDARProcessor
from tqdm import tqdm

processor = LiDARProcessor()

# Progress bar for batch processing
files = list(Path("data/raw").glob("*.laz"))
for file_path in tqdm(files, desc="Processing tiles"):
processor.enrich(file_path, Path("data/enriched") / file_path.name)

Resource Monitoringโ€‹

import psutil
import time

def monitor_resources():
"""Monitor CPU and memory usage"""
process = psutil.Process()

while True:
cpu_percent = process.cpu_percent(interval=1)
memory_mb = process.memory_info().rss / 1024 / 1024
print(f"CPU: {cpu_percent:.1f}% | Memory: {memory_mb:.0f} MB")
time.sleep(5)

# Run in separate thread
import threading
monitor_thread = threading.Thread(target=monitor_resources, daemon=True)
monitor_thread.start()

# Your processing code here
processor.process_directory("data/raw", "data/enriched")

๐Ÿ”ง Troubleshootingโ€‹

Common Issuesโ€‹

1. Out of Memoryโ€‹

Solution: Use chunked processing or reduce batch size:

processor = LiDARProcessor(
chunk_size=1_000_000, # Process 1M points at a time
num_workers=2 # Reduce parallel workers
)

2. GPU Not Detectedโ€‹

Solution: Verify CUDA installation:

# Check CUDA version
nvidia-smi

# Test CuPy
python -c "import cupy; print(cupy.cuda.runtime.getDeviceCount())"

If GPU is not available, the library automatically falls back to CPU processing.

3. RGB Augmentation Failsโ€‹

Solution: Ensure orthophotos are accessible:

from ign_lidar.rgb_augmentation import verify_rgb_source

# Test RGB source
result = verify_rgb_source(
test_tile="0750_6620",
cache_dir=Path("cache/orthophotos")
)
print(f"RGB source valid: {result}")

4. Slow Processingโ€‹

Solution: Enable optimizations:

processor = LiDARProcessor(
use_gpu=True, # Enable GPU
num_workers=8, # Max parallel workers
cache_features=True, # Cache intermediate results
skip_existing=True # Skip already processed files
)

๐Ÿ“ˆ Performance Tipsโ€‹

1. Optimal Worker Countโ€‹

import os

# Use 75% of CPU cores for I/O-bound tasks
optimal_workers = max(1, int(os.cpu_count() * 0.75))

processor = LiDARProcessor(num_workers=optimal_workers)

2. GPU Batch Processingโ€‹

# Process multiple tiles on GPU for better utilization
processor = LiDARProcessor(
use_gpu=True,
gpu_batch_size=4 # Process 4 tiles simultaneously
)

3. Disk I/O Optimizationโ€‹

# Use SSD for intermediate storage
export TMPDIR=/mnt/ssd/tmp

# Or in Python
import tempfile
tempfile.tempdir = "/mnt/ssd/tmp"

๐ŸŽ“ Next Stepsโ€‹

๐Ÿ“š Further Readingโ€‹