Aller au contenu principal

GPU Acceleration

GPU acceleration significantly speeds up LiDAR processing workflows, providing 6-20x speedup for large-scale datasets and complex feature extraction tasks.

Overview​

The IGN LiDAR HD processor supports GPU acceleration with three performance modes:

  1. CPU-Only: Standard processing (no GPU required)
  2. Hybrid Mode (CuPy): GPU arrays + CPU algorithms (6-8x speedup)
  3. Full GPU Mode (RAPIDS cuML): Complete GPU pipeline (12-20x speedup)

The hybrid mode uses an intelligent per-chunk KDTree strategy that avoids global tree construction bottlenecks, delivering excellent performance even without RAPIDS cuML.

Supported Operations​

  • Geometric Feature Extraction: Surface normals, curvature, planarity, verticality
  • KNN Search: GPU-accelerated k-nearest neighbors (with RAPIDS cuML)
  • PCA Computation: GPU-based principal component analysis (with RAPIDS cuML)
  • Point Cloud Filtering: Parallel preprocessing and noise reduction
  • RGB/NIR Augmentation: GPU-optimized orthophoto integration

πŸš€ Performance Benchmarks​

Real-World Results (17M points, NVIDIA RTX 4080 16GB)​

Current Performance (Optimized):

ModeProcessing TimeSpeedupRequirements
CPU-Only60 min β†’ 12 min5xNone (optimized!)
Hybrid (CuPy + sklearn)7-10 min β†’ 2 min25-30xCuPy + CUDA 12.0+
Full GPU (RAPIDS cuML)3-5 min β†’ 1-2 min30-60xRAPIDS cuML + CUDA 12.0+
Automatic Performance Optimizations

IGN LiDAR HD includes major performance optimizations that benefit all modes (CPU, Hybrid, Full GPU). Per-chunk KDTree strategy and smaller chunk sizes provide 5-10x speedup automatically! These optimizations have been available since v1.7.5 and continue in v2.0+.

Operation Breakdown​

OperationCPU TimeHybrid GPUFull GPUBest Speedup
Feature Extraction45 min8 min3 min15x
KNN Search30 min15 min2 min15x
PCA Computation10 min8 min1 min10x
Batch Processing120 min20 min8 min15x

πŸ”§ Setup Requirements​

Hardware Requirements​

  • GPU: NVIDIA GPU with CUDA Compute Capability 6.0+ (Pascal or newer)
  • Memory: Minimum 4GB VRAM (8GB+ recommended, 16GB for large tiles)
  • Driver: CUDA 12.0+ compatible NVIDIA driver
  • System: 32GB+ RAM recommended for processing large tiles
  • Budget: NVIDIA RTX 3060 12GB
  • Optimal: NVIDIA RTX 4070/4080 16GB
  • Professional: NVIDIA A6000 48GB

πŸ“¦ Installation Options​

Option 1: Hybrid Mode (CuPy Only) - Quick Start​

Best for: Quick setup, testing, or when RAPIDS cuML isn't available

# Install CuPy for your CUDA version
pip install cupy-cuda12x # For CUDA 12.x
# OR
pip install cupy-cuda11x # For CUDA 11.x

# Verify GPU availability
python -c "import cupy as cp; print(cp.cuda.runtime.getDeviceCount(), 'GPU(s) found')"

Performance: 6-8x speedup (uses GPU arrays with CPU sklearn algorithms via per-chunk optimization)

Option 2: Full GPU Mode (RAPIDS cuML) - Maximum Performance​

Best for: Production workloads, large-scale processing, maximum speed

# Quick install (recommended - uses provided script)
./install_cuml.sh

# Or manual installation:
# Create conda environment (required for RAPIDS)
conda create -n ign_gpu python=3.12 -y
conda activate ign_gpu

# Install RAPIDS cuML (includes CuPy)
conda install -c rapidsai -c conda-forge -c nvidia \
cuml=24.10 cupy cuda-version=12.5 -y

# Install IGN LiDAR HD
pip install ign-lidar-hd

# Verify installation
**Verification script:**

```bash
python scripts/verify_gpu_setup.py

**Performance**: 15-20x speedup (complete GPU pipeline)

### Option 3: Automated Installation Script

For WSL2/Linux systems, use our automated installation script:

```bash
# Download and run the installation script
wget https://raw.githubusercontent.com/sducournau/IGN_LIDAR_HD_DATASET/main/install_cuml.sh
chmod +x install_cuml.sh
./install_cuml.sh

The script will:

  • Install Miniconda (if needed)
  • Create ign_gpu conda environment
  • Install RAPIDS cuML + all dependencies
  • Configure CUDA paths

Verifying Installation​

# Check GPU detection
ign-lidar-hd --version

# Test GPU processing (use a small tile)
ign-lidar-hd enrich --input test.laz --output test_enriched.laz --use-gpu

πŸ“– Usage Guide​

Command Line Interface​

The easiest way to use GPU acceleration is via the CLI:

# Basic GPU processing
ign-lidar-hd enrich --input-dir data/ --output enriched/ --use-gpu

# Full-featured GPU processing with all options
ign-lidar-hd enrich \
--input-dir data/ \
--output enriched/ \
--use-gpu \
--auto-params \
--preprocess \
--add-rgb \
--add-infrared \
--rgb-cache-dir cache/rgb \
--infrared-cache-dir cache/infrared

# Process specific tiles
ign-lidar-hd enrich \
--input tile1.laz tile2.laz \
--output enriched/ \
--use-gpu \
--force # Reprocess even if outputs exist

Python API​

from ign_lidar import LiDARProcessor

# Initialize with GPU support
processor = LiDARProcessor(
lod_level="LOD2",
use_gpu=True,
num_workers=4
)

# Process a single tile
patches = processor.process_tile(
"data/tile.laz",
"output/",
enable_rgb=True
)

# Process directory with GPU
patches = processor.process_directory(
"data/",
"output/",
num_workers=4
)

Pipeline Configuration (YAML)​

global:
num_workers: 4

enrich:
input_dir: "data/raw"
output: "data/enriched"
use_gpu: true
auto_params: true
preprocess: true
add_rgb: true
add_infrared: true
rgb_cache_dir: "cache/rgb"
infrared_cache_dir: "cache/infrared"

patch:
input_dir: "data/enriched"
output: "data/patches"
lod_level: "LOD2"

Then run: ign-lidar-hd pipeline config.yaml

πŸ› Troubleshooting​

Common Issues​

GPU Not Detected​

Symptoms: Message "GPU not available, falling back to CPU"

Solutions:

# 1. Check if GPU is visible
nvidia-smi

# 2. Verify CUDA installation
python -c "import cupy as cp; print(cp.cuda.runtime.getDeviceCount())"

# 3. Check CUDA version compatibility
python -c "import cupy; print('CuPy CUDA version:', cupy.cuda.runtime.runtimeGetVersion())"

# 4. Verify LD_LIBRARY_PATH (Linux/WSL2)
echo $LD_LIBRARY_PATH # Should include /usr/local/cuda-XX.X/lib64

CuPy Installation Issues​

Problem: CuPy not finding CUDA libraries

WSL2 Solution:

# Install CUDA Toolkit
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get install cuda-toolkit-13-0

# Add to ~/.zshrc or ~/.bashrc
export PATH=/usr/local/cuda-13.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-13.0/lib64:$LD_LIBRARY_PATH

# Reload and test
source ~/.zshrc
python -c "import cupy; print('CuPy working!')"

RAPIDS cuML Installation Issues​

Problem: Conda TOS errors during installation

Solution:

# Accept conda Terms of Service
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

# Then retry installation
conda install -c rapidsai -c conda-forge -c nvidia cuml=24.10 -y

CUDA Out of Memory​

Symptoms: RuntimeError: CUDA out of memory

Solutions:

  1. Process smaller tiles: Split large files into smaller chunks
  2. Reduce chunk size: The processor automatically chunks large point clouds
  3. Close other GPU applications: Free up VRAM
  4. Use a GPU with more memory: 16GB+ recommended for large tiles
# Monitor GPU memory usage
watch -n 1 nvidia-smi

Slow Performance Despite GPU​

Possible causes:

  1. Using Hybrid Mode instead of Full GPU: Install RAPIDS cuML for maximum speed
  2. Thermal throttling: Check GPU temperature with nvidia-smi
  3. PCIe bandwidth: Ensure GPU is in x16 slot
  4. CPU bottleneck: Use --num-workers to parallelize I/O

Check GPU utilization:

# Monitor GPU usage during processing
nvidia-smi dmon -s u

Per-Chunk vs Global KDTree​

The system automatically selects the best strategy:

  • With RAPIDS cuML: Uses global KDTree on GPU (fastest, 15-20x speedup)
  • Without cuML: Uses per-chunk KDTree with CPU sklearn (still fast, 5-10x speedup)

You'll see different log messages:

# With cuML (fastest)
βœ“ RAPIDS cuML available - GPU algorithms enabled
Computing normals with GPU-accelerated KDTree (global)

# Without cuML (still fast)
⚠ RAPIDS cuML not available - using per-chunk CPU KDTree
Computing normals with per-chunk KDTree (5% overlap)

Automatic CPU Fallback​

The system automatically falls back to CPU processing if GPU is unavailable:

  • CuPy import fails β†’ CPU mode
  • CUDA runtime error β†’ CPU mode
  • Insufficient GPU memory β†’ CPU mode (with warning)

Disabling GPU (force CPU):

ign-lidar-hd enrich --input-dir data/ --output enriched/  # No --use-gpu flag

πŸ“‹ Detailed Benchmarks​

Test Environment​

  • GPU: NVIDIA RTX 4080 (16GB VRAM)
  • CPU: AMD Ryzen 9 / Intel i7 equivalent
  • System: WSL2 Ubuntu 24.04, 32GB RAM
  • CUDA: 13.0
  • Test Tile: 17M points (typical IGN LiDAR HD tile)

Processing Time Comparison​

ConfigurationProcessing TimeSpeedupNotes
CPU-Only (sklearn)60 min1xBaseline
Hybrid (CuPy + sklearn)7-10 min6-8xPer-chunk KDTree optimization
Full GPU (RAPIDS cuML)3-5 min12-20xGlobal GPU KDTree

Feature Extraction Breakdown​

OperationCPUHybrid GPUFull GPUBest Speedup
Normal Computation25 min4 min1.5 min16x
KNN Search20 min12 min1 min20x
PCA (eigenvalues)8 min6 min0.5 min16x
Curvature Calculation5 min2 min0.5 min10x
Other Features2 min1 min0.5 min4x

Memory Usage​

ModeGPU MemorySystem RAMTotal
CPU-Only0 GB24 GB24 GB
Hybrid (CuPy + sklearn)6 GB16 GB22 GB
Full GPU (RAPIDS cuML)8 GB12 GB20 GB

Batch Processing (100 tiles)​

  • CPU-Only: ~100 hours
  • Hybrid Mode: ~14 hours (7x speedup)
  • Full GPU Mode: ~6 hours (16x speedup)

Accuracy Validation​

All three modes produce identical results (verified with feature correlation > 0.9999).

πŸ’‘ Best Practices​

1. Choose the Right Mode​

  • Development/Testing: Hybrid mode (easy setup, good performance)
  • Production: Full GPU mode with RAPIDS cuML (maximum performance)
  • No GPU: CPU mode works fine for small batches

2. Optimize Your Workflow​

# Recommended pipeline configuration for GPU
global:
num_workers: 4 # Parallelize I/O while GPU processes

enrich:
use_gpu: true
auto_params: true # Let the system optimize parameters
preprocess: true # Clean data before feature extraction

3. Monitor Resources​

# Watch GPU usage in real-time
watch -n 1 nvidia-smi

# Monitor with detailed metrics
nvidia-smi dmon -s pucvmet -d 1

4. Batch Processing Tips​

  • Use --force cautiously: Only reprocess when needed
  • Enable smart caching: Use --rgb-cache-dir and --infrared-cache-dir
  • Parallelize I/O: Use --num-workers for concurrent file operations
  • Process strategically: Start with urban tiles (higher point density) to test settings

5. Hardware Recommendations​

Use CaseMinimum GPURecommended GPUOptimal GPU
Learning/Small datasetsGTX 1660 6GBRTX 3060 12GBRTX 4060 Ti 16GB
Production/Medium batchesRTX 3060 12GBRTX 4070 12GBRTX 4080 16GB
Large-scale processingRTX 3080 10GBRTX 4080 16GBA6000 48GB

πŸŽ“ Advanced Topics​

Per-Chunk Optimization Strategy​

When RAPIDS cuML is not available, the system uses an intelligent per-chunk strategy:

  1. Splits point cloud into ~5M point chunks
  2. Builds local KDTree per chunk (fast with sklearn)
  3. Uses 5% overlap between chunks to handle edge cases
  4. Merges results seamlessly

This provides 80-90% of GPU performance without requiring RAPIDS cuML installation.

GPU Memory Management​

The system automatically manages GPU memory:

  • Automatic chunking: Large point clouds split into GPU-sized chunks
  • Memory pooling: CuPy reuses allocated memory
  • Garbage collection: Frees memory between tiles
  • Fallback handling: Gracefully handles OOM errors

Multi-GPU Support​

Currently, the library uses a single GPU (device 0). For multi-GPU processing:

# Process different directories on different GPUs
CUDA_VISIBLE_DEVICES=0 ign-lidar-hd enrich --input dir1/ --output out1/ --use-gpu &
CUDA_VISIBLE_DEVICES=1 ign-lidar-hd enrich --input dir2/ --output out2/ --use-gpu &

For more advanced GPU optimization techniques, see the Performance Guide.