GPU Acceleration Overview

Available in: v1.3.0+
Performance Boost: 5-10x faster than CPU
Requirements: NVIDIA GPU with CUDA 11.0+

GPU Development Status

🚧 Major GPU Enhancement in Progress - We're implementing comprehensive GPU acceleration across the entire pipeline. See our detailed roadmap in the "Future Development" section below for upcoming features.

Overview

GPU acceleration can provide 4-10x speedup for feature computation compared to CPU processing, making it essential for large-scale LiDAR datasets and production pipelines.

Benefits

⚡ 4-10x faster feature computation
🔄 Automatic CPU fallback when GPU unavailable
📦 No code changes required - just add a flag
🎯 Production-ready with comprehensive error handling
💾 Memory efficient with smart batching

Performance Gains

GPU acceleration is most beneficial for point clouds with >100K points. For smaller datasets, CPU processing may be faster due to GPU initialization overhead.

Requirements

Hardware Requirements

GPU: NVIDIA GPU with CUDA support
Memory: 4GB+ GPU RAM recommended (8GB+ for large tiles)
Compute Capability: 3.5 or higher

Software Requirements

CUDA Toolkit: 11.0 or higher (11.8 or 12.x recommended)
Python: 3.8 or higher
Python packages: CuPy (required), RAPIDS cuML (optional, better performance)

Tested GPU Models

GPU Model	Memory	Performance	Notes
RTX 4090	24 GB	Excellent	Best performance
RTX 3080	10 GB	Very Good	Good price/performance
RTX 3060	12 GB	Good	Budget-friendly
Tesla V100	16 GB	Very Good	Server/cloud
GTX 1080 Ti	11 GB	Moderate	Older generation

Installation

Step 1: Check CUDA Availability

First, verify you have an NVIDIA GPU and CUDA installed:

# Check if you have an NVIDIA GPU
nvidia-smi

# Should show your GPU info and CUDA version

If nvidia-smi is not found, you need to install NVIDIA drivers and CUDA Toolkit first.

Step 2: Install CUDA Toolkit

Visit NVIDIA CUDA Downloads and follow instructions for your OS.

Recommended versions:

CUDA 11.8 (most compatible)
CUDA 12.x (latest features)

WSL2 Support

GPU acceleration works on WSL2! Requirements:

Windows 11 or Windows 10 21H2+
NVIDIA drivers installed on Windows
CUDA toolkit installed in WSL2

See NVIDIA WSL guide for details.

Step 3: Install Python GPU Dependencies

CuPy Installation

CuPy must be installed separately as it requires a specific version matching your CUDA Toolkit. Installing via pip install ign-lidar-hd[gpu] will not work as it would attempt to build CuPy from source.

# Option 1: Basic GPU support with CuPy (recommended for most users)
pip install ign-lidar-hd
pip install cupy-cuda11x  # For CUDA 11.x
# OR
pip install cupy-cuda12x  # For CUDA 12.x

# Option 2: Advanced GPU with RAPIDS cuML (best performance)
pip install ign-lidar-hd
pip install cupy-cuda12x  # Choose based on your CUDA version
conda install -c rapidsai -c conda-forge -c nvidia cuml

# Option 3: RAPIDS via pip (may require more configuration)
pip install ign-lidar-hd
pip install cupy-cuda11x  # For CUDA 11.x
pip install cuml-cu11     # For CUDA 11.x
# OR
pip install cupy-cuda12x  # For CUDA 12.x
pip install cuml-cu12     # For CUDA 12.x

Installation Recommendations:

Install CuPy separately: Always choose cupy-cuda11x or cupy-cuda12x based on your CUDA version
CuPy only: Simplest installation, 5-6x speedup
CuPy + RAPIDS: Best performance, up to 10x speedup
Conda for RAPIDS: More reliable for RAPIDS cuML dependencies

Step 4: Verify Installation

from ign_lidar.features_gpu import GPU_AVAILABLE, CUML_AVAILABLE

print(f"GPU (CuPy) available: {GPU_AVAILABLE}")
print(f"RAPIDS cuML available: {CUML_AVAILABLE}")

Expected output:

GPU (CuPy) available: True
RAPIDS cuML available: True

Quick Start

Command Line Interface

Simply add the --use-gpu flag to any enrich command:

# Basic usage
ign-lidar-hd enrich \
  --input tiles/ \
  --output enriched/ \
  --use-gpu

# With additional options
ign-lidar-hd enrich \
  --input tiles/ \
  --output enriched/ \
  --use-gpu \
  --mode full \
  --num-workers 4

Automatic Fallback

The --use-gpu flag will automatically fall back to CPU if GPU is not available. Your processing will continue without errors.

Python API

Using LiDARProcessor

from pathlib import Path
from ign_lidar.processor import LiDARProcessor

# Create processor with GPU acceleration
processor = LiDARProcessor(
    lod_level='LOD2',
    patch_size=150.0,
    num_points=16384,
    use_gpu=True  # ⚡ Enable GPU
)

# Process tiles - automatic GPU acceleration
num_patches = processor.process_tile(
    laz_file=Path("data/tiles/tile.laz"),
    output_dir=Path("data/patches")
)

print(f"Created {num_patches} patches using GPU")

Direct Feature Computation

import numpy as np
from ign_lidar.features import compute_all_features_with_gpu

# Load your point cloud
points = np.random.rand(1000000, 3).astype(np.float32)
classification = np.random.randint(0, 10, 1000000).astype(np.uint8)

# Compute features with GPU
normals, curvature, height, geo_features = compute_all_features_with_gpu(
    points=points,
    classification=classification,
    k=10,
    auto_k=False,
    use_gpu=True  # Enables GPU
)

print(f"Computed {len(normals)} normals on GPU")

Configuration

Python Configuration

from ign_lidar import Config

config = Config(
    use_gpu=True,
    gpu_memory_limit=0.8,  # Use 80% of GPU memory
    cuda_device=0  # Use first GPU (if multiple)
)

Environment Variables

# Specify CUDA device (if multiple GPUs)
export CUDA_VISIBLE_DEVICES=0

# Limit GPU memory usage
export CUPY_GPU_MEMORY_LIMIT="8GB"

import os

# Set before importing ign_lidar
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from ign_lidar.processor import LiDARProcessor

When to Use GPU

✅ Use GPU for:

Large point clouds (>100K points)
Batch processing of many tiles
Production pipelines requiring speed
Real-time or interactive applications
Processing 10+ tiles

❌ Use CPU for:

Small point clouds (<10K points)
One-off processing tasks
Systems without NVIDIA GPU
Prototyping and debugging
Quick tests with 1-2 tiles

Decision Tree

Performance Benchmarks

Expected Speedups

Based on testing with various GPUs:

Point Count	CPU (12 cores)	GPU (RTX 3080)	Speedup
1K points	0.02s	0.01s	2x
10K points	0.15s	0.03s	5x
100K points	0.50s	0.08s	6.3x
1M points	4.5s	0.8s	5.6x
10M points	45s	8s	5.6x

Factors affecting performance:

GPU model and memory
Point cloud density and distribution
K-neighbors parameter (larger = more computation)
CPU baseline (more cores = smaller relative speedup)

Performance Comparison

Benchmarking Your System

Use the included benchmark script to test GPU vs CPU performance:

# Quick synthetic benchmark
python scripts/benchmarks/benchmark_gpu.py --synthetic

# Benchmark with real data
python scripts/benchmarks/benchmark_gpu.py path/to/file.laz

# Comprehensive multi-size benchmark
python scripts/benchmarks/benchmark_gpu.py --multi-size

Best Practices

Optimizing GPU Performance

Batch processing: Process multiple tiles in sequence to amortize GPU initialization overhead
Appropriate k-neighbors: Larger k = more computation benefit from GPU
Monitor memory: Use nvidia-smi to check GPU memory usage
Use workers=1 with GPU: GPU parallelizes internally, multiple workers may compete for GPU resources

Error Handling

The library handles GPU errors gracefully:

# Automatic CPU fallback
processor = LiDARProcessor(use_gpu=True)

# If GPU fails or unavailable:
# - Warning logged
# - Automatically uses CPU
# - Processing continues successfully

Monitoring GPU Usage

Monitor GPU utilization during processing:

# One-time check
nvidia-smi

# Continuous monitoring (updates every second)
watch -n 1 nvidia-smi

# Real-time monitoring
nvidia-smi -l 1

Troubleshooting

"GPU requested but CuPy not available"

Problem: CuPy is not installed or CUDA version mismatch.

Solution:

# Check CUDA version
nvidia-smi

# Install matching CuPy version
pip install cupy-cuda11x  # for CUDA 11.x
pip install cupy-cuda12x  # for CUDA 12.x

"Out of memory" error

Problem: GPU memory insufficient for point cloud size.

Solutions:

Process tiles in smaller batches
Reduce batch size in GPU computer
Use CPU for very large tiles

# Reduce batch size for large tiles
from ign_lidar.features_gpu import GPUFeatureComputer

computer = GPUFeatureComputer(use_gpu=True, batch_size=50000)

Slow performance on GPU

Possible causes:

GPU not utilized: Check with nvidia-smi
Small point clouds: GPU overhead dominates (use CPU for <10K points)
Memory transfer bottleneck: Batch multiple operations together

Solutions:

# Monitor GPU usage while processing
watch -n 1 nvidia-smi

# Use GPU for large batches only
# (automatically handled by the library)

CuPy import warnings

Problem: Warnings about CUDA version or cuBLAS libraries.

Solution: Usually safe to ignore if operations complete successfully. To suppress:

import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='cupy')

Troubleshooting Decision Tree

FAQ

Q: Can I use AMD GPUs?

A: Currently only NVIDIA GPUs with CUDA are supported. AMD ROCm support may be added in future versions.

Q: Does GPU work on WSL2?

A: Yes! CUDA support in WSL2 requires:

Windows 11 or Windows 10 21H2+
NVIDIA drivers installed on Windows
CUDA toolkit installed in WSL2

See NVIDIA WSL guide

Q: What about Google Colab / Kaggle?

A: Yes, works great in cloud notebooks with GPU runtime. Example:

# Install in Colab
!pip install ign-lidar-hd[gpu]

# Use GPU (automatically detected)
from ign_lidar.processor import LiDARProcessor
processor = LiDARProcessor(use_gpu=True)

Q: Does this work with TensorFlow/PyTorch?

A: Yes, CuPy and TensorFlow/PyTorch can coexist. They share GPU memory. Monitor usage to avoid OOM errors.

Q: Can I mix CPU and GPU processing?

A: Yes! Use use_gpu=True for feature computation but other operations (I/O, patch extraction) remain on CPU for optimal performance.

Version Compatibility

ign-lidar-hd	CuPy	CUDA	Python
1.5.0+	10.0+	11.0 - 12.x	3.8+
1.3.0+	10.0+	11.0 - 12.x	3.8+
1.2.1+	10.0+	11.0+	3.8+

🚀 Future Development

We're continuously expanding GPU acceleration capabilities:

Phase 3: Advanced GPU Pipeline (In Progress)

Universal GPU Processing: Full pipeline GPU acceleration
Multi-GPU Support: Distributed processing across multiple GPUs
Advanced Algorithms: GPU-based spatial indexing and neighborhood search
Memory Optimization: Advanced memory pooling and streaming
Performance Analytics: Real-time GPU performance monitoring

Expected Timeline: Rolling releases throughout 2024-2025

Upcoming Features

🔄 GPU Memory Pooling: Reduce allocation overhead
📊 GPU Performance Dashboard: Real-time monitoring
🌐 Multi-GPU Processing: Parallel tile processing
⚡ Streaming Processing: Handle datasets larger than GPU memory
🎯 Auto-GPU Selection: Intelligent GPU/CPU task distribution

Stay Updated

Follow our GitHub repository for the latest GPU acceleration developments and release announcements.

Overview​

Benefits​

Requirements​

Hardware Requirements​

Software Requirements​

Tested GPU Models​

Installation​

Step 1: Check CUDA Availability​

Step 2: Install CUDA Toolkit​

Step 3: Install Python GPU Dependencies​

Step 4: Verify Installation​

Quick Start​

Command Line Interface​

Python API​

Using LiDARProcessor​

Direct Feature Computation​

Configuration​

Python Configuration​

Environment Variables​

When to Use GPU​

✅ Use GPU for:​

❌ Use CPU for:​

Decision Tree​

Performance Benchmarks​

Expected Speedups​

Performance Comparison​

Benchmarking Your System​

Best Practices​

Optimizing GPU Performance​

Error Handling​

Monitoring GPU Usage​

Troubleshooting​

"GPU requested but CuPy not available"​

"Out of memory" error​

Slow performance on GPU​

CuPy import warnings​

Troubleshooting Decision Tree​

FAQ​

Q: Can I use AMD GPUs?​

Q: Does GPU work on WSL2?​

Q: What about Google Colab / Kaggle?​

Q: Does this work with TensorFlow/PyTorch?​

Q: Can I mix CPU and GPU processing?​

Version Compatibility​

🚀 Future Development​

Phase 3: Advanced GPU Pipeline (In Progress)​

Upcoming Features​

See Also​

External Resources​