GPU Acceleration Overview
Available in: v1.3.0+
Performance Boost: 5-10x faster than CPU
Requirements: NVIDIA GPU with CUDA 11.0+
🚧 Major GPU Enhancement in Progress - We're implementing comprehensive GPU acceleration across the entire pipeline. See our detailed roadmap in the "Future Development" section below for upcoming features.
Overview
GPU acceleration can provide 4-10x speedup for feature computation compared to CPU processing, making it essential for large-scale LiDAR datasets and production pipelines.
Benefits
- ⚡ 4-10x faster feature computation
- 🔄 Automatic CPU fallback when GPU unavailable
- 📦 No code changes required - just add a flag
- 🎯 Production-ready with comprehensive error handling
- 💾 Memory efficient with smart batching
GPU acceleration is most beneficial for point clouds with >100K points. For smaller datasets, CPU processing may be faster due to GPU initialization overhead.
Requirements
Hardware Requirements
- GPU: NVIDIA GPU with CUDA support
- Memory: 4GB+ GPU RAM recommended (8GB+ for large tiles)
- Compute Capability: 3.5 or higher
Software Requirements
- CUDA Toolkit: 11.0 or higher (11.8 or 12.x recommended)
- Python: 3.8 or higher
- Python packages: CuPy (required), RAPIDS cuML (optional, better performance)
Tested GPU Models
GPU Model | Memory | Performance | Notes |
---|---|---|---|
RTX 4090 | 24 GB | Excellent | Best performance |
RTX 3080 | 10 GB | Very Good | Good price/performance |
RTX 3060 | 12 GB | Good | Budget-friendly |
Tesla V100 | 16 GB | Very Good | Server/cloud |
GTX 1080 Ti | 11 GB | Moderate | Older generation |
Installation
Step 1: Check CUDA Availability
First, verify you have an NVIDIA GPU and CUDA installed:
# Check if you have an NVIDIA GPU
nvidia-smi
# Should show your GPU info and CUDA version
If nvidia-smi
is not found, you need to install NVIDIA drivers and CUDA Toolkit first.
Step 2: Install CUDA Toolkit
Visit NVIDIA CUDA Downloads and follow instructions for your OS.
Recommended versions:
- CUDA 11.8 (most compatible)
- CUDA 12.x (latest features)
GPU acceleration works on WSL2! Requirements:
- Windows 11 or Windows 10 21H2+
- NVIDIA drivers installed on Windows
- CUDA toolkit installed in WSL2
See NVIDIA WSL guide for details.
Step 3: Install Python GPU Dependencies
CuPy must be installed separately as it requires a specific version matching your CUDA Toolkit. Installing via pip install ign-lidar-hd[gpu]
will not work as it would attempt to build CuPy from source.
# Option 1: Basic GPU support with CuPy (recommended for most users)
pip install ign-lidar-hd
pip install cupy-cuda11x # For CUDA 11.x
# OR
pip install cupy-cuda12x # For CUDA 12.x
# Option 2: Advanced GPU with RAPIDS cuML (best performance)
pip install ign-lidar-hd
pip install cupy-cuda12x # Choose based on your CUDA version
conda install -c rapidsai -c conda-forge -c nvidia cuml
# Option 3: RAPIDS via pip (may require more configuration)
pip install ign-lidar-hd
pip install cupy-cuda11x # For CUDA 11.x
pip install cuml-cu11 # For CUDA 11.x
# OR
pip install cupy-cuda12x # For CUDA 12.x
pip install cuml-cu12 # For CUDA 12.x
Installation Recommendations:
- Install CuPy separately: Always choose
cupy-cuda11x
orcupy-cuda12x
based on your CUDA version - CuPy only: Simplest installation, 5-6x speedup
- CuPy + RAPIDS: Best performance, up to 10x speedup
- Conda for RAPIDS: More reliable for RAPIDS cuML dependencies
Step 4: Verify Installation
from ign_lidar.features_gpu import GPU_AVAILABLE, CUML_AVAILABLE
print(f"GPU (CuPy) available: {GPU_AVAILABLE}")
print(f"RAPIDS cuML available: {CUML_AVAILABLE}")
Expected output:
GPU (CuPy) available: True
RAPIDS cuML available: True
Quick Start
Command Line Interface
Simply add the --use-gpu
flag to any enrich
command:
# Basic usage
ign-lidar-hd enrich \
--input tiles/ \
--output enriched/ \
--use-gpu
# With additional options
ign-lidar-hd enrich \
--input tiles/ \
--output enriched/ \
--use-gpu \
--mode full \
--num-workers 4
The --use-gpu
flag will automatically fall back to CPU if GPU is not available. Your processing will continue without errors.
Python API
Using LiDARProcessor
from pathlib import Path
from ign_lidar.processor import LiDARProcessor
# Create processor with GPU acceleration
processor = LiDARProcessor(
lod_level='LOD2',
patch_size=150.0,
num_points=16384,
use_gpu=True # ⚡ Enable GPU
)
# Process tiles - automatic GPU acceleration
num_patches = processor.process_tile(
laz_file=Path("data/tiles/tile.laz"),
output_dir=Path("data/patches")
)
print(f"Created {num_patches} patches using GPU")
Direct Feature Computation
import numpy as np
from ign_lidar.features import compute_all_features_with_gpu
# Load your point cloud
points = np.random.rand(1000000, 3).astype(np.float32)
classification = np.random.randint(0, 10, 1000000).astype(np.uint8)
# Compute features with GPU
normals, curvature, height, geo_features = compute_all_features_with_gpu(
points=points,
classification=classification,
k=10,
auto_k=False,
use_gpu=True # Enables GPU
)
print(f"Computed {len(normals)} normals on GPU")
Configuration
Python Configuration
from ign_lidar import Config
config = Config(
use_gpu=True,
gpu_memory_limit=0.8, # Use 80% of GPU memory
cuda_device=0 # Use first GPU (if multiple)
)
Environment Variables
# Specify CUDA device (if multiple GPUs)
export CUDA_VISIBLE_DEVICES=0
# Limit GPU memory usage
export CUPY_GPU_MEMORY_LIMIT="8GB"
import os
# Set before importing ign_lidar
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from ign_lidar.processor import LiDARProcessor
When to Use GPU
✅ Use GPU for:
- Large point clouds (>100K points)
- Batch processing of many tiles
- Production pipelines requiring speed
- Real-time or interactive applications
- Processing 10+ tiles
❌ Use CPU for:
- Small point clouds (<10K points)
- One-off processing tasks
- Systems without NVIDIA GPU
- Prototyping and debugging
- Quick tests with 1-2 tiles
Decision Tree
Performance Benchmarks
Expected Speedups
Based on testing with various GPUs:
Point Count | CPU (12 cores) | GPU (RTX 3080) | Speedup |
---|---|---|---|
1K points | 0.02s | 0.01s | 2x |
10K points | 0.15s | 0.03s | 5x |
100K points | 0.50s | 0.08s | 6.3x |
1M points | 4.5s | 0.8s | 5.6x |
10M points | 45s | 8s | 5.6x |
Factors affecting performance:
- GPU model and memory
- Point cloud density and distribution
- K-neighbors parameter (larger = more computation)
- CPU baseline (more cores = smaller relative speedup)
Performance Comparison
Benchmarking Your System
Use the included benchmark script to test GPU vs CPU performance:
# Quick synthetic benchmark
python scripts/benchmarks/benchmark_gpu.py --synthetic
# Benchmark with real data
python scripts/benchmarks/benchmark_gpu.py path/to/file.laz
# Comprehensive multi-size benchmark
python scripts/benchmarks/benchmark_gpu.py --multi-size
Best Practices
Optimizing GPU Performance
- Batch processing: Process multiple tiles in sequence to amortize GPU initialization overhead
- Appropriate k-neighbors: Larger k = more computation benefit from GPU
- Monitor memory: Use
nvidia-smi
to check GPU memory usage - Use workers=1 with GPU: GPU parallelizes internally, multiple workers may compete for GPU resources
Error Handling
The library handles GPU errors gracefully:
# Automatic CPU fallback
processor = LiDARProcessor(use_gpu=True)
# If GPU fails or unavailable:
# - Warning logged
# - Automatically uses CPU
# - Processing continues successfully
Monitoring GPU Usage
Monitor GPU utilization during processing:
# One-time check
nvidia-smi
# Continuous monitoring (updates every second)
watch -n 1 nvidia-smi
# Real-time monitoring
nvidia-smi -l 1
Troubleshooting
"GPU requested but CuPy not available"
Problem: CuPy is not installed or CUDA version mismatch.
Solution:
# Check CUDA version
nvidia-smi
# Install matching CuPy version
pip install cupy-cuda11x # for CUDA 11.x
pip install cupy-cuda12x # for CUDA 12.x
"Out of memory" error
Problem: GPU memory insufficient for point cloud size.
Solutions:
- Process tiles in smaller batches
- Reduce batch size in GPU computer
- Use CPU for very large tiles
# Reduce batch size for large tiles
from ign_lidar.features_gpu import GPUFeatureComputer
computer = GPUFeatureComputer(use_gpu=True, batch_size=50000)
Slow performance on GPU
Possible causes:
- GPU not utilized: Check with
nvidia-smi
- Small point clouds: GPU overhead dominates (use CPU for <10K points)
- Memory transfer bottleneck: Batch multiple operations together
Solutions:
# Monitor GPU usage while processing
watch -n 1 nvidia-smi
# Use GPU for large batches only
# (automatically handled by the library)
CuPy import warnings
Problem: Warnings about CUDA version or cuBLAS libraries.
Solution: Usually safe to ignore if operations complete successfully. To suppress:
import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='cupy')
Troubleshooting Decision Tree
FAQ
Q: Can I use AMD GPUs?
A: Currently only NVIDIA GPUs with CUDA are supported. AMD ROCm support may be added in future versions.
Q: Does GPU work on WSL2?
A: Yes! CUDA support in WSL2 requires:
- Windows 11 or Windows 10 21H2+
- NVIDIA drivers installed on Windows
- CUDA toolkit installed in WSL2
See NVIDIA WSL guide
Q: What about Google Colab / Kaggle?
A: Yes, works great in cloud notebooks with GPU runtime. Example:
# Install in Colab
!pip install ign-lidar-hd[gpu]
# Use GPU (automatically detected)
from ign_lidar.processor import LiDARProcessor
processor = LiDARProcessor(use_gpu=True)
Q: Does this work with TensorFlow/PyTorch?
A: Yes, CuPy and TensorFlow/PyTorch can coexist. They share GPU memory. Monitor usage to avoid OOM errors.
Q: Can I mix CPU and GPU processing?
A: Yes! Use use_gpu=True
for feature computation but other operations (I/O, patch extraction) remain on CPU for optimal performance.
Version Compatibility
ign-lidar-hd | CuPy | CUDA | Python |
---|---|---|---|
1.5.0+ | 10.0+ | 11.0 - 12.x | 3.8+ |
1.3.0+ | 10.0+ | 11.0 - 12.x | 3.8+ |
1.2.1+ | 10.0+ | 11.0+ | 3.8+ |
🚀 Future Development
We're continuously expanding GPU acceleration capabilities:
Phase 3: Advanced GPU Pipeline (In Progress)
- Universal GPU Processing: Full pipeline GPU acceleration
- Multi-GPU Support: Distributed processing across multiple GPUs
- Advanced Algorithms: GPU-based spatial indexing and neighborhood search
- Memory Optimization: Advanced memory pooling and streaming
- Performance Analytics: Real-time GPU performance monitoring
Expected Timeline: Rolling releases throughout 2024-2025
Upcoming Features
- 🔄 GPU Memory Pooling: Reduce allocation overhead
- 📊 GPU Performance Dashboard: Real-time monitoring
- 🌐 Multi-GPU Processing: Parallel tile processing
- ⚡ Streaming Processing: Handle datasets larger than GPU memory
- 🎯 Auto-GPU Selection: Intelligent GPU/CPU task distribution
Follow our GitHub repository for the latest GPU acceleration developments and release announcements.
See Also
- GPU Features - Detailed feature computation and API reference
- RGB GPU Acceleration - GPU-accelerated RGB augmentation (v1.5.0+)
- Architecture - System architecture
- Workflows - GPU workflow examples