Skip to main content

GPU Acceleration Overview

Available in: v1.3.0+
Performance Boost: 5-10x faster than CPU
Requirements: NVIDIA GPU with CUDA 11.0+

GPU Development Status

🚧 Major GPU Enhancement in Progress - We're implementing comprehensive GPU acceleration across the entire pipeline. See our detailed roadmap in the "Future Development" section below for upcoming features.

Overview​

GPU acceleration can provide 4-10x speedup for feature computation compared to CPU processing, making it essential for large-scale LiDAR datasets and production pipelines.

Benefits​

  • ⚑ 4-10x faster feature computation
  • πŸ”„ Automatic CPU fallback when GPU unavailable
  • πŸ“¦ No code changes required - just add a flag
  • 🎯 Production-ready with comprehensive error handling
  • πŸ’Ύ Memory efficient with smart batching
Performance Gains

GPU acceleration is most beneficial for point clouds with >100K points. For smaller datasets, CPU processing may be faster due to GPU initialization overhead.

Requirements​

Hardware Requirements​

  • GPU: NVIDIA GPU with CUDA support
  • Memory: 4GB+ GPU RAM recommended (8GB+ for large tiles)
  • Compute Capability: 3.5 or higher

Software Requirements​

  • CUDA Toolkit: 11.0 or higher (11.8 or 12.x recommended)
  • Python: 3.8 or higher
  • Python packages: CuPy (required), RAPIDS cuML (optional, better performance)

Tested GPU Models​

GPU ModelMemoryPerformanceNotes
RTX 409024 GBExcellentBest performance
RTX 308010 GBVery GoodGood price/performance
RTX 306012 GBGoodBudget-friendly
Tesla V10016 GBVery GoodServer/cloud
GTX 1080 Ti11 GBModerateOlder generation

Installation​

Step 1: Check CUDA Availability​

First, verify you have an NVIDIA GPU and CUDA installed:

# Check if you have an NVIDIA GPU
nvidia-smi

# Should show your GPU info and CUDA version

If nvidia-smi is not found, you need to install NVIDIA drivers and CUDA Toolkit first.

Step 2: Install CUDA Toolkit​

Visit NVIDIA CUDA Downloads and follow instructions for your OS.

Recommended versions:

  • CUDA 11.8 (most compatible)
  • CUDA 12.x (latest features)
WSL2 Support

GPU acceleration works on WSL2! Requirements:

  • Windows 11 or Windows 10 21H2+
  • NVIDIA drivers installed on Windows
  • CUDA toolkit installed in WSL2

See NVIDIA WSL guide for details.

Step 3: Install Python GPU Dependencies​

CuPy Installation

CuPy must be installed separately as it requires a specific version matching your CUDA Toolkit. Installing via pip install ign-lidar-hd[gpu] will not work as it would attempt to build CuPy from source.

# Option 1: Basic GPU support with CuPy (recommended for most users)
pip install ign-lidar-hd
pip install cupy-cuda11x # For CUDA 11.x
# OR
pip install cupy-cuda12x # For CUDA 12.x

# Option 2: Advanced GPU with RAPIDS cuML (best performance)
pip install ign-lidar-hd
pip install cupy-cuda12x # Choose based on your CUDA version
conda install -c rapidsai -c conda-forge -c nvidia cuml

# Option 3: RAPIDS via pip (may require more configuration)
pip install ign-lidar-hd
pip install cupy-cuda11x # For CUDA 11.x
pip install cuml-cu11 # For CUDA 11.x
# OR
pip install cupy-cuda12x # For CUDA 12.x
pip install cuml-cu12 # For CUDA 12.x

Installation Recommendations:

  • Install CuPy separately: Always choose cupy-cuda11x or cupy-cuda12x based on your CUDA version
  • CuPy only: Simplest installation, 5-6x speedup
  • CuPy + RAPIDS: Best performance, up to 10x speedup
  • Conda for RAPIDS: More reliable for RAPIDS cuML dependencies

Step 4: Verify Installation​

from ign_lidar.features_gpu import GPU_AVAILABLE, CUML_AVAILABLE

print(f"GPU (CuPy) available: {GPU_AVAILABLE}")
print(f"RAPIDS cuML available: {CUML_AVAILABLE}")

Expected output:

GPU (CuPy) available: True
RAPIDS cuML available: True

Quick Start​

Command Line Interface​

Simply add the --use-gpu flag to any enrich command:

# Basic usage
ign-lidar-hd enrich \
--input tiles/ \
--output enriched/ \
--use-gpu

# With additional options
ign-lidar-hd enrich \
--input tiles/ \
--output enriched/ \
--use-gpu \
--mode full \
--num-workers 4
Automatic Fallback

The --use-gpu flag will automatically fall back to CPU if GPU is not available. Your processing will continue without errors.

Python API​

Using LiDARProcessor​

from pathlib import Path
from ign_lidar.processor import LiDARProcessor

# Create processor with GPU acceleration
processor = LiDARProcessor(
lod_level='LOD2',
patch_size=150.0,
num_points=16384,
use_gpu=True # ⚑ Enable GPU
)

# Process tiles - automatic GPU acceleration
num_patches = processor.process_tile(
laz_file=Path("data/tiles/tile.laz"),
output_dir=Path("data/patches")
)

print(f"Created {num_patches} patches using GPU")

Direct Feature Computation​

import numpy as np
from ign_lidar.features import compute_all_features_with_gpu

# Load your point cloud
points = np.random.rand(1000000, 3).astype(np.float32)
classification = np.random.randint(0, 10, 1000000).astype(np.uint8)

# Compute features with GPU
normals, curvature, height, geo_features = compute_all_features_with_gpu(
points=points,
classification=classification,
k=10,
auto_k=False,
use_gpu=True # Enables GPU
)

print(f"Computed {len(normals)} normals on GPU")

Configuration​

Python Configuration​

from ign_lidar import Config

config = Config(
use_gpu=True,
gpu_memory_limit=0.8, # Use 80% of GPU memory
cuda_device=0 # Use first GPU (if multiple)
)

Environment Variables​

# Specify CUDA device (if multiple GPUs)
export CUDA_VISIBLE_DEVICES=0

# Limit GPU memory usage
export CUPY_GPU_MEMORY_LIMIT="8GB"
import os

# Set before importing ign_lidar
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from ign_lidar.processor import LiDARProcessor

When to Use GPU​

βœ… Use GPU for:​

  • Large point clouds (>100K points)
  • Batch processing of many tiles
  • Production pipelines requiring speed
  • Real-time or interactive applications
  • Processing 10+ tiles

❌ Use CPU for:​

  • Small point clouds (<10K points)
  • One-off processing tasks
  • Systems without NVIDIA GPU
  • Prototyping and debugging
  • Quick tests with 1-2 tiles

Decision Tree​

Performance Benchmarks​

Expected Speedups​

Based on testing with various GPUs:

Point CountCPU (12 cores)GPU (RTX 3080)Speedup
1K points0.02s0.01s2x
10K points0.15s0.03s5x
100K points0.50s0.08s6.3x
1M points4.5s0.8s5.6x
10M points45s8s5.6x

Factors affecting performance:

  • GPU model and memory
  • Point cloud density and distribution
  • K-neighbors parameter (larger = more computation)
  • CPU baseline (more cores = smaller relative speedup)

Performance Comparison​

Benchmarking Your System​

Use the included benchmark script to test GPU vs CPU performance:

# Quick synthetic benchmark
python scripts/benchmarks/benchmark_gpu.py --synthetic

# Benchmark with real data
python scripts/benchmarks/benchmark_gpu.py path/to/file.laz

# Comprehensive multi-size benchmark
python scripts/benchmarks/benchmark_gpu.py --multi-size

Best Practices​

Optimizing GPU Performance​

  1. Batch processing: Process multiple tiles in sequence to amortize GPU initialization overhead
  2. Appropriate k-neighbors: Larger k = more computation benefit from GPU
  3. Monitor memory: Use nvidia-smi to check GPU memory usage
  4. Use workers=1 with GPU: GPU parallelizes internally, multiple workers may compete for GPU resources

Error Handling​

The library handles GPU errors gracefully:

# Automatic CPU fallback
processor = LiDARProcessor(use_gpu=True)

# If GPU fails or unavailable:
# - Warning logged
# - Automatically uses CPU
# - Processing continues successfully

Monitoring GPU Usage​

Monitor GPU utilization during processing:

# One-time check
nvidia-smi

# Continuous monitoring (updates every second)
watch -n 1 nvidia-smi

# Real-time monitoring
nvidia-smi -l 1

Troubleshooting​

"GPU requested but CuPy not available"​

Problem: CuPy is not installed or CUDA version mismatch.

Solution:

# Check CUDA version
nvidia-smi

# Install matching CuPy version
pip install cupy-cuda11x # for CUDA 11.x
pip install cupy-cuda12x # for CUDA 12.x

"Out of memory" error​

Problem: GPU memory insufficient for point cloud size.

Solutions:

  1. Process tiles in smaller batches
  2. Reduce batch size in GPU computer
  3. Use CPU for very large tiles
# Reduce batch size for large tiles
from ign_lidar.features_gpu import GPUFeatureComputer

computer = GPUFeatureComputer(use_gpu=True, batch_size=50000)

Slow performance on GPU​

Possible causes:

  1. GPU not utilized: Check with nvidia-smi
  2. Small point clouds: GPU overhead dominates (use CPU for <10K points)
  3. Memory transfer bottleneck: Batch multiple operations together

Solutions:

# Monitor GPU usage while processing
watch -n 1 nvidia-smi

# Use GPU for large batches only
# (automatically handled by the library)

CuPy import warnings​

Problem: Warnings about CUDA version or cuBLAS libraries.

Solution: Usually safe to ignore if operations complete successfully. To suppress:

import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='cupy')

Troubleshooting Decision Tree​

FAQ​

Q: Can I use AMD GPUs?​

A: Currently only NVIDIA GPUs with CUDA are supported. AMD ROCm support may be added in future versions.

Q: Does GPU work on WSL2?​

A: Yes! CUDA support in WSL2 requires:

  • Windows 11 or Windows 10 21H2+
  • NVIDIA drivers installed on Windows
  • CUDA toolkit installed in WSL2

See NVIDIA WSL guide

Q: What about Google Colab / Kaggle?​

A: Yes, works great in cloud notebooks with GPU runtime. Example:

# Install in Colab
!pip install ign-lidar-hd[gpu]

# Use GPU (automatically detected)
from ign_lidar.processor import LiDARProcessor
processor = LiDARProcessor(use_gpu=True)

Q: Does this work with TensorFlow/PyTorch?​

A: Yes, CuPy and TensorFlow/PyTorch can coexist. They share GPU memory. Monitor usage to avoid OOM errors.

Q: Can I mix CPU and GPU processing?​

A: Yes! Use use_gpu=True for feature computation but other operations (I/O, patch extraction) remain on CPU for optimal performance.

Version Compatibility​

ign-lidar-hdCuPyCUDAPython
1.5.0+10.0+11.0 - 12.x3.8+
1.3.0+10.0+11.0 - 12.x3.8+
1.2.1+10.0+11.0+3.8+

πŸš€ Future Development​

We're continuously expanding GPU acceleration capabilities:

Phase 3: Advanced GPU Pipeline (In Progress)​

  • Universal GPU Processing: Full pipeline GPU acceleration
  • Multi-GPU Support: Distributed processing across multiple GPUs
  • Advanced Algorithms: GPU-based spatial indexing and neighborhood search
  • Memory Optimization: Advanced memory pooling and streaming
  • Performance Analytics: Real-time GPU performance monitoring

Expected Timeline: Rolling releases throughout 2024-2025

Upcoming Features​

  • πŸ”„ GPU Memory Pooling: Reduce allocation overhead
  • πŸ“Š GPU Performance Dashboard: Real-time monitoring
  • 🌐 Multi-GPU Processing: Parallel tile processing
  • ⚑ Streaming Processing: Handle datasets larger than GPU memory
  • 🎯 Auto-GPU Selection: Intelligent GPU/CPU task distribution
Stay Updated

Follow our GitHub repository for the latest GPU acceleration developments and release announcements.

See Also​

External Resources​