GPU Feature Computation
Available in: v1.3.0+
Acceleration: 5-10x speedup over CPU
Fixed in v1.6.2: GPU formulas now match CPU (see breaking changes below)
GPU feature formulas were corrected to match CPU and standard literature (Weinmann et al., 2015). If you used GPU acceleration in v1.6.1 or earlier, feature values have changed. You'll need to retrain models or switch to CPU for compatibility with old models.
This guide covers the technical details of GPU-accelerated feature computation, including which features are accelerated, API reference, and advanced optimization techniques.
Features Acceleratedβ
The following features are computed on GPU when GPU acceleration is enabled:
Core Geometric Featuresβ
- β Surface normals (nx, ny, nz) - Normal vectors for each point
- β Curvature values - Surface curvature at each point
- β Height above ground - Normalized height values
Advanced Geometric Featuresβ
- β Planarity - Measure of how flat a surface is (useful for roofs, roads)
- β Linearity - Measure of linear structures (useful for edges, cables)
- β Sphericity - Measure of spherical structures (useful for vegetation)
- β Anisotropy - Directional structure measure
- β Roughness - Surface texture and irregularity
- β Local density - Point density in local neighborhood
Building-Specific Featuresβ
- β Verticality - Measure of vertical alignment (walls)
- β Horizontality - Measure of horizontal alignment (roofs, floors)
- β Wall score - Probability of being a wall element
- β Roof score - Probability of being a roof element
Performance by Feature Typeβ
Feature Type | CPU Time | GPU Time | Speedup |
---|---|---|---|
Surface Normals | 2.5s | 0.3s | 8.3x |
Curvature | 3.0s | 0.4s | 7.5x |
Height Above Ground | 1.5s | 0.2s | 7.5x |
Geometric Features | 4.0s | 0.6s | 6.7x |
Building Features | 5.0s | 0.8s | 6.3x |
Total (1M points) | 16s | 2.3s | 7x |
What Changed in v1.6.2β
Formula Correctionsβ
GPU formulas were corrected to match CPU and standard literature:
Before v1.6.2 (INCORRECT):
planarity = (Ξ»1 - Ξ»2) / Ξ»0 # Wrong normalization
linearity = (Ξ»0 - Ξ»1) / Ξ»0 # Wrong normalization
sphericity = Ξ»2 / Ξ»0 # Wrong normalization
v1.6.2+ (CORRECT - matches Weinmann et al., 2015):
sum_Ξ» = Ξ»0 + Ξ»1 + Ξ»2
planarity = (Ξ»1 - Ξ»2) / sum_Ξ» # Standard formulation
linearity = (Ξ»0 - Ξ»1) / sum_Ξ» # Standard formulation
sphericity = Ξ»2 / sum_Ξ» # Standard formulation
New Robustness Featuresβ
- Degenerate Case Filtering: Points with insufficient neighbors or near-zero eigenvalues now return 0.0 instead of NaN/Inf
- Robust Curvature: Uses Median Absolute Deviation (MAD) instead of std for outlier resistance
- Radius Search Support: Optional radius-based neighbor search (falls back to CPU)
Validationβ
GPU now produces identical results to CPU (validated: max difference < 0.0001%):
# Run validation test
python tests/test_feature_fixes.py
# Expected: βββ ALL TESTS PASSED βββ
For more details, see:
- v1.6.2 Release Notes
- Repository files:
GEOMETRIC_FEATURES_ANALYSIS.md
,IMPLEMENTATION_SUMMARY.md
API Referenceβ
GPUFeatureComputer Classβ
The main class for GPU-accelerated feature computation.
from ign_lidar.features_gpu import GPUFeatureComputer
# Initialize GPU feature computer
computer = GPUFeatureComputer(
use_gpu=True,
batch_size=100000,
memory_limit=0.8,
device_id=0
)
Constructor Parametersβ
Parameter | Type | Default | Description |
---|---|---|---|
use_gpu | bool | True | Enable GPU acceleration |
batch_size | int | 100000 | Points processed per GPU batch |
memory_limit | float | 0.8 | GPU memory usage limit (0-1) |
device_id | int | 0 | CUDA device ID (for multi-GPU) |
Main Methodsβ
compute_all_features_with_gpu()β
Compute all features for a point cloud using GPU acceleration.
from ign_lidar.features import compute_all_features_with_gpu
import numpy as np
# Your point cloud data
points = np.random.rand(1000000, 3).astype(np.float32)
classification = np.random.randint(0, 10, 1000000).astype(np.uint8)
# Compute features
normals, curvature, height, geo_features = compute_all_features_with_gpu(
points=points,
classification=classification,
k=20,
auto_k=False,
use_gpu=True,
batch_size=100000
)
Parameters:
Parameter | Type | Required | Description |
---|---|---|---|
points | np.ndarray | Yes | Point coordinates (N, 3) |
classification | np.ndarray | Yes | Point classifications (N,) |
k | int | No | Number of neighbors for features (default: 20) |
auto_k | bool | No | Automatically adjust k based on density (default: False) |
use_gpu | bool | No | Enable GPU acceleration (default: True) |
batch_size | int | No | Batch size for GPU processing (default: 100000) |
Returns:
Return Value | Type | Shape | Description |
---|---|---|---|
normals | np.ndarray | (N, 3) | Surface normal vectors |
curvature | np.ndarray | (N,) | Curvature values |
height | np.ndarray | (N,) | Height above ground |
geo_features | dict | - | Dictionary of geometric features |
compute_normals_gpu()β
Compute surface normals using GPU.
from ign_lidar.features_gpu import compute_normals_gpu
normals = compute_normals_gpu(
points=points,
k=20,
batch_size=100000
)
compute_curvature_gpu()β
Compute curvature values using GPU.
from ign_lidar.features_gpu import compute_curvature_gpu
curvature = compute_curvature_gpu(
points=points,
normals=normals,
k=20,
batch_size=100000
)
compute_geometric_features_gpu()β
Compute all geometric features using GPU.
from ign_lidar.features_gpu import compute_geometric_features_gpu
geo_features = compute_geometric_features_gpu(
points=points,
normals=normals,
k=20,
batch_size=100000
)
# Access individual features
planarity = geo_features['planarity']
linearity = geo_features['linearity']
sphericity = geo_features['sphericity']
Advanced Usageβ
Batch Processing Optimizationβ
For processing multiple tiles, reuse the GPU computer instance:
from ign_lidar.features_gpu import GPUFeatureComputer
from pathlib import Path
# Initialize once
computer = GPUFeatureComputer(use_gpu=True, batch_size=100000)
# Process multiple tiles
for tile_path in Path("tiles/").glob("*.laz"):
# Load tile
points, classification = load_tile(tile_path)
# Compute features (GPU stays initialized)
normals, curvature, height, geo_features = computer.compute_all(
points=points,
classification=classification,
k=20
)
# Save results
save_enriched_tile(tile_path, normals, curvature, height, geo_features)
Memory Managementβ
Control GPU memory usage for large tiles:
from ign_lidar.features_gpu import GPUFeatureComputer
# For very large tiles (>5M points)
computer = GPUFeatureComputer(
use_gpu=True,
batch_size=50000, # Smaller batch size
memory_limit=0.6 # Use less GPU memory
)
# For small to medium tiles (less than 1M points)
computer = GPUFeatureComputer(
use_gpu=True,
batch_size=200000, # Larger batch size
memory_limit=0.9 # Use more GPU memory
)
Multi-GPU Support (Experimental)β
Multi-GPU support is experimental in v1.5.0. Use with caution in production.
import os
# Specify GPU device
os.environ['CUDA_VISIBLE_DEVICES'] = '0' # Use first GPU
# Or for specific GPU
computer = GPUFeatureComputer(use_gpu=True, device_id=1) # Use second GPU
Custom Feature Computationβ
Implement custom GPU-accelerated features:
import cupy as cp
from ign_lidar.features_gpu import GPUFeatureComputer
class CustomGPUComputer(GPUFeatureComputer):
def compute_custom_feature(self, points_gpu):
"""Compute a custom feature on GPU"""
# Your custom GPU computation using CuPy
feature = cp.mean(points_gpu, axis=1)
return cp.asnumpy(feature)
# Use custom computer
computer = CustomGPUComputer(use_gpu=True)
Performance Optimization Tipsβ
1. Optimal Batch Sizeβ
Choose batch size based on GPU memory:
GPU Memory | Recommended Batch Size | Max Point Cloud |
---|---|---|
4 GB | 50,000 | 2M points |
8 GB | 100,000 | 5M points |
12 GB | 150,000 | 8M points |
16 GB+ | 200,000+ | 10M+ points |
2. K-Neighbors Selectionβ
Larger k values benefit more from GPU:
# Optimal for GPU (k >= 20)
features = compute_all_features_with_gpu(points, classification, k=20, use_gpu=True)
# Less optimal for GPU (k < 10)
features = compute_all_features_with_gpu(points, classification, k=5, use_gpu=True)
3. Memory Transfer Optimizationβ
Minimize CPU-GPU transfers by batching operations:
# β Bad: Multiple transfers
normals = compute_normals_gpu(points)
curvature = compute_curvature_gpu(points, normals) # Transfer normals back
geo = compute_geometric_features_gpu(points, normals) # Transfer again
# β
Good: Single batch
normals, curvature, height, geo = compute_all_features_with_gpu(points, classification)
4. Persistent GPU Memoryβ
For repeated processing, keep data on GPU:
import cupy as cp
# Transfer to GPU once
points_gpu = cp.asarray(points)
# Process multiple times without re-transfer
for k in [10, 20, 30]:
normals = compute_normals_gpu(points_gpu, k=k)
Benchmarkingβ
Running Benchmarksβ
The library includes comprehensive benchmarking tools:
# Synthetic benchmark (quick test)
python scripts/benchmarks/benchmark_gpu.py --synthetic
# Real data benchmark
python scripts/benchmarks/benchmark_gpu.py path/to/tile.laz
# Multi-size benchmark
python scripts/benchmarks/benchmark_gpu.py --multi-size
# Compare different k values
python scripts/benchmarks/benchmark_gpu.py --test-k
Interpreting Resultsβ
Example benchmark output:
GPU Benchmark Results
=====================
GPU Model: NVIDIA RTX 3080 (10GB)
CUDA Version: 11.8
CuPy Version: 11.6.0
Point Cloud: 1,000,000 points
K-neighbors: 20
Feature Computation Times:
--------------------------
Normals (CPU): 2.45s
Normals (GPU): 0.31s β 7.9x speedup
Curvature (CPU): 2.98s
Curvature (GPU): 0.42s β 7.1x speedup
Geometric (CPU): 3.87s
Geometric (GPU): 0.58s β 6.7x speedup
Total (CPU): 15.32s
Total (GPU): 2.14s β 7.2x speedup
Memory Usage:
-------------
GPU Memory Used: 1.2 GB / 10 GB (12%)
Peak Memory: 1.8 GB
CPU Memory Used: 2.4 GB
Performance Factorsβ
GPU performance depends on:
- Point cloud size: Larger = better GPU utilization
- K-neighbors value: Larger = more parallelizable work
- GPU model: Newer = faster processing
- Memory bandwidth: Higher = faster transfers
- CUDA compute capability: Higher = more features
Troubleshootingβ
Performance Issuesβ
GPU Slower Than Expectedβ
Symptoms: GPU processing not much faster than CPU
Possible Causes:
- Small point clouds (<10K points) - GPU overhead dominates
- Low k value (<10) - Not enough parallelizable work
- Memory transfer bottleneck
- GPU not fully utilized
Solutions:
# Check GPU utilization
nvidia-smi -l 1
# Should show high GPU usage during processing
# If GPU usage is low:
# Increase batch size
computer = GPUFeatureComputer(batch_size=200000)
# Increase k value
features = compute_all_features_with_gpu(points, classification, k=30)
# Use larger tiles or batch processing
Out of Memory Errorsβ
Symptoms: CUDA out of memory errors
Solutions:
# Reduce batch size
computer = GPUFeatureComputer(batch_size=50000)
# Reduce memory limit
computer = GPUFeatureComputer(memory_limit=0.6)
# Process in smaller chunks
for chunk in split_point_cloud(points, chunk_size=500000):
features = compute_all_features_with_gpu(chunk, classification)
CuPy Import Errorsβ
Symptoms: ImportError or CUDA version mismatch warnings
Solutions:
# Check CUDA version
nvidia-smi | grep "CUDA Version"
# Reinstall matching CuPy
pip uninstall cupy
pip install cupy-cuda11x # or cupy-cuda12x
Memory Leaksβ
If GPU memory keeps increasing:
# Force GPU memory cleanup
import cupy as cp
cp.get_default_memory_pool().free_all_blocks()
# Or use context manager
from ign_lidar.features_gpu import GPUMemoryManager
with GPUMemoryManager():
# GPU memory automatically freed after this block
features = compute_all_features_with_gpu(points, classification)
Limitationsβ
Current Limitationsβ
- GPU Memory: Limited by available GPU RAM
- Single GPU: Multi-GPU support is experimental
- NVIDIA Only: Requires NVIDIA GPU with CUDA
- K-NN Implementation: Uses brute-force for k < 50, KD-tree for k >= 50
Future Improvements (Roadmap)β
- π Multi-GPU support (v1.6.0) - Distribute work across multiple GPUs
- π Mixed precision (v1.6.0) - Use FP16 for faster computation
- π AMD GPU support (v2.0.0) - ROCm support for AMD GPUs
- π Chunked processing (v1.6.0) - Automatic chunking for very large tiles
- π Persistent GPU cache (v1.7.0) - Cache preprocessed data on GPU
See Alsoβ
- GPU Overview - GPU setup and installation
- RGB GPU Acceleration - GPU-accelerated RGB augmentation
- Architecture - System architecture
- Workflows - GPU workflow examples