GPU Feature Computation

Available in: v1.3.0+
Acceleration: 5-10x speedup over CPU
Fixed in v1.6.2: GPU formulas now match CPU (see breaking changes below)

Breaking Change in v1.6.2

GPU feature formulas were corrected to match CPU and standard literature (Weinmann et al., 2015). If you used GPU acceleration in v1.6.1 or earlier, feature values have changed. You'll need to retrain models or switch to CPU for compatibility with old models.

This guide covers the technical details of GPU-accelerated feature computation, including which features are accelerated, API reference, and advanced optimization techniques.

Features Accelerated

The following features are computed on GPU when GPU acceleration is enabled:

Core Geometric Features

✅ Surface normals (nx, ny, nz) - Normal vectors for each point
✅ Curvature values - Surface curvature at each point
✅ Height above ground - Normalized height values

Advanced Geometric Features

✅ Planarity - Measure of how flat a surface is (useful for roofs, roads)
✅ Linearity - Measure of linear structures (useful for edges, cables)
✅ Sphericity - Measure of spherical structures (useful for vegetation)
✅ Anisotropy - Directional structure measure
✅ Roughness - Surface texture and irregularity
✅ Local density - Point density in local neighborhood

Building-Specific Features

✅ Verticality - Measure of vertical alignment (walls)
✅ Horizontality - Measure of horizontal alignment (roofs, floors)
✅ Wall score - Probability of being a wall element
✅ Roof score - Probability of being a roof element

Performance by Feature Type

Feature Type	CPU Time	GPU Time	Speedup
Surface Normals	2.5s	0.3s	8.3x
Curvature	3.0s	0.4s	7.5x
Height Above Ground	1.5s	0.2s	7.5x
Geometric Features	4.0s	0.6s	6.7x
Building Features	5.0s	0.8s	6.3x
Total (1M points)	16s	2.3s	7x

What Changed in v1.6.2

Formula Corrections

GPU formulas were corrected to match CPU and standard literature:

Before v1.6.2 (INCORRECT):

planarity = (λ1 - λ2) / λ0  # Wrong normalization
linearity = (λ0 - λ1) / λ0  # Wrong normalization
sphericity = λ2 / λ0         # Wrong normalization

v1.6.2+ (CORRECT - matches Weinmann et al., 2015):

sum_λ = λ0 + λ1 + λ2
planarity = (λ1 - λ2) / sum_λ   # Standard formulation
linearity = (λ0 - λ1) / sum_λ   # Standard formulation
sphericity = λ2 / sum_λ          # Standard formulation

New Robustness Features

Degenerate Case Filtering: Points with insufficient neighbors or near-zero eigenvalues now return 0.0 instead of NaN/Inf
Robust Curvature: Uses Median Absolute Deviation (MAD) instead of std for outlier resistance
Radius Search Support: Optional radius-based neighbor search (falls back to CPU)

Validation

GPU now produces identical results to CPU (validated: max difference < 0.0001%):

# Run validation test
python tests/test_feature_fixes.py
# Expected: ✓✓✓ ALL TESTS PASSED ✓✓✓

For more details, see:

v1.6.2 Release Notes
Repository files: GEOMETRIC_FEATURES_ANALYSIS.md, IMPLEMENTATION_SUMMARY.md

API Reference

GPUFeatureComputer Class

The main class for GPU-accelerated feature computation.

from ign_lidar.features_gpu import GPUFeatureComputer

# Initialize GPU feature computer
computer = GPUFeatureComputer(
    use_gpu=True,
    batch_size=100000,
    memory_limit=0.8,
    device_id=0
)

Constructor Parameters

Parameter	Type	Default	Description
`use_gpu`	bool	`True`	Enable GPU acceleration
`batch_size`	int	`100000`	Points processed per GPU batch
`memory_limit`	float	`0.8`	GPU memory usage limit (0-1)
`device_id`	int	`0`	CUDA device ID (for multi-GPU)

Main Methods

compute_all_features_with_gpu()

Compute all features for a point cloud using GPU acceleration.

from ign_lidar.features import compute_all_features_with_gpu
import numpy as np

# Your point cloud data
points = np.random.rand(1000000, 3).astype(np.float32)
classification = np.random.randint(0, 10, 1000000).astype(np.uint8)

# Compute features
normals, curvature, height, geo_features = compute_all_features_with_gpu(
    points=points,
    classification=classification,
    k=20,
    auto_k=False,
    use_gpu=True,
    batch_size=100000
)

Parameters:

Parameter	Type	Required	Description
`points`	np.ndarray	Yes	Point coordinates (N, 3)
`classification`	np.ndarray	Yes	Point classifications (N,)
`k`	int	No	Number of neighbors for features (default: 20)
`auto_k`	bool	No	Automatically adjust k based on density (default: False)
`use_gpu`	bool	No	Enable GPU acceleration (default: True)
`batch_size`	int	No	Batch size for GPU processing (default: 100000)

Returns:

Return Value	Type	Shape	Description
`normals`	np.ndarray	(N, 3)	Surface normal vectors
`curvature`	np.ndarray	(N,)	Curvature values
`height`	np.ndarray	(N,)	Height above ground
`geo_features`	dict	-	Dictionary of geometric features

compute_normals_gpu()

Compute surface normals using GPU.

from ign_lidar.features_gpu import compute_normals_gpu

normals = compute_normals_gpu(
    points=points,
    k=20,
    batch_size=100000
)

compute_curvature_gpu()

Compute curvature values using GPU.

from ign_lidar.features_gpu import compute_curvature_gpu

curvature = compute_curvature_gpu(
    points=points,
    normals=normals,
    k=20,
    batch_size=100000
)

compute_geometric_features_gpu()

Compute all geometric features using GPU.

from ign_lidar.features_gpu import compute_geometric_features_gpu

geo_features = compute_geometric_features_gpu(
    points=points,
    normals=normals,
    k=20,
    batch_size=100000
)

# Access individual features
planarity = geo_features['planarity']
linearity = geo_features['linearity']
sphericity = geo_features['sphericity']

Advanced Usage

Batch Processing Optimization

For processing multiple tiles, reuse the GPU computer instance:

from ign_lidar.features_gpu import GPUFeatureComputer
from pathlib import Path

# Initialize once
computer = GPUFeatureComputer(use_gpu=True, batch_size=100000)

# Process multiple tiles
for tile_path in Path("tiles/").glob("*.laz"):
    # Load tile
    points, classification = load_tile(tile_path)

    # Compute features (GPU stays initialized)
    normals, curvature, height, geo_features = computer.compute_all(
        points=points,
        classification=classification,
        k=20
    )

    # Save results
    save_enriched_tile(tile_path, normals, curvature, height, geo_features)

Memory Management

Control GPU memory usage for large tiles:

from ign_lidar.features_gpu import GPUFeatureComputer

# For very large tiles (>5M points)
computer = GPUFeatureComputer(
    use_gpu=True,
    batch_size=50000,  # Smaller batch size
    memory_limit=0.6   # Use less GPU memory
)

# For small to medium tiles (less than 1M points)
computer = GPUFeatureComputer(
    use_gpu=True,
    batch_size=200000,  # Larger batch size
    memory_limit=0.9    # Use more GPU memory
)

Multi-GPU Support (Experimental)

Experimental Feature

Multi-GPU support is experimental in v1.5.0. Use with caution in production.

import os

# Specify GPU device
os.environ['CUDA_VISIBLE_DEVICES'] = '0'  # Use first GPU

# Or for specific GPU
computer = GPUFeatureComputer(use_gpu=True, device_id=1)  # Use second GPU

Custom Feature Computation

Implement custom GPU-accelerated features:

import cupy as cp
from ign_lidar.features_gpu import GPUFeatureComputer

class CustomGPUComputer(GPUFeatureComputer):
    def compute_custom_feature(self, points_gpu):
        """Compute a custom feature on GPU"""
        # Your custom GPU computation using CuPy
        feature = cp.mean(points_gpu, axis=1)
        return cp.asnumpy(feature)

# Use custom computer
computer = CustomGPUComputer(use_gpu=True)

Performance Optimization Tips

1. Optimal Batch Size

Choose batch size based on GPU memory:

GPU Memory	Recommended Batch Size	Max Point Cloud
4 GB	50,000	2M points
8 GB	100,000	5M points
12 GB	150,000	8M points
16 GB+	200,000+	10M+ points

2. K-Neighbors Selection

Larger k values benefit more from GPU:

# Optimal for GPU (k >= 20)
features = compute_all_features_with_gpu(points, classification, k=20, use_gpu=True)

# Less optimal for GPU (k < 10)
features = compute_all_features_with_gpu(points, classification, k=5, use_gpu=True)

3. Memory Transfer Optimization

Minimize CPU-GPU transfers by batching operations:

# ❌ Bad: Multiple transfers
normals = compute_normals_gpu(points)
curvature = compute_curvature_gpu(points, normals)  # Transfer normals back
geo = compute_geometric_features_gpu(points, normals)  # Transfer again

# ✅ Good: Single batch
normals, curvature, height, geo = compute_all_features_with_gpu(points, classification)

4. Persistent GPU Memory

For repeated processing, keep data on GPU:

import cupy as cp

# Transfer to GPU once
points_gpu = cp.asarray(points)

# Process multiple times without re-transfer
for k in [10, 20, 30]:
    normals = compute_normals_gpu(points_gpu, k=k)

Benchmarking

Running Benchmarks

The library includes comprehensive benchmarking tools:

# Synthetic benchmark (quick test)
python scripts/benchmarks/benchmark_gpu.py --synthetic

# Real data benchmark
python scripts/benchmarks/benchmark_gpu.py path/to/tile.laz

# Multi-size benchmark
python scripts/benchmarks/benchmark_gpu.py --multi-size

# Compare different k values
python scripts/benchmarks/benchmark_gpu.py --test-k

Interpreting Results

Example benchmark output:

GPU Benchmark Results
=====================
GPU Model: NVIDIA RTX 3080 (10GB)
CUDA Version: 11.8
CuPy Version: 11.6.0

Point Cloud: 1,000,000 points
K-neighbors: 20

Feature Computation Times:
--------------------------
Normals (CPU):     2.45s
Normals (GPU):     0.31s  → 7.9x speedup

Curvature (CPU):   2.98s
Curvature (GPU):   0.42s  → 7.1x speedup

Geometric (CPU):   3.87s
Geometric (GPU):   0.58s  → 6.7x speedup

Total (CPU):      15.32s
Total (GPU):       2.14s  → 7.2x speedup

Memory Usage:
-------------
GPU Memory Used:   1.2 GB / 10 GB (12%)
Peak Memory:       1.8 GB
CPU Memory Used:   2.4 GB

Performance Factors

GPU performance depends on:

Point cloud size: Larger = better GPU utilization
K-neighbors value: Larger = more parallelizable work
GPU model: Newer = faster processing
Memory bandwidth: Higher = faster transfers
CUDA compute capability: Higher = more features

Troubleshooting

Performance Issues

GPU Slower Than Expected

Symptoms: GPU processing not much faster than CPU

Possible Causes:

Small point clouds (<10K points) - GPU overhead dominates
Low k value (<10) - Not enough parallelizable work
Memory transfer bottleneck
GPU not fully utilized

Solutions:

# Check GPU utilization
nvidia-smi -l 1

# Should show high GPU usage during processing
# If GPU usage is low:

# Increase batch size
computer = GPUFeatureComputer(batch_size=200000)

# Increase k value
features = compute_all_features_with_gpu(points, classification, k=30)

# Use larger tiles or batch processing

Out of Memory Errors

Symptoms: CUDA out of memory errors

Solutions:

# Reduce batch size
computer = GPUFeatureComputer(batch_size=50000)

# Reduce memory limit
computer = GPUFeatureComputer(memory_limit=0.6)

# Process in smaller chunks
for chunk in split_point_cloud(points, chunk_size=500000):
    features = compute_all_features_with_gpu(chunk, classification)

CuPy Import Errors

Symptoms: ImportError or CUDA version mismatch warnings

Solutions:

# Check CUDA version
nvidia-smi | grep "CUDA Version"

# Reinstall matching CuPy
pip uninstall cupy
pip install cupy-cuda11x  # or cupy-cuda12x

Memory Leaks

If GPU memory keeps increasing:

# Force GPU memory cleanup
import cupy as cp
cp.get_default_memory_pool().free_all_blocks()

# Or use context manager
from ign_lidar.features_gpu import GPUMemoryManager

with GPUMemoryManager():
    # GPU memory automatically freed after this block
    features = compute_all_features_with_gpu(points, classification)

Limitations

Current Limitations

GPU Memory: Limited by available GPU RAM
Single GPU: Multi-GPU support is experimental
NVIDIA Only: Requires NVIDIA GPU with CUDA
K-NN Implementation: Uses brute-force for k < 50, KD-tree for k >= 50

Future Improvements (Roadmap)

🔄 Multi-GPU support (v1.6.0) - Distribute work across multiple GPUs
🔄 Mixed precision (v1.6.0) - Use FP16 for faster computation
🔄 AMD GPU support (v2.0.0) - ROCm support for AMD GPUs
🔄 Chunked processing (v1.6.0) - Automatic chunking for very large tiles
🔄 Persistent GPU cache (v1.7.0) - Cache preprocessed data on GPU

Features Accelerated​

Core Geometric Features​

Advanced Geometric Features​

Building-Specific Features​

Performance by Feature Type​

What Changed in v1.6.2​

Formula Corrections​

New Robustness Features​

Validation​

API Reference​

GPUFeatureComputer Class​

Constructor Parameters​

Main Methods​

compute_all_features_with_gpu()​

compute_normals_gpu()​

compute_curvature_gpu()​

compute_geometric_features_gpu()​

Advanced Usage​

Batch Processing Optimization​

Memory Management​

Multi-GPU Support (Experimental)​

Custom Feature Computation​

Performance Optimization Tips​

1. Optimal Batch Size​

2. K-Neighbors Selection​

3. Memory Transfer Optimization​

4. Persistent GPU Memory​

Benchmarking​

Running Benchmarks​

Interpreting Results​

Performance Factors​

Troubleshooting​

Performance Issues​

GPU Slower Than Expected​

Out of Memory Errors​

CuPy Import Errors​

Memory Leaks​

Limitations​

Current Limitations​

Future Improvements (Roadmap)​

See Also​

References​

Features Accelerated

Core Geometric Features

Advanced Geometric Features

Building-Specific Features

Performance by Feature Type

What Changed in v1.6.2

Formula Corrections

New Robustness Features

Validation

API Reference

GPUFeatureComputer Class

Constructor Parameters

Main Methods

compute_all_features_with_gpu()

compute_normals_gpu()

compute_curvature_gpu()

compute_geometric_features_gpu()

Advanced Usage

Batch Processing Optimization

Memory Management

Multi-GPU Support (Experimental)

Custom Feature Computation

Performance Optimization Tips

1. Optimal Batch Size

2. K-Neighbors Selection

3. Memory Transfer Optimization

4. Persistent GPU Memory

Benchmarking

Running Benchmarks

Interpreting Results

Performance Factors

Troubleshooting

Performance Issues

GPU Slower Than Expected

Out of Memory Errors

CuPy Import Errors

Memory Leaks

Limitations

Current Limitations

Future Improvements (Roadmap)

See Also

References