Performance Guide

Optimize IGN LiDAR HD processing for maximum performance across different hardware configurations and dataset sizes.

v1.7.5 Performance Boost

NEW in v1.7.5: Automatic 5-10x speedup through optimized chunking! No configuration changes needed - your existing commands will run faster automatically.

Overview

This guide covers performance optimization strategies for:

Automatic Optimizations (v1.7.5+) - Per-chunk KDTree strategy
Large-scale dataset processing
Memory-constrained environments
GPU acceleration
Multi-core processing
Network and I/O optimization

v1.7.5 Optimizations (Automatic)

Per-Chunk KDTree Strategy

The v1.7.5 release includes major performance optimizations that are always enabled:

What Changed:

✅ Small KDTrees per chunk (~3-5M points each) instead of one massive global tree
✅ 3x smaller chunk sizes (5M vs 15M points for 10-20M datasets)
✅ 10% overlap between chunks maintains accuracy
✅ Works with both CPU and GPU backends

Impact:

🚀 5-10x faster normal computation (main bottleneck)
⏱️ 17M points: 2-5 minutes instead of 20+ minutes or hanging
💻 CPU performance: Now competitive with basic GPU setups
⚡ GPU performance: Even faster with cuML acceleration per chunk

Technical Details:

Small trees fit better in cache/VRAM
KDTree complexity: 4 × O(4.5M × log(4.5M)) vs O(17M × log(17M))
Practical speedup is 10-20x due to cache efficiency

No Configuration Needed

# This command now runs 5-10x faster automatically!
ign-lidar-hd enrich --input-dir data/ --output output/ \
  --mode full --k-neighbors 30 --preprocess --use-gpu

All existing commands benefit from the optimization. No API changes required.

Hardware Requirements

Minimum Requirements

CPU: 4 cores, 2.5GHz
RAM: 8GB
Storage: 100GB available space
GPU: Optional, CUDA-compatible

Recommended Configuration

CPU: 8+ cores, 3.0GHz+
RAM: 32GB+
Storage: 500GB+ SSD
GPU: 8GB+ VRAM (RTX 3070 or better)

High-Performance Setup

CPU: 16+ cores, 3.5GHz+ (Threadripper/Xeon)
RAM: 64GB+ DDR4-3200
Storage: 2TB+ NVMe SSD
GPU: 16GB+ VRAM (RTX 4080/A5000 or better)

CPU Optimization

Multi-Core Processing

from ign_lidar import Processor
import multiprocessing

# Use all available cores
num_cores = multiprocessing.cpu_count()
processor = Processor(num_workers=num_cores)

# Or specify optimal number
processor = Processor(num_workers=8)  # Often optimal

Batch Size Optimization

# CPU-optimized batch sizes
cpu_config = {
    "small_dataset": 25000,   # <10M points
    "medium_dataset": 50000,  # 10-50M points
    "large_dataset": 100000   # >50M points
}

processor = Processor(batch_size=cpu_config["medium_dataset"])

Memory-Efficient Processing

def process_large_file_cpu(file_path):
    """Memory-efficient CPU processing"""
    processor = Processor(
        batch_size=25000,
        enable_streaming=True,
        memory_limit_gb=8
    )

    return processor.process_streaming(file_path)

GPU Optimization

GPU Configuration

from ign_lidar import Processor

# Optimal GPU settings
gpu_processor = Processor(
    use_gpu=True,
    gpu_memory_fraction=0.8,  # Use 80% of GPU memory
    gpu_batch_size=100000,    # Larger batches for GPU
    mixed_precision=True      # Use FP16 for speed
)

GPU Memory Management

# Dynamic memory management
def adaptive_gpu_processing(points):
    try:
        # Try large batch first
        processor = Processor(
            use_gpu=True,
            gpu_batch_size=200000
        )
        return processor.process(points)
    except RuntimeError as e:
        if "out of memory" in str(e):
            # Fallback to smaller batch
            processor = Processor(
                use_gpu=True,
                gpu_batch_size=50000
            )
            return processor.process(points)

Multi-GPU Processing

# Use multiple GPUs
def multi_gpu_processing(file_list):
    import torch

    if torch.cuda.device_count() > 1:
        processors = []
        for i in range(torch.cuda.device_count()):
            processor = Processor(
                use_gpu=True,
                gpu_device=i
            )
            processors.append(processor)

        # Distribute work across GPUs
        return distribute_work(file_list, processors)

Memory Optimization

Streaming Processing

# Process files larger than RAM
def stream_process_large_file(file_path, output_path):
    processor = Processor(
        enable_streaming=True,
        chunk_size=1000000,  # 1M points per chunk
        overlap_size=10000   # 10k point overlap
    )

    processor.process_stream(
        input_path=file_path,
        output_path=output_path
    )

Memory Monitoring

import psutil
import gc

def monitor_memory_usage():
    """Monitor and manage memory usage"""
    memory_percent = psutil.virtual_memory().percent

    if memory_percent > 85:
        # Force garbage collection
        gc.collect()

        # Clear processor caches
        processor.clear_cache()

        print(f"Memory usage: {memory_percent}%")

Efficient Data Types

# Use appropriate data types to save memory
config = {
    "coordinates": "float32",    # vs float64
    "features": "float32",       # vs float64
    "labels": "uint8",          # vs int32
    "colors": "uint8"           # vs uint16
}

processor = Processor(data_types=config)

I/O Optimization

Fast File Formats

# Use compressed formats
processor = Processor(
    output_format="laz",  # Compressed LAS
    compression_level=6   # Balance size/speed
)

# Or use HDF5 for analysis
processor = Processor(
    output_format="h5",
    h5_compression="gzip",
    h5_compression_level=4
)

Parallel I/O

import concurrent.futures
import os

def parallel_file_processing(file_list, num_threads=4):
    """Process multiple files in parallel"""

    def process_single_file(file_path):
        processor = Processor()
        return processor.process_file(file_path)

    with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:
        futures = [executor.submit(process_single_file, f) for f in file_list]
        results = [future.result() for future in futures]

    return results

SSD Optimization

# Optimize for SSD storage
processor = Processor(
    temp_dir="/path/to/ssd/tmp",  # Use SSD for temp files
    enable_io_caching=True,       # Cache frequent reads
    prefetch_size=1024*1024*100   # 100MB prefetch buffer
)

Network Optimization

Efficient Downloads

from ign_lidar import Downloader

# Optimized download settings
downloader = Downloader(
    max_concurrent=8,      # Parallel downloads
    chunk_size=1024*1024,  # 1MB chunks
    retry_attempts=3,
    timeout=30
)

Batch Downloads

# Download multiple tiles efficiently
tile_list = ["C_3945-6730_2022", "C_3945-6735_2022"]

downloader.batch_download(
    tile_list,
    output_dir="./tiles/",
    verify_checksums=True
)

Processing Pipeline Optimization

Optimized Workflow

def optimized_processing_pipeline(tile_list):
    """Optimized end-to-end processing"""

    # 1. Download with verification
    downloader = Downloader(max_concurrent=6)
    downloaded_files = downloader.batch_download(tile_list)

    # 2. Process with GPU acceleration
    processor = Processor(
        use_gpu=True,
        gpu_memory_fraction=0.8,
        enable_caching=True
    )

    # 3. Batch processing
    results = []
    for file_batch in batch_files(downloaded_files, batch_size=4):
        batch_results = processor.process_batch(file_batch)
        results.extend(batch_results)

    return results

Memory-Aware Processing

def memory_aware_processing(file_list, max_memory_gb=16):
    """Adjust processing based on available memory"""

    available_memory = psutil.virtual_memory().available / (1024**3)

    if available_memory < 8:
        # Low memory mode
        config = {
            "batch_size": 25000,
            "use_gpu": False,
            "enable_streaming": True
        }
    elif available_memory < 32:
        # Standard mode
        config = {
            "batch_size": 100000,
            "use_gpu": True,
            "gpu_memory_fraction": 0.6
        }
    else:
        # High performance mode
        config = {
            "batch_size": 200000,
            "use_gpu": True,
            "gpu_memory_fraction": 0.8
        }

    processor = Processor(**config)
    return processor.process_files(file_list)

Benchmarking and Profiling

Performance Measurement

import time
import cProfile

def benchmark_processing(file_path):
    """Benchmark processing performance"""

    start_time = time.time()

    processor = Processor(use_gpu=True)
    result = processor.process_file(file_path)

    end_time = time.time()
    processing_time = end_time - start_time

    points_per_second = len(result) / processing_time

    print(f"Processing time: {processing_time:.2f}s")
    print(f"Points per second: {points_per_second:,.0f}")

    return result

Memory Profiling

from memory_profiler import profile

@profile
def profile_memory_usage():
    """Profile memory usage during processing"""
    processor = Processor()

    # Load data
    points = load_test_data()

    # Process with profiling
    result = processor.process(points)

    return result

Performance Best Practices

General Guidelines

Use GPU acceleration when available (10-15x speedup)
Optimize batch sizes for your hardware
Enable caching for repeated operations
Use appropriate data types to save memory
Monitor memory usage and implement fallbacks

Hardware-Specific Tips

For CPU-Only Systems

Use all available cores
Implement streaming for large files
Optimize I/O with SSD storage
Use memory-mapped files when possible

For GPU Systems

Use mixed precision (FP16) when possible
Implement dynamic batch sizing
Clear GPU cache between large jobs
Monitor GPU temperature and throttling

For Memory-Constrained Systems

Enable streaming processing
Use smaller batch sizes
Implement aggressive garbage collection
Use compressed file formats

Performance Monitoring

System Monitoring

def monitor_system_performance():
    """Monitor system performance during processing"""
    import psutil
    import time

    while processing_active:
        cpu_percent = psutil.cpu_percent(interval=1)
        memory_percent = psutil.virtual_memory().percent

        if hasattr(psutil, 'gpu_percent'):
            gpu_percent = psutil.gpu_percent()
            print(f"CPU: {cpu_percent}%, RAM: {memory_percent}%, GPU: {gpu_percent}%")
        else:
            print(f"CPU: {cpu_percent}%, RAM: {memory_percent}%")

        time.sleep(5)

Performance Logging

import logging

# Configure performance logging
logging.basicConfig(
    filename='performance.log',
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

def log_performance_metrics(operation, duration, points_processed):
    """Log performance metrics"""
    points_per_second = points_processed / duration

    logging.info(f"Operation: {operation}")
    logging.info(f"Duration: {duration:.2f}s")
    logging.info(f"Points: {points_processed:,}")
    logging.info(f"Performance: {points_per_second:,.0f} points/second")

Overview​

v1.7.5 Optimizations (Automatic)​

Per-Chunk KDTree Strategy​

No Configuration Needed​

Hardware Requirements​

Minimum Requirements​

Recommended Configuration​

High-Performance Setup​

CPU Optimization​

Multi-Core Processing​

Batch Size Optimization​

Memory-Efficient Processing​

GPU Optimization​

GPU Configuration​

GPU Memory Management​

Multi-GPU Processing​

Memory Optimization​

Streaming Processing​

Memory Monitoring​

Efficient Data Types​

I/O Optimization​

Fast File Formats​

Parallel I/O​

SSD Optimization​

Network Optimization​

Efficient Downloads​

Batch Downloads​

Processing Pipeline Optimization​

Optimized Workflow​

Memory-Aware Processing​

Benchmarking and Profiling​

Performance Measurement​

Memory Profiling​

Performance Best Practices​

General Guidelines​

Hardware-Specific Tips​

For CPU-Only Systems​

For GPU Systems​

For Memory-Constrained Systems​

Performance Monitoring​

System Monitoring​

Performance Logging​

Troubleshooting Performance Issues​

Common Issues and Solutions​

Slow Processing​

Memory Issues​

GPU Issues​

Related Documentation​