Aller au contenu principal

Performance Guide

Optimize IGN LiDAR HD processing for maximum performance across different hardware configurations and dataset sizes.

v1.7.5 Performance Boost

NEW in v1.7.5: Automatic 5-10x speedup through optimized chunking! No configuration changes needed - your existing commands will run faster automatically.

Overview​

This guide covers performance optimization strategies for:

  • Automatic Optimizations (v1.7.5+) - Per-chunk KDTree strategy
  • Large-scale dataset processing
  • Memory-constrained environments
  • GPU acceleration
  • Multi-core processing
  • Network and I/O optimization

v1.7.5 Optimizations (Automatic)​

Per-Chunk KDTree Strategy​

The v1.7.5 release includes major performance optimizations that are always enabled:

What Changed:

  • βœ… Small KDTrees per chunk (~3-5M points each) instead of one massive global tree
  • βœ… 3x smaller chunk sizes (5M vs 15M points for 10-20M datasets)
  • βœ… 10% overlap between chunks maintains accuracy
  • βœ… Works with both CPU and GPU backends

Impact:

  • πŸš€ 5-10x faster normal computation (main bottleneck)
  • ⏱️ 17M points: 2-5 minutes instead of 20+ minutes or hanging
  • πŸ’» CPU performance: Now competitive with basic GPU setups
  • ⚑ GPU performance: Even faster with cuML acceleration per chunk

Technical Details:

  • Small trees fit better in cache/VRAM
  • KDTree complexity: 4 Γ— O(4.5M Γ— log(4.5M)) vs O(17M Γ— log(17M))
  • Practical speedup is 10-20x due to cache efficiency

No Configuration Needed​

# This command now runs 5-10x faster automatically!
ign-lidar-hd enrich --input-dir data/ --output output/ \
--mode full --k-neighbors 30 --preprocess --use-gpu

All existing commands benefit from the optimization. No API changes required.

Hardware Requirements​

Minimum Requirements​

  • CPU: 4 cores, 2.5GHz
  • RAM: 8GB
  • Storage: 100GB available space
  • GPU: Optional, CUDA-compatible
  • CPU: 8+ cores, 3.0GHz+
  • RAM: 32GB+
  • Storage: 500GB+ SSD
  • GPU: 8GB+ VRAM (RTX 3070 or better)

High-Performance Setup​

  • CPU: 16+ cores, 3.5GHz+ (Threadripper/Xeon)
  • RAM: 64GB+ DDR4-3200
  • Storage: 2TB+ NVMe SSD
  • GPU: 16GB+ VRAM (RTX 4080/A5000 or better)

CPU Optimization​

Multi-Core Processing​

from ign_lidar import Processor
import multiprocessing

# Use all available cores
num_cores = multiprocessing.cpu_count()
processor = Processor(num_workers=num_cores)

# Or specify optimal number
processor = Processor(num_workers=8) # Often optimal

Batch Size Optimization​

# CPU-optimized batch sizes
cpu_config = {
"small_dataset": 25000, # <10M points
"medium_dataset": 50000, # 10-50M points
"large_dataset": 100000 # >50M points
}

processor = Processor(batch_size=cpu_config["medium_dataset"])

Memory-Efficient Processing​

def process_large_file_cpu(file_path):
"""Memory-efficient CPU processing"""
processor = Processor(
batch_size=25000,
enable_streaming=True,
memory_limit_gb=8
)

return processor.process_streaming(file_path)

GPU Optimization​

GPU Configuration​

from ign_lidar import Processor

# Optimal GPU settings
gpu_processor = Processor(
use_gpu=True,
gpu_memory_fraction=0.8, # Use 80% of GPU memory
gpu_batch_size=100000, # Larger batches for GPU
mixed_precision=True # Use FP16 for speed
)

GPU Memory Management​

# Dynamic memory management
def adaptive_gpu_processing(points):
try:
# Try large batch first
processor = Processor(
use_gpu=True,
gpu_batch_size=200000
)
return processor.process(points)
except RuntimeError as e:
if "out of memory" in str(e):
# Fallback to smaller batch
processor = Processor(
use_gpu=True,
gpu_batch_size=50000
)
return processor.process(points)

Multi-GPU Processing​

# Use multiple GPUs
def multi_gpu_processing(file_list):
import torch

if torch.cuda.device_count() > 1:
processors = []
for i in range(torch.cuda.device_count()):
processor = Processor(
use_gpu=True,
gpu_device=i
)
processors.append(processor)

# Distribute work across GPUs
return distribute_work(file_list, processors)

Memory Optimization​

Streaming Processing​

# Process files larger than RAM
def stream_process_large_file(file_path, output_path):
processor = Processor(
enable_streaming=True,
chunk_size=1000000, # 1M points per chunk
overlap_size=10000 # 10k point overlap
)

processor.process_stream(
input_path=file_path,
output_path=output_path
)

Memory Monitoring​

import psutil
import gc

def monitor_memory_usage():
"""Monitor and manage memory usage"""
memory_percent = psutil.virtual_memory().percent

if memory_percent > 85:
# Force garbage collection
gc.collect()

# Clear processor caches
processor.clear_cache()

print(f"Memory usage: {memory_percent}%")

Efficient Data Types​

# Use appropriate data types to save memory
config = {
"coordinates": "float32", # vs float64
"features": "float32", # vs float64
"labels": "uint8", # vs int32
"colors": "uint8" # vs uint16
}

processor = Processor(data_types=config)

I/O Optimization​

Fast File Formats​

# Use compressed formats
processor = Processor(
output_format="laz", # Compressed LAS
compression_level=6 # Balance size/speed
)

# Or use HDF5 for analysis
processor = Processor(
output_format="h5",
h5_compression="gzip",
h5_compression_level=4
)

Parallel I/O​

import concurrent.futures
import os

def parallel_file_processing(file_list, num_threads=4):
"""Process multiple files in parallel"""

def process_single_file(file_path):
processor = Processor()
return processor.process_file(file_path)

with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:
futures = [executor.submit(process_single_file, f) for f in file_list]
results = [future.result() for future in futures]

return results

SSD Optimization​

# Optimize for SSD storage
processor = Processor(
temp_dir="/path/to/ssd/tmp", # Use SSD for temp files
enable_io_caching=True, # Cache frequent reads
prefetch_size=1024*1024*100 # 100MB prefetch buffer
)

Network Optimization​

Efficient Downloads​

from ign_lidar import Downloader

# Optimized download settings
downloader = Downloader(
max_concurrent=8, # Parallel downloads
chunk_size=1024*1024, # 1MB chunks
retry_attempts=3,
timeout=30
)

Batch Downloads​

# Download multiple tiles efficiently
tile_list = ["C_3945-6730_2022", "C_3945-6735_2022"]

downloader.batch_download(
tile_list,
output_dir="./tiles/",
verify_checksums=True
)

Processing Pipeline Optimization​

Optimized Workflow​

def optimized_processing_pipeline(tile_list):
"""Optimized end-to-end processing"""

# 1. Download with verification
downloader = Downloader(max_concurrent=6)
downloaded_files = downloader.batch_download(tile_list)

# 2. Process with GPU acceleration
processor = Processor(
use_gpu=True,
gpu_memory_fraction=0.8,
enable_caching=True
)

# 3. Batch processing
results = []
for file_batch in batch_files(downloaded_files, batch_size=4):
batch_results = processor.process_batch(file_batch)
results.extend(batch_results)

return results

Memory-Aware Processing​

def memory_aware_processing(file_list, max_memory_gb=16):
"""Adjust processing based on available memory"""

available_memory = psutil.virtual_memory().available / (1024**3)

if available_memory < 8:
# Low memory mode
config = {
"batch_size": 25000,
"use_gpu": False,
"enable_streaming": True
}
elif available_memory < 32:
# Standard mode
config = {
"batch_size": 100000,
"use_gpu": True,
"gpu_memory_fraction": 0.6
}
else:
# High performance mode
config = {
"batch_size": 200000,
"use_gpu": True,
"gpu_memory_fraction": 0.8
}

processor = Processor(**config)
return processor.process_files(file_list)

Benchmarking and Profiling​

Performance Measurement​

import time
import cProfile

def benchmark_processing(file_path):
"""Benchmark processing performance"""

start_time = time.time()

processor = Processor(use_gpu=True)
result = processor.process_file(file_path)

end_time = time.time()
processing_time = end_time - start_time

points_per_second = len(result) / processing_time

print(f"Processing time: {processing_time:.2f}s")
print(f"Points per second: {points_per_second:,.0f}")

return result

Memory Profiling​

from memory_profiler import profile

@profile
def profile_memory_usage():
"""Profile memory usage during processing"""
processor = Processor()

# Load data
points = load_test_data()

# Process with profiling
result = processor.process(points)

return result

Performance Best Practices​

General Guidelines​

  1. Use GPU acceleration when available (10-15x speedup)
  2. Optimize batch sizes for your hardware
  3. Enable caching for repeated operations
  4. Use appropriate data types to save memory
  5. Monitor memory usage and implement fallbacks

Hardware-Specific Tips​

For CPU-Only Systems​

  • Use all available cores
  • Implement streaming for large files
  • Optimize I/O with SSD storage
  • Use memory-mapped files when possible

For GPU Systems​

  • Use mixed precision (FP16) when possible
  • Implement dynamic batch sizing
  • Clear GPU cache between large jobs
  • Monitor GPU temperature and throttling

For Memory-Constrained Systems​

  • Enable streaming processing
  • Use smaller batch sizes
  • Implement aggressive garbage collection
  • Use compressed file formats

Performance Monitoring​

System Monitoring​

def monitor_system_performance():
"""Monitor system performance during processing"""
import psutil
import time

while processing_active:
cpu_percent = psutil.cpu_percent(interval=1)
memory_percent = psutil.virtual_memory().percent

if hasattr(psutil, 'gpu_percent'):
gpu_percent = psutil.gpu_percent()
print(f"CPU: {cpu_percent}%, RAM: {memory_percent}%, GPU: {gpu_percent}%")
else:
print(f"CPU: {cpu_percent}%, RAM: {memory_percent}%")

time.sleep(5)

Performance Logging​

import logging

# Configure performance logging
logging.basicConfig(
filename='performance.log',
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

def log_performance_metrics(operation, duration, points_processed):
"""Log performance metrics"""
points_per_second = points_processed / duration

logging.info(f"Operation: {operation}")
logging.info(f"Duration: {duration:.2f}s")
logging.info(f"Points: {points_processed:,}")
logging.info(f"Performance: {points_per_second:,.0f} points/second")

Troubleshooting Performance Issues​

Common Issues and Solutions​

Slow Processing​

  1. Check GPU utilization: Ensure GPU is being used
  2. Optimize batch size: Try different values
  3. Check I/O bottlenecks: Use SSD storage
  4. Monitor memory usage: Avoid swapping

Memory Issues​

  1. Reduce batch size: Lower memory usage
  2. Enable streaming: For files larger than RAM
  3. Clear caches: Free up memory periodically
  4. Use compression: Reduce memory footprint

GPU Issues​

  1. Check CUDA version: Ensure compatibility
  2. Monitor GPU memory: Avoid out-of-memory errors
  3. Check thermal throttling: Ensure adequate cooling
  4. Update drivers: Use latest GPU drivers