Performance Guide
Optimize IGN LiDAR HD processing for maximum performance across different hardware configurations and dataset sizes.
v1.7.5 Performance Boost
NEW in v1.7.5: Automatic 5-10x speedup through optimized chunking! No configuration changes needed - your existing commands will run faster automatically.
Overviewβ
This guide covers performance optimization strategies for:
- Automatic Optimizations (v1.7.5+) - Per-chunk KDTree strategy
- Large-scale dataset processing
- Memory-constrained environments
- GPU acceleration
- Multi-core processing
- Network and I/O optimization
v1.7.5 Optimizations (Automatic)β
Per-Chunk KDTree Strategyβ
The v1.7.5 release includes major performance optimizations that are always enabled:
What Changed:
- β Small KDTrees per chunk (~3-5M points each) instead of one massive global tree
- β 3x smaller chunk sizes (5M vs 15M points for 10-20M datasets)
- β 10% overlap between chunks maintains accuracy
- β Works with both CPU and GPU backends
Impact:
- π 5-10x faster normal computation (main bottleneck)
- β±οΈ 17M points: 2-5 minutes instead of 20+ minutes or hanging
- π» CPU performance: Now competitive with basic GPU setups
- β‘ GPU performance: Even faster with cuML acceleration per chunk
Technical Details:
- Small trees fit better in cache/VRAM
- KDTree complexity: 4 Γ O(4.5M Γ log(4.5M)) vs O(17M Γ log(17M))
- Practical speedup is 10-20x due to cache efficiency
No Configuration Neededβ
# This command now runs 5-10x faster automatically!
ign-lidar-hd enrich --input-dir data/ --output output/ \
--mode full --k-neighbors 30 --preprocess --use-gpu
All existing commands benefit from the optimization. No API changes required.
Hardware Requirementsβ
Minimum Requirementsβ
- CPU: 4 cores, 2.5GHz
- RAM: 8GB
- Storage: 100GB available space
- GPU: Optional, CUDA-compatible
Recommended Configurationβ
- CPU: 8+ cores, 3.0GHz+
- RAM: 32GB+
- Storage: 500GB+ SSD
- GPU: 8GB+ VRAM (RTX 3070 or better)
High-Performance Setupβ
- CPU: 16+ cores, 3.5GHz+ (Threadripper/Xeon)
- RAM: 64GB+ DDR4-3200
- Storage: 2TB+ NVMe SSD
- GPU: 16GB+ VRAM (RTX 4080/A5000 or better)
CPU Optimizationβ
Multi-Core Processingβ
from ign_lidar import Processor
import multiprocessing
# Use all available cores
num_cores = multiprocessing.cpu_count()
processor = Processor(num_workers=num_cores)
# Or specify optimal number
processor = Processor(num_workers=8) # Often optimal
Batch Size Optimizationβ
# CPU-optimized batch sizes
cpu_config = {
"small_dataset": 25000, # <10M points
"medium_dataset": 50000, # 10-50M points
"large_dataset": 100000 # >50M points
}
processor = Processor(batch_size=cpu_config["medium_dataset"])
Memory-Efficient Processingβ
def process_large_file_cpu(file_path):
"""Memory-efficient CPU processing"""
processor = Processor(
batch_size=25000,
enable_streaming=True,
memory_limit_gb=8
)
return processor.process_streaming(file_path)
GPU Optimizationβ
GPU Configurationβ
from ign_lidar import Processor
# Optimal GPU settings
gpu_processor = Processor(
use_gpu=True,
gpu_memory_fraction=0.8, # Use 80% of GPU memory
gpu_batch_size=100000, # Larger batches for GPU
mixed_precision=True # Use FP16 for speed
)
GPU Memory Managementβ
# Dynamic memory management
def adaptive_gpu_processing(points):
try:
# Try large batch first
processor = Processor(
use_gpu=True,
gpu_batch_size=200000
)
return processor.process(points)
except RuntimeError as e:
if "out of memory" in str(e):
# Fallback to smaller batch
processor = Processor(
use_gpu=True,
gpu_batch_size=50000
)
return processor.process(points)
Multi-GPU Processingβ
# Use multiple GPUs
def multi_gpu_processing(file_list):
import torch
if torch.cuda.device_count() > 1:
processors = []
for i in range(torch.cuda.device_count()):
processor = Processor(
use_gpu=True,
gpu_device=i
)
processors.append(processor)
# Distribute work across GPUs
return distribute_work(file_list, processors)
Memory Optimizationβ
Streaming Processingβ
# Process files larger than RAM
def stream_process_large_file(file_path, output_path):
processor = Processor(
enable_streaming=True,
chunk_size=1000000, # 1M points per chunk
overlap_size=10000 # 10k point overlap
)
processor.process_stream(
input_path=file_path,
output_path=output_path
)
Memory Monitoringβ
import psutil
import gc
def monitor_memory_usage():
"""Monitor and manage memory usage"""
memory_percent = psutil.virtual_memory().percent
if memory_percent > 85:
# Force garbage collection
gc.collect()
# Clear processor caches
processor.clear_cache()
print(f"Memory usage: {memory_percent}%")
Efficient Data Typesβ
# Use appropriate data types to save memory
config = {
"coordinates": "float32", # vs float64
"features": "float32", # vs float64
"labels": "uint8", # vs int32
"colors": "uint8" # vs uint16
}
processor = Processor(data_types=config)
I/O Optimizationβ
Fast File Formatsβ
# Use compressed formats
processor = Processor(
output_format="laz", # Compressed LAS
compression_level=6 # Balance size/speed
)
# Or use HDF5 for analysis
processor = Processor(
output_format="h5",
h5_compression="gzip",
h5_compression_level=4
)
Parallel I/Oβ
import concurrent.futures
import os
def parallel_file_processing(file_list, num_threads=4):
"""Process multiple files in parallel"""
def process_single_file(file_path):
processor = Processor()
return processor.process_file(file_path)
with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:
futures = [executor.submit(process_single_file, f) for f in file_list]
results = [future.result() for future in futures]
return results
SSD Optimizationβ
# Optimize for SSD storage
processor = Processor(
temp_dir="/path/to/ssd/tmp", # Use SSD for temp files
enable_io_caching=True, # Cache frequent reads
prefetch_size=1024*1024*100 # 100MB prefetch buffer
)
Network Optimizationβ
Efficient Downloadsβ
from ign_lidar import Downloader
# Optimized download settings
downloader = Downloader(
max_concurrent=8, # Parallel downloads
chunk_size=1024*1024, # 1MB chunks
retry_attempts=3,
timeout=30
)
Batch Downloadsβ
# Download multiple tiles efficiently
tile_list = ["C_3945-6730_2022", "C_3945-6735_2022"]
downloader.batch_download(
tile_list,
output_dir="./tiles/",
verify_checksums=True
)
Processing Pipeline Optimizationβ
Optimized Workflowβ
def optimized_processing_pipeline(tile_list):
"""Optimized end-to-end processing"""
# 1. Download with verification
downloader = Downloader(max_concurrent=6)
downloaded_files = downloader.batch_download(tile_list)
# 2. Process with GPU acceleration
processor = Processor(
use_gpu=True,
gpu_memory_fraction=0.8,
enable_caching=True
)
# 3. Batch processing
results = []
for file_batch in batch_files(downloaded_files, batch_size=4):
batch_results = processor.process_batch(file_batch)
results.extend(batch_results)
return results
Memory-Aware Processingβ
def memory_aware_processing(file_list, max_memory_gb=16):
"""Adjust processing based on available memory"""
available_memory = psutil.virtual_memory().available / (1024**3)
if available_memory < 8:
# Low memory mode
config = {
"batch_size": 25000,
"use_gpu": False,
"enable_streaming": True
}
elif available_memory < 32:
# Standard mode
config = {
"batch_size": 100000,
"use_gpu": True,
"gpu_memory_fraction": 0.6
}
else:
# High performance mode
config = {
"batch_size": 200000,
"use_gpu": True,
"gpu_memory_fraction": 0.8
}
processor = Processor(**config)
return processor.process_files(file_list)
Benchmarking and Profilingβ
Performance Measurementβ
import time
import cProfile
def benchmark_processing(file_path):
"""Benchmark processing performance"""
start_time = time.time()
processor = Processor(use_gpu=True)
result = processor.process_file(file_path)
end_time = time.time()
processing_time = end_time - start_time
points_per_second = len(result) / processing_time
print(f"Processing time: {processing_time:.2f}s")
print(f"Points per second: {points_per_second:,.0f}")
return result
Memory Profilingβ
from memory_profiler import profile
@profile
def profile_memory_usage():
"""Profile memory usage during processing"""
processor = Processor()
# Load data
points = load_test_data()
# Process with profiling
result = processor.process(points)
return result
Performance Best Practicesβ
General Guidelinesβ
- Use GPU acceleration when available (10-15x speedup)
- Optimize batch sizes for your hardware
- Enable caching for repeated operations
- Use appropriate data types to save memory
- Monitor memory usage and implement fallbacks
Hardware-Specific Tipsβ
For CPU-Only Systemsβ
- Use all available cores
- Implement streaming for large files
- Optimize I/O with SSD storage
- Use memory-mapped files when possible
For GPU Systemsβ
- Use mixed precision (FP16) when possible
- Implement dynamic batch sizing
- Clear GPU cache between large jobs
- Monitor GPU temperature and throttling
For Memory-Constrained Systemsβ
- Enable streaming processing
- Use smaller batch sizes
- Implement aggressive garbage collection
- Use compressed file formats
Performance Monitoringβ
System Monitoringβ
def monitor_system_performance():
"""Monitor system performance during processing"""
import psutil
import time
while processing_active:
cpu_percent = psutil.cpu_percent(interval=1)
memory_percent = psutil.virtual_memory().percent
if hasattr(psutil, 'gpu_percent'):
gpu_percent = psutil.gpu_percent()
print(f"CPU: {cpu_percent}%, RAM: {memory_percent}%, GPU: {gpu_percent}%")
else:
print(f"CPU: {cpu_percent}%, RAM: {memory_percent}%")
time.sleep(5)
Performance Loggingβ
import logging
# Configure performance logging
logging.basicConfig(
filename='performance.log',
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
def log_performance_metrics(operation, duration, points_processed):
"""Log performance metrics"""
points_per_second = points_processed / duration
logging.info(f"Operation: {operation}")
logging.info(f"Duration: {duration:.2f}s")
logging.info(f"Points: {points_processed:,}")
logging.info(f"Performance: {points_per_second:,.0f} points/second")
Troubleshooting Performance Issuesβ
Common Issues and Solutionsβ
Slow Processingβ
- Check GPU utilization: Ensure GPU is being used
- Optimize batch size: Try different values
- Check I/O bottlenecks: Use SSD storage
- Monitor memory usage: Avoid swapping
Memory Issuesβ
- Reduce batch size: Lower memory usage
- Enable streaming: For files larger than RAM
- Clear caches: Free up memory periodically
- Use compression: Reduce memory footprint
GPU Issuesβ
- Check CUDA version: Ensure compatibility
- Monitor GPU memory: Avoid out-of-memory errors
- Check thermal throttling: Ensure adequate cooling
- Update drivers: Use latest GPU drivers