Preprocessing for Artifact Mitigation
Learn how to use point cloud preprocessing to reduce LiDAR scan line artifacts and improve geometric feature quality.
Overviewβ
LiDAR point clouds often contain various artifacts that negatively impact geometric feature computation:
- Scan line patterns - "Dashed line" appearance in planarity/curvature maps
- Measurement noise - Outliers from atmospheric conditions, sensor errors
- Isolated points - Stray points from birds, insects, or reflection errors
- Density variations - Uneven point distribution affecting feature consistency
The preprocessing module addresses these issues through three complementary techniques:
- Statistical Outlier Removal (SOR) - Remove outliers based on neighbor distance statistics
- Radius Outlier Removal (ROR) - Remove isolated points without sufficient neighbors
- Voxel Downsampling - Homogenize point density (optional)
Quick Startβ
Enable with Defaultsβ
ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--preprocess
This applies:
- SOR with k=12 neighbors, std_multiplier=2.0
- ROR with radius=1.0m, min_neighbors=4
- No voxel downsampling
Building Mode with Preprocessingβ
ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--mode full \
--preprocess \
--num-workers 4
With RGB and Preprocessingβ
ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--mode full \
--preprocess \
--add-rgb \
--rgb-cache-dir cache/
Preprocessing Techniquesβ
Statistical Outlier Removal (SOR)β
Removes points whose mean distance to k-nearest neighbors is outside the statistical norm.
How it works:
- For each point, compute mean distance to k nearest neighbors
- Calculate global mean and standard deviation of these distances
- Remove points exceeding:
threshold = mean + std_multiplier Γ std
Parameters:
--sor-k
- Number of neighbors (default: 12)- Higher values: more robust but slower
- Lower values: faster but may miss some outliers
--sor-std
- Standard deviation multiplier (default: 2.0)- Higher values: more lenient (keep more points)
- Lower values: stricter filtering (remove more points)
Example:
# Conservative (keep more points)
--preprocess --sor-k 15 --sor-std 3.0
# Aggressive (remove more outliers)
--preprocess --sor-k 10 --sor-std 1.5
Radius Outlier Removal (ROR)β
Removes points that don't have enough neighbors within a specified radius.
How it works:
- For each point, count neighbors within search radius
- Remove points with fewer than min_neighbors
Parameters:
--ror-radius
- Search radius in meters (default: 1.0)- Urban: 0.5-1.0m (denser point clouds)
- Rural: 1.0-2.0m (sparser point clouds)
--ror-neighbors
- Minimum required neighbors (default: 4)- Higher values: stricter isolation detection
- Lower values: more lenient
Example:
# Urban areas (dense clouds)
--preprocess --ror-radius 0.8 --ror-neighbors 5
# Rural areas (sparse clouds)
--preprocess --ror-radius 1.5 --ror-neighbors 3
Voxel Downsamplingβ
Divides space into voxels (3D grid cells) and reduces points to one representative per voxel.
How it works:
- Divide space into voxel grid of size
voxel_size
- For each voxel with points, compute centroid
- Replace all points in voxel with single centroid point
Parameters:
--voxel-size
- Voxel size in meters (optional, e.g., 0.5)- Smaller values: preserve more detail (0.2-0.5m)
- Larger values: aggressive downsampling (0.5-1.0m)
Example:
# Moderate downsampling
--preprocess --voxel-size 0.5
# Aggressive memory reduction
--preprocess --voxel-size 1.0
When to use:
- Very dense point clouds (>20M points/tile)
- Memory-constrained systems
- When processing speed is critical
- When uniform density is desired
Recommended Presetsβ
Conservative (Default)β
Preserve maximum detail while removing clear outliers:
ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--preprocess \
--sor-k 12 \
--sor-std 2.0 \
--ror-radius 1.0 \
--ror-neighbors 4
Use for:
- High-quality datasets
- Detailed building extraction
- When preserving fine features
Balancedβ
Good quality improvement with reasonable processing time:
ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--preprocess \
--sor-k 12 \
--sor-std 2.0 \
--ror-radius 1.0 \
--ror-neighbors 4 \
--voxel-size 0.5
Use for:
- General-purpose processing
- Large batches of tiles
- Moderate memory constraints
Aggressiveβ
Maximum artifact removal and memory reduction:
ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--preprocess \
--sor-k 10 \
--sor-std 1.5 \
--ror-radius 0.8 \
--ror-neighbors 5 \
--voxel-size 0.3
Use for:
- Noisy datasets
- Memory-limited systems
- When speed > detail
- Initial exploratory analysis
Urban Scenesβ
Optimized for dense urban environments:
ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--preprocess \
--sor-k 15 \
--sor-std 2.5 \
--ror-radius 0.8 \
--ror-neighbors 5
Natural Scenesβ
Optimized for vegetation and rural areas:
ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--preprocess \
--sor-k 12 \
--sor-std 3.0 \
--ror-radius 1.5 \
--ror-neighbors 3
Python APIβ
Using LiDARProcessorβ
from ign_lidar import LiDARProcessor
# Enable preprocessing with defaults
processor = LiDARProcessor(
lod_level='LOD2',
include_extra_features=True,
preprocess=True
)
# Custom preprocessing configuration
preprocess_config = {
'sor': {
'enable': True,
'k': 15,
'std_multiplier': 2.5
},
'ror': {
'enable': True,
'radius': 1.0,
'min_neighbors': 4
},
'voxel': {
'enable': False
}
}
processor = LiDARProcessor(
lod_level='LOD2',
include_extra_features=True,
preprocess=True,
preprocess_config=preprocess_config
)
# Process tiles
processor.process_tile('tile.laz', 'output/')
Direct Preprocessing Functionsβ
from ign_lidar.preprocessing import (
statistical_outlier_removal,
radius_outlier_removal,
voxel_downsample,
preprocess_point_cloud
)
import numpy as np
# Load your points (NΓ3 array)
points = np.load('points.npy')
# Apply individual filters
filtered_sor, mask_sor = statistical_outlier_removal(
points, k=12, std_multiplier=2.0
)
filtered_ror, mask_ror = radius_outlier_removal(
points, radius=1.0, min_neighbors=4
)
downsampled, voxel_indices = voxel_downsample(
points, voxel_size=0.5, method='centroid'
)
# Or use the complete pipeline
processed, stats = preprocess_point_cloud(points, config=None)
print(f"Original: {stats['original_points']:,} points")
print(f"Final: {stats['final_points']:,} points")
print(f"Reduction: {stats['reduction_ratio']:.1%}")
print(f"Time: {stats['processing_time_ms']:.0f}ms")
Performance Impactβ
Processing Timeβ
Configuration | Overhead | Typical Tile Time |
---|---|---|
No preprocessing | 0% (baseline) | 2-5 minutes |
SOR only | +5-10% | 2.2-5.5 minutes |
SOR + ROR | +10-20% | 2.5-6 minutes |
SOR + ROR + Voxel | +15-30% | 2.8-6.5 minutes |
Memory Usageβ
Configuration | Memory Impact |
---|---|
No preprocessing | Baseline |
SOR/ROR | +10% (KDTree overhead) |
With voxel (0.5m) | -30% to -50% (fewer points) |
With voxel (1.0m) | -50% to -70% (fewer points) |
Quality Improvementβ
Based on comprehensive artifact analysis:
Metric | Improvement |
---|---|
Scan line artifacts | 60-80% reduction |
Surface normal quality | 40-60% cleaner |
Edge discontinuities | 30-50% smoother |
Degenerate features | 20-40% fewer |
Best Practicesβ
When to Use Preprocessingβ
β Use preprocessing when:
- You observe scan line patterns in feature visualizations
- Point clouds have visible noise or outliers
- Building edges appear jagged or noisy
- Training ML models (cleaner features = better results)
- Processing large batches (voxel helps with memory)
β Skip preprocessing when:
- Point clouds are already clean (manual inspection)
- You need absolute maximum detail preservation
- Processing time is critical constraint
- Working with very small/sparse clouds (less than 100k points)
Parameter Tuning Tipsβ
- Start with defaults - They work well for most IGN LiDAR HD data
- Visualize before/after - Use CloudCompare or similar tools
- Check reduction ratio - Aim for less than 10% point reduction for conservative filtering
- Monitor processing time - Balance quality vs. speed for your use case
- Test on representative tiles - Don't tune on outliers
Troubleshootingβ
Problem: Too many points removed (>20% reduction)
Solution:
- Increase
--sor-std
to 3.0 or higher - Increase
--sor-k
to 15-20 - Increase
--ror-radius
to 1.5-2.0 - Decrease
--ror-neighbors
to 3
Problem: Still see scan line artifacts
Solution:
- Decrease
--sor-std
to 1.5 - Decrease
--sor-k
to 10 - Decrease
--ror-radius
to 0.8 - Increase
--ror-neighbors
to 5-6 - Add
--voxel-size 0.5
Problem: Processing too slow
Solution:
- Disable voxel downsampling
- Increase
--sor-k
to reduce KDTree rebuilds - Reduce
--num-workers
to avoid memory thrashing - Consider processing in batches
Problem: Out of memory errors
Solution:
- Add
--voxel-size 0.5
or smaller - Reduce
--num-workers
- Process tiles individually (
--input
instead of--input-dir
) - Increase system swap space
Examplesβ
Example 1: Basic Usageβ
# Process a single tile with preprocessing
ign-lidar-hd enrich \
--input raw_tiles/tile_001.laz \
--output enriched/ \
--mode full \
--preprocess
Example 2: Batch Processingβ
# Process all tiles with conservative preprocessing
ign-lidar-hd enrich \
--input-dir raw_tiles/ \
--output enriched/ \
--mode full \
--preprocess \
--sor-k 15 \
--sor-std 3.0 \
--num-workers 4
Example 3: Memory-Constrained Systemβ
# Use voxel downsampling to reduce memory usage
ign-lidar-hd enrich \
--input-dir raw_tiles/ \
--output enriched/ \
--mode full \
--preprocess \
--voxel-size 0.5 \
--num-workers 2
Example 4: High-Quality Processingβ
# Conservative preprocessing + RGB for best quality
ign-lidar-hd enrich \
--input-dir raw_tiles/ \
--output enriched/ \
--mode full \
--preprocess \
--sor-k 15 \
--sor-std 2.5 \
--ror-radius 1.0 \
--ror-neighbors 4 \
--add-rgb \
--rgb-cache-dir cache/ \
--num-workers 2
Related Documentationβ
- Artifact Analysis Report - Detailed analysis of LiDAR artifacts
- Implementation Guide - Technical implementation details
- Integration Details - CLI and processor integration
- CLI Commands - Complete CLI reference
- Python API - Processor API documentation
Referencesβ
- PDAL Statistical Outlier Removal: https://pdal.io/stages/filters.outlier.html
- PDAL Radius Outlier Removal: https://pdal.io/stages/filters.radiusoutlier.html
- PCL Voxel Grid: https://pointclouds.org/documentation/classpcl_1_1_voxel_grid.html