Aller au contenu principal

Preprocessing for Artifact Mitigation

Learn how to use point cloud preprocessing to reduce LiDAR scan line artifacts and improve geometric feature quality.

Overview

LiDAR point clouds often contain various artifacts that negatively impact geometric feature computation:

  • Scan line patterns - "Dashed line" appearance in planarity/curvature maps
  • Measurement noise - Outliers from atmospheric conditions, sensor errors
  • Isolated points - Stray points from birds, insects, or reflection errors
  • Density variations - Uneven point distribution affecting feature consistency

The preprocessing module addresses these issues through three complementary techniques:

  1. Statistical Outlier Removal (SOR) - Remove outliers based on neighbor distance statistics
  2. Radius Outlier Removal (ROR) - Remove isolated points without sufficient neighbors
  3. Voxel Downsampling - Homogenize point density (optional)

Quick Start

Enable with Defaults

ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--preprocess

This applies:

  • SOR with k=12 neighbors, std_multiplier=2.0
  • ROR with radius=1.0m, min_neighbors=4
  • No voxel downsampling

Building Mode with Preprocessing

ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--mode full \
--preprocess \
--num-workers 4

With RGB and Preprocessing

ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--mode full \
--preprocess \
--add-rgb \
--rgb-cache-dir cache/

Preprocessing Techniques

Statistical Outlier Removal (SOR)

Removes points whose mean distance to k-nearest neighbors is outside the statistical norm.

How it works:

  1. For each point, compute mean distance to k nearest neighbors
  2. Calculate global mean and standard deviation of these distances
  3. Remove points exceeding: threshold = mean + std_multiplier × std

Parameters:

  • --sor-k - Number of neighbors (default: 12)
    • Higher values: more robust but slower
    • Lower values: faster but may miss some outliers
  • --sor-std - Standard deviation multiplier (default: 2.0)
    • Higher values: more lenient (keep more points)
    • Lower values: stricter filtering (remove more points)

Example:

# Conservative (keep more points)
--preprocess --sor-k 15 --sor-std 3.0

# Aggressive (remove more outliers)
--preprocess --sor-k 10 --sor-std 1.5

Radius Outlier Removal (ROR)

Removes points that don't have enough neighbors within a specified radius.

How it works:

  1. For each point, count neighbors within search radius
  2. Remove points with fewer than min_neighbors

Parameters:

  • --ror-radius - Search radius in meters (default: 1.0)
    • Urban: 0.5-1.0m (denser point clouds)
    • Rural: 1.0-2.0m (sparser point clouds)
  • --ror-neighbors - Minimum required neighbors (default: 4)
    • Higher values: stricter isolation detection
    • Lower values: more lenient

Example:

# Urban areas (dense clouds)
--preprocess --ror-radius 0.8 --ror-neighbors 5

# Rural areas (sparse clouds)
--preprocess --ror-radius 1.5 --ror-neighbors 3

Voxel Downsampling

Divides space into voxels (3D grid cells) and reduces points to one representative per voxel.

How it works:

  1. Divide space into voxel grid of size voxel_size
  2. For each voxel with points, compute centroid
  3. Replace all points in voxel with single centroid point

Parameters:

  • --voxel-size - Voxel size in meters (optional, e.g., 0.5)
    • Smaller values: preserve more detail (0.2-0.5m)
    • Larger values: aggressive downsampling (0.5-1.0m)

Example:

# Moderate downsampling
--preprocess --voxel-size 0.5

# Aggressive memory reduction
--preprocess --voxel-size 1.0

When to use:

  • Very dense point clouds (>20M points/tile)
  • Memory-constrained systems
  • When processing speed is critical
  • When uniform density is desired

Conservative (Default)

Preserve maximum detail while removing clear outliers:

ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--preprocess \
--sor-k 12 \
--sor-std 2.0 \
--ror-radius 1.0 \
--ror-neighbors 4

Use for:

  • High-quality datasets
  • Detailed building extraction
  • When preserving fine features

Balanced

Good quality improvement with reasonable processing time:

ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--preprocess \
--sor-k 12 \
--sor-std 2.0 \
--ror-radius 1.0 \
--ror-neighbors 4 \
--voxel-size 0.5

Use for:

  • General-purpose processing
  • Large batches of tiles
  • Moderate memory constraints

Aggressive

Maximum artifact removal and memory reduction:

ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--preprocess \
--sor-k 10 \
--sor-std 1.5 \
--ror-radius 0.8 \
--ror-neighbors 5 \
--voxel-size 0.3

Use for:

  • Noisy datasets
  • Memory-limited systems
  • When speed > detail
  • Initial exploratory analysis

Urban Scenes

Optimized for dense urban environments:

ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--preprocess \
--sor-k 15 \
--sor-std 2.5 \
--ror-radius 0.8 \
--ror-neighbors 5

Natural Scenes

Optimized for vegetation and rural areas:

ign-lidar-hd enrich \
--input-dir tiles/ \
--output enriched/ \
--preprocess \
--sor-k 12 \
--sor-std 3.0 \
--ror-radius 1.5 \
--ror-neighbors 3

Python API

Using LiDARProcessor

from ign_lidar import LiDARProcessor

# Enable preprocessing with defaults
processor = LiDARProcessor(
lod_level='LOD2',
include_extra_features=True,
preprocess=True
)

# Custom preprocessing configuration
preprocess_config = {
'sor': {
'enable': True,
'k': 15,
'std_multiplier': 2.5
},
'ror': {
'enable': True,
'radius': 1.0,
'min_neighbors': 4
},
'voxel': {
'enable': False
}
}

processor = LiDARProcessor(
lod_level='LOD2',
include_extra_features=True,
preprocess=True,
preprocess_config=preprocess_config
)

# Process tiles
processor.process_tile('tile.laz', 'output/')

Direct Preprocessing Functions

from ign_lidar.preprocessing import (
statistical_outlier_removal,
radius_outlier_removal,
voxel_downsample,
preprocess_point_cloud
)
import numpy as np

# Load your points (N×3 array)
points = np.load('points.npy')

# Apply individual filters
filtered_sor, mask_sor = statistical_outlier_removal(
points, k=12, std_multiplier=2.0
)

filtered_ror, mask_ror = radius_outlier_removal(
points, radius=1.0, min_neighbors=4
)

downsampled, voxel_indices = voxel_downsample(
points, voxel_size=0.5, method='centroid'
)

# Or use the complete pipeline
processed, stats = preprocess_point_cloud(points, config=None)

print(f"Original: {stats['original_points']:,} points")
print(f"Final: {stats['final_points']:,} points")
print(f"Reduction: {stats['reduction_ratio']:.1%}")
print(f"Time: {stats['processing_time_ms']:.0f}ms")

Performance Impact

Processing Time

ConfigurationOverheadTypical Tile Time
No preprocessing0% (baseline)2-5 minutes
SOR only+5-10%2.2-5.5 minutes
SOR + ROR+10-20%2.5-6 minutes
SOR + ROR + Voxel+15-30%2.8-6.5 minutes

Memory Usage

ConfigurationMemory Impact
No preprocessingBaseline
SOR/ROR+10% (KDTree overhead)
With voxel (0.5m)-30% to -50% (fewer points)
With voxel (1.0m)-50% to -70% (fewer points)

Quality Improvement

Based on comprehensive artifact analysis:

MetricImprovement
Scan line artifacts60-80% reduction
Surface normal quality40-60% cleaner
Edge discontinuities30-50% smoother
Degenerate features20-40% fewer

Best Practices

When to Use Preprocessing

Use preprocessing when:

  • You observe scan line patterns in feature visualizations
  • Point clouds have visible noise or outliers
  • Building edges appear jagged or noisy
  • Training ML models (cleaner features = better results)
  • Processing large batches (voxel helps with memory)

Skip preprocessing when:

  • Point clouds are already clean (manual inspection)
  • You need absolute maximum detail preservation
  • Processing time is critical constraint
  • Working with very small/sparse clouds (less than 100k points)

Parameter Tuning Tips

  1. Start with defaults - They work well for most IGN LiDAR HD data
  2. Visualize before/after - Use CloudCompare or similar tools
  3. Check reduction ratio - Aim for less than 10% point reduction for conservative filtering
  4. Monitor processing time - Balance quality vs. speed for your use case
  5. Test on representative tiles - Don't tune on outliers

Troubleshooting

Problem: Too many points removed (>20% reduction)

Solution:

  • Increase --sor-std to 3.0 or higher
  • Increase --sor-k to 15-20
  • Increase --ror-radius to 1.5-2.0
  • Decrease --ror-neighbors to 3

Problem: Still see scan line artifacts

Solution:

  • Decrease --sor-std to 1.5
  • Decrease --sor-k to 10
  • Decrease --ror-radius to 0.8
  • Increase --ror-neighbors to 5-6
  • Add --voxel-size 0.5

Problem: Processing too slow

Solution:

  • Disable voxel downsampling
  • Increase --sor-k to reduce KDTree rebuilds
  • Reduce --num-workers to avoid memory thrashing
  • Consider processing in batches

Problem: Out of memory errors

Solution:

  • Add --voxel-size 0.5 or smaller
  • Reduce --num-workers
  • Process tiles individually (--input instead of --input-dir)
  • Increase system swap space

Examples

Example 1: Basic Usage

# Process a single tile with preprocessing
ign-lidar-hd enrich \
--input raw_tiles/tile_001.laz \
--output enriched/ \
--mode full \
--preprocess

Example 2: Batch Processing

# Process all tiles with conservative preprocessing
ign-lidar-hd enrich \
--input-dir raw_tiles/ \
--output enriched/ \
--mode full \
--preprocess \
--sor-k 15 \
--sor-std 3.0 \
--num-workers 4

Example 3: Memory-Constrained System

# Use voxel downsampling to reduce memory usage
ign-lidar-hd enrich \
--input-dir raw_tiles/ \
--output enriched/ \
--mode full \
--preprocess \
--voxel-size 0.5 \
--num-workers 2

Example 4: High-Quality Processing

# Conservative preprocessing + RGB for best quality
ign-lidar-hd enrich \
--input-dir raw_tiles/ \
--output enriched/ \
--mode full \
--preprocess \
--sor-k 15 \
--sor-std 2.5 \
--ror-radius 1.0 \
--ror-neighbors 4 \
--add-rgb \
--rgb-cache-dir cache/ \
--num-workers 2

References