Smart Skip Detection
Smart Skip Detection prevents redundant operations by automatically detecting and skipping existing files during downloads, enrichment, and processing workflows.
Overview
This feature adds intelligent skip detection to all workflows:
- Download Skip - Avoid re-downloading existing tiles
- Enrichment Skip - Skip files that are already enriched
- Processing Skip - Skip tiles with existing patches
Key Benefits
⚡ Time Savings
- Downloads: Skip re-downloading tiles (~60 min saved on 50 tiles)
- Processing: Skip reprocessing tiles (~90 min saved on 50 tiles)
- Total: ~150 minutes saved on typical workflow
💾 Resource Savings
- Bandwidth: Avoid downloading duplicate large files (12+ GB on 50 tiles)
- Disk Space: Avoid creating duplicate patches
- CPU/Memory: Avoid redundant feature computation
🔄 Workflow Improvements
- Resume Capability: Easily resume after interruptions
- Incremental Builds: Add new data to existing datasets
- Idempotent Operations: Safe to run commands multiple times
Smart Download Skip
Automatically skips existing tiles during download:
# Downloads only missing tiles
ign-lidar-hd download \
--bbox 2.0,48.8,2.5,49.0 \
--output tiles/
# Output shows what's skipped vs downloaded
⏭️ tile_001.laz already exists (245 MB), skipping
Downloading tile_002.laz...
✅ Downloaded tile_002.laz (238 MB)
📊 Download Summary:
Total tiles requested: 10
✅ Successfully downloaded: 7
⏭️ Skipped (already present): 2
❌ Failed: 1
Force Re-download
Use --force
flag to override skip behavior:
# Force re-download all tiles
ign-lidar-hd download \
--bbox 2.0,48.8,2.5,49.0 \
--output tiles/ \
--force
Smart Enrichment Skip
Automatically skips LAZ files that are already enriched:
# Enriches only files without building features
ign-lidar-hd enrich \
--input-dir /path/to/raw_tiles/ \
--output /path/to/enriched_tiles/ \
--mode full
# Shows progress and skip statistics
[1/20] Processing: tile_001.laz
✅ Enriched: 1.2M points in 15.3s
[2/20] ⏭️ tile_002.laz: Already enriched, skipping
[3/20] Processing: tile_003.laz
✅ Enriched: 980K points in 12.1s
Force Re-enrichment
Use --force
flag to re-enrich files:
# Force re-enrichment of all files
ign-lidar-hd enrich \
--input-dir /path/to/raw_tiles/ \
--output /path/to/enriched_tiles/ \
--mode full \
--force
Smart Processing Skip
Automatically skips tiles with existing patches:
# Processes only tiles without patches
ign-lidar-hd process \
--input enriched_tiles/ \
--output patches/ \
--lod-level LOD2
# Shows detailed skip/process statistics
[1/20] Processing: tile_001.laz
✅ Completed: 48 patches in 23.5s
[2/20] ⏭️ tile_002.laz: 52 patches exist, skipping
[3/20] Processing: tile_003.laz
✅ Completed: 45 patches in 21.2s
📊 Processing Summary:
Total tiles: 20
✅ Processed: 15
⏭️ Skipped: 5
📦 Total patches created: 712
Force Reprocessing
Use --force
flag to reprocess all tiles:
# Force reprocess all tiles
ign-lidar-hd process \
--input enriched_tiles/ \
--output patches/ \
--lod-level LOD2 \
--force
Common Use Cases
1. Resume After Interruption
# Start big job
ign-lidar-hd download --bbox ... --output tiles/ --max-tiles 100
ign-lidar-hd process --input tiles/ --output patches/
# System crashes at tile 45...
# Resume - automatically skips completed work
ign-lidar-hd download --bbox ... --output tiles/ --max-tiles 100 # Skips 45 tiles
ign-lidar-hd process --input tiles/ --output patches/ # Skips 45 tiles
2. Incremental Dataset Building
# Week 1: Download Paris
ign-lidar-hd download --bbox 2.0,48.8,2.5,49.0 --output france_tiles/
ign-lidar-hd process --input france_tiles/ --output france_patches/
# Week 2: Add Lyon (some overlap)
ign-lidar-hd download --bbox 4.7,45.6,5.0,45.9 --output france_tiles/
# Skips any overlapping tiles
ign-lidar-hd process --input france_tiles/ --output france_patches/
# Skips Paris tiles, processes only Lyon tiles
3. Batch Processing with Mixed Status
# Process a directory with mixed completion status
ign-lidar-hd process --input mixed_tiles/ --output patches/
# Output shows what's done vs what needs processing
⏭️ Tiles with existing patches: 15
✅ New tiles processed: 8
❌ Failed tiles: 2
Python API
Download with Skip Control
from ign_lidar import IGNLiDARDownloader
from pathlib import Path
downloader = IGNLiDARDownloader(Path("tiles/"))
# Skip existing by default
results = downloader.batch_download(tile_list)
# Force re-download
results = downloader.batch_download(tile_list, skip_existing=False)
# Check individual results
success, was_skipped = downloader.download_tile(filename)
if was_skipped:
print(f"Skipped {filename} (already exists)")
elif success:
print(f"Downloaded {filename}")
else:
print(f"Failed to download {filename}")
Processing with Skip Control
from ign_lidar import LiDARProcessor
from pathlib import Path
processor = LiDARProcessor(lod_level='LOD2')
# Skip existing patches by default
patches = processor.process_directory(
Path("enriched_tiles/"),
Path("patches/")
)
# Force reprocessing
patches = processor.process_directory(
Path("enriched_tiles/"),
Path("patches/"),
skip_existing=False
)
Performance Impact
Skip Check Performance
- File existence check: ~0.001-0.01s per file
- Patch directory check: ~0.01-0.05s per tile
- Enrichment check: ~0.02-0.1s per file
Time Comparison
100 tiles, 50% already processed:
Without Skip Detection:
Download: 100 tiles × 45s = 75 min
Process: 100 tiles × 35s = 58 min
Total: 133 minutes
With Skip Detection:
Download: 50 skipped (0.5 min) + 50 new (37.5 min) = 38 min
Process: 50 skipped (0.4 min) + 50 new (29 min) = 29.4 min
Total: 67.4 minutes
Time saved: 65.6 minutes (49% reduction)
Configuration
Smart skip detection is enabled by default for all operations. You can control it via:
CLI Flags
# Default: Skip existing
ign-lidar-hd command [args]
# Force override: Process everything
ign-lidar-hd command [args] --force
Python Parameters
# Default: skip_existing=True
processor.process_tile(file, output_dir)
# Override: skip_existing=False
processor.process_tile(file, output_dir, skip_existing=False)
Skip Detection Logic
Download Skip
- Checks if LAZ file exists in output directory
- Compares file size (skips if > 1MB, indicating complete download)
- Logs skip reason and file size
Enrichment Skip
- Checks if output file already exists
- Validates that file contains building features
- Skips if features are already present
Processing Skip
- Checks if patch directory exists for the tile
- Counts existing .npz patches
- Skips if patches already exist (non-zero count)
Troubleshooting
Files Not Being Skipped
Check that file paths and naming are consistent:
# Verify file naming patterns
ls -la tiles/
ls -la patches/
Unexpected Skips
Use verbose logging to see skip decisions:
# Enable debug logging
ign-lidar-hd process --input tiles/ --output patches/ --verbose
Force Reprocessing
When you need to reprocess everything:
# Method 1: Use --force flag
ign-lidar-hd process --input tiles/ --output patches/ --force
# Method 2: Clear output directory
rm -rf patches/*
ign-lidar-hd process --input tiles/ --output patches/