Smart Skip Detection
Smart Skip Detection prevents redundant operations by automatically detecting and skipping existing files during downloads, enrichment, and processing workflows.
Overviewβ
This feature adds intelligent skip detection to all workflows:
- Download Skip - Avoid re-downloading existing tiles
- Enrichment Skip - Skip files that are already enriched
- Processing Skip - Skip tiles with existing patches
Key Benefitsβ
β‘ Time Savingsβ
- Downloads: Skip re-downloading tiles (~60 min saved on 50 tiles)
- Processing: Skip reprocessing tiles (~90 min saved on 50 tiles)
- Total: ~150 minutes saved on typical workflow
πΎ Resource Savingsβ
- Bandwidth: Avoid downloading duplicate large files (12+ GB on 50 tiles)
- Disk Space: Avoid creating duplicate patches
- CPU/Memory: Avoid redundant feature computation
π Workflow Improvementsβ
- Resume Capability: Easily resume after interruptions
- Incremental Builds: Add new data to existing datasets
- Idempotent Operations: Safe to run commands multiple times
Smart Download Skipβ
Automatically skips existing tiles during download:
# Downloads only missing tiles
ign-lidar-hd download \
--bbox 2.0,48.8,2.5,49.0 \
--output tiles/
# Output shows what's skipped vs downloaded
βοΈ tile_001.laz already exists (245 MB), skipping
Downloading tile_002.laz...
β
Downloaded tile_002.laz (238 MB)
π Download Summary:
Total tiles requested: 10
β
Successfully downloaded: 7
βοΈ Skipped (already present): 2
β Failed: 1
Force Re-downloadβ
Use --force
flag to override skip behavior:
# Force re-download all tiles
ign-lidar-hd download \
--bbox 2.0,48.8,2.5,49.0 \
--output tiles/ \
--force
Smart Enrichment Skipβ
Automatically skips LAZ files that are already enriched:
# Enriches only files without building features
ign-lidar-hd enrich \
--input-dir /path/to/raw_tiles/ \
--output /path/to/enriched_tiles/ \
--mode full
# Shows progress and skip statistics
[1/20] Processing: tile_001.laz
β
Enriched: 1.2M points in 15.3s
[2/20] βοΈ tile_002.laz: Already enriched, skipping
[3/20] Processing: tile_003.laz
β
Enriched: 980K points in 12.1s
Force Re-enrichmentβ
Use --force
flag to re-enrich files:
# Force re-enrichment of all files
ign-lidar-hd enrich \
--input-dir /path/to/raw_tiles/ \
--output /path/to/enriched_tiles/ \
--mode full \
--force
Smart Processing Skipβ
Automatically skips tiles with existing patches:
# Processes only tiles without patches
ign-lidar-hd process \
--input enriched_tiles/ \
--output patches/ \
--lod-level LOD2
# Shows detailed skip/process statistics
[1/20] Processing: tile_001.laz
β
Completed: 48 patches in 23.5s
[2/20] βοΈ tile_002.laz: 52 patches exist, skipping
[3/20] Processing: tile_003.laz
β
Completed: 45 patches in 21.2s
π Processing Summary:
Total tiles: 20
β
Processed: 15
βοΈ Skipped: 5
π¦ Total patches created: 712
Force Reprocessingβ
Use --force
flag to reprocess all tiles:
# Force reprocess all tiles
ign-lidar-hd process \
--input enriched_tiles/ \
--output patches/ \
--lod-level LOD2 \
--force
Common Use Casesβ
1. Resume After Interruptionβ
# Start big job
ign-lidar-hd download --bbox ... --output tiles/ --max-tiles 100
ign-lidar-hd process --input tiles/ --output patches/
# System crashes at tile 45...
# Resume - automatically skips completed work
ign-lidar-hd download --bbox ... --output tiles/ --max-tiles 100 # Skips 45 tiles
ign-lidar-hd process --input tiles/ --output patches/ # Skips 45 tiles
2. Incremental Dataset Buildingβ
# Week 1: Download Paris
ign-lidar-hd download --bbox 2.0,48.8,2.5,49.0 --output france_tiles/
ign-lidar-hd process --input france_tiles/ --output france_patches/
# Week 2: Add Lyon (some overlap)
ign-lidar-hd download --bbox 4.7,45.6,5.0,45.9 --output france_tiles/
# Skips any overlapping tiles
ign-lidar-hd process --input france_tiles/ --output france_patches/
# Skips Paris tiles, processes only Lyon tiles
3. Batch Processing with Mixed Statusβ
# Process a directory with mixed completion status
ign-lidar-hd process --input mixed_tiles/ --output patches/
# Output shows what's done vs what needs processing
βοΈ Tiles with existing patches: 15
β
New tiles processed: 8
β Failed tiles: 2
Python APIβ
Download with Skip Controlβ
from ign_lidar import IGNLiDARDownloader
from pathlib import Path
downloader = IGNLiDARDownloader(Path("tiles/"))
# Skip existing by default
results = downloader.batch_download(tile_list)
# Force re-download
results = downloader.batch_download(tile_list, skip_existing=False)
# Check individual results
success, was_skipped = downloader.download_tile(filename)
if was_skipped:
print(f"Skipped {filename} (already exists)")
elif success:
print(f"Downloaded {filename}")
else:
print(f"Failed to download {filename}")
Processing with Skip Controlβ
from ign_lidar import LiDARProcessor
from pathlib import Path
processor = LiDARProcessor(lod_level='LOD2')
# Skip existing patches by default
patches = processor.process_directory(
Path("enriched_tiles/"),
Path("patches/")
)
# Force reprocessing
patches = processor.process_directory(
Path("enriched_tiles/"),
Path("patches/"),
skip_existing=False
)
Performance Impactβ
Skip Check Performanceβ
- File existence check: ~0.001-0.01s per file
- Patch directory check: ~0.01-0.05s per tile
- Enrichment check: ~0.02-0.1s per file
Time Comparisonβ
100 tiles, 50% already processed:
Without Skip Detection:
Download: 100 tiles Γ 45s = 75 min
Process: 100 tiles Γ 35s = 58 min
Total: 133 minutes
With Skip Detection:
Download: 50 skipped (0.5 min) + 50 new (37.5 min) = 38 min
Process: 50 skipped (0.4 min) + 50 new (29 min) = 29.4 min
Total: 67.4 minutes
Time saved: 65.6 minutes (49% reduction)
Configurationβ
Smart skip detection is enabled by default for all operations. You can control it via:
CLI Flagsβ
# Default: Skip existing
ign-lidar-hd command [args]
# Force override: Process everything
ign-lidar-hd command [args] --force
Python Parametersβ
# Default: skip_existing=True
processor.process_tile(file, output_dir)
# Override: skip_existing=False
processor.process_tile(file, output_dir, skip_existing=False)
Skip Detection Logicβ
Download Skipβ
- Checks if LAZ file exists in output directory
- Compares file size (skips if > 1MB, indicating complete download)
- Logs skip reason and file size
Enrichment Skipβ
- Checks if output file already exists
- Validates that file contains building features
- Skips if features are already present
Processing Skipβ
- Checks if patch directory exists for the tile
- Counts existing .npz patches
- Skips if patches already exist (non-zero count)
Troubleshootingβ
Files Not Being Skippedβ
Check that file paths and naming are consistent:
# Verify file naming patterns
ls -la tiles/
ls -la patches/
Unexpected Skipsβ
Use verbose logging to see skip decisions:
# Enable debug logging
ign-lidar-hd process --input tiles/ --output patches/ --verbose
Force Reprocessingβ
When you need to reprocess everything:
# Method 1: Use --force flag
ign-lidar-hd process --input tiles/ --output patches/ --force
# Method 2: Clear output directory
rm -rf patches/*
ign-lidar-hd process --input tiles/ --output patches/