Skip to main content

Smart Skip Detection

Smart Skip Detection prevents redundant operations by automatically detecting and skipping existing files during downloads, enrichment, and processing workflows.

Overview​

This feature adds intelligent skip detection to all workflows:

  • Download Skip - Avoid re-downloading existing tiles
  • Enrichment Skip - Skip files that are already enriched
  • Processing Skip - Skip tiles with existing patches

Key Benefits​

⚑ Time Savings​

  • Downloads: Skip re-downloading tiles (~60 min saved on 50 tiles)
  • Processing: Skip reprocessing tiles (~90 min saved on 50 tiles)
  • Total: ~150 minutes saved on typical workflow

πŸ’Ύ Resource Savings​

  • Bandwidth: Avoid downloading duplicate large files (12+ GB on 50 tiles)
  • Disk Space: Avoid creating duplicate patches
  • CPU/Memory: Avoid redundant feature computation

πŸ”„ Workflow Improvements​

  • Resume Capability: Easily resume after interruptions
  • Incremental Builds: Add new data to existing datasets
  • Idempotent Operations: Safe to run commands multiple times

Smart Download Skip​

Automatically skips existing tiles during download:

# Downloads only missing tiles
ign-lidar-hd download \
--bbox 2.0,48.8,2.5,49.0 \
--output tiles/

# Output shows what's skipped vs downloaded
⏭️ tile_001.laz already exists (245 MB), skipping
Downloading tile_002.laz...
βœ… Downloaded tile_002.laz (238 MB)

πŸ“Š Download Summary:
Total tiles requested: 10
βœ… Successfully downloaded: 7
⏭️ Skipped (already present): 2
❌ Failed: 1

Force Re-download​

Use --force flag to override skip behavior:

# Force re-download all tiles
ign-lidar-hd download \
--bbox 2.0,48.8,2.5,49.0 \
--output tiles/ \
--force

Smart Enrichment Skip​

Automatically skips LAZ files that are already enriched:

# Enriches only files without building features
ign-lidar-hd enrich \
--input-dir /path/to/raw_tiles/ \
--output /path/to/enriched_tiles/ \
--mode full

# Shows progress and skip statistics
[1/20] Processing: tile_001.laz
βœ… Enriched: 1.2M points in 15.3s
[2/20] ⏭️ tile_002.laz: Already enriched, skipping
[3/20] Processing: tile_003.laz
βœ… Enriched: 980K points in 12.1s

Force Re-enrichment​

Use --force flag to re-enrich files:

# Force re-enrichment of all files
ign-lidar-hd enrich \
--input-dir /path/to/raw_tiles/ \
--output /path/to/enriched_tiles/ \
--mode full \
--force

Smart Processing Skip​

Automatically skips tiles with existing patches:

# Processes only tiles without patches
ign-lidar-hd process \
--input enriched_tiles/ \
--output patches/ \
--lod-level LOD2

# Shows detailed skip/process statistics
[1/20] Processing: tile_001.laz
βœ… Completed: 48 patches in 23.5s
[2/20] ⏭️ tile_002.laz: 52 patches exist, skipping
[3/20] Processing: tile_003.laz
βœ… Completed: 45 patches in 21.2s

πŸ“Š Processing Summary:
Total tiles: 20
βœ… Processed: 15
⏭️ Skipped: 5
πŸ“¦ Total patches created: 712

Force Reprocessing​

Use --force flag to reprocess all tiles:

# Force reprocess all tiles
ign-lidar-hd process \
--input enriched_tiles/ \
--output patches/ \
--lod-level LOD2 \
--force

Common Use Cases​

1. Resume After Interruption​

# Start big job
ign-lidar-hd download --bbox ... --output tiles/ --max-tiles 100
ign-lidar-hd process --input tiles/ --output patches/

# System crashes at tile 45...

# Resume - automatically skips completed work
ign-lidar-hd download --bbox ... --output tiles/ --max-tiles 100 # Skips 45 tiles
ign-lidar-hd process --input tiles/ --output patches/ # Skips 45 tiles

2. Incremental Dataset Building​

# Week 1: Download Paris
ign-lidar-hd download --bbox 2.0,48.8,2.5,49.0 --output france_tiles/
ign-lidar-hd process --input france_tiles/ --output france_patches/

# Week 2: Add Lyon (some overlap)
ign-lidar-hd download --bbox 4.7,45.6,5.0,45.9 --output france_tiles/
# Skips any overlapping tiles
ign-lidar-hd process --input france_tiles/ --output france_patches/
# Skips Paris tiles, processes only Lyon tiles

3. Batch Processing with Mixed Status​

# Process a directory with mixed completion status
ign-lidar-hd process --input mixed_tiles/ --output patches/

# Output shows what's done vs what needs processing
⏭️ Tiles with existing patches: 15
βœ… New tiles processed: 8
❌ Failed tiles: 2

Python API​

Download with Skip Control​

from ign_lidar import IGNLiDARDownloader
from pathlib import Path

downloader = IGNLiDARDownloader(Path("tiles/"))

# Skip existing by default
results = downloader.batch_download(tile_list)

# Force re-download
results = downloader.batch_download(tile_list, skip_existing=False)

# Check individual results
success, was_skipped = downloader.download_tile(filename)
if was_skipped:
print(f"Skipped {filename} (already exists)")
elif success:
print(f"Downloaded {filename}")
else:
print(f"Failed to download {filename}")

Processing with Skip Control​

from ign_lidar import LiDARProcessor
from pathlib import Path

processor = LiDARProcessor(lod_level='LOD2')

# Skip existing patches by default
patches = processor.process_directory(
Path("enriched_tiles/"),
Path("patches/")
)

# Force reprocessing
patches = processor.process_directory(
Path("enriched_tiles/"),
Path("patches/"),
skip_existing=False
)

Performance Impact​

Skip Check Performance​

  • File existence check: ~0.001-0.01s per file
  • Patch directory check: ~0.01-0.05s per tile
  • Enrichment check: ~0.02-0.1s per file

Time Comparison​

100 tiles, 50% already processed:

Without Skip Detection:
Download: 100 tiles Γ— 45s = 75 min
Process: 100 tiles Γ— 35s = 58 min
Total: 133 minutes

With Skip Detection:
Download: 50 skipped (0.5 min) + 50 new (37.5 min) = 38 min
Process: 50 skipped (0.4 min) + 50 new (29 min) = 29.4 min
Total: 67.4 minutes

Time saved: 65.6 minutes (49% reduction)

Configuration​

Smart skip detection is enabled by default for all operations. You can control it via:

CLI Flags​

# Default: Skip existing
ign-lidar-hd command [args]

# Force override: Process everything
ign-lidar-hd command [args] --force

Python Parameters​

# Default: skip_existing=True
processor.process_tile(file, output_dir)

# Override: skip_existing=False
processor.process_tile(file, output_dir, skip_existing=False)

Skip Detection Logic​

Download Skip​

  • Checks if LAZ file exists in output directory
  • Compares file size (skips if > 1MB, indicating complete download)
  • Logs skip reason and file size

Enrichment Skip​

  • Checks if output file already exists
  • Validates that file contains building features
  • Skips if features are already present

Processing Skip​

  • Checks if patch directory exists for the tile
  • Counts existing .npz patches
  • Skips if patches already exist (non-zero count)

Troubleshooting​

Files Not Being Skipped​

Check that file paths and naming are consistent:

# Verify file naming patterns
ls -la tiles/
ls -la patches/

Unexpected Skips​

Use verbose logging to see skip decisions:

# Enable debug logging
ign-lidar-hd process --input tiles/ --output patches/ --verbose

Force Reprocessing​

When you need to reprocess everything:

# Method 1: Use --force flag
ign-lidar-hd process --input tiles/ --output patches/ --force

# Method 2: Clear output directory
rm -rf patches/*
ign-lidar-hd process --input tiles/ --output patches/

See Also​