Building Cluster ID and Parcel Cluster ID Features Guide
Configuration Version: 6.3.1
Date: October 25, 2025
Status: ✅ Already Enabled in asprs_complete.yaml
Overview
The Building Cluster ID and Parcel Cluster ID features are advanced object identification features that assign unique identifiers to points based on their association with specific buildings or cadastral parcels. These features are essential for:
- Object-based analysis: Group points by building or parcel
- Instance segmentation: Separate individual building instances
- Property analysis: Link LiDAR data to cadastral information
- Change detection: Track changes to specific buildings/parcels over time
✅ Current Configuration Status
Your asprs_complete.yaml configuration already has these features enabled:
1. Feature Computation (Lines 106-109)
features:
# Cluster ID features (building and parcel identification)
compute_cluster_id: true # Spatial clustering for object identification
compute_building_cluster_id: true # Building-specific cluster IDs from BD TOPO
compute_parcel_cluster_id: true # Cadastral parcel cluster IDs
2. Ground Truth Assignment (Lines 296-297)
ground_truth:
bd_topo:
# Cluster ID assignment
assign_building_cluster_ids: true # Assign unique ID per building polygon
assign_parcel_cluster_ids: true # Assign unique ID per cadastral parcel
3. Output Saving (Lines 499-501)
output:
extra_dims:
# Cluster IDs (object identification)
- cluster_id # General spatial cluster ID
- building_cluster_id # Building-specific ID from BD TOPO
- parcel_cluster_id # Cadastral parcel ID
🔍 Feature Descriptions
1. cluster_id - General Spatial Clustering
- Type: Integer (int32)
- Range: 0 to N (number of detected clusters)
- Method: DBSCAN or similar spatial clustering
- Purpose: Groups nearby points into spatial clusters regardless of class
- Use case: Detect connected objects, segment point clouds into instances
Example values:
0- Noise/unassigned points1, 2, 3, ...- Individual cluster IDs
2. building_cluster_id - Building Instance IDs
- Type: Integer (int32)
- Range: 0 to N (number of buildings in BD TOPO)
- Method: Spatial intersection with BD TOPO building polygons
- Purpose: Assign unique ID to each building from ground truth
- Use case: Building-level analysis, facade extraction, architectural studies
Example values:
0- Not inside any building polygon1001, 1002, 1003, ...- Unique building IDs from BD TOPO
Key features:
- Links points to official building registry (BD TOPO)
- Preserves building identity across tiles
- Enables building-level statistics and analysis
3. parcel_cluster_id - Cadastral Parcel IDs
- Type: Integer (int32)
- Range: 0 to N (number of parcels)
- Method: Spatial intersection with cadastral parcel polygons
- Source: French cadastre (PARCELLE.shp)
- Purpose: Link points to land ownership parcels
- Use case: Property analysis, urban planning, land use studies
Example values:
0- Not inside any cadastral parcel2001, 2002, 2003, ...- Unique parcel IDs from cadastre
Key features:
- Links points to official land registry
- Enables property-level aggregation
- Useful for legal/administrative boundaries
📊 How Cluster IDs Work
Processing Pipeline
1. Load Point Cloud
↓
2. Compute Features (geometric, spectral, height)
↓
3. Load Ground Truth (BD TOPO + Cadastre)
↓
4. Spatial Clustering (compute cluster_id)
↓
5. Building Assignment (compute building_cluster_id)
├─ Check if point falls inside building polygon (with buffer)
├─ Assign building ID from BD TOPO
└─ Use 3D bounding box if extrude_3d=true
↓
6. Parcel Assignment (compute parcel_cluster_id)
├─ Check if point falls inside parcel polygon
└─ Assign parcel ID from cadastre
↓
7. Save to LAZ (cluster IDs as extra dimensions)
Assignment Logic
Building Cluster ID:
# For each point (x, y, z):
for building in bd_topo_buildings:
if point_inside_building_polygon(x, y, building):
# Check vertical extent if 3D extrusion enabled
if extrude_3d and (z_min <= z <= z_max):
building_cluster_id = building.id
else:
building_cluster_id = building.id
break
else:
building_cluster_id = 0 # Not in any building
Parcel Cluster ID:
# For each point (x, y):
for parcel in cadastre_parcels:
if point_inside_parcel_polygon(x, y, parcel):
parcel_cluster_id = parcel.id
break
else:
parcel_cluster_id = 0 # Not in any parcel
🎯 Use Cases & Applications
1. Building-Level Analysis
Extract all points for a specific building:
import laspy
# Read LAZ file with cluster IDs
las = laspy.read("output.laz")
building_id = 1001
# Filter points by building ID
mask = las.building_cluster_id == building_id
building_points = las.xyz[mask]
building_classification = las.classification[mask]
print(f"Building {building_id}: {len(building_points)} points")
print(f"Classes: {np.unique(building_classification)}")
Calculate building statistics:
# Group by building ID
unique_buildings = np.unique(las.building_cluster_id)
unique_buildings = unique_buildings[unique_buildings > 0] # Exclude 0
for bldg_id in unique_buildings:
mask = las.building_cluster_id == bldg_id
points = las.xyz[mask]
# Compute building metrics
height = points[:, 2].max() - points[:, 2].min()
volume_estimate = len(points) * 0.01 # Assuming 10cm point spacing
print(f"Building {bldg_id}: {height:.1f}m tall, ~{volume_estimate:.0f}m³")
2. Parcel-Level Aggregation
Aggregate points by cadastral parcel:
# Group by parcel
unique_parcels = np.unique(las.parcel_cluster_id)
unique_parcels = unique_parcels[unique_parcels > 0]
parcel_stats = []
for parcel_id in unique_parcels:
mask = las.parcel_cluster_id == parcel_id
points = las.xyz[mask]
classes = las.classification[mask]
# Count class distribution per parcel
stats = {
'parcel_id': parcel_id,
'n_points': len(points),
'n_buildings': np.sum(classes == 6), # Building class
'n_vegetation': np.sum((classes >= 3) & (classes <= 5)), # Veg classes
'n_ground': np.sum(classes == 2), # Ground class
'mean_height': points[:, 2].mean(),
}
parcel_stats.append(stats)
import pandas as pd
df = pd.DataFrame(parcel_stats)
print(df.describe())
3. Instance Segmentation
Separate individual building instances:
from sklearn.cluster import DBSCAN
# Get all building points
building_mask = las.classification == 6
building_points = las.xyz[building_mask]
building_ids = las.building_cluster_id[building_mask]
# Separate by building cluster ID
for bldg_id in np.unique(building_ids):
if bldg_id == 0:
continue
instance_mask = building_ids == bldg_id
instance_points = building_points[instance_mask]
# Save individual building
# ... process or save instance_points
4. Change Detection
Track changes to specific buildings over time:
# Load two time periods
las_t1 = laspy.read("tile_2023.laz")
las_t2 = laspy.read("tile_2025.laz")
# Compare building 1001 between time periods
bldg_id = 1001
mask_t1 = las_t1.building_cluster_id == bldg_id
mask_t2 = las_t2.building_cluster_id == bldg_id
n_points_t1 = np.sum(mask_t1)
n_points_t2 = np.sum(mask_t2)
height_t1 = las_t1.xyz[mask_t1][:, 2].max() - las_t1.xyz[mask_t1][:, 2].min()
height_t2 = las_t2.xyz[mask_t2][:, 2].max() - las_t2.xyz[mask_t2][:, 2].min()
print(f"Building {bldg_id} changes:")
print(f" Points: {n_points_t1} → {n_points_t2} ({n_points_t2-n_points_t1:+d})")
print(f" Height: {height_t1:.1f}m → {height_t2:.1f}m ({height_t2-height_t1:+.1f}m)")
🔧 Configuration Options
Advanced Tuning
If you need to adjust cluster ID computation, here are the relevant parameters:
Spatial Clustering (cluster_id)
reclassification:
use_clustering: true # Enable spatial clustering
spatial_cluster_eps: 0.5 # 50cm clustering radius (DBSCAN epsilon)
min_cluster_size: 10 # Min 10 points per cluster
spatial_cluster_eps: Distance threshold for grouping points- Smaller (0.2-0.4m): Tighter clusters, more instances
- Larger (0.5-1.0m): Looser clusters, merged instances
min_cluster_size: Minimum points to form a cluster- Smaller (5-10): More small objects detected
- Larger (20-50): Only large objects, filter noise
Building Cluster ID
ground_truth:
bd_topo:
features:
buildings:
buffer_distance: 0.8 # Tolerance for building boundaries (m)
extrude_3d: true # Use 3D bounding boxes
adaptive_buffer_max: 6.0 # Max search distance for facades
buffer_distance: Tolerance for point-polygon intersection- Accounts for building alignment errors in BD TOPO
- Captures points slightly outside footprint
extrude_3d: Enable 3D bounding box checking- Checks both horizontal (XY) and vertical (Z) containment
- More accurate for multi-story buildings
Parcel Cluster ID
ground_truth:
bd_topo:
cadastre:
enabled: true # Enable parcel integration
path: "./data/ground_truth/cadastre/"
parcel_file: "PARCELLE.shp" # Shapefile with parcel polygons
enabled: Master switch for cadastre integrationpath: Directory containing cadastral shapefilesparcel_file: Name of parcel polygon shapefile
📁 Required Data Files
BD TOPO (Buildings)
File: BATIMENT.shp
Location: ./data/ground_truth/BDTOPO/
Required fields:
- Geometry (Polygon)
IDorOBJECTID(unique building identifier)hauteur(building height, optional)
Download:
# From IGN Géoplateforme
# https://geoservices.ign.fr/bdtopo
Cadastre (Parcels)
File: PARCELLE.shp
Location: ./data/ground_truth/cadastre/
Required fields:
- Geometry (Polygon)
IDUorID(unique parcel identifier)SECTION,NUMERO(cadastral reference)
Download:
# From data.gouv.fr or cadastre.gouv.fr
# https://cadastre.data.gouv.fr/
⚡ Performance Considerations
Memory Usage
Cluster IDs add minimal memory overhead:
- Per point: +12 bytes (3 × int32)
cluster_id: 4 bytesbuilding_cluster_id: 4 bytesparcel_cluster_id: 4 bytes
- 18M point tile: ~216 MB additional memory
- LAZ file size: +2-5 MB (compressed)
Processing Time
Cluster ID computation adds ~10-30 seconds per tile:
- Spatial clustering (
cluster_id): ~5-10s (DBSCAN) - Building assignment: ~10-15s (spatial index lookups)
- Parcel assignment: ~5-10s (spatial index lookups)
Optimization:
- Uses STRtree spatial index for fast polygon lookups
- Cached ground truth data reduces repeated loading
- Parallel processing for large datasets
🐛 Troubleshooting
Issue: All cluster IDs are 0
Symptoms:
building_cluster_idandparcel_cluster_idare all 0- No points assigned to buildings/parcels
Solutions:
-
Check ground truth files exist:
ls ./data/ground_truth/BDTOPO/BATIMENT.shp
ls ./data/ground_truth/cadastre/PARCELLE.shp -
Verify coordinate systems match:
- Point cloud: Lambert-93 (EPSG:2154)
- BD TOPO: Lambert-93 (EPSG:2154)
- Cadastre: Lambert-93 (EPSG:2154)
-
Check bounding box overlap:
import geopandas as gpd
# Load ground truth
buildings = gpd.read_file("./data/ground_truth/BDTOPO/BATIMENT.shp")
# Check extent
print(f"Buildings extent: {buildings.total_bounds}")
print(f"Tile extent: {tile_bbox}") -
Increase buffer distance:
buffer_distance: 1.0 # Increase from 0.8m
Issue: Too many points assigned to buildings
Symptoms:
- Non-building points have building_cluster_id > 0
- Over-classification
Solutions:
-
Enable 3D extrusion:
extrude_3d: true # Check vertical extent -
Reduce buffer distance:
buffer_distance: 0.5 # Reduce from 0.8m -
Use adaptive buffers:
enable_adaptive_buffer: true
adaptive_buffer_max: 4.0 # Reduce max buffer
Issue: Slow processing
Symptoms:
- Ground truth assignment takes >1 minute per tile
Solutions:
-
Enable spatial indexing:
use_spatial_index: true # Should be default -
Enable caching:
cache_enabled: true
cache_dir: "./cache/ground_truth" -
Simplify ground truth polygons:
import geopandas as gpd
buildings = gpd.read_file("BATIMENT.shp")
buildings['geometry'] = buildings.geometry.simplify(0.1) # Simplify to 10cm
buildings.to_file("BATIMENT_simplified.shp")
📈 Validation
Check Cluster ID Distribution
import laspy
import numpy as np
las = laspy.read("output.laz")
# Check cluster_id
print("Cluster ID Statistics:")
print(f" Min: {las.cluster_id.min()}")
print(f" Max: {las.cluster_id.max()}")
print(f" Unique clusters: {len(np.unique(las.cluster_id))}")
print(f" Unassigned (0): {np.sum(las.cluster_id == 0)}")
# Check building_cluster_id
print("\nBuilding Cluster ID Statistics:")
print(f" Min: {las.building_cluster_id.min()}")
print(f" Max: {las.building_cluster_id.max()}")
print(f" Unique buildings: {len(np.unique(las.building_cluster_id[las.building_cluster_id > 0]))}")
print(f" Assigned to buildings: {np.sum(las.building_cluster_id > 0)} ({100*np.sum(las.building_cluster_id > 0)/len(las):.1f}%)")
# Check parcel_cluster_id
print("\nParcel Cluster ID Statistics:")
print(f" Min: {las.parcel_cluster_id.min()}")
print(f" Max: {las.parcel_cluster_id.max()}")
print(f" Unique parcels: {len(np.unique(las.parcel_cluster_id[las.parcel_cluster_id > 0]))}")
print(f" Assigned to parcels: {np.sum(las.parcel_cluster_id > 0)} ({100*np.sum(las.parcel_cluster_id > 0)/len(las):.1f}%)")
Visualize Cluster IDs
import open3d as o3d
import numpy as np
import matplotlib.pyplot as plt
# Load LAZ
las = laspy.read("output.laz")
points = las.xyz
building_ids = las.building_cluster_id
# Create color map for building IDs
unique_ids = np.unique(building_ids[building_ids > 0])
cmap = plt.get_cmap('tab20')
colors = np.zeros((len(points), 3))
for i, bid in enumerate(unique_ids):
mask = building_ids == bid
colors[mask] = cmap(i % 20)[:3]
# Visualize
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(points)
pcd.colors = o3d.utility.Vector3dVector(colors)
o3d.visualization.draw_geometries([pcd])
🎓 Summary
Your configuration is already optimized for cluster ID features:
✅ Enabled:
compute_cluster_id- Spatial clusteringcompute_building_cluster_id- BD TOPO buildingscompute_parcel_cluster_id- Cadastral parcels
✅ Configured:
- Ground truth assignment from BD TOPO + Cadastre
- 3D building bounding boxes with adaptive buffers
- Output saving as extra dimensions in LAZ
✅ Optimized:
- STRtree spatial indexing for fast lookups
- Caching for repeated queries
- Adaptive buffers for better building capture
Next Steps:
-
Ensure ground truth files are available:
./data/ground_truth/BDTOPO/BATIMENT.shp./data/ground_truth/cadastre/PARCELLE.shp
-
Run processing:
ign-lidar-hd process \
-c examples/production/asprs_complete.yaml \
input_dir="/data/lidar/tiles" \
output_dir="/data/output" -
Verify cluster IDs in output:
import laspy
las = laspy.read("output/enriched/*.laz")
print(f"Cluster IDs available: {las.point_format.extra_dimension_names}")
Version: 6.3.1
Last Updated: October 25, 2025
Documentation: See codebase docs for implementation details