Performance Tuning
Optimize FilterMate for maximum performance on your datasets. This guide covers backend selection, query optimization, indexing strategies, and system tuning.
Quick Winsβ
1. Use the Right Backendβ
Most important decision for performance!
Action: Install psycopg2 for datasets > 50k features
pip install psycopg2-binary
2. Create Spatial Indexesβ
PostgreSQL:
CREATE INDEX idx_your_table_geom ON your_table USING GIST(geometry);
ANALYZE your_table;
Spatialite:
SELECT CreateSpatialIndex('your_table', 'geometry');
Gain: 10-100Γ faster spatial queries
3. Enable Performance Warningsβ
Let FilterMate alert you to optimization opportunities:
{
"PERFORMANCE": {
"enable_performance_warnings": true,
"large_dataset_warning_threshold": 50000
}
}
Backend Optimizationβ
PostgreSQL Performanceβ
Server Configurationβ
Edit postgresql.conf for spatial workloads:
# Memory Settings
shared_buffers = 4GB # 25% of RAM
effective_cache_size = 12GB # 75% of RAM
work_mem = 256MB # Per query
# Query Planning
random_page_cost = 1.1 # SSD
effective_io_concurrency = 200 # SSD
# Parallel Query
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
Restart PostgreSQL after changes:
sudo systemctl restart postgresql
Table Optimizationβ
-- Create spatial index (if missing)
CREATE INDEX IF NOT EXISTS idx_table_geom
ON your_table USING GIST(geometry);
-- Create attribute indexes
CREATE INDEX IF NOT EXISTS idx_table_type
ON your_table(type);
-- Update statistics
ANALYZE your_table;
-- Vacuum to reclaim space
VACUUM ANALYZE your_table;
-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE tablename = 'your_table';
Connection Poolingβ
Use PgBouncer for multiple FilterMate instances:
# pgbouncer.ini
[databases]
your_db = host=localhost port=5432 dbname=your_db
[pgbouncer]
pool_mode = transaction
max_client_conn = 100
default_pool_size = 20
Gain: 30-50% faster on concurrent operations
Query Optimizationβ
Check query plan:
EXPLAIN ANALYZE
SELECT * FROM your_table
WHERE ST_Intersects(
geometry,
ST_Buffer(ST_GeomFromText('POINT(0 0)', 4326), 1000)
);
Look for:
- β "Index Scan using GIST" (good)
- β "Seq Scan" (bad - missing index)
Materialized Viewsβ
FilterMate creates materialized views automatically. Monitor them:
-- List FilterMate materialized views
SELECT matviewname, size
FROM pg_matviews
WHERE matviewname LIKE 'filtermate_%';
-- Refresh manually if needed
REFRESH MATERIALIZED VIEW filtermate_temp_view;
-- Drop old views
DROP MATERIALIZED VIEW IF EXISTS filtermate_old_view;
Spatialite Performanceβ
Enable R*Tree Indexesβ
-- Check if spatial index exists
SELECT * FROM geometry_columns
WHERE f_table_name = 'your_table';
-- Create spatial index
SELECT CreateSpatialIndex('your_table', 'geometry');
-- Verify index
SELECT * FROM sqlite_master
WHERE name LIKE 'idx_your_table_geometry%';
Gain: 15-40Γ faster spatial queries
Optimize Databaseβ
-- Vacuum to compact database
VACUUM;
-- Analyze for query optimizer
ANALYZE;
-- Check database size
SELECT page_count * page_size / 1024.0 / 1024.0 AS size_mb
FROM pragma_page_count(), pragma_page_size();
Cache Settingsβ
# In QGIS Python Console
import sqlite3
conn = sqlite3.connect('/path/to/your.db')
conn.execute('PRAGMA cache_size = 10000') # ~40MB cache
conn.execute('PRAGMA temp_store = MEMORY') # Temp tables in RAM
conn.close()
Geometry Simplificationβ
For large, complex polygons:
-- Create simplified geometry column
ALTER TABLE your_table ADD COLUMN geom_simple GEOMETRY;
UPDATE your_table
SET geom_simple = SimplifyPreserveTopology(geometry, 0.0001);
-- Create index on simplified geometry
SELECT CreateSpatialIndex('your_table', 'geom_simple');
Gain: 2-5Γ faster on complex geometries
OGR Backend Performanceβ
Spatial Indexingβ
OGR backend automatically creates spatial indexes. Verify:
# In QGIS Python Console
layer = iface.activeLayer()
provider = layer.dataProvider()
# Check for spatial index
has_index = provider.capabilities() & provider.CreateSpatialIndex
print(f"Spatial index: {has_index}")
If missing, create manually:
processing.run("native:createspatialindex", {
'INPUT': layer,
'OUTPUT': 'memory:'
})
Format Selectionβ
Performance by format (10k features):
| Format | Read Time | Filter Time | Notes |
|---|---|---|---|
| GeoPackage | 0.2s β‘ | 0.5s β‘ | Best choice |
| Shapefile | 0.3s β‘ | 0.8s | Legacy |
| GeoJSON | 0.5s | 1.5s | Large files slow |
| KML | 0.8s π | 2.0s π | Avoid for large data |
Recommendation: Convert to GeoPackage for best performance
processing.run("native:package", {
'LAYERS': [layer],
'OUTPUT': 'output.gpkg'
})
Query Optimizationβ
Predicate Selectionβ
Choose the right spatial predicate:
| Predicate | Speed | Use Case |
|---|---|---|
| intersects | β‘β‘β‘ Fastest | General overlap detection |
| contains | β‘β‘ Fast | Point-in-polygon |
| within | β‘β‘ Fast | Reverse contains |
| touches | β‘ Medium | Boundary analysis |
| crosses | β‘ Medium | Line intersections |
| overlaps | π Slow | Partial overlap only |
Tip: Use intersects when possible - it's the fastest!
Predicate Orderingβ
FilterMate automatically orders predicates for performance (Spatialite):
# Automatic optimization - fastest predicates first
predicates = ['intersects', 'contains', 'within']
# FilterMate orders as: intersects OR contains OR within
# Not: within OR contains OR intersects (slower)
Gain: 2-3Γ faster on multiple predicates
Buffer Optimizationβ
Fixed Buffersβ
# β
Good - Computed once
buffer_distance = 100.0
apply_filter(layer, buffer_distance=buffer_distance)
# β Bad - Computed for every feature
buffer_expression = "CASE WHEN type='A' THEN 100 ELSE 200 END"
Projected CRSβ
Always use projected CRS for buffers:
# β
Good - Meters (EPSG:3857)
layer_3857 = processing.run("native:reprojectlayer", {
'INPUT': layer,
'TARGET_CRS': 'EPSG:3857'
})
# β Bad - Degrees (EPSG:4326)
# Buffer of 100 degrees makes no sense!
Negative Buffersβ
Use carefully - expensive operation:
# Negative buffer = inward erosion
buffer_distance = -10.0 # 10m inward
# Alternative: Simplify first
simplified_layer = processing.run("native:simplifygeometries", {
'INPUT': layer,
'TOLERANCE': 1.0
})
Memory Managementβ
Large Datasetsβ
Chunk Processingβ
For very large datasets (> 1M features):
# Process in chunks
chunk_size = 100000
for offset in range(0, total_features, chunk_size):
features = layer.getFeatures(
QgsFeatureRequest()
.setLimit(chunk_size)
.setOffset(offset)
)
# Process chunk
Memory Layersβ
Avoid memory layers for large data:
# β Bad - 500k features in memory
memory_layer = QgsVectorLayer("Point", "temp", "memory")
# β
Good - Use GeoPackage
temp_gpkg = processing.run("native:package", {
'LAYERS': [layer],
'OUTPUT': 'temp.gpkg'
})
FilterMate Cacheβ
Configure caching in config.json:
{
"PERFORMANCE": {
"cache_layer_metadata": true,
"max_cache_entries": 200,
"cache_geometry_sources": true
}
}
Geometry Source Cache: 5Γ faster when filtering multiple layers against the same source
Monitoring Performanceβ
Enable Loggingβ
import logging
logger = logging.getLogger('FilterMate')
logger.setLevel(logging.DEBUG)
# View timing information
# Check QGIS Python Console
Benchmark Your Workflowβ
Use FilterMate's benchmark tool:
python tests/benchmark_simple.py
Output:
Testing Spatialite optimization (10,000 features):
Without optimization: 1.38s
With optimization: 0.03s
Speed improvement: 44.6Γ
Testing OGR spatial index (10,000 features):
Without index: 0.80s
With index: 0.04s
Speed improvement: 19.5Γ
PostgreSQL Query Analysisβ
Monitor slow queries:
-- Enable query logging
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log > 1s
SELECT pg_reload_conf();
-- View slow queries
SELECT
query,
calls,
total_time,
mean_time
FROM pg_stat_statements
WHERE query LIKE '%ST_%'
ORDER BY mean_time DESC
LIMIT 10;
Performance Checklistβ
Before Filteringβ
- Backend selected - Right backend for dataset size
- Spatial indexes - Created on geometry columns
- Statistics updated - ANALYZE run recently
- CRS projected - Using meters, not degrees
- Format optimized - GeoPackage preferred
- Geometry valid - No invalid geometries
- Cache enabled - Performance settings configured
During Filteringβ
- Progress visible - QGIS progress bar shows activity
- Memory usage - Check system monitor
- No errors - Check QGIS message bar
- Query time - Should be reasonable for dataset size
After Filteringβ
- Results correct - Verify feature count
- Performance acceptable - Note timing
- Cleanup done - Temp tables removed
- Cache hit - Subsequent filters faster
Troubleshooting Slow Performanceβ
Symptom: Filtering Takes Foreverβ
Check: Dataset size
layer.featureCount() # How many features?
Solutions:
- Install PostgreSQL if > 50k features
- Create spatial index if missing
- Simplify geometries if very complex
- Use faster predicate (intersects vs overlaps)
Symptom: QGIS Freezesβ
Cause: Operation in main thread
Solution: Ensure async tasks enabled:
# FilterMate uses QgsTask automatically
# If freezing, check QGIS task manager
Symptom: Out of Memoryβ
Cause: Too many features in memory
Solutions:
- Use PostgreSQL - server-side processing
- Enable spatial index - reduces memory usage
- Chunk processing - process in batches
- Close other applications - free up RAM
Symptom: Slow on First Run, Fast Afterβ
Cause: Cache warming, index creation
Expected: First run slower as FilterMate:
- Creates spatial indexes
- Caches geometry sources
- Analyzes layer structure
Subsequent runs: 2-5Γ faster due to caching
Advanced Optimizationβ
Custom PostgreSQL Functionsβ
Create optimized functions for repeated operations:
CREATE OR REPLACE FUNCTION fast_buffer_intersects(
geom geometry,
target geometry,
buffer_dist numeric
) RETURNS boolean AS $$
BEGIN
-- Bounding box check first (fast)
IF NOT ST_Intersects(
ST_Expand(geom, buffer_dist),
ST_Envelope(target)
) THEN
RETURN FALSE;
END IF;
-- Actual intersection (if bbox test passed)
RETURN ST_Intersects(
ST_Buffer(geom, buffer_dist),
target
);
END;
$$ LANGUAGE plpgsql IMMUTABLE;
Parallel Processingβ
For multiple layers, use Python multiprocessing:
from multiprocessing import Pool
def filter_layer(layer_name):
layer = QgsProject.instance().mapLayersByName(layer_name)[0]
# Apply filter
return result
# Process layers in parallel
with Pool(4) as pool:
results = pool.map(filter_layer, layer_names)
Pre-computed Geometriesβ
Store frequently used geometries:
-- Create lookup table
CREATE TABLE admin_boundaries (
id serial PRIMARY KEY,
name text,
geometry geometry(POLYGON, 4326),
simplified_geom geometry(POLYGON, 4326)
);
-- Pre-compute simplified versions
UPDATE admin_boundaries
SET simplified_geom = ST_Simplify(geometry, 0.001);
-- Create indexes
CREATE INDEX idx_admin_geom ON admin_boundaries USING GIST(geometry);
CREATE INDEX idx_admin_simple ON admin_boundaries USING GIST(simplified_geom);
Performance Goalsβ
Target Response Timesβ
| Dataset Size | Target Time | Backend |
|---|---|---|
| < 1k | < 0.5s β‘ | Any |
| 1k - 10k | < 1s β‘ | Any |
| 10k - 50k | < 3s β‘ | Spatialite |
| 50k - 500k | < 10s β‘ | PostgreSQL |
| > 500k | < 30s β‘ | PostgreSQL |
Measuring Performanceβ
import time
start = time.time()
# Apply filter
elapsed = time.time() - start
print(f"Filter applied in {elapsed:.2f}s")
# Compare to target
target = 3.0 # seconds
if elapsed > target:
print(f"β οΈ Slower than target ({target}s)")
else:
print(f"β
Within target")
See Alsoβ
- Backend Selection - Choose optimal backend
- PostgreSQL Backend - PostgreSQL optimization
- Spatialite Backend - Spatialite tuning
- Performance Comparison - Benchmark data
Resourcesβ
Last updated: December 8, 2025