Quick Start: GPU Refactoring Implementation
This guide is for developers working on Phase 1 GPU refactoring.
Estimated Time: 1 week
Date: October 2025
🎯 Your Mission
Implement the GPU-Core Bridge module to eliminate code duplication while maintaining GPU performance.
Why this matters: Currently 71% of GPU feature code is duplicated. You're fixing that.
📋 Before You Start (Day 0)
1. Read These Documents (2 hours)
Required reading:
AUDIT_SUMMARY.md(15 min) - Understand the problemAUDIT_VISUAL_SUMMARY.md(10 min) - See the architectureIMPLEMENTATION_GUIDE_GPU_BRIDGE.md(30 min) - Your implementation guide
Optional:
AUDIT_GPU_REFACTORING_CORE_FEATURES.md- Deep technical details
2. Set Up Environment (30 min)
# Clone repo if needed
cd /path/to/IGN_LIDAR_HD_DATASET
# Create feature branch
git checkout -b feature/gpu-core-bridge
# Verify GPU environment
python -c "import cupy as cp; print(f'CuPy: {cp.__version__}')"
python -c "import numpy as np; print(f'NumPy: {np.__version__}')"
# Install dependencies if needed
pip install cupy-cuda11x # or cupy-cuda12x
pip install pytest pytest-benchmark
# Verify tests work
pytest tests/ -v
3. Understand the Codebase (1 hour)
# Key files to review
cat ign_lidar/features/core/__init__.py
cat ign_lidar/features/core/eigenvalues.py
cat ign_lidar/features/features_gpu_chunked.py | head -100
📅 Week 1 Schedule
Day 1: Setup & Module Structure
Morning (3 hours):
- Create
ign_lidar/features/core/gpu_bridge.py - Add module docstring and imports
- Create
GPUCoreBridgeclass skeleton - Implement
__init__method
Afternoon (3 hours):
- Implement
compute_eigenvalues_gpu()method - Implement CPU fallback
_compute_eigenvalues_cpu() - Add error handling
- Test basic functionality manually
Code to write: ~150 lines
Reference: IMPLEMENTATION_GUIDE_GPU_BRIDGE.md Step 1
Day 2: GPU Implementation
Morning (3 hours):
- Implement
_compute_eigenvalues_batched_gpu() - Handle cuSOLVER batch size limits
- Add GPU memory management
- Add logging
Afternoon (3 hours):
- Implement
compute_eigenvalue_features_gpu() - Integrate with core module
- Add convenience function
- Test with small datasets
Code to write: ~200 lines
Test manually:
from ign_lidar.features.core.gpu_bridge import GPUCoreBridge
import numpy as np
# Small test
points = np.random.rand(1000, 3).astype(np.float32)
neighbors = np.random.randint(0, 1000, size=(1000, 20))
bridge = GPUCoreBridge(use_gpu=True)
eigenvalues = bridge.compute_eigenvalues_gpu(points, neighbors)
print(f"Shape: {eigenvalues.shape}") # Should be (1000, 3)
print(f"Sample: {eigenvalues[0]}") # Should be 3 values
Day 3: Testing Infrastructure
Morning (3 hours):
- Create
tests/test_gpu_bridge.py - Write test fixtures
- Implement basic unit tests
- Test CPU fallback
Afternoon (3 hours):
- Test GPU vs CPU consistency
- Test batching with large datasets
- Test error handling
- Test integration with core module
Code to write: ~400 lines
Run tests:
pytest tests/test_gpu_bridge.py -v
pytest tests/test_gpu_bridge.py::TestGPUCoreBridge -v
Day 4: Performance & Validation
Morning (3 hours):
- Create
scripts/benchmark_gpu_bridge.py - Run benchmarks with different sizes
- Compare GPU vs CPU performance
- Verify speedup >= 8×
Afternoon (3 hours):
- Optimize if needed
- Test with real data
- Memory profiling
- Fix any issues
Run benchmarks:
python scripts/benchmark_gpu_bridge.py
python scripts/benchmark_gpu_bridge.py --sizes 10000 100000 500000
Expected results:
Dataset: 100,000 points, k=20
CPU Time: 2.5s ± 0.1s
GPU Time: 0.25s ± 0.02s
Speedup: 10.0×
✅ Performance target met (>= 8×)
Day 5: Documentation & Review
Morning (2 hours):
- Update
ign_lidar/features/core/__init__.pyexports - Write docstrings for all functions
- Add usage examples
- Update CHANGELOG
Afternoon (2 hours):
- Code self-review
- Run full test suite
- Prepare pull request
- Document any issues
Final checks:
# All tests pass
pytest tests/ -v
# Benchmarks meet target
python scripts/benchmark_gpu_bridge.py
# Code quality
# (if using black/flake8)
black ign_lidar/features/core/gpu_bridge.py
flake8 ign_lidar/features/core/gpu_bridge.py
🔧 Implementation Tips
GPU Memory Management
# Always clean up GPU memory
def compute_something_gpu(self, data):
data_gpu = cp.asarray(data)
try:
result_gpu = process(data_gpu)
result = cp.asnumpy(result_gpu)
return result
finally:
# Cleanup happens even if error
del data_gpu
if 'result_gpu' in locals():
del result_gpu
Batching Pattern
# Standard batching pattern
batch_size = 500_000 # cuSOLVER limit
num_batches = (N + batch_size - 1) // batch_size
for batch_idx in range(num_batches):
start = batch_idx * batch_size
end = min((batch_idx + 1) * batch_size, N)
# Process batch
batch_result = process_batch(data[start:end])
results[start:end] = batch_result
Testing Pattern
# Always test GPU vs CPU consistency
def test_consistency():
bridge_gpu = GPUCoreBridge(use_gpu=True)
bridge_cpu = GPUCoreBridge(use_gpu=False)
result_gpu = bridge_gpu.compute_eigenvalues_gpu(points, neighbors)
result_cpu = bridge_cpu.compute_eigenvalues_gpu(points, neighbors)
np.testing.assert_allclose(
result_gpu, result_cpu,
rtol=1e-5, atol=1e-7
)
🐛 Common Issues & Solutions
Issue 1: CuPy Import Error
ImportError: No module named 'cupy'
Solution:
pip install cupy-cuda11x # For CUDA 11.x
# or
pip install cupy-cuda12x # For CUDA 12.x
Issue 2: cuSOLVER Batch Size Error
cupy._core.linalg.LinAlgError: cuSOLVER error
Solution: Implement batching for large datasets
# Use max batch size of 500K
if N > 500_000:
result = self._compute_eigenvalues_batched_gpu(data, N)
Issue 3: GPU Out of Memory
cupy.cuda.memory.OutOfMemoryError
Solution: Reduce batch size or add cleanup
# Add explicit cleanup
cp.get_default_memory_pool().free_all_blocks()
Issue 4: Numerical Differences GPU vs CPU
# Small differences are expected due to floating-point
# Use appropriate tolerances
np.testing.assert_allclose(a, b, rtol=1e-5, atol=1e-7)
✅ Success Checklist
Code Complete
-
gpu_bridge.pycreated (~500 lines) - All methods implemented
- Error handling added
- Logging configured
Tests Complete
- Unit tests written (~400 lines)
- All tests passing
- GPU vs CPU consistency verified
- Edge cases covered
Performance Validated
- Benchmarks run
- Speedup >= 8× confirmed
- Memory usage acceptable
- No performance regression
Documentation Complete
- Docstrings for all functions
- Usage examples included
- Core module exports updated
- CHANGELOG updated
Ready for Review
- Code self-reviewed
- All tests passing
- No linting errors
- PR prepared
📝 Daily Progress Template
Copy this for daily updates:
## Day X Progress
**What I completed:**
-
-
- **What I learned:**
-
- **Issues encountered:**
-
- **Blockers:**
- **Tomorrow's plan:**
-
-
- **Time spent:** X hours
🆘 Getting Help
Quick Questions
- Review
IMPLEMENTATION_GUIDE_GPU_BRIDGE.md - Check existing code in
features_gpu_chunked.py - Look at core module implementations
Technical Issues
- Review
AUDIT_GPU_REFACTORING_CORE_FEATURES.mdSection 2 - Check error handling patterns in existing code
- Consult GPU optimization guide
Architecture Questions
- Review
AUDIT_VISUAL_SUMMARY.md - Check data flow diagrams
- Review current vs. proposed architecture
🎓 Learning Resources
CuPy Documentation
- Official docs: https://docs.cupy.dev/
- GPU arrays: Like NumPy but on GPU
- Key functions:
cp.asarray(),cp.asnumpy()
Eigenvalue Computation
np.linalg.eigvalsh()- CPU versioncp.linalg.eigvalsh()- GPU version- Returns eigenvalues sorted ascending
cuSOLVER Limits
- Maximum batch size: ~500K matrices
- For larger: implement batching
- Error:
CUSOLVER_STATUS_INVALID_VALUE
🚀 After Phase 1
If Successful
- Request code review
- Merge to main
- Start Phase 2 (eigenvalue integration)
If Issues Found
- Document issues
- Propose solutions
- Adjust timeline if needed
Metrics to Report
- Code written: ~XXX lines
- Tests written: ~XXX tests
- Test coverage: XX%
- GPU speedup: XX×
- Time spent: XX hours
📞 Contact
Questions?
- Technical Lead: [Name]
- Code Review: [Name]
- GPU Expert: [Name]
Resources:
- Project docs:
/docs/ - Implementation guide:
IMPLEMENTATION_GUIDE_GPU_BRIDGE.md - Audit:
AUDIT_GPU_REFACTORING_CORE_FEATURES.md
Good luck! You're fixing 71% code duplication. This is important work! 🎉
Quick Links:
- 📖 Full Guide:
IMPLEMENTATION_GUIDE_GPU_BRIDGE.md - 📊 Overview:
AUDIT_SUMMARY.md - ✅ Checklist:
AUDIT_CHECKLIST.md - 🎨 Diagrams:
AUDIT_VISUAL_SUMMARY.md