first commit

2025-07-26 07:23:50 -07:00
commit cc77989365
15 changed files with 2429 additions and 0 deletions
--- a/spec.md
+++ b/spec.md
@@ -0,0 +1,161 @@
+# VR180 Human Matting Proof of Concept - Det-SAM2 Approach
+
+## Project Overview
+
+A proof-of-concept implementation to test the feasibility of using Det-SAM2 for automated human matting on VR180 3D side-by-side equirectangular video. The system will process a 30-second test clip to evaluate quality, performance, and resource requirements on local RTX 3080 hardware, with design considerations for cloud GPU scaling.
+
+## Input Specifications
+
+- **Format**: VR180 3D side-by-side equirectangular video
+- **Resolution**: 6144x3072 (3072x3072 per eye)
+- **Test Duration**: 30 seconds
+- **Layout**: Left eye (0-3071px), Right eye (3072-6143px)
+
+## Core Functionality
+
+### Automatic Person Detection
+- **Method**: YOLOv8 integration with Det-SAM2
+- **Detection**: Automatic bounding box placement on all humans
+- **Minimal Manual Input**: Fully automated pipeline with no point selection required
+
+### Processing Strategy
+- **Primary Approach**: Process both eyes using disparity mapping optimization
+- **Fallback**: Independent processing per eye if disparity mapping proves complex
+- **Chunking**: Adaptive segmentation (full 30s clip preferred, fallback to smaller chunks if VRAM limited)
+
+### Scaling and Quality Options
+- **Resolution Scaling**: 25%, 50%, or 100% processing resolution
+- **Mask Upscaling**: AI-based upscaling to full resolution for final output
+- **Quality vs Performance**: Configurable tradeoffs for local vs cloud processing
+
+## Configuration System
+
+### YAML/TOML Configuration File
+```yaml
+input:
+  video_path: "path/to/input.mp4"
+  
+processing:
+  scale_factor: 0.5  # 0.25, 0.5, 1.0
+  chunk_size: 900    # frames, 0 for full video
+  overlap_frames: 60 # for chunked processing
+  
+detection:
+  confidence_threshold: 0.7
+  model: "yolov8n"   # yolov8n, yolov8s, yolov8m
+  
+matting:
+  use_disparity_mapping: true
+  memory_offload: true
+  fp16: true
+  
+output:
+  path: "path/to/output/"
+  format: "alpha"     # "alpha" or "greenscreen"
+  background_color: [0, 255, 0]  # for greenscreen
+  maintain_sbs: true  # keep side-by-side format
+  
+hardware:
+  device: "cuda"
+  max_vram_gb: 10     # RTX 3080 limit
+```
+
+## Technical Implementation
+
+### Memory Optimization (Det-SAM2 Enhancements)
+- **CPU Offloading**: `offload_video_to_cpu=True`
+- **FP16 Storage**: Reduce memory usage by ~50%
+- **Frame Release**: `release_old_frames()` for constant VRAM usage
+- **Adaptive Chunking**: Automatic chunk size based on available VRAM
+
+### VR180-Specific Optimizations
+- **Stereo Processing**: Leverage disparity mapping for efficiency
+- **Cross-Eye Validation**: Ensure consistency between left/right views
+- **Edge Refinement**: Multi-resolution processing for clean matting boundaries
+
+### Output Options
+- **Alpha Channel**: Transparent PNG sequence or video with alpha
+- **Green Screen**: Configurable background color for traditional keying
+- **Format Preservation**: Maintain original SBS layout or output separate eyes
+
+## Performance Targets
+
+### Local RTX 3080 (10GB VRAM)
+- **25% Scale**: ~5-8 FPS processing, ~6 minutes for 30s clip
+- **50% Scale**: ~3-5 FPS processing, ~10 minutes for 30s clip
+- **100% Scale**: Chunked processing required, ~15-20 minutes for 30s clip
+
+### Cloud GPU Scaling (Future)
+- **Design Considerations**: Docker containerization ready
+- **Provider Agnostic**: Compatible with RunPod, Vast.ai, etc.
+- **Batch Processing**: Queue-based job distribution
+- **Cost Estimation**: Target $0.10-0.50 per 30s clip processing
+
+## Quality Assessment Features
+
+### Automated Quality Metrics
+- **Edge Consistency**: Measure aliasing and stair-stepping
+- **Temporal Stability**: Frame-to-frame consistency scoring
+- **Stereo Alignment**: Left/right eye correspondence validation
+
+### Debug/Analysis Outputs
+- **Detection Visualization**: Bounding boxes overlaid on frames
+- **Confidence Maps**: Per-pixel matting confidence scores
+- **Processing Stats**: VRAM usage, FPS, chunk information
+
+## Deliverables
+
+### Phase 1: Core Implementation
+1. **Det-SAM2 Integration**: Automatic detection pipeline
+2. **VRAM Optimization**: Memory management for RTX 3080
+3. **Basic Matting**: Single-resolution processing
+4. **Configuration System**: YAML-based parameter control
+
+### Phase 2: VR180 Optimization  
+1. **Disparity Processing**: Stereo-aware matting
+2. **Multi-Resolution**: Scaling and upsampling pipeline
+3. **Quality Assessment**: Automated metrics and visualization
+4. **Edge Refinement**: Anti-aliasing and boundary smoothing
+
+### Phase 3: Production Ready
+1. **Cloud GPU Support**: Docker containerization
+2. **Batch Processing**: Multiple video queue system
+3. **Performance Profiling**: Detailed resource usage analytics
+4. **Quality Validation**: Comprehensive testing suite
+
+## Success Criteria
+
+### Technical Feasibility
+- [ ] Process 30s VR180 clip without manual intervention
+- [ ] Maintain <10GB VRAM usage on RTX 3080
+- [ ] Achieve acceptable matting quality at 50% scale
+- [ ] Complete processing in <15 minutes locally
+
+### Quality Benchmarks
+- [ ] Clean edges with minimal artifacts
+- [ ] Temporal consistency across frames
+- [ ] Stereo alignment between left/right eyes
+- [ ] Usable results for green screen compositing
+
+### Scalability Validation
+- [ ] Configuration-driven parameter control
+- [ ] Clear performance vs quality tradeoffs identified
+- [ ] Docker deployment pathway established
+- [ ] Cost/benefit analysis for cloud GPU usage
+
+## Risk Mitigation
+
+### VRAM Limitations
+- **Fallback**: Automatic chunking with overlap processing
+- **Monitoring**: Real-time VRAM usage tracking
+- **Graceful Degradation**: Quality reduction before failure
+
+### Quality Issues
+- **Validation Pipeline**: Automated quality assessment
+- **Manual Override**: Optional bounding box adjustment
+- **Fallback Methods**: Integration points for RVM if needed
+
+### Performance Bottlenecks
+- **Profiling**: Detailed timing analysis per component  
+- **Optimization**: Identify CPU vs GPU bound operations
+- **Scaling Strategy**: Clear upgrade path to cloud GPUs