# VR180 Human Matting with Det-SAM2 A proof-of-concept implementation for automated human matting on VR180 3D side-by-side equirectangular video using Det-SAM2 and YOLOv8 detection. ## Features - **Automatic Person Detection**: Uses YOLOv8 to eliminate manual point selection - **VRAM Optimization**: Memory management for RTX 3080 (10GB) compatibility - **VR180-Specific Processing**: Side-by-side stereo handling with disparity mapping - **Flexible Scaling**: 25%, 50%, or 100% processing resolution with AI upscaling - **Multiple Output Formats**: Alpha channel or green screen background - **Chunked Processing**: Handles long videos with memory-efficient chunking - **Cloud GPU Ready**: Docker containerization for RunPod, Vast.ai deployment ## Installation ```bash # Clone repository git clone cd sam2e # Install dependencies pip install -r requirements.txt # Install in development mode pip install -e . ``` ## Quick Start 1. **Generate example configuration:** ```bash vr180-matting --generate-config config.yaml ``` 2. **Edit configuration file:** ```yaml input: video_path: "path/to/your/vr180_video.mp4" processing: scale_factor: 0.5 # Start with 50% for testing output: path: "output/matted_video.mp4" format: "alpha" # or "greenscreen" ``` 3. **Process video:** ```bash vr180-matting config.yaml ``` ## Configuration ### Input Settings - `video_path`: Path to VR180 side-by-side video file ### Processing Settings - `scale_factor`: Resolution scaling (0.25, 0.5, 1.0) - `chunk_size`: Frames per chunk (0 for auto-calculation) - `overlap_frames`: Frame overlap between chunks ### Detection Settings - `confidence_threshold`: YOLO detection confidence (0.1-1.0) - `model`: YOLO model size (yolov8n, yolov8s, yolov8m) ### Matting Settings - `use_disparity_mapping`: Enable stereo optimization - `memory_offload`: CPU offloading for VRAM management - `fp16`: Use FP16 precision to reduce memory usage ### Output Settings - `path`: Output file/directory path - `format`: "alpha" for RGBA or "greenscreen" for RGB with background - `background_color`: RGB background color for green screen mode - `maintain_sbs`: Keep side-by-side format vs separate eye outputs ### Hardware Settings - `device`: "cuda" or "cpu" - `max_vram_gb`: VRAM limit (e.g., 10 for RTX 3080) ## Usage Examples ### Basic Processing ```bash # Process with default settings vr180-matting config.yaml # Override scale factor vr180-matting config.yaml --scale 0.25 # Use CPU processing vr180-matting config.yaml --device cpu ``` ### Output Formats ```bash # Alpha channel output (RGBA PNG sequence) vr180-matting config.yaml --format alpha # Green screen output (RGB video) vr180-matting config.yaml --format greenscreen ``` ### Memory Optimization ```bash # Smaller chunks for limited VRAM vr180-matting config.yaml --chunk-size 300 # Validate config without processing vr180-matting config.yaml --dry-run ``` ## Performance Guidelines ### RTX 3080 (10GB VRAM) - **25% Scale**: ~5-8 FPS, 6 minutes for 30s clip - **50% Scale**: ~3-5 FPS, 10 minutes for 30s clip - **100% Scale**: Chunked processing, 15-20 minutes for 30s clip ### Cloud GPU Scaling - **A6000 (48GB)**: $6-8 per hour video - **A100 (80GB)**: $8-12 per hour video - **H100 (80GB)**: $6-10 per hour video ## Troubleshooting ### Common Issues **CUDA Out of Memory:** - Reduce `scale_factor` (try 0.25) - Lower `chunk_size` - Enable `memory_offload: true` - Use `fp16: true` **No Persons Detected:** - Lower `confidence_threshold` - Try larger YOLO model (yolov8s, yolov8m) - Check input video quality **Poor Edge Quality:** - Increase `scale_factor` for final processing - Reduce compression in output format - Enable edge refinement post-processing ### Memory Monitoring The tool provides detailed memory usage reports: ``` VRAM Allocated: 8.2 GB VRAM Free: 1.8 GB VRAM Utilization: 82% ``` ## Architecture ### Processing Pipeline 1. **Video Analysis**: Load metadata, analyze SBS layout 2. **Chunking**: Divide video into memory-efficient chunks 3. **Detection**: YOLOv8 person detection per chunk 4. **Matting**: SAM2 mask propagation with memory optimization 5. **VR180 Processing**: Stereo-aware matting with consistency validation 6. **Output**: Combine chunks and save in requested format ### Memory Management - Automatic VRAM monitoring and emergency cleanup - CPU offloading for frame storage - FP16 precision support - Adaptive chunk sizing based on available memory ## Development ### Project Structure ``` vr180_matting/ ├── config.py # Configuration management ├── detector.py # YOLOv8 person detection ├── sam2_wrapper.py # SAM2 integration ├── memory_manager.py # VRAM optimization ├── video_processor.py # Base video processing ├── vr180_processor.py # VR180-specific processing └── main.py # CLI entry point ``` ### Contributing 1. Fork the repository 2. Create a feature branch 3. Make changes with tests 4. Submit a pull request ## License [License information] ## Acknowledgments - SAM2 team for the segmentation model - Ultralytics for YOLOv8 detection - Research referenced in `research.md`