3a59e87f3e4db0a5ebaa28280807427dc4c13cd4
VR180 Human Matting with Det-SAM2
Automated human matting for VR180 3D side-by-side video using SAM2 and YOLOv8. Now with two processing approaches: chunked (original) and streaming (optimized).
Features
- Automatic Person Detection: Uses YOLOv8 to eliminate manual point selection
- Two Processing Modes:
- Chunked: Original stable implementation with higher memory usage
- Streaming: New 2-3x faster implementation with constant memory usage
- VRAM Optimization: Memory management for consumer GPUs (10GB+)
- VR180-Specific Processing: Stereo consistency with master-slave eye processing
- Flexible Scaling: 25%, 50%, or 100% processing resolution
- Multiple Output Formats: Alpha channel or green screen background
- Cloud GPU Ready: Optimized for RunPod, Vast.ai deployment
Installation
# Clone repository
git clone <repository-url>
cd sam2e
# Install dependencies
pip install -r requirements.txt
# Install in development mode
pip install -e .
Quick Start
- Generate example configuration:
vr180-matting --generate-config config.yaml
- Edit configuration file:
input:
video_path: "path/to/your/vr180_video.mp4"
processing:
scale_factor: 0.5 # Start with 50% for testing
output:
path: "output/matted_video.mp4"
format: "alpha" # or "greenscreen"
- Process video:
# Chunked approach (original)
vr180-matting config.yaml
# Streaming approach (optimized, 2-3x faster)
python -m vr180_streaming config-streaming.yaml
Processing Approaches
Streaming Approach (Recommended)
- Memory: Constant ~50GB usage
- Speed: 2-3x faster than chunked
- GPU: 70%+ utilization
- Best for: Long videos, limited RAM
python -m vr180_streaming --generate-config config-streaming.yaml
python -m vr180_streaming config-streaming.yaml
Chunked Approach (Original)
- Memory: 100GB+ peak usage
- Speed: Slower due to chunking overhead
- GPU: Lower utilization (~2.5%)
- Best for: Maximum stability, testing
vr180-matting --generate-config config-chunked.yaml
vr180-matting config-chunked.yaml
See STREAMING_VS_CHUNKED.md for detailed comparison.
RunPod Quick Setup
For cloud GPU processing on RunPod:
# After connecting to your RunPod instance
git clone <repository-url>
cd sam2e
./runpod_setup.sh
# Then run with Python directly:
python -m vr180_streaming config-streaming-runpod.yaml # Streaming (recommended)
python -m vr180_matting config-chunked-runpod.yaml # Chunked (original)
The setup script will:
- Install all dependencies
- Download SAM2 models
- Create example configs for both approaches
Configuration
Input Settings
video_path: Path to VR180 side-by-side video file
Processing Settings
scale_factor: Resolution scaling (0.25, 0.5, 1.0)chunk_size: Frames per chunk (0 for auto-calculation)overlap_frames: Frame overlap between chunks
Detection Settings
confidence_threshold: YOLO detection confidence (0.1-1.0)model: YOLO model size (yolov8n, yolov8s, yolov8m)
Matting Settings
use_disparity_mapping: Enable stereo optimizationmemory_offload: CPU offloading for VRAM managementfp16: Use FP16 precision to reduce memory usage
Output Settings
path: Output file/directory pathformat: "alpha" for RGBA or "greenscreen" for RGB with backgroundbackground_color: RGB background color for green screen modemaintain_sbs: Keep side-by-side format vs separate eye outputs
Hardware Settings
device: "cuda" or "cpu"max_vram_gb: VRAM limit (e.g., 10 for RTX 3080)
Usage Examples
Basic Processing
# Process with default settings
vr180-matting config.yaml
# Override scale factor
vr180-matting config.yaml --scale 0.25
# Use CPU processing
vr180-matting config.yaml --device cpu
Output Formats
# Alpha channel output (RGBA PNG sequence)
vr180-matting config.yaml --format alpha
# Green screen output (RGB video)
vr180-matting config.yaml --format greenscreen
Memory Optimization
# Smaller chunks for limited VRAM
vr180-matting config.yaml --chunk-size 300
# Validate config without processing
vr180-matting config.yaml --dry-run
Performance Guidelines
RTX 3080 (10GB VRAM)
- 25% Scale: ~5-8 FPS, 6 minutes for 30s clip
- 50% Scale: ~3-5 FPS, 10 minutes for 30s clip
- 100% Scale: Chunked processing, 15-20 minutes for 30s clip
Cloud GPU Scaling
- A6000 (48GB): $6-8 per hour video
- A100 (80GB): $8-12 per hour video
- H100 (80GB): $6-10 per hour video
Troubleshooting
Common Issues
CUDA Out of Memory:
- Reduce
scale_factor(try 0.25) - Lower
chunk_size - Enable
memory_offload: true - Use
fp16: true
No Persons Detected:
- Lower
confidence_threshold - Try larger YOLO model (yolov8s, yolov8m)
- Check input video quality
Poor Edge Quality:
- Increase
scale_factorfor final processing - Reduce compression in output format
- Enable edge refinement post-processing
Memory Monitoring
The tool provides detailed memory usage reports:
VRAM Allocated: 8.2 GB
VRAM Free: 1.8 GB
VRAM Utilization: 82%
Architecture
Processing Pipeline
- Video Analysis: Load metadata, analyze SBS layout
- Chunking: Divide video into memory-efficient chunks
- Detection: YOLOv8 person detection per chunk
- Matting: SAM2 mask propagation with memory optimization
- VR180 Processing: Stereo-aware matting with consistency validation
- Output: Combine chunks and save in requested format
Memory Management
- Automatic VRAM monitoring and emergency cleanup
- CPU offloading for frame storage
- FP16 precision support
- Adaptive chunk sizing based on available memory
Development
Project Structure
vr180_matting/ # Chunked approach (original)
├── config.py # Configuration management
├── detector.py # YOLOv8 person detection
├── sam2_wrapper.py # SAM2 integration
├── memory_manager.py # VRAM optimization
├── video_processor.py # Base video processing
├── vr180_processor.py # VR180-specific processing
└── main.py # CLI entry point
vr180_streaming/ # Streaming approach (optimized)
├── frame_reader.py # Streaming frame reader
├── frame_writer.py # Direct ffmpeg pipe writer
├── stereo_manager.py # Stereo consistency management
├── sam2_streaming.py # SAM2 streaming integration
├── detector.py # YOLO person detection
├── streaming_processor.py # Main processor
├── config.py # Configuration
└── main.py # CLI entry point
Contributing
- Fork the repository
- Create a feature branch
- Make changes with tests
- Submit a pull request
License
[License information]
Acknowledgments
- SAM2 team for the segmentation model
- Ultralytics for YOLOv8 detection
- Research referenced in
research.md
Description
Languages
Python
96.5%
Shell
2.7%
Dockerfile
0.8%