first commit

2025-07-26 07:23:50 -07:00
commit cc77989365
15 changed files with 2429 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,199 @@
+# VR180 Human Matting with Det-SAM2
+
+A proof-of-concept implementation for automated human matting on VR180 3D side-by-side equirectangular video using Det-SAM2 and YOLOv8 detection.
+
+## Features
+
+- **Automatic Person Detection**: Uses YOLOv8 to eliminate manual point selection
+- **VRAM Optimization**: Memory management for RTX 3080 (10GB) compatibility  
+- **VR180-Specific Processing**: Side-by-side stereo handling with disparity mapping
+- **Flexible Scaling**: 25%, 50%, or 100% processing resolution with AI upscaling
+- **Multiple Output Formats**: Alpha channel or green screen background
+- **Chunked Processing**: Handles long videos with memory-efficient chunking
+- **Cloud GPU Ready**: Docker containerization for RunPod, Vast.ai deployment
+
+## Installation
+
+```bash
+# Clone repository
+git clone <repository-url>
+cd sam2e
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Install in development mode
+pip install -e .
+```
+
+## Quick Start
+
+1. **Generate example configuration:**
+```bash
+vr180-matting --generate-config config.yaml
+```
+
+2. **Edit configuration file:**
+```yaml
+input:
+  video_path: "path/to/your/vr180_video.mp4"
+  
+processing:
+  scale_factor: 0.5  # Start with 50% for testing
+  
+output:
+  path: "output/matted_video.mp4"
+  format: "alpha"    # or "greenscreen"
+```
+
+3. **Process video:**
+```bash
+vr180-matting config.yaml
+```
+
+## Configuration
+
+### Input Settings
+- `video_path`: Path to VR180 side-by-side video file
+
+### Processing Settings
+- `scale_factor`: Resolution scaling (0.25, 0.5, 1.0)
+- `chunk_size`: Frames per chunk (0 for auto-calculation)
+- `overlap_frames`: Frame overlap between chunks
+
+### Detection Settings
+- `confidence_threshold`: YOLO detection confidence (0.1-1.0)
+- `model`: YOLO model size (yolov8n, yolov8s, yolov8m)
+
+### Matting Settings
+- `use_disparity_mapping`: Enable stereo optimization
+- `memory_offload`: CPU offloading for VRAM management
+- `fp16`: Use FP16 precision to reduce memory usage
+
+### Output Settings
+- `path`: Output file/directory path
+- `format`: "alpha" for RGBA or "greenscreen" for RGB with background
+- `background_color`: RGB background color for green screen mode
+- `maintain_sbs`: Keep side-by-side format vs separate eye outputs
+
+### Hardware Settings
+- `device`: "cuda" or "cpu"
+- `max_vram_gb`: VRAM limit (e.g., 10 for RTX 3080)
+
+## Usage Examples
+
+### Basic Processing
+```bash
+# Process with default settings
+vr180-matting config.yaml
+
+# Override scale factor
+vr180-matting config.yaml --scale 0.25
+
+# Use CPU processing
+vr180-matting config.yaml --device cpu
+```
+
+### Output Formats
+```bash
+# Alpha channel output (RGBA PNG sequence)
+vr180-matting config.yaml --format alpha
+
+# Green screen output (RGB video)
+vr180-matting config.yaml --format greenscreen
+```
+
+### Memory Optimization
+```bash
+# Smaller chunks for limited VRAM
+vr180-matting config.yaml --chunk-size 300
+
+# Validate config without processing
+vr180-matting config.yaml --dry-run
+```
+
+## Performance Guidelines
+
+### RTX 3080 (10GB VRAM)
+- **25% Scale**: ~5-8 FPS, 6 minutes for 30s clip
+- **50% Scale**: ~3-5 FPS, 10 minutes for 30s clip  
+- **100% Scale**: Chunked processing, 15-20 minutes for 30s clip
+
+### Cloud GPU Scaling
+- **A6000 (48GB)**: $6-8 per hour video
+- **A100 (80GB)**: $8-12 per hour video
+- **H100 (80GB)**: $6-10 per hour video
+
+## Troubleshooting
+
+### Common Issues
+
+**CUDA Out of Memory:**
+- Reduce `scale_factor` (try 0.25)
+- Lower `chunk_size` 
+- Enable `memory_offload: true`
+- Use `fp16: true`
+
+**No Persons Detected:**
+- Lower `confidence_threshold`
+- Try larger YOLO model (yolov8s, yolov8m)
+- Check input video quality
+
+**Poor Edge Quality:**
+- Increase `scale_factor` for final processing
+- Reduce compression in output format
+- Enable edge refinement post-processing
+
+### Memory Monitoring
+The tool provides detailed memory usage reports:
+```
+VRAM Allocated: 8.2 GB
+VRAM Free: 1.8 GB  
+VRAM Utilization: 82%
+```
+
+## Architecture
+
+### Processing Pipeline
+1. **Video Analysis**: Load metadata, analyze SBS layout
+2. **Chunking**: Divide video into memory-efficient chunks
+3. **Detection**: YOLOv8 person detection per chunk
+4. **Matting**: SAM2 mask propagation with memory optimization
+5. **VR180 Processing**: Stereo-aware matting with consistency validation
+6. **Output**: Combine chunks and save in requested format
+
+### Memory Management
+- Automatic VRAM monitoring and emergency cleanup
+- CPU offloading for frame storage
+- FP16 precision support
+- Adaptive chunk sizing based on available memory
+
+## Development
+
+### Project Structure
+```
+vr180_matting/
+├── config.py          # Configuration management
+├── detector.py        # YOLOv8 person detection
+├── sam2_wrapper.py     # SAM2 integration
+├── memory_manager.py   # VRAM optimization
+├── video_processor.py  # Base video processing
+├── vr180_processor.py  # VR180-specific processing
+└── main.py            # CLI entry point
+```
+
+### Contributing
+1. Fork the repository
+2. Create a feature branch
+3. Make changes with tests
+4. Submit a pull request
+
+## License
+
+[License information]
+
+## Acknowledgments
+
+- SAM2 team for the segmentation model
+- Ultralytics for YOLOv8 detection
+- Research referenced in `research.md`