2025-07-27 09:26:47 -07:00
2025-07-26 18:31:16 -07:00
2025-07-27 09:26:47 -07:00
2025-07-26 07:23:50 -07:00
2025-07-26 11:04:04 -07:00
2025-07-27 08:34:57 -07:00
2025-07-26 07:35:44 -07:00
2025-07-26 07:35:44 -07:00
2025-07-26 14:52:44 -07:00
2025-07-27 08:10:20 -07:00
2025-07-27 08:19:42 -07:00
2025-07-26 07:23:50 -07:00
2025-07-26 07:49:24 -07:00
2025-07-27 08:40:59 -07:00
2025-07-26 07:35:44 -07:00
2025-07-26 11:04:04 -07:00
2025-07-26 15:31:07 -07:00
2025-07-27 08:01:08 -07:00

VR180 Human Matting with Det-SAM2

Automated human matting for VR180 3D side-by-side video using SAM2 and YOLOv8. Now with two processing approaches: chunked (original) and streaming (optimized).

Features

  • Automatic Person Detection: Uses YOLOv8 to eliminate manual point selection
  • Two Processing Modes:
    • Chunked: Original stable implementation with higher memory usage
    • Streaming: New 2-3x faster implementation with constant memory usage
  • VRAM Optimization: Memory management for consumer GPUs (10GB+)
  • VR180-Specific Processing: Stereo consistency with master-slave eye processing
  • Flexible Scaling: 25%, 50%, or 100% processing resolution
  • Multiple Output Formats: Alpha channel or green screen background
  • Cloud GPU Ready: Optimized for RunPod, Vast.ai deployment

Installation

# Clone repository
git clone <repository-url>
cd sam2e

# Install dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e .

Quick Start

  1. Generate example configuration:
vr180-matting --generate-config config.yaml
  1. Edit configuration file:
input:
  video_path: "path/to/your/vr180_video.mp4"
  
processing:
  scale_factor: 0.5  # Start with 50% for testing
  
output:
  path: "output/matted_video.mp4"
  format: "alpha"    # or "greenscreen"
  1. Process video:
# Chunked approach (original)
vr180-matting config.yaml

# Streaming approach (optimized, 2-3x faster)
python -m vr180_streaming config-streaming.yaml

Processing Approaches

  • Memory: Constant ~50GB usage
  • Speed: 2-3x faster than chunked
  • GPU: 70%+ utilization
  • Best for: Long videos, limited RAM
python -m vr180_streaming --generate-config config-streaming.yaml
python -m vr180_streaming config-streaming.yaml

Chunked Approach (Original)

  • Memory: 100GB+ peak usage
  • Speed: Slower due to chunking overhead
  • GPU: Lower utilization (~2.5%)
  • Best for: Maximum stability, testing
vr180-matting --generate-config config-chunked.yaml
vr180-matting config-chunked.yaml

See STREAMING_VS_CHUNKED.md for detailed comparison.

RunPod Quick Setup

For cloud GPU processing on RunPod:

# After connecting to your RunPod instance
git clone <repository-url>
cd sam2e
./runpod_setup.sh

# Then run with Python directly:
python -m vr180_streaming config-streaming-runpod.yaml   # Streaming (recommended)
python -m vr180_matting config-chunked-runpod.yaml       # Chunked (original)

The setup script will:

  • Install all dependencies
  • Download SAM2 models
  • Create example configs for both approaches

Configuration

Input Settings

  • video_path: Path to VR180 side-by-side video file

Processing Settings

  • scale_factor: Resolution scaling (0.25, 0.5, 1.0)
  • chunk_size: Frames per chunk (0 for auto-calculation)
  • overlap_frames: Frame overlap between chunks

Detection Settings

  • confidence_threshold: YOLO detection confidence (0.1-1.0)
  • model: YOLO model size (yolov8n, yolov8s, yolov8m)

Matting Settings

  • use_disparity_mapping: Enable stereo optimization
  • memory_offload: CPU offloading for VRAM management
  • fp16: Use FP16 precision to reduce memory usage

Output Settings

  • path: Output file/directory path
  • format: "alpha" for RGBA or "greenscreen" for RGB with background
  • background_color: RGB background color for green screen mode
  • maintain_sbs: Keep side-by-side format vs separate eye outputs

Hardware Settings

  • device: "cuda" or "cpu"
  • max_vram_gb: VRAM limit (e.g., 10 for RTX 3080)

Usage Examples

Basic Processing

# Process with default settings
vr180-matting config.yaml

# Override scale factor
vr180-matting config.yaml --scale 0.25

# Use CPU processing
vr180-matting config.yaml --device cpu

Output Formats

# Alpha channel output (RGBA PNG sequence)
vr180-matting config.yaml --format alpha

# Green screen output (RGB video)
vr180-matting config.yaml --format greenscreen

Memory Optimization

# Smaller chunks for limited VRAM
vr180-matting config.yaml --chunk-size 300

# Validate config without processing
vr180-matting config.yaml --dry-run

Performance Guidelines

RTX 3080 (10GB VRAM)

  • 25% Scale: ~5-8 FPS, 6 minutes for 30s clip
  • 50% Scale: ~3-5 FPS, 10 minutes for 30s clip
  • 100% Scale: Chunked processing, 15-20 minutes for 30s clip

Cloud GPU Scaling

  • A6000 (48GB): $6-8 per hour video
  • A100 (80GB): $8-12 per hour video
  • H100 (80GB): $6-10 per hour video

Troubleshooting

Common Issues

CUDA Out of Memory:

  • Reduce scale_factor (try 0.25)
  • Lower chunk_size
  • Enable memory_offload: true
  • Use fp16: true

No Persons Detected:

  • Lower confidence_threshold
  • Try larger YOLO model (yolov8s, yolov8m)
  • Check input video quality

Poor Edge Quality:

  • Increase scale_factor for final processing
  • Reduce compression in output format
  • Enable edge refinement post-processing

Memory Monitoring

The tool provides detailed memory usage reports:

VRAM Allocated: 8.2 GB
VRAM Free: 1.8 GB  
VRAM Utilization: 82%

Architecture

Processing Pipeline

  1. Video Analysis: Load metadata, analyze SBS layout
  2. Chunking: Divide video into memory-efficient chunks
  3. Detection: YOLOv8 person detection per chunk
  4. Matting: SAM2 mask propagation with memory optimization
  5. VR180 Processing: Stereo-aware matting with consistency validation
  6. Output: Combine chunks and save in requested format

Memory Management

  • Automatic VRAM monitoring and emergency cleanup
  • CPU offloading for frame storage
  • FP16 precision support
  • Adaptive chunk sizing based on available memory

Development

Project Structure

vr180_matting/          # Chunked approach (original)
├── config.py          # Configuration management
├── detector.py        # YOLOv8 person detection
├── sam2_wrapper.py    # SAM2 integration
├── memory_manager.py  # VRAM optimization
├── video_processor.py # Base video processing
├── vr180_processor.py # VR180-specific processing
└── main.py           # CLI entry point

vr180_streaming/       # Streaming approach (optimized)
├── frame_reader.py   # Streaming frame reader
├── frame_writer.py   # Direct ffmpeg pipe writer
├── stereo_manager.py # Stereo consistency management
├── sam2_streaming.py # SAM2 streaming integration
├── detector.py       # YOLO person detection
├── streaming_processor.py # Main processor
├── config.py         # Configuration
└── main.py          # CLI entry point

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make changes with tests
  4. Submit a pull request

License

[License information]

Acknowledgments

  • SAM2 team for the segmentation model
  • Ultralytics for YOLOv8 detection
  • Research referenced in research.md
Description
No description provided
Readme 663 KiB
Languages
Python 96.5%
Shell 2.7%
Dockerfile 0.8%