test2/README.md

# VR180 Human Matting with Det-SAM2

Automated human matting for VR180 3D side-by-side video using SAM2 and YOLOv8. Now with two processing approaches: chunked (original) and streaming (optimized).

## Features

- **Automatic Person Detection**: Uses YOLOv8 to eliminate manual point selection
- **Two Processing Modes**:
  - **Chunked**: Original stable implementation with higher memory usage
  - **Streaming**: New 2-3x faster implementation with constant memory usage
- **VRAM Optimization**: Memory management for consumer GPUs (10GB+)
- **VR180-Specific Processing**: Stereo consistency with master-slave eye processing
- **Flexible Scaling**: 25%, 50%, or 100% processing resolution
- **Multiple Output Formats**: Alpha channel or green screen background
- **Cloud GPU Ready**: Optimized for RunPod, Vast.ai deployment

## Installation

```bash
# Clone repository
git clone <repository-url>
cd sam2e

# Install dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e .
```

## Quick Start

1. **Generate example configuration:**
```bash
vr180-matting --generate-config config.yaml
```

2. **Edit configuration file:**
```yaml
input:
  video_path: "path/to/your/vr180_video.mp4"

processing:
  scale_factor: 0.5  # Start with 50% for testing

output:
  path: "output/matted_video.mp4"
  format: "alpha"    # or "greenscreen"
```

3. **Process video:**
```bash
# Chunked approach (original)
vr180-matting config.yaml

# Streaming approach (optimized, 2-3x faster)
python -m vr180_streaming config-streaming.yaml
```

## Processing Approaches

### Streaming Approach (Recommended)
- **Memory**: Constant ~50GB usage
- **Speed**: 2-3x faster than chunked
- **GPU**: 70%+ utilization
- **Best for**: Long videos, limited RAM

```bash
python -m vr180_streaming --generate-config config-streaming.yaml
python -m vr180_streaming config-streaming.yaml
```

### Chunked Approach (Original)
- **Memory**: 100GB+ peak usage
- **Speed**: Slower due to chunking overhead
- **GPU**: Lower utilization (~2.5%)
- **Best for**: Maximum stability, testing

```bash
vr180-matting --generate-config config-chunked.yaml
vr180-matting config-chunked.yaml
```

See [STREAMING_VS_CHUNKED.md](STREAMING_VS_CHUNKED.md) for detailed comparison.

## RunPod Quick Setup

For cloud GPU processing on RunPod:

```bash
# After connecting to your RunPod instance
git clone <repository-url>
cd sam2e
./runpod_setup.sh

# Then use the convenience scripts:
./run_streaming.sh   # For streaming approach (recommended)
./run_chunked.sh     # For chunked approach
```

The setup script will:
- Install all dependencies
- Download SAM2 models
- Create example configs
- Set up convenience scripts

## Configuration

### Input Settings
- `video_path`: Path to VR180 side-by-side video file

### Processing Settings
- `scale_factor`: Resolution scaling (0.25, 0.5, 1.0)
- `chunk_size`: Frames per chunk (0 for auto-calculation)
- `overlap_frames`: Frame overlap between chunks

### Detection Settings
- `confidence_threshold`: YOLO detection confidence (0.1-1.0)
- `model`: YOLO model size (yolov8n, yolov8s, yolov8m)

### Matting Settings
- `use_disparity_mapping`: Enable stereo optimization
- `memory_offload`: CPU offloading for VRAM management
- `fp16`: Use FP16 precision to reduce memory usage

### Output Settings
- `path`: Output file/directory path
- `format`: "alpha" for RGBA or "greenscreen" for RGB with background
- `background_color`: RGB background color for green screen mode
- `maintain_sbs`: Keep side-by-side format vs separate eye outputs

### Hardware Settings
- `device`: "cuda" or "cpu"
- `max_vram_gb`: VRAM limit (e.g., 10 for RTX 3080)

## Usage Examples

### Basic Processing
```bash
# Process with default settings
vr180-matting config.yaml

# Override scale factor
vr180-matting config.yaml --scale 0.25

# Use CPU processing
vr180-matting config.yaml --device cpu
```

### Output Formats
```bash
# Alpha channel output (RGBA PNG sequence)
vr180-matting config.yaml --format alpha

# Green screen output (RGB video)
vr180-matting config.yaml --format greenscreen
```

### Memory Optimization
```bash
# Smaller chunks for limited VRAM
vr180-matting config.yaml --chunk-size 300

# Validate config without processing
vr180-matting config.yaml --dry-run
```

## Performance Guidelines

### RTX 3080 (10GB VRAM)
- **25% Scale**: ~5-8 FPS, 6 minutes for 30s clip
- **50% Scale**: ~3-5 FPS, 10 minutes for 30s clip
- **100% Scale**: Chunked processing, 15-20 minutes for 30s clip

### Cloud GPU Scaling
- **A6000 (48GB)**: $6-8 per hour video
- **A100 (80GB)**: $8-12 per hour video
- **H100 (80GB)**: $6-10 per hour video

## Troubleshooting

### Common Issues

**CUDA Out of Memory:**
- Reduce `scale_factor` (try 0.25)
- Lower `chunk_size`
- Enable `memory_offload: true`
- Use `fp16: true`

**No Persons Detected:**
- Lower `confidence_threshold`
- Try larger YOLO model (yolov8s, yolov8m)
- Check input video quality

**Poor Edge Quality:**
- Increase `scale_factor` for final processing
- Reduce compression in output format
- Enable edge refinement post-processing

### Memory Monitoring
The tool provides detailed memory usage reports:
```
VRAM Allocated: 8.2 GB
VRAM Free: 1.8 GB
VRAM Utilization: 82%
```

## Architecture

### Processing Pipeline
1. **Video Analysis**: Load metadata, analyze SBS layout
2. **Chunking**: Divide video into memory-efficient chunks
3. **Detection**: YOLOv8 person detection per chunk
4. **Matting**: SAM2 mask propagation with memory optimization
5. **VR180 Processing**: Stereo-aware matting with consistency validation
6. **Output**: Combine chunks and save in requested format

### Memory Management
- Automatic VRAM monitoring and emergency cleanup
- CPU offloading for frame storage
- FP16 precision support
- Adaptive chunk sizing based on available memory

## Development

### Project Structure
```
vr180_matting/          # Chunked approach (original)
├── config.py          # Configuration management
├── detector.py        # YOLOv8 person detection
├── sam2_wrapper.py    # SAM2 integration
├── memory_manager.py  # VRAM optimization
├── video_processor.py # Base video processing
├── vr180_processor.py # VR180-specific processing
└── main.py           # CLI entry point

vr180_streaming/       # Streaming approach (optimized)
├── frame_reader.py   # Streaming frame reader
├── frame_writer.py   # Direct ffmpeg pipe writer
├── stereo_manager.py # Stereo consistency management
├── sam2_streaming.py # SAM2 streaming integration
├── detector.py       # YOLO person detection
├── streaming_processor.py # Main processor
├── config.py         # Configuration
└── main.py          # CLI entry point
```

### Contributing
1. Fork the repository
2. Create a feature branch
3. Make changes with tests
4. Submit a pull request

## License

[License information]

## Acknowledgments

- SAM2 team for the segmentation model
- Ultralytics for YOLOv8 detection
- Research referenced in `research.md`