test2/RUNPOD_DEPLOYMENT.md

# RunPod Deployment Guide

## Quick Start (Recommended)

### 1. Create RunPod Instance
- **Template**: `runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04`
- **GPU**: NVIDIA A40 (48GB VRAM)
- **Storage**: 50GB+ (for videos and models)
- **Persistent Storage**: Recommended for model caching

### 2. Connect and Setup
```bash
# After SSH/Terminal access
cd /workspace

# Clone your repository
git clone https://github.com/YOUR_USERNAME/sam2e.git
cd sam2e

# Make setup script executable and run
chmod +x runpod_setup.sh
./runpod_setup.sh

# Install SAM2 separately (not on PyPI)
pip install git+https://github.com/facebookresearch/segment-anything-2.git
```

### 3. Upload Your Video
```bash
# Option 1: wget from URL
wget -O /workspace/data/myvideo.mp4 "https://your-video-url.com/video.mp4"

# Option 2: Use RunPod's file browser
# Upload to /workspace/data/

# Option 3: rclone from cloud storage
rclone copy remote:path/to/video.mp4 /workspace/data/
```

### 4. Configure and Run
```bash
# Use RunPod-optimized config
cp config_runpod.yaml config.yaml

# Edit video path
nano config.yaml  # Update video_path

# Run processing
vr180-matting config.yaml
```

## Advanced Docker Deployment

### Option 1: Pre-built Docker Image
```bash
# Build and push to Docker Hub (do this locally first)
docker build -t yourusername/vr180-matting:latest .
docker push yourusername/vr180-matting:latest

# On RunPod, use your image as template
```

### Option 2: Build on RunPod
```bash
cd /workspace/sam2e
docker build -t vr180-matting .
docker run --gpus all -v /workspace/data:/app/data -v /workspace/output:/app/output -it vr180-matting
```

## Performance Tips for A40

### Optimal Settings
```yaml
processing:
  scale_factor: 0.75  # A40 can handle higher resolution
  chunk_size: 0      # Let it auto-calculate for 48GB

detection:
  model: "yolov8m"   # or yolov8l for better accuracy

matting:
  memory_offload: false  # Plenty of VRAM
  fp16: true            # Still use FP16 for speed
```

### Expected Performance
- **50% scale**: ~10-15 FPS processing
- **75% scale**: ~6-10 FPS processing
- **100% scale**: ~4-6 FPS processing

### Memory Monitoring
```bash
# Watch GPU usage while processing
watch -n 1 nvidia-smi

# Or in another terminal
nvidia-smi dmon -s um
```

## Batch Processing Script

Create `batch_process.py` for multiple videos:
```python
import os
import sys
from pathlib import Path
from vr180_matting.config import VR180Config
from vr180_matting.vr180_processor import VR180Processor

# Directory setup
input_dir = Path("/workspace/data/videos")
output_dir = Path("/workspace/output")
base_config = "config_runpod.yaml"

for video_file in input_dir.glob("*.mp4"):
    print(f"Processing: {video_file.name}")

    # Load base config
    config = VR180Config.from_yaml(base_config)

    # Update paths
    config.input.video_path = str(video_file)
    config.output.path = str(output_dir / f"matted_{video_file.stem}.mp4")

    # Process
    processor = VR180Processor(config)
    processor.process_video()

    print(f"Completed: {video_file.name}\n")
```

## Cost Optimization

### RunPod Spot Instances
- A40 spot: ~$0.44/hour vs $0.79/hour on-demand
- Perfect for batch processing
- Add persistence: $0.10/GB/month for model storage

### Processing Time Estimates
- 30s clip @ 50% scale: ~10 minutes = ~$0.07
- 1 hour video @ 50% scale: ~12 hours = ~$5.28
- 1 hour video @ 25% scale: ~6 hours = ~$2.64

### Auto-shutdown Script
```bash
# Add to end of processing script
echo "Processing complete, shutting down in 60 seconds..."
sleep 60
runpodctl stop instance
```

## Troubleshooting

### SAM2 Model Download Issues
```bash
# Manual download
cd /workspace/sam2e/models
wget https://dl.fbaipublicfiles.com/segment_anything_2/sam2_hiera_large.pt
```

### CUDA Version Mismatch
```bash
# Check CUDA version
nvcc --version
python -c "import torch; print(torch.version.cuda)"

# Reinstall PyTorch if needed
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
```

### Out of Memory on A40
```yaml
# Unlikely, but if it happens:
processing:
  scale_factor: 0.25
  chunk_size: 300
matting:
  memory_offload: true
```

## Monitoring Dashboard

Create `monitor.py` for real-time stats:
```python
import psutil
import GPUtil
import time

while True:
    # GPU stats
    gpus = GPUtil.getGPUs()
    for gpu in gpus:
        print(f"GPU: {gpu.memoryUsed}MB / {gpu.memoryTotal}MB ({gpu.memoryUtil*100:.1f}%)")

    # CPU/RAM stats
    print(f"RAM: {psutil.virtual_memory().percent}%")
    print(f"CPU: {psutil.cpu_percent()}%")
    print("-" * 40)
    time.sleep(2)
```

## Quick Commands Reference

```bash
# Test on short clip
vr180-matting config.yaml --dry-run

# Process with monitoring
vr180-matting config.yaml --verbose

# Override settings
vr180-matting config.yaml --scale 0.25 --format greenscreen

# Generate config
vr180-matting --generate-config my_config.yaml
```