scott/test2

Fork 0

Go to file

Scott Register 43be574729 debug

2025-07-27 09:26:47 -07:00

vr180_matting

fix scaling 1

2025-07-26 18:31:16 -07:00

vr180_streaming

debug

2025-07-27 09:26:47 -07:00

agents.md

first commit

2025-07-26 07:23:50 -07:00

config_example.yaml

install sam2 the way facebook says

2025-07-26 08:14:35 -07:00

config_runpod.yaml

optimizations A round 1

2025-07-26 11:04:04 -07:00

config-streaming-runpod.yaml

nvenc

2025-07-27 08:34:57 -07:00

docker-compose.yml

commit2 runpod

2025-07-26 07:35:44 -07:00

Dockerfile

commit2 runpod

2025-07-26 07:35:44 -07:00

quick_memory_check.py

quick check

2025-07-26 14:52:44 -07:00

README.md

fixup some running stuff

2025-07-27 08:10:20 -07:00

requirements.txt

more stuff

2025-07-27 08:19:42 -07:00

research.md

first commit

2025-07-26 07:23:50 -07:00

RUNPOD_DEPLOYMENT.md

no double clone

2025-07-26 07:49:24 -07:00

runpod_setup.sh

pytorch shit

2025-07-27 08:40:59 -07:00

setup.py

commit2 runpod

2025-07-26 07:35:44 -07:00

spec.md

optimizations A round 1

2025-07-26 11:04:04 -07:00

test_installation.py

install sam2 the way facebook says

2025-07-26 08:14:35 -07:00

test_inter_chunk_cleanup.py

growth

2025-07-26 15:31:07 -07:00

test_streaming.py

streaming part1

2025-07-27 08:01:08 -07:00

README.md

VR180 Human Matting with Det-SAM2

Automated human matting for VR180 3D side-by-side video using SAM2 and YOLOv8. Now with two processing approaches: chunked (original) and streaming (optimized).

Features

Automatic Person Detection: Uses YOLOv8 to eliminate manual point selection
Two Processing Modes:
- Chunked: Original stable implementation with higher memory usage
- Streaming: New 2-3x faster implementation with constant memory usage
VRAM Optimization: Memory management for consumer GPUs (10GB+)
VR180-Specific Processing: Stereo consistency with master-slave eye processing
Flexible Scaling: 25%, 50%, or 100% processing resolution
Multiple Output Formats: Alpha channel or green screen background
Cloud GPU Ready: Optimized for RunPod, Vast.ai deployment

Installation

# Clone repository
git clone <repository-url>
cd sam2e

# Install dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e .

Quick Start

Generate example configuration:

vr180-matting --generate-config config.yaml

Edit configuration file:

input:
  video_path: "path/to/your/vr180_video.mp4"
  
processing:
  scale_factor: 0.5  # Start with 50% for testing
  
output:
  path: "output/matted_video.mp4"
  format: "alpha"    # or "greenscreen"

Process video:

# Chunked approach (original)
vr180-matting config.yaml

# Streaming approach (optimized, 2-3x faster)
python -m vr180_streaming config-streaming.yaml

Processing Approaches

Streaming Approach (Recommended)

Memory: Constant ~50GB usage
Speed: 2-3x faster than chunked
GPU: 70%+ utilization
Best for: Long videos, limited RAM

python -m vr180_streaming --generate-config config-streaming.yaml
python -m vr180_streaming config-streaming.yaml

Chunked Approach (Original)

Memory: 100GB+ peak usage
Speed: Slower due to chunking overhead
GPU: Lower utilization (~2.5%)
Best for: Maximum stability, testing

vr180-matting --generate-config config-chunked.yaml
vr180-matting config-chunked.yaml

See STREAMING_VS_CHUNKED.md for detailed comparison.

RunPod Quick Setup

For cloud GPU processing on RunPod:

# After connecting to your RunPod instance
git clone <repository-url>
cd sam2e
./runpod_setup.sh

# Then run with Python directly:
python -m vr180_streaming config-streaming-runpod.yaml   # Streaming (recommended)
python -m vr180_matting config-chunked-runpod.yaml       # Chunked (original)

The setup script will:

Install all dependencies
Download SAM2 models
Create example configs for both approaches

Configuration

Input Settings

video_path: Path to VR180 side-by-side video file

Processing Settings

scale_factor: Resolution scaling (0.25, 0.5, 1.0)
chunk_size: Frames per chunk (0 for auto-calculation)
overlap_frames: Frame overlap between chunks

Detection Settings

confidence_threshold: YOLO detection confidence (0.1-1.0)
model: YOLO model size (yolov8n, yolov8s, yolov8m)

Matting Settings

use_disparity_mapping: Enable stereo optimization
memory_offload: CPU offloading for VRAM management
fp16: Use FP16 precision to reduce memory usage

Output Settings

path: Output file/directory path
format: "alpha" for RGBA or "greenscreen" for RGB with background
background_color: RGB background color for green screen mode
maintain_sbs: Keep side-by-side format vs separate eye outputs

Hardware Settings

device: "cuda" or "cpu"
max_vram_gb: VRAM limit (e.g., 10 for RTX 3080)

Usage Examples

Basic Processing

# Process with default settings
vr180-matting config.yaml

# Override scale factor
vr180-matting config.yaml --scale 0.25

# Use CPU processing
vr180-matting config.yaml --device cpu

Output Formats

# Alpha channel output (RGBA PNG sequence)
vr180-matting config.yaml --format alpha

# Green screen output (RGB video)
vr180-matting config.yaml --format greenscreen

Memory Optimization

# Smaller chunks for limited VRAM
vr180-matting config.yaml --chunk-size 300

# Validate config without processing
vr180-matting config.yaml --dry-run

Performance Guidelines

RTX 3080 (10GB VRAM)

25% Scale: ~5-8 FPS, 6 minutes for 30s clip
50% Scale: ~3-5 FPS, 10 minutes for 30s clip
100% Scale: Chunked processing, 15-20 minutes for 30s clip

Cloud GPU Scaling

A6000 (48GB): $6-8 per hour video
A100 (80GB): $8-12 per hour video
H100 (80GB): $6-10 per hour video

Troubleshooting

Common Issues

CUDA Out of Memory:

Reduce scale_factor (try 0.25)
Lower chunk_size
Enable memory_offload: true
Use fp16: true

No Persons Detected:

Lower confidence_threshold
Try larger YOLO model (yolov8s, yolov8m)
Check input video quality

Poor Edge Quality:

Increase scale_factor for final processing
Reduce compression in output format
Enable edge refinement post-processing

Memory Monitoring

The tool provides detailed memory usage reports:

VRAM Allocated: 8.2 GB
VRAM Free: 1.8 GB  
VRAM Utilization: 82%

Architecture

Processing Pipeline

Video Analysis: Load metadata, analyze SBS layout
Chunking: Divide video into memory-efficient chunks
Detection: YOLOv8 person detection per chunk
Matting: SAM2 mask propagation with memory optimization
VR180 Processing: Stereo-aware matting with consistency validation
Output: Combine chunks and save in requested format

Memory Management

Automatic VRAM monitoring and emergency cleanup
CPU offloading for frame storage
FP16 precision support
Adaptive chunk sizing based on available memory

Development

Project Structure

vr180_matting/          # Chunked approach (original)
├── config.py          # Configuration management
├── detector.py        # YOLOv8 person detection
├── sam2_wrapper.py    # SAM2 integration
├── memory_manager.py  # VRAM optimization
├── video_processor.py # Base video processing
├── vr180_processor.py # VR180-specific processing
└── main.py           # CLI entry point

vr180_streaming/       # Streaming approach (optimized)
├── frame_reader.py   # Streaming frame reader
├── frame_writer.py   # Direct ffmpeg pipe writer
├── stereo_manager.py # Stereo consistency management
├── sam2_streaming.py # SAM2 streaming integration
├── detector.py       # YOLO person detection
├── streaming_processor.py # Main processor
├── config.py         # Configuration
└── main.py          # CLI entry point

Contributing

Fork the repository
Create a feature branch
Make changes with tests
Submit a pull request

License

[License information]

Acknowledgments

SAM2 team for the segmentation model
Ultralytics for YOLOv8 detection
Research referenced in research.md