idk

amp
exccept
2025-07-27 10:37:40 -07:00 · 2025-07-27 10:23:25 -07:00 · 2025-07-27 10:20:25 -07:00 · 2025-07-27 10:16:39 -07:00 · 2025-07-27 10:10:00 -07:00 · 2025-07-27 10:01:12 -07:00
21 changed files with 3813 additions and 686 deletions
--- a/README.md
+++ b/README.md
@@ -1,16 +1,18 @@
 # VR180 Human Matting with Det-SAM2
-A proof-of-concept implementation for automated human matting on VR180 3D side-by-side equirectangular video using Det-SAM2 and YOLOv8 detection.
+Automated human matting for VR180 3D side-by-side video using SAM2 and YOLOv8. Now with two processing approaches: chunked (original) and streaming (optimized).
 ## Features
 - **Automatic Person Detection**: Uses YOLOv8 to eliminate manual point selection
- **VRAM Optimization**: Memory management for RTX 3080 (10GB) compatibility  
+- **Two Processing Modes**:
- **VR180-Specific Processing**: Side-by-side stereo handling with disparity mapping
+  - **Chunked**: Original stable implementation with higher memory usage
- **Flexible Scaling**: 25%, 50%, or 100% processing resolution with AI upscaling
+  - **Streaming**: New 2-3x faster implementation with constant memory usage
 - **VRAM Optimization**: Memory management for consumer GPUs (10GB+)
 - **VR180-Specific Processing**: Stereo consistency with master-slave eye processing
 - **Flexible Scaling**: 25%, 50%, or 100% processing resolution
 - **Multiple Output Formats**: Alpha channel or green screen background
- **Chunked Processing**: Handles long videos with memory-efficient chunking
+- **Cloud GPU Ready**: Optimized for RunPod, Vast.ai deployment
 - **Cloud GPU Ready**: Docker containerization for RunPod, Vast.ai deployment
 ## Installation
@@ -48,9 +50,59 @@ output:
 3. **Process video:**
 ```bash
 # Chunked approach (original)
 vr180-matting config.yaml
 # Streaming approach (optimized, 2-3x faster)
 python -m vr180_streaming config-streaming.yaml
 ```
 ## Processing Approaches
 ### Streaming Approach (Recommended)
 - **Memory**: Constant ~50GB usage
 - **Speed**: 2-3x faster than chunked
 - **GPU**: 70%+ utilization
 - **Best for**: Long videos, limited RAM
 ```bash
 python -m vr180_streaming --generate-config config-streaming.yaml
 python -m vr180_streaming config-streaming.yaml
 ```
 ### Chunked Approach (Original)
 - **Memory**: 100GB+ peak usage
 - **Speed**: Slower due to chunking overhead
 - **GPU**: Lower utilization (~2.5%)
 - **Best for**: Maximum stability, testing
 ```bash
 vr180-matting --generate-config config-chunked.yaml
 vr180-matting config-chunked.yaml
 ```
 See [STREAMING_VS_CHUNKED.md](STREAMING_VS_CHUNKED.md) for detailed comparison.
 ## RunPod Quick Setup
 For cloud GPU processing on RunPod:
 ```bash
 # After connecting to your RunPod instance
 git clone <repository-url>
 cd sam2e
 ./runpod_setup.sh
 # Then run with Python directly:
 python -m vr180_streaming config-streaming-runpod.yaml   # Streaming (recommended)
 python -m vr180_matting config-chunked-runpod.yaml       # Chunked (original)
 ```
 The setup script will:
 - Install all dependencies
 - Download SAM2 models
 - Create example configs for both approaches
 ## Configuration
 ### Input Settings
@@ -172,7 +224,7 @@ VRAM Utilization: 82%
 ### Project Structure
 ```
-vr180_matting/
+vr180_matting/          # Chunked approach (original)
 ├── config.py          # Configuration management
 ├── detector.py        # YOLOv8 person detection
 ├── sam2_wrapper.py    # SAM2 integration
@@ -180,6 +232,16 @@ vr180_matting/
 ├── video_processor.py # Base video processing
 ├── vr180_processor.py # VR180-specific processing
 └── main.py           # CLI entry point
 vr180_streaming/       # Streaming approach (optimized)
 ├── frame_reader.py   # Streaming frame reader
 ├── frame_writer.py   # Direct ffmpeg pipe writer
 ├── stereo_manager.py # Stereo consistency management
 ├── sam2_streaming.py # SAM2 streaming integration
 ├── detector.py       # YOLO person detection
 ├── streaming_processor.py # Main processor
 ├── config.py         # Configuration
 └── main.py          # CLI entry point
 ```
 ### Contributing
--- a/analyze_memory_profile.py
+++ b/analyze_memory_profile.py
@@ -1,193 +0,0 @@
 #!/usr/bin/env python3
 """
 Analyze memory profile JSON files to identify OOM causes
 """
 import json
 import glob
 import os
 import sys
 from pathlib import Path
 def analyze_memory_files():
    """Analyze partial memory profile files"""
    # Get all partial files in order
    files = sorted(glob.glob('memory_profile_partial_*.json'))
    if not files:
        print("❌ No memory profile files found!")
        print("Expected files like: memory_profile_partial_0.json")
        return
    print(f"🔍 Found {len(files)} memory profile files")
    print("=" * 60)
    peak_memory = 0
    peak_vram = 0
    critical_points = []
    all_checkpoints = []
    for i, file in enumerate(files):
        try:
            with open(file, 'r') as f:
                data = json.load(f)
            timeline = data.get('timeline', [])
            if not timeline:
                continue
            # Find peaks in this file
            file_peak_rss = max([d['rss_gb'] for d in timeline])
            file_peak_vram = max([d['vram_gb'] for d in timeline])
            if file_peak_rss > peak_memory:
                peak_memory = file_peak_rss
            if file_peak_vram > peak_vram:
                peak_vram = file_peak_vram
            # Find memory growth spikes (>3GB increase)
            for j in range(1, len(timeline)):
                prev_rss = timeline[j-1]['rss_gb']
                curr_rss = timeline[j]['rss_gb']
                growth = curr_rss - prev_rss
                if growth > 3.0:  # >3GB growth spike
                    checkpoint = timeline[j].get('checkpoint', f'sample_{j}')
                    critical_points.append({
                        'file': file,
                        'file_index': i,
                        'sample': j,
                        'timestamp': timeline[j]['timestamp'],
                        'rss_gb': curr_rss,
                        'vram_gb': timeline[j]['vram_gb'],
                        'growth_gb': growth,
                        'checkpoint': checkpoint
                    })
            # Collect all checkpoints
            checkpoints = [d for d in timeline if 'checkpoint' in d]
            for cp in checkpoints:
                cp['file'] = file
                cp['file_index'] = i
                all_checkpoints.append(cp)
            # Show progress for this file
            if timeline:
                start_rss = timeline[0]['rss_gb']
                end_rss = timeline[-1]['rss_gb']
                growth = end_rss - start_rss
                samples = len(timeline)
                print(f"📊 File {i+1:2d}: {start_rss:5.1f}GB → {end_rss:5.1f}GB "
                      f"(+{growth:4.1f}GB) [{samples:3d} samples]")
                # Show significant checkpoints from this file
                if checkpoints:
                    for cp in checkpoints:
                        print(f"    📍 {cp['checkpoint']}: {cp['rss_gb']:.1f}GB")
        except Exception as e:
            print(f"❌ Error reading {file}: {e}")
    print("\n" + "=" * 60)
    print("🎯 ANALYSIS SUMMARY")
    print("=" * 60)
    print(f"📈 Peak Memory: {peak_memory:.1f} GB")
    print(f"🎮 Peak VRAM: {peak_vram:.1f} GB")
    print(f"⚡ Growth Spikes: {len(critical_points)} events >3GB")
    if critical_points:
        print(f"\n💥 MEMORY GROWTH SPIKES (>3GB):")
        print("   Location                           Growth   Total   VRAM")
        print("   " + "-" * 55)
        for point in critical_points:
            location = point['checkpoint'][:30].ljust(30)
            print(f"   {location} +{point['growth_gb']:4.1f}GB → {point['rss_gb']:5.1f}GB  {point['vram_gb']:4.1f}GB")
    if all_checkpoints:
        print(f"\n📍 CHECKPOINT PROGRESSION:")
        print("   Checkpoint                         Memory   VRAM    File")
        print("   " + "-" * 55)
        for cp in all_checkpoints:
            checkpoint = cp['checkpoint'][:30].ljust(30)
            file_num = cp['file_index'] + 1
            print(f"   {checkpoint} {cp['rss_gb']:5.1f}GB  {cp['vram_gb']:4.1f}GB  #{file_num}")
    # Memory growth analysis
    if len(all_checkpoints) > 1:
        print(f"\n📊 MEMORY GROWTH ANALYSIS:")
        # Find the biggest memory jumps between checkpoints
        big_jumps = []
        for i in range(1, len(all_checkpoints)):
            prev_cp = all_checkpoints[i-1]
            curr_cp = all_checkpoints[i]
            growth = curr_cp['rss_gb'] - prev_cp['rss_gb']
            if growth > 2.0:  # >2GB jump
                big_jumps.append({
                    'from': prev_cp['checkpoint'],
                    'to': curr_cp['checkpoint'],
                    'growth': growth,
                    'from_memory': prev_cp['rss_gb'],
                    'to_memory': curr_cp['rss_gb']
                })
        if big_jumps:
            print("   Major jumps (>2GB):")
            for jump in big_jumps:
                print(f"   {jump['from']} → {jump['to']}: "
                      f"+{jump['growth']:.1f}GB ({jump['from_memory']:.1f}→{jump['to_memory']:.1f}GB)")
        else:
            print("   ✅ No major memory jumps detected")
    # Diagnosis
    print(f"\n🔬 DIAGNOSIS:")
    if peak_memory > 400:
        print("   🔴 CRITICAL: Memory usage exceeded 400GB")
        print("   💡 Recommendation: Reduce chunk_size to 200-300 frames")
    elif peak_memory > 200:
        print("   🟡 HIGH: Memory usage over 200GB")
        print("   💡 Recommendation: Reduce chunk_size to 400 frames")
    else:
        print("   🟢 MODERATE: Memory usage under 200GB")
    if critical_points:
        # Find most common growth spike locations
        spike_locations = {}
        for point in critical_points:
            location = point['checkpoint']
            spike_locations[location] = spike_locations.get(location, 0) + 1
        print("\n   🎯 Most problematic locations:")
        for location, count in sorted(spike_locations.items(), key=lambda x: x[1], reverse=True)[:3]:
            print(f"      {location}: {count} spikes")
    print(f"\n💡 NEXT STEPS:")
    if 'merge' in str(critical_points).lower():
        print("   1. Chunk merging still causing memory accumulation")
        print("   2. Check if streaming merge is actually being used")
        print("   3. Verify chunk files are being deleted immediately")
    elif 'propagation' in str(critical_points).lower():
        print("   1. SAM2 propagation using too much memory")
        print("   2. Reduce chunk_size further (try 300 frames)")
        print("   3. Enable more aggressive frame release")
    else:
        print("   1. Review the checkpoint progression above")
        print("   2. Focus on locations with biggest memory spikes")
        print("   3. Consider reducing chunk_size if spikes are large")
 def main():
    print("🔍 MEMORY PROFILE ANALYZER")
    print("Analyzing memory profile files for OOM causes...")
    print()
    analyze_memory_files()
 if __name__ == "__main__":
    main()
--- a/config-streaming-runpod.yaml
+++ b/config-streaming-runpod.yaml
@@ -0,0 +1,70 @@
 # VR180 Streaming Configuration for RunPod
 # Optimized for A6000 (48GB VRAM) or similar cloud GPUs
 input:
  video_path: "/workspace/input_video.mp4"  # Update with your input path
  start_frame: 0  # Resume from checkpoint if auto_resume is enabled
  max_frames: null  # null = process entire video, or set a number for testing
 streaming:
  mode: true  # True streaming - no chunking!
  buffer_frames: 10  # Small buffer for correction lookahead
  write_interval: 1  # Write every frame immediately
 processing:
  scale_factor: 0.5  # 0.5 = 4K processing for 8K input (good balance)
  adaptive_scaling: true  # Dynamically adjust scale based on GPU load
  target_gpu_usage: 0.7  # Target 70% GPU utilization
  min_scale: 0.25  # Never go below 25% scale
  max_scale: 1.0  # Can go up to full resolution if GPU allows
 detection:
  confidence_threshold: 0.7  # Person detection confidence
  model: "yolov8n"  # Fast model suitable for streaming (n/s/m/l/x)
  device: "cuda"
 matting:
  sam2_model_cfg: "sam2.1_hiera_l"  # Use large model for best quality
  sam2_checkpoint: "segment-anything-2/checkpoints/sam2.1_hiera_large.pt"
  memory_offload: true  # Critical for streaming - offload to CPU when needed
  fp16: false  # Disable FP16 to avoid type mismatch with compiled models for memory efficiency
  continuous_correction: true  # Periodically refine tracking
  correction_interval: 30  # Correct every 0.5 seconds at 60fps (for testing)
 stereo:
  mode: "master_slave"  # Left eye detects, right eye follows
  master_eye: "left"  # Which eye leads detection
  disparity_correction: true  # Adjust for stereo parallax
  consistency_threshold: 0.3  # Max allowed difference between eyes
  baseline: 65.0  # Interpupillary distance in mm
  focal_length: 1000.0  # Camera focal length in pixels
 output:
  path: "/workspace/output_video.mp4"  # Update with your output path
  format: "greenscreen"  # "greenscreen" or "alpha"
  background_color: [0, 255, 0]  # RGB for green screen
  video_codec: "h264_nvenc"  # GPU encoding for L40 (fallback to CPU if not available)
  quality_preset: "p4"  # NVENC preset (p1=fastest, p7=slowest/best quality)
  crf: 18  # Quality (0-51, lower = better, 18 = high quality)
  maintain_sbs: true  # Keep side-by-side format with audio
 hardware:
  device: "cuda"
  max_vram_gb: 44.0  # Conservative limit for L40 48GB VRAM
  max_ram_gb: 48.0  # RunPod container RAM limit
 recovery:
  enable_checkpoints: true  # Save progress for resume
  checkpoint_interval: 1000  # Save every ~16 seconds at 60fps
  auto_resume: true  # Automatically resume from last checkpoint
  checkpoint_dir: "./checkpoints"
 performance:
  profile_enabled: true  # Track performance metrics
  log_interval: 100  # Log progress every 100 frames
  memory_monitor: true  # Monitor RAM/VRAM usage
 # Usage:
 # 1. Update input.video_path and output.path
 # 2. Adjust scale_factor based on your GPU (0.25 for faster, 1.0 for quality)
 # 3. Run: python -m vr180_streaming config-streaming-runpod.yaml
--- a/debug_memory_leak.py
+++ b/debug_memory_leak.py
@@ -1,151 +0,0 @@
 #!/usr/bin/env python3
 """
 Debug memory leak between chunks - track exactly where memory accumulates
 """
 import psutil
 import gc
 from pathlib import Path
 import sys
 def detailed_memory_check(label):
    """Get detailed memory info"""
    process = psutil.Process()
    memory_info = process.memory_info()
    rss_gb = memory_info.rss / (1024**3)
    vms_gb = memory_info.vms / (1024**3)
    # System memory
    sys_memory = psutil.virtual_memory()
    available_gb = sys_memory.available / (1024**3)
    print(f"🔍 {label}:")
    print(f"   RSS: {rss_gb:.2f} GB (physical memory)")
    print(f"   VMS: {vms_gb:.2f} GB (virtual memory)")
    print(f"   Available: {available_gb:.2f} GB")
    return rss_gb
 def simulate_chunk_processing():
    """Simulate the chunk processing to see where memory accumulates"""
    print("🚀 SIMULATING CHUNK PROCESSING TO FIND MEMORY LEAK")
    print("=" * 60)
    base_memory = detailed_memory_check("0. Baseline")
    # Step 1: Import everything (with lazy loading)
    print("\n📦 Step 1: Imports")
    from vr180_matting.config import VR180Config
    from vr180_matting.vr180_processor import VR180Processor
    import_memory = detailed_memory_check("1. After imports")
    import_growth = import_memory - base_memory
    print(f"   Growth: +{import_growth:.2f} GB")
    # Step 2: Load config
    print("\n⚙️  Step 2: Config loading")
    config = VR180Config.from_yaml('config.yaml')
    config_memory = detailed_memory_check("2. After config load")
    config_growth = config_memory - import_memory
    print(f"   Growth: +{config_growth:.2f} GB")
    # Step 3: Initialize processor (models still lazy)
    print("\n🏗️  Step 3: Processor initialization")
    processor = VR180Processor(config)
    processor_memory = detailed_memory_check("3. After processor init")
    processor_growth = processor_memory - config_memory
    print(f"   Growth: +{processor_growth:.2f} GB")
    # Step 4: Load video info (lightweight)
    print("\n🎬 Step 4: Video info loading")
    try:
        video_info = processor.load_video_info(config.input.video_path)
        print(f"   Video: {video_info.get('width', 'unknown')}x{video_info.get('height', 'unknown')}, "
              f"{video_info.get('total_frames', 'unknown')} frames")
    except Exception as e:
        print(f"   Warning: Could not load video info: {e}")
    video_info_memory = detailed_memory_check("4. After video info")
    video_info_growth = video_info_memory - processor_memory
    print(f"   Growth: +{video_info_growth:.2f} GB")
    # Step 5: Simulate chunk 0 processing (this is where models actually load)
    print("\n🔄 Step 5: Simulating chunk 0 processing...")
    # This is where the real memory usage starts
    print("   Loading first 10 frames to trigger model loading...")
    try:
        # Read a small number of frames to trigger model loading
        frames = processor.read_video_frames(
            config.input.video_path,
            start_frame=0,
            num_frames=10,  # Just 10 frames to trigger model loading
            scale_factor=config.processing.scale_factor
        )
        frames_memory = detailed_memory_check("5a. After reading 10 frames")
        frames_growth = frames_memory - video_info_memory
        print(f"   10 frames growth: +{frames_growth:.2f} GB")
        # Free frames
        del frames
        gc.collect()
        after_free_memory = detailed_memory_check("5b. After freeing 10 frames")
        free_improvement = frames_memory - after_free_memory
        print(f"   Memory freed: -{free_improvement:.2f} GB")
    except Exception as e:
        print(f"   Could not simulate frame loading: {e}")
        after_free_memory = video_info_memory
    print(f"\n📊 MEMORY ANALYSIS:")
    print(f"   Baseline → Final: {base_memory:.2f}GB → {after_free_memory:.2f}GB")
    print(f"   Total growth: +{after_free_memory - base_memory:.2f}GB")
    if after_free_memory - base_memory > 10:
        print(f"   🔴 HIGH: Memory growth > 10GB before any real processing")
        print(f"   💡 This suggests model loading is using too much memory")
    elif after_free_memory - base_memory > 5:
        print(f"   🟡 MODERATE: Memory growth 5-10GB")
        print(f"   💡 Normal for model loading, but monitor chunk processing")
    else:
        print(f"   🟢 GOOD: Memory growth < 5GB")
        print(f"   💡 Initialization memory usage is reasonable")
    print(f"\n🎯 KEY INSIGHTS:")
    if import_growth > 1:
        print(f"   - Import growth: {import_growth:.2f}GB (fixed with lazy loading)")
    if processor_growth > 10:
        print(f"   - Processor init: {processor_growth:.2f}GB (investigate model pre-loading)")
    print(f"\n💡 RECOMMENDATIONS:")
    if after_free_memory - base_memory > 15:
        print(f"   1. Reduce chunk_size to 200-300 frames")
        print(f"   2. Use smaller models (yolov8n instead of yolov8m)")
        print(f"   3. Enable FP16 mode for SAM2")
    elif after_free_memory - base_memory > 8:
        print(f"   1. Monitor chunk processing carefully")
        print(f"   2. Use streaming merge (should be automatic)")
        print(f"   3. Current settings may be acceptable")
    else:
        print(f"   1. Settings look good for initialization")
        print(f"   2. Focus on chunk processing memory leaks")
 def main():
    if len(sys.argv) != 2:
        print("Usage: python debug_memory_leak.py <config.yaml>")
        print("This simulates initialization to find memory leaks")
        sys.exit(1)
    config_path = sys.argv[1]
    if not Path(config_path).exists():
        print(f"Config file not found: {config_path}")
        sys.exit(1)
    simulate_chunk_processing()
 if __name__ == "__main__":
    main()
--- a/memory_profiler_script.py
+++ b/memory_profiler_script.py
@@ -1,249 +0,0 @@
 #!/usr/bin/env python3
 """
 Memory profiling script for VR180 matting pipeline
 Tracks memory usage during processing to identify leaks
 """
 import sys
 import time
 import psutil
 import tracemalloc
 import subprocess
 import gc
 from pathlib import Path
 from typing import Dict, List, Tuple
 import threading
 import json
 class MemoryProfiler:
    def __init__(self, output_file: str = "memory_profile.json"):
        self.output_file = output_file
        self.data = []
        self.process = psutil.Process()
        self.running = False
        self.thread = None
        self.checkpoint_counter = 0
    def start_monitoring(self, interval: float = 1.0):
        """Start continuous memory monitoring"""
        tracemalloc.start()
        self.running = True
        self.thread = threading.Thread(target=self._monitor_loop, args=(interval,))
        self.thread.daemon = True
        self.thread.start()
        print(f"🔍 Memory monitoring started (interval: {interval}s)")
    def stop_monitoring(self):
        """Stop monitoring and save results"""
        self.running = False
        if self.thread:
            self.thread.join()
        # Get tracemalloc snapshot
        snapshot = tracemalloc.take_snapshot()
        top_stats = snapshot.statistics('lineno')
        # Save detailed results
        results = {
            'timeline': self.data,
            'top_memory_allocations': [
                {
                    'file': stat.traceback.format()[0],
                    'size_mb': stat.size / 1024 / 1024,
                    'count': stat.count
                }
                for stat in top_stats[:20]  # Top 20 allocations
            ],
            'summary': {
                'peak_rss_gb': max([d['rss_gb'] for d in self.data]) if self.data else 0,
                'peak_vram_gb': max([d['vram_gb'] for d in self.data]) if self.data else 0,
                'total_samples': len(self.data)
            }
        }
        with open(self.output_file, 'w') as f:
            json.dump(results, f, indent=2)
        tracemalloc.stop()
        print(f"📊 Memory profile saved to {self.output_file}")
    def _monitor_loop(self, interval: float):
        """Continuous monitoring loop"""
        while self.running:
            try:
                # System memory
                memory_info = self.process.memory_info()
                rss_gb = memory_info.rss / (1024**3)
                # System-wide memory
                sys_memory = psutil.virtual_memory()
                sys_used_gb = (sys_memory.total - sys_memory.available) / (1024**3)
                sys_available_gb = sys_memory.available / (1024**3)
                # GPU memory (if available)
                vram_gb = 0
                vram_free_gb = 0
                try:
                    result = subprocess.run(['nvidia-smi', '--query-gpu=memory.used,memory.free', 
                                           '--format=csv,noheader,nounits'], 
                                          capture_output=True, text=True, timeout=5)
                    if result.returncode == 0:
                        lines = result.stdout.strip().split('\n')
                        if lines and lines[0]:
                            used, free = lines[0].split(', ')
                            vram_gb = float(used) / 1024
                            vram_free_gb = float(free) / 1024
                except Exception:
                    pass
                # Tracemalloc current usage
                try:
                    current, peak = tracemalloc.get_traced_memory()
                    traced_mb = current / (1024**2)
                except Exception:
                    traced_mb = 0
                data_point = {
                    'timestamp': time.time(),
                    'rss_gb': rss_gb,
                    'vram_gb': vram_gb,
                    'vram_free_gb': vram_free_gb,
                    'sys_used_gb': sys_used_gb,
                    'sys_available_gb': sys_available_gb,
                    'traced_mb': traced_mb
                }
                self.data.append(data_point)
                # Print periodic updates and save partial data
                if len(self.data) % 10 == 0:  # Every 10 samples
                    print(f"🔍 Memory: RSS={rss_gb:.2f}GB, VRAM={vram_gb:.2f}GB, Sys={sys_used_gb:.1f}GB")
                    # Save partial data every 30 samples in case of crash
                    if len(self.data) % 30 == 0:
                        self._save_partial_data()
            except Exception as e:
                print(f"Monitoring error: {e}")
            time.sleep(interval)
    def _save_partial_data(self):
        """Save partial data to prevent loss on crash"""
        try:
            partial_file = f"memory_profile_partial_{self.checkpoint_counter}.json"
            with open(partial_file, 'w') as f:
                json.dump({
                    'timeline': self.data,
                    'status': 'partial_save',
                    'samples': len(self.data)
                }, f, indent=2)
            self.checkpoint_counter += 1
        except Exception as e:
            print(f"Failed to save partial data: {e}")
    def log_checkpoint(self, checkpoint_name: str):
        """Log a specific checkpoint"""
        if self.data:
            self.data[-1]['checkpoint'] = checkpoint_name
            latest = self.data[-1]
            print(f"📍 CHECKPOINT [{checkpoint_name}]: RSS={latest['rss_gb']:.2f}GB, VRAM={latest['vram_gb']:.2f}GB")
            # Save checkpoint data immediately
            self._save_partial_data()
 def run_with_profiling(config_path: str):
    """Run the VR180 matting with memory profiling"""
    profiler = MemoryProfiler("memory_profile_detailed.json")
    try:
        # Start monitoring
        profiler.start_monitoring(interval=2.0)  # Sample every 2 seconds
        # Log initial state
        profiler.log_checkpoint("STARTUP")
        # Import after starting profiler to catch import memory usage
        print("Importing VR180 processor...")
        from vr180_matting.vr180_processor import VR180Processor
        from vr180_matting.config import VR180Config
        profiler.log_checkpoint("IMPORTS_COMPLETE")
        # Load config
        print(f"Loading config from {config_path}")
        config = VR180Config.from_yaml(config_path)
        profiler.log_checkpoint("CONFIG_LOADED")
        # Initialize processor
        print("Initializing VR180 processor...")
        processor = VR180Processor(config)
        profiler.log_checkpoint("PROCESSOR_INITIALIZED")
        # Force garbage collection
        gc.collect()
        profiler.log_checkpoint("INITIAL_GC_COMPLETE")
        # Run processing
        print("Starting VR180 processing...")
        processor.process_video()
        profiler.log_checkpoint("PROCESSING_COMPLETE")
    except Exception as e:
        print(f"❌ Error during processing: {e}")
        profiler.log_checkpoint(f"ERROR: {str(e)}")
        raise
    finally:
        # Stop monitoring and save results
        profiler.stop_monitoring()
        # Print summary
        print("\n" + "="*60)
        print("MEMORY PROFILING SUMMARY")
        print("="*60)
        if profiler.data:
            peak_rss = max([d['rss_gb'] for d in profiler.data])
            peak_vram = max([d['vram_gb'] for d in profiler.data])
            print(f"Peak RSS Memory: {peak_rss:.2f} GB")
            print(f"Peak VRAM Usage: {peak_vram:.2f} GB")
            print(f"Total Samples: {len(profiler.data)}")
            # Show checkpoints
            checkpoints = [d for d in profiler.data if 'checkpoint' in d]
            if checkpoints:
                print(f"\nCheckpoints ({len(checkpoints)}):")
                for cp in checkpoints:
                    print(f"  {cp['checkpoint']}: RSS={cp['rss_gb']:.2f}GB, VRAM={cp['vram_gb']:.2f}GB")
        print(f"\nDetailed profile saved to: {profiler.output_file}")
 def main():
    if len(sys.argv) != 2:
        print("Usage: python memory_profiler_script.py <config.yaml>")
        print("\nThis script runs VR180 matting with detailed memory profiling")
        print("It will:")
        print("- Monitor RSS, VRAM, and system memory every 2 seconds")
        print("- Track memory allocations with tracemalloc")
        print("- Log checkpoints at key processing stages")
        print("- Save detailed JSON report for analysis")
        sys.exit(1)
    config_path = sys.argv[1]
    if not Path(config_path).exists():
        print(f"❌ Config file not found: {config_path}")
        sys.exit(1)
    print("🚀 Starting VR180 Memory Profiling")
    print(f"Config: {config_path}")
    print("="*60)
    run_with_profiling(config_path)
 if __name__ == "__main__":
    main()
--- a/requirements.txt
+++ b/requirements.txt
@@ -12,4 +12,4 @@ ffmpeg-python>=0.2.0
 decord>=0.6.0
 # GPU acceleration (optional but recommended for stereo validation speedup)
 # cupy-cuda11x>=12.0.0  # For CUDA 11.x
-# cupy-cuda12x>=12.0.0  # For CUDA 12.x - uncomment appropriate version
+cupy-cuda12x>=12.0.0  # For CUDA 12.x (most common on modern systems)
--- a/runpod_setup.sh
+++ b/runpod_setup.sh
@@ -1,113 +1,257 @@
 #!/bin/bash
-# RunPod Quick Setup Script
+# VR180 Matting Unified Setup Script for RunPod
 # Supports both chunked and streaming implementations
 # Optimized for L40, A6000, and other NVENC-capable GPUs
-echo "🚀 Setting up VR180 Matting on RunPod..."
+set -e  # Exit on error
 echo "🚀 VR180 Matting Setup for RunPod"
 echo "=================================="
 echo "GPU: $(nvidia-smi --query-gpu=name --format=csv,noheader)"
 echo "VRAM: $(nvidia-smi --query-gpu=memory.total --format=csv,noheader)"
 echo ""
 # Function to print colored output
 print_status() {
    echo -e "\n\033[1;34m$1\033[0m"
 }
 print_success() {
    echo -e "\033[1;32m✅ $1\033[0m"
 }
 print_error() {
    echo -e "\033[1;31m❌ $1\033[0m"
 }
 # Check if running on RunPod
 if [ -d "/workspace" ]; then
    print_status "Detected RunPod environment"
    WORKSPACE="/workspace"
 else
    print_status "Not on RunPod - using current directory"
    WORKSPACE="$(pwd)"
 fi
 # Update system
-echo "📦 Installing system dependencies..."
+print_status "Installing system dependencies..."
-apt-get update && apt-get install -y ffmpeg git wget nano
+apt-get update && apt-get install -y \
    ffmpeg \
    git \
    wget \
    nano \
    vim \
    htop \
    nvtop \
    libgl1-mesa-glx \
    libglib2.0-0 \
    libsm6 \
    libxext6 \
    libxrender-dev \
    libgomp1 || print_error "Failed to install some packages"
 # Install Python dependencies
-echo "🐍 Installing Python dependencies..."
+print_status "Installing Python dependencies..."
 pip install --upgrade pip
 pip install -r requirements.txt
 # Install decord for SAM2 video loading
-echo "📹 Installing decord for video processing..."
+print_status "Installing video processing libraries..."
-pip install decord
+pip install decord ffmpeg-python
-# Install CuPy for GPU acceleration of stereo validation
+# Install CuPy for GPU acceleration (CUDA 12 is standard on modern RunPod)
-echo "🚀 Installing CuPy for GPU acceleration..."
+print_status "Installing CuPy for GPU acceleration..."
-# Auto-detect CUDA version and install appropriate CuPy
+if command -v nvidia-smi &> /dev/null; then
-python -c "
+    print_status "Installing CuPy for CUDA 12.x (standard on RunPod)..."
-import torch
+    pip install cupy-cuda12x>=12.0.0 && print_success "Installed CuPy for CUDA 12.x"
-if torch.cuda.is_available():
+else
-    cuda_version = torch.version.cuda
+    print_error "NVIDIA GPU not detected, skipping CuPy installation"
    print(f'CUDA version detected: {cuda_version}')
    if cuda_version.startswith('11.'):
        import subprocess
        subprocess.run(['pip', 'install', 'cupy-cuda11x>=12.0.0'])
        print('Installed CuPy for CUDA 11.x')
    elif cuda_version.startswith('12.'):
        import subprocess
        subprocess.run(['pip', 'install', 'cupy-cuda12x>=12.0.0'])
        print('Installed CuPy for CUDA 12.x')
    else:
        print(f'Unsupported CUDA version: {cuda_version}')
 else:
    print('CUDA not available, skipping CuPy installation')
 "
 # Install SAM2 separately (not on PyPI)
 echo "🎯 Installing SAM2..."
 pip install git+https://github.com/facebookresearch/segment-anything-2.git
 # Install project
 echo "📦 Installing VR180 matting package..."
 pip install -e .
 # Download models
 echo "📥 Downloading models..."
 mkdir -p models
 # Download YOLOv8 models
 python -c "from ultralytics import YOLO; YOLO('yolov8n.pt'); YOLO('yolov8m.pt')"
 # Clone SAM2 repo for checkpoints
 echo "📥 Cloning SAM2 for model checkpoints..."
 if [ ! -d "segment-anything-2" ]; then
    git clone https://github.com/facebookresearch/segment-anything-2.git
 fi
-# Download SAM2 checkpoints using their official script
+# Clone and install SAM2
 print_status "Installing Segment Anything 2..."
 if [ ! -d "segment-anything-2" ]; then
    git clone https://github.com/facebookresearch/segment-anything-2.git
    cd segment-anything-2
    pip install -e .
    cd ..
 else
    print_status "SAM2 already cloned, updating..."
    cd segment-anything-2
    git pull
    pip install -e . --upgrade
    cd ..
 fi
 # Fix PyTorch version conflicts after SAM2 installation
 print_status "Fixing PyTorch version conflicts..."
 pip install torchaudio --upgrade --no-deps || print_error "Failed to upgrade torchaudio"
 # Download SAM2 checkpoints
 print_status "Downloading SAM2 checkpoints..."
 cd segment-anything-2/checkpoints
 if [ ! -f "sam2.1_hiera_large.pt" ]; then
    echo "📥 Downloading SAM2 checkpoints..."
    chmod +x download_ckpts.sh
-    bash download_ckpts.sh
+    bash download_ckpts.sh || {
        print_error "Automatic download failed, trying manual download..."
        wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt
    }
 fi
 cd ../..
 # Download YOLOv8 models
 print_status "Downloading YOLO models..."
 python -c "from ultralytics import YOLO; YOLO('yolov8n.pt'); print('✅ YOLOv8n downloaded')"
 python -c "from ultralytics import YOLO; YOLO('yolov8m.pt'); print('✅ YOLOv8m downloaded')"
 # Create working directories
-mkdir -p /workspace/data /workspace/output
+print_status "Creating directory structure..."
 mkdir -p $WORKSPACE/sam2e/{input,output,checkpoints}
 mkdir -p /workspace/data /workspace/output  # RunPod standard dirs
 cd $WORKSPACE/sam2e
 # Create example configs if they don't exist
 print_status "Creating example configuration files..."
 # Chunked approach config
 if [ ! -f "config-chunked-runpod.yaml" ]; then
    print_status "Creating chunked approach config..."
    cat > config-chunked-runpod.yaml << 'EOF'
 # VR180 Matting - Chunked Approach (Original)
 input:
  video_path: "/workspace/data/input_video.mp4"
 processing:
  scale_factor: 0.5  # 0.5 for 8K input = 4K processing
  chunk_size: 600    # Larger chunks for cloud GPU
  overlap_frames: 60 # Overlap between chunks
 detection:
  confidence_threshold: 0.7
  model: "yolov8n"
 matting:
  use_disparity_mapping: true
  memory_offload: true
  fp16: true
  sam2_model_cfg: "sam2.1_hiera_l"
  sam2_checkpoint: "segment-anything-2/checkpoints/sam2.1_hiera_large.pt"
 output:
  path: "/workspace/output/output_video.mp4"
  format: "greenscreen"  # or "alpha"
  background_color: [0, 255, 0]
  maintain_sbs: true
 hardware:
  device: "cuda"
  max_vram_gb: 40  # Conservative for 48GB GPU
 EOF
    print_success "Created config-chunked-runpod.yaml"
 fi
 # Streaming approach config already exists
 if [ ! -f "config-streaming-runpod.yaml" ]; then
    print_error "config-streaming-runpod.yaml not found - please check the repository"
 fi
 # Skip creating convenience scripts - use Python directly
 # Test installation
-echo ""
+print_status "Testing installation..."
-echo "🧪 Testing installation..."
+python -c "
-python test_installation.py
+import sys
 print('Python:', sys.version)
 try:
    import torch
    print(f'✅ PyTorch: {torch.__version__}')
    print(f'   CUDA available: {torch.cuda.is_available()}')
    if torch.cuda.is_available():
        print(f'   GPU: {torch.cuda.get_device_name(0)}')
        print(f'   VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f}GB')
 except: print('❌ PyTorch not available')
 try:
    import cv2
    print(f'✅ OpenCV: {cv2.__version__}')
 except: print('❌ OpenCV not available')
 try:
    from ultralytics import YOLO
    print('✅ YOLO available')
 except: print('❌ YOLO not available')
 try:
    import yaml, numpy, psutil
    print('✅ Other dependencies available')
 except: print('❌ Some dependencies missing')
 "
 # Run streaming test if available
 if [ -f "test_streaming.py" ]; then
    print_status "Running streaming implementation test..."
    python test_streaming.py || print_error "Streaming test failed"
 fi
 # Check which SAM2 models are available
-echo ""
+print_status "SAM2 Models available:"
 echo "📊 SAM2 Models available:"
 if [ -f "segment-anything-2/checkpoints/sam2.1_hiera_large.pt" ]; then
-    echo "  ✅ sam2.1_hiera_large.pt (recommended)"
+    print_success "sam2.1_hiera_large.pt (recommended for quality)"
    echo "     Config: sam2_model_cfg: 'sam2.1_hiera_l'"
    echo "     Checkpoint: sam2_checkpoint: 'segment-anything-2/checkpoints/sam2.1_hiera_large.pt'"
 fi
 if [ -f "segment-anything-2/checkpoints/sam2.1_hiera_base_plus.pt" ]; then
-    echo "  ✅ sam2.1_hiera_base_plus.pt"
+    print_success "sam2.1_hiera_base_plus.pt (balanced)"
-    echo "     Config: sam2_model_cfg: 'sam2.1_hiera_base_plus'"
+    echo "     Config: sam2_model_cfg: 'sam2.1_hiera_b+'"
 fi
-if [ -f "segment-anything-2/checkpoints/sam2_hiera_large.pt" ]; then
+if [ -f "segment-anything-2/checkpoints/sam2.1_hiera_small.pt" ]; then
-    echo "  ✅ sam2_hiera_large.pt (legacy)"
+    print_success "sam2.1_hiera_small.pt (fast)"
-    echo "     Config: sam2_model_cfg: 'sam2_hiera_l'"
+    echo "     Config: sam2_model_cfg: 'sam2.1_hiera_s'"
 fi
-echo ""
+# Print usage instructions
-echo "✅ Setup complete!"
+print_success "Setup complete!"
-echo ""
+echo
-echo "📝 Quick start:"
+echo "📋 Usage Instructions:"
-echo "1. Upload your VR180 video to /workspace/data/"
+echo "====================="
-echo "   wget -O /workspace/data/video.mp4 'your-video-url'"
+echo
-echo ""
+echo "1. Upload your VR180 video:"
-echo "2. Use the RunPod optimized config:"
+echo "   wget -O /workspace/data/input_video.mp4 'your-video-url'"
-echo "   cp config_runpod.yaml config.yaml"
+echo "   # Or use RunPod's file upload feature"
-echo "   nano config.yaml  # Update video path"
+echo
-echo ""
+echo "2. Choose your processing approach:"
-echo "3. Run the matting:"
+echo
-echo "   vr180-matting config.yaml"
+echo "   a) STREAMING (Recommended - 2-3x faster, constant memory):"
-echo ""
+echo "      python -m vr180_streaming config-streaming-runpod.yaml"
-echo "💡 For A40 GPU, you can use higher quality settings:"
+echo
-echo "   vr180-matting config.yaml --scale 0.75"
+echo "   b) CHUNKED (Original - more stable, higher memory):"
 echo "      python -m vr180_matting config-chunked-runpod.yaml"
 echo
 echo "3. Optional: Edit configs first:"
 echo "   nano config-streaming-runpod.yaml  # For streaming"
 echo "   nano config-chunked-runpod.yaml    # For chunked"
 echo
 echo "4. Monitor progress:"
 echo "   - GPU usage: nvtop"
 echo "   - System resources: htop"
 echo "   - Output directory: ls -la /workspace/output/"
 echo
 echo "📊 Performance Tips:"
 echo "==================="
 echo "- Streaming: Best for long videos, uses ~50GB RAM constant"
 echo "- Chunked: More stable but uses 100GB+ RAM in spikes"
 echo "- Scale factor: 0.25 (fast) → 0.5 (balanced) → 1.0 (quality)"
 echo "- L40/A6000: Can handle 0.5-0.75 scale easily with NVENC GPU encoding"
 echo "- Monitor VRAM with: nvidia-smi -l 1"
 echo
 echo "🎯 Example Commands:"
 echo "==================="
 echo "# Process with custom output path:"
 echo "python -m vr180_streaming config-streaming-runpod.yaml --output /workspace/output/my_video.mp4"
 echo
 echo "# Process specific frame range:"
 echo "python -m vr180_streaming config-streaming-runpod.yaml --start-frame 1000 --max-frames 5000"
 echo
 echo "# Override scale for quality:"
 echo "python -m vr180_streaming config-streaming-runpod.yaml --scale 0.75"
 echo
 echo "Happy matting! 🎬"
--- a/test_streaming.py
+++ b/test_streaming.py
@@ -0,0 +1,142 @@
 #!/usr/bin/env python3
 """
 Test script to verify streaming implementation components
 """
 import sys
 from pathlib import Path
 def test_imports():
    """Test that all modules can be imported"""
    print("Testing imports...")
    try:
        from vr180_streaming import VR180StreamingProcessor, StreamingConfig
        print("✅ Main imports successful")
    except ImportError as e:
        print(f"❌ Failed to import main modules: {e}")
        return False
    try:
        from vr180_streaming.frame_reader import StreamingFrameReader
        from vr180_streaming.frame_writer import StreamingFrameWriter
        from vr180_streaming.stereo_manager import StereoConsistencyManager
        from vr180_streaming.sam2_streaming import SAM2StreamingProcessor
        from vr180_streaming.detector import PersonDetector
        print("✅ Component imports successful")
    except ImportError as e:
        print(f"❌ Failed to import components: {e}")
        return False
    return True
 def test_config():
    """Test configuration loading"""
    print("\nTesting configuration...")
    try:
        from vr180_streaming.config import StreamingConfig
        # Test creating config
        config = StreamingConfig()
        print("✅ Config creation successful")
        # Test config validation
        errors = config.validate()
        print(f"   Config errors: {len(errors)} (expected, no paths set)")
        return True
    except Exception as e:
        print(f"❌ Config test failed: {e}")
        return False
 def test_dependencies():
    """Test required dependencies"""
    print("\nTesting dependencies...")
    deps_ok = True
    # Test PyTorch
    try:
        import torch
        print(f"✅ PyTorch {torch.__version__}")
        if torch.cuda.is_available():
            print(f"   CUDA available: {torch.cuda.get_device_name(0)}")
        else:
            print("   ⚠️  CUDA not available")
    except ImportError:
        print("❌ PyTorch not installed")
        deps_ok = False
    # Test OpenCV
    try:
        import cv2
        print(f"✅ OpenCV {cv2.__version__}")
    except ImportError:
        print("❌ OpenCV not installed")
        deps_ok = False
    # Test Ultralytics
    try:
        from ultralytics import YOLO
        print("✅ Ultralytics YOLO available")
    except ImportError:
        print("❌ Ultralytics not installed")
        deps_ok = False
    # Test other deps
    try:
        import yaml
        import numpy as np
        import psutil
        print("✅ Other dependencies available")
    except ImportError as e:
        print(f"❌ Missing dependency: {e}")
        deps_ok = False
    return deps_ok
 def test_frame_reader():
    """Test frame reader with a dummy video"""
    print("\nTesting StreamingFrameReader...")
    try:
        from vr180_streaming.frame_reader import StreamingFrameReader
        # Would need an actual video file to test
        print("⚠️  Skipping reader test (no test video)")
        return True
    except Exception as e:
        print(f"❌ Frame reader test failed: {e}")
        return False
 def main():
    """Run all tests"""
    print("🧪 VR180 Streaming Implementation Test")
    print("=" * 40)
    all_ok = True
    # Run tests
    all_ok &= test_imports()
    all_ok &= test_config()
    all_ok &= test_dependencies()
    all_ok &= test_frame_reader()
    print("\n" + "=" * 40)
    if all_ok:
        print("✅ All tests passed!")
        print("\nNext steps:")
        print("1. Install SAM2: cd segment-anything-2 && pip install -e .")
        print("2. Download checkpoints: cd checkpoints && ./download_ckpts.sh")
        print("3. Create config: python -m vr180_streaming --generate-config my_config.yaml")
        print("4. Run processing: python -m vr180_streaming my_config.yaml")
    else:
        print("❌ Some tests failed")
        print("\nPlease run: pip install -r requirements.txt")
    return 0 if all_ok else 1
 if __name__ == "__main__":
    sys.exit(main())
--- a/vr180_streaming/README.md
+++ b/vr180_streaming/README.md
@@ -0,0 +1,172 @@
 # VR180 Streaming Matting
 True streaming implementation for VR180 human matting with constant memory usage.
 ## Key Features
 - **True Streaming**: Process frames one at a time without accumulation
 - **Constant Memory**: No memory buildup regardless of video length  
 - **Stereo Consistency**: Master-slave processing ensures matched detection
 - **2-3x Faster**: Eliminates chunking overhead from original implementation
 - **Direct FFmpeg Pipe**: Zero-copy frame writing
 ## Architecture
 ```
 Input Video → Frame Reader → SAM2 Streaming → Frame Writer → Output Video
     ↓             ↓              ↓                ↓
  (no chunks)  (one frame)   (propagate)    (immediate write)
 ```
 ### Components
 1. **StreamingFrameReader** (`frame_reader.py`)
   - Reads frames one at a time
   - Supports seeking for resume/recovery
   - Constant memory footprint
 2. **StreamingFrameWriter** (`frame_writer.py`)
   - Direct pipe to ffmpeg encoder
   - GPU-accelerated encoding (H.264/H.265)
   - Preserves audio from source
 3. **StereoConsistencyManager** (`stereo_manager.py`)
   - Master-slave eye processing
   - Disparity-aware detection transfer
   - Automatic consistency validation
 4. **SAM2StreamingProcessor** (`sam2_streaming.py`)
   - Integrates with SAM2's native video predictor
   - Memory-efficient state management
   - Continuous correction support
 5. **VR180StreamingProcessor** (`streaming_processor.py`)
   - Main orchestrator
   - Adaptive GPU scaling
   - Checkpoint/resume support
 ## Usage
 ### Quick Start
 ```bash
 # Generate example config
 python -m vr180_streaming --generate-config my_config.yaml
 # Edit config with your paths
 vim my_config.yaml
 # Run processing
 python -m vr180_streaming my_config.yaml
 ```
 ### Command Line Options
 ```bash
 # Override output path
 python -m vr180_streaming config.yaml --output /path/to/output.mp4
 # Process specific frame range
 python -m vr180_streaming config.yaml --start-frame 1000 --max-frames 5000
 # Override scale factor
 python -m vr180_streaming config.yaml --scale 0.25
 # Dry run to validate config
 python -m vr180_streaming config.yaml --dry-run
 ```
 ## Configuration
 Key configuration options:
 ```yaml
 streaming:
  mode: true  # Enable streaming mode
  buffer_frames: 10  # Lookahead buffer
 processing:
  scale_factor: 0.5  # Resolution scaling
  adaptive_scaling: true  # Dynamic GPU optimization
 stereo:
  mode: "master_slave"  # Stereo consistency mode
  master_eye: "left"  # Which eye leads detection
 recovery:
  enable_checkpoints: true  # Save progress
  auto_resume: true  # Resume from checkpoint
 ```
 ## Performance
 Compared to chunked implementation:
 | Metric | Chunked | Streaming | Improvement |
 |--------|---------|-----------|-------------|
 | Speed | ~0.54s/frame | ~0.18s/frame | 3x faster |
 | Memory | 100GB+ peak | <50GB constant | 2x lower |
 | VRAM | 2.5% usage | 70%+ usage | 28x better |
 | Consistency | Variable | Guaranteed | ✓ |
 ## Requirements
 - Python 3.10+
 - PyTorch 2.0+
 - CUDA GPU (8GB+ VRAM recommended)
 - FFmpeg with GPU encoding support
 - SAM2 (segment-anything-2)
 ## Troubleshooting
 ### Out of Memory
 - Reduce `scale_factor` in config
 - Enable `adaptive_scaling`
 - Ensure `memory_offload: true`
 ### Stereo Mismatch
 - Adjust `consistency_threshold`
 - Enable `disparity_correction`
 - Check `baseline` and `focal_length` settings
 ### Slow Processing
 - Use GPU video codec (`h264_nvenc`)
 - Reduce `correction_interval`
 - Lower output quality (`crf: 23`)
 ## Advanced Features
 ### Adaptive Scaling
 Automatically adjusts processing resolution based on GPU load:
 ```yaml
 processing:
  adaptive_scaling: true
  target_gpu_usage: 0.7
  min_scale: 0.25
  max_scale: 1.0
 ```
 ### Continuous Correction
 Periodically refines tracking for long videos:
 ```yaml
 matting:
  continuous_correction: true
  correction_interval: 300  # Every 5 seconds at 60fps
 ```
 ### Checkpoint Recovery
 Automatically resume from interruptions:
 ```yaml
 recovery:
  enable_checkpoints: true
  checkpoint_interval: 1000
  auto_resume: true
 ```
 ## Contributing
 Please ensure your code follows the streaming architecture principles:
 - No frame accumulation in memory
 - Immediate processing and writing
 - Proper resource cleanup
 - Checkpoint support for long videos
--- a/vr180_streaming/init.py
+++ b/vr180_streaming/init.py
@@ -0,0 +1,8 @@
 """VR180 Streaming Matting - True streaming implementation for constant memory usage"""
 __version__ = "0.1.0"
 from .streaming_processor import VR180StreamingProcessor
 from .config import StreamingConfig
 __all__ = ["VR180StreamingProcessor", "StreamingConfig"]
--- a/vr180_streaming/main.py
+++ b/vr180_streaming/main.py
@@ -0,0 +1,9 @@
 #!/usr/bin/env python3
 """
 VR180 Streaming entry point for python -m vr180_streaming
 """
 from .main import main
 if __name__ == "__main__":
    main()
--- a/vr180_streaming/config.py
+++ b/vr180_streaming/config.py
@@ -0,0 +1,242 @@
 """
 Configuration management for VR180 streaming
 """
 import yaml
 from pathlib import Path
 from typing import Dict, Any, List, Optional
 from dataclasses import dataclass, field
@dataclass
 class InputConfig:
    video_path: str
    start_frame: int = 0
    max_frames: Optional[int] = None
@dataclass
 class StreamingOptions:
    mode: bool = True
    buffer_frames: int = 10
    write_interval: int = 1  # Write every N frames
@dataclass
 class ProcessingConfig:
    scale_factor: float = 0.5
    adaptive_scaling: bool = True
    target_gpu_usage: float = 0.7
    min_scale: float = 0.25
    max_scale: float = 1.0
@dataclass
 class DetectionConfig:
    confidence_threshold: float = 0.7
    model: str = "yolov8n"
    device: str = "cuda"
@dataclass
 class MattingConfig:
    sam2_model_cfg: str = "sam2.1_hiera_l"
    sam2_checkpoint: str = "segment-anything-2/checkpoints/sam2.1_hiera_large.pt"
    memory_offload: bool = True
    fp16: bool = True
    continuous_correction: bool = True
    correction_interval: int = 300
@dataclass
 class StereoConfig:
    mode: str = "master_slave"  # "master_slave", "independent", "joint"
    master_eye: str = "left"
    disparity_correction: bool = True
    consistency_threshold: float = 0.3
    baseline: float = 65.0  # mm
    focal_length: float = 1000.0  # pixels
@dataclass
 class OutputConfig:
    path: str
    format: str = "greenscreen"  # "alpha" or "greenscreen"
    background_color: List[int] = field(default_factory=lambda: [0, 255, 0])
    video_codec: str = "h264_nvenc"
    quality_preset: str = "p4"
    crf: int = 18
    maintain_sbs: bool = True
@dataclass
 class HardwareConfig:
    device: str = "cuda"
    max_vram_gb: float = 40.0
    max_ram_gb: float = 48.0
@dataclass 
 class RecoveryConfig:
    enable_checkpoints: bool = True
    checkpoint_interval: int = 1000
    auto_resume: bool = True
    checkpoint_dir: str = "./checkpoints"
@dataclass
 class PerformanceConfig:
    profile_enabled: bool = True
    log_interval: int = 100
    memory_monitor: bool = True
 class StreamingConfig:
    """Complete configuration for VR180 streaming processing"""
    def __init__(self):
        self.input = InputConfig("")
        self.streaming = StreamingOptions()
        self.processing = ProcessingConfig()
        self.detection = DetectionConfig()
        self.matting = MattingConfig()
        self.stereo = StereoConfig()
        self.output = OutputConfig("")
        self.hardware = HardwareConfig()
        self.recovery = RecoveryConfig()
        self.performance = PerformanceConfig()
    @classmethod
    def from_yaml(cls, yaml_path: str) -> 'StreamingConfig':
        """Load configuration from YAML file"""
        config = cls()
        with open(yaml_path, 'r') as f:
            data = yaml.safe_load(f)
        # Update each section
        if 'input' in data:
            config.input = InputConfig(**data['input'])
        if 'streaming' in data:
            config.streaming = StreamingOptions(**data['streaming'])
        if 'processing' in data:
            for key, value in data['processing'].items():
                setattr(config.processing, key, value)
        if 'detection' in data:
            config.detection = DetectionConfig(**data['detection'])
        if 'matting' in data:
            config.matting = MattingConfig(**data['matting'])
        if 'stereo' in data:
            config.stereo = StereoConfig(**data['stereo'])
        if 'output' in data:
            config.output = OutputConfig(**data['output'])
        if 'hardware' in data:
            config.hardware = HardwareConfig(**data['hardware'])
        if 'recovery' in data:
            config.recovery = RecoveryConfig(**data['recovery'])
        if 'performance' in data:
            for key, value in data['performance'].items():
                setattr(config.performance, key, value)
        return config
    def to_dict(self) -> Dict[str, Any]:
        """Convert configuration to dictionary"""
        return {
            'input': {
                'video_path': self.input.video_path,
                'start_frame': self.input.start_frame,
                'max_frames': self.input.max_frames
            },
            'streaming': {
                'mode': self.streaming.mode,
                'buffer_frames': self.streaming.buffer_frames,
                'write_interval': self.streaming.write_interval
            },
            'processing': {
                'scale_factor': self.processing.scale_factor,
                'adaptive_scaling': self.processing.adaptive_scaling,
                'target_gpu_usage': self.processing.target_gpu_usage,
                'min_scale': self.processing.min_scale,
                'max_scale': self.processing.max_scale
            },
            'detection': {
                'confidence_threshold': self.detection.confidence_threshold,
                'model': self.detection.model,
                'device': self.detection.device
            },
            'matting': {
                'sam2_model_cfg': self.matting.sam2_model_cfg,
                'sam2_checkpoint': self.matting.sam2_checkpoint,
                'memory_offload': self.matting.memory_offload,
                'fp16': self.matting.fp16,
                'continuous_correction': self.matting.continuous_correction,
                'correction_interval': self.matting.correction_interval
            },
            'stereo': {
                'mode': self.stereo.mode,
                'master_eye': self.stereo.master_eye,
                'disparity_correction': self.stereo.disparity_correction,
                'consistency_threshold': self.stereo.consistency_threshold,
                'baseline': self.stereo.baseline,
                'focal_length': self.stereo.focal_length
            },
            'output': {
                'path': self.output.path,
                'format': self.output.format,
                'background_color': self.output.background_color,
                'video_codec': self.output.video_codec,
                'quality_preset': self.output.quality_preset,
                'crf': self.output.crf,
                'maintain_sbs': self.output.maintain_sbs
            },
            'hardware': {
                'device': self.hardware.device,
                'max_vram_gb': self.hardware.max_vram_gb,
                'max_ram_gb': self.hardware.max_ram_gb
            },
            'recovery': {
                'enable_checkpoints': self.recovery.enable_checkpoints,
                'checkpoint_interval': self.recovery.checkpoint_interval,
                'auto_resume': self.recovery.auto_resume,
                'checkpoint_dir': self.recovery.checkpoint_dir
            },
            'performance': {
                'profile_enabled': self.performance.profile_enabled,
                'log_interval': self.performance.log_interval,
                'memory_monitor': self.performance.memory_monitor
            }
        }
    def validate(self) -> List[str]:
        """Validate configuration and return list of errors"""
        errors = []
        # Check input
        if not self.input.video_path:
            errors.append("Input video path is required")
        elif not Path(self.input.video_path).exists():
            errors.append(f"Input video not found: {self.input.video_path}")
        # Check output
        if not self.output.path:
            errors.append("Output path is required")
        # Check scale factor
        if not 0.1 <= self.processing.scale_factor <= 1.0:
            errors.append("Scale factor must be between 0.1 and 1.0")
        # Check SAM2 checkpoint
        if not Path(self.matting.sam2_checkpoint).exists():
            errors.append(f"SAM2 checkpoint not found: {self.matting.sam2_checkpoint}")
        return errors
--- a/vr180_streaming/detector.py
+++ b/vr180_streaming/detector.py
@@ -0,0 +1,223 @@
 """
 Person detector using YOLOv8 for streaming pipeline
 """
 import numpy as np
 from typing import List, Dict, Any, Optional
 import warnings
 try:
    from ultralytics import YOLO
 except ImportError:
    warnings.warn("Ultralytics YOLO not installed. Please install with: pip install ultralytics")
    YOLO = None
 class PersonDetector:
    """YOLO-based person detector for VR180 streaming"""
    def __init__(self, config: Dict[str, Any]):
        self.config = config
        self.confidence_threshold = config.get('detection', {}).get('confidence_threshold', 0.7)
        self.model_name = config.get('detection', {}).get('model', 'yolov8n')
        self.device = config.get('detection', {}).get('device', 'cuda')
        self.model = None
        self._load_model()
        # Statistics
        self.stats = {
            'frames_processed': 0,
            'total_detections': 0,
            'avg_detections_per_frame': 0.0
        }
    def _load_model(self) -> None:
        """Load YOLO model"""
        if YOLO is None:
            raise RuntimeError("YOLO not available. Please install ultralytics.")
        try:
            # Load pretrained model
            model_file = f"{self.model_name}.pt"
            self.model = YOLO(model_file)
            self.model.to(self.device)
            print(f"🎯 Person detector initialized:")
            print(f"   Model: {self.model_name}")
            print(f"   Device: {self.device}")
            print(f"   Confidence threshold: {self.confidence_threshold}")
        except Exception as e:
            raise RuntimeError(f"Failed to load YOLO model: {e}")
    def detect_persons(self, frame: np.ndarray) -> List[Dict[str, Any]]:
        """
        Detect persons in frame
        Args:
            frame: Input frame (BGR)
        Returns:
            List of detection dictionaries with 'box', 'confidence' keys
        """
        if self.model is None:
            return []
        # Run detection
        results = self.model(frame, verbose=False, conf=self.confidence_threshold)
        detections = []
        for r in results:
            if r.boxes is not None:
                for box in r.boxes:
                    # Check if detection is person (class 0 in COCO)
                    if int(box.cls) == 0:
                        # Get box coordinates
                        x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
                        confidence = float(box.conf)
                        detection = {
                            'box': [int(x1), int(y1), int(x2), int(y2)],
                            'confidence': confidence,
                            'area': (x2 - x1) * (y2 - y1),
                            'center': [(x1 + x2) / 2, (y1 + y2) / 2]
                        }
                        detections.append(detection)
        # Update statistics
        self.stats['frames_processed'] += 1
        self.stats['total_detections'] += len(detections)
        self.stats['avg_detections_per_frame'] = (
            self.stats['total_detections'] / self.stats['frames_processed']
        )
        return detections
    def detect_persons_batch(self, frames: List[np.ndarray]) -> List[List[Dict[str, Any]]]:
        """
        Detect persons in batch of frames
        Args:
            frames: List of frames
        Returns:
            List of detection lists
        """
        if not frames or self.model is None:
            return []
        # Process batch
        results_batch = self.model(frames, verbose=False, conf=self.confidence_threshold)
        all_detections = []
        for results in results_batch:
            frame_detections = []
            if results.boxes is not None:
                for box in results.boxes:
                    if int(box.cls) == 0:  # Person class
                        x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
                        confidence = float(box.conf)
                        detection = {
                            'box': [int(x1), int(y1), int(x2), int(y2)],
                            'confidence': confidence,
                            'area': (x2 - x1) * (y2 - y1),
                            'center': [(x1 + x2) / 2, (y1 + y2) / 2]
                        }
                        frame_detections.append(detection)
            all_detections.append(frame_detections)
        # Update statistics
        self.stats['frames_processed'] += len(frames)
        self.stats['total_detections'] += sum(len(d) for d in all_detections)
        self.stats['avg_detections_per_frame'] = (
            self.stats['total_detections'] / self.stats['frames_processed']
        )
        return all_detections
    def filter_detections(self, 
                         detections: List[Dict[str, Any]], 
                         min_area: Optional[float] = None,
                         max_detections: Optional[int] = None) -> List[Dict[str, Any]]:
        """
        Filter detections based on criteria
        Args:
            detections: List of detections
            min_area: Minimum bounding box area
            max_detections: Maximum number of detections to keep
        Returns:
            Filtered detections
        """
        filtered = detections.copy()
        # Filter by minimum area
        if min_area is not None:
            filtered = [d for d in filtered if d['area'] >= min_area]
        # Sort by confidence and keep top N
        if max_detections is not None and len(filtered) > max_detections:
            filtered = sorted(filtered, key=lambda x: x['confidence'], reverse=True)
            filtered = filtered[:max_detections]
        return filtered
    def convert_to_sam_prompts(self, 
                             detections: List[Dict[str, Any]]) -> tuple:
        """
        Convert detections to SAM2 prompt format
        Args:
            detections: List of detections
        Returns:
            Tuple of (boxes, labels) for SAM2
        """
        if not detections:
            return [], []
        boxes = [d['box'] for d in detections]
        # All detections are positive prompts (label=1)
        labels = [1] * len(detections)
        return boxes, labels
    def get_stats(self) -> Dict[str, Any]:
        """Get detection statistics"""
        return self.stats.copy()
    def reset_stats(self) -> None:
        """Reset statistics"""
        self.stats = {
            'frames_processed': 0,
            'total_detections': 0,
            'avg_detections_per_frame': 0.0
        }
    def warmup(self, input_shape: tuple = (1080, 1920, 3)) -> None:
        """
        Warmup model with dummy inference
        Args:
            input_shape: Shape of input frames
        """
        if self.model is None:
            return
        print("🔥 Warming up detector...")
        dummy_frame = np.zeros(input_shape, dtype=np.uint8)
        _ = self.detect_persons(dummy_frame)
        print("   Detector ready!")
    def set_confidence_threshold(self, threshold: float) -> None:
        """Update confidence threshold"""
        self.confidence_threshold = max(0.1, min(0.99, threshold))
    def __del__(self):
        """Cleanup"""
        self.model = None
--- a/vr180_streaming/frame_reader.py
+++ b/vr180_streaming/frame_reader.py
@@ -0,0 +1,191 @@
 """
 Streaming frame reader for memory-efficient video processing
 """
 import cv2
 import numpy as np
 from pathlib import Path
 from typing import Optional, Dict, Any, Tuple
 class StreamingFrameReader:
    """Read frames one at a time from video file with seeking support"""
    def __init__(self, video_path: str, start_frame: int = 0):
        self.video_path = Path(video_path)
        if not self.video_path.exists():
            raise FileNotFoundError(f"Video file not found: {video_path}")
        self.cap = cv2.VideoCapture(str(self.video_path))
        if not self.cap.isOpened():
            raise RuntimeError(f"Failed to open video: {video_path}")
        # Get video properties
        self.fps = self.cap.get(cv2.CAP_PROP_FPS)
        self.total_frames = int(self.cap.get(cv2.CAP_PROP_FRAME_COUNT))
        self.width = int(self.cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        self.height = int(self.cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
        # Set start position
        self.current_frame_idx = 0
        if start_frame > 0:
            self.seek(start_frame)
        print(f"📹 Streaming reader initialized:")
        print(f"   Video: {self.video_path.name}")
        print(f"   Resolution: {self.width}x{self.height}")
        print(f"   FPS: {self.fps}")
        print(f"   Total frames: {self.total_frames}")
        print(f"   Starting at frame: {start_frame}")
    def read_frame(self) -> Optional[np.ndarray]:
        """
        Read next frame from video
        Returns:
            Frame as numpy array or None if end of video
        """
        ret, frame = self.cap.read()
        if ret:
            self.current_frame_idx += 1
            return frame
        return None
    def seek(self, frame_idx: int) -> bool:
        """
        Seek to specific frame
        Args:
            frame_idx: Target frame index
        Returns:
            True if seek successful
        """
        if 0 <= frame_idx < self.total_frames:
            self.cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
            self.current_frame_idx = frame_idx
            return True
        return False
    def get_video_info(self) -> Dict[str, Any]:
        """Get video metadata"""
        return {
            'width': self.width,
            'height': self.height,
            'fps': self.fps,
            'total_frames': self.total_frames,
            'path': str(self.video_path)
        }
    def get_progress(self) -> float:
        """Get current progress as percentage"""
        if self.total_frames > 0:
            return (self.current_frame_idx / self.total_frames) * 100
        return 0.0
    def reset(self) -> None:
        """Reset to beginning of video"""
        self.seek(0)
    def peek_frame(self) -> Optional[np.ndarray]:
        """
        Peek at next frame without advancing position
        Returns:
            Frame as numpy array or None if end of video
        """
        current_pos = self.current_frame_idx
        frame = self.read_frame()
        if frame is not None:
            # Reset position
            self.seek(current_pos)
        return frame
    def read_frame_at(self, frame_idx: int) -> Optional[np.ndarray]:
        """
        Read frame at specific index without changing current position
        Args:
            frame_idx: Frame index to read
        Returns:
            Frame as numpy array or None if invalid index
        """
        current_pos = self.current_frame_idx
        if self.seek(frame_idx):
            frame = self.read_frame()
            # Restore position
            self.seek(current_pos)
            return frame
        return None
    def get_frame_batch(self, start_idx: int, count: int) -> list[np.ndarray]:
        """
        Read a batch of frames (for initial detection or correction)
        Args:
            start_idx: Starting frame index
            count: Number of frames to read
        Returns:
            List of frames
        """
        current_pos = self.current_frame_idx
        frames = []
        if self.seek(start_idx):
            for i in range(count):
                frame = self.read_frame()
                if frame is None:
                    break
                frames.append(frame)
        # Restore position
        self.seek(current_pos)
        return frames
    def estimate_memory_per_frame(self) -> float:
        """
        Estimate memory usage per frame in MB
        Returns:
            Estimated memory in MB
        """
        # BGR format = 3 channels, uint8 = 1 byte per channel
        bytes_per_frame = self.width * self.height * 3
        return bytes_per_frame / (1024 * 1024)
    def close(self) -> None:
        """Release video capture resources"""
        if self.cap is not None:
            self.cap.release()
            self.cap = None
    def __enter__(self):
        """Context manager support"""
        return self
    def __exit__(self, exc_type, exc_val, exc_tb):
        """Context manager cleanup"""
        self.close()
    def __del__(self):
        """Ensure cleanup on deletion"""
        self.close()
    def __len__(self) -> int:
        """Total number of frames"""
        return self.total_frames
    def __iter__(self):
        """Iterator support"""
        self.reset()
        return self
    def __next__(self) -> np.ndarray:
        """Iterator next frame"""
        frame = self.read_frame()
        if frame is None:
            raise StopIteration
        return frame
--- a/vr180_streaming/frame_writer.py
+++ b/vr180_streaming/frame_writer.py
@@ -0,0 +1,336 @@
 """
 Streaming frame writer using ffmpeg pipe for zero-copy output
 """
 import subprocess
 import numpy as np
 from pathlib import Path
 from typing import Optional, Dict, Any
 import signal
 import atexit
 import warnings
 def test_nvenc_support() -> bool:
    """Test if NVENC encoding is available"""
    try:
        # Quick test with a 1-frame video
        cmd = [
            'ffmpeg', '-f', 'lavfi', '-i', 'testsrc=duration=0.1:size=320x240:rate=1',
            '-c:v', 'h264_nvenc', '-t', '0.1', '-f', 'null', '-'
        ]
        result = subprocess.run(
            cmd,
            capture_output=True,
            timeout=10,
            text=True
        )
        return result.returncode == 0
    except (subprocess.TimeoutExpired, FileNotFoundError):
        return False
 class StreamingFrameWriter:
    """Write frames directly to ffmpeg via pipe for memory-efficient output"""
    def __init__(self, 
                 output_path: str,
                 width: int,
                 height: int, 
                 fps: float,
                 audio_source: Optional[str] = None,
                 video_codec: str = 'h264_nvenc',
                 quality_preset: str = 'p4',  # NVENC preset
                 crf: int = 18,
                 pixel_format: str = 'bgr24'):
        self.output_path = Path(output_path)
        self.output_path.parent.mkdir(parents=True, exist_ok=True)
        self.width = width
        self.height = height
        self.fps = fps
        self.audio_source = audio_source
        self.pixel_format = pixel_format
        self.frames_written = 0
        self.ffmpeg_process = None
        # Test NVENC support if GPU codec requested
        if video_codec in ['h264_nvenc', 'hevc_nvenc']:
            print(f"🔍 Testing NVENC support...")
            if not test_nvenc_support():
                print(f"❌ NVENC not available, switching to CPU encoding")
                video_codec = 'libx264'
                quality_preset = 'medium'
            else:
                print(f"✅ NVENC available")
        # Build ffmpeg command
        self.ffmpeg_cmd = self._build_ffmpeg_command(
            video_codec, quality_preset, crf
        )
        # Start ffmpeg process
        self._start_ffmpeg()
        # Register cleanup
        atexit.register(self.close)
        print(f"📼 Streaming writer initialized:")
        print(f"   Output: {self.output_path}")
        print(f"   Resolution: {width}x{height} @ {fps}fps")
        print(f"   Codec: {video_codec}")
        print(f"   Audio: {'Yes' if audio_source else 'No'}")
    def _build_ffmpeg_command(self, video_codec: str, preset: str, crf: int) -> list:
        """Build ffmpeg command with optimal settings"""
        cmd = ['ffmpeg', '-y']  # Overwrite output
        # Video input from pipe
        cmd.extend([
            '-f', 'rawvideo',
            '-pix_fmt', self.pixel_format,
            '-s', f'{self.width}x{self.height}',
            '-r', str(self.fps),
            '-i', 'pipe:0'  # Read from stdin
        ])
        # Audio input if provided
        if self.audio_source and Path(self.audio_source).exists():
            cmd.extend(['-i', str(self.audio_source)])
        # Try GPU encoding first, fallback to CPU
        if video_codec == 'h264_nvenc':
            # NVIDIA GPU encoding
            cmd.extend([
                '-c:v', 'h264_nvenc',
                '-preset', preset,  # p1-p7, higher = better quality
                '-rc', 'vbr',       # Variable bitrate
                '-cq', str(crf),    # Quality level (0-51, lower = better)
                '-b:v', '0',        # Let VBR decide bitrate
                '-maxrate', '50M',  # Max bitrate for 8K
                '-bufsize', '100M'  # Buffer size
            ])
        elif video_codec == 'hevc_nvenc':
            # NVIDIA HEVC/H.265 encoding (better for 8K)
            cmd.extend([
                '-c:v', 'hevc_nvenc',
                '-preset', preset,
                '-rc', 'vbr',
                '-cq', str(crf),
                '-b:v', '0',
                '-maxrate', '40M',  # HEVC is more efficient
                '-bufsize', '80M'
            ])
        else:
            # CPU fallback (libx264)
            cmd.extend([
                '-c:v', 'libx264',
                '-preset', 'medium',
                '-crf', str(crf),
                '-pix_fmt', 'yuv420p'
            ])
        # Audio settings
        if self.audio_source:
            cmd.extend([
                '-c:a', 'copy',      # Copy audio without re-encoding
                '-map', '0:v:0',     # Map video from pipe
                '-map', '1:a:0',     # Map audio from file
                '-shortest'          # Match shortest stream
            ])
        else:
            cmd.extend(['-map', '0:v:0'])  # Video only
        # Output file
        cmd.append(str(self.output_path))
        return cmd
    def _start_ffmpeg(self) -> None:
        """Start ffmpeg subprocess"""
        try:
            print(f"🎬 Starting ffmpeg: {' '.join(self.ffmpeg_cmd[:10])}...")
            self.ffmpeg_process = subprocess.Popen(
                self.ffmpeg_cmd,
                stdin=subprocess.PIPE,
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
                bufsize=10**8  # Large buffer for performance
            )
            # Test if ffmpeg starts successfully (quick check)
            import time
            time.sleep(0.2)  # Give ffmpeg time to fail if it's going to
            if self.ffmpeg_process.poll() is not None:
                # Process already died - read error
                stderr = self.ffmpeg_process.stderr.read().decode()
                # Check for specific NVENC errors and provide better feedback
                if 'nvenc' in ' '.join(self.ffmpeg_cmd):
                    if 'unsupported device' in stderr.lower():
                        print(f"❌ NVENC not available on this GPU - switching to CPU encoding")
                    elif 'cannot load' in stderr.lower() or 'not found' in stderr.lower():
                        print(f"❌ NVENC drivers not available - switching to CPU encoding")
                    else:
                        print(f"❌ NVENC encoding failed: {stderr}")
                    # Try CPU fallback
                    print(f"🔄 Falling back to CPU encoding (libx264)...")
                    self.ffmpeg_cmd = self._build_ffmpeg_command('libx264', 'medium', 18)
                    return self._start_ffmpeg()
                else:
                    raise RuntimeError(f"FFmpeg failed: {stderr}")
            # Set process to ignore SIGINT (Ctrl+C) - we'll handle it
            if hasattr(signal, 'pthread_sigmask'):
                signal.pthread_sigmask(signal.SIG_BLOCK, [signal.SIGINT])
        except Exception as e:
            # Final fallback if everything fails
            if 'nvenc' in ' '.join(self.ffmpeg_cmd):
                print(f"⚠️  GPU encoding failed with error: {e}")
                print(f"🔄 Falling back to CPU encoding...")
                self.ffmpeg_cmd = self._build_ffmpeg_command('libx264', 'medium', 18)
                return self._start_ffmpeg()
            else:
                raise RuntimeError(f"Failed to start ffmpeg: {e}")
    def write_frame(self, frame: np.ndarray) -> bool:
        """
        Write a single frame to the video
        Args:
            frame: Frame to write (BGR format)
        Returns:
            True if successful
        """
        if self.ffmpeg_process is None or self.ffmpeg_process.poll() is not None:
            raise RuntimeError("FFmpeg process is not running")
        try:
            # Ensure correct shape
            if frame.shape != (self.height, self.width, 3):
                raise ValueError(
                    f"Frame shape {frame.shape} doesn't match expected "
                    f"({self.height}, {self.width}, 3)"
                )
            # Ensure correct dtype
            if frame.dtype != np.uint8:
                frame = frame.astype(np.uint8)
            # Write raw frame data to pipe
            self.ffmpeg_process.stdin.write(frame.tobytes())
            self.ffmpeg_process.stdin.flush()
            self.frames_written += 1
            # Periodic progress update
            if self.frames_written % 100 == 0:
                print(f"   Written {self.frames_written} frames...", end='\r')
            return True
        except BrokenPipeError:
            # Check if ffmpeg failed
            if self.ffmpeg_process.poll() is not None:
                stderr = self.ffmpeg_process.stderr.read().decode()
                raise RuntimeError(f"FFmpeg process died: {stderr}")
            raise
        except Exception as e:
            raise RuntimeError(f"Failed to write frame: {e}")
    def write_frame_alpha(self, frame: np.ndarray, alpha: np.ndarray) -> bool:
        """
        Write frame with alpha channel (converts to green screen)
        Args:
            frame: RGB frame
            alpha: Alpha mask (0-255)
        Returns:
            True if successful
        """
        # Create green screen composite
        green_bg = np.full_like(frame, [0, 255, 0], dtype=np.uint8)
        # Normalize alpha to 0-1
        if alpha.dtype == np.uint8:
            alpha_float = alpha.astype(np.float32) / 255.0
        else:
            alpha_float = alpha
        # Expand alpha to 3 channels if needed
        if alpha_float.ndim == 2:
            alpha_float = np.expand_dims(alpha_float, axis=2)
            alpha_float = np.repeat(alpha_float, 3, axis=2)
        # Composite
        composite = (frame * alpha_float + green_bg * (1 - alpha_float)).astype(np.uint8)
        return self.write_frame(composite)
    def get_progress(self) -> Dict[str, Any]:
        """Get writing progress"""
        return {
            'frames_written': self.frames_written,
            'duration_seconds': self.frames_written / self.fps if self.fps > 0 else 0,
            'output_path': str(self.output_path),
            'process_alive': self.ffmpeg_process is not None and self.ffmpeg_process.poll() is None
        }
    def close(self) -> None:
        """Close ffmpeg process and finalize video"""
        if self.ffmpeg_process is not None:
            try:
                # Close stdin to signal end of input
                if self.ffmpeg_process.stdin:
                    self.ffmpeg_process.stdin.close()
                # Wait for ffmpeg to finish (with timeout)
                print(f"\n🎬 Finalizing video with {self.frames_written} frames...")
                self.ffmpeg_process.wait(timeout=30)
                # Check return code
                if self.ffmpeg_process.returncode != 0:
                    stderr = self.ffmpeg_process.stderr.read().decode()
                    warnings.warn(f"FFmpeg exited with code {self.ffmpeg_process.returncode}: {stderr}")
                else:
                    print(f"✅ Video saved: {self.output_path}")
            except subprocess.TimeoutExpired:
                print("⚠️  FFmpeg taking too long, terminating...")
                self.ffmpeg_process.terminate()
                self.ffmpeg_process.wait(timeout=5)
            except Exception as e:
                warnings.warn(f"Error closing ffmpeg: {e}")
                if self.ffmpeg_process.poll() is None:
                    self.ffmpeg_process.kill()
            finally:
                self.ffmpeg_process = None
    def __enter__(self):
        """Context manager support"""
        return self
    def __exit__(self, exc_type, exc_val, exc_tb):
        """Context manager cleanup"""
        self.close()
    def __del__(self):
        """Ensure cleanup on deletion"""
        try:
            self.close()
        except:
            pass
--- a/vr180_streaming/main.py
+++ b/vr180_streaming/main.py
@@ -0,0 +1,298 @@
 #!/usr/bin/env python3
 """
 VR180 Streaming Human Matting - Main CLI entry point
 """
 import argparse
 import sys
 from pathlib import Path
 import traceback
 from .config import StreamingConfig
 from .streaming_processor import VR180StreamingProcessor
 def create_parser() -> argparse.ArgumentParser:
    """Create command line argument parser"""
    parser = argparse.ArgumentParser(
        description="VR180 Streaming Human Matting - True streaming implementation",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  # Process video with streaming
  vr180-streaming config-streaming.yaml
  # Process with custom output
  vr180-streaming config-streaming.yaml --output /path/to/output.mp4
  # Generate example config
  vr180-streaming --generate-config config-streaming-example.yaml
  # Process specific frame range
  vr180-streaming config-streaming.yaml --start-frame 1000 --max-frames 5000
        """
    )
    parser.add_argument(
        "config",
        nargs="?",
        help="Path to YAML configuration file"
    )
    parser.add_argument(
        "--generate-config",
        metavar="PATH",
        help="Generate example configuration file at specified path"
    )
    parser.add_argument(
        "--output", "-o",
        metavar="PATH",
        help="Override output path from config"
    )
    parser.add_argument(
        "--scale",
        type=float,
        metavar="FACTOR",
        help="Override scale factor (0.25, 0.5, 1.0)"
    )
    parser.add_argument(
        "--start-frame",
        type=int,
        metavar="N",
        help="Start processing from frame N"
    )
    parser.add_argument(
        "--max-frames",
        type=int,
        metavar="N",
        help="Process at most N frames"
    )
    parser.add_argument(
        "--device",
        choices=["cuda", "cpu"],
        help="Override processing device"
    )
    parser.add_argument(
        "--format",
        choices=["alpha", "greenscreen"],
        help="Override output format"
    )
    parser.add_argument(
        "--no-audio",
        action="store_true",
        help="Don't copy audio to output"
    )
    parser.add_argument(
        "--verbose", "-v",
        action="store_true",
        help="Enable verbose output"
    )
    parser.add_argument(
        "--dry-run",
        action="store_true",
        help="Validate configuration without processing"
    )
    return parser
 def generate_example_config(output_path: str) -> None:
    """Generate example configuration file"""
    config_content = '''# VR180 Streaming Configuration
 # For RunPod or similar cloud GPU environments
 input:
  video_path: "/workspace/input_video.mp4"
  start_frame: 0  # Start from beginning (or resume from checkpoint)
  max_frames: null  # Process entire video (or set limit for testing)
 streaming:
  mode: true  # Enable streaming mode
  buffer_frames: 10  # Small lookahead buffer
  write_interval: 1  # Write every frame immediately
 processing:
  scale_factor: 0.5  # Process at 50% resolution for 8K input
  adaptive_scaling: true  # Dynamically adjust based on GPU load
  target_gpu_usage: 0.7  # Target 70% GPU utilization
  min_scale: 0.25
  max_scale: 1.0
 detection:
  confidence_threshold: 0.7
  model: "yolov8n"  # Fast model for streaming
  device: "cuda"
 matting:
  sam2_model_cfg: "sam2.1_hiera_l"  # Large model for quality
  sam2_checkpoint: "segment-anything-2/checkpoints/sam2.1_hiera_large.pt"
  memory_offload: true  # Essential for streaming
  fp16: true  # Use half precision
  continuous_correction: true  # Refine tracking periodically  
  correction_interval: 300  # Every 300 frames
 stereo:
  mode: "master_slave"  # Left eye leads, right follows
  master_eye: "left"
  disparity_correction: true  # Adjust for stereo depth
  consistency_threshold: 0.3
  baseline: 65.0  # mm - typical eye separation
  focal_length: 1000.0  # pixels - adjust based on camera
 output:
  path: "/workspace/output_video.mp4"
  format: "greenscreen"  # or "alpha"
  background_color: [0, 255, 0]  # Pure green
  video_codec: "h264_nvenc"  # GPU encoding
  quality_preset: "p4"  # Balance quality/speed
  crf: 18  # High quality
  maintain_sbs: true  # Keep side-by-side format
 hardware:
  device: "cuda"
  max_vram_gb: 40.0  # RunPod A6000 has 48GB
  max_ram_gb: 48.0  # Container RAM limit
 recovery:
  enable_checkpoints: true
  checkpoint_interval: 1000  # Every 1000 frames
  auto_resume: true  # Resume from checkpoint if found
  checkpoint_dir: "./checkpoints"
 performance:
  profile_enabled: true
  log_interval: 100  # Log every 100 frames
  memory_monitor: true  # Track memory usage
 '''
    output_path = Path(output_path)
    output_path.parent.mkdir(parents=True, exist_ok=True)
    with open(output_path, 'w') as f:
        f.write(config_content)
    print(f"✅ Generated example configuration: {output_path}")
    print("\nEdit the configuration file with your paths and run:")
    print(f"  python -m vr180_streaming {output_path}")
 def validate_config(config: StreamingConfig, verbose: bool = False) -> bool:
    """Validate configuration and print any errors"""
    errors = config.validate()
    if errors:
        print("❌ Configuration validation failed:")
        for error in errors:
            print(f"   - {error}")
        return False
    if verbose:
        print("✅ Configuration validation passed")
        print(f"   Input: {config.input.video_path}")
        print(f"   Output: {config.output.path}")
        print(f"   Scale: {config.processing.scale_factor}")
        print(f"   Device: {config.hardware.device}")
        print(f"   Format: {config.output.format}")
    return True
 def apply_cli_overrides(config: StreamingConfig, args: argparse.Namespace) -> None:
    """Apply command line overrides to configuration"""
    if args.output:
        config.output.path = args.output
    if args.scale:
        if not 0.1 <= args.scale <= 1.0:
            raise ValueError("Scale factor must be between 0.1 and 1.0")
        config.processing.scale_factor = args.scale
    if args.start_frame is not None:
        if args.start_frame < 0:
            raise ValueError("Start frame must be non-negative")
        config.input.start_frame = args.start_frame
    if args.max_frames is not None:
        if args.max_frames <= 0:
            raise ValueError("Max frames must be positive")
        config.input.max_frames = args.max_frames
    if args.device:
        config.hardware.device = args.device
        config.detection.device = args.device
    if args.format:
        config.output.format = args.format
    if args.no_audio:
        config.output.maintain_sbs = False  # This will skip audio copy
 def main() -> int:
    """Main entry point"""
    parser = create_parser()
    args = parser.parse_args()
    try:
        # Handle config generation
        if args.generate_config:
            generate_example_config(args.generate_config)
            return 0
        # Require config file for processing
        if not args.config:
            parser.print_help()
            print("\n❌ Error: Configuration file required")
            print("\nGenerate an example config with:")
            print("  vr180-streaming --generate-config config-streaming.yaml")
            return 1
        # Load configuration
        config_path = Path(args.config)
        if not config_path.exists():
            print(f"❌ Error: Configuration file not found: {config_path}")
            return 1
        print(f"📄 Loading configuration from {config_path}")
        config = StreamingConfig.from_yaml(str(config_path))
        # Apply CLI overrides
        apply_cli_overrides(config, args)
        # Validate configuration
        if not validate_config(config, verbose=args.verbose):
            return 1
        # Dry run mode
        if args.dry_run:
            print("✅ Dry run completed successfully")
            return 0
        # Process video
        processor = VR180StreamingProcessor(config)
        processor.process_video()
        return 0
    except KeyboardInterrupt:
        print("\n⚠️  Processing interrupted by user")
        return 130
    except Exception as e:
        print(f"\n❌ Error: {e}")
        if args.verbose:
            traceback.print_exc()
        return 1
 if __name__ == "__main__":
    sys.exit(main())
--- a/vr180_streaming/sam2_streaming.py
+++ b/vr180_streaming/sam2_streaming.py
@@ -0,0 +1,629 @@
 """
 SAM2 streaming processor for frame-by-frame video segmentation
 NOTE: This is a template implementation. The actual SAM2 integration would need to:
 1. Handle the fact that SAM2VideoPredictor loads the entire video internally
 2. Potentially modify SAM2 to support frame-by-frame input
 3. Or use a custom video loader that provides frames on demand
 For a true streaming implementation, you may need to:
 - Extend SAM2VideoPredictor to accept a frame generator instead of video path
 - Implement a custom video loader that doesn't load all frames at once
 - Use the memory offloading features more aggressively
 """
 import torch
 import numpy as np
 import cv2
 from pathlib import Path
 from typing import Dict, Any, List, Optional, Tuple, Generator
 import warnings
 import gc
 # Import SAM2 components - these will be available after SAM2 installation
 try:
    from sam2.build_sam import build_sam2_video_predictor
    from sam2.utils.misc import load_video_frames
 except ImportError:
    warnings.warn("SAM2 not installed. Please install segment-anything-2 first.")
 class SAM2StreamingProcessor:
    """Streaming integration with SAM2 video predictor"""
    def __init__(self, config: Dict[str, Any]):
        self.config = config
        self.device = torch.device(config.get('hardware', {}).get('device', 'cuda'))
        # Processing parameters (set before _init_predictor)
        self.memory_offload = config.get('matting', {}).get('memory_offload', True)
        self.fp16 = config.get('matting', {}).get('fp16', True)
        self.correction_interval = config.get('matting', {}).get('correction_interval', 300)
        # SAM2 model configuration
        model_cfg = config.get('matting', {}).get('sam2_model_cfg', 'sam2.1_hiera_l')
        checkpoint = config.get('matting', {}).get('sam2_checkpoint', 
                                                   'segment-anything-2/checkpoints/sam2.1_hiera_large.pt')
        # Build predictor
        self.predictor = None
        self._init_predictor(model_cfg, checkpoint)
        # State management
        self.states = {}  # eye -> inference state
        self.object_ids = []
        self.frame_count = 0
        print(f"🎯 SAM2 streaming processor initialized:")
        print(f"   Model: {model_cfg}")
        print(f"   Device: {self.device}")
        print(f"   Memory offload: {self.memory_offload}")
        print(f"   FP16: {self.fp16}")
    def _init_predictor(self, model_cfg: str, checkpoint: str) -> None:
        """Initialize SAM2 video predictor"""
        try:
            # Map config string to actual config path
            config_mapping = {
                'sam2.1_hiera_t': 'configs/sam2.1/sam2.1_hiera_t.yaml',
                'sam2.1_hiera_s': 'configs/sam2.1/sam2.1_hiera_s.yaml',
                'sam2.1_hiera_b+': 'configs/sam2.1/sam2.1_hiera_b+.yaml',
                'sam2.1_hiera_l': 'configs/sam2.1/sam2.1_hiera_l.yaml',
            }
            actual_config = config_mapping.get(model_cfg, model_cfg)
            # Build predictor with VOS optimizations
            self.predictor = build_sam2_video_predictor(
                actual_config,
                checkpoint,
                device=self.device,
                vos_optimized=True  # Enable full model compilation for speed
            )
            # Set to eval mode and ensure all model components are on GPU
            self.predictor.eval()
            # Force all predictor components to GPU
            self.predictor = self.predictor.to(self.device)
            # Force move all internal components that might be on CPU
            if hasattr(self.predictor, 'image_encoder'):
                self.predictor.image_encoder = self.predictor.image_encoder.to(self.device)
            if hasattr(self.predictor, 'memory_attention'):
                self.predictor.memory_attention = self.predictor.memory_attention.to(self.device)
            if hasattr(self.predictor, 'memory_encoder'):
                self.predictor.memory_encoder = self.predictor.memory_encoder.to(self.device)
            if hasattr(self.predictor, 'sam_mask_decoder'):
                self.predictor.sam_mask_decoder = self.predictor.sam_mask_decoder.to(self.device)
            if hasattr(self.predictor, 'sam_prompt_encoder'):
                self.predictor.sam_prompt_encoder = self.predictor.sam_prompt_encoder.to(self.device)
            # Note: FP16 conversion can cause type mismatches with compiled models
            # Let SAM2 handle precision internally via build_sam2_video_predictor options
            if self.fp16 and self.device.type == 'cuda':
                print("   FP16 enabled via SAM2 internal settings")
            print(f"   All SAM2 components moved to {self.device}")
        except Exception as e:
            raise RuntimeError(f"Failed to initialize SAM2 predictor: {e}")
    def init_state(self, 
                  video_info: Dict[str, Any],
                  eye: str = 'full') -> Dict[str, Any]:
        """
        Initialize inference state for streaming (NO VIDEO LOADING)
        Args:
            video_info: Video metadata dict with width, height, frame_count
            eye: Eye identifier ('left', 'right', or 'full')
        Returns:
            Inference state dictionary
        """
        print(f"   Initializing streaming state for {eye} eye...")
        # Monitor memory before initialization
        if torch.cuda.is_available():
            before_mem = torch.cuda.memory_allocated() / 1e9
            print(f"   📊 GPU memory before init: {before_mem:.1f}GB")
        # Create streaming state WITHOUT loading video frames
        state = self._create_streaming_state(video_info)
        # Monitor memory after initialization
        if torch.cuda.is_available():
            after_mem = torch.cuda.memory_allocated() / 1e9
            print(f"   📊 GPU memory after init: {after_mem:.1f}GB (+{after_mem-before_mem:.1f}GB)")
        self.states[eye] = state
        print(f"   ✅ Streaming state initialized for {eye} eye")
        return state
    def _create_streaming_state(self, video_info: Dict[str, Any]) -> Dict[str, Any]:
        """Create streaming state for frame-by-frame processing"""
        # Create a streaming-compatible inference state
        # This mirrors SAM2's internal state structure but without video frames
        # Create streaming-compatible state without loading video
        # This approach avoids the dummy video complexity
        with torch.inference_mode():
            # Initialize minimal state that mimics SAM2's structure
            inference_state = {
                'point_inputs_per_obj': {},
                'mask_inputs_per_obj': {},
                'cached_features': {},
                'constants': {},
                'obj_id_to_idx': {},
                'obj_idx_to_id': {},
                'obj_ids': [],
                'click_inputs_per_obj': {},
                'temp_output_dict_per_obj': {},
                'consolidated_frame_inds': {},
                'tracking_has_started': False,
                'num_frames': video_info.get('total_frames', video_info.get('frame_count', 0)),
                'video_height': video_info['height'],
                'video_width': video_info['width'],
                'device': self.device,
                'storage_device': self.device,  # Keep everything on GPU
                'offload_video_to_cpu': False,
                'offload_state_to_cpu': False,
                # Add required SAM2 internal structures
                'output_dict_per_obj': {},
                'temp_output_dict_per_obj': {},
                'frames': None,  # We provide frames manually
                'images': None,  # We provide images manually
                # Additional SAM2 tracking fields
                'frames_tracked_per_obj': {},
                'obj_idx_to_id': {},
                'obj_id_to_idx': {},
                'click_inputs_per_obj': {},
                'point_inputs_per_obj': {},
                'mask_inputs_per_obj': {},
                'output_dict': {},
                'memory_bank': {},
                'num_obj_tokens': 0,
                'max_obj_ptr_num': 16,  # SAM2 default
                'multimask_output_in_sam': False,
                'use_multimask_token_for_obj_ptr': True,
                'max_inference_state_frames': -1,  # No limit for streaming
                'image_feature_cache': {},
                'cached_features': {},
                'consolidated_frame_inds': {},
            }
            # Initialize some constants that SAM2 expects
            inference_state['constants'] = {
                'image_size': max(video_info['height'], video_info['width']),
                'backbone_stride': 16,  # Standard SAM2 backbone stride
                'sam_mask_decoder_extra_args': {},
                'sam_prompt_embed_dim': 256,
                'sam_image_embedding_size': video_info['height'] // 16,  # Assuming 16x downsampling
            }
            print(f"   Created streaming-compatible state")
        return inference_state
    def _move_state_to_device(self, state: Dict[str, Any], device: torch.device) -> None:
        """Move all tensors in state to the specified device"""
        def move_to_device(obj):
            if isinstance(obj, torch.Tensor):
                return obj.to(device)
            elif isinstance(obj, dict):
                return {k: move_to_device(v) for k, v in obj.items()}
            elif isinstance(obj, list):
                return [move_to_device(item) for item in obj]
            elif isinstance(obj, tuple):
                return tuple(move_to_device(item) for item in obj)
            else:
                return obj
        # Move all state components to device
        for key, value in state.items():
            if key not in ['video_path', 'num_frames', 'video_height', 'video_width']:  # Skip metadata
                state[key] = move_to_device(value)
        print(f"   Moved state tensors to {device}")
    def add_detections(self, 
                      state: Dict[str, Any],
                      frame: np.ndarray,
                      detections: List[Dict[str, Any]], 
                      frame_idx: int = 0) -> List[int]:
        """
        Add detection boxes as prompts to SAM2 with frame data
        Args:
            state: Inference state
            frame: Frame image (RGB numpy array)
            detections: List of detections with 'box' key
            frame_idx: Frame index to add prompts
        Returns:
            List of object IDs
        """
        if not detections:
            warnings.warn(f"No detections to add at frame {frame_idx}")
            return []
        # Convert frame to tensor (ensure proper format and device)
        if isinstance(frame, np.ndarray):
            # Convert BGR to RGB if needed (OpenCV uses BGR)
            if frame.shape[-1] == 3:
                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            frame_tensor = torch.from_numpy(frame).float().to(self.device)
        else:
            frame_tensor = frame.float().to(self.device)
        if frame_tensor.ndim == 3:
            frame_tensor = frame_tensor.permute(2, 0, 1)  # HWC -> CHW
        frame_tensor = frame_tensor.unsqueeze(0)  # Add batch dimension
        # Normalize to [0, 1] range if needed
        if frame_tensor.max() > 1.0:
            frame_tensor = frame_tensor / 255.0
        # Convert detections to SAM2 format
        boxes = []
        for det in detections:
            box = det['box']  # [x1, y1, x2, y2]
            boxes.append(box)
        boxes_tensor = torch.tensor(boxes, dtype=torch.float32, device=self.device)
        # Manually process frame and add prompts (streaming approach)
        with torch.inference_mode():
            # Process frame through SAM2's image encoder
            backbone_out = self.predictor.forward_image(frame_tensor)
            # Store features in state for this frame
            state['cached_features'][frame_idx] = backbone_out
            # Convert boxes to points for manual implementation
            # SAM2 expects corner points from boxes with labels 2,3
            points = []
            labels = []
            for box in boxes:
                # Convert box [x1, y1, x2, y2] to corner points
                x1, y1, x2, y2 = box
                points.extend([[x1, y1], [x2, y2]])  # Top-left and bottom-right corners
                labels.extend([2, 3])  # SAM2 standard labels for box corners
            points_tensor = torch.tensor(points, dtype=torch.float32, device=self.device)
            labels_tensor = torch.tensor(labels, dtype=torch.int32, device=self.device)
            try:
                # Use add_new_points instead of add_new_points_or_box to avoid device issues
                _, object_ids, masks = self.predictor.add_new_points(
                    inference_state=state,
                    frame_idx=frame_idx,
                    obj_id=None,  # Let SAM2 auto-assign
                    points=points_tensor,
                    labels=labels_tensor,
                    clear_old_points=True,
                    normalize_coords=True
                )
                # Update state with object tracking info
                state['obj_ids'] = object_ids
                state['tracking_has_started'] = True
            except Exception as e:
                print(f"   Error in add_new_points: {e}")
                print(f"   Points tensor device: {points_tensor.device}")
                print(f"   Labels tensor device: {labels_tensor.device}")
                print(f"   Frame tensor device: {frame_tensor.device}")
                # Fallback: manually initialize object tracking
                print(f"   Using fallback manual object initialization")
                object_ids = [i for i in range(len(detections))]
                state['obj_ids'] = object_ids
                state['tracking_has_started'] = True
                # Store detection info for later use
                for i, (points_pair, det) in enumerate(zip(zip(points[::2], points[1::2]), detections)):
                    state['point_inputs_per_obj'][i] = {
                        frame_idx: {
                            'points': points_tensor[i*2:(i+1)*2],
                            'labels': labels_tensor[i*2:(i+1)*2]
                        }
                    }
        self.object_ids = object_ids
        print(f"   Added {len(detections)} detections at frame {frame_idx}: objects {object_ids}")
        return object_ids
    def propagate_single_frame(self, 
                             state: Dict[str, Any],
                             frame: np.ndarray,
                             frame_idx: int) -> np.ndarray:
        """
        Propagate masks for a single frame (true streaming)
        Args:
            state: Inference state
            frame: Frame image (RGB numpy array)
            frame_idx: Frame index
        Returns:
            Combined mask for all objects
        """
        # Convert frame to tensor (ensure proper format and device)
        if isinstance(frame, np.ndarray):
            # Convert BGR to RGB if needed (OpenCV uses BGR)
            if frame.shape[-1] == 3:
                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            frame_tensor = torch.from_numpy(frame).float().to(self.device)
        else:
            frame_tensor = frame.float().to(self.device)
        if frame_tensor.ndim == 3:
            frame_tensor = frame_tensor.permute(2, 0, 1)  # HWC -> CHW
        frame_tensor = frame_tensor.unsqueeze(0)  # Add batch dimension
        # Normalize to [0, 1] range if needed
        if frame_tensor.max() > 1.0:
            frame_tensor = frame_tensor / 255.0
        with torch.inference_mode():
            # Process frame through SAM2's image encoder
            backbone_out = self.predictor.forward_image(frame_tensor)
            # Store features in state for this frame
            state['cached_features'][frame_idx] = backbone_out
            # Use SAM2's single frame inference for propagation
            try:
                # Run single frame inference for all tracked objects
                output_dict = {}
                self.predictor._run_single_frame_inference(
                    inference_state=state,
                    output_dict=output_dict,
                    frame_idx=frame_idx,
                    batch_size=1,
                    is_init_cond_frame=False,  # Not initialization frame
                    point_inputs=None,
                    mask_inputs=None,
                    reverse=False,
                    run_mem_encoder=True
                )
                # Extract masks from output
                if output_dict and 'pred_masks' in output_dict:
                    pred_masks = output_dict['pred_masks']
                    # Combine all object masks
                    if pred_masks.shape[0] > 0:
                        combined_mask = pred_masks.max(dim=0)[0]
                        combined_mask_np = (combined_mask > 0.0).cpu().numpy().astype(np.uint8) * 255
                    else:
                        combined_mask_np = np.zeros((frame.shape[0], frame.shape[1]), dtype=np.uint8)
                else:
                    combined_mask_np = np.zeros((frame.shape[0], frame.shape[1]), dtype=np.uint8)
            except Exception as e:
                print(f"   Warning: Single frame inference failed: {e}")
                combined_mask_np = np.zeros((frame.shape[0], frame.shape[1]), dtype=np.uint8)
            # Cleanup old features to prevent memory accumulation
            self._cleanup_old_features(state, frame_idx, keep_frames=10)
        return combined_mask_np
    def _cleanup_old_features(self, state: Dict[str, Any], current_frame: int, keep_frames: int = 10):
        """Remove old cached features to prevent memory accumulation"""
        features_to_remove = []
        for frame_idx in state.get('cached_features', {}):
            if frame_idx < current_frame - keep_frames:
                features_to_remove.append(frame_idx)
        for frame_idx in features_to_remove:
            del state['cached_features'][frame_idx]
        # Periodic GPU memory cleanup
        if current_frame % 50 == 0:
            if torch.cuda.is_available():
                torch.cuda.empty_cache()
            gc.collect()
    def propagate_frame_pair(self,
                           left_state: Dict[str, Any],
                           right_state: Dict[str, Any],
                           left_frame: np.ndarray,
                           right_frame: np.ndarray,
                           frame_idx: int) -> Tuple[np.ndarray, np.ndarray]:
        """
        Propagate masks for a stereo frame pair
        Args:
            left_state: Left eye inference state
            right_state: Right eye inference state
            left_frame: Left eye frame
            right_frame: Right eye frame
            frame_idx: Current frame index
        Returns:
            Tuple of (left_masks, right_masks)
        """
        # For actual implementation, we would need to handle the video frames
        # being already loaded in the state. This is a simplified version.
        # In practice, SAM2's propagate_in_video would handle frame loading.
        # Get masks from the current propagation state
        # This is pseudo-code as actual integration would depend on
        # how frames are provided to SAM2VideoPredictor
        left_masks = np.zeros((left_frame.shape[0], left_frame.shape[1]), dtype=np.uint8)
        right_masks = np.zeros((right_frame.shape[0], right_frame.shape[1]), dtype=np.uint8)
        # In actual implementation, you would:
        # 1. Use predictor.propagate_in_video() generator
        # 2. Extract masks for current frame_idx
        # 3. Combine multiple object masks if needed
        return left_masks, right_masks
    def _propagate_single_frame(self,
                               state: Dict[str, Any],
                               frame: np.ndarray,
                               frame_idx: int) -> np.ndarray:
        """
        Propagate masks for a single frame
        Args:
            state: Inference state
            frame: Input frame
            frame_idx: Frame index
        Returns:
            Combined mask for all objects
        """
        # This is a simplified version - in practice we'd use the actual
        # SAM2 propagation API which handles memory updates internally
        # Get current masks from propagation
        # Note: This is pseudo-code as the actual API may differ
        masks = []
        # For each tracked object
        for obj_idx in range(len(self.object_ids)):
            # Get mask for this object
            # In reality, SAM2 handles this internally
            obj_mask = self._get_object_mask(state, obj_idx, frame_idx)
            masks.append(obj_mask)
        # Combine all object masks
        if masks:
            combined_mask = np.max(masks, axis=0)
        else:
            combined_mask = np.zeros((frame.shape[0], frame.shape[1]), dtype=np.uint8)
        return combined_mask
    def _get_object_mask(self, state: Dict[str, Any], obj_idx: int, frame_idx: int) -> np.ndarray:
        """
        Get mask for specific object (placeholder - actual implementation uses SAM2 API)
        """
        # In practice, this would extract the mask from SAM2's internal state
        # For now, return a placeholder
        h, w = state.get('video_height', 1080), state.get('video_width', 1920)
        return np.zeros((h, w), dtype=np.uint8)
    def apply_continuous_correction(self,
                                  state: Dict[str, Any],
                                  frame: np.ndarray,
                                  frame_idx: int,
                                  detector: Any) -> None:
        """
        Apply continuous correction by re-detecting and refining masks
        Args:
            state: Inference state
            frame: Current frame
            frame_idx: Frame index
            detector: Person detector instance
        """
        if frame_idx % self.correction_interval != 0:
            return
        print(f"   🔄 Applying continuous correction at frame {frame_idx}")
        # Detect persons in current frame
        new_detections = detector.detect_persons(frame)
        if new_detections:
            # Add new prompts to refine tracking
            with torch.inference_mode():
                boxes = [det['box'] for det in new_detections]
                boxes_tensor = torch.tensor(boxes, dtype=torch.float32, device=self.device)
                # Add refinement prompts
                self.predictor.add_new_points_or_box(
                    inference_state=state,
                    frame_idx=frame_idx,
                    obj_id=0,  # Refine existing objects
                    box=boxes_tensor
                )
    def apply_mask_to_frame(self,
                          frame: np.ndarray,
                          mask: np.ndarray,
                          output_format: str = 'greenscreen',
                          background_color: List[int] = [0, 255, 0]) -> np.ndarray:
        """
        Apply mask to frame with specified output format
        Args:
            frame: Input frame (BGR)
            mask: Binary mask
            output_format: 'alpha' or 'greenscreen'
            background_color: Background color for greenscreen
        Returns:
            Processed frame
        """
        if output_format == 'alpha':
            # Add alpha channel
            if mask.dtype != np.uint8:
                mask = (mask * 255).astype(np.uint8)
            # Create BGRA image
            bgra = np.zeros((frame.shape[0], frame.shape[1], 4), dtype=np.uint8)
            bgra[:, :, :3] = frame
            bgra[:, :, 3] = mask
            return bgra
        else:  # greenscreen
            # Create green background
            background = np.full_like(frame, background_color, dtype=np.uint8)
            # Expand mask to 3 channels
            if mask.ndim == 2:
                mask_3ch = np.expand_dims(mask, axis=2)
                mask_3ch = np.repeat(mask_3ch, 3, axis=2)
            else:
                mask_3ch = mask
            # Normalize mask to 0-1
            if mask_3ch.dtype == np.uint8:
                mask_float = mask_3ch.astype(np.float32) / 255.0
            else:
                mask_float = mask_3ch.astype(np.float32)
            # Composite
            result = (frame * mask_float + background * (1 - mask_float)).astype(np.uint8)
            return result
    def cleanup(self) -> None:
        """Clean up resources"""
        # Clear states
        self.states.clear()
        # Clear CUDA cache
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            torch.cuda.synchronize()
        # Garbage collection
        gc.collect()
        print("🧹 SAM2 streaming processor cleaned up")
    def get_memory_usage(self) -> Dict[str, float]:
        """Get current memory usage"""
        memory_stats = {
            'states_count': len(self.states),
            'object_count': len(self.object_ids),
        }
        if torch.cuda.is_available():
            memory_stats['cuda_allocated_gb'] = torch.cuda.memory_allocated() / 1e9
            memory_stats['cuda_reserved_gb'] = torch.cuda.memory_reserved() / 1e9
        return memory_stats
--- a/vr180_streaming/sam2_streaming_simple.py
+++ b/vr180_streaming/sam2_streaming_simple.py
@@ -0,0 +1,407 @@
 """
 Simple SAM2 streaming processor based on det-sam2 pattern
 Adapted for current segment-anything-2 API
 """
 import torch
 import numpy as np
 import cv2
 import tempfile
 import os
 from pathlib import Path
 from typing import Dict, Any, List, Optional
 import warnings
 import gc
 # Import SAM2 components
 try:
    from sam2.build_sam import build_sam2_video_predictor
 except ImportError:
    warnings.warn("SAM2 not installed. Please install segment-anything-2 first.")
 class SAM2StreamingProcessor:
    """Simple streaming integration with SAM2 following det-sam2 pattern"""
    def __init__(self, config: Dict[str, Any]):
        self.config = config
        self.device = torch.device(config.get('hardware', {}).get('device', 'cuda'))
        # SAM2 model configuration
        model_cfg_name = config.get('matting', {}).get('sam2_model_cfg', 'sam2.1_hiera_l')
        checkpoint = config.get('matting', {}).get('sam2_checkpoint', 
                                                   'segment-anything-2/checkpoints/sam2.1_hiera_large.pt')
        # Map config name to Hydra path (like the examples show)
        config_mapping = {
            'sam2.1_hiera_t': 'configs/sam2.1/sam2.1_hiera_t.yaml',
            'sam2.1_hiera_s': 'configs/sam2.1/sam2.1_hiera_s.yaml',
            'sam2.1_hiera_b+': 'configs/sam2.1/sam2.1_hiera_b+.yaml',
            'sam2.1_hiera_l': 'configs/sam2.1/sam2.1_hiera_l.yaml',
        }
        model_cfg = config_mapping.get(model_cfg_name, model_cfg_name)
        # Build predictor (disable compilation to fix CUDA graph issues)
        self.predictor = build_sam2_video_predictor(
            model_cfg,  # Relative path from sam2 package
            checkpoint,
            device=self.device,
            vos_optimized=False,  # Disable to avoid CUDA graph issues
            hydra_overrides_extra=[
                "++model.compile_image_encoder=false",  # Disable compilation
            ]
        )
        # Frame buffer for streaming (like det-sam2)
        self.frame_buffer = []
        self.frame_buffer_size = config.get('streaming', {}).get('buffer_frames', 10)
        # State management (simple)
        self.inference_state = None
        self.temp_dir = None
        self.object_ids = []
        # Memory management
        self.memory_offload = config.get('matting', {}).get('memory_offload', True)
        self.max_frames_to_track = config.get('matting', {}).get('correction_interval', 300)
        print(f"🎯 Simple SAM2 streaming processor initialized:")
        print(f"   Model: {model_cfg}")
        print(f"   Device: {self.device}")
        print(f"   Buffer size: {self.frame_buffer_size}")
        print(f"   Memory offload: {self.memory_offload}")
    def add_frame_and_detections(self, 
                                frame: np.ndarray, 
                                detections: List[Dict[str, Any]], 
                                frame_idx: int) -> np.ndarray:
        """
        Add frame to buffer and process detections (det-sam2 pattern)
        Args:
            frame: Input frame (BGR)
            detections: List of detections with 'box' key
            frame_idx: Global frame index
        Returns:
            Mask for current frame
        """
        # Add frame to buffer
        self.frame_buffer.append({
            'frame': frame,
            'frame_idx': frame_idx,
            'detections': detections
        })
        # Process when buffer is full or when we have detections
        if len(self.frame_buffer) >= self.frame_buffer_size or detections:
            return self._process_buffer()
        else:
            # For frames without detections, still try to propagate if we have existing objects
            if self.inference_state is not None and self.object_ids:
                return self._propagate_existing_objects()
            else:
                # Return empty mask if no processing yet
                return np.zeros((frame.shape[0], frame.shape[1]), dtype=np.uint8)
    def _process_buffer(self) -> np.ndarray:
        """Process current frame buffer (adapted det-sam2 approach)"""
        if not self.frame_buffer:
            return np.zeros((480, 640), dtype=np.uint8)
        try:
            # Create temporary directory for frames (current SAM2 API requirement)
            self._create_temp_frames()
            # Initialize or update SAM2 state
            if self.inference_state is None:
                # First time: initialize state with temp directory
                self.inference_state = self.predictor.init_state(
                    video_path=self.temp_dir,
                    offload_video_to_cpu=self.memory_offload,
                    offload_state_to_cpu=self.memory_offload
                )
                print(f"   Initialized SAM2 state with {len(self.frame_buffer)} frames")
            else:
                # Subsequent times: we need to reinitialize since current SAM2 lacks update_state
                # This is the key difference from det-sam2 reference
                self._cleanup_temp_frames()
                self._create_temp_frames()
                self.inference_state = self.predictor.init_state(
                    video_path=self.temp_dir,
                    offload_video_to_cpu=self.memory_offload,
                    offload_state_to_cpu=self.memory_offload
                )
                print(f"   Reinitialized SAM2 state with {len(self.frame_buffer)} frames")
            # Add detections as prompts (standard SAM2 API)
            self._add_detection_prompts()
            # Get masks via propagation
            masks = self._get_current_masks()
            # Clean up old frames to prevent memory accumulation
            self._cleanup_old_frames()
            return masks
        except Exception as e:
            print(f"   Warning: Buffer processing failed: {e}")
            return np.zeros((480, 640), dtype=np.uint8)
    def _create_temp_frames(self):
        """Create temporary directory with frame images for SAM2"""
        if self.temp_dir:
            self._cleanup_temp_frames()
        self.temp_dir = tempfile.mkdtemp(prefix='sam2_streaming_')
        for i, buffer_item in enumerate(self.frame_buffer):
            frame = buffer_item['frame']
            # Convert BGR to RGB for SAM2
            frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            # Save as JPEG (SAM2 expects JPEG images in directory)
            frame_path = os.path.join(self.temp_dir, f"{i:05d}.jpg")
            cv2.imwrite(frame_path, cv2.cvtColor(frame_rgb, cv2.COLOR_RGB2BGR))
    def _add_detection_prompts(self):
        """Add detection boxes as prompts to SAM2 (standard API)"""
        for buffer_idx, buffer_item in enumerate(self.frame_buffer):
            detections = buffer_item.get('detections', [])
            for det_idx, detection in enumerate(detections):
                box = detection['box']  # [x1, y1, x2, y2]
                # Use standard SAM2 API
                try:
                    _, out_obj_ids, out_mask_logits = self.predictor.add_new_points_or_box(
                        inference_state=self.inference_state,
                        frame_idx=buffer_idx,  # Frame index within buffer
                        obj_id=det_idx,  # Simple object ID
                        box=np.array(box, dtype=np.float32)
                    )
                    # Track object IDs
                    if det_idx not in self.object_ids:
                        self.object_ids.append(det_idx)
                except Exception as e:
                    print(f"   Warning: Failed to add detection: {e}")
                    continue
    def _get_current_masks(self) -> np.ndarray:
        """Get masks for current frame via propagation"""
        if not self.object_ids:
            # No objects to track
            frame_shape = self.frame_buffer[-1]['frame'].shape
            return np.zeros((frame_shape[0], frame_shape[1]), dtype=np.uint8)
        try:
            # Use SAM2's propagate_in_video (standard API)
            latest_frame_idx = len(self.frame_buffer) - 1
            masks_for_frame = []
            for out_frame_idx, out_obj_ids, out_mask_logits in self.predictor.propagate_in_video(
                self.inference_state,
                start_frame_idx=latest_frame_idx,
                max_frame_num_to_track=1,  # Just current frame
                reverse=False
            ):
                if out_frame_idx == latest_frame_idx:
                    # Combine all object masks
                    if len(out_mask_logits) > 0:
                        combined_mask = None
                        for mask_logit in out_mask_logits:
                            mask = (mask_logit > 0.0).cpu().numpy()
                            if combined_mask is None:
                                combined_mask = mask.astype(bool)
                            else:
                                combined_mask = combined_mask | mask.astype(bool)
                        return (combined_mask * 255).astype(np.uint8)
            # If no masks found, return empty
            frame_shape = self.frame_buffer[-1]['frame'].shape
            return np.zeros((frame_shape[0], frame_shape[1]), dtype=np.uint8)
        except Exception as e:
            print(f"   Warning: Mask propagation failed: {e}")
            frame_shape = self.frame_buffer[-1]['frame'].shape
            return np.zeros((frame_shape[0], frame_shape[1]), dtype=np.uint8)
    def _propagate_existing_objects(self) -> np.ndarray:
        """Propagate existing objects without adding new detections"""
        if not self.object_ids or not self.frame_buffer:
            frame_shape = self.frame_buffer[-1]['frame'].shape if self.frame_buffer else (480, 640)
            return np.zeros((frame_shape[0], frame_shape[1]), dtype=np.uint8)
        try:
            # Update temp frames with current buffer
            self._create_temp_frames()
            # Reinitialize state (since we can't incrementally update)
            self.inference_state = self.predictor.init_state(
                video_path=self.temp_dir,
                offload_video_to_cpu=self.memory_offload,
                offload_state_to_cpu=self.memory_offload
            )
            # Re-add all previous detections from buffer
            for buffer_idx, buffer_item in enumerate(self.frame_buffer):
                detections = buffer_item.get('detections', [])
                if detections:  # Only add frames that had detections
                    for det_idx, detection in enumerate(detections):
                        box = detection['box']
                        try:
                            self.predictor.add_new_points_or_box(
                                inference_state=self.inference_state,
                                frame_idx=buffer_idx,
                                obj_id=det_idx,
                                box=np.array(box, dtype=np.float32)
                            )
                        except Exception as e:
                            print(f"   Warning: Failed to re-add detection: {e}")
            # Get masks for latest frame
            latest_frame_idx = len(self.frame_buffer) - 1
            for out_frame_idx, out_obj_ids, out_mask_logits in self.predictor.propagate_in_video(
                self.inference_state,
                start_frame_idx=latest_frame_idx,
                max_frame_num_to_track=1,
                reverse=False
            ):
                if out_frame_idx == latest_frame_idx and len(out_mask_logits) > 0:
                    combined_mask = None
                    for mask_logit in out_mask_logits:
                        mask = (mask_logit > 0.0).cpu().numpy()
                        if combined_mask is None:
                            combined_mask = mask.astype(bool)
                        else:
                            combined_mask = combined_mask | mask.astype(bool)
                    return (combined_mask * 255).astype(np.uint8)
            # If no masks, return empty
            frame_shape = self.frame_buffer[-1]['frame'].shape
            return np.zeros((frame_shape[0], frame_shape[1]), dtype=np.uint8)
        except Exception as e:
            print(f"   Warning: Object propagation failed: {e}")
            frame_shape = self.frame_buffer[-1]['frame'].shape
            return np.zeros((frame_shape[0], frame_shape[1]), dtype=np.uint8)
        except Exception as e:
            print(f"   Warning: Mask propagation failed: {e}")
            frame_shape = self.frame_buffer[-1]['frame'].shape
            return np.zeros((frame_shape[0], frame_shape[1]), dtype=np.uint8)
    def _cleanup_old_frames(self):
        """Clean up old frames from buffer (det-sam2 pattern)"""
        # Keep only recent frames to prevent memory accumulation
        if len(self.frame_buffer) > self.frame_buffer_size:
            self.frame_buffer = self.frame_buffer[-self.frame_buffer_size:]
        # Periodic GPU memory cleanup
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
        gc.collect()
    def _cleanup_temp_frames(self):
        """Clean up temporary frame directory"""
        if self.temp_dir and os.path.exists(self.temp_dir):
            import shutil
            shutil.rmtree(self.temp_dir)
            self.temp_dir = None
    def cleanup(self):
        """Clean up all resources"""
        self._cleanup_temp_frames()
        self.frame_buffer.clear()
        self.object_ids.clear()
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            torch.cuda.synchronize()
        gc.collect()
        print("🧹 Simple SAM2 streaming processor cleaned up")
    def apply_mask_to_frame(self, 
                          frame: np.ndarray, 
                          mask: np.ndarray, 
                          output_format: str = "alpha",
                          background_color: tuple = (0, 255, 0)) -> np.ndarray:
        """
        Apply mask to frame with specified output format (matches chunked implementation)
        Args:
            frame: Input frame (BGR)
            mask: Binary mask (0-255 or boolean)
            output_format: "alpha" or "greenscreen" 
            background_color: RGB background color for greenscreen mode
        Returns:
            Processed frame
        """
        if mask is None:
            return frame
        # Ensure mask is 2D (handle 3D masks properly)
        if mask.ndim == 3:
            mask = mask.squeeze()
        # Resize mask to match frame if needed (use INTER_NEAREST for binary masks)
        if mask.shape[:2] != frame.shape[:2]:
            import cv2
            # Convert to uint8 for resizing, then back to bool
            if mask.dtype == bool:
                mask_uint8 = mask.astype(np.uint8) * 255
            else:
                mask_uint8 = mask.astype(np.uint8)
            mask_resized = cv2.resize(mask_uint8, 
                                    (frame.shape[1], frame.shape[0]), 
                                    interpolation=cv2.INTER_NEAREST)
            mask = mask_resized.astype(bool) if mask.dtype == bool else mask_resized
        if output_format == "alpha":
            # Create RGBA output (matches chunked implementation)
            output = np.zeros((frame.shape[0], frame.shape[1], 4), dtype=np.uint8)
            output[:, :, :3] = frame
            if mask.dtype == bool:
                output[:, :, 3] = mask.astype(np.uint8) * 255
            else:
                output[:, :, 3] = mask.astype(np.uint8)
            return output
        elif output_format == "greenscreen":
            # Create RGB output with background (matches chunked implementation)
            output = np.full_like(frame, background_color, dtype=np.uint8)
            if mask.dtype == bool:
                output[mask] = frame[mask]
            else:
                mask_bool = mask.astype(bool)
                output[mask_bool] = frame[mask_bool]
            return output
        else:
            raise ValueError(f"Unsupported output format: {output_format}. Use 'alpha' or 'greenscreen'")
    def get_memory_usage(self) -> Dict[str, float]:
        """
        Get current memory usage statistics
        Returns:
            Dictionary with memory usage info
        """
        stats = {}
        if torch.cuda.is_available():
            # GPU memory stats
            stats['cuda_allocated_gb'] = torch.cuda.memory_allocated() / (1024**3)
            stats['cuda_reserved_gb'] = torch.cuda.memory_reserved() / (1024**3)
            stats['cuda_max_allocated_gb'] = torch.cuda.max_memory_allocated() / (1024**3)
        return stats
--- a/vr180_streaming/stereo_manager.py
+++ b/vr180_streaming/stereo_manager.py
@@ -0,0 +1,324 @@
 """
 Stereo consistency manager for VR180 side-by-side video processing
 """
 import numpy as np
 from typing import Tuple, List, Dict, Any, Optional
 import cv2
 import warnings
 class StereoConsistencyManager:
    """Manage stereo consistency between left and right eye views"""
    def __init__(self, config: Dict[str, Any]):
        self.config = config
        self.master_eye = config.get('stereo', {}).get('master_eye', 'left')
        self.disparity_correction = config.get('stereo', {}).get('disparity_correction', True)
        self.consistency_threshold = config.get('stereo', {}).get('consistency_threshold', 0.3)
        # Stereo calibration parameters (can be loaded from config)
        self.baseline = config.get('stereo', {}).get('baseline', 65.0)  # mm, typical IPD
        self.focal_length = config.get('stereo', {}).get('focal_length', 1000.0)  # pixels
        # Statistics tracking
        self.stats = {
            'frames_processed': 0,
            'corrections_applied': 0,
            'detection_transfers': 0,
            'mask_validations': 0
        }
        print(f"👀 Stereo consistency manager initialized:")
        print(f"   Master eye: {self.master_eye}")
        print(f"   Disparity correction: {self.disparity_correction}")
        print(f"   Consistency threshold: {self.consistency_threshold}")
    def split_frame(self, frame: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
        """
        Split side-by-side frame into left and right eye views
        Args:
            frame: SBS frame
        Returns:
            Tuple of (left_eye, right_eye) frames
        """
        height, width = frame.shape[:2]
        split_point = width // 2
        left_eye = frame[:, :split_point]
        right_eye = frame[:, split_point:]
        return left_eye, right_eye
    def combine_frames(self, left_eye: np.ndarray, right_eye: np.ndarray) -> np.ndarray:
        """
        Combine left and right eye frames back to SBS format
        Args:
            left_eye: Left eye frame
            right_eye: Right eye frame
        Returns:
            Combined SBS frame
        """
        # Ensure same height
        if left_eye.shape[0] != right_eye.shape[0]:
            target_height = min(left_eye.shape[0], right_eye.shape[0])
            left_eye = cv2.resize(left_eye, (left_eye.shape[1], target_height))
            right_eye = cv2.resize(right_eye, (right_eye.shape[1], target_height))
        return np.hstack([left_eye, right_eye])
    def transfer_detections(self, 
                          detections: List[Dict[str, Any]], 
                          direction: str = 'left_to_right') -> List[Dict[str, Any]]:
        """
        Transfer detections from master to slave eye with disparity adjustment
        Args:
            detections: List of detection dicts with 'box' key
            direction: Transfer direction ('left_to_right' or 'right_to_left')
        Returns:
            Transferred detections adjusted for stereo disparity
        """
        transferred = []
        for det in detections:
            box = det['box']  # [x1, y1, x2, y2]
            if self.disparity_correction:
                # Calculate disparity based on estimated depth
                # Closer objects have larger disparity
                box_width = box[2] - box[0]
                estimated_depth = self._estimate_depth_from_size(box_width)
                disparity = self._calculate_disparity(estimated_depth)
                # Apply disparity shift
                if direction == 'left_to_right':
                    # Right eye sees objects shifted left
                    adjusted_box = [
                        box[0] - disparity,
                        box[1],
                        box[2] - disparity,
                        box[3]
                    ]
                else:  # right_to_left
                    # Left eye sees objects shifted right
                    adjusted_box = [
                        box[0] + disparity,
                        box[1],
                        box[2] + disparity,
                        box[3]
                    ]
            else:
                # No disparity correction
                adjusted_box = box.copy()
            # Create transferred detection
            transferred_det = det.copy()
            transferred_det['box'] = adjusted_box
            transferred_det['confidence'] = det.get('confidence', 1.0) * 0.95  # Slight reduction
            transferred_det['transferred'] = True
            transferred.append(transferred_det)
        self.stats['detection_transfers'] += len(detections)
        return transferred
    def validate_masks(self, 
                      left_masks: np.ndarray, 
                      right_masks: np.ndarray,
                      frame_idx: int = 0) -> np.ndarray:
        """
        Validate and correct right eye masks based on left eye
        Args:
            left_masks: Master eye masks
            right_masks: Slave eye masks to validate
            frame_idx: Current frame index for logging
        Returns:
            Validated/corrected right eye masks
        """
        self.stats['mask_validations'] += 1
        # Quick validation - compare mask areas
        left_area = np.sum(left_masks > 0)
        right_area = np.sum(right_masks > 0)
        if left_area == 0:
            # No person in left eye, clear right eye too
            if right_area > 0:
                warnings.warn(f"Frame {frame_idx}: No person in left eye but found in right - clearing")
                self.stats['corrections_applied'] += 1
                return np.zeros_like(right_masks)
            return right_masks
        # Calculate area ratio
        area_ratio = right_area / (left_area + 1e-6)
        # Check if correction needed
        if abs(area_ratio - 1.0) > self.consistency_threshold:
            print(f"   Frame {frame_idx}: Area mismatch (ratio={area_ratio:.2f}) - applying correction")
            self.stats['corrections_applied'] += 1
            # Apply correction based on severity
            if area_ratio < 0.5 or area_ratio > 2.0:
                # Significant difference - use template matching
                right_masks = self._correct_mask_from_template(left_masks, right_masks)
            else:
                # Minor difference - blend masks
                right_masks = self._blend_masks(left_masks, right_masks, area_ratio)
        return right_masks
    def combine_masks(self, left_masks: np.ndarray, right_masks: np.ndarray) -> np.ndarray:
        """
        Combine left and right eye masks back to SBS format
        Args:
            left_masks: Left eye masks
            right_masks: Right eye masks
        Returns:
            Combined SBS masks
        """
        # Handle different mask formats
        if left_masks.ndim == 2 and right_masks.ndim == 2:
            # Single channel masks
            return np.hstack([left_masks, right_masks])
        elif left_masks.ndim == 3 and right_masks.ndim == 3:
            # Multi-channel masks (e.g., per-object)
            return np.concatenate([left_masks, right_masks], axis=1)
        else:
            raise ValueError(f"Incompatible mask dimensions: {left_masks.shape} vs {right_masks.shape}")
    def _estimate_depth_from_size(self, object_width_pixels: float) -> float:
        """
        Estimate object depth from its width in pixels
        Assumes average human width of 45cm
        Args:
            object_width_pixels: Width of detected person in pixels
        Returns:
            Estimated depth in meters
        """
        HUMAN_WIDTH_M = 0.45  # Average human shoulder width
        # Using similar triangles: depth = (focal_length * real_width) / pixel_width
        depth = (self.focal_length * HUMAN_WIDTH_M) / max(object_width_pixels, 1)
        # Clamp to reasonable range (0.5m to 10m)
        return np.clip(depth, 0.5, 10.0)
    def _calculate_disparity(self, depth_m: float) -> float:
        """
        Calculate stereo disparity in pixels for given depth
        Args:
            depth_m: Depth in meters
        Returns:
            Disparity in pixels
        """
        # Disparity = (baseline * focal_length) / depth
        # Convert baseline from mm to m
        disparity_pixels = (self.baseline / 1000.0 * self.focal_length) / depth_m
        return disparity_pixels
    def _correct_mask_from_template(self, 
                                   template_mask: np.ndarray, 
                                   target_mask: np.ndarray) -> np.ndarray:
        """
        Correct target mask using template mask with disparity adjustment
        Args:
            template_mask: Master eye mask to use as template
            target_mask: Mask to correct
        Returns:
            Corrected mask
        """
        if not self.disparity_correction:
            # Simple copy without disparity
            return template_mask.copy()
        # Calculate average disparity from mask centroid
        template_moments = cv2.moments(template_mask.astype(np.uint8))
        if template_moments['m00'] > 0:
            cx_template = int(template_moments['m10'] / template_moments['m00'])
            # Estimate depth from mask size
            mask_width = np.sum(np.any(template_mask > 0, axis=0))
            depth = self._estimate_depth_from_size(mask_width)
            disparity = int(self._calculate_disparity(depth))
            # Shift template mask by disparity
            if self.master_eye == 'left':
                # Right eye sees shifted left
                translation = np.float32([[1, 0, -disparity], [0, 1, 0]])
            else:
                # Left eye sees shifted right  
                translation = np.float32([[1, 0, disparity], [0, 1, 0]])
            corrected = cv2.warpAffine(
                template_mask.astype(np.float32),
                translation,
                (template_mask.shape[1], template_mask.shape[0])
            )
            return corrected
        else:
            # No valid mask to correct from
            return template_mask.copy()
    def _blend_masks(self, 
                    mask1: np.ndarray, 
                    mask2: np.ndarray, 
                    area_ratio: float) -> np.ndarray:
        """
        Blend two masks based on area ratio
        Args:
            mask1: First mask
            mask2: Second mask  
            area_ratio: Ratio of mask2/mask1 areas
        Returns:
            Blended mask
        """
        # Calculate blend weight based on how far off the ratio is
        blend_weight = min(abs(area_ratio - 1.0) / self.consistency_threshold, 1.0)
        # Blend towards mask1 (master) based on weight
        blended = mask1 * blend_weight + mask2 * (1 - blend_weight)
        # Threshold to binary
        return (blended > 0.5).astype(mask1.dtype)
    def get_stats(self) -> Dict[str, Any]:
        """Get processing statistics"""
        self.stats['frames_processed'] = self.stats.get('mask_validations', 0)
        if self.stats['frames_processed'] > 0:
            self.stats['correction_rate'] = (
                self.stats['corrections_applied'] / self.stats['frames_processed']
            )
        else:
            self.stats['correction_rate'] = 0.0
        return self.stats.copy()
    def reset_stats(self) -> None:
        """Reset statistics"""
        self.stats = {
            'frames_processed': 0,
            'corrections_applied': 0,
            'detection_transfers': 0,
            'mask_validations': 0
        }
--- a/vr180_streaming/streaming_processor.py
+++ b/vr180_streaming/streaming_processor.py
@@ -0,0 +1,418 @@
 """
 Main VR180 streaming processor - orchestrates all components for true streaming
 """
 import time
 import gc
 import json
 import psutil
 import torch
 import numpy as np
 from pathlib import Path
 from typing import Dict, Any, Optional, Tuple
 import warnings
 from .frame_reader import StreamingFrameReader
 from .frame_writer import StreamingFrameWriter
 from .stereo_manager import StereoConsistencyManager
 from .sam2_streaming_simple import SAM2StreamingProcessor
 from .detector import PersonDetector
 from .config import StreamingConfig
 class VR180StreamingProcessor:
    """Main processor for streaming VR180 human matting"""
    def __init__(self, config: StreamingConfig):
        self.config = config
        # Initialize components
        self.frame_reader = None
        self.frame_writer = None
        self.stereo_manager = None
        self.sam2_processor = None
        self.detector = None
        # Processing state
        self.start_time = None
        self.frames_processed = 0
        self.checkpoint_state = {}
        # Performance monitoring
        self.process = psutil.Process()
        self.performance_stats = {
            'fps': 0.0,
            'avg_frame_time': 0.0,
            'peak_memory_gb': 0.0,
            'gpu_utilization': 0.0
        }
    def initialize(self) -> None:
        """Initialize all components"""
        print("\n🚀 Initializing VR180 Streaming Processor")
        print("=" * 60)
        # Initialize frame reader
        start_frame = self._load_checkpoint() if self.config.recovery.auto_resume else 0
        self.frame_reader = StreamingFrameReader(
            self.config.input.video_path,
            start_frame=start_frame
        )
        # Get video info
        video_info = self.frame_reader.get_video_info()
        # Apply scaling to dimensions
        scale = self.config.processing.scale_factor
        output_width = int(video_info['width'] * scale)
        output_height = int(video_info['height'] * scale)
        # Initialize frame writer
        self.frame_writer = StreamingFrameWriter(
            output_path=self.config.output.path,
            width=output_width,
            height=output_height,
            fps=video_info['fps'],
            audio_source=self.config.input.video_path if self.config.output.maintain_sbs else None,
            video_codec=self.config.output.video_codec,
            quality_preset=self.config.output.quality_preset,
            crf=self.config.output.crf
        )
        # Initialize stereo manager
        self.stereo_manager = StereoConsistencyManager(self.config.to_dict())
        # Initialize SAM2 processor
        self.sam2_processor = SAM2StreamingProcessor(self.config.to_dict())
        # Initialize detector
        self.detector = PersonDetector(self.config.to_dict())
        self.detector.warmup((output_height // 2, output_width // 2, 3))  # Warmup with single eye dims
        print("\n✅ All components initialized successfully!")
        print(f"   Input: {video_info['width']}x{video_info['height']} @ {video_info['fps']}fps")
        print(f"   Output: {output_width}x{output_height} @ {video_info['fps']}fps")
        print(f"   Scale factor: {scale}")
        print(f"   Starting from frame: {start_frame}")
        print("=" * 60 + "\n")
    def process_video(self) -> None:
        """Main processing loop"""
        try:
            self.initialize()
            self.start_time = time.time()
            # Simple SAM2 initialization (no complex state management needed)
            print("🎯 SAM2 streaming processor ready...")
            # Process first frame to establish detections
            print("🔍 Processing first frame for initial detection...")
            if not self._initialize_tracking():
                raise RuntimeError("Failed to initialize tracking - no persons detected")
            # Main streaming loop
            print("\n🎬 Starting streaming processing loop...")
            self._streaming_loop()
        except KeyboardInterrupt:
            print("\n⚠️  Processing interrupted by user")
            self._save_checkpoint()
        except Exception as e:
            print(f"\n❌ Error during processing: {e}")
            self._save_checkpoint()
            raise
        finally:
            self._finalize()
    def _initialize_tracking(self) -> bool:
        """Initialize tracking with first frame detection"""
        # Read and process first frame
        first_frame = self.frame_reader.read_frame()
        if first_frame is None:
            raise RuntimeError("Cannot read first frame")
        # Scale frame if needed
        if self.config.processing.scale_factor != 1.0:
            first_frame = self._scale_frame(first_frame)
        # Split into eyes
        left_eye, right_eye = self.stereo_manager.split_frame(first_frame)
        # Detect on master eye
        master_eye = left_eye if self.stereo_manager.master_eye == 'left' else right_eye
        detections = self.detector.detect_persons(master_eye)
        if not detections:
            warnings.warn("No persons detected in first frame")
            return False
        print(f"   Detected {len(detections)} person(s) in first frame")
        # Process with simple SAM2 approach
        left_masks = self.sam2_processor.add_frame_and_detections(left_eye, detections, 0)
        # Transfer detections to right eye
        transferred_detections = self.stereo_manager.transfer_detections(
            detections, 
            'left_to_right' if self.stereo_manager.master_eye == 'left' else 'right_to_left'
        )
        right_masks = self.sam2_processor.add_frame_and_detections(right_eye, transferred_detections, 0)
        # Apply masks and write
        processed_frame = self._apply_masks_to_frame(first_frame, left_masks, right_masks)
        self.frame_writer.write_frame(processed_frame)
        self.frames_processed = 1
        return True
    def _streaming_loop(self) -> None:
        """Main streaming processing loop"""
        frame_times = []
        last_log_time = time.time()
        # Start from frame 1 (already processed frame 0)
        for frame_idx, frame in enumerate(self.frame_reader, start=1):
            frame_start_time = time.time()
            # Scale frame if needed
            if self.config.processing.scale_factor != 1.0:
                frame = self._scale_frame(frame)
            # Split into eyes
            left_eye, right_eye = self.stereo_manager.split_frame(frame)
            # Check if we need to run detection for continuous correction
            detections = []
            if (self.config.matting.continuous_correction and 
                frame_idx % self.config.matting.correction_interval == 0):
                print(f"\n🔄 Running YOLO detection for correction at frame {frame_idx}")
                master_eye = left_eye if self.stereo_manager.master_eye == 'left' else right_eye
                detections = self.detector.detect_persons(master_eye)
                if detections:
                    print(f"   Detected {len(detections)} person(s) for correction")
                else:
                    print(f"   No persons detected for correction")
            # Process frames (with detections if this is a correction frame)
            left_masks = self.sam2_processor.add_frame_and_detections(left_eye, detections, frame_idx)
            # For right eye, transfer detections if we have them
            if detections:
                transferred_detections = self.stereo_manager.transfer_detections(
                    detections, 
                    'left_to_right' if self.stereo_manager.master_eye == 'left' else 'right_to_left'
                )
                right_masks = self.sam2_processor.add_frame_and_detections(right_eye, transferred_detections, frame_idx)
            else:
                right_masks = self.sam2_processor.add_frame_and_detections(right_eye, [], frame_idx)
            # Validate stereo consistency
            right_masks = self.stereo_manager.validate_masks(
                left_masks, right_masks, frame_idx
            )
            # Apply masks and write frame
            processed_frame = self._apply_masks_to_frame(frame, left_masks, right_masks)
            self.frame_writer.write_frame(processed_frame)
            # Update stats
            frame_time = time.time() - frame_start_time
            frame_times.append(frame_time)
            self.frames_processed += 1
            # Periodic logging and cleanup
            if frame_idx % self.config.performance.log_interval == 0:
                self._log_progress(frame_idx, frame_times)
                frame_times = frame_times[-100:]  # Keep only recent times
            # Checkpoint saving
            if (self.config.recovery.enable_checkpoints and 
                frame_idx % self.config.recovery.checkpoint_interval == 0):
                self._save_checkpoint()
            # Memory monitoring and cleanup
            if frame_idx % 50 == 0:
                self._monitor_and_cleanup()
            # Check max frames limit
            if (self.config.input.max_frames is not None and 
                self.frames_processed >= self.config.input.max_frames):
                print(f"\n✅ Reached max frames limit ({self.config.input.max_frames})")
                break
    def _scale_frame(self, frame: np.ndarray) -> np.ndarray:
        """Scale frame according to configuration"""
        scale = self.config.processing.scale_factor
        if scale == 1.0:
            return frame
        new_width = int(frame.shape[1] * scale)
        new_height = int(frame.shape[0] * scale)
        import cv2
        return cv2.resize(frame, (new_width, new_height), interpolation=cv2.INTER_AREA)
    def _apply_masks_to_frame(self, 
                            frame: np.ndarray,
                            left_masks: np.ndarray,
                            right_masks: np.ndarray) -> np.ndarray:
        """Apply masks to frame and combine results"""
        # Split frame
        left_eye, right_eye = self.stereo_manager.split_frame(frame)
        # Apply masks to each eye
        left_processed = self.sam2_processor.apply_mask_to_frame(
            left_eye, left_masks,
            output_format=self.config.output.format,
            background_color=self.config.output.background_color
        )
        right_processed = self.sam2_processor.apply_mask_to_frame(
            right_eye, right_masks,
            output_format=self.config.output.format,
            background_color=self.config.output.background_color
        )
        # Combine back to SBS
        if self.config.output.maintain_sbs:
            return self.stereo_manager.combine_frames(left_processed, right_processed)
        else:
            # Return just left eye for non-SBS output
            return left_processed
    def _apply_continuous_correction(self,
                                   left_eye: np.ndarray,
                                   right_eye: np.ndarray,
                                   frame_idx: int) -> None:
        """Apply continuous correction to maintain tracking accuracy"""
        print(f"\n🔄 Applying continuous correction at frame {frame_idx}")
        # Detect on master eye and add fresh detections
        master_eye = left_eye if self.stereo_manager.master_eye == 'left' else right_eye
        detections = self.detector.detect_persons(master_eye)
        if detections:
            print(f"   Adding {len(detections)} fresh detection(s) for correction")
            # Add fresh detections to help correct drift
            self.sam2_processor.add_frame_and_detections(master_eye, detections, frame_idx)
        # Transfer corrections to slave eye
        # Note: This is simplified - actual implementation would transfer the refined prompts
    def _log_progress(self, frame_idx: int, frame_times: list) -> None:
        """Log processing progress"""
        elapsed = time.time() - self.start_time
        avg_frame_time = np.mean(frame_times) if frame_times else 0
        fps = 1.0 / avg_frame_time if avg_frame_time > 0 else 0
        # Memory stats
        memory_info = self.process.memory_info()
        memory_gb = memory_info.rss / (1024**3)
        # GPU stats if available
        gpu_stats = self.sam2_processor.get_memory_usage()
        # Progress percentage
        progress = self.frame_reader.get_progress()
        print(f"\n📊 Progress: Frame {frame_idx} ({progress:.1f}%)")
        print(f"   Speed: {fps:.1f} FPS (avg: {avg_frame_time*1000:.1f}ms/frame)")
        print(f"   Memory: {memory_gb:.1f}GB RAM", end="")
        if 'cuda_allocated_gb' in gpu_stats:
            print(f", {gpu_stats['cuda_allocated_gb']:.1f}GB VRAM")
        else:
            print()
        print(f"   Time elapsed: {elapsed/60:.1f} minutes")
        # Update performance stats
        self.performance_stats['fps'] = fps
        self.performance_stats['avg_frame_time'] = avg_frame_time
        self.performance_stats['peak_memory_gb'] = max(
            self.performance_stats['peak_memory_gb'], memory_gb
        )
    def _monitor_and_cleanup(self) -> None:
        """Monitor memory and perform cleanup if needed"""
        memory_info = self.process.memory_info()
        memory_gb = memory_info.rss / (1024**3)
        # Check if approaching limits
        if memory_gb > self.config.hardware.max_ram_gb * 0.8:
            print(f"\n⚠️  High memory usage ({memory_gb:.1f}GB) - running cleanup")
            gc.collect()
            if torch.cuda.is_available():
                torch.cuda.empty_cache()
                torch.cuda.synchronize()
    def _save_checkpoint(self) -> None:
        """Save processing checkpoint"""
        if not self.config.recovery.enable_checkpoints:
            return
        checkpoint_dir = Path(self.config.recovery.checkpoint_dir)
        checkpoint_dir.mkdir(exist_ok=True)
        checkpoint_file = checkpoint_dir / f"{Path(self.config.output.path).stem}_checkpoint.json"
        checkpoint_data = {
            'frame_index': self.frames_processed,
            'timestamp': time.time(),
            'input_video': self.config.input.video_path,
            'output_video': self.config.output.path,
            'config': self.config.to_dict()
        }
        with open(checkpoint_file, 'w') as f:
            json.dump(checkpoint_data, f, indent=2)
        print(f"💾 Checkpoint saved at frame {self.frames_processed}")
    def _load_checkpoint(self) -> int:
        """Load checkpoint if exists"""
        checkpoint_dir = Path(self.config.recovery.checkpoint_dir)
        checkpoint_file = checkpoint_dir / f"{Path(self.config.output.path).stem}_checkpoint.json"
        if checkpoint_file.exists():
            with open(checkpoint_file, 'r') as f:
                checkpoint_data = json.load(f)
            if checkpoint_data['input_video'] == self.config.input.video_path:
                start_frame = checkpoint_data['frame_index']
                print(f"📂 Found checkpoint - resuming from frame {start_frame}")
                return start_frame
        return 0
    def _finalize(self) -> None:
        """Finalize processing and cleanup"""
        print("\n🏁 Finalizing processing...")
        # Close components
        if self.frame_writer:
            self.frame_writer.close()
        if self.frame_reader:
            self.frame_reader.close()
        if self.sam2_processor:
            self.sam2_processor.cleanup()
        # Print final statistics
        if self.start_time:
            total_time = time.time() - self.start_time
            print(f"\n📈 Final Statistics:")
            print(f"   Total frames: {self.frames_processed}")
            print(f"   Total time: {total_time/60:.1f} minutes")
            print(f"   Average FPS: {self.frames_processed/total_time:.1f}")
            print(f"   Peak memory: {self.performance_stats['peak_memory_gb']:.1f}GB")
            # Stereo consistency stats
            stereo_stats = self.stereo_manager.get_stats()
            print(f"\n👀 Stereo Consistency:")
            print(f"   Corrections applied: {stereo_stats['corrections_applied']}")
            print(f"   Correction rate: {stereo_stats['correction_rate']*100:.1f}%")
        print("\n✅ Processing complete!")
--- a/vr180_streaming/timeout_init.py
+++ b/vr180_streaming/timeout_init.py
@@ -0,0 +1,45 @@
 """
 Timeout wrapper for SAM2 initialization to prevent hanging
 """
 import signal
 import functools
 from typing import Any, Callable
 class TimeoutError(Exception):
    pass
 def timeout(seconds: int = 120):
    """Decorator to add timeout to function calls"""
    def decorator(func: Callable) -> Callable:
        @functools.wraps(func)
        def wrapper(*args, **kwargs) -> Any:
            # Define signal handler
            def timeout_handler(signum, frame):
                raise TimeoutError(f"Function {func.__name__} timed out after {seconds} seconds")
            # Set signal handler
            old_handler = signal.signal(signal.SIGALRM, timeout_handler)
            signal.alarm(seconds)
            try:
                result = func(*args, **kwargs)
            finally:
                # Restore old handler
                signal.alarm(0)
                signal.signal(signal.SIGALRM, old_handler)
            return result
        return wrapper
    return decorator
@timeout(120)  # 2 minute timeout
 def safe_init_state(predictor, video_path: str, **kwargs) -> Any:
    """Safely initialize SAM2 state with timeout"""
    return predictor.init_state(
        video_path=video_path,
        **kwargs
    )
Author	SHA1	Message	Date
Scott Register	c1aa11e5a0	idk	2025-07-27 10:37:40 -07:00
Scott Register	f0cf3341af	amp	2025-07-27 10:23:25 -07:00
Scott Register	ee330fa322	exccept	2025-07-27 10:20:25 -07:00
Scott Register	1e9c42adbd	fix streaming	2025-07-27 10:16:39 -07:00
Scott Register	9cc755b5c7	cupy and mask	2025-07-27 10:10:00 -07:00
Scott Register	300ae5613e	fucking llms	2025-07-27 10:01:12 -07:00
Scott Register	a479d6a5f0	wtf	2025-07-27 09:57:42 -07:00
Scott Register	e38f63f539	simplify2	2025-07-27 09:55:52 -07:00
Scott Register	66895a87a0	simplify	2025-07-27 09:52:56 -07:00
Scott Register	43be574729	debug	2025-07-27 09:26:47 -07:00
Scott Register	9b7f36fec2	bullshit	2025-07-27 09:23:15 -07:00
Scott Register	7b3ffb7830	idk	2025-07-27 09:20:42 -07:00
Scott Register	1d15fb5bc8	please fucking work	2025-07-27 09:15:48 -07:00
Scott Register	2e5ded7dbf	fix api	2025-07-27 09:04:40 -07:00
Scott Register	3a59e87f3e	fix something	2025-07-27 08:58:43 -07:00
Scott Register	abc48604a1	timeout init	2025-07-27 08:55:42 -07:00
Scott Register	ee80ed28b6	add stuff true streaming	2025-07-27 08:54:19 -07:00
Scott Register	b5eae7b41d	pytorch shit	2025-07-27 08:40:59 -07:00
Scott Register	4cc14bc0a9	nvenc	2025-07-27 08:34:57 -07:00
Scott Register	9faaf4ed57	more stuff	2025-07-27 08:19:42 -07:00
Scott Register	7431954482	add main	2025-07-27 08:15:56 -07:00
Scott Register	f0208f0983	fixup some running stuff	2025-07-27 08:10:20 -07:00
Scott Register	4b058c2405	streaming part1	2025-07-27 08:01:08 -07:00