8.0 KiB
8.0 KiB
Plan: Separate Left/Right Eye Processing for VR180 SAM2 Pipeline
Overview
Implement a new processing mode that splits VR180 side-by-side frames into separate left and right halves, processes each eye independently through SAM2, then recombines them into the final output. This should improve tracking accuracy by removing parallax confusion between eyes.
Key Changes Required
1. Configuration Updates
File: config.yaml
- Add new configuration option:
processing.separate_eye_processing: false(default off for backward compatibility) - Add related options:
processing.enable_greenscreen_fallback: true(render full green if no humans detected)processing.eye_overlap_pixels: 0(optional overlap for blending)
2. Core SAM2 Processor Enhancements
File: core/sam2_processor.py
New Methods:
split_frame_into_eyes(frame) -> (left_frame, right_frame)split_video_into_eyes(video_path, left_output, right_output, scale)process_single_eye_segment(segment_info, eye_side, yolo_prompts, previous_masks, inference_scale)combine_eye_masks(left_masks, right_masks, full_frame_shape) -> combined_maskscreate_greenscreen_segment(segment_info, duration_seconds) -> bool
Modified Methods:
process_single_segment()- Add branch for separate eye processing mode- New processing flow:
- Check if separate_eye_processing enabled
- If enabled: split segment video into left/right eye videos
- Process each eye independently with SAM2
- Combine masks back to full frame format
- If fallback needed: create full greenscreen segment
3. YOLO Detector Enhancements
File: core/yolo_detector.py
New Methods:
detect_humans_in_single_eye(frame, eye_side) -> List[Dict]convert_eye_detections_to_sam2_prompts(detections, eye_side) -> List[Dict]has_any_detections(detections_list) -> bool
Modified Methods:
detect_humans_in_video_first_frame()- Add eye-specific detection support- Object ID assignment: Always use obj_id=1 for single-eye processing (since each eye is processed independently)
4. Mask Processor Updates
File: core/mask_processor.py
New Methods:
create_full_greenscreen_frame(frame_shape) -> np.ndarrayprocess_greenscreen_only_segment(segment_info, frame_count) -> bool
Modified Methods:
apply_green_mask()- Handle combined eye masks properly- Add support for full-greenscreen fallback when no humans detected
5. Main Pipeline Integration
File: main.py
Processing Flow Changes:
# For each segment:
if config.get('processing.separate_eye_processing', False):
# 1. Run YOLO on full frame to check for ANY human presence
full_frame_detections = detector.detect_humans_in_video_first_frame(segment_video)
if not full_frame_detections:
# No humans detected anywhere - create full greenscreen segment
success = mask_processor.process_greenscreen_only_segment(segment_info, expected_frame_count)
continue
# 2. Split detections by eye and process separately
left_detections = [d for d in full_frame_detections if is_in_left_half(d, frame_width)]
right_detections = [d for d in full_frame_detections if is_in_right_half(d, frame_width)]
# 3. Process left eye (if detections exist)
left_masks = None
if left_detections:
left_eye_prompts = detector.convert_eye_detections_to_sam2_prompts(left_detections, 'left')
left_masks = sam2_processor.process_single_eye_segment(segment_info, 'left', left_eye_prompts, previous_left_masks, inference_scale)
# 4. Process right eye (if detections exist)
right_masks = None
if right_detections:
right_eye_prompts = detector.convert_eye_detections_to_sam2_prompts(right_detections, 'right')
right_masks = sam2_processor.process_single_eye_segment(segment_info, 'right', right_eye_prompts, previous_right_masks, inference_scale)
# 5. Combine masks back to full frame format
if left_masks or right_masks:
combined_masks = sam2_processor.combine_eye_masks(left_masks, right_masks, full_frame_shape)
# Continue with normal mask processing...
else:
# Neither eye had trackable humans - full greenscreen fallback
success = mask_processor.process_greenscreen_only_segment(segment_info, expected_frame_count)
else:
# Original processing mode (current behavior)
# ... existing logic unchanged
6. File Structure Changes
New Files:
core/eye_processor.py- Dedicated class for eye-specific operationsutils/video_utils.py- Video manipulation utilities (splitting, combining)
Modified Files:
- All core processing modules as detailed above
- Update logging to distinguish left/right eye processing
- Update debug frame generation for eye-specific visualization
7. Debug and Monitoring Enhancements
Debug Outputs:
left_eye_debug.jpg- Left eye YOLO detectionsright_eye_debug.jpg- Right eye YOLO detectionsleft_eye_sam2_masks.jpg- Left eye SAM2 resultsright_eye_sam2_masks.jpg- Right eye SAM2 resultscombined_masks_debug.jpg- Final combined result
Logging Enhancements:
- Clear distinction between left/right eye processing stages
- Performance metrics for each eye processing
- Fallback trigger logging when no humans detected
8. Performance Considerations
Optimizations:
- Parallel Processing: Process left and right eyes simultaneously using threading
- Selective Processing: Skip SAM2 for eyes with no YOLO detections
- Memory Management: Clean up intermediate eye videos promptly
- Caching: Cache split eye videos if processing multiple segments
Resource Usage:
- Memory: ~2x peak usage during eye processing (temporary)
- Storage: Temporary left/right eye videos (~1.5x original size)
- Compute: Potentially faster overall due to smaller frame processing
9. Backward Compatibility
Default Behavior:
separate_eye_processing: falseby default- Existing configurations work unchanged
- All current functionality preserved
Migration Path:
- Users can gradually test new mode on problematic segments
- Configuration flag allows easy A/B testing
- Existing debug outputs remain functional
10. Error Handling and Fallbacks
Robust Error Recovery:
- If eye splitting fails → fall back to original processing
- If single eye SAM2 fails → use greenscreen for that eye
- If both eyes fail → full greenscreen segment
- Comprehensive logging of all fallback triggers
Quality Validation:
- Verify combined masks have reasonable pixel counts
- Check for mask alignment issues between eyes
- Validate segment completeness before marking done
Implementation Priority
Phase 1 (Core Functionality)
- Configuration schema updates
- Basic eye splitting and recombining logic
- Modified SAM2 processor with separate eye support
- Greenscreen fallback implementation
Phase 2 (Integration)
- Main pipeline integration with new processing mode
- YOLO detector eye-specific enhancements
- Mask processor updates for combined masks
- Basic error handling and fallbacks
Phase 3 (Polish)
- Performance optimizations (parallel processing)
- Enhanced debug outputs and logging
- Comprehensive testing and validation
- Documentation updates
Expected Benefits
Tracking Improvements:
- Eliminated Parallax Confusion: SAM2 processes single viewpoint per eye
- Better Object Consistency: Single object tracking per eye view
- Improved Temporal Coherence: Less cross-eye interference
- Reduced False Positives: Eye-specific context for tracking
Operational Benefits:
- Graceful Degradation: Full greenscreen when humans not detected
- Flexible Processing: Can enable/disable per pipeline
- Better Debug Visibility: Eye-specific debug outputs
- Performance Scalability: Smaller frames = faster processing per eye
This plan maintains full backward compatibility while adding the requested separate eye processing capability with robust fallback mechanisms.