198 lines
8.0 KiB
Markdown
198 lines
8.0 KiB
Markdown
# Plan: Separate Left/Right Eye Processing for VR180 SAM2 Pipeline
|
|
|
|
## Overview
|
|
Implement a new processing mode that splits VR180 side-by-side frames into separate left and right halves, processes each eye independently through SAM2, then recombines them into the final output. This should improve tracking accuracy by removing parallax confusion between eyes.
|
|
|
|
## Key Changes Required
|
|
|
|
### 1. Configuration Updates
|
|
**File: `config.yaml`**
|
|
- Add new configuration option: `processing.separate_eye_processing: false` (default off for backward compatibility)
|
|
- Add related options:
|
|
- `processing.enable_greenscreen_fallback: true` (render full green if no humans detected)
|
|
- `processing.eye_overlap_pixels: 0` (optional overlap for blending)
|
|
|
|
### 2. Core SAM2 Processor Enhancements
|
|
**File: `core/sam2_processor.py`**
|
|
|
|
#### New Methods:
|
|
- `split_frame_into_eyes(frame) -> (left_frame, right_frame)`
|
|
- `split_video_into_eyes(video_path, left_output, right_output, scale)`
|
|
- `process_single_eye_segment(segment_info, eye_side, yolo_prompts, previous_masks, inference_scale)`
|
|
- `combine_eye_masks(left_masks, right_masks, full_frame_shape) -> combined_masks`
|
|
- `create_greenscreen_segment(segment_info, duration_seconds) -> bool`
|
|
|
|
#### Modified Methods:
|
|
- `process_single_segment()` - Add branch for separate eye processing mode
|
|
- New processing flow:
|
|
1. Check if separate_eye_processing enabled
|
|
2. If enabled: split segment video into left/right eye videos
|
|
3. Process each eye independently with SAM2
|
|
4. Combine masks back to full frame format
|
|
5. If fallback needed: create full greenscreen segment
|
|
|
|
### 3. YOLO Detector Enhancements
|
|
**File: `core/yolo_detector.py`**
|
|
|
|
#### New Methods:
|
|
- `detect_humans_in_single_eye(frame, eye_side) -> List[Dict]`
|
|
- `convert_eye_detections_to_sam2_prompts(detections, eye_side) -> List[Dict]`
|
|
- `has_any_detections(detections_list) -> bool`
|
|
|
|
#### Modified Methods:
|
|
- `detect_humans_in_video_first_frame()` - Add eye-specific detection support
|
|
- Object ID assignment: Always use obj_id=1 for single-eye processing (since each eye is processed independently)
|
|
|
|
### 4. Mask Processor Updates
|
|
**File: `core/mask_processor.py`**
|
|
|
|
#### New Methods:
|
|
- `create_full_greenscreen_frame(frame_shape) -> np.ndarray`
|
|
- `process_greenscreen_only_segment(segment_info, frame_count) -> bool`
|
|
|
|
#### Modified Methods:
|
|
- `apply_green_mask()` - Handle combined eye masks properly
|
|
- Add support for full-greenscreen fallback when no humans detected
|
|
|
|
### 5. Main Pipeline Integration
|
|
**File: `main.py`**
|
|
|
|
#### Processing Flow Changes:
|
|
```python
|
|
# For each segment:
|
|
if config.get('processing.separate_eye_processing', False):
|
|
# 1. Run YOLO on full frame to check for ANY human presence
|
|
full_frame_detections = detector.detect_humans_in_video_first_frame(segment_video)
|
|
|
|
if not full_frame_detections:
|
|
# No humans detected anywhere - create full greenscreen segment
|
|
success = mask_processor.process_greenscreen_only_segment(segment_info, expected_frame_count)
|
|
continue
|
|
|
|
# 2. Split detections by eye and process separately
|
|
left_detections = [d for d in full_frame_detections if is_in_left_half(d, frame_width)]
|
|
right_detections = [d for d in full_frame_detections if is_in_right_half(d, frame_width)]
|
|
|
|
# 3. Process left eye (if detections exist)
|
|
left_masks = None
|
|
if left_detections:
|
|
left_eye_prompts = detector.convert_eye_detections_to_sam2_prompts(left_detections, 'left')
|
|
left_masks = sam2_processor.process_single_eye_segment(segment_info, 'left', left_eye_prompts, previous_left_masks, inference_scale)
|
|
|
|
# 4. Process right eye (if detections exist)
|
|
right_masks = None
|
|
if right_detections:
|
|
right_eye_prompts = detector.convert_eye_detections_to_sam2_prompts(right_detections, 'right')
|
|
right_masks = sam2_processor.process_single_eye_segment(segment_info, 'right', right_eye_prompts, previous_right_masks, inference_scale)
|
|
|
|
# 5. Combine masks back to full frame format
|
|
if left_masks or right_masks:
|
|
combined_masks = sam2_processor.combine_eye_masks(left_masks, right_masks, full_frame_shape)
|
|
# Continue with normal mask processing...
|
|
else:
|
|
# Neither eye had trackable humans - full greenscreen fallback
|
|
success = mask_processor.process_greenscreen_only_segment(segment_info, expected_frame_count)
|
|
|
|
else:
|
|
# Original processing mode (current behavior)
|
|
# ... existing logic unchanged
|
|
```
|
|
|
|
### 6. File Structure Changes
|
|
|
|
#### New Files:
|
|
- `core/eye_processor.py` - Dedicated class for eye-specific operations
|
|
- `utils/video_utils.py` - Video manipulation utilities (splitting, combining)
|
|
|
|
#### Modified Files:
|
|
- All core processing modules as detailed above
|
|
- Update logging to distinguish left/right eye processing
|
|
- Update debug frame generation for eye-specific visualization
|
|
|
|
### 7. Debug and Monitoring Enhancements
|
|
|
|
#### Debug Outputs:
|
|
- `left_eye_debug.jpg` - Left eye YOLO detections
|
|
- `right_eye_debug.jpg` - Right eye YOLO detections
|
|
- `left_eye_sam2_masks.jpg` - Left eye SAM2 results
|
|
- `right_eye_sam2_masks.jpg` - Right eye SAM2 results
|
|
- `combined_masks_debug.jpg` - Final combined result
|
|
|
|
#### Logging Enhancements:
|
|
- Clear distinction between left/right eye processing stages
|
|
- Performance metrics for each eye processing
|
|
- Fallback trigger logging when no humans detected
|
|
|
|
### 8. Performance Considerations
|
|
|
|
#### Optimizations:
|
|
- **Parallel Processing**: Process left and right eyes simultaneously using threading
|
|
- **Selective Processing**: Skip SAM2 for eyes with no YOLO detections
|
|
- **Memory Management**: Clean up intermediate eye videos promptly
|
|
- **Caching**: Cache split eye videos if processing multiple segments
|
|
|
|
#### Resource Usage:
|
|
- **Memory**: ~2x peak usage during eye processing (temporary)
|
|
- **Storage**: Temporary left/right eye videos (~1.5x original size)
|
|
- **Compute**: Potentially faster overall due to smaller frame processing
|
|
|
|
### 9. Backward Compatibility
|
|
|
|
#### Default Behavior:
|
|
- `separate_eye_processing: false` by default
|
|
- Existing configurations work unchanged
|
|
- All current functionality preserved
|
|
|
|
#### Migration Path:
|
|
- Users can gradually test new mode on problematic segments
|
|
- Configuration flag allows easy A/B testing
|
|
- Existing debug outputs remain functional
|
|
|
|
### 10. Error Handling and Fallbacks
|
|
|
|
#### Robust Error Recovery:
|
|
- If eye splitting fails → fall back to original processing
|
|
- If single eye SAM2 fails → use greenscreen for that eye
|
|
- If both eyes fail → full greenscreen segment
|
|
- Comprehensive logging of all fallback triggers
|
|
|
|
#### Quality Validation:
|
|
- Verify combined masks have reasonable pixel counts
|
|
- Check for mask alignment issues between eyes
|
|
- Validate segment completeness before marking done
|
|
|
|
## Implementation Priority
|
|
|
|
### Phase 1 (Core Functionality)
|
|
1. Configuration schema updates
|
|
2. Basic eye splitting and recombining logic
|
|
3. Modified SAM2 processor with separate eye support
|
|
4. Greenscreen fallback implementation
|
|
|
|
### Phase 2 (Integration)
|
|
1. Main pipeline integration with new processing mode
|
|
2. YOLO detector eye-specific enhancements
|
|
3. Mask processor updates for combined masks
|
|
4. Basic error handling and fallbacks
|
|
|
|
### Phase 3 (Polish)
|
|
1. Performance optimizations (parallel processing)
|
|
2. Enhanced debug outputs and logging
|
|
3. Comprehensive testing and validation
|
|
4. Documentation updates
|
|
|
|
## Expected Benefits
|
|
|
|
### Tracking Improvements:
|
|
- **Eliminated Parallax Confusion**: SAM2 processes single viewpoint per eye
|
|
- **Better Object Consistency**: Single object tracking per eye view
|
|
- **Improved Temporal Coherence**: Less cross-eye interference
|
|
- **Reduced False Positives**: Eye-specific context for tracking
|
|
|
|
### Operational Benefits:
|
|
- **Graceful Degradation**: Full greenscreen when humans not detected
|
|
- **Flexible Processing**: Can enable/disable per pipeline
|
|
- **Better Debug Visibility**: Eye-specific debug outputs
|
|
- **Performance Scalability**: Smaller frames = faster processing per eye
|
|
|
|
This plan maintains full backward compatibility while adding the requested separate eye processing capability with robust fallback mechanisms. |