Compare commits
8 Commits
ed08ef2b4b
...
sbs
| Author | SHA1 | Date | |
|---|---|---|---|
| b97a3752a7 | |||
| 0057017ac4 | |||
| 70044e1b10 | |||
| 6617acb1c9 | |||
| 02ad4d87d2 | |||
| 97f12c79a4 | |||
| cd7bc54efe | |||
| 46363a8a11 |
58
README.md
58
README.md
@@ -32,19 +32,40 @@ git clone <repository-url>
|
|||||||
cd samyolo_on_segments
|
cd samyolo_on_segments
|
||||||
|
|
||||||
# Install Python dependencies
|
# Install Python dependencies
|
||||||
pip install -r requirements.txt
|
uv venv && source .venv/bin/activate
|
||||||
|
uv pip install -r requirements.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
### Model Dependencies
|
### Download Models
|
||||||
|
|
||||||
You'll need to download the required model checkpoints:
|
Use the provided script to automatically download all required models:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Download SAM2.1 and YOLO models
|
||||||
|
python download_models.py
|
||||||
|
```
|
||||||
|
|
||||||
|
This script will:
|
||||||
|
- Create a `models/` directory structure
|
||||||
|
- Download SAM2.1 configs and checkpoints (tiny, small, base+, large)
|
||||||
|
- Download common YOLO models (yolov8n, yolov8s, yolov8m)
|
||||||
|
- Update `config.yaml` to use local model paths
|
||||||
|
|
||||||
|
**Manual Download (Alternative):**
|
||||||
1. **SAM2 Models**: Download from [Meta's SAM2 repository](https://github.com/facebookresearch/sam2)
|
1. **SAM2 Models**: Download from [Meta's SAM2 repository](https://github.com/facebookresearch/sam2)
|
||||||
2. **YOLO Models**: YOLOv8 models will be downloaded automatically or you can specify a custom path
|
2. **YOLO Models**: YOLOv8 models will be downloaded automatically on first use
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
### 1. Configure the Pipeline
|
### 1. Download Models
|
||||||
|
|
||||||
|
First, download the required SAM2.1 and YOLO models:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python download_models.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Configure the Pipeline
|
||||||
|
|
||||||
Edit `config.yaml` to specify your input video and desired settings:
|
Edit `config.yaml` to specify your input video and desired settings:
|
||||||
|
|
||||||
@@ -63,18 +84,18 @@ processing:
|
|||||||
detect_segments: "all"
|
detect_segments: "all"
|
||||||
|
|
||||||
models:
|
models:
|
||||||
yolo_model: "yolov8n.pt"
|
yolo_model: "models/yolo/yolov8n.pt"
|
||||||
sam2_checkpoint: "../checkpoints/sam2.1_hiera_large.pt"
|
sam2_checkpoint: "models/sam2/checkpoints/sam2.1_hiera_large.pt"
|
||||||
sam2_config: "configs/sam2.1/sam2.1_hiera_l.yaml"
|
sam2_config: "models/sam2/configs/sam2.1/sam2.1_hiera_l.yaml"
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Run the Pipeline
|
### 3. Run the Pipeline
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python main.py --config config.yaml
|
python main.py --config config.yaml
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. Monitor Progress
|
### 4. Monitor Progress
|
||||||
|
|
||||||
Check processing status:
|
Check processing status:
|
||||||
```bash
|
```bash
|
||||||
@@ -166,8 +187,25 @@ samyolo_on_segments/
|
|||||||
├── README.md # This documentation
|
├── README.md # This documentation
|
||||||
├── config.yaml # Default configuration
|
├── config.yaml # Default configuration
|
||||||
├── main.py # Main entry point
|
├── main.py # Main entry point
|
||||||
|
├── download_models.py # Model download script
|
||||||
├── requirements.txt # Python dependencies
|
├── requirements.txt # Python dependencies
|
||||||
├── spec.md # Detailed specification
|
├── spec.md # Detailed specification
|
||||||
|
├── models/ # Downloaded models (created by script)
|
||||||
|
│ ├── sam2/
|
||||||
|
│ │ ├── configs/sam2.1/ # SAM2.1 configuration files
|
||||||
|
│ │ │ ├── sam2.1_hiera_t.yaml
|
||||||
|
│ │ │ ├── sam2.1_hiera_s.yaml
|
||||||
|
│ │ │ ├── sam2.1_hiera_b+.yaml
|
||||||
|
│ │ │ └── sam2.1_hiera_l.yaml
|
||||||
|
│ │ └── checkpoints/ # SAM2.1 model weights
|
||||||
|
│ │ ├── sam2.1_hiera_tiny.pt
|
||||||
|
│ │ ├── sam2.1_hiera_small.pt
|
||||||
|
│ │ ├── sam2.1_hiera_base_plus.pt
|
||||||
|
│ │ └── sam2.1_hiera_large.pt
|
||||||
|
│ └── yolo/ # YOLO model weights
|
||||||
|
│ ├── yolov8n.pt
|
||||||
|
│ ├── yolov8s.pt
|
||||||
|
│ └── yolov8m.pt
|
||||||
├── core/ # Core processing modules
|
├── core/ # Core processing modules
|
||||||
│ ├── __init__.py
|
│ ├── __init__.py
|
||||||
│ ├── config_loader.py # Configuration management
|
│ ├── config_loader.py # Configuration management
|
||||||
|
|||||||
230
claude.md
Normal file
230
claude.md
Normal file
@@ -0,0 +1,230 @@
|
|||||||
|
# YOLO + SAM2 VR180 Video Processing Pipeline - LLM Guide
|
||||||
|
|
||||||
|
## Project Overview
|
||||||
|
|
||||||
|
This repository implements an automated video processing pipeline specifically designed for **VR180 side-by-side stereo videos**. The system detects and segments humans in video content, replacing backgrounds with green screen for post-production compositing. The pipeline is optimized for long VR videos by splitting them into manageable segments, processing each segment independently, and then reassembling the final output.
|
||||||
|
|
||||||
|
## Core Purpose
|
||||||
|
|
||||||
|
The primary goal is to automatically create green screen videos from VR180 content where:
|
||||||
|
- **Left eye view** (left half of frame) contains humans as Object 1 (green masks)
|
||||||
|
- **Right eye view** (right half of frame) contains humans as Object 2 (blue masks)
|
||||||
|
- Background is replaced with pure green (RGB: 0,255,0) for chroma keying
|
||||||
|
- Original audio is preserved throughout the process
|
||||||
|
- Processing handles videos of any length through segmentation
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
### Pipeline Stages
|
||||||
|
|
||||||
|
1. **Video Segmentation** (`core/video_splitter.py`)
|
||||||
|
- Splits long videos into 5-second segments using FFmpeg
|
||||||
|
- Creates organized directory structure: `segment_0/`, `segment_1/`, etc.
|
||||||
|
- Preserves timestamps and forces keyframes for clean cuts
|
||||||
|
|
||||||
|
2. **Human Detection** (`core/yolo_detector.py`)
|
||||||
|
- Uses YOLOv8 for robust human detection in VR180 format
|
||||||
|
- Supports both detection mode (bounding boxes) and segmentation mode (direct masks)
|
||||||
|
- Automatically assigns humans to left/right eye based on position in frame
|
||||||
|
- Saves detection results for reuse and debugging
|
||||||
|
|
||||||
|
3. **Mask Generation** (`core/sam2_processor.py`)
|
||||||
|
- Uses Meta's SAM2 (Segment Anything Model 2) for precise segmentation
|
||||||
|
- Propagates masks across all frames in each segment
|
||||||
|
- Supports mask continuity between segments using previous segment's final masks
|
||||||
|
- Handles VR180 stereo tracking with separate object IDs for each eye
|
||||||
|
|
||||||
|
4. **Green Screen Processing** (`core/mask_processor.py`)
|
||||||
|
- Applies generated masks to isolate humans
|
||||||
|
- Replaces background with green screen
|
||||||
|
- Uses GPU acceleration (CuPy) for fast processing
|
||||||
|
- Maintains original video quality and framerate
|
||||||
|
|
||||||
|
5. **Video Assembly** (`core/video_assembler.py`)
|
||||||
|
- Concatenates all processed segments into final video
|
||||||
|
- Preserves original audio track from input video
|
||||||
|
- Uses hardware encoding (NVENC) when available
|
||||||
|
|
||||||
|
### Key Components
|
||||||
|
|
||||||
|
```
|
||||||
|
samyolo_on_segments/
|
||||||
|
├── main.py # Entry point - orchestrates the pipeline
|
||||||
|
├── config.yaml # Configuration file (YAML format)
|
||||||
|
├── core/ # Core processing modules
|
||||||
|
│ ├── config_loader.py # Configuration management
|
||||||
|
│ ├── video_splitter.py # FFmpeg-based video segmentation
|
||||||
|
│ ├── yolo_detector.py # YOLO human detection (detection/segmentation modes)
|
||||||
|
│ ├── sam2_processor.py # SAM2 mask generation and propagation
|
||||||
|
│ ├── mask_processor.py # Green screen application
|
||||||
|
│ └── video_assembler.py # Final video concatenation
|
||||||
|
├── utils/ # Utility functions
|
||||||
|
│ ├── file_utils.py # File system operations
|
||||||
|
│ ├── logging_utils.py # Logging configuration
|
||||||
|
│ └── status_utils.py # Progress monitoring
|
||||||
|
└── models/ # Model storage (created by download_models.py)
|
||||||
|
├── sam2/ # SAM2 checkpoints and configs
|
||||||
|
└── yolo/ # YOLO model weights
|
||||||
|
```
|
||||||
|
|
||||||
|
## VR180 Specific Features
|
||||||
|
|
||||||
|
### Stereo Video Handling
|
||||||
|
- Automatically detects humans in left and right eye views
|
||||||
|
- Assigns Object ID 1 to left eye humans (green masks)
|
||||||
|
- Assigns Object ID 2 to right eye humans (blue masks)
|
||||||
|
- Maintains stereo correspondence throughout segments
|
||||||
|
|
||||||
|
### Frame Division Logic
|
||||||
|
- Frame width is divided in half to separate left/right views
|
||||||
|
- Human detection centers are used to determine eye assignment
|
||||||
|
- If only one human is detected, it may be duplicated to both eyes (configurable)
|
||||||
|
|
||||||
|
## Configuration System
|
||||||
|
|
||||||
|
The pipeline is controlled via `config.yaml` with these key sections:
|
||||||
|
|
||||||
|
### Essential Settings
|
||||||
|
```yaml
|
||||||
|
input:
|
||||||
|
video_path: "/path/to/vr180_video.mp4"
|
||||||
|
|
||||||
|
output:
|
||||||
|
directory: "/path/to/output/"
|
||||||
|
filename: "greenscreen_output.mp4"
|
||||||
|
|
||||||
|
processing:
|
||||||
|
segment_duration: 5 # Seconds per segment
|
||||||
|
inference_scale: 0.5 # Scale for faster processing
|
||||||
|
yolo_confidence: 0.6 # Detection threshold
|
||||||
|
detect_segments: "all" # Which segments to process
|
||||||
|
|
||||||
|
models:
|
||||||
|
yolo_model: "models/yolo/yolov8n.pt"
|
||||||
|
sam2_checkpoint: "models/sam2/checkpoints/sam2.1_hiera_large.pt"
|
||||||
|
sam2_config: "models/sam2/configs/sam2.1/sam2.1_hiera_l.yaml"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Advanced Options
|
||||||
|
- **YOLO Modes**: Switch between detection (bboxes) and segmentation (direct masks)
|
||||||
|
- **Mid-segment Detection**: Re-detect humans at intervals within segments
|
||||||
|
- **Mask Quality**: Temporal smoothing, morphological operations, edge refinement
|
||||||
|
- **Debug Outputs**: Save detection visualizations and first-frame masks
|
||||||
|
|
||||||
|
## Processing Flow
|
||||||
|
|
||||||
|
### For First Segment (segment_0):
|
||||||
|
1. Load first frame at inference scale
|
||||||
|
2. Run YOLO to detect humans
|
||||||
|
3. Convert detections to SAM2 prompts (or use YOLO masks directly)
|
||||||
|
4. Initialize SAM2 with prompts/masks
|
||||||
|
5. Propagate masks through all frames
|
||||||
|
6. Apply green screen and save output
|
||||||
|
7. Save final mask for next segment
|
||||||
|
|
||||||
|
### For Subsequent Segments:
|
||||||
|
1. Check if YOLO detection is requested for this segment
|
||||||
|
2. If yes: Use YOLO detection (same as first segment)
|
||||||
|
3. If no: Load previous segment's final mask
|
||||||
|
4. Initialize SAM2 with previous masks
|
||||||
|
5. Continue propagation through segment
|
||||||
|
6. Apply green screen and save output
|
||||||
|
|
||||||
|
### Fallback Logic:
|
||||||
|
- If no previous mask exists, searches backwards through segments
|
||||||
|
- First segment always requires YOLO detection
|
||||||
|
- Missing detections can be recovered in later segments
|
||||||
|
|
||||||
|
## Model Support
|
||||||
|
|
||||||
|
### YOLO Models
|
||||||
|
- **Detection**: yolov8n.pt, yolov8s.pt, yolov8m.pt (bounding boxes only)
|
||||||
|
- **Segmentation**: yolov8n-seg.pt, yolov8s-seg.pt (direct mask output)
|
||||||
|
|
||||||
|
### SAM2 Models
|
||||||
|
- **Tiny**: sam2.1_hiera_tiny.pt (fastest, lowest quality)
|
||||||
|
- **Small**: sam2.1_hiera_small.pt
|
||||||
|
- **Base+**: sam2.1_hiera_base_plus.pt
|
||||||
|
- **Large**: sam2.1_hiera_large.pt (best quality, slowest)
|
||||||
|
|
||||||
|
## Key Implementation Details
|
||||||
|
|
||||||
|
### GPU Optimization
|
||||||
|
- CUDA device selection with MPS fallback
|
||||||
|
- CuPy for GPU-accelerated mask operations
|
||||||
|
- NVENC hardware encoding support
|
||||||
|
- Batch processing where possible
|
||||||
|
|
||||||
|
### Memory Management
|
||||||
|
- Segments processed sequentially to limit memory usage
|
||||||
|
- Explicit garbage collection between segments
|
||||||
|
- Low-resolution inference with high-resolution rendering
|
||||||
|
- Configurable scale factors for different stages
|
||||||
|
|
||||||
|
### Error Handling
|
||||||
|
- Graceful fallback when masks are unavailable
|
||||||
|
- Segment-level recovery (can restart individual segments)
|
||||||
|
- Comprehensive logging at all stages
|
||||||
|
- Status checking and cleanup utilities
|
||||||
|
|
||||||
|
## Debugging Features
|
||||||
|
|
||||||
|
### Status Monitoring
|
||||||
|
```bash
|
||||||
|
python main.py --config config.yaml --status
|
||||||
|
```
|
||||||
|
|
||||||
|
### Segment Cleanup
|
||||||
|
```bash
|
||||||
|
python main.py --config config.yaml --cleanup-segment 5
|
||||||
|
```
|
||||||
|
|
||||||
|
### Debug Outputs
|
||||||
|
- `yolo_debug.jpg`: Bounding box visualizations
|
||||||
|
- `first_frame_detection.jpg`: Initial mask visualization
|
||||||
|
- `mask.png`: Final segment mask for continuity
|
||||||
|
- `yolo_detections`: Saved detection coordinates
|
||||||
|
|
||||||
|
## Common Issues and Solutions
|
||||||
|
|
||||||
|
### No Right Eye Detections in VR180
|
||||||
|
- Lower `yolo_confidence` threshold (try 0.3-0.4)
|
||||||
|
- Enable debug mode to analyze detection confidence
|
||||||
|
- Check if person is actually visible in right eye view
|
||||||
|
|
||||||
|
### Mask Propagation Failures
|
||||||
|
- Ensure first segment has successful YOLO detections
|
||||||
|
- Check previous segment's mask.png exists
|
||||||
|
- Consider re-running YOLO on problem segments
|
||||||
|
|
||||||
|
### Memory Issues
|
||||||
|
- Reduce `inference_scale` (try 0.25)
|
||||||
|
- Use smaller models (tiny/small variants)
|
||||||
|
- Process fewer segments at once
|
||||||
|
|
||||||
|
## Development Notes
|
||||||
|
|
||||||
|
### Adding Features
|
||||||
|
- All core modules inherit from base classes in `core/`
|
||||||
|
- Configuration is centralized through `ConfigLoader`
|
||||||
|
- Logging uses Python's standard logging module
|
||||||
|
- File operations go through `utils/file_utils.py`
|
||||||
|
|
||||||
|
### Testing Components
|
||||||
|
- Each module can be tested independently
|
||||||
|
- Use `--status` flag to check processing state
|
||||||
|
- Debug outputs help verify each stage
|
||||||
|
|
||||||
|
### Performance Tuning
|
||||||
|
- Adjust `inference_scale` for speed vs quality
|
||||||
|
- Use `detect_segments` to process only key frames
|
||||||
|
- Enable `use_nvenc` for hardware encoding
|
||||||
|
- Consider `vos_optimized` mode for SAM2 (experimental)
|
||||||
|
|
||||||
|
## Original Monolithic Script
|
||||||
|
|
||||||
|
The project includes the original working script in `spec.md` (lines 200-811) as a reference implementation. This script works but processes videos monolithically. The current modular architecture maintains the same core logic while adding:
|
||||||
|
- Better error handling and recovery
|
||||||
|
- Configurable processing pipeline
|
||||||
|
- Debug and monitoring capabilities
|
||||||
|
- Cleaner code organization
|
||||||
118
config.yaml
118
config.yaml
@@ -1,59 +1,137 @@
|
|||||||
# YOLO + SAM2 Video Processing Configuration
|
# YOLO + SAM2 Video Processing Configuration
|
||||||
|
# This file serves as a complete reference for all available settings.
|
||||||
|
|
||||||
input:
|
input:
|
||||||
|
# Full path to the input video file.
|
||||||
video_path: "/path/to/input/video.mp4"
|
video_path: "/path/to/input/video.mp4"
|
||||||
|
|
||||||
output:
|
output:
|
||||||
|
# Directory where all output files and segments will be stored.
|
||||||
directory: "/path/to/output/"
|
directory: "/path/to/output/"
|
||||||
|
# Filename for the final assembled video.
|
||||||
filename: "processed_video.mp4"
|
filename: "processed_video.mp4"
|
||||||
|
|
||||||
processing:
|
processing:
|
||||||
# Duration of each video segment in seconds
|
# Duration of each video segment in seconds. Shorter segments use less memory.
|
||||||
segment_duration: 5
|
segment_duration: 5
|
||||||
|
|
||||||
# Scale factor for SAM2 inference (0.5 = half resolution)
|
# Scale factor for SAM2 inference (e.g., 0.5 = half resolution).
|
||||||
|
# Lower values are faster but may reduce mask quality.
|
||||||
inference_scale: 0.5
|
inference_scale: 0.5
|
||||||
|
|
||||||
# YOLO detection confidence threshold
|
# YOLO detection confidence threshold (0.0 to 1.0).
|
||||||
yolo_confidence: 0.6
|
yolo_confidence: 0.6
|
||||||
|
|
||||||
# Which segments to run YOLO detection on
|
# Which segments to run YOLO detection on.
|
||||||
# Options: "all", [0, 5, 10], or [] for default (all)
|
# Options: "all", a list of specific segment indices (e.g., [0, 10, 20]), or [] for default ("all").
|
||||||
detect_segments: "all"
|
detect_segments: "all"
|
||||||
|
|
||||||
models:
|
# --- VR180 Stereo Processing ---
|
||||||
# YOLO model path - can be pretrained (yolov8n.pt) or custom path
|
# Enables special logic for VR180 SBS video. When false, video is treated as a single view.
|
||||||
yolo_model: "yolov8n.pt"
|
separate_eye_processing: false
|
||||||
|
|
||||||
# SAM2 model configuration
|
# Threshold for stereo mask agreement (Intersection over Union).
|
||||||
sam2_checkpoint: "../checkpoints/sam2.1_hiera_large.pt"
|
# A value of 0.5 means masks must overlap by 50% to be considered a pair.
|
||||||
sam2_config: "configs/sam2.1/sam2.1_hiera_l.yaml"
|
stereo_iou_threshold: 0.5
|
||||||
|
|
||||||
|
# Factor to reduce YOLO confidence by if no stereo pairs are found on the first try (e.g., 0.8 = 20% reduction).
|
||||||
|
confidence_reduction_factor: 0.8
|
||||||
|
|
||||||
|
# If no humans are detected in a segment, create a full green screen video.
|
||||||
|
# Only used when separate_eye_processing is true.
|
||||||
|
enable_greenscreen_fallback: true
|
||||||
|
|
||||||
|
# Pixel overlap between left/right eyes for smoother blending at the center seam.
|
||||||
|
eye_overlap_pixels: 0
|
||||||
|
|
||||||
|
models:
|
||||||
|
# YOLO mode: "detection" (for bounding boxes) or "segmentation" (for direct masks).
|
||||||
|
# "segmentation" is generally recommended as it provides initial masks to SAM2.
|
||||||
|
yolo_mode: "segmentation"
|
||||||
|
|
||||||
|
# Path to the YOLO model for "detection" mode.
|
||||||
|
yolo_detection_model: "models/yolo/yolo11l.pt"
|
||||||
|
# Path to the YOLO model for "segmentation" mode.
|
||||||
|
yolo_segmentation_model: "models/yolo/yolo11x-seg.pt"
|
||||||
|
|
||||||
|
# --- SAM2 Model Configuration ---
|
||||||
|
sam2_checkpoint: "models/sam2/checkpoints/sam2.1_hiera_small.pt"
|
||||||
|
sam2_config: "models/sam2/configs/sam2.1/sam2.1_hiera_s.yaml"
|
||||||
|
# (Experimental) Use optimized VOS predictor for a significant speedup. Requires PyTorch 2.5.1+.
|
||||||
|
sam2_vos_optimized: false
|
||||||
|
|
||||||
video:
|
video:
|
||||||
# Use NVIDIA hardware encoding (requires NVENC-capable GPU)
|
# Use NVIDIA's NVENC for hardware-accelerated video encoding.
|
||||||
use_nvenc: true
|
use_nvenc: true
|
||||||
|
|
||||||
# Output video bitrate
|
# Bitrate for the output video (e.g., "25M", "50M").
|
||||||
output_bitrate: "50M"
|
output_bitrate: "50M"
|
||||||
|
|
||||||
# Preserve original audio track
|
# If true, the audio track from the input video will be copied to the final output.
|
||||||
preserve_audio: true
|
preserve_audio: true
|
||||||
|
|
||||||
# Force keyframes for better segment boundaries
|
# Force keyframes at the start of each segment for clean cuts. Recommended to keep true.
|
||||||
force_keyframes: true
|
force_keyframes: true
|
||||||
|
|
||||||
advanced:
|
advanced:
|
||||||
# Green screen color (RGB values)
|
# RGB color for the green screen background.
|
||||||
green_color: [0, 255, 0]
|
green_color: [0, 255, 0]
|
||||||
|
|
||||||
# Blue screen color for second object (RGB values)
|
# RGB color for the second object's mask (typically the right eye in VR180).
|
||||||
blue_color: [255, 0, 0]
|
blue_color: [255, 0, 0]
|
||||||
|
|
||||||
# YOLO human class ID (0 for COCO person class)
|
# The class ID for humans in the YOLO model (COCO default is 0 for "person").
|
||||||
human_class_id: 0
|
human_class_id: 0
|
||||||
|
|
||||||
# GPU memory management
|
# If true, deletes intermediate files like segment videos after processing.
|
||||||
cleanup_intermediate_files: true
|
cleanup_intermediate_files: true
|
||||||
|
|
||||||
# Logging level (DEBUG, INFO, WARNING, ERROR)
|
# Logging level: DEBUG, INFO, WARNING, ERROR.
|
||||||
log_level: "INFO"
|
log_level: "INFO"
|
||||||
|
|
||||||
|
# If true, saves debug images for YOLO detections.
|
||||||
|
save_yolo_debug_frames: true
|
||||||
|
|
||||||
|
# --- Mid-Segment Re-detection ---
|
||||||
|
# Re-run YOLO at intervals within a segment to correct tracking drift.
|
||||||
|
enable_mid_segment_detection: false
|
||||||
|
redetection_interval: 30 # Frames between re-detections.
|
||||||
|
max_redetections_per_segment: 10
|
||||||
|
|
||||||
|
# --- Parallel Processing Optimizations ---
|
||||||
|
# (Experimental) Generate low-res videos for upcoming segments in the background.
|
||||||
|
enable_background_lowres_generation: false
|
||||||
|
max_concurrent_lowres: 2 # Max parallel FFmpeg processes.
|
||||||
|
lowres_segments_ahead: 2 # How many segments to prepare in advance.
|
||||||
|
use_ffmpeg_lowres: true # Use FFmpeg (faster) instead of OpenCV for low-res creation.
|
||||||
|
|
||||||
|
# --- Mask Quality Enhancement Settings ---
|
||||||
|
# These settings allow fine-tuning of the final mask appearance.
|
||||||
|
# Enabling these may increase processing time.
|
||||||
|
mask_processing:
|
||||||
|
# Edge feathering and blurring for smoother transitions.
|
||||||
|
enable_edge_blur: true
|
||||||
|
edge_blur_radius: 3
|
||||||
|
edge_blur_sigma: 0.5
|
||||||
|
|
||||||
|
# Temporal smoothing to reduce mask flickering between frames.
|
||||||
|
enable_temporal_smoothing: false
|
||||||
|
temporal_blend_weight: 0.2
|
||||||
|
temporal_history_frames: 2
|
||||||
|
|
||||||
|
# Clean up small noise and holes in the mask.
|
||||||
|
# Generally not needed when using SAM2, as its masks are high quality.
|
||||||
|
enable_morphological_cleaning: false
|
||||||
|
morphology_kernel_size: 5
|
||||||
|
min_component_size: 500
|
||||||
|
|
||||||
|
# Method for blending the mask edge with the background.
|
||||||
|
# Options: "linear" (fastest), "gaussian", "sigmoid".
|
||||||
|
alpha_blending_mode: "linear"
|
||||||
|
alpha_transition_width: 1
|
||||||
|
|
||||||
|
# Advanced edge-preserving smoothing filter. Slower but can produce higher quality edges.
|
||||||
|
enable_bilateral_filter: false
|
||||||
|
bilateral_d: 9
|
||||||
|
bilateral_sigma_color: 75
|
||||||
|
bilateral_sigma_space: 75
|
||||||
|
|||||||
@@ -1,2 +1,4 @@
|
|||||||
# YOLO + SAM2 Video Processing Pipeline
|
# YOLO + SAM2 Video Processing Pipeline
|
||||||
# Core modules for video processing with human detection and segmentation
|
# Core modules for video processing with human detection and segmentation
|
||||||
|
|
||||||
|
from .eye_processor import EyeProcessor
|
||||||
337
core/async_lowres_preprocessor.py
Normal file
337
core/async_lowres_preprocessor.py
Normal file
@@ -0,0 +1,337 @@
|
|||||||
|
"""
|
||||||
|
Async low-resolution video preprocessor for parallel processing optimization.
|
||||||
|
Creates low-resolution videos in background while main pipeline processes other segments.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import asyncio
|
||||||
|
import subprocess
|
||||||
|
import logging
|
||||||
|
import threading
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import List, Dict, Any, Optional
|
||||||
|
from concurrent.futures import ThreadPoolExecutor
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
class AsyncLowResPreprocessor:
|
||||||
|
"""
|
||||||
|
Handles async pre-generation of low-resolution videos for SAM2 inference.
|
||||||
|
Uses FFmpeg subprocesses to bypass Python GIL limitations.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, max_concurrent: int = 3, segments_ahead: int = 3, use_ffmpeg: bool = True):
|
||||||
|
"""
|
||||||
|
Initialize async preprocessor.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
max_concurrent: Maximum number of concurrent FFmpeg processes
|
||||||
|
segments_ahead: How many segments to prepare in advance
|
||||||
|
use_ffmpeg: Use FFmpeg instead of OpenCV for better performance
|
||||||
|
"""
|
||||||
|
self.max_concurrent = max_concurrent
|
||||||
|
self.segments_ahead = segments_ahead
|
||||||
|
self.use_ffmpeg = use_ffmpeg
|
||||||
|
self.preparation_tasks = {} # segment_idx -> threading.Thread
|
||||||
|
self.completed_segments = set() # Track completed preparations
|
||||||
|
self.active_threads = [] # Track active background threads
|
||||||
|
|
||||||
|
logger.info(f"AsyncLowResPreprocessor initialized: max_concurrent={max_concurrent}, "
|
||||||
|
f"segments_ahead={segments_ahead}, use_ffmpeg={use_ffmpeg}")
|
||||||
|
|
||||||
|
async def create_lowres_ffmpeg(self, input_path: str, output_path: str, scale: float, semaphore: asyncio.Semaphore) -> bool:
|
||||||
|
"""
|
||||||
|
Create low-resolution video using FFmpeg (bypasses Python GIL).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
input_path: Path to input video
|
||||||
|
output_path: Path to output low-res video
|
||||||
|
scale: Scale factor for resolution reduction
|
||||||
|
semaphore: Asyncio semaphore for limiting concurrent processes
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful
|
||||||
|
"""
|
||||||
|
async with semaphore: # Limit concurrent FFmpeg processes
|
||||||
|
try:
|
||||||
|
# Ensure output directory exists
|
||||||
|
os.makedirs(os.path.dirname(output_path), exist_ok=True)
|
||||||
|
|
||||||
|
# FFmpeg command for fast low-res video creation
|
||||||
|
cmd = [
|
||||||
|
'ffmpeg', '-y', # Overwrite output
|
||||||
|
'-i', input_path,
|
||||||
|
'-vf', f'scale=iw*{scale}:ih*{scale}',
|
||||||
|
'-c:v', 'libx264',
|
||||||
|
'-preset', 'ultrafast', # Fastest encoding
|
||||||
|
'-crf', '28', # Lower quality OK for inference
|
||||||
|
'-an', # No audio needed for inference
|
||||||
|
output_path
|
||||||
|
]
|
||||||
|
|
||||||
|
logger.debug(f"Starting FFmpeg low-res creation: {os.path.basename(input_path)} -> {os.path.basename(output_path)}")
|
||||||
|
|
||||||
|
# Run FFmpeg asynchronously
|
||||||
|
proc = await asyncio.create_subprocess_exec(
|
||||||
|
*cmd,
|
||||||
|
stdout=asyncio.subprocess.DEVNULL,
|
||||||
|
stderr=asyncio.subprocess.PIPE
|
||||||
|
)
|
||||||
|
|
||||||
|
stdout, stderr = await proc.wait(), await proc.communicate()
|
||||||
|
|
||||||
|
if proc.returncode != 0:
|
||||||
|
stderr_text = stderr[1].decode() if stderr and len(stderr) > 1 else "Unknown error"
|
||||||
|
logger.error(f"FFmpeg failed for {input_path}: {stderr_text}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Verify output file was created
|
||||||
|
if not os.path.exists(output_path) or os.path.getsize(output_path) == 0:
|
||||||
|
logger.error(f"FFmpeg output file missing or empty: {output_path}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
logger.debug(f"FFmpeg low-res creation completed: {os.path.basename(output_path)}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error in FFmpeg low-res creation for {input_path}: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def create_lowres_opencv(self, input_path: str, output_path: str, scale: float) -> bool:
|
||||||
|
"""
|
||||||
|
Fallback: Create low-resolution video using OpenCV (blocking operation).
|
||||||
|
Used when FFmpeg is not available or fails.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
input_path: Path to input video
|
||||||
|
output_path: Path to output low-res video
|
||||||
|
scale: Scale factor for resolution reduction
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
import cv2
|
||||||
|
|
||||||
|
logger.debug(f"Creating low-res video with OpenCV: {os.path.basename(input_path)}")
|
||||||
|
|
||||||
|
cap = cv2.VideoCapture(input_path)
|
||||||
|
if not cap.isOpened():
|
||||||
|
logger.error(f"Could not open video with OpenCV: {input_path}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Get video properties
|
||||||
|
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH) * scale)
|
||||||
|
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT) * scale)
|
||||||
|
fps = cap.get(cv2.CAP_PROP_FPS) or 30.0
|
||||||
|
|
||||||
|
# Ensure output directory exists
|
||||||
|
os.makedirs(os.path.dirname(output_path), exist_ok=True)
|
||||||
|
|
||||||
|
# Create video writer
|
||||||
|
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
|
||||||
|
out = cv2.VideoWriter(output_path, fourcc, fps, (frame_width, frame_height))
|
||||||
|
|
||||||
|
if not out.isOpened():
|
||||||
|
logger.error(f"Could not create video writer for: {output_path}")
|
||||||
|
cap.release()
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Process frames
|
||||||
|
frame_count = 0
|
||||||
|
while True:
|
||||||
|
ret, frame = cap.read()
|
||||||
|
if not ret:
|
||||||
|
break
|
||||||
|
|
||||||
|
# Resize frame
|
||||||
|
low_res_frame = cv2.resize(frame, (frame_width, frame_height),
|
||||||
|
interpolation=cv2.INTER_LINEAR)
|
||||||
|
out.write(low_res_frame)
|
||||||
|
frame_count += 1
|
||||||
|
|
||||||
|
# Cleanup
|
||||||
|
cap.release()
|
||||||
|
out.release()
|
||||||
|
|
||||||
|
logger.debug(f"OpenCV low-res creation completed: {frame_count} frames -> {os.path.basename(output_path)}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error in OpenCV low-res creation for {input_path}: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
async def create_lowres_video_async(self, input_path: str, output_path: str, scale: float, semaphore: asyncio.Semaphore) -> bool:
|
||||||
|
"""
|
||||||
|
Create low-resolution video using the configured method (FFmpeg or OpenCV).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
input_path: Path to input video
|
||||||
|
output_path: Path to output low-res video
|
||||||
|
scale: Scale factor for resolution reduction
|
||||||
|
semaphore: Asyncio semaphore for limiting concurrent processes
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful
|
||||||
|
"""
|
||||||
|
# Skip if already exists
|
||||||
|
if os.path.exists(output_path) and os.path.getsize(output_path) > 0:
|
||||||
|
logger.debug(f"Low-res video already exists: {os.path.basename(output_path)}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
if self.use_ffmpeg:
|
||||||
|
# Try FFmpeg first
|
||||||
|
success = await self.create_lowres_ffmpeg(input_path, output_path, scale, semaphore)
|
||||||
|
if success:
|
||||||
|
return True
|
||||||
|
|
||||||
|
logger.warning(f"FFmpeg failed for {input_path}, falling back to OpenCV")
|
||||||
|
|
||||||
|
# Fallback to OpenCV (run in thread pool to avoid blocking)
|
||||||
|
loop = asyncio.get_event_loop()
|
||||||
|
with ThreadPoolExecutor(max_workers=1) as executor:
|
||||||
|
success = await loop.run_in_executor(
|
||||||
|
executor, self.create_lowres_opencv, input_path, output_path, scale
|
||||||
|
)
|
||||||
|
|
||||||
|
return success
|
||||||
|
|
||||||
|
async def prepare_segment_lowres(self, segment_info: Dict[str, Any], scale: float,
|
||||||
|
separate_eye_processing: bool = False, semaphore: asyncio.Semaphore = None) -> bool:
|
||||||
|
"""
|
||||||
|
Prepare low-resolution videos for a segment (regular or eye-specific).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
segment_info: Segment information dictionary
|
||||||
|
scale: Scale factor for resolution reduction
|
||||||
|
separate_eye_processing: Whether to prepare eye-specific videos
|
||||||
|
semaphore: Asyncio semaphore for limiting concurrent processes
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if all videos were prepared successfully
|
||||||
|
"""
|
||||||
|
segment_idx = segment_info['index']
|
||||||
|
segment_dir = segment_info['directory']
|
||||||
|
|
||||||
|
try:
|
||||||
|
if separate_eye_processing:
|
||||||
|
# Prepare low-res videos for left and right eyes
|
||||||
|
success_left = success_right = True
|
||||||
|
|
||||||
|
left_eye_path = os.path.join(segment_dir, "left_eye.mp4")
|
||||||
|
right_eye_path = os.path.join(segment_dir, "right_eye.mp4")
|
||||||
|
|
||||||
|
if os.path.exists(left_eye_path):
|
||||||
|
lowres_left_path = os.path.join(segment_dir, "low_res_left_eye_video.mp4")
|
||||||
|
success_left = await self.create_lowres_video_async(left_eye_path, lowres_left_path, scale, semaphore)
|
||||||
|
|
||||||
|
if os.path.exists(right_eye_path):
|
||||||
|
lowres_right_path = os.path.join(segment_dir, "low_res_right_eye_video.mp4")
|
||||||
|
success_right = await self.create_lowres_video_async(right_eye_path, lowres_right_path, scale, semaphore)
|
||||||
|
|
||||||
|
success = success_left and success_right
|
||||||
|
if success:
|
||||||
|
logger.info(f"Pre-generated low-res eye videos for segment {segment_idx}")
|
||||||
|
else:
|
||||||
|
logger.warning(f"Failed to pre-generate some eye videos for segment {segment_idx}")
|
||||||
|
else:
|
||||||
|
# Prepare regular low-res video
|
||||||
|
input_path = segment_info['video_file']
|
||||||
|
lowres_path = os.path.join(segment_dir, "low_res_video.mp4")
|
||||||
|
|
||||||
|
success = await self.create_lowres_video_async(input_path, lowres_path, scale, semaphore)
|
||||||
|
if success:
|
||||||
|
logger.info(f"Pre-generated low-res video for segment {segment_idx}")
|
||||||
|
else:
|
||||||
|
logger.warning(f"Failed to pre-generate low-res video for segment {segment_idx}")
|
||||||
|
|
||||||
|
if success:
|
||||||
|
self.completed_segments.add(segment_idx)
|
||||||
|
|
||||||
|
return success
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error preparing low-res videos for segment {segment_idx}: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def start_background_preparation(self, segments_info: List[Dict[str, Any]], scale: float,
|
||||||
|
separate_eye_processing: bool = False, current_segment: int = 0):
|
||||||
|
"""
|
||||||
|
Start preparing upcoming segments in background using threads.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
segments_info: List of all segment information
|
||||||
|
scale: Scale factor for resolution reduction
|
||||||
|
separate_eye_processing: Whether to prepare eye-specific videos
|
||||||
|
current_segment: Index of currently processing segment
|
||||||
|
"""
|
||||||
|
def background_worker():
|
||||||
|
"""Background thread worker that prepares upcoming segments."""
|
||||||
|
try:
|
||||||
|
# Prepare segments ahead of current processing
|
||||||
|
start_idx = current_segment + 1
|
||||||
|
end_idx = min(len(segments_info), start_idx + self.segments_ahead)
|
||||||
|
|
||||||
|
segments_to_prepare = []
|
||||||
|
for i in range(start_idx, end_idx):
|
||||||
|
if i not in self.completed_segments and i not in self.preparation_tasks:
|
||||||
|
segments_to_prepare.append((i, segments_info[i]))
|
||||||
|
|
||||||
|
if segments_to_prepare:
|
||||||
|
logger.info(f"Starting background preparation for {len(segments_to_prepare)} segments (indices {start_idx}-{end_idx-1})")
|
||||||
|
|
||||||
|
# Run async work in new event loop
|
||||||
|
loop = asyncio.new_event_loop()
|
||||||
|
asyncio.set_event_loop(loop)
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Create semaphore in this event loop
|
||||||
|
semaphore = asyncio.Semaphore(self.max_concurrent)
|
||||||
|
|
||||||
|
tasks = []
|
||||||
|
for segment_idx, segment_info in segments_to_prepare:
|
||||||
|
task = self.prepare_segment_lowres(segment_info, scale, separate_eye_processing, semaphore)
|
||||||
|
tasks.append(task)
|
||||||
|
|
||||||
|
# Run all preparation tasks
|
||||||
|
results = loop.run_until_complete(asyncio.gather(*tasks, return_exceptions=True))
|
||||||
|
|
||||||
|
# Mark completed segments
|
||||||
|
for i, (segment_idx, _) in enumerate(segments_to_prepare):
|
||||||
|
if i < len(results) and results[i] is True:
|
||||||
|
self.completed_segments.add(segment_idx)
|
||||||
|
logger.debug(f"Background preparation completed for segment {segment_idx}")
|
||||||
|
|
||||||
|
finally:
|
||||||
|
loop.close()
|
||||||
|
else:
|
||||||
|
logger.debug(f"No segments need preparation (current: {current_segment})")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error in background preparation worker: {e}")
|
||||||
|
|
||||||
|
# Start background thread
|
||||||
|
thread = threading.Thread(target=background_worker, daemon=True)
|
||||||
|
thread.start()
|
||||||
|
self.active_threads.append(thread)
|
||||||
|
|
||||||
|
def is_segment_ready(self, segment_idx: int) -> bool:
|
||||||
|
"""
|
||||||
|
Check if low-res videos for a segment are ready.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
segment_idx: Index of segment to check
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if segment is ready
|
||||||
|
"""
|
||||||
|
return segment_idx in self.completed_segments
|
||||||
|
|
||||||
|
def cleanup(self):
|
||||||
|
"""Clean up any running threads."""
|
||||||
|
# Note: daemon threads will be cleaned up automatically when main process exits
|
||||||
|
# We just clear our tracking structures
|
||||||
|
self.active_threads.clear()
|
||||||
|
self.preparation_tasks.clear()
|
||||||
|
|
||||||
|
logger.debug("AsyncLowResPreprocessor cleanup completed")
|
||||||
@@ -50,11 +50,31 @@ class ConfigLoader:
|
|||||||
raise ValueError(f"Missing required field: output.{field}")
|
raise ValueError(f"Missing required field: output.{field}")
|
||||||
|
|
||||||
# Validate models section
|
# Validate models section
|
||||||
required_model_fields = ['yolo_model', 'sam2_checkpoint', 'sam2_config']
|
required_model_fields = ['sam2_checkpoint', 'sam2_config']
|
||||||
for field in required_model_fields:
|
for field in required_model_fields:
|
||||||
if field not in self.config['models']:
|
if field not in self.config['models']:
|
||||||
raise ValueError(f"Missing required field: models.{field}")
|
raise ValueError(f"Missing required field: models.{field}")
|
||||||
|
|
||||||
|
# Validate YOLO model configuration
|
||||||
|
yolo_mode = self.config['models'].get('yolo_mode', 'detection')
|
||||||
|
if yolo_mode not in ['detection', 'segmentation']:
|
||||||
|
raise ValueError(f"Invalid yolo_mode: {yolo_mode}. Must be 'detection' or 'segmentation'")
|
||||||
|
|
||||||
|
# Check for legacy yolo_model field vs new structure
|
||||||
|
has_legacy_yolo_model = 'yolo_model' in self.config['models']
|
||||||
|
has_new_yolo_models = 'yolo_detection_model' in self.config['models'] or 'yolo_segmentation_model' in self.config['models']
|
||||||
|
|
||||||
|
if not has_legacy_yolo_model and not has_new_yolo_models:
|
||||||
|
raise ValueError("Missing YOLO model configuration. Provide either 'yolo_model' (legacy) or 'yolo_detection_model'/'yolo_segmentation_model' (new)")
|
||||||
|
|
||||||
|
# Validate that the required model for the current mode exists
|
||||||
|
if yolo_mode == 'detection':
|
||||||
|
if has_new_yolo_models and 'yolo_detection_model' not in self.config['models']:
|
||||||
|
raise ValueError("yolo_mode is 'detection' but yolo_detection_model not specified")
|
||||||
|
elif yolo_mode == 'segmentation':
|
||||||
|
if has_new_yolo_models and 'yolo_segmentation_model' not in self.config['models']:
|
||||||
|
raise ValueError("yolo_mode is 'segmentation' but yolo_segmentation_model not specified")
|
||||||
|
|
||||||
# Validate processing.detect_segments format
|
# Validate processing.detect_segments format
|
||||||
detect_segments = self.config['processing'].get('detect_segments', 'all')
|
detect_segments = self.config['processing'].get('detect_segments', 'all')
|
||||||
if not isinstance(detect_segments, (str, list)):
|
if not isinstance(detect_segments, (str, list)):
|
||||||
@@ -114,9 +134,18 @@ class ConfigLoader:
|
|||||||
return self.config['processing'].get('detect_segments', 'all')
|
return self.config['processing'].get('detect_segments', 'all')
|
||||||
|
|
||||||
def get_yolo_model_path(self) -> str:
|
def get_yolo_model_path(self) -> str:
|
||||||
"""Get YOLO model path."""
|
"""Get YOLO model path (legacy method for backward compatibility)."""
|
||||||
|
# Check for legacy configuration first
|
||||||
|
if 'yolo_model' in self.config['models']:
|
||||||
return self.config['models']['yolo_model']
|
return self.config['models']['yolo_model']
|
||||||
|
|
||||||
|
# Use new configuration based on mode
|
||||||
|
yolo_mode = self.config['models'].get('yolo_mode', 'detection')
|
||||||
|
if yolo_mode == 'detection':
|
||||||
|
return self.config['models'].get('yolo_detection_model', 'yolov8n.pt')
|
||||||
|
else: # segmentation mode
|
||||||
|
return self.config['models'].get('yolo_segmentation_model', 'yolov8n-seg.pt')
|
||||||
|
|
||||||
def get_sam2_checkpoint(self) -> str:
|
def get_sam2_checkpoint(self) -> str:
|
||||||
"""Get SAM2 checkpoint path."""
|
"""Get SAM2 checkpoint path."""
|
||||||
return self.config['models']['sam2_checkpoint']
|
return self.config['models']['sam2_checkpoint']
|
||||||
@@ -156,3 +185,11 @@ class ConfigLoader:
|
|||||||
def should_cleanup_intermediate_files(self) -> bool:
|
def should_cleanup_intermediate_files(self) -> bool:
|
||||||
"""Get whether to cleanup intermediate files."""
|
"""Get whether to cleanup intermediate files."""
|
||||||
return self.config.get('advanced', {}).get('cleanup_intermediate_files', True)
|
return self.config.get('advanced', {}).get('cleanup_intermediate_files', True)
|
||||||
|
|
||||||
|
def get_stereo_iou_threshold(self) -> float:
|
||||||
|
"""Get the IOU threshold for stereo mask agreement."""
|
||||||
|
return self.config['processing'].get('stereo_iou_threshold', 0.5)
|
||||||
|
|
||||||
|
def get_confidence_reduction_factor(self) -> float:
|
||||||
|
"""Get the factor to reduce YOLO confidence by on retry."""
|
||||||
|
return self.config['processing'].get('confidence_reduction_factor', 0.8)
|
||||||
266
core/eye_processor.py
Normal file
266
core/eye_processor.py
Normal file
@@ -0,0 +1,266 @@
|
|||||||
|
"""
|
||||||
|
Eye processor module for VR180 separate eye processing.
|
||||||
|
Handles splitting VR180 side-by-side frames into separate left/right eyes and recombining.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
import logging
|
||||||
|
import subprocess
|
||||||
|
from typing import Dict, List, Any, Optional, Tuple
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
class EyeProcessor:
|
||||||
|
"""Handles VR180 eye-specific processing operations."""
|
||||||
|
|
||||||
|
def __init__(self, eye_overlap_pixels: int = 0):
|
||||||
|
"""
|
||||||
|
Initialize eye processor.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
eye_overlap_pixels: Number of pixels to overlap between eyes for blending
|
||||||
|
"""
|
||||||
|
self.eye_overlap_pixels = eye_overlap_pixels
|
||||||
|
|
||||||
|
def split_frame_into_eyes(self, frame: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
|
||||||
|
"""
|
||||||
|
Split a VR180 side-by-side frame into separate left and right eye frames.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
frame: Input VR180 frame (BGR format)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (left_eye_frame, right_eye_frame)
|
||||||
|
"""
|
||||||
|
if len(frame.shape) != 3:
|
||||||
|
raise ValueError("Frame must be a 3-channel BGR image")
|
||||||
|
|
||||||
|
height, width, channels = frame.shape
|
||||||
|
half_width = width // 2
|
||||||
|
|
||||||
|
# Extract left and right eye frames
|
||||||
|
left_eye = frame[:, :half_width + self.eye_overlap_pixels, :]
|
||||||
|
right_eye = frame[:, half_width - self.eye_overlap_pixels:, :]
|
||||||
|
|
||||||
|
logger.debug(f"Split frame {width}x{height} into left: {left_eye.shape} and right: {right_eye.shape}")
|
||||||
|
|
||||||
|
return left_eye, right_eye
|
||||||
|
|
||||||
|
def split_video_into_eyes(self, input_video_path: str, left_output_path: str,
|
||||||
|
right_output_path: str, scale: float = 1.0) -> bool:
|
||||||
|
"""
|
||||||
|
Split a VR180 video into separate left and right eye videos using FFmpeg.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
input_video_path: Path to input VR180 video
|
||||||
|
left_output_path: Output path for left eye video
|
||||||
|
right_output_path: Output path for right eye video
|
||||||
|
scale: Scale factor for output videos (default: 1.0)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful, False otherwise
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Get video properties
|
||||||
|
cap = cv2.VideoCapture(input_video_path)
|
||||||
|
if not cap.isOpened():
|
||||||
|
logger.error(f"Could not open video: {input_video_path}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||||
|
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||||
|
fps = cap.get(cv2.CAP_PROP_FPS)
|
||||||
|
cap.release()
|
||||||
|
|
||||||
|
# Calculate output dimensions
|
||||||
|
half_width = int((width // 2) * scale)
|
||||||
|
output_height = int(height * scale)
|
||||||
|
|
||||||
|
# Create output directories if they don't exist
|
||||||
|
os.makedirs(os.path.dirname(left_output_path), exist_ok=True)
|
||||||
|
os.makedirs(os.path.dirname(right_output_path), exist_ok=True)
|
||||||
|
|
||||||
|
# FFmpeg command for left eye (crop left half)
|
||||||
|
left_command = [
|
||||||
|
'ffmpeg', '-y',
|
||||||
|
'-i', input_video_path,
|
||||||
|
'-vf', f'crop={width//2 + self.eye_overlap_pixels}:{height}:0:0,scale={half_width}:{output_height}',
|
||||||
|
'-c:v', 'libx264',
|
||||||
|
'-preset', 'fast',
|
||||||
|
'-crf', '18',
|
||||||
|
left_output_path
|
||||||
|
]
|
||||||
|
|
||||||
|
# FFmpeg command for right eye (crop right half)
|
||||||
|
right_command = [
|
||||||
|
'ffmpeg', '-y',
|
||||||
|
'-i', input_video_path,
|
||||||
|
'-vf', f'crop={width//2 + self.eye_overlap_pixels}:{height}:{width//2 - self.eye_overlap_pixels}:0,scale={half_width}:{output_height}',
|
||||||
|
'-c:v', 'libx264',
|
||||||
|
'-preset', 'fast',
|
||||||
|
'-crf', '18',
|
||||||
|
right_output_path
|
||||||
|
]
|
||||||
|
|
||||||
|
logger.info(f"Splitting video into left eye: {left_output_path}")
|
||||||
|
result_left = subprocess.run(left_command, capture_output=True, text=True)
|
||||||
|
if result_left.returncode != 0:
|
||||||
|
logger.error(f"FFmpeg failed for left eye: {result_left.stderr}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
logger.info(f"Splitting video into right eye: {right_output_path}")
|
||||||
|
result_right = subprocess.run(right_command, capture_output=True, text=True)
|
||||||
|
if result_right.returncode != 0:
|
||||||
|
logger.error(f"FFmpeg failed for right eye: {result_right.stderr}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
logger.info(f"Successfully split video into separate eye videos")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error splitting video into eyes: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def combine_eye_masks(self, left_masks: Optional[Dict[int, np.ndarray]],
|
||||||
|
right_masks: Optional[Dict[int, np.ndarray]],
|
||||||
|
full_frame_shape: Tuple[int, int]) -> Dict[int, np.ndarray]:
|
||||||
|
"""
|
||||||
|
Combine left and right eye masks back into full-frame format.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
left_masks: Dictionary of masks from left eye processing (frame_idx -> mask)
|
||||||
|
right_masks: Dictionary of masks from right eye processing (frame_idx -> mask)
|
||||||
|
full_frame_shape: Shape of the full VR180 frame (height, width)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary of combined masks in full-frame format
|
||||||
|
"""
|
||||||
|
combined_masks = {}
|
||||||
|
full_height, full_width = full_frame_shape
|
||||||
|
half_width = full_width // 2
|
||||||
|
|
||||||
|
# Get all frame indices from both eyes
|
||||||
|
left_frames = set(left_masks.keys()) if left_masks else set()
|
||||||
|
right_frames = set(right_masks.keys()) if right_masks else set()
|
||||||
|
all_frames = left_frames.union(right_frames)
|
||||||
|
|
||||||
|
for frame_idx in all_frames:
|
||||||
|
# Create full-frame mask
|
||||||
|
combined_mask = np.zeros((full_height, full_width), dtype=np.uint8)
|
||||||
|
|
||||||
|
# Add left eye mask to left half of frame
|
||||||
|
if left_masks and frame_idx in left_masks:
|
||||||
|
left_mask = left_masks[frame_idx]
|
||||||
|
if len(left_mask.shape) == 3:
|
||||||
|
left_mask = left_mask.squeeze()
|
||||||
|
|
||||||
|
# Resize left mask to fit left half of full frame
|
||||||
|
left_target_width = half_width + self.eye_overlap_pixels
|
||||||
|
if left_mask.shape != (full_height, left_target_width):
|
||||||
|
left_mask = cv2.resize(left_mask.astype(np.uint8),
|
||||||
|
(left_target_width, full_height),
|
||||||
|
interpolation=cv2.INTER_NEAREST)
|
||||||
|
|
||||||
|
# Place in left half of combined mask
|
||||||
|
combined_mask[:, :left_target_width] = left_mask[:, :left_target_width]
|
||||||
|
|
||||||
|
# Add right eye mask to right half of frame
|
||||||
|
if right_masks and frame_idx in right_masks:
|
||||||
|
right_mask = right_masks[frame_idx]
|
||||||
|
if len(right_mask.shape) == 3:
|
||||||
|
right_mask = right_mask.squeeze()
|
||||||
|
|
||||||
|
# Resize right mask to fit right half of full frame
|
||||||
|
right_target_width = half_width + self.eye_overlap_pixels
|
||||||
|
right_start_x = half_width - self.eye_overlap_pixels
|
||||||
|
|
||||||
|
if right_mask.shape != (full_height, right_target_width):
|
||||||
|
right_mask = cv2.resize(right_mask.astype(np.uint8),
|
||||||
|
(right_target_width, full_height),
|
||||||
|
interpolation=cv2.INTER_NEAREST)
|
||||||
|
|
||||||
|
# Place in right half of combined mask
|
||||||
|
combined_mask[:, right_start_x:] = right_mask
|
||||||
|
|
||||||
|
# Store combined mask for this frame (using object ID 1 for simplicity)
|
||||||
|
combined_masks[frame_idx] = {1: combined_mask}
|
||||||
|
|
||||||
|
logger.debug(f"Combined {len(combined_masks)} frame masks from left/right eyes")
|
||||||
|
return combined_masks
|
||||||
|
|
||||||
|
def is_in_left_half(self, detection: Dict[str, Any], frame_width: int) -> bool:
|
||||||
|
"""
|
||||||
|
Check if a detection is in the left half of a VR180 frame.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
detection: YOLO detection dictionary with 'bbox' key
|
||||||
|
frame_width: Width of the full VR180 frame
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if detection center is in left half
|
||||||
|
"""
|
||||||
|
bbox = detection['bbox']
|
||||||
|
center_x = (bbox[0] + bbox[2]) / 2
|
||||||
|
return center_x < (frame_width // 2)
|
||||||
|
|
||||||
|
def is_in_right_half(self, detection: Dict[str, Any], frame_width: int) -> bool:
|
||||||
|
"""
|
||||||
|
Check if a detection is in the right half of a VR180 frame.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
detection: YOLO detection dictionary with 'bbox' key
|
||||||
|
frame_width: Width of the full VR180 frame
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if detection center is in right half
|
||||||
|
"""
|
||||||
|
return not self.is_in_left_half(detection, frame_width)
|
||||||
|
|
||||||
|
def convert_detection_to_eye_coordinates(self, detection: Dict[str, Any],
|
||||||
|
eye_side: str, frame_width: int) -> Dict[str, Any]:
|
||||||
|
"""
|
||||||
|
Convert a full-frame detection to eye-specific coordinates.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
detection: YOLO detection dictionary with 'bbox' key
|
||||||
|
eye_side: 'left' or 'right'
|
||||||
|
frame_width: Width of the full VR180 frame
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Detection with converted coordinates for the specific eye
|
||||||
|
"""
|
||||||
|
bbox = detection['bbox'].copy()
|
||||||
|
half_width = frame_width // 2
|
||||||
|
|
||||||
|
if eye_side == 'right':
|
||||||
|
# Shift right eye coordinates to start from 0
|
||||||
|
bbox[0] -= (half_width - self.eye_overlap_pixels) # x1
|
||||||
|
bbox[2] -= (half_width - self.eye_overlap_pixels) # x2
|
||||||
|
|
||||||
|
# Ensure coordinates are within bounds
|
||||||
|
eye_width = half_width + self.eye_overlap_pixels
|
||||||
|
bbox[0] = max(0, min(bbox[0], eye_width - 1))
|
||||||
|
bbox[2] = max(0, min(bbox[2], eye_width - 1))
|
||||||
|
|
||||||
|
converted_detection = detection.copy()
|
||||||
|
converted_detection['bbox'] = bbox
|
||||||
|
|
||||||
|
return converted_detection
|
||||||
|
|
||||||
|
def create_full_greenscreen_frame(self, frame_shape: Tuple[int, int, int],
|
||||||
|
green_color: List[int] = [0, 255, 0]) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Create a full greenscreen frame for fallback when no humans are detected.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
frame_shape: Shape of the frame (height, width, channels)
|
||||||
|
green_color: RGB values for green screen color
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Full greenscreen frame
|
||||||
|
"""
|
||||||
|
greenscreen_frame = np.full(frame_shape, green_color, dtype=np.uint8)
|
||||||
|
logger.debug(f"Created full greenscreen frame with shape {frame_shape}")
|
||||||
|
return greenscreen_frame
|
||||||
914
core/mask_processor.py
Normal file
914
core/mask_processor.py
Normal file
@@ -0,0 +1,914 @@
|
|||||||
|
"""
|
||||||
|
Mask processor module for applying green screen effects.
|
||||||
|
Handles applying masks to video frames to create green screen output.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
import cupy as cp
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import logging
|
||||||
|
from typing import Dict, List, Any, Optional, Tuple
|
||||||
|
from collections import deque
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
class MaskProcessor:
|
||||||
|
"""Handles mask application and green screen processing with quality enhancements."""
|
||||||
|
|
||||||
|
def __init__(self, green_color: List[int] = [0, 255, 0], blue_color: List[int] = [255, 0, 0],
|
||||||
|
mask_quality_config: Optional[Dict[str, Any]] = None,
|
||||||
|
output_mode: str = "green_screen"):
|
||||||
|
"""
|
||||||
|
Initialize mask processor with quality enhancement options.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
green_color: RGB color for green screen background
|
||||||
|
blue_color: RGB color for second object (if needed)
|
||||||
|
mask_quality_config: Configuration dictionary for mask quality improvements
|
||||||
|
output_mode: Output mode - "green_screen" or "alpha_channel"
|
||||||
|
"""
|
||||||
|
self.green_color = green_color
|
||||||
|
self.blue_color = blue_color
|
||||||
|
self.output_mode = output_mode
|
||||||
|
self.use_gpu = self._check_gpu_availability()
|
||||||
|
|
||||||
|
# Mask quality configuration with defaults
|
||||||
|
if mask_quality_config is None:
|
||||||
|
mask_quality_config = {}
|
||||||
|
|
||||||
|
self.enable_edge_blur = mask_quality_config.get('enable_edge_blur', False)
|
||||||
|
self.edge_blur_radius = mask_quality_config.get('edge_blur_radius', 3)
|
||||||
|
self.edge_blur_sigma = mask_quality_config.get('edge_blur_sigma', 1.5)
|
||||||
|
|
||||||
|
self.enable_temporal_smoothing = mask_quality_config.get('enable_temporal_smoothing', False)
|
||||||
|
self.temporal_blend_weight = mask_quality_config.get('temporal_blend_weight', 0.3)
|
||||||
|
self.temporal_history_frames = mask_quality_config.get('temporal_history_frames', 3)
|
||||||
|
|
||||||
|
self.enable_morphological_cleaning = mask_quality_config.get('enable_morphological_cleaning', False)
|
||||||
|
self.morphology_kernel_size = mask_quality_config.get('morphology_kernel_size', 5)
|
||||||
|
self.min_component_size = mask_quality_config.get('min_component_size', 500)
|
||||||
|
|
||||||
|
self.alpha_blending_mode = mask_quality_config.get('alpha_blending_mode', 'gaussian')
|
||||||
|
self.alpha_transition_width = mask_quality_config.get('alpha_transition_width', 10)
|
||||||
|
|
||||||
|
self.enable_bilateral_filter = mask_quality_config.get('enable_bilateral_filter', False)
|
||||||
|
self.bilateral_d = mask_quality_config.get('bilateral_d', 9)
|
||||||
|
self.bilateral_sigma_color = mask_quality_config.get('bilateral_sigma_color', 75)
|
||||||
|
self.bilateral_sigma_space = mask_quality_config.get('bilateral_sigma_space', 75)
|
||||||
|
|
||||||
|
# Temporal history buffer for mask smoothing
|
||||||
|
self.mask_history = deque(maxlen=self.temporal_history_frames)
|
||||||
|
|
||||||
|
# Log configuration
|
||||||
|
if any([self.enable_edge_blur, self.enable_temporal_smoothing, self.enable_morphological_cleaning]):
|
||||||
|
logger.info("Mask quality enhancements enabled:")
|
||||||
|
if self.enable_edge_blur:
|
||||||
|
logger.info(f" Edge blur: radius={self.edge_blur_radius}, sigma={self.edge_blur_sigma}")
|
||||||
|
if self.enable_temporal_smoothing:
|
||||||
|
logger.info(f" Temporal smoothing: weight={self.temporal_blend_weight}, history={self.temporal_history_frames}")
|
||||||
|
if self.enable_morphological_cleaning:
|
||||||
|
logger.info(f" Morphological cleaning: kernel={self.morphology_kernel_size}, min_size={self.min_component_size}")
|
||||||
|
logger.info(f" Alpha blending: mode={self.alpha_blending_mode}, width={self.alpha_transition_width}")
|
||||||
|
else:
|
||||||
|
logger.info("Mask quality enhancements disabled - using standard binary masking")
|
||||||
|
|
||||||
|
logger.info(f"Output mode: {self.output_mode}")
|
||||||
|
|
||||||
|
def _check_gpu_availability(self) -> bool:
|
||||||
|
"""Check if CuPy GPU acceleration is available."""
|
||||||
|
try:
|
||||||
|
import cupy as cp
|
||||||
|
# Test GPU availability
|
||||||
|
test_array = cp.array([1, 2, 3])
|
||||||
|
_ = test_array * 2
|
||||||
|
logger.info("GPU acceleration available via CuPy")
|
||||||
|
return True
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"GPU acceleration not available, using CPU: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def enhance_mask_quality(self, mask: np.ndarray) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Apply all enabled mask quality enhancements.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
mask: Input binary mask
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Enhanced mask with quality improvements applied
|
||||||
|
"""
|
||||||
|
enhanced_mask = mask.copy()
|
||||||
|
|
||||||
|
# 1. Morphological cleaning
|
||||||
|
if self.enable_morphological_cleaning:
|
||||||
|
enhanced_mask = self._clean_mask_morphologically(enhanced_mask)
|
||||||
|
|
||||||
|
# 2. Temporal smoothing
|
||||||
|
if self.enable_temporal_smoothing:
|
||||||
|
enhanced_mask = self._apply_temporal_smoothing(enhanced_mask)
|
||||||
|
|
||||||
|
# 3. Edge enhancement and blurring
|
||||||
|
if self.enable_edge_blur:
|
||||||
|
enhanced_mask = self._apply_edge_blur(enhanced_mask)
|
||||||
|
|
||||||
|
# 4. Bilateral filtering (if enabled)
|
||||||
|
if self.enable_bilateral_filter:
|
||||||
|
enhanced_mask = self._apply_bilateral_filter(enhanced_mask)
|
||||||
|
|
||||||
|
return enhanced_mask
|
||||||
|
|
||||||
|
def _clean_mask_morphologically(self, mask: np.ndarray) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Clean mask using morphological operations to remove noise and small artifacts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
mask: Input binary mask
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Cleaned mask
|
||||||
|
"""
|
||||||
|
# Convert to uint8 for OpenCV operations
|
||||||
|
mask_uint8 = (mask * 255).astype(np.uint8)
|
||||||
|
|
||||||
|
# Create morphological kernel
|
||||||
|
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,
|
||||||
|
(self.morphology_kernel_size, self.morphology_kernel_size))
|
||||||
|
|
||||||
|
# Opening operation (erosion followed by dilation) to remove small noise
|
||||||
|
cleaned = cv2.morphologyEx(mask_uint8, cv2.MORPH_OPEN, kernel)
|
||||||
|
|
||||||
|
# Closing operation (dilation followed by erosion) to fill small holes
|
||||||
|
cleaned = cv2.morphologyEx(cleaned, cv2.MORPH_CLOSE, kernel)
|
||||||
|
|
||||||
|
# Remove small connected components
|
||||||
|
if self.min_component_size > 0:
|
||||||
|
cleaned = self._remove_small_components(cleaned)
|
||||||
|
|
||||||
|
return (cleaned / 255.0).astype(np.float32)
|
||||||
|
|
||||||
|
def _remove_small_components(self, mask: np.ndarray) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Remove connected components smaller than minimum size.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
mask: Input binary mask (uint8)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Mask with small components removed
|
||||||
|
"""
|
||||||
|
# Find connected components
|
||||||
|
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(mask, connectivity=8)
|
||||||
|
|
||||||
|
# Create output mask
|
||||||
|
output_mask = np.zeros_like(mask)
|
||||||
|
|
||||||
|
# Keep components larger than minimum size (skip background label 0)
|
||||||
|
for i in range(1, num_labels):
|
||||||
|
component_size = stats[i, cv2.CC_STAT_AREA]
|
||||||
|
if component_size >= self.min_component_size:
|
||||||
|
output_mask[labels == i] = 255
|
||||||
|
|
||||||
|
return output_mask
|
||||||
|
|
||||||
|
def _apply_temporal_smoothing(self, mask: np.ndarray) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Apply temporal smoothing using mask history.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
mask: Current frame mask
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Temporally smoothed mask
|
||||||
|
"""
|
||||||
|
if len(self.mask_history) == 0:
|
||||||
|
# First frame, no history to blend with
|
||||||
|
self.mask_history.append(mask.copy())
|
||||||
|
return mask
|
||||||
|
|
||||||
|
# Blend with previous frames using weighted average
|
||||||
|
smoothed_mask = mask.astype(np.float32)
|
||||||
|
total_weight = 1.0
|
||||||
|
|
||||||
|
for i, hist_mask in enumerate(reversed(self.mask_history)):
|
||||||
|
# Exponential decay: more recent frames have higher weight
|
||||||
|
frame_weight = self.temporal_blend_weight * (0.8 ** i)
|
||||||
|
smoothed_mask += hist_mask.astype(np.float32) * frame_weight
|
||||||
|
total_weight += frame_weight
|
||||||
|
|
||||||
|
# Normalize by total weight
|
||||||
|
smoothed_mask /= total_weight
|
||||||
|
|
||||||
|
# Update history
|
||||||
|
self.mask_history.append(mask.copy())
|
||||||
|
|
||||||
|
return smoothed_mask
|
||||||
|
|
||||||
|
def _apply_edge_blur(self, mask: np.ndarray) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Apply Gaussian blur to mask edges for smooth transitions.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
mask: Input mask
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Mask with blurred edges
|
||||||
|
"""
|
||||||
|
# Apply Gaussian blur
|
||||||
|
kernel_size = 2 * self.edge_blur_radius + 1
|
||||||
|
blurred_mask = cv2.GaussianBlur(mask.astype(np.float32),
|
||||||
|
(kernel_size, kernel_size),
|
||||||
|
self.edge_blur_sigma)
|
||||||
|
|
||||||
|
return blurred_mask
|
||||||
|
|
||||||
|
def _apply_bilateral_filter(self, mask: np.ndarray) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Apply bilateral filtering for edge-preserving smoothing.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
mask: Input mask
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Filtered mask
|
||||||
|
"""
|
||||||
|
# Convert to uint8 for bilateral filter
|
||||||
|
mask_uint8 = (mask * 255).astype(np.uint8)
|
||||||
|
|
||||||
|
# Apply bilateral filter
|
||||||
|
filtered = cv2.bilateralFilter(mask_uint8, self.bilateral_d,
|
||||||
|
self.bilateral_sigma_color,
|
||||||
|
self.bilateral_sigma_space)
|
||||||
|
|
||||||
|
return (filtered / 255.0).astype(np.float32)
|
||||||
|
|
||||||
|
def _create_alpha_mask(self, mask: np.ndarray) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Create alpha mask with smooth transitions based on blending mode.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
mask: Input binary/float mask
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Alpha mask with smooth transitions
|
||||||
|
"""
|
||||||
|
if self.alpha_blending_mode == "linear":
|
||||||
|
return mask
|
||||||
|
elif self.alpha_blending_mode == "gaussian":
|
||||||
|
# Use distance transform for smooth falloff
|
||||||
|
binary_mask = (mask > 0.5).astype(np.uint8)
|
||||||
|
|
||||||
|
# Distance transform from mask edges
|
||||||
|
dist_inside = cv2.distanceTransform(binary_mask, cv2.DIST_L2, 5)
|
||||||
|
dist_outside = cv2.distanceTransform(1 - binary_mask, cv2.DIST_L2, 5)
|
||||||
|
|
||||||
|
# Create smooth alpha based on distance
|
||||||
|
alpha = np.zeros_like(mask, dtype=np.float32)
|
||||||
|
transition_width = self.alpha_transition_width
|
||||||
|
|
||||||
|
# Inside mask: fade from edge
|
||||||
|
alpha[binary_mask > 0] = np.minimum(1.0, dist_inside[binary_mask > 0] / transition_width)
|
||||||
|
|
||||||
|
# Outside mask: fade to zero
|
||||||
|
alpha[binary_mask == 0] = np.maximum(0.0, 1.0 - dist_outside[binary_mask == 0] / transition_width)
|
||||||
|
|
||||||
|
return alpha
|
||||||
|
elif self.alpha_blending_mode == "sigmoid":
|
||||||
|
# Sigmoid-based smooth transition
|
||||||
|
return 1.0 / (1.0 + np.exp(-10 * (mask - 0.5)))
|
||||||
|
else:
|
||||||
|
return mask
|
||||||
|
|
||||||
|
def apply_green_mask(self, frame: np.ndarray, masks: List[np.ndarray]) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Apply green screen mask to a frame with quality enhancements.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
frame: Input video frame (BGR format)
|
||||||
|
masks: List of object masks to apply
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Frame with green screen background and enhanced mask quality
|
||||||
|
"""
|
||||||
|
# Combine all masks into a single mask
|
||||||
|
combined_mask = self._combine_masks(masks)
|
||||||
|
|
||||||
|
# Apply quality enhancements
|
||||||
|
enhanced_mask = self.enhance_mask_quality(combined_mask)
|
||||||
|
|
||||||
|
# Create alpha mask for smooth blending
|
||||||
|
alpha_mask = self._create_alpha_mask(enhanced_mask)
|
||||||
|
|
||||||
|
# Apply mask using alpha blending
|
||||||
|
if self.use_gpu:
|
||||||
|
return self._apply_green_mask_gpu_enhanced(frame, alpha_mask)
|
||||||
|
else:
|
||||||
|
return self._apply_green_mask_cpu_enhanced(frame, alpha_mask)
|
||||||
|
|
||||||
|
def apply_mask_with_alpha(self, frame: np.ndarray, masks: List[np.ndarray]) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Apply mask to create RGBA frame with alpha channel.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
frame: Input video frame (BGR format)
|
||||||
|
masks: List of object masks to apply
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
RGBA frame with alpha channel
|
||||||
|
"""
|
||||||
|
# Combine all masks into a single mask
|
||||||
|
combined_mask = self._combine_masks(masks)
|
||||||
|
|
||||||
|
# Apply quality enhancements
|
||||||
|
enhanced_mask = self.enhance_mask_quality(combined_mask)
|
||||||
|
|
||||||
|
# Create alpha mask for smooth blending
|
||||||
|
alpha_mask = self._create_alpha_mask(enhanced_mask)
|
||||||
|
|
||||||
|
# Resize alpha mask to match frame if needed
|
||||||
|
if alpha_mask.shape != frame.shape[:2]:
|
||||||
|
alpha_mask = cv2.resize(alpha_mask, (frame.shape[1], frame.shape[0]))
|
||||||
|
|
||||||
|
# Convert BGR to BGRA
|
||||||
|
bgra_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2BGRA)
|
||||||
|
|
||||||
|
# Set alpha channel
|
||||||
|
bgra_frame[:, :, 3] = (alpha_mask * 255).astype(np.uint8)
|
||||||
|
|
||||||
|
return bgra_frame
|
||||||
|
|
||||||
|
def _combine_masks(self, masks: List[np.ndarray]) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Combine multiple object masks into a single mask.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
masks: List of object masks
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Combined mask
|
||||||
|
"""
|
||||||
|
if not masks:
|
||||||
|
return np.zeros((0, 0), dtype=np.float32)
|
||||||
|
|
||||||
|
# Start with first mask
|
||||||
|
combined_mask = masks[0].squeeze().astype(np.float32)
|
||||||
|
|
||||||
|
# Combine with remaining masks using logical OR
|
||||||
|
for mask in masks[1:]:
|
||||||
|
mask_squeezed = mask.squeeze().astype(np.float32)
|
||||||
|
if mask_squeezed.shape != combined_mask.shape:
|
||||||
|
# Resize mask to match combined mask
|
||||||
|
mask_squeezed = cv2.resize(mask_squeezed,
|
||||||
|
(combined_mask.shape[1], combined_mask.shape[0]),
|
||||||
|
interpolation=cv2.INTER_NEAREST)
|
||||||
|
combined_mask = np.maximum(combined_mask, mask_squeezed)
|
||||||
|
|
||||||
|
return combined_mask
|
||||||
|
|
||||||
|
def reset_temporal_history(self):
|
||||||
|
"""Reset temporal history buffer. Call this when starting a new segment."""
|
||||||
|
self.mask_history.clear()
|
||||||
|
logger.debug("Temporal history buffer reset")
|
||||||
|
|
||||||
|
def _apply_green_mask_gpu_enhanced(self, frame: np.ndarray, alpha_mask: np.ndarray) -> np.ndarray:
|
||||||
|
"""GPU-accelerated green mask application with alpha blending using CuPy (Phase 1 optimized)."""
|
||||||
|
try:
|
||||||
|
# Convert to CuPy arrays with optimized data transfer
|
||||||
|
frame_gpu = cp.asarray(frame, dtype=cp.uint8)
|
||||||
|
alpha_gpu = cp.asarray(alpha_mask, dtype=cp.float32)
|
||||||
|
|
||||||
|
# Resize alpha mask to match frame if needed (vectorized operation)
|
||||||
|
if alpha_gpu.shape != frame_gpu.shape[:2]:
|
||||||
|
# Use CuPy's resize instead of OpenCV for GPU optimization
|
||||||
|
alpha_gpu = cp.array(cv2.resize(cp.asnumpy(alpha_gpu),
|
||||||
|
(frame_gpu.shape[1], frame_gpu.shape[0])))
|
||||||
|
|
||||||
|
# Create green background (optimized broadcasting)
|
||||||
|
green_color_gpu = cp.array(self.green_color, dtype=cp.uint8)
|
||||||
|
green_background = cp.broadcast_to(green_color_gpu, frame_gpu.shape)
|
||||||
|
|
||||||
|
# Apply vectorized alpha blending with optimized memory access
|
||||||
|
alpha_3d = cp.expand_dims(alpha_gpu, axis=2)
|
||||||
|
|
||||||
|
# Use more efficient computation with explicit typing
|
||||||
|
frame_float = frame_gpu.astype(cp.float32)
|
||||||
|
green_float = green_background.astype(cp.float32)
|
||||||
|
|
||||||
|
# Vectorized blending operation
|
||||||
|
result_frame = cp.clip(alpha_3d * frame_float + (1.0 - alpha_3d) * green_float, 0, 255)
|
||||||
|
|
||||||
|
return cp.asnumpy(result_frame.astype(cp.uint8))
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"GPU enhanced processing failed, falling back to CPU: {e}")
|
||||||
|
return self._apply_green_mask_cpu_enhanced(frame, alpha_mask)
|
||||||
|
|
||||||
|
def _apply_green_mask_cpu_enhanced(self, frame: np.ndarray, alpha_mask: np.ndarray) -> np.ndarray:
|
||||||
|
"""CPU-based green mask application with alpha blending (Phase 1 optimized)."""
|
||||||
|
# Resize alpha mask to match frame if needed
|
||||||
|
if alpha_mask.shape != frame.shape[:2]:
|
||||||
|
alpha_mask = cv2.resize(alpha_mask, (frame.shape[1], frame.shape[0]))
|
||||||
|
|
||||||
|
# Create green background with broadcasting (more efficient)
|
||||||
|
green_color = np.array(self.green_color, dtype=np.uint8)
|
||||||
|
green_background = np.broadcast_to(green_color, frame.shape)
|
||||||
|
|
||||||
|
# Apply optimized alpha blending with explicit data types
|
||||||
|
alpha_3d = np.expand_dims(alpha_mask.astype(np.float32), axis=2)
|
||||||
|
|
||||||
|
# Vectorized blending with optimized memory access
|
||||||
|
frame_float = frame.astype(np.float32)
|
||||||
|
green_float = green_background.astype(np.float32)
|
||||||
|
|
||||||
|
result_frame = np.clip(alpha_3d * frame_float + (1.0 - alpha_3d) * green_float, 0, 255)
|
||||||
|
|
||||||
|
return result_frame.astype(np.uint8)
|
||||||
|
|
||||||
|
def apply_colored_mask(self, frame: np.ndarray, masks_a: List[np.ndarray],
|
||||||
|
masks_b: List[np.ndarray]) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Apply colored masks for visualization (green and blue).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
frame: Input video frame
|
||||||
|
masks_a: Masks for object A (green)
|
||||||
|
masks_b: Masks for object B (blue)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Frame with colored masks applied
|
||||||
|
"""
|
||||||
|
colored_mask = np.zeros_like(frame)
|
||||||
|
|
||||||
|
# Apply green color to masks_a
|
||||||
|
for mask in masks_a:
|
||||||
|
mask = mask.squeeze()
|
||||||
|
if mask.shape != frame.shape[:2]:
|
||||||
|
mask = cv2.resize(mask, (frame.shape[1], frame.shape[0]),
|
||||||
|
interpolation=cv2.INTER_NEAREST)
|
||||||
|
colored_mask[mask > 0] = self.green_color
|
||||||
|
|
||||||
|
# Apply blue color to masks_b
|
||||||
|
for mask in masks_b:
|
||||||
|
mask = mask.squeeze()
|
||||||
|
if mask.shape != frame.shape[:2]:
|
||||||
|
mask = cv2.resize(mask, (frame.shape[1], frame.shape[0]),
|
||||||
|
interpolation=cv2.INTER_NEAREST)
|
||||||
|
colored_mask[mask > 0] = self.blue_color
|
||||||
|
|
||||||
|
return colored_mask
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
def process_and_save_output_video(self, video_path: str, output_video_path: str,
|
||||||
|
video_segments: Dict[int, Dict[int, np.ndarray]],
|
||||||
|
use_nvenc: bool = False, bitrate: str = "50M",
|
||||||
|
batch_size: int = 16) -> bool:
|
||||||
|
"""
|
||||||
|
Process high-resolution frames, apply upscaled masks, and save the output video.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
video_path: Path to input video
|
||||||
|
output_video_path: Path to save output video
|
||||||
|
video_segments: Dictionary of frame masks
|
||||||
|
use_nvenc: Whether to use NVIDIA hardware encoding
|
||||||
|
bitrate: Output video bitrate
|
||||||
|
batch_size: Number of frames to process in a single batch
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
cap = cv2.VideoCapture(video_path)
|
||||||
|
if not cap.isOpened():
|
||||||
|
logger.error(f"Could not open video: {video_path}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||||
|
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||||
|
fps = cap.get(cv2.CAP_PROP_FPS) or 30.0
|
||||||
|
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||||
|
|
||||||
|
logger.info(f"Processing video: {frame_width}x{frame_height} @ {fps}fps, {total_frames} frames")
|
||||||
|
|
||||||
|
# Setup VideoWriter
|
||||||
|
out_writer = None
|
||||||
|
if self.output_mode == "alpha_channel":
|
||||||
|
success = self._setup_alpha_encoder(output_video_path, frame_width, frame_height, fps, bitrate)
|
||||||
|
if not success:
|
||||||
|
logger.error("Failed to setup alpha channel encoder")
|
||||||
|
cap.release()
|
||||||
|
return False
|
||||||
|
use_nvenc = False
|
||||||
|
elif use_nvenc:
|
||||||
|
success = self._setup_nvenc_encoder(output_video_path, frame_width, frame_height, fps, bitrate)
|
||||||
|
if not success:
|
||||||
|
logger.warning("NVENC setup failed, falling back to OpenCV")
|
||||||
|
use_nvenc = False
|
||||||
|
|
||||||
|
if not use_nvenc and self.output_mode != "alpha_channel":
|
||||||
|
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
|
||||||
|
out_writer = cv2.VideoWriter(output_video_path, fourcc, fps, (frame_width, frame_height))
|
||||||
|
if not out_writer.isOpened():
|
||||||
|
logger.error("Failed to create output video writer")
|
||||||
|
cap.release()
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Process frames in batches
|
||||||
|
frame_idx = 0
|
||||||
|
processed_frames = 0
|
||||||
|
|
||||||
|
while frame_idx < total_frames:
|
||||||
|
batch_frames = []
|
||||||
|
batch_masks = []
|
||||||
|
|
||||||
|
# Read a batch of frames
|
||||||
|
for _ in range(batch_size):
|
||||||
|
ret, frame = cap.read()
|
||||||
|
if not ret:
|
||||||
|
break
|
||||||
|
batch_frames.append(frame)
|
||||||
|
|
||||||
|
if not batch_frames:
|
||||||
|
break
|
||||||
|
|
||||||
|
# Get masks for the current batch and perform just-in-time upscaling
|
||||||
|
for i in range(len(batch_frames)):
|
||||||
|
current_frame_idx = frame_idx + i
|
||||||
|
if current_frame_idx in video_segments:
|
||||||
|
frame_masks = video_segments[current_frame_idx]
|
||||||
|
upscaled_masks = []
|
||||||
|
for obj_id, mask in frame_masks.items():
|
||||||
|
mask = mask.squeeze()
|
||||||
|
if mask.shape != (frame_height, frame_width):
|
||||||
|
upscaled_mask = cv2.resize(mask.astype(np.uint8),
|
||||||
|
(frame_width, frame_height),
|
||||||
|
interpolation=cv2.INTER_NEAREST)
|
||||||
|
upscaled_masks.append(upscaled_mask)
|
||||||
|
else:
|
||||||
|
upscaled_masks.append(mask.astype(np.uint8))
|
||||||
|
batch_masks.append(upscaled_masks)
|
||||||
|
else:
|
||||||
|
batch_masks.append([]) # No masks for this frame
|
||||||
|
|
||||||
|
# Process the batch
|
||||||
|
result_batch = []
|
||||||
|
for i, frame in enumerate(batch_frames):
|
||||||
|
masks = batch_masks[i]
|
||||||
|
if masks:
|
||||||
|
if self.output_mode == "alpha_channel":
|
||||||
|
result_frame = self.apply_mask_with_alpha(frame, masks)
|
||||||
|
else:
|
||||||
|
result_frame = self.apply_green_mask(frame, masks)
|
||||||
|
else:
|
||||||
|
# No mask for this frame
|
||||||
|
if self.output_mode == "alpha_channel":
|
||||||
|
bgra_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2BGRA)
|
||||||
|
bgra_frame[:, :, 3] = 0
|
||||||
|
result_frame = bgra_frame
|
||||||
|
else:
|
||||||
|
result_frame = frame
|
||||||
|
result_batch.append(result_frame)
|
||||||
|
|
||||||
|
# Write the processed batch
|
||||||
|
for result_frame in result_batch:
|
||||||
|
if self.output_mode == "alpha_channel" and hasattr(self, 'alpha_process'):
|
||||||
|
self.alpha_process.stdin.write(result_frame.tobytes())
|
||||||
|
elif use_nvenc and hasattr(self, 'nvenc_process'):
|
||||||
|
self.nvenc_process.stdin.write(result_frame.tobytes())
|
||||||
|
else:
|
||||||
|
out_writer.write(result_frame)
|
||||||
|
|
||||||
|
processed_frames += len(batch_frames)
|
||||||
|
frame_idx += len(batch_frames)
|
||||||
|
|
||||||
|
if processed_frames % 100 < batch_size:
|
||||||
|
logger.info(f"Processed {processed_frames}/{total_frames} frames")
|
||||||
|
|
||||||
|
# Cleanup
|
||||||
|
cap.release()
|
||||||
|
if self.output_mode == "alpha_channel" and hasattr(self, 'alpha_process'):
|
||||||
|
self.alpha_process.stdin.close()
|
||||||
|
self.alpha_process.wait()
|
||||||
|
elif use_nvenc and hasattr(self, 'nvenc_process'):
|
||||||
|
self.nvenc_process.stdin.close()
|
||||||
|
self.nvenc_process.wait()
|
||||||
|
else:
|
||||||
|
if out_writer:
|
||||||
|
out_writer.release()
|
||||||
|
|
||||||
|
logger.info(f"Successfully processed {processed_frames} frames to {output_video_path}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error processing video: {e}", exc_info=True)
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _setup_nvenc_encoder(self, output_path: str, width: int, height: int,
|
||||||
|
fps: float, bitrate: str) -> bool:
|
||||||
|
"""Setup NVENC hardware encoder using FFmpeg."""
|
||||||
|
try:
|
||||||
|
# Determine encoder based on platform
|
||||||
|
if sys.platform == 'darwin':
|
||||||
|
encoder = 'hevc_videotoolbox'
|
||||||
|
else:
|
||||||
|
encoder = 'hevc_nvenc'
|
||||||
|
|
||||||
|
command = [
|
||||||
|
'ffmpeg',
|
||||||
|
'-y', # Overwrite output file
|
||||||
|
'-f', 'rawvideo',
|
||||||
|
'-vcodec', 'rawvideo',
|
||||||
|
'-pix_fmt', 'bgr24',
|
||||||
|
'-s', f'{width}x{height}',
|
||||||
|
'-r', str(fps),
|
||||||
|
'-i', '-', # Input from stdin
|
||||||
|
'-an', # No audio (will be added later)
|
||||||
|
'-vcodec', encoder,
|
||||||
|
'-pix_fmt', 'yuv420p', # Changed from nv12 for better compatibility
|
||||||
|
'-preset', 'slow',
|
||||||
|
'-b:v', bitrate,
|
||||||
|
output_path
|
||||||
|
]
|
||||||
|
|
||||||
|
self.nvenc_process = subprocess.Popen(command, stdin=subprocess.PIPE,
|
||||||
|
stderr=subprocess.PIPE)
|
||||||
|
logger.info(f"Initialized {encoder} hardware encoder")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to setup NVENC encoder: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _setup_alpha_encoder(self, output_path: str, width: int, height: int,
|
||||||
|
fps: float, bitrate: str) -> bool:
|
||||||
|
"""Setup encoder for alpha channel video using FFmpeg with H.264/H.265."""
|
||||||
|
try:
|
||||||
|
# For VR180 SBS, we'll use H.265 (HEVC) with alpha channel
|
||||||
|
# Note: Standard H.264/H.265 don't support alpha directly,
|
||||||
|
# so we'll encode the alpha as a separate grayscale channel or use a special pixel format
|
||||||
|
|
||||||
|
# Determine encoder based on platform
|
||||||
|
if sys.platform == 'darwin':
|
||||||
|
encoder = 'hevc_videotoolbox'
|
||||||
|
else:
|
||||||
|
encoder = 'hevc_nvenc'
|
||||||
|
|
||||||
|
command = [
|
||||||
|
'ffmpeg',
|
||||||
|
'-y', # Overwrite output file
|
||||||
|
'-f', 'rawvideo',
|
||||||
|
'-vcodec', 'rawvideo',
|
||||||
|
'-pix_fmt', 'bgra', # BGRA for alpha channel
|
||||||
|
'-s', f'{width}x{height}',
|
||||||
|
'-r', str(fps),
|
||||||
|
'-i', '-', # Input from stdin
|
||||||
|
'-an', # No audio (will be added later)
|
||||||
|
'-c:v', encoder,
|
||||||
|
'-pix_fmt', 'yuv420p', # Standard pixel format
|
||||||
|
'-preset', 'slow',
|
||||||
|
'-b:v', bitrate,
|
||||||
|
'-tag:v', 'hvc1', # Required for some players
|
||||||
|
output_path
|
||||||
|
]
|
||||||
|
|
||||||
|
self.alpha_process = subprocess.Popen(command, stdin=subprocess.PIPE,
|
||||||
|
stderr=subprocess.PIPE)
|
||||||
|
self.alpha_output_path = output_path
|
||||||
|
logger.info(f"Initialized {encoder} for alpha channel output (will be encoded as transparency in RGB)")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to setup alpha encoder: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def process_segment(self, segment_info: dict, video_segments: Dict[int, Dict[int, np.ndarray]],
|
||||||
|
use_nvenc: bool = False, bitrate: str = "50M") -> bool:
|
||||||
|
"""
|
||||||
|
Process a single segment and save the output video.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
segment_info: Segment information dictionary
|
||||||
|
video_segments: Dictionary of frame masks from SAM2
|
||||||
|
use_nvenc: Whether to use hardware encoding
|
||||||
|
bitrate: Output video bitrate
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful
|
||||||
|
"""
|
||||||
|
input_video = segment_info['video_file']
|
||||||
|
if self.output_mode == "alpha_channel":
|
||||||
|
output_video = os.path.join(segment_info['directory'], f"output_{segment_info['index']}.mov")
|
||||||
|
else:
|
||||||
|
output_video = os.path.join(segment_info['directory'], f"output_{segment_info['index']}.mp4")
|
||||||
|
|
||||||
|
logger.info(f"Processing segment {segment_info['index']} with {self.output_mode}")
|
||||||
|
|
||||||
|
success = self.process_and_save_output_video(
|
||||||
|
input_video,
|
||||||
|
output_video,
|
||||||
|
video_segments,
|
||||||
|
use_nvenc,
|
||||||
|
bitrate
|
||||||
|
)
|
||||||
|
|
||||||
|
if success:
|
||||||
|
logger.info(f"Successfully created {self.output_mode} video: {output_video}")
|
||||||
|
# Mark segment as completed only after video is successfully written
|
||||||
|
try:
|
||||||
|
output_done_file = os.path.join(segment_info['directory'], "output_frames_done")
|
||||||
|
with open(output_done_file, 'w') as f:
|
||||||
|
f.write(f"Segment {segment_info['index']} processed and saved successfully.")
|
||||||
|
logger.debug(f"Created completion marker for segment {segment_info['index']}")
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to create completion marker for segment {segment_info['index']}: {e}")
|
||||||
|
else:
|
||||||
|
logger.error(f"Failed to process segment {segment_info['index']}")
|
||||||
|
|
||||||
|
return success
|
||||||
|
|
||||||
|
def create_full_greenscreen_frame(self, frame_shape: Tuple[int, int, int],
|
||||||
|
green_color: Optional[List[int]] = None) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Create a full greenscreen frame for fallback when no humans are detected.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
frame_shape: Shape of the frame (height, width, channels)
|
||||||
|
green_color: RGB values for green screen color (uses default if None)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Full greenscreen frame
|
||||||
|
"""
|
||||||
|
if green_color is None:
|
||||||
|
green_color = self.green_color
|
||||||
|
|
||||||
|
greenscreen_frame = np.full(frame_shape, green_color, dtype=np.uint8)
|
||||||
|
logger.debug(f"Created full greenscreen frame with shape {frame_shape}")
|
||||||
|
return greenscreen_frame
|
||||||
|
|
||||||
|
def process_greenscreen_only_segment(self, segment_info: dict,
|
||||||
|
green_color: Optional[List[int]] = None,
|
||||||
|
use_nvenc: bool = False, bitrate: str = "50M") -> bool:
|
||||||
|
"""
|
||||||
|
Create a full greenscreen segment when no humans are detected.
|
||||||
|
Used as fallback in separate eye processing mode.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
segment_info: Segment information dictionary
|
||||||
|
green_color: RGB values for green screen color (uses default if None)
|
||||||
|
use_nvenc: Whether to use hardware encoding
|
||||||
|
bitrate: Output video bitrate
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if greenscreen segment was created successfully
|
||||||
|
"""
|
||||||
|
segment_dir = segment_info['directory']
|
||||||
|
video_path = segment_info['video_file']
|
||||||
|
segment_idx = segment_info['index']
|
||||||
|
|
||||||
|
logger.info(f"Creating full greenscreen segment {segment_idx} (no humans detected)")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Get video properties
|
||||||
|
cap = cv2.VideoCapture(video_path)
|
||||||
|
if not cap.isOpened():
|
||||||
|
logger.error(f"Could not open video: {video_path}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||||
|
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||||
|
fps = cap.get(cv2.CAP_PROP_FPS) or 30.0
|
||||||
|
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||||
|
cap.release()
|
||||||
|
|
||||||
|
# Create output video path
|
||||||
|
if self.output_mode == "alpha_channel":
|
||||||
|
output_video_path = os.path.join(segment_dir, f"output_{segment_idx}.mov")
|
||||||
|
else:
|
||||||
|
output_video_path = os.path.join(segment_dir, f"output_{segment_idx}.mp4")
|
||||||
|
|
||||||
|
# Create greenscreen frame
|
||||||
|
if green_color is None:
|
||||||
|
green_color = self.green_color
|
||||||
|
|
||||||
|
greenscreen_frame = self.create_full_greenscreen_frame(
|
||||||
|
(height, width, 3), green_color
|
||||||
|
)
|
||||||
|
|
||||||
|
# Setup video writer based on mode and hardware encoding preference
|
||||||
|
if use_nvenc:
|
||||||
|
success = self._write_greenscreen_with_nvenc(
|
||||||
|
output_video_path, greenscreen_frame, frame_count, fps, bitrate
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
success = self._write_greenscreen_with_opencv(
|
||||||
|
output_video_path, greenscreen_frame, frame_count, fps
|
||||||
|
)
|
||||||
|
|
||||||
|
if not success:
|
||||||
|
logger.error(f"Failed to write greenscreen video for segment {segment_idx}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Create empty mask file (black mask since no humans detected)
|
||||||
|
mask_output_path = os.path.join(segment_dir, "mask.png")
|
||||||
|
black_mask = np.zeros((height, width, 3), dtype=np.uint8)
|
||||||
|
cv2.imwrite(mask_output_path, black_mask)
|
||||||
|
|
||||||
|
# Mark segment as completed
|
||||||
|
output_done_file = os.path.join(segment_dir, "output_frames_done")
|
||||||
|
with open(output_done_file, 'w') as f:
|
||||||
|
f.write(f"Greenscreen segment {segment_idx} completed successfully\n")
|
||||||
|
|
||||||
|
logger.info(f"Successfully created greenscreen segment {segment_idx}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error creating greenscreen segment {segment_idx}: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _write_greenscreen_with_opencv(self, output_path: str, greenscreen_frame: np.ndarray,
|
||||||
|
frame_count: int, fps: float) -> bool:
|
||||||
|
"""Write greenscreen video using OpenCV VideoWriter."""
|
||||||
|
try:
|
||||||
|
if self.output_mode == "alpha_channel":
|
||||||
|
# For alpha channel mode, create fully transparent frames
|
||||||
|
bgra_frame = cv2.cvtColor(greenscreen_frame, cv2.COLOR_BGR2BGRA)
|
||||||
|
bgra_frame[:, :, 3] = 0 # Fully transparent
|
||||||
|
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
|
||||||
|
out = cv2.VideoWriter(output_path, fourcc, fps,
|
||||||
|
(greenscreen_frame.shape[1], greenscreen_frame.shape[0]), True)
|
||||||
|
frame_to_write = bgra_frame[:, :, :3] # OpenCV expects BGR for mp4v
|
||||||
|
else:
|
||||||
|
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
|
||||||
|
out = cv2.VideoWriter(output_path, fourcc, fps,
|
||||||
|
(greenscreen_frame.shape[1], greenscreen_frame.shape[0]))
|
||||||
|
frame_to_write = greenscreen_frame
|
||||||
|
|
||||||
|
if not out.isOpened():
|
||||||
|
logger.error(f"Failed to open video writer for {output_path}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Write identical greenscreen frames
|
||||||
|
for _ in range(frame_count):
|
||||||
|
out.write(frame_to_write)
|
||||||
|
|
||||||
|
out.release()
|
||||||
|
logger.debug(f"Wrote {frame_count} greenscreen frames using OpenCV")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error writing greenscreen with OpenCV: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _write_greenscreen_with_nvenc(self, output_path: str, greenscreen_frame: np.ndarray,
|
||||||
|
frame_count: int, fps: float, bitrate: str) -> bool:
|
||||||
|
"""Write greenscreen video using NVENC hardware encoding."""
|
||||||
|
try:
|
||||||
|
# Setup NVENC encoder
|
||||||
|
if not self._setup_nvenc_encoder(output_path,
|
||||||
|
greenscreen_frame.shape[1],
|
||||||
|
greenscreen_frame.shape[0],
|
||||||
|
fps, bitrate):
|
||||||
|
logger.warning("NVENC setup failed for greenscreen, falling back to OpenCV")
|
||||||
|
return self._write_greenscreen_with_opencv(output_path, greenscreen_frame, frame_count, fps)
|
||||||
|
|
||||||
|
# Write identical greenscreen frames
|
||||||
|
for _ in range(frame_count):
|
||||||
|
self.nvenc_process.stdin.write(greenscreen_frame.tobytes())
|
||||||
|
|
||||||
|
# Finalize encoding
|
||||||
|
self.nvenc_process.stdin.close()
|
||||||
|
self.nvenc_process.wait()
|
||||||
|
|
||||||
|
if self.nvenc_process.returncode != 0:
|
||||||
|
logger.error("NVENC encoding failed for greenscreen")
|
||||||
|
return False
|
||||||
|
|
||||||
|
logger.debug(f"Wrote {frame_count} greenscreen frames using NVENC")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error writing greenscreen with NVENC: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def has_valid_masks(self, video_segments: Optional[Dict[int, Dict[int, np.ndarray]]]) -> bool:
|
||||||
|
"""
|
||||||
|
Check if video segments contain valid masks.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
video_segments: Video segments dictionary from SAM2
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if valid masks are found
|
||||||
|
"""
|
||||||
|
if not video_segments:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check if any frame has non-empty masks
|
||||||
|
for frame_idx, frame_masks in video_segments.items():
|
||||||
|
for obj_id, mask in frame_masks.items():
|
||||||
|
if mask is not None and np.any(mask):
|
||||||
|
return True
|
||||||
|
|
||||||
|
return False
|
||||||
@@ -8,26 +8,44 @@ import cv2
|
|||||||
import numpy as np
|
import numpy as np
|
||||||
import torch
|
import torch
|
||||||
import logging
|
import logging
|
||||||
|
import subprocess
|
||||||
import gc
|
import gc
|
||||||
from typing import Dict, List, Any, Optional, Tuple
|
from typing import Dict, List, Any, Optional, Tuple
|
||||||
from sam2.build_sam import build_sam2_video_predictor
|
from sam2.build_sam import build_sam2_video_predictor
|
||||||
|
from .eye_processor import EyeProcessor
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
class SAM2Processor:
|
class SAM2Processor:
|
||||||
"""Handles SAM2-based video segmentation for human tracking."""
|
"""Handles SAM2-based video segmentation for human tracking."""
|
||||||
|
|
||||||
def __init__(self, checkpoint_path: str, config_path: str):
|
def __init__(self, checkpoint_path: str, config_path: str, vos_optimized: bool = False,
|
||||||
|
separate_eye_processing: bool = False, eye_overlap_pixels: int = 0,
|
||||||
|
async_preprocessor=None):
|
||||||
"""
|
"""
|
||||||
Initialize SAM2 processor.
|
Initialize SAM2 processor.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
checkpoint_path: Path to SAM2 checkpoint
|
checkpoint_path: Path to SAM2 checkpoint
|
||||||
config_path: Path to SAM2 config file
|
config_path: Path to SAM2 config file
|
||||||
|
vos_optimized: Enable VOS optimization for speedup (requires PyTorch 2.5.1+)
|
||||||
|
separate_eye_processing: Enable VR180 separate eye processing mode
|
||||||
|
eye_overlap_pixels: Pixel overlap between eyes for blending
|
||||||
|
async_preprocessor: Optional async preprocessor for background low-res video generation
|
||||||
"""
|
"""
|
||||||
self.checkpoint_path = checkpoint_path
|
self.checkpoint_path = checkpoint_path
|
||||||
self.config_path = config_path
|
self.config_path = config_path
|
||||||
|
self.vos_optimized = vos_optimized
|
||||||
|
self.separate_eye_processing = separate_eye_processing
|
||||||
|
self.async_preprocessor = async_preprocessor
|
||||||
self.predictor = None
|
self.predictor = None
|
||||||
|
|
||||||
|
# Initialize eye processor if separate eye processing is enabled
|
||||||
|
if separate_eye_processing:
|
||||||
|
self.eye_processor = EyeProcessor(eye_overlap_pixels=eye_overlap_pixels)
|
||||||
|
else:
|
||||||
|
self.eye_processor = None
|
||||||
|
|
||||||
self._initialize_predictor()
|
self._initialize_predictor()
|
||||||
|
|
||||||
def _initialize_predictor(self):
|
def _initialize_predictor(self):
|
||||||
@@ -46,12 +64,51 @@ class SAM2Processor:
|
|||||||
|
|
||||||
logger.info(f"Using device: {device}")
|
logger.info(f"Using device: {device}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Extract just the config filename for SAM2's Hydra-based loader
|
||||||
|
# SAM2 expects a config name relative to its internal config directory
|
||||||
|
config_name = os.path.basename(self.config_path)
|
||||||
|
if config_name.endswith('.yaml'):
|
||||||
|
config_name = config_name[:-5] # Remove .yaml extension
|
||||||
|
|
||||||
|
# SAM2 configs are in the format "sam2.1_hiera_X.yaml"
|
||||||
|
# and should be referenced as "configs/sam2.1/sam2.1_hiera_X"
|
||||||
|
if config_name.startswith("sam2.1_hiera"):
|
||||||
|
config_name = f"configs/sam2.1/{config_name}"
|
||||||
|
elif config_name.startswith("sam2_hiera"):
|
||||||
|
config_name = f"configs/sam2/{config_name}"
|
||||||
|
|
||||||
|
logger.info(f"Using SAM2 config: {config_name}")
|
||||||
|
|
||||||
|
# Use VOS optimization if enabled and supported
|
||||||
|
if self.vos_optimized:
|
||||||
try:
|
try:
|
||||||
self.predictor = build_sam2_video_predictor(
|
self.predictor = build_sam2_video_predictor(
|
||||||
self.config_path,
|
config_name, # Use just the config name, not full path
|
||||||
self.checkpoint_path,
|
self.checkpoint_path,
|
||||||
device=device
|
device=device,
|
||||||
|
vos_optimized=True # New optimization for major speedup
|
||||||
)
|
)
|
||||||
|
logger.info("Using optimized SAM2 VOS predictor with full model compilation")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to use optimized VOS predictor: {e}")
|
||||||
|
logger.info("Falling back to standard SAM2 predictor")
|
||||||
|
# Fallback to standard predictor
|
||||||
|
self.predictor = build_sam2_video_predictor(
|
||||||
|
config_name,
|
||||||
|
self.checkpoint_path,
|
||||||
|
device=device,
|
||||||
|
overrides=dict(conf=0.95)
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
# Use standard predictor
|
||||||
|
self.predictor = build_sam2_video_predictor(
|
||||||
|
config_name,
|
||||||
|
self.checkpoint_path,
|
||||||
|
device=device,
|
||||||
|
overrides=dict(conf=0.95)
|
||||||
|
)
|
||||||
|
logger.info("Using standard SAM2 predictor")
|
||||||
|
|
||||||
# Enable optimizations for CUDA
|
# Enable optimizations for CUDA
|
||||||
if device.type == "cuda":
|
if device.type == "cuda":
|
||||||
@@ -67,13 +124,64 @@ class SAM2Processor:
|
|||||||
|
|
||||||
def create_low_res_video(self, input_video_path: str, output_video_path: str, scale: float):
|
def create_low_res_video(self, input_video_path: str, output_video_path: str, scale: float):
|
||||||
"""
|
"""
|
||||||
Create a low-resolution version of the input video for inference.
|
Create a low-resolution version of the input video for inference using FFmpeg
|
||||||
|
with hardware acceleration for improved performance.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
input_video_path: Path to input video
|
input_video_path: Path to input video
|
||||||
output_video_path: Path to output low-res video
|
output_video_path: Path to output low-res video
|
||||||
scale: Scale factor for resolution reduction
|
scale: Scale factor for resolution reduction
|
||||||
"""
|
"""
|
||||||
|
try:
|
||||||
|
# Get video properties using OpenCV
|
||||||
|
cap = cv2.VideoCapture(input_video_path)
|
||||||
|
if not cap.isOpened():
|
||||||
|
raise ValueError(f"Could not open video: {input_video_path}")
|
||||||
|
|
||||||
|
original_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||||
|
original_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||||
|
fps = cap.get(cv2.CAP_PROP_FPS) or 30.0
|
||||||
|
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||||
|
cap.release()
|
||||||
|
|
||||||
|
target_width = int(original_width * scale)
|
||||||
|
target_height = int(original_height * scale)
|
||||||
|
|
||||||
|
# Ensure dimensions are even, as required by many codecs
|
||||||
|
target_width = target_width if target_width % 2 == 0 else target_width + 1
|
||||||
|
target_height = target_height if target_height % 2 == 0 else target_height + 1
|
||||||
|
|
||||||
|
# Construct FFmpeg command with hardware acceleration
|
||||||
|
command = [
|
||||||
|
'ffmpeg',
|
||||||
|
'-y',
|
||||||
|
'-hwaccel', 'auto', # Auto-detect hardware acceleration
|
||||||
|
'-i', input_video_path,
|
||||||
|
'-vf', f'scale={target_width}:{target_height}',
|
||||||
|
'-c:v', 'h264_nvenc', # Use NVIDIA's hardware encoder
|
||||||
|
'-preset', 'fast',
|
||||||
|
'-crf', '23',
|
||||||
|
output_video_path
|
||||||
|
]
|
||||||
|
|
||||||
|
logger.info(f"Executing FFmpeg command: {' '.join(command)}")
|
||||||
|
|
||||||
|
# Execute FFmpeg command
|
||||||
|
process = subprocess.run(command, check=True, capture_output=True, text=True)
|
||||||
|
|
||||||
|
if process.returncode != 0:
|
||||||
|
logger.error(f"FFmpeg failed with error: {process.stderr}")
|
||||||
|
raise RuntimeError(f"FFmpeg process failed: {process.stderr}")
|
||||||
|
|
||||||
|
logger.info(f"Created low-res video with {frame_count} frames: {output_video_path}")
|
||||||
|
|
||||||
|
except (subprocess.CalledProcessError, FileNotFoundError) as e:
|
||||||
|
logger.warning(f"Hardware-accelerated FFmpeg failed: {e}. Falling back to OpenCV.")
|
||||||
|
# Fallback to original OpenCV implementation if FFmpeg fails
|
||||||
|
self._create_low_res_video_opencv(input_video_path, output_video_path, scale)
|
||||||
|
|
||||||
|
def _create_low_res_video_opencv(self, input_video_path: str, output_video_path: str, scale: float):
|
||||||
|
"""Original OpenCV-based implementation for creating low-resolution video."""
|
||||||
cap = cv2.VideoCapture(input_video_path)
|
cap = cv2.VideoCapture(input_video_path)
|
||||||
if not cap.isOpened():
|
if not cap.isOpened():
|
||||||
raise ValueError(f"Could not open video: {input_video_path}")
|
raise ValueError(f"Could not open video: {input_video_path}")
|
||||||
@@ -98,42 +206,106 @@ class SAM2Processor:
|
|||||||
cap.release()
|
cap.release()
|
||||||
out.release()
|
out.release()
|
||||||
|
|
||||||
logger.info(f"Created low-res video with {frame_count} frames: {output_video_path}")
|
logger.info(f"Created low-res video with {frame_count} frames using OpenCV: {output_video_path}")
|
||||||
|
|
||||||
def add_yolo_prompts_to_predictor(self, inference_state, prompts: List[Dict[str, Any]]) -> bool:
|
def ensure_low_res_video(self, input_video_path: str, output_video_path: str,
|
||||||
|
scale: float, segment_idx: Optional[int] = None) -> bool:
|
||||||
|
"""
|
||||||
|
Ensure low-resolution video exists, using async preprocessor if available.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
input_video_path: Path to input video
|
||||||
|
output_video_path: Path to output low-res video
|
||||||
|
scale: Scale factor for resolution reduction
|
||||||
|
segment_idx: Optional segment index for async coordination
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if low-res video is ready
|
||||||
|
"""
|
||||||
|
# Check if already exists
|
||||||
|
if os.path.exists(output_video_path) and os.path.getsize(output_video_path) > 0:
|
||||||
|
return True
|
||||||
|
|
||||||
|
# Use async preprocessor if available and segment index provided
|
||||||
|
if self.async_preprocessor and segment_idx is not None:
|
||||||
|
if self.async_preprocessor.is_segment_ready(segment_idx):
|
||||||
|
if os.path.exists(output_video_path) and os.path.getsize(output_video_path) > 0:
|
||||||
|
logger.debug(f"Async preprocessor provided segment {segment_idx}")
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
logger.debug(f"Async preprocessor hasn't completed segment {segment_idx} yet")
|
||||||
|
|
||||||
|
# Fallback to synchronous creation
|
||||||
|
try:
|
||||||
|
logger.info(f"Creating low-res video synchronously: {input_video_path} -> {output_video_path}")
|
||||||
|
self.create_low_res_video(input_video_path, output_video_path, scale)
|
||||||
|
|
||||||
|
if os.path.exists(output_video_path) and os.path.getsize(output_video_path) > 0:
|
||||||
|
logger.info(f"Successfully created low-res video: {output_video_path} ({os.path.getsize(output_video_path)} bytes)")
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
logger.error(f"Low-res video creation failed - file doesn't exist or is empty: {output_video_path}")
|
||||||
|
return False
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to create low-res video {output_video_path}: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def add_yolo_prompts_to_predictor(self, inference_state, prompts: List[Dict[str, Any]],
|
||||||
|
inference_scale: float = 1.0) -> bool:
|
||||||
"""
|
"""
|
||||||
Add YOLO detection prompts to SAM2 predictor.
|
Add YOLO detection prompts to SAM2 predictor.
|
||||||
|
Includes error handling matching the working spec.md implementation.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
inference_state: SAM2 inference state
|
inference_state: SAM2 inference state
|
||||||
prompts: List of prompt dictionaries with obj_id and bbox
|
prompts: List of prompt dictionaries with obj_id and bbox
|
||||||
|
inference_scale: Scale factor to apply to bounding boxes
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
True if prompts were added successfully
|
True if prompts were added successfully
|
||||||
"""
|
"""
|
||||||
if not prompts:
|
if not prompts:
|
||||||
logger.warning("No prompts provided to SAM2")
|
logger.warning("SAM2 Debug: No prompts provided to SAM2")
|
||||||
return False
|
return False
|
||||||
|
|
||||||
try:
|
logger.info(f"SAM2 Debug: Received {len(prompts)} prompts to add to predictor")
|
||||||
for prompt in prompts:
|
|
||||||
|
success_count = 0
|
||||||
|
|
||||||
|
for i, prompt in enumerate(prompts):
|
||||||
obj_id = prompt['obj_id']
|
obj_id = prompt['obj_id']
|
||||||
bbox = prompt['bbox']
|
bbox = prompt['bbox']
|
||||||
|
confidence = prompt.get('confidence', 'unknown')
|
||||||
|
|
||||||
|
# Scale bounding box for SAM2 inference resolution
|
||||||
|
scaled_bbox = bbox * inference_scale
|
||||||
|
|
||||||
|
logger.info(f"SAM2 Debug: Adding prompt {i+1}/{len(prompts)}: Object {obj_id}")
|
||||||
|
logger.info(f" Original bbox: {bbox}")
|
||||||
|
logger.info(f" Scaled bbox (scale={inference_scale}): {scaled_bbox}")
|
||||||
|
logger.info(f" Confidence: {confidence}")
|
||||||
|
|
||||||
|
try:
|
||||||
_, out_obj_ids, out_mask_logits = self.predictor.add_new_points_or_box(
|
_, out_obj_ids, out_mask_logits = self.predictor.add_new_points_or_box(
|
||||||
inference_state=inference_state,
|
inference_state=inference_state,
|
||||||
frame_idx=0,
|
frame_idx=0,
|
||||||
obj_id=obj_id,
|
obj_id=obj_id,
|
||||||
box=bbox.astype(np.float32),
|
box=scaled_bbox.astype(np.float32),
|
||||||
)
|
)
|
||||||
|
|
||||||
logger.debug(f"Added prompt for Object {obj_id}: {bbox}")
|
logger.info(f"SAM2 Debug: ✓ Successfully added Object {obj_id} - returned obj_ids: {out_obj_ids}")
|
||||||
|
success_count += 1
|
||||||
logger.info(f"Successfully added {len(prompts)} prompts to SAM2")
|
|
||||||
return True
|
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Error adding prompts to SAM2: {e}")
|
logger.error(f"SAM2 Debug: ✗ Error adding Object {obj_id}: {e}")
|
||||||
|
# Continue processing other prompts even if one fails
|
||||||
|
continue
|
||||||
|
|
||||||
|
if success_count > 0:
|
||||||
|
logger.info(f"SAM2 Debug: Final result - {success_count}/{len(prompts)} prompts successfully added")
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
logger.error("SAM2 Debug: FAILED - No prompts were successfully added to SAM2")
|
||||||
return False
|
return False
|
||||||
|
|
||||||
def load_previous_segment_mask(self, prev_segment_dir: str) -> Optional[Dict[int, np.ndarray]]:
|
def load_previous_segment_mask(self, prev_segment_dir: str) -> Optional[Dict[int, np.ndarray]]:
|
||||||
@@ -218,32 +390,46 @@ class SAM2Processor:
|
|||||||
Dictionary mapping frame indices to object masks
|
Dictionary mapping frame indices to object masks
|
||||||
"""
|
"""
|
||||||
video_segments = {}
|
video_segments = {}
|
||||||
|
frame_count = 0
|
||||||
|
|
||||||
try:
|
try:
|
||||||
|
logger.info("Starting SAM2 mask propagation...")
|
||||||
for out_frame_idx, out_obj_ids, out_mask_logits in self.predictor.propagate_in_video(inference_state):
|
for out_frame_idx, out_obj_ids, out_mask_logits in self.predictor.propagate_in_video(inference_state):
|
||||||
video_segments[out_frame_idx] = {
|
video_segments[out_frame_idx] = {
|
||||||
out_obj_id: (out_mask_logits[i] > 0.0).cpu().numpy()
|
out_obj_id: (out_mask_logits[i] > 0.0).cpu().numpy()
|
||||||
for i, out_obj_id in enumerate(out_obj_ids)
|
for i, out_obj_id in enumerate(out_obj_ids)
|
||||||
}
|
}
|
||||||
|
frame_count += 1
|
||||||
|
|
||||||
logger.info(f"Propagated masks across {len(video_segments)} frames with {len(out_obj_ids)} objects")
|
# Log progress every 50 frames
|
||||||
|
if frame_count % 50 == 0:
|
||||||
|
logger.info(f"SAM2 propagation progress: {frame_count} frames processed")
|
||||||
|
|
||||||
|
logger.info(f"SAM2 propagation completed: {len(video_segments)} frames with {len(out_obj_ids) if 'out_obj_ids' in locals() else 0} objects")
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Error during mask propagation: {e}")
|
logger.error(f"Error during mask propagation after {frame_count} frames: {e}")
|
||||||
|
logger.error("This may be due to VOS optimization issues or insufficient GPU memory")
|
||||||
|
if frame_count == 0:
|
||||||
|
logger.error("No frames were processed - propagation failed completely")
|
||||||
|
else:
|
||||||
|
logger.warning(f"Partial propagation completed: {frame_count} frames before failure")
|
||||||
|
|
||||||
return video_segments
|
return video_segments
|
||||||
|
|
||||||
def process_single_segment(self, segment_info: dict, yolo_prompts: Optional[List[Dict[str, Any]]] = None,
|
def process_single_segment(self, segment_info: dict, yolo_prompts: Optional[List[Dict[str, Any]]] = None,
|
||||||
previous_masks: Optional[Dict[int, np.ndarray]] = None,
|
previous_masks: Optional[Dict[int, np.ndarray]] = None,
|
||||||
inference_scale: float = 0.5) -> Optional[Dict[int, Dict[int, np.ndarray]]]:
|
inference_scale: float = 0.5,
|
||||||
|
multi_frame_prompts: Optional[Dict[int, List[Dict[str, Any]]]] = None) -> Optional[Dict[int, Dict[int, np.ndarray]]]:
|
||||||
"""
|
"""
|
||||||
Process a single video segment with SAM2.
|
Process a single video segment with SAM2.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
segment_info: Segment information dictionary
|
segment_info: Segment information dictionary
|
||||||
yolo_prompts: Optional YOLO detection prompts
|
yolo_prompts: Optional YOLO detection prompts for first frame
|
||||||
previous_masks: Optional masks from previous segment
|
previous_masks: Optional masks from previous segment
|
||||||
inference_scale: Scale factor for inference
|
inference_scale: Scale factor for inference
|
||||||
|
multi_frame_prompts: Optional prompts for multiple frames (mid-segment detection)
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
Video segments dictionary or None if failed
|
Video segments dictionary or None if failed
|
||||||
@@ -260,13 +446,10 @@ class SAM2Processor:
|
|||||||
|
|
||||||
logger.info(f"Processing segment {segment_idx} with SAM2")
|
logger.info(f"Processing segment {segment_idx} with SAM2")
|
||||||
|
|
||||||
# Create low-resolution video for inference
|
# Create low-resolution video for inference (async-aware)
|
||||||
low_res_video_path = os.path.join(segment_dir, "low_res_video.mp4")
|
low_res_video_path = os.path.join(segment_dir, "low_res_video.mp4")
|
||||||
if not os.path.exists(low_res_video_path):
|
if not self.ensure_low_res_video(video_path, low_res_video_path, inference_scale, segment_idx):
|
||||||
try:
|
logger.error(f"Failed to create low-res video for segment {segment_idx}")
|
||||||
self.create_low_res_video(video_path, low_res_video_path, inference_scale)
|
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"Failed to create low-res video for segment {segment_idx}: {e}")
|
|
||||||
return None
|
return None
|
||||||
|
|
||||||
try:
|
try:
|
||||||
@@ -275,7 +458,7 @@ class SAM2Processor:
|
|||||||
|
|
||||||
# Add prompts or previous masks
|
# Add prompts or previous masks
|
||||||
if yolo_prompts:
|
if yolo_prompts:
|
||||||
if not self.add_yolo_prompts_to_predictor(inference_state, yolo_prompts):
|
if not self.add_yolo_prompts_to_predictor(inference_state, yolo_prompts, inference_scale):
|
||||||
return None
|
return None
|
||||||
elif previous_masks:
|
elif previous_masks:
|
||||||
if not self.add_previous_masks_to_predictor(inference_state, previous_masks):
|
if not self.add_previous_masks_to_predictor(inference_state, previous_masks):
|
||||||
@@ -284,6 +467,13 @@ class SAM2Processor:
|
|||||||
logger.error(f"No prompts or previous masks available for segment {segment_idx}")
|
logger.error(f"No prompts or previous masks available for segment {segment_idx}")
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
# Add mid-segment prompts if provided
|
||||||
|
if multi_frame_prompts:
|
||||||
|
logger.info(f"Adding mid-segment prompts for segment {segment_idx}")
|
||||||
|
if not self.add_multi_frame_prompts_to_predictor(inference_state, multi_frame_prompts):
|
||||||
|
logger.warning(f"Failed to add mid-segment prompts for segment {segment_idx}")
|
||||||
|
# Don't return None here - continue with existing prompts
|
||||||
|
|
||||||
# Propagate masks
|
# Propagate masks
|
||||||
video_segments = self.propagate_masks(inference_state)
|
video_segments = self.propagate_masks(inference_state)
|
||||||
|
|
||||||
@@ -299,13 +489,7 @@ class SAM2Processor:
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.warning(f"Could not remove low-res video: {e}")
|
logger.warning(f"Could not remove low-res video: {e}")
|
||||||
|
|
||||||
# Mark segment as completed (for resume capability)
|
|
||||||
try:
|
|
||||||
with open(output_done_file, 'w') as f:
|
|
||||||
f.write(f"Segment {segment_idx} completed successfully\n")
|
|
||||||
logger.debug(f"Marked segment {segment_idx} as completed")
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Could not create completion marker: {e}")
|
|
||||||
|
|
||||||
return video_segments
|
return video_segments
|
||||||
|
|
||||||
@@ -360,3 +544,464 @@ class SAM2Processor:
|
|||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Error saving final masks: {e}")
|
logger.error(f"Error saving final masks: {e}")
|
||||||
|
|
||||||
|
def generate_first_frame_debug_masks(self, video_path: str, prompts: List[Dict[str, Any]],
|
||||||
|
output_path: str, inference_scale: float = 0.5) -> bool:
|
||||||
|
"""
|
||||||
|
Generate SAM2 masks for just the first frame and save debug visualization.
|
||||||
|
This helps debug what SAM2 is producing for each detected object.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
video_path: Path to the video file
|
||||||
|
prompts: List of SAM2 prompt dictionaries
|
||||||
|
output_path: Path to save the debug image
|
||||||
|
inference_scale: Scale factor for SAM2 inference
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if debug masks were generated successfully
|
||||||
|
"""
|
||||||
|
if not prompts:
|
||||||
|
logger.warning("No prompts provided for first frame debug")
|
||||||
|
return False
|
||||||
|
|
||||||
|
try:
|
||||||
|
logger.info(f"SAM2 Debug: Generating first frame masks for {len(prompts)} objects")
|
||||||
|
|
||||||
|
# Load the first frame
|
||||||
|
cap = cv2.VideoCapture(video_path)
|
||||||
|
ret, original_frame = cap.read()
|
||||||
|
cap.release()
|
||||||
|
|
||||||
|
if not ret:
|
||||||
|
logger.error("Could not read first frame for debug mask generation")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Scale frame for inference if needed
|
||||||
|
if inference_scale != 1.0:
|
||||||
|
inference_frame = cv2.resize(original_frame, None, fx=inference_scale, fy=inference_scale, interpolation=cv2.INTER_LINEAR)
|
||||||
|
else:
|
||||||
|
inference_frame = original_frame.copy()
|
||||||
|
|
||||||
|
# Create temporary low-res video with just first frame
|
||||||
|
import tempfile
|
||||||
|
import os
|
||||||
|
temp_dir = tempfile.mkdtemp()
|
||||||
|
temp_video_path = os.path.join(temp_dir, "first_frame.mp4")
|
||||||
|
|
||||||
|
# Write single frame to temporary video
|
||||||
|
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
|
||||||
|
out = cv2.VideoWriter(temp_video_path, fourcc, 1.0, (inference_frame.shape[1], inference_frame.shape[0]))
|
||||||
|
out.write(inference_frame)
|
||||||
|
out.release()
|
||||||
|
|
||||||
|
# Initialize SAM2 inference state with single frame
|
||||||
|
inference_state = self.predictor.init_state(video_path=temp_video_path, async_loading_frames=True)
|
||||||
|
|
||||||
|
# Add prompts
|
||||||
|
if not self.add_yolo_prompts_to_predictor(inference_state, prompts, inference_scale):
|
||||||
|
logger.error("Failed to add prompts for first frame debug")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Generate masks for first frame only
|
||||||
|
frame_masks = {}
|
||||||
|
for out_frame_idx, out_obj_ids, out_mask_logits in self.predictor.propagate_in_video(inference_state):
|
||||||
|
if out_frame_idx == 0: # Only process first frame
|
||||||
|
frame_masks = {
|
||||||
|
out_obj_id: (out_mask_logits[i] > 0.0).cpu().numpy()
|
||||||
|
for i, out_obj_id in enumerate(out_obj_ids)
|
||||||
|
}
|
||||||
|
break
|
||||||
|
|
||||||
|
if not frame_masks:
|
||||||
|
logger.error("No masks generated for first frame debug")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Create debug visualization
|
||||||
|
debug_frame = original_frame.copy()
|
||||||
|
|
||||||
|
# Define colors for each object
|
||||||
|
colors = {
|
||||||
|
1: (0, 255, 0), # Green for Object 1 (Left eye)
|
||||||
|
2: (255, 0, 0), # Blue for Object 2 (Right eye)
|
||||||
|
3: (0, 255, 255), # Yellow for Object 3
|
||||||
|
4: (255, 0, 255), # Magenta for Object 4
|
||||||
|
}
|
||||||
|
|
||||||
|
# Overlay masks with transparency
|
||||||
|
for obj_id, mask in frame_masks.items():
|
||||||
|
mask = mask.squeeze()
|
||||||
|
|
||||||
|
# Resize mask to match original frame if needed
|
||||||
|
if mask.shape != original_frame.shape[:2]:
|
||||||
|
mask = cv2.resize(mask.astype(np.float32), (original_frame.shape[1], original_frame.shape[0]), interpolation=cv2.INTER_NEAREST)
|
||||||
|
mask = mask > 0.5
|
||||||
|
|
||||||
|
# Apply colored overlay
|
||||||
|
color = colors.get(obj_id, (128, 128, 128))
|
||||||
|
overlay = debug_frame.copy()
|
||||||
|
overlay[mask] = color
|
||||||
|
|
||||||
|
# Blend with original (30% overlay, 70% original)
|
||||||
|
cv2.addWeighted(overlay, 0.3, debug_frame, 0.7, 0, debug_frame)
|
||||||
|
|
||||||
|
# Draw outline
|
||||||
|
contours, _ = cv2.findContours(mask.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
|
||||||
|
cv2.drawContours(debug_frame, contours, -1, color, 2)
|
||||||
|
|
||||||
|
logger.info(f"SAM2 Debug: Object {obj_id} mask - shape: {mask.shape}, pixels: {np.sum(mask)}")
|
||||||
|
|
||||||
|
# Add title
|
||||||
|
title = f"SAM2 First Frame Masks: {len(frame_masks)} objects detected"
|
||||||
|
cv2.putText(debug_frame, title, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 2)
|
||||||
|
|
||||||
|
# Add mask source information
|
||||||
|
source_info = "Mask Source: SAM2 (from YOLO bounding boxes)"
|
||||||
|
cv2.putText(debug_frame, source_info, (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 255), 2)
|
||||||
|
|
||||||
|
# Add object legend
|
||||||
|
y_offset = 90
|
||||||
|
for obj_id in sorted(frame_masks.keys()):
|
||||||
|
color = colors.get(obj_id, (128, 128, 128))
|
||||||
|
text = f"Object {obj_id}: {'Left Eye' if obj_id == 1 else 'Right Eye' if obj_id == 2 else f'Object {obj_id}'}"
|
||||||
|
cv2.putText(debug_frame, text, (10, y_offset), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
|
||||||
|
y_offset += 30
|
||||||
|
|
||||||
|
# Save debug image
|
||||||
|
success = cv2.imwrite(output_path, debug_frame)
|
||||||
|
|
||||||
|
# Cleanup
|
||||||
|
self.predictor.reset_state(inference_state)
|
||||||
|
import shutil
|
||||||
|
shutil.rmtree(temp_dir)
|
||||||
|
|
||||||
|
if success:
|
||||||
|
logger.info(f"SAM2 Debug: Saved first frame masks to {output_path}")
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
logger.error(f"Failed to save first frame masks to {output_path}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error generating first frame debug masks: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def add_multi_frame_prompts_to_predictor(self, inference_state, multi_frame_prompts: Dict[int, Any]) -> bool:
|
||||||
|
"""
|
||||||
|
Add YOLO prompts at multiple frame indices for mid-segment re-detection.
|
||||||
|
Supports both bounding box prompts (detection mode) and mask prompts (segmentation mode).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
inference_state: SAM2 inference state
|
||||||
|
multi_frame_prompts: Dictionary mapping frame_index -> prompts (list of dicts for bbox, dict with 'masks' for segmentation)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if prompts were added successfully
|
||||||
|
"""
|
||||||
|
if not multi_frame_prompts:
|
||||||
|
logger.warning("SAM2 Mid-segment: No multi-frame prompts provided")
|
||||||
|
return False
|
||||||
|
|
||||||
|
success_count = 0
|
||||||
|
total_count = 0
|
||||||
|
|
||||||
|
for frame_idx, prompts_data in multi_frame_prompts.items():
|
||||||
|
# Check if this is segmentation mode (masks) or detection mode (bbox prompts)
|
||||||
|
if isinstance(prompts_data, dict) and 'masks' in prompts_data:
|
||||||
|
# Segmentation mode: add masks directly
|
||||||
|
masks_dict = prompts_data['masks']
|
||||||
|
logger.info(f"SAM2 Mid-segment: Processing frame {frame_idx} with {len(masks_dict)} YOLO masks")
|
||||||
|
|
||||||
|
for obj_id, mask in masks_dict.items():
|
||||||
|
total_count += 1
|
||||||
|
logger.info(f"SAM2 Mid-segment: Frame {frame_idx}, adding mask for Object {obj_id}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
self.predictor.add_new_mask(inference_state, frame_idx, obj_id, mask)
|
||||||
|
logger.info(f"SAM2 Mid-segment: ✓ Frame {frame_idx}, Object {obj_id} mask added successfully")
|
||||||
|
success_count += 1
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"SAM2 Mid-segment: ✗ Frame {frame_idx}, Object {obj_id} mask failed: {e}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
else:
|
||||||
|
# Detection mode: add bounding box prompts (existing logic)
|
||||||
|
prompts = prompts_data
|
||||||
|
logger.info(f"SAM2 Mid-segment: Processing frame {frame_idx} with {len(prompts)} bbox prompts")
|
||||||
|
|
||||||
|
for i, prompt in enumerate(prompts):
|
||||||
|
obj_id = prompt['obj_id']
|
||||||
|
bbox = prompt['bbox']
|
||||||
|
confidence = prompt.get('confidence', 'unknown')
|
||||||
|
total_count += 1
|
||||||
|
|
||||||
|
logger.info(f"SAM2 Mid-segment: Frame {frame_idx}, Prompt {i+1}/{len(prompts)}: Object {obj_id}, bbox={bbox}, conf={confidence}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
_, out_obj_ids, out_mask_logits = self.predictor.add_new_points_or_box(
|
||||||
|
inference_state=inference_state,
|
||||||
|
frame_idx=frame_idx, # Key: specify the exact frame index
|
||||||
|
obj_id=obj_id,
|
||||||
|
box=bbox.astype(np.float32),
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(f"SAM2 Mid-segment: ✓ Frame {frame_idx}, Object {obj_id} added successfully - returned obj_ids: {out_obj_ids}")
|
||||||
|
success_count += 1
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"SAM2 Mid-segment: ✗ Frame {frame_idx}, Object {obj_id} failed: {e}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
if success_count > 0:
|
||||||
|
logger.info(f"SAM2 Mid-segment: Final result - {success_count}/{total_count} prompts successfully added across {len(multi_frame_prompts)} frames")
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
logger.error("SAM2 Mid-segment: FAILED - No prompts were successfully added")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def process_single_eye_segment(self, segment_info: dict, eye_side: str,
|
||||||
|
yolo_prompts: Optional[List[Dict[str, Any]]] = None,
|
||||||
|
previous_masks: Optional[Dict[int, np.ndarray]] = None,
|
||||||
|
inference_scale: float = 0.5) -> Optional[Dict[int, np.ndarray]]:
|
||||||
|
"""
|
||||||
|
Process a single eye of a VR180 segment with SAM2.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
segment_info: Segment information dictionary
|
||||||
|
eye_side: 'left' or 'right' eye
|
||||||
|
yolo_prompts: Optional YOLO detection prompts for first frame
|
||||||
|
previous_masks: Optional masks from previous segment
|
||||||
|
inference_scale: Scale factor for inference
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary mapping frame indices to masks, or None if failed
|
||||||
|
"""
|
||||||
|
if not self.eye_processor:
|
||||||
|
logger.error("Eye processor not initialized - separate_eye_processing must be enabled")
|
||||||
|
return None
|
||||||
|
|
||||||
|
segment_dir = segment_info['directory']
|
||||||
|
video_path = segment_info['video_file']
|
||||||
|
segment_idx = segment_info['index']
|
||||||
|
|
||||||
|
logger.info(f"Processing {eye_side} eye for segment {segment_idx}")
|
||||||
|
|
||||||
|
# Use the video path directly (it should already be the eye-specific video)
|
||||||
|
eye_video_path = video_path
|
||||||
|
|
||||||
|
# Verify the eye video exists
|
||||||
|
if not os.path.exists(eye_video_path):
|
||||||
|
logger.error(f"Eye video not found: {eye_video_path}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Create low-resolution eye video for inference (async-aware)
|
||||||
|
low_res_eye_video_path = os.path.join(segment_dir, f"low_res_{eye_side}_eye_video.mp4")
|
||||||
|
if not self.ensure_low_res_video(eye_video_path, low_res_eye_video_path, inference_scale, segment_idx):
|
||||||
|
logger.error(f"Failed to create low-res {eye_side} eye video for segment {segment_idx}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Initialize inference state with eye-specific video
|
||||||
|
inference_state = self.predictor.init_state(video_path=low_res_eye_video_path, async_loading_frames=True)
|
||||||
|
|
||||||
|
# Add prompts or previous masks (always use obj_id=1 for single eye processing)
|
||||||
|
if yolo_prompts:
|
||||||
|
# Convert prompts to use obj_id=1 for single eye processing
|
||||||
|
eye_prompts = []
|
||||||
|
for prompt in yolo_prompts:
|
||||||
|
eye_prompt = prompt.copy()
|
||||||
|
eye_prompt['obj_id'] = 1 # Always use obj_id=1 for single eye
|
||||||
|
eye_prompts.append(eye_prompt)
|
||||||
|
|
||||||
|
if not self.add_yolo_prompts_to_predictor(inference_state, eye_prompts, inference_scale):
|
||||||
|
logger.error(f"Failed to add prompts for {eye_side} eye")
|
||||||
|
return None
|
||||||
|
|
||||||
|
elif previous_masks:
|
||||||
|
# Convert previous masks to use obj_id=1 for single eye processing
|
||||||
|
eye_masks = {1: list(previous_masks.values())[0]} if previous_masks else {}
|
||||||
|
if not self.add_previous_masks_to_predictor(inference_state, eye_masks):
|
||||||
|
logger.error(f"Failed to add previous masks for {eye_side} eye")
|
||||||
|
return None
|
||||||
|
else:
|
||||||
|
logger.error(f"No prompts or previous masks available for {eye_side} eye of segment {segment_idx}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Propagate masks
|
||||||
|
logger.info(f"Propagating masks for {eye_side} eye")
|
||||||
|
video_segments = self.propagate_masks(inference_state)
|
||||||
|
|
||||||
|
# Extract just the masks (remove obj_id structure since we only use obj_id=1)
|
||||||
|
eye_masks = {}
|
||||||
|
for frame_idx, frame_masks in video_segments.items():
|
||||||
|
if 1 in frame_masks: # We always use obj_id=1 for single eye processing
|
||||||
|
eye_masks[frame_idx] = frame_masks[1]
|
||||||
|
|
||||||
|
# Clean up
|
||||||
|
self.predictor.reset_state(inference_state)
|
||||||
|
del inference_state
|
||||||
|
gc.collect()
|
||||||
|
|
||||||
|
# Remove temporary low-res video
|
||||||
|
try:
|
||||||
|
os.remove(low_res_eye_video_path)
|
||||||
|
logger.debug(f"Removed low-res {eye_side} eye video: {low_res_eye_video_path}")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Could not remove low-res {eye_side} eye video: {e}")
|
||||||
|
|
||||||
|
logger.info(f"Successfully processed {eye_side} eye with {len(eye_masks)} frames")
|
||||||
|
return eye_masks
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error processing {eye_side} eye for segment {segment_idx}: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def process_segment_with_separate_eyes(self, segment_info: dict,
|
||||||
|
left_prompts: Optional[List[Dict[str, Any]]] = None,
|
||||||
|
right_prompts: Optional[List[Dict[str, Any]]] = None,
|
||||||
|
previous_left_masks: Optional[Dict[int, np.ndarray]] = None,
|
||||||
|
previous_right_masks: Optional[Dict[int, np.ndarray]] = None,
|
||||||
|
inference_scale: float = 0.5,
|
||||||
|
full_frame_shape: Optional[Tuple[int, int]] = None) -> Optional[Dict[int, Dict[int, np.ndarray]]]:
|
||||||
|
"""
|
||||||
|
Process a VR180 segment with separate left and right eye processing.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
segment_info: Segment information dictionary
|
||||||
|
left_prompts: Optional YOLO prompts for left eye
|
||||||
|
right_prompts: Optional YOLO prompts for right eye
|
||||||
|
previous_left_masks: Optional previous masks for left eye
|
||||||
|
previous_right_masks: Optional previous masks for right eye
|
||||||
|
inference_scale: Scale factor for inference
|
||||||
|
full_frame_shape: Shape of full VR180 frame (height, width)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Combined video segments dictionary or None if failed
|
||||||
|
"""
|
||||||
|
if not self.eye_processor:
|
||||||
|
logger.error("Eye processor not initialized - separate_eye_processing must be enabled")
|
||||||
|
return None
|
||||||
|
|
||||||
|
segment_idx = segment_info['index']
|
||||||
|
logger.info(f"Processing segment {segment_idx} with separate eye processing")
|
||||||
|
|
||||||
|
# Get full frame shape if not provided
|
||||||
|
if full_frame_shape is None:
|
||||||
|
try:
|
||||||
|
cap = cv2.VideoCapture(segment_info['video_file'])
|
||||||
|
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||||
|
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||||
|
cap.release()
|
||||||
|
full_frame_shape = (height, width)
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Could not determine frame shape: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Process left eye if prompts or previous masks are available
|
||||||
|
left_masks = None
|
||||||
|
if left_prompts or previous_left_masks:
|
||||||
|
logger.info(f"Processing left eye for segment {segment_idx}")
|
||||||
|
left_masks = self.process_single_eye_segment(
|
||||||
|
segment_info, 'left', left_prompts, previous_left_masks, inference_scale
|
||||||
|
)
|
||||||
|
|
||||||
|
# Process right eye if prompts or previous masks are available
|
||||||
|
right_masks = None
|
||||||
|
if right_prompts or previous_right_masks:
|
||||||
|
logger.info(f"Processing right eye for segment {segment_idx}")
|
||||||
|
right_masks = self.process_single_eye_segment(
|
||||||
|
segment_info, 'right', right_prompts, previous_right_masks, inference_scale
|
||||||
|
)
|
||||||
|
|
||||||
|
# Combine masks back to full frame format
|
||||||
|
if left_masks or right_masks:
|
||||||
|
logger.info(f"Combining eye masks for segment {segment_idx}")
|
||||||
|
combined_masks = self.eye_processor.combine_eye_masks(
|
||||||
|
left_masks, right_masks, full_frame_shape
|
||||||
|
)
|
||||||
|
|
||||||
|
# Clean up eye-specific videos to save space
|
||||||
|
try:
|
||||||
|
left_eye_path = os.path.join(segment_info['directory'], "left_eye_video.mp4")
|
||||||
|
right_eye_path = os.path.join(segment_info['directory'], "right_eye_video.mp4")
|
||||||
|
|
||||||
|
if os.path.exists(left_eye_path):
|
||||||
|
os.remove(left_eye_path)
|
||||||
|
logger.debug(f"Removed left eye video: {left_eye_path}")
|
||||||
|
|
||||||
|
if os.path.exists(right_eye_path):
|
||||||
|
os.remove(right_eye_path)
|
||||||
|
logger.debug(f"Removed right eye video: {right_eye_path}")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Could not clean up eye videos: {e}")
|
||||||
|
|
||||||
|
logger.info(f"Successfully processed segment {segment_idx} with separate eyes")
|
||||||
|
return combined_masks
|
||||||
|
else:
|
||||||
|
logger.warning(f"No masks generated for either eye in segment {segment_idx}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def create_greenscreen_segment(self, segment_info: dict, green_color: List[int] = [0, 255, 0]) -> bool:
|
||||||
|
"""
|
||||||
|
Create a full greenscreen segment when no humans are detected.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
segment_info: Segment information dictionary
|
||||||
|
green_color: RGB values for green screen color
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if greenscreen segment was created successfully
|
||||||
|
"""
|
||||||
|
segment_dir = segment_info['directory']
|
||||||
|
video_path = segment_info['video_file']
|
||||||
|
segment_idx = segment_info['index']
|
||||||
|
|
||||||
|
logger.info(f"Creating full greenscreen segment {segment_idx}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Get video properties
|
||||||
|
cap = cv2.VideoCapture(video_path)
|
||||||
|
if not cap.isOpened():
|
||||||
|
logger.error(f"Could not open video: {video_path}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||||
|
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||||
|
fps = cap.get(cv2.CAP_PROP_FPS)
|
||||||
|
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||||
|
cap.release()
|
||||||
|
|
||||||
|
# Create output video path
|
||||||
|
output_video_path = os.path.join(segment_dir, f"output_{segment_idx}.mp4")
|
||||||
|
|
||||||
|
# Create greenscreen frames
|
||||||
|
greenscreen_frame = self.eye_processor.create_full_greenscreen_frame(
|
||||||
|
(height, width, 3), green_color
|
||||||
|
)
|
||||||
|
|
||||||
|
# Write greenscreen video
|
||||||
|
fourcc = cv2.VideoWriter_fourcc(*'HEVC')
|
||||||
|
out = cv2.VideoWriter(output_video_path, fourcc, fps, (width, height))
|
||||||
|
|
||||||
|
for _ in range(frame_count):
|
||||||
|
out.write(greenscreen_frame)
|
||||||
|
|
||||||
|
out.release()
|
||||||
|
|
||||||
|
# Create mask file (empty/black mask since no humans detected)
|
||||||
|
mask_output_path = os.path.join(segment_dir, "mask.png")
|
||||||
|
black_mask = np.zeros((height, width, 3), dtype=np.uint8)
|
||||||
|
cv2.imwrite(mask_output_path, black_mask)
|
||||||
|
|
||||||
|
# Mark segment as completed
|
||||||
|
output_done_file = os.path.join(segment_dir, "output_frames_done")
|
||||||
|
with open(output_done_file, 'w') as f:
|
||||||
|
f.write(f"Greenscreen segment {segment_idx} completed successfully\n")
|
||||||
|
|
||||||
|
logger.info(f"Successfully created greenscreen segment {segment_idx}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error creating greenscreen segment {segment_idx}: {e}")
|
||||||
|
return False
|
||||||
|
|||||||
306
core/video_assembler.py
Normal file
306
core/video_assembler.py
Normal file
@@ -0,0 +1,306 @@
|
|||||||
|
"""
|
||||||
|
Video assembler module for concatenating processed segments.
|
||||||
|
Handles merging processed segments and adding audio from original video.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
import logging
|
||||||
|
from typing import List, Optional
|
||||||
|
from utils.file_utils import get_segments_directories, file_exists
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
class VideoAssembler:
|
||||||
|
"""Handles final video assembly from processed segments."""
|
||||||
|
|
||||||
|
def __init__(self, preserve_audio: bool = True, use_nvenc: bool = False,
|
||||||
|
output_mode: str = "green_screen"):
|
||||||
|
"""
|
||||||
|
Initialize video assembler.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
preserve_audio: Whether to preserve audio from original video
|
||||||
|
use_nvenc: Whether to use hardware encoding for final output
|
||||||
|
output_mode: Output mode - "green_screen" or "alpha_channel"
|
||||||
|
"""
|
||||||
|
self.preserve_audio = preserve_audio
|
||||||
|
self.use_nvenc = use_nvenc
|
||||||
|
self.output_mode = output_mode
|
||||||
|
|
||||||
|
def create_concat_file(self, segments_dir: str, output_filename: str = "concat_list.txt") -> Optional[str]:
|
||||||
|
"""
|
||||||
|
Create a concatenation file for FFmpeg.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
segments_dir: Directory containing processed segments
|
||||||
|
output_filename: Name for the concat file
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Path to concat file or None if no valid segments found
|
||||||
|
"""
|
||||||
|
concat_path = os.path.join(segments_dir, output_filename)
|
||||||
|
valid_segments = 0
|
||||||
|
|
||||||
|
try:
|
||||||
|
segments = get_segments_directories(segments_dir)
|
||||||
|
|
||||||
|
with open(concat_path, 'w') as f:
|
||||||
|
for i, segment in enumerate(segments):
|
||||||
|
segment_dir = os.path.join(segments_dir, segment)
|
||||||
|
if self.output_mode == "alpha_channel":
|
||||||
|
output_video = os.path.join(segment_dir, f"output_{i}.mov")
|
||||||
|
else:
|
||||||
|
output_video = os.path.join(segment_dir, f"output_{i}.mp4")
|
||||||
|
|
||||||
|
if file_exists(output_video):
|
||||||
|
# Use relative path for FFmpeg
|
||||||
|
relative_path = os.path.relpath(output_video, segments_dir)
|
||||||
|
f.write(f"file '{relative_path}'\n")
|
||||||
|
valid_segments += 1
|
||||||
|
else:
|
||||||
|
logger.warning(f"Output video not found for segment {i}: {output_video}")
|
||||||
|
|
||||||
|
if valid_segments == 0:
|
||||||
|
logger.error("No valid output segments found for concatenation")
|
||||||
|
os.remove(concat_path)
|
||||||
|
return None
|
||||||
|
|
||||||
|
logger.info(f"Created concatenation file with {valid_segments} segments: {concat_path}")
|
||||||
|
return concat_path
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error creating concatenation file: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def concatenate_segments(self, segments_dir: str, output_path: str,
|
||||||
|
bitrate: str = "50M") -> bool:
|
||||||
|
"""
|
||||||
|
Concatenate video segments using FFmpeg.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
segments_dir: Directory containing processed segments
|
||||||
|
output_path: Path for final concatenated video
|
||||||
|
bitrate: Output video bitrate
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful
|
||||||
|
"""
|
||||||
|
# Create concatenation file
|
||||||
|
concat_file = self.create_concat_file(segments_dir)
|
||||||
|
if not concat_file:
|
||||||
|
return False
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Build FFmpeg command
|
||||||
|
if self.output_mode == "alpha_channel":
|
||||||
|
# For alpha channel, we need to maintain the ProRes codec
|
||||||
|
cmd = [
|
||||||
|
'ffmpeg',
|
||||||
|
'-y', # Overwrite output
|
||||||
|
'-f', 'concat',
|
||||||
|
'-safe', '0',
|
||||||
|
'-i', concat_file,
|
||||||
|
'-c:v', 'copy', # Copy video codec to preserve alpha
|
||||||
|
'-an', # No audio for now
|
||||||
|
output_path
|
||||||
|
]
|
||||||
|
else:
|
||||||
|
cmd = [
|
||||||
|
'ffmpeg',
|
||||||
|
'-y', # Overwrite output
|
||||||
|
'-f', 'concat',
|
||||||
|
'-safe', '0',
|
||||||
|
'-i', concat_file,
|
||||||
|
'-c:v', 'copy', # Copy video codec (no re-encoding)
|
||||||
|
'-an', # No audio for now
|
||||||
|
output_path
|
||||||
|
]
|
||||||
|
|
||||||
|
# Use hardware encoding if requested
|
||||||
|
if self.use_nvenc:
|
||||||
|
import sys
|
||||||
|
if sys.platform == 'darwin':
|
||||||
|
encoder = 'hevc_videotoolbox'
|
||||||
|
else:
|
||||||
|
encoder = 'hevc_nvenc'
|
||||||
|
|
||||||
|
# Re-encode with hardware acceleration
|
||||||
|
cmd = [
|
||||||
|
'ffmpeg',
|
||||||
|
'-y',
|
||||||
|
'-f', 'concat',
|
||||||
|
'-safe', '0',
|
||||||
|
'-i', concat_file,
|
||||||
|
'-c:v', encoder,
|
||||||
|
'-preset', 'slow',
|
||||||
|
'-b:v', bitrate,
|
||||||
|
'-pix_fmt', 'yuv420p',
|
||||||
|
'-an',
|
||||||
|
output_path
|
||||||
|
]
|
||||||
|
|
||||||
|
logger.info(f"Running concatenation command: {' '.join(cmd)}")
|
||||||
|
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
logger.error(f"FFmpeg concatenation failed: {result.stderr}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
logger.info(f"Successfully concatenated segments to: {output_path}")
|
||||||
|
|
||||||
|
# Clean up concat file
|
||||||
|
try:
|
||||||
|
os.remove(concat_file)
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error during concatenation: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def copy_audio_from_original(self, original_video: str, processed_video: str,
|
||||||
|
final_output: str) -> bool:
|
||||||
|
"""
|
||||||
|
Copy audio track from original video to processed video.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
original_video: Path to original video with audio
|
||||||
|
processed_video: Path to processed video without audio
|
||||||
|
final_output: Path for final output with audio
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful
|
||||||
|
"""
|
||||||
|
if not self.preserve_audio:
|
||||||
|
logger.info("Audio preservation disabled, skipping audio copy")
|
||||||
|
return True
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Check if original video has audio
|
||||||
|
probe_cmd = [
|
||||||
|
'ffprobe',
|
||||||
|
'-v', 'error',
|
||||||
|
'-select_streams', 'a:0',
|
||||||
|
'-show_entries', 'stream=codec_type',
|
||||||
|
'-of', 'csv=p=0',
|
||||||
|
original_video
|
||||||
|
]
|
||||||
|
|
||||||
|
result = subprocess.run(probe_cmd, capture_output=True, text=True)
|
||||||
|
|
||||||
|
if result.returncode != 0 or result.stdout.strip() != 'audio':
|
||||||
|
logger.warning("Original video has no audio track")
|
||||||
|
# Just copy the processed video
|
||||||
|
import shutil
|
||||||
|
shutil.copy2(processed_video, final_output)
|
||||||
|
return True
|
||||||
|
|
||||||
|
# Copy audio from original to processed video
|
||||||
|
cmd = [
|
||||||
|
'ffmpeg',
|
||||||
|
'-y',
|
||||||
|
'-i', processed_video, # Video input
|
||||||
|
'-i', original_video, # Audio input
|
||||||
|
'-c:v', 'copy', # Copy video stream
|
||||||
|
'-c:a', 'copy', # Copy audio stream
|
||||||
|
'-map', '0:v:0', # Map video from first input
|
||||||
|
'-map', '1:a:0', # Map audio from second input
|
||||||
|
'-shortest', # Match duration to shortest stream
|
||||||
|
final_output
|
||||||
|
]
|
||||||
|
|
||||||
|
logger.info("Copying audio from original video...")
|
||||||
|
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
logger.error(f"FFmpeg audio copy failed: {result.stderr}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
logger.info(f"Successfully added audio to final video: {final_output}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error copying audio: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def assemble_final_video(self, segments_dir: str, original_video: str,
|
||||||
|
output_path: str, bitrate: str = "50M") -> bool:
|
||||||
|
"""
|
||||||
|
Complete pipeline to assemble final video with audio.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
segments_dir: Directory containing processed segments
|
||||||
|
original_video: Path to original video (for audio)
|
||||||
|
output_path: Path for final output video
|
||||||
|
bitrate: Output video bitrate
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful
|
||||||
|
"""
|
||||||
|
logger.info("Starting final video assembly...")
|
||||||
|
|
||||||
|
# Step 1: Concatenate segments
|
||||||
|
temp_concat_path = os.path.join(os.path.dirname(output_path), "temp_concat.mp4")
|
||||||
|
|
||||||
|
if not self.concatenate_segments(segments_dir, temp_concat_path, bitrate):
|
||||||
|
logger.error("Failed to concatenate segments")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Step 2: Add audio from original
|
||||||
|
if self.preserve_audio and file_exists(original_video):
|
||||||
|
success = self.copy_audio_from_original(original_video, temp_concat_path, output_path)
|
||||||
|
|
||||||
|
# Clean up temp file
|
||||||
|
try:
|
||||||
|
os.remove(temp_concat_path)
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return success
|
||||||
|
else:
|
||||||
|
# No audio to add, just rename temp file
|
||||||
|
import shutil
|
||||||
|
try:
|
||||||
|
shutil.move(temp_concat_path, output_path)
|
||||||
|
logger.info(f"Final video saved to: {output_path}")
|
||||||
|
return True
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error moving final video: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def verify_segment_completeness(self, segments_dir: str) -> tuple[bool, List[int]]:
|
||||||
|
"""
|
||||||
|
Verify all segments have been processed.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
segments_dir: Directory containing segments
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (all_complete, missing_segments)
|
||||||
|
"""
|
||||||
|
segments = get_segments_directories(segments_dir)
|
||||||
|
missing_segments = []
|
||||||
|
|
||||||
|
for i, segment in enumerate(segments):
|
||||||
|
segment_dir = os.path.join(segments_dir, segment)
|
||||||
|
if self.output_mode == "alpha_channel":
|
||||||
|
output_video = os.path.join(segment_dir, f"output_{i}.mov")
|
||||||
|
else:
|
||||||
|
output_video = os.path.join(segment_dir, f"output_{i}.mp4")
|
||||||
|
|
||||||
|
if not file_exists(output_video):
|
||||||
|
missing_segments.append(i)
|
||||||
|
|
||||||
|
all_complete = len(missing_segments) == 0
|
||||||
|
|
||||||
|
if all_complete:
|
||||||
|
logger.info(f"All {len(segments)} segments have been processed")
|
||||||
|
else:
|
||||||
|
logger.warning(f"Missing output for segments: {missing_segments}")
|
||||||
|
|
||||||
|
return all_complete, missing_segments
|
||||||
@@ -7,7 +7,7 @@ import os
|
|||||||
import subprocess
|
import subprocess
|
||||||
import logging
|
import logging
|
||||||
from typing import List, Tuple
|
from typing import List, Tuple
|
||||||
from ..utils.file_utils import ensure_directory, get_video_file_name
|
from utils.file_utils import ensure_directory, get_video_file_name
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
@@ -44,6 +44,14 @@ class VideoSplitter:
|
|||||||
segments_dir = os.path.join(output_dir, f"{video_name}_segments")
|
segments_dir = os.path.join(output_dir, f"{video_name}_segments")
|
||||||
ensure_directory(segments_dir)
|
ensure_directory(segments_dir)
|
||||||
|
|
||||||
|
# Check for completion marker to avoid re-splitting
|
||||||
|
completion_marker = os.path.join(segments_dir, ".splitting_done")
|
||||||
|
if os.path.exists(completion_marker):
|
||||||
|
logger.info(f"Video already split, skipping splitting process. Found completion marker: {completion_marker}")
|
||||||
|
segment_dirs = [d for d in os.listdir(segments_dir) if os.path.isdir(os.path.join(segments_dir, d)) and d.startswith("segment_")]
|
||||||
|
segment_dirs.sort(key=lambda x: int(x.split("_")[1]))
|
||||||
|
return segments_dir, segment_dirs
|
||||||
|
|
||||||
logger.info(f"Splitting video {input_video} into {self.segment_duration}s segments")
|
logger.info(f"Splitting video {input_video} into {self.segment_duration}s segments")
|
||||||
|
|
||||||
# Split video using ffmpeg
|
# Split video using ffmpeg
|
||||||
@@ -83,6 +91,11 @@ class VideoSplitter:
|
|||||||
# Create file list for later concatenation
|
# Create file list for later concatenation
|
||||||
self._create_file_list(segments_dir, segment_dirs)
|
self._create_file_list(segments_dir, segment_dirs)
|
||||||
|
|
||||||
|
# Create completion marker
|
||||||
|
completion_marker = os.path.join(segments_dir, ".splitting_done")
|
||||||
|
with open(completion_marker, 'w') as f:
|
||||||
|
f.write("Video splitting completed successfully.")
|
||||||
|
|
||||||
logger.info(f"Successfully split video into {len(segment_dirs)} segments")
|
logger.info(f"Successfully split video into {len(segment_dirs)} segments")
|
||||||
return segments_dir, segment_dirs
|
return segments_dir, segment_dirs
|
||||||
|
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
317
download_models.py
Executable file
317
download_models.py
Executable file
@@ -0,0 +1,317 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Model download script for YOLO + SAM2 video processing pipeline.
|
||||||
|
Downloads SAM2.1 models and organizes them in the models directory.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import urllib.request
|
||||||
|
import urllib.error
|
||||||
|
from pathlib import Path
|
||||||
|
import sys
|
||||||
|
|
||||||
|
def create_directory_structure():
|
||||||
|
"""Create the models directory structure."""
|
||||||
|
base_dir = Path(__file__).parent
|
||||||
|
models_dir = base_dir / "models"
|
||||||
|
|
||||||
|
# Create main models directory
|
||||||
|
models_dir.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
# Create subdirectories
|
||||||
|
sam2_dir = models_dir / "sam2"
|
||||||
|
sam2_configs_dir = sam2_dir / "configs" / "sam2.1"
|
||||||
|
sam2_checkpoints_dir = sam2_dir / "checkpoints"
|
||||||
|
yolo_dir = models_dir / "yolo"
|
||||||
|
|
||||||
|
sam2_dir.mkdir(exist_ok=True)
|
||||||
|
sam2_configs_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
sam2_checkpoints_dir.mkdir(exist_ok=True)
|
||||||
|
yolo_dir.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
print(f"Created models directory structure in: {models_dir}")
|
||||||
|
return models_dir, sam2_configs_dir, sam2_checkpoints_dir, yolo_dir
|
||||||
|
|
||||||
|
def download_file(url, destination, description="file"):
|
||||||
|
"""Download a file with progress indication."""
|
||||||
|
try:
|
||||||
|
print(f"Downloading {description}...")
|
||||||
|
print(f" URL: {url}")
|
||||||
|
print(f" Destination: {destination}")
|
||||||
|
|
||||||
|
def progress_hook(block_num, block_size, total_size):
|
||||||
|
if total_size > 0:
|
||||||
|
percent = min(100, (block_num * block_size * 100) // total_size)
|
||||||
|
sys.stdout.write(f"\r Progress: {percent}%")
|
||||||
|
sys.stdout.flush()
|
||||||
|
|
||||||
|
urllib.request.urlretrieve(url, destination, progress_hook)
|
||||||
|
print(f"\n ✓ Downloaded {description}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except urllib.error.URLError as e:
|
||||||
|
print(f"\n ✗ Failed to download {description}: {e}")
|
||||||
|
return False
|
||||||
|
except Exception as e:
|
||||||
|
print(f"\n ✗ Error downloading {description}: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def download_sam2_models():
|
||||||
|
"""Download SAM2.1 model configurations and checkpoints."""
|
||||||
|
print("Setting up SAM2.1 models...")
|
||||||
|
|
||||||
|
# Create directory structure
|
||||||
|
models_dir, configs_dir, checkpoints_dir, yolo_dir = create_directory_structure()
|
||||||
|
|
||||||
|
# SAM2.1 model definitions
|
||||||
|
sam2_models = {
|
||||||
|
"tiny": {
|
||||||
|
"config_url": "https://raw.githubusercontent.com/facebookresearch/sam2/main/sam2/configs/sam2.1/sam2.1_hiera_t.yaml",
|
||||||
|
"checkpoint_url": "https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_tiny.pt",
|
||||||
|
"config_file": "sam2.1_hiera_t.yaml",
|
||||||
|
"checkpoint_file": "sam2.1_hiera_tiny.pt"
|
||||||
|
},
|
||||||
|
"small": {
|
||||||
|
"config_url": "https://raw.githubusercontent.com/facebookresearch/sam2/main/sam2/configs/sam2.1/sam2.1_hiera_s.yaml",
|
||||||
|
"checkpoint_url": "https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_small.pt",
|
||||||
|
"config_file": "sam2.1_hiera_s.yaml",
|
||||||
|
"checkpoint_file": "sam2.1_hiera_small.pt"
|
||||||
|
},
|
||||||
|
"base_plus": {
|
||||||
|
"config_url": "https://raw.githubusercontent.com/facebookresearch/sam2/main/sam2/configs/sam2.1/sam2.1_hiera_b+.yaml",
|
||||||
|
"checkpoint_url": "https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_base_plus.pt",
|
||||||
|
"config_file": "sam2.1_hiera_b+.yaml",
|
||||||
|
"checkpoint_file": "sam2.1_hiera_base_plus.pt"
|
||||||
|
},
|
||||||
|
"large": {
|
||||||
|
"config_url": "https://raw.githubusercontent.com/facebookresearch/sam2/main/sam2/configs/sam2.1/sam2.1_hiera_l.yaml",
|
||||||
|
"checkpoint_url": "https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt",
|
||||||
|
"config_file": "sam2.1_hiera_l.yaml",
|
||||||
|
"checkpoint_file": "sam2.1_hiera_large.pt"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
success_count = 0
|
||||||
|
total_downloads = len(sam2_models) * 2 # configs + checkpoints
|
||||||
|
|
||||||
|
# Download each model's config and checkpoint
|
||||||
|
for model_name, model_info in sam2_models.items():
|
||||||
|
print(f"\n--- Downloading SAM2.1 {model_name.upper()} model ---")
|
||||||
|
|
||||||
|
# Download config file
|
||||||
|
config_path = configs_dir / model_info["config_file"]
|
||||||
|
if not config_path.exists():
|
||||||
|
if download_file(
|
||||||
|
model_info["config_url"],
|
||||||
|
config_path,
|
||||||
|
f"SAM2.1 {model_name} config"
|
||||||
|
):
|
||||||
|
success_count += 1
|
||||||
|
else:
|
||||||
|
print(f" ✓ Config file already exists: {config_path}")
|
||||||
|
success_count += 1
|
||||||
|
|
||||||
|
# Download checkpoint file
|
||||||
|
checkpoint_path = checkpoints_dir / model_info["checkpoint_file"]
|
||||||
|
if not checkpoint_path.exists():
|
||||||
|
if download_file(
|
||||||
|
model_info["checkpoint_url"],
|
||||||
|
checkpoint_path,
|
||||||
|
f"SAM2.1 {model_name} checkpoint"
|
||||||
|
):
|
||||||
|
success_count += 1
|
||||||
|
else:
|
||||||
|
print(f" ✓ Checkpoint file already exists: {checkpoint_path}")
|
||||||
|
success_count += 1
|
||||||
|
|
||||||
|
print(f"\n=== Download Summary ===")
|
||||||
|
print(f"Successfully downloaded: {success_count}/{total_downloads} files")
|
||||||
|
|
||||||
|
if success_count == total_downloads:
|
||||||
|
print("✓ All SAM2.1 models downloaded successfully!")
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
print(f"⚠ Some downloads failed ({total_downloads - success_count} files)")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def download_yolo_models():
|
||||||
|
"""Download default YOLO models to models directory."""
|
||||||
|
print("\n--- Setting up YOLO models ---")
|
||||||
|
print(" Downloading both detection and segmentation models...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
from ultralytics import YOLO
|
||||||
|
import torch
|
||||||
|
|
||||||
|
# Default YOLO models to download (both detection and segmentation)
|
||||||
|
yolo_models = [
|
||||||
|
"yolov8n.pt", # Detection models
|
||||||
|
"yolov8s.pt",
|
||||||
|
"yolov8m.pt",
|
||||||
|
"yolo11l.pt", # YOLOv11 detection models
|
||||||
|
"yolo11x.pt",
|
||||||
|
"yolov8n-seg.pt", # Segmentation models
|
||||||
|
"yolov8s-seg.pt",
|
||||||
|
"yolov8m-seg.pt",
|
||||||
|
"yolo11l-seg.pt", # YOLOv11 segmentation models
|
||||||
|
"yolo11x-seg.pt"
|
||||||
|
]
|
||||||
|
models_dir = Path(__file__).parent / "models" / "yolo"
|
||||||
|
|
||||||
|
for model_name in yolo_models:
|
||||||
|
model_path = models_dir / model_name
|
||||||
|
if not model_path.exists():
|
||||||
|
print(f"Downloading {model_name}...")
|
||||||
|
try:
|
||||||
|
# First try to download using the YOLO class with export
|
||||||
|
model = YOLO(model_name)
|
||||||
|
|
||||||
|
# Export/save the model to our directory
|
||||||
|
# The model.ckpt is the internal checkpoint
|
||||||
|
if hasattr(model, 'ckpt') and hasattr(model.ckpt, 'save'):
|
||||||
|
# Save the checkpoint directly
|
||||||
|
torch.save(model.ckpt, str(model_path))
|
||||||
|
print(f" ✓ Saved {model_name} to models directory")
|
||||||
|
else:
|
||||||
|
# Alternative: try to find where YOLO downloaded the model
|
||||||
|
import shutil
|
||||||
|
|
||||||
|
# Common locations where YOLO might store models
|
||||||
|
possible_paths = [
|
||||||
|
Path.home() / ".cache" / "ultralytics" / "models" / model_name,
|
||||||
|
Path.home() / ".ultralytics" / "models" / model_name,
|
||||||
|
Path.home() / "runs" / "detect" / model_name,
|
||||||
|
Path.cwd() / model_name, # Current directory
|
||||||
|
]
|
||||||
|
|
||||||
|
found = False
|
||||||
|
for possible_path in possible_paths:
|
||||||
|
if possible_path.exists():
|
||||||
|
shutil.copy2(possible_path, model_path)
|
||||||
|
print(f" ✓ Copied {model_name} from {possible_path}")
|
||||||
|
found = True
|
||||||
|
# Clean up if it was downloaded to current directory
|
||||||
|
if possible_path.parent == Path.cwd() and possible_path != model_path:
|
||||||
|
possible_path.unlink()
|
||||||
|
break
|
||||||
|
|
||||||
|
if not found:
|
||||||
|
# Last resort: use urllib to download directly
|
||||||
|
# Use different release versions for different YOLO versions
|
||||||
|
if model_name.startswith("yolov11"):
|
||||||
|
yolo_url = f"https://github.com/ultralytics/assets/releases/download/v8.3.0/{model_name}"
|
||||||
|
else:
|
||||||
|
yolo_url = f"https://github.com/ultralytics/assets/releases/download/v8.2.0/{model_name}"
|
||||||
|
print(f" Downloading directly from {yolo_url}...")
|
||||||
|
download_file(yolo_url, str(model_path), f"YOLO {model_name}")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ⚠ Error downloading {model_name}: {e}")
|
||||||
|
# Try direct download as fallback
|
||||||
|
try:
|
||||||
|
# Use different release versions for different YOLO versions
|
||||||
|
if model_name.startswith("yolov11"):
|
||||||
|
yolo_url = f"https://github.com/ultralytics/assets/releases/download/v8.3.0/{model_name}"
|
||||||
|
else:
|
||||||
|
yolo_url = f"https://github.com/ultralytics/assets/releases/download/v8.2.0/{model_name}"
|
||||||
|
print(f" Trying direct download from {yolo_url}...")
|
||||||
|
download_file(yolo_url, str(model_path), f"YOLO {model_name}")
|
||||||
|
except Exception as e2:
|
||||||
|
print(f" ✗ Failed to download {model_name}: {e2}")
|
||||||
|
else:
|
||||||
|
print(f" ✓ {model_name} already exists")
|
||||||
|
|
||||||
|
# Verify all models exist
|
||||||
|
success = all((models_dir / model).exists() for model in yolo_models)
|
||||||
|
if success:
|
||||||
|
print("✓ YOLO models setup complete!")
|
||||||
|
print(" Available detection models: yolov8n.pt, yolov8s.pt, yolov8m.pt, yolov11l.pt, yolov11x.pt")
|
||||||
|
print(" Available segmentation models: yolov8n-seg.pt, yolov8s-seg.pt, yolov8m-seg.pt, yolov11l-seg.pt, yolov11x-seg.pt")
|
||||||
|
else:
|
||||||
|
missing_models = [model for model in yolo_models if not (models_dir / model).exists()]
|
||||||
|
print("⚠ Some YOLO models may be missing:")
|
||||||
|
for model in missing_models:
|
||||||
|
print(f" - {model}")
|
||||||
|
return success
|
||||||
|
|
||||||
|
except ImportError:
|
||||||
|
print("⚠ ultralytics not installed. YOLO models will be downloaded on first use.")
|
||||||
|
return False
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠ Error setting up YOLO models: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def update_config_file():
|
||||||
|
"""Update config.yaml to use local model paths."""
|
||||||
|
print("\n--- Updating config.yaml ---")
|
||||||
|
|
||||||
|
config_path = Path(__file__).parent / "config.yaml"
|
||||||
|
if not config_path.exists():
|
||||||
|
print("⚠ config.yaml not found, skipping update")
|
||||||
|
return False
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Read current config
|
||||||
|
with open(config_path, 'r') as f:
|
||||||
|
content = f.read()
|
||||||
|
|
||||||
|
# Update model paths to use local models
|
||||||
|
updated_content = content.replace(
|
||||||
|
'yolo_model: "yolov8n.pt"',
|
||||||
|
'yolo_model: "models/yolo/yolov8n.pt"'
|
||||||
|
).replace(
|
||||||
|
'yolo_detection_model: "models/yolo/yolov8n.pt"',
|
||||||
|
'yolo_detection_model: "models/yolo/yolov8n.pt"'
|
||||||
|
).replace(
|
||||||
|
'yolo_segmentation_model: "models/yolo/yolov8n-seg.pt"',
|
||||||
|
'yolo_segmentation_model: "models/yolo/yolov8n-seg.pt"'
|
||||||
|
).replace(
|
||||||
|
'sam2_checkpoint: "../checkpoints/sam2.1_hiera_large.pt"',
|
||||||
|
'sam2_checkpoint: "models/sam2/checkpoints/sam2.1_hiera_large.pt"'
|
||||||
|
).replace(
|
||||||
|
'sam2_config: "configs/sam2.1/sam2.1_hiera_l.yaml"',
|
||||||
|
'sam2_config: "models/sam2/configs/sam2.1/sam2.1_hiera_l.yaml"'
|
||||||
|
)
|
||||||
|
|
||||||
|
# Write updated config
|
||||||
|
with open(config_path, 'w') as f:
|
||||||
|
f.write(updated_content)
|
||||||
|
|
||||||
|
print("✓ Updated config.yaml to use local model paths")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠ Error updating config.yaml: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Main function to download all models."""
|
||||||
|
print("🤖 YOLO + SAM2 Model Download Script")
|
||||||
|
print("="*50)
|
||||||
|
|
||||||
|
# Download SAM2 models
|
||||||
|
sam2_success = download_sam2_models()
|
||||||
|
|
||||||
|
# Download YOLO models
|
||||||
|
yolo_success = download_yolo_models()
|
||||||
|
|
||||||
|
# Update config file
|
||||||
|
config_success = update_config_file()
|
||||||
|
|
||||||
|
print("\n" + "="*50)
|
||||||
|
print("📋 Final Summary:")
|
||||||
|
print(f" SAM2 models: {'✓' if sam2_success else '⚠'}")
|
||||||
|
print(f" YOLO models: {'✓' if yolo_success else '⚠'}")
|
||||||
|
print(f" Config update: {'✓' if config_success else '⚠'}")
|
||||||
|
|
||||||
|
if sam2_success and config_success:
|
||||||
|
print("\n🎉 Setup complete! You can now run the pipeline with:")
|
||||||
|
print(" python main.py --config config.yaml")
|
||||||
|
else:
|
||||||
|
print("\n⚠ Some steps failed. Check the output above for details.")
|
||||||
|
|
||||||
|
print("\n📁 Models are organized in:")
|
||||||
|
print(f" {Path(__file__).parent / 'models'}")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
783
main.py
783
main.py
@@ -8,6 +8,8 @@ and creating green screen masks with SAM2.
|
|||||||
import os
|
import os
|
||||||
import sys
|
import sys
|
||||||
import argparse
|
import argparse
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
from typing import List
|
from typing import List
|
||||||
|
|
||||||
# Add project root to path
|
# Add project root to path
|
||||||
@@ -16,6 +18,9 @@ sys.path.append(os.path.dirname(__file__))
|
|||||||
from core.config_loader import ConfigLoader
|
from core.config_loader import ConfigLoader
|
||||||
from core.video_splitter import VideoSplitter
|
from core.video_splitter import VideoSplitter
|
||||||
from core.yolo_detector import YOLODetector
|
from core.yolo_detector import YOLODetector
|
||||||
|
from core.sam2_processor import SAM2Processor
|
||||||
|
from core.mask_processor import MaskProcessor
|
||||||
|
from core.video_assembler import VideoAssembler
|
||||||
from utils.logging_utils import setup_logging, get_logger
|
from utils.logging_utils import setup_logging, get_logger
|
||||||
from utils.file_utils import ensure_directory
|
from utils.file_utils import ensure_directory
|
||||||
from utils.status_utils import print_processing_status, cleanup_incomplete_segment
|
from utils.status_utils import print_processing_status, cleanup_incomplete_segment
|
||||||
@@ -66,6 +71,100 @@ def validate_dependencies():
|
|||||||
logger.error("Please install requirements: pip install -r requirements.txt")
|
logger.error("Please install requirements: pip install -r requirements.txt")
|
||||||
return False
|
return False
|
||||||
|
|
||||||
|
def create_yolo_mask_debug_frame(detections: List[dict], video_path: str, output_path: str, scale: float = 1.0) -> bool:
|
||||||
|
"""
|
||||||
|
Create debug visualization for YOLO direct masks.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
detections: List of YOLO detections with masks
|
||||||
|
video_path: Path to video file
|
||||||
|
output_path: Path to save debug image
|
||||||
|
scale: Scale factor for frame processing
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if debug frame was created successfully
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Load first frame
|
||||||
|
cap = cv2.VideoCapture(video_path)
|
||||||
|
ret, original_frame = cap.read()
|
||||||
|
cap.release()
|
||||||
|
|
||||||
|
if not ret:
|
||||||
|
logger.error("Could not read first frame for YOLO mask debug")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Scale frame if needed
|
||||||
|
if scale != 1.0:
|
||||||
|
original_frame = cv2.resize(original_frame, None, fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR)
|
||||||
|
|
||||||
|
debug_frame = original_frame.copy()
|
||||||
|
|
||||||
|
# Define colors for each object
|
||||||
|
colors = {
|
||||||
|
1: (0, 255, 0), # Green for Object 1 (Left eye)
|
||||||
|
2: (255, 0, 0), # Blue for Object 2 (Right eye)
|
||||||
|
}
|
||||||
|
|
||||||
|
# Get detections with masks
|
||||||
|
detections_with_masks = [d for d in detections if d.get('has_mask', False)]
|
||||||
|
|
||||||
|
# Overlay masks with transparency
|
||||||
|
obj_id = 1
|
||||||
|
for detection in detections_with_masks[:2]: # Up to 2 objects
|
||||||
|
mask = detection['mask']
|
||||||
|
|
||||||
|
# Resize mask to match frame if needed
|
||||||
|
if mask.shape != original_frame.shape[:2]:
|
||||||
|
mask = cv2.resize(mask.astype(np.float32), (original_frame.shape[1], original_frame.shape[0]), interpolation=cv2.INTER_NEAREST)
|
||||||
|
mask = mask > 0.5
|
||||||
|
|
||||||
|
mask = mask.astype(bool)
|
||||||
|
|
||||||
|
# Apply colored overlay
|
||||||
|
color = colors.get(obj_id, (128, 128, 128))
|
||||||
|
overlay = debug_frame.copy()
|
||||||
|
overlay[mask] = color
|
||||||
|
|
||||||
|
# Blend with original (30% overlay, 70% original)
|
||||||
|
cv2.addWeighted(overlay, 0.3, debug_frame, 0.7, 0, debug_frame)
|
||||||
|
|
||||||
|
# Draw outline
|
||||||
|
contours, _ = cv2.findContours(mask.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
|
||||||
|
cv2.drawContours(debug_frame, contours, -1, color, 2)
|
||||||
|
|
||||||
|
logger.info(f"YOLO Mask Debug: Object {obj_id} mask - shape: {mask.shape}, pixels: {np.sum(mask)}")
|
||||||
|
obj_id += 1
|
||||||
|
|
||||||
|
# Add title and source info
|
||||||
|
title = f"YOLO Direct Masks: {len(detections_with_masks)} objects detected"
|
||||||
|
cv2.putText(debug_frame, title, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 2)
|
||||||
|
|
||||||
|
source_info = "Mask Source: YOLO Segmentation (DIRECT - No SAM2)"
|
||||||
|
cv2.putText(debug_frame, source_info, (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2) # Green for YOLO
|
||||||
|
|
||||||
|
# Add object legend
|
||||||
|
y_offset = 90
|
||||||
|
for i, detection in enumerate(detections_with_masks[:2]):
|
||||||
|
obj_id = i + 1
|
||||||
|
color = colors.get(obj_id, (128, 128, 128))
|
||||||
|
text = f"Object {obj_id}: {'Left Eye' if obj_id == 1 else 'Right Eye'} (YOLO Mask)"
|
||||||
|
cv2.putText(debug_frame, text, (10, y_offset), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
|
||||||
|
y_offset += 30
|
||||||
|
|
||||||
|
# Save debug image
|
||||||
|
success = cv2.imwrite(output_path, debug_frame)
|
||||||
|
if success:
|
||||||
|
logger.info(f"YOLO Mask Debug: Saved debug frame to {output_path}")
|
||||||
|
else:
|
||||||
|
logger.error(f"Failed to save YOLO mask debug frame to {output_path}")
|
||||||
|
|
||||||
|
return success
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error creating YOLO mask debug frame: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
def resolve_detect_segments(detect_segments, total_segments: int) -> List[int]:
|
def resolve_detect_segments(detect_segments, total_segments: int) -> List[int]:
|
||||||
"""
|
"""
|
||||||
Resolve detect_segments configuration to list of segment indices.
|
Resolve detect_segments configuration to list of segment indices.
|
||||||
@@ -89,8 +188,295 @@ def resolve_detect_segments(detect_segments, total_segments: int) -> List[int]:
|
|||||||
logger.warning(f"Invalid detect_segments format: {detect_segments}. Using all segments.")
|
logger.warning(f"Invalid detect_segments format: {detect_segments}. Using all segments.")
|
||||||
return list(range(total_segments))
|
return list(range(total_segments))
|
||||||
|
|
||||||
def main():
|
def process_segment_with_separate_eyes(segment_info, detector, sam2_processor, mask_processor, config,
|
||||||
"""Main processing pipeline."""
|
previous_left_masks=None, previous_right_masks=None):
|
||||||
|
"""
|
||||||
|
Process a single segment using separate eye processing mode.
|
||||||
|
Split video first, then run YOLO independently on each eye.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
segment_info: Segment information dictionary
|
||||||
|
detector: YOLO detector instance
|
||||||
|
sam2_processor: SAM2 processor with eye processing enabled
|
||||||
|
mask_processor: Mask processor instance
|
||||||
|
config: Configuration loader instance
|
||||||
|
previous_left_masks: Previous masks for left eye
|
||||||
|
previous_right_masks: Previous masks for right eye
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (success, left_masks, right_masks)
|
||||||
|
"""
|
||||||
|
segment_idx = segment_info['index']
|
||||||
|
logger.info(f"VR180 Separate Eyes: Processing segment {segment_idx} (video-split approach)")
|
||||||
|
|
||||||
|
# Get video properties
|
||||||
|
cap = cv2.VideoCapture(segment_info['video_file'])
|
||||||
|
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||||
|
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||||
|
cap.release()
|
||||||
|
|
||||||
|
full_frame_shape = (frame_height, frame_width)
|
||||||
|
|
||||||
|
# Step 1: Split the segment video into left and right eye videos
|
||||||
|
left_eye_video = os.path.join(segment_info['directory'], "left_eye.mp4")
|
||||||
|
right_eye_video = os.path.join(segment_info['directory'], "right_eye.mp4")
|
||||||
|
|
||||||
|
logger.info(f"VR180 Separate Eyes: Splitting segment video into eye videos")
|
||||||
|
success = sam2_processor.eye_processor.split_video_into_eyes(
|
||||||
|
segment_info['video_file'],
|
||||||
|
left_eye_video,
|
||||||
|
right_eye_video,
|
||||||
|
scale=config.get_inference_scale()
|
||||||
|
)
|
||||||
|
|
||||||
|
if not success:
|
||||||
|
logger.error(f"VR180 Separate Eyes: Failed to split video for segment {segment_idx}")
|
||||||
|
return False, None, None
|
||||||
|
|
||||||
|
# Check if both eye videos were created
|
||||||
|
if not os.path.exists(left_eye_video) or not os.path.exists(right_eye_video):
|
||||||
|
logger.error(f"VR180 Separate Eyes: Eye video files not created for segment {segment_idx}")
|
||||||
|
return False, None, None
|
||||||
|
|
||||||
|
logger.info(f"VR180 Separate Eyes: Created eye videos - left: {left_eye_video}, right: {right_eye_video}")
|
||||||
|
|
||||||
|
# Step 2: Run YOLO independently on each eye video
|
||||||
|
left_detections = detector.detect_humans_in_video_first_frame(
|
||||||
|
left_eye_video, scale=1.0 # Already scaled during video splitting
|
||||||
|
)
|
||||||
|
|
||||||
|
right_detections = detector.detect_humans_in_video_first_frame(
|
||||||
|
right_eye_video, scale=1.0 # Already scaled during video splitting
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(f"VR180 Separate Eyes: YOLO detections - left: {len(left_detections)}, right: {len(right_detections)}")
|
||||||
|
|
||||||
|
# Check if we have YOLO segmentation masks
|
||||||
|
has_yolo_masks = False
|
||||||
|
if detector.supports_segmentation:
|
||||||
|
has_yolo_masks = any(d.get('has_mask', False) for d in (left_detections + right_detections))
|
||||||
|
|
||||||
|
if has_yolo_masks:
|
||||||
|
logger.info(f"VR180 Separate Eyes: YOLO segmentation mode - using direct masks instead of bounding boxes")
|
||||||
|
|
||||||
|
# Save eye-specific debug frames if enabled
|
||||||
|
if config.get('advanced.save_yolo_debug_frames', False) and (left_detections or right_detections):
|
||||||
|
try:
|
||||||
|
# Load first frames from each eye video
|
||||||
|
left_cap = cv2.VideoCapture(left_eye_video)
|
||||||
|
ret_left, left_frame = left_cap.read()
|
||||||
|
left_cap.release()
|
||||||
|
|
||||||
|
right_cap = cv2.VideoCapture(right_eye_video)
|
||||||
|
ret_right, right_frame = right_cap.read()
|
||||||
|
right_cap.release()
|
||||||
|
|
||||||
|
if ret_left and ret_right:
|
||||||
|
# Save eye-specific debug frames
|
||||||
|
left_debug_path = os.path.join(segment_info['directory'], "left_eye_debug.jpg")
|
||||||
|
right_debug_path = os.path.join(segment_info['directory'], "right_eye_debug.jpg")
|
||||||
|
|
||||||
|
detector.save_eye_debug_frames(
|
||||||
|
left_frame, right_frame,
|
||||||
|
left_detections, right_detections,
|
||||||
|
left_debug_path, right_debug_path
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(f"VR180 Separate Eyes: Saved eye-specific debug frames for segment {segment_idx}")
|
||||||
|
else:
|
||||||
|
logger.warning(f"VR180 Separate Eyes: Could not load eye frames for debug visualization")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"VR180 Separate Eyes: Failed to create eye debug frames: {e}")
|
||||||
|
|
||||||
|
# Step 3: Process left eye if detections exist or we have previous masks
|
||||||
|
left_masks = None
|
||||||
|
if left_detections or previous_left_masks:
|
||||||
|
try:
|
||||||
|
left_prompts = None
|
||||||
|
left_initial_masks = None
|
||||||
|
|
||||||
|
if left_detections:
|
||||||
|
if has_yolo_masks:
|
||||||
|
# YOLO segmentation mode: convert masks to initial masks for SAM2
|
||||||
|
left_initial_masks = {}
|
||||||
|
for i, detection in enumerate(left_detections):
|
||||||
|
if detection.get('has_mask', False):
|
||||||
|
mask = detection['mask']
|
||||||
|
left_initial_masks[1] = mask.astype(bool) # Always use obj_id=1 for single eye
|
||||||
|
logger.info(f"VR180 Separate Eyes: Left eye YOLO mask - shape: {mask.shape}, pixels: {np.sum(mask)}")
|
||||||
|
break # Only take the first/best mask for single eye processing
|
||||||
|
|
||||||
|
if left_initial_masks:
|
||||||
|
logger.info(f"VR180 Separate Eyes: Left eye - using YOLO segmentation masks as initial masks")
|
||||||
|
else:
|
||||||
|
# YOLO detection mode: convert bounding boxes to prompts
|
||||||
|
left_prompts = detector.convert_detections_to_sam2_prompts(left_detections, frame_width // 2)
|
||||||
|
logger.info(f"VR180 Separate Eyes: Left eye - {len(left_prompts)} SAM2 prompts")
|
||||||
|
|
||||||
|
# Create temporary segment info for left eye processing
|
||||||
|
left_segment_info = segment_info.copy()
|
||||||
|
left_segment_info['video_file'] = left_eye_video
|
||||||
|
|
||||||
|
left_masks = sam2_processor.process_single_eye_segment(
|
||||||
|
left_segment_info, 'left', left_prompts,
|
||||||
|
left_initial_masks or previous_left_masks,
|
||||||
|
1.0 # Scale already applied during video splitting
|
||||||
|
)
|
||||||
|
|
||||||
|
if left_masks:
|
||||||
|
logger.info(f"VR180 Separate Eyes: Left eye processed - {len(left_masks)} frame masks")
|
||||||
|
else:
|
||||||
|
logger.warning(f"VR180 Separate Eyes: Left eye processing failed")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"VR180 Separate Eyes: Error processing left eye for segment {segment_idx}: {e}")
|
||||||
|
left_masks = None
|
||||||
|
|
||||||
|
# Step 4: Process right eye if detections exist or we have previous masks
|
||||||
|
right_masks = None
|
||||||
|
if right_detections or previous_right_masks:
|
||||||
|
try:
|
||||||
|
right_prompts = None
|
||||||
|
right_initial_masks = None
|
||||||
|
|
||||||
|
if right_detections:
|
||||||
|
if has_yolo_masks:
|
||||||
|
# YOLO segmentation mode: convert masks to initial masks for SAM2
|
||||||
|
right_initial_masks = {}
|
||||||
|
for i, detection in enumerate(right_detections):
|
||||||
|
if detection.get('has_mask', False):
|
||||||
|
mask = detection['mask']
|
||||||
|
right_initial_masks[1] = mask.astype(bool) # Always use obj_id=1 for single eye
|
||||||
|
logger.info(f"VR180 Separate Eyes: Right eye YOLO mask - shape: {mask.shape}, pixels: {np.sum(mask)}")
|
||||||
|
break # Only take the first/best mask for single eye processing
|
||||||
|
|
||||||
|
if right_initial_masks:
|
||||||
|
logger.info(f"VR180 Separate Eyes: Right eye - using YOLO segmentation masks as initial masks")
|
||||||
|
else:
|
||||||
|
# YOLO detection mode: convert bounding boxes to prompts
|
||||||
|
right_prompts = detector.convert_detections_to_sam2_prompts(right_detections, frame_width // 2)
|
||||||
|
logger.info(f"VR180 Separate Eyes: Right eye - {len(right_prompts)} SAM2 prompts")
|
||||||
|
|
||||||
|
# Create temporary segment info for right eye processing
|
||||||
|
right_segment_info = segment_info.copy()
|
||||||
|
right_segment_info['video_file'] = right_eye_video
|
||||||
|
|
||||||
|
right_masks = sam2_processor.process_single_eye_segment(
|
||||||
|
right_segment_info, 'right', right_prompts,
|
||||||
|
right_initial_masks or previous_right_masks,
|
||||||
|
1.0 # Scale already applied during video splitting
|
||||||
|
)
|
||||||
|
|
||||||
|
if right_masks:
|
||||||
|
logger.info(f"VR180 Separate Eyes: Right eye processed - {len(right_masks)} frame masks")
|
||||||
|
else:
|
||||||
|
logger.warning(f"VR180 Separate Eyes: Right eye processing failed")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"VR180 Separate Eyes: Error processing right eye for segment {segment_idx}: {e}")
|
||||||
|
right_masks = None
|
||||||
|
|
||||||
|
# Step 5: Check if we got any valid masks
|
||||||
|
if not left_masks and not right_masks:
|
||||||
|
logger.warning(f"VR180 Separate Eyes: Neither eye produced valid masks for segment {segment_idx}")
|
||||||
|
|
||||||
|
if config.get('processing.enable_greenscreen_fallback', True):
|
||||||
|
logger.info(f"VR180 Separate Eyes: Using greenscreen fallback for segment {segment_idx}")
|
||||||
|
success = mask_processor.process_greenscreen_only_segment(
|
||||||
|
segment_info,
|
||||||
|
green_color=config.get_green_color(),
|
||||||
|
use_nvenc=config.get_use_nvenc(),
|
||||||
|
bitrate=config.get_output_bitrate()
|
||||||
|
)
|
||||||
|
return success, None, None
|
||||||
|
else:
|
||||||
|
logger.error(f"VR180 Separate Eyes: No masks generated and greenscreen fallback disabled")
|
||||||
|
return False, None, None
|
||||||
|
|
||||||
|
# Step 6: Combine masks back to full frame format
|
||||||
|
try:
|
||||||
|
logger.info(f"VR180 Separate Eyes: Combining eye masks for segment {segment_idx}")
|
||||||
|
combined_masks = sam2_processor.eye_processor.combine_eye_masks(
|
||||||
|
left_masks, right_masks, full_frame_shape
|
||||||
|
)
|
||||||
|
|
||||||
|
if not combined_masks:
|
||||||
|
logger.error(f"VR180 Separate Eyes: Failed to combine eye masks for segment {segment_idx}")
|
||||||
|
return False, left_masks, right_masks
|
||||||
|
|
||||||
|
# Validate combined masks have reasonable content
|
||||||
|
total_mask_pixels = 0
|
||||||
|
for frame_idx, frame_masks in combined_masks.items():
|
||||||
|
for obj_id, mask in frame_masks.items():
|
||||||
|
if mask is not None:
|
||||||
|
total_mask_pixels += np.sum(mask)
|
||||||
|
|
||||||
|
if total_mask_pixels == 0:
|
||||||
|
logger.warning(f"VR180 Separate Eyes: Combined masks are empty for segment {segment_idx}")
|
||||||
|
if config.get('processing.enable_greenscreen_fallback', True):
|
||||||
|
logger.info(f"VR180 Separate Eyes: Using greenscreen fallback due to empty masks")
|
||||||
|
success = mask_processor.process_greenscreen_only_segment(
|
||||||
|
segment_info,
|
||||||
|
green_color=config.get_green_color(),
|
||||||
|
use_nvenc=config.get_use_nvenc(),
|
||||||
|
bitrate=config.get_output_bitrate()
|
||||||
|
)
|
||||||
|
return success, left_masks, right_masks
|
||||||
|
|
||||||
|
logger.info(f"VR180 Separate Eyes: Combined masks contain {total_mask_pixels} total pixels")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"VR180 Separate Eyes: Error combining eye masks for segment {segment_idx}: {e}")
|
||||||
|
# Try greenscreen fallback if mask combination fails
|
||||||
|
if config.get('processing.enable_greenscreen_fallback', True):
|
||||||
|
logger.info(f"VR180 Separate Eyes: Using greenscreen fallback due to mask combination error")
|
||||||
|
success = mask_processor.process_greenscreen_only_segment(
|
||||||
|
segment_info,
|
||||||
|
green_color=config.get_green_color(),
|
||||||
|
use_nvenc=config.get_use_nvenc(),
|
||||||
|
bitrate=config.get_output_bitrate()
|
||||||
|
)
|
||||||
|
return success, left_masks, right_masks
|
||||||
|
else:
|
||||||
|
return False, left_masks, right_masks
|
||||||
|
|
||||||
|
# Step 7: Save combined masks
|
||||||
|
mask_path = os.path.join(segment_info['directory'], "mask.png")
|
||||||
|
sam2_processor.save_final_masks(
|
||||||
|
combined_masks,
|
||||||
|
mask_path,
|
||||||
|
green_color=config.get_green_color(),
|
||||||
|
blue_color=config.get_blue_color()
|
||||||
|
)
|
||||||
|
|
||||||
|
# Step 8: Apply green screen and save output video
|
||||||
|
success = mask_processor.process_segment(
|
||||||
|
segment_info,
|
||||||
|
combined_masks,
|
||||||
|
use_nvenc=config.get_use_nvenc(),
|
||||||
|
bitrate=config.get_output_bitrate()
|
||||||
|
)
|
||||||
|
|
||||||
|
if success:
|
||||||
|
logger.info(f"VR180 Separate Eyes: Successfully processed segment {segment_idx}")
|
||||||
|
else:
|
||||||
|
logger.error(f"VR180 Separate Eyes: Failed to create output video for segment {segment_idx}")
|
||||||
|
|
||||||
|
# Clean up temporary eye video files
|
||||||
|
try:
|
||||||
|
if os.path.exists(left_eye_video):
|
||||||
|
os.remove(left_eye_video)
|
||||||
|
if os.path.exists(right_eye_video):
|
||||||
|
os.remove(right_eye_video)
|
||||||
|
logger.debug(f"VR180 Separate Eyes: Cleaned up temporary eye videos for segment {segment_idx}")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"VR180 Separate Eyes: Failed to clean up temporary eye videos: {e}")
|
||||||
|
|
||||||
|
return success, left_masks, right_masks
|
||||||
|
|
||||||
|
async def main_async():
|
||||||
|
"""Main processing pipeline with async optimizations."""
|
||||||
args = parse_arguments()
|
args = parse_arguments()
|
||||||
|
|
||||||
try:
|
try:
|
||||||
@@ -157,31 +543,386 @@ def main():
|
|||||||
detect_segments_config = config.get_detect_segments()
|
detect_segments_config = config.get_detect_segments()
|
||||||
detect_segments = resolve_detect_segments(detect_segments_config, len(segments_info))
|
detect_segments = resolve_detect_segments(detect_segments_config, len(segments_info))
|
||||||
|
|
||||||
# Step 2: Run YOLO detection on specified segments
|
# Initialize processors once
|
||||||
logger.info("Step 2: Running YOLO human detection")
|
logger.info("Step 2: Initializing YOLO detector")
|
||||||
|
|
||||||
|
# Get YOLO mode and model paths
|
||||||
|
yolo_mode = config.get('models.yolo_mode', 'detection')
|
||||||
|
detection_model = config.get('models.yolo_detection_model', config.get_yolo_model_path())
|
||||||
|
segmentation_model = config.get('models.yolo_segmentation_model', None)
|
||||||
|
|
||||||
|
logger.info(f"YOLO Mode: {yolo_mode}")
|
||||||
|
|
||||||
detector = YOLODetector(
|
detector = YOLODetector(
|
||||||
model_path=config.get_yolo_model_path(),
|
detection_model_path=detection_model,
|
||||||
|
segmentation_model_path=segmentation_model,
|
||||||
|
mode=yolo_mode,
|
||||||
confidence_threshold=config.get_yolo_confidence(),
|
confidence_threshold=config.get_yolo_confidence(),
|
||||||
human_class_id=config.get_human_class_id()
|
human_class_id=config.get_human_class_id()
|
||||||
)
|
)
|
||||||
|
|
||||||
detection_results = detector.process_segments_batch(
|
logger.info("Step 3: Initializing SAM2 processor")
|
||||||
|
|
||||||
|
# Check if separate eye processing is enabled
|
||||||
|
separate_eye_processing = config.get('processing.separate_eye_processing', False)
|
||||||
|
eye_overlap_pixels = config.get('processing.eye_overlap_pixels', 0)
|
||||||
|
enable_greenscreen_fallback = config.get('processing.enable_greenscreen_fallback', True)
|
||||||
|
|
||||||
|
# Initialize async preprocessor if enabled
|
||||||
|
async_preprocessor = None
|
||||||
|
if config.get('advanced.enable_background_lowres_generation', False):
|
||||||
|
from core.async_lowres_preprocessor import AsyncLowResPreprocessor
|
||||||
|
|
||||||
|
max_concurrent = config.get('advanced.max_concurrent_lowres', 3)
|
||||||
|
segments_ahead = config.get('advanced.lowres_segments_ahead', 3)
|
||||||
|
use_ffmpeg = config.get('advanced.use_ffmpeg_lowres', True)
|
||||||
|
|
||||||
|
async_preprocessor = AsyncLowResPreprocessor(
|
||||||
|
max_concurrent=max_concurrent,
|
||||||
|
segments_ahead=segments_ahead,
|
||||||
|
use_ffmpeg=use_ffmpeg
|
||||||
|
)
|
||||||
|
logger.info(f"Async low-res preprocessing: ENABLED (max_concurrent={max_concurrent}, segments_ahead={segments_ahead})")
|
||||||
|
else:
|
||||||
|
logger.info("Async low-res preprocessing: DISABLED")
|
||||||
|
|
||||||
|
if separate_eye_processing:
|
||||||
|
logger.info("VR180 Separate Eye Processing: ENABLED")
|
||||||
|
logger.info(f"Eye overlap pixels: {eye_overlap_pixels}")
|
||||||
|
logger.info(f"Greenscreen fallback: {enable_greenscreen_fallback}")
|
||||||
|
|
||||||
|
sam2_processor = SAM2Processor(
|
||||||
|
checkpoint_path=config.get_sam2_checkpoint(),
|
||||||
|
config_path=config.get_sam2_config(),
|
||||||
|
vos_optimized=config.get('models.sam2_vos_optimized', False),
|
||||||
|
separate_eye_processing=separate_eye_processing,
|
||||||
|
eye_overlap_pixels=eye_overlap_pixels,
|
||||||
|
async_preprocessor=async_preprocessor
|
||||||
|
)
|
||||||
|
|
||||||
|
# Initialize mask processor with quality enhancements
|
||||||
|
mask_quality_config = config.get('mask_processing', {})
|
||||||
|
mask_processor = MaskProcessor(
|
||||||
|
green_color=config.get_green_color(),
|
||||||
|
blue_color=config.get_blue_color(),
|
||||||
|
mask_quality_config=mask_quality_config
|
||||||
|
)
|
||||||
|
|
||||||
|
# Process each segment sequentially (YOLO -> SAM2 -> Render)
|
||||||
|
logger.info("Step 4: Processing segments sequentially")
|
||||||
|
total_humans_detected = 0
|
||||||
|
|
||||||
|
# Start background low-res video preprocessing if enabled
|
||||||
|
if async_preprocessor:
|
||||||
|
logger.info("Starting background low-res video preprocessing")
|
||||||
|
async_preprocessor.start_background_preparation(
|
||||||
segments_info,
|
segments_info,
|
||||||
detect_segments,
|
config.get_inference_scale(),
|
||||||
|
separate_eye_processing,
|
||||||
|
current_segment=0
|
||||||
|
)
|
||||||
|
|
||||||
|
# Initialize previous masks for separate eye processing
|
||||||
|
previous_left_masks = None
|
||||||
|
previous_right_masks = None
|
||||||
|
|
||||||
|
for i, segment_info in enumerate(segments_info):
|
||||||
|
segment_idx = segment_info['index']
|
||||||
|
|
||||||
|
logger.info(f"Processing segment {segment_idx}/{len(segments_info)-1}")
|
||||||
|
|
||||||
|
# Start background preparation for upcoming segments
|
||||||
|
if async_preprocessor and i < len(segments_info) - 1:
|
||||||
|
async_preprocessor.start_background_preparation(
|
||||||
|
segments_info,
|
||||||
|
config.get_inference_scale(),
|
||||||
|
separate_eye_processing,
|
||||||
|
current_segment=i
|
||||||
|
)
|
||||||
|
|
||||||
|
# Reset temporal history for new segment
|
||||||
|
mask_processor.reset_temporal_history()
|
||||||
|
|
||||||
|
# Skip if segment output already exists
|
||||||
|
output_video = os.path.join(segment_info['directory'], f"output_{segment_idx}.mp4")
|
||||||
|
if os.path.exists(output_video):
|
||||||
|
logger.info(f"Segment {segment_idx} already processed, skipping")
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Branch based on processing mode
|
||||||
|
if separate_eye_processing:
|
||||||
|
# Use separate eye processing mode
|
||||||
|
success, left_masks, right_masks = process_segment_with_separate_eyes(
|
||||||
|
segment_info, detector, sam2_processor, mask_processor, config,
|
||||||
|
previous_left_masks, previous_right_masks
|
||||||
|
)
|
||||||
|
|
||||||
|
# Update previous masks for next segment
|
||||||
|
previous_left_masks = left_masks
|
||||||
|
previous_right_masks = right_masks
|
||||||
|
|
||||||
|
if success:
|
||||||
|
logger.info(f"Successfully processed segment {segment_idx} with separate eye processing")
|
||||||
|
else:
|
||||||
|
logger.error(f"Failed to process segment {segment_idx} with separate eye processing")
|
||||||
|
|
||||||
|
continue # Skip the original processing logic
|
||||||
|
|
||||||
|
# Determine if we should use YOLO detections or previous masks
|
||||||
|
use_detections = segment_idx in detect_segments
|
||||||
|
|
||||||
|
# First segment must use detections
|
||||||
|
if segment_idx == 0 and not use_detections:
|
||||||
|
logger.warning(f"First segment must use YOLO detection")
|
||||||
|
use_detections = True
|
||||||
|
|
||||||
|
# Get YOLO prompts or previous masks
|
||||||
|
yolo_prompts = None
|
||||||
|
previous_masks = None
|
||||||
|
|
||||||
|
if use_detections:
|
||||||
|
# Run YOLO stereo detection and matching on current segment
|
||||||
|
logger.info(f"Running stereo pair detection on segment {segment_idx}")
|
||||||
|
|
||||||
|
# Load the first frame for detection
|
||||||
|
cap = cv2.VideoCapture(segment_info['video_file'])
|
||||||
|
ret, frame = cap.read()
|
||||||
|
cap.release()
|
||||||
|
|
||||||
|
if not ret:
|
||||||
|
logger.error(f"Could not read first frame of segment {segment_idx}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Scale frame if needed
|
||||||
|
if config.get_inference_scale() != 1.0:
|
||||||
|
frame = cv2.resize(frame, None, fx=config.get_inference_scale(), fy=config.get_inference_scale(), interpolation=cv2.INTER_LINEAR)
|
||||||
|
|
||||||
|
yolo_prompts = detector.detect_and_match_stereo_pairs(
|
||||||
|
frame,
|
||||||
|
config.get_confidence_reduction_factor(),
|
||||||
|
config.get_stereo_iou_threshold(),
|
||||||
|
segment_info,
|
||||||
|
config.get('advanced.save_yolo_debug_frames', True)
|
||||||
|
)
|
||||||
|
|
||||||
|
if not yolo_prompts:
|
||||||
|
logger.warning(f"No valid stereo pairs found for segment {segment_idx}. Attempting to use previous segment's mask.")
|
||||||
|
if segment_idx > 0:
|
||||||
|
prev_segment_dir = segments_info[segment_idx - 1]['directory']
|
||||||
|
previous_masks = sam2_processor.load_previous_segment_mask(prev_segment_dir)
|
||||||
|
if previous_masks:
|
||||||
|
logger.info(f"Using masks from segment {segment_idx - 1} as fallback.")
|
||||||
|
else:
|
||||||
|
logger.error(f"Fallback failed: No previous mask found for segment {segment_idx}.")
|
||||||
|
else:
|
||||||
|
logger.error("Cannot use fallback for the first segment.")
|
||||||
|
elif segment_idx > 0:
|
||||||
|
# Try to load previous segment mask
|
||||||
|
for j in range(segment_idx - 1, -1, -1):
|
||||||
|
prev_segment_dir = segments_info[j]['directory']
|
||||||
|
previous_masks = sam2_processor.load_previous_segment_mask(prev_segment_dir)
|
||||||
|
if previous_masks:
|
||||||
|
logger.info(f"Using masks from segment {j} for segment {segment_idx}")
|
||||||
|
break
|
||||||
|
|
||||||
|
if not yolo_prompts and not previous_masks:
|
||||||
|
logger.error(f"No prompts or previous masks available for segment {segment_idx}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Check if we have YOLO masks from the stereo pair matching and can use them as initial masks for SAM2
|
||||||
|
if yolo_prompts and detector.supports_segmentation:
|
||||||
|
logger.info(f"Pipeline Debug: YOLO segmentation provided matched stereo masks - using as SAM2 initial masks.")
|
||||||
|
|
||||||
|
# Convert the prompts (which contain masks) into the initial_masks format for SAM2
|
||||||
|
initial_masks = {prompt['obj_id']: prompt['mask'] for prompt in yolo_prompts if 'mask' in prompt}
|
||||||
|
|
||||||
|
if initial_masks:
|
||||||
|
# We are providing initial masks, so we should not provide bbox prompts
|
||||||
|
previous_masks = initial_masks
|
||||||
|
yolo_prompts = None
|
||||||
|
logger.info(f"Pipeline Debug: Using {len(previous_masks)} YOLO masks as SAM2 initial masks.")
|
||||||
|
else:
|
||||||
|
logger.warning("YOLO segmentation mode is on, but no masks were found in the final prompts.")
|
||||||
|
|
||||||
|
# Debug what we're passing to SAM2
|
||||||
|
if yolo_prompts:
|
||||||
|
logger.info(f"Pipeline Debug: Passing {len(yolo_prompts)} YOLO prompts to SAM2 for segment {segment_idx}")
|
||||||
|
for i, prompt in enumerate(yolo_prompts):
|
||||||
|
logger.info(f"Pipeline Debug: Prompt {i+1}: Object {prompt['obj_id']}, bbox={prompt['bbox']}")
|
||||||
|
|
||||||
|
if previous_masks:
|
||||||
|
logger.info(f"Pipeline Debug: Using {len(previous_masks)} previous masks for segment {segment_idx}")
|
||||||
|
logger.info(f"Pipeline Debug: Previous mask object IDs: {list(previous_masks.keys())}")
|
||||||
|
|
||||||
|
# Handle mid-segment detection if enabled (works for both detection and segmentation modes)
|
||||||
|
multi_frame_prompts = None
|
||||||
|
if config.get('advanced.enable_mid_segment_detection', False) and (yolo_prompts or has_yolo_masks):
|
||||||
|
logger.info(f"Mid-segment Detection: Enabled for segment {segment_idx}")
|
||||||
|
|
||||||
|
# Calculate frame indices for re-detection
|
||||||
|
cap = cv2.VideoCapture(segment_info['video_file'])
|
||||||
|
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||||
|
fps = cap.get(cv2.CAP_PROP_FPS) or 30.0
|
||||||
|
cap.release()
|
||||||
|
|
||||||
|
redetection_interval = config.get('advanced.redetection_interval', 30)
|
||||||
|
max_redetections = config.get('advanced.max_redetections_per_segment', 10)
|
||||||
|
|
||||||
|
# Generate frame indices: [30, 60, 90, ...] (skip frame 0 since we already have first frame prompts)
|
||||||
|
frame_indices = []
|
||||||
|
frame_idx = redetection_interval
|
||||||
|
while frame_idx < total_frames and len(frame_indices) < max_redetections:
|
||||||
|
frame_indices.append(frame_idx)
|
||||||
|
frame_idx += redetection_interval
|
||||||
|
|
||||||
|
if frame_indices:
|
||||||
|
logger.info(f"Mid-segment Detection: Running YOLO on frames {frame_indices} (interval={redetection_interval})")
|
||||||
|
|
||||||
|
# Run multi-frame detection
|
||||||
|
multi_frame_detections = detector.detect_humans_multi_frame(
|
||||||
|
segment_info['video_file'],
|
||||||
|
frame_indices,
|
||||||
scale=config.get_inference_scale()
|
scale=config.get_inference_scale()
|
||||||
)
|
)
|
||||||
|
|
||||||
# Log detection summary
|
# Convert detections to SAM2 prompts (different handling for segmentation vs detection mode)
|
||||||
total_humans = sum(len(detections) for detections in detection_results.values())
|
multi_frame_prompts = {}
|
||||||
logger.info(f"Detected {total_humans} humans across {len(detection_results)} segments")
|
cap = cv2.VideoCapture(segment_info['video_file'])
|
||||||
|
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||||
|
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||||
|
cap.release()
|
||||||
|
|
||||||
# Step 3: Process segments with SAM2 (placeholder for now)
|
for frame_idx, detections in multi_frame_detections.items():
|
||||||
logger.info("Step 3: SAM2 processing and green screen generation")
|
if detections:
|
||||||
logger.info("SAM2 processing module not yet implemented - this is where segment processing would occur")
|
if has_yolo_masks:
|
||||||
|
# Segmentation mode: convert YOLO masks to SAM2 mask prompts
|
||||||
|
frame_masks = {}
|
||||||
|
for i, detection in enumerate(detections[:2]): # Up to 2 objects
|
||||||
|
if detection.get('has_mask', False):
|
||||||
|
mask = detection['mask']
|
||||||
|
# Resize mask to match inference scale
|
||||||
|
if config.get_inference_scale() != 1.0:
|
||||||
|
scale = config.get_inference_scale()
|
||||||
|
scaled_height = int(frame_height * scale)
|
||||||
|
scaled_width = int(frame_width * scale)
|
||||||
|
mask = cv2.resize(mask.astype(np.float32), (scaled_width, scaled_height), interpolation=cv2.INTER_NEAREST)
|
||||||
|
mask = mask > 0.5
|
||||||
|
|
||||||
# Step 4: Assemble final video (placeholder for now)
|
obj_id = i + 1 # Sequential object IDs
|
||||||
logger.info("Step 4: Assembling final video with audio")
|
frame_masks[obj_id] = mask.astype(bool)
|
||||||
logger.info("Video assembly module not yet implemented - this is where concatenation and audio copying would occur")
|
logger.debug(f"Mid-segment Detection: Frame {frame_idx}, Object {obj_id} mask - shape: {mask.shape}, pixels: {np.sum(mask)}")
|
||||||
|
|
||||||
|
if frame_masks:
|
||||||
|
# Store as mask prompts (different format than bbox prompts)
|
||||||
|
multi_frame_prompts[frame_idx] = {'masks': frame_masks}
|
||||||
|
logger.info(f"Mid-segment Detection: Frame {frame_idx} -> {len(frame_masks)} YOLO masks")
|
||||||
|
else:
|
||||||
|
# Detection mode: convert to bounding box prompts (existing logic)
|
||||||
|
prompts = detector.convert_detections_to_sam2_prompts(detections, frame_width)
|
||||||
|
multi_frame_prompts[frame_idx] = prompts
|
||||||
|
logger.info(f"Mid-segment Detection: Frame {frame_idx} -> {len(prompts)} SAM2 prompts")
|
||||||
|
|
||||||
|
logger.info(f"Mid-segment Detection: Generated prompts for {len(multi_frame_prompts)} frames")
|
||||||
|
else:
|
||||||
|
logger.info(f"Mid-segment Detection: No additional frames to process (segment has {total_frames} frames)")
|
||||||
|
elif config.get('advanced.enable_mid_segment_detection', False):
|
||||||
|
logger.info(f"Mid-segment Detection: Skipped for segment {segment_idx} (no initial YOLO data)")
|
||||||
|
|
||||||
|
# Process segment with SAM2
|
||||||
|
logger.info(f"Pipeline Debug: Starting SAM2 processing for segment {segment_idx}")
|
||||||
|
video_segments = sam2_processor.process_single_segment(
|
||||||
|
segment_info,
|
||||||
|
yolo_prompts=yolo_prompts,
|
||||||
|
previous_masks=previous_masks,
|
||||||
|
inference_scale=config.get_inference_scale(),
|
||||||
|
multi_frame_prompts=multi_frame_prompts
|
||||||
|
)
|
||||||
|
|
||||||
|
if video_segments is None:
|
||||||
|
logger.error(f"SAM2 processing failed for segment {segment_idx}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Check if SAM2 produced adequate results
|
||||||
|
if len(video_segments) == 0:
|
||||||
|
logger.error(f"SAM2 produced no frames for segment {segment_idx}")
|
||||||
|
continue
|
||||||
|
elif len(video_segments) < 10: # Expected many frames for a 5-second segment
|
||||||
|
logger.warning(f"SAM2 produced very few frames ({len(video_segments)}) for segment {segment_idx} - this may indicate propagation failure")
|
||||||
|
|
||||||
|
# Debug what SAM2 produced
|
||||||
|
logger.info(f"Pipeline Debug: SAM2 completed for segment {segment_idx}")
|
||||||
|
logger.info(f"Pipeline Debug: Generated masks for {len(video_segments)} frames")
|
||||||
|
|
||||||
|
if video_segments:
|
||||||
|
# Check first frame to see what objects were tracked
|
||||||
|
first_frame_idx = min(video_segments.keys())
|
||||||
|
first_frame_objects = video_segments[first_frame_idx]
|
||||||
|
logger.info(f"Pipeline Debug: First frame contains {len(first_frame_objects)} tracked objects")
|
||||||
|
logger.info(f"Pipeline Debug: Tracked object IDs: {list(first_frame_objects.keys())}")
|
||||||
|
|
||||||
|
for obj_id, mask in first_frame_objects.items():
|
||||||
|
mask_pixels = np.sum(mask)
|
||||||
|
logger.info(f"Pipeline Debug: Object {obj_id} mask has {mask_pixels} pixels")
|
||||||
|
|
||||||
|
# Check last frame as well
|
||||||
|
last_frame_idx = max(video_segments.keys())
|
||||||
|
last_frame_objects = video_segments[last_frame_idx]
|
||||||
|
logger.info(f"Pipeline Debug: Last frame contains {len(last_frame_objects)} tracked objects")
|
||||||
|
logger.info(f"Pipeline Debug: Final object IDs: {list(last_frame_objects.keys())}")
|
||||||
|
|
||||||
|
# Save final masks for next segment
|
||||||
|
mask_path = os.path.join(segment_info['directory'], "mask.png")
|
||||||
|
sam2_processor.save_final_masks(
|
||||||
|
video_segments,
|
||||||
|
mask_path,
|
||||||
|
green_color=config.get_green_color(),
|
||||||
|
blue_color=config.get_blue_color()
|
||||||
|
)
|
||||||
|
|
||||||
|
# Apply green screen and save output video
|
||||||
|
success = mask_processor.process_segment(
|
||||||
|
segment_info,
|
||||||
|
video_segments,
|
||||||
|
use_nvenc=config.get_use_nvenc(),
|
||||||
|
bitrate=config.get_output_bitrate()
|
||||||
|
)
|
||||||
|
|
||||||
|
if success:
|
||||||
|
logger.info(f"Successfully processed segment {segment_idx}")
|
||||||
|
else:
|
||||||
|
logger.error(f"Failed to create green screen video for segment {segment_idx}")
|
||||||
|
|
||||||
|
# Log processing summary
|
||||||
|
logger.info(f"Sequential processing complete. Total humans detected: {total_humans_detected}")
|
||||||
|
|
||||||
|
# Step 3: Assemble final video
|
||||||
|
logger.info("Step 3: Assembling final video with audio")
|
||||||
|
|
||||||
|
# Initialize video assembler
|
||||||
|
assembler = VideoAssembler(
|
||||||
|
preserve_audio=config.get_preserve_audio(),
|
||||||
|
use_nvenc=config.get_use_nvenc()
|
||||||
|
)
|
||||||
|
|
||||||
|
# Verify all segments are complete
|
||||||
|
all_complete, missing = assembler.verify_segment_completeness(segments_dir)
|
||||||
|
|
||||||
|
if not all_complete:
|
||||||
|
logger.error(f"Cannot assemble video - missing segments: {missing}")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Assemble final video
|
||||||
|
final_output = os.path.join(output_dir, config.get_output_filename())
|
||||||
|
|
||||||
|
success = assembler.assemble_final_video(
|
||||||
|
segments_dir,
|
||||||
|
input_video,
|
||||||
|
final_output,
|
||||||
|
bitrate=config.get_output_bitrate()
|
||||||
|
)
|
||||||
|
|
||||||
|
if success:
|
||||||
|
logger.info(f"Final video saved to: {final_output}")
|
||||||
|
|
||||||
logger.info("Pipeline completed successfully")
|
logger.info("Pipeline completed successfully")
|
||||||
return 0
|
return 0
|
||||||
@@ -189,6 +930,16 @@ def main():
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Pipeline failed: {e}", exc_info=True)
|
logger.error(f"Pipeline failed: {e}", exc_info=True)
|
||||||
return 1
|
return 1
|
||||||
|
finally:
|
||||||
|
# Cleanup async preprocessor if it was used
|
||||||
|
if async_preprocessor:
|
||||||
|
async_preprocessor.cleanup()
|
||||||
|
logger.debug("Async preprocessor cleanup completed")
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Main entry point - wrapper for async main."""
|
||||||
|
import asyncio
|
||||||
|
return asyncio.run(main_async())
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
exit_code = main()
|
exit_code = main()
|
||||||
|
|||||||
@@ -6,6 +6,7 @@ opencv-python>=4.8.0
|
|||||||
numpy>=1.24.0
|
numpy>=1.24.0
|
||||||
|
|
||||||
# SAM2 - Segment Anything Model 2
|
# SAM2 - Segment Anything Model 2
|
||||||
|
# Note: Make sure to run download_models.py after installing to get model weights
|
||||||
git+https://github.com/facebookresearch/sam2.git
|
git+https://github.com/facebookresearch/sam2.git
|
||||||
|
|
||||||
# GPU acceleration (optional but recommended)
|
# GPU acceleration (optional but recommended)
|
||||||
@@ -17,6 +18,8 @@ tqdm>=4.65.0
|
|||||||
matplotlib>=3.7.0
|
matplotlib>=3.7.0
|
||||||
Pillow>=10.0.0
|
Pillow>=10.0.0
|
||||||
|
|
||||||
|
decord
|
||||||
|
|
||||||
# Optional: For advanced features
|
# Optional: For advanced features
|
||||||
psutil>=5.9.0 # Memory monitoring
|
psutil>=5.9.0 # Memory monitoring
|
||||||
pympler>=0.9 # Memory profiling (for debugging)
|
pympler>=0.9 # Memory profiling (for debugging)
|
||||||
|
|||||||
198
sbs_spec.md
Normal file
198
sbs_spec.md
Normal file
@@ -0,0 +1,198 @@
|
|||||||
|
# Plan: Separate Left/Right Eye Processing for VR180 SAM2 Pipeline
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
Implement a new processing mode that splits VR180 side-by-side frames into separate left and right halves, processes each eye independently through SAM2, then recombines them into the final output. This should improve tracking accuracy by removing parallax confusion between eyes.
|
||||||
|
|
||||||
|
## Key Changes Required
|
||||||
|
|
||||||
|
### 1. Configuration Updates
|
||||||
|
**File: `config.yaml`**
|
||||||
|
- Add new configuration option: `processing.separate_eye_processing: false` (default off for backward compatibility)
|
||||||
|
- Add related options:
|
||||||
|
- `processing.enable_greenscreen_fallback: true` (render full green if no humans detected)
|
||||||
|
- `processing.eye_overlap_pixels: 0` (optional overlap for blending)
|
||||||
|
|
||||||
|
### 2. Core SAM2 Processor Enhancements
|
||||||
|
**File: `core/sam2_processor.py`**
|
||||||
|
|
||||||
|
#### New Methods:
|
||||||
|
- `split_frame_into_eyes(frame) -> (left_frame, right_frame)`
|
||||||
|
- `split_video_into_eyes(video_path, left_output, right_output, scale)`
|
||||||
|
- `process_single_eye_segment(segment_info, eye_side, yolo_prompts, previous_masks, inference_scale)`
|
||||||
|
- `combine_eye_masks(left_masks, right_masks, full_frame_shape) -> combined_masks`
|
||||||
|
- `create_greenscreen_segment(segment_info, duration_seconds) -> bool`
|
||||||
|
|
||||||
|
#### Modified Methods:
|
||||||
|
- `process_single_segment()` - Add branch for separate eye processing mode
|
||||||
|
- New processing flow:
|
||||||
|
1. Check if separate_eye_processing enabled
|
||||||
|
2. If enabled: split segment video into left/right eye videos
|
||||||
|
3. Process each eye independently with SAM2
|
||||||
|
4. Combine masks back to full frame format
|
||||||
|
5. If fallback needed: create full greenscreen segment
|
||||||
|
|
||||||
|
### 3. YOLO Detector Enhancements
|
||||||
|
**File: `core/yolo_detector.py`**
|
||||||
|
|
||||||
|
#### New Methods:
|
||||||
|
- `detect_humans_in_single_eye(frame, eye_side) -> List[Dict]`
|
||||||
|
- `convert_eye_detections_to_sam2_prompts(detections, eye_side) -> List[Dict]`
|
||||||
|
- `has_any_detections(detections_list) -> bool`
|
||||||
|
|
||||||
|
#### Modified Methods:
|
||||||
|
- `detect_humans_in_video_first_frame()` - Add eye-specific detection support
|
||||||
|
- Object ID assignment: Always use obj_id=1 for single-eye processing (since each eye is processed independently)
|
||||||
|
|
||||||
|
### 4. Mask Processor Updates
|
||||||
|
**File: `core/mask_processor.py`**
|
||||||
|
|
||||||
|
#### New Methods:
|
||||||
|
- `create_full_greenscreen_frame(frame_shape) -> np.ndarray`
|
||||||
|
- `process_greenscreen_only_segment(segment_info, frame_count) -> bool`
|
||||||
|
|
||||||
|
#### Modified Methods:
|
||||||
|
- `apply_green_mask()` - Handle combined eye masks properly
|
||||||
|
- Add support for full-greenscreen fallback when no humans detected
|
||||||
|
|
||||||
|
### 5. Main Pipeline Integration
|
||||||
|
**File: `main.py`**
|
||||||
|
|
||||||
|
#### Processing Flow Changes:
|
||||||
|
```python
|
||||||
|
# For each segment:
|
||||||
|
if config.get('processing.separate_eye_processing', False):
|
||||||
|
# 1. Run YOLO on full frame to check for ANY human presence
|
||||||
|
full_frame_detections = detector.detect_humans_in_video_first_frame(segment_video)
|
||||||
|
|
||||||
|
if not full_frame_detections:
|
||||||
|
# No humans detected anywhere - create full greenscreen segment
|
||||||
|
success = mask_processor.process_greenscreen_only_segment(segment_info, expected_frame_count)
|
||||||
|
continue
|
||||||
|
|
||||||
|
# 2. Split detections by eye and process separately
|
||||||
|
left_detections = [d for d in full_frame_detections if is_in_left_half(d, frame_width)]
|
||||||
|
right_detections = [d for d in full_frame_detections if is_in_right_half(d, frame_width)]
|
||||||
|
|
||||||
|
# 3. Process left eye (if detections exist)
|
||||||
|
left_masks = None
|
||||||
|
if left_detections:
|
||||||
|
left_eye_prompts = detector.convert_eye_detections_to_sam2_prompts(left_detections, 'left')
|
||||||
|
left_masks = sam2_processor.process_single_eye_segment(segment_info, 'left', left_eye_prompts, previous_left_masks, inference_scale)
|
||||||
|
|
||||||
|
# 4. Process right eye (if detections exist)
|
||||||
|
right_masks = None
|
||||||
|
if right_detections:
|
||||||
|
right_eye_prompts = detector.convert_eye_detections_to_sam2_prompts(right_detections, 'right')
|
||||||
|
right_masks = sam2_processor.process_single_eye_segment(segment_info, 'right', right_eye_prompts, previous_right_masks, inference_scale)
|
||||||
|
|
||||||
|
# 5. Combine masks back to full frame format
|
||||||
|
if left_masks or right_masks:
|
||||||
|
combined_masks = sam2_processor.combine_eye_masks(left_masks, right_masks, full_frame_shape)
|
||||||
|
# Continue with normal mask processing...
|
||||||
|
else:
|
||||||
|
# Neither eye had trackable humans - full greenscreen fallback
|
||||||
|
success = mask_processor.process_greenscreen_only_segment(segment_info, expected_frame_count)
|
||||||
|
|
||||||
|
else:
|
||||||
|
# Original processing mode (current behavior)
|
||||||
|
# ... existing logic unchanged
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. File Structure Changes
|
||||||
|
|
||||||
|
#### New Files:
|
||||||
|
- `core/eye_processor.py` - Dedicated class for eye-specific operations
|
||||||
|
- `utils/video_utils.py` - Video manipulation utilities (splitting, combining)
|
||||||
|
|
||||||
|
#### Modified Files:
|
||||||
|
- All core processing modules as detailed above
|
||||||
|
- Update logging to distinguish left/right eye processing
|
||||||
|
- Update debug frame generation for eye-specific visualization
|
||||||
|
|
||||||
|
### 7. Debug and Monitoring Enhancements
|
||||||
|
|
||||||
|
#### Debug Outputs:
|
||||||
|
- `left_eye_debug.jpg` - Left eye YOLO detections
|
||||||
|
- `right_eye_debug.jpg` - Right eye YOLO detections
|
||||||
|
- `left_eye_sam2_masks.jpg` - Left eye SAM2 results
|
||||||
|
- `right_eye_sam2_masks.jpg` - Right eye SAM2 results
|
||||||
|
- `combined_masks_debug.jpg` - Final combined result
|
||||||
|
|
||||||
|
#### Logging Enhancements:
|
||||||
|
- Clear distinction between left/right eye processing stages
|
||||||
|
- Performance metrics for each eye processing
|
||||||
|
- Fallback trigger logging when no humans detected
|
||||||
|
|
||||||
|
### 8. Performance Considerations
|
||||||
|
|
||||||
|
#### Optimizations:
|
||||||
|
- **Parallel Processing**: Process left and right eyes simultaneously using threading
|
||||||
|
- **Selective Processing**: Skip SAM2 for eyes with no YOLO detections
|
||||||
|
- **Memory Management**: Clean up intermediate eye videos promptly
|
||||||
|
- **Caching**: Cache split eye videos if processing multiple segments
|
||||||
|
|
||||||
|
#### Resource Usage:
|
||||||
|
- **Memory**: ~2x peak usage during eye processing (temporary)
|
||||||
|
- **Storage**: Temporary left/right eye videos (~1.5x original size)
|
||||||
|
- **Compute**: Potentially faster overall due to smaller frame processing
|
||||||
|
|
||||||
|
### 9. Backward Compatibility
|
||||||
|
|
||||||
|
#### Default Behavior:
|
||||||
|
- `separate_eye_processing: false` by default
|
||||||
|
- Existing configurations work unchanged
|
||||||
|
- All current functionality preserved
|
||||||
|
|
||||||
|
#### Migration Path:
|
||||||
|
- Users can gradually test new mode on problematic segments
|
||||||
|
- Configuration flag allows easy A/B testing
|
||||||
|
- Existing debug outputs remain functional
|
||||||
|
|
||||||
|
### 10. Error Handling and Fallbacks
|
||||||
|
|
||||||
|
#### Robust Error Recovery:
|
||||||
|
- If eye splitting fails → fall back to original processing
|
||||||
|
- If single eye SAM2 fails → use greenscreen for that eye
|
||||||
|
- If both eyes fail → full greenscreen segment
|
||||||
|
- Comprehensive logging of all fallback triggers
|
||||||
|
|
||||||
|
#### Quality Validation:
|
||||||
|
- Verify combined masks have reasonable pixel counts
|
||||||
|
- Check for mask alignment issues between eyes
|
||||||
|
- Validate segment completeness before marking done
|
||||||
|
|
||||||
|
## Implementation Priority
|
||||||
|
|
||||||
|
### Phase 1 (Core Functionality)
|
||||||
|
1. Configuration schema updates
|
||||||
|
2. Basic eye splitting and recombining logic
|
||||||
|
3. Modified SAM2 processor with separate eye support
|
||||||
|
4. Greenscreen fallback implementation
|
||||||
|
|
||||||
|
### Phase 2 (Integration)
|
||||||
|
1. Main pipeline integration with new processing mode
|
||||||
|
2. YOLO detector eye-specific enhancements
|
||||||
|
3. Mask processor updates for combined masks
|
||||||
|
4. Basic error handling and fallbacks
|
||||||
|
|
||||||
|
### Phase 3 (Polish)
|
||||||
|
1. Performance optimizations (parallel processing)
|
||||||
|
2. Enhanced debug outputs and logging
|
||||||
|
3. Comprehensive testing and validation
|
||||||
|
4. Documentation updates
|
||||||
|
|
||||||
|
## Expected Benefits
|
||||||
|
|
||||||
|
### Tracking Improvements:
|
||||||
|
- **Eliminated Parallax Confusion**: SAM2 processes single viewpoint per eye
|
||||||
|
- **Better Object Consistency**: Single object tracking per eye view
|
||||||
|
- **Improved Temporal Coherence**: Less cross-eye interference
|
||||||
|
- **Reduced False Positives**: Eye-specific context for tracking
|
||||||
|
|
||||||
|
### Operational Benefits:
|
||||||
|
- **Graceful Degradation**: Full greenscreen when humans not detected
|
||||||
|
- **Flexible Processing**: Can enable/disable per pipeline
|
||||||
|
- **Better Debug Visibility**: Eye-specific debug outputs
|
||||||
|
- **Performance Scalability**: Smaller frames = faster processing per eye
|
||||||
|
|
||||||
|
This plan maintains full backward compatibility while adding the requested separate eye processing capability with robust fallback mechanisms.
|
||||||
618
spec.md
618
spec.md
@@ -190,3 +190,621 @@ models:
|
|||||||
- **Fine-tuned YOLO**: Domain-specific human detection models
|
- **Fine-tuned YOLO**: Domain-specific human detection models
|
||||||
- **SAM2 Optimization**: Custom SAM2 checkpoints for video content
|
- **SAM2 Optimization**: Custom SAM2 checkpoints for video content
|
||||||
- **Temporal Consistency**: Enhanced cross-segment mask propagation
|
- **Temporal Consistency**: Enhanced cross-segment mask propagation
|
||||||
|
|
||||||
|
|
||||||
|
Here is the original monolithic script this repo is a refactor/modularization of. If something
|
||||||
|
doesn't work in this repo, then consult the following script becasue it does work so this can
|
||||||
|
be used to solve problems:
|
||||||
|
|
||||||
|
|
||||||
|
import os
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
import cupy as cp
|
||||||
|
from concurrent.futures import ThreadPoolExecutor
|
||||||
|
import torch
|
||||||
|
import logging
|
||||||
|
import sys
|
||||||
|
import gc
|
||||||
|
from sam2.build_sam import build_sam2_video_predictor
|
||||||
|
import argparse
|
||||||
|
from ultralytics import YOLO
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
logging.basicConfig(level=logging.INFO)
|
||||||
|
|
||||||
|
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
|
||||||
|
|
||||||
|
# Variables for input and output directories
|
||||||
|
SAM2_CHECKPOINT = "../checkpoints/sam2.1_hiera_large.pt"
|
||||||
|
MODEL_CFG = "configs/sam2.1/sam2.1_hiera_l.yaml"
|
||||||
|
GREEN = [0, 255, 0]
|
||||||
|
BLUE = [255, 0, 0]
|
||||||
|
|
||||||
|
INFERENCE_SCALE = 0.50
|
||||||
|
FULL_SCALE = 1.0
|
||||||
|
|
||||||
|
# YOLO model for human detection (class 0 = person)
|
||||||
|
YOLO_MODEL_PATH = "yolov8n.pt" # You can change this to a custom model
|
||||||
|
YOLO_CONFIDENCE = 0.6
|
||||||
|
HUMAN_CLASS_ID = 0 # COCO class ID for person
|
||||||
|
|
||||||
|
def open_video(video_path):
|
||||||
|
"""
|
||||||
|
Opens a video file and returns a generator that yields frames.
|
||||||
|
|
||||||
|
Parameters:
|
||||||
|
- video_path: Path to the video file.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
- A generator that yields frames from the video.
|
||||||
|
"""
|
||||||
|
cap = cv2.VideoCapture(video_path)
|
||||||
|
if not cap.isOpened():
|
||||||
|
print(f"Error: Could not open video file {video_path}")
|
||||||
|
return
|
||||||
|
while True:
|
||||||
|
ret, frame = cap.read()
|
||||||
|
if not ret:
|
||||||
|
break
|
||||||
|
yield frame
|
||||||
|
cap.release()
|
||||||
|
|
||||||
|
def load_previous_segment_mask(prev_segment_dir):
|
||||||
|
mask_path = os.path.join(prev_segment_dir, "mask.png")
|
||||||
|
mask_image = cv2.imread(mask_path)
|
||||||
|
|
||||||
|
if mask_image is None:
|
||||||
|
raise FileNotFoundError(f"Mask image not found at {mask_path}")
|
||||||
|
|
||||||
|
# Ensure the mask_image has three color channels
|
||||||
|
if len(mask_image.shape) != 3 or mask_image.shape[2] != 3:
|
||||||
|
raise ValueError("Mask image does not have three color channels.")
|
||||||
|
|
||||||
|
mask_image = mask_image.astype(np.uint8)
|
||||||
|
|
||||||
|
# Extract Object A and Object B masks
|
||||||
|
mask_a = np.all(mask_image == GREEN, axis=2)
|
||||||
|
mask_b = np.all(mask_image == BLUE, axis=2)
|
||||||
|
|
||||||
|
per_obj_input_mask = {1: mask_a, 2: mask_b}
|
||||||
|
input_palette = None # No palette needed for binary mask
|
||||||
|
|
||||||
|
return per_obj_input_mask, input_palette
|
||||||
|
|
||||||
|
|
||||||
|
def apply_green_mask(frame, masks):
|
||||||
|
# Convert frame and masks to CuPy arrays
|
||||||
|
frame_gpu = cp.asarray(frame)
|
||||||
|
combined_mask = cp.zeros(frame_gpu.shape[:2], dtype=cp.bool_)
|
||||||
|
|
||||||
|
for mask in masks:
|
||||||
|
mask_gpu = cp.asarray(mask.squeeze())
|
||||||
|
if mask_gpu.shape != frame_gpu.shape[:2]:
|
||||||
|
resized_mask = cv2.resize(cp.asnumpy(mask_gpu).astype(cp.float32),
|
||||||
|
(frame_gpu.shape[1], frame_gpu.shape[0]))
|
||||||
|
mask_gpu = cp.asarray(resized_mask > 0.5) # Convert back to CuPy boolean array
|
||||||
|
else:
|
||||||
|
mask_gpu = mask_gpu.astype(cp.bool_) # Ensure boolean type
|
||||||
|
combined_mask |= mask_gpu # Perform the bitwise OR operation
|
||||||
|
|
||||||
|
green_background = cp.full(frame_gpu.shape, cp.array([0, 255, 0], dtype=cp.uint8), dtype=cp.uint8)
|
||||||
|
result_frame = cp.where(combined_mask[..., None], frame_gpu, green_background)
|
||||||
|
return cp.asnumpy(result_frame) # Convert back to NumPy
|
||||||
|
|
||||||
|
|
||||||
|
def initialize_predictor():
|
||||||
|
if torch.cuda.is_available():
|
||||||
|
device = torch.device("cuda")
|
||||||
|
elif torch.backends.mps.is_available():
|
||||||
|
device = torch.device("mps")
|
||||||
|
print(
|
||||||
|
"\nSupport for MPS devices is preliminary. SAM 2 is trained with CUDA and might "
|
||||||
|
"give numerically different outputs and sometimes degraded performance on MPS."
|
||||||
|
)
|
||||||
|
# Enable MPS fallback for operations not supported on MPS
|
||||||
|
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
|
||||||
|
else:
|
||||||
|
device = torch.device("cpu")
|
||||||
|
logger.info(f"Using device: {device}")
|
||||||
|
predictor = build_sam2_video_predictor(MODEL_CFG, SAM2_CHECKPOINT, device=device)
|
||||||
|
return predictor
|
||||||
|
|
||||||
|
|
||||||
|
def load_first_frame(video_path, scale=1.0):
|
||||||
|
"""
|
||||||
|
Opens a video file and returns the first frame, scaled as specified.
|
||||||
|
|
||||||
|
Parameters:
|
||||||
|
- video_path: Path to the video file.
|
||||||
|
- scale: Scaling factor for the frame (default is 1.0 for original size).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
- first_frame: The first frame of the video, scaled accordingly.
|
||||||
|
"""
|
||||||
|
cap = cv2.VideoCapture(video_path)
|
||||||
|
if not cap.isOpened():
|
||||||
|
logger.error(f"Error: Could not open video file {video_path}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
ret, frame = cap.read()
|
||||||
|
cap.release()
|
||||||
|
|
||||||
|
if not ret:
|
||||||
|
logger.error(f"Error: Could not read frame from video file {video_path}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
if scale != 1.0:
|
||||||
|
frame = cv2.resize(
|
||||||
|
frame, None, fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR
|
||||||
|
)
|
||||||
|
|
||||||
|
return frame
|
||||||
|
|
||||||
|
def detect_humans_with_yolo(frame, yolo_model, confidence_threshold=YOLO_CONFIDENCE):
|
||||||
|
"""
|
||||||
|
Detect humans in a frame using YOLO model.
|
||||||
|
|
||||||
|
Parameters:
|
||||||
|
- frame: Input frame (BGR format)
|
||||||
|
- yolo_model: Loaded YOLO model
|
||||||
|
- confidence_threshold: Detection confidence threshold
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
- human_boxes: List of bounding boxes for detected humans
|
||||||
|
"""
|
||||||
|
# Run YOLO detection
|
||||||
|
results = yolo_model(frame, conf=confidence_threshold, verbose=False)
|
||||||
|
|
||||||
|
human_boxes = []
|
||||||
|
|
||||||
|
# Process results
|
||||||
|
for result in results:
|
||||||
|
boxes = result.boxes
|
||||||
|
if boxes is not None:
|
||||||
|
for box in boxes:
|
||||||
|
# Get class ID
|
||||||
|
cls = int(box.cls.cpu().numpy()[0])
|
||||||
|
|
||||||
|
# Check if it's a person (class 0 in COCO)
|
||||||
|
if cls == HUMAN_CLASS_ID:
|
||||||
|
# Get bounding box coordinates (x1, y1, x2, y2)
|
||||||
|
coords = box.xyxy[0].cpu().numpy()
|
||||||
|
conf = float(box.conf.cpu().numpy()[0])
|
||||||
|
|
||||||
|
human_boxes.append({
|
||||||
|
'bbox': coords,
|
||||||
|
'confidence': conf
|
||||||
|
})
|
||||||
|
|
||||||
|
logger.info(f"Detected human with confidence {conf:.2f} at {coords}")
|
||||||
|
|
||||||
|
return human_boxes
|
||||||
|
|
||||||
|
def add_yolo_detections_to_predictor(predictor, inference_state, human_detections, frame_width):
|
||||||
|
"""
|
||||||
|
Add YOLO human detections as bounding boxes to SAM2 predictor.
|
||||||
|
For stereo videos, creates two objects (left and right humans).
|
||||||
|
|
||||||
|
Parameters:
|
||||||
|
- predictor: SAM2 video predictor
|
||||||
|
- inference_state: SAM2 inference state
|
||||||
|
- human_detections: List of human detection results
|
||||||
|
- frame_width: Width of the frame for stereo splitting
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
- out_mask_logits: SAM2 output mask logits
|
||||||
|
"""
|
||||||
|
half_frame_width = frame_width // 2
|
||||||
|
|
||||||
|
# Sort detections by x-coordinate to get left and right humans
|
||||||
|
human_detections.sort(key=lambda x: x['bbox'][0]) # Sort by x1 coordinate
|
||||||
|
|
||||||
|
obj_id = 1
|
||||||
|
out_mask_logits = None
|
||||||
|
|
||||||
|
for i, detection in enumerate(human_detections[:2]): # Take up to 2 humans (left and right)
|
||||||
|
bbox = detection['bbox']
|
||||||
|
|
||||||
|
# For stereo videos, assign obj_id based on position
|
||||||
|
if len(human_detections) >= 2:
|
||||||
|
# If we have multiple humans, assign based on left/right position
|
||||||
|
center_x = (bbox[0] + bbox[2]) / 2
|
||||||
|
if center_x < half_frame_width:
|
||||||
|
current_obj_id = 1 # Left human
|
||||||
|
else:
|
||||||
|
current_obj_id = 2 # Right human
|
||||||
|
else:
|
||||||
|
# If only one human, duplicate for both sides (as in original stereo logic)
|
||||||
|
current_obj_id = obj_id
|
||||||
|
obj_id += 1
|
||||||
|
|
||||||
|
# Also add the mirrored version for stereo
|
||||||
|
if obj_id <= 2:
|
||||||
|
mirrored_bbox = bbox.copy()
|
||||||
|
mirrored_bbox[0] += half_frame_width # Shift x1
|
||||||
|
mirrored_bbox[2] += half_frame_width # Shift x2
|
||||||
|
|
||||||
|
# Ensure mirrored bbox is within frame bounds
|
||||||
|
mirrored_bbox[0] = max(0, min(mirrored_bbox[0], frame_width - 1))
|
||||||
|
mirrored_bbox[2] = max(0, min(mirrored_bbox[2], frame_width - 1))
|
||||||
|
|
||||||
|
try:
|
||||||
|
_, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
|
||||||
|
inference_state=inference_state,
|
||||||
|
frame_idx=0,
|
||||||
|
obj_id=obj_id,
|
||||||
|
box=mirrored_bbox.astype(np.float32),
|
||||||
|
)
|
||||||
|
logger.info(f"Added mirrored human detection for Object {obj_id}")
|
||||||
|
obj_id += 1
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error adding mirrored human detection for Object {obj_id}: {e}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
_, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
|
||||||
|
inference_state=inference_state,
|
||||||
|
frame_idx=0,
|
||||||
|
obj_id=current_obj_id,
|
||||||
|
box=bbox.astype(np.float32),
|
||||||
|
)
|
||||||
|
logger.info(f"Added human detection for Object {current_obj_id}")
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error adding human detection for Object {current_obj_id}: {e}")
|
||||||
|
|
||||||
|
return out_mask_logits
|
||||||
|
|
||||||
|
def propagate_masks(predictor, inference_state):
|
||||||
|
video_segments = {}
|
||||||
|
for out_frame_idx, out_obj_ids, out_mask_logits in predictor.propagate_in_video(inference_state):
|
||||||
|
video_segments[out_frame_idx] = {
|
||||||
|
out_obj_id: (out_mask_logits[i] > 0.0).cpu().numpy()
|
||||||
|
for i, out_obj_id in enumerate(out_obj_ids)
|
||||||
|
}
|
||||||
|
return video_segments
|
||||||
|
|
||||||
|
def apply_colored_mask(frame, masks_a, masks_b):
|
||||||
|
colored_mask = np.zeros_like(frame)
|
||||||
|
|
||||||
|
# Apply colors to the masks
|
||||||
|
for mask in masks_a:
|
||||||
|
mask = mask.squeeze()
|
||||||
|
if mask.shape != frame.shape[:2]:
|
||||||
|
mask = cv2.resize(mask, (frame.shape[1], frame.shape[0]), interpolation=cv2.INTER_NEAREST)
|
||||||
|
indices = np.where(mask)
|
||||||
|
colored_mask[mask] = [0, 255, 0] # Green for Object A
|
||||||
|
|
||||||
|
for mask in masks_b:
|
||||||
|
mask = mask.squeeze()
|
||||||
|
if mask.shape != frame.shape[:2]:
|
||||||
|
mask = cv2.resize(mask, (frame.shape[1], frame.shape[0]), interpolation=cv2.INTER_NEAREST)
|
||||||
|
indices = np.where(mask)
|
||||||
|
colored_mask[mask] = [255, 0, 0] # Blue for Object B
|
||||||
|
|
||||||
|
return colored_mask
|
||||||
|
|
||||||
|
|
||||||
|
def process_and_save_output_video(video_path, output_video_path, video_segments, use_nvenc=False):
|
||||||
|
"""
|
||||||
|
Process high-resolution frames, apply upscaled masks, and save the output video.
|
||||||
|
"""
|
||||||
|
cap = cv2.VideoCapture(video_path)
|
||||||
|
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||||
|
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||||
|
fps = cap.get(cv2.CAP_PROP_FPS) or 59.94
|
||||||
|
|
||||||
|
# Setup VideoWriter with desired settings
|
||||||
|
if use_nvenc:
|
||||||
|
# Use FFmpeg with NVENC offloading for H.265 encoding
|
||||||
|
import subprocess
|
||||||
|
|
||||||
|
if sys.platform == 'darwin':
|
||||||
|
encoder = 'hevc_videotoolbox'
|
||||||
|
else:
|
||||||
|
encoder = 'hevc_nvenc'
|
||||||
|
|
||||||
|
command = [
|
||||||
|
'ffmpeg',
|
||||||
|
'-y', # Overwrite output file if it exists
|
||||||
|
'-f', 'rawvideo',
|
||||||
|
'-vcodec', 'rawvideo',
|
||||||
|
'-pix_fmt', 'bgr24',
|
||||||
|
'-s', f'{frame_width}x{frame_height}',
|
||||||
|
'-r', str(fps),
|
||||||
|
'-i', '-', # Input from stdin
|
||||||
|
'-an', # No audio
|
||||||
|
'-vcodec', encoder,
|
||||||
|
'-pix_fmt', 'nv12',
|
||||||
|
'-preset', 'slow',
|
||||||
|
'-b:v', '50M',
|
||||||
|
output_video_path
|
||||||
|
]
|
||||||
|
process = subprocess.Popen(command, stdin=subprocess.PIPE)
|
||||||
|
else:
|
||||||
|
# Use OpenCV VideoWriter
|
||||||
|
fourcc = cv2.VideoWriter_fourcc(*'HEVC') # H.265
|
||||||
|
out = cv2.VideoWriter(output_video_path, fourcc, fps, (frame_width, frame_height))
|
||||||
|
|
||||||
|
frame_idx = 0
|
||||||
|
while True:
|
||||||
|
ret, frame = cap.read()
|
||||||
|
if not ret or frame_idx >= len(video_segments):
|
||||||
|
break
|
||||||
|
|
||||||
|
masks = [video_segments[frame_idx][out_obj_id] for out_obj_id in video_segments[frame_idx]]
|
||||||
|
upscaled_masks = []
|
||||||
|
|
||||||
|
for mask in masks:
|
||||||
|
mask = mask.squeeze()
|
||||||
|
upscaled_mask = cv2.resize(mask.astype(np.uint8), (frame.shape[1], frame.shape[0]), interpolation=cv2.INTER_NEAREST)
|
||||||
|
upscaled_masks.append(upscaled_mask)
|
||||||
|
|
||||||
|
result_frame = apply_green_mask(frame, upscaled_masks)
|
||||||
|
|
||||||
|
# Write frame to output
|
||||||
|
if use_nvenc:
|
||||||
|
process.stdin.write(result_frame.tobytes())
|
||||||
|
else:
|
||||||
|
out.write(result_frame)
|
||||||
|
|
||||||
|
frame_idx += 1
|
||||||
|
|
||||||
|
cap.release()
|
||||||
|
if use_nvenc:
|
||||||
|
process.stdin.close()
|
||||||
|
process.wait()
|
||||||
|
else:
|
||||||
|
out.release()
|
||||||
|
|
||||||
|
def get_video_file_name(index):
|
||||||
|
return f"segment_{str(index).zfill(3)}.mp4"
|
||||||
|
|
||||||
|
def do_yolo_detection_on_segments(base_dir, segments, detect_segments, scale=1.0, yolo_model_path=YOLO_MODEL_PATH):
|
||||||
|
"""
|
||||||
|
Run YOLO detection on specified segments and save detection results.
|
||||||
|
"""
|
||||||
|
logger.info("Running YOLO detection on requested segments.")
|
||||||
|
|
||||||
|
# Load YOLO model
|
||||||
|
yolo_model = YOLO(yolo_model_path)
|
||||||
|
|
||||||
|
for i, segment in enumerate(segments):
|
||||||
|
segment_index = int(segment.split("_")[1])
|
||||||
|
segment_dir = os.path.join(base_dir, segment)
|
||||||
|
detection_file = os.path.join(segment_dir, "yolo_detections")
|
||||||
|
video_file = os.path.join(segment_dir, get_video_file_name(i))
|
||||||
|
|
||||||
|
if segment_index in detect_segments and not os.path.exists(detection_file):
|
||||||
|
first_frame = load_first_frame(video_file, scale)
|
||||||
|
if first_frame is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Convert BGR to RGB for YOLO (YOLO expects BGR, so keep as BGR)
|
||||||
|
human_detections = detect_humans_with_yolo(first_frame, yolo_model)
|
||||||
|
|
||||||
|
if human_detections:
|
||||||
|
# Save detection results
|
||||||
|
with open(detection_file, 'w') as f:
|
||||||
|
f.write("# YOLO Human Detections\n")
|
||||||
|
for detection in human_detections:
|
||||||
|
bbox = detection['bbox']
|
||||||
|
conf = detection['confidence']
|
||||||
|
f.write(f"{bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]},{conf}\n")
|
||||||
|
logger.info(f"Saved {len(human_detections)} human detections for segment {segment}")
|
||||||
|
else:
|
||||||
|
logger.warning(f"No humans detected in segment {segment}")
|
||||||
|
# Create empty file to mark as processed
|
||||||
|
with open(detection_file, 'w') as f:
|
||||||
|
f.write("# No humans detected\n")
|
||||||
|
|
||||||
|
def save_final_masks(video_segments, mask_output_path):
|
||||||
|
"""
|
||||||
|
Save the final masks as a colored image.
|
||||||
|
"""
|
||||||
|
last_frame_idx = max(video_segments.keys())
|
||||||
|
masks_dict = video_segments[last_frame_idx]
|
||||||
|
# Assuming you have two objects with IDs 1 and 2
|
||||||
|
mask_a = masks_dict.get(1).squeeze() if 1 in masks_dict else None
|
||||||
|
mask_b = masks_dict.get(2).squeeze() if 2 in masks_dict else None
|
||||||
|
|
||||||
|
if mask_a is None and mask_b is None:
|
||||||
|
logger.error("No masks found for objects.")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Use the first available mask to determine dimensions
|
||||||
|
reference_mask = mask_a if mask_a is not None else mask_b
|
||||||
|
black_frame = np.zeros((reference_mask.shape[0], reference_mask.shape[1], 3), dtype=np.uint8)
|
||||||
|
|
||||||
|
if mask_a is not None:
|
||||||
|
mask_a = mask_a.astype(bool)
|
||||||
|
black_frame[mask_a] = GREEN
|
||||||
|
|
||||||
|
if mask_b is not None:
|
||||||
|
mask_b = mask_b.astype(bool)
|
||||||
|
black_frame[mask_b] = BLUE
|
||||||
|
|
||||||
|
# Save the mask image
|
||||||
|
cv2.imwrite(mask_output_path, black_frame)
|
||||||
|
logger.info(f"Saved final masks to {mask_output_path}")
|
||||||
|
|
||||||
|
def create_low_res_video(input_video_path, output_video_path, scale):
|
||||||
|
"""
|
||||||
|
Creates a low-resolution version of the input video for inference.
|
||||||
|
"""
|
||||||
|
cap = cv2.VideoCapture(input_video_path)
|
||||||
|
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH) * scale)
|
||||||
|
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT) * scale)
|
||||||
|
fps = cap.get(cv2.CAP_PROP_FPS) or 59.94
|
||||||
|
|
||||||
|
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
|
||||||
|
out = cv2.VideoWriter(output_video_path, fourcc, fps, (frame_width, frame_height))
|
||||||
|
|
||||||
|
while True:
|
||||||
|
ret, frame = cap.read()
|
||||||
|
if not ret:
|
||||||
|
break
|
||||||
|
low_res_frame = cv2.resize(frame, (frame_width, frame_height), interpolation=cv2.INTER_LINEAR)
|
||||||
|
out.write(low_res_frame)
|
||||||
|
|
||||||
|
cap.release()
|
||||||
|
out.release()
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(description="Process video segments with YOLO + SAM2.")
|
||||||
|
parser.add_argument("--base-dir", type=str, help="Base directory for video segments.")
|
||||||
|
parser.add_argument("--segments-detect-humans", nargs='*', help="Segments for which to run YOLO human detection. Use 'all' for all segments, or list specific segment numbers (e.g., 1 5 10). Default: all segments.")
|
||||||
|
parser.add_argument("--yolo-model", type=str, default=YOLO_MODEL_PATH, help="Path to YOLO model.")
|
||||||
|
parser.add_argument("--yolo-confidence", type=float, default=YOLO_CONFIDENCE, help="YOLO detection confidence threshold.")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
base_dir = args.base_dir
|
||||||
|
segments = [d for d in os.listdir(base_dir) if os.path.isdir(os.path.join(base_dir, d)) and d.startswith("segment_")]
|
||||||
|
segments.sort(key=lambda x: int(x.split("_")[1]))
|
||||||
|
|
||||||
|
# Handle different ways to specify segments for YOLO detection
|
||||||
|
if args.segments_detect_humans is None or len(args.segments_detect_humans) == 0:
|
||||||
|
# Default: run YOLO on all segments
|
||||||
|
detect_segments = [int(seg.split("_")[1]) for seg in segments]
|
||||||
|
logger.info("No segments specified, running YOLO detection on ALL segments")
|
||||||
|
elif len(args.segments_detect_humans) == 1 and args.segments_detect_humans[0].lower() == 'all':
|
||||||
|
# Explicit 'all' keyword
|
||||||
|
detect_segments = [int(seg.split("_")[1]) for seg in segments]
|
||||||
|
logger.info("Running YOLO detection on ALL segments")
|
||||||
|
else:
|
||||||
|
# Specific segment numbers provided
|
||||||
|
try:
|
||||||
|
detect_segments = [int(x) for x in args.segments_detect_humans]
|
||||||
|
logger.info(f"Running YOLO detection on segments: {detect_segments}")
|
||||||
|
except ValueError:
|
||||||
|
logger.error("Invalid segment numbers provided. Use integers or 'all'.")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Run YOLO detection on specified segments
|
||||||
|
do_yolo_detection_on_segments(base_dir, segments, detect_segments, scale=INFERENCE_SCALE, yolo_model_path=args.yolo_model)
|
||||||
|
|
||||||
|
# Load YOLO model for inference
|
||||||
|
yolo_model = YOLO(args.yolo_model)
|
||||||
|
|
||||||
|
for i, segment in enumerate(segments):
|
||||||
|
segment_index = int(segment.split("_")[1])
|
||||||
|
segment_dir = os.path.join(base_dir, segment)
|
||||||
|
video_file_name = get_video_file_name(i)
|
||||||
|
video_path = os.path.join(segment_dir, video_file_name)
|
||||||
|
output_done_file = os.path.join(segment_dir, "output_frames_done")
|
||||||
|
|
||||||
|
if os.path.exists(output_done_file):
|
||||||
|
logger.info(f"Segment {segment} already processed. Skipping.")
|
||||||
|
continue
|
||||||
|
|
||||||
|
logger.info(f"Processing segment {segment}")
|
||||||
|
|
||||||
|
# Initialize predictor
|
||||||
|
predictor = initialize_predictor()
|
||||||
|
|
||||||
|
# Prepare low-resolution video frames for inference
|
||||||
|
low_res_video_path = os.path.join(segment_dir, "low_res_video.mp4")
|
||||||
|
if not os.path.exists(low_res_video_path):
|
||||||
|
create_low_res_video(video_path, low_res_video_path, INFERENCE_SCALE)
|
||||||
|
logger.info(f"Low-resolution video created for segment {segment}")
|
||||||
|
else:
|
||||||
|
logger.info(f"Low-resolution video already exists for segment {segment}, reuse")
|
||||||
|
|
||||||
|
# Initialize inference state with low-resolution video
|
||||||
|
inference_state = predictor.init_state(video_path=low_res_video_path, async_loading_frames=True)
|
||||||
|
|
||||||
|
# Load YOLO detections or previous masks
|
||||||
|
detection_file = os.path.join(segment_dir, "yolo_detections")
|
||||||
|
use_detections = segment_index in detect_segments
|
||||||
|
|
||||||
|
if i == 0 and not use_detections:
|
||||||
|
# First segment must use YOLO detection since there's no previous mask
|
||||||
|
logger.warning(f"First segment {segment} requires YOLO detection. Running YOLO detection.")
|
||||||
|
use_detections = True
|
||||||
|
|
||||||
|
if i > 0 and not use_detections:
|
||||||
|
# Try to load previous segment mask - search backwards for the most recent successful mask
|
||||||
|
logger.info(f"Using previous segment mask for segment {segment}")
|
||||||
|
mask_found = False
|
||||||
|
|
||||||
|
# Search backwards through previous segments to find a valid mask
|
||||||
|
for j in range(i - 1, -1, -1):
|
||||||
|
prev_segment_dir = os.path.join(base_dir, segments[j])
|
||||||
|
prev_mask_path = os.path.join(prev_segment_dir, "mask.png")
|
||||||
|
|
||||||
|
if os.path.exists(prev_mask_path):
|
||||||
|
try:
|
||||||
|
per_obj_input_mask, input_palette = load_previous_segment_mask(prev_segment_dir)
|
||||||
|
# Add previous masks to predictor
|
||||||
|
for obj_id, mask in per_obj_input_mask.items():
|
||||||
|
predictor.add_new_mask(inference_state, 0, obj_id, mask)
|
||||||
|
logger.info(f"Successfully loaded mask from segment {segments[j]}")
|
||||||
|
mask_found = True
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Error loading mask from {segments[j]}: {e}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
if not mask_found:
|
||||||
|
logger.error(f"No valid previous mask found for segment {segment}. Consider running YOLO detection on this segment.")
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
# Load first frame for detection
|
||||||
|
first_frame = load_first_frame(low_res_video_path, scale=1.0)
|
||||||
|
if first_frame is None:
|
||||||
|
logger.error(f"Could not load first frame for segment {segment}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Run YOLO detection on first frame (either from file or on-the-fly)
|
||||||
|
if os.path.exists(detection_file):
|
||||||
|
logger.info(f"Using existing YOLO detections for segment {segment}")
|
||||||
|
else:
|
||||||
|
logger.info(f"Running YOLO detection on-the-fly for segment {segment}")
|
||||||
|
|
||||||
|
human_detections = detect_humans_with_yolo(first_frame, yolo_model, args.yolo_confidence)
|
||||||
|
|
||||||
|
if human_detections:
|
||||||
|
# Add YOLO detections to predictor
|
||||||
|
frame_width = first_frame.shape[1]
|
||||||
|
add_yolo_detections_to_predictor(predictor, inference_state, human_detections, frame_width)
|
||||||
|
else:
|
||||||
|
logger.warning(f"No humans detected in segment {segment}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Perform inference and collect masks per frame
|
||||||
|
video_segments = propagate_masks(predictor, inference_state)
|
||||||
|
|
||||||
|
# Process high-resolution frames and save output video
|
||||||
|
output_video_path = os.path.join(segment_dir, f"output_{segment_index}.mp4")
|
||||||
|
logger.info("Processing segment complete, attempting to save full video from low res masks")
|
||||||
|
process_and_save_output_video(
|
||||||
|
video_path,
|
||||||
|
output_video_path,
|
||||||
|
video_segments,
|
||||||
|
use_nvenc=True # Set to True to use NVENC offloading
|
||||||
|
)
|
||||||
|
|
||||||
|
# Save final masks
|
||||||
|
mask_output_path = os.path.join(segment_dir, "mask.png")
|
||||||
|
save_final_masks(video_segments, mask_output_path)
|
||||||
|
|
||||||
|
# Clean up
|
||||||
|
predictor.reset_state(inference_state)
|
||||||
|
del inference_state
|
||||||
|
del video_segments
|
||||||
|
del predictor
|
||||||
|
gc.collect()
|
||||||
|
|
||||||
|
try:
|
||||||
|
os.remove(low_res_video_path)
|
||||||
|
logger.info(f"Deleted low-resolution video for segment {segment}")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Could not delete low-resolution video for segment {segment}: {e}")
|
||||||
|
|
||||||
|
# Mark segment as completed
|
||||||
|
open(output_done_file, 'a').close()
|
||||||
|
|
||||||
|
logger.info("Processing complete.")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|||||||
122
test-separate-eyes-config.yaml
Normal file
122
test-separate-eyes-config.yaml
Normal file
@@ -0,0 +1,122 @@
|
|||||||
|
# YOLO + SAM2 Video Processing Configuration with VR180 Separate Eye Processing
|
||||||
|
|
||||||
|
input:
|
||||||
|
video_path: "./input/regrets_full.mp4"
|
||||||
|
|
||||||
|
output:
|
||||||
|
directory: "./output/"
|
||||||
|
filename: "vr180_processed_both_eyes.mp4"
|
||||||
|
|
||||||
|
processing:
|
||||||
|
# Duration of each video segment in seconds
|
||||||
|
segment_duration: 5
|
||||||
|
|
||||||
|
# Scale factor for SAM2 inference (0.5 = half resolution)
|
||||||
|
inference_scale: 0.4
|
||||||
|
|
||||||
|
# YOLO detection confidence threshold (lowered for better VR180 detection)
|
||||||
|
yolo_confidence: 0.4
|
||||||
|
|
||||||
|
# Which segments to run YOLO detection on
|
||||||
|
detect_segments: "all"
|
||||||
|
|
||||||
|
# VR180 separate eye processing mode (ENABLED FOR TESTING)
|
||||||
|
separate_eye_processing: false
|
||||||
|
|
||||||
|
# Enable full greenscreen fallback when no humans detected
|
||||||
|
# A value of 0.5 means masks must overlap by 50% to be considered a pair.
|
||||||
|
stereo_iou_threshold: 0.5
|
||||||
|
|
||||||
|
# Factor to reduce YOLO confidence by if no stereo pairs are found on the first try (e.g., 0.8 = 20% reduction).
|
||||||
|
confidence_reduction_factor: 0.8
|
||||||
|
|
||||||
|
# If no humans are detected in a segment, create a full green screen video.
|
||||||
|
# Only used when separate_eye_processing is true.
|
||||||
|
enable_greenscreen_fallback: true
|
||||||
|
|
||||||
|
# Pixel overlap between left/right eyes for blending (0 = no overlap)
|
||||||
|
eye_overlap_pixels: 0
|
||||||
|
|
||||||
|
models:
|
||||||
|
# YOLO detection mode: "detection" (bounding boxes) or "segmentation" (direct masks)
|
||||||
|
yolo_mode: "segmentation" # Default: existing behavior, Options: "detection", "segmentation"
|
||||||
|
|
||||||
|
# YOLO model paths for different modes
|
||||||
|
yolo_detection_model: "models/yolo/yolo11l.pt" # Regular YOLO for detection mode
|
||||||
|
yolo_segmentation_model: "models/yolo/yolo11x-seg.pt" # Segmentation YOLO for segmentation mode
|
||||||
|
|
||||||
|
# SAM2 model configuration
|
||||||
|
sam2_checkpoint: "models/sam2/checkpoints/sam2.1_hiera_small.pt"
|
||||||
|
sam2_config: "models/sam2/configs/sam2.1/sam2.1_hiera_s.yaml"
|
||||||
|
|
||||||
|
video:
|
||||||
|
# Use NVIDIA hardware encoding (requires NVENC-capable GPU)
|
||||||
|
use_nvenc: true
|
||||||
|
|
||||||
|
# Output video bitrate
|
||||||
|
output_bitrate: "25M"
|
||||||
|
|
||||||
|
# Preserve original audio track
|
||||||
|
preserve_audio: true
|
||||||
|
|
||||||
|
# Force keyframes for better segment boundaries
|
||||||
|
force_keyframes: true
|
||||||
|
|
||||||
|
advanced:
|
||||||
|
# Green screen color (RGB values)
|
||||||
|
green_color: [0, 255, 0]
|
||||||
|
|
||||||
|
# Blue screen color for second object (RGB values)
|
||||||
|
blue_color: [255, 0, 0]
|
||||||
|
|
||||||
|
# YOLO human class ID (0 for COCO person class)
|
||||||
|
human_class_id: 0
|
||||||
|
|
||||||
|
# GPU memory management
|
||||||
|
cleanup_intermediate_files: true
|
||||||
|
|
||||||
|
# Logging level (DEBUG, INFO, WARNING, ERROR)
|
||||||
|
log_level: "INFO"
|
||||||
|
|
||||||
|
# Save debug frames with YOLO detections visualized (ENABLED FOR TESTING)
|
||||||
|
save_yolo_debug_frames: true
|
||||||
|
|
||||||
|
# --- Mid-Segment Re-detection ---
|
||||||
|
# Re-run YOLO at intervals within a segment to correct tracking drift.
|
||||||
|
enable_mid_segment_detection: false
|
||||||
|
redetection_interval: 30 # Frames between re-detections.
|
||||||
|
max_redetections_per_segment: 10
|
||||||
|
|
||||||
|
|
||||||
|
# Parallel Processing Optimizations
|
||||||
|
enable_background_lowres_generation: false # Enable async low-res video pre-generation (temporarily disabled due to syntax fix needed)
|
||||||
|
max_concurrent_lowres: 2 # Max parallel FFmpeg processes for low-res creation
|
||||||
|
lowres_segments_ahead: 2 # How many segments to prepare in advance
|
||||||
|
use_ffmpeg_lowres: true # Use FFmpeg instead of OpenCV for low-res creation
|
||||||
|
|
||||||
|
# Mask Quality Enhancement Settings - Optimized for Performance
|
||||||
|
mask_processing:
|
||||||
|
# Edge feathering and blurring (REDUCED for performance)
|
||||||
|
enable_edge_blur: true # Enable Gaussian blur on mask edges for smooth transitions
|
||||||
|
edge_blur_radius: 3 # Reduced from 10 to 3 for better performance
|
||||||
|
edge_blur_sigma: 0.5 # Gaussian blur standard deviation
|
||||||
|
|
||||||
|
# Temporal smoothing between frames
|
||||||
|
enable_temporal_smoothing: false # Enable frame-to-frame mask blending
|
||||||
|
temporal_blend_weight: 0.2 # Weight for previous frame (0.0-1.0, higher = more smoothing)
|
||||||
|
temporal_history_frames: 2 # Number of previous frames to consider
|
||||||
|
|
||||||
|
# Morphological mask cleaning (DISABLED for VR180 - SAM2 masks are already high quality)
|
||||||
|
enable_morphological_cleaning: false # Disabled for performance - SAM2 produces clean masks
|
||||||
|
morphology_kernel_size: 5 # Kernel size for opening/closing operations
|
||||||
|
min_component_size: 500 # Minimum pixel area for connected components
|
||||||
|
|
||||||
|
# Alpha blending mode (OPTIMIZED)
|
||||||
|
alpha_blending_mode: "linear" # Linear is fastest - keep as-is
|
||||||
|
alpha_transition_width: 1 # Width of transition zone in pixels
|
||||||
|
|
||||||
|
# Advanced options
|
||||||
|
enable_bilateral_filter: false # Edge-preserving smoothing (slower but higher quality)
|
||||||
|
bilateral_d: 9 # Bilateral filter diameter
|
||||||
|
bilateral_sigma_color: 75 # Bilateral filter color sigma
|
||||||
|
bilateral_sigma_space: 75 # Bilateral filter space sigma
|
||||||
Reference in New Issue
Block a user