diff --git a/README.md b/README.md new file mode 100644 index 0000000..005589d --- /dev/null +++ b/README.md @@ -0,0 +1,300 @@ +# YOLO + SAM2 Video Processing Pipeline + +An automated video processing system that combines YOLO object detection with Meta's SAM2 (Segment Anything Model 2) to create green screen videos with precise human segmentation. + +## Overview + +This pipeline processes long videos by splitting them into manageable segments, detecting humans using YOLO, and generating precise masks with SAM2 for green screen background replacement. The system preserves audio and maintains video quality throughout the process. + +## Features + +- **Automated Human Detection**: Uses YOLOv8 for robust human detection +- **Precise Segmentation**: Leverages SAM2 for accurate mask generation +- **Scalable Processing**: Handles videos of any length through segmentation +- **GPU Acceleration**: CUDA/NVENC support for faster processing +- **Audio Preservation**: Maintains original audio track in output +- **Stereo Video Support**: Handles VR/360 content with left/right tracking +- **Configurable Pipeline**: YAML-based configuration for easy customization + +## Installation + +### Prerequisites + +- Python 3.8+ +- NVIDIA GPU with CUDA support (recommended) +- FFmpeg installed and available in PATH + +### Install Dependencies + +```bash +# Clone the repository +git clone +cd samyolo_on_segments + +# Install Python dependencies +pip install -r requirements.txt +``` + +### Model Dependencies + +You'll need to download the required model checkpoints: + +1. **SAM2 Models**: Download from [Meta's SAM2 repository](https://github.com/facebookresearch/sam2) +2. **YOLO Models**: YOLOv8 models will be downloaded automatically or you can specify a custom path + +## Quick Start + +### 1. Configure the Pipeline + +Edit `config.yaml` to specify your input video and desired settings: + +```yaml +input: + video_path: "/path/to/your/video.mp4" + +output: + directory: "/path/to/output/" + filename: "processed_video.mp4" + +processing: + segment_duration: 5 + inference_scale: 0.5 + yolo_confidence: 0.6 + detect_segments: "all" + +models: + yolo_model: "yolov8n.pt" + sam2_checkpoint: "../checkpoints/sam2.1_hiera_large.pt" + sam2_config: "configs/sam2.1/sam2.1_hiera_l.yaml" +``` + +### 2. Run the Pipeline + +```bash +python main.py --config config.yaml +``` + +### 3. Monitor Progress + +Check processing status: +```bash +python main.py --config config.yaml --status +``` + +Clean up a specific segment for reprocessing: +```bash +python main.py --config config.yaml --cleanup-segment 5 +``` + +## Configuration Options + +### Input/Output Settings + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `input.video_path` | Path to input video file | Required | +| `output.directory` | Output directory path | Required | +| `output.filename` | Output video filename | Required | + +### Processing Parameters + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `processing.segment_duration` | Duration of each segment (seconds) | 5 | +| `processing.inference_scale` | Scale factor for SAM2 inference | 0.5 | +| `processing.yolo_confidence` | YOLO detection confidence threshold | 0.6 | +| `processing.detect_segments` | Segments to process ("all" or list) | "all" | + +### Model Configuration + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `models.yolo_model` | YOLO model path or name | "yolov8n.pt" | +| `models.sam2_checkpoint` | SAM2 checkpoint path | Required | +| `models.sam2_config` | SAM2 config file path | Required | + +### Video Settings + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `video.use_nvenc` | Use NVIDIA hardware encoding | true | +| `video.output_bitrate` | Output video bitrate | "50M" | +| `video.preserve_audio` | Copy original audio track | true | +| `video.force_keyframes` | Force keyframes for clean cuts | true | + +### Advanced Options + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `advanced.green_color` | Green screen RGB color | [0, 255, 0] | +| `advanced.blue_color` | Blue screen RGB color | [255, 0, 0] | +| `advanced.human_class_id` | YOLO human class ID | 0 | +| `advanced.log_level` | Logging verbosity | "INFO" | +| `advanced.cleanup_intermediate_files` | Clean up temp files | true | + +## Processing Pipeline + +### Step 1: Video Segmentation +- Splits input video into configurable segments (default 5 seconds) +- Creates organized directory structure: `video_segments/segment_0/`, `segment_1/`, etc. +- Uses FFmpeg with keyframe forcing for clean cuts + +### Step 2: Human Detection +- Runs YOLO detection on specified segments +- Detects human bounding boxes with configurable confidence threshold +- Saves detection results for reuse and debugging + +### Step 3: SAM2 Segmentation (In Development) +- Uses YOLO detections as prompts for SAM2 +- Generates precise masks for detected humans +- Propagates masks across all frames in segments + +### Step 4: Green Screen Processing (In Development) +- Applies generated masks to isolate humans +- Replaces background with green screen +- Maintains video quality and framerate + +### Step 5: Video Assembly (In Development) +- Concatenates processed segments +- Preserves original audio track +- Outputs final video with green screen background + +## Project Structure + +``` +samyolo_on_segments/ +├── README.md # This documentation +├── config.yaml # Default configuration +├── main.py # Main entry point +├── requirements.txt # Python dependencies +├── spec.md # Detailed specification +├── core/ # Core processing modules +│ ├── __init__.py +│ ├── config_loader.py # Configuration management +│ ├── sam2_processor.py # SAM2 segmentation (planned) +│ ├── video_splitter.py # Video segmentation +│ └── yolo_detector.py # YOLO human detection +└── utils/ # Utility modules + ├── __init__.py + ├── file_utils.py # File operations + ├── logging_utils.py # Logging configuration + └── status_utils.py # Progress monitoring +``` + +## Usage Examples + +### Basic Processing +```bash +python main.py --config config.yaml +``` + +### Custom Configuration +```bash +python main.py --config my_custom_config.yaml --log-file processing.log +``` + +### Process Specific Segments Only +```yaml +processing: + detect_segments: [0, 5, 10, 15] # Only process these segments +``` + +### High-Quality Processing +```yaml +processing: + inference_scale: 1.0 # Full resolution inference +video: + output_bitrate: "100M" # Higher bitrate +``` + +## Performance Considerations + +### Hardware Requirements +- **GPU**: NVIDIA GPU with 8GB+ VRAM (recommended) +- **RAM**: 16GB+ for processing large videos +- **Storage**: SSD recommended for temporary files + +### Processing Time +- Approximately 1-2x real-time on modern GPUs +- Scales with video resolution and number of segments +- YOLO detection: ~1-2 seconds per segment +- SAM2 processing: ~10-30 seconds per segment (estimated) + +### Optimization Tips +1. Use `inference_scale: 0.5` for faster processing +2. Process only key segments with `detect_segments` list +3. Enable NVENC for hardware-accelerated encoding +4. Use SSD storage for temporary files + +## Troubleshooting + +### Common Issues + +**ImportError: No module named 'sam2'** +```bash +pip install git+https://github.com/facebookresearch/sam2.git +``` + +**CUDA out of memory** +- Reduce `inference_scale` to 0.25 or 0.5 +- Process fewer segments at once +- Use a smaller YOLO model (yolov8n.pt instead of yolov8x.pt) + +**FFmpeg not found** +```bash +# Ubuntu/Debian +sudo apt install ffmpeg + +# macOS +brew install ffmpeg + +# Windows +# Download from https://ffmpeg.org/download.html +``` + +**No humans detected** +- Lower `yolo_confidence` threshold +- Check that humans are clearly visible in the video +- Verify the input video format is supported + +### Debug Mode +Enable detailed logging: +```yaml +advanced: + log_level: "DEBUG" +``` + +## Current Status + +**Implemented:** +- ✅ Video segmentation with FFmpeg +- ✅ YOLO human detection +- ✅ Configuration management +- ✅ Progress monitoring +- ✅ Segment cleanup utilities + +**In Development:** +- 🚧 SAM2 integration and mask generation +- 🚧 Green screen processing +- 🚧 Video assembly with audio + +**Planned:** +- 📋 Multi-object tracking +- 📋 Real-time processing support +- 📋 Web interface +- 📋 Cloud processing integration + +## Contributing + +This project is under active development. The core detection pipeline is functional, with SAM2 integration and green screen processing coming soon. + +## License + +[Add your license information here] + +## Support + +For issues and questions: +1. Check the troubleshooting section +2. Review the logs with `log_level: "DEBUG"` +3. Open an issue with your configuration and error details \ No newline at end of file