21 lines
3.2 KiB
Markdown
21 lines
3.2 KiB
Markdown
This project is an exploration of the following prompt:
|
|
|
|
Prompt start:
|
|
|
|
User:
|
|
Help me come up with an python project that can be used to do human matting on VR180 3D SBS video. You do not need to write any code, we just need to come up with some high level plans. Please do research to fine the best methods. I have explored and done some research on the past using things like SAM2 but the main challenge was that i had to split the input video into small 10 second segments to fit into video memory and i had to manually select the object which i wanted to keep and turn everything else into a green background so it could be keyed out--some of these things were computationally expensive since i was running on a consumer cpu and nvidia 3080 with limited vram. I also explored robust video matting again on segments since hte input video is large--this mostly worked but again was a bit slow and it seemed like the model did not do well if the peeople were in non standard positions like they were facing away from teh camrea of their body was highly obscurred. Please go over some approches taht could be considered. If SAM or RVM is a good approach then I would be opent o doing this again, but if there are other methods that could be used i would like to hear about them too. Additionally, i think this should be packaged in such a way that it coudl be ran remotely on rented cloud gpus. I have not done this yet--but i think it is a common way to run compute intensive workflows like pytorch via docker containers or something on rented hardware--so intead of me just running the code on a 3080 over the course of a day or two i could just rent a cloud gpu with maybe 128gb of vram and spend a few dollars to get it done faster and just download the result.
|
|
|
|
Claude:
|
|
I'll help you research the best approaches for human matting on VR180 3D SBS video! Before diving into the research, let me clarify a few things to ensure I find the most relevant solutions:
|
|
|
|
What's your typical video resolution and duration? (This will help determine memory requirements and whether newer methods can handle full videos without segmentation)
|
|
Are you looking to process multiple people in the scene, or primarily single-person matting?
|
|
What's your target processing time and budget for cloud GPU usage? (e.g., would you prefer a method that's 10x faster but costs $50 vs one that takes longer but costs $5?)
|
|
|
|
User:
|
|
1) The video varies but expect 8000x4000 60fps. If we could downscale, generate the matt and then upscale again without losing massive quality on the edge (minimal stairsteping or aliasing where the human that is rendered meets the green background) then this coudl be an option. 2) Just a single person, they will appear on both sides of the frame since its VR180 3D SBS of course with slightly different view since two lenses were used to capture the video. 3) I woudl prefer to not spend lots of money per video, but something like maybe costs like 10-20 dollars for a 1 hour video at the resolution/framerate described above would be reasonable. If it can be done for cheaper great, ideally it would work fine or have control points to make it "scale" on high end or even consumer gpu hardware if i do end up wantting to run it locally
|
|
|
|
Prompt End.
|
|
|
|
The resulting research is found in research.md in this directory
|