I love that this is like tape in that it's a sequential access medium. It's storing a tape-like data stream in a digital version of what used to be tape itself (VHS).
I believe YouTube supports random access, or otherwise you wouldn’t be able to jump around in a video. Youtube-dl also supports resuming downloads in the middle, I believe.
Each frame gets the same amount of the file, about a kilobyte. So each frame is basically a sector. You need to read in a few extra frames to undo the compression, but otherwise it's just like a normal filesystem. And reading in a batch of sectors at once is normal for real drives too.
Even if you did need the frames to be self-describing, you could just toss a counter/offset in the top left corner for less than 1% overhead.