Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify flush() behavior #220

Closed
sandersdan opened this issue May 5, 2021 · 9 comments · Fixed by #247
Closed

Clarify flush() behavior #220

sandersdan opened this issue May 5, 2021 · 9 comments · Fixed by #247
Labels
breaking Interface changes that would break current usage (producing errors or undesired behavior).

Comments

@sandersdan
Copy link
Contributor

sandersdan commented May 5, 2021

The behavior of a decoder following a flush() is not currently specified. In practice many decoders cannot resume decoding mid-stream after a flush and need a keyframe, however it may be difficult to express this without delving into codec-specific behavior.

@chcunningham
Copy link
Collaborator

The current specification reflects the earlier thinking (i.e. no frame-type requirements). But, given that we now know many decoders require a keyframe post flush() (seems all but dav1d), I think the spec should just require it.

@sandersdan
Copy link
Contributor Author

sandersdan commented May 5, 2021

FYI at least the VideoToolbox decoder in Chrome can continue mid-stream after a flush without issue. I believe that the FFmpeg H.264 decoder can also continue, but may use error recovery to decode non-keyframe starts (ie. grey with morphing blocks until a keyframe).

My preference is to allow non-keyframe after flush() to result in any of: best effort decode, silent drop, decode error.

It could be that apps do not like silent drop, but it is a strategy we have implemented following a configuration change in some of our decoders to work around incorrect muxing in other <video> cases. An alternative could be a non-silent drop, something like calling the output callback without a frame.

Edit: My reasoning for the above preference is that incorrectly-marked keyframes are a common problem with long-tail content on the web, and we've seen before that MSE API users prefer that the browser works around such issues. The issue does not often come up with sources that transcode all of their content. It may be that WebCodecs is low-level enough that we can reasonably require apps to implement the workarounds instead. I've also heard that some live streaming cases do not have clear keyframe marking in their streams, and apps would prefer not to implement their own bitstream parsing for that.

@chcunningham
Copy link
Collaborator

@sandersdan and I discussed this at length today.

My preference is to allow non-keyframe after flush() to result in any of: best effort decode, silent drop, decode error.

This doesn't offer enough predictability. It will produce scenarios where a site worked great when tested in browser X, but errored in browser Y. Or even: test stream worked great in browser X, but user stream later failed in that same browser.

My reasoning for the above preference is that incorrectly-marked keyframes are a common problem with long-tail content on the web, and we've seen before that MSE API users prefer that the browser works around such issues.

I do hear this reasoning. I think the MSE situation is definitely a mess to be avoided. I also think that we should hold WC users to a higher bar (lower level API). At the same time, I don't want require that all implementations violate the layers to vet that a frame is actually key frame (very codec specific, and we really want to allow pass-through to the underlying decoders). To that end, here's a proposal that I like and @sandersdan doesn't hate:

  1. continue to require correct specification of chunk.type
  2. use that chunk.type in normative spec steps to say that a key frame must be provided after flush() (and after configure()). if a key frame is not provided -> decode error.
  3. state that implementers may additionally inspect the frame to validate it is actually a key frame
  4. offer a JS library that does inspection for folks who don't know there content well.

Chrome would be one of the implementations that does deep inspection allowed by step 3, but we avoid making every implementer do so.

@padenot
Copy link
Collaborator

padenot commented May 7, 2021

Chrome would be one of the implementations that does deep inspection allowed by step 3, but we avoid making every implementer do so.

A cursory look at the Gecko source code tells me that it also does not trust the container for (at least) some container/codecs pairs for precisely the reason outlined by @sandersdan's last paragraph.

I think I agree with the steps above, but what happens after a flush if the frame is actually not a key frame but is marked as a key frame and the browser X doesn't do a deep inspection to validate this (step 3), and it somehow recovers, but browser Y does, and throws ? We're kind of back to square one.

This doesn't offer enough predictability. It will produce scenarios where a site worked great when tested in browser X, but errored in browser Y. Or even: test stream worked great in browser X, but user stream later failed in that same browser.

There also the case where it works great on browse/version on macOS but doesn't on the same browser/version on Window, which is really confusing for developers I guess.

@chcunningham chcunningham added this to the V1 Launch Blockers milestone May 7, 2021
@chcunningham
Copy link
Collaborator

I think I agree with the steps above, but what happens after a flush if the frame is actually not a key frame but is marked as a key frame and the browser X doesn't do a deep inspection to validate this (step 3), and it somehow recovers, but browser Y does, and throws ? We're kind of back to square one.

This does allow for the different behaviors you mentioned, but it only allows for that in the case that the user is doing the wrong thing. We could formally require that all implementations violate the layers, but this seems really onerous and I think its fair for the spec to expect some amount of savvy from the user vs lots of hand holding from the UA. I think our design is actually pretty friendly compared with other APIs in this space.

@chcunningham chcunningham added the breaking Interface changes that would break current usage (producing errors or undesired behavior). label May 12, 2021
@chcunningham
Copy link
Collaborator

Triage note: marked 'breaking' as the current spec wording does not intend / hint at the requirements now being proposed.

@padenot
Copy link
Collaborator

padenot commented May 17, 2021

Besides, during the last editor call, it was mentioned that it could be a requirement for an implementation to go the extra mile (in practice not really because they already to it) and really check that a frame marked as a key frame by inspecting the byte stream, for compatibility reasons.

@chcunningham
Copy link
Collaborator

PR #247 gives implementers the option strictly without requiring. That's still my first choice.

While many implementers have such code today, they generally on have it for the handful of codecs currently supported in MSE. Also, it may only feasible to do such inspection if the content is not encrypted. WC doesn't do EME yet, but we're hearing demand for it (#41).

@photopea
Copy link

photopea commented Aug 1, 2022

Hi guys, could anyone help me with this problem? https://stackoverflow.com/questions/73184093/decode-mp4-video-with-videodecoder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Interface changes that would break current usage (producing errors or undesired behavior).
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants