#218: Media Capabilities

Visit on Github.

Opened Nov 14, 2017

Hello TAG!

I'm requesting a TAG review of:

Further details (optional):

You should also know that...

The API is available in Chrome behind a flag. --enable-blink-features=MediaCapabilities Implementation bugs are tracked here.

We'd prefer the TAG provide feedback as (please select one):

  • open issues in our Github repo for each point of feedback
  • open a single issue in our Github repo for the entire review
  • [x ] leave review feedback as a comment in this issue and @-notify [mounirlamouri, chcunningham]

Discussions

Comment by @torgo Feb 2, 2018 (See Github)

Discussed at London f2f day 3

Comment by @triblondon Feb 2, 2018 (See Github)

Sangwhan to write up this review over dinnertime today.

Comment by @triblondon Feb 2, 2018 (See Github)

Issues raised in conversation:

  • Privacy: potential for fingerprinting, private mode is insufficient mitigation
Comment by @foolip Feb 27, 2018 (See Github)

FYI, there is now an Intent to Ship: Media Capabilities: decoding on blink-dev.

I note that this review was slower than usual, with 2.5 months passing before there was any activity, and sounds like there's still a write-up to come? Feedback at any stage of the lifecycle of a spec is welcome of course, but I'll suggest in the blink-dev thread to not block waiting for more feedback.

Comment by @foolip Feb 27, 2018 (See Github)

On fingerprinting: https://www.chromium.org/Home/chromium-security/security-faq#TOC-Why-isn-t-passive-browser-fingerprinting-including-passive-cookies-in-Chrome-s-threat-model- https://www.chromium.org/Home/chromium-security/client-identification-mechanisms

Comment by @cynthia Feb 27, 2018 (See Github)

@foolip apologies for the delay. We try to triage incoming reviews as soon as we see them, but there are times stuff falls through the cracks. I think it’s safe to not consider this a blocker for shipping the feature, I’ll summarize the discussion from the F2F into a write-up shortly.

Comment by @cynthia Mar 6, 2018 (See Github)

Apologies that this took so long. @chcunningham @foolip

As for the privacy issues, thanks for the links related to fingerprinting. The S&P questionnaire link in the original review request seems to be a 404, could you clarify on this? https://github.com/WICG/media-capabilities/blob/master/security-privacy-questionnaire.md

First pass review I notice a inconsistency with a API that touches on the same domain - Web Audio. Web Audio defines channels in the form of a unsigned long (which does obstruct away the presence of a low frequency channel) and the sample rate is a float. I don't have a strong opinion on which is better, but types for parameters touching the same concepts should probably be consistent. How to deal with the presence of a low frequency channel is an open question though - and whether or not exposing this detail is actually useful to the content authors.

The content mime type would most likely require additional parsing from each application that uses this - would it make sense to provide this in structured data to make it easier to use? It seems like most content authors would do codec checks via regex or substring matching with this approach, which isn't great. A section in the explainer (https://github.com/WICG/media-capabilities/blob/master/explainer.md#why-not-a-list-of-formats) seems to touch on this, but the intention for this review comment is different from the one mentioned here.

A normative definition of what defines a screen change (or a reference back to a normative definition) would be helpful.

Minor question 1: The explainer example code seems to suggest that the screen color depth is a string (the spec is missing a definition for this though) - is there any particular reason for this decision?

Minor question 2: The explainer touches on HDCP - but that isn't on the spec. Wouldn't the approach in the explainer break when a user launches the content on a HDCP capable screen, starts playback, then drag it into a non-HDCP capable screen?

Since it is unclear exactly what from the spec is shipping - would you mind sharing the CL that relates to the I2S?

Comment by @mounirlamouri Mar 6, 2018 (See Github)

The S&P questionnaire link in the original review request seems to be a 404, could you clarify on this? https://github.com/WICG/media-capabilities/blob/master/security-privacy-questionnaire.md

I believe the link is working. Maybe GH had troubles when you tried?

Web Audio and channels

As mentioned in the spec, channels is still a placeholder and we do not currently use it in Chrome's implementation. I have filed https://github.com/WICG/media-capabilities/issues/73 to make it clearer.

The content mime type would most likely require additional parsing from each application that uses this - would it make sense to provide this in structured data to make it easier to use?

I'm not entirely sure what you meant by this. Specifically, what you mean by "additional parsing from each application". I would expect web applications to copy-paste the type in their code or read it directly from a manifest of some sort.

A normative definition of what defines a screen change (or a reference back to a normative definition) would be helpful.

As mentioned below, only part 2 of the spec is something we are launching in Chrome. Part 3 is more draft-y and most of it was or will be merged into a CSS spec. This will likely be the case with the change event if it ever happens.

Minor question 1: The explainer example code seems to suggest that the screen color depth is a string (the spec is missing a definition for this though) - is there any particular reason for this decision?

Color depth changes we had were merged into the appropriate CSS spec. I believe 3.3 is a leftover from a removel. I've filed https://github.com/WICG/media-capabilities/issues/74

Minor question 2: The explainer touches on HDCP - but that isn't on the spec. Wouldn't the approach in the explainer break when a user launches the content on a HDCP capable screen, starts playback, then drag it into a non-HDCP capable screen?

HDCP was split into another specification with a brief explainer in the directory. It will be an extension of EME. Your point about EME and screen changes is correct though I believe CDM might deal with this. The screen change event would be another way but the intent of this event is larger and could be fired when the screen size has changed.

Since it is unclear exactly what from the spec is shipping - would you mind sharing the CL that relates to the I2S?

That's a very good point. Part 2 is the one that is shipping in Chrome: https://wicg.github.io/media-capabilities/#decoding-encoding-capabilities

Comment by @cynthia Mar 6, 2018 (See Github)

I believe the link is working. Maybe GH had troubles when you tried?

It seems so - I just tried again and it works just fine.

I'm not entirely sure what you meant by this. Specifically, what you mean by "additional parsing from each application". I would expect web applications to copy-paste the type in their code or read it directly from a manifest of some sort.

I imagined a use case where the content author wants to parse out just the codec and not the container information, with a string based format this would require parsing the string.

As mentioned below, only part 2 of the spec is something we are launching in Chrome. Part 3 is more draft-y and most of it was or will be merged into a CSS spec. This will likely be the case with the change event if it ever happens.

It would be great if the spec could be trimmed down to only what is shipping. Stale draft material tends to confuse both implementors and content authors.

Comment by @mounirlamouri Mar 7, 2018 (See Github)

Good point for the spec state, I will add a warning on top of section 3 mentioning that it's still WIP.

Regarding the codecs string, we require the container and the codec, such as video/webm;codecs=vp8. I believe that most places in the web platform ask for formats in this form (old APIs would accept container-only).

Comment by @cynthia Oct 31, 2018 (See Github)

The framerate is the number of frames used in one second (frames per second). It is represented either as a double or as a fraction.

This is a bit strange, but I'm guessing some sort of compatibility legacy reason? Would be useful to know why this is like this (and if this is the long term right way forward)

Comment by @cynthia Oct 31, 2018 (See Github)

Hey all,

Thanks for filing this issue. We took it up during our Paris F2F. Apologies that it took so long.

I had a question about how this could be use to test capabilities in a multiple media stream context. For example, can we understand if it's possible to efficiently decode more than one video stream at once? Or get a maximum number of streams/channels/densities at which the client would hit limits? This case could come up in an RTC scenario. There may be cases where decode won't be smooth when you have two decoders or a encoder and decoder pair running, for example.

We also had questions about naming and types of the returned capabilities, specifically smooth and powerEfficient. These names imply a guarantee - despite what you've already written about how they are not. Have there been any alternative names considered? Curious if this can be addressed somehow? (We ballbrainstormed ''janky or powerInefficient as poor choices, but with the logic inverted.)

Thanks again, and looking forward to hearing back.

Comment by @chcunningham Oct 31, 2018 (See Github)

Hey Cynthia,

Re: framerate, this stems discussion here https://github.com/WICG/media-capabilities/issues/12

But its recently being reconsidered https://github.com/WICG/media-capabilities/issues/96

Happy to have your input.

Comment by @chcunningham Oct 31, 2018 (See Github)

For example, can we understand if it's possible to efficiently decode more than one video stream at once?

Its not easy to know with current API shape and I haven't thought much about ways the API could be changed for this use case. If you wanted to show 2 videos of same resolution you could approximate by doing a query that doubles the framerate. For sw codecs this is decent. For hw codecs it will depend how the hw resources are divided (e.g. by time slice vs by playback). AFAIK platform decoder APIs don't often surface parallel session limits up front, so this would involve some guessing/learning even for implementers (not a deal breaker).

These names imply a guarantee...

We could prefix the names with "ideally" or "typically"... I tend to favor the shorter names though. I expect sophisticated users of the API to understand that nothing is ever guaranteed in media ;).

Discussed Dec 1, 2018 (See Github)

Sangwhan: travis had something on the naming - 2 properties returned - smooth and power efficient - frame rates- fractional nuumbers delivered as strings - they are reconsidering as 2 integers.

Alex: why not floats?

Sangwhan: they said floats dont cut it. but it's being reconsidered from strings.

Bumped to next week

Comment by @chcunningham Dec 18, 2018 (See Github)

Hi TAG,

This review is marked "pending external feedback" - I want to check to see if there's any lingering work from the original proposal?

Also, we have updated (diff, PR) the spec for decodingInfo() to allow callers to describe encrypted media. This doesn't really change the API shape nor have implications existing callers. It merely adds new optional fields to the input dictionaries and new algorithm steps to process those inputs. The motivations for this are described here in the explainer. Please let me know if mentioning those additions here is sufficient to trigger review.

Chris

Discussed Jan 1, 2019 (See Github)

Kenneth: comment 21 days ago that they updated the PR but we haven't followed up.

Travis: I'm skeptical of how useful this is going to be in practice. A UA might be able to tell you whether it could play smoothly or be battery efficient but there are a lot of factors...

Hadley: re battery efficiency -- could be important for older devices and developing/less wealthy countries.

Alex: this is one of those trade-offs where it's highly invested eng teams vs the common case. Big media companies would need this.

Kenneth: but it is bad for the user if you're draining their battery - bad for the web. This seems like something we need to have but is it done the right way? Could be especially useful for VR media...

Dan: is this in fact useful in other domains besides high volume video? Maybe get feedback from VR people?

Alex: there might be some but [it's a different medium]... Potential benefits might be different. we could ask them if they are paying attention to power at all.

Dan: XR folks will be requesting a review soon so..

Kenneth: what about amazon / hulu / netflix? Also BBC.

Dan: we could reach out to talk to Mark... (AC rep for netflix).

Kenneth: we could get feedback from 2nd screen working group....

Alex: And Web & TV folks

Yves to work w3c contacts.

Kenneth: let's get feedback and from that feedback we figure out what we do

Discussed Jan 1, 2019 (See Github)

Dan: What shall we do with this?

Yves: There is a discussion on Second Screen, also talked to dom about getting feedback from WebRTC as well.

Dan: Maybe should bump to next week? Give Sangwhan and Travis time to look it over.

Comment by @cynthia Jan 10, 2019 (See Github)

Hi! Apologies for the delay. I have a pile of excuses, but it all boils down to - buried in mail backlog. Sorry, and thank you so much for your patience.

Its not easy to know with current API shape and I haven't thought much about ways the API could be changed for this use case. If you wanted to show 2 videos of same resolution you could approximate by doing a query that doubles the framerate. For sw codecs this is decent. For hw codecs it will depend how the hw resources are divided (e.g. by time slice vs by playback). AFAIK platform decoder APIs don't often surface parallel session limits up front, so this would involve some guessing/learning even for implementers (not a deal breaker).

This came from the possibility of an application such as this: https://viewsync.net

Are there any native APIs that surface this information up that could be used on constrained devices? (e.g. Mobile OSes come to mind) If so, noting that it might be recommended to make use of the plumbing done there as a non-normative note could improve the usefulness of this API.

The same limitation most likely exists in WebRTC, and if I understand the situation correctly application rely on blunt UA sniffing to determine if it is a desktop or a mobile/embedded environment and either just drop dead or only show a single video stream (or none), which is suboptimal. (I recall appear.in not being very nice to Safari, for example)

We could prefix the names with "ideally" or "typically"... I tend to favor the shorter names though. I expect sophisticated users of the API to understand that nothing is ever guaranteed in media ;).

I actually do not have strong opinions on this; @travisleithead may have more to say on this.

Also, we have updated (diff, PR) the spec for decodingInfo() to allow callers to describe encrypted media. This doesn't really change the API shape nor have implications existing callers. It merely adds new optional fields to the input dictionaries and new algorithm steps to process those inputs. The motivations for this are described here in the explainer. Please let me know if mentioning those additions here is sufficient to trigger review.

This can be lumped together with the same review; I'll look at the changes and post another comment. This time not two months down the road!

An extra question that came to mind is whether capabilities for a remote playback device should be accessible (given that a native mechanism exists, this I need to do some homework on) in a similar manner: https://w3c.github.io/remote-playback/

Comment by @chrisn Jan 10, 2019 (See Github)

I think there's a need to support remote playback capabilities. The Remote Playback API describes three modes of operation: media mirroring, media remoting, and media flinging.

In cases where the UA is requesting media content to pass through to a remote playback device (media mirroring and media remoting), you'd want to use the capabilities of the remote playback device to decide which media encoding to request from the server. So capability discovery is something we'd want to consider in the Open Screen Protocol design.

Discussion in the Second Screen WG so far has focused mainly on the media flinging case, where the remote playback device is typically handed the URL of the media, so would do its own negotiation with the server, so maybe here there's less of a need to expose the remote device capabilities? I'd need to think about this some more to be more certain though.

cc @mfoltzgoogle

Comment by @cynthia Apr 10, 2019 (See Github)

@chcunningham did you folks have a chance to discuss this internally about the feedback above?

Comment by @chcunningham May 2, 2019 (See Github)

@cynthia apologies for the delay.

Re: native APIs for concurrent playbacks, I'll have to double check - some changes coming to new Android, but not sure if this particular query is possible. @jernoble for Safari/Mac (and thoughts on the use case).

Re: webRTC - the API currently doesn't cover WebRTC decoding at all. Its debatable whether it should. There is some ongoing exploration of doing this for WebRTC encoding via encodingInfo()... The alternative approach is to augment WebRTC to better handle capabilities questions in its domain.

@mounirlamouri, thoughts on Remote Playback? What is capability discovery like in that API right now?

Comment by @mounirlamouri May 2, 2019 (See Github)

Regarding remoting, I think supporting something via Open Screen Protocol and exposed through the Presentation API would be interested but I do not think we should extend the Remote Playback API which is meant to be a very simple API to play things remotely.

Comment by @deanliao May 8, 2019 (See Github)

Hi TAG,

I'm implementing MediaCapabilities encodingInfo() API, specifically "transmission" type. Intent to Implement Design doc I'm requesting privacy / security review as it adds fingerprinting surface just like decodingInfo(). Except above, do I need to provide any information for launching the encodingInfo() method?

Comment by @chcunningham May 8, 2019 (See Github)

Re encodingInfo(), I recommend holding off on TAG review of that for a bit longer. We are still discussing the API internally (particularly the WebRTC part), so lets wait for that dust to settle (should know soon).

Comment by @markafoltz May 8, 2019 (See Github)

Re: https://github.com/w3ctag/design-reviews/issues/218#issuecomment-453275368

@chrisn Sorry I missed this earlier, it got filtered into a folder.

Capability queries for the Open Screen Protocol are being discussed in https://github.com/webscreens/openscreenprotocol/issues/123. Depending on where it lands, it should help user agents accurately determine which devices are capable of remote playback of a specific media element. But I didn't see a direct impact on the Media Capabilities API itself, since the idea is the user agent (not the Web application) figures out the best way to play the media remotely.

For the Presentation API, if the user agent rendering the presentation supports the Media Capabilities API, the presentation can use that to make choices about what media to play; but the API should work the same as on any other document.

Happy to discuss further at our upcoming F2F.

Comment by @hober May 23, 2019 (See Github)

Hi, @kenchris and I took a quick look at this during our F2F in Reykjavik. We note that the two pending bits of review that got added to this issue recently—decodingInfo() for encrypted media, and the "transmission" type of encodingInfo()—have fingerprinting implications, but the Security & Privacy Questionnaire assessment hasn't been updated in light of these new features. Could you update your security & privacy assessment accordingly?

Additionally, it's difficult for us to review the "transmission" type for encodingInfo() as your design document is not visible to non-Google employees. We'll hold off on reviewing this aspect until your internal dust settles and you make a design document public.

Comment by @mounirlamouri May 23, 2019 (See Github)

@hober and @kenchris I do not believe that we need to update the Security & Privacy Questionnaire as the potential fingerprinting concerns for encrypted media and "transmission" type are the same category as the fingerprinting concerns of the rest of the API.

@deanliao can you create a publicly visible version of your design document?

Comment by @chcunningham Nov 12, 2019 (See Github)

Hi Tag, Just doing a bit of cleanup.

Re: encodingInfo() - I've created new issues in MediaCapabilities and WebRTC to scrutinize that proposal a bit more and ponder whether WebRTC should instead have a its own separate mechansim.

Does anyone have lingering issues to discuss here?

Comment by @chcunningham Nov 12, 2019 (See Github)

Re: manifests, just for clarity, these would often be the well known mpeg-dash manifests. Some sites roll their own formats (e.g. YouTube), but they probably prefer it that way.

Comment by @hadleybeeman Dec 5, 2019 (See Github)

We're looking at this in our TAG face-to-face in Cupertino.

Looks like we are still waiting for a design doc from @deanliao.

@chcunningham @mounirlamouri, I'm guessing you're tracking this too. Could one of the three of you help us with this?

Comment by @deanliao Dec 5, 2019 (See Github)

@hadleybeeman sorry I transferred to other team and handed over the task to @chcunningham .

Comment by @cynthia Mar 4, 2020 (See Github)

@plinss and I revisited this in our Wellington F2F.

Apologies that we've had this open for so long. We don't think there are significant enough technical issues remaining to justify keeping this issue open, so we would like to close this. Thank you for bringing this to our attention!

If you have any significant design changes that would require a second round of review, let us know and we will re-open.