#608: Media Session: video conferencing actions

Visit on Github.

Opened Feb 12, 2021

Ya ya yawm TAG!

I'm requesting a TAG review of Media Session: video conferencing actions.

Make Media Session useful for video conferencing use cases. This addition to the Media Session specification allows websites to handle microphone, camera, and hangup via new actions and states.

  • Explainer¹ (minimally containing user needs and example code): https://github.com/w3c/mediasession/issues/264
  • Security and Privacy self-review²: nothing new here, the one for the main spec applies.
  • GitHub repo (if you prefer feedback filed there): https://github.com/w3c/mediasession/
  • Primary contacts (and their relationship to the specification):
    • Tommy Steimel (@steimelchrome), Google, editor for the change
    • Mounir Lamouri (@mounirlamouri), Google, co-chair Media WG
    • Jer Noble (@jernoble), Apple, co-chair Media WG
  • Organization/project driving the design: Google
  • External status/issue trackers for this feature (publicly visible, e.g. Chrome Status): TBD

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • The group where the incubation/design work on this is being done (or is intended to be done in the future): Media WG
  • The group where standardization of this work is intended to be done ("unknown" if not known): Media WG
  • Existing major pieces of multi-stakeholder review or discussion of this design: N/A but GH issue is meant for that
  • Major unresolved issues with or opposition to this design: N/A
  • This work is being funded by: N/A

You should also know that...

This is a small addition of the Media Session specification that has already launched in Chrome and (afaik) Firefox. Safari expressed interest and has a WIP implementation. Apple mentioned that they want to add a hangup action to the Media Session specification during the Media WG meeting in January.

We'd prefer the TAG provide feedback as (please delete all but the desired option):

💬 leave review feedback as a comment in this issue and @steimelchrome @mounirlamouri

Discussions

Comment by @kenchris Feb 16, 2021 (See Github)

Considering that people are asking about the use-cases (https://github.com/w3c/mediasession/issues/264#issuecomment-779286920) it probably makes sense to write a minimal explainer with those

Comment by @mounirlamouri Feb 16, 2021 (See Github)

Considering that people are asking about the use-cases (w3c/mediasession#264 (comment)) it probably makes sense to write a minimal explainer with those

We are going to update the issue to be clearer. Having everything scoped into that issue would be better than having documents around for a change so small. We will update the original explainer to take into account the new actions.

However, the use cases are fairly obvious: when you are on a call, you could mute/unmute yourself, stop/start your camera, and hang up from another location than the website where the video call is happening. Accessing these actions from a notification or some media centre or from a Picture-in-Picture window are usage we are considering. Those are very similar use cases than the media playback ones: controlling your media from outside the scope of the page.

Comment by @mounirlamouri Feb 16, 2021 (See Github)

The issue has been updated to include the use cases.

Discussed Feb 22, 2021 (See Github)

Ken: toggling....

Dan: they do need to further explore the use cases in the explainer.

Sangwhan: design-wise I think it's fine.

Dan: good to see multiple impementers.

Sangwhan: writing feedback. Method for mic is muted and camera is turned on, which is the opposite. One checks if it's off, one if it's on, that's weird.

Dan: considering actions that can activate/deactivate camera/mic is there enough thought that's gone into the privacy design?

Ken: this is common in apps people are used to use.

Sangwhan: the media session thing is a mechanism for you to react to things like media keys. If you have a remote or multimedia keyboard, you press a hardware button and send it to the application and it knows instead of having to listen to key events. Something that comes from the system, an intent. In this case it's a mute button on your hardware mic. I don't have good ideas of how this could be exposed in a securty and privacy manner.

Ken: the user is doing an action somewhere and expects something to happen.

Sangwhan: who knows, there might be a loophole.

Ken: you can already turn on the camear on the web. You get a notification the first time, and it has to be visible. I think it should be handled by what you can already do.

Sangwhan: feels more like a convenience mechanism. If you have actual conferencing equipment you have these buttons you can use.

Comment by @cynthia Feb 23, 2021 (See Github)

Design-wise, I think this looks fine.Two things are a bit confusing:

  1. muted / turnedon naming - would be nice to have a generic name to indicate enabled/activeness of the underlying device.
  2. muted is negative while turnedon is positive, feels like this should be both positive or negative?
Comment by @torgo Feb 23, 2021 (See Github)

Are there any additional privacy issues that need to be discussed in the explainer, considering that we are talking about muting / unmuting a user's microphone and activating / deactivating the user's camera? Even if not, it might be good to document why not.

Comment by @mounirlamouri Feb 25, 2021 (See Github)

Are there any additional privacy issues that need to be discussed in the explainer, considering that we are talking about muting / unmuting a user's microphone and activating / deactivating the user's camera? Even if not, it might be good to document why not.

There is no privacy implications here. The Media Session API allows websites to register actions. If example.com registers a given action, the UA will notify them when that action should be triggered. The UA is only the middle person in this and no third party is involved. If you are not familiar with Media Session, I understand that it can be unclear.

Design-wise, I think this looks fine.Two things are a bit confusing:

  1. muted / turnedon naming - would be nice to have a generic name to indicate enabled/activeness of the underlying device.
  2. muted is negative while turnedon is positive, feels like this should be both positive or negative?

The naming isn't great, I agree. @kenchris suggested a naming in the issue that I'm fine with. I will let @steimelchrome make the last call as he is the one actually driving this.

Discussed Mar 8, 2021 (See Github)

Dan: if there are no privacy concerns might be good to docment why not...

Ken: they're considering naming feedback. What is the cross browser story?

Dan: the spec should contain what he wrote about security and privacy.. is that added? Yes.

Ken: will leave closing comment

Comment by @kenchris Mar 9, 2021 (See Github)

Thanks for taking our feedback into consideration. We are fine with this feature

Comment by @steimelchrome Mar 9, 2021 (See Github)

Thanks for reviewing! For the record, we made a gist version of the explainer in response to your earlier comment