Design Review #798

#798: Early design review: Document Picture-in-Picture

Opened Dec 9, 2022

Wotcher TAG!

I'm requesting a TAG review of Document Picture-in-Picture.

There currently exists a Web API for putting an HTMLVideoElement into a Picture-in-Picture window (HTMLVideoElement.requestPictureInPicture()). This limits a website's ability to provide a custom picture-in-picture experience (PiP). We want to expand upon that functionality by giving websites the ability to open a picture-in-picture (i.e., always-on-top) window with a blank document that can be populated with arbitrary HTMLElements instead of only a single HTMLVideoElement.

Explainer¹: https://github.com/WICG/document-picture-in-picture
Security and Privacy self-review²: https://github.com/WICG/document-picture-in-picture/blob/main/security-privacy-questionnaire.md
GitHub repo (if you prefer feedback filed there): https://github.com/WICG/document-picture-in-picture
Primary contacts (and their relationship to the specification):
- Tommy Steimel (steimelchrome), Google
- Frank Liberato (liberato-at-chromium), Google
Organization/project driving the design: Google Chrome
External status/issue trackers for this feature (publicly visible, e.g. Chrome Status): https://chromestatus.com/feature/5755179560337408

Further details:

I have reviewed the TAG's Web Platform Design Principles
The group where the incubation/design work on this is being done (or is intended to be done in the future): WICG
The group where standardization of this work is intended to be done ("unknown" if not known): unknown
Major unresolved issues with or opposition to this design:
- See github issues list for known problems. One notable issue is that we're still trying to figure out how to best design and specify how CSS copying works for this feature
This work is being funded by: Google

We'd prefer the TAG provide feedback as (please delete all but the desired option):

🐛 open issues in our GitHub repo for each point of feedback

Discussions

2023-03-06

Minutes

Amy: CSS question - one action is to rope in a CSS person.

Dan: Ask Lea to take a look?

Amy: use cases are written as "the web site wants to" no the web site doesn't want anything...

Dan: they need to rephrase as user needs.

Max: describe the use case from the user's perspective. In the goals they only mention web site... "always non top window" but didn't give details. In the about paragraph it lists many cases but in goals secion it only lists one...

Yves: what is the relationship with popover? It's sharing some visual elements of a window that pops on top of something else. There are some a11y issues with popups. Does the same thing apply here?

Dan: reviewing the S&P responses.

Yves: video has subtitles... what happens in that case if you have an a11y tool that displays subtitles in a separate window.

Dan: https://wicg.github.io/document-picture-in-picture/#spoofing don't understand

Amy: you cna have a little bubble in the PiP that has the URL (the origin)...

Yves: it's always possible to fake what is not in the browser chrome. difficult to ensure what is displayed is correct info.

Sangwhan: in past review I brought up the fact that picture in picture was very limited. Original proposal was a media proposal - fits into media working group. This is a general thing ... so best to go in webapps .. predecessor was media working group.

Amy: this is useful context. Review request makes it all seem like very early work. I didn't know this has been worked on on and off for five years. Changes the perspective about multi stakeholder interest.

Previous reviews:

Also related:

https://github.com/w3ctag/design-reviews/issues/767

Sangwham: PiP not a window - it's a quasi-window... You'd want to have the content authors cutomize and I don't see that possiblw

conversation on phishing

Amy: general feature is useful... it's explicit the lack of customization is on purpose...

Sangwhan: aspect ratio... does that follow an existing pattern?

<blockquote> @steimelchrome have you updated the explainer to clarify these issues? (And by the way, thanks @slightlyoff!)

From our review in today's TAG breakout, this looks like a generally useful feature.

A few other questions:

what is the planned route for standardizaion for this? Right now it just lists WICG.
we noted that while the explainer is well written, it doesn't start with user needs as we've been encouraging. Can you add some material documenting the use cases from a user's perspective?
is there any relationship with Popover #743 - considering these are both to do with layering of content
we discussed a potential issue around accessibility... for example if there are subtitles in the PiP "window" ensuring those can be picked up by assistive technology appropriately. What other accessibility considerations have you discussed?
we're slightly concerned with the proposed mitigation to the spoofing issue - although it's good that this consideration is called out. Can you strengthen this wording maybe with an example?
we'd like to encourage you to use normative language in the security and privacy considerations sections, as you develop those further
has there been any feedback from other browsers? Have you opened up issues in Mozilla or Webkit standards positions?
how would this feature work with multiple screens? Would it be up to implementations to decide which screen the PIP window shows up in? It seems like it would be useful to factor in multiple screens, given that proposals like this have come forward.
Is the aspect ratio (width/height vs height/width) following a common pattern? If so, we might want to document this as a design principle.

Also just noting: we're going to bring more CSS expertise to bear on this review so expect some further questions.

Thanks!

</blockquote>

Dan: leaves comment

2023-03-27

Minutes

Dan: will ping Chris H and put on the agenda for the 10th.

2023-04-tokyo

Minutes

Yves: How was it different from popover...

Yves: if you have an image editor and you have side toolboxes... which would be a better model? Popover or picture-in-picture?

Dan: my sense would be popover... but it kind of depends...

Yves: depends on what capabilities are for exchanging data from one window to another...

2023-06-12

Minutes

Dan: we've had external feedback, responding to mine and Yves' message in April. Privacy/security concerns. They were claiming it doesn't make the situation worse, but we want to leave the web better. They replied it makes the web better. They've modified the spec about spoofing prevention from Yves' feedback. Mozilla standards position is not resolved yet. Don't think it's clear how they think this can be... theiri point is any ui element can be misused by advertisers.

Amy: no further comments

Yves: we discussed in Tokyo. Agreed it wasn't worse, but good to say there are issues in the spec. Looks like that is what they did.

Dan: change to spec addresses phishing topic. What about the API?

Rossen: given the fact we can't dictate/require UI I agree the normative language is good. But worry there are UAs where the UI is minimal or customiseable, and it becomes a vector for exploits. Especially things designed to harvest user data without their consent. Not that there's something different for us to do, but still concerned.

Dan: what if they added normative language around those requirements for those kinds of UAs?

Rossen: e.g. a webview - a PWA (full screen) could be abused. Not sure that this language is sufficient. Having said that I don't have something more strong to offer without getting into territory of specifying UI expectations - which we don't do.

Peter: "if the user clicks the window's close button" but doesn't mandate that the window has a close button. Would be nice if there was a standard consistent way to dismiss something.

Lea: this is basically very similar to window.open except the window's always on top....

Peter: ...with a lot more constraints. E.g. you can't position it.

Lea: they address this a question under detailed design discussion... but wasn't there discussion of creating a better window.open - couldn't this plug into that?

Lea: they've discussed many use cases - maybe it shouldn't be named this way - since it's not just about pictures... Does this name space have anything else? or is it only this one method? Would another namespace be more suitable? If the only reason this isn't part of window.open ... I'd love this to become part of the efforts to improve window.open... Would be better - would fix both problems in one swoop. Wish the use cases for integrating into window.open were explored more.

-Rossen

Lea: why don't we fix window.open so the options it supports are feature detectible?

Peter: it does strike me we have other things that are difficult to feature detect...

Peter: why is this an API and not an element... they talked about "requiring the overhead of a separate doc" - not sure that's necesarily true... wondering if that's not a better approach. tie it into frame behaviour?

Dan: good feedback, a bit late. Multistakeholder is an issue.

+Tess

Peter: it's not too late if there is no multistakeholder support

Dan: Picture in picture has multistakeholder support as part of WebRTC

<blockquote> Hi - first of all we know we're late getting back to you on this. Thank you for bearing with us. We appreciate the addition of normative language around spoofing. We remain concerned about the lack of multi-stakeholder support - particularly the lack of support from other browsers - unlike picture in picture itself which enjoys strong support across engines. We're also concerned that this feature could be used to enable surprising and disruptive advertising experiences. We also remain concerned about the browser chrome around this picture-in-picture window. E.g. the documentation presumes there will be a close button but this is highly dependent on platform.

In terms of API design, we did see a lot of commonalities between this functionality and an "always on top" option in window.open(). Integrating it in window.open() would also fix a lot of issues around it (guaranteed prominent window chrome, guaranteed close button, existing ways to interact with said windows, naming, namespacing etc) and it also reduces the new API surface that authors need to learn. We do see that this was considered as an alternative, but rejected due to lack of feature detectibility for window.open() options and some functional differences (never outliving the opener). There are discussions around creating a new method that fixes the various issues of window.open(), it may be a good idea to collaborate with the folks working on this effort. The functional differences between this and window.open() may be useful more broadly too, never outliving the opener certainly would be!

</blockquote>

2023-07-03

Minutes

Peter: no response to our latest comment

Tess: I'll draft a comment raising this issue re: device independence.

Tess: on IOS the picture-in-picture API has media controls and you have to feed it media content.. so you woulnd't be able to interact with controls... etc...

Rossen: is the pip supposed to be interactive?

Tess: i think that's the point. One use case is you want your own media controls that you've implemented in js, and you think the ux is jarring of switching between your media controls and the systems controls when the video goes into pip, so you want a document fragemnt for the video and controls to go into pip. Or you want to control the media loading and selection so mid stream you can do ad insertion. Or there's some region of the page that you want to be pip'd like a notetaking screen that isnt' video at all, so you can have that persist on the screen and you can type notes into it quickly. At the end of the day I don't see yhow you could do this api on ios

Rossen: most of the lower level platforms support pip in a media streaming only way. when I hear picture in picture my immediate understanding its a picture. Not an app in app. Which is what you're describing here. Where you have input, and everything else that comes along, with a different engine

Tess: when we asked about how you'd do this on ios the first reply we got form one of the proposers was "I'm not sure how to address this ... weren't focussed on android/ios feasability". Shouldn't we be designing web stuff that works on all major platforms?

Dan: considering majority of web usage happens on mobile devices, we need to focus on that

Rossen: their first goal is to allow a website to display an arbitrary htm lelement that is always on top of the window. It has nothing to do with picture in picture

Tess: either they want the fullscreen api.. but they don't.. because they want it to be outside of the window. or they want window.open

Lea: confused about .. I know on osx there is a flag to keep a window on top at all times, I've used this. What are the actual differences between the PIP api in osx and havin ga window that is always on top, besides presentational differences? Is it media controls? performance improvements? What would make a window that is always on top a bad candidate for PIP?

Rossen: at least in other platforms, the ux of PIP is very well expected already. The control of the window and its behaviour. It's always a small portion of the screen. If you expand it and it takes over everything. There's an expected windowing behaviour of the pip experience that is very hardcoded, and I'm assuming most operating systesm, it's not something you can control at all. You can't say make it half of hte screen. You get the tiny window always. Will they also want to control the size of the window and its position? Basically, everything you get with window.open. So why not window.open?

Lea: but then it would not be PIP. I thought that was the whole argument.

Tess: maybe you add an always on top flag to window.open

Lea: that's what we proposed

Tess: I think that's fine

Tess to leave comment

2023-07-mos-eisley

Minutes

Dan: I'll ping google folks about this to see if we can get movement.

2024-01-08

Minutes

Dan: heard some noise about from chrome on social media

Lea: I remember Tess had some comments but don't see them. Commonalities between this and window.open. Tess said the actual OS level picture in picture doesn't allow a lot of stuff from that. Should this hook into the native OS level picture in picture, or be a way to have windows that are always on top? I'm not clear what would be the difference. Would benefit from Tess's perspective.

Dan: seem to remember that Tess had input on how this would work in webkit - it wouldn't

Lea: not my recollection.. there are two components. Having a window always on top which is a concept on OSX as every operating system. My understanding is that there's always a picture in picture functionality that is separte from this concept, ipmlemented as a separate orthogonal feature. Not sure how it differs - is it UI or does it have other advantages? Like being optimised for showing video? The use cases here go beyond video.

Dan: specifically not video

Lea: then I'm not sure how the native PIP relates. Seems like what they really want a window that is always on top

Matthew: from briefly looking - an example they give is a highly custom video player, but that's no the focus. Speaking of focus... the word ofcus only comes up once in the explainer and it's about mental focus not browser focus. I don't see any mention at all ofhow they envision focus management - how do you get between the popup and where you were? I've not used this in real life but I'd be very interested as to what they're expectations are around that. Moving between the window and the document is something everyone is going to be able to need to do.

Dan: we could ask what the current state is, and see if Tess comes to plenary for further discussion

Lea: I don't see anything in the reply tha answers our questions about use cases... links to an entire thread on blink dev. Their first case is PIP with video, which exists. Seems that everything in there is something that is a window that can be on top, which has the benefit of ensuring a certain amount of browser UI around it, which prevents spoofing. I found the comment referred to from dominic .. they don't think our advice is good... I'm not sure what this means

Matthew I think what he's saying is that you're only able to open one of these popups for a given page or tab. But it's unclear.

Lea: this could also be decomposed into a window.open concept

Matthew: There is something about window.open in the explainer: https://github.com/WICG/document-picture-in-picture?tab=readme-ov-file#since-this-is-pretty-close-to-windowopen-why-not-just-add-an-alwaysontop-flag-to-windowopen which I wonder if they added since we looked

Dan: where have they mentioned spoofing? Can't find it in the explainer

Amy: they said they added normative language in the spec

Lea: [reads it] sounds just like window.open. Generally an antipattern to introduce multiple features that do similar things in slightly different ways. Seems like none of the reasons not to extend window.open, but point to even more low level primitives that window.open needs. Seems like overfitting. Am I missing something?

Dan: what is 'enough ui'?

Matthew: imagine there's a tacit assumption that this window will only be shown as a direct result of user action, but can't find a mention of that in the exlpainer - is it too obvious or should we get it confirmed? Also does the window.open section answer any of our questions - I think it doesn't.

Dan: we should add that question about user action.. whether user activation is required is important

Hi @steimelchrome thank you for bearing with us. We're picking this back up today and trying to get you some useful feedback.

We're concerned about spoofing. There's some language in the spec about this now - great - but there is no discussion in the explainer about possible abuse cases and mitigations against those abuse cases - that would be very valuable. Also the language in the spec just says UAs need to provide "enough UI" which is a bit vague. Is there any non-normative language that could be added here to elaborate on the kind of UI that should be provided? Also, could you confirm that the PiP window will only be opened as a direct result of user action?

We're concerned about accessibility - specifically the explainer doesn't mention how focus management is expected to work - how will users move between the PiP window and the main document?

We are still concerned that from an author perspective, this introduces a feature that is very related to window.open() but solves subtly different problems. We saw the part in the explainer about this, but it may be useful to decompose the problem into the parts where window.open() behavior conflicts with what is desired here, and examine whether these primitives may be useful for window.open() as well. Based on the differences mentioned in the explainer, that does seem to be the case:

a window that does not outlive its opener is definitely a useful concept for window.open(), in fact if we were to design window.open() today I'd argue it should be the default!
Feature testing window.open() features is also a more general problem.

Decomposing this into lower level functionality that can be integrated in window.open() would also address the spoofing concerns as well. An important question we need clarity on is, is there something that makes this functionality fundamentally incompatible with window.open() or is it about managing design & implementation effort?

Can you please let us know the current status and any response to these issues?

</blockquote>

2024-01-15

Minutes

Lea: Dominic got back to me and I asked a question...

Dan: he did clarify

Lea: yes. There could still be a way to do this as window.open... or it still may make sense to base it on that - additional additional functionalty - and 3rd design option which is orthogonal to window.open what they're doing... Want to make sure they've all been considered and whether they're all feasible. Do they mean the top level window not including any iframes when they say top level traversable? That's my interpretation.

Dan: leave it a couple more days?

Lea: should I add another comment to explain why I'm asking?

Dan: yeah

Matthew: I think it's right for us to ask about the current issue of the overall design first; we did also ask about spoofing and accessibility so we need to make sure we get answers on those eventually.

2024-02-26

Minutes

Discussed concerns about these being more generic capabilities that perhaps should be part fo window.open (or a replacement API)

2024-03-18

Minutes

Matthew: They've provided us with several updates, but doesn't look like we have any pointers to options 2 and 3 from Lea's comment having been considered.

Peter: window.open() could be modified to restrict PiP window numbers. Should we ping them to answer that question?

Hi @steimelchrome, thank you for your recent updates. We are still unclear as to whether options 2 and 3 from Lea's comment have been considered - could you point us to the outcome of any discussions on those?

</blockquote>

Matthew: left the comment

2024-04-22

Minutes

<blockquote> Hi @steimelchrome can you feed back on any updates to this proposal? Matthew [asked a question](https://github.com/w3ctag/design-reviews/issues/798#issuecomment-2004432739) above regarding Lea's feedback that looks like it's still pending. Thanks! </blockquote>

Dan: posts comment

2024-05-06

Minutes

Dan: 2 weeks ago there was an update.

Max: Tess also had a comment about cross-platform implementation...

Max: in 2022 some experts from Apple replied that it's not feasible to implement in IOS...

Dan: I'm suggesting that we should close this as unsatisfied... since we don't seem to be getting anywhere and they are not engaging with our feedback?

2024-05-13

Minutes

Peter: Response to Lea's feedback, though unclear as to the answers to Lea's specific questions (on why this isn't just part of window.open()). They raised the point that they want gestures to work the same way as if the PiP window was an <iframe> - though that doesn't fully explain it to me.

Lea: All of the differences that it has from window.open() would actually be useful there - like closing the windows that an opener has spawned. I'd argue for that being the default - we can't do that, but we could make it possible.

Peter: Control over how things close would be good.

Lea: There was discussion on implementability. The WebKit position indicates that PiP is video-specific on iOS. Are there use cases that need more than video? If needing to use feature detection, can't currently use window.open() (though that could be added there).

Peter: Is there a use case for making window.open() async?

Lea: One reason for not using window.open() was the requirement for this to be aysnc. There is a requestWindow() method that resolves to a promise. Why is that necessary to be async? So why can't we just improve window.open()?

Peter: Because they want to have a permission on it - that's also useful for window.open() and was my thinking too.

Matthew: +1 to improving window.open()

Lea: All of the extension seem appropriate for window.open()

Peter: I get that window.open() can't be used for this, but a better window.open() (would need to have a new name if async) would be a good idea.

Lea: agree. There were proposals for a better window.open() that was async (so, different name) in WHATWG - I linked to them in my first comment on the issue.

<blockquote> Hi folks,

We discussed this again during a breakout today.

Overall, we see why the current window.open() doesn’t work for what this API is trying to do, however it appears that all of these differences are things that would be useful for window.open() as well:

An async API to allow gating behind a permissions prompt
Feature detection for individual parameters
Allowing up to one window per top-level traversible
Ability to create "always on top" windows
...

We understand that improving window.open() is a substantial undertaking, however from an architectural point of view, we cannot justify creating a parallel, more narrowly scoped API for the sole reason of avoiding that work. Instead, we encourage people to work on the existing effort to modernize window.open() and ensure it covers these use cases as well.

The video-specific use cases appear to be covered already by video.pictureInPicture() so designing this as a more general API seems appropriate. It is unfortunate that not every existing platform can implement this API, but it is clear that there are use cases that go beyond video, so we think that as long as feature detection is possible and has good ergonomics, this may be worth doing.

</blockquote>

2024-06-17

Minutes

Proposed close

We review Lea's last comment

Dan: satisfied with concerns or unsatsified?

Peter: I agree with Lea's points.. I'd want to have Lea's input before closing

Lea: there's been no response. Unsatisfied

Tess: also unsatisfied. Mostly agree with Lea's comment.

Lea: we've discussed a lot. I don't think this design is architecturally sound, just addresses immediate problem rather than what is best for web platform as a whole.

Tess: writes closing comment