design-reviews#427: Modal window

#427: Modal window

Visit on Github

Opened

Sep 27, 2019

こんにちはTAG!

I'm requesting a TAG review of:

Name: Modal Window
Explainer: https://github.com/adrianhopebailie/modal-window/blob/master/explainer.md
Primary contacts: Authors @adrianhopebailie & @marcoscaceres . Contributors @rsolomakhin, @danyao, Junkee Song, @samuelgoto

Further details:

I have read and filled out the Self-Review Questionnare on Security and Privacy. The assessment is here.
I have reviewed the TAG's API Design Principles
The group where the work on this specification is: Web Payment

We'd prefer the TAG provide feedback as:

leave review feedback as a comment in this issue and @-notify @adrianhopebailie, @marcoscaceres, @rsolomakhin, @danyao, and @samuelgoto.

Discussions

Comment by @annevk Sep 30, 2019 (See Github)

The answer to

Does this specification enable new script execution/loading mechanisms?

is wrong. If you're creating a new kind of browsing context, you are in fact doing that.

Comment by @marcoscaceres Sep 30, 2019 (See Github)

Hi, just noting that I've not had a chance to review the proposal yet... still digging myself out from under a ton of TPAC email (so probably not accurate to have me as an author :)) ... just something we threw around during TPAC. Might be too early to be asking for TAG review.

Comment by @cynthia Sep 30, 2019 (See Github)

@annevk is right, it does look like the answer to that question should be fixed.

Comment by @rsolomakhin Sep 30, 2019 (See Github)

@cynthia @annevk : I've modified the answer:

Does this specification enable new script execution/loading mechanisms?

Yes, the specification enables opening a new top level context, similar to a popup.

Comment by @adrianhopebailie Sep 30, 2019 (See Github)

@marcoscaceres you did the initial hard yards to get this started so credited you as an author (I think it's mostly still your proposal plus some extra content on previous efforts and justifications).

Comment by @marcoscaceres Sep 30, 2019 (See Github)

I'm happy to help as author... but... just... need... time... best for us to discuss this somewhere else :) Let's pick this up over on @adrianhopebailie's repo. Give me a few days tho :(

Discussed Oct 16, 2019 (See Github)

Tess, Alice: We didn't get to look at this yet

Sangwhan: I actually did...

... This is another WebView API, analogous to Portals. We seem to have two competing things that are trying to show a webpage from a different origin, in a browsing context that is almost but not quite an iframe.

... Reviving one of these IE specials... use case seems to be for payments. Modal dialog which is a weird pop-up iframe, you can message back and forth, on top of your current document. Sort of not great.

... significant overlap with Portals. I don't understand why they're not being developed together. API just opens a web page in a "modal dialog".

.. IE modal dialog blocked execution of main modal opener. This one doesn't do that but inherits the same strange design.

... Use case is valid, but ...

Hadley: Looking at the issue, looks like they're not convinced this is mature enough for tAG review. Started brainstorming at TPAC and haven't finished.

Sangwhan: I agree that this type of capability is missing, but this sdoesn't seem like the right solution.

Lukasz: Do they foresee any impact on visivble user interface (i.e. they link to showModalDialog)

Hadley: Are you suggesting we ask...?

Sangwhan: Could we suggest trying to align this effor with Portals?

Peter: Considering the feedback we gave to Kenji et al, regarding layers, it seems like there should be a common base class here if nothing else. They should definitely be talking.

Sangwhan: I'll write that down as a first response... other browser vendors aren't really fans of Portals, ...

Peter: Tess and Alice, want to do a breakout on this?

... Sangwhan, I've added you.

... bumping it two weeks.

Comment by @adrianhopebailie Oct 16, 2019 (See Github)

Proposed to WICG: https://discourse.wicg.io/t/proposal-modal-window/3982

Discussed Oct 29, 2019 (See Github)

Peter: Alice, Anything we should talk about here?

Lukasz: Security questionnaire looks very interesting. I think it's not finished yet. Look at the answer to parts 2, for example. Still .5 -- in progress. Not sure what long term implication of introducing this new context.

Alice: I see that Marcos had commented that he intended to have a look and needed a few days. Over to WICG discourse... I guess involved in that?

Alice: ? and Sangwhan also wanted to have a look... bump this for next week, and capture Lukasz's comments?

Alice: Comment if they want to finish filling out the security questionnaire?

Alice: ... I can do that if Lukasz isn't. Lukasz, did you want to comment about the security and privacy questionnaire, or shall I?

Lukasz: I can comment!!

Peter: Bump this to next week, when hopefully rest of the folks are here.

Comment by @lknik Oct 29, 2019 (See Github)

Hi,

We discussed it during telecon today. It seems based on the questionnaire that security/privacy assessment process is ongoing (thanks for marking it so evidently, by the way, this is appreciated).

We wonder if you plan on updating the document anytime soon (particularly points 2, 5)?

Comment by @rsolomakhin Oct 30, 2019 (See Github)

@marcoscaceres : You wrote "Maybe" and "Unsure" for points 2 and 5. Do you have any new thoughts on those two points?

Comment by @pes10k Nov 1, 2019 (See Github)

Some privacy concerns (re-raising from email conversations with authors):

The spec seems very susceptible to spoofing (I believe the authors were going to revise…). Being able to position the modal at the bottom of the window seems to make a "overlapping with the toolbar / URL bar" spoofing defense impossible. Even more so with the ability to go fullscreen, and for the context being able to resize the modal window.
The same issue raised against similar functionality in the Payment Handler API: The spec says that the modal window is a 1p context. This cuts against the privacy improvements being pushed by partitioning storage, Safari's ITP, Brave's storage and cookie, as it allows peer communication between 1ps, and would enable cross site tracking. The modal context should be 3p, and a nested context of the triggering page.

Hope this helps

Comment by @adrianhopebailie Nov 5, 2019 (See Github)

@snyderp wrt to your first point, the explainer does say:

The browsing context (not the opener) controls the dimensions.

I think we can sharpen this up to be more explicit that neither the opener or child can control the size and position, only the browser. Would that alleviate your concerns, at least regarding your first point?

Comment by @pes10k Nov 5, 2019 (See Github)

@adrianhopebailie re the first issue, im looking at item 3. Just want to make sure I'm reading the right text, since that seems explicitly say the opposite (and not be a matter of ambiguous text). Is this the correct text to be reading?

Comment by @marcoscaceres Nov 6, 2019 (See Github)

@adrianhopebailie re the first issue, im looking at item 3. Just want to make sure I'm reading the right text, since that seems explicitly say the opposite (and not be a matter of ambiguous text). Is this the correct text to be reading?

Yes, you are reading the right text. We should reword that.

About 1p vs 3p, we should probably fix that too.

Comment by @danyao Nov 7, 2019 (See Github)

About 1p vs 3p, we should probably fix that too.

@marcoscaceres By "fixing", do you mean Modal Window should be treated as a 3P context embedded from the opening page?

I don't think that makes sense because a key value of introducing something like Modal Window is to provide a 1P context that still looks somewhat in context with the original page. The challenge is to find the tricky balance between making it "in-context" enough to not incur the full user friction of a redirect or popup, but distinct from the original page enough that bad actors cannot use it to silently track users. I believe the user activation requirement and adding a URL bar (possibly read-only) to the modal window set a correct balance.

I believe Modal Window will not be very useful if it's made to behave like a 3P iframe. @asolove-stripe voiced a similar opinion in a different thread and can speak about the use cases from Stripe.

Comment by @asolove-stripe Nov 8, 2019 (See Github)

I am very much not an expert on browsers and standards, but I do have a couple userland thoughts on why treating this as more similar to a top-level page are useful:

For payments, we want to be able to use webauthentication from parties like a bank or payment system that is not the owner of the main page, but does have the marked origin in the modal. We believe that, with the right restrictions on the secure modal, this would be a net-win for both privacy and security over current methods, that rely on tracking in iframes and problematic SMS-based authentication.
Some payment systems currently block iframing using x-frame-options and require full redirects or popups because they involve entering passwords and would therefore be phishable. Because the secure modal is clearly separate from the underlying page, and clearly marks the origin and security status of the connection, we believe these payment systems should be able to continue blocking iframing while allowing themsleves to appear within this secure modal.
Having access to 1p cookies is also important for use-cases involving authentication. Many federated authentication systems today depend on iframes and invisible redirects in ways that are indistinguishable from malicious trackers. But if the user is consciously choosing to interact with them, in order to pursue a clear purpose that is also clearly tied to an origin, it would be great for secure modal to support that. This would allow browsers to hasten the death of iframe and redirect attacks without breaking things like embedded payment forms and embedded federated authentication workflows that seem to be legitimate and are willing to be subject to having a visible origin and clear user consent.

Obviously, I understand concerns about this API being used for problematic cases that trigger security and privacy concerns. But I would love to see if you all can creatively support many of the use-cases above, so that third-party systems in payments and authentication can get rid of the much-worse status quo systems and replace them with secure modal.

Comment by @alice Nov 12, 2019 (See Github)

@hober and I discussed this. My thoughts:

The explainer is pretty vague on use cases: "could be useful for other use cases (e.g. authentication, sharing, access to third-party services)".
- Authentication seems like an important use case, but "access to third-party services" seems too general to be meaningful, and we weren't sure how the sharing use case would intersect with the Web Share API.
Given the lack of a clearly defined context for this API to be used, unlike the payment request scenario, it seems like it would be difficult to define a clear user gesture which should trigger the modal window to show. How would we avoid a scenario where users are being disrupted by modal windows showing the moment they interact with the page in any meaningful way?

Comment by @marcoscaceres Nov 12, 2019 (See Github)

Authentication seems like an important use case, but "access to third-party services" seems too general to be meaningful, and we weren't sure how the sharing use case would intersect with the Web Share API.

See below... but yes, they would need associated with some kind of action or intent (e.g., "authenticate").

How would we avoid a scenario where users are being disrupted by modal windows showing the moment they interact with the page in any meaningful way?

It could be similar to share (where a user picks a share target), in that options could be shown before the actual modal dialog is presented.

Picking share targets

Except, in this case, it would be related to the particular action (e.g., "authenticate").

Discussed Nov 19, 2019 (See Github)

Tess: We talked about this a couple of weeks ago and wanted to do a breakout

Alice: I am sure that we actually did a breakout! It is less of a proposal for a specific API and more a feature that various APIs can use which is not spelled out by the explainer.

Tess: I will try to dig up what my comments were going to be and actually post them.

Dan: Some new comments from Marcos around user gesture.

Alice: this made me believe this is not a standalone API.

Dan: F2F

Tess: Sure

Comment by @dbaron Nov 19, 2019 (See Github)

It might also be helpful to clarify what material is intended to be implemented directly based on this work, and what is intended to be used by other specifications and implemented based on that. And for the latter material, it may also be useful to provide advice to the other specifications that would use this.

Comment by @pes10k Nov 19, 2019 (See Github)

Just as reference, privacy aspects of the near-identical functionality in the Payment Handler API are being discussed here. https://github.com/w3c/payment-handler/issues/351

Main point of discussion is 1p vs 3p storage context…

Comment by @hober Nov 19, 2019 (See Github)

I'm concerned with your (apparent, implicit) design decision to populate native UI (the modal sheet) with arbitrary web content. My concern is similar to those expressed by Marcos, Nick, and Pete. Let me try restating it.

Native UI for existing payment methods (e.g. the Apple Pay sheet in Safari) is difficult to spoof with web content, partly because the native sheet overlaps native browser UI and the web content. Adding a feature to the web which allows sites to populate something that looks like such a sheet with arbitrary web content makes it far easier for web sites to spoof such UI.

Given this concern, I don't think your answer to question 11 of the security and privacy questionnaire is sufficient:

11. Does this specification allow an origin some measure of control over a user agent’s native UI?

Yes, to some degree. It allows a website to create a new modal window context which makes the calling context inaccessible until the modal is closed. To mitigate abuse of this capability it is recommended that only a single modal can be open per parent context.

Nor is your rationale for preferring your current proposal to pop-ups:

Pop-ups are generally locked down and difficult to invoke reliably due to the measures introduced by browsers to counter their abuse[…]

Given their modal nature, we can’t yet think of a good way to abuse modal windows. The assumption being that only a single modal window will be allowed at a time.

In both of these cases, the "spam the user with modals" attack is considered, but the "spoof native UI for phising purposes" is not. In particular, some of "the measures introduced by browsers to counter their abuse" are to counter the use of pop-ups for phishing. Don't you anticipate browsers having to take similar measures for modal windows? If not, why not?

Comment by @adrianhopebailie Nov 22, 2019 (See Github)

@hober IMO concerns about phishing native UI are unfounded. It seems trivial for browsers to implement modal windows in such a way that they cannot be mistaken for native UI by doing simple things like showing an address bar at the top of the window.

To use @marcoscaceres example, a native UI would replace that bar at the top with something that can only be rendered by the platform.

What is stopping a website from using a pop-up today that renders as close as possible to an Apple Pay sheet? This is arguably even more confusing since the window.open API provides the caller even more control over the look and feel that we would want to give a modal window caller.

I'm not sure how the answer to Q11 can be changed to provide any information that isn't already there? The proposal is to give the calling site the ability to create a new context but not have any control over how it is rendered. The influence the caller has is the modality of the new context, so by calling the API they effectively make their own context non-interactive.

With PH API the calling context has indirect control over the URL that is rendered inside that context (i.e. it specifies which payment methods are supported and the user selects a payment handler to invoke that supports that payment method) but has no control over the size or position of the window or what is rendered around it to flag to the user that it's a Payment Handler context.

Don't you anticipate browsers having to take similar measures for modal windows? If not, why not?

Not at the API level, no. The way browsers render modal windows should handle this as I describe above.

I'd also note that "native UI" is rare (the Apple Pay sheet is unique to Safari) so it would be useful to enumerate what native UI you consider at risk of being spoofed.

Comment by @adrianhopebailie Nov 22, 2019 (See Github)

@dbaron my original thinking was that this would simply be a new platform API that would ultimately replace window.open however it seems that limiting invocation of this feature to specific contexts (payment, auth etc) may be a good way to alleviate some of the privacy and security concerns.

Therefor, perhaps rather than making the API available to Window it could be limited to same-origin workers only in specific contexts.

This would require that APIs used for other use-cases adopt the same pattern as the payment APIs whereby a calling context calls a use-case a specific API (e.g. credentials API) and if the service behind the API is from another origin (as opposed to the UA itself) the worker for that service is invoked. If the worker requires some UI it is able to invoke a modal window.

i.e. As a reminder of how PR API and PH API interact:

The calling context (merchant.example) invokes PR API and specifies the properties of the payment service it wishes to invoke (supported payment methods etc)
The UA, possibly through user interaction, possibly based on user defined defaults, determines that it must invoke a Payment Handler from bobpay.xyz
The UA invokes the worker that is installed from bobpay.xyz by emitting an event that the worker is listening for.
The worker is able to call the modal window API but only from inside the event handler

@rsolomakhin @marcoscaceres are we on the same page?

Comment by @dbaron Nov 22, 2019 (See Github)

@hober IMO concerns about phishing native UI are unfounded. It seems trivial for browsers to implement modal windows in such a way that they cannot be mistaken for native UI by doing simple things like showing an address bar at the top of the window.

[...]

What is stopping a website from using a pop-up today that renders as close as possible to an Apple Pay sheet? This is arguably even more confusing since the window.open API provides the caller even more control over the look and feel that we would want to give a modal window caller.

I'm skeptical of users' ability to make the sometimes-subtle distinctions needed here as to what is "connected" to the browser UI and what is entirely within the page. (I'd be interested to see studies as to how effective it is and whether users can detect good spoofs.) Thus I'm somewhat skeptical of the idea of teaching users that this with this sort of sub-urlbar are safe... although I could also certainly imagine UIs that I would suspect are more likely to be effective at this than the screenshots I've seen (for either modal window or apple pay).

Given that I'm not confident of the effectiveness, I'm a little hesitant to do things that center on teaching users that these things are safe to interact with. (Then again, users seem happy to fill out things like https://twitter.com/davidbaron/status/898389439101018113 that are completely unsafe.)

Comment by @adrianhopebailie Nov 25, 2019 (See Github)

I'm skeptical of users' ability to make the sometimes-subtle distinctions needed here as to what is "connected" to the browser UI and what is entirely within the page.

I agree but that is not a new issue and this is an opportunity to make the differences less subtle. We already have pop-ups and they are already used for use cases like SSO. As you point out in the link worse solutions are also already widely used for payments.

This use case (login to your bank to authorize a payment/share data) is only going to grow in prevalence. It's the de-facto flow for Open Banking-based payments which are rolling out across the EU and growing in popularity elsewhere: https://www.bis.org/bcbs/publ/d486.pdf

Also, Secure Remote Commerce (SRC), the card industry's initiative to improve online payments is going to move to a profile/wallet based model requiring users to authenticate prior to making a payment.

All of these use case are moving from user supplied credentials on the RP site to cross-origin systems where the user is directed to an AS origin to authenticate and authorize a payment/login/other.

E.g. User will not provide a card number and CVV to the merchant, the merchant will direct the user to their digital wallet where they authorize the payment and be redirected back

E.g. User will not provide a username and password to the website, the website will direct the user to their IdP where they will authroize the sharing of their email address etc. and be redirected back.

These are now the de-facto auth flows for the web but the platform doesn't have the appropriate primitives to allow websites to do this securely for their users.

I see this as an opportunity to REPLACE what we have today (pop-ups) with something that is designed explicitly to solve for these cross-origin use cases AND in doing so consider challenges like phishing and window spam that were likely not considered appropriately when adding features like window.open.

Comment by @dbaron Dec 4, 2019 (See Github)

So I think there are two things that could make me more comfortable here (and I'm aware the second one is a substantial amount of work):

If this were a concept that specific specs (like Payment Handler) could use as a shared mechanism, but not a general purpose API that any website could use. (@alice will write more about this shortly.)
Demonstrating that it's possible to build a UI that is simultaneously (a) a polished and shippable quality UI and (b) where a reasonable portion of users are able to make the correct security distinctions based on it (compared to the best-possible fakes of it). (I think a "reasonable portion" probably at least means a similar portion to the Web's existing security indicators... although it would probably be desirable to do better! It probably also means including users using various accessibility mechanisms.)

Comment by @alice Dec 4, 2019 (See Github)

[...] a concept that specific specs (like Payment Handler) could use as a shared mechanism, but not a general purpose API that any website could use.

For example, it could be a concept like "browser tab", which refers to something created by the UA, but which cannot be created directly by the web page.

It might be worth fleshing out some pre-requisites for an API to be able to use this mechanism, e.g. that the API must be able to create a list of trusted providers.

Comment by @danyao Dec 11, 2019 (See Github)

Thanks @dbaron and @alice for the feedback!

If this were a concept that specific spec (like Payment Handler) could use as a shared mechanism, but not a general purpose API that any website could use. (@alice will write more about this shortly.)

This seems reasonable. @agektmr has made a similar suggestion elsewhere and I think it may be an effective tactic for us to validate the benefit of this proposal while buying some time to better understand the security / privacy / UX threats so they may addressed. I think a critical next step to check whether Modal Window API has legs is to find some concrete use cases, whether standalone or as part of API, where this proposal is clearly better than alternative solutions. It seems that a likely area may be cross-origin service coordination, but more investigation is needed to clearly articulate the user problems that are being solved.

Demonstrating that it's possible to build a UI that is simultaneously (a) a polished and shippable quality UI and (b) where a reasonable portion of users are able to make the correct security distinctions based on it (compared to the best-possible fakes of it). (I think a "reasonable portion" probably at least means a similar portion to the Web's existing security indicators... although it would probably be desirable to do better! It probably also means including users using various accessibility mechanisms.)

These are very fair questions. They are similar to what we want to answer for Payment Handler API as well. Chrome is planning to conduct a UXR in the near future to answer these questions, but in the context of the current payment handler flow. We’ll share any transferable learnings on this thread.

[...] a concept that specific specs (like Payment Handler) could use as a shared mechanism, but not a general purpose API that any website could use.

For example, it could be a concept like "browser tab", which refers to something created by the UA, but which cannot be created directly by the web page.

It might be worth fleshing out some pre-requisites for an API to be able to use this mechanism, e.g. that the API must be able to create a list of trusted providers.

These make sense. Out of curiosity, @alice, did you have any APIs in mind that could benefit from such a "browser tab" concept, if we flesh out the prerequisites?

Comment by @dlongley Dec 16, 2019 (See Github)

The Credentials Community Group is keeping an eye on this thread for potential use of a modal window API/feature for the Credential Handler API, a work item of that group.

A video of a demo of that API is here, polyfill code here, and a simple demo that includes a polyfilled UI (like the Payment Handler UI) that we would prefer to become a more phishing-safe modal window is here.

Comment by @hober Dec 16, 2019 (See Github)

@danyao wrote:

Chrome is planning to conduct a UXR in the near future to answer these questions, but in the context of the current payment handler flow. We’ll share any transferable learnings on this thread.

Looking forward to it!

Comment by @alice Dec 16, 2019 (See Github)

@danyao

I think a critical next step to check whether Modal Window API has legs is to find some concrete use cases, whether standalone or as part of API, where this proposal is clearly better than alternative solutions.

That sounds like a very promising avenue, and would certainly help provide context for reasoning about the security, privacy, and other UX implications.

It seems that a likely area may be cross-origin service coordination, but more investigation is needed to clearly articulate the user problems that are being solved.

Once again, completely agree!

... did you have any APIs in mind that could benefit from such a "browser tab" concept, if we flesh out the prerequisites?

Not specifically, but I think your point about cross-origin service coordination is extremely salient, and it would be worth looking for APIs covering the use case examples given in the explainer - e.g. sharing and authentication.

Comment by @alice Dec 16, 2019 (See Github)

It seems like you have plenty of work to go on with, and we probably won't have a lot of feedback to offer until that's done, so we're going to close this for now.

Please comment here or file a new review linking back to this one when you're ready for another look.

Thank you for so thoughtfully considering the feedback we've had so far, and we look forward to more collaboration in future!