design-reviews#64: Cross-spec device identification & association

#64: Cross-spec device identification & association

Opened Jul 28, 2015

We're developing several specs that interact with physical devices. Several of these specs provide IDs for their devices to let web developers associate properties with the devices over time. MediaDeviceInfo takes the extra step of providing separate IDs for the "device", meaning a particular microphone, camera, speaker, etc., and for the "group", meaning a collection of devices on the same physical hardware. But media devices aren't the only thing that can come in a group, so we should be thinking about how to let web developers associate device groups between different specifications.

A particular use case for this is associating a headset's audio channels with the headset's Bluetooth settings channel.

@jan-ivar was kind enough to write down in w3c/mediacapture-main#202 some of the properties of the existing IDs:

MediaDeviceInfo.groupId is not persisted to disk, but stable within a browser-session and not origin-unique.
MediaDeviceInfo.deviceId is origin-unique, and persistent only when permission is granted: w3c/mediacapture-main#218
MIDIPort.id - is persistent but may not be origin-unique.
BluetoothDevice.id is origin-unique, not available at all until permission is granted, and persistent until permission is revoked.

Unless someone argues otherwise, I'm going to treat any lack of origin-uniqueness as a bug.

As an initial base for discussion, I'd suggest that we add a groupId property to at least BluetoothDevice, and specify that it's shared with MediaDeviceInfo.

I think we don't need the groupId to persist across browser sessions as long as the individual devices in a group do have persistent IDs, but it might be easier for developers if it is persistent. Do we make platform evolution harder in any way if the group ID is persistent?

Where should we put the algorithm for generating group IDs? How do we word that? I don't think we can specify how a UA figures out that two devices are in the same group: it'll depend a lot on the connection technologies.

Discussions

Comment by @jyasskin Aug 13, 2015 (See Github)

Other specs that should participate in these group IDs:

WebNFC (@zolkis @kenchris)
WebUSB (@reillyeon)

Comment by @reillyeon Aug 13, 2015 (See Github)

USB has recently introduced the concept of a "container ID" that is designed to provide a unique identifier for a physical device across the various components within it and external connections it has. For example, a USB printer that also has WiFi can be recognized as the same device no matter how it is connected.

Comment by @jan-ivar Aug 14, 2015 (See Github)

Since I'm mentioned here, I should point out that I'm not sure tying ids from different APIs together is necessarily a good idea. What I meant to illustrate with the detail I gave in the other issue was the varying requirements around ids in specific APIs, as well as the privacy sensitive nature around them with regard to unintended use or misuse, as well as to their lifetimes.

With all those needs in play, adding lateral dependencies across specs, with perhaps additional uniformity needs, I worry may create limitations that aren't inherent in any one API alone.

Others probably have their own ways of judging how things are best organized, but I personally lean towards chasing real specific use-cases, like the headset one, and I tend to run in the opposite direction of general symmetry for its own sake, as I find there's usually a cost associated with any discipline.

Comment by @torgo Sep 15, 2015 (See Github)

@travisleithead at Boston f2f: When you ask for an id you get the same id for the same device when your are in a session. In the webrtc case they want to be able to store a device identifier across sessions. Can be a fingerprinting issue if the same device id is leaked across different origins. Each spec has taken a slightly different approach - could there be a consistent approach taken? Can ids be recycled?

Comment by @torgo Sep 15, 2015 (See Github)

Is there already some best practice thinking that we could ... er... reference...

Comment by @travisleithead Sep 15, 2015 (See Github)

Some thoughts from the minutes:

ask for an ID and get the same device, within the bounds of a session.
store a device ID so that the application don't have to ask for a device to be selected time and time again.
If the hardware device id is leaked across origins, that's not good.
Can IDs be recycled among different types of hardware?"
what should be the available entropy in the pool?
should be reset even for the same origin when cookies are reset.

Comment by @dbaron Jan 13, 2016 (See Github)

Discussed this a bit at the Melbourne face-to-face (afternoon of Wednesday January 13). Seems like there's a desire to have a document describing these principles, and Travis will work on drafting that.

Comment by @travisleithead Jan 13, 2016 (See Github)

(specifically adding said principles to the existing design-principles doc)

Comment by @torgo Mar 31, 2016 (See Github)

Taken up at london f2f.

Comment by @torgo May 4, 2016 (See Github)

Taken up at tag teleconference 4-may. Travis to put finishing touches on a proposal and socialize to WebUSB and WebBluetooth as well as @tobie for peer review.

Comment by @travisleithead May 4, 2016 (See Github)

Updated! https://w3ctag.github.io/design-principles/

Comment by @jyasskin May 4, 2016 (See Github)

Cool, but note that https://w3ctag.github.io/design-principles/#device-apis doesn't address the main question from this issue: how should a site be able to learn that a BluetoothDevice and a MediaDeviceInfo represent the same physical device?

Comment by @tobie May 5, 2016 (See Github)

As far as the sensor work goes, I'm completely punting on exposing identifiers until I hear of potential use cases for them, or until we actually start seeing (1) multiple sensors of the same type in the same device or (2) a use of generic sensor APIs beyond same-device sensors.

Note there might be genuine use cases for exposing actual hardware identifiers in order to identify precisely the chip that's being used. This will absolutely be the case for the WebGPIO work from the Browsers and Robotics CG. But there also might be legitimate use for this with sensors derived from the Generic Sensor API. In this case, it would be useful to get more precise guidance from the TAG (or in the permission spec itself) as to how permissioning should be handled. For example, should requiring access to the sensor HW identifier require a different permission level than simply accessing sensor data (e.g. through dedicated permission descriptor types)?

Worth also noting that because operating systems already are an abstraction-level above sensors—as are sensor hubs in more high-end devices—there is no clear, one to one mapping, between a physical sensor and its software representation within the user agent. This might cause a careless implementation to unintentionally reveal HW implementation details, for example, by using the same random identifier for different sensor types provided by the same sensor hub, or tying a high-level sensor to a specific low-level sensor's ID used in its implementation.

Comment by @zolkis May 5, 2016 (See Github)

I like the approach taken in the OIC Core specification:

use an opaque device ID (there a UUID) to identify the device (meaning an OIC stack instance there, but a physical device here), and
use a device-relative path to identify classes and instances of resources (sensors, actuators).

We could play with the scope and lifetime of device IDs: per publishing, per session/origin, etc. API implementations may use the device id's but don't necessarily have to expose them; in most cases it will be enough to expose the resource path (e.g. /myapp/temperature_sensor_1x_C).

Updated! https://w3ctag.github.io/design-principles/

Cool, but note that https://w3ctag.github.io/design-principles/#device-apis doesn't address the main question from this issue: how should a site be able to learn that a BluetoothDevice and a MediaDeviceInfo represent the same physical device?

I guess it depends on the context. The questions are: what is the threat knowing that, whether the threat is site-specific, what is the policy used against it, is the user permission enough, etc.

If these are hard to define or control, the information is best not exposed.

But if the possibility of identifying the device is there by other means, and the threat is minor, why not expose if there is a use case? BTW, what is the use case? :)

Comment by @travisleithead Aug 17, 2016 (See Github)

Need to note the latest thoughts on using an object to convert device ids for persistent use--as discussed in Stockholm.

Comment by @torgo Apr 28, 2017 (See Github)

@travisleithead to do some prep-work ahead of time for a discussion 5/23. Also related to WebUSB #108 and web bluetooth.

Comment by @travisleithead May 22, 2017 (See Github)

On availability of device/group ids:

Leave it up to the feature -- don't try to normalize, as privacy implications may be different across various types of devices (some are more sensitive to discover than others)
It would explicitly not be a goal to create a one-size-fits-all, one-stop-shop for getting all devices and their ids, etc.

On correlation:

For device/group ids that can be obtained (see previous point), correlation among them within a single web app should be the default (e.g., not require any special API to correlate). The fingerprinting aspect is already possible, and the entropy of device uniqueness and characteristics is sufficiently high that implicit correlation is already trivial, and thus attempting to obfuscate the correlation is only making the web platform harder to use for legitimate correlation scenarios.

On persistence:

As a general rule of thumb, implicit persistence of any device ids without permission seems risky. It is a convenient feature for return-visits to a particular origin because explicitly-saved device ids can be retrieved bypassing the discovery phase, but in nearly every other scenario it leaves breadcrumbs around for cross-origin correlation. Trying to keep the persistence origin-unique is one approach, but may not be suitable for all devices. My preference would be to issue general guidelines that no device/group ids are persisted, and that a new API to "persist" a device/group id is designed so that this is always a specific permission-based action that users could understand (e.g., "web site X wants to remember your YYY Bluetooth device").

On device/group hierarchy:

I don't know that I understand the use cases that drove the separation of device ids from group ids from container ids. In the container id scenario mentioned earlier, I would expect (in my correlated approach thinking) that the device id generated for the USB printer would be the same id generated by some Wifi connection API (e.g., the UA would have an internal table of device-to-generated-id and each time that device was requested from any API, the lookup table would return the id. The said table would be flushed at navigation time to avoid persistence).
It seems reasonable for devices to be grouped. These groupings may need to be uniquely identified from each other such groups, and hence a group id is necessary. Do these group ids need correlation (when they are groups of devices)? Maybe. Is an 'id' the right way of grouping? Maybe not.
When a single device provides many services, do these "sub-devices" get exposed as regular device ids? Do they all have the same device id in this case? Or is a new field needed to separate them? I would be personally inclined to go with a simpler approach and avoid proliferating different hierarchical id representations throughout the platform if possible. Where the ship has sailed we should try not to make it worse. E.g., if five sensor devices are exposed to the web platform, it should be OK to give them all the same device id (if they are all essentially the same sensor-fusion device), or if a photo camera device and video camera are the same device, but exposed through different interfaces--let them have the same id. There are always other ways of telling the devices apart, rather than through their id alone, and these will be contextually apparent.

API for device id persistence:

Just food for thought. Something relatively trivial, async-based (to allow for permission prompt), and which allows for subsequent checking on a previously-generated persistent id (since these may be cleared at the UA's discretion).

partial interface Navigator {
   Promise<DOMString> getPersistentDeviceId(DOMString deviceId);
   Promise<boolean> hasPersistentDeviceId(DOMString persistentDeviceId);
};

With the 'get' API, a non-persisted device id (all of them by default, ideally) is provided, and the UA asks for permission to persist it. If granted, a new persistent id is generated by the user agent, persisted associated with the origin, and returned.

The same persisted id, if passed to the 'has' method answers true or false, if the UA knows about it (still). It's async to allow the UA to implement mitigations against frequent requests like exponentially increasing time-out periods between fulfilling the promise.

Presumably, the 'get' API would return the same persisted id when called with the same device id multiple times.

Comment by @travisleithead May 30, 2017 (See Github)

Discussed on TAG call 2017-05-30. Thought we should flesh out the API a bit more. Also want to get some more thoughts on the issue of default persistence.

Comment by @travisleithead May 31, 2017 (See Github)

Adding some folks for their thoughts: @jyasskin, @zolkis, @reillyeon, @Jan-Ivar, @marcoscaceres, @anssiko, @mounirlamouri

In recent TAG discussion we looked at a variety of topics around device ids, but the one that gathered the most attention was the question of persistence. Our goal is to try and understand the complexities, and then try to establish cross-spec guidance around persistence of device ids (see https://github.com/w3ctag/design-reviews/issues/64#issuecomment-303243097). Two schools of thought have been suggested:

Recommend no-persistence by default; to get persistence, the platform will provide an API to convert a temporary device id to a persistent one, optionally with a user-prompt (see suggestion above).
Recommend that all device ids be created unique to an origin in a way that the id cannot be guessed by another origin, but can be re-used by the origin in subsequent sessions -- thus persistent. This streamlined approach does not need to involve the user opting-in. Make this the norm for all device ids to enable persistence by default.

In both cases, it would be desirable that clearing the browser settings/storage would clear any persistent ids/cause new unique ids to be used/break the persistence.

Qs:

Is a user prompt a necessary precaution? Is it overkill?
What security/privacy benefits are gained by one vs. the other?
3rd alternatives?
Any concerns with my thoughts on correlation-by-default (within an origin--see above)?
There are risks to making changes to device id on the existing web. Is it worth attempting to change device id behavior at this point?

Comment by @kenchris May 31, 2017 (See Github)

I don't think we have looked a lot at persistence yet. Selection (with filters) seems to work fine for Web Bluetooth and Web USB. Also when plugging in a USB device, a dialog appears which takes you to the URL (Physical Web for Bluetooth), which may allow to auto connect, which at least is the plan for Bluetooth.

In general we are trying to avoid permission dialogs. Too many may teach users to always accept, and only slightly annoying non-modal permission drop downs (like in Chrome) makes sites prompt way too early. On the other hand, no site would want to show a modal device selection dialog at the wrong time.

Comment by @kenchris May 31, 2017 (See Github)

@tobie for sensors, groups might make sense, especially if we talk about connected sensors, ie. a laptop might have accelerometer and gyroscope, but when you connect a VR headset (let's say a cabled one in this case), then that headset might have the same and you want to make sure to use those for your use-case. Same might happen if we expose sensors over Bluetooth, ie a specialized VR controller, like a lasergun or sword.

Also with modular phones (like the newly announced Essential phone, to which you can attach modules which becomes part of the phone over a USB protocol) this might become more important

Comment by @travisleithead Jul 26, 2017 (See Github)

Discussed again at London F2F. I'm inclined to recommend the alternative mentioned above about device ids being persistent by default. Next step is to write this guideance up in the design principles doc and seek wide feedback there.

Issue w3ctag/design-principles#69 will track putting these guidelines into design principles.

Comment by @travisleithead Sep 5, 2017 (See Github)

Took-up at 2017-09-05, doesn't appear to be any disagreement, so will work on getting this guideline into the design-principles doc.

Comment by @jan-ivar Sep 5, 2017 (See Github)

@travisleithead I'm sorry, I've been waiting for something more concrete in https://github.com/w3ctag/design-principles/issues/69 to review, but nothing has materialized so far.

Of the two schools of thought, the first one seems like a red herring, as having a super-cookie that works as an identifier across all domains seems like a bad idea, even behind a user-prompt.

The second school appears to tie the concept of origin-unique IDs, which are always a good idea IMHO, with the idea that these should be persisted by default.¹

That seems like a false choice to me. Instead, the options to consider here seem to be opt-in vs. opt-out vs. always persisted (I trust clearing cookies erases IDs regardless).

MediaCapture is an example of implicit opt-in in Firefox. That is, origin-unique deviceIds are only given to sites the end-user indicates an elevated level of trust with (e.g. sharing mic or camera). This critically leaves out the drive-by web.

There are normative requirements on browsers that forgo this:

"[When permission is lacking]... the user agent MAY clear this identifier once the last browsing session from this origin has been closed. If the user agent chooses not to clear the identifier in this condition, then it MUST provide for the user to visibly inspect and delete the identifier, like a cookie."

Specifically, the concern was creating a new class of persistent identifier without visibility.

Will the guidelines (about to be) given here, run afoul of this concern?

<sub>1. Or at least "to enable persistence by default", which is not the same thing as recommending it. Are we merely recommending origin-uniqueness here, in order to let browsers that so wish to persist by default, or are we recommending all browsers persist by default? I can't tell.</sub>