design-reviews#493: Data for measuring audio-video synchronization and end-to-end delay in realtime communications

#493: Data for measuring audio-video synchronization and end-to-end delay in realtime communications

Opened Mar 31, 2020

Hello TAG!

I'm requesting a TAG review of captureTimestamp and senderCaptureTimeOffset.

These two new data fields are added to RTCRtpContributingSources as WebRTC extensions. They are used for measuring how synchronized the audio and video tracks are, and the end-to-end delay, in real-time communication applications. They are particularly desired by real-time communication systems, where an intermediate steam regenerator that terminates the streams originating from senders, an audio mixer as an example, is involved.

Explainer¹ (minimally containing user needs and example code): https://github.com/w3c/webrtc-extensions/blob/master/explainer.md
Specification URL: https://w3c.github.io/webrtc-extensions/#rtcrtpcontributingsource-dictionary
Tests:
- Tests for captureTimestamp: https://cs.chromium.org/chromium/src/third_party/blink/web_tests/external/wpt/webrtc-extensions/RTCRtpSynchronizationSource-captureTimestamp.html
- Tests for senderCaptureTimeOffset: will be added
Primary contacts (and their relationship to the specification):
- Minyue Li (@minyuel), Google
- Chen Xing, Google
- Henrik Boström (@henbos), Google all are authors to the specification.
Organization(s)/project(s) driving the specification: Google Hangouts
Key pieces of existing multi-stakeholder review or discussion of this specification: The proposal has been presented to W3C WG and browser implementers. While there is no signal from other implementers to implement this yet, the proposal has been reviewed without concerns raised.

Real applications, Google Hangouts and Meet, for example, have been asking for reliable audio video synchronization and end-to-end delay measurements for several years, which we can back up with discussions from 2017, see https://github.com/w3c/webrtc-stats/issues/158.

External status/issue trackers for this specification: https://www.chromestatus.com/feature/5728533701722112

Further details:

[] I have reviewed the TAG's API Design Principles

Relevant time constraints or deadlines: n/a
The group where the work on this specification is currently being done: WebRTC W3C WG
The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): WebRTC W3C WG
Major unresolved issues with or opposition to this specification: N/A
This work is being funded by: Google

We'd prefer the TAG provide feedback as (please delete all but the desired option): 🐛 open issues in our GitHub repo for each point of feedback

Discussions

Comment by @minyuel Apr 1, 2020 (See Github)

Hi @cynthia,

Thanks in advance in looking into this! Feel free to reach out to me if any questions.

Comment by @dbaron Apr 2, 2020 (See Github)

I'd note that it seems like this is the part of WebRTC that is more on the IETF side than the W3C side, and it feels like there might be more relevant expertise for reviewing this there. Probably the first place to go would be dispatch although it might eventually fit in avtcore.

Comment by @henbos Apr 2, 2020 (See Github)

If there is interest from more experts to take a look at this I welcome it, but I don't think more technical expertise is a blocker at the moment (there are no outstanding technical issues). We would like more interest/commitment from implementers, but that is tangental to IETF.

Comment by @alice Apr 7, 2020 (See Github)

Saying "there are no outstanding technical issues" seems to indicate a misunderstanding of @dbaron 's suggestion. The aim of seeking wide review is not necessarily to answer known questions, but to uncover questions or issues the original team may not have considered. Part of what the TAG review process can offer is to suggest alternate venues for wide review.

Comment by @henbos Apr 8, 2020 (See Github)

OK that makes sense, just wanted to make sure we were not trying to address one resource-related problem (implementer interest) by remedying a different resource-related problem (available expertise).

Can you help clarify?

Is asking for wider review a requirement of TAG or a suggestion if wider review is deemed necessary? (Was it just deemed necessary?)
What does "the first place to go would be dispatch" mean in terms of next steps? E-mail dispatch@ietf.org asking to take a look?

I want to make sure this moves forward so that we don't wait for TAG review but TAG review waits for us to contact experts and then we wait for each other.

If more in-depth reviewing is a requirement I could also try poking contacts at Firefox, Safari and Edge asking if they can give a formal review.

Comment by @dbaron Apr 9, 2020 (See Github)

So the TAG doesn't have the authority to impose requirements.

But it seems like this is in the part of WebRTC where there's more existing expertise on the IETF side than the W3C side, so it seems like it probably would be good to try to take some steps to get review from the set of experts there. Emailing dispatch seems likely to be one of the better ways to do that.

I don't think there's anything that the TAG review is waiting on, but this is also the case where the TAG probably doesn't have a lot of expertise in the space, and review from other sources is likely to be more valuable.

Comment by @henbos Apr 9, 2020 (See Github)

@alvestrand Do you have anything to add about W3C-side vs IETF-side expertise of WebRTC in relation to this TAG review?

Discussed Apr 13, 2020 (See Github)

David: Was hoping Sangwhan had some comments.

Peter: I'm a little bit concerned that we'll miss important comments because this is outside of our field of expertise.

Sangwhan: So the original question I had was are there any security implications of having a somewhat disentaglable sum of delays which map to your hop route - and I don't know. I'm sure someone with the right security experience can find a way to abuse this but all I know is that given small enough hops and the sums of maybe 2-3 users it seems numerically possible to unmask, whether or not hat that is useful.. no idea.

Re-enforce the original "please discuss with IETF" after repeating the original question

Comment by @cynthia Apr 14, 2020 (See Github)

As I understand it this exposes an integral of the intermediate latencies that it has seen on the communication path to be used as a synchronization hint. (Given the right set of samples it could maybe be used to unmask the actual intermediaries, but that's given the right samples that are short enough in terms of hops) This goes wildly outside of my field of expertise, but would this have any security/privacy implications? There isn't a S&P self assessment, so wasn't able to refer to that.

The other question was around other implementor interest; this isn't the right venue to solicit other implementor interest so we'd like to know if there was anything that was said from other implementors. It seems there are some extensions on WebRTC that only one implementation implement, which added up could result in fragmentation in the long run.

I agree with the comments from @dbaron above on this being better suited for review from the IETF end.

Comment by @cynthia Apr 22, 2020 (See Github)

While we haven't got any answers for the questions we had above - technically we don't have strong opinions on the proposal here.

However, the things that we would like the group to look into are these:

Non-Chromium implementer interest, is it there, and if not, why?
A bit more investigation on possible privacy implications by exposing this information.

The S&P document here: https://www.w3.org/TR/security-privacy-questionnaire/ should give you a rough idea on what we expect from [2]. For the time being, we will be closing this issue - please let us know if you find any potential issues and we can re-open.

Comment by @minyuel May 28, 2020 (See Github)

Thanks for the reply! Sorry for not having checked it earlier.

As I understand it this exposes an integral of the intermediate latencies that it has seen on the communication path to be used as a synchronization hint. (Given the right set of samples it could maybe be used to unmask the actual intermediaries, but that's given the right samples that are short enough in terms of hops) This goes wildly outside of my field of expertise, but would this have any security/privacy implications? There isn't a S&P self assessment, so wasn't able to refer to that.

As it measures latency, the only privacy implication we can think of is that one maybe can infer the location of a participant in a real-time calling, but still very roughly.

The other question was around other implementor interest; this isn't the right venue to solicit other implementor interest so we'd like to know if there was anything that was said from other implementors. It seems there are some extensions on WebRTC that only one implementation implement, which added up could result in fragmentation in the long run.

The proposal has been presented to W3C WG and browser implementers. While there is no signal from other implementers to implement this yet, the proposal has been reviewed without concerns raised.

It is true that some WebRTC extensions (including this one) are not yet standardized. Some popular applications are using them to improve their quality, and WebRTC also needs to learn from these applications to decide the next step, and normally that takes time.

I agree with the comments from @dbaron above on this being better suited for review from the IETF end.