#366: JS Self-Profiling API

Visit on Github

Opened Apr 25, 2019

Góðan dag TAG!

I'm requesting a TAG review of:

Further details (optional):

You should also know that...

  • An experimental Blink/V8 implementation is in the works.
  • I'd be interested in feedback on whether we should work on optimizing a structurally compressed JS object profile format, or allow traces to be returned GZIP'd in an ArrayBuffer. The latter is much more viable for transmission over the wire, but harder to work with on the client side.
    • The mean GZIP compression ratio in our testing was 7.5.
  • Additionally, it would be great to get feedback on whether or not we should set a maximum sample count for each profiler, or maximum trace duration. The latter seems easier to reason about.

We'd prefer the TAG provide feedback as (please select one):

  • open issues in our GitHub repo for each point of feedback
  • open a single issue in our GitHub repo for the entire review
  • leave review feedback as a comment in this issue and @-notify @acomminos

Thanks!

Discussions

Comment by @annevk May 17, 2019 (See Github)

What's the plan for standardizing the profile format across engines?

Comment by @acomminos May 17, 2019 (See Github)

What's the plan for standardizing the profile format across engines?

For function labels, we want to leverage the engine-set "name" property of captured functions from ECMA-262 wherever possible. Function line/column/resource info will be specified normatively, which will be critical to ensure compatibility with sourcemaps.

On the topic of the inclusion of functions, whether or not we should allow some inlined/optimized functions to be omitted from traces is an open question- folks from Apple preferred this, but it may make it more difficult to aggregate traces across different browsers/JS engines.

Comment by @lknik May 22, 2019 (See Github)

That's an interesting but wondering if it might result in some unexpected leaks around CPU speed, etc. I'm happy you already mention timing. Let's look in more detail?

Comment by @lknik May 22, 2019 (See Github)

Another question, what's the intended use case. Do you foresee it used against live users, like in A/B testing (without the user being aware), or perhaps in a more testing-oriented environment in developer settings? Did you consider using permissions?

Comment by @acomminos May 28, 2019 (See Github)

That's an interesting but wondering if it might result in some unexpected leaks around CPU speed, etc. I'm happy you already mention timing. Let's look in more detail?

Absolutely. We don't anticipate this providing more timing information than SharedArrayBuffer/performance.now() (see slides here regarding a polyfill), but UA-specific mitigations may be necessary, as with any timing source.

Another question, what's the intended use case. Do you foresee it used against live users, like in A/B testing (without the user being aware), or perhaps in a more testing-oriented environment in developer settings? Did you consider using permissions?

The intended use case is for live users in the wild, to get representative traces. Permissions aren't a great fit for this use case, particularly because that biases sampling (limiting the ability to collect profiles from the first load, as well as selecting for users with a higher propensity to accept permissions).

Comment by @lknik Jun 14, 2019 (See Github)

Absolutely. We don't anticipate this providing more timing information than SharedArrayBuffer/performance.now() (see slides here regarding a polyfill), but UA-specific mitigations may be necessary, as with any timing source.

How about timing the renderer/etc areas? It's not quite the same concern as with SharedArrayBuffer / performance.now().

The intended use case is for live users in the wild, to get representative traces. Permissions aren't a great fit for this use case, particularly because that biases sampling (limiting the ability to collect profiles from the first load, as well as selecting for users with a higher propensity to accept permissions).

Comment by @dbaron Sep 12, 2019 (See Github)

@kenchris and I are looking at this in a breakout at our Tokyo face-to-face.

First thought:

I'd be interested in feedback on whether we should work on optimizing a structurally compressed JS object profile format, or allow traces to be returned GZIP'd in an ArrayBuffer. The latter is much more viable for transmission over the wire, but harder to work with on the client side.

We both think that, even if doing the compression for sending is a little difficult right now, that's a platform gap that's likely to be fixed in the relatively near future (through work in or related to the WHATWG Streams spec, like the work discussed in w3ctag/design-reviews#410), so it's probably better to do the easy-to-use thing, with the knowledge that being able to do compression in client JS should become easier pretty soon.

Additionally, it would be great to get feedback on whether or not we should set a maximum sample count for each profiler, or maximum trace duration. The latter seems easier to reason about.

I don't think we have thoughts on this -- but maybe we would if you described what the advantages and disadvantages were a little more clearly.

One additional issue is that @kenchris was concerned about was what the effects on the users randomly chosen for this sort of profiling would be -- it seems like it could have a serious effect on their battery life, etc. (On the flip side, many web features can use a lot of battery life.) It's something worth thinking about, though probably not a blocking issue. (Should there be any interactions with, say, battery saver mode? Though if there were that might pose privacy concerns.)

Comment by @dbaron Sep 12, 2019 (See Github)

BTW, one concern I heard from colleagues is that they'd like to avoid content depending on what stack frames look like (which is already apparently a problem with Error.stack) since they'd like to be able to change/improve that over time. Having content depend on it means that, even though the spec says implementations can do what they want, they end up being unable to change things. It's possible this might be an argument for going the other way on compression -- although on the other hand doing un-gzipping is likely to be pretty trivial on the client soon, so it's likely not a very strong argument.

Discussed Oct 2, 2019 (See Github)

Kenneth: We gave a lot of comments here but haven't heard back yet.

David: My co-workers had some more feedback, should probably revise my comments.

Comment by @dbaron Oct 2, 2019 (See Github)

... though (continuing from the previous comment) this is probably actually less of an issue because the API is asynchronous, so it's less likely to result in bad dependencies.

Also, more generally, curious if @acomminos has thoughts on any of the various comments above.

Comment by @acomminos Oct 2, 2019 (See Github)

Thank you for the feedback!

We both think that, even if doing the compression for sending is a little difficult right now, that's a platform gap that's likely to be fixed in the relatively near future [...]

Yes! The work on stream-based compression will complement this excellently.

Additionally, it would be great to get feedback on whether or not we should set a maximum sample count for each profiler, or maximum trace duration. The latter seems easier to reason about.

I don't think we have thoughts on this -- but maybe we would if you described what the advantages and disadvantages were a little more clearly.

We ended up going for a sample count maximum in the latest proposal, for the reasons that memory usage and ingestion pipelines are likely to be more sensitive to the number of samples rather than the duration of time spent during sampling (especially with a low sample rate).

One additional issue is that @kenchris was concerned about was what the effects on the users randomly chosen for this sort of profiling would be -- it seems like it could have a serious effect on their battery life, etc. (On the flip side, many web features can use a lot of battery life.)

That would depend pretty heavily on the VM's architecture, as well as the sample intervals the user agent chooses to support. IIRC, in V8 we're seeing about ~4% of CPU cycles spent during sampling with a 10ms sample rate.

One note of interest- @toddreifsteck mentioned at the TPAC Web Perf F2F that they had achieved a 1% overhead in Chakra's sampling profiler with a 1ms sample rate.

BTW, one concern I heard from colleagues is that they'd like to avoid content depending on what stack frames look like (which is already apparently a problem with Error.stack) since they'd like to be able to change/improve that over time.

Recently the spec has been updated to require that traces return the ECMA-262 defined "name" property of function objects after some feedback for more rigorous specification from implementors. These are fairly well-specified, although lack some detail for object members.

In practice, I don't suspect that the function names returned by traces will be terribly useful- more often than not, they'll likely be unminified via sourcemaps on the server, and used as a backup when sourcemap lookup fails.

Comment by @kenchris Dec 16, 2019 (See Github)

Thanks for the answers, this was useful. Thanks for flying TAG!