-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploy Feb 19, 2025 #5377
Merged
Merged
Deploy Feb 19, 2025 #5377
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This just adds a new interface that we can hang functionality off of which is specific to the inverted tree. No functional changes.
This is the main new concept in this PR that allows us to make the inverted tree fast. See the comment above CallNodeInfoInverted in src/types/profile-derived.js for details. The PR is structured as follows: - Implement the suffix order in a brute force manner (this commit). - Use the suffix order to re-implement everything that was using the inverted call node table. - Once nothing is using the inverted call node table directly anymore, make it fast. We make it fast by rewriting the computation of the inverted call node table and of the suffix order so that we only materialize inverted call nodes that are displayed in the call tree, and not for every sample. And we only compute the suffix order to the level of precision needed to have correct ranges for all materialized inverted call nodes.
This function is used by getNativeSymbolsForCallNodeInverted, getStackAddressInfoForCallNodeInverted, and getStackLineInfoForCallNodeInverted. This replaces a call to getStackIndexToCallNodeIndex() with a call to getStackIndexToNonInvertedCallNodeIndex(). It also mostly removes the use of the inverted call node table for this code. (There's still a place that accesses callNodeInfo.getCallNodeTable().depth, but this will be fixed in a later commit.) We want to eliminate all callers to getStackIndexToCallNodeIndex() because we don't want to compute a mapping from non-inverted stack index to inverted call node index upfront.
… order. This replaces a call to getStackIndexToCallNodeIndex() with a call to getStackIndexToNonInvertedCallNodeIndex(). It also removes a call to getCallNodeTable(). And it replaces a SampleIndexToCallNodeIndex mapping with a SampleIndexToNonInvertedCallNodeIndex mapping.
This replaces a call to getStackIndexToCallNodeIndex() with a call to getStackIndexToNonInvertedCallNodeIndex(). It also removes a call to getCallNodeTable().
…ted call nodes. This function is used when hovering or clicking the activity graph. This commit replaces a SampleIndexToCallNodeIndex mapping with a SampleIndexToNonInvertedCallNodeIndex mapping.
The stack chart is always non-inverted, so this commit is functionally neutral. This lets us remove the now-unused function getSampleIndexToCallNodeIndexForFilteredThread.
This removes a few more uses of getCallNodeTable().
This replaces lots of uses of getCallNodeTable() with uses of getNonInvertedCallNodeTable(). It also replaces lots of uses of getStackIndexToCallNodeIndex() with uses of getStackIndexToNonInvertedCallNodeIndex(). We now compute the call tree timings quite differently for inverted mode compared to non-inverted mode. There's one part of the work that's shared: The getCallNodeLeafAndSummary computes the self time for each non-inverted node, and the result is used for both the inverted and the non-inverted call tree timings. The CallTreeTimings Flow type is turned into an enum, with a different type for CallTreeTimingsNonInverted and for CallTreeTimingsInverted. A new implementation for the CallTreeInternal interface is added.
All these places now deal with non-inverted call nodes, and for those, what we meant by "leaf" and by "self" was always the same thing. And I prefer the word "self" because "leaf" usually means "has no children" and that's not the case here. We still use the word "leaf" in many parts of the documentation.
Whether a function recurses (directly or indirectly) is the same in the inverted call node table and in the non-inverted call node table.
This just stops exposing it from the interface. The way we compute it will change in the next commit.
This is the main commit of this PR. Now that nothing is relying on having an inverted call node for each sample, or on having a fully-computed inverted call node table, we can make it so that we only add entries to the inverted call node table when we actually need a node, for example because it was revealed in the call tree. This makes it a lot faster to click the "Invert call stack" checkbox - before this commit, we were computing a lot of inverted call nodes that were never shown to the user. After this commit, CallNodeInfoInvertedImpl no longer inherits from CallNodeInfoImpl - it is now a fully separate implementation. This commit reduces the time spent in `getInvertedCallNodeInfo` on an example profile (https://share.firefox.dev/411Vg2T) from 11 seconds to 718 ms. Before: https://share.firefox.dev/3CTNApp After: https://share.firefox.dev/492F7wl (15x faster)
The new structure gives us a nice guarantee about roots of the inverted tree: There is an inverted root for every func, and their indexes are identical. This makes it really cheap to translate between the call node index and the func index (no conversion or lookup is necessary) and also makes it cheap to check if a node is a root. This commit also replaces a few maps and sets with typed arrays for performance. This is easier now that the root indexes are all contiguous.
This avoids a CompareIC when comparing to null in _createInvertedRootCallNodeTable, because we'll now only be comparing integers. This speeds up _createInvertedRootCallNodeTable by almost 2x.
…ted. This saves a lot of work upfront that's not needed. At any given time, we just need the suffix order to be accurate enough so that the "suffix order index range" for every existing inverted call node is correct. This commit reduces the time spent in `getInvertedCallNodeInfo` + `getChildren` on an example profile (https://share.firefox.dev/411Vg2T) from 721 ms to 20 ms. Before: https://share.firefox.dev/40Wdi6S After: https://share.firefox.dev/3AZjbpg (35x faster)
Co-authored-by: Francesco Lodolo [:flod] <[email protected]> (it)
…iwan) (zh-TW) Co-authored-by: Olvcpr423 <[email protected]> (zh-CN) Co-authored-by: Pin-guang Chen <[email protected]> (zh-TW)
Co-authored-by: Michael Köhler <[email protected]> (de)
Co-authored-by: Mark Heijl <[email protected]> (nl)
Co-authored-by: Valery Ledovskoy <[email protected]> (ru)
Co-authored-by: Jim Spentzos <[email protected]> (el)
Co-authored-by: Melo46 <[email protected]> (ia)
Co-authored-by: Luna Jernberg <[email protected]> (sv-SE) Co-authored-by: Andreas Pettersson <[email protected]> (sv-SE)
… in the flame graph, it'll be non-inverted traced timing.
…er is somewhat arbitrary) as requested in julien's review
…erted call node paths.
…torNode and improve its comments.
Bumps [koa](https://github.com/koajs/koa) from 2.15.3 to 2.15.4. - [Release notes](https://github.com/koajs/koa/releases) - [Changelog](https://github.com/koajs/koa/blob/2.15.4/History.md) - [Commits](koajs/koa@2.15.3...2.15.4) --- updated-dependencies: - dependency-name: koa dependency-type: indirect ...
Fixes #5310. This matches the precision of regular JavaScript numbers. 32-bit floats don't have enough precision to exactly represent integers larger than 16,777,216. So they don't have sufficient precision for the call tree which sometimes is used to with bytes values which can range in the gigabytes.
Fixes #5310. This matches the precision of regular JavaScript numbers. 32-bit floats don't have enough precision to exactly represent integers larger than 16,777,216. So they don't have sufficient precision for the call tree which sometimes is used to with bytes values which can range in the gigabytes.
…gory columns. Fixes #5368.
…gory columns. (#5369) [Production](https://share.firefox.dev/3Ld89id) | [Deploy preview](https://deploy-preview-5369--perf-html.netlify.app/public/z0s8gcsaedxz7e0mpyjz2nd80fbczaz1z6q3syg/flame-graph/?globalTrackOrder=0&hiddenLocalTracksByPid=5262-0&thread=0&timelineType=category&v=10) Fixes #5368.
Updated locales: de, el, en-GB, es-CL, fr, fur, fy-NL, ia, it, nl, pt-BR, ru, sv-SE, tr, uk, zh-CN, zh-TW.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updates:
[Nicolas Chevobbe] Make timeline ruler notches visible in High Contrast Mode (#5346)
[Nazım Can Altınova] Add the ability to mark marker fields as hidden (#5354)
[Maxx Crawford] Update guide-startup-shutdown.md (#5357)
[Nazım Can Altınova] Enable prettier on the docs-user markdown files (#5358)
[Paul Adenot] Allow searching by Content-Type in the network marker view (#5351)
[Florian Quèze] Hide the pid in global tracks if it is 0. (#5361)
[Markus Stange] Make inverting the call tree fast, by computing inverted call nodes lazily (#4900)
[Markus Stange] Use 64-bit floats for call tree timings. (#5371)
[Markus Stange] Extend the workaround in the v53 upgrader to generate missing subcategory columns. (#5369)
Also thanks to our localizers:
de: Michael Köhler
el: Jim Spentzos
en-GB: Paul
es-CL: ravmn
fr: Théo Chevalier
fur: Fabio Tomat
fy-NL: Fjoerfoks
ia: Melo46
it: Francesco Lodolo
nl: Mark Heijl
pt-BR: Marcelo Ghelman
ru: Valery Ledovskoy
sv-SE: Luna Jernberg, Andreas Pettersson
tr: Grk
uk: Іhor Hordiichuk
zh-CN: Olvcpr423
zh-TW: Pin-guang Chen