-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add performance measurement #9722
Comments
Having a single metric that tells you whether your PR makes Theia slower or faster sounds like a very good thing. However, I have a couple of questions:
I'm wondering where you folks are coming from: is startup time a problem in your work? Is VS Code considerably faster, for example? One thing that would be really cool IMO is to have a suite of common tasks that we time for every release that could be part of the release process. |
Thanks for your feedback @tsmaeder! To answer your questions:
The LCP roughly corresponds to the point of time where the Theia loading screen disappears and the application is drawn. So it is basically the point where the user can start doing work minus the drawing.
Currently we start a basic IDE with an empty workspace and nothing opened. However this can certainly be extended to cover more use cases, like large workspaces or large amount of VS Code extensions.
No, the back-end startup time is not included. The backend is only measured indirectly in the way it affects the frontend to be slower / faster.
We measure startup time as this is relatively well defined and easy to measure. It doesn't take a lot of time and could therefore be executed with each PR. This way startup time regressions can be detected early. Of course it makes sense to also do additional measurements which can be added later.
We can't really prevent the tests being affected by the infrastructure. One mitigation is for example to run the tests many times, clean the results and then take the average. However this is certainly not fool-proof.
We would like to suggest that first the performance numbers shall just be shown without any "call-to-action" and ideally should be collected somewhere so they can be tracked. Once there is enough confidence in the stability of them and the community decides that it's worth it one can think about handing out "yellow cards". In that case we should offer: An easy way to re trigger the performance tests (so any fluke can easily be handled with) and the log files so the problem can be analyzed.
Startup time is a very important metric for user experience and influences the user's perception of the quality of the tool a lot. Also it's easy to measure and therefore a good candidate for the first of hopefully many performance tests.
Yes absolutely. In an ideal world we would have
To summarize:
|
Current state of this issue:
|
Feature Description:
We would like to contribute a mechanism to measure/monitor the system performance, more precisely the startup time of Theia to avoid regressions and as a benchmark for possible improvements.
We currently have a script (using puppeteer) that uses the Google DevTools performance tracing to measure the largest contentful paint (LCP) metric. The script is parameterized and can run the measurement multiple times, if neccessary. The script starts a performance tracing, opens theia and stops the tracing again. This generates a profile file, that contains all events that were captured during the recording. This file can also be imported into the Google DevTools to see a timeline of all events. Then the LCP metric is parsed from the file and written to the console. If there is more than 1 run a mean and standard deviation are calculated and logged as well.
We believe this script is already useful as a stand-lone tool as it allows measuring performance effects/improvements in a consistent way.
As useful extensions, we could also integrate this into the nightly build and in PR builds.
However, in our opinion, hardcoded limits for the measurements should be avoided, as this will lead to a lot of failed builds.
One possible solution to integrate the script into the build could be to run the startup measurement multiple times during the nightly build and keep a history of the results. These numbers can then be used to compare the results of PR builds to see if the performance is affected. For example, startup times that take 20% longer than the mean of the nightly builds could be flagged with a warning.
Any opinions or suggestions? We will first contribute the script and then potential integrations if wanted.
@JonasHelming
The text was updated successfully, but these errors were encountered: