Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance measurement #9722

Open
sgraband opened this issue Jul 12, 2021 · 3 comments
Open

Add performance measurement #9722

sgraband opened this issue Jul 12, 2021 · 3 comments
Labels
metrics issues related to metrics and logging performance issues related to performance proposal feature proposals (potential future features)

Comments

@sgraband
Copy link
Contributor

Feature Description:

We would like to contribute a mechanism to measure/monitor the system performance, more precisely the startup time of Theia to avoid regressions and as a benchmark for possible improvements.

We currently have a script (using puppeteer) that uses the Google DevTools performance tracing to measure the largest contentful paint (LCP) metric. The script is parameterized and can run the measurement multiple times, if neccessary. The script starts a performance tracing, opens theia and stops the tracing again. This generates a profile file, that contains all events that were captured during the recording. This file can also be imported into the Google DevTools to see a timeline of all events. Then the LCP metric is parsed from the file and written to the console. If there is more than 1 run a mean and standard deviation are calculated and logged as well.

We believe this script is already useful as a stand-lone tool as it allows measuring performance effects/improvements in a consistent way.

As useful extensions, we could also integrate this into the nightly build and in PR builds.
However, in our opinion, hardcoded limits for the measurements should be avoided, as this will lead to a lot of failed builds.
One possible solution to integrate the script into the build could be to run the startup measurement multiple times during the nightly build and keep a history of the results. These numbers can then be used to compare the results of PR builds to see if the performance is affected. For example, startup times that take 20% longer than the mean of the nightly builds could be flagged with a warning.

Any opinions or suggestions? We will first contribute the script and then potential integrations if wanted.

@JonasHelming

@vince-fugnitto vince-fugnitto added metrics issues related to metrics and logging proposal feature proposals (potential future features) labels Jul 12, 2021
@tsmaeder
Copy link
Contributor

Having a single metric that tells you whether your PR makes Theia slower or faster sounds like a very good thing. However, I have a couple of questions:

  1. Is LSP a good metric? Can the user start doing work at the the point of of the LCP? Theia is not a traditional website and probably many things are started lazily or in the background.
  2. What state is the IDE started to? Are there editors open, etc.?
  3. Is back-end startup included in the number?
  4. While browser load time is important, why are we only measuring startup time? Is browser load time a good measure for most PRs?
  5. If running on CI, how do we ensure the results are not skewed by other tasks being run on the same shared infrastructure?
  6. Having a number is useful, but if we get a "yellow card" on our PR, what are the expectations on the developer and how does the developer find out what he needs to do? If we have bad numbers, we should at least provide the trace file.

I'm wondering where you folks are coming from: is startup time a problem in your work? Is VS Code considerably faster, for example?

One thing that would be really cool IMO is to have a suite of common tasks that we time for every release that could be part of the release process.

@sgraband
Copy link
Contributor Author

Thanks for your feedback @tsmaeder! To answer your questions:

Having a single metric that tells you whether your PR makes Theia slower or faster sounds like a very good thing. However, I have a couple of questions:

  1. Is LSP a good metric? Can the user start doing work at the the point of of the LCP? Theia is not a traditional website and probably many things are started lazily or in the background.

The LCP roughly corresponds to the point of time where the Theia loading screen disappears and the application is drawn. So it is basically the point where the user can start doing work minus the drawing.

  1. What state is the IDE started to? Are there editors open, etc.?

Currently we start a basic IDE with an empty workspace and nothing opened. However this can certainly be extended to cover more use cases, like large workspaces or large amount of VS Code extensions.

  1. Is back-end startup included in the number?

No, the back-end startup time is not included. The backend is only measured indirectly in the way it affects the frontend to be slower / faster.

  1. While browser load time is important, why are we only measuring startup time? Is browser load time a good measure for most PRs?

We measure startup time as this is relatively well defined and easy to measure. It doesn't take a lot of time and could therefore be executed with each PR. This way startup time regressions can be detected early. Of course it makes sense to also do additional measurements which can be added later.

  1. If running on CI, how do we ensure the results are not skewed by other tasks being run on the same shared infrastructure?

We can't really prevent the tests being affected by the infrastructure. One mitigation is for example to run the tests many times, clean the results and then take the average. However this is certainly not fool-proof.
Therefore we would like to suggest to not flag builds as failures/unstable when the performance requirement is not met. We should rather just post the number on the PR as information, without leading to failed builds. Ideally the number could also be collected by some dashboard so that it can be tracked over time.
Only after we have some experience of how the performance tests behave in practice and some confidence in the numbers I would start thinking about considering failing a build because of them.

  1. Having a number is useful, but if we get a "yellow card" on our PR, what are the expectations on the developer and how does the developer find out what he needs to do? If we have bad numbers, we should at least provide the trace file.

We would like to suggest that first the performance numbers shall just be shown without any "call-to-action" and ideally should be collected somewhere so they can be tracked. Once there is enough confidence in the stability of them and the community decides that it's worth it one can think about handing out "yellow cards". In that case we should offer: An easy way to re trigger the performance tests (so any fluke can easily be handled with) and the log files so the problem can be analyzed.
In general we definitely want to avoid a poor signal-to-noise ratio. When every second PR is flagged without a reason the flag will just be ignored in practice.

I'm wondering where you folks are coming from: is startup time a problem in your work? Is VS Code considerably faster, for example?

Startup time is a very important metric for user experience and influences the user's perception of the quality of the tool a lot. Also it's easy to measure and therefore a good candidate for the first of hopefully many performance tests.

One thing that would be really cool IMO is to have a suite of common tasks that we time for every release that could be part of the release process.

Yes absolutely. In an ideal world we would have

  • a selection of performance tests which can be executed with each PR without increasing the build time a lot so any regression is hopefully caught early
  • a more complete collection of performance tests executed with each nightly build where build time is not that important
  • a full set of performance tests (maybe the same as nightly) which is checked at least for each release

To summarize:

  • As a first step we just want to provide the performance measurement script (only including startup time measurement) so anybody interested can run the test(s) themselves
  • In the future it definitely makes sense to integrate them also in the CI process, however it's important to reduce the number of false positives as much as possible.
  • Additional performance tests covering much more complex scenarios can be added iteratively

@sdirix
Copy link
Member

sdirix commented Sep 19, 2023

Current state of this issue:

  • A script was contributed to Theia with Add performance measurement #9777 using the LCP metric
  • There is not yet an automatic measurement and logging of Theia builds integrated in the main repository
  • The e2e repository however captures the logs produced during the e2e tests, including startup time logs. See here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metrics issues related to metrics and logging performance issues related to performance proposal feature proposals (potential future features)
Projects
None yet
Development

No branches or pull requests

4 participants