-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf testing: Cover Site Editor loading time #23842
Conversation
Size Change: -1.02 MB (88%) 🏆 Total Size: 1.15 MB
ℹ️ View Unchanged
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to run perf/e2e tests and watch what they're doing?
For e2e tests, yes, you just have to pass some flags. See here: https://github.com/WordPress/gutenberg/blob/master/docs/contributors/testing-overview.md#end-to-end-testing
e.g. npm run test-e2e:watch -- --puppeteer-interactive
I'm not sure if that also works with the perf test, but since it uses puppeteer, I imagine it would work if you passed that flag to the right script :)
the latter's loading time is 10x the Site Editor's
This does make sense to me since we have significantly fewer dependencies here. We also load less stuff synchronously on load with PHP. So templates are queried from WordPress after the initial dom loads. (I don't think that is the case with the post editor, which probably tests it loading with a lot of blocks initially.)
think it should be fine to merge this branch, as the GH Action should start to work properly after that.
I'm onboard with this so long as we verify that it starts working directly afterwards!
This PR looks pretty good to me 👍 I'll give a ✅ but would recommend waiting to merge for feedback from some other folks (maybe @youknowriad has some thoughts). since my knowledge of this code is limited
|
||
// Measuring loading time | ||
while ( i-- ) { | ||
await page.reload( { waitUntil: [ 'domcontentloaded', 'load' ] } ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we wait until the template has loaded on the page? I think this just measures document load, but there is another delay of a few seconds until the template is loaded from the API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. What would be the best way to implement this? Add a waitForSelector
for a thing that's only in the template? 🤔
packages/e2e-tests/specs/performance/site-editor.performance.test.js
Outdated
Show resolved
Hide resolved
That said, I wonder about the test failing:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should fix the CI before moving forward.
@@ -220,21 +222,32 @@ async function runPerformanceTests( branches, options ) { | |||
log( '>> Starting the WordPress environment' ); | |||
await runShellScript( 'npm run wp-env start', environmentDirectory ); | |||
|
|||
/** @type {Record<string, WPFormattedPerformanceResults>} */ | |||
const testSuites = [ 'post-editor', 'site-editor' ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an argument instead of running all tests with default to post-editor? or do you think we should be running all tests everytime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in the function body of runPerformanceTests
, which is the top-level export of this file (invoked by bin/plugin/cli.js
). If we want to make this an argument, we'd basically have to make it one that's passed by the CLI.
I'm leaning towards always running both perf suites. Their main use case is CI (the GH Action), as this is the most reliable guard against perf regressions, so it should IMO be easy to monitor that. Furthermore, the Site Editor test only covers loading time, which is currently pretty short. We can revisit if we add more coverage and testing times grow too much.
Can you elaborate? Do you mean that it should be failing, since the perf test is throwing an error (that's being silently ignored)? I overall agree, but then I'll also need to fix the underlying issue -- which I think is only present on this branch (but will go away once merged to As an alternative, maybe someone with macOS can run |
ideally yes, at least it shouldn't fail (silently or not). In other PRs, the test seem to be passing properly. (I do see the console.log in the output) |
Oh now, I understand. This is only failing because the file has been renamed in master. This makes me thing, the same will happen when we'll do |
I think the perf command should be using master for the test and the results. At least that's how I wrote it/imagined it. That way it works regardless of the branch you're testing. The branches are only used to setup the WP+gutenberg environment. |
@@ -147,13 +149,13 @@ async function getPerformanceResultsForBranch( | |||
const results = []; | |||
for ( let i = 0; i < 3; i++ ) { | |||
await runShellScript( | |||
'npm run test-performance', | |||
`npm run test-performance -- packages/e2e-tests/specs/performance/${ testSuite }.test.js`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this seems to be the change that's causing issues in the Github action. We're changing the API of npm run test-performance
. Can't we just leave it as is? basically run all performance tests when you do npm run test-performance
? This is documented already in several places and I'm not sure we should be breaking this assumption?
Alternatively, we could have:
npm run test-performance // runs post editor tests
npm run test-performance --suite="posteditor" // runs post editor tests
npm run test-performance --suite="site-editor" // runs site editor tests
This will ensure we don't break the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's only one side of the medal though, since master
only produces one results.json
file for the post editor only, whereas this branch renames that file to post-editor.results.json
, and writes the site editor results to site-editor.results.json
.
In fact, an earlier version of this PR had npm run test-performance
(with no added args), and that caused the GH action to fail when looking for post-editor.results.json
, since master
had only produced results.json
.
I'm not sure we can retain API stability this way, and add test coverage for a different test suite. We'd be extremely limited in our options, and things might end up rather hacky (something like adding site-editor specific fields to results.json
, while retaining the post-editor ones without even renaming them -- sounds like it could become quite messy and hard to understand/maintain/extend quickly).
Overall, I'm not convinced that that's the right approach; the requirement of this kind of API stability basically means we can never significantly change perf tests, and that's just a really tough constraint, since it's really hard to anticipate whatever needs might arise in the future.
Wouldn't it make sense that a branch that changes perf tests uses those tests to run on both that branch, and master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm you're right, I can see that. I guess an alternative here could be to make ./bin/plugin/cli.js
take an optional "branch" where if specified, we run the perf tests from that branch instead of "master" and we could use that argument in the github action? Any thoughts on that?
Regardless, though, if we change the signature of npm run test-performance
we should make sure to update the docs, having a --suite
option or something like that is a bit nicer than having to pass the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, still a few more questions 😅
Mmm you're right, I can see that. I guess an alternative here could be to make
./bin/plugin/cli.js
take an optional "branch" where if specified, we run the perf tests from that branch instead of "master" and we could use that argument in the github action? Any thoughts on that?
This is actually an interface where I am wondering if we need to add more options: Do we need the extra argument? Doesn't it make sense to always use the branch's tests? For most branches, those will be identical to master
's anyway, so it doesn't really make a difference there. But if a branch does modify them, we surely wanna use them, no?
Regardless, though, if we change the signature of
npm run test-performance
we should make sure to update the docs, having a--suite
option or something like that is a bit nicer than having to pass the file.
I don't mind adding that. Personally, I actually like the 'pass-through' approach (i.e. the npm ... -- ...
syntax), mostly for its discoverability -- I didn't really look at any code but just tried it, and it did the right thing. I find that quite intuitive (and it's always the first thing I try when needing to pass a filename to an npm script). I prefer this uniform syntax over the 'nice' one, but I'm happy to go along with the 'nice' one.
Finally: To make the actual change, we'll have to insert the branch's test files into the wp-env container. As I said, this might be kinda painful (or even near-impossible currently) to test on my Linux box, where there's always some permissions issue when attempting to add or change files in those Docker images (and I haven't been able to fully track down that issue for a while -- wp-env works for most purposes for me, just not for stuff that requires write access, e.g. uploading media files). My alternative suggestion would be to merge without those changes, verify that the new perf test behavior works fine (and revert if it doesn't), and tackle that perf test file injection in a separate PR. Would you be okay with that? 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the extra argument?
We need to run the same tests across branches ./bin/plugin/cli.js branchA branchB
runs the master tests in the environment built using branchA
and branchB
and compare both.
I was suggesting something like ./bin/plugin/cli.js branchA branchB --testsBranch branchC
to run the branchC tests in the environment built using branchA
and branchB
and compare both.
In the Github action BranchC will be equal to branchA though but that's fine.
--suite config
I don't feel strongly here, the important thing is that the docs stay correct whichever choice you make.
Finally: To make the actual change, we'll have to insert the branch's test files into the wp-env container.
I think it's just a matter of calling git checkout branchC
at the beginning of the perf
command (when setting up the perf environment directory)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rare situations where we had to make something work in different branches differently came down so far, to selector changes to open the inserter but that doesn't have an impact on any metric.
The metric is simple: - type a character, click on a block, load the page... These are the things that need to be computed similarly across branches and these things don't differ depending on the branch the test is used for. So for me personally, I think the testsBranch argument is doing its work properly so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take your word for it; I don't understand what that work properly is, other than there was a PR where filenames changed and something crashed because of the lack of a tests branch. Thanks for helping clarity this! It's not a big problem that I'm seeing; was mostly just trying to remove things I don't think we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, if we change code that break performance tests, we need to update in the same PR the tests to run both with the new updated code but we also need to ensure the tests continue to run properly if applied to trunk (previous code).
@youknowriad I'm coming back to this and trying to better understand what you wrote here; I'm wondering if you are saying that you suspect that test changes will work in a PR but then fail when run on trunk.
One thing I discovered while working on our test suites is that we don't run tests against the PR's branch, but the "PR's merge branch." The merge branch is a branch which tracks the PR but every time we make a commit it creates a hypothetical merge of that branch tip into the target branch (usually trunk
).
What this means is that if in a PR we make changes to a test suite, then when those changes run in CI and we're relying on $GITHUB_SHA
(as we do), then the tests already incorporate the issues of merging back into trunk
. If the new test files would break in trunk
they would also break on the $GITHUB_SHA
.
I know that at that point in that PR we're comparing two different test suites, but by providing --tests-branch
we're changing the equation from "the results in the PR comparing trunk
to the changes are invalid" into "the results in trunk
comparing one commit to the previous commit are invalid."
Since we only report performance changes in mainline trunk
I'm still struggling to see how --tests-branch
helps us achieve the goal we say it does.
cc: @ockham
But also, say we change the way we compute some metric "typing" or something like that, if we decide to run different tests on different branches (say the branches tests), the results will be completely different between the branches so comparing them wouldn't make sense anymore.
This is another thing I think I've been uncovering as a hidden assumption, that our results are consistent within test runs. Even with running each branch 3x I'm seeing routine variation between the branches on the perf tests, frequently hitting more than 2x difference when no difference exists. I'm going to start trying to gather some statistics on the values from individual PR runs so we can discuss the facts of the measurements we're making, but I know already for a fact that the way we talk about these measurements in the PRs doesn't match the reality I've seen.
For example, I make a change to README.md
and in the performance tests (after 3x runs and taking the median) it shows that my branch loads the editor in half the time as it does in trunk
and yet trunk
is 2x faster when typing. The test code is the same, the code under test is the same, but our test runtime variance is so much wider than the time it takes to run the code we're testing that our results are probably mostly random variation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, I make a change to README.md and in the performance tests (after 3x runs and taking the median) it shows that my branch loads the editor in half the time as it does in trunk and yet trunk is 2x faster when typing. $
If this is happening for any of the metrics we track in https://codehealth.vercel.app then it's a problem yes. But in my experience I only saw it in site editor metrics. These are more recent and we didn't work on their stability yet so I don't think we can trust them just yet. It's just a vague indication for now. But for the post editor metrics tracked on the site, I've not seen any major stability issues like that. I do see small variances (you can look at the graphs) but any important change has correlated well with an impactful commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for some context, @youknowriad, I'm trying to start gathering data on this in #45747 and in at least the first batch of 30 test runs I found that 22 of them showed results at least as different on the type
event (in the post editor) as the reported change from 14.4 to 14.5 was - 10 of which were more than that many ms faster, and 12 of which were more than that many ms slower.
How are you measuring variance and correlation? I have to be missing something because it looks like the data is explained primarily by random variation. That is, if there is an impact, I'd like to know how you were able to pull it out of the noise caused by random variation which seems to have a bigger impact than any reported change.
In the meantime I'll spin up a PR that specifically tests that release and see if I can reproduce the measurements you found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a mention of npm run test-performance
in the testing-overview.md
that needs to be updated, otherwise, this is looking good.
Thanks!
👍 |
Should be ready for final approval 🎉 |
}; | ||
|
||
// Remove results for which we don't have data (and where the statistical functions thus returned NaN or Infinity etc). | ||
const finiteMedians = pickBy( medians, isFinite ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this has an impact on the WPFormattedPerformanceResults
type? things should be made optional there right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I'm doing that here? https://github.com/WordPress/gutenberg/pull/23842/files#diff-f7011184c69b1a3bacc4458a3cc16446R51-R58 😅 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed this somehow 😬 😄
Description
Add some performance test coverage to the site editor (loading time), specifically to avoid perfomance regressions with regard to template resolution and
wp_template
auto-draft creation (see e.g. #23662).Changes some of the underlying performance test framework to allow for multiple test files.
Fixes #23793.
How has this been tested?
npm run test-performance
In order to run the newly added tests interactively, use
npm run test-performance -- site-editor.test.js --puppeteer-interactive
.Types of changes
Increase performance test coverage.
Question
Am I doing this right? Comparing the results file to the Post Editor's, the latter's loading time is 10x the Site Editor's. This seems almost too good to be true (altough it's true that the Site Editor probably loads fewer dependencies than the Post Editor). Is it possible to run perf/e2e tests and watch what they're doing?
Notes
Note that if you inspect this PR's 'Performance Tests' GH Action details, and expand the 'Run the performance tests' section there (and scroll to the bottom), you'll find that it actually fails upon trying to run
npm run test-performance -- packages/e2e-tests/specs/performance/post-editor.performance.test.js
. That is a file that was previously namedperformance.test.js
, and is being renamed by this PR. It is being invoked bybin/plugin/commands/performance.js
, which is also changed accordingly by this PR.IIUC, the reason for the failure is that the GH Action is meant to compare performance for various branches, so it actually clones the
master
branch (which still has the old filenames) and runs the perf tests there (inside of awp-env
container that it spins up for that branch), but still uses the files from this branch to look for those test suite and results files. I don't see an easy fix, since we'd have to inject our updated files from this branch into that container. I think it should be fine to merge this branch, as the GH Action should start to work properly after that.Checklist: