[HOLD for payment 2023-01-09] [Tracking] Add performance CI tests #11711

roryabraham · 2022-10-10T21:43:21Z

If you haven’t already, check out our contributing guidelines for onboarding and email [email protected] to request to join our Slack channel!

Action Performed:

Open a pull request in NewDot.

Expected Result:

A suite of performance tests should run, so that we know before a PR is merged, if it creates a statistically significant performance regression.

Actual Result:

We don't have performance CI tests

Workaround:

Just don't break stuff™

Platform:

n/a

View all open jobs on GitHub

roryabraham · 2022-10-10T21:43:58Z

cc @hannojg Can you please comment on this so I can assign it to you

hannojg · 2022-10-11T05:32:20Z

Yes 👋

melvin-bot · 2022-10-14T06:52:18Z

@AndrewGable, @hannojg Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

AndrewGable · 2022-10-17T19:19:03Z

PR in review

AndrewGable · 2022-10-18T22:07:44Z

PR was merged, however, we had a lot of testing to confirm a few questions:

Does the PR work?
Can we get the "Large Runners" working?
Do the "Large Runners" speed things up?
Do the "Large Runners" have KVM support to allow hardware acceleration?

AndrewGable · 2022-10-18T22:10:48Z

After testing via this PR:

PR works, but required some small edits I made here: [No QA]Remove baseline branch as it doesn't exist on remote #11936
Large runners work!
Large runners seem to speed up most things, but not the tests.
Large runners do not seem to have kvm support out of the box, but @hannojg might be able to enable them. We will discuss further.

AndrewGable · 2022-10-20T01:50:30Z

Problem: We are trying to configure cloud machines to run an emulator very quickly to run performance tests. However:

Emulators can be fast, but require hardware acceleration (HAXM on Mac & Windows, KVM on Linux) from the host. This presents a challenge on CI especially when running hardware accelerated emulators within a docker container, because Nested Virtualization must be supported by the host VM which isn't the case for most cloud-based CI providers due to infrastructural limits. (source)

Solution: Skip running the tests on emulators on CI machines and run the tests on physical devices via AWS Device Farm.

I'm still validating this idea is possible, but I've confirmed we can at least start the tests via AWS Device Farm. I think the real question will be if they are fast enough to get reliable results.

Once we prove the POC, we can hook it up to realm/aws-devicefarm which provides an easy to use GitHub Actions workflow.

AndrewGable · 2022-10-20T18:25:11Z

Discussing in Slack with @hannojg - But I was unable to get the custom node http server working on the aws device farm machines. I think looking to see if Appium can support our tests might be the best next move since they are supported out of the box by AWS device farm.

AndrewGable · 2022-10-26T22:02:35Z

I've reached out to AWS Device Farm support via a support case, I have chatted with their support on the phone and provided code details. They are circling back internally and I will let you know when I hear back from them.

hannojg · 2022-10-30T11:20:56Z

My guess after trying hard yesterday to make our simple HTTP server impl working is that it would work with a "private device" for which we can setup a VPC https://docs.aws.amazon.com/devicefarm/latest/developerguide/amazon-vpc-endpoints.html.

However, that might not be feasible as setup, I am currently migrating to appium.

AndrewGable · 2022-10-31T17:50:35Z

@hannojg and I are getting on a call with AWS device farm team today

AndrewGable · 2022-10-31T19:20:04Z

Update from our call with AWS Device farm:

They gave us a code snippet that enables us to run the Android tests as we'd like 🎉 I am working on getting the tests working as originally written now. If I cannot get them to work today, I will hand this off to @hannojg for tomorrow.
AWS Device farm let us know however that the way we are running these tests will not be able to be run on iOS devices. This is because iOS restricts the ability to manipulate the network on iOS devices starting on iOS 12 (very old OS version).
I would think we can start with Android devices as they tend to be slower, than if we want to build this out we can look at their recommended solution (VPC).

AndrewGable · 2022-10-31T19:51:51Z

We had our first successful run via AWS device farm for Android e2e tests 🎉 Now we just need to improve and tweak things.

❇️  Performance comparison results:

➡️  Significant changes to duration

➡️  Meaningless changes to duration
 - TTI: 1560.9 ms → 1605.7 ms (+44.8 ms, +2.9%) 
 - runJsBundle: 232.0 ms → 241.4 ms (+9.4 ms, +4.0%) 
 - regularAppStart: 0.0 ms → 0.0 ms (+0.0 ms, +8.5%) 
 - nativeLaunch: 16.7 ms → 15.9 ms (-0.8 ms, -4.7%)

AndrewGable · 2022-11-03T01:06:13Z

Here is our working list:

Remove screen recording code (AWS Device farm does this for us now)
Hook up CI/CD with GitHub actions via https://github.com/realm/aws-devicefarm
Figure out how to report results from device farm back to GitHub actions
Should we hook it up to https://github.com/bamlab/android-performance-profiler/ ?

I've done 1 & 2 on this branch/fork: https://github.com/AndrewGable/App/pull/1

I think the last part of this will be figuring out how to configure AWS to report back the results. I believe this can be done using the "artifacts" as mentioned here: https://github.com/realm/aws-devicefarm#download-artifacts

AndrewGable · 2022-11-03T03:36:01Z

Here is an example of a passing test run: https://github.com/AndrewGable/App/actions/runs/3382629912/jobs/5617747052

It took 45 minutes. I think we can definitely figure out how to get this working more efficiently, but we can focus on that after step 3 & 4 are done.

AndrewGable · 2022-11-11T00:29:05Z

@rafecolton enabled the large runners again for us, seeing how fast the tests are now 🚀

AndrewGable · 2022-11-11T00:56:08Z

Large runner results:

First APK took ~9 minutes (compared to ~17 minutes to build)
Second APK took ~4 minutes (compared to ~7 minutes to build)
Tests take about ~14 minutes (compared to ~15 minutes)

Total time ~30 minutes vs ~45 minutes

JmillsExpensify · 2022-11-11T02:38:59Z

Nice improvement!

AndrewGable · 2022-11-11T17:05:02Z

I still worry this is too long to roll out in it's current iteration, our current tests run in under 5 minutes. I don't think we can expect to run these tests in that time, but I think we need to get them under a certain amount of time or increase their value.

JmillsExpensify · 2022-11-11T19:05:53Z

Totally. So what do you think that number is out of curiosity? 10? 15? Something else?

AndrewGable · 2022-11-11T19:13:14Z

If we are going to run it on every commit, I think ~10 minutes would be OK. I don't know if it's realistic we will ever be able to speed these tests up that much so I think we might have to do something "out of the box".

One idea is we could run these tests on merge, then report the results back to the PR. If they introduce a performance regression then we could revert the PR, or something similar. Otherwise we could run these via a label, or only on large PRs, etc. I don't have an exact proposed solution yet, but curious for thoughts from @Expensify/mobile-deployers @marcaaron ?

JmillsExpensify · 2022-11-11T19:18:19Z

One idea is we could run these tests on merge, then report the results back to the PR. If they introduce a performance regression then we could revert the PR, or something similar.

I would love to have the most coverage possible, so this is an ideal place to start. Given that time between merge and QA is most cases, the 10 min "delay" also feels OK. In any case, not the real decision maker in this department. 😄

AndrewGable · 2022-11-11T19:48:08Z

Maybe a more refined proposal would be:

Run any long running tests after merge in the "Process new code merged to main" workflow
When the perf tests finish, add a comment to the PR mentioning the results
If there is a regression found ("Significant Changes To Duration") then add the deploy blocker label to the PR with a comment like "There was a performance regression found on this PR, please investigate with priority"
This will be added to the deploy checklist and we will not deploy until it's resolved

roryabraham · 2022-11-11T20:05:03Z

Sounds good @AndrewGable, but can we please make sure that, unlike these post-merge test and lint jobs we already have, the long-running tests do not block the deploy until they finish? i.e: assume that PRs do not introduce a performance regression, deploy them to staging, and then only if they do not pass post a comment and mark them as a deploy blocker

AndrewGable · 2022-11-12T00:55:44Z

End of week update: We have a workable solution here, but still talking with AWS to juice up the device farm side of things.

AndrewGable · 2022-11-14T23:42:57Z

I am attempting to split the e2e tests up from the rest of the source code to reduce the size of the ZIP we send to the Android phone via AWS. However, I have been blocked today by a broken Android build.

AndrewGable · 2022-11-15T22:59:31Z

I was able to squeeze some more performance improvements out of the test, total time is now ~23 minutes:

First APK took ~9 minutes (compared to ~17 minutes to build)
Second APK took ~4 minutes (compared to ~7 minutes to build)
Tests take about ~9 minutes (compared to ~15 minutes)

I will work on the final piece which is running this in the "Process new code merged to main" workflow.

marcaaron · 2022-11-15T23:29:12Z

One idea is we could run these tests on merge, then report the results back to the PR. If they introduce a performance regression then we could revert the PR, or something similar.

Chiming in late, but this seems reasonable to me. Another option could be to require the test in CI but manually trigger it with a comment like "I am ready for my performance test now". Maybe that is too manual 😂

AndrewGable · 2022-11-16T02:18:40Z

Running into an issue with linting as described here: SchemaStore/schemastore#2579

AndrewGable · 2022-11-29T00:43:11Z

Trying to finish this right now, next steps are:

Call from external workflow (preDeploy.yml)
Leave the deploy blocker label when there is Significant Changes To Duration

AndrewGable · 2022-11-29T01:58:20Z

PR finally in review 😮‍💨

JmillsExpensify · 2022-11-29T06:30:56Z

Woo! P.S. I didn't cover this in the regular open source update today but was going to profile it in the margelo room tomorrow.

AndrewGable · 2022-12-13T23:14:03Z

Nice, confirmed these are now running on merge into main: #13570 (comment)

AndrewGable · 2022-12-30T17:59:25Z

These are running, there are some further separate improvements to make (e.g. adding more tests and mocking API responses), but I am going to consider this one done for now.

melvin-bot · 2023-01-02T17:22:30Z

The solution for this issue has been 🚀 deployed to production 🚀 in version 1.2.46-0 and is now subject to a 7-day regression period 📆. Here is the list of pull requests that resolve this issue:

dev: perf regression test configuration #13194

If no regressions arise, payment will be issued on 2023-01-09. 🎊

After the hold period, please check if any of the following need payment for this issue, and if so check them off after paying:

External issue reporter
Contributor that fixed the issue
Contributor+ that helped on the issue and/or PR

As a reminder, here are the bonuses/penalties that should be applied for any External issue:

Merged PR within 3 business days of assignment - 50% bonus
Merged PR more than 9 business days after assignment - 50% penalty

roryabraham added Engineering Daily KSv2 NewFeature Something to build that is a new item. labels Oct 10, 2022

roryabraham mentioned this issue Oct 10, 2022

[Tracking] [Performance] Margelo/Expensify - Performance Improvements #10962

Closed

AndrewGable self-assigned this Oct 10, 2022

AndrewGable assigned hannojg Oct 11, 2022

alex-mechler mentioned this issue Oct 11, 2022

dev: CI performance regression tests #11716

Merged

91 tasks

melvin-bot bot added the Overdue label Oct 12, 2022

AndrewGable added the Reviewing Has a PR in review label Oct 17, 2022

melvin-bot bot removed the Overdue label Oct 17, 2022

JmillsExpensify changed the title ~~Add performance CI tests~~ [Tracking] Add performance CI tests Oct 19, 2022

AndrewGable added Weekly KSv2 and removed Reviewing Has a PR in review Daily KSv2 labels Oct 21, 2022

hannojg mentioned this issue Oct 30, 2022

[WIP] ci/perf tests with appium rewrite #12286

Closed

92 tasks

melvin-bot bot added the Overdue label Nov 24, 2022

melvin-bot bot removed the Overdue label Nov 29, 2022

AndrewGable mentioned this issue Nov 29, 2022

Run e2e performance regression tests on merge into main via AWS Device farm #12320

Merged

96 tasks

AndrewGable added the Reviewing Has a PR in review label Nov 29, 2022

AndrewGable mentioned this issue Dec 13, 2022

dev: perf regression test configuration #13194

Merged

51 tasks

AndrewGable closed this as completed Dec 30, 2022

melvin-bot bot added Weekly KSv2 Awaiting Payment Auto-added when associated PR is deployed to production and removed Weekly KSv2 labels Jan 2, 2023

melvin-bot bot changed the title ~~[Tracking] Add performance CI tests~~ [HOLD for payment 2023-01-09] [Tracking] Add performance CI tests Jan 2, 2023

[HOLD for payment 2023-01-09] [Tracking] Add performance CI tests #11711

[HOLD for payment 2023-01-09] [Tracking] Add performance CI tests #11711

Comments

roryabraham commented Oct 10, 2022

Action Performed:

Expected Result:

Actual Result:

Workaround:

Platform:

roryabraham commented Oct 10, 2022

hannojg commented Oct 11, 2022

melvin-bot bot commented Oct 14, 2022

AndrewGable commented Oct 17, 2022

AndrewGable commented Oct 18, 2022

AndrewGable commented Oct 18, 2022

AndrewGable commented Oct 20, 2022

AndrewGable commented Oct 20, 2022

AndrewGable commented Oct 26, 2022

hannojg commented Oct 30, 2022

AndrewGable commented Oct 31, 2022

AndrewGable commented Oct 31, 2022

AndrewGable commented Oct 31, 2022

AndrewGable commented Nov 3, 2022 • edited Loading

AndrewGable commented Nov 3, 2022

AndrewGable commented Nov 11, 2022

AndrewGable commented Nov 11, 2022

JmillsExpensify commented Nov 11, 2022

AndrewGable commented Nov 11, 2022

JmillsExpensify commented Nov 11, 2022

AndrewGable commented Nov 11, 2022

JmillsExpensify commented Nov 11, 2022

AndrewGable commented Nov 11, 2022

roryabraham commented Nov 11, 2022

AndrewGable commented Nov 12, 2022

AndrewGable commented Nov 14, 2022

AndrewGable commented Nov 15, 2022

marcaaron commented Nov 15, 2022

AndrewGable commented Nov 16, 2022

AndrewGable commented Nov 29, 2022 • edited Loading

AndrewGable commented Nov 29, 2022

JmillsExpensify commented Nov 29, 2022

AndrewGable commented Dec 13, 2022

AndrewGable commented Dec 30, 2022

melvin-bot bot commented Jan 2, 2023

AndrewGable commented Nov 3, 2022 •

edited

Loading

AndrewGable commented Nov 29, 2022 •

edited

Loading