Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve End-to-End Test Management and Upgrade/Downgrade Testing #16365

Closed
systay opened this issue Jul 11, 2024 · 4 comments · Fixed by #16494
Closed

Improve End-to-End Test Management and Upgrade/Downgrade Testing #16365

systay opened this issue Jul 11, 2024 · 4 comments · Fixed by #16494
Labels
Type: RFC Request For Comment

Comments

@systay
Copy link
Collaborator

systay commented Jul 11, 2024

Introduction

Currently, our process for handling bug fixes and end-to-end tests faces several challenges, particularly with version-specific test skipping and backport management. Although GitHub contains all necessary information about issues, fixes, and backports, our existing system does not efficiently leverage this data, leading to persistent test skipping issues and inadequate upgrade/downgrade testing coverage. This proposal outlines the current issues and suggests improvements to enhance our testing process.

Current System and Identified Issues

In the current system, bug fixes are created against the main branch. Each bug fix should include at least one end-to-end test that should fail without the fix and pass with it. For upgrade/downgrade testing, tests must stop running if the vtgate component in the system is older and doesn't have the fix. This is managed using the utils.SkipIfBinaryIsBelowVersion(t, 21, "vtgate") call in Go. When a bug fix PR is deemed important, a backport label is added, and the bot backports it to all relevant branches.

However, there are three main issues with this approach:

First, the backported PR still contains the SkipIfBinaryIsBelowVersion call, which now references the incorrect version, causing inaccurate skipping of tests. Second, both the main and other branches will continue to skip the test because the SkipIfBinaryIsBelowVersion remains, and no mechanism updates the branches once fixes are merged. Third, upgrade/downgrade testing only tests against the latest release of the branch rather than the current release branch, potentially leading to unexpected issues post-release. In other words - we only test against the release tags, not against the release branches.

To improve the system, consider the following preferable solutions:

End-to-end tests should track which issues they are fixing, potentially through code annotations. By leveraging this tracking information together with the existing data in GitHub about issues, fixes, and backports, we can accurately calculate which tests need to be run. This approach would eliminate the need for any branch to keep track other branches a fix has been ported to, thereby resolving the issues of incorrect test skipping, persistent skips after merges, and inadequate upgrade/downgrade testing coverage.

By implementing these changes, the issues of incorrect test skipping, persistent skips after merges, and inadequate upgrade/downgrade testing coverage can be resolved, leading to a more reliable and maintainable testing process.

@systay systay added Type: RFC Request For Comment Type: Testing and removed Type: Testing labels Jul 11, 2024
@shlomi-noach
Copy link
Contributor

This sounds good. The one thing I'd say is that we need to validate that both the linter (Extracting the annotation) as well as GitHub API - are producing the right result (or some result). So there needs to eb a test that validates that we got something.

@frouioui
Copy link
Member

I like the idea of code annotations, it can even make a test more maintainable as we will, in some ways, link it to a GH Issue/PR.

One thing to take into consideration is that the test will now need to issue, at least, a GET command to GH's API or something else, which can slow down or impact the experience for some users.

Something worth noting, in GitHub Actions, the GitHub API rate limit is set to 1000 requests per hour across all actions. It will be hard to reach that number if only a small amount of tests execute API calls, even in an hour, but something to keep in mind. Link to GH docs here.

@deepthi
Copy link
Member

deepthi commented Jul 17, 2024

There seem to be 2 separate issues

  1. We are skipping some tests in upgrade/downgrade testing depending on the version
  2. Upgrade/downgrade testing is based on the tag of older releases versus the latest commit on an older release branch.

For the first issue

  • is it mostly vtgate tests where we have the problems with skipping tests?
  • why isn't it enough to make sure to run a version of the test that matches the vtgate version? For example, if we are running a test on release-20.0 that is testing downgrading to v19, why should we run the release-20.0 version of a test, which then requires us to conditionalize new tests? why not run the release-19.0 version of the test?
  • so far this example only looks at vtgate. we probably have similar problems with other components that also need to be solved. I'm offering up this as an idea to be explored because it might enable us to remove a large number of if conditions from test code. we'll need to see how applicable it is to other components like vtctld.

@frouioui
Copy link
Member

First Issue

I think it would be fine to run a version of the test that matches the vtgate version. For instance, when downgrading to vtgate N-1 we would checkout to the SHA we are downgrading to and run the tests located on that tag. As stated in #7344 the purpose of the query-serving upgrade/downgrade tests is primarily to make sure the cluster can operate correctly as quoted below, not necessarily that every query have to be compatible between N-1 and N. Which means that by running the same test version as the vtgate version we run would still respect this criteria.

Ability for a cluster to operate correctly in the midst of an upgrade / downgrade. E.g. vtctld and vttablets are still running but vtgate is restarted with the new version.

Applying this new strategy where we checkout to the proper SHA would save us from having to use the Skip methods. And from what I am seeing, we mostly skip tests in vtgate's end-to-end tests (both queries and schema tracking). With the exception of the PRS and Backup tests where we have different assertions based on the version of vttablet and vtctl, and these different assestion can even be removed today as they are based on v16 vs v17.

I am okay with this approach.

Second Issue

For the second issue, from what I am seeing we originally always used the previous release tag, even in the very first PR that added non-automated upgrade/downgrade testing: https://github.com/vitessio/vitess/pull/7294/files#diff-48dcda0e0b100dde692989e67a1ad23e97c3da464035698f53bd60528cf1420c. Which was then carried on by #8471, followed by #9300 and then #9745. We talked about this issue in August 2022 in an offline discussion, snippet below.

- Pros of running with patch release as previous release is that this is the most common use case and we want to exercise that use case

- Cons are that if we break something on release-14.0 branch the downgrade test on main won’t catch it

- Manan is proposing that we also add an upgrade test on 13.0 which tests upgrading to 14.0. That way if we add something on 13.0 that breaks upgrade we’ll find it. Similarly for 14.0 to main

We have implemented what manan proposed, but it still does not solve the issue where we don't know if the next version on release-20.0 will be compatible with the next version on main. I agree it is debatable as we should not introduce breaking changes on release branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: RFC Request For Comment
Projects
None yet
4 participants