-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run regression tests in github action in kind #2997
Comments
Regression tests are those that require the env variable RUN_KUBE2E_TESTS=1. There are 2 kinds of regression tests found in the test/kube2e/... folder: those that require CLUSTER_LOCK_TESTS=1 set to true, and those that don't. Running the cluster-lock tests in a Github action, along with setup, takes ~ 22m 9s (see: https://github.com/solo-io/gloo/runs/709923886) There are 2 issues with running all regression tests in the github action:
Attempts were made to use https://github.com/nektos/act and reproduce this error locally, but we're using the setup-helm and setup-kind github actions, which are poorly handled because they need the If the end goal here is to improve CI time, we should:
|
I'll leave this issue open, as we still want to figure out how to run the non-cluster-lock regression tests in a Github Action. Here is another example with extra outputs for the regression test failure: https://github.com/solo-io/gloo/pull/3066/checks?check_run_id=711013005 The reproducible error looks like:
One possible cause is that gloo has an internal error:
|
Does this error also occur each time when running locally in KinD, or only in github actions @ashleywang1? |
I'm also really concerned about this flake, with a Github Action only running the cluster-lock regression test case: It ran for 45 minutes, failed, and didn't output any logs: https://github.com/solo-io/gloo/pull/3066/checks?check_run_id=713448708 |
The error with the regression tests turned out to be this:
Here is where the error is coming from https://github.com/grpc/grpc-go/blob/6b9bf4296edc5fae722a5dff887a954ffc599b12/rpc_util.go#L547 After talking with @EItanya, it seemed that we're "somehow using the wrong marshaller or the wrong version of the correct marshaller", specifically, the gogo proto golang lib. The reason for this is because we were running the following setup step:
We noticed a diff in go.mod and go.sum that included protobuf dependencies:
Fixing it to this setup step fixed the issue:
|
reopening as we are moving to kind in stages. right now cluster lock tests are the only ones running in kind |
Is your feature request related to a problem? Please describe.
Allows for clean slate for each CI run
Describe alternatives you've considered
Spin up a new GKE cluster for each CI run
Additional context
Would make CI a lot more resilient
The text was updated successfully, but these errors were encountered: