Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-datacenter clusters in test runner and script #1881

Merged

Conversation

rzetelskik
Copy link
Member

@rzetelskik rzetelskik commented Apr 4, 2024

Description of your changes: This PR adds support for multi-datacenter clusters in our test runner and scripts.

Which issue is resolved by this Pull Request:
Prerequisite for #1632.

/kind feature
/priority important-longterm
/cc

Copy link
Contributor

@rzetelskik: GitHub didn't allow me to request PR reviews from the following users: rzetelskik.

Note that only scylladb members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Description of your changes: WIP

Which issue is resolved by this Pull Request:
Resolves #

/kind feature
/priority important-longterm
/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@scylla-operator-bot scylla-operator-bot bot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 4, 2024
@rzetelskik rzetelskik force-pushed the multi-region-support branch from 10d5a2a to 9ff84af Compare April 4, 2024 11:45
@rzetelskik rzetelskik changed the title [WIP] Support multi-dc clusters in test runner and script Support multi-dc clusters in test runner and script Apr 4, 2024
@scylla-operator-bot scylla-operator-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 4, 2024
@rzetelskik
Copy link
Member Author

/cc zimnx tnozicka

@scylla-operator-bot scylla-operator-bot bot requested review from tnozicka and zimnx April 4, 2024 12:55
@rzetelskik
Copy link
Member Author

rzetelskik commented Apr 11, 2024

#1525 (comment)

#1694 (comment)

I don't see how these changes could contribute to flakiness, since it's just some changes in machinery, so I'm not investigating.

/test images
/retest

@rzetelskik
Copy link
Member Author

@rzetelskik: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke-parallel 9ff84af link true /test e2e-gke-parallel
Full PR test history. Your PR dashboard.

Cluster provisioning failed.
/test images
/retest

@rzetelskik rzetelskik changed the title Support multi-dc clusters in test runner and script Support multi-datacenter clusters in test runner and script Apr 14, 2024
@rzetelskik rzetelskik force-pushed the multi-region-support branch 2 times, most recently from 325c6b6 to 61326f2 Compare April 14, 2024 10:54
@rzetelskik
Copy link
Member Author

@rzetelskik: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke-serial 61326f2 link true /test e2e-gke-serial
Full PR test history. Your PR dashboard.

Cluster provisioning failed.
/retest

@scylla-operator-bot scylla-operator-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 18, 2024
@rzetelskik rzetelskik force-pushed the multi-region-support branch from 61326f2 to d3b46ef Compare April 18, 2024 08:15
@scylla-operator-bot scylla-operator-bot bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 18, 2024
@rzetelskik
Copy link
Member Author

@rzetelskik: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke-parallel-clusterip d3b46ef link true /test e2e-gke-parallel-clusterip
Full PR test history. Your PR dashboard.

Known Scylla Manager flake. Already working on it, this PR is unrelated.

/retest

@rzetelskik rzetelskik force-pushed the multi-region-support branch 2 times, most recently from 255f6d4 to 9f0d333 Compare April 29, 2024 10:46
@rzetelskik rzetelskik requested a review from zimnx April 29, 2024 10:48
@rzetelskik
Copy link
Member Author

@zimnx thanks for the review, I replied to all of your comments

@rzetelskik rzetelskik force-pushed the multi-region-support branch from 9f0d333 to 66167d5 Compare April 29, 2024 11:21
@rzetelskik rzetelskik changed the title [WIP] Support multi-datacenter clusters in test runner and script Support multi-datacenter clusters in test runner and script May 31, 2024
@scylla-operator-bot scylla-operator-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 31, 2024
@rzetelskik rzetelskik requested a review from tnozicka May 31, 2024 13:04
Copy link
Contributor

@tnozicka tnozicka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

I think this is a good start and it bumps into multiple design issues at once. It brings a lot of value and I think we can iterate on the scripts or framework interfaces in the future, so it's not set in stone. It will be easier to address each of those nits individually, if needed.

/approve
lgtm, but you need to fix the CI failure

@rzetelskik
Copy link
Member Author

lgtm, but you need to fix the CI failure

https://github.com/scylladb/scylla-operator-release/pull/206 sent a PR to fix the typo on the CI side

@tnozicka
Copy link
Contributor

tnozicka commented Jun 3, 2024

@tnozicka
Copy link
Contributor

tnozicka commented Jun 3, 2024

/lgtm

@scylla-operator-bot scylla-operator-bot bot added the lgtm Indicates that a PR is ready to be merged. label Jun 3, 2024
@rzetelskik
Copy link
Member Author

rzetelskik commented Jun 3, 2024

Waiting for cluster to be provisioned...
Cluster provisioning failed. Exiting.
Missing kubeconfigs.
Usage: /usr/bin/bash kubeconfig [kubeconfig ...]

@rzetelskik: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke-release-script-latest d03fb8e link unknown /test e2e-gke-release-script-latest
Full PR test history. Your PR dashboard.

Waiting for cluster to be provisioned...
Cluster provisioning failed. Exiting.
Missing kubeconfigs.
Usage: /usr/bin/bash kubeconfig [kubeconfig ...]

@tnozicka should I fix this in scylla-operator-release (pass kubeconfig to funcs) or make them discover kubeconfigs here?

@rzetelskik rzetelskik force-pushed the multi-region-support branch from d03fb8e to 2d65286 Compare June 3, 2024 08:37
@scylla-operator-bot scylla-operator-bot bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 3, 2024
@tnozicka
Copy link
Contributor

tnozicka commented Jun 3, 2024

I'd use the env vars for KUBECONFIGS to match how everything else handles KUBECONFIG. If you get KUBECONFIGS it wins over a KUBECONFIG, if you get KUBECONFIG, translate it to KUBECONFIGS[0] so you can use KUBECONFIGS consistently, if needed.

@rzetelskik rzetelskik force-pushed the multi-region-support branch 5 times, most recently from 3e5efa0 to 55cdb76 Compare June 3, 2024 11:36
@rzetelskik
Copy link
Member Author

I'd use the env vars for KUBECONFIGS to match how everything else handles KUBECONFIG. If you get KUBECONFIGS it wins over a KUBECONFIG, if you get KUBECONFIG, translate it to KUBECONFIGS[0] so you can use KUBECONFIGS consistently, if needed.

You can't really pass/test arrays as env vars that way, so best I could do here is to do this for KUBECONFIG_DIR on sourcing e2e lib. Unless you know a reasonable workaround for this.

@rzetelskik rzetelskik requested a review from tnozicka June 3, 2024 11:36
@rzetelskik rzetelskik force-pushed the multi-region-support branch from 55cdb76 to 4d4cf5c Compare June 3, 2024 12:14
@tnozicka
Copy link
Contributor

tnozicka commented Jun 3, 2024

You can't really pass/test arrays as env vars that way, so best I could do here is to do this for KUBECONFIG_DIR on sourcing e2e lib.

sounds good

@rzetelskik
Copy link
Member Author

rzetelskik commented Jun 3, 2024

You can't really pass/test arrays as env vars that way, so best I could do here is to do this for KUBECONFIG_DIR on sourcing e2e lib.
sounds good

ok, done

@tnozicka I realised I haven't passed kubeconfigs to the e2e pod in this PR. Should we land this as a starting point regardless? As I'm trying to run a multi-dc e2e test I'll probably bump into some other issues, but I think the baseline for framework etc is solid.

@tnozicka
Copy link
Contributor

tnozicka commented Jun 3, 2024

I am fine with followups

/approve
/lgtm

@scylla-operator-bot scylla-operator-bot bot added the lgtm Indicates that a PR is ready to be merged. label Jun 3, 2024
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rzetelskik, tnozicka, zimnx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rzetelskik
Copy link
Member Author

/hold cancel

@scylla-operator-bot scylla-operator-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 3, 2024
@scylla-operator-bot scylla-operator-bot bot merged commit a6cb614 into scylladb:master Jun 3, 2024
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants