-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci:[CNI] ACN PR Pipeline speed improvements #2077
Conversation
There is a caveat to this approach in terms of usability. We no longer can simply rerun the E2E test in the pipeline(in case it fails due to transient issue, which it does a lot with goldpinger and stuff). To restart we will need to rerun the cluster step associated with that E2E test. |
.pipelines/singletenancy/cilium-overlay/cilium-overlay-e2e-step-template.yaml
Show resolved
Hide resolved
In another PR we could test using "counter" as a mechanism for people that are re-running tests often within the pipeline. Would need fine tuning, but I believe that deletion should be a priority as the speedups/optimizations are worthless if the cluster sits around for any additional time (possibly 15 days) than needed. |
I agree completely that delete is priority 1. To clarify by “pulling out delete cluster part” i meant removing from respective e2e step template and into a separate template like creat cluster. (And not move out of pipeline) |
c519c45
to
702bb07
Compare
.pipelines/pipeline.yaml
Outdated
|
||
- template: singletenancy/cilium-overlay/cilium-overlay-e2e-job-template.yaml | ||
parameters: | ||
name: "cilium_overlay_cilium_e2e" | ||
displayName: Cilium on AKS Overlay | ||
pipelineBuildImage: "$(BUILD_IMAGE)" | ||
testDropgz: "" | ||
clusterType: overlay-no-kube-proxy-up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cilium-overlay-up
.pipelines/pipeline.yaml
Outdated
|
||
- template: singletenancy/aks-swift/e2e-job-template.yaml | ||
parameters: | ||
name: "aks_swift_e2e" | ||
displayName: AKS Swift Ubuntu | ||
pipelineBuildImage: "$(BUILD_IMAGE)" | ||
testDropgz: "" | ||
clusterType: byocni-up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swift-podsubnet-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can go ahead and rename the first two to cilium-podsubnet-up
and cilium-overlay-up
. I think we could keep this one as byocni-up
, because swift-podsubnet-up
might be misleading. We have an existing target called swift-up
which brings up a swift cluster.
If you don't want to keep byocni-up
-- Maybe we could go with swift-byocni-up
, to clearly indicate that it's byocni but just need to make sure it matches in all other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swift-byocni-up
makes sense
@@ -158,6 +149,9 @@ steps: | |||
- script: | | |||
echo "validate pod IP assignment and check systemd-networkd restart" | |||
kubectl get pod -owide -A | |||
# Deleting echo-external-node deployment until cilium version matches TODO. https://github.com/cilium/cilium-cli/issues/67 is addressing the change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what this test is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It tries to echo (curl in this case) to an external, non-cilium, node. The deployment is stuck in pending due to its node selector looking for "cilium.io/no-schedule": "true". We cannot skip the test at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now its fine, we should contribute to cilium add skip check to this test case @vipul-21
7140910
to
bd0e76d
Compare
8c943e7
to
52d659a
Compare
52d659a
to
e2396eb
Compare
I just discovered this change because I was not getting the clusters I was expecting out of the hack/Makefile anymore. @jpayne3506 it is incredibly confusing and misleading to have |
@rbtr we did raise the same questions during the PR review: #2077 (comment), but afaik @tamilmani1989 wanted to use the |
But again, we might not have needed separate make targets for this anyway since the difference is just the addition of the kube-proxy flag. This is the kind of modification that can be done off of an env var |
@jpayne3506 is going to update the makefile with more explicit names for each target |
thanks! |
* Move Create Cluster * Lower Windows scaleup * Delete echo-external-node for Cilium validate * Change cluster creation from job to stage * Decrease k8utils retry delay time * Adjust submodule pipeline * Remove test comments * Addressing Comments * Add delete cluster template * Single Stage Delete * ACN merge commitID fix * Addressing Comments * Refactor dualstack from #2098
* Move Create Cluster * Lower Windows scaleup * Delete echo-external-node for Cilium validate * Change cluster creation from job to stage * Decrease k8utils retry delay time * Adjust submodule pipeline * Remove test comments * Addressing Comments * Add delete cluster template * Single Stage Delete * ACN merge commitID fix * Addressing Comments * Refactor dualstack from #2098
* ci:[CNI] ACN PR Pipeline speed improvements (#2077) * Move Create Cluster * Lower Windows scaleup * Delete echo-external-node for Cilium validate * Change cluster creation from job to stage * Decrease k8utils retry delay time * Adjust submodule pipeline * Remove test comments * Addressing Comments * Add delete cluster template * Single Stage Delete * ACN merge commitID fix * Addressing Comments * Refactor dualstack from #2098 * ci: Changing Service Connection for PR pipeline (#2153) * ci: change service connection * add: change ACN PR azureSubscription * ci: Replace manual install of kubectl with ADO KubectlInstaller task (#2307) * ci: remove kubectl install * ci: add KubectlInstaller for kubectl * ci: Add k8s conformance tests to Cilium CI/CD (#2348) * ci: add k8s conformance * ci: remove kubetest2 k8se2e * chore: remove test comments * ci: add k8se2e to nightly pipeline * ci: Replace k8s conformance tests within aks-swift CI (#2590) ci: k8se2e for aks-swift * ci: v4overlay conformance test cases (#2274) v4overlay conformance test cases * fix: Match Cilium CLI to Cilium Agent (#2365) * fix: Match Cilium CLI to Cilium Agent * ci: Capture single digit versions * ci: Update makefile cluster creation (#2358) ci: Update aks cluster creation --------- Co-authored-by: Paul Yu <[email protected]>
Reason for Change:
Improves the time it takes to run the ACN PR Pipeline by roughly 30 minutes by:
Issue Fixed:
Requirements:
Notes:
To re-run a failed E2E stage, start from the appropriate cluster creation stage.