-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NE-1802: Bump Golang, k8s.io, OpenShift API, and controller-runtime #225
Conversation
@alebedev87: This pull request references NE-1802 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
db88266
to
3f92d3b
Compare
|
/retest |
3f92d3b
to
e9c00b9
Compare
@alebedev87: This pull request references NE-1802 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Commits reorganized. |
/retest Azure installation failed. |
/assign @gcs278 |
@alebedev87: This pull request references NE-1802 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@@ -1,4 +1,4 @@ | |||
build_root_image: | |||
name: release | |||
namespace: openshift | |||
tag: rhel-8-release-golang-1.19-openshift-4.12 | |||
tag: rhel-8-release-golang-1.22-openshift-4.17 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you be using rhel-9 in the name instead of rhel-8 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of postponing the migration to RHEL9 to not put all the eggs into one basket (rebase from upstream is huge plus this bump turned into a big one). Let's see if we'll have enough time to do RHEL9 migration too. But anyway it will be in a dedicated PR.
/retest |
@alebedev87, a nit pick, but your commands:
Doesn't produce the same versions that you have bumped to. I get
Sorry, I am one of those people that like to reproduce the bump as a sanity check 😄 I think you need to pin your versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No major issue, just questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I have missed a conversation about this, but doesn't splitting the bump & the vendor into two separate commits make the bump commit "broken"?
It could be just my personal philosophy, but I thought best practice was to avoid creating commits that don't compile. Was the motivation just to make reviews easier? I'm open to differing opinions, feel free to disagree, just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminded me of this discussion we had in C-I-O. I think I keep trying to stick to what I said in the discussion: put the vendor dir into a dedicated commit if either is true:
- it's more convenient for the reviewer
- the PR will be backported (vendored files may cause a lot of cherry-pick conflicts)
Was the motivation just to make reviews easier?
Yes. Will rebase the PR (fixup the vendor commit) once ready to move on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a small miscommunication, in the CIO discussion, I was referring to splitting out the vendor & bump commit as one together. I referred to that as the "vendor commit" there.
But in this PR, we have a "bump commit" and "vendor dir commit" separate. By bump commit, I mean the go get ... && go mod tidy
commands (b871828). By "vendor dir commit", I mean go mod vendor
(67ec0f2).
Will rebase the PR (fixup the vendor commit) once ready to move on.
What does that mean exactly? Are you saying your going to combine your b871828 67ec0f2 commits before merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a small miscommunication, in the openshift/cluster-ingress-operator#1097 (comment), I was referring to splitting out the vendor & bump commit as one together. I referred to that as the "vendor commit" there.
But in this PR, we have a "bump commit" and "vendor dir commit" separate. By bump commit, I mean the go get ... && go mod tidy commands (b871828). By "vendor dir commit", I mean go mod vendor (67ec0f2).
Got it now. Yes, I understood the vendor commit as a commit with vendor dir only.
Will rebase the PR (fixup the vendor commit) once ready to move on.
What does that mean exactly? Are you saying your going to combine your b871828 67ec0f2 commits before merging?
Yes, will leave the PR will 2 commits: "bump+vendor dir" and "golangci-lint commit".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I think we are on the same page now. Thanks!
Dockerfile.openshift
Outdated
WORKDIR /external-dns-operator | ||
COPY . . | ||
RUN make build-operator | ||
|
||
FROM registry.ci.openshift.org/ocp/4.12:base | ||
FROM registry.ci.openshift.org/ocp/4.17:base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remind me how this works? I think we still substitute all of the base image no per the release repo?
Regardless, I think at one point we discussed trying to get this hooked up with the Art bot so it stays in sync. I think this won't be very sustainable to keep updating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now external-dns-operator repo controls the base, builder and build root images from the repository. So when we need to bump golang or base image we submit a PR only into this repo. That means less automation from ART indeed. But we disabled it to get rid of openshift release branches created automatically. That was causing a confusion about the adherence of ExternalDNS Operator to OCP lifecycle.
As a matter of fact, Dockerfile.openshift
is not even used anymore. I updated Dockerfile.openshift
just to be consistent but it can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a matter of fact, Dockerfile.openshift is not even used anymore. I updated Dockerfile.openshift just to be consistent but it can be removed.
Oh, so this Dockerfile isn't even used? That makes more sense why this is super outdated. Now I see the regular Dockerfile
uses it's own non-OpenShift images, which makes my question about the ART bumps irrelevant.
Should we just delete it then or does it serve a purpose still?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just delete it then or does it serve a purpose still?
No it doesn't. Let me remove it.
Upd: done.
@alebedev87: This pull request references NE-1802 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@alebedev87: This pull request references NE-1802 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@gcs278 : Pinned the |
e2e TestExternalDNSRecordLifecycle is failing. Let's see if that is intermittent?
|
@@ -211,7 +211,7 @@ endef | |||
.PHONY: lint | |||
## Checks the code with golangci-lint | |||
lint: $(GOLANGCI_LINT_BIN) | |||
$(GOLANGCI_LINT_BIN) run -c .golangci.yaml --deadline=30m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question - is there another way to give this a time limit, now that the deadline
flag is deprecated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's already in the golangci-lint config. I don't see the reason to limit in 2 places, so I removed the flag completely. Also, 30m
was an overkill, this test should not take more than a couple of seconds.
Upd: maybe a "couple of seconds" was a wrong estimation (failed job). Increased to 10 minutes.
KUSTOMIZE := go run sigs.k8s.io/kustomize/kustomize/v4 | ||
K8S_ENVTEST_VERSION := 1.21.4 | ||
KUSTOMIZE := go run sigs.k8s.io/kustomize/kustomize/v5 | ||
K8S_ENVTEST_VERSION := 1.30.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to use 1.30.0 instead of 1.30.3 as is used in the go modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. sigs.k8s.io/controller-runtime/tools/setup-envtest
gets the artifacts from kubebuilder-tools
which used to be published from the kubebuilder repository. Here is the link to all the kubebuilder tools releases available which I found in their latest release. There is no 1.30.3
release, that why I'm getting an error when I'm trying to use 1.30.3
:
$ make test
go run sigs.k8s.io/controller-tools/cmd/controller-gen "crd:preserveUnknownFields=false" rbac:roleName=external-dns-operator webhook paths="./..." output:crd:artifacts:config=config/crd/bases
go run sigs.k8s.io/controller-tools/cmd/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
unable to fetch hash for requested version: unable fetch metadata for kubebuilder-tools-1.30.3-linux-amd64.tar.gz -- got status "404 Not Found" from GCS
I think that we'll have to upgrade to the latest version of sigs.k8s.io/controller-runtime/tools/setup-envtest
and start fetching the envtest binaries from the new location. The problem is that we cannot do it right now as the latest version uses 1.31.z
k8s.io dependencies. So we have to postpone it to the next API bump.
source.Kind[client.Object](operatorCache, &corev1.ConfigMap{}, | ||
&handler.EnqueueRequestForObject{}, | ||
predicate.And(predicate.NewPredicateFuncs(ctrlutils.InNamespace(config.SourceNamespace)), predicate.NewPredicateFuncs(ctrlutils.HasName(config.CAConfigMapName))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: was this manual or automatic? If source.Kind changed this much, it might be better to put all the signature parameters on the same line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this manual or automatic?
Manual, I had to do it myself. As far as I know, no migration tools were provided by controller-runtime
.
If source.Kind changed this much, it might be better to put all the signature parameters on the same line.
Yes, source.Kind
is a generic function now. Wouldn't it be too lengthy on a single line?
The failure here is probably due to a too small DnsPollingTimeout. 3 Minutes isn't enough to provision the LB, much less resolve the DNS name. I saw in Loki that the GCP CCM took 5 minutes to provision the LB. That's quite long, probably an anomaly with the GCP API being slow. It's a flake to keep an eye on. |
@gcs278 : 3 minutes may be too short indeed. Some time ago, I decreased it from 15 minutes to reduce the elapsed time needed for a failed run. I'll try to do the happy medium and will increase it to 7 minutes. |
7ab2709
to
34958da
Compare
Increased the golangci-lint timeout to |
/label px-approved No TE is needed for this feature, only release notes. |
/label docs-approved |
No release note for this specific change as we do not make release notes for bumping kubernetes dependencies. |
All Azure tests failed to get DNS records, not much can be found in Loki. Let's see whether it's a reproducible issue. /retest |
6 similar comments
- Bump Golang to 1.22: - Update go.mod - Update Dockerfiles - Add `CGO_ENABLED=1` to `go test -race` (golang/go#51235). - Bump k8s.io/* modules to 0.30.3 and OpenShift API to the latest. - Bump Controller runtime bumped to 0.18.5: - Controller's `Watch` function now has a single generic source parameter (kubernetes-sigs/controller-runtime#2783). - Manager's `CertDir` option removed, now using the dedicated webhook server option (kubernetes-sigs/controller-runtime#2422). - Cache's `Namespaces` option replaced by `DefaultNamespaces` (kubernetes-sigs/controller-runtime#2421). - Regenerate CRD and bundle manifests: - ExternalDNS API uses `metav1.LabelSelector` for the label filtering. It was updated with `+listType=atomic` marker which resulted in the addition of `x-kubernetes-list-type: atomic` to CRD. - Bump `kustomize` to v5 to fix a conflict caused by k8s.io bumps: - `kyaml` unable to use the bumped `github.com/google/gnostic-models/openapiv2` package.
- Update deprecated fields in `.golangci.yaml`. - Remove deprecated `--deadline` flag. - Replace deprecated `k8s.io/utils/pointer` with new generic `k8s.io/utils/ptr` package.
CI and Makefile use default Dockerfile.
Increase the DNS polling timeout to 7 minutes to prevent false positives on providers with slower load balancer provisioning, such as GCP, where it may take up to 5 minutes.
The RHEL9 migration (openshift/external-dns#61) reset the latest tag for external-dns repository in quay.io.
e506cf5
to
f1b2155
Compare
The problems seen in Azure were related to removed operand image. The external-dns instances could not start. I blame the changes in the image mirroring in release repository (initial script was preserving all unique images) as well as my unsolicited tag I used for the tests (unlikely though). Anyway the operand was migrated to RHEL9 so I rebump the operand image. |
/lgtm |
@alebedev87: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
CGO_ENABLED=1
togo test -race
(doc: document race detector cgo requirements golang/go#51235).Watch
function now has a single generic source parameter(⚠️ Source, Event, Predicate, Handler: Add generics support kubernetes-sigs/controller-runtime#2783).
CertDir
option removed, now using the dedicated webhook server option(⚠ Remove deprecated manager, webhook and cluster options kubernetes-sigs/controller-runtime#2422).
Namespaces
option replaced byDefaultNamespaces
(⚠️ Allow configuring more granular cache filtering kubernetes-sigs/controller-runtime#2421).
kustomize
to v5 to fix a conflict caused by k8s.io bumps:kyaml
unable to use the bumpedgithub.jparrowsec.cn/google/gnostic-models/openapiv2
package.metav1.LabelSelector
for the label filtering.It was updated with
+listType=atomic
marker which resulted inthe addition of
x-kubernetes-list-type: atomic
to CRD.Commands to reproduce the bump: