Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NE-1802: Bump Golang, k8s.io, OpenShift API, and controller-runtime #225

Merged
merged 5 commits into from
Sep 17, 2024

Conversation

alebedev87
Copy link
Contributor

@alebedev87 alebedev87 commented Aug 13, 2024

Commands to reproduce the bump:

go get k8s.io/[email protected]
go get k8s.io/[email protected]
go get k8s.io/[email protected]
go get k8s.io/[email protected]
go get github.com/openshift/[email protected]
go get sigs.k8s.io/[email protected]
go get sigs.k8s.io/kustomize/kustomize/[email protected]
go mod tidy
go mod vendor

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 13, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 13, 2024

@alebedev87: This pull request references NE-1802 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from gcs278 and Miciah August 13, 2024 14:21
@alebedev87
Copy link
Contributor Author

  • golangci-lint was running out of memory (happens often when the version is out of date). Bumped golangci-lint to the latest version.
  • kustomize was having a conflict with the k8s.io bumps. Bumped kustomize to v5 version.

@alebedev87
Copy link
Contributor Author

Dockerfiles updated with golang 1.22.

@alebedev87
Copy link
Contributor Author

/retest

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 14, 2024

@alebedev87: This pull request references NE-1802 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@alebedev87
Copy link
Contributor Author

Commits reorganized.

@alebedev87
Copy link
Contributor Author

/retest

Azure installation failed.

@candita
Copy link

candita commented Aug 14, 2024

/assign @gcs278
/assign

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 14, 2024

@alebedev87: This pull request references NE-1802 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Commands to reproduce the bump:

go get k8s.io/api
go get k8s.io/client-go
go get k8s.io/apiserver
go get k8s.io/apiextensions-apiserver
go get github.com/openshift/api                                                                                        
go get sigs.k8s.io/controller-runtime
go get sigs.k8s.io/kustomize/kustomize/v5
go mod tidy
go mod vendor

Commands to test the bump:

make bundle build test verify

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@@ -1,4 +1,4 @@
build_root_image:
name: release
namespace: openshift
tag: rhel-8-release-golang-1.19-openshift-4.12
tag: rhel-8-release-golang-1.22-openshift-4.17
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you be using rhel-9 in the name instead of rhel-8 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of postponing the migration to RHEL9 to not put all the eggs into one basket (rebase from upstream is huge plus this bump turned into a big one). Let's see if we'll have enough time to do RHEL9 migration too. But anyway it will be in a dedicated PR.

@candita
Copy link

candita commented Aug 15, 2024

/retest

@gcs278
Copy link
Contributor

gcs278 commented Aug 15, 2024

@alebedev87, a nit pick, but your commands:

go get k8s.io/api
go get k8s.io/client-go
go get k8s.io/apiserver
go get k8s.io/apiextensions-apiserver
go get github.com/openshift/api                                                                                        
go get sigs.k8s.io/controller-runtime
go get sigs.k8s.io/kustomize/kustomize/v5
go mod tidy
go mod vendor

Doesn't produce the same versions that you have bumped to. I get v0.31.0 K8S packages now, logr v1.4.1, and I also get this error:

go: github.com/openshift/external-dns-operator/tools imports
	sigs.k8s.io/kustomize/kustomize/v4 imports
	sigs.k8s.io/kustomize/kustomize/v4/commands imports
	sigs.k8s.io/kustomize/kustomize/v4/commands/create imports
	sigs.k8s.io/kustomize/api/loader: module sigs.k8s.io/kustomize/api@latest found (v0.17.3), but does not contain package sigs.k8s.io/kustomize/api/loader

Sorry, I am one of those people that like to reproduce the bump as a sanity check 😄 I think you need to pin your versions.

Copy link
Contributor

@gcs278 gcs278 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major issue, just questions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I have missed a conversation about this, but doesn't splitting the bump & the vendor into two separate commits make the bump commit "broken"?

It could be just my personal philosophy, but I thought best practice was to avoid creating commits that don't compile. Was the motivation just to make reviews easier? I'm open to differing opinions, feel free to disagree, just curious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminded me of this discussion we had in C-I-O. I think I keep trying to stick to what I said in the discussion: put the vendor dir into a dedicated commit if either is true:

  • it's more convenient for the reviewer
  • the PR will be backported (vendored files may cause a lot of cherry-pick conflicts)

Was the motivation just to make reviews easier?

Yes. Will rebase the PR (fixup the vendor commit) once ready to move on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a small miscommunication, in the CIO discussion, I was referring to splitting out the vendor & bump commit as one together. I referred to that as the "vendor commit" there.

But in this PR, we have a "bump commit" and "vendor dir commit" separate. By bump commit, I mean the go get ... && go mod tidy commands (b871828). By "vendor dir commit", I mean go mod vendor (67ec0f2).

Will rebase the PR (fixup the vendor commit) once ready to move on.

What does that mean exactly? Are you saying your going to combine your b871828 67ec0f2 commits before merging?

Copy link
Contributor Author

@alebedev87 alebedev87 Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a small miscommunication, in the openshift/cluster-ingress-operator#1097 (comment), I was referring to splitting out the vendor & bump commit as one together. I referred to that as the "vendor commit" there.

But in this PR, we have a "bump commit" and "vendor dir commit" separate. By bump commit, I mean the go get ... && go mod tidy commands (b871828). By "vendor dir commit", I mean go mod vendor (67ec0f2).

Got it now. Yes, I understood the vendor commit as a commit with vendor dir only.

Will rebase the PR (fixup the vendor commit) once ready to move on.

What does that mean exactly? Are you saying your going to combine your b871828 67ec0f2 commits before merging?

Yes, will leave the PR will 2 commits: "bump+vendor dir" and "golangci-lint commit".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think we are on the same page now. Thanks!

WORKDIR /external-dns-operator
COPY . .
RUN make build-operator

FROM registry.ci.openshift.org/ocp/4.12:base
FROM registry.ci.openshift.org/ocp/4.17:base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me how this works? I think we still substitute all of the base image no per the release repo?

Regardless, I think at one point we discussed trying to get this hooked up with the Art bot so it stays in sync. I think this won't be very sustainable to keep updating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now external-dns-operator repo controls the base, builder and build root images from the repository. So when we need to bump golang or base image we submit a PR only into this repo. That means less automation from ART indeed. But we disabled it to get rid of openshift release branches created automatically. That was causing a confusion about the adherence of ExternalDNS Operator to OCP lifecycle.

As a matter of fact, Dockerfile.openshift is not even used anymore. I updated Dockerfile.openshift just to be consistent but it can be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a matter of fact, Dockerfile.openshift is not even used anymore. I updated Dockerfile.openshift just to be consistent but it can be removed.

Oh, so this Dockerfile isn't even used? That makes more sense why this is super outdated. Now I see the regular Dockerfile uses it's own non-OpenShift images, which makes my question about the ART bumps irrelevant.

Should we just delete it then or does it serve a purpose still?

Copy link
Contributor Author

@alebedev87 alebedev87 Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just delete it then or does it serve a purpose still?

No it doesn't. Let me remove it.

Upd: done.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 16, 2024

@alebedev87: This pull request references NE-1802 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Commands to reproduce the bump:

go get k8s.io/[email protected]
go get k8s.io/[email protected]
go get k8s.io/[email protected]
go get k8s.io/[email protected]
go get github.com/openshift/[email protected]
go get sigs.k8s.io/[email protected]
go get sigs.k8s.io/kustomize/kustomize/[email protected]
go mod tidy
go mod vendor

Commands to test the bump:

make bundle build test verify

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 16, 2024

@alebedev87: This pull request references NE-1802 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Commands to reproduce the bump:

go get k8s.io/[email protected]
go get k8s.io/[email protected]
go get k8s.io/[email protected]
go get k8s.io/[email protected]
go get github.com/openshift/[email protected]
go get sigs.k8s.io/[email protected]
go get sigs.k8s.io/kustomize/kustomize/[email protected]
go mod tidy
go mod vendor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@alebedev87
Copy link
Contributor Author

alebedev87 commented Aug 16, 2024

@gcs278 : Pinned the go get commands to exact versions. Indeed, 1.31 Kube was released at the same time as I posted the PR.

@candita
Copy link

candita commented Aug 20, 2024

e2e TestExternalDNSRecordLifecycle is failing. Let's see if that is intermittent?
/test e2e-gcp-operator

operator_test.go:212: Ensuring test namespace
operator_test.go:218: Creating credentials secret
operator_test.go:225: Creating external dns instance
operator_test.go:235: Creating source service
common.go:279: Getting IPs of service's load balancer
(repeats)
common.go:279: Getting IPs of service's load balancer
operator_test.go:247: failed to get service IPs external-dns-test/test-service: failed to get loadbalancer IPs for service test-service/external-dns-test: client rate limiter Wait returned an error: context deadline exceeded

--- FAIL: TestExternalDNSRecordLifecycle (180.33s)

@@ -211,7 +211,7 @@ endef
.PHONY: lint
## Checks the code with golangci-lint
lint: $(GOLANGCI_LINT_BIN)
$(GOLANGCI_LINT_BIN) run -c .golangci.yaml --deadline=30m
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - is there another way to give this a time limit, now that the deadline flag is deprecated?

Copy link
Contributor Author

@alebedev87 alebedev87 Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's already in the golangci-lint config. I don't see the reason to limit in 2 places, so I removed the flag completely. Also, 30m was an overkill, this test should not take more than a couple of seconds.

Upd: maybe a "couple of seconds" was a wrong estimation (failed job). Increased to 10 minutes.

KUSTOMIZE := go run sigs.k8s.io/kustomize/kustomize/v4
K8S_ENVTEST_VERSION := 1.21.4
KUSTOMIZE := go run sigs.k8s.io/kustomize/kustomize/v5
K8S_ENVTEST_VERSION := 1.30.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to use 1.30.0 instead of 1.30.3 as is used in the go modules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. sigs.k8s.io/controller-runtime/tools/setup-envtest gets the artifacts from kubebuilder-tools which used to be published from the kubebuilder repository. Here is the link to all the kubebuilder tools releases available which I found in their latest release. There is no 1.30.3 release, that why I'm getting an error when I'm trying to use 1.30.3:

$ make test
go run sigs.k8s.io/controller-tools/cmd/controller-gen "crd:preserveUnknownFields=false" rbac:roleName=external-dns-operator webhook paths="./..." output:crd:artifacts:config=config/crd/bases
go run sigs.k8s.io/controller-tools/cmd/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
unable to fetch hash for requested version: unable fetch metadata for kubebuilder-tools-1.30.3-linux-amd64.tar.gz -- got status "404 Not Found" from GCS

I think that we'll have to upgrade to the latest version of sigs.k8s.io/controller-runtime/tools/setup-envtest and start fetching the envtest binaries from the new location. The problem is that we cannot do it right now as the latest version uses 1.31.z k8s.io dependencies. So we have to postpone it to the next API bump.

Comment on lines +87 to +89
source.Kind[client.Object](operatorCache, &corev1.ConfigMap{},
&handler.EnqueueRequestForObject{},
predicate.And(predicate.NewPredicateFuncs(ctrlutils.InNamespace(config.SourceNamespace)), predicate.NewPredicateFuncs(ctrlutils.HasName(config.CAConfigMapName))),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: was this manual or automatic? If source.Kind changed this much, it might be better to put all the signature parameters on the same line.

Copy link
Contributor Author

@alebedev87 alebedev87 Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this manual or automatic?

Manual, I had to do it myself. As far as I know, no migration tools were provided by controller-runtime.

If source.Kind changed this much, it might be better to put all the signature parameters on the same line.

Yes, source.Kind is a generic function now. Wouldn't it be too lengthy on a single line?

@gcs278
Copy link
Contributor

gcs278 commented Aug 20, 2024

go get k8s.io/[email protected]
go get k8s.io/[email protected]
go get k8s.io/[email protected]
go get k8s.io/[email protected]
go get github.com/openshift/[email protected]
go get sigs.k8s.io/[email protected]
go get sigs.k8s.io/kustomize/kustomize/[email protected]

The failure here is probably due to a too small DnsPollingTimeout. 3 Minutes isn't enough to provision the LB, much less resolve the DNS name. I saw in Loki that the GCP CCM took 5 minutes to provision the LB. That's quite long, probably an anomaly with the GCP API being slow. It's a flake to keep an eye on.

@alebedev87
Copy link
Contributor Author

@gcs278 : 3 minutes may be too short indeed. Some time ago, I decreased it from 15 minutes to reduce the elapsed time needed for a failed run. I'll try to do the happy medium and will increase it to 7 minutes.

@alebedev87
Copy link
Contributor Author

Increased the golangci-lint timeout to 10m.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 11, 2024
@alebedev87
Copy link
Contributor Author

/label px-approved

No TE is needed for this feature, only release notes.

@openshift-ci openshift-ci bot added the px-approved Signifies that Product Support has signed off on this PR label Sep 12, 2024
@jmanthei
Copy link

/label docs-approved

@openshift-ci openshift-ci bot added the docs-approved Signifies that Docs has signed off on this PR label Sep 13, 2024
@jmanthei
Copy link

/label px-approved

No TE is needed for this feature, only release notes.

No release note for this specific change as we do not make release notes for bumping kubernetes dependencies.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b2a62fe and 2 for PR HEAD e506cf5 in total

@alebedev87
Copy link
Contributor Author

All Azure tests failed to get DNS records, not much can be found in Loki. Let's see whether it's a reproducible issue.

/retest

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b2a62fe and 2 for PR HEAD e506cf5 in total

6 similar comments
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b2a62fe and 2 for PR HEAD e506cf5 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b2a62fe and 2 for PR HEAD e506cf5 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b2a62fe and 2 for PR HEAD e506cf5 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b2a62fe and 2 for PR HEAD e506cf5 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b2a62fe and 2 for PR HEAD e506cf5 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b2a62fe and 2 for PR HEAD e506cf5 in total

- Bump Golang to 1.22:
  - Update go.mod
  - Update Dockerfiles
  - Add `CGO_ENABLED=1` to `go test -race` (golang/go#51235).
- Bump k8s.io/* modules to 0.30.3 and OpenShift API to the latest.
- Bump Controller runtime bumped to 0.18.5:
    - Controller's `Watch` function now has a single generic source parameter
      (kubernetes-sigs/controller-runtime#2783).
    - Manager's `CertDir` option removed, now using the dedicated webhook server option
      (kubernetes-sigs/controller-runtime#2422).
    - Cache's `Namespaces` option replaced by `DefaultNamespaces`
      (kubernetes-sigs/controller-runtime#2421).
- Regenerate CRD and bundle manifests:
    - ExternalDNS API uses `metav1.LabelSelector` for the label filtering.
      It was updated with `+listType=atomic` marker which resulted in
      the addition of `x-kubernetes-list-type: atomic` to CRD.
- Bump `kustomize` to v5 to fix a conflict caused by k8s.io bumps:
    - `kyaml` unable to use the bumped `github.com/google/gnostic-models/openapiv2` package.
- Update deprecated fields in `.golangci.yaml`.
- Remove deprecated `--deadline` flag.
- Replace deprecated `k8s.io/utils/pointer` with new generic `k8s.io/utils/ptr` package.
CI and Makefile use default Dockerfile.
Increase the DNS polling timeout to 7 minutes to prevent false positives on
providers with slower load balancer provisioning, such as GCP, where it may
take up to 5 minutes.
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b2a62fe and 2 for PR HEAD e506cf5 in total

The RHEL9 migration (openshift/external-dns#61)
reset the latest tag for external-dns repository in quay.io.
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 17, 2024
@alebedev87
Copy link
Contributor Author

alebedev87 commented Sep 17, 2024

The problems seen in Azure were related to removed operand image. The external-dns instances could not start. I blame the changes in the image mirroring in release repository (initial script was preserving all unique images) as well as my unsolicited tag I used for the tests (unlikely though).

Anyway the operand was migrated to RHEL9 so I rebump the operand image.

@gcs278
Copy link
Contributor

gcs278 commented Sep 17, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 17, 2024
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b2a62fe and 2 for PR HEAD f1b2155 in total

Copy link
Contributor

openshift-ci bot commented Sep 17, 2024

@alebedev87: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 63fb086 into openshift:main Sep 17, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants