Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve cluster creation success in Korea #1196

Merged
merged 4 commits into from
Dec 1, 2020
Merged

Conversation

jim-minter
Copy link
Member

@jim-minter jim-minter commented Nov 26, 2020

gcs.prod.monitoring.core.windows.net in Korea Central currently has a bad server which when our internet checker hits it causes our cluster creations to fail irrevocably due to golang/go#36026. Workaround.

@github-actions github-actions bot added the needs-rebase branch needs a rebase label Nov 26, 2020
@github-actions
Copy link

Please rebase pull request.

@github-actions github-actions bot removed the needs-rebase branch needs a rebase label Nov 26, 2020
@hawkowl
Copy link
Collaborator

hawkowl commented Nov 26, 2020

Seems good.

I'm not sure about never raising the error from the internet checker in the logs, though -- it seems like it will always requeue it, forever?

@mjudeikis
Copy link
Contributor

If I read this right, it will try 6 times, and again after 1 hours return reconcile.Result{RequeueAfter: time.Hour, Requeue: true}, err So should be acceptable

@mjudeikis
Copy link
Contributor

Seems good.

I'm not sure about never raising the error from the internet checker in the logs, though -- it seems like it will always requeue it, forever?

Is it not a case that requeue on failure will be logged out by the framework?

Copy link
Contributor

@mjudeikis mjudeikis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we might want to change the waiting logic a bit

@@ -59,7 +59,9 @@ func (r *CheckerController) Reconcile(request ctrl.Request) (ctrl.Result, error)
if thisErr != nil {
// do all checks even if there is an error
err = thisErr

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Can't we use err instead of thisErr? The only real error-generating statement this this function is c.Check(ctx) so one variable should be enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good questions, why this is not enough?

for _, c := range r.checkers {
		err := c.Check(ctx)
		if err != nil {
			// do all checks even if there is an error
			if err != errRequeue {
				r.log.Errorf("checker %s failed with %v", c.Name(), err)
			}
		}
	}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's subtle. Consider the following case: checker 0 returns an error, and checker 1 returns nil. In that case by your code we would return foo, nil when we exit the function, but in fact we want to return foo, err.

@@ -63,6 +52,24 @@ func (r *InternetChecker) Name() string {

// Reconcile will keep checking that the cluster can connect to essential services.
func (r *InternetChecker) Check(ctx context.Context) error {
cli := &http.Client{
Transport: &http.Transport{
// We set DisableKeepAlives for two reasons:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good explanation

@jim-minter jim-minter merged commit ec43431 into Azure:master Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants