Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize route backends after module updates #243

Merged
merged 18 commits into from
Jun 12, 2023
Merged

Conversation

CarlAmko
Copy link
Contributor

@CarlAmko CarlAmko commented May 31, 2023

Resolves #208

What

Applying edge routes' backend is now deferred until after route modules are successfully applied. If route modules fail to apply for any reason, the route's backend remains unset, thus closing off access to the edge from that route.

This approach was inspired by Niji's comment in the parent issue.

How

  1. Omit backend from new routes / Remove backend from current routes
  2. Route modules now try to apply
    2a. If route module application was successful, routes are then updated with their specified backend
    2b. If unsuccessful, reconcileRoutes returns an error and the route remains backend-less
  3. Route statuses are updated

Breaking Changes

Edge routes will no longer be valid if an error occurs while applying route modules.

Validation

Given this sample manifest.yml:

Expand

kind: NgrokModuleSet
apiVersion: ingress.k8s.ngrok.com/v1alpha1
metadata:
  name: auth
modules:
  oauth:
    google:
      optionsPassthrough: true
      inactivityTimeout: 10m
      maximumDuration: 24h
      authCheckInterval: 20m
      clientSecret:
        name: auth
        key: client-secret
      emailDomains:
      - ngrok.com
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: game-2048
  annotations:
    k8s.ngrok.com/modules: auth
spec:
  ingressClassName: ngrok
  rules:
    - host: test-ingress-12345.ngrok.app
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: game-2048
                port:
                  number: 80
---
apiVersion: v1
kind: Service
metadata:
  name: game-2048
spec:
  ports:
    - name: http
      port: 80
      targetPort: 80
  selector:
    app: game-2048

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: game-2048
spec:
  replicas: 1
  selector:
    matchLabels:
      app: game-2048
  template:
    metadata:
      labels:
        app: game-2048
    spec:
      containers:
        - name: backend
          image: alexwhen/docker-2048
          ports:
            - name: http
              containerPort: 80

Creating a new route

  1. Apply the manifest kubectl apply -f manifest.yml
  2. Check the ingress controller pod logs kubectl logs ngrok-ingress-controller-kubernetes-ingress-controller-<HASH> -f
  3. Observe for error logs / events suggesting a route module application failure
    image
  4. Attempting to hit the route results in an ngrok 6015 error / dashboard error
    image

image

Updating a current route

  1. Edit the above manifest to remove the clientSecret configuration.
  2. Re-apply the manifest to create a valid route + backend
    image
  3. Now undo the edits and attempt to apply a clientSecret, like above.
  4. This should fail, as there are no secrets loaded into your environment.
  5. Observe that backend is still running with the previous (valid) configuration.

Future Work

  • Check for all route prerequisites before attempting any route updates. This would ideally combine with the current ipPolicies validation so that we can evaluate other module criteria (such as the correctly set secrets) before executing updates.
  • Update this approach to incorporate better wholesale validation before applying updates to minimize downtime. Niji captured some theoretical approaches to solving this.

@github-actions github-actions bot added the area/controller Issues dealing with the controller label May 31, 2023
@CarlAmko CarlAmko force-pushed the carl/208_fail-closed branch from 5b64299 to 8ba1716 Compare May 31, 2023 17:56
@CarlAmko CarlAmko marked this pull request as ready for review May 31, 2023 18:13
@CarlAmko CarlAmko requested a review from a team as a code owner May 31, 2023 18:13
@CK-Ward CK-Ward requested a review from jonstacks June 1, 2023 23:25
Copy link
Contributor

@nikolay-ngrok nikolay-ngrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is working, even tho it has some unexpected errors popping out in the UI

@CarlAmko
Copy link
Contributor Author

CarlAmko commented Jun 2, 2023

it has some unexpected errors popping out in the UI

In the UI? How do you mean? ngrok dashboard?

@nikolay-ngrok
Copy link
Contributor

Yep, the ngrok dashboard. Not necessarily something ppl will look at tho 🤔

@CarlAmko CarlAmko requested a review from nikolay-ngrok June 2, 2023 15:21
@alex-bezek
Copy link
Collaborator

the controller still continues to re-sync on failures at its specified interval

If we get an error that there is no sense in retrying (ie: bad config, real error from ngrok, not just a netwrok flake, etc) you can return a Result object in your reconcile function that specifies not to re-queue it

You could probably create a specific error type https://github.com/ngrok/kubernetes-ingress-controller/blob/main/internal/errors/errors.go and check for that when returning in the reconciler

@alex-bezek
Copy link
Collaborator

@nijikokun 's comment also talked about creating a static response backend to use as the placeholder while updating

Static backend that says 'Initializing edge'

image
This message actually makes some sense and does have our styling vs a static response backend is going to be some pretty simple text. I would just verify with Product that what you have without the custom static response backend is an ok user experience

Copy link
Collaborator

@alex-bezek alex-bezek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of questions and maybe future scope items

@CarlAmko
Copy link
Contributor Author

CarlAmko commented Jun 5, 2023

the controller still continues to re-sync on failures at its specified interval

If we get an error that there is no sense in retrying (ie: bad config, real error from ngrok, not just a netwrok flake, etc) you can return a Result object in your reconcile function that specifies not to re-queue it

You could probably create a specific error type https://github.com/ngrok/kubernetes-ingress-controller/blob/main/internal/errors/errors.go and check for that when returning in the reconciler

Awesome. I'll go ahead and do this. 👍🏼

Copy link
Contributor

@euank euank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for bearing with me on all my comments!

The commits could probably use some rebasing/rewording to make git history read better, but code-wise, this makes sense.

@CarlAmko
Copy link
Contributor Author

CarlAmko commented Jun 9, 2023

Nice, thanks for bearing with me on all my comments!

The commits could probably use some rebasing/rewording to make git history read better, but code-wise, this makes sense.

Yeah, I'm planning to squash them honestly, just because we changed approaches a few times midflight. Agree!

@CarlAmko CarlAmko merged commit 9dac779 into main Jun 12, 2023
@CarlAmko CarlAmko deleted the carl/208_fail-closed branch June 12, 2023 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Issues dealing with the controller
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ingress controller "fails open" if modules have issues
5 participants