Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compactor went into tailspin #3914

Closed
bboreham opened this issue Mar 5, 2021 · 7 comments · Fixed by #4262
Closed

compactor went into tailspin #3914

bboreham opened this issue Mar 5, 2021 · 7 comments · Fixed by #4262

Comments

@bboreham
Copy link
Contributor

bboreham commented Mar 5, 2021

Describe the bug

Consul was down at the time compactor started, and it never recovered:

level=info ts=2021-03-04T20:23:01.350819279Z caller=main.go:188 msg="Starting Cortex" version="(version=1.6.0, branch=master, revision=56f794d)"
level=info ts=2021-03-04T20:23:01.35627101Z caller=module_service.go:59 msg=initialising module=server
level=info ts=2021-03-04T20:23:01.358032071Z caller=module_service.go:59 msg=initialising module=compactor
level=info ts=2021-03-04T20:23:01.351964058Z caller=server.go:229 http=[::]:80 grpc=[::]:9095 msg="server listening on addresses"
level=info ts=2021-03-04T20:23:01.356491008Z caller=module_service.go:59 msg=initialising module=memberlist-kv
level=info ts=2021-03-04T20:23:01.36340712Z caller=compactor.go:373 component=compactor msg="waiting until compactor is ACTIVE in the ring"
level=info ts=2021-03-04T20:23:01.366220882Z caller=lifecycler.go:527 msg="not loading tokens from file, tokens file path is empty"
level=error ts=2021-03-04T20:23:01.391200499Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.394262836Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.391467754Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.399955867Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.397642587Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.402454173Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.404896477Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.407444107Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.410901197Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.413247689Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:01.4157827Z caller=client.go:147 msg="error getting key" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:02.848432818Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:06.895736201Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:14.397161217Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:25.078769837Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:23:49.041899802Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:24:35.634427793Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:25:35.127795679Z caller=client.go:234 msg="error getting path" key=compactor err="Get \"http://consul.cortex.svc.cluster.local:8500/v1/kv/compactor?stale=&wait=10000ms\": dial tcp: lookup consul.cortex.svc.cluster.local on 10.0.0.10:53: no such host"
level=error ts=2021-03-04T20:26:37.176005708Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:27:42.508689928Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:28:35.350979964Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:29:41.117751261Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=error ts=2021-03-04T20:30:39.87687265Z caller=client.go:234 msg="error getting path" key=compactor err="Unexpected response code: 500"
level=info ts=2021-03-04T20:31:36.036454597Z caller=client.go:247 msg="value is nil" key=compactor index=22
level=info ts=2021-03-04T20:31:36.930199939Z caller=client.go:247 msg="value is nil" key=compactor index=24
level=info ts=2021-03-04T20:31:37.932395975Z caller=client.go:247 msg="value is nil" key=compactor index=25
level=info ts=2021-03-04T20:31:41.887641219Z caller=client.go:247 msg="value is nil" key=compactor index=28
level=info ts=2021-03-04T20:31:41.941646251Z caller=client.go:247 msg="value is nil" key=compactor index=29
level=info ts=2021-03-04T20:31:46.926493102Z caller=client.go:247 msg="value is nil" key=compactor index=31
level=info ts=2021-03-04T20:31:46.967013886Z caller=client.go:247 msg="value is nil" key=compactor index=32
...
level=info ts=2021-03-05T10:12:35.156472897Z caller=client.go:247 msg="value is nil" key=compactor index=78733
level=info ts=2021-03-05T10:12:35.455228914Z caller=client.go:247 msg="value is nil" key=compactor index=78734
level=info ts=2021-03-05T10:12:36.157313862Z caller=client.go:247 msg="value is nil" key=compactor index=78736
level=info ts=2021-03-05T10:12:37.157305834Z caller=client.go:247 msg="value is nil" key=compactor index=78738

We have compactor sharding turned on:

        - -compactor.ring.consul.hostname=consul.cortex.svc.cluster.local:8500
        - -compactor.ring.prefix=
        - -compactor.ring.store=consul
        - -compactor.sharding-enabled=true

Expected behavior
I think it should exit with error in this situation; crashlooping would make the fault more obvious to the operator, and after a few restarts it would have managed to talk to Consul in my case.

@pracucci
Copy link
Contributor

pracucci commented Mar 5, 2021

Isn't it a temporarily error from which will eventually recover once consul gets back online?

@bboreham
Copy link
Contributor Author

bboreham commented Mar 5, 2021

Well it stayed like this ~12 hours and recovered when I restarted compactor.

@bboreham
Copy link
Contributor Author

bboreham commented Mar 5, 2021

I edited the problem description to show the end of the logfile.

@pracucci
Copy link
Contributor

pracucci commented Mar 5, 2021

My bad. I thought the error occurred while checking if a tenant is owned by the compactor replica. It actually happened here:

level.Info(c.logger).Log("msg", "waiting until compactor is ACTIVE in the ring")
if err := ring.WaitInstanceState(ctx, c.ring, c.ringLifecycler.ID, ring.ACTIVE); err != nil {
return err
}

I agree, we should have a timeout.

@bboreham
Copy link
Contributor Author

bboreham commented Mar 5, 2021

While I'm here, that message "value is nil" doesn't provide much "info".

@stale
Copy link

stale bot commented Jun 3, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

@pracucci
Copy link
Contributor

Still valid and we actually have a PR open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants