-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanos Compactor: overlaps found while gathering blocks. #1089
Comments
Hi, what's the output of the |
I
|
Sorry, I mean |
That makes more sense - inspector output is below. Note - the 0.2.1 image didn't support
|
This looks to me like the simplest configuration error - All Prometheus sources uploading blocks from Prometheus instances that has exactly the same label (wrong - labels has to be unique). Is that the case? |
Interesting...yes I see that now. I've deleted all the duplicate blocks that I could find and the compactor appears to be running smoothly again. I'll continue to monitor it over the next few days and make sure it doesn't start writing multiple blocks again. Can you think of any reason Thanos would be writing out two blocks from each replica? At this point its hard to determine whether it was a Thanos issue or a configuration issue since the prometheus-operator obfiscates a lot of the specific configuration of the resources. |
Did you set TSDB min block time to be equal to the TSDB max block time? Perhaps local compaction wasn't off. We do check this now in the newer versions if Sidecar is configured to upload blocks. |
So I cleared out all of the duplicate blocks and the compactor ran fine for a day, but when I checked again the following day there were a bunch of duplicates again. I cleared them all out again and changed the image to version 0.4.0-rc.1 and it's been running fine ever since. I think it may have been due to something in the older version. The compactor has been running fine for the last few days now, so I'll close this ticket. Thanks for everyone's help! |
Also I met the same problem, my compactor image apiVersion: apps/v1
kind: StatefulSet
metadata:
name: thanos-compactor
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: thanos-compactor
serviceName: thanos-compactor
template:
metadata:
labels:
app: thanos-compactor
spec:
containers:
- args:
- compact
- --log.level=debug
- --data-dir=/var/thanos/store
- --objstore.config=$(OBJSTORE_CONFIG)
- --wait
env:
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos.yaml
name: thanos-objstore-config
image: improbable/thanos:v0.5.0
name: thanos-compactor
ports:
- containerPort: 10902
name: http
volumeMounts:
- mountPath: /var/thanos/store
name: data
readOnly: false
volumes:
- emptyDir: {}
name: data
volumeClaimTemplates: [] |
@chrisghill In your message where you pasted the output of I'm having the same problem but I cannot figure it out. My labels look unique:
|
Image: improbable/thanos:v0.2.1
What happened
We're running prometheus-operator in our kubernetes cluster, with Thanos enabled. We've been running just Kube-Prometheus for ~6 months and we added Thanos about a month ago. I was circling back to verify that compaction was working correctly and when I checked the compactor logs I saw that it had failed and was continuing to fail every time it ran.
I'm aware there are other tickets referencing this issue, but it appears the solution was to delete one or two offending blocks. In my case it appears to be every single block.
Also, I bumped the version of the thanos-compactor image to v0.4.0-rc.1 and it is experiencing the same issue.
What you expected to happen
Compactor to successfully run
How to reproduce it (as minimally and precisely as possible):
Every time I restart the compactor pod halts with the overlap error.
Full logs to relevant components
Anything else we need to know
I'll post the manifests of everything here:
Prometheus Statefulset
Note that we're running 2 prometheus servers, each with a 500GB PVC. The config is pretty standard.
Thanos-store Statefulset
Thanos-compactor Statefulset
Objstore-config Secret
The text was updated successfully, but these errors were encountered: