-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: shard (group) count metrics for influxdb #1221
Comments
good idea, should be simple to implement as well |
@sparrc that was fast! Thanks |
I can confirm this is working correctly with telegraf |
@sparrc this is not what I expected. What we need is a gauge so we can see where shards are cleaned up as well as created. Repeating creating followed by deletion indicates there is an issue where old data is inserted. |
I don't understand what you mean |
This is the output in grafana: But in fact the shards are not growing all the time, we currently have about ~100 shards. This will fall back to ~70 once retention kicks in. So I expect the chart to go like The first pattern indicates that something is writing old data that is being cleaned up all the time. |
can you provide the output from |
I can't send you the output but here are some observations:
So I guess the reported shards are all the shards ever created and the filtered tsm1_filestore results are actually what |
I see, that's a pretty easy fix then, I thought that all shard entries were supposed to be counted. |
I expected to see the same output as |
@sparrc any idea when the nightlies are built so I can test this before release? |
I believe it's 2am or 3am US/PST |
hmm, I'll try about the same time tomorrow then |
Just downloaded the nightly
After installation I see a very small drop but a still way too high number in n_chards I also just saw that my previous observations might not be correct #skipping headers with the grep
influx -username xxx -password xxx -execute "SHOW SHARDS" | grep "T00:00:00Z" | wc -l
61
curl -s http://localhost:8086/debug/vars | grep shard | wc -l
2408
curl -s http://localhost:8086/debug/vars | grep shard | grep -v "\"values\": {}" | wc -l
2373
curl -s http://localhost:8086/debug/vars | grep tsm1_filestore | wc -l
2408
curl -s http://localhost:8086/debug/vars | grep tsm1_filestore | grep -v "\"values\": {}" | wc -l
42 I guess contacting somebody of the infuxdb team and asking what to filter on or looking into the source might be a better idea? |
// Shards associated with deleted shard groups are effectively deleted.
// Don't list them.
if sgi.Deleted() {
continue
} maybe the /debug/vars call should also apply that filter? |
doesn't look like that's going to be possible, so we'll probably need to start running queries on the db for some of these metrics. Unfortunately that will also require quite a bit larger of a change because we need a user we can authenticate as. |
I would prefer to create a ticket for influxdb that adds this deleted to /debug/vars |
Closing due to lack of interest and discussion. Feel free to reopen if desired. |
We had problems with InfuxDB when large amounts of new shards were being created. (influxdata/influxdb#6635)
With this metric we would have be able to correlate our crashes with the shard creation.
The text was updated successfully, but these errors were encountered: