Request: shard (group) count metrics for influxdb #1221

francisdb · 2016-05-18T15:09:39Z

We had problems with InfuxDB when large amounts of new shards were being created. (influxdata/influxdb#6635)
With this metric we would have be able to correlate our crashes with the shard creation.

sparrc · 2016-05-18T16:13:09Z

good idea, should be simple to implement as well

closes #1221

francisdb · 2016-05-18T19:12:17Z

@sparrc that was fast! Thanks

francisdb · 2016-05-24T15:15:11Z

I can confirm this is working correctly with telegraf 0.13.1

francisdb · 2016-05-24T15:59:22Z

@sparrc this is not what I expected. What we need is a gauge so we can see where shards are cleaned up as well as created. Repeating creating followed by deletion indicates there is an issue where old data is inserted.

sparrc · 2016-05-24T16:53:47Z

I don't understand what you mean

francisdb · 2016-05-24T18:24:56Z

This is the output in grafana:

But in fact the shards are not growing all the time, we currently have about ~100 shards. This will fall back to ~70 once retention kicks in.

So I expect the chart to go like
70 -> 100 -> 70 -> 100
instead of
70 -> 100 -> 130 -> 160

The first pattern indicates that something is writing old data that is being cleaned up all the time.

sparrc · 2016-05-24T18:49:07Z

can you provide the output from http://<host>:8086/debug/vars in a gist or attachment?

francisdb · 2016-05-24T19:27:03Z

I can't send you the output but here are some observations:

SHOW SHARDS in influx shows ~100 rows
sudo find /var/lib/influxdb/data/ -type d | wc -l returns ~100
the shard: section in debug/vars indeed shows a lot more shards ~ 350
shard: entries without "values": {} values leaves about 270 rows
looking at tsm1_filestore: and filtering out the ones with "values": {} leaves about ~100 rows

So I guess the reported shards are all the shards ever created and the filtered tsm1_filestore results are actually what SHOW SHARDS reports.

sparrc · 2016-05-24T20:01:16Z

I see, that's a pretty easy fix then, I thought that all shard entries were supposed to be counted.

francisdb · 2016-05-24T20:47:43Z

I expected to see the same output as SHOW SHARDS, maybe both are interesting?

francisdb · 2016-05-24T20:53:37Z

This is what the current results are if influx is restarted (influxdb 0.12.2)

closes #1221

francisdb · 2016-05-25T13:11:34Z

@sparrc any idea when the nightlies are built so I can test this before release?

sparrc · 2016-05-25T13:25:36Z

I believe it's 2am or 3am US/PST

francisdb · 2016-05-25T13:29:34Z

hmm, I'll try about the same time tomorrow then

francisdb · 2016-05-26T14:02:00Z

Just downloaded the nightly

 Package: telegraf
 Version: 0.14.0~n201605260840-0

After installation I see a very small drop but a still way too high number in n_chards

I also just saw that my previous observations might not be correct

#skipping headers with the grep
influx -username xxx -password xxx -execute "SHOW SHARDS" | grep "T00:00:00Z" | wc -l
61

curl -s http://localhost:8086/debug/vars | grep shard | wc -l
2408

curl -s http://localhost:8086/debug/vars | grep shard | grep -v "\"values\": {}" | wc -l
2373

curl -s http://localhost:8086/debug/vars | grep tsm1_filestore | wc -l
2408

curl -s http://localhost:8086/debug/vars | grep tsm1_filestore | grep -v "\"values\": {}" | wc -l
42

I guess contacting somebody of the infuxdb team and asking what to filter on or looking into the source might be a better idea?

francisdb · 2016-05-26T14:10:24Z

https://github.com/influxdata/influxdb/blob/cebe256773387b65a1e35658039cfd4df540f402/coordinator/statement_executor.go#L666

// Shards associated with deleted shard groups are effectively deleted.
// Don't list them.
if sgi.Deleted() {
    continue
}

maybe the /debug/vars call should also apply that filter?

sparrc · 2016-05-26T15:29:06Z

doesn't look like that's going to be possible, so we'll probably need to start running queries on the db for some of these metrics. Unfortunately that will also require quite a bit larger of a change because we need a user we can authenticate as.

francisdb · 2016-05-26T18:56:16Z

I would prefer to create a ticket for influxdb that adds this deleted to /debug/vars

sjwang90 · 2020-05-29T20:26:18Z

Closing due to lack of interest and discussion. Feel free to reopen if desired.

sparrc added a commit that referenced this issue May 18, 2016

influxdb input: Add shard counter

b065573

closes #1221

sparrc mentioned this issue May 18, 2016

influxdb input: Add shard counter #1222

Merged

sparrc closed this as completed in #1222 May 18, 2016

sparrc reopened this May 24, 2016

sparrc added a commit that referenced this issue May 25, 2016

only count shard if it's non-empty

2ded98d

closes #1221

sparrc added a commit that referenced this issue May 25, 2016

only count shard if it's non-empty

916d15f

closes #1221

sparrc closed this as completed in 6351aa5 May 25, 2016

sparrc reopened this May 26, 2016

sjwang90 closed this as completed May 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: shard (group) count metrics for influxdb #1221

Request: shard (group) count metrics for influxdb #1221

francisdb commented May 18, 2016

sparrc commented May 18, 2016

francisdb commented May 18, 2016

francisdb commented May 24, 2016

francisdb commented May 24, 2016

sparrc commented May 24, 2016

francisdb commented May 24, 2016

sparrc commented May 24, 2016

francisdb commented May 24, 2016

sparrc commented May 24, 2016

francisdb commented May 24, 2016

francisdb commented May 24, 2016

francisdb commented May 25, 2016

sparrc commented May 25, 2016

francisdb commented May 25, 2016

francisdb commented May 26, 2016

francisdb commented May 26, 2016

sparrc commented May 26, 2016

francisdb commented May 26, 2016

sjwang90 commented May 29, 2020

Request: shard (group) count metrics for influxdb #1221

Request: shard (group) count metrics for influxdb #1221

Comments

francisdb commented May 18, 2016

sparrc commented May 18, 2016

francisdb commented May 18, 2016

francisdb commented May 24, 2016

francisdb commented May 24, 2016

sparrc commented May 24, 2016

francisdb commented May 24, 2016

sparrc commented May 24, 2016

francisdb commented May 24, 2016

sparrc commented May 24, 2016

francisdb commented May 24, 2016

francisdb commented May 24, 2016

francisdb commented May 25, 2016

sparrc commented May 25, 2016

francisdb commented May 25, 2016

francisdb commented May 26, 2016

francisdb commented May 26, 2016

sparrc commented May 26, 2016

francisdb commented May 26, 2016

sjwang90 commented May 29, 2020