Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compaction planning fixes #8420

Merged
merged 3 commits into from
May 23, 2017
Merged

Compaction planning fixes #8420

merged 3 commits into from
May 23, 2017

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented May 23, 2017

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated
  • Sign CLA (if not already signed)

This fixes some issues related to compaction planning.

  • There was a bug introduced in Compaction limits #8348 where planned files would not be released.
  • There was a bug where failing to write snapshot would mask the file creation error.
  • There was a race in findGenerations where it could get stuck returning no files.

jwilder added 3 commits May 23, 2017 12:05
It was possible that the findGenerations could get stuck returning
no files even when generations existed on disk.
The defer was never executed because the planning happens in a
long running goroutine that loops.  The plans need to be released
immediately after applying them.
The root error when creating a tmp file when writing a snapshot
was hidden making it difficult to determine why snapshots were
failing.
@jwilder jwilder merged commit 2c91eab into master May 23, 2017
@jwilder jwilder deleted the jw-snap-err branch May 23, 2017 19:59
@jwilder jwilder removed the review label May 23, 2017
@garceri
Copy link

garceri commented May 24, 2017

Compiled with the fixes from #8420 , cleaned data/wal/meta directories and started influxdb, getting same behavior, no compaction errors (or any messages at all about compaction) on the logs, if was working fine then..

[root@ip-10-10-0-228 ~]# df -h /var/lib/rancher/volumes/rancher-ebs/ebs_influxdb-wal /var/lib/rancher/volumes/rancher-ebs/ebs_influxdb-data
Filesystem                Size      Used Available Use% Mounted on
/dev/xvdg               196.7G    869.7M    185.9G   0% /var/lib/rancher/volumes/rancher-ebs/ebs_influxdb-wal
/dev/xvdf               295.2G     65.1M    281.1G   0% /var/lib/rancher/volumes/rancher-ebs/ebs_influxdb-data
[root@ip-10-10-0-228 ~]# df -h /var/lib/rancher/volumes/rancher-ebs/ebs_influxdb-wal /var/lib/rancher/volumes/rancher-ebs/ebs_influxdb-data
Filesystem                Size      Used Available Use% Mounted on
/dev/xvdg               196.7G      1.6G    185.1G   1% /var/lib/rancher/volumes/rancher-ebs/ebs_influxdb-wal
/dev/xvdf               295.2G     21.6G    259.6G   8% /var/lib/rancher/volumes/rancher-ebs/ebs_influxdb-data

This is the contents of the log files (cleaned of httpd and query messages)

[root@ip-10-10-0-228 13]# docker logs 94a2d24eaebb 2>&1 |grep -v httpd |grep -v query

 8888888           .d888 888                   8888888b.  888888b.
   888            d88P"  888                   888  "Y88b 888  "88b
   888            888    888                   888    888 888  .88P
   888   88888b.  888888 888 888  888 888  888 888    888 8888888K.
   888   888 "88b 888    888 888  888  Y8bd8P' 888    888 888  "Y88b
   888   888  888 888    888 888  888   X88K   888    888 888    888
   888   888  888 888    888 Y88b 888 .d8""8b. 888  .d88P 888   d88P
 8888888 888  888 888    888  "Y88888 888  888 8888888P"  8888888P"

[I] 2017-05-23T19:50:19Z InfluxDB starting, version 1.2.2, branch jw-snap-err, commit 29e4287fd29e7bfb2d0c1c51322432af701225fb
[I] 2017-05-23T19:50:19Z Go version go1.8.1, GOMAXPROCS set to 2
[I] 2017-05-23T19:50:19Z Using configuration at: /etc/influxdb/influxdb.conf
[I] 2017-05-23T19:50:19Z Using data dir: /data/influxdb/data service=store
[I] 2017-05-23T19:50:19Z opened service service=subscriber
[I] 2017-05-23T19:50:19Z Starting monitor system service=monitor
[I] 2017-05-23T19:50:19Z 'build' registered for diagnostics monitoring service=monitor
[I] 2017-05-23T19:50:19Z 'runtime' registered for diagnostics monitoring service=monitor
[I] 2017-05-23T19:50:19Z 'network' registered for diagnostics monitoring service=monitor
[I] 2017-05-23T19:50:19Z 'system' registered for diagnostics monitoring service=monitor
[I] 2017-05-23T19:50:19Z Starting precreation service with check interval of 10m0s, advance period of 30m0s service=shard-precreation
[I] 2017-05-23T19:50:19Z Starting snapshot service service=snapshot
[I] 2017-05-23T19:50:19Z Starting retention policy enforcement service with check interval of 30m0s service=retention
[I] 2017-05-23T19:50:19Z Starting graphite service, batch size 5000, batch timeout 1s service=graphite addr=:2003
[I] 2017-05-23T19:50:19Z 'graphite:tcp::2003' registered for diagnostics monitoring service=monitor
[I] 2017-05-23T19:50:19Z Listening on TCP: [::]:2003 service=graphite addr=:2003
[I] 2017-05-23T19:50:19Z Listening for signals
[I] 2017-05-23T19:50:19Z Sending usage statistics to usage.influxdata.com
[I] 2017-05-23T19:50:19Z Storing statistics in database '_internal' retention policy 'monitor', at interval 10s service=monitor
[I] 2017-05-23T20:20:19Z retention policy shard deletion check commencing service=retention
[I] 2017-05-23T20:50:19Z retention policy shard deletion check commencing service=retention
[I] 2017-05-23T21:03:11Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/6 written in 13.575555ms engine=tsm1
[I] 2017-05-23T21:20:19Z retention policy shard deletion check commencing service=retention
[I] 2017-05-23T21:50:19Z retention policy shard deletion check commencing service=retention
[I] 2017-05-23T22:20:19Z retention policy shard deletion check commencing service=retention
[I] 2017-05-23T22:50:19Z retention policy shard deletion check commencing service=retention
[I] 2017-05-23T23:20:19Z retention policy shard deletion check commencing service=retention
[I] 2017-05-23T23:30:19Z new shard group 7 successfully precreated for database _internal, retention policy monitor service=metaclient
[I] 2017-05-23T23:44:11Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:11Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 284.508755ms engine=tsm1
[I] 2017-05-23T23:44:11Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:12Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:12Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 272.665117ms engine=tsm1
[I] 2017-05-23T23:44:12Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:13Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:13Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 274.509375ms engine=tsm1
[I] 2017-05-23T23:44:13Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:14Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:14Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 275.367036ms engine=tsm1
[I] 2017-05-23T23:44:14Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:15Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:15Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 281.434839ms engine=tsm1
[I] 2017-05-23T23:44:15Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:16Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:16Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 652.751358ms engine=tsm1
[I] 2017-05-23T23:44:16Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:17Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:17Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 271.625156ms engine=tsm1
[I] 2017-05-23T23:44:17Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:18Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:18Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 274.495871ms engine=tsm1
[I] 2017-05-23T23:44:18Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:19Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:19Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 276.933868ms engine=tsm1
[I] 2017-05-23T23:44:19Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:20Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:20Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 275.980105ms engine=tsm1
[I] 2017-05-23T23:44:20Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:21Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:21Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 279.672511ms engine=tsm1
[I] 2017-05-23T23:44:21Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:22Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:22Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 270.917213ms engine=tsm1
[I] 2017-05-23T23:44:22Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:23Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:23Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 314.955125ms engine=tsm1
[I] 2017-05-23T23:44:23Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:24Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:24Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 278.192211ms engine=tsm1
[I] 2017-05-23T23:44:24Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:25Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:25Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 348.018476ms engine=tsm1
[I] 2017-05-23T23:44:25Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:26Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:26Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 286.781978ms engine=tsm1
[I] 2017-05-23T23:44:26Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:27Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:27Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 271.066347ms engine=tsm1
[I] 2017-05-23T23:44:27Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:28Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:28Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 271.063343ms engine=tsm1
[I] 2017-05-23T23:44:28Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:29Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:29Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 274.267954ms engine=tsm1
[I] 2017-05-23T23:44:29Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:30Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:30Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 281.407001ms engine=tsm1
[I] 2017-05-23T23:44:30Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:31Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:31Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 523.680446ms engine=tsm1
[I] 2017-05-23T23:44:31Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:32Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:32Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 274.511117ms engine=tsm1
[I] 2017-05-23T23:44:32Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:33Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:33Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 271.734162ms engine=tsm1
[I] 2017-05-23T23:44:33Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:34Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:34Z Snapshot for path /data/influxdb/data/telegraf-dev/autogen/5 written in 277.049705ms engine=tsm1
[I] 2017-05-23T23:44:34Z error writing snapshot: indirectIndex: not enough data for max time engine=tsm1
[I] 2017-05-23T23:44:35Z error adding new TSM files from snapshot: indirectIndex: not enough data for max time engine=tsm1
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants