-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(storage): Offline Series Compaction #16542
Conversation
fdc1a8d
to
72fa14e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that it's that fast. I seem to recall it being quite a bit slower when I prototyped it. Is the main improvement you made to add the segment compaction directly on the segment itself?
Couple of other questions:
- what steps have been taken to test/verify this?
- are you at all concerned about the size of the segments no longer being powers of two after compaction?
33a23ba
to
e1346d8
Compare
@e-dard I'm testing on an 8-core i9 and it's running in parallel. I was surprised by the speed too though. I added a bunch of I'm OOO today and tomorrow but I'll add some additional test coverage. I did hit a bug with the use of I'm not concerned about the sizing being off. That was originally added so that we could create a file with a fixed size for Windows. |
Added test coverage for the series compaction. I'm getting a CI error from Semantic Pull Request but I can't figure out why. |
@benbjohnson the PR title also has to be in the same format |
@8none1 I think we should get some real-world testing of this under our belts before 1.8 ships. Perhaps by taking a real workload and verifying the tool does the right thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
96e9f6b
to
f3f041e
Compare
f3f041e
to
34c0fda
Compare
v1.8.4 [2021-01-27] ------------------- ### Bugfixes - [#19696](influxdata/influxdb#19697): fix(flux): add durations to Flux logging v1.8.3 [2020-09-30] ------------------- ### Features - [#19187](influxdata/influxdb#19187): feat: Collect values written stats. - [#19611](influxdata/influxdb#19611): feat: Add -lponly flag to export sub-command. ### Bugfixes - [#19409](influxdata/influxdb#19409): chore: update uuid library from satori to gofrs. - [#19439](influxdata/influxdb#19439): fix(storage): ArrayFilterCursor truncation for multi-block data. - [#19460](influxdata/influxdb#19460): chore: Use latest version of influxql package. - [#19512](influxdata/influxdb#19512): chore: Quiet static analysis tools. - [#19592](influxdata/influxdb#19592): fix(services/storage): multi measurement queries return all applicable series. - [#19612](influxdata/influxdb#19612): fix: lock map before writes. v1.8.2 [2020-08-13] ------------------- ### Bugfixes - [#19253](influxdata/influxdb#19253): fix(tsdb): Revert disable series id set cache size by default. v1.8.1 [2020-07-08] ------------------- ### Features - [#18457](influxdata/influxdb#18457): feat(query): Parallelize field iterator planning. - [#18886](influxdata/influxdb#18886): feat(http): Allow user supplied HTTP headers. ### Bugfixes - [#17319](influxdata/influxdb#17319): fix(flux): buckets call no longer panics. - [#18212](influxdata/influxdb#18212): fix(tsdb): Defer closing of underlying SeriesIDSetIterators. - [#18286](influxdata/influxdb#18286): fix(tsdb): Disable series id set cache size by default. - [#18299](influxdata/influxdb#18299): refactor(http): Simplify Authorizer. - [#18694](influxdata/influxdb#18694): fix(tsi1): wait deleting epoch before dropping shard. - [#18687](influxdata/influxdb#18687): perf(tsi1): batch write tombstone entries when dropping/deleting. - [#18826](influxdata/influxdb#18826): fix: gracefully handle errors when creating snapshots. - [#18849](influxdata/influxdb#18849): chore(build): remove all of the go1.12 references from build. v1.8.0 [2020-04-11] ------------------- ### Features - [#15952](influxdata/influxdb#15952): Add influx_inspect verify-tombstone tool. - [#16542](influxdata/influxdb#16542): Add offline series compaction to influx_inspect buildtsi. - [#16599](influxdata/influxdb#16599): Make influx CLI support custom HTTP endpoint. - [#16908](influxdata/influxdb#16908): Add support for InfluxDB 2.0 write API. - [#17621](influxdata/influxdb#17621): Update Flux to v0.65.0. - [#17188](influxdata/influxdb#17188): Enhance support for bound parameters. ### Bugfixes - [#10503](influxdata/influxdb#10503): Delete rebuilds series index when series to be deleted are only found in cache. - [#10504](https://github.com/influxdata/influxdb/issue/10504): Delete rebuilds series index when series to be deleted are outside timerange. - [#14485](influxdata/influxdb#14485): Parse Accept header correctly. - [#16524](influxdata/influxdb#16524): Upgrade compaction error log from `Info` to `Warn`. - [#16525](influxdata/influxdb#16525): Remove double increment of meta index. - [#16595](influxdata/influxdb#16595): Improve series cardinality limit for inmem index. - [#16606](influxdata/influxdb#16606): Ensure all block data returned. - [#16627](influxdata/influxdb#16627): Skip WriteSnapshot during backup if snapshotter is busy. - [#16709](influxdata/influxdb#16709): Reduce influxd and influx startup time if Flux isn't used. - [#16762](influxdata/influxdb#16762): Fix bugs in -compact-series-file. - [#16944](influxdata/influxdb#16944): Update to Go 1.13.8 and Go modules. - [#17032](influxdata/influxdb#17032): Fix a SIGSEGV when accessing tsi active log. - [#17656](influxdata/influxdb#17656): Verify precision in write requests. - [#17698](influxdata/influxdb#17698): Enable configuration of TLS 1.3.
v1.8.4 [2021-01-27] ------------------- ### Bugfixes - [#19696](influxdata/influxdb#19697): fix(flux): add durations to Flux logging v1.8.3 [2020-09-30] ------------------- ### Features - [#19187](influxdata/influxdb#19187): feat: Collect values written stats. - [#19611](influxdata/influxdb#19611): feat: Add -lponly flag to export sub-command. ### Bugfixes - [#19409](influxdata/influxdb#19409): chore: update uuid library from satori to gofrs. - [#19439](influxdata/influxdb#19439): fix(storage): ArrayFilterCursor truncation for multi-block data. - [#19460](influxdata/influxdb#19460): chore: Use latest version of influxql package. - [#19512](influxdata/influxdb#19512): chore: Quiet static analysis tools. - [#19592](influxdata/influxdb#19592): fix(services/storage): multi measurement queries return all applicable series. - [#19612](influxdata/influxdb#19612): fix: lock map before writes. v1.8.2 [2020-08-13] ------------------- ### Bugfixes - [#19253](influxdata/influxdb#19253): fix(tsdb): Revert disable series id set cache size by default. v1.8.1 [2020-07-08] ------------------- ### Features - [#18457](influxdata/influxdb#18457): feat(query): Parallelize field iterator planning. - [#18886](influxdata/influxdb#18886): feat(http): Allow user supplied HTTP headers. ### Bugfixes - [#17319](influxdata/influxdb#17319): fix(flux): buckets call no longer panics. - [#18212](influxdata/influxdb#18212): fix(tsdb): Defer closing of underlying SeriesIDSetIterators. - [#18286](influxdata/influxdb#18286): fix(tsdb): Disable series id set cache size by default. - [#18299](influxdata/influxdb#18299): refactor(http): Simplify Authorizer. - [#18694](influxdata/influxdb#18694): fix(tsi1): wait deleting epoch before dropping shard. - [#18687](influxdata/influxdb#18687): perf(tsi1): batch write tombstone entries when dropping/deleting. - [#18826](influxdata/influxdb#18826): fix: gracefully handle errors when creating snapshots. - [#18849](influxdata/influxdb#18849): chore(build): remove all of the go1.12 references from build. v1.8.0 [2020-04-11] ------------------- ### Features - [#15952](influxdata/influxdb#15952): Add influx_inspect verify-tombstone tool. - [#16542](influxdata/influxdb#16542): Add offline series compaction to influx_inspect buildtsi. - [#16599](influxdata/influxdb#16599): Make influx CLI support custom HTTP endpoint. - [#16908](influxdata/influxdb#16908): Add support for InfluxDB 2.0 write API. - [#17621](influxdata/influxdb#17621): Update Flux to v0.65.0. - [#17188](influxdata/influxdb#17188): Enhance support for bound parameters. ### Bugfixes - [#10503](influxdata/influxdb#10503): Delete rebuilds series index when series to be deleted are only found in cache. - [#10504](https://github.com/influxdata/influxdb/issue/10504): Delete rebuilds series index when series to be deleted are outside timerange. - [#14485](influxdata/influxdb#14485): Parse Accept header correctly. - [#16524](influxdata/influxdb#16524): Upgrade compaction error log from `Info` to `Warn`. - [#16525](influxdata/influxdb#16525): Remove double increment of meta index. - [#16595](influxdata/influxdb#16595): Improve series cardinality limit for inmem index. - [#16606](influxdata/influxdb#16606): Ensure all block data returned. - [#16627](influxdata/influxdb#16627): Skip WriteSnapshot during backup if snapshotter is busy. - [#16709](influxdata/influxdb#16709): Reduce influxd and influx startup time if Flux isn't used. - [#16762](influxdata/influxdb#16762): Fix bugs in -compact-series-file. - [#16944](influxdata/influxdb#16944): Update to Go 1.13.8 and Go modules. - [#17032](influxdata/influxdb#17032): Fix a SIGSEGV when accessing tsi active log. - [#17656](influxdata/influxdb#17656): Verify precision in write requests. - [#17698](influxdata/influxdb#17698): Enable configuration of TLS 1.3.
This pull request adds the ability to offline compact existing series files into the
influx_inspect buildtsi
tool:It iterates over all series in each segment and rewrites all series that have not been tombstoned in the index to a new
.tmp
file next to the segment. Once all segments are converted, the temporary files overwrite the original segments.The included benchmark shows that compacting 1M series takes only about 63ms:
% go test -bench=BenchmarkSeriesFile_Compaction -run=XXX -v ./tsdb goos: darwin goarch: amd64 pkg: github.com/influxdata/influxdb/tsdb BenchmarkSeriesFile_Compaction-16 21 62932487 ns/op PASS ok github.com/influxdata/influxdb/tsdb 14.363s
Describe your proposed changes here.