-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pin chunk
and index
format to schema
version.
#10213
Conversation
chunk
and index
format to schema
version.chunk
and index
format to schema
version.
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
1971171
to
c904bf9
Compare
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
…fig` Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
chunk
and index
format to schema
version.chunk
and index
format to schema
version.
Signed-off-by: Kaviraj <[email protected]>
Are there any updates we need to make to the documentation when this change goes through? |
I think you mean the non-index labels changes went into ChunkV4. I think yes. @sandeepsukhani @vlad-diachenko @salvacorts worked on this.
Yes. Schema v13 needs to be added. I will do that either in this PR or follow up PR. |
pkg/storage/stores/indexshipper/compactor/retention/retention_test.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! Thank you Kavi, this will ease our lives a lot when implementing and rolling out new chunk formats 🚀
I left out some minor comments.
pkg/ingester/stream.go
Outdated
@@ -594,7 +611,7 @@ func (s *stream) resetCounter() { | |||
|
|||
func headBlockType(unorderedWrites bool) chunkenc.HeadBlockFmt { | |||
if unorderedWrites { | |||
return chunkenc.DefaultHeadBlockFmt | |||
return chunkenc.UnorderedWithNonIndexedLabelsHeadBlockFmt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it would be good to have a DefaultUnorderedHeadFmt
const to use in this case and similar cases (if any) that way, we don't need to manually find and replace these occurrences but just the constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea. Only problem is once we export such variables outside the chunkenc
, people use it instead of taking it from schema
config.
Besides, particularly here this unordered head fmt should also depends on either chunkv3 or chunkv4
(cannot use normal UnorderedBlockFmt with chunk4 for example).
I handled it differently by exporting ChunkHeadFormatFor(chunkfmt)
api from chunkenc
package.
pkg/storage/stores/indexshipper/compactor/retention/retention_test.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work Kavi! The changes look good to me, modulo Salva's suggestions which are yet to be resolved. While reviewing the code, I had similar concerns about having to update the tests outside of the chunkenc
package to use the latest chunk format. I think we can put a LatestPeriodConfig
or RecommendedPeriodConfig
somewhere for the tests and use it for tests building chunks. The chunkenc
package can then have most tests running against all supported chunk formats.
Signed-off-by: Kaviraj <[email protected]>
Thanks for the review @sandeepsukhani. Agree. Basically I tweaked tests inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -210,7 +210,7 @@ func (r *ingesterRecoverer) Close() { | |||
s.unorderedWrites = isAllowed | |||
|
|||
if !isAllowed && old { | |||
err := s.chunks[len(s.chunks)-1].chunk.ConvertHead(headBlockType(isAllowed)) | |||
err := s.chunks[len(s.chunks)-1].chunk.ConvertHead(headBlockType(s.chunkFormat, isAllowed)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
headBlockType
actually doesn't make sense to me because isAllowed
here is always going to be false. Anyways, not going to block the PR on this.
Will merge this once getting some reviews from @owen-d as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One fix needed then lgtm
pkg/ingester/instance.go
Outdated
schemaconfig *config.SchemaConfig | ||
|
||
// precomputed `Chunk` and `Head` formats parsed from schemaconfig | ||
chunkfmt byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be detected when writing a chunk within the stream
, not on the instance. The instance
may be exist across schema boundaries. This leaves us with two choices:
- set the format based on the min timestamp in a chunk when it's cut
- set the format based on the max timestamp in a chunk when it's cut
I think (1) is the more conservative approach and thus more appropriate. We may write an older format to a newer schema bucket, but we won't write a newer format to an older bucket. To be exact, the chunk is written once then added to the index buckets for different days. I'd rather us have some v13-schema indices reference v12 compatible chunks than vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. thanks @owen-d for the explanation.
In fact my initial commit used "per-stream" chunk format selection, but was using model.Now()
instead of min/max
timestamp. Makes more sense to use correct timestamp here.
Also agree with min
timestamp.
Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
Hello @kavirajk!
Please, if the current pull request addresses a bug fix, label it with the |
We pin all three `Chunk`, `HeadBlock` and `TSDB` Version to `schema` version in period config. This is the following mapping (after being discussed with @owen-d and @sandeepsukhani ) `v12` (current existing schema) - ChunkFormatV3 (UnorderedHeadBlock) + TSDBv2 `v13` (introducing new schema) - ChunkFormatV4 (UnorderedWithNonIndexedLabelsHeadBlockFmt) + TSDBv3 Note the new schema `v13` supports the latest chunk and index format. **NOTES for Reviewer** 1. General approach is we removed the idea of `index.LiveFormat`, `chunkenc.DefaultChunkFormat` and `chunkenc.DefaultHeadBlockFmt` and made following two changes. These variables were used before to tie chunk and tsdb formats specific to Loki versions. This PR remove that coupling and pin these formats to `schema` version instead. 1. These variables were replaced with explicit chunk and index formats within those packages (and it's tests) 2. If these variables were used outside it's own packages say by ingester, compactor, etc. Then we extract correct chunk and index versions from the `schema` config. 2. Add two methods to `periodConfig`. (1) `ChunkFormat()` returning chunk and head format tied to schema (2) `TSDBFormat()` returning tsdb format tied to schema. 2. Other ideas I thought of doing but didn't end up doing is make `ChunkFormat` and `IndexFormat` as separate type (rather than `byte` and `int` currently. Similar to `HeadBlockFmt` type). But didnt' do it eventually to keep the PR small and don't want to complicate with lots of changes. 4. Moved couple of test cases from `chunkenc` to `config` package, because the test case was actually testing methods on `schemaconfig` and it was creating cycling dependencies. --------- Signed-off-by: Kaviraj <[email protected]> (cherry picked from commit bbfb13c)
…10333) Backport bbfb13c from #10213 --- We pin all three `Chunk`, `HeadBlock` and `TSDB` Version to `schema` version in period config. This is the following mapping (after being discussed with @owen-d and @sandeepsukhani ) `v12` (current existing schema) - ChunkFormatV3 (UnorderedHeadBlock) + TSDBv2 `v13` (introducing new schema) - ChunkFormatV4 (UnorderedWithNonIndexedLabelsHeadBlockFmt) + TSDBv3 Note the new schema `v13` supports the latest chunk and index format. **NOTES for Reviewer** 1. General approach is we removed the idea of `index.LiveFormat`, `chunkenc.DefaultChunkFormat` and `chunkenc.DefaultHeadBlockFmt` and made following two changes. These variables were used before to tie chunk and tsdb formats specific to Loki versions. This PR remove that coupling and pin these formats to `schema` version instead. 1. These variables were replaced with explicit chunk and index formats within those packages (and it's tests) 2. If these variables were used outside it's own packages say by ingester, compactor, etc. Then we extract correct chunk and index versions from the `schema` config. 2. Add two methods to `periodConfig`. (1) `ChunkFormat()` returning chunk and head format tied to schema (2) `TSDBFormat()` returning tsdb format tied to schema. 2. Other ideas I thought of doing but didn't end up doing is make `ChunkFormat` and `IndexFormat` as separate type (rather than `byte` and `int` currently. Similar to `HeadBlockFmt` type). But didnt' do it eventually to keep the PR small and don't want to complicate with lots of changes. 4. Moved couple of test cases from `chunkenc` to `config` package, because the test case was actually testing methods on `schemaconfig` and it was creating cycling dependencies. Co-authored-by: Kaviraj Kanagaraj <[email protected]>
We pin all three
Chunk
,HeadBlock
andTSDB
Version toschema
version in period config.This is the following mapping (after being discussed with @owen-d and @sandeepsukhani )
v12
(current existing schema) - ChunkFormatV3 (UnorderedHeadBlock) + TSDBv2v13
(introducing new schema) - ChunkFormatV4 (UnorderedWithNonIndexedLabelsHeadBlockFmt) + TSDBv3Note the new schema
v13
supports the latest chunk and index format.NOTES for Reviewer
General approach is we removed the idea of
index.LiveFormat
,chunkenc.DefaultChunkFormat
andchunkenc.DefaultHeadBlockFmt
and made following two changes. These variables were used before to tie chunk and tsdb formats specific to Loki versions. This PR remove that coupling and pin these formats toschema
version instead.schema
config.Add two methods to
periodConfig
. (1)ChunkFormat()
returning chunk and head format tied to schema (2)TSDBFormat()
returning tsdb format tied to schema.Other ideas I thought of doing but didn't end up doing is make
ChunkFormat
andIndexFormat
as separate type (rather thanbyte
andint
currently. Similar toHeadBlockFmt
type). But didnt' do it eventually to keep the PR small and don't want to complicate with lots of changes.Moved couple of test cases from
chunkenc
toconfig
package, because the test case was actually testing methods onschemaconfig
and it was creating cycling dependencies.