Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added ability to specify head block format version #9841

Closed
wants to merge 1 commit into from

Conversation

vlad-diachenko
Copy link
Contributor

@vlad-diachenko vlad-diachenko commented Jul 1, 2023

What this PR does / why we need it:

Currently, the first byte of head block data identifies the type of the head block(ordered/unordered). However, using this approach, it's not possible to distinguish the version of the data that is stored in this head block.
In the next version of the chunk format (V4), we add additional data and when we deserialize the data, it's necessary to know the version of the data format to use different deserialization strategies.

To make it possible, the first byte of the head block now specifies the version of the head block, not the type(ordered/unordered).

Special notes for your reviewer:

  1. This PR does not enable the new format of the head block format and it will be enabled once chunk format V4 is added in this PR.
  2. VersionedHeadBlockFmtV1 byte = 128 - as long as previously the first byte of the head block was the chunk format version, I decided to use 128 as a baseline for head block format versioning. it allows to avoid the collision with chunk formats v1,v2 if somebody decides to upgrade to the latest version of Loki from some legacy version.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR

@vlad-diachenko vlad-diachenko requested a review from a team as a code owner July 1, 2023 04:08
Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me. I just left some minor suggestions.

@@ -77,6 +77,9 @@ const (
_
OrderedHeadBlockFmt
UnorderedHeadBlockFmt

// head block format that stores chunk format version as well as type of head block (ordered/unordered)
VersionedHeadBlockFmtV1 byte = 128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go with 5 here since 128 won't let us increment the version anymore. I do not see any problem with going with 5, so please let me know if I am missing something.

@@ -131,14 +134,21 @@ type block struct {
// This block holds the un-compressed entries. Once it has enough data, this is
// emptied into a block with only compressed entries.
type headBlock struct {
chunkDataFormat byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using the byte type, what do you think about defining a type called chunkFormat and using that wherever it is required? It makes it easier to find the references.

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it possible, the first byte of the head block now specifies the version of the head block, not the type(ordered/unordered).

Conceptually, these are the same thing. Ordered vs Unordered are just the two versions we support (with clearer names than something like HeadV1 vs HeadV2). Even if we do go this new route you suggest, I don't see a reason to start with VersionedHeadBlockFmtV1 = 128 (we could just use the next incremented number and add a second bytes encoded after that).

Can we go about this in a different way?

I think it's much simpler to add a new head format version (UnorderedWithMetadataFmt) that will (1) accept unordered entries and (2) encode non-indexed metadata. It's not intended that we'll need to mix-and-match different headblock vs chunk formats. Asides the issue of some of this being conceptually impossible (we can't have a head block accept non-indexed-metadata but then write to an older chunk format that doens't support it), it's intended that loki starts writing new versions while being able to read old versions and not the other way around.

In summary, I think it's clearer and simpler to add a new head format that accepts unordered entries with optional non-indexed-metadata: UnorderedWithMetadataFmt. WDYT?

@vlad-diachenko
Copy link
Contributor Author

vlad-diachenko commented Jul 6, 2023

To make it possible, the first byte of the head block now specifies the version of the head block, not the type(ordered/unordered).

Conceptually, these are the same thing. Ordered vs Unordered are just the two versions we support (with clearer names than something like HeadV1 vs HeadV2). Even if we do go this new route you suggest, I don't see a reason to start with VersionedHeadBlockFmtV1 = 128 (we could just use the next incremented number and add a second bytes encoded after that).

Can we go about this in a different way?

I think it's much simpler to add a new head format version (UnorderedWithMetadataFmt) that will (1) accept unordered entries and (2) encode non-indexed metadata. It's not intended that we'll need to mix-and-match different headblock vs chunk formats. Asides the issue of some of this being conceptually impossible (we can't have a head block accept non-indexed-metadata but then write to an older chunk format that doens't support it), it's intended that loki starts writing new versions while being able to read old versions and not the other way around.

In summary, I think it's clearer and simpler to add a new head format that accepts unordered entries with optional non-indexed-metadata: UnorderedWithMetadataFmt. WDYT?

Hey @owen-d . yes, if we are going to remove/deprecate ordered writes, we can just add a new format of a head like UnorderedWithMetadataFmt. However, currently, each user can disable/enable unordered-writes and Loki decides what format of head block to use: OrderedHeadBlockFmt, UnorderedHeadBlockFmt. So, with the current implementation of Loki, it's necessary to add 2 new formats OrderedWithMetadataFmt and UnorderedWithMetadataFmt because some users can have unordered-writes disabled but still want to use non-indexed metadata.

Also, even with the current implementation, it does not stop us from removing/deprecating ordered writes in the future.

So, I would go with my solution, and just change the value of VersionedHeadBlockFmtV1 to 5, so it would allow us to add new versions without any issues in the future and add as many options to head block as we want...

WDYT @owen-d , @sandeepsukhani , @salvacorts ?

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the route we chose, the main thing we're going to need to implement is the ability to write entries with non-indexed-metadata in both the unordered head block as well as in the blocks. If we choose to do this with the old ordered-only head as well instead of deprecating+removing ordered-only writes, that's fine too :).

The problem with using your (head_version, block_version) tuple is that it forces us to support the matrix of all head_versions and all block_versions in the future. In reality, we may want to add a new head_version or block_version without needing to make it compatible with all formats of the other.

Does that make sense?

I think we should use the explicit variant I've suggested and quickly deprecate the ordered-only ingestion variants as a followup

@@ -369,10 +376,10 @@ func (hb *unorderedHeadBlock) Serialise(pool WriterPool) ([]byte, error) {
}

func (hb *unorderedHeadBlock) Convert(version HeadBlockFmt) (HeadBlock, error) {
if version > OrderedHeadBlockFmt {
if version == UnorderedHeadBlockFmt {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for changing this, I'm not sure what I was thinking :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you wanted to make this logic flexible to use unorderedHeadBlock for all the versions after OrderedHeadBlockFmt ;) like 4, 5,6,7,etc...

@sandeepsukhani
Copy link
Contributor

However, currently, each user can disable/enable unordered-writes and Loki decides what format of head block to use: OrderedHeadBlockFmt, UnorderedHeadBlockFmt.

We default to enabling unordered writes and it is highly unlikely someone would go out of the way and disable unordered writes. Less code is always better so I think it would be better to keep it simple i.e always create unordered blocks and use chunk format version going forward.

@vlad-diachenko
Copy link
Contributor Author

closed in favor of 529ad9d

@chaudum chaudum deleted the vlad.diachenko/head_block_versioning branch December 20, 2024 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants