Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PLAT-102919] auto delete overlapped blocks + support filter based on compaction level #16

Merged
merged 5 commits into from
Mar 13, 2024

Conversation

jnyi
Copy link
Collaborator

@jnyi jnyi commented Mar 12, 2024

Currently thanos will halt if it detects overlapped blocks, we had incorrect setup causing blocks ended overlapping with each other and need to clean that up automatically. This might also address this allow overlapping source block for the compaction plan · Issue #5755 · thanos-io/thanos · GitHub

See more in investigation doc: https://docs.google.com/document/d/1VGQUoXI8k1QntMPxiEL4z_Fj9lx5STrLyzof3GqZ298/edit

Adding support for compaction level using __block_level in relabel configs

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

Signed-off-by: Yi Jin <[email protected]>
@@ -418,6 +425,7 @@ func NewGroup(
groupGarbageCollectedBlocks prometheus.Counter,
blocksMarkedForDeletion prometheus.Counter,
blocksMarkedForNoCompact prometheus.Counter,
blocksOverlapped prometheus.Counter,
Copy link
Collaborator

@hczhu-db hczhu-db Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks odd that this counter name needs to be repeated 5 times in different places.

level.Warn(cg.logger).Log("msg", "found overlapping block in plan that are not the first",
"first", kept.String(), "block", m.String())
kept = m
} else if m.MinTime < kept.MinTime || m.MaxTime > kept.MaxTime {
Copy link
Collaborator

@hczhu-db hczhu-db Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two disjoint blocks (e.g., [0, 1) and [2, 3)) will trigger this half as well. Should it be?

if kept contains m {
} else if m contains kept {
} else {
  halt;
}

Copy link
Collaborator Author

@jnyi jnyi Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this triggered halting due to partially overlapping blocks:

ts=2024-03-12T05:59:15.143363282Z caller=compact.go:533 level=error name=pantheon-compactor
 msg="critical error detected; halting" err="compaction: group 0@9802502949926001568: found partially 
overlapped block: 
01HQHWVPCPH519JQN3RC2HRM6D (min time: 1708747200000, max time: 1708819200000) vs 
01HQF7C1GKTJZSW7B0BWTRS6K0 (min time: 1708732800000, max time: 1708761600000)"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes from this group:
01HQF7C1GKTJZSW7B0BWTRS6K0 (min time: 1708732800000, max time: 1708761600000) 01HQD12MFPJ9EYMQGXRFWFHNE3 (min time: 1708740000000, max time: 1708761600000) 01HQHWVPCPH519JQN3RC2HRM6D (min time: 1708747200000, max time: 1708819200000)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the condition if m.MinTime < kept.MinTime || m.MaxTime > kept.MaxTime includes the disjoint case besides overlapping case.

Comment on lines 1052 to 1060
cg.blocksOverlapped.Inc()
if err := os.RemoveAll(filepath.Join(dir, m.ULID.String())); err != nil {
return errors.Wrapf(err, "remove old block dir %s", m.String())
}
if blockDeletableChecker.CanDelete(cg, m.ULID) {
level.Warn(cg.logger).Log("msg", "deleting overlapping block", "block", m.String(),
"level", m.Compaction.Level, "source", m.Thanos.Source, "labels", m.Thanos.Labels)
return block.Delete(ctx, cg.logger, cg.bkt, m.ULID)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not future-proof. If upstream adds another condition (not included in blockDeletableChecker) for a block deletion, this deletion logic would be broken. We need to careful when pull in upstream changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, I am thinking to remove the block checker to force deletion here.

@jnyi jnyi requested a review from hczhu-db March 13, 2024 01:44
@jnyi jnyi changed the title [PLAT-102919] auto delete overlapped blocks [PLAT-102919] auto delete overlapped blocks + support filter based on compaction level Mar 13, 2024
@jnyi jnyi merged commit d0a908a into databricks:db_main Mar 13, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants