Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bucket-rewrite: the new and old blocks are combined by the compactor to create another block #4550

Closed
ilangofman opened this issue Aug 10, 2021 · 7 comments

Comments

@ilangofman
Copy link
Contributor

Thanos, Prometheus and Golang version used:

Thanos: Using https://github.com/thanos-io/thanos/tree/aa148f8fdb28
Prometheus: prometheus, version 2.27.1
Golang: go1.16.6 

What happened:

When running the bucket rewrite tool, it creates a new block, and the old one is marked for deletion.

The old block is not deleted right away, as it waits for the deletion delay to pass. However, while it is still marked for deletion, the compactor can still use it for compaction because it ignores the deletion marker for a period of time as explained here:
https://github.com/thanos-io/thanos/blob/aa148f8fdb28/cmd/thanos/compact.go#L224-L227.

When the compactor runs, and picks up the new and deleted block, it see's that there exists two blocks with overlapping time intervals and proceeds to combine them. Once combined, it creates a new block that has the old deleted data in it. The block created by the bucket rewrite tool is marked for deletion.

What you expected to happen:

That the new and old block are not merged together to create another block.

I think this occurs because the source blocks inside the meta.json file for the block are different. The original block has all of its source blocks, while the new block has only itself as the source blocks.

I think a possible fix for this is to add the source blocks inside the meta.json of the new block. I believe it would need to be the source blocks of the rewritten block + the rewritten block itself. This way the compactor will handle the two blocks similar to how it handles not combining compacted blocks and their source blocks. There could be other ways to handle this as well. There was some discussion in Cortex to not compact any block marked for deletion at all. This would fix this issue I believe. Seen here: cortexproject/cortex#4328.

How to reproduce it (as minimally and precisely as possible):

  • Run the bucket rewrite tool on a block with the option to mark the old block for deletion.
  • Run the compactor soon after.
  • It should see the two blocks and proceed to combine them because of the overlapping time intervals to create a new block with the same data as the original block. The block without the deleted data created by the bucket rewrite, should be marked for deletion.

Full logs to relevant components:

Anything else we need to know:

I am running the bucket rewrite logic from inside of Cortex. I suspect that this issue is the same as it would be in Thanos, because the compactor is run inside of the Thanos code which calculates the overlapping blocks.

@ilangofman ilangofman changed the title Buket-rewrite: the new and old blocks are combined by the compactor to create another block Bucket-rewrite: the new and old blocks are combined by the compactor to create another block Aug 10, 2021
@yeya24
Copy link
Contributor

yeya24 commented Aug 11, 2021

Yes, I have seen the same problem before.

For the solution you proposed, do you want to add the original sources to the new block as well? In this case, how do you deal with duplicated compaction sources? https://github.com/thanos-io/thanos/blob/main/pkg/compact/compact.go#L733 I might be wrong though.

cortexproject/cortex#4328 configures the values of delete-delay and consistency-delay to avoid the cases that both old and new blocks are picked up by the compactor. This way is a work-around but not a solution.

What about adding a new block metadata filter to filter the original block? As the original block ID is present in the new block metadata, we can remove the original block to avoid compacting it.

Another solution might be to add both a deletion marker and a no compact marker to the original block, this seems like the easiest way. WDYT?

@ilangofman
Copy link
Contributor Author

ilangofman commented Aug 12, 2021

I was experimenting with setting the source of the new block to the original block ID + the original block sources. I also added the original block to the list of parent blocks (Not sure if this makes a difference). My thought process behind it was to make it similar to how compaction creates the meta.json file. Since after it creates the newly compacted block, it doesn't combine the new block with the smaller source blocks because the smaller blocks are part of the sources of the new block. I don't know if this is the best approach to solving this problem and it could confuse someone looking at the new meta.json file.

So if the original block is BlockA, With Sources: Block1, Block2
The rewritten block would be BlockB, With Sources: Block1, Block2, BlockA

Another solution might be to add both a deletion marker and a no compact marker to the original block, this seems like the easiest way. WDYT?

One case that could potentially arise is that if a new block was recently created through regular compaction (let's call it BlockA). The source blocks of BlockA are marked for deletion as per usual. If we were to rewrite the block (Creating BlockB), and place a no compact marker on BlockA, then it would avoid the issue of combining BlockA and BlockB. However, if BlockA was only recently created, then the ignoreDeletionMarkFilter might still have the compactor select the source blocks of BlockA for compaction. Then I think it will combine the source blocks of BlockA with BlockB because of the overlapping time intervals. I think this is a possible scenario but I am not 100% sure. Please let me know your thoughts.

What about adding a new block metadata filter to filter the original block? As the original block ID is present in the new block metadata, we can remove the original block to avoid compacting it.

I think this makes sense. If the previous scenario described is also valid, then we would have to filter out all the original source blocks as well which are currently present in the bucket rewrite field inside of the meta.json file.

@stale
Copy link

stale bot commented Oct 11, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Oct 11, 2021
@stale
Copy link

stale bot commented Oct 30, 2021

Closing for now as promised, let us know if you need this to be reopened! 🤗

1 similar comment
@stale
Copy link

stale bot commented Jan 9, 2022

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Jan 9, 2022
@yeya24 yeya24 reopened this Jan 10, 2022
@stale stale bot removed the stale label Jan 10, 2022
@stale
Copy link

stale bot commented Apr 17, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Apr 17, 2022
@stale
Copy link

stale bot commented May 1, 2022

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed May 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants