Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(compactor): calculate pending bytes for scale compactor #6497

Closed

Conversation

Little-Wallace
Copy link
Contributor

@Little-Wallace Little-Wallace commented Nov 21, 2022

Signed-off-by: Little-Wallace [email protected]

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

PLEASE DO NOT LEAVE THIS EMPTY !!!

Please explain IN DETAIL what the changes are in this PR and why they are needed:

  • Summarize your change (mandatory)
  • How does this PR work? Need a brief introduction for the changed logic (optional)
  • Describe clearly one logical change and avoid lazy messages (optional)
  • Describe any limitations of the current code (optional)

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.

Types of user-facing changes

Please keep the types that apply to your changes, and remove those that do not apply.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

Please create a release note for your changes. In the release note, focus on the impact on users, and mention the environment or conditions where the impact may occur.

Refer to a related PR or issue link (optional)

#6477

Signed-off-by: Little-Wallace <[email protected]>
@Li0k Li0k self-requested a review November 21, 2022 10:55
@hzxa21
Copy link
Collaborator

hzxa21 commented Nov 22, 2022

2022-11-21T11:03:13.441880Z ERROR risingwave_storage::monitor::monitored_store: Failed in iter: Hummock error: Other error meta cache lookup request dedup get cancel: RecvError(()).
  backtrace of inner error:
   0: <risingwave_storage::hummock::error::HummockError as core::convert::From<risingwave_storage::hummock::error::HummockErrorInner>>::from
             at ./src/storage/src/hummock/error.rs:63:10
   1: <T as core::convert::Into<U>>::into

Related to #6483 ?

@codecov
Copy link

codecov bot commented Nov 23, 2022

Codecov Report

Merging #6497 (a06d7ef) into main (3ec1ea0) will increase coverage by 0.02%.
The diff coverage is 83.13%.

@@            Coverage Diff             @@
##             main    #6497      +/-   ##
==========================================
+ Coverage   73.35%   73.37%   +0.02%     
==========================================
  Files        1010     1010              
  Lines      161906   162215     +309     
==========================================
+ Hits       118763   119024     +261     
- Misses      43143    43191      +48     
Flag Coverage Δ
rust 73.37% <83.13%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/meta/src/rpc/service/hummock_service.rs 3.03% <0.00%> (-0.08%) ⬇️
src/meta/src/hummock/compaction_schedule_policy.rs 92.94% <60.00%> (-4.81%) ⬇️
...a/src/hummock/compaction/tier_compaction_picker.rs 98.61% <66.66%> (-0.34%) ⬇️
src/meta/src/hummock/compaction_scheduler.rs 82.84% <75.00%> (-0.15%) ⬇️
src/meta/src/hummock/compaction/mod.rs 83.73% <79.06%> (-1.21%) ⬇️
src/meta/src/hummock/manager/mod.rs 75.72% <84.44%> (+0.35%) ⬆️
...ta/src/hummock/manager/compaction_group_manager.rs 63.88% <85.71%> (+0.19%) ⬆️
src/meta/src/hummock/compaction/level_selector.rs 96.67% <96.99%> (+0.07%) ⬆️
...hummock/compaction/base_level_compaction_picker.rs 100.00% <100.00%> (ø)
...src/hummock/compaction/manual_compaction_picker.rs 99.47% <100.00%> (+0.85%) ⬆️
... and 8 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
@@ -82,6 +88,81 @@ impl DynamicLevelSelector {
inner: LevelSelectorCore::new(config, overlap_strategy),
}
}

fn calculate_l0_overlap(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please corret me if I am wrong, this method calculates "total size of non-pending L0 file size + total size of base level SSTs overlapping with the non-pending L0 SSTs". Based on this, I have the following questions:

  1. Shouldn't we use if !handlers[0].is_pending_compact(&table_info.id) in L102 so that we are actually use non-pending L0 SSTs to calculate overlap?
  2. output_files in L108 represents the pending SSTs in base level that output to base level itself. IIUC, we should consider these SSTs in next_level_size instead of ignoring them in L116. To be more precise, I think we should consider non-pending SSTs and pending SSTs with target level == base level for next_level_size.
  3. Do we intentionally ignore L0 sub-level compaction?

if compact_bytes + compacting_file_size + target_bytes >= select_level.total_file_size {
break;
}
if output_files.contains(&sst.id) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we ignore pending SSTs output to select level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
}
}
let mut next_level_size = 0;
let next_level_files =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is understandable now, but the variable names make it easy for me to get lost while reading

I tried to modify several names to distinguish between l0 and base_level, how about this?

total_level_size -> l0_total_level_size

next_level_files -> pending_base_level_files 
next_level_size -> base_level_size

handlers: &[LevelHandler],
) -> u64 {
if select_level.total_file_size <= target_bytes {
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we use zero instead of an exact value when <= target_bytes? Is it because the next compact_task won't be spawned?

level_handlers: &[LevelHandler],
) -> u64 {
let ctx = self.inner.calculate_level_base_size(levels);
let mut pending_compaction_bytes = self.calculate_l0_overlap(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to l0_pending_compaction_bytes


use crate::hummock::compaction::level_selector::{DynamicLevelSelector, LevelSelector};
use crate::hummock::compaction::manual_compaction_picker::ManualCompactionSelector;
use crate::hummock::compaction::overlap_strategy::{OverlapStrategy, RangeOverlapStrategy};
use crate::hummock::level_handler::LevelHandler;

// we assume that every core could compact data with 50MB/s, and when there has been 32GB data
// waiting to compact, a new compactor-node with 8-core could consume this data with in 2 minutes.
const COMPACTION_BYTES_PER_CORE: u64 = 4 * 1024 * 1024 * 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumption may not necessarily be true for all machine types, do we need to use it as a configuration ? or add a TODO in this PR

< compactor.max_concurrent_task_number()
{
.unwrap_or(&0);
if running_task < 2 * compactor.max_concurrent_task_number() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this Pr, the limit of running_task for per compactor is max_concurrent_task_number * 2 when Scheduler in Burst state, right ?

@github-actions
Copy link
Contributor

This PR has been open for 60 days with no activity. Could you please update the status? Feel free to ping a reviewer if you are waiting for review.

@hzxa21
Copy link
Collaborator

hzxa21 commented Feb 3, 2023

@Little-Wallace @Li0k Any updates? Are we still working on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants