Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightning: optimize the estimated size #41943

Merged
merged 18 commits into from
Mar 14, 2023

Conversation

okJiang
Copy link
Member

@okJiang okJiang commented Mar 6, 2023

What problem does this PR solve?

Issue Number: close #41942

Problem Summary:

What is changed and how it works?

The compression ratio for TiKV is typically around 3 to 1.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Mar 6, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • dsdashun
  • gozssky

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 6, 2023
@okJiang
Copy link
Member Author

okJiang commented Mar 6, 2023

/check-issue-triage-complete

@sleepymole sleepymole added the component/lightning This issue is related to Lightning of TiDB. label Mar 6, 2023
br/pkg/lightning/restore/get_pre_info.go Outdated Show resolved Hide resolved
br/pkg/lightning/restore/precheck_impl.go Outdated Show resolved Hide resolved
@@ -115,7 +115,8 @@ const (
status VARCHAR(32) NOT NULL,
state TINYINT(1) NOT NULL DEFAULT 0 COMMENT '0: normal, 1: exited before finish',
source_bytes BIGINT(20) UNSIGNED NOT NULL DEFAULT 0,
cluster_avail BIGINT(20) UNSIGNED NOT NULL DEFAULT 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For existing clutters,this table has already been created, and the table will only be created if the table doesn't exist. This means the table schema will remain the old one without those two new fields for those existing clusters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lightning is once-only used. It is fine?

@@ -561,6 +561,10 @@ func (p *PreRestoreInfoGetterImpl) EstimateSourceDataSize(ctx context.Context, o
}
}

if !isTiDBBackend(p.cfg) {
sizeWithIndex = sizeWithIndex / 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here describing why size should be divided by 3 ?

@@ -153,13 +162,23 @@ func (ci *clusterResourceCheckItem) Check(ctx context.Context) (*CheckResult, er
return nil, errors.Trace(err)
}
estimateSize := clusterSource * replicaCount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the estimated size is multiplied by the number of replicas. However, you have divided the available size of the entire cluster into TiKV size and TiFlash size. There may be false alarms. For example, suppose there's only one TiFlash node in a cluster, and it is sufficient to store one replica. However, the check indicates that this node is not large enough to store multiple replicas. In this situation, the check should pass IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After 263bba7 and a898ff6, The Tiflash check should be same as TiKV, I think it be able to effect the result.

@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 6, 2023
br/pkg/lightning/restore/get_pre_info.go Outdated Show resolved Hide resolved
br/pkg/lightning/restore/precheck_impl.go Outdated Show resolved Hide resolved
br/pkg/lightning/restore/precheck_impl.go Outdated Show resolved Hide resolved
@@ -54,6 +54,9 @@ import (
"golang.org/x/exp/maps"
)

// compressionRatio is the tikv/tiflash's compression ratio
const compressionRatio = float64(1) / 3
Copy link
Member Author

@okJiang okJiang Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After asked tiflash member, the size in tikv is similar with tiflash for one replica

Copy link
Contributor

@sleepymole sleepymole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM.

} else {
// if sample data failed due to max-error, fallback to use source size
sizeWithIndex += tbl.TotalSize
tableSize = int64(float64(tbl.TotalSize) * tbl.IndexRatio)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the if tbl.IndexRatio > 0? It seems tbl.IndexRatio can not be zero.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is removed in 931f006

estimateSize := clusterSource * replicaCount
if estimateSize > clusterAvail {
estimateTikvSize := tikvSourceSize * replicaCount
// note: tiflashSourceSize contains replicaCount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also let tikvSourceSize contains replicaCount?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we can 931f006

@@ -99,6 +99,10 @@ func (s *tableRestoreSuiteBase) setupSuite(t *testing.T) {
core, err := ddl.MockTableInfo(se, node.(*ast.CreateTableStmt), 0xabcdef)
require.NoError(t, err)
core.State = model.StatePublic
core.TiFlashReplica = &model.TiFlashReplicaInfo{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test case for multiple tables with different tiflash replicas.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in 931f006

@ti-chi-bot ti-chi-bot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 8, 2023
@ti-chi-bot ti-chi-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 8, 2023
@okJiang okJiang force-pushed the optimize-estimate-size branch from 7c637cf to 931f006 Compare March 8, 2023 07:19
@okJiang
Copy link
Member Author

okJiang commented Mar 9, 2023

ptal again~ @dsdashun @gozssky

@ti-chi-bot
Copy link
Member

@yabola: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 13, 2023
@sleepymole
Copy link
Contributor

@okJiang Did you do any tests on real clusters?

@okJiang
Copy link
Member Author

okJiang commented Mar 13, 2023

@okJiang Did you do any tests on real clusters?

Yes, but more detailed testing is underway.

resp="{"check_results":[{"check_item":"CHECK_ITEM_TARGET_CLUSTER_SIZE", "has_passed":false, "msg":"TiKV requires more storage space. Estimated required size: 312.6GiB. Actual size: 271.6GiB. Please increase storage to prevent import task failures."}

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Mar 14, 2023
@okJiang
Copy link
Member Author

okJiang commented Mar 14, 2023

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 309a05d

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Mar 14, 2023
@okJiang okJiang merged commit 4ac0120 into pingcap:master Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/lightning This issue is related to Lightning of TiDB. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lightning: estimated size is inaccurate
5 participants