Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: correct fast analyze stats caclulation #10766

Merged
merged 3 commits into from
Jun 20, 2019

Conversation

alivxxx
Copy link
Contributor

@alivxxx alivxxx commented Jun 11, 2019

What problem does this PR solve?

The stats built from fast analyze were wrong:

  • The TotColSize was the column size in the sample, not the total column size.
  • When built CM Sketch with top N, we scale each value by rowCount/sampleSize, even if the value just occurred once, which will overestimate the row count, and it also conflicts with the calculation of defaultValue, which treat the only once items the same as the values that do not occur.

What is changed and how it works?

  • Scale the total column size by rowCount/sampleSize.
  • For values that only occurred once in the sample, it uses defaultValue as their row count.

Check List

Tests

  • Unit test
    Code changes

  • Has exported function/method change

Side effects

  • None

Related changes

  • Need to cherry-pick to the release branch

@codecov
Copy link

codecov bot commented Jun 11, 2019

Codecov Report

Merging #10766 into master will decrease coverage by 0.0239%.
The diff coverage is 100%.

@@               Coverage Diff               @@
##             master     #10766       +/-   ##
===============================================
- Coverage   80.9332%   80.9093%   -0.024%     
===============================================
  Files           419        419               
  Lines         88746      88750        +4     
===============================================
- Hits          71825      71807       -18     
- Misses        11690      11711       +21     
- Partials       5231       5232        +1

@alivxxx alivxxx requested a review from winoros June 12, 2019 10:58
Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason added status/LGT1 Indicates that a PR has LGTM 1. priority/release-blocker This issue blocks a release. Please solve it ASAP. labels Jun 13, 2019
Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qw4990
Copy link
Contributor

qw4990 commented Jun 20, 2019

/run-all-tests

@alivxxx
Copy link
Contributor Author

alivxxx commented Jun 20, 2019

/run-sqllogic-test-2

@alivxxx alivxxx added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jun 20, 2019
@alivxxx alivxxx merged commit 8c81e43 into pingcap:master Jun 20, 2019
@alivxxx alivxxx deleted the fast-analyze branch June 20, 2019 06:02
alivxxx added a commit to alivxxx/tidb that referenced this pull request Jun 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics priority/release-blocker This issue blocks a release. Please solve it ASAP. status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants