Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics of TIMESTAMP column is stored as datetime string without timezone information #52429

Open
winoros opened this issue Apr 8, 2024 · 4 comments
Assignees
Labels
affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. component/statistics severity/moderate sig/planner SIG: Planner type/bug The issue is confirmed as a bug.

Comments

@winoros
Copy link
Member

winoros commented Apr 8, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

image

As you can see, the type of this column is TIMESTAMP, but we store its histogram with datetime string without timezone information. And the timezone is not UTC, it's decided by the session executing the ANALYZE command.

This is not correct and will introduce an estimation error.

@winoros winoros added type/bug The issue is confirmed as a bug. sig/planner SIG: Planner severity/moderate affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. labels Apr 8, 2024
@winoros
Copy link
Member Author

winoros commented Apr 8, 2024

This is the root cause of #41985

@winoros
Copy link
Member Author

winoros commented Apr 8, 2024

We've already stored the histogram of TIMESTAMP to string value. So it's not easy to store it at its original value for compatibility issues.

A possible fix is that we always store the datetime string with UTC timezone, and do conversion when doing row count estimation.

@winoros
Copy link
Member Author

winoros commented Apr 9, 2024

TopN or index's histogram is correct. They're following the normal encoding&decoding procedure.

@ti-chi-bot ti-chi-bot bot added the may-affects-5.4 This bug maybe affects 5.4.x versions. label Apr 9, 2024
@winoros winoros removed the may-affects-5.4 This bug maybe affects 5.4.x versions. label Apr 9, 2024
@winoros
Copy link
Member Author

winoros commented Apr 9, 2024

It's a long-existing issue. I think i would not solve it before 8.1.0 is released. I'll fix it in the minor version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. component/statistics severity/moderate sig/planner SIG: Planner type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

5 participants