Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syncer(dm) : fix default collation with upstream in create table statement #3575

Merged
merged 23 commits into from
Dec 6, 2021
Merged

syncer(dm) : fix default collation with upstream in create table statement #3575

merged 23 commits into from
Dec 6, 2021

Conversation

WizardXiao
Copy link
Contributor

@WizardXiao WizardXiao commented Nov 22, 2021

What problem does this PR solve?

fix https://github.com/pingcap/ticdc/issues/3420

in this case:
sql1: create database if not exists test
sql2: create database test CHARACTER SET=utf8mb4
sql3: create table test.t1 (id int)
sql4: create table test.t1 (id int) CHARSET=utf8mb4

if syncer send the same ddl to downstream tidb, database or table will has diffrent collation from upstream mysql, because the default collation is different between tidb and mysql.

What is changed and how it works?

it will adjust collation when sync create database or create table statement.

for database, it will add collation by binlog query event and handle ddl like these:
sql1: create database if not exists test COLLATE = utf8mb4_general_ci
sql2: create database test CHARACTER SET=utf8mb4 COLLATE = utf8mb4_general_ci

for table has no charset and collation , as we can not get collation from binlog, so there will have issue about timeliness, because dm can set start position at increment mode. so we will check and add warning log in dm.
in this case, downstream db will use collation from database for table, so user can use dm sync dabase ddl sql, but if not, user should make sure dabase's collation is right if create dabase manually. so the ddl will be handled like that:
sql3: create table test.t1 (id int) and warning log detect create table risk which use implicit charset and collation

for table has charset but no collation, we will add default collation by SHOW CHARACTER SET, so the dll will be handled like that:
sql4: create table test.t1 (id int) CHARSET=utf8mb4 COLLATE = utf8mb4_general_ci

Check List

Tests

  • Unit test

Release note

Fix a bug when upstream use implicit collation in create database or create table. 
It will add collation when there is no charset and collation in create database and it will not add collation in create table as table can inherit database. 
By the way, it will not directly alter downstream schema, so that user need make sure consistence when create database or table manually.

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Nov 22, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • glorv
  • lance6716

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 22, 2021
@WizardXiao WizardXiao added the area/dm Issues or PRs related to DM. label Nov 22, 2021
@WizardXiao
Copy link
Contributor Author

/run-all-tests

dm/pkg/utils/db.go Outdated Show resolved Hide resolved
dm/pkg/utils/db.go Outdated Show resolved Hide resolved
dm/syncer/ddl.go Outdated Show resolved Hide resolved
dm/syncer/syncer.go Outdated Show resolved Hide resolved
dm/syncer/ddl.go Outdated Show resolved Hide resolved
@WizardXiao
Copy link
Contributor Author

/run-all-tests

@WizardXiao WizardXiao requested a review from lance6716 November 23, 2021 08:15
dm/pkg/utils/db.go Outdated Show resolved Hide resolved
dm/syncer/ddl.go Outdated Show resolved Hide resolved
dm/syncer/ddl.go Outdated Show resolved Hide resolved
dm/syncer/ddl.go Outdated Show resolved Hide resolved
dm/syncer/ddl.go Outdated Show resolved Hide resolved
@lance6716 lance6716 added the needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. label Nov 23, 2021
dm/pkg/utils/db.go Outdated Show resolved Hide resolved
Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we should check dump/load phase will correctly handle collation.

dm/syncer/ddl.go Outdated Show resolved Hide resolved
@lance6716
Copy link
Contributor

by SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES we get currently collation of the table, but what we want is get collation at the replication time. (maybe if we continue replication we'll see alter collations?) please check if we can achieve that.

we have done some work of get SQL mode from the header of query event, maybe this would help
https://github.com/pingcap/ticdc/blob/7a5634bbc0b7b53f7fb70afe4c193fba6ab199b1/dm/pkg/binlog/event/util.go#L359

@Ehco1996
Copy link
Contributor

by SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES we get currently collation of the table, but what we want is get collation at the replication time. (maybe if we continue replication we'll see alter collations?) please check if we can achieve that.

we have done some work of get SQL mode from the header of query event, maybe this would help

https://github.com/pingcap/ticdc/blob/7a5634bbc0b7b53f7fb70afe4c193fba6ab199b1/dm/pkg/binlog/event/util.go#L359

seems we can get replication-time-connection from binlog QueryEvent ptal @WizardXiao
https://dev.mysql.com/doc/internals/en/query-event.html

@pingcap pingcap deleted a comment from Ehco1996 Nov 24, 2021
@WizardXiao
Copy link
Contributor Author

by SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES we get currently collation of the table, but what we want is get collation at the replication time. (maybe if we continue replication we'll see alter collations?) please check if we can achieve that.
we have done some work of get SQL mode from the header of query event, maybe this would help
https://github.com/pingcap/ticdc/blob/7a5634bbc0b7b53f7fb70afe4c193fba6ab199b1/dm/pkg/binlog/event/util.go#L359

seems we can get replication-time-connection from binlog QueryEvent ptal @WizardXiao https://dev.mysql.com/doc/internals/en/query-event.html

ok, tanks, i will try.

Copy link
Contributor

@lichunzhu lichunzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do the same thing to columns and databases?

@WizardXiao
Copy link
Contributor Author

Do we need to do the same thing to columns and databases?

i think this pr will only support tables and if anything go well, we may think about columns and databases later.

@WizardXiao
Copy link
Contributor Author

After some test about mysql binlog, there is a problem when we use Binlog::QUERY_EVENT:status_vars to make sure the timeliness of Collation, because there is no Q_CHARSET_DATABASE_CODE which is characterset and collation of the schema in Mysql 8.0.26. so maybe we need diffrent way to handle full or incremental phase, and maybe will sacrifice some timeliness.

@Ehco1996
Copy link
Contributor

Ehco1996 commented Nov 26, 2021

> because there is no Q_CHARSET_DATABASE_CODE which is characterset and collation of the schema in Mysql 8.0.26.

could you please add the doc link here? i can not find this in mysql doc 😅

> we need diffrent way to handle handle full or incremental phase

this is also confuses me, in full pahse, loader directy execute create-table-sql that exported by dumpling with the collation from source 🤔, so we just need to adjust collation in incremental phase

@lance6716
Copy link
Contributor

because there is no Q_CHARSET_DATABASE_CODE which is characterset and collation of the schema in Mysql 8.0.26.

Not 8.0.26, almost every modern versions
https://github.com/mysql/mysql-server/blob/beb865a960b9a8a16cf999c323e46c5b0c67f21f/libbinlogevents/include/statement_events.h#L313

@lance6716
Copy link
Contributor

lance6716 commented Nov 26, 2021

this is also confuses me, in full pahse, loader directy execute create-table-sql that exported by dumpling with the collation from source 🤔, so we just need to adjust collation in incremental phase

SHOW CREATE TABLE will not write collation when it's the default collation of the charset. please see below comments

https://github.com/mysql/mysql-server/blob/3290a66c89eb1625a7058e0ef732432b6952b435/sql/sql_show.cc#L1983-L1993

@lance6716 lance6716 changed the title dm/syncer : fix default collation with upstream in create table statement syncer(dm) : fix default collation with upstream in create table statement Nov 26, 2021
@Ehco1996
Copy link
Contributor

Ehco1996 commented Nov 26, 2021

this is also confuses me, in full pahse, loader directy execute create-table-sql that exported by dumpling with the collation from source 🤔, so we just need to adjust collation in incremental phase

SHOW CREATE TABLE will not write collation when it's the default collation of the charset. please see below comments

https://github.com/mysql/mysql-server/blob/3290a66c89eb1625a7058e0ef732432b6952b435/sql/sql_show.cc#L1983-L1993

thanks! i think maybe adjust charset and collation for dumpling and syncer with INFORMATION_SCHEMA.TABLES is enough?

  • dumpling: there is no need to get collation-at-the-replication-time
  • syncer: when a create-table-sql-without-collation comes from binlog, use the current collation of the upstream table as the default collation there does not seem to be a problem with the correctness

this may cause correctness problems when default collation is different between upstream and downstream (select result order)

@glorv
Copy link
Contributor

glorv commented Nov 26, 2021

@WizardXiao I think this PR deserve a release note as it's an important bugfix.

@dveeden
Copy link
Contributor

dveeden commented Dec 3, 2021

I assume this works fine with the New framework for collations enabled? Do we need to add tests for that?

@WizardXiao
Copy link
Contributor Author

/run-all-tests

@WizardXiao
Copy link
Contributor Author

I assume this works fine with the New framework for collations enabled? Do we need to add tests for that?
Current approach will sync collation to tidb and will not check collation conf in tidb, so if new_collation_enabled is false, tidb may throw error, and user can set new_collation_enabled be true.

@WizardXiao
Copy link
Contributor Author

/run-verify-ci

dm/pkg/terror/error_list.go Outdated Show resolved Hide resolved
Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this works fine with the New framework for collations enabled? Do we need to add tests for that?

This PR only handle the increment phase, pingcap/tidb#30292 handles dump/load phase. A testing PR will be submitted after including them two.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Dec 6, 2021
@WizardXiao
Copy link
Contributor Author

/run-all-tests

dm/pkg/binlog/event/util.go Show resolved Hide resolved
dm/pkg/binlog/event/util.go Outdated Show resolved Hide resolved
dm/pkg/terror/error_list.go Outdated Show resolved Hide resolved
dm/syncer/ddl.go Outdated Show resolved Hide resolved
dm/syncer/ddl.go Outdated Show resolved Hide resolved
dm/syncer/ddl.go Outdated Show resolved Hide resolved
@WizardXiao
Copy link
Contributor Author

/run-all-tests

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Dec 6, 2021
@glorv
Copy link
Contributor

glorv commented Dec 6, 2021

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 89113bf

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 6, 2021
@ti-chi-bot
Copy link
Member

@WizardXiao: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot merged commit 383c7b5 into pingcap:master Dec 6, 2021
ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Dec 6, 2021
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #3753.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dm Issues or PRs related to DM. needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DM migrates table with incorrect collation
10 participants