Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightning: refine fetching table structure error handling #44801

Merged
merged 3 commits into from
Jun 29, 2023

Conversation

dsdashun
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #44614

Problem Summary:

What is changed and how it works?

If any errors are encountered while fetching column auto information, log a warning and refrain from adding the table structure to the table structure map. This approach allows the error to be ignored if the table is not relevant, making the importing process proceed without any issues.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 19, 2023
@dsdashun
Copy link
Contributor Author

/run-integration-br-test

@dsdashun
Copy link
Contributor Author

/run-integration-br-test

@ti-chi-bot ti-chi-bot bot added needs-cherry-pick-release-5.3 Type: Need cherry pick to release-5.3 needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. and removed do-not-merge/needs-triage-completed labels Jun 20, 2023
failpoint.Inject(
"FetchRemoteTableModels_BeforeFetchTableAutoIDInfos",
func() {
fmt.Println("failpoint: FetchRemoteTableModels_BeforeFetchTableAutoIDInfos")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use nil inject function when the inject verb is sleep, like FailAfterWriteRows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand your point. I print that out to verify in the integration test script that my drop table is executed before this failpoint's hook is triggered. This makes it easier to debug. Do you think it's OK to keep the printing message here?

@@ -365,6 +365,19 @@ func (p *PreImportInfoGetterImpl) GetAllTableStructures(ctx context.Context, opt

func (p *PreImportInfoGetterImpl) getTableStructuresByFileMeta(ctx context.Context, dbSrcFileMeta *mydump.MDDatabaseMeta, getPreInfoCfg *ropts.GetPreInfoConfig) ([]*model.TableInfo, error) {
dbName := dbSrcFileMeta.Name
failpoint.Inject(
"getTableStructuresByFileMeta_BeforeFetchRemoteTableModels",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the caller directly set FetchRemoteTableModels_BeforeFetchTableAutoIDInfos?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have previously attempted this approach where I set a sleep timer of 5 seconds to activate that failpoint. During the integration test, the failpoint was triggered twice, resulting in a total of 10 seconds of delay. However, I was only concerned with the second trigger. In this second trigger, I needed to complete the drop table operation within the sleep period. To achieve this, I added an 8-second sleep timer in the integration test script (5 < 8 < 10). Although this approach worked, it lacked flexibility. If there were more or fewer calls to FetchRemoteTableModels, I would need to adjust the sleep timer accordingly.

Therefore, I now use a different approach where I introduce a failpoint here that enables the specific failpoint I want to sleep. This way, I only need to adjust the sleep time together with the total sleep time of a single failpoint trigger, making the behavior more controllable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this approach, are we currently equivalently set sleep(6000) for FetchRemoteTableModels_BeforeFetchTableAutoIDInfos? so we can directly set it.

And another suggestion is maybe we let FetchRemoteTableModels_BeforeFetchTableAutoIDInfos use b.db to drop the table, to mimic the user action happens at that point 😂 so it's no longer related to sleep duration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this approach, are we currently equivalently set sleep(6000) for FetchRemoteTableModels_BeforeFetchTableAutoIDInfos? so we can directly set it.

If directly enable FetchRemoteTableModels_BeforeFetchTableAutoIDInfos with sleep(6000), it will actually sleep 12 seconds. The first 6 seconds is from here:

err := worker.makeJobs(rc.dbMetas, rc.preInfoGetter.FetchRemoteTableModels)
=>
tables, _ := getTables(worker.ctx, dbMeta.Name)

At this time, the table schemas have not been created, and the errors are totally ignored.

The second 6 seconds is from here:

dbInfos, err := rc.preInfoGetter.GetAllTableStructures(ctx)
=>
dbInfos, err = LoadSchemaInfo(ctx, p.dbMetas, func(ctx context.Context, dbName string) ([]*model.TableInfo, error) {
return p.getTableStructuresByFileMeta(ctx, p.mdDBMetaMap[dbName], getPreInfoCfg)
})
=>
currentTableInfosFromDB, err := p.targetInfoGetter.FetchRemoteTableModels(ctx, dbName)

At this time, the table schemas have been created.

For my approach, I initially did not enable the FetchRemoteTableModels_BeforeFetchTableAutoIDInfos failpoint. After creating the schemas, I enabled the failpoint here to ensure that the tables have been created during the sleep time of the failpoint. This way, any errors will not be completely ignored.

For the alternative method you mentioned, we still need a precise way to trigger the failpoint only once during the second attempt, ensuring that the error will not be ignored. If the FetchRemoteTableModels_BeforeFetchTableAutoIDInfos failpoint is triggered twice and the drop table is executed twice, the following will occur:

  • The first time, the drop table is executed successfully, but the generated error is ignored.
  • The second time, the drop table cannot be executed successfully because the table has already been dropped. We won't encounter any error because the drop table is not executed halfway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And another suggestion is maybe we let FetchRemoteTableModels_BeforeFetchTableAutoIDInfos use b.db to drop the table, to mimic the user action happens at that point 😂 so it's no longer related to sleep duration

This suggestion is technically feasible and presents a good option. However, I am currently struggling with whether to include "DROP TABLE" in the script or in the failpoint hook.

  • If we write "DROP TABLE" in the script, it offers flexibility as only the script needs to be modified. However, this approach also has some drawbacks, such as unnecessary additional sleep time that may cause uncontrollable execution sequences. We would then need to tune the sleep time in the script.
  • On the other hand, if we write "DROP TABLE" in the failpoint, it eliminates the need for additional sleep time and provides controllable logic. However, it may feel like we are coupling the testing logic into the Lightning code.

Which approach do you prefer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently I prefer we use failpoint to do it. Let's wait the option from the second reviewer

@@ -141,10 +141,11 @@ func NewTargetInfoGetter(db *sql.DB) backend.TargetInfoGetter {
// TODO: refactor
func (b *targetInfoGetter) FetchRemoteTableModels(ctx context.Context, schemaName string) ([]*model.TableInfo, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the lightning logic, is it easy to pass the table-level allow list to this function, so we don't need to care about unrelated tables?

No need to change in this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if this idea will work? But we don't need to change this PR

@@ -365,6 +365,19 @@ func (p *PreImportInfoGetterImpl) GetAllTableStructures(ctx context.Context, opt

func (p *PreImportInfoGetterImpl) getTableStructuresByFileMeta(ctx context.Context, dbSrcFileMeta *mydump.MDDatabaseMeta, getPreInfoCfg *ropts.GetPreInfoConfig) ([]*model.TableInfo, error) {
dbName := dbSrcFileMeta.Name
failpoint.Inject(
"getTableStructuresByFileMeta_BeforeFetchRemoteTableModels",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently I prefer we use failpoint to do it. Let's wait the option from the second reviewer

@@ -141,10 +141,11 @@ func NewTargetInfoGetter(db *sql.DB) backend.TargetInfoGetter {
// TODO: refactor
func (b *targetInfoGetter) FetchRemoteTableModels(ctx context.Context, schemaName string) ([]*model.TableInfo, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if this idea will work? But we don't need to change this PR

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jun 26, 2023
@niubell niubell requested a review from GMHDBJD June 29, 2023 09:00
Copy link
Contributor

@GMHDBJD GMHDBJD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the lgtm label Jun 29, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jun 29, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: GMHDBJD, lance6716

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 29, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jun 29, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-06-26 02:53:07.307042309 +0000 UTC m=+589752.708292752: ☑️ agreed by lance6716.
  • 2023-06-29 09:04:12.231916705 +0000 UTC m=+871217.633167139: ☑️ agreed by GMHDBJD.

@dsdashun
Copy link
Contributor Author

/retest-required

@tiprow
Copy link

tiprow bot commented Jun 29, 2023

@dsdashun: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dsdashun
Copy link
Contributor Author

/run-integration-br-test

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #45059.

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Jun 29, 2023
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #45060.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-5.3: #45061.

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Jun 29, 2023
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-5.4: #45062.

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Jun 29, 2023
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.1: #45063.

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Jun 29, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jun 30, 2023

@dsdashun: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/check_dev_2 9c6259e link unknown /test check-dev2

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ti-chi-bot
Copy link
Member

@dsdashun: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/check_dev_2 9c6259e link unknown /test check-dev2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dsdashun
Copy link
Contributor Author

/cherry-pick release-6.5-20230424-v6.5.2

@dsdashun dsdashun deleted the fix-44614 branch June 30, 2023 08:14
@ti-chi-bot
Copy link
Member

@dsdashun: new pull request created to branch release-6.5-20230424-v6.5.2: #45082.

In response to this:

/cherry-pick release-6.5-20230424-v6.5.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-5.3 Type: Need cherry pick to release-5.3 needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tidb-lightning tidb backend import tableA but report Error 1146: Table 'tableB' doesn't exist
4 participants