-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lightning: refine fetching table structure error handling #44801
Conversation
/run-integration-br-test |
/run-integration-br-test |
failpoint.Inject( | ||
"FetchRemoteTableModels_BeforeFetchTableAutoIDInfos", | ||
func() { | ||
fmt.Println("failpoint: FetchRemoteTableModels_BeforeFetchTableAutoIDInfos") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use nil inject function when the inject verb is sleep, like FailAfterWriteRows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand your point. I print that out to verify in the integration test script that my drop table
is executed before this failpoint's hook is triggered. This makes it easier to debug. Do you think it's OK to keep the printing message here?
@@ -365,6 +365,19 @@ func (p *PreImportInfoGetterImpl) GetAllTableStructures(ctx context.Context, opt | |||
|
|||
func (p *PreImportInfoGetterImpl) getTableStructuresByFileMeta(ctx context.Context, dbSrcFileMeta *mydump.MDDatabaseMeta, getPreInfoCfg *ropts.GetPreInfoConfig) ([]*model.TableInfo, error) { | |||
dbName := dbSrcFileMeta.Name | |||
failpoint.Inject( | |||
"getTableStructuresByFileMeta_BeforeFetchRemoteTableModels", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the caller directly set FetchRemoteTableModels_BeforeFetchTableAutoIDInfos
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have previously attempted this approach where I set a sleep timer of 5 seconds to activate that failpoint. During the integration test, the failpoint was triggered twice, resulting in a total of 10 seconds of delay. However, I was only concerned with the second trigger. In this second trigger, I needed to complete the drop table
operation within the sleep period. To achieve this, I added an 8-second sleep timer in the integration test script (5 < 8 < 10). Although this approach worked, it lacked flexibility. If there were more or fewer calls to FetchRemoteTableModels
, I would need to adjust the sleep timer accordingly.
Therefore, I now use a different approach where I introduce a failpoint here that enables the specific failpoint I want to sleep. This way, I only need to adjust the sleep time together with the total sleep time of a single failpoint trigger, making the behavior more controllable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this approach, are we currently equivalently set sleep(6000)
for FetchRemoteTableModels_BeforeFetchTableAutoIDInfos
? so we can directly set it.
And another suggestion is maybe we let FetchRemoteTableModels_BeforeFetchTableAutoIDInfos
use b.db
to drop the table, to mimic the user action happens at that point 😂 so it's no longer related to sleep duration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this approach, are we currently equivalently set
sleep(6000)
forFetchRemoteTableModels_BeforeFetchTableAutoIDInfos
? so we can directly set it.
If directly enable FetchRemoteTableModels_BeforeFetchTableAutoIDInfos
with sleep(6000)
, it will actually sleep 12 seconds. The first 6 seconds is from here:
tidb/br/pkg/lightning/importer/import.go
Line 807 in e341791
err := worker.makeJobs(rc.dbMetas, rc.preInfoGetter.FetchRemoteTableModels) |
tidb/br/pkg/lightning/importer/import.go
Line 601 in e341791
tables, _ := getTables(worker.ctx, dbMeta.Name) |
At this time, the table schemas have not been created, and the errors are totally ignored.
The second 6 seconds is from here:
tidb/br/pkg/lightning/importer/import.go
Line 813 in e341791
dbInfos, err := rc.preInfoGetter.GetAllTableStructures(ctx) |
tidb/br/pkg/lightning/importer/get_pre_info.go
Lines 356 to 358 in e341791
dbInfos, err = LoadSchemaInfo(ctx, p.dbMetas, func(ctx context.Context, dbName string) ([]*model.TableInfo, error) { | |
return p.getTableStructuresByFileMeta(ctx, p.mdDBMetaMap[dbName], getPreInfoCfg) | |
}) |
currentTableInfosFromDB, err := p.targetInfoGetter.FetchRemoteTableModels(ctx, dbName) |
At this time, the table schemas have been created.
For my approach, I initially did not enable the FetchRemoteTableModels_BeforeFetchTableAutoIDInfos
failpoint. After creating the schemas, I enabled the failpoint here to ensure that the tables have been created during the sleep time of the failpoint. This way, any errors will not be completely ignored.
For the alternative method you mentioned, we still need a precise way to trigger the failpoint only once during the second attempt, ensuring that the error will not be ignored. If the FetchRemoteTableModels_BeforeFetchTableAutoIDInfos
failpoint is triggered twice and the drop table
is executed twice, the following will occur:
- The first time, the
drop table
is executed successfully, but the generated error is ignored. - The second time, the
drop table
cannot be executed successfully because the table has already been dropped. We won't encounter any error because thedrop table
is not executed halfway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And another suggestion is maybe we let
FetchRemoteTableModels_BeforeFetchTableAutoIDInfos
useb.db
to drop the table, to mimic the user action happens at that point 😂 so it's no longer related to sleep duration
This suggestion is technically feasible and presents a good option. However, I am currently struggling with whether to include "DROP TABLE" in the script or in the failpoint hook.
- If we write "DROP TABLE" in the script, it offers flexibility as only the script needs to be modified. However, this approach also has some drawbacks, such as unnecessary additional sleep time that may cause uncontrollable execution sequences. We would then need to tune the sleep time in the script.
- On the other hand, if we write "DROP TABLE" in the failpoint, it eliminates the need for additional sleep time and provides controllable logic. However, it may feel like we are coupling the testing logic into the Lightning code.
Which approach do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently I prefer we use failpoint to do it. Let's wait the option from the second reviewer
@@ -141,10 +141,11 @@ func NewTargetInfoGetter(db *sql.DB) backend.TargetInfoGetter { | |||
// TODO: refactor | |||
func (b *targetInfoGetter) FetchRemoteTableModels(ctx context.Context, schemaName string) ([]*model.TableInfo, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the lightning logic, is it easy to pass the table-level allow list to this function, so we don't need to care about unrelated tables?
No need to change in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if this idea will work? But we don't need to change this PR
@@ -365,6 +365,19 @@ func (p *PreImportInfoGetterImpl) GetAllTableStructures(ctx context.Context, opt | |||
|
|||
func (p *PreImportInfoGetterImpl) getTableStructuresByFileMeta(ctx context.Context, dbSrcFileMeta *mydump.MDDatabaseMeta, getPreInfoCfg *ropts.GetPreInfoConfig) ([]*model.TableInfo, error) { | |||
dbName := dbSrcFileMeta.Name | |||
failpoint.Inject( | |||
"getTableStructuresByFileMeta_BeforeFetchRemoteTableModels", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently I prefer we use failpoint to do it. Let's wait the option from the second reviewer
@@ -141,10 +141,11 @@ func NewTargetInfoGetter(db *sql.DB) backend.TargetInfoGetter { | |||
// TODO: refactor | |||
func (b *targetInfoGetter) FetchRemoteTableModels(ctx context.Context, schemaName string) ([]*model.TableInfo, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if this idea will work? But we don't need to change this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: GMHDBJD, lance6716 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required |
@dsdashun: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/run-integration-br-test |
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created to branch |
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <[email protected]>
@dsdashun: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@dsdashun: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/cherry-pick release-6.5-20230424-v6.5.2 |
@dsdashun: new pull request created to branch In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Signed-off-by: ti-chi-bot <[email protected]>
What problem does this PR solve?
Issue Number: close #44614
Problem Summary:
What is changed and how it works?
If any errors are encountered while fetching column auto information, log a warning and refrain from adding the table structure to the table structure map. This approach allows the error to be ignored if the table is not relevant, making the importing process proceed without any issues.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.