-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ddl: handle store closed in doDDLJob #18844
Conversation
Codecov Report
@@ Coverage Diff @@
## master #18844 +/- ##
================================================
- Coverage 79.1739% 79.1450% -0.0290%
================================================
Files 548 548
Lines 148545 147883 -662
================================================
- Hits 117609 117042 -567
+ Misses 21449 21370 -79
+ Partials 9487 9471 -16 |
PTAL @zimulala |
PTAL @zimulala |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@zimulala PTAL, thanks! |
c.Assert(s.getDDLSchemaVer(c, d), GreaterEqual, ver) | ||
d.restartWorkers(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this test? Will deleting this test reduce test scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d.Stop()
will call d.cancel()
, which will trigger doDDLJob
quit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can replace done <- d.doDDLJob(ctx, job)
with checking the job in history. Instead of removing these test scenarios.
session/session_test.go
Outdated
|
||
c.Assert(failpoint.Enable("github.com/pingcap/tidb/ddl/storeCloseInLoop", `return(2)`), IsNil) | ||
go func() { | ||
time.Sleep(time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think use Sleep
and go
will make the test unstable and "sleep 2s" make the test run a long test.
/run-all-tests |
1 similar comment
/run-all-tests |
seems an error irrelevant to this PR could be reproduce:
|
PTAL @zimulala |
This reverts commit df93c43.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sorry there're some concurrency problem still, I'll fix it when I have some spare time 😢 |
// It only starts the original workers. | ||
func (d *ddl) restartWorkers(ctx context.Context) { | ||
d.cancel() | ||
d.wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added these two lines⬆️
now CI is failed because |
/run-all-tests |
Oh finally CI passed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
@@ -488,9 +490,17 @@ func (d *ddl) doDDLJob(ctx sessionctx.Context, job *model.Job) error { | |||
metrics.HandleJobHistogram.WithLabelValues(job.Type.String(), metrics.RetLabel(err)).Observe(time.Since(startTime).Seconds()) | |||
}() | |||
for { | |||
failpoint.Inject("storeCloseInLoop", func(_ failpoint.Value) { | |||
d.cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may better call d.Stop
here; otherwise clean up jobs like for del range won't be triggered.
What problem does this PR solve?
Issue Number: close #18714 , pingcap/dm#792
Problem Summary: add an exit branch in
doDDLJob
What is changed and how it works?
What's Changed:
detected
d.ctx.Done()
indoDDLJob
, which may caused by worker closed, store closed, ..., to avoid deadloop. Note nowdoDDLJob
will fail immediately when worker closed, so changed some 5-year-old testHow it Works:
add an exit branch in
doDDLJob
, add a unit test to verifyRelated changes
Check List
Tests
Side effects
Release note