Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs: remove FOR UPDATE clause when updating job #67660

Merged
merged 1 commit into from
Jul 29, 2021

Conversation

ajwerner
Copy link
Contributor

In cockroachdb currently, the FOR UPDATE lock in an exclusive lock. That
means that both clients trying to inspect jobs and the job adoption loops will
both try to scan the table and encounter these locks. For the most part, we
don't really update the job from the leaves of a distsql flow. There is an
exception which is IMPORT incrementing a sequence. Nevertheless, the retry
behavior there seems sound. The other exception is pausing or canceling jobs.
I think that in that case we prefer to invalidate the work of the transaction
as our intention is to cancel it.

If cockroach implemented UPGRADE locks (#49684), then this FOR UPDATE would
not be a problem.

Release note (performance improvement): Jobs no longer hold exclusive locks
during the duration of their checkpointing transactions which can result in
long wait times when trying to run SHOW JOBS.

@ajwerner ajwerner requested review from sajjadrizvi, adityamaru and a team July 15, 2021 13:49
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@@ -119,7 +119,7 @@ func (j *Job) Update(ctx context.Context, txn *kv.Txn, updateFn UpdateFn) error
var payload *jobspb.Payload
var progress *jobspb.Progress
if err := j.runInTxn(ctx, txn, func(ctx context.Context, txn *kv.Txn) error {
stmt := "SELECT status, payload, progress FROM system.jobs WHERE id = $1 FOR UPDATE"
stmt := "SELECT status, payload, progress FROM system.jobs WHERE id = $1"
if j.sessionID != "" {
stmt = "SELECT status, payload, progress, claim_session_id FROM system." +
"jobs WHERE id = $1 FOR UPDATE"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to remove the FOR UPDATE locking clause from this statement as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, whoops.

@@ -119,7 +119,7 @@ func (j *Job) Update(ctx context.Context, txn *kv.Txn, updateFn UpdateFn) error
var payload *jobspb.Payload
var progress *jobspb.Progress
if err := j.runInTxn(ctx, txn, func(ctx context.Context, txn *kv.Txn) error {
stmt := "SELECT status, payload, progress FROM system.jobs WHERE id = $1 FOR UPDATE"
stmt := "SELECT status, payload, progress FROM system.jobs WHERE id = $1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth retaining the FOR UPDATE clause for calls to (*Job).Update that do intend to update the system.jobs row? Can we distinguish those from the cases that are just using this function to read by checking whether updateFn is a nil (after replacing no-op functions with nil functions)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's the tool we want to use. In almost all cases the job is either updated by its coordinator or from pause or cancel. In those cases, I'd actually prefer a pause or cancel to be able to overwrite the status during a long-running update of a job. The only case I think we want locking is this one right here:

err := j.Registry.UpdateJobWithTxn(ctx, j.JobID, txn, resolveChunkFunc)

I think it's the only place where we try to modify the job from multiple nodes concurrently during normal interaction. I'm going to do some plumbing to retain the locking in that call.

@ajwerner ajwerner force-pushed the ajwerner/remove-jobs-for-update branch from e589ffe to 970813d Compare July 16, 2021 01:25
@ajwerner
Copy link
Contributor Author

@nvanbenschoten, @adityamaru see how this makes you feel.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 3 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @sajjadrizvi)


pkg/jobs/registry.go, line 557 at r2 (raw file):

// that a txn will be automatically created.
func (r *Registry) UpdateJobWithTxn(
	ctx context.Context, jobID jobspb.JobID, txn *kv.Txn, useReadLock bool, updateFunc UpdateFn,

This new param could use a comment.


pkg/jobs/update.go, line 119 at r2 (raw file):

// defined in jobs.go.
func (j *Job) Update(ctx context.Context, txn *kv.Txn, updateFn UpdateFn) error {
	const useForUpdate = false

nit: we use three different names for this. useReadLock, useForUpdate, and useForUpdateReadLock. Consider consolidating.


pkg/jobs/update.go, line 270 at r2 (raw file):

	switch {
	case hasSessionID && !useForUpdate:
		return "SELECT " + columnsWithSession + from

This is fine, though I'd imagine the logic would be easier to read like:

cols := columnsWithoutSession
if hasSessionID {
    cols = columnsWithSession
}
sfu := ""
if useForUpdate {
    sfu = " FOR UPDATE"
}
return "SELECT " + cols + from + sfu

Was the intention to avoid runtime string concatenation costs?

Copy link

@sajjadrizvi sajjadrizvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 3 files at r2.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @ajwerner)

In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That
means that both clients trying to inspect jobs and the job adoption loops will
both try to scan the table and encounter these locks. For the most part, we
don't really update the job from the leaves of a distsql flow.

There is an exception which is IMPORT incrementing a sequence. In that case,
which motivated the initial locking addition, we'll leave the locking.

The other exception is pausing or canceling jobs. I think that in that case
we prefer to invalidate the work of the transaction as our intention is to
cancel it.

If cockroach implemented UPGRADE locks (cockroachdb#49684), then this FOR UPDATE would
not be a problem.

Release note (performance improvement): Jobs no longer hold exclusive locks
during the duration of their checkpointing transactions which can result in
long wait times when trying to run SHOW JOBS.
@ajwerner ajwerner force-pushed the ajwerner/remove-jobs-for-update branch from 970813d to 269bf63 Compare July 29, 2021 15:21
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @nvanbenschoten and @sajjadrizvi)


pkg/jobs/registry.go, line 557 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This new param could use a comment.

Done.


pkg/jobs/update.go, line 119 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: we use three different names for this. useReadLock, useForUpdate, and useForUpdateReadLock. Consider consolidating.

Done.


pkg/jobs/update.go, line 270 at r2 (raw file):

Was the intention to avoid runtime string concatenation costs?

yes, that was in my head when writing this. Probably silly. I reworked it a bit for readability but still returning a constant. I think the new thing is better than the old thing and is good enough.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner)

@ajwerner
Copy link
Contributor Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 29, 2021

Build succeeded:

@craig craig bot merged commit 539496d into cockroachdb:master Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants