Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: reduce contention on SessionRegistry mutex #95553

Merged
merged 1 commit into from
Jan 20, 2023
Merged

sql: reduce contention on SessionRegistry mutex #95553

merged 1 commit into from
Jan 20, 2023

Conversation

ecwall
Copy link
Contributor

@ecwall ecwall commented Jan 19, 2023

Fixes #95535

This reduces SessionRegistry contention in 2 ways:

  1. Change SessionRegistry syncutil.Mutex to syncutil.RWMutex and use RLock() in place of Lock() when not modifying the registry.
  2. Only lock reading/writing SessionRegistry maps with mutex instead of including the session operations in the lock.

Release note (bug fix): Reduce register session, deregister session, and session cancel query contention.

@ecwall ecwall requested a review from a team January 19, 2023 21:45
@ecwall ecwall requested review from a team as code owners January 19, 2023 21:45
@ecwall ecwall requested a review from yuzefovich January 19, 2023 21:45
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ecwall ecwall requested a review from rafiss January 19, 2023 21:47
session.cancelSession()
return &serverpb.CancelSessionResponse{Canceled: true}, nil
}
session, ok := r.getSessionByID(sessionID)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now uses the map to look up the session instead of scanning.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @yuzefovich)


pkg/server/status.go line 223 at r1 (raw file):

		b.sessionRegistry.RLock()
		defer b.sessionRegistry.RUnlock()
		sessions = b.sessionRegistry.SerializeAll(false /* lock */)

it looks like this is the only instance where false is passed to SerializeAll. and the lock is just acquired above. so can we just get rid of the boolean parameter, and always do the locking inside of SerializeAll?


pkg/sql/exec_util.go line 2100 at r1 (raw file):

	sessions := make([]registrySession, len(r.sessions))
	i := 0
	for _, session := range r.sessions {

nit: this could be for i, session := range r.session so you don't need to increment i manually

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finds!

Reviewed 4 of 5 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/server/status.go line 223 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

it looks like this is the only instance where false is passed to SerializeAll. and the lock is just acquired above. so can we just get rid of the boolean parameter, and always do the locking inside of SerializeAll?

+1 - I think we'll just want to RLock and RUnlock in getSessions unconditionally, and everything else should just work.


pkg/sql/exec_util.go line 2100 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: this could be for i, session := range r.session so you don't need to increment i manually

I don't think it would work since i would be a key in the map (i.e. clusterunique.ID) and not an integer index.

We could instead change the allocation of sessions to make([]registrySession, 0, len(r.sessions)) and then use append in order to avoid the index altogether.


pkg/sql/exec_util.go line 2186 at r2 (raw file):

}

// SerializeAll returns a slice of all sessions in the registry, converted to serverpb.Sessions.

nit: wrap comments at 80 characters.


pkg/sql/exec_util.go line 2192 at r2 (raw file):

	i := 0
	for _, s := range sessions {
		response[i] = s.serialize()

Currently there is no increment of i, so this seems broken - we should probably just keep the appending behavior.

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @yuzefovich)


pkg/server/status.go line 223 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

+1 - I think we'll just want to RLock and RUnlock in getSessions unconditionally, and everything else should just work.

The difference is that b.closedSessionCache.GetSerializedSessions() is also being locked without being unlocked in between. Why does that not need to be included in the lock?

Or are you suggesting to lock here -> lock in getSessions -> unlock in getSessions -> unlock here?


pkg/sql/exec_util.go line 2100 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I don't think it would work since i would be a key in the map (i.e. clusterunique.ID) and not an integer index.

We could instead change the allocation of sessions to make([]registrySession, 0, len(r.sessions)) and then use append in order to avoid the index altogether.

Changed to append + make([]registrySession, 0, len(r.sessions))


pkg/sql/exec_util.go line 2186 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: wrap comments at 80 characters.

Fixed.


pkg/sql/exec_util.go line 2192 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Currently there is no increment of i, so this seems broken - we should probably just keep the appending behavior.

Changed to append.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)


pkg/server/status.go line 223 at r1 (raw file):

Previously, ecwall (Evan Wall) wrote…

The difference is that b.closedSessionCache.GetSerializedSessions() is also being locked without being unlocked in between. Why does that not need to be included in the lock?

Or are you suggesting to lock here -> lock in getSessions -> unlock in getSessions -> unlock here?

But it's a different lock, right? So we'll have b.sessionRegistry.RLock() and b.sessionRegistry.RUnlock() inside of b.sessionRegistry.SerializeAll(), once that method returns, we do b.closedSessionCache.mu.Lock() and b.closedSessionCache.mu.Unlock() inside of GetSerializedSessions. Two locks seem independent of each other, or am I missing something?


pkg/sql/exec_util.go line 2065 at r3 (raw file):

// Use register() and deregister() to modify this registry.
type SessionRegistry struct {
	syncutil.RWMutex

nit: it's probably better to name this mutex, i.e. mu syncutil.RWMutex) so that only SessionRegistry object itself would lock / unlock it as necessary. Also it'd be probably even nicer to move the other two fields into an anonymous struct

type SessionRegistry struct {
  mu struct {
    syncutil.RWMutex
    sessionsByID ...
    sessionsByCancelKey ...
  }
}

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @yuzefovich)


pkg/server/status.go line 223 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

But it's a different lock, right? So we'll have b.sessionRegistry.RLock() and b.sessionRegistry.RUnlock() inside of b.sessionRegistry.SerializeAll(), once that method returns, we do b.closedSessionCache.mu.Lock() and b.closedSessionCache.mu.Unlock() inside of GetSerializedSessions. Two locks seem independent of each other, or am I missing something?

The way it is special cased, I assumed that the sessionRegistry.Lock() was required over the whole operation to prevent duplicates somehow, but I didn't look into it to verify that so I'll do that next. I would like to simplify it like you suggested if possible.

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @yuzefovich)


pkg/server/status.go line 223 at r1 (raw file):

Previously, ecwall (Evan Wall) wrote…

The way it is special cased, I assumed that the sessionRegistry.Lock() was required over the whole operation to prevent duplicates somehow, but I didn't look into it to verify that so I'll do that next. I would like to simplify it like you suggested if possible.

It looks like it was just to make sure that the closed sessions did not overlap with the open ones. I changed it to not lock over the whole check and to just prefer closed sessions to open ones when populating userSessions.


pkg/sql/exec_util.go line 2065 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: it's probably better to name this mutex, i.e. mu syncutil.RWMutex) so that only SessionRegistry object itself would lock / unlock it as necessary. Also it'd be probably even nicer to move the other two fields into an anonymous struct

type SessionRegistry struct {
  mu struct {
    syncutil.RWMutex
    sessionsByID ...
    sessionsByCancelKey ...
  }
}

Done.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! :lgtm: but probably worth waiting for Rafi to take a look too. I'm assuming that we'll want to backport this, right? Consider adding the corresponding labels.

Reviewed 4 of 5 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rafiss)


pkg/server/status.go line 223 at r1 (raw file):

Previously, ecwall (Evan Wall) wrote…

It looks like it was just to make sure that the closed sessions did not overlap with the open ones. I changed it to not lock over the whole check and to just prefer closed sessions to open ones when populating userSessions.

Oh, I initially missed that comment about deduplicating sessions.

@ecwall ecwall requested a review from a team as a code owner January 20, 2023 13:56
Fixes #95535

This reduces `SessionRegistry` contention in 2 ways:
1) Change `SessionRegistry` `syncutil.Mutex` to `syncutil.RWMutex` and use
`RLock()` in place of `Lock()` when not modifying the registry.
2) Only lock reading/writing `SessionRegistry` maps with mutex instead of
including the session operations in the lock.

Release note (bug fix): Reduce register session, deregister session, and
session cancel query contention.
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thanks for improving this

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)

@ecwall
Copy link
Contributor Author

ecwall commented Jan 20, 2023

bors r=yuzefovich,rafiss

@craig
Copy link
Contributor

craig bot commented Jan 20, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jan 20, 2023

Build succeeded:

@craig craig bot merged commit 875420d into cockroachdb:master Jan 20, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jan 20, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 56d3687 to blathers/backport-release-22.1-95553: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: reduce contention between session (de)registration and query cancelation
4 participants