-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IMPROVED] Check stream state performance #5963
Conversation
…uld check all msgs from the streams first sequence through ack floor and up to delivered. We do this to make sure our ack state is correct. In cases where there were alot of messages still in the stream due to offline or clow consumers, this could be a heavy load on a server. This improvement uses LoadNextMsg() to efficiently skip ahead and we now remember our checked floor and do not repeat checks for messages below our check floor on subsequent runs. Signed-off-by: Derek Collison <[email protected]>
Signed-off-by: Derek Collison <[email protected]>
Signed-off-by: Derek Collison <[email protected]>
We used to check at 5s then every 30s. However we already checked once the log replay was complete. Now just check every ~2.5 minutes. Signed-off-by: Derek Collison <[email protected]>
Signed-off-by: Derek Collison <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall but just a question.
psi.fblk = i | ||
// We only require read lock here as that is desirable, | ||
// so we need to do this in a go routine to acquire write lock. | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to guard against accidentally creating multiple of these goroutines for the same store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, but was thinking same thing.
We essentially have three options.
- Promote to write lock at call sites, but I want LoadNextMsg() to be able to operate in parallel.
- Do not do any fixups in this function to stale fblks.
- The proposal above (we could modify it to funnel through a single Go routine, but should not be common)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if a simple CompareAndSwap
and a deferred clear would be enough just to ensure only one fixup runs at a time for a given store. I'd be worried that we could end up with multiple of these doing the same work at the same time, which could compound the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be for a different set of PSIM entries though so a simple boolean state would not suffice IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we could go with this for now and keep an eye on the goroutines. It may not be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think worse case is we could duplicate work but not invalidate state. Hence the checks to only move it forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
When checking interest state for interest or workqueue streams, we would check all msgs from the streams first sequence through ack floor and up to delivered. We do this to make sure our ack state is correct. In cases where there were alot of messages still in the stream due to offline or slow consumers, this could be a heavy load on a server. This improvement uses LoadNextMsg() to efficiently skip ahead and we now remember our checked floor and do not repeat checks for messages below our check floor on subsequent runs. This change also highlighted a datarace in filestore that is fixed here as well. Signed-off-by: Derek Collison <[email protected]> --------- Signed-off-by: Derek Collison <[email protected]>
When checking interest state for interest or workqueue streams, we would check all msgs from the streams first sequence through ack floor and up to delivered. We do this to make sure our ack state is correct. In cases where there were alot of messages still in the stream due to offline or slow consumers, this could be a heavy load on a server. This improvement uses LoadNextMsg() to efficiently skip ahead and we now remember our checked floor and do not repeat checks for messages below our check floor on subsequent runs. This change also highlighted a datarace in filestore that is fixed here as well. Signed-off-by: Derek Collison <[email protected]> --------- Signed-off-by: Derek Collison <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
When checking interest state for interest or workqueue streams, we would check all msgs from the streams first sequence through ack floor and up to delivered.
We do this to make sure our ack state is correct. In cases where there were alot of messages still in the stream due to offline or slow consumers, this could be a heavy load on a server.
This improvement uses LoadNextMsg() to efficiently skip ahead and we now remember our checked floor and do not repeat checks for messages below our check floor on subsequent runs.
This change also highlighted a datarace in filestore that is fixed here as well.
Signed-off-by: Derek Collison [email protected]