-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fully lock adding node queues during hinted handoff #4353
Conversation
I believe this change address the issues with hinted-handoff not fully replicating all data to nodes that come back online after an outage.. A detailed explanation follows. During testing of of hinted-handoff (HH) under various scenarios, HH stats showed that the HH Processor was occasionally encountering errors while unmarshalling hinted data. This error was not handled completely correctly, and in clusters with more than 3 nodes, this could cause the HH service to stall until the node was restarted. This was the high-level reason why HH data was not being replicated. Furthermore by watching, at the byte-level, the hinted-handoff data it could be seen that HH segment block lengths were getting randomly set to 0, but the block data itself was fine (Block data contains hinted writes). This was the root cause of the unmarshalling errors outlined above. This, in turn, was tracked down to the HH system opening each segment file multiple times concurrently, which was not file-level thread-safe, so these mutiple open calls were corrupting the file. Finally, the reason a segment file was being opened multiple times in parallel was because WriteShard on the HH Processor was checking for node queues in an unsafe manner. Since WriteShard can be called concurrently this was adding queues for the same node more than once, and each queue-addition results in opening segment files. This change fixes the locking in WriteShard such the check for an existing HH queue for a given node is performed in a synchronized manner.
c35bd20
to
44d52ac
Compare
if queue, err = p.addQueue(ownerID); err != nil { | ||
if err := func() error { | ||
// Check again under write-lock. | ||
p.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This locking pattern is the same as used by the expvar
package in the standard libary.
func (p *Processor) WriteShard(shardID, ownerID uint64, points []models.Point) error { | ||
p.mu.RLock() | ||
queue, ok := p.queues[ownerID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the root cause. p.queues
was not being locked when checked for an entry for ownerID
.
Would be nice it we were able to validate this via a test. |
👍 That definitely needs synchronization. |
@corylanou - I am open to suggestions but I not sure how this is any One way we now can check - and I will make sure we do - is to always run On Wednesday, October 7, 2015, Cory LaNou [email protected] wrote:
|
Fully lock adding node queues during hinted handoff
@corylanou - let me know if you have any ideas for testing this. I will be enabling race-detection on a single burn-in now. |
@otoolep yeah, I don't have an easy answer. We could possibly add a test that does a bunch of routines all calling HH and |
I believe this change address the issues with hinted-handoff not fully replicating all data to nodes that come back online after an outage. A detailed explanation follows.
During testing of hinted-handoff (HH) under various scenarios, HH stats showed that the HH Processor was occasionally encountering errors while unmarshalling hinted data. This error was not handled completely correctly, and in clusters with more than 3 nodes, this could cause the HH service to stall until the node was restarted. This was the high-level reason why HH data was not being replicated.
Furthermore by watching, at the byte-level, the hinted-handoff data it could be seen that HH segment block lengths were getting randomly set to 0, but the block data itself was fine (Block data contains hinted writes). This was the root cause of the unmarshalling errors outlined above. This, in turn, was tracked down to the HH system opening each segment file multiple times concurrently. This is not thread-safe, so these mutiple
open
calls were corrupting the file.Finally, the reason a segment file was being opened multiple times in parallel was because
WriteShard
on the HH Processor was manipulating HH node queues in an unsafe manner. SinceWriteShard
can be called concurrently this was adding queues for the same node more than once, and each queue-addition results in opening segment files.This change fixes the locking in
WriteShard
such the check for an existing HH queue for a given node is performed in a synchronized manner.Before this change I could cause HH to fail 100% within a couple of minutes due to corrupt HH data, under specific testing. With this change in place everything looks fine.