-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: send chunks with fail scatters to random node in generative ssp #97589
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Left some nits.
One q: do we understand why kv fails to scatter?
log.Warningf(ctx, "scatter returned node 0. Route span starting at %s to current node %v because of hash error: %v", | ||
scatterKey, nodeID, err) | ||
} else { | ||
randomNum := int(hash.Sum32()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: call this variable a hashedKey
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if nodeID, ok := flowCtx.NodeID.OptionalNodeID(); ok { | ||
cachedNodeIDs := cache.cachedNodeIDs() | ||
if len(cachedNodeIDs) > 0 && len(importSpanChunk.entries) > 0 { | ||
hash := fnv.New32a() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i don't think the memory footprint of a hash is that big, but we could instantiate it outside of the loop and reset it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if nodeID, ok := flowCtx.NodeID.OptionalNodeID(); ok { | ||
cachedNodeIDs := cache.cachedNodeIDs() | ||
if len(cachedNodeIDs) > 0 && len(importSpanChunk.entries) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i don't think we need to check the length of importSpanChunk.entries
, given that we grab the first entry above (line 360)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
testDiskMonitor := execinfra.NewTestDiskMonitor(ctx, st) | ||
defer testDiskMonitor.Stop(ctx) | ||
|
||
// Set up the test so that the test context is canceled after the first entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this comment can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
also, we should backport this to 23.1 |
…e ssp For chunks that have failed to scatter, this patch routes the chunk to a random node instead of the current node. This is necessary as prior to the generative version, split and scatter processors were on every node, thus there was no imbalance introduced from routing chunks that have failed to scatter to the current node. The new generative split and scatter processor is only on 1 node, and thus would cause the same node to process all chunks that have failed to scatter. Release note: None
bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 210650d to blathers/backport-release-23.1-97589: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
For chunks that have failed to scatter, this patch routes the chunk to a
random node instead of the current node. This is necessary as prior to the
generative version, split and scatter processors were on every node, thus there
was no imbalance introduced from routing chunks that have failed to scatter to
the current node. The new generative split and scatter processor is only on 1
node, and thus would cause the same node to process all chunks that have failed
to scatter.
Addresses run 6 and 9 of #99206
Release note: None