-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gossip: remove frequent gossiping of gossip client connections #89613
Conversation
Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. I have added a few people who may be able to assist in reviewing:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
(apologies in advance if some tests fail in CI -- my laptop is able to build cockroach but not really set up to run the full test suite, so I'm relying on CI to point out any issues) |
c1ae3a5
to
6cdf785
Compare
Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @a-robinson.
For example, when run in a mixed version cluster does debug zip run all of its sql commands against the local node or does it run some against remote nodes?
I'm not certain on this one. I think local node but I'm not familiar here. cc @irfansharif.
The failing test TestGossipPropagation
seems due to no component actively removing expired infos. Previously the clientsTimer
would tick in this test every 2 seconds so it would eventually succeed.
I think in a real cluster, given the store gossip @ 10s so there is still something removing it, should be fine - just specific to this test. Triggering gossip w/ a key inside the retry loop should fix
mustAdd(remote, "remote-trigger", nil, 2 * time.Minute)
if _, err := g.GetInfo(gossip.MakeGossipClientsKey(d.NodeID)); err == nil { | ||
alive[d.NodeID] = true | ||
var gossipLiveness livenesspb.Liveness | ||
if err := g.GetInfoProto(gossip.MakeNodeLivenessKey(d.NodeID), &gossipLiveness); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this change needed to pass some tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't actually tried running any tests without this. It just seemed like it would be pretty bad if crdb_internal.gossip_nodes
suddenly switched to always saying that all nodes are not live (due to the gossip clients keys being missing). Especially since IIRC it's the table that's used by CLI commands like cockroach node status
.
And basing it on node liveness seems like a reasonable alternative, although I'm open to other ideas.
Thanks, I suspect so too. I just haven't had a good chance to fix it up yet, but will do soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @a-robinson, @irfansharif, and @kvoli)
pkg/sql/crdb_internal.go
line 4234 at r1 (raw file):
// crdbInternalGossipNetwork exposes the local view of the gossip network (i.e // the gossip client connections from source_id node to target_id node). var crdbInternalGossipNetworkTable = virtualSchemaTable{
drive-by comment: i'd be slightly more inclined to make this an empty table (or do the idea to only show local clients) rather than removing the table outright
we could also have it do p.BufferClientNotice
to warn that this table is no longer reliable
I'll just take over this PR + fallout, through next week. |
Thanks! |
Congrats on getting 22.2 out! I'd hope this is something that will make it into v23.1? Let me know if it'd help for me to clean something up here, although it's unlikely that I'm the best person to make the call on what to do with the crdb_internal.gossip_network` table (or any other compatibility concerns) at this point. |
6cdf785
to
e9d3ca0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need a reviewer here. @kvoli and @tbg, up for it? Alex did most of the work so you know it's good. Do review it with backportability to 22.2 (at the very least) in mind. Was thinking we'd let this bake for a two-ish weeks pre-backport. If it doesn't feel eligible for a backport now that 22.2.0 was published publicly, I'll be a bit regretful of dropping it earlier but would understand.
I don't know if completely removing the gossip_network table is really the best idea or if it should just be left in and only populated with the clients from the local node.
I've re-added the table but left it empty. It just makes it slightly more palatable for a backport.
we could also have it do p.BufferClientNotice to warn that this table is no longer reliable
Done.
For example, when run in a mixed version cluster does debug zip run all of its sql commands against the local node or does it run some against remote nodes? If an old node ever tries to query the gossip_network table on a different node it could have a bad time.
Runs it against the local node. But we do want to support older version CRDB binaries in a point release querying newer point release CRDB servers, so keeping the table around seems ok. For mixed-version clusters while the cluster version hasn't yet been finalized, we want to be able to use the older binary generally. There too keeping the table around seems helpful. In a future release we can get rid of it entirely.
The failing test TestGossipPropagation seems due to no component actively removing expired infos.
Fixed.
This is clearly going to break the gossip roachtest, but between @irfansharif kindly volunteering to fix that up separately and his existing TODO in that file I've left that out of this change.
Fixed.
It just seemed like it would be pretty bad if crdb_internal.gossip_nodes suddenly switched to always saying that all nodes are not live (due to the gossip clients keys being missing). Especially since IIRC it's the table that's used by CLI commands like cockroach node status. And basing it on node liveness seems like a reasonable alternative, although I'm open to other ideas.
Keeping gossip_nodes backed by the gossiped view of node liveness seems better than using the KV node liveness. It's generally good that these tables are highly available and don't need leaseholder participation. We could get that with liveness too by looking at the in-memory state on the node, but I'll keep it as is.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @a-robinson and @kvoli)
42d6cb8
to
faec9a5
Compare
These gossip-clients keys make up two thirds or more of the gossip traffic in various large clusters I've inspected, consuming almost an entire CPU core in the worst case I've seen. They don't provide enough value to justify that sort of ongoing cost, so this commit removes them entirely as well as the periodic logging of the gossip network and the crdb_internal.gossip_network table, both of which relied on them. Release note (backward-incompatible change): We've stopped supporting/populating the crdb_internal.gossip_network table. It was an internal table with no API guarantees (so perhaps no meriting a release note?). Release note (performance improvement): Significantly reduced CPU usage of the underlying gossip network in large clusters.
faec9a5
to
2114678
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for pushing this over the finish line!
Reviewed 5 of 22 files at r1, 8 of 17 files at r2, 1 of 2 files at r3, 3 of 11 files at r4, all commit messages.
Dismissed @kvoli from a discussion.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @a-robinson and @irfansharif)
-- commits
line 9 at r5:
Still reading the PR, but is there a chance in the ./cockroach node status
data sources (see cli roachtest change)? If so, worth pointing out.
pkg/cmd/roachtest/tests/cli.go
line 112 at r5 (raw file):
waitUntil([]string{ "is_available is_live", "false false",
I am confused by this change, what's going on here?
pkg/sql/crdb_internal.go
line 4108 at r5 (raw file):
if err := g.GetInfoProto(gossip.MakeNodeLivenessKey(d.NodeID), &gossipLiveness); err == nil { if now.Before(gossipLiveness.Expiration.ToTimestamp().GoTime()) { alive[d.NodeID] = true
This works, but I wonder if this is the canonical way to do this sort of thing in unified architecture. I would've thought the tenant would have a liveness view of KV around somewhere (possibly powered by Gossip under the hood). Not to block the merge - after all this works - but maybe @knz can take a look and see if this is palatable long-term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do review it with backportability to 22.2 (at the very least) in mind. Was thinking we'd let this bake for a two-ish weeks pre-backport. If it doesn't feel eligible for a backport now that 22.2.0 was published publicly, I'll be a bit regretful of dropping it earlier but would understand.
I don't see an issue backporting.
Reviewed 2 of 22 files at r1, 1 of 17 files at r2, 8 of 11 files at r4, all commit messages.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @a-robinson and @irfansharif)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @a-robinson and @tbg)
Previously, tbg (Tobias Grieger) wrote…
Still reading the PR, but is there a chance in the
./cockroach node status
data sources (see cli roachtest change)? If so, worth pointing out.
There's no change.
pkg/cmd/roachtest/tests/cli.go
line 112 at r5 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I am confused by this change, what's going on here?
It's because of the changes under pkg/sql/crdb_internal.go. Before the is_live column was computed off of the existence of a gossip clients key, which for n1 in test test, would've existed (n1 isn't taken down). But now that we no longer have these keys, is_live is backed by gossiped back liveness. And n1 can't heartbeat its liveness because we've killed 2/3 nodes.
pkg/sql/crdb_internal.go
line 4108 at r5 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This works, but I wonder if this is the canonical way to do this sort of thing in unified architecture. I would've thought the tenant would have a liveness view of KV around somewhere (possibly powered by Gossip under the hood). Not to block the merge - after all this works - but maybe @knz can take a look and see if this is palatable long-term.
I'll defer to Rafa to sort out what a tenant-with-capabilities view of this would look like. I assume it's just going to be a table populated by data returned through the Connector interface, which in turn would do a very similar thing to what we're doing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @irfansharif and @tbg)
pkg/cmd/roachtest/tests/cli.go
line 112 at r5 (raw file):
Previously, irfansharif (irfan sharif) wrote…
It's because of the changes under pkg/sql/crdb_internal.go. Before the is_live column was computed off of the existence of a gossip clients key, which for n1 in test test, would've existed (n1 isn't taken down). But now that we no longer have these keys, is_live is backed by gossiped back liveness. And n1 can't heartbeat its liveness because we've killed 2/3 nodes.
FWIW it may be possible to use gossiped store descriptors instead of gossiped liveness since my understanding is that those get refreshed every couple minutes, but I don't know whether that would really be better overall. Using liveness seemed better to me at the time, but I didn't consider what the output should look like during cluster instability.
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. Backport to branch 22.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
blathers backport 22.2 |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. Backport to branch 22.2 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
I think the 22.2 backport here broke backwards compatibility with 22.1. In a mixed 22.2/22.1 cluster, running |
Ah, right, 22.1 uses I don't know if there's much we can do about this now. 22.1 is out of support with no further releases planned. We can consider backporting a retroactive 22.2 version gate, but it's not clear that it's worth the risk and effort. |
I guess we'll have the same problem with mixed 22.2 patch releases before/after this backport, and with mixed 22.2/23.1 clusters before this backport. The right approach here would've been to add a 23.1 version gate and not backport to 22.2. I think the best option here would be to backport a 23.1 version gate, and to semi-revert the 22.2 backport such that it still gossips the |
For reference, this was initially released in v22.2.3. |
Opened #103788 with a partial 22.2 revert. I'm not entirely convinced we want to do that, considering the cost of gossiping these, but breaking backwards compatibility isn't great either. |
This was intentional BTW, since we did want to backport it to 22.2 and backports can't carry version gates. The
I'm with you. I feel like a release note pointing to newer patch releases is sufficient: #103788 (review). |
We can't add version gates, but we can piggyback on the existing V22_2 version gate. Of course, that still breaks patch release compat, but at least we stop the bleeding in mixed-major clusters. |
These gossip-clients keys make up two thirds or more of the gossip traffic in various large clusters I've inspected, consuming almost an entire CPU core in the worst case I've seen. They don't provide enough value to justify that sort of ongoing cost, so this commit removes them entirely as well as the periodic logging of the gossip network and the crdb_internal.gossip_network table, both of which relied on them.
These gossip-clients keys make up two thirds or more of the gossip
traffic in various large clusters I've inspected, consuming almost an
entire CPU core in the worst case I've seen. They don't provide enough
value to justify that sort of ongoing cost, so this commit removes them
entirely as well as the periodic logging of the gossip network and the
crdb_internal.gossip_network table, both of which relied on them.
Release note (backward-incompatible change): We've stopped
supporting/populating the crdb_internal.gossip_network table. It was an
internal table with no API guarantees (so perhaps no meriting a release
note?).
Release note (performance improvement): Significantly reduced CPU usage
of the underlying gossip network in large clusters.
Informs #51838 (largely fixes it for practical purposes, although there's likely still more that could be done)
This is clearly going to break the gossip roachtest, but between @irfansharif kindly volunteering to fix that up separately and his existing TODO in that file I've left that out of this change.
I don't know if completely removing the gossip_network table is really the best idea or if it should just be left in and only populated with the clients from the local node. For example, when run in a mixed version cluster does
debug zip
run all of its sql commands against the local node or does it run some against remote nodes? If an old node ever tries to query thegossip_network
table on a different node it could have a bad time.@irfansharif @kvoli