-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlmigrations: add default .meta zone config #17628
sqlmigrations: add default .meta zone config #17628
Conversation
This seems to break assumptions for some tests (e.g. in Wondering if we should specify a default |
What are the general implications of putting a 5x somewhere? Will folks who are running a three node cluster get annoying warnings all the time and they then have to change the zone config back manually to avoid them? It's more complicated, but what we really want is something that uses "at least three but preferably five replicas", right? Also, what fails with this change, and can't it be fixed if this PR is what we want to do? Reviewed 2 of 2 files at r1. Comments from Reviewable |
Folks running a 3 node cluster will get similar warnings to folks running a 1 node cluster. I think the ranges end up in purgatory. The "at least three but preferably five replicas" is the semantics of what we get here. If there are only 3 nodes, everything should work fine (modulo warnings).
The test failures are likely just violation about assumptions of the default number of replicas for PS I could be convinced that we hold off on this change until 1.2. |
I'm worried that it's off-putting to run a three node cluster just to be told that you really need five nodes (and if you don't need five nodes, why is the default setup warning me about that?). Ideally that zone runs with five when it can, but isn't unhappy when all it can get is three. We also haven't really run 5-fold replication in production afaict. Should we do that first before we upreplicate a system range for 1.1? I don't expect any concrete problems, but it'd be good to check (that sounds like an argument to postpone it to 1.2).
No, was just curious if there was anything "serious" there, but doesn't seem that way. |
Yeah, I think we'll want to add a separate zone for node liveness, as @bdarnell mentioned on #14990 (comment). Adding a new zone isn't much work. |
Agreed. Also, once the range has been upreplicated to 4 or 5 replicas, you have to be careful about shrinking the cluster back to 3. This happens in Jesse's cross-DC migration demo, for example (a 3-node cluster is temporarily 6 nodes during the migration). I think we need to be careful about introducing this, which means waiting for 1.2 (for the replication part. We could do the TTL part now if that makes a significant difference on its own) Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
Let's just hold off on this PR until 1.2. Adjusting the TTL for meta ranges seems minor. |
886076b
to
01ee83d
Compare
01ee83d
to
e63038f
Compare
I've updated this PR to only set a lower TTL for the |
e63038f
to
257472e
Compare
Reviewed 10 of 10 files at r2. Comments from Reviewable |
Also while we're looking at this area, splitting the liveness span into its own zone and giving it a short TTL would be a good idea. |
I've forgotten the precise details, but we recently saw a cluster with megabytes of historical meta version history, but 10s of kilobytes of live data. Yes, a future migration will be slightly more difficult, but only slightly. We'll have to read the meta zone config and make some determination of how to upgrade it.
Didn't someone look at this. Looks like we do split the liveness into its own range. I think the remaining bit would be to add a zone config for it. Let me take a look. |
I was mistaken, we already have the ability to set a zone for timeseries data. I think all that is needed is to add a default zone. Let me do so. |
I've added a default |
079741e
to
4f3dce9
Compare
I think 1m is overly aggressive, but it should still work just fine. Perhaps 10min. 1h for the meta SGTM.
Reviewed 10 of 10 files at r2, 13 of 13 files at r3. pkg/sql/metric_test.go, line 80 at r2 (raw file):
rotted comment. pkg/sqlmigrations/migrations_test.go, line 743 at r2 (raw file):
Make this a pkg-local Comments from Reviewable |
I'd also use 10m for the Do we need to put this behind a Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. pkg/sqlmigrations/migrations.go, line 522 at r3 (raw file):
If there's already a config for the meta zone, we should update it (or leave it alone) instead of clobbering it. pkg/sqlmigrations/migrations.go, line 528 at r3 (raw file):
This should start with the Comments from Reviewable |
Reviewed 2 of 2 files at r1, 10 of 10 files at r2. pkg/sqlmigrations/migrations.go, line 522 at r3 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I think that's handled by the pkg/sqlmigrations/migrations.go, line 528 at r3 (raw file): Previously, bdarnell (Ben Darnell) wrote…
If so,
Comments from Reviewable |
4f3dce9
to
df0a060
Compare
I think it is ok to have nodes use different zone configs for the liveness span. Only one node will be applying that zone config (the leaseholder). Seems like the worst that can happen is we GC some keys earlier. Review status: 8 of 13 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending. pkg/sqlmigrations/migrations.go, line 522 at r3 (raw file): Previously, benesch (Nikhil Benesch) wrote…
We only add a zone config if one isn't already present. I've renamed pkg/sqlmigrations/migrations.go, line 528 at r3 (raw file): Previously, benesch (Nikhil Benesch) wrote…
Hmm, this is trickier than the other case because the user would have set a zone config on pkg/sqlmigrations/migrations_test.go, line 743 at r2 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. Comments from Reviewable |
dfb92c0
to
4a54398
Compare
Review status: 8 of 13 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. pkg/sql/metric_test.go, line 80 at r2 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. Comments from Reviewable |
Review status: 7 of 13 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. pkg/sqlmigrations/migrations.go, line 528 at r3 (raw file):
I wasn't proposing a general inheritance hierarchy (that might be a good idea, but not what I'm suggesting here). I was just thinking of a one-time copy so that when liveness moves from
Just skipping this setting would lose the existing configuration. I think for consistency with the Comments from Reviewable |
4a54398
to
5dc3962
Compare
Review status: 7 of 13 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. pkg/sqlmigrations/migrations.go, line 528 at r3 (raw file): Previously, bdarnell (Ben Darnell) wrote…
@benesch I'm not sure what fix you have in mind for I've added additional logic to the migration so that base the new Comments from Reviewable |
5dc3962
to
3b7fbb2
Compare
Review status: 7 of 13 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. pkg/sqlmigrations/migrations.go, line 534 at r4 (raw file):
I wonder if we should always update the Comments from Reviewable |
Review status: 7 of 13 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. pkg/sqlmigrations/migrations.go, line 534 at r4 (raw file): Previously, petermattis (Peter Mattis) wrote…
If there's an existing config with a non-default TTL, I think we should leave it as-is. If it differs only in other parameters, I think it would probably be a good idea to update it. Comments from Reviewable |
3b7fbb2
to
5da6e9e
Compare
Review status: 4 of 13 files reviewed at latest revision, 5 unresolved discussions, some commit checks pending. pkg/sqlmigrations/migrations.go, line 534 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. This made the logic here somewhat simpler. PTAL. Comments from Reviewable |
b790647
to
27f5353
Compare
Figured out the acceptance failure and it will be fixed shortly. Is this good to go? |
27f5353
to
9beedd5
Compare
Default the .meta zone config to 1h GC TTL and default the .liveness zone config to 10m GC TTL. The shorter GC TTLs reflect the lack of need for ever performing historical queries on these ranges coupled with the desire to keep the meta and liveness ranges smaller. See cockroachdb#16266 See cockroachdb#14990 Release note (general change): Clusters are now initialized with default .meta and .liveness zones with lower GC TTL configurations.
LGTM Review status: 4 of 14 files reviewed at latest revision, 5 unresolved discussions, some commit checks pending. Comments from Reviewable |
9beedd5
to
fcf822a
Compare
Default the .meta zone config to 5 replicas and 1h GC TTL. The higher
replication reflects the relative danger of significant data loss and
unavailability for the meta ranges. The shorter GC TTL reflects the lack
of need for ever performing historical queries on these ranges coupled
with the desire to keep the meta ranges smaller.
See #16266
See #14990