Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlmigrations: add default .meta zone config #17628

Conversation

petermattis
Copy link
Collaborator

Default the .meta zone config to 5 replicas and 1h GC TTL. The higher
replication reflects the relative danger of significant data loss and
unavailability for the meta ranges. The shorter GC TTL reflects the lack
of need for ever performing historical queries on these ranges coupled
with the desire to keep the meta ranges smaller.

See #16266
See #14990

@petermattis petermattis requested review from a team August 14, 2017 13:39
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis
Copy link
Collaborator Author

This seems to break assumptions for some tests (e.g. in storage).

Wondering if we should specify a default .system zone config with a higher replication factor. Per #14990, the .system zone config covers both system tables and node liveness, so we can't set the TTL lower.

@tbg
Copy link
Member

tbg commented Aug 14, 2017

What are the general implications of putting a 5x somewhere? Will folks who are running a three node cluster get annoying warnings all the time and they then have to change the zone config back manually to avoid them? It's more complicated, but what we really want is something that uses "at least three but preferably five replicas", right?

Also, what fails with this change, and can't it be fixed if this PR is what we want to do?


Reviewed 2 of 2 files at r1.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks pending.


Comments from Reviewable

@petermattis
Copy link
Collaborator Author

What are the general implications of putting a 5x somewhere? Will folks who are running a three node cluster get annoying warnings all the time and they then have to change the zone config back manually to avoid them? It's more complicated, but what we really want is something that uses "at least three but preferably five replicas", right?

Folks running a 3 node cluster will get similar warnings to folks running a 1 node cluster. I think the ranges end up in purgatory. The "at least three but preferably five replicas" is the semantics of what we get here. If there are only 3 nodes, everything should work fine (modulo warnings).

Also, what fails with this change, and can't it be fixed if this PR is what we want to do?

The test failures are likely just violation about assumptions of the default number of replicas for r1. I think, I haven't investigated. Or are you asking something else?

PS I could be convinced that we hold off on this change until 1.2.

@tbg
Copy link
Member

tbg commented Aug 14, 2017

The "at least three but preferably five replicas" is the semantics of what we get here. If there are only 3 nodes, everything should work fine (modulo warnings).

I'm worried that it's off-putting to run a three node cluster just to be told that you really need five nodes (and if you don't need five nodes, why is the default setup warning me about that?). Ideally that zone runs with five when it can, but isn't unhappy when all it can get is three.

We also haven't really run 5-fold replication in production afaict. Should we do that first before we upreplicate a system range for 1.1? I don't expect any concrete problems, but it'd be good to check (that sounds like an argument to postpone it to 1.2).

The test failures are likely just violation about assumptions of the default number of replicas for r1. I think, I haven't investigated. Or are you asking something else?

No, was just curious if there was anything "serious" there, but doesn't seem that way.

@a-robinson
Copy link
Contributor

Wondering if we should specify a default .system zone config with a higher replication factor. Per #14990, the .system zone config covers both system tables and node liveness, so we can't set the TTL lower.

Yeah, I think we'll want to add a separate zone for node liveness, as @bdarnell mentioned on #14990 (comment). Adding a new zone isn't much work.

@bdarnell
Copy link
Contributor

I'm worried that it's off-putting to run a three node cluster just to be told that you really need five nodes (and if you don't need five nodes, why is the default setup warning me about that?).

Agreed. Also, once the range has been upreplicated to 4 or 5 replicas, you have to be careful about shrinking the cluster back to 3. This happens in Jesse's cross-DC migration demo, for example (a 3-node cluster is temporarily 6 nodes during the migration). I think we need to be careful about introducing this, which means waiting for 1.2 (for the replication part. We could do the TTL part now if that makes a significant difference on its own)


Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@petermattis petermattis added this to the 1.2 milestone Aug 14, 2017
@petermattis
Copy link
Collaborator Author

Let's just hold off on this PR until 1.2. Adjusting the TTL for meta ranges seems minor.

@petermattis petermattis changed the title migrations/sqlmigrations: add default .meta zone config [DO NOT MERGE] migrations/sqlmigrations: add default .meta zone config Aug 14, 2017
@petermattis petermattis force-pushed the pmattis/default-meta-and-system-zone-configs branch from 886076b to 01ee83d Compare September 16, 2017 17:11
@petermattis petermattis force-pushed the pmattis/default-meta-and-system-zone-configs branch from 01ee83d to e63038f Compare December 10, 2017 18:52
@petermattis petermattis requested a review from a team as a code owner December 10, 2017 18:52
@petermattis petermattis requested review from a team December 10, 2017 18:52
@petermattis petermattis changed the title [DO NOT MERGE] migrations/sqlmigrations: add default .meta zone config sqlmigrations: add default .meta zone config Dec 10, 2017
@petermattis
Copy link
Collaborator Author

I've updated this PR to only set a lower TTL for the .meta zone. PTAL.

@petermattis petermattis requested review from bdarnell and tbg December 10, 2017 18:53
@petermattis petermattis force-pushed the pmattis/default-meta-and-system-zone-configs branch from e63038f to 257472e Compare December 10, 2017 20:02
@bdarnell
Copy link
Contributor

:lgtm:, but your last message said that only adjusting the TTL seems minor. Have we seen problems recently with lots of meta version history? Adding a migration for the TTL makes it slightly harder to add a subsequent migration in the future to change replication to 5 (if we decide to do so) since we'll no longer be able to use INSERT IGNORE.


Reviewed 10 of 10 files at r2.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Also while we're looking at this area, splitting the liveness span into its own zone and giving it a short TTL would be a good idea.

@petermattis
Copy link
Collaborator Author

but your last message said that only adjusting the TTL seems minor. Have we seen problems recently with lots of meta version history? Adding a migration for the TTL makes it slightly harder to add a subsequent migration in the future to change replication to 5 (if we decide to do so) since we'll no longer be able to use INSERT IGNORE.

I've forgotten the precise details, but we recently saw a cluster with megabytes of historical meta version history, but 10s of kilobytes of live data. Yes, a future migration will be slightly more difficult, but only slightly. We'll have to read the meta zone config and make some determination of how to upgrade it.

Also while we're looking at this area, splitting the liveness span into its own zone and giving it a short TTL would be a good idea.

Didn't someone look at this. Looks like we do split the liveness into its own range. I think the remaining bit would be to add a zone config for it. Let me take a look.

@petermattis
Copy link
Collaborator Author

Also while we're looking at this area, splitting the liveness span into its own zone and giving it a short TTL would be a good idea.

Didn't someone look at this. Looks like we do split the liveness into its own range. I think the remaining bit would be to add a zone config for it. Let me take a look.

I was mistaken, we already have the ability to set a zone for timeseries data. I think all that is needed is to add a default zone. Let me do so.

@petermattis
Copy link
Collaborator Author

I've added a default .liveness zone with a 1m GC TTL. Any thoughts on what the default GC TTL should be for .meta and .liveness? Both values (1h and 1m) were pulled out of the air.

@petermattis petermattis force-pushed the pmattis/default-meta-and-system-zone-configs branch from 079741e to 4f3dce9 Compare December 11, 2017 17:05
@tbg
Copy link
Member

tbg commented Dec 11, 2017

I think 1m is overly aggressive, but it should still work just fine. Perhaps 10min. 1h for the meta SGTM.

:lgtm: unless @bdarnell thinks we shouldn't do this. (I don't see this as particularly pressing).


Reviewed 10 of 10 files at r2, 13 of 13 files at r3.
Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


pkg/sql/metric_test.go, line 80 at r2 (raw file):

	// DDL statements.
	accum := queryCounter{
		insertCount: 3, // version setting population migration and root user addition.

rotted comment.


pkg/sqlmigrations/migrations_test.go, line 743 at r2 (raw file):

	newMigrations := make([]migrationDescriptor, 0, len(backwardCompatibleMigrations))
	for _, m := range backwardCompatibleMigrations {
		if m.name == "add default .meta zone config" {

Make this a pkg-local const that you can use in both places.


Comments from Reviewable

@bdarnell
Copy link
Contributor

:lgtm:

I'd also use 10m for the .liveness ttl. Historical versions in the .meta ranges have been useful so I don't think it's a great idea to discard them, but I guess we do have space constraints there and the history is still available in the range-local data.

Do we need to put this behind a set cluster version check, or is it OK that during a mixed-version cluster nodes will have different ideas about which zone the liveness span is in? I think it'll be fine...


Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/sqlmigrations/migrations.go, line 522 at r3 (raw file):

func addDefaultMetaAndLivenessZoneConfigs(ctx context.Context, r runner) error {
	metaZone := config.DefaultZoneConfig()

If there's already a config for the meta zone, we should update it (or leave it alone) instead of clobbering it.


pkg/sqlmigrations/migrations.go, line 528 at r3 (raw file):

	}

	livenessZone := config.DefaultZoneConfig()

This should start with the .system zone config instead of the default.


Comments from Reviewable

@benesch
Copy link
Contributor

benesch commented Dec 11, 2017

Reviewed 2 of 2 files at r1, 10 of 10 files at r2.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/sqlmigrations/migrations.go, line 522 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

If there's already a config for the meta zone, we should update it (or leave it alone) instead of clobbering it.

I think that's handled by the INSERT ... ON CONFLICT (id) DO NOTHING below. I do wish we could use ALTER RANGE ... EXPERIMENTAL CONFIGURE ZONE here; the fact that we access these system tables with raw SQL (and sometimes worse, with raw KV) in so many places has me worried.


pkg/sqlmigrations/migrations.go, line 528 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This should start with the .system zone config instead of the default.

If so, ALTER RANGE liveness EXPERIMENTAL CONFIGURE ZONE... should do the same. In fact, in light of this new inheritance hierarchy, I have a weak preference for fixing ALTER RANGE liveness, then updating this migration to use the higher-level SQL syntax:

if n, err := exec(`SELECT system.zones WHERE id = blah`) {
  return err
} else if n == 0 {
  exec(`ALTER RANGE blah EXPERIMENTAL CONFIGURE ZONE '{stuff}'`)
}

Comments from Reviewable

@petermattis petermattis force-pushed the pmattis/default-meta-and-system-zone-configs branch from 4f3dce9 to df0a060 Compare December 11, 2017 22:21
@petermattis
Copy link
Collaborator Author

I think it is ok to have nodes use different zone configs for the liveness span. Only one node will be applying that zone config (the leaseholder). Seems like the worst that can happen is we GC some keys earlier.


Review status: 8 of 13 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.


pkg/sqlmigrations/migrations.go, line 522 at r3 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

I think that's handled by the INSERT ... ON CONFLICT (id) DO NOTHING below. I do wish we could use ALTER RANGE ... EXPERIMENTAL CONFIGURE ZONE here; the fact that we access these system tables with raw SQL (and sometimes worse, with raw KV) in so many places has me worried.

We only add a zone config if one isn't already present. I've renamed addZoneConfig to addZoneConfigIfNotPresent to make this clearer.


pkg/sqlmigrations/migrations.go, line 528 at r3 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

If so, ALTER RANGE liveness EXPERIMENTAL CONFIGURE ZONE... should do the same. In fact, in light of this new inheritance hierarchy, I have a weak preference for fixing ALTER RANGE liveness, then updating this migration to use the higher-level SQL syntax:

if n, err := exec(`SELECT system.zones WHERE id = blah`) {
  return err
} else if n == 0 {
  exec(`ALTER RANGE blah EXPERIMENTAL CONFIGURE ZONE '{stuff}'`)
}

Hmm, this is trickier than the other case because the user would have set a zone config on .system in order to affect the liveness range. Thoughts about what to do here? Probably safest to ignore setting the liveness config if the .system config exists. Or perhaps I should clone the .system config to the .liveness config.


pkg/sqlmigrations/migrations_test.go, line 743 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Make this a pkg-local const that you can use in both places.

Done.


Comments from Reviewable

@petermattis petermattis force-pushed the pmattis/default-meta-and-system-zone-configs branch 3 times, most recently from dfb92c0 to 4a54398 Compare December 12, 2017 00:51
@petermattis
Copy link
Collaborator Author

Review status: 8 of 13 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/sql/metric_test.go, line 80 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

rotted comment.

Done.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: 7 of 13 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/sqlmigrations/migrations.go, line 528 at r3 (raw file):

in light of this new inheritance hierarchy

I wasn't proposing a general inheritance hierarchy (that might be a good idea, but not what I'm suggesting here). I was just thinking of a one-time copy so that when liveness moves from .system to .liveness it doesn't lose any existing configuration.

Probably safest to ignore setting the liveness config if the .system config exists.

Just skipping this setting would lose the existing configuration. I think for consistency with the metaZone change, if a config exists for .system, we copy it to .liveness without modifying its TTL.


Comments from Reviewable

@petermattis petermattis force-pushed the pmattis/default-meta-and-system-zone-configs branch from 4a54398 to 5dc3962 Compare December 13, 2017 14:15
@petermattis
Copy link
Collaborator Author

Review status: 7 of 13 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/sqlmigrations/migrations.go, line 528 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

in light of this new inheritance hierarchy

I wasn't proposing a general inheritance hierarchy (that might be a good idea, but not what I'm suggesting here). I was just thinking of a one-time copy so that when liveness moves from .system to .liveness it doesn't lose any existing configuration.

Probably safest to ignore setting the liveness config if the .system config exists.

Just skipping this setting would lose the existing configuration. I think for consistency with the metaZone change, if a config exists for .system, we copy it to .liveness without modifying its TTL.

@benesch I'm not sure what fix you have in mind for ALTER RANGE liveness. Having it "merge" with an existing system zone config. Btw, SHOW ZONE CONFIGURATION FOR RANGE is very nice.

I've added additional logic to the migration so that base the new meta and liveness zone configs on the existing default and system zone configs respectively. The TTL seconds is only updated if they are the default. PTAL.


Comments from Reviewable

@petermattis petermattis force-pushed the pmattis/default-meta-and-system-zone-configs branch from 5dc3962 to 3b7fbb2 Compare December 13, 2017 14:30
@petermattis
Copy link
Collaborator Author

Review status: 7 of 13 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/sqlmigrations/migrations.go, line 534 at r4 (raw file):

			return err
		}
		if id == keys.RootNamespaceID {

I wonder if we should always update the .meta zone config, even if one was already present.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: 7 of 13 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/sqlmigrations/migrations.go, line 534 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I wonder if we should always update the .meta zone config, even if one was already present.

If there's an existing config with a non-default TTL, I think we should leave it as-is. If it differs only in other parameters, I think it would probably be a good idea to update it.


Comments from Reviewable

@petermattis petermattis force-pushed the pmattis/default-meta-and-system-zone-configs branch from 3b7fbb2 to 5da6e9e Compare December 15, 2017 14:36
@petermattis
Copy link
Collaborator Author

Review status: 4 of 13 files reviewed at latest revision, 5 unresolved discussions, some commit checks pending.


pkg/sqlmigrations/migrations.go, line 534 at r4 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

If there's an existing config with a non-default TTL, I think we should leave it as-is. If it differs only in other parameters, I think it would probably be a good idea to update it.

Done. This made the logic here somewhat simpler. PTAL.


Comments from Reviewable

@petermattis petermattis force-pushed the pmattis/default-meta-and-system-zone-configs branch 2 times, most recently from b790647 to 27f5353 Compare December 15, 2017 18:51
@petermattis
Copy link
Collaborator Author

Figured out the acceptance failure and it will be fixed shortly. Is this good to go?

@petermattis petermattis force-pushed the pmattis/default-meta-and-system-zone-configs branch from 27f5353 to 9beedd5 Compare December 16, 2017 01:11
Default the .meta zone config to 1h GC TTL and default the .liveness
zone config to 10m GC TTL. The shorter GC TTLs reflect the lack of need
for ever performing historical queries on these ranges coupled with the
desire to keep the meta and liveness ranges smaller.

See cockroachdb#16266
See cockroachdb#14990

Release note (general change): Clusters are now initialized with default
.meta and .liveness zones with lower GC TTL configurations.
@bdarnell
Copy link
Contributor

LGTM


Review status: 4 of 14 files reviewed at latest revision, 5 unresolved discussions, some commit checks pending.


Comments from Reviewable

@petermattis petermattis force-pushed the pmattis/default-meta-and-system-zone-configs branch from 9beedd5 to fcf822a Compare December 16, 2017 01:13
@petermattis petermattis merged commit a91ebbf into cockroachdb:master Dec 16, 2017
@petermattis petermattis deleted the pmattis/default-meta-and-system-zone-configs branch December 16, 2017 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants