Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: polish "cascading" zone configs #28901

Closed
tbg opened this issue Aug 21, 2018 · 21 comments
Closed

server: polish "cascading" zone configs #28901

tbg opened this issue Aug 21, 2018 · 21 comments
Labels
A-kv-client Relating to the KV client and the KV interface. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@tbg
Copy link
Member

tbg commented Aug 21, 2018

Split from #14990 (comment).

When a zone config is created, the parent zone config is copied and the copy adapted with the fields specified by the user. This means that if the parent zone config changes later, the child won't receive the update.

This is problematic. For example, the .default zone config specifies three replicas when a cluster is replicated. Correspondingly, the .liveness, .meta and system zone configs now have a replication factor of three.

If a user changes the replication factor to 5 cluster-wide, they are going to change the .default zone config to that effect. Any other zone configs won't receive this update and in particular the cluster won't actually be able to survive two outages because vital system ranges are only 3x replicated.

These semantics are questionable and uninutitive, so we should change them: don't fill the fields that are inherited from a parent; instead, walk the chain of parents whenever an unset field's value needs to be known.

@tbg tbg added the A-kv-client Relating to the KV client and the KV interface. label Aug 21, 2018
@tbg tbg added this to the 2.2 milestone Aug 21, 2018
@knz
Copy link
Contributor

knz commented Sep 13, 2018

A hurdle is that Go does not let us see the difference between "field is unset" and "field value set by user also happens to be the default value for that type in Go".

So really solving this will require changing at least the ZoneConfig struct (with booleans everywhere) and the marshal/unmarshal functions.

@knz knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. labels Sep 13, 2018
@ridwanmsharif ridwanmsharif self-assigned this Sep 17, 2018
@knz
Copy link
Contributor

knz commented Sep 18, 2018

See also #30375. There may be some incorrectness to solve as part of this work.

@ridwanmsharif
Copy link

As @knz mentioned, we must first change the ZoneConfig structure to support knowledge of an extra bit of information - whether the value is inherited or explicitly set. This information would be required for the NumReplicas, RangeMinBytes, RangeMaxBytes and GC at the very least. I wanted to ask if we needed that also for the Constraints and LeasePreferences?

One way of getting this extra information that is required is having booleans in the structure for each of them. Alternatively we could have a reserved value for each to serve as a flag that it has not been explicitly set (For example -1 for NumReplicas, RangeMinBytes, RangeMaxBytes). I think we do something similar for flagging if a zone is a subzone placeholder:

func (z *ZoneConfig) IsSubzonePlaceholder() bool {

Are there other ways of deducing whether the field is inherited?

@tbg
Copy link
Member Author

tbg commented Sep 18, 2018

To play devil's advocate, zero values make no sense (or aren't necessary to be used) for the four fields that you mentioned initially (though we might have to do some work to make sure users can't use zero values there). They do seem to make sense for Constraints and LeasePreferences.

For compat purposes, it might be easier to add a bool to each field that indicates whether the field is intentionally empty (or whether it just happens to be set to zero). This isn't pretty, but allows existing zone configs (which won't have any of the bools set) to work as today.

@knz
Copy link
Contributor

knz commented Sep 18, 2018

IF you use separate fields you need to flip the boolean: the default false must mean "the value has been set explicitly".

So probably the boolean will be called "inherit"

@ridwanmsharif
Copy link

So to confirm, this also applies to Constraints and LeasePreferences? Their values should also cascade if not explicitly set? The boolean values for each field would be something along the lines of ExplicitlySet so essentially all fields not set by the user are inherited by default (with the default zone config having all the booleans set to true), we have to change marshaling/unmarshaling to set these to true when explicitly set.

I also just wanted to note somthething for future discussions about the zone tables and subzone structures: While navigating the hierarchies of zones and subzones (with zones having their own list of subzones, where the list contains index subzones as well as partitions on index subzones), It becomes fairly hairy to figure out how to inherit fields. If they we're all in the system.zones table keyed by (descriptor_id, index_id, partition) it would make this a lot cleaner. Talked about it briefly with @benesch, maybe we could write up an RFC for changing the zone table layout at some time

@benesch
Copy link
Contributor

benesch commented Sep 18, 2018

To play devil's advocate, zero values make no sense (or aren't necessary to be used) for the four fields that you mentioned initially (though we might have to do some work to make sure users can't use zero values there). They do seem to make sense for Constraints and LeasePreferences.

That's unfortunately not true:

  // TTLSeconds specifies the maximum age of a value before it's
  // garbage collected. Only older versions of values are garbage
  // collected. Specifying <=0 mean older versions are never GC'd.

🤦‍♂️

It's also perfectly reasonable to set RangeMinBytes to zero to indicate that you don't want a range to be merged away when it's empty.

So to confirm, this also applies to Constraints and LeasePreferences? Their values should also cascade if not explicitly set? The boolean values for each field would be something along the lines of ExplicitlySet so essentially all fields not set by the user are inherited by default (with the default zone config having all the booleans set to true), we have to change marshaling/unmarshaling to set these to true when explicitly set.

I'll voice my vote in favor of making these not inherit. If your database has the constraints [+us] and your table has the constraints [+europe], I think it would be extraordinarily confusing if the resulting constraints on the table were [+us, +europe]. We'd have to invent some new constraint syntax to indicate that you wanted to start from a clean slate.

For compat purposes, it might be easier to add a bool to each field that indicates whether the field is intentionally empty (or whether it just happens to be set to zero). This isn't pretty, but allows existing zone configs (which won't have any of the bools set) to work as today.

Note that also removing gogoproto.nullable on each field in the proto is perfectly backwards compatible and allows you tell whether a given field is explicitly set or unset.

@a-robinson
Copy link
Contributor

a-robinson commented Sep 18, 2018

So to confirm, this also applies to Constraints and LeasePreferences? Their values should also cascade if not explicitly set? The boolean values for each field would be something along the lines of ExplicitlySet so essentially all fields not set by the user are inherited by default (with the default zone config having all the booleans set to true), we have to change marshaling/unmarshaling to set these to true when explicitly set.

I'll voice my vote in favor of making these not inherit. If your database has the constraints [+us] and your table has the constraints [+europe], I think it would be extraordinarily confusing if the resulting constraints on the table were [+us, +europe]. We'd have to invent some new constraint syntax to indicate that you wanted to start from a clean slate.

Their values should cascade if they're empty, but not if they're set to different things. I wouldn't want setting a table's RangeMinBytes to prevent it from inheriting its database's placement constraints.

What we should be striving for is that zone config updates to different zones are commutative. The end state should be the same regardless of whether I run:

ALTER DATABASE foo CONFIGURE ZONE USING gc.ttlseconds = 30;
ALTER TABLE foo.bar CONFIGURE ZONE USING constraints = '[+region=us-east]';

or

ALTER TABLE foo.bar CONFIGURE ZONE USING constraints = '[+region=us-east]';
ALTER DATABASE foo CONFIGURE ZONE USING gc.ttlseconds = 30;

Today that isn't the case, but it should be.

Note that also removing gogoproto.nullable on each field in the proto is perfectly backwards compatible and allows you tell whether a given field is explicitly set or unset.

That is what's for, after all :)

@a-robinson
Copy link
Contributor

What we should be striving for is that zone config updates to different zones are commutative. The end state should be the same regardless of whether I run:

Note that this is currently true if you're modifying the same field in each zone, e.g.:

ALTER DATABASE foo CONFIGURE ZONE USING constraints = '[+region=us-east]';
ALTER TABLE foo.bar CONFIGURE ZONE USING constraints = '[+region=us-west]';

Regardless of the order of the statements, table foo.bar will be constrained to us-west and everything else in database foo will be constrained to us-east.

@ridwanmsharif
Copy link

ridwanmsharif commented Sep 18, 2018

Other than the type difference (making it nullable changes the type to a ptr so we can check if the field is set) that seems a great idea. I think I'm going with that one if no one objects. Also as @a-robinson mentioned, if the values are set then they won't look them up in their parents so in your example @benesch:

I'll voice my vote in favor of making these not inherit. If your database has the constraints [+us] and your table has the constraints [+europe], I think it would be extraordinarily confusing if the resulting constraints on the table were [+us, +europe]. We'd have to invent some new constraint syntax to indicate that you wanted to start from a clean slate.

The table constraints would remain as [+europe]. If however the database has constraints [+us] and you have a table with no constraints explicitly set, it'll inherit the [+us] constraint. If the db changes its constraints to now be [+europe], that will apply to the table too. And I do think thats the behavior we want with the cascading zone configs.

@benesch
Copy link
Contributor

benesch commented Sep 18, 2018

The table constraints would remain as [+europe]. If however the database has constraints [+us] and you have a table with no constraints explicitly set, it'll inherit the [+us] constraint. If the db changes its constraints to now be [+europe], that will apply to the table too. And I do think thats the behavior we want with the cascading zone configs.

Yes, that's exactly what I mean. Sorry I wasn't more clear.

@knz
Copy link
Contributor

knz commented Sep 18, 2018 via email

craig bot pushed a commit that referenced this issue Oct 4, 2018
30426: rfc: Cascading Zone Configs r=ridwanmsharif a=ridwanmsharif

Adds an RFC to discuss proposals and design for cascading zone configs.
Discusses changes to the `ZoneConfig` structure, why it is needed, the
different ways we could change it and designs what the change  should
look like

Issue: #28901 

Release note: None

Co-authored-by: Ridwan Sharif <[email protected]>
@petermattis petermattis removed this from the 2.2 milestone Oct 5, 2018
ridwanmsharif pushed a commit to ridwanmsharif/cockroach that referenced this issue Oct 11, 2018
Very much a WIP. Modifies the `ZoneConfig` struct to
make fields optional now. If fields are not set, they
are looked for in the heirarchy. Refer to the RFC for
more details.

Issue: cockroachdb#28901
Implements most of RFC: cockroachdb#30426

Release note: None
craig bot pushed a commit that referenced this issue Oct 12, 2018
30611: sql, config: make zone configs cascade r=ridwanmsharif a=ridwanmsharif

Modifies the `ZoneConfig` struct to
make fields optional now. If fields are not set, they
are looked for in the heirarchy. Refer to the RFC for
more details.

Issue: #28901
Implements most of RFC: #30426

TODO:
- [x] Update `set_zone_config` to use the new structure and validate them
- [x] Exhaustively test cascading zone configs

Release note: Changes behavior of Zone Configs.

Co-authored-by: Ridwan Sharif <[email protected]>
@ridwanmsharif
Copy link

Zone configs do cascade now. Leaving this open though until the following is complete:

  • Add a column showing where each field is set from (as described in the RFC)
  • Add a SQL command of the form ALTER ... CONFIGURE ZONE UNSET filed_name to unset a specific field
  • Add some validation assertions to make sure users never set per-replica constraints without specifying the replication factor. Similarly for range min and max sizes.
  • Add a roachtest to test how this feature behaves with some nodes using the old zone config system.

@awoods187
Copy link
Contributor

@tbg should this be closed now that we have #30611

@tbg
Copy link
Member Author

tbg commented Mar 25, 2019

@awoods187 see the TODO list above. Most of that is UX so I'll defer to your opinion on how much of it should actually happen.

@awoods187 awoods187 changed the title server: make zone configs "cascading" server: polish "cascading" zone configs Mar 25, 2019
@awoods187
Copy link
Contributor

Ah I missed that--updated the title to make it clear

@knz
Copy link
Contributor

knz commented Apr 1, 2019

@awoods187 I think the remaining TODO items are for the SQL team not core any more. Although I might be convinced to work on this since I touched this code last and it's going to come under the umbrella of DUX.

@lunevalex
Copy link
Collaborator

@awoods187 moving to SQL based on the last comment from @knz, please close if no longer relevant.

@awoods187
Copy link
Contributor

That makes sense. @ajstorm I think this mostly translates to "locality" zone configurations which we've moved away from users interacting with directly in multi-region. Do you think we can close it out?

@ajstorm
Copy link
Collaborator

ajstorm commented Apr 6, 2021

The above thread is very long, and seems to contain a bunch of historical context that I don't have. @awoods187 are you able to net this out a bit more?

@rafiss rafiss added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label May 12, 2021
@irfansharif
Copy link
Contributor

The remaining work here is UX, and tracked in #31398.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-client Relating to the KV client and the KV interface. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests