Skip to content

Commit

Permalink
spanconfig,kv: merge adjacent ranges with identical configs
Browse files Browse the repository at this point in the history
Fixes cockroachdb#72389.
Fixes cockroachdb#66063 (gated behind a cluster setting).

This should drastically reduce the total number of ranges in the system,
especially when running with a large number of tables and/or tenants. To
understand what the new set of split points are, consider the following
test snippet:

    exec-sql tenant=11
    CREATE DATABASE db;
    CREATE TABLE db.t0();
    CREATE TABLE db.t1();
    CREATE TABLE db.t2();
    CREATE TABLE db.t3();
    CREATE TABLE db.t4();
    CREATE TABLE db.t5();
    CREATE TABLE db.t6();
    CREATE TABLE db.t7();
    CREATE TABLE db.t8();
    CREATE TABLE db.t9();
    ALTER TABLE db.t5 CONFIGURE ZONE using num_replicas = 42;
    ----

    # If installing a unique zone config for a table in the middle, we
    # should observe three splits (before, the table itself, and after).

    diff offset=48
    ----
    --- gossiped system config span (legacy)
    +++ span config infrastructure (current)
    ...
     /Tenant/10                                 database system (tenant)
     /Tenant/11                                 database system (tenant)
    +/Tenant/11/Table/106                       range default
    +/Tenant/11/Table/111                       num_replicas=42
    +/Tenant/11/Table/112                       range default

This PR introduces two cluster settings to selectively opt into this
optimization: spanconfig.{tenant,host}_coalesce_adjacent.enabled
(defaulting to true and false respectively). We also don't coalesce
system table ranges on the host tenant. We had a few implementation
choices here:

(a) Down in KV, augment the spanconfig.Store to coalesce the in-memory
    state when updating entries. On every update, we'd check if the span
    we're writing to has the same config as the preceding and/or
    succeeding one, and if so, write out a larger span instead. We
    previously prototyped a form of this in cockroachdb#68491.

    Pros:
    - reduced memory footprint of spanconfig.Store
    - read path (computing splits) stays fast -- just have to look up
      the right span from the backing interval tree
    Cons:
    - uses more persisted storage than necessary
    - difficult to disable the optimization dynamically (though still
      possible -- we'd effectively have to restart the KVSubscriber and
      populate in-memory span config state using a store that
      does/doesn't coalesce configs)
    - difficult to implement; our original prototype did not have cockroachdb#73150
      which is important for reducing reconciliation round trips

(b) Same as (a) but coalesce configs up in the spanconfig.Store
    maintained in reconciler itself.

    Pros:
    - reduced storage use (both persisted and in-memory)
    - read path (computing splits) stays fast -- just have to look up
      the right span from the backing interval tree
    Cons:
    - very difficult to disable the optimization dynamically (each
      tenant process would need to be involved)
    - most difficult to implement

(c) Same as (a) but through another API on the spanconfig.Store
    interface that accepts only a single update at a time and does not
    generate deltas (not something we need down in KV). Removes the
    implementation complexity.

(d) Keep the contents of `system.span_configurations` and the in-memory
    state of spanconfig.Stores as it is today, uncoalesced. When
    determining split points, iterate through adjacent configs within
    the provided key bounds and see whether we could ignore certain
    split keys.

    Pros:
    - easiest to implement
    - easy to disable the optimization dynamically, for ex. through a
      cluster setting
    Cons:
    - uses more storage (persisted and in-memory) than necessary
    - read path (computing splits) is more expensive if iterating
      through adjacent configs

----

This PR implements option (d). For a benchmark on how slow (d) is going
to be in practice with varying numbers of entries to be scanning over
(10k, 100k, 1m):

    $ dev bench pkg/spanconfig/spanconfigstore -f=BenchmarkStoreComputeSplitKey -v

    BenchmarkStoreComputeSplitKey
    BenchmarkStoreComputeSplitKey/num-entries=10000
    BenchmarkStoreComputeSplitKey/num-entries=10000-10               1166842 ns/op
    BenchmarkStoreComputeSplitKey/num-entries=100000
    BenchmarkStoreComputeSplitKey/num-entries=100000-10             12273375 ns/op
    BenchmarkStoreComputeSplitKey/num-entries=1000000
    BenchmarkStoreComputeSplitKey/num-entries=1000000-10           140591766 ns/op
    PASS

It's feasible that in the future we re-work this in favor of (c)
possibly.

Release note: None
Release justification: high benefit change to existing functionality
(affecting only multi-tenant clusters).
  • Loading branch information
irfansharif committed Apr 26, 2022
1 parent 0358dbb commit 42b4cd3
Show file tree
Hide file tree
Showing 24 changed files with 637 additions and 128 deletions.
4 changes: 4 additions & 0 deletions pkg/ccl/serverccl/statusccl/tenant_status_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,10 @@ func TestTenantStatusAPI(t *testing.T) {
ctx := context.Background()

knobs := tests.CreateTestingKnobs()
knobs.SpanConfig = &spanconfig.TestingKnobs{
// Some of these subtests expect multiple (uncoalesced) tenant ranges.
StoreDisableCoalesceAdjacent: true,
}

testHelper := newTestTenantHelper(t, 3 /* tenantClusterSize */, knobs)
defer testHelper.cleanup(ctx, t)
Expand Down
1 change: 1 addition & 0 deletions pkg/ccl/serverccl/statusccl/tenant_test_utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ func newTestTenantHelper(
t.Helper()

params, _ := tests.CreateTestServerParams()
params.Knobs = knobs
testCluster := serverutils.StartNewTestCluster(t, 1 /* numNodes */, base.TestClusterArgs{
ServerArgs: params,
})
Expand Down
175 changes: 111 additions & 64 deletions pkg/ccl/spanconfigccl/spanconfigcomparedccl/testdata/multitenant
Original file line number Diff line number Diff line change
Expand Up @@ -31,50 +31,132 @@ diff offset=50
reconcile tenant=11
----

# As soon as tenant-11 starts reconciling, we should observe more fine-grained
# span configs within its keyspan. This isn't true for the legacy system.
# Even after tenant-11 starts reconciling, we shouldn't observe fine-grained
# span configs within its keyspan. tenant-11 only has system tables, so
# everything will be just within the one range.

configs version=current offset=43 limit=5
diff offset=48 limit=10
----
--- gossiped system config span (legacy)
+++ span config infrastructure (current)
...
/Table/50 database system (host)
/Tenant/10 database system (tenant)
/Tenant/11 database system (tenant)
/Tenant/11/Table/4 database system (tenant)
/Tenant/11/Table/5 database system (tenant)
/Tenant/10 database system (tenant)
/Tenant/11 database system (tenant)

# Once tenant-11 starts adding user tables however (with different configs), we
# should see finer-grained splits). With 10 tables, all with 'range default',
# we should see a single coalesced range for the user tables.

exec-sql tenant=11
CREATE DATABASE db;
CREATE TABLE db.t0();
CREATE TABLE db.t1();
CREATE TABLE db.t2();
CREATE TABLE db.t3();
CREATE TABLE db.t4();
CREATE TABLE db.t5();
CREATE TABLE db.t6();
CREATE TABLE db.t7();
CREATE TABLE db.t8();
CREATE TABLE db.t9();
----

diff offset=48
----
--- gossiped system config span (legacy)
+++ span config infrastructure (current)
...
/Tenant/10 database system (tenant)
/Tenant/11 database system (tenant)
+/Tenant/11/Table/106 range default


# If installing a unique zone config for a table in the middle, we should
# observe three splits (before, the table itself, and after).

exec-sql tenant=11
ALTER TABLE db.t5 CONFIGURE ZONE using num_replicas = 42;
----

configs version=legacy offset=43
diff offset=48
----
--- gossiped system config span (legacy)
+++ span config infrastructure (current)
...
/Table/50 range system
/Tenant/10 database system (tenant)
/Tenant/11 database system (tenant)
/Tenant/10 database system (tenant)
/Tenant/11 database system (tenant)
+/Tenant/11/Table/106 range default
+/Tenant/11/Table/111 num_replicas=42
+/Tenant/11/Table/112 range default

diff offset=48 limit=10

# If adjacent tables also add the same config, they should be merged into one.

exec-sql tenant=11
ALTER TABLE db.t6 CONFIGURE ZONE using num_replicas = 42;
----

diff offset=48
----
--- gossiped system config span (legacy)
+++ span config infrastructure (current)
...
/Tenant/10 database system (tenant)
/Tenant/11 database system (tenant)
+/Tenant/11/Table/4 database system (tenant)
+/Tenant/11/Table/5 database system (tenant)
+/Tenant/11/Table/6 database system (tenant)
+/Tenant/11/Table/7 database system (tenant)
+/Tenant/11/Table/11 database system (tenant)
+/Tenant/11/Table/12 database system (tenant)
+/Tenant/11/Table/13 database system (tenant)
+/Tenant/11/Table/14 database system (tenant)
...

# Sanity check that new tenant tables show up correctly.
+/Tenant/11/Table/106 range default
+/Tenant/11/Table/111 num_replicas=42
+/Tenant/11/Table/113 range default

exec-sql tenant=11
CREATE DATABASE db;
CREATE TABLE db.t1();
CREATE TABLE db.t2();
ALTER TABLE db.t1 CONFIGURE ZONE using num_replicas = 42, gc.ttlseconds = 1000;
ALTER TABLE db.t4 CONFIGURE ZONE using num_replicas = 42;
----

diff offset=48
----
--- gossiped system config span (legacy)
+++ span config infrastructure (current)
...
/Tenant/10 database system (tenant)
/Tenant/11 database system (tenant)
+/Tenant/11/Table/106 range default
+/Tenant/11/Table/110 num_replicas=42
+/Tenant/11/Table/113 range default

# Dropping tables should drop the corresponding split point, picking the next
# applicable ones (if any).

exec-sql tenant=11
DROP TABLE db.t5;
----

diff offset=48
----
--- gossiped system config span (legacy)
+++ span config infrastructure (current)
...
/Tenant/10 database system (tenant)
/Tenant/11 database system (tenant)
+/Tenant/11/Table/106 range default
+/Tenant/11/Table/110 num_replicas=42
+/Tenant/11/Table/113 range default

exec-sql tenant=11
DROP TABLE db.t4;
----

diff offset=48
----
--- gossiped system config span (legacy)
+++ span config infrastructure (current)
...
/Tenant/10 database system (tenant)
/Tenant/11 database system (tenant)
+/Tenant/11/Table/106 range default
+/Tenant/11/Table/112 num_replicas=42
+/Tenant/11/Table/113 range default

exec-sql tenant=11
DROP TABLE db.t6;
----

diff offset=48
Expand All @@ -84,41 +166,6 @@ diff offset=48
...
/Tenant/10 database system (tenant)
/Tenant/11 database system (tenant)
+/Tenant/11/Table/4 database system (tenant)
+/Tenant/11/Table/5 database system (tenant)
+/Tenant/11/Table/6 database system (tenant)
+/Tenant/11/Table/7 database system (tenant)
+/Tenant/11/Table/11 database system (tenant)
+/Tenant/11/Table/12 database system (tenant)
+/Tenant/11/Table/13 database system (tenant)
+/Tenant/11/Table/14 database system (tenant)
+/Tenant/11/Table/15 database system (tenant)
+/Tenant/11/Table/19 database system (tenant)
+/Tenant/11/Table/20 database system (tenant)
+/Tenant/11/Table/21 database system (tenant)
+/Tenant/11/Table/23 database system (tenant)
+/Tenant/11/Table/24 database system (tenant)
+/Tenant/11/Table/25 database system (tenant)
+/Tenant/11/Table/26 database system (tenant)
+/Tenant/11/Table/27 database system (tenant)
+/Tenant/11/Table/28 database system (tenant)
+/Tenant/11/NamespaceTable/30 database system (tenant)
+/Tenant/11/NamespaceTable/Max database system (tenant)
+/Tenant/11/Table/32 database system (tenant)
+/Tenant/11/Table/33 database system (tenant)
+/Tenant/11/Table/34 database system (tenant)
+/Tenant/11/Table/35 database system (tenant)
+/Tenant/11/Table/36 database system (tenant)
+/Tenant/11/Table/37 database system (tenant)
+/Tenant/11/Table/39 database system (tenant)
+/Tenant/11/Table/40 database system (tenant)
+/Tenant/11/Table/41 database system (tenant)
+/Tenant/11/Table/42 database system (tenant)
+/Tenant/11/Table/43 database system (tenant)
+/Tenant/11/Table/44 database system (tenant)
+/Tenant/11/Table/46 database system (tenant)
+/Tenant/11/Table/50 database system (tenant)
+/Tenant/11/Table/106 ttl_seconds=1000 num_replicas=42
+/Tenant/11/Table/107 range default
+/Tenant/11/Table/106 range default

# vim:ft=diff
7 changes: 6 additions & 1 deletion pkg/kv/kvserver/client_spanconfigs_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/kv/kvserver"
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/security"
"github.com/cockroachdb/cockroach/pkg/settings/cluster"
"github.com/cockroachdb/cockroach/pkg/spanconfig"
"github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigstore"
"github.com/cockroachdb/cockroach/pkg/sql/sessiondata"
Expand All @@ -37,7 +38,11 @@ import (
func TestSpanConfigUpdateAppliedToReplica(t *testing.T) {
defer leaktest.AfterTest(t)()

spanConfigStore := spanconfigstore.New(roachpb.TestingDefaultSpanConfig())
spanConfigStore := spanconfigstore.New(
roachpb.TestingDefaultSpanConfig(),
cluster.MakeTestingClusterSettings(),
nil,
)
mockSubscriber := newMockSpanConfigSubscriber(spanConfigStore)

ctx := context.Background()
Expand Down
1 change: 1 addition & 0 deletions pkg/server/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -565,6 +565,7 @@ func NewServer(cfg Config, stopper *stop.Stopper) (*Server, error) {
keys.SpanConfigurationsTableID,
1<<20, /* 1 MB */
fallbackConf,
cfg.Settings,
spanConfigKnobs,
)
}
Expand Down
1 change: 1 addition & 0 deletions pkg/server/server_sql.go
Original file line number Diff line number Diff line change
Expand Up @@ -990,6 +990,7 @@ func newSQLServer(ctx context.Context, cfg sqlServerArgs) (*SQLServer, error) {
execCfg,
codec,
cfg.TenantID,
cfg.Settings,
spanConfigKnobs,
)
spanConfig.manager = spanconfigmanager.New(
Expand Down
2 changes: 2 additions & 0 deletions pkg/spanconfig/spanconfigkvsubscriber/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ go_library(
"//pkg/kv/kvclient/rangefeed/rangefeedbuffer",
"//pkg/kv/kvclient/rangefeed/rangefeedcache",
"//pkg/roachpb",
"//pkg/settings/cluster",
"//pkg/spanconfig",
"//pkg/spanconfig/spanconfigstore",
"//pkg/sql/catalog",
Expand Down Expand Up @@ -54,6 +55,7 @@ go_test(
"//pkg/security",
"//pkg/security/securitytest",
"//pkg/server",
"//pkg/settings/cluster",
"//pkg/spanconfig",
"//pkg/spanconfig/spanconfigkvaccessor",
"//pkg/spanconfig/spanconfigtestutils",
Expand Down
1 change: 1 addition & 0 deletions pkg/spanconfig/spanconfigkvsubscriber/datadriven_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ func TestDataDriven(t *testing.T) {
dummyTableID,
10<<20, /* 10 MB */
spanconfigtestutils.ParseConfig(t, "FALLBACK"),
tc.Server(0).ClusterSettings(),
&spanconfig.TestingKnobs{
KVSubscriberRangeFeedKnobs: &rangefeedcache.TestingKnobs{
OnTimestampAdvance: func(ts hlc.Timestamp) {
Expand Down
14 changes: 9 additions & 5 deletions pkg/spanconfig/spanconfigkvsubscriber/kvsubscriber.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangefeed/rangefeedbuffer"
"github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangefeed/rangefeedcache"
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/settings/cluster"
"github.com/cockroachdb/cockroach/pkg/spanconfig"
"github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigstore"
"github.com/cockroachdb/cockroach/pkg/util/hlc"
Expand Down Expand Up @@ -78,6 +79,7 @@ import (
type KVSubscriber struct {
fallback roachpb.SpanConfig
knobs *spanconfig.TestingKnobs
settings *cluster.Settings

rfc *rangefeedcache.Watcher

Expand Down Expand Up @@ -111,8 +113,12 @@ func New(
spanConfigurationsTableID uint32,
bufferMemLimit int64,
fallback roachpb.SpanConfig,
settings *cluster.Settings,
knobs *spanconfig.TestingKnobs,
) *KVSubscriber {
if knobs == nil {
knobs = &spanconfig.TestingKnobs{}
}
spanConfigTableStart := keys.SystemSQLCodec.IndexPrefix(
spanConfigurationsTableID,
keys.SpanConfigurationsTablePrimaryKeyIndexID,
Expand All @@ -121,13 +127,11 @@ func New(
Key: spanConfigTableStart,
EndKey: spanConfigTableStart.PrefixEnd(),
}
spanConfigStore := spanconfigstore.New(fallback)
if knobs == nil {
knobs = &spanconfig.TestingKnobs{}
}
spanConfigStore := spanconfigstore.New(fallback, settings, knobs)
s := &KVSubscriber{
fallback: fallback,
knobs: knobs,
settings: settings,
}
var rfCacheKnobs *rangefeedcache.TestingKnobs
if knobs != nil {
Expand Down Expand Up @@ -247,7 +251,7 @@ func (s *KVSubscriber) handleUpdate(ctx context.Context, u rangefeedcache.Update
func (s *KVSubscriber) handleCompleteUpdate(
ctx context.Context, ts hlc.Timestamp, events []rangefeedbuffer.Event,
) {
freshStore := spanconfigstore.New(s.fallback)
freshStore := spanconfigstore.New(s.fallback, s.settings, s.knobs)
for _, ev := range events {
freshStore.Apply(ctx, false /* dryrun */, ev.(*bufferEvent).Update)
}
Expand Down
2 changes: 2 additions & 0 deletions pkg/spanconfig/spanconfigkvsubscriber/kvsubscriber_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ import (

"github.com/cockroachdb/cockroach/pkg/keys"
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/settings/cluster"
"github.com/cockroachdb/cockroach/pkg/spanconfig"
"github.com/cockroachdb/cockroach/pkg/util/hlc"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
Expand Down Expand Up @@ -80,6 +81,7 @@ func TestGetProtectionTimestamps(t *testing.T) {
keys.SpanConfigurationsTableID,
1<<20, /* 1 MB */
roachpb.SpanConfig{},
cluster.MakeTestingClusterSettings(),
nil,
)
m := &manualStore{
Expand Down
1 change: 1 addition & 0 deletions pkg/spanconfig/spanconfigreconciler/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ go_library(
"//pkg/keys",
"//pkg/kv",
"//pkg/roachpb",
"//pkg/settings/cluster",
"//pkg/spanconfig",
"//pkg/spanconfig/spanconfigsqltranslator",
"//pkg/spanconfig/spanconfigstore",
Expand Down
Loading

0 comments on commit 42b4cd3

Please sign in to comment.