rfcs: new RFC on shared-process deployments for cluster virtualization #84700

knz · 2022-07-20T11:48:01Z

Text of RFC: https://github.com/knz/cockroach/blob/20220720-rfc-in-memory-tenants/docs/RFCS/20220720_shared_process_deployments.md

Some prototype work ongoing here: #84608.
Epic: CRDB-14537

cockroach-teamcity · 2022-07-20T11:48:10Z

This change is

ajstorm

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, @nvanbenschoten, and @rafiss)

docs/RFCS/20220720_in_memory_tenants.md line 13 at r1 (raw file):

same CockroachDB process serving the KV layer.

This change is meant to (eventually) power a multitenant experience in

Nit: Can we also explicitly tie this to the unified architecture project in this summary? Internal readers may find an explicit linkage helpful to frame this within a larger body of work.

docs/RFCS/20220720_in_memory_tenants.md line 124 at r1 (raw file):

deployment, a maximum of 2 of these in-memory tenants will be
available to external network clients: one for application workloads
and one for system operators. This will power the new "multi-tenant

Nit: Let's be consistent with terminology and call this the "unified architecture".

docs/RFCS/20220720_in_memory_tenants.md line 128 at r1 (raw file):

self-hosted deployments. Serving more tenants over the network will be
restricted via a license check, reserved for exclusive use inside
Cockroach Cloud.

Nit: Should we be more explicit here and say that it's only for CC Serverless?

docs/RFCS/20220720_in_memory_tenants.md line 164 at r1 (raw file):

Tenant names will be introduced via a new UNIQUE column in `system.tenants`.

An entry will also be materialized for the system tenant with ID 1 and a

Nit: Should we restrict this "On newly created clusters..."? Above we talk about how the system tenant will not always be ID 1.

docs/RFCS/20220720_in_memory_tenants.md line 179 at r1 (raw file):

on a value separate from the tenant ID, and that can be changed at run time.

This will become the new ADMIN/GUEST capability, and will be stored

Nit: Instead of "ADMIN/GUEST" could we call this something more generic like "tenant privileges" capability? We may want to expand this beyond two modes in the future. Also, what about NONADMIN or UNPRIVILEGED as opposed to GUEST. GUEST may be difficult to interpret for some who are unaware that the alternative is ADMIN.

docs/RFCS/20220720_in_memory_tenants.md line 183 at r1 (raw file):

To support the restriction on network access, we will also introduce a
new PRIVATE/EXPORTED capability. Only tenants with cap EXPORTED will

Nit: Call this "tenant visibility"? Also, does it make sense to go with HIDDEN/VISIBLE or PRIVATE/VISIBLE instead of EXPORTED?

docs/RFCS/20220720_in_memory_tenants.md line 191 at r1 (raw file):

privileged operations to any secondary tenant. A candidate use case
could be granting the capability to view KV-level timeseries to a
non-ADMIN tenant.

Do we need any syntax around tenant privileges/visibility? Will we ever want the ability to change a tenant from privileged to non-privileged (aside from the migration case cited above) or the reverse?

docs/RFCS/20220720_in_memory_tenants.md line 210 at r1 (raw file):

Also, there will be a reference counter (incremented by internal SQL
queries and network connection), and the registry would stop a SQL
server some time after its reference count drops to zero.

This reference counting sounds interesting (and powerful) but I wonder if it's overkill for a first cut. Did you ever contemplate an explicit "stop tenant" command to shut down a SQL server? The benefits to this approach is that there's less to implement in the short term, and the behavior is more predictable.

docs/RFCS/20220720_in_memory_tenants.md line 321 at r1 (raw file):

or, alternatively,

`AS TENANT <tenantname> <string>`  (where the string is a quoted SQL statement), e.g. `AS TENANT app $$SELECT * FROM system.users$$`

What authorization, if any, would we need to perform when running this statement? Does the caller need to come from an ADMIN tenant to act on a non-ADMIN tenant?

ajstorm

Very nice to see this!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, @nvanbenschoten, and @rafiss)

knz

I'll only process nit comments in batches - leaving them as open comments for now.

Reviewed 1 of 1 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @dhartunian, @knz, @nvanbenschoten, and @rafiss)

docs/RFCS/20220720_in_memory_tenants.md line 183 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Nit: Call this "tenant visibility"? Also, does it make sense to go with HIDDEN/VISIBLE or PRIVATE/VISIBLE instead of EXPORTED?

I have thought hard about using the term "visibility" initially, and I am 70%+ confident it's not the right abstraction. If we really want to be pedantic, I'd go for the full words "WITHOUT_NETWORK" / "WITH_NETWORK".

docs/RFCS/20220720_in_memory_tenants.md line 191 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Do we need any syntax around tenant privileges/visibility? Will we ever want the ability to change a tenant from privileged to non-privileged (aside from the migration case cited above) or the reverse?

Not in a first iteration. Added words about it.

docs/RFCS/20220720_in_memory_tenants.md line 210 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

This reference counting sounds interesting (and powerful) but I wonder if it's overkill for a first cut. Did you ever contemplate an explicit "stop tenant" command to shut down a SQL server? The benefits to this approach is that there's less to implement in the short term, and the behavior is more predictable.

the TTL would be configurable. If folk want the servers to linger they can configure the TTL to "forever". It's generally bad design to introduce registry data structures without an invalidation protocol, so I'm not going to do that.

docs/RFCS/20220720_in_memory_tenants.md line 321 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

What authorization, if any, would we need to perform when running this statement? Does the caller need to come from an ADMIN tenant to act on a non-ADMIN tenant?

Good Q. Added a sentence about that. Note that David is requesting we do not provision this facility too early. We might cut it from the RFC altogether.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @dhartunian, @knz, @nvanbenschoten, and @rafiss)

docs/RFCS/20220720_in_memory_tenants.md line 129 at r4 (raw file):

everywhere" deployment model for Cockroach Cloud Dedicated, and
self-hosted deployments. Serving more tenants over the network will be
restricted via a license check, reserved for exclusive use inside

Why not let everybody use as many tenants as they want (perhaps conditioned on having a license)?
In particular, if the Obs Service is to use tenants for anything, I think they need to be available broadly.

knz

Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @knz, @nvanbenschoten, and @rafiss)

docs/RFCS/20220720_in_memory_tenants.md line 129 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Why not let everybody use as many tenants as they want (perhaps conditioned on having a license)?
In particular, if the Obs Service is to use tenants for anything, I think they need to be available broadly.

I think you're right. I re-reviewed our licensing change proposal https://docs.google.com/document/d/1UQDSAqaYGE-bfeMQbz4nPz0cveaFSxo2YP0fL0Vqo0k/edit

and I see that indeed there's no mention of locking a specific number of tenants. I'll remove this limit in the next batch of updates.

ajstorm

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @knz, @nvanbenschoten, and @rafiss)

docs/RFCS/20220720_in_memory_tenants.md line 129 at r4 (raw file):

I'll remove this limit in the next batch of updates.

IMO, it's not just about what's possible (and allowed as part of our licensing), it's also about what we're willing (and able) to test. Unless we're prepared to properly validate scenarios where we run a large number of in-memory tenants through some fairly rigorous testing, I'd think we'd be better off blocking the creation of more than one tenant.

I could totally imagine us revisiting this at some point down the road when (a) we've seen that there's customer demand for this, and (b) we have the means to properly test it.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @dhartunian, @knz, @nvanbenschoten, and @rafiss)

docs/RFCS/20220720_in_memory_tenants.md line 129 at r4 (raw file):

Previously, ajstorm (Adam Storm) wrote…

I'll remove this limit in the next batch of updates.

IMO, it's not just about what's possible (and allowed as part of our licensing), it's also about what we're willing (and able) to test. Unless we're prepared to properly validate scenarios where we run a large number of in-memory tenants through some fairly rigorous testing, I'd think we'd be better off blocking the creation of more than one tenant.

I could totally imagine us revisiting this at some point down the road when (a) we've seen that there's customer demand for this, and (b) we have the means to properly test it.

Imposing an arbitrary limit because larger values haven't been tested is an unusual thing to do. We can use whatever language we want in our docs, and print messages on the command line, etc, but otherwise let people have fun.

knz · 2022-07-21T17:09:45Z

@lunevalex is going on record elsewhere saying that we should run all dedicated / self-hosted customers using the serverless model, that is, with separate processes for all secondary tenants. This would make this RFC unnecessary and I could simply close this PR altogether.

Thoughts?

ajstorm · 2022-07-21T19:04:47Z

This would make this RFC unnecessary and I could simply close this PR altogether.

Until we have a plan for something that would replace this, we should both leave this open, and continue working on it.

Imposing an arbitrary limit because larger values haven't been tested is an unusual thing to do. We can use whatever language we want in our docs, and print messages on the command line, etc, but otherwise let people have fun.

I'm fine with that, so long (a) product is fine with it, and (b) we mark it as being in "preview" mode.

knz

In light of various convos with the tenant streaming and obs inf teams, I've simultaneously reduced the scope of the RFC (we don't need tenant capabilities in this RFC) and generalized the mechanisms (so we can use them in standalone SQL pods too, for help in testing tenant-to-tenant replication).

It's time to get the RFC more seriously reviewed. We'd like a prototype of this to make its way to 22.2, or very early in the 23.1 cycle.

Reviewed 1 of 1 files at r5, 1 of 1 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @knz, @nvanbenschoten, and @rafiss)

docs/RFCS/20220720_in_memory_tenants.md line 13 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Nit: Can we also explicitly tie this to the unified architecture project in this summary? Internal readers may find an explicit linkage helpful to frame this within a larger body of work.

We're going to decouple this from unified architecture, and instead focus on tenant-to-tenant repl as the main use case.

docs/RFCS/20220720_in_memory_tenants.md line 124 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Nit: Let's be consistent with terminology and call this the "unified architecture".

Removed the sentence.

docs/RFCS/20220720_in_memory_tenants.md line 179 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Nit: Instead of "ADMIN/GUEST" could we call this something more generic like "tenant privileges" capability? We may want to expand this beyond two modes in the future. Also, what about NONADMIN or UNPRIVILEGED as opposed to GUEST. GUEST may be difficult to interpret for some who are unaware that the alternative is ADMIN.

We don't need tenant capabilities for this RFC. I removed this section entirely.

docs/RFCS/20220720_in_memory_tenants.md line 321 at r1 (raw file):

Previously, knz (kena) wrote…

Good Q. Added a sentence about that. Note that David is requesting we do not provision this facility too early. We might cut it from the RFC altogether.

Removed.

docs/RFCS/20220720_in_memory_tenants.md line 129 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Imposing an arbitrary limit because larger values haven't been tested is an unusual thing to do. We can use whatever language we want in our docs, and print messages on the command line, etc, but otherwise let people have fun.

Done.

dhartunian

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @knz, @nvanbenschoten, and @rafiss)

docs/RFCS/20220720_dynamic_tenant_servers.md line 257 at r8 (raw file):

Internally, the HTTP service will route incoming requests to a
different HTTP request mux depending on the `tenant` cookie, using the

For /api/v2/ might be worth also supporting a header-based approach.

docs/RFCS/20220720_dynamic_tenant_servers.md line 265 at r8 (raw file):

to set the `tenant` cookie to the specified value (e.g. `/tenant/app`
sets cookie to `app`, `/tenant/system` sets cookie to `/system`) and
redirect to the `/` URL.

Will it be possible to query the list of known tenant names without a session? My assumption is not since that data itself lives in SQL.

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @knz, @nvanbenschoten, and @rafiss)

docs/RFCS/20220720_dynamic_tenant_servers.md line 257 at r8 (raw file):

Previously, dhartunian (David Hartunian) wrote…

For /api/v2/ might be worth also supporting a header-based approach.

I'll add it.

docs/RFCS/20220720_dynamic_tenant_servers.md line 265 at r8 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Will it be possible to query the list of known tenant names without a session? My assumption is not since that data itself lives in SQL.

This is a good insight!

Yes, we definitely need a new endpoint to query the list of tenant names (if only to build the selector in the front-end).
We do not need a session for that. We can make the endpoint talk to the tenant registry directly.

knz

Reviewed 1 of 1 files at r10, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @nvanbenschoten, and @rafiss)

docs/RFCS/20220720_dynamic_tenant_servers.md line 257 at r8 (raw file):

Previously, knz (kena) wrote…

I'll add it.

Done.

docs/RFCS/20220720_dynamic_tenant_servers.md line 265 at r8 (raw file):

Previously, knz (kena) wrote…

This is a good insight!

Yes, we definitely need a new endpoint to query the list of tenant names (if only to build the selector in the front-end).
We do not need a session for that. We can make the endpoint talk to the tenant registry directly.

Done.

89966: sql: add tenant names r=postamar,stevendanna,ecwall a=knz As spelled out by #84700, this PR adds: - a new `name` column to `system.tenants`; - a new overload to `create_tenant()` which takes the tenant name as parameter; - a new built-in function `rename_tenant()`. See the enclosed commits for details. I am welcoming alternate approaches to create the name column as long as it properly enforces uniqueness, enables by-name point lookups and simplifies the task of keeping it coherent with the info payload. Epic: CRDB-14537 90011: changefeedccl: Ensure correct file ordering. r=miretskiy a=miretskiy When async flushing enabled, the following sequence of events is possible (even if very unlikely): * k@t1 is emitted, causing async flush to write file f1 * k@t2 is emitted, causing async flush to write file f2 * f2 is written out before f1. In this unlikely scenario -- and the reason why it's unlikely is that we have to generate megabytes of data to cause a flush, unless, file_size parameter was pathologically small -- if a client were to read the contents of the directory right after step 2, and then read directory after step 3, the client will first observe k@t2, and then observe k@t1 -- which is a violation of ordering guarantees. This PR fixes this issue by adopting a queue so that if there is a pending flush in flight, the next flush is queued behind. It is possible that this simple approach may result in throughput decrease. This situation will occur if the underlying storage (s3 or gcp) cannot keep up with writing out data before the next file comes in. However, at this point, such situation is unlikely for two reasons: one, the rest of changefeed machinery must be able to generate one or more files worth of data before the previous flush completes -- and this task in of itself is not trivial, and two, changefeed must have enough memory allocated to it so that pushback mechanism does not trigger. The above assumption was validated in reasonably sized test -- i.e. the non-zero queue depth was never observed. Nonetheless, this PR also adds a log message which may be helpful to detect the situation when the sink might not keep up with the incoming data rate. A more complex solution -- for example, allow unlimited inflight requests during backfill -- may be revisited later, if the above assumption proven incorrect. Fixes #89683 Release note: None. 90236: vendor: bump Pebble to 0090519bf0bb r=itsbilal a=coolcom200 ``` 0090519b metrics: expose fsync latency as prometheus histogram ``` Release note: None Epic: None 90284: multiregionccl: add a datadriven test for secondary region failover r=arulajmani a=rafiss fixes #90169 This test shows that the lease holder and voters fail over to the secondary region when the primary region is taken offline. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Leon Fattakhov <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>

This also updates previously published RFCs. Release note: None

stevendanna

I'll admit that I just skimmed it again and didn't do a deep review. Let's commit this for posterities sake as it represents what was implemented.

superseded

knz · 2023-08-25T14:31:38Z

TFYR!

bors r=stevendanna

craig · 2023-08-25T15:38:29Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2023-08-25T16:59:29Z

Build succeeded:

Bazel Essential CI (Cockroach)

knz force-pushed the 20220720-rfc-in-memory-tenants branch from 280f97a to 35a51a6 Compare July 20, 2022 11:51

knz requested review from rafiss, ajstorm and dhartunian July 20, 2022 11:51

knz marked this pull request as ready for review July 20, 2022 11:52

knz requested a review from a team as a code owner July 20, 2022 11:52

knz requested a review from nvanbenschoten July 20, 2022 11:52

knz force-pushed the 20220720-rfc-in-memory-tenants branch 3 times, most recently from c99c186 to 5c21aba Compare July 20, 2022 15:25

ajstorm previously requested changes Jul 20, 2022

View reviewed changes

ajstorm reviewed Jul 20, 2022

View reviewed changes

knz force-pushed the 20220720-rfc-in-memory-tenants branch 2 times, most recently from 4c30c5a to a09cf18 Compare July 20, 2022 15:46

knz commented Jul 20, 2022

View reviewed changes

andreimatei reviewed Jul 20, 2022

View reviewed changes

knz commented Jul 21, 2022

View reviewed changes

ajstorm requested a review from andreimatei July 21, 2022 12:33

ajstorm reviewed Jul 21, 2022

View reviewed changes

ajstorm self-requested a review July 21, 2022 12:33

andreimatei reviewed Jul 21, 2022

View reviewed changes

knz force-pushed the 20220720-rfc-in-memory-tenants branch 3 times, most recently from a4099a4 to ca3795f Compare August 10, 2022 10:53

knz commented Aug 10, 2022

View reviewed changes

knz force-pushed the 20220720-rfc-in-memory-tenants branch 2 times, most recently from 393f152 to 8df53ba Compare August 10, 2022 10:56

knz force-pushed the 20220720-rfc-in-memory-tenants branch from 8df53ba to 4aaab29 Compare August 10, 2022 11:14

dhartunian reviewed Aug 10, 2022

View reviewed changes

knz commented Aug 10, 2022

View reviewed changes

knz force-pushed the 20220720-rfc-in-memory-tenants branch 4 times, most recently from d5e450c to 61c25f5 Compare August 15, 2022 09:26

knz commented Aug 15, 2022

View reviewed changes

knz force-pushed the 20220720-rfc-in-memory-tenants branch from 61c25f5 to a10412a Compare August 15, 2022 09:33

knz force-pushed the 20220720-rfc-in-memory-tenants branch from a10412a to c11aae6 Compare August 23, 2022 10:11

knz mentioned this pull request Aug 24, 2022

sql: connecting to an inactive tenant should return an error #83650

Closed

knz force-pushed the 20220720-rfc-in-memory-tenants branch 2 times, most recently from 3f8f319 to 20c2dd9 Compare September 27, 2022 15:09

nvanbenschoten removed their request for review September 27, 2022 19:41

knz mentioned this pull request Oct 14, 2022

sql: add tenant names #89966

Merged

rafiss removed their request for review November 23, 2022 21:52

rfcs: new RFC on shared-process deployments for cluster virtualization

d9f31ed

This also updates previously published RFCs. Release note: None

knz force-pushed the 20220720-rfc-in-memory-tenants branch from 20c2dd9 to d9f31ed Compare August 24, 2023 11:05

knz changed the title ~~rfcs: new RFC on in-process tenant servers~~ rfcs: new RFC on shared-process deployments for cluster virtualization Aug 24, 2023

stevendanna approved these changes Aug 25, 2023

View reviewed changes

craig bot merged commit 8645582 into cockroachdb:master Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfcs: new RFC on shared-process deployments for cluster virtualization #84700

rfcs: new RFC on shared-process deployments for cluster virtualization #84700

knz commented Jul 20, 2022 •

edited

Loading

cockroach-teamcity commented Jul 20, 2022

ajstorm left a comment

ajstorm left a comment

knz left a comment

andreimatei left a comment

knz left a comment

ajstorm left a comment

andreimatei left a comment

knz commented Jul 21, 2022

ajstorm commented Jul 21, 2022

knz left a comment •

edited

Loading

dhartunian left a comment

knz left a comment

knz left a comment

stevendanna left a comment

knz commented Aug 25, 2023

craig bot commented Aug 25, 2023

craig bot commented Aug 25, 2023

rfcs: new RFC on shared-process deployments for cluster virtualization #84700

rfcs: new RFC on shared-process deployments for cluster virtualization #84700

Conversation

knz commented Jul 20, 2022 • edited Loading

cockroach-teamcity commented Jul 20, 2022

ajstorm left a comment

Choose a reason for hiding this comment

ajstorm left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

ajstorm left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

knz commented Jul 21, 2022

ajstorm commented Jul 21, 2022

knz left a comment • edited Loading

Choose a reason for hiding this comment

dhartunian left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

stevendanna left a comment

Choose a reason for hiding this comment

knz commented Aug 25, 2023

craig bot commented Aug 25, 2023

craig bot commented Aug 25, 2023

knz commented Jul 20, 2022 •

edited

Loading

knz left a comment •

edited

Loading