Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfcs: new RFC on shared-process deployments for cluster virtualization #84700

Merged
merged 1 commit into from
Aug 25, 2023

Conversation

knz
Copy link
Contributor

@knz knz commented Jul 20, 2022

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz force-pushed the 20220720-rfc-in-memory-tenants branch from 280f97a to 35a51a6 Compare July 20, 2022 11:51
@knz knz requested review from rafiss, ajstorm and dhartunian July 20, 2022 11:51
@knz knz marked this pull request as ready for review July 20, 2022 11:52
@knz knz requested a review from a team as a code owner July 20, 2022 11:52
@knz knz requested a review from nvanbenschoten July 20, 2022 11:52
@knz knz force-pushed the 20220720-rfc-in-memory-tenants branch 3 times, most recently from c99c186 to 5c21aba Compare July 20, 2022 15:25
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, @nvanbenschoten, and @rafiss)


docs/RFCS/20220720_in_memory_tenants.md line 13 at r1 (raw file):

same CockroachDB process serving the KV layer.

This change is meant to (eventually) power a multitenant experience in

Nit: Can we also explicitly tie this to the unified architecture project in this summary? Internal readers may find an explicit linkage helpful to frame this within a larger body of work.


docs/RFCS/20220720_in_memory_tenants.md line 124 at r1 (raw file):

deployment, a maximum of 2 of these in-memory tenants will be
available to external network clients: one for application workloads
and one for system operators. This will power the new "multi-tenant

Nit: Let's be consistent with terminology and call this the "unified architecture".


docs/RFCS/20220720_in_memory_tenants.md line 128 at r1 (raw file):

self-hosted deployments. Serving more tenants over the network will be
restricted via a license check, reserved for exclusive use inside
Cockroach Cloud.

Nit: Should we be more explicit here and say that it's only for CC Serverless?


docs/RFCS/20220720_in_memory_tenants.md line 164 at r1 (raw file):

Tenant names will be introduced via a new UNIQUE column in `system.tenants`.

An entry will also be materialized for the system tenant with ID 1 and a

Nit: Should we restrict this "On newly created clusters..."? Above we talk about how the system tenant will not always be ID 1.


docs/RFCS/20220720_in_memory_tenants.md line 179 at r1 (raw file):

on a value separate from the tenant ID, and that can be changed at run time.

This will become the new ADMIN/GUEST capability, and will be stored

Nit: Instead of "ADMIN/GUEST" could we call this something more generic like "tenant privileges" capability? We may want to expand this beyond two modes in the future. Also, what about NONADMIN or UNPRIVILEGED as opposed to GUEST. GUEST may be difficult to interpret for some who are unaware that the alternative is ADMIN.


docs/RFCS/20220720_in_memory_tenants.md line 183 at r1 (raw file):

To support the restriction on network access, we will also introduce a
new PRIVATE/EXPORTED capability. Only tenants with cap EXPORTED will

Nit: Call this "tenant visibility"? Also, does it make sense to go with HIDDEN/VISIBLE or PRIVATE/VISIBLE instead of EXPORTED?


docs/RFCS/20220720_in_memory_tenants.md line 191 at r1 (raw file):

privileged operations to any secondary tenant. A candidate use case
could be granting the capability to view KV-level timeseries to a
non-ADMIN tenant.

Do we need any syntax around tenant privileges/visibility? Will we ever want the ability to change a tenant from privileged to non-privileged (aside from the migration case cited above) or the reverse?


docs/RFCS/20220720_in_memory_tenants.md line 210 at r1 (raw file):

Also, there will be a reference counter (incremented by internal SQL
queries and network connection), and the registry would stop a SQL
server some time after its reference count drops to zero.

This reference counting sounds interesting (and powerful) but I wonder if it's overkill for a first cut. Did you ever contemplate an explicit "stop tenant" command to shut down a SQL server? The benefits to this approach is that there's less to implement in the short term, and the behavior is more predictable.


docs/RFCS/20220720_in_memory_tenants.md line 321 at r1 (raw file):

or, alternatively,

`AS TENANT <tenantname> <string>`  (where the string is a quoted SQL statement), e.g. `AS TENANT app $$SELECT * FROM system.users$$`

What authorization, if any, would we need to perform when running this statement? Does the caller need to come from an ADMIN tenant to act on a non-ADMIN tenant?

Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice to see this!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, @nvanbenschoten, and @rafiss)

@knz knz force-pushed the 20220720-rfc-in-memory-tenants branch 2 times, most recently from 4c30c5a to a09cf18 Compare July 20, 2022 15:46
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll only process nit comments in batches - leaving them as open comments for now.

Reviewed 1 of 1 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @dhartunian, @knz, @nvanbenschoten, and @rafiss)


docs/RFCS/20220720_in_memory_tenants.md line 183 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Nit: Call this "tenant visibility"? Also, does it make sense to go with HIDDEN/VISIBLE or PRIVATE/VISIBLE instead of EXPORTED?

I have thought hard about using the term "visibility" initially, and I am 70%+ confident it's not the right abstraction. If we really want to be pedantic, I'd go for the full words "WITHOUT_NETWORK" / "WITH_NETWORK".


docs/RFCS/20220720_in_memory_tenants.md line 191 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Do we need any syntax around tenant privileges/visibility? Will we ever want the ability to change a tenant from privileged to non-privileged (aside from the migration case cited above) or the reverse?

Not in a first iteration. Added words about it.


docs/RFCS/20220720_in_memory_tenants.md line 210 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

This reference counting sounds interesting (and powerful) but I wonder if it's overkill for a first cut. Did you ever contemplate an explicit "stop tenant" command to shut down a SQL server? The benefits to this approach is that there's less to implement in the short term, and the behavior is more predictable.

the TTL would be configurable. If folk want the servers to linger they can configure the TTL to "forever". It's generally bad design to introduce registry data structures without an invalidation protocol, so I'm not going to do that.


docs/RFCS/20220720_in_memory_tenants.md line 321 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

What authorization, if any, would we need to perform when running this statement? Does the caller need to come from an ADMIN tenant to act on a non-ADMIN tenant?

Good Q. Added a sentence about that. Note that David is requesting we do not provision this facility too early. We might cut it from the RFC altogether.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @dhartunian, @knz, @nvanbenschoten, and @rafiss)


docs/RFCS/20220720_in_memory_tenants.md line 129 at r4 (raw file):

everywhere" deployment model for Cockroach Cloud Dedicated, and
self-hosted deployments. Serving more tenants over the network will be
restricted via a license check, reserved for exclusive use inside

Why not let everybody use as many tenants as they want (perhaps conditioned on having a license)?
In particular, if the Obs Service is to use tenants for anything, I think they need to be available broadly.

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @knz, @nvanbenschoten, and @rafiss)


docs/RFCS/20220720_in_memory_tenants.md line 129 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Why not let everybody use as many tenants as they want (perhaps conditioned on having a license)?
In particular, if the Obs Service is to use tenants for anything, I think they need to be available broadly.

I think you're right. I re-reviewed our licensing change proposal https://docs.google.com/document/d/1UQDSAqaYGE-bfeMQbz4nPz0cveaFSxo2YP0fL0Vqo0k/edit

and I see that indeed there's no mention of locking a specific number of tenants. I'll remove this limit in the next batch of updates.

@ajstorm ajstorm requested a review from andreimatei July 21, 2022 12:33
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @knz, @nvanbenschoten, and @rafiss)


docs/RFCS/20220720_in_memory_tenants.md line 129 at r4 (raw file):

I'll remove this limit in the next batch of updates.

IMO, it's not just about what's possible (and allowed as part of our licensing), it's also about what we're willing (and able) to test. Unless we're prepared to properly validate scenarios where we run a large number of in-memory tenants through some fairly rigorous testing, I'd think we'd be better off blocking the creation of more than one tenant.

I could totally imagine us revisiting this at some point down the road when (a) we've seen that there's customer demand for this, and (b) we have the means to properly test it.

@ajstorm ajstorm self-requested a review July 21, 2022 12:33
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @dhartunian, @knz, @nvanbenschoten, and @rafiss)


docs/RFCS/20220720_in_memory_tenants.md line 129 at r4 (raw file):

Previously, ajstorm (Adam Storm) wrote…

I'll remove this limit in the next batch of updates.

IMO, it's not just about what's possible (and allowed as part of our licensing), it's also about what we're willing (and able) to test. Unless we're prepared to properly validate scenarios where we run a large number of in-memory tenants through some fairly rigorous testing, I'd think we'd be better off blocking the creation of more than one tenant.

I could totally imagine us revisiting this at some point down the road when (a) we've seen that there's customer demand for this, and (b) we have the means to properly test it.

Imposing an arbitrary limit because larger values haven't been tested is an unusual thing to do. We can use whatever language we want in our docs, and print messages on the command line, etc, but otherwise let people have fun.

@knz
Copy link
Contributor Author

knz commented Jul 21, 2022

@lunevalex is going on record elsewhere saying that we should run all dedicated / self-hosted customers using the serverless model, that is, with separate processes for all secondary tenants. This would make this RFC unnecessary and I could simply close this PR altogether.

Thoughts?

@ajstorm
Copy link
Collaborator

ajstorm commented Jul 21, 2022

This would make this RFC unnecessary and I could simply close this PR altogether.

Until we have a plan for something that would replace this, we should both leave this open, and continue working on it.

Imposing an arbitrary limit because larger values haven't been tested is an unusual thing to do. We can use whatever language we want in our docs, and print messages on the command line, etc, but otherwise let people have fun.

I'm fine with that, so long (a) product is fine with it, and (b) we mark it as being in "preview" mode.

@knz knz force-pushed the 20220720-rfc-in-memory-tenants branch 3 times, most recently from a4099a4 to ca3795f Compare August 10, 2022 10:53
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of various convos with the tenant streaming and obs inf teams, I've simultaneously reduced the scope of the RFC (we don't need tenant capabilities in this RFC) and generalized the mechanisms (so we can use them in standalone SQL pods too, for help in testing tenant-to-tenant replication).

It's time to get the RFC more seriously reviewed. We'd like a prototype of this to make its way to 22.2, or very early in the 23.1 cycle.

Reviewed 1 of 1 files at r5, 1 of 1 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @knz, @nvanbenschoten, and @rafiss)


docs/RFCS/20220720_in_memory_tenants.md line 13 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Nit: Can we also explicitly tie this to the unified architecture project in this summary? Internal readers may find an explicit linkage helpful to frame this within a larger body of work.

We're going to decouple this from unified architecture, and instead focus on tenant-to-tenant repl as the main use case.


docs/RFCS/20220720_in_memory_tenants.md line 124 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Nit: Let's be consistent with terminology and call this the "unified architecture".

Removed the sentence.


docs/RFCS/20220720_in_memory_tenants.md line 179 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Nit: Instead of "ADMIN/GUEST" could we call this something more generic like "tenant privileges" capability? We may want to expand this beyond two modes in the future. Also, what about NONADMIN or UNPRIVILEGED as opposed to GUEST. GUEST may be difficult to interpret for some who are unaware that the alternative is ADMIN.

We don't need tenant capabilities for this RFC. I removed this section entirely.


docs/RFCS/20220720_in_memory_tenants.md line 321 at r1 (raw file):

Previously, knz (kena) wrote…

Good Q. Added a sentence about that. Note that David is requesting we do not provision this facility too early. We might cut it from the RFC altogether.

Removed.


docs/RFCS/20220720_in_memory_tenants.md line 129 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Imposing an arbitrary limit because larger values haven't been tested is an unusual thing to do. We can use whatever language we want in our docs, and print messages on the command line, etc, but otherwise let people have fun.

Done.

@knz knz force-pushed the 20220720-rfc-in-memory-tenants branch 2 times, most recently from 393f152 to 8df53ba Compare August 10, 2022 10:56
@knz knz force-pushed the 20220720-rfc-in-memory-tenants branch from 8df53ba to 4aaab29 Compare August 10, 2022 11:14
Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @knz, @nvanbenschoten, and @rafiss)


docs/RFCS/20220720_dynamic_tenant_servers.md line 257 at r8 (raw file):

Internally, the HTTP service will route incoming requests to a
different HTTP request mux depending on the `tenant` cookie, using the

For /api/v2/ might be worth also supporting a header-based approach.


docs/RFCS/20220720_dynamic_tenant_servers.md line 265 at r8 (raw file):

to set the `tenant` cookie to the specified value (e.g. `/tenant/app`
sets cookie to `app`, `/tenant/system` sets cookie to `/system`) and
redirect to the `/` URL.

Will it be possible to query the list of known tenant names without a session? My assumption is not since that data itself lives in SQL.

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @knz, @nvanbenschoten, and @rafiss)


docs/RFCS/20220720_dynamic_tenant_servers.md line 257 at r8 (raw file):

Previously, dhartunian (David Hartunian) wrote…

For /api/v2/ might be worth also supporting a header-based approach.

I'll add it.


docs/RFCS/20220720_dynamic_tenant_servers.md line 265 at r8 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Will it be possible to query the list of known tenant names without a session? My assumption is not since that data itself lives in SQL.

This is a good insight!

Yes, we definitely need a new endpoint to query the list of tenant names (if only to build the selector in the front-end).
We do not need a session for that. We can make the endpoint talk to the tenant registry directly.

@knz knz force-pushed the 20220720-rfc-in-memory-tenants branch 4 times, most recently from d5e450c to 61c25f5 Compare August 15, 2022 09:26
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r10, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @nvanbenschoten, and @rafiss)


docs/RFCS/20220720_dynamic_tenant_servers.md line 257 at r8 (raw file):

Previously, knz (kena) wrote…

I'll add it.

Done.


docs/RFCS/20220720_dynamic_tenant_servers.md line 265 at r8 (raw file):

Previously, knz (kena) wrote…

This is a good insight!

Yes, we definitely need a new endpoint to query the list of tenant names (if only to build the selector in the front-end).
We do not need a session for that. We can make the endpoint talk to the tenant registry directly.

Done.

@knz knz force-pushed the 20220720-rfc-in-memory-tenants branch from 61c25f5 to a10412a Compare August 15, 2022 09:33
@knz knz force-pushed the 20220720-rfc-in-memory-tenants branch from a10412a to c11aae6 Compare August 23, 2022 10:11
@knz knz force-pushed the 20220720-rfc-in-memory-tenants branch 2 times, most recently from 3f8f319 to 20c2dd9 Compare September 27, 2022 15:09
@nvanbenschoten nvanbenschoten removed their request for review September 27, 2022 19:41
@knz knz mentioned this pull request Oct 14, 2022
craig bot pushed a commit that referenced this pull request Oct 20, 2022
89966: sql: add tenant names r=postamar,stevendanna,ecwall a=knz

As spelled out by #84700, this PR adds:

-  a new `name` column to `system.tenants`;
-  a new overload to `create_tenant()` which takes the tenant name as parameter;
-  a new built-in function `rename_tenant()`.

See the enclosed commits for details. I am welcoming alternate approaches to create the name column as long as it properly enforces uniqueness, enables by-name point lookups and simplifies the task of keeping it coherent with the info payload.

Epic: CRDB-14537

90011: changefeedccl: Ensure correct file ordering. r=miretskiy a=miretskiy

When async flushing enabled, the following sequence of events is possible (even if very unlikely):

 * k@t1 is emitted, causing async flush to write file f1 
 * k@t2 is emitted, causing async flush to write file f2 
 * f2 is written out before f1.

In this unlikely scenario -- and the reason why it's unlikely is that we have to generate megabytes of data to cause a flush, unless, file_size parameter was pathologically small -- if a client were to read the contents of the directory right after step 2, and then read directory after step 3, the client will first observe k@t2, and then observe k@t1 -- which is a violation of ordering guarantees.

This PR fixes this issue by adopting a queue so that if there is a pending flush in flight, the next flush is queued behind.

It is possible that this simple approach may result in throughput decrease.  This situation will occur if the underlying storage (s3 or gcp) cannot keep up with writing out data before the next file comes in.  However, at this point, such situation is unlikely for two reasons: one, the rest of changefeed machinery must be able to generate one or more files worth of data before the previous flush completes -- and this task in of itself is not trivial, and two, changefeed must have enough memory allocated to it so that pushback mechanism does not trigger.  The above assumption was validated in reasonably sized test -- i.e. the non-zero queue depth was never observed. Nonetheless, this PR also adds a log message which may be helpful to detect the situation when the sink might not keep up with the incoming data rate.

A more complex solution -- for example, allow unlimited inflight requests during backfill -- may be revisited later, if the above assumption proven incorrect.

Fixes #89683

Release note: None.

90236: vendor: bump Pebble to 0090519bf0bb r=itsbilal a=coolcom200

```
0090519b metrics: expose fsync latency as prometheus histogram
```

Release note: None
Epic: None

90284: multiregionccl: add a datadriven test for secondary region failover r=arulajmani a=rafiss

fixes #90169

This test shows that the lease holder and voters fail over to the secondary region when the primary region is taken offline.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: Leon Fattakhov <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
@rafiss rafiss removed their request for review November 23, 2022 21:52
This also updates previously published RFCs.

Release note: None
@knz knz force-pushed the 20220720-rfc-in-memory-tenants branch from 20c2dd9 to d9f31ed Compare August 24, 2023 11:05
@knz knz changed the title rfcs: new RFC on in-process tenant servers rfcs: new RFC on shared-process deployments for cluster virtualization Aug 24, 2023
Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit that I just skimmed it again and didn't do a deep review. Let's commit this for posterities sake as it represents what was implemented.

@knz knz dismissed ajstorm’s stale review August 25, 2023 14:31

superseded

@knz
Copy link
Contributor Author

knz commented Aug 25, 2023

TFYR!

bors r=stevendanna

@craig
Copy link
Contributor

craig bot commented Aug 25, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 25, 2023

Build succeeded:

@craig craig bot merged commit 8645582 into cockroachdb:master Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants