-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfcs: new RFC on shared-process deployments for cluster virtualization #84700
Conversation
280f97a
to
35a51a6
Compare
c99c186
to
5c21aba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, @nvanbenschoten, and @rafiss)
docs/RFCS/20220720_in_memory_tenants.md
line 13 at r1 (raw file):
same CockroachDB process serving the KV layer. This change is meant to (eventually) power a multitenant experience in
Nit: Can we also explicitly tie this to the unified architecture project in this summary? Internal readers may find an explicit linkage helpful to frame this within a larger body of work.
docs/RFCS/20220720_in_memory_tenants.md
line 124 at r1 (raw file):
deployment, a maximum of 2 of these in-memory tenants will be available to external network clients: one for application workloads and one for system operators. This will power the new "multi-tenant
Nit: Let's be consistent with terminology and call this the "unified architecture".
docs/RFCS/20220720_in_memory_tenants.md
line 128 at r1 (raw file):
self-hosted deployments. Serving more tenants over the network will be restricted via a license check, reserved for exclusive use inside Cockroach Cloud.
Nit: Should we be more explicit here and say that it's only for CC Serverless?
docs/RFCS/20220720_in_memory_tenants.md
line 164 at r1 (raw file):
Tenant names will be introduced via a new UNIQUE column in `system.tenants`. An entry will also be materialized for the system tenant with ID 1 and a
Nit: Should we restrict this "On newly created clusters..."? Above we talk about how the system tenant will not always be ID 1.
docs/RFCS/20220720_in_memory_tenants.md
line 179 at r1 (raw file):
on a value separate from the tenant ID, and that can be changed at run time. This will become the new ADMIN/GUEST capability, and will be stored
Nit: Instead of "ADMIN/GUEST" could we call this something more generic like "tenant privileges" capability? We may want to expand this beyond two modes in the future. Also, what about NONADMIN or UNPRIVILEGED as opposed to GUEST. GUEST may be difficult to interpret for some who are unaware that the alternative is ADMIN.
docs/RFCS/20220720_in_memory_tenants.md
line 183 at r1 (raw file):
To support the restriction on network access, we will also introduce a new PRIVATE/EXPORTED capability. Only tenants with cap EXPORTED will
Nit: Call this "tenant visibility"? Also, does it make sense to go with HIDDEN/VISIBLE or PRIVATE/VISIBLE instead of EXPORTED?
docs/RFCS/20220720_in_memory_tenants.md
line 191 at r1 (raw file):
privileged operations to any secondary tenant. A candidate use case could be granting the capability to view KV-level timeseries to a non-ADMIN tenant.
Do we need any syntax around tenant privileges/visibility? Will we ever want the ability to change a tenant from privileged to non-privileged (aside from the migration case cited above) or the reverse?
docs/RFCS/20220720_in_memory_tenants.md
line 210 at r1 (raw file):
Also, there will be a reference counter (incremented by internal SQL queries and network connection), and the registry would stop a SQL server some time after its reference count drops to zero.
This reference counting sounds interesting (and powerful) but I wonder if it's overkill for a first cut. Did you ever contemplate an explicit "stop tenant" command to shut down a SQL server? The benefits to this approach is that there's less to implement in the short term, and the behavior is more predictable.
docs/RFCS/20220720_in_memory_tenants.md
line 321 at r1 (raw file):
or, alternatively, `AS TENANT <tenantname> <string>` (where the string is a quoted SQL statement), e.g. `AS TENANT app $$SELECT * FROM system.users$$`
What authorization, if any, would we need to perform when running this statement? Does the caller need to come from an ADMIN tenant to act on a non-ADMIN tenant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice to see this!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, @nvanbenschoten, and @rafiss)
4c30c5a
to
a09cf18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll only process nit
comments in batches - leaving them as open comments for now.
Reviewed 1 of 1 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @dhartunian, @knz, @nvanbenschoten, and @rafiss)
docs/RFCS/20220720_in_memory_tenants.md
line 183 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Nit: Call this "tenant visibility"? Also, does it make sense to go with HIDDEN/VISIBLE or PRIVATE/VISIBLE instead of EXPORTED?
I have thought hard about using the term "visibility" initially, and I am 70%+ confident it's not the right abstraction. If we really want to be pedantic, I'd go for the full words "WITHOUT_NETWORK" / "WITH_NETWORK".
docs/RFCS/20220720_in_memory_tenants.md
line 191 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Do we need any syntax around tenant privileges/visibility? Will we ever want the ability to change a tenant from privileged to non-privileged (aside from the migration case cited above) or the reverse?
Not in a first iteration. Added words about it.
docs/RFCS/20220720_in_memory_tenants.md
line 210 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
This reference counting sounds interesting (and powerful) but I wonder if it's overkill for a first cut. Did you ever contemplate an explicit "stop tenant" command to shut down a SQL server? The benefits to this approach is that there's less to implement in the short term, and the behavior is more predictable.
the TTL would be configurable. If folk want the servers to linger they can configure the TTL to "forever". It's generally bad design to introduce registry data structures without an invalidation protocol, so I'm not going to do that.
docs/RFCS/20220720_in_memory_tenants.md
line 321 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
What authorization, if any, would we need to perform when running this statement? Does the caller need to come from an ADMIN tenant to act on a non-ADMIN tenant?
Good Q. Added a sentence about that. Note that David is requesting we do not provision this facility too early. We might cut it from the RFC altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @dhartunian, @knz, @nvanbenschoten, and @rafiss)
docs/RFCS/20220720_in_memory_tenants.md
line 129 at r4 (raw file):
everywhere" deployment model for Cockroach Cloud Dedicated, and self-hosted deployments. Serving more tenants over the network will be restricted via a license check, reserved for exclusive use inside
Why not let everybody use as many tenants as they want (perhaps conditioned on having a license)?
In particular, if the Obs Service is to use tenants for anything, I think they need to be available broadly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @knz, @nvanbenschoten, and @rafiss)
docs/RFCS/20220720_in_memory_tenants.md
line 129 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Why not let everybody use as many tenants as they want (perhaps conditioned on having a license)?
In particular, if the Obs Service is to use tenants for anything, I think they need to be available broadly.
I think you're right. I re-reviewed our licensing change proposal https://docs.google.com/document/d/1UQDSAqaYGE-bfeMQbz4nPz0cveaFSxo2YP0fL0Vqo0k/edit
and I see that indeed there's no mention of locking a specific number of tenants. I'll remove this limit in the next batch of updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @knz, @nvanbenschoten, and @rafiss)
docs/RFCS/20220720_in_memory_tenants.md
line 129 at r4 (raw file):
I'll remove this limit in the next batch of updates.
IMO, it's not just about what's possible (and allowed as part of our licensing), it's also about what we're willing (and able) to test. Unless we're prepared to properly validate scenarios where we run a large number of in-memory tenants through some fairly rigorous testing, I'd think we'd be better off blocking the creation of more than one tenant.
I could totally imagine us revisiting this at some point down the road when (a) we've seen that there's customer demand for this, and (b) we have the means to properly test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @dhartunian, @knz, @nvanbenschoten, and @rafiss)
docs/RFCS/20220720_in_memory_tenants.md
line 129 at r4 (raw file):
Previously, ajstorm (Adam Storm) wrote…
I'll remove this limit in the next batch of updates.
IMO, it's not just about what's possible (and allowed as part of our licensing), it's also about what we're willing (and able) to test. Unless we're prepared to properly validate scenarios where we run a large number of in-memory tenants through some fairly rigorous testing, I'd think we'd be better off blocking the creation of more than one tenant.
I could totally imagine us revisiting this at some point down the road when (a) we've seen that there's customer demand for this, and (b) we have the means to properly test it.
Imposing an arbitrary limit because larger values haven't been tested is an unusual thing to do. We can use whatever language we want in our docs, and print messages on the command line, etc, but otherwise let people have fun.
@lunevalex is going on record elsewhere saying that we should run all dedicated / self-hosted customers using the serverless model, that is, with separate processes for all secondary tenants. This would make this RFC unnecessary and I could simply close this PR altogether. Thoughts? |
Until we have a plan for something that would replace this, we should both leave this open, and continue working on it.
I'm fine with that, so long (a) product is fine with it, and (b) we mark it as being in "preview" mode. |
a4099a4
to
ca3795f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In light of various convos with the tenant streaming and obs inf teams, I've simultaneously reduced the scope of the RFC (we don't need tenant capabilities in this RFC) and generalized the mechanisms (so we can use them in standalone SQL pods too, for help in testing tenant-to-tenant replication).
It's time to get the RFC more seriously reviewed. We'd like a prototype of this to make its way to 22.2, or very early in the 23.1 cycle.
Reviewed 1 of 1 files at r5, 1 of 1 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @knz, @nvanbenschoten, and @rafiss)
docs/RFCS/20220720_in_memory_tenants.md
line 13 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Nit: Can we also explicitly tie this to the unified architecture project in this summary? Internal readers may find an explicit linkage helpful to frame this within a larger body of work.
We're going to decouple this from unified architecture, and instead focus on tenant-to-tenant repl as the main use case.
docs/RFCS/20220720_in_memory_tenants.md
line 124 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Nit: Let's be consistent with terminology and call this the "unified architecture".
Removed the sentence.
docs/RFCS/20220720_in_memory_tenants.md
line 179 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Nit: Instead of "ADMIN/GUEST" could we call this something more generic like "tenant privileges" capability? We may want to expand this beyond two modes in the future. Also, what about NONADMIN or UNPRIVILEGED as opposed to GUEST. GUEST may be difficult to interpret for some who are unaware that the alternative is ADMIN.
We don't need tenant capabilities for this RFC. I removed this section entirely.
docs/RFCS/20220720_in_memory_tenants.md
line 321 at r1 (raw file):
Previously, knz (kena) wrote…
Good Q. Added a sentence about that. Note that David is requesting we do not provision this facility too early. We might cut it from the RFC altogether.
Removed.
docs/RFCS/20220720_in_memory_tenants.md
line 129 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Imposing an arbitrary limit because larger values haven't been tested is an unusual thing to do. We can use whatever language we want in our docs, and print messages on the command line, etc, but otherwise let people have fun.
Done.
393f152
to
8df53ba
Compare
8df53ba
to
4aaab29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @knz, @nvanbenschoten, and @rafiss)
docs/RFCS/20220720_dynamic_tenant_servers.md
line 257 at r8 (raw file):
Internally, the HTTP service will route incoming requests to a different HTTP request mux depending on the `tenant` cookie, using the
For /api/v2/
might be worth also supporting a header-based approach.
docs/RFCS/20220720_dynamic_tenant_servers.md
line 265 at r8 (raw file):
to set the `tenant` cookie to the specified value (e.g. `/tenant/app` sets cookie to `app`, `/tenant/system` sets cookie to `/system`) and redirect to the `/` URL.
Will it be possible to query the list of known tenant names without a session? My assumption is not since that data itself lives in SQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @knz, @nvanbenschoten, and @rafiss)
docs/RFCS/20220720_dynamic_tenant_servers.md
line 257 at r8 (raw file):
Previously, dhartunian (David Hartunian) wrote…
For
/api/v2/
might be worth also supporting a header-based approach.
I'll add it.
docs/RFCS/20220720_dynamic_tenant_servers.md
line 265 at r8 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Will it be possible to query the list of known tenant names without a session? My assumption is not since that data itself lives in SQL.
This is a good insight!
Yes, we definitely need a new endpoint to query the list of tenant names (if only to build the selector in the front-end).
We do not need a session for that. We can make the endpoint talk to the tenant registry directly.
d5e450c
to
61c25f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r10, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @andreimatei, @dhartunian, @nvanbenschoten, and @rafiss)
docs/RFCS/20220720_dynamic_tenant_servers.md
line 257 at r8 (raw file):
Previously, knz (kena) wrote…
I'll add it.
Done.
docs/RFCS/20220720_dynamic_tenant_servers.md
line 265 at r8 (raw file):
Previously, knz (kena) wrote…
This is a good insight!
Yes, we definitely need a new endpoint to query the list of tenant names (if only to build the selector in the front-end).
We do not need a session for that. We can make the endpoint talk to the tenant registry directly.
Done.
61c25f5
to
a10412a
Compare
a10412a
to
c11aae6
Compare
3f8f319
to
20c2dd9
Compare
89966: sql: add tenant names r=postamar,stevendanna,ecwall a=knz As spelled out by #84700, this PR adds: - a new `name` column to `system.tenants`; - a new overload to `create_tenant()` which takes the tenant name as parameter; - a new built-in function `rename_tenant()`. See the enclosed commits for details. I am welcoming alternate approaches to create the name column as long as it properly enforces uniqueness, enables by-name point lookups and simplifies the task of keeping it coherent with the info payload. Epic: CRDB-14537 90011: changefeedccl: Ensure correct file ordering. r=miretskiy a=miretskiy When async flushing enabled, the following sequence of events is possible (even if very unlikely): * k@t1 is emitted, causing async flush to write file f1 * k@t2 is emitted, causing async flush to write file f2 * f2 is written out before f1. In this unlikely scenario -- and the reason why it's unlikely is that we have to generate megabytes of data to cause a flush, unless, file_size parameter was pathologically small -- if a client were to read the contents of the directory right after step 2, and then read directory after step 3, the client will first observe k@t2, and then observe k@t1 -- which is a violation of ordering guarantees. This PR fixes this issue by adopting a queue so that if there is a pending flush in flight, the next flush is queued behind. It is possible that this simple approach may result in throughput decrease. This situation will occur if the underlying storage (s3 or gcp) cannot keep up with writing out data before the next file comes in. However, at this point, such situation is unlikely for two reasons: one, the rest of changefeed machinery must be able to generate one or more files worth of data before the previous flush completes -- and this task in of itself is not trivial, and two, changefeed must have enough memory allocated to it so that pushback mechanism does not trigger. The above assumption was validated in reasonably sized test -- i.e. the non-zero queue depth was never observed. Nonetheless, this PR also adds a log message which may be helpful to detect the situation when the sink might not keep up with the incoming data rate. A more complex solution -- for example, allow unlimited inflight requests during backfill -- may be revisited later, if the above assumption proven incorrect. Fixes #89683 Release note: None. 90236: vendor: bump Pebble to 0090519bf0bb r=itsbilal a=coolcom200 ``` 0090519b metrics: expose fsync latency as prometheus histogram ``` Release note: None Epic: None 90284: multiregionccl: add a datadriven test for secondary region failover r=arulajmani a=rafiss fixes #90169 This test shows that the lease holder and voters fail over to the secondary region when the primary region is taken offline. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Leon Fattakhov <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>
This also updates previously published RFCs. Release note: None
20c2dd9
to
d9f31ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll admit that I just skimmed it again and didn't do a deep review. Let's commit this for posterities sake as it represents what was implemented.
TFYR! bors r=stevendanna |
Build failed (retrying...): |
Build succeeded: |
Text of RFC: https://github.com/knz/cockroach/blob/20220720-rfc-in-memory-tenants/docs/RFCS/20220720_shared_process_deployments.md
Some prototype work ongoing here: #84608.
Epic: CRDB-14537