-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlproxyccl: proxy / serverless clusters block use of db names containing a period (.
)
#92526
Comments
Proposal by @jaylim-crl:
Note: we would need to impose the restriction in point (1) on all clusters (including SH / dedicated), not just serverless, so that customers can migrate existing apps from one product to another without surprise. |
Should we choose to go this way, we'd make the reserved prefix configurable (via a cluster setting). |
Does postgres allow dots in database names? This seems like it might introduce ambiguity with the notation for database.schema.table. Or is the problem that you could quote things in sql so that would work but there is no such quoting for the connection string? I would prefer a simpler rule like "no dots" instead of a more complex one that only matches Does the introduction of SNI-based routing give us a path to deprecating and removing this parsing of the database name? IIRC putting the tenant ID in the database name was the last choice, if neither SNI nor the options parameter were supported. |
Yes.
No. Double quotes are required in that case. That said if a db name contains periods, it's possible to specify it in the conn URL without quoting:
That's what we started with. The db names can already contain periods, that cat is out of the bag.
For serverless, yes. But generally we've wanted to offer a non-SNI option as well, and the other alternative, via the
Correct, but we do want a solution for #92580 that works with non-TLS configurations as well, and with drivers that don't support |
My current thinking is that the reserved prefix would be a configurable regular expression. Its default value (used in CC and SH clusters using default) would be something a bit more generic, something like |
This is not quite right, because tenant names are not required to have two words in them. This is only the form of the default cluster name that we generate, but the user can override it with whatever they like. The pattern that we’d have to prohibit would instead be: “XXX-NN.” |
This is the regex that we use for cluster names: cockroach/pkg/ccl/sqlproxyccl/proxy_handler.go Lines 48 to 55 in 3910ac4
XXX would be the cluster name, and NN will always be a number, i.e. tenant ID. |
That regexp is too inclusive. It would forbid the (imho, legitimate) use of That's why i insisted on having the hyphen and a minimum of two words ( Also we will need to talk about that number at the end if we start considering using tenant names instead of IDs |
Oh and so the user can customize their cluster name... We really have made a mess of this. The complete disregard for PG identifier rules in that decision process is... Unsettling. |
Okay at this point I'm starting to think we should push for a change in the serverless connection URLs to mandate a prefix before the cluster name. For example the URL for cluster name Then we'd reserve the db namespace that starts with Then we'd go to our serverless users who use the db name variant to add that prefix in their apps. After that prefix becomes mandatory (in the db name position) we will become able to recognize composite db names reliably, as it will be clear when the cluster name prefix is not included. |
Currently the sqlproxy is implemented so that if SNI information is provided, SNI is only used if the sqlproxy is unable to extract a tenant from the option or the DB. I think we can sidestep most of the pain by changing the precedence:
|
Do you mean by "the behavior is a little weird" that IF the fall back ends up using the db name, we will always pick up the cluster name before the period, but there can still be a db name with a period after that? |
Yep! If someone selects a database name that starts with a tenant name and the client does not support SNI it can lead to a confusing error message. The user may see a message saying that the tenant does not exist or an authentication failure if the tenant does exist. |
I think I'm ok with that. I'll modify #92619 accordingly. |
Note that these will not match in the case where a tenant cluster has been moved from one host to another. For example:
The
In other words, we're looking for an SNI name that contains a host short-id component, because that implies that it also contains a tenant name. There are a couple of other possible misunderstandings here that I want to ensure are clear:
The non-ambiguous way to parse the database name in this case would be to assume that the database name is always the part after the last period (i.e. not that the cluster name is always the part before the first period). Database names with dots would therefore result in a cluster routing error (if SNI and PG option are not present). We could give users a workaround by allowing quotes to be used around the database name (e.g. For example, we could say that when a routing ID is used in a DNS name, it's in the form
However, I expect users to get confused by a difference like that, so there's a usability tradeoff here. We were hoping to publish cluster routing IDs in the Cloud dashboard, so that customers can conveniently use this and so that Support can easily look up clusters. If the routing ID takes a different form depending on where it's used, it's going to get confusing... |
This is not the only way to do this. Could we instead consider constraining routing IDs to always match a regexp we'd decide in advance? We could then make this an optional part of the cluster name specification. |
Can you give more details on what you mean? Today, the regex for a routing ID would roughly be:
Constraining it further would be problematic, as we've already rolled them out, and they're part of important user-visible and aesthetically-important connection strings. Also, remember that we have today's tenant name that we need to continue supporting for backwards-compat, which has this form: |
What I'm requesting is to agree that any current and future routing ID has a structure we can recognize already today (i.e. we can describe today the structure of all future tenant name + routing combinations) |
The routing ID is described in the RFC I referenced (though note that we used the serverless cluster name rather than the constant string "serverless" as described there), with this structure:
The only part under end-user control is the
As far as we know today, this routing ID is sufficient for our present and future needs. Can you give a specific example of what you're proposing we change (while still preserving a high-level of backwards-compat)? |
What I propose is to reserve an open name space for any and all future extensions besides / in addition to the short ID, in a way that guarantees space for a composite database name using periods afterwards. The problem I see is the following: again with this host ID we're looking at a closed namespace, and if we ever need to extend we will once again infringe on the space available for database names. Let me replay this for you. When this work started we had a regexp for cluster name + tenant ID. This was already a closed namespace—it did not leave any room for extension:
Within that namespace, the only possible way to add extensions would have been to reserve certain tenant IDs as marker for extensions, and then play "under" that, for example by saying "if the tenant ID part starts with a zero, then the string before that should be interpreted as, say, extension name then hyphen then cluster name". However, in the new RFC you didn't go this way. Instead, you carved another namespace: a period followed by the short ID. By doing so, you chipped once again away at the realm of possible database names containing periods. And the result is still a closed namespace: any short ID that matches the short ID regexp is a valid routing ID. Again there's no room for extensions. So I predict the next time we will need to do something you will again suggest using a period or something similar. This is not good namespace design. The proper way to do this is the following:
The customer for the "outer" format is for the other layers of the stack, which are not and need not be aware of our infrastructure. The customer for the "inner" format is our own infrastructure. As it stands, your standpoint so far has been "everything up to the last period in the database name is reserved as routing information" and that is unacceptable. We need something better. I would be OK with a format that's a little bit more complex such as "either one or two periods, as long as the routing ID has some structure" but that needs to be the end of that complexity, and so we must ensure that format has space for extensions. It does not seem to currently have space for extensions. |
Let's continue this convo under the suggestion from Jeff. @jeffswenson could you assist here?
Can we spell out the rules and the regular expressions a bit more precisely?
|
It doesn't know this today. We would need to provide the proxy with a list of DNS suffixes that enable SNI. The SNI enable list would contain all DNS suffixes that may be routed to the cluster. Alternatively, we could supply a list of DNS suffixes that disable SNI. We would use the disable list to ignore SNI strings that contain the old "freetier" dns names.
We ended up dropping the Serverless prefix and replaced the prefix with the cluster's name. Example
If we introduce namespace now, we will still need to support thousands of production clusters that do not use the new namespace. I see two parts of this we need to be look out for: Parsing ambiguityLooking at the string "foo-bar-12.abcd.bannana". It is unclear if it should be parsed as:
@andy-kimball I'm skeptical anyone knows that they can route using the routing id. All of the documentation I can find refers to the <cluster_name>-<tenant_id>.<database_name> format. I think we should check if anyone is using. If it is unused we should remove support from the sqlproxy for parsing a routing id from the database name. The format is fundamentally ambiguous. If we remove the extra '.' in the routing ID, then the cluster and database names are unambiguous. The segment up to the first '.' is the cluster name and the remainder is the database name. We could adjust the control plane to avoid the creation of non-unique (cluster-name, tenant-id) pairs. Name conflicts are unlikely, and we can avoid them by incrementing the tenant id until a non-conflicting id is found. For the small handful of tenants that exist with ambiguous names, we can avoid migrating them to the same host cluster, or if it looks like the cluster is unused, we could rename the cluster's tenant ID on migration. Error MessagesIf we see the string "foo-12.bar" and tenant "foo-12" does not exist, we don't know which error should be returned to the client. It may be they forgot the tenant and "foo-12.bar" is the database name, or it may be they mistyped the tenant and foo-12 does not exist. I think we can handle this scenario by including the parsed representation in the error message. E.g. return an error like: 'Tenant "foo-12" does not exist. No SNI or cluster option was found. Parsed the tenant from the database parameter "foo-12.bar"' |
It looks like parsing
|
The way I understand it is that the RFC proposal was accepted and the work to modify sqlproxy is currently queued. Are you suggesting the discussion at hand will preempt that work? |
Yes, that was never implemented. Today, we can only route by |
It's ambiguous just with a single period. Because the tenant name is optional in the database name, there's no way to tell whether Therefore, we're already in a place where there's some ambiguity, and need to continue supporting existing connection strings. However, as @jeffswenson said, we haven't implemented parsing of the I'm very concerned about the aesthetics of the various proposals (I think that's just as important as unambiguity), but as Jeff said, embedding the tenant name in a database name is a "fallback for a fallback", so aesthetics are less of a concern there. I'd be OK with a format like:
If the database name starts with Here are a couple more answers:
Today, it's
The RFC suggested we move to a constant
Note this is exactly how we parse qualified names in SQL - you need to use quotes to disambiguate when you use dots in names, e.g.:
In this case, everything after the last period (outside of quotes) is the table name, and not using quotes will cause ambiguities. I think it was perfectly reasonable to use similar rules when we decided to qualify database names in PG connection strings. Using a prefix to disambiguate rather than using SQL qualified name rules is just your personal preference and not the only "acceptable" solution. |
"This is exactly how" is factually incorrect. The SQL dialect has the benefit of opt-in quotes to disambiguate. The connection URL does not have that luxury. I'll reproduce jeff and i's argument in my next comment. |
We chose to extend what's acceptable in the connection URL. We have no requirement to be exactly compatible there in every edge case. I think your underlying assumption is incorrect - that we have to exactly replicate the semantics of PG connection strings in our Serverless offering. I disagree that this is one of our requirements. |
Here's what Jeff and I discussed Requirements on the shape of the solution
Some useful properties we can take advantage of
ProposalIn serverless proxy, we would change the routing logic as follows:
Examples for serverless:
In the dedicated/SH multitenant case, we would implement the routing logic as follows:
To support the last rule above, we would also modify the SQL dialect to reject the creation of any DB with a name that matches We find this restriction to be much less likely to be backward-incompatible with existing single-tenant deployments than blocking the entire This will also provide us a mechanism for extension, just like the separation using Examples for dedicated/SH:
AppendixServerless-only, optional host routing ID in database name.How to find a host routing ID in the fallback of the fallback in the database name? In that case, separate with additional punctuation. This is possible because pg db names can contain extra punctuation. Some candidates:
|
Overall, this proposal would be acceptable to me. Can you put it in a Google doc, so that it's easy to make comments on specific details? |
PR with finalized RFC #93537 |
93537: docs: new RFC on tenant routing r=JeffSwenson,andy-kimball a=knz [Rendered RFC text here](https://github.com/knz/cockroach/blob/20221212-tenant-routing/docs/RFCS/20221212_tenant_routing.md) Informs #92619. Informs #92526. Informs #92580. Epic: CRDB-22385 Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
Describe the problem
If I do
CREATE DATABASE "my.db"
, I am unable to connect to the dbmy.db
using a SQL clients in CC serverless, for example with a URL of the typepostgresq://...@.../mycluster.my.db
The problem is that the sql proxy decomposes the provided db name using
strings.Split
, and then discards any string after the second period. So if the provided db name ismycluster.my.db
, the result of the db name parsing is cluster ==mycluster
, dbname ==my
, and the.db
suffix is stripped.To Reproduce
see above
Expected behavior
Complex db names with periods in their name should be useable by SQL clients.
cc @jaylim-crl @jeffswenson for triage
Jira issue: CRDB-21837
Epic: CRDB-22385
The text was updated successfully, but these errors were encountered: