-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccl/sqlproxyccl: add server name indication (SNI) support #79477
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @darinpp)
pkg/ccl/sqlproxyccl/frontend_admitter.go, line 84 at r1 (raw file):
conn: conn, err: newErrorf(codeClientReadFailed, "receiving post-TLS startup message: %v", err), sniServerName: sniServerName,
nit: is it necessary to return an SNI server name in the context of an error? Usually it’s more ergonomic to return nothing if there’s an error (though the conn field is an exception here).
pkg/ccl/sqlproxyccl/proxy_handler.go, line 536 at r1 (raw file):
// No cluster identifiers were specified. if clusterIdentifierDB == "" && clusterIdentifierOpt == "" { // Try SNI if the tenant id and cluster name not provided in any other way.
This implicit behavior is fine for now, but I wonder if this will cause issues in the future. There was a UX issue back then where users specified identifiers through both options and database parameters at the same time. In that case, we returned an error when both of them do not match.
That may not be applicable here since the first part of the SNI name don’t reflect the cluster name. That said, should we return an error if a user supplied an SNI server name and identifiers through at least one of the other two parameters?
pkg/ccl/sqlproxyccl/proxy_handler.go, line 543 at r1 (raw file):
hostname := parts[0] hostnameParts := strings.Split(hostname, "-") if len(hostnameParts) == 2 && strings.EqualFold("serverless", hostnameParts[0]) {
One thing that’s strange to me in this PR is hard coding the serverless string here. This will limit how clients use the proxy in the future. Is there a way to remove this hardcoding?
Depending on how we set the DNS entries up, and how much we care about this, skipping validation means that a user could do “foo-10.5xj…”, and still connect to the server. If we care about this, we could do the validation later (after the parsing step). The directory cache could have a DNS field that the proxy can validate the whole server name against, just like what we do to the ClusterName field.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 545 at r1 (raw file):
if len(hostnameParts) == 2 && strings.EqualFold("serverless", hostnameParts[0]) { if tenID, err := strconv.ParseUint(hostnameParts[1], 10, 64); err == nil { return fe.msg, "", roachpb.MakeTenantID(tenID), nil
If the tenant ID is 0, I believe we will panic here. Not sure about the negative case. I think we want some checks like what we did below here to handle that case, if it’s true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @darinpp)
pkg/ccl/sqlproxyccl/proxy_handler.go, line 538 at r1 (raw file):
// Try SNI if the tenant id and cluster name not provided in any other way. if fe.sniServerName != "" { // Try to obtain tenant ID from SNI
nit: extract this into a parseSNI helper to maintain symmetry with the other branches.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 541 at r1 (raw file):
parts := strings.Split(fe.sniServerName, ".") if len(parts) != 0 { hostname := parts[0]
I spent some time thinking about how we will evolve this function once we support migrating a tenant between clusters. A few observations:
- We do not need to extract the cluster name. It exists for routing non-production connections.
- We should probably purge tenant ID from the proxy. The proxy should be reworked to think in terms of the routing ID.
- From the perspective of the sqlproxy and tenantdir, the unique routing identifier for the tenant would be: "serverless-10.abc".
- As long as we do not migrate the tenant cluster, the sqlproxy can translate the legacy identifiers into the routing id "e.g. --cluster running-fox-4 -> serverless-4.abc".
- Reworking the sqlproxy to think in terms of routing id will require changes to the tenantdir interface.
Given the scope of refactoring needed to migrate from tenant id to routing id, I think it is reasonable to defer the work to the project adding tenant migration support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jaylim-crl and @jeffswenson)
pkg/ccl/sqlproxyccl/frontend_admitter.go, line 84 at r1 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
nit: is it necessary to return an SNI server name in the context of an error? Usually it’s more ergonomic to return nothing if there’s an error (though the conn field is an exception here).
not necessary. Removed.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 536 at r1 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
This implicit behavior is fine for now, but I wonder if this will cause issues in the future. There was a UX issue back then where users specified identifiers through both options and database parameters at the same time. In that case, we returned an error when both of them do not match.
That may not be applicable here since the first part of the SNI name don’t reflect the cluster name. That said, should we return an error if a user supplied an SNI server name and identifiers through at least one of the other two parameters?
The code is modified to return an error if an SNI provided tenant ID exists and simultaneously a tenant ID is provided through database/options and the tenant IDs between the sources don't match.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 538 at r1 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
nit: extract this into a parseSNI helper to maintain symmetry with the other branches.
done
pkg/ccl/sqlproxyccl/proxy_handler.go, line 541 at r1 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
I spent some time thinking about how we will evolve this function once we support migrating a tenant between clusters. A few observations:
- We do not need to extract the cluster name. It exists for routing non-production connections.
- We should probably purge tenant ID from the proxy. The proxy should be reworked to think in terms of the routing ID.
- From the perspective of the sqlproxy and tenantdir, the unique routing identifier for the tenant would be: "serverless-10.abc".
- As long as we do not migrate the tenant cluster, the sqlproxy can translate the legacy identifiers into the routing id "e.g. --cluster running-fox-4 -> serverless-4.abc".
- Reworking the sqlproxy to think in terms of routing id will require changes to the tenantdir interface.
Given the scope of refactoring needed to migrate from tenant id to routing id, I think it is reasonable to defer the work to the project adding tenant migration support.
The plan is to eventually eliminate the cluster name altogether. For now - we allow connections w/o cluster name if the tenant ID is received through SNI.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 543 at r1 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
One thing that’s strange to me in this PR is hard coding the serverless string here. This will limit how clients use the proxy in the future. Is there a way to remove this hardcoding?
Depending on how we set the DNS entries up, and how much we care about this, skipping validation means that a user could do “foo-10.5xj…”, and still connect to the server. If we care about this, we could do the validation later (after the parsing step). The directory cache could have a DNS field that the proxy can validate the whole server name against, just like what we do to the ClusterName field.
I don't like this too but this is what the routing RFC says. I suggested to Andy either to allow cluster name specified through the SNI or just ignore the prefix. He wanted to not allow either and stick to the RFC. Given however that allowing empty cluster name is unique to SNI, the check for the cluster name being serverless
should be in clusterNameAndTenantFromParams
as upon existing we don't pass information about how we obtained the cluster id.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 545 at r1 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
If the tenant ID is 0, I believe we will panic here. Not sure about the negative case. I think we want some checks like what we did below here to handle that case, if it’s true.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @darinpp and @jeffswenson)
pkg/ccl/sqlproxyccl/proxy_handler.go, line 536 at r1 (raw file):
Previously, darinpp wrote…
The code is modified to return an error if an SNI provided tenant ID exists and simultaneously a tenant ID is provided through database/options and the tenant IDs between the sources don't match.
FYI, the current logic would allow something like postgres://bob:[email protected]:%s/tenant-cluster-28.defaultdb?sslmode=require
to go through, but I guess this is fine. It's difficult to enforce what goes into the SNI field especially when the structure is tied to how we want serverless routing to work.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 543 at r1 (raw file):
Previously, darinpp wrote…
I don't like this too but this is what the routing RFC says. I suggested to Andy either to allow cluster name specified through the SNI or just ignore the prefix. He wanted to not allow either and stick to the RFC. Given however that allowing empty cluster name is unique to SNI, the check for the cluster name being
serverless
should be inclusterNameAndTenantFromParams
as upon existing we don't pass information about how we obtained the cluster id.
Hm, interesting. I guess this is fine for now. If there's an issue in the future, we can refine.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 623 at r2 (raw file):
"Is '%d' (SNI) or '%d' (database/options) the identifier for the cluster that you're connecting to?", sniTenID.InternalValue, tenID) err = errors.WithHint(err, clusterIdentifierHint)
We should update clusterIdentifierHint
as well. Currently it only shows 2 methods:
cockroach/pkg/ccl/sqlproxyccl/proxy_handler.go
Lines 690 to 703 in 7a96183
const clusterIdentifierHint = `Ensure that your cluster identifier is uniquely specified using any of the | |
following methods: | |
1) Database parameter: | |
Use "<cluster identifier>.<database name>" as the database parameter. | |
(e.g. database="active-roach-42.defaultdb") | |
2) Options parameter: | |
Use "--cluster=<cluster identifier>" as the options parameter. | |
(e.g. options="--cluster=active-roach-42") | |
For more details, please visit our docs site at: | |
https://www.cockroachlabs.com/docs/cockroachcloud/connect-to-a-serverless-cluster | |
` |
pkg/ccl/sqlproxyccl/proxy_handler.go, line 645 at r2 (raw file):
hostnameParts := strings.Split(hostname, "-") if len(hostnameParts) == 2 && strings.EqualFold("serverless", hostnameParts[0]) { if tenID, err := strconv.ParseUint(hostnameParts[1], 10, 64); err == nil && tenID > 0 {
nit: can we check tenID >= roachpb.MinTenantID.ToUint64()
here? This would exclude the system tenant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the tests for this new code still to come?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @darinpp and @jeffswenson)
pkg/ccl/sqlproxyccl/proxy_handler.go, line 541 at r1 (raw file):
Previously, darinpp wrote…
The plan is to eventually eliminate the cluster name altogether. For now - we allow connections w/o cluster name if the tenant ID is received through SNI.
We can never truly eliminate the cluster name, b/c we need to keep it around for backwards-compat. But we do want to deprecate it.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 543 at r1 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
Hm, interesting. I guess this is fine for now. If there's an issue in the future, we can refine.
The prefix just follows our existing convention of using "admin-", "services-", "proxy-", etc. in front of our DNS names to distinguish different endpoints. Why is that strange? I thought about using "proxy-", but that seemed like "Engineer naming", meaning a name that only make sense to the engineers building the service, and not to the customer. I settled on "serverless-" since developers will understand they're connecting to a Serverless cluster.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 638 at r2 (raw file):
// will be true. If not - false. func parseSNI(sniServerName string) (roachpb.TenantID, bool) { if sniServerName != "" {
NIT: Why not make the most important code path get the least nesting, similar to this (for easier reading):
if sniServerName == "" {
return roachpb.MaxTenantID, false
}
parts := strings.Split(sniServerName, ".")
if len(parts) == 0 {
return roachpb.MaxTenantID, false
}
and so on...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @darinpp and @jeffswenson)
pkg/ccl/sqlproxyccl/proxy_handler.go, line 543 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
The prefix just follows our existing convention of using "admin-", "services-", "proxy-", etc. in front of our DNS names to distinguish different endpoints. Why is that strange? I thought about using "proxy-", but that seemed like "Engineer naming", meaning a name that only make sense to the engineers building the service, and not to the customer. I settled on "serverless-" since developers will understand they're connecting to a Serverless cluster.
It's strange because serverless is a CockroachCloud thing. For example, if we wanted to put this proxy in front of the app tenant (as part of the unification work), serverless-X
wouldn't make sense, and I'd imagine tenant-X
in that case. I was thinking of making this something more configurable, i.e. enforced by the directory. We could add a new SNIPrefix field in addition to the ClusterName field, and check against that.
If a user uses foo-bar-baz-10.5xj....
, we would first extract the tenant ID by splitting on -, and taking the last part. The tenantID will be used to retrieve data from the directory cache. Once we have that, we'd validate foo-bar-baz-10
against the SNIPrefix
(or something along those lines).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @darinpp and @jeffswenson)
pkg/ccl/sqlproxyccl/proxy_handler.go, line 543 at r1 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
It's strange because serverless is a CockroachCloud thing. For example, if we wanted to put this proxy in front of the app tenant (as part of the unification work),
serverless-X
wouldn't make sense, and I'd imaginetenant-X
in that case. I was thinking of making this something more configurable, i.e. enforced by the directory. We could add a new SNIPrefix field in addition to the ClusterName field, and check against that.If a user uses
foo-bar-baz-10.5xj....
, we would first extract the tenant ID by splitting on -, and taking the last part. The tenantID will be used to retrieve data from the directory cache. Once we have that, we'd validatefoo-bar-baz-10
against theSNIPrefix
(or something along those lines).
Ah I see your concern. Yeah, I'd be fine making it configurable somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, they are already added. Is there a specific case that you want to see tested that isn't currently?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @darinpp, @jaylim-crl, and @jeffswenson)
pkg/ccl/sqlproxyccl/proxy_handler.go, line 536 at r1 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
FYI, the current logic would allow something like
postgres://bob:[email protected]:%s/tenant-cluster-28.defaultdb?sslmode=require
to go through, but I guess this is fine. It's difficult to enforce what goes into the SNI field especially when the structure is tied to how we want serverless routing to work.
I really don't see the value into putting a lot of effort into ensuring the host name is formatted in a specific way. Seems like the end result is just a better error message.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 543 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Ah I see your concern. Yeah, I'd be fine making it configurable somehow.
What would happen if we don't do any of that? Why is it a problem to allow connecting to the server if (options/database is not present and) the host name is <anything>-42.<the rest>
? If the user wanted to connect to their host name - they will do w/o the checks. If they didn't want - also not a problem as they will connect to someone else's cluster and get rejected by not having the right credentials.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 623 at r2 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
We should update
clusterIdentifierHint
as well. Currently it only shows 2 methods:cockroach/pkg/ccl/sqlproxyccl/proxy_handler.go
Lines 690 to 703 in 7a96183
const clusterIdentifierHint = `Ensure that your cluster identifier is uniquely specified using any of the following methods: 1) Database parameter: Use "<cluster identifier>.<database name>" as the database parameter. (e.g. database="active-roach-42.defaultdb") 2) Options parameter: Use "--cluster=<cluster identifier>" as the options parameter. (e.g. options="--cluster=active-roach-42") For more details, please visit our docs site at: https://www.cockroachlabs.com/docs/cockroachcloud/connect-to-a-serverless-cluster `
added option 3
pkg/ccl/sqlproxyccl/proxy_handler.go, line 638 at r2 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: Why not make the most important code path get the least nesting, similar to this (for easier reading):
if sniServerName == "" { return roachpb.MaxTenantID, false } parts := strings.Split(sniServerName, ".") if len(parts) == 0 { return roachpb.MaxTenantID, false }
and so on...
done
pkg/ccl/sqlproxyccl/proxy_handler.go, line 645 at r2 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
nit: can we check
tenID >= roachpb.MinTenantID.ToUint64()
here? This would exclude the system tenant.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @darinpp and @jeffswenson)
pkg/ccl/sqlproxyccl/proxy_handler.go, line 536 at r1 (raw file):
Previously, darinpp wrote…
I really don't see the value into putting a lot of effort into ensuring the host name is formatted in a specific way. Seems like the end result is just a better error message.
Yes, I think we may eventually want those better error messages, and this all relates to the connection experience. We could refine this in the future. This is fine as-is today.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 543 at r1 (raw file):
Previously, darinpp wrote…
What would happen if we don't do any of that? Why is it a problem to allow connecting to the server if (options/database is not present and) the host name is -42.? If the user wanted to connect to their host name - they will do w/o the checks. If they didn't want - also not a problem as they will connect to someone else's cluster and get rejected by not having the right credentials.
I think this is just mostly for connection experience. We've seen various issues with the options and database parameters, and I don't know if we'll experience the same here.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 749 at r3 (raw file):
3) Host name: Use secure connection to the host name assigned to your cluster.
Should we mention SNI here? I don't know what's best. The comment earlier states the following:
// We currently support embedding the cluster identifier in three ways:
//
// - Through server name identification (SNI) when using TLS connections
// (e.g. serverless-101.5xj.gcp-us-central1.cockroachlabs.cloud)
//
// - Within the database param (e.g. "happy-koala-3.defaultdb")
//
// - Within the options param (e.g. "... --cluster=happy-koala-5 ...").
// PostgreSQL supports three different ways to set a run-time parameter
// through its command-line options, i.e. "-c NAME=VALUE", "-cNAME=VALUE", and
// "--NAME=VALUE".
Also cc: @emily-horing @ianjevans on the updated option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @darinpp, @emily-horing, @ianjevans, @jaylim-crl, and @jeffswenson)
pkg/ccl/sqlproxyccl/proxy_handler.go, line 749 at r3 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
Should we mention SNI here? I don't know what's best. The comment earlier states the following:
// We currently support embedding the cluster identifier in three ways: // // - Through server name identification (SNI) when using TLS connections // (e.g. serverless-101.5xj.gcp-us-central1.cockroachlabs.cloud) // // - Within the database param (e.g. "happy-koala-3.defaultdb") // // - Within the options param (e.g. "... --cluster=happy-koala-5 ..."). // PostgreSQL supports three different ways to set a run-time parameter // through its command-line options, i.e. "-c NAME=VALUE", "-cNAME=VALUE", and // "--NAME=VALUE".
Also cc: @emily-horing @ianjevans on the updated option.
I changed it a bit to indicate SNI and TLS is needed.
Previously the proxy supported two ways of providing tenant id and cluster name information. One was through the database name. The second was through the options parameter. As part of the serverless routing changes, we want to support routing with a tenant id passed through SNI. This PR adds the ability to route using SNI info. Release justification: Low risk, high reward changes to existing functionality Release note: None
bors r+ |
Build succeeded: |
Previously the proxy supported two ways of providing tenant id and
cluster name information. One was through the database name. The second
was through the options parameter. As part of the serverless routing
changes, we want to support routing with a tenant id passed through SNI.
This PR adds the ability to route using SNI info.
Release justification: Low risk, high reward changes to existing functionality
Release note: None