-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccl/sqlproxyccl: CC code migration to DB #65164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the explanatory commit message. Thanks for all the efforts that went into it.
Reviewed 34 of 34 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @darinpp)
pkg/cli/mt_proxy.go, line 1 at r1 (raw file):
// Copyright 2021 The Cockroach Authors.
Can you add this file underneath the pattern for pkg/cli/start*.go
in .github/CODEOWNERS
thanks
(you may need to rebase first to pick up the latest codeowner changes)
pkg/cli/mt_proxy.go, line 33 at r1 (raw file):
var mtStartSQLProxyCmd = &cobra.Command{ Use: "start-proxy <basepath>",
The use and short strings should not mention a positional argument, since it doesn't take positional arguments (cobra.NoArgs
below)
If you want to explain how to use this, give an example with the --xxx
flags here.
pkg/cli/mt_proxy.go, line 46 at r1 (raw file):
func init() { f := mtStartSQLProxyCmd.Flags() f.StringVar(&proxyOpts.Denylist, "denylist-file", "",
Can you move this logic to flags.go
and then add the definition for all these flags in cliflags/flags.go
alongside the others, thanks.
The default values go to context.go
.
pkg/cli/mt_proxy.go, line 141 at r1 (raw file):
} ctx, _ = stopper.WithCancelOnQuiesce(ctx) return
style: return ctx, stopper, err
for readability
pkg/cli/mt_start_sql.go, line 1 at r1 (raw file):
// Copyright 2020 The Cockroach Authors.
ditto codeowners
pkg/cli/mt_test_directory.go, line 1 at r1 (raw file):
// Copyright 2021 The Cockroach Authors.
ditto codeowners
089440a
to
995c6bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @knz)
pkg/cli/mt_proxy.go, line 1 at r1 (raw file):
Previously, knz (kena) wrote…
Can you add this file underneath the pattern for
pkg/cli/start*.go
in.github/CODEOWNERS
thanks
(you may need to rebase first to pick up the latest codeowner changes)
added
pkg/cli/mt_proxy.go, line 33 at r1 (raw file):
Previously, knz (kena) wrote…
The use and short strings should not mention a positional argument, since it doesn't take positional arguments (
cobra.NoArgs
below)
If you want to explain how to use this, give an example with the--xxx
flags here.
fixed
pkg/cli/mt_proxy.go, line 46 at r1 (raw file):
Previously, knz (kena) wrote…
Can you move this logic to
flags.go
and then add the definition for all these flags incliflags/flags.go
alongside the others, thanks.
The default values go tocontext.go
.
done
pkg/cli/mt_proxy.go, line 141 at r1 (raw file):
Previously, knz (kena) wrote…
style:
return ctx, stopper, err
for readability
done
pkg/cli/mt_start_sql.go, line 1 at r1 (raw file):
Previously, knz (kena) wrote…
ditto codeowners
done
pkg/cli/mt_test_directory.go, line 1 at r1 (raw file):
Previously, knz (kena) wrote…
ditto codeowners
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @darinpp and @knz)
pkg/ccl/sqlproxyccl/authentication.go, line 20 at r1 (raw file):
// the connections is considered authenticated. If that doesn't happen, it // returns an error. var Authenticate = func(clientConn, crdbConn net.Conn) error {
How come this needs to be exported? All the uses I see are from the same package.
pkg/ccl/sqlproxyccl/backend_dialer.go, line 22 at r1 (raw file):
// BackendDial is an example backend dialer that does a TCP/IP connection // to a backend, SSL and forwards the start message. var BackendDial = func(
NIT: Add a comment explaining why we're using the variable.
pkg/ccl/sqlproxyccl/proxy.go, line 26 at r1 (raw file):
// UpdateMetricsForError updates the metrics relevant for the type of the // error message. func UpdateMetricsForError(metrics *Metrics, err error) {
Do these functions need to be exported?
pkg/ccl/sqlproxyccl/proxy_handler.go, line 72 at r1 (raw file):
// Insecure if set, will not use TLS for the backend connection. For testing. Insecure bool // Routing rule for constructing the backend address for each incoming
NIT: Start comment with matching name of field here and other places in this file.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 96 at r1 (raw file):
// IncomingTLSCert is the managed cert of the proxy endpoint to // which clients connect. IncomingCert certmgr.Cert
How come all these fields are exported? Aren't the ProxyOptions
what callers should set? Let's create better OO encapsulation now that everything's in the same package and only export what needs to be exported (and what should be exported goes into ProxyOptions
).
pkg/ccl/sqlproxyccl/proxy_handler.go, line 127 at r1 (raw file):
// return on context cancellation. // See `TestProxyKeepAlive` for example usage. KeepAliveLoop func(context.Context) error
Add a TODO to get rid of this, since it's a strange and inefficient design that requires an extra goroutine per connection.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 156 at r1 (raw file):
if handler.DirectoryAddr != "" { conn, err := grpc.Dial(handler.DirectoryAddr, grpc.WithInsecure())
What happens to the ProxyHandler
if the GRPC connection to the tenantdir fails? Will it be able to recover somehow?
pkg/ccl/sqlproxyccl/proxy_handler.go, line 194 at r1 (raw file):
clientErr := &CodeError{CodeParamsRoutingFailed, err} log.Errorf(ctx, "unable to extract cluster name and tenant id: %s", clientErr.Error()) SendErrToClient(conn, clientErr)
How come there's not one helper that both sends the error to the client and updates metrics so that it's less likely people will forget to call both?
pkg/ccl/sqlproxyccl/proxy_handler.go, line 213 at r1 (raw file):
} if err = handler.validateAccessAndAdmitConnection(ctx, tenID, ipAddr); err != nil {
It's a bit strange that we call FrontEndAdmitter
but then call ...AdmitConnection
. How come there's not just one call that "admits" a connection?
pkg/ccl/sqlproxyccl/proxy_handler.go, line 226 at r1 (raw file):
var crdbConn net.Conn var outgoingAddress string for i := 1; i < 10; i++ {
I don't think we should be retrying 10 times like this - it's an arbitrary number. Also, there's no backoff here, so we'll hammer the various backends we call. I think this logic should loop infinitely (or maybe with a really long timeout like 5 minutes), coupled with exponential backoff (say starting at 10ms and backing off to retrying once per second).
pkg/ccl/sqlproxyccl/proxy_handler.go, line 238 at r1 (raw file):
if err != nil && errors.As(err, &codeErr) && codeErr.Code == CodeBackendDown &&
Looking at the list of failure codes, it looks like there are quite a few we'd want to retry on:
CodeBackendDown
CodeBackendReadFailed
CodeBackendWriteFailed
CodeBackendDisconnected
I could imagine one or more of those being returned if the backend comes down in the middle of us establishing a connection with it.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 239 at r1 (raw file):
errors.As(err, &codeErr) && codeErr.Code == CodeBackendDown && handler.Directory != nil {
Doesn't handler.Directory always need to be non-nil? It seems painful to have to check this every time we use the directory otherwise. If we need a stub directory for testing, let's create one of those, rather than having to check for nil here and other places.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 276 at r1 (raw file):
defer cancel() go func() {
Can you add a comment reminding us to find a better wayward to cut access, since having a goroutine on every connection is not ideal. Also, this design does not allow us to cut a connection immediately; we have to wait until the goroutine wakes up after a random interval. We talked before about a connection directory which holds all active connections that would allow us to immediately close connections. Please file a JIRA issue on this as well, as it could make a good starter project.
pkg/ccl/sqlproxyccl/server.go, line 30 at r1 (raw file):
// health check and prometheus metrics. type Server struct { connHandler func(ctx context.Context, metrics *Metrics, proxyConn *Conn) error
NIT: I think you should define a ProxyHandlerFunc
type to make this neater.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @darinpp)
pkg/ccl/sqlproxyccl/proxy_handler.go, line 226 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I don't think we should be retrying 10 times like this - it's an arbitrary number. Also, there's no backoff here, so we'll hammer the various backends we call. I think this logic should loop infinitely (or maybe with a really long timeout like 5 minutes), coupled with exponential backoff (say starting at 10ms and backing off to retrying once per second).
You can use the pkg/util/retry
package which already has this logic.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 276 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Can you add a comment reminding us to find a better wayward to cut access, since having a goroutine on every connection is not ideal. Also, this design does not allow us to cut a connection immediately; we have to wait until the goroutine wakes up after a random interval. We talked before about a connection directory which holds all active connections that would allow us to immediately close connections. Please file a JIRA issue on this as well, as it could make a good starter project.
Since this is in the crdb repo now, please file a github issue instead (then we can fish the jira ticket created by exalate for prioritization)
pkg/cli/mt_test_directory.go, line 29 at r2 (raw file):
Run a test directory service. `, Example: ` cockroach mt test-directory`,
Since you're not demonstrating command-line flags, there's no need for an example here (your example is not different from the canonical name of the command)
pkg/cli/cliflags/flags_mt.go, line 29 at r2 (raw file):
DenyList = FlagInfo{ Name: "denylist-file", EnvVar: "COCKROACH_DENYLIST_FILE",
Are you sure you want environment variables for all these parameters? We typically only use env vars for those things more likely to be overridden in CI or in our team's test computers, not for production systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @knz)
pkg/ccl/sqlproxyccl/authentication.go, line 20 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
How come this needs to be exported? All the uses I see are from the same package.
I changed them so they are not exported anymore.
pkg/ccl/sqlproxyccl/backend_dialer.go, line 22 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: Add a comment explaining why we're using the variable.
Added.
pkg/ccl/sqlproxyccl/proxy.go, line 26 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Do these functions need to be exported?
No, they don't. Changed.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 72 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: Start comment with matching name of field here and other places in this file.
Fixed.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 96 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
How come all these fields are exported? Aren't the
ProxyOptions
what callers should set? Let's create better OO encapsulation now that everything's in the same package and only export what needs to be exported (and what should be exported goes intoProxyOptions
).
Changed.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 127 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Add a TODO to get rid of this, since it's a strange and inefficient design that requires an extra goroutine per connection.
I actually removed this. Not needed.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 156 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
What happens to the
ProxyHandler
if the GRPC connection to the tenantdir fails? Will it be able to recover somehow?
Yes, this is the default behavior. I added a test that test disconnect and automatic reconnect.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 194 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
How come there's not one helper that both sends the error to the client and updates metrics so that it's less likely people will forget to call both?
I added a single helper.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 213 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
It's a bit strange that we call
FrontEndAdmitter
but then call...AdmitConnection
. How come there's not just one call that "admits" a connection?
Agree. I renamed the admitter
to throttler
as this reflects more precisely the function.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 226 at r1 (raw file):
Previously, knz (kena) wrote…
You can use the
pkg/util/retry
package which already has this logic.
added infinite retries and used the retry
package
pkg/ccl/sqlproxyccl/proxy_handler.go, line 238 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Looking at the list of failure codes, it looks like there are quite a few we'd want to retry on:
CodeBackendDown CodeBackendReadFailed CodeBackendWriteFailed CodeBackendDisconnected
I could imagine one or more of those being returned if the backend comes down in the middle of us establishing a connection with it.
I can't see a case where a code other than CodeBackendDown
can be retried. Once we start the passing of the messages back and forth, there is no way to retry and the only possibility is to report the error back to the client.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 239 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Doesn't handler.Directory always need to be non-nil? It seems painful to have to check this every time we use the directory otherwise. If we need a stub directory for testing, let's create one of those, rather than having to check for nil here and other places.
It doesn't need to be nil right now. We still support a custom rule to connect to the backend. We will need to support this at least until we have the suspend/resume working. It may make sense to keep after that as not every use of the proxy should need a directory service.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 276 at r1 (raw file):
Previously, knz (kena) wrote…
Since this is in the crdb repo now, please file a github issue instead (then we can fish the jira ticket created by exalate for prioritization)
Added a comment and will create a github issue as well.
pkg/ccl/sqlproxyccl/server.go, line 30 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: I think you should define a
ProxyHandlerFunc
type to make this neater.
Defined.
pkg/cli/mt_test_directory.go, line 29 at r2 (raw file):
Previously, knz (kena) wrote…
Since you're not demonstrating command-line flags, there's no need for an example here (your example is not different from the canonical name of the command)
Agree. I removed the examples.
pkg/cli/cliflags/flags_mt.go, line 29 at r2 (raw file):
Previously, knz (kena) wrote…
Are you sure you want environment variables for all these parameters? We typically only use env vars for those things more likely to be overridden in CI or in our team's test computers, not for production systems.
Right. No need for environment variables. I removed them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 25 of 25 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @darinpp)
pkg/ccl/sqlproxyccl/backend_dialer.go, line 22 at r3 (raw file):
// BackendDial is an example backend dialer that does a TCP/IP connection // to a backend, SSL and forwards the start message. It is defined as a variable // so it can be redirected for testing.
nit: if this is meant only for use in tests, you can call it var TestingBackendDial = ...
This makes it easier to flag assignments of the form sqlproxyccl.TestingXXX = ...
in non-test code during code reviews elsewhere.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 218 at r3 (raw file):
outgoingAddress, err = handler.OutgoingAddress(ctx, clusterName, tenID) if err != nil { continue
your previous code was not retrying in this case. I think we still need to handle this one as an error.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 229 at r3 (raw file):
codeErr.Code == CodeBackendDown && handler.directory != nil { _ = handler.directory.ReportFailure(ctx, roachpb.MakeTenantID(tenID), outgoingAddress)
Why do you ignore the error return here?
pkg/ccl/sqlproxyccl/proxy_handler.go, line 261 at r3 (raw file):
defer cancel() // TODO: starting a new go routine for every connection here is inefficient.
nit: TODO(darinpp): ...
- we have a linter
b0736b0
to
391eea1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @knz)
pkg/ccl/sqlproxyccl/backend_dialer.go, line 22 at r3 (raw file):
Previously, knz (kena) wrote…
nit: if this is meant only for use in tests, you can call it
var TestingBackendDial = ...
This makes it easier to flag assignments of the form
sqlproxyccl.TestingXXX = ...
in non-test code during code reviews elsewhere.
The code that calls the BackendDial
expects to call that exact name. So if I want to hook up the method for testing, I had to use the same exact name. Having func BackendDial(...
doesn't allow redirecting for test purposes. I don't think you are suggesting that a prefix all hookable methods with Test..
pkg/ccl/sqlproxyccl/proxy_handler.go, line 218 at r3 (raw file):
Previously, knz (kena) wrote…
your previous code was not retrying in this case. I think we still need to handle this one as an error.
Retrying is what we want in this case. An error can only occur when using the directory and according to Andy - we want to retry in cases. Especially for the case where the directory server is temporary down and all RPC calls report unable to connect.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 229 at r3 (raw file):
Previously, knz (kena) wrote…
Why do you ignore the error return here?
There isn't much we can do with the error in this exact case. Perhaps log it but given that this is in endless loop - it wouldn't make sense to log on each iteration.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 261 at r3 (raw file):
Previously, knz (kena) wrote…
nit:
TODO(darinpp): ...
- we have a linter
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @knz)
pkg/ccl/sqlproxyccl/backend_dialer.go, line 22 at r3 (raw file):
Previously, knz (kena) wrote…
nit: if this is meant only for use in tests, you can call it
var TestingBackendDial = ...
This makes it easier to flag assignments of the form
sqlproxyccl.TestingXXX = ...
in non-test code during code reviews elsewhere.
It's used in production code, it's just that we hook it during testing so that we can stub it out.
pkg/ccl/sqlproxyccl/backend_dialer.go, line 23 at r4 (raw file):
// to a backend, SSL and forwards the start message. It is defined as a variable // so it can be redirected for testing. var BackendDial = func(
NIT: Does BackendDial
need to be exported?
pkg/ccl/sqlproxyccl/frontend_admitter.go, line 24 at r4 (raw file):
// on whether the error happened before the connection was upgraded to TLS or not // it will either be the original or the TLS connection. var FrontendAdmit = func(
NIT: Does this need to be exported? Can you do a pass over all the files and review what should be exported or not? It's hard for me to flag every place.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 238 at r1 (raw file):
Previously, darinpp wrote…
I can't see a case where a code other than
CodeBackendDown
can be retried. Once we start the passing of the messages back and forth, there is no way to retry and the only possibility is to report the error back to the client.
OK
pkg/ccl/sqlproxyccl/proxy_handler.go, line 218 at r3 (raw file):
Previously, darinpp wrote…
Retrying is what we want in this case. An error can only occur when using the directory and according to Andy - we want to retry in cases. Especially for the case where the directory server is temporary down and all RPC calls report unable to connect.
Right, we don't want to return errors on transient failures that our system should "self-heal" from given more time.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 229 at r3 (raw file):
Previously, darinpp wrote…
There isn't much we can do with the error in this exact case. Perhaps log it but given that this is in endless loop - it wouldn't make sense to log on each iteration.
One idea: you could log only if randNum(0, 50) == 0, or similar.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 48 at r4 (raw file):
// alphanumeric characters, with dashes allowed within the name (but not as a // starting or ending character). ClusterNameRegex = regexp.MustCompile("^[a-z0-9][a-z0-9-]{4,18}[a-z0-9]$")
NIT: Another variable that doesn't need to be exported.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 63 at r4 (raw file):
// ProxyOptions is the information needed to construct a new ProxyHandler. type ProxyOptions struct { Denylist string
NIT: Can you add comments for the uncommented fields given these are all exported?
pkg/ccl/sqlproxyccl/proxy_handler.go, line 102 at r4 (raw file):
incomingCert certmgr.Cert // denyListService provides access control
NIT: Periods at end of comments.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 263 at r4 (raw file):
// TODO(darinpp): starting a new go routine for every connection here is inefficient. // Change to maintain a map of connections by IP/tenant and have single // go routine that closes connections that are blacklisted.
blacklisted => blocklisted
pkg/ccl/sqlproxyccl/proxy_handler.go, line 317 at r4 (raw file):
// OutgoingAddress resolves a tenant ID and a tenant cluster name to an IP of // the backend. func (handler *ProxyHandler) OutgoingAddress(
NIT: Does this need to be exported?
pkg/ccl/sqlproxyccl/proxy_handler.go, line 388 at r4 (raw file):
// through its command-line options, i.e. "-c NAME=VALUE", "-cNAME=VALUE", and // "--NAME=VALUE". func ClusterNameAndTenantFromParams(
NIT: exporting necessary?
pkg/ccl/sqlproxyccl/proxy_handler.go, line 522 at r4 (raw file):
// IncomingTLSConfig gets back the current TLS config for the incoiming client // connection endpoint. func (handler *ProxyHandler) IncomingTLSConfig() *tls.Config {
NIT: exporting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andy-kimball FYI, reviewable has special support for comments that start with "nit:", but your usage of "NIT:" in capitals defeats that special functionality.
Reviewed 1 of 1 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @darinpp)
pkg/ccl/sqlproxyccl/proxy_handler.go, line 218 at r3 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Right, we don't want to return errors on transient failures that our system should "self-heal" from given more time.
What's wrong here is that the error is discarded entirely. If the thing ends up retrying forever because of such an error, we'd be blind to this fact.
We need to make these errors observable, even during a retry loop.
Please use log.Every
for this (e.g. max 1 such error per second).
pkg/ccl/sqlproxyccl/proxy_handler.go, line 229 at r3 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
One idea: you could log only if randNum(0, 50) == 0, or similar.
ditto log.Every
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @knz)
pkg/ccl/sqlproxyccl/backend_dialer.go, line 23 at r4 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: Does
BackendDial
need to be exported?
I changed so it isn't exported. It all boils down to whether we want to use the code as a building blocks or not. Sounds like we won't use it so I changed pretty much all I can change to not be exported. We won't have the option to construct custom handler on the CC side.
pkg/ccl/sqlproxyccl/frontend_admitter.go, line 24 at r4 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: Does this need to be exported? Can you do a pass over all the files and review what should be exported or not? It's hard for me to flag every place.
not exported anymore.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 218 at r3 (raw file):
Previously, knz (kena) wrote…
What's wrong here is that the error is discarded entirely. If the thing ends up retrying forever because of such an error, we'd be blind to this fact.
We need to make these errors observable, even during a retry loop.
Please uselog.Every
for this (e.g. max 1 such error per second).
ok. I added a log message using log.Every
pkg/ccl/sqlproxyccl/proxy_handler.go, line 229 at r3 (raw file):
Previously, knz (kena) wrote…
ditto
log.Every
done
pkg/ccl/sqlproxyccl/proxy_handler.go, line 48 at r4 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: Another variable that doesn't need to be exported.
fixed. Not exported anymore.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 63 at r4 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: Can you add comments for the uncommented fields given these are all exported?
added comments to all fields.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 102 at r4 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: Periods at end of comments.
done.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 263 at r4 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
blacklisted => blocklisted
done.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 317 at r4 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: Does this need to be exported?
fixed
pkg/ccl/sqlproxyccl/proxy_handler.go, line 388 at r4 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: exporting necessary?
nope. Fixed.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 522 at r4 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: exporting?
also fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @darinpp, and @knz)
pkg/ccl/sqlproxyccl/backend_dialer.go, line 23 at r4 (raw file):
Previously, darinpp wrote…
I changed so it isn't exported. It all boils down to whether we want to use the code as a building blocks or not. Sounds like we won't use it so I changed pretty much all I can change to not be exported. We won't have the option to construct custom handler on the CC side.
If we need it exported, I have no objection. I just think the default should be to not export unless we have a proven need to use it elsewhere. So if you need this in the managed-service (e.g. to implement susqlproxy), then by all means export it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @knz)
pkg/ccl/sqlproxyccl/backend_dialer.go, line 23 at r4 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
If we need it exported, I have no objection. I just think the default should be to not export unless we have a proven need to use it elsewhere. So if you need this in the managed-service (e.g. to implement susqlproxy), then by all means export it.
At this point it is a bit hard to say what is needed and what isn't. I'll leave all to be private and then change as needed.
6f5e91a
to
1cc8cb4
Compare
pkg/testutils/hook.go, line 18 at r5 (raw file):
Question: is it OK to hook globals in the CRDB codebase? Are all tests supposed to be able to run in parallel in the same process, or is it OK if some cannot? Perhaps @knz knows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting good, but you've fat fingered a Server
-> server
search/replace operation. A few files need to be reverted.
Reviewed 23 of 23 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @darinpp)
pkg/ccl/sqlproxyccl/authentication.go, line 46 at r5 (raw file):
case *pgproto3.ReadyForQuery: // Server has authenticated the connection successfully and is ready to // Serve queries.
incorrect replace here
pkg/ccl/sqlproxyccl/authentication.go, line 51 at r5 (raw file):
// Server has authenticated the connection; keep reading messages until // `pgproto3.ReadyForQuery` is encountered which signifies that server // is ready to Serve queries.
ditto
pkg/ccl/sqlproxyccl/proxy_handler.go, line 231 at r5 (raw file):
var outgoingAddressErrs, codeBackendDownErrs, reportFailureErrs int for r := retry.StartWithCtx(ctx, retryOpts); r.Next(); { shouldLog := everyMinute.ShouldLog()
don't precompute the shouldLog
, and instead repeat the .ShouldLog()
expression both times below. If the call to backendDial()
takes a minute or two for whatever reason, we'd lose some log entries otherwise.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 237 at r5 (raw file):
if shouldLog { log.Ops.Errorf(ctx, "outgoing address: %v (%d errors in the past minute)",
"outgoing address (%d errors skipped): %v"
This makes it clearer if the error message contains newlines.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 257 at r5 (raw file):
if shouldLog { log.Ops.Errorf(ctx, "backend dial: %v (%d errors in the past minute)",
ditto
pkg/ccl/sqlproxyccl/proxy_handler.go, line 268 at r5 (raw file):
if shouldLog { log.Ops.Errorf(ctx, "report failure: %v (%d errors in the past minute)",
ditto
pkg/server/server.go, line 1192 at r5 (raw file):
// Set up the init server. We have to do this relatively early because we // can't call RegisterInitServer() after `grpc.serve`, which is called in
incorrect chang ehere
pkg/server/server.go, line 1738 at r5 (raw file):
s.oidc = oidc // serve UI assets.
incorrect change here
pkg/server/server.go, line 2124 at r5 (raw file):
// (Server.Start) will call this at the right moment. startRPCServer = func(ctx context.Context) { // serve the gRPC endpoint.
ditto
pkg/server/server.go, line 2177 at r5 (raw file):
} // serve the plain HTTP (non-TLS) connection over clearL.
ditto
pkg/server/server.go, line 2198 at r5 (raw file):
} // serve the HTTP endpoint. This will be the original httpLn
ditto
pkg/server/status.go, line 897 at r5 (raw file):
var dir string switch req.Type { //TODO(ridwanmsharif): serve logfiles so debug-zip can fetch them
incorrect change here
pkg/testutils/hook.go, line 18 at r5 (raw file):
Andy I will answer your question below.
However, first let me share an opinion: to be honest I am not a fan of this function.
We could have implemented this many years ago and yet we chose not to do so: instead, we implement TestingXXX
functions that do this value replacement:
utilccl.TestingEnableEnterprise()
jobs.TestingSetProgressThresholds()
sql.TestingOverrideExplainEnvVersion()
etc
I think this is the pattern that should be used here.
There are two reasons for this. One is that it allows us to lint that these global override functions are not used outside of test code.
The other one is that TestingXXXS
functions make it possible to hide the global variables from other packages. Only the TestingXXX
function needs to be exported. This reduces the risk of cross-package dependencies on mutable global variables, which is generally not idiomatic go.
Regarding the question:
Are all tests supposed to be able to run in parallel in the same process, or is it OK if some cannot?
FYI: Tests from different packages run in different processes.
Tests from the same package run inside the same process, but are not run concurrently with each other by default. Tests running with t.Paralle()
can only run concurrently with other tests using t.Parallel()
too and the test harness blocks progress on all other tests that don't use t.Parallel()
while they run.
pkg/util/grpcutil/grpc_log.go, line 234 at r5 (raw file):
// connection, gRPC is unhappy that the TLS handshake did not // complete. We don't care. incomingConnSpamReSrc = `^grpc: Server\.serve failed to complete security handshake from "[^"]*": EOF`
That change here doesn't seem right.
pkg/util/log/fluent_client_test.go, line 112 at r5 (raw file):
t.Logf("got client: %v", conn.RemoteAddr()) // serve the connection.
incorrect change here
pkg/util/netutil/net.go, line 106 at r5 (raw file):
ctx := context.TODO() // net/http.(*Server).serve/http2.ConfigureServer are not thread safe with
incorrect change here
pkg/util/netutil/net.go, line 132 at r5 (raw file):
ctx context.Context, stopper *stop.Stopper, l net.Listener, serveConn func(net.Conn), ) error { // Inspired by net/http.(*Server).serve
ditto
pkg/util/netutil/net.go, line 155 at r5 (raw file):
go func() { defer stopper.Recover(ctx) s.Server.ConnState(rw, http.StateNew) // before serve can return
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @darinpp, and @knz)
pkg/testutils/hook.go, line 18 at r5 (raw file):
Previously, darinpp wrote…
I renamed the
HookGlobal
toTestingHook
. The intention is to only call this from tests. The name should help us lint this. The examples that you give work similarly (but most seem to hook data) and are just custom made for each case. Having a custom method that hooks by usingTestingHook
is still possible - see for examplehookFrontendAdmit
. If for test purposes the func replacement was the same for all tests, then this could happen inTestMain
in a similar way as some of your examples. Lastly - this isn't just hooking global functions. In my case funcs that are hooked are private to the package. But you can also use the same to hook a handler that is a struct field. I added a test forTestingHook
that shows hooking and restoring the field of a struct.
Raphael, to give some context: this function was added in CC by Peter, and the entire org has begun using it extensively. It's a very nice, general way to intercept functions that call out to other sub-systems and to temporarily update variables. I think this pattern is better than specific Testing
functions that are just boilerplate that you have to write every time. In fact, we were doing something similar in CC (i.e. creating custom TestingXXX
functions) when Peter came up with this new generalized pattern, and it's really made our testing easier and nicer, with less boilerplate needed.
Note that we most often use HookGlobal
in CC when we're hooking private variables and functions in a test.
ab400b7
to
9f38b99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 34 files at r1, 20 of 21 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @darinpp, and @knz)
pkg/ccl/sqlproxyccl/proxy_handler.go, line 522 at r4 (raw file):
Previously, darinpp wrote…
also fixed
Hm, still seems to be exporting.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 372 at r6 (raw file):
} // This doesn't verify the name part of the tenant and relies just on the int id.
I'd imagine we want to eventually verify the cluster name of the tenant. Could we add a TODO for that?
Also, any reason why we skipped that here, or do we have a different idea in mind now?
pkg/ccl/sqlproxyccl/proxy_handler.go, line 473 at r6 (raw file):
Quoted 4 lines of code…
tenID, err := strconv.ParseUint(tenantIDStr, 10, 64) if err != nil { return msg, "", 0, errors.Wrapf(err, "cannot parse %s as uint64", tenantIDStr) }
nit: we can move this into roachpb/tenant.go, and create a MakeTenantIDFromString
or something similar:
cockroach/pkg/roachpb/tenant.go
Lines 44 to 48 in 803ad8f
// MakeTenantID constructs a new TenantID from the provided uint64. | |
func MakeTenantID(id uint64) TenantID { | |
checkValid(id) | |
return TenantID{id} | |
} |
If this returns roachpb.TenantID
, then we don't need to call roachpb.MakeTenantID
everywhere here.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 530 at r6 (raw file):
// CockroachDB currently does not support the options parameter, so the parsing // logic is built on that assumption. If we do start supporting options in // CockroachDB itself, then we should revisit this.
This comment is outdated. CRDB now supports the options parameter as of #59621. At a glance, the logic seems fine as-is, but worth adding some tests, if we don't have one yet.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 581 at r6 (raw file):
} // setupIncomingCert will setup a manged cert for the incoming connections.
nit: s/manged/managed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @darinpp, @jaylim-crl, and @knz)
pkg/ccl/sqlproxyccl/proxy_handler.go, line 522 at r4 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
Hm, still seems to be exporting.
Right. Fixed.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 372 at r6 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
I'd imagine we want to eventually verify the cluster name of the tenant. Could we add a TODO for that?
Also, any reason why we skipped that here, or do we have a different idea in mind now?
Right. I changed it to pass the name
so now the verification works.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 473 at r6 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
tenID, err := strconv.ParseUint(tenantIDStr, 10, 64) if err != nil { return msg, "", 0, errors.Wrapf(err, "cannot parse %s as uint64", tenantIDStr) }
nit: we can move this into roachpb/tenant.go, and create a
MakeTenantIDFromString
or something similar:cockroach/pkg/roachpb/tenant.go
Lines 44 to 48 in 803ad8f
// MakeTenantID constructs a new TenantID from the provided uint64. func MakeTenantID(id uint64) TenantID { checkValid(id) return TenantID{id} } If this returns
roachpb.TenantID
, then we don't need to callroachpb.MakeTenantID
everywhere here.
There is no point of creating MakeTenantIDFromString
as this is called just once and then all other code works with roachpb.TenantID
. I changed the return type of clusterNameAndTenantFromParams
so it now returns roachpb.TenantID
and eliminates the need to call MakeTenantID
in the other few places. Few other spots however now need to call TenantID.ToUint64()
so overall it isn't a big gain.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 530 at r6 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
// CockroachDB currently does not support the options parameter, so the parsing // logic is built on that assumption. If we do start supporting options in // CockroachDB itself, then we should revisit this.
This comment is outdated. CRDB now supports the options parameter as of #59621. At a glance, the logic seems fine as-is, but worth adding some tests, if we don't have one yet.
The options support will have to come in another PR
pkg/ccl/sqlproxyccl/proxy_handler.go, line 581 at r6 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
nit: s/manged/managed
corrected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Start a proxy server that uses the directory server
./cockroach mt start-proxy --directory=:36257 --listen-metrics=:8081 --log="{sinks: {stderr: {filter: info}}}" --insecure
Does a SIGINT work for you here?
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @darinpp, @jaylim-crl, and @knz)
pkg/cli/mt_proxy.go, line 54 at r7 (raw file):
return err } stopper.AddCloser(stop.CloserFn(func() { _ = proxyLn.Close() }))
Based on local testing, the stopper seems to be blocked waiting on the proxy server to terminate, which never terminates since we only close the listener as part of the closer, which happens after quiescing:
cockroach/pkg/util/stop/stopper.go
Lines 469 to 476 in e497fd1
s.Quiesce(ctx) | |
// Run the closers without holding s.mu. There's no concern around new | |
// closers being added; we've marked this stopper as `stopping` above, so | |
// any attempts to do so will be refused. | |
for _, c := range s.mu.closers { | |
c.Close() | |
} |
I think we should close the listener as part of another async task which waits for the quiesce operation to occur. Something similar to
cockroach/pkg/util/netutil/net.go
Lines 46 to 53 in e497fd1
waitQuiesce := func(context.Context) { | |
<-stopper.ShouldQuiesce() | |
FatalIfUnexpected(ln.Close()) | |
} | |
if err := stopper.RunAsyncTask(ctx, "listen-quiesce", waitQuiesce); err != nil { | |
waitQuiesce(ctx) | |
return nil, err | |
} |
I implemented that change locally, and the behavior seems to be what I had expected - the listeners close first, followed by stoppers, i.e. sending a SIGINT or SIGTERM eventually terminates the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @darinpp, @jaylim-crl, and @knz)
pkg/cli/mt_proxy.go, line 54 at r7 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
Based on local testing, the stopper seems to be blocked waiting on the proxy server to terminate, which never terminates since we only close the listener as part of the closer, which happens after quiescing:
cockroach/pkg/util/stop/stopper.go
Lines 469 to 476 in e497fd1
s.Quiesce(ctx) // Run the closers without holding s.mu. There's no concern around new // closers being added; we've marked this stopper as `stopping` above, so // any attempts to do so will be refused. for _, c := range s.mu.closers { c.Close() } I think we should close the listener as part of another async task which waits for the quiesce operation to occur. Something similar to
cockroach/pkg/util/netutil/net.go
Lines 46 to 53 in e497fd1
waitQuiesce := func(context.Context) { <-stopper.ShouldQuiesce() FatalIfUnexpected(ln.Close()) } if err := stopper.RunAsyncTask(ctx, "listen-quiesce", waitQuiesce); err != nil { waitQuiesce(ctx) return nil, err } I implemented that change locally, and the behavior seems to be what I had expected - the listeners close first, followed by stoppers, i.e. sending a SIGINT or SIGTERM eventually terminates the process.
fixed.
Previsouly the sql proxy code was in the CC repo. This was making the testing of the proxy against a live SQL server hard and was also requiring a frequent cockroach repo bumps in case of changes. This moves all the code from the CC report to the DB repo so now the proxy is part of the cockroach executable. More detailed list of changed: * The old, sample star-proxy code has been retired in favor of the code moving over from the CC repo. * The code that handles individual connections to the backend has been separated into a new ProxyHandler. Added tests for the proxy handler. * BackendConfig has been retired. * Using stop.Stopper to control the shutdown of the proxy. * Added a command under mt that can be used to run the test directory server. * Added proxy options to control idle timeout, rate limits, config options, use of directory server etc. * Added code to monitor and handle os signals (HUP, TERM, INT). * Intergated the cert manager so the certificates can be reloaded on external signal. * Fixed the SQL tenant process so now the idle timeout causes the stopper to quiesce and the process to terminate successfuly. * Set up the logging for the new proxy. * Added a self-signed cert type to the cert manager to be used when testing secure connections witout generating explicit key/cert files. * Moved the HookGlobal code from CC that can be used for temporary hooks during testing. Here is how to test end to end the proxy, SQL tenant and host server, using the test directory: ``` # Start a host server in insecure mode. Tenants should already have been configured. ./cockroach start-single-node --insecure --log="{sinks: {stderr: {filter: info}}}" # Start a test directory server ./cockroach mt test-directory --port 36257 --log="{sinks: {stderr: {filter: info}}}" # Start a proxy server that uses the directory server ./cockroach mt start-proxy --directory=:36257 --listen-metrics=:8081 --log="{sinks: {stderr: {filter: info}}}" --insecure # Start a SQL client for one of the tenants. ./cockroach sql --url="postgresql://[email protected]:46257/dim-dog-2.defaultdb" --insecure ``` Release note: None
I'm done addressing feedback and will re-base and merge if tests pass. @knz - I fixed all the issues you raised so I'll assume you don't have more feedback. |
bors r+ |
Build succeeded: |
In cockroachdb#65164, we migrated the sqlproxy in the CC code to the DB repository, and there were a few buglets: - sqlproxy crashes when the tenant ID supplied in the connection string is 0 - sqlproxy leaks internal parsing errors to the client This patch hides internal parsing errors, and replaces them with friendly user-facing errors (e.g. "Invalid cluster name"). We also add a bounds check to the parsed tenant ID so that the process does not crash on an invalid tenant ID. More tests were added as well. Release note: None
In cockroachdb#65164, we migrated the sqlproxy in the CC code to the DB repository, and there were a few buglets: - sqlproxy crashes when the tenant ID supplied in the connection string is 0 because roachpb.MakeTenantID panics when the tenant ID is 0. - sqlproxy leaks internal parsing errors to the client. This patch hides internal parsing errors, and replaces them with friendly user-facing errors (e.g. "invalid cluster name"). We also add a bounds check to the parsed tenant ID so that the process does not crash on an invalid tenant ID. More tests were added as well. Release note: None
66303: colexec: optimize the external sort for top K case r=yuzefovich a=yuzefovich **colexec: extend external sort benchmark for top K case** Release note: None **colexec: optimize the external sort for top K case** Previously, if the top K sort spilled to disk, we used the general external sort. However, we could easily optimize that case with the knowledge that only K tuples are needed by the output. Namely, we can use the in-memory top K sort (in order to create each new partition of the desired size) and also limit the size of each merged partition by K tuples. This commit adds these optimizations. Addresses: #45192. Release note: None 66379: colexecbase: add casts from decimals to ints and floats r=yuzefovich a=yuzefovich This commit adds vectorized casts from decimals to ints (of all widths) and floats. Addresses: #48135. Release note: None 66412: sqlproxyccl: minor fixes and enhancements to the proxy handler and denylist r=jaylim-crl a=jaylim-crl #### sqlproxyccl: allow denylist entries that do not expire Previously, we assumed that all denylist entries have an expiration key. When denylist entries do not specify an expiration key, the entries are marked as expired right away since their values default to the zero instant time. This might be cumbersome for operators to specify an expiration when the intention was to not allow the rule to expire at all. This patch changes the behavior of the denylist such that entries without any expiration keys represent rules that do not expire. #### sqlproxyccl: minor fixes around the proxy handler In #65164, we migrated the sqlproxy in the CC code to the DB repository, and there were a few buglets: - sqlproxy crashes when the tenant ID supplied in the connection string is 0 because roachpb.MakeTenantID panics when the tenant ID is 0. - sqlproxy leaks internal parsing errors to the client. This patch hides internal parsing errors, and replaces them with friendly user-facing errors (e.g. "Invalid cluster name"). We also add a bounds check to the parsed tenant ID so that the process does not crash on an invalid tenant ID. More tests were added as well. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Jay Lim <[email protected]>
Previsouly the sql proxy code was in the CC repo. This was making the
testing of the proxy against a live SQL server hard and was also
requiring a frequent cockroach repo bumps in case of changes.
This moves all the code from the CC report to the DB repo so now the
proxy is part of the cockroach executable.
More detailed list of changed:
moving over from the CC repo.
separated into a new ProxyHandler. Added tests for the proxy handler.
server.
options, use of directory server etc.
external signal.
stopper to quiesce and the process to terminate successfuly.
testing secure connections witout generating explicit key/cert files.
during testing.
Here is how to test end to end the proxy, SQL tenant and host server,
using the test directory:
Release note: None