Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: gRPC server does not set NextProtocols #136367

Open
tbg opened this issue Nov 28, 2024 · 1 comment
Open

server: gRPC server does not set NextProtocols #136367

tbg opened this issue Nov 28, 2024 · 1 comment
Labels
branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-server-and-security DB Server & Security

Comments

@tbg
Copy link
Member

tbg commented Nov 28, 2024

Describe the problem

While upgrading gRPC from v1.56.3 to v1.68.0, I noticed that post-bump CRDB was unable to connect to pre-bump CRDB nodes, the nodes using gRPC at v1.68.0 would print

W241128 13:28:08.615942 205 server/init.go:402 ⋮ [T1,Vsystem,n?] 25 outgoing join rpc to ‹127.0.0.1:29000› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property"›

This check is new in v1.68.0 (as in, it's not yet there in v1.56.3). So this problem likely persists, but it has not caused any issues. This new check was added to provide a better failure mode. Apparently every HTTP2 server needs to support ALPN, but our HTTP2 server is gRPC and it is a bit of a mystery how we end up failing this check, especially given that everything works once we disable the check, which we can do via an env var.

See the full initial analysis here: #136278 (comment)

With a fork of grpc-go that turns the check off, #136278 can merge. This issue serves as a reminder to get to the bottom of this problem, as future versions of gRPC are likely to hard-code the check, and we don't want to have to be on a fork forever.

To Reproduce

See #136278 (comment)

Jira issue: CRDB-45001

@tbg tbg added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. branch-master Failures and bugs on the master branch. T-server-and-security DB Server & Security labels Nov 28, 2024
tbg added a commit to tbg/cockroach that referenced this issue Nov 28, 2024
This works around the issue discussed in cockroachdb#136367.  Apparently, we have
some misconfiguration or bug in gRPC v1.56.3 that makes the gRPC server
seem unable to properly support HTTP2.  This effectively breaks
communication between CRDB nodes at these two different gRPC versions.

Switch to a fork that disables the check (there is no other way to
disable it other than changing code).
@tbg
Copy link
Member Author

tbg commented Nov 28, 2024

Oh I understand it now. Our TLS config uses GetConfigForClient:

/pkg/security/certificate_manager.go#L416-L421

func (cm *CertificateManager) GetServerTLSConfig() (*tls.Config, error) {
	if _, err := cm.getEmbeddedServerTLSConfig(nil); err != nil {
		return nil, err
	}
	return &tls.Config{
		GetConfigForClient: cm.getEmbeddedServerTLSConfig,

in v1.56.3, gRPC does add h2 to NextProtocols, but only for the config itself. It doesn't make sure that the configs returned by GetConfigForClient do the same:

// NewTLS uses c to construct a TransportCredentials based on TLS.
func NewTLS(c *tls.Config) TransportCredentials {
	tc := &tlsCreds{credinternal.CloneTLSConfig(c)}
	tc.config.NextProtos = credinternal.AppendH2ToNextProtos(tc.config.NextProtos)
	return tc
}

In v1.68.0, this has been rectified in a fairly recent change: grpc/grpc-go#7813

https://github.com/grpc/grpc-go/blob/v1.68.0/credentials/tls.go#L201-L215

The long and short of it is that we need to make sure to disable this ALPN check until every CRDB version our binary might need to communicate with contains the v1.68.0 upgrade. That should be straightforward, and thanks to mixed-version testing it'll be pretty obvious if we get it wrong either. I'll sprinkle comments in go.mod to that effect.

tbg added a commit to tbg/cockroach that referenced this issue Nov 28, 2024
tbg added a commit to tbg/cockroach that referenced this issue Nov 29, 2024
This works around the issue discussed in cockroachdb#136367.  Apparently, we have
some misconfiguration or bug in gRPC v1.56.3 that makes the gRPC server
seem unable to properly support HTTP2.  This effectively breaks
communication between CRDB nodes at these two different gRPC versions.

Switch to a fork that disables the check (there is no other way to
disable it other than changing code).
tbg added a commit to tbg/cockroach that referenced this issue Nov 29, 2024
tbg added a commit to tbg/cockroach that referenced this issue Dec 2, 2024
This works around the issue discussed in cockroachdb#136367.  Apparently, we have
some misconfiguration or bug in gRPC v1.56.3 that makes the gRPC server
seem unable to properly support HTTP2.  This effectively breaks
communication between CRDB nodes at these two different gRPC versions.

Switch to a fork that disables the check (there is no other way to
disable it other than changing code).
tbg added a commit to tbg/cockroach that referenced this issue Dec 2, 2024
tbg added a commit to tbg/cockroach that referenced this issue Dec 4, 2024
This works around the issue discussed in cockroachdb#136367.  Apparently, we have
some misconfiguration or bug in gRPC v1.56.3 that makes the gRPC server
seem unable to properly support HTTP2.  This effectively breaks
communication between CRDB nodes at these two different gRPC versions.

Switch to a fork that disables the check (there is no other way to
disable it other than changing code).
tbg added a commit to tbg/cockroach that referenced this issue Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-server-and-security DB Server & Security
Projects
None yet
Development

No branches or pull requests

1 participant