Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Resolver and Balancer APIs (gRFC L9) #1408

Merged
merged 14 commits into from
Aug 31, 2017

Conversation

menghanl
Copy link
Contributor

@menghanl menghanl commented Jul 31, 2017

This is the first step for #1388.
See gRFC L9 for details: grpc/proposal#30

This PR adds package balancer and resolver.
It changes ClientConn internals to new APIs and adds a wrapper for v1 balancer.

It does minimal change to ClientConn's behavior:

  • Blocking dial blocks until the ClientConn's state is READY

$new-bar-2$

@menghanl menghanl changed the title Add new Resolver and Balancer APIs Add new Resolver and Balancer APIs (gRFC L9) Jul 31, 2017
@dfawley dfawley self-requested a review August 4, 2017 20:50
@dfawley dfawley self-assigned this Aug 4, 2017
@menghanl menghanl force-pushed the bar_api_change branch 6 times, most recently from ab7246e to 6aaa555 Compare August 7, 2017 23:34

var (
// m is a map from name to balancer builder.
m map[string]Builder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m = make(map[string]Builder) to save the line in init()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Get returns the resolver builder registered with the given name.
// If no builder is register with the name, the default pickfirst will
// be used.
func Get(name string) (b Builder, ok bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "ok" necessary? Or can the user check b == nil?

or even:

var defaultBuilder Builder = &pickfirst.Builder{}

-- now "!ok" is not possible (once implemented).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
Return defaultBuilder if map lookup is not ok.

// SubConnection represents a gRPC sub connection.
// Each sub connection contains a list of addresses. gRPC will
// try to connect to them (in sequence), and stop trying the
// remainings if one connection was successful.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"// remainder once one connection is successful"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// The reconnect backoff will be applied on the list, not a single address.
// For example, try_on_all_addresses -> backoff -> try_on_all_addresses.
//
// All SubConnection starts in IDLE, and will not try to connect. To trigger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"All SubConnections start"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// All SubConnection starts in IDLE, and will not try to connect. To trigger
// the connecting, Balancers must call Connect.
// When the connection encounters an error, it will reconnect immediately.
// When the connection becomes IDLE, it will not reconnect unless Connect is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we document when this could happen (that a connection goes to IDLE)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be documented in the connectivity semantics doc. Also, it's not clear when a SubConn should go IDLE now...


// Package resolver defines APIs for name resolution in gRPC.
// All APIs in this package are experimental.
package resolver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of the comments in the balancer package apply here as well.

Copy link
Contributor Author

@menghanl menghanl Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// SetDefaultScheme sets the default scheme that will be used.
// The default default scheme is "dns".
func SetDefaultScheme(scheme string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: balancer comments, how about setting the default scheme to a Builder instead of a string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolver is a bit different from balancer I think.
For balancer, users cannot override the default balancer. The default will always be pickfirst.
For resolver, the default is whatever the users set here.

If we read the map in SetDefaultScheme function, we need to make sure the builder was already registered.

We could make this SetDefaultBuilder(b Builder).
Then if users set the default resolver to one with scheme "blah", and then dial with "blah:///", we should find the correct resolver to use. So we need to register the default Builder as well.

// of resolved addresses.
// The address list should be the complete list of new addresses.
// Two consecutive call with the same addresses will be noop. It's preferred
// that resolver doesn't call this consecutively with non-updated addresses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we even need this sentence. Like, if the resolver refreshes every 30 minutes and calls this, it's not a problem. The idea was to avoid a busy-loop situation, right? I'm not sure who would implement a resolver to continually call NewAddress, or what line of thinking might even lead to that.

// Resolver watches for the updates on the specified target.
// Updates include address updates and service config updates.
type Resolver interface {
// ResolveNow will be called by gRPC to try to resolve the target name again.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For further discussion: can we avoid the need for this by having grpc start up a new Resolver for the same target?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting up a new resolver means we need to support switching resolver.
This sounds more complicated than just calling a function on the existing resolver...

clientconn.go Outdated
}

// ConnectivityState indicates the state of a client connection.
type ConnectivityState int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes will break anyone using the connectivity state API (it's new, but, still).

Maybe we can push a PR with only that change in it ASAP while we iterate on this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #1430

@menghanl menghanl force-pushed the bar_api_change branch 6 times, most recently from 8997c7e to d3ded16 Compare August 10, 2017 19:12
Add and use connectivity package for states (grpc#1430)

* Add and use connectivity package
* Mark cc state APIs as experimental

Add balancer, resolver and connectivity package.

add balancer_v1_wrapper.go and remove grpclb.go

all test passed, how?

end2end passed, nil pointer failure, ac.wait return nil transport

fix ac.wait return nil transport, races not fixed (accessing cc.balancer)

end2end passed, TODO grpclb

all test passed

add grpclb back

move building balancer out from a goroutine to avoid race

Otherwise, Dial() and Close() may race.

Mark new APIs experimental

split resetAddrConn into newAddrConn and connect

add acBalancerWrapper

rename file to balancer_conn_wrappers.go

remove grpclog.Print

make TODO(bar)

remove Print

comments fixes

add missing license

fix build failure after merge

fix race in grpclb
@dfawley dfawley assigned menghanl and unassigned dfawley Aug 23, 2017
@menghanl menghanl assigned dfawley and unassigned menghanl Aug 23, 2017
clientconn.go Outdated
dopts dialOptions
events trace.EventLog
acbw *acBalancerWrapper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a SubConn type instead? We only need it so we can notify the balancer that the ac's state changed, IIUC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if !ok {
return
}
ccb.cc.removeAddrConn(acbw.getAddrConn(), errConnClosing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be errConnDrain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// If it's in the list, the connection will be kept.
// If it's not in the list, the connection will gracefully closed, and
// a new connection will be created.
UpdateAddresses([]resolver.Address)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What SubConn state changes could result from this call if the current address is not in the new list? It should depend on the current state of the subconn, presumably:

Idle->Idle (Would this trigger a callback?)
Connecting->Connecting (to new address) ? (Would this trigger a callback?)
Ready->TransientFailure->Connecting->Ready ?
TransientFailure->Connecting->Ready
Shutdown->Shutdown

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Idle and Shutdown, it won't trigger a callback.
For the other 3 cases, yes, they can all happen.
We actually can have Ready->Connecting->Ready.

I updated the comment to mention this can trigger state update for the subconn.
I also added a TODO to check all the state transitions.

// The returned put() function will be called once the rpc has finished, with the
// final status of that RPC.
// put may be nil if balancer doesn't care about the RPC status.
Pick(ctx context.Context, opts PickOptions) (conn SubConn, put func(PutInfo), err error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/put/done/g?

Or maybe rpcDone or rpcEnd?

(I would rather this not seem like the opposite operation as Pick().)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dfawley dfawley added the Type: API Change Breaking API changes (experimental APIs only!) label Aug 24, 2017
if oldSC != nil {
// Teardown old sc.
delete(bw.conns, oldA)
delete(bw.connSt, oldSC)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q. Why is this not protected via bw.mu as in line 187-188?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I believe this needs to be protected by the mutex. Will fix it soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Please take a look.
Thanks for pointing this out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Np. Thanks!

@menghanl menghanl added this to the 1.7 Release milestone Aug 30, 2017
@menghanl menghanl merged commit 8233e12 into grpc:master Aug 31, 2017
@menghanl menghanl deleted the bar_api_change branch August 31, 2017 17:59
@menghanl menghanl mentioned this pull request Nov 29, 2017
8 tasks
@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: API Change Breaking API changes (experimental APIs only!)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants