-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A21-Service-Config-Error-Handling #100
Conversation
@yashykt, this looks like a previous proposal. Please ping the PR when it is up-to-date with the latest proposal. |
…ad balancing policy
@ejona86 From the meeting, we had discussed primarily 3 changes -
2 seemed like it should be a different proposal, while 3 doesn't seem to be a change, but I guess it is better to call it explicitly. I have added the two. Thanks for looking! |
3d4e5f4
to
604d2cf
Compare
604d2cf
to
f0f83ac
Compare
@markdroth PTAL. I have omitted details on the current state of the various fields with their invalid values. Let me know if those should be added here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will need to be changes, but this looks pretty good. Since I'll be OOO, I'm fine if it goes in without another review from me.
A21-Service-Config-Error-Handling.md
Outdated
configuration. | ||
1. If it previously received an update with a valid service config, it should | ||
continue using the configuration from that config. | ||
2. If this is a new client or a client which has not yet received a valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be clear: a null (non-present) service config is a valid service config for this. If there previously was not a service config, and now there is one but it is invalid, that should fall into case (1). Note that empty is different than null.
I think I would rephrase it like this:
 1. If it previously had selected a valid service config or no service config, it
 should continue using that previous configuration. An empty service config is valid.
2. If the client has not previously selected a valid service config or no service
 config, for example a new client, then:
@markdroth and I just had a conversation, and since the service config is supposed to be atomic along with the addresses, we sort of agreed that the name resolver could fail the entire update (no longer provide address updates while the service config is broken). At that point, it would be become more like:
 1. If it previously had selected addresses and service config (or no service config), it
should continue using that previous configuration (including address). An empty
service config is valid.
2. If it has not previously selected addresses and service config, for example a new
client, then:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepting the first rephrasing suggestion.
I don't think failing the entire update works. In the scenario that we have a bad service config, and the addresses change later, the entire system would go down even with a default service config configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I agree that we'd have the name resolver return the tuple of (list of addresses, magic cookie for parsed service config). The Channel however may decide to fail the entire update if there isn't a default service config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks really good. There's only one substantive issue (the one about the API semantics between the resolver and the channel); the other stuff is mostly minor nits.
Thanks for doing this!
Reviewed 1 of 1 files at r6.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @carl-mastrangelo, @ejona86, @lyuxuan, and @yashykt)
A21-Service-Config-Error-Handling.md, line 49 at r5 (raw file):
Previously, yashykt (Yash Tibrewal) wrote…
Accepting the first rephrasing suggestion.
I don't think failing the entire update works. In the scenario that we have a bad service config, and the addresses change later, the entire system would go down even with a default service config configured.
I think the way we want this to work is the following:
- When the resolver returns a result, it returns two things:
- A list of addresses (which may be empty, as per the request routing design).
- A parsed service config or an error indicating that the config parsing failed.
- The resolver always returns the most recent data it has, even if that most recent data is an error indicating that parsing failed. (In other words, the resolver is not responsible for caching previously parsed service configs; that's the channel's responsibility.)
- When the channel gets a result from the resolver, if the resolver indicates that service config parsing failed, the channel can decide how to handle it.
- If the channel has a parsed config that was returned by the resolver as part of a previous result, it can keep using that.
- Otherwise, if the channel has a default config set via the client API, it can use that.
- If neither of the above two options are available, the channel can ignore the resolver result and put itself into state TRANSIENT_FAILURE.
A21-Service-Config-Error-Handling.md, line 144 at r5 (raw file):
Previously, yashykt (Yash Tibrewal) wrote…
view comment above this failing when new addresses are received with a bad service config and the entire system going down even when a default service config is provided.
See above. I think we want the caching to happen in the channel, not the resolver.
A21-Service-Config-Error-Handling.md, line 66 at r6 (raw file):
of the service config needs to be in a valid JSON format. If the service config is well formatted, the client validates each field that it recognizes while ignoring the unknown fields, i.e., unknown fields do not affect validation. For
Suggest starting a new paragraph between these two sentences.
A21-Service-Config-Error-Handling.md, line 67 at r6 (raw file):
is well formatted, the client validates each field that it recognizes while ignoring the unknown fields, i.e., unknown fields do not affect validation. For known fields, valid values are defined by the field itself. Fields can accept
I think the wording "defined by the field itself" is a bit vague. What we should say here is that any future gRFC adding new fields to the service config must clearly define the validation criteria for the fields' values.
Note that since this requirement did not exist previously, we need to define the criteria for all existing fields in this gRFC. Alternatively, we could retroactively alter all existing gRFCs that add service config fields to define their criteria, which would require modifications to the following:
- the fields defined in the original service config design
- A6: Client Retries
- A17: Client-side Health Checking
- A24: LB Policy Config
A21-Service-Config-Error-Handling.md, line 68 at r6 (raw file):
ignoring the unknown fields, i.e., unknown fields do not affect validation. For known fields, valid values are defined by the field itself. Fields can accept lists where some of the values may be unknown. As long as there is atleast one
s/atleast/at least/
A21-Service-Config-Error-Handling.md, line 173 at r6 (raw file):
require us to support partial roll-backs.) C -
These three are all being formatted on the same line, which is probably not what you intended.
|
||
### Behavior on receiving a new gRPC Config | ||
|
||
1. A gRPC config can have multiple service configs. When multiple service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have been calling this the Service Config "choice".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it better to be verbose here, and be clear about what needs to be done from the point after receiving the grpc config
A21-Service-Config-Error-Handling.md
Outdated
config. An empty service config is valid. | ||
2. If the client has not previously selected a valid service config or no | ||
service config, for example a new client, then: | ||
1. It uses the default service config, if configured. Note that, an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be clarified as well? If the service config is passed as a channel arg (or via some, non-resolver based way), what should the error handling look like? This is also the same scenario as this bullet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default service config also needs to be validated if that's what you mean.
cb93a29
to
fb97f12
Compare
@markdroth I will send out PRs for the already defined fields that lack clarity on validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the resolver does not see the default service config, then we would need a different way to validate the default service config
The channel (and not the resolver) will be responsible for validating the default config, and it can use the same API that the resolver would use for doing the parsing.
There is a related question of having a way for applications to parse a service config and check that it's valid. We may want to discuss providing some sort of API for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
I'm approving this, but please don't merge until you've gotten approval on the PRs to update the existing docs that define service config fields.
Thanks!
A24 update: Add validation rules as per #100
A17 update: Specify validation rule as per #100
PRs #128, #129, #130 and grpc/grpc#18225 have been merged to update existing service config fields with the validation criteria |
Thanks for all the reviews! |
Discussion thread at https://groups.google.com/forum/#!topic/grpc-io/bAxyse6c2eI