Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A21-Service-Config-Error-Handling #100

Merged
merged 13 commits into from
Mar 4, 2019

Conversation

yashykt
Copy link
Member

@yashykt yashykt commented Sep 18, 2018

@ejona86
Copy link
Member

ejona86 commented Oct 1, 2018

@yashykt, this looks like a previous proposal. Please ping the PR when it is up-to-date with the latest proposal.

@yashykt
Copy link
Member Author

yashykt commented Oct 1, 2018

@ejona86 From the meeting, we had discussed primarily 3 changes -

  1. Service config validation should ignore unknown fields.
  2. Changes to loadbalancing policy to make it a list.
  3. Behavior when the two service config channel arguments are set.

2 seemed like it should be a different proposal, while 3 doesn't seem to be a change, but I guess it is better to call it explicitly. I have added the two. Thanks for looking!

@yashykt yashykt force-pushed the serviceconfigerrorhandling branch from 604d2cf to f0f83ac Compare February 12, 2019 19:39
@yashykt
Copy link
Member Author

yashykt commented Feb 12, 2019

@markdroth PTAL. I have omitted details on the current state of the various fields with their invalid values. Let me know if those should be added here.

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will need to be changes, but this looks pretty good. Since I'll be OOO, I'm fine if it goes in without another review from me.

configuration.
1. If it previously received an update with a valid service config, it should
continue using the configuration from that config.
2. If this is a new client or a client which has not yet received a valid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be clear: a null (non-present) service config is a valid service config for this. If there previously was not a service config, and now there is one but it is invalid, that should fall into case (1). Note that empty is different than null.

I think I would rephrase it like this:

	1. If it previously had selected a valid service config or no service config, it
	should continue using that previous configuration. An empty service config is valid.
	2. If the client has not previously selected a valid service config or no service
	config, for example a new client, then:

@markdroth and I just had a conversation, and since the service config is supposed to be atomic along with the addresses, we sort of agreed that the name resolver could fail the entire update (no longer provide address updates while the service config is broken). At that point, it would be become more like:

	1. If it previously had selected addresses and service config (or no service config), it
	should continue using that previous configuration (including address). An empty
	service config is valid.
	2. If it has not previously selected addresses and service config, for example a new
	client, then:

Copy link
Member Author

@yashykt yashykt Feb 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting the first rephrasing suggestion.

I don't think failing the entire update works. In the scenario that we have a bad service config, and the addresses change later, the entire system would go down even with a default service config configured.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I agree that we'd have the name resolver return the tuple of (list of addresses, magic cookie for parsed service config). The Channel however may decide to fail the entire update if there isn't a default service config.

A21-Service-Config-Error-Handling.md Outdated Show resolved Hide resolved
A21-Service-Config-Error-Handling.md Outdated Show resolved Hide resolved
A21-Service-Config-Error-Handling.md Outdated Show resolved Hide resolved
A21-Service-Config-Error-Handling.md Outdated Show resolved Hide resolved
A21-Service-Config-Error-Handling.md Outdated Show resolved Hide resolved
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks really good. There's only one substantive issue (the one about the API semantics between the resolver and the channel); the other stuff is mostly minor nits.

Thanks for doing this!

Reviewed 1 of 1 files at r6.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @carl-mastrangelo, @ejona86, @lyuxuan, and @yashykt)


A21-Service-Config-Error-Handling.md, line 49 at r5 (raw file):

Previously, yashykt (Yash Tibrewal) wrote…

Accepting the first rephrasing suggestion.

I don't think failing the entire update works. In the scenario that we have a bad service config, and the addresses change later, the entire system would go down even with a default service config configured.

I think the way we want this to work is the following:

  • When the resolver returns a result, it returns two things:
    • A list of addresses (which may be empty, as per the request routing design).
    • A parsed service config or an error indicating that the config parsing failed.
  • The resolver always returns the most recent data it has, even if that most recent data is an error indicating that parsing failed. (In other words, the resolver is not responsible for caching previously parsed service configs; that's the channel's responsibility.)
  • When the channel gets a result from the resolver, if the resolver indicates that service config parsing failed, the channel can decide how to handle it.
    • If the channel has a parsed config that was returned by the resolver as part of a previous result, it can keep using that.
    • Otherwise, if the channel has a default config set via the client API, it can use that.
    • If neither of the above two options are available, the channel can ignore the resolver result and put itself into state TRANSIENT_FAILURE.

A21-Service-Config-Error-Handling.md, line 144 at r5 (raw file):

Previously, yashykt (Yash Tibrewal) wrote…

view comment above this failing when new addresses are received with a bad service config and the entire system going down even when a default service config is provided.

See above. I think we want the caching to happen in the channel, not the resolver.


A21-Service-Config-Error-Handling.md, line 66 at r6 (raw file):

of the service config needs to be in a valid JSON format. If the service config
is well formatted, the client validates each field that it recognizes while
ignoring the unknown fields, i.e., unknown fields do not affect validation. For

Suggest starting a new paragraph between these two sentences.


A21-Service-Config-Error-Handling.md, line 67 at r6 (raw file):

is well formatted, the client validates each field that it recognizes while
ignoring the unknown fields, i.e., unknown fields do not affect validation. For
known fields, valid values are defined by the field itself. Fields can accept

I think the wording "defined by the field itself" is a bit vague. What we should say here is that any future gRFC adding new fields to the service config must clearly define the validation criteria for the fields' values.

Note that since this requirement did not exist previously, we need to define the criteria for all existing fields in this gRFC. Alternatively, we could retroactively alter all existing gRFCs that add service config fields to define their criteria, which would require modifications to the following:


A21-Service-Config-Error-Handling.md, line 68 at r6 (raw file):

ignoring the unknown fields, i.e., unknown fields do not affect validation. For
known fields, valid values are defined by the field itself. Fields can accept
lists where some of the values may be unknown. As long as there is atleast one

s/atleast/at least/


A21-Service-Config-Error-Handling.md, line 173 at r6 (raw file):

require us to support partial roll-backs.)

C -

These three are all being formatted on the same line, which is probably not what you intended.


### Behavior on receiving a new gRPC Config

1. A gRPC config can have multiple service configs. When multiple service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have been calling this the Service Config "choice".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it better to be verbose here, and be clear about what needs to be done from the point after receiving the grpc config

config. An empty service config is valid.
2. If the client has not previously selected a valid service config or no
service config, for example a new client, then:
1. It uses the default service config, if configured. Note that, an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be clarified as well? If the service config is passed as a channel arg (or via some, non-resolver based way), what should the error handling look like? This is also the same scenario as this bullet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default service config also needs to be validated if that's what you mean.

A21-Service-Config-Error-Handling.md Outdated Show resolved Hide resolved
A21-Service-Config-Error-Handling.md Outdated Show resolved Hide resolved
@yashykt yashykt force-pushed the serviceconfigerrorhandling branch from cb93a29 to fb97f12 Compare February 28, 2019 23:20
@yashykt
Copy link
Member Author

yashykt commented Feb 28, 2019

@markdroth I will send out PRs for the already defined fields that lack clarity on validation.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the resolver does not see the default service config, then we would need a different way to validate the default service config

The channel (and not the resolver) will be responsible for validating the default config, and it can use the same API that the resolver would use for doing the parsing.

There is a related question of having a way for applications to parse a service config and check that it's valid. We may want to discuss providing some sort of API for that.

A21-Service-Config-Error-Handling.md Outdated Show resolved Hide resolved
A21-Service-Config-Error-Handling.md Outdated Show resolved Hide resolved
A21-Service-Config-Error-Handling.md Outdated Show resolved Hide resolved
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

I'm approving this, but please don't merge until you've gotten approval on the PRs to update the existing docs that define service config fields.

Thanks!

@yashykt
Copy link
Member Author

yashykt commented Mar 4, 2019

PRs #128, #129, #130 and grpc/grpc#18225 have been merged to update existing service config fields with the validation criteria

@yashykt
Copy link
Member Author

yashykt commented Mar 4, 2019

Thanks for all the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants