-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow ingester's read path to return gRPC errors #6680
Conversation
ff2322b
to
1a38827
Compare
1a38827
to
1bc3fc2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see a bug in pkg/ingester/active_series.go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comments.
pkg/ingester/errors_test.go
Outdated
} | ||
|
||
func TestHandleReadErrorWithHTTPGRPC(t *testing.T) { | ||
originalMsg := "this is an error" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Nit]
originalMsg := "this is an error" | |
const originalMsg = "this is an error" |
pkg/ingester/errors.go
Outdated
func handleReadError(err error) error { | ||
var ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we do this?
func handleReadError(err error) error { | |
var ( | |
func handleReadError(err error) error { | |
if err == nil { | |
return nil | |
} | |
var ( |
I don't see the check in the calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I'm wrong because I didn't see that what we call is an ingester function with the same name. Can we just put this code into that method? It's confusing to have a method which sometimes calls a function with same name, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All handleError
-like functions are placed in errors.go
, but as the documentation of some of them states, some of them are there only for backwards compatibility and will be removed in 2.12.
WDYT if we do this kind of improvement once we removed the legacy code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In mimir 2.12 we should get rid of -ingester.return-only-grpc-errors
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All handleError-like functions are placed in errors.go, but as the documentation of some of them states, some of them are there only for backwards compatibility and will be removed in 2.12.
The documentation of this method doesn't state anything. It's just called handleReadError
and we need to look into it to understand what it's doing. I'd rather call it mapReadErrorToGRPCStatus
.
OTOH, the caller code is:
func (i *Ingester) handleReadError(err error) error {
if err == nil {
return nil
}
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
return err
}
if i.cfg.ReturnOnlyGRPCErrors {
return handleReadError(err)
}
return handleReadErrorWithHTTPGRPC(err)
}
Which doesn't clarify what is happening (what is handling? it should be mapping IMO).
WDYT if we do this kind of improvement once we removed the legacy code?
I forsee this code staying here for at least between 6 and 9 months, so I would prioritize making it the most expressive and cleaner now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also noticed that handleReadError
isn't handling errors, but translating them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just kept the same naming convention that was used previously. I will now rename all the methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree with Oleg that handleReadError
can be inlined into Ingester.handleReadError
since it's just cognitive overhead for this tiny amount of code to be in its own function. Another benefit of inlining it would be that the test would be on ingester.handleReadError
instead, for better coverage (and again less total complexity).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't do it right now but only once the HTTP status codes are removed.
Currently, the goal of Ingester.HandleReadError()
(renamed in Ingester.MapReadErrorToErrorWithStatus()
) is to understand whether the -ingester.return-only-grpc-errors
is true or false, and to call the right function that will do the actual mapping. The corresponding test tests this behavior.
The actual mapping is currently done in errors.go
(because all the error-related things are in that source), and the tests for both cases of mapping (httpgrpc and grpc) is done there. Once we remove the legacy part, it will make sense to move the mapping in ingester.go. IMO
Signed-off-by: Yuri Nikolic <[email protected]>
Signed-off-by: Yuri Nikolic <[email protected]>
1bc3fc2
to
de22b89
Compare
Signed-off-by: Yuri Nikolic <[email protected]>
Re:
And all the comments about removing stuff in 2.12: Do we have a migration plan scratched in some issue to understand which version will carry what? We're delivering the Removing this flag (and assuming a truthy value, right?) means that we're removing the return of non-grpc status codes from the ingester. Before removing something, we need to keep the feature as deprecated for at least two versions. If you just remove the flag in 2.12 and remove the code commented as meant to be removed, someone updating from 2.10 will have their distributors receiving gRPC status messages that they don't understand. Additionally, I see code in the distributor error handling saying it should be removed in 2.12: but since nobody will have the flag enabled in 2.11, it means that when updating to 2.12, some distributors will be updated to not handle HTTP status code, while ingesters are still returning HTTP status codes. Summarizing: we need to have an issue/markdown documenting the migration path, and what's changing in each version. All of that should respect the two-versions deprecation (not removing) policy. |
@colega What is the best place to write the migration path? The idea was really to get rid of this in mimir 2.12.0, but we are slow with moving ahead. Then we'll think about actual migration plan. I am expecting this one to be the last ingester-related change. And the CLI flag in question is ingester-related. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.Error(t, tooBusyError) | ||
require.Equal(t, "the ingester is currently too busy to process queries, try again later", tooBusyError.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Nit] Simplification
require.Error(t, tooBusyError) | |
require.Equal(t, "the ingester is currently too busy to process queries, try again later", tooBusyError.Error()) | |
require.EqualError(t, tooBusyError, "the ingester is currently too busy to process queries, try again later") |
pkg/ingester/errors.go
Outdated
func handleReadError(err error) error { | ||
var ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree with Oleg that handleReadError
can be inlined into Ingester.handleReadError
since it's just cognitive overhead for this tiny amount of code to be in its own function. Another benefit of inlining it would be that the test would be on ingester.handleReadError
instead, for better coverage (and again less total complexity).
Signed-off-by: Yuri Nikolic <[email protected]>
I think an issue is the easiest way to go.
We need to write a plan before stating versions. |
Signed-off-by: Yuri Nikolic <[email protected]>
@colega So what is the best way to merge this PR? Remove versions
As agreed via Slack, the following actions will be done:
|
pkg/distributor/errors.go
Outdated
// TODO This code is needed for backwards compatibility, since ingesters may still return | ||
// errors created by httpgrpc.Errorf(). If pushErr is one of those errors, we just propagate | ||
// it. This code should be removed in mimir 2.12.0. | ||
// it. This code should be removed together with the removal of `-ingester.return-only-grpc-errors`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't be sure that this code should be removed when the flag is removed. That is a conclusion that should come from a migration plan, which we don't have yet.
Please review this comment once you have the migration plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed all related TODOs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
pkg/util/http.go
Outdated
// IsHTTPStatusCode returns true if the given code is a valid HTTP status code, or false otherwise. | ||
func IsHTTPStatusCode(code codes.Code) bool { | ||
httpStatus := http.StatusText(int(code)) | ||
return httpStatus != "" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a test. I can imagine a next golang version returning "Unknown" string for unknown http.StatusText
and breaking this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will replace it with the previous implementation
return int(statusCode) >= 100 && int(statusCode) < 600
Signed-off-by: Yuri Nikolic <[email protected]>
Signed-off-by: Yuri Nikolic <[email protected]>
What this PR does
In #6443 we introduced the experimental CLI flag
-ingester.return-grpc-errors-only
that, when set totrue
makes ingester's write path return errors with gRPC codes only. This PR extends the effect of setting this flag totrue
to the ingester's read path. The default value remainsfalse
, meaning that the semantics of both ingester's write and read path is the same as before #6443 and this PR respectively.This PR also enriches the
mimirpb.ErrorCause
enum with an additional symbolic valuemimirpb.TOO_BUSY
.This table represents the differences between errors returned by ingester's read path depending on the
-ingester.return-grpc-errors-only
configuration.-ingester.grpc-errors-enabled=false
-ingester.grpc-errors-enabled=false
tooBusyError
httpgrpc.Errorf()
- code:
http.StatusServiceUnavailable
status.New()
- code:
codes.ResourceExhausted
- cause:
mimirpb.TOO_BUSY
unavailableError
status.New()
- code:
codes.Unavailable
- remark: opens circuit breaker
status.New()
- code:
codes.Unavailable
- cause:
mimirpb.SERVICE_UNAVAILABLE
- remark: opens circuit breaker
ingesterError
interfaceIngester.Push()
- implicitly assigned
codes.Unknown
by gRPCstatus.New()
- code:
codes.Internal
- cause:
mimirpb.UNKNOWN_CAUSE
Which issue(s) this PR fixes or relates to
Part of #6008.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]