Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ingester's read path to return gRPC errors #6680

Merged
merged 7 commits into from
Nov 24, 2023

Conversation

duricanikolic
Copy link
Contributor

@duricanikolic duricanikolic commented Nov 17, 2023

What this PR does

In #6443 we introduced the experimental CLI flag -ingester.return-grpc-errors-only that, when set to true makes ingester's write path return errors with gRPC codes only. This PR extends the effect of setting this flag to true to the ingester's read path. The default value remains false, meaning that the semantics of both ingester's write and read path is the same as before #6443 and this PR respectively.

This PR also enriches the mimirpb.ErrorCause enum with an additional symbolic value mimirpb.TOO_BUSY.

This table represents the differences between errors returned by ingester's read path depending on the -ingester.return-grpc-errors-only configuration.

error -ingester.grpc-errors-enabled=false -ingester.grpc-errors-enabled=false
tooBusyError - created by: httpgrpc.Errorf()
- code: http.StatusServiceUnavailable
- created by: status.New()
- code: codes.ResourceExhausted
- cause: mimirpb.TOO_BUSY
unavailableError - created by: status.New()
- code: codes.Unavailable
- remark: opens circuit breaker
- created by: status.New()
- code: codes.Unavailable
- cause: mimirpb.SERVICE_UNAVAILABLE
- remark: opens circuit breaker
other errors not implementing
ingesterError interface
- propagated by Ingester.Push()
- implicitly assigned codes.Unknown by gRPC
- created by: status.New()
- code: codes.Internal
- cause: mimirpb.UNKNOWN_CAUSE

Which issue(s) this PR fixes or relates to

Part of #6008.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@duricanikolic duricanikolic force-pushed the yuri/handling-errors-new branch from ff2322b to 1a38827 Compare November 23, 2023 15:12
@duricanikolic duricanikolic changed the title Making tooBusyError a gRPC error Allow ingester's read path to return gRPC errors Nov 23, 2023
@duricanikolic duricanikolic force-pushed the yuri/handling-errors-new branch from 1a38827 to 1bc3fc2 Compare November 23, 2023 15:18
@duricanikolic duricanikolic self-assigned this Nov 23, 2023
@duricanikolic duricanikolic marked this pull request as ready for review November 23, 2023 15:35
@duricanikolic duricanikolic requested a review from a team as a code owner November 23, 2023 15:35
Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see a bug in pkg/ingester/active_series.go.

CHANGELOG.md Outdated Show resolved Hide resolved
pkg/ingester/active_series.go Outdated Show resolved Hide resolved
@aknuds1 aknuds1 requested a review from a team November 23, 2023 16:03
@aknuds1 aknuds1 added enhancement New feature or request component/ingester labels Nov 23, 2023
Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments.

pkg/ingester/errors_test.go Outdated Show resolved Hide resolved
pkg/ingester/errors_test.go Outdated Show resolved Hide resolved
}

func TestHandleReadErrorWithHTTPGRPC(t *testing.T) {
originalMsg := "this is an error"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit]

Suggested change
originalMsg := "this is an error"
const originalMsg = "this is an error"

pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
pkg/ingester/ingester.go Outdated Show resolved Hide resolved
@aknuds1 aknuds1 requested a review from a team November 23, 2023 16:43
Comment on lines 599 to 600
func handleReadError(err error) error {
var (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we do this?

Suggested change
func handleReadError(err error) error {
var (
func handleReadError(err error) error {
if err == nil {
return nil
}
var (

I don't see the check in the calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I'm wrong because I didn't see that what we call is an ingester function with the same name. Can we just put this code into that method? It's confusing to have a method which sometimes calls a function with same name, IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All handleError-like functions are placed in errors.go, but as the documentation of some of them states, some of them are there only for backwards compatibility and will be removed in 2.12.
WDYT if we do this kind of improvement once we removed the legacy code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In mimir 2.12 we should get rid of -ingester.return-only-grpc-errors too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All handleError-like functions are placed in errors.go, but as the documentation of some of them states, some of them are there only for backwards compatibility and will be removed in 2.12.

The documentation of this method doesn't state anything. It's just called handleReadError and we need to look into it to understand what it's doing. I'd rather call it mapReadErrorToGRPCStatus.

OTOH, the caller code is:

func (i *Ingester) handleReadError(err error) error {
	if err == nil {
		return nil
	}

	if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
		return err
	}

	if i.cfg.ReturnOnlyGRPCErrors {
		return handleReadError(err)
	}
	return handleReadErrorWithHTTPGRPC(err)
}

Which doesn't clarify what is happening (what is handling? it should be mapping IMO).

WDYT if we do this kind of improvement once we removed the legacy code?

I forsee this code staying here for at least between 6 and 9 months, so I would prioritize making it the most expressive and cleaner now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed that handleReadError isn't handling errors, but translating them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just kept the same naming convention that was used previously. I will now rename all the methods.

Copy link
Contributor

@aknuds1 aknuds1 Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree with Oleg that handleReadError can be inlined into Ingester.handleReadError since it's just cognitive overhead for this tiny amount of code to be in its own function. Another benefit of inlining it would be that the test would be on ingester.handleReadError instead, for better coverage (and again less total complexity).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't do it right now but only once the HTTP status codes are removed.
Currently, the goal of Ingester.HandleReadError() (renamed in Ingester.MapReadErrorToErrorWithStatus()) is to understand whether the -ingester.return-only-grpc-errors is true or false, and to call the right function that will do the actual mapping. The corresponding test tests this behavior.

The actual mapping is currently done in errors.go (because all the error-related things are in that source), and the tests for both cases of mapping (httpgrpc and grpc) is done there. Once we remove the legacy part, it will make sense to move the mapping in ingester.go. IMO

@duricanikolic duricanikolic force-pushed the yuri/handling-errors-new branch from 1bc3fc2 to de22b89 Compare November 23, 2023 17:44
Signed-off-by: Yuri Nikolic <[email protected]>
@colega
Copy link
Contributor

colega commented Nov 24, 2023

Re:

@duricanikolic: In mimir 2.12 we should get rid of -ingester.return-only-grpc-errors too.

And all the comments about removing stuff in 2.12:

Do we have a migration plan scratched in some issue to understand which version will carry what?

We're delivering the -ingester.return-only-grpc-errors flag in 2.11, but note that nobody will have this flag enabled.

Removing this flag (and assuming a truthy value, right?) means that we're removing the return of non-grpc status codes from the ingester. Before removing something, we need to keep the feature as deprecated for at least two versions.

If you just remove the flag in 2.12 and remove the code commented as meant to be removed, someone updating from 2.10 will have their distributors receiving gRPC status messages that they don't understand.

Additionally, I see code in the distributor error handling saying it should be removed in 2.12: but since nobody will have the flag enabled in 2.11, it means that when updating to 2.12, some distributors will be updated to not handle HTTP status code, while ingesters are still returning HTTP status codes.

Summarizing: we need to have an issue/markdown documenting the migration path, and what's changing in each version. All of that should respect the two-versions deprecation (not removing) policy.

@duricanikolic
Copy link
Contributor Author

duricanikolic commented Nov 24, 2023

Re:

@duricanikolic: In mimir 2.12 we should get rid of -ingester.return-only-grpc-errors too.

And all the comments about removing stuff in 2.12:

Do we have a migration plan scratched in some issue to understand which version will carry what?

We're delivering the -ingester.return-only-grpc-errors flag in 2.11, but note that nobody will have this flag enabled.

Removing this flag (and assuming a truthy value, right?) means that we're removing the return of non-grpc status codes from the ingester. Before removing something, we need to keep the feature as deprecated for at least two versions.

If you just remove the flag in 2.12 and remove the code commented as meant to be removed, someone updating from 2.10 will have their distributors receiving gRPC status messages that they don't understand.

Additionally, I see code in the distributor error handling saying it should be removed in 2.12: but since nobody will have the flag enabled in 2.11, it means that when updating to 2.12, some distributors will be updated to not handle HTTP status code, while ingesters are still returning HTTP status codes.

Summarizing: we need to have an issue/markdown documenting the migration path, and what's changing in each version. All of that should respect the two-versions deprecation (not removing) policy.

@colega What is the best place to write the migration path?

The idea was really to get rid of this in mimir 2.12.0, but we are slow with moving ahead.
I have now replaced the related 2.12.0 occurrences with 2.14.0.

Then we'll think about actual migration plan. I am expecting this one to be the last ingester-related change. And the CLI flag in question is ingester-related.

Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side it looks pretty good now, although I agree with @colega that it would be beneficial to inline handleReadError (unless I'm overlooking something).

Before merging, @colega's review should be concluded though :)

pkg/ingester/errors.go Show resolved Hide resolved
Comment on lines +241 to +242
require.Error(t, tooBusyError)
require.Equal(t, "the ingester is currently too busy to process queries, try again later", tooBusyError.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Simplification

Suggested change
require.Error(t, tooBusyError)
require.Equal(t, "the ingester is currently too busy to process queries, try again later", tooBusyError.Error())
require.EqualError(t, tooBusyError, "the ingester is currently too busy to process queries, try again later")

Comment on lines 599 to 600
func handleReadError(err error) error {
var (
Copy link
Contributor

@aknuds1 aknuds1 Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree with Oleg that handleReadError can be inlined into Ingester.handleReadError since it's just cognitive overhead for this tiny amount of code to be in its own function. Another benefit of inlining it would be that the test would be on ingester.handleReadError instead, for better coverage (and again less total complexity).

Signed-off-by: Yuri Nikolic <[email protected]>
@colega
Copy link
Contributor

colega commented Nov 24, 2023

What is the best place to write the migration path?

I think an issue is the easiest way to go.

I have now replaced the related 2.12.0 occurrences with 2.14.0.

We need to write a plan before stating versions.

@duricanikolic
Copy link
Contributor Author

We need to write a plan before stating versions.

@colega So what is the best way to merge this PR? Remove versions

What is the best place to write the migration path?

I think an issue is the easiest way to go.

I have now replaced the related 2.12.0 occurrences with 2.14.0.

We need to write a plan before stating versions.

As agreed via Slack, the following actions will be done:

  • Mentions of mimir versions where the flag and methods will be removed are removed from the current TODOs
  • The issue All errors should be safe to be wrapped and retained #6008 will be updated with the migration path
  • Once the migration path is approved, the TODOs will be denoted with the correct versions

// TODO This code is needed for backwards compatibility, since ingesters may still return
// errors created by httpgrpc.Errorf(). If pushErr is one of those errors, we just propagate
// it. This code should be removed in mimir 2.12.0.
// it. This code should be removed together with the removal of `-ingester.return-only-grpc-errors`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't be sure that this code should be removed when the flag is removed. That is a conclusion that should come from a migration plan, which we don't have yet.

Please review this comment once you have the migration plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed all related TODOs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

pkg/util/http.go Outdated
Comment on lines 347 to 351
// IsHTTPStatusCode returns true if the given code is a valid HTTP status code, or false otherwise.
func IsHTTPStatusCode(code codes.Code) bool {
httpStatus := http.StatusText(int(code))
return httpStatus != ""
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a test. I can imagine a next golang version returning "Unknown" string for unknown http.StatusText and breaking this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will replace it with the previous implementation

return int(statusCode) >= 100 && int(statusCode) < 600

@duricanikolic duricanikolic merged commit bdb2e2d into main Nov 24, 2023
27 checks passed
@duricanikolic duricanikolic deleted the yuri/handling-errors-new branch November 24, 2023 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/ingester enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants