Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP retry spec clarification needed #2737

Closed
mateuszrzeszutek opened this issue Aug 17, 2022 · 3 comments · Fixed by #2743
Closed

HTTP retry spec clarification needed #2737

mateuszrzeszutek opened this issue Aug 17, 2022 · 3 comments · Fixed by #2743
Assignees
Labels
area:semantic-conventions Related to semantic conventions [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR semconv:HTTP spec:trace Related to the specification/trace directory

Comments

@mateuszrzeszutek
Copy link
Member

What are you trying to achieve?

I started working on implementing the HTTP retry spec in the Java HTTP client instrumentations, and I ran into several questions that I'd like to clarify.

  • According to this example redirects don't add the http.retry_count attribute, is that right?
  • What about authorization scenarios? E.g. the server returns 401, so the HTTP client automatically adds a new Authorization header and retries the request. Should this scenario follow the same behavior as redirects?
  • Does the retry counter reset if e.g. redirect requests are retried? E.g.
    GET / - 500 (CLIENT, trace=t1, span=s1)
     |
     --- server (SERVER, trace=t1, span=s2)
    
    GET / - 302 (CLIENT, trace=t2, span=s1, http.retry_count=1)
     |
     --- server (SERVER, trace=t2, span=s2)
    
    GET /hello - 500 (CLIENT, trace=t3, span=s1)
     |
     --- server (SERVER, trace=t3, span=s2)
    
    GET /hello - 200 (CLIENT, trace=t4, span=s1, http.retry_count=1)
     |
     --- server (SERVER, trace=t4, span=s2)
    
  • I've skimmed through several HTTP clients sources, and some of them use the same code for all kind of retries (redirects, auth, actual retries). Would it be correct to assume that status codes >=400 (or 500? depending on the auth scenario I guess) should be treated as retries? Request finished with 3xx (and lower) would not get the retry count attribute.

Additional context.

open-telemetry/opentelemetry-java-instrumentation#5722

@mateuszrzeszutek mateuszrzeszutek added the spec:trace Related to the specification/trace directory label Aug 17, 2022
@Oberon00
Copy link
Member

Oberon00 commented Aug 17, 2022

I feel like retry & redirect should be counted with the same counter (they are both "re-sends", see https://www.rfc-editor.org/rfc/rfc9110#name-redirection-3xx). One could add something like a "resend_reason" that, apart from distinguishing redirect and other retries, could also give more detailed information on the "retry" (e.g. retry with authorization, retry after network error, retry after HTTP error)

@Oberon00 Oberon00 added semconv:HTTP area:semantic-conventions Related to semantic conventions labels Aug 17, 2022
@tigrannajaryan tigrannajaryan removed their assignment Aug 17, 2022
@tigrannajaryan tigrannajaryan added help wanted Extra attention is needed [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR labels Aug 17, 2022
@tigrannajaryan
Copy link
Member

We need an assignee for this.

@mateuszrzeszutek
Copy link
Member Author

I feel like retry & redirect should be counted with the same counter (they are both "re-sends", see https://www.rfc-editor.org/rfc/rfc9110#name-redirection-3xx).

I totally agree with that, after thinking about this for a while I too came to a conclusion that we should count all retries, no matter the reason. That's mostly because:

  • if we were to only count >=500 responses as retries, then this behavior would be in contradiction with how the HTTP client span status is calculated: some errors (40x), even if they cause the request to be re-sent, are not treated as retries.
  • if we were to count >=400 responses as retries, then there would be a weird discrepancy between how redirects and auth errors are treated. Both are more or less standard scenarios for re-sending the request in the HTTP spec ("The user agent MAY repeat the request with a new or replaced Authorization header field", and, in fact, probably most of the HTTP clients are able to do that automatically), but for some reason we're treating one of them in a different way.
  • so, to sum up, to me the most sensible solution is to bump the retry counter every time the HTTP request gets re-sent, no matter what the cause is.

One could add something like a "resend_reason" that, apart from distinguishing redirect and other retries, could also give more detailed information on the "retry" (e.g. retry with authorization, retry after network error, retry after HTTP error)

This could be quite easily derived from the HTTP status (or the exception event, in case of some network/IO error), so I think I'd initially skip this.

We need an assignee for this.

If it's about actually implementing the changes and making a PR, you can assign this to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR semconv:HTTP spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants