-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migrate retryCallCluster for new ES client #71412
migrate retryCallCluster for new ES client #71412
Conversation
return iif( | ||
() => | ||
error instanceof esErrors.NoLivingConnectionsError || | ||
error instanceof esErrors.ConnectionError || | ||
error instanceof esErrors.TimeoutError || | ||
retryMigrationStatusCodes.includes(error.statusCode) || | ||
error?.body?.error?.type === 'snapshot_in_progress_exception', | ||
timer(delay), | ||
throwError(error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a direct adaptation of
kibana/src/core/server/elasticsearch/legacy/retry_call_cluster.ts
Lines 62 to 74 in 159369b
() => { | |
return ( | |
error instanceof esErrors.NoConnections || | |
error instanceof esErrors.ConnectionFault || | |
error instanceof esErrors.ServiceUnavailable || | |
error instanceof esErrors.RequestTimeout || | |
error instanceof esErrors.AuthenticationException || | |
error instanceof esErrors.AuthorizationException || | |
// @ts-expect-error | |
error instanceof esErrors.Gone || | |
error?.body?.error?.type === 'snapshot_in_progress_exception' | |
); | |
}, |
@delvedor Could you take a look see if I did not do any mapping mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The legacy client has a bug, where both the HTTP timeout error (408) and the socket timeout error will fall into the RequestTimeout
error.
In the new client, the socket timeout is TimeoutError
, while the HTTP timeout is a ResponseError
with the statusCode
property set to 408.
Not sure which one are you checking here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say we probably want both. @rudolf can you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we want to retry all operations, regardless of where the timeout occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a performance point of view, instead of checking via instanceof
, you could use error.name === {name}
, which is an order of magnitude faster :)
Each error object of the client has a corresponding .name
property: https://github.com/elastic/elasticsearch-js/blob/master/lib/errors.js
* @internal | ||
*/ | ||
export const retryCallCluster = <TResponse, TContext>( | ||
apiCaller: ApiCaller<TResponse, TContext> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we delegate type inference to the apiCaller
?
export const retryCallCluster = <T extends Promise<unknown>>(apiCaller: () => T): T => {
return defer(apiCaller)
.pipe(
retryWhen((errors) =>
errors.pipe(
concatMap((error, i) =>
iif(
() => error instanceof esErrors.NoLivingConnectionsError,
timer(1000),
throwError(error)
)
)
)
)
)
.toPromise() as T;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I thought I had it with my APICaller
type, but nope. Fixed.
.pipe( | ||
retryWhen((errors) => | ||
errors.pipe( | ||
concatMap((error, i) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i
is not used
Pinging @elastic/kibana-platform (Team:Platform) |
import { Logger } from '../../logging'; | ||
|
||
const retryResponseStatuses = [ | ||
503, // ServiceUnavailable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, if the status code of a response is 502
, 503
or 504
, the client performs an automatic failover internally, so have it here could be a duplication.
Regarding 4xx
status codes, the client does not perform any automatic failover/retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. Having this feature for our 'end' consumers is great. For the migration needs, we need to perform an infinity of retries, so keeping the logic in a single place is probably better. But we don't need to disable the internal failover here though, the duplication is probably fine.
…-retry-call-cluster
} | ||
return iif( | ||
() => | ||
error.name === 'NoLivingConnectionsError' || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: it's not type safe anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, however the unit test are asserting behavior against concrete error from the library, so I guess this is alright. But I can revert the change if we prefer instanceof
based checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, that optional. Feel free to merge
…-retry-call-cluster
const successReturn = elasticsearchClientMock.createClientResponse({ ...dummyBody }); | ||
|
||
let i = 0; | ||
client.asyncSearch.get.mockImplementation(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: I usually use chain of mockImplementationOnce
💚 Build SucceededBuild metrics
History
To update your PR or re-run it, just comment with: |
* adapt retryCallCluster for new ES client * review comments * retry on 408 ResponseError * use error name instead of instanceof base check * use error name instead of instanceof base check bis * use mockImplementationOnce chaining Co-authored-by: restrry <[email protected]>
* adapt retryCallCluster for new ES client * review comments * retry on 408 ResponseError * use error name instead of instanceof base check * use error name instead of instanceof base check bis * use mockImplementationOnce chaining Co-authored-by: restrry <[email protected]> Co-authored-by: restrry <[email protected]>
Summary
Part of #35508
Migrate
retryCallCluster
andmigrationRetryCallCluster
to be usable with new ES client.Checklist