-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alerting] Show more user friendly ES error message when executor fails #96254
[Alerting] Show more user friendly ES error message when executor fails #96254
Conversation
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
@elasticmachine merge upstream |
I created an index threshold rule with a
in the logs. It looks like the timeSeries query called by the index threshold rule executor does its own try catch logic for the ES query to handle and log errors without actually bubbling up the error to the framework, so you may need to update the error logging in there as well. |
@ymao1 Thank you! I'll look into that |
For some reason I'm thinking there is a built-in restriction in using a runtime field as the timestamp field in index patterns. That would obviously result in a table scan for every query that uses date ranges, which many of the alerts use, which is why I think it's disallowed. So - it might be possible/better to special case this in alerting as well - if we can determine the field used for the date is actually a runtime field, we could potentially disallow that usage when validating the alert params during create/edit. And after the alert is created/edited, and the user later changes the field to a runtime field, we could detect this situation and provide an explicit error message, instead of trying to infer the error state from the elasticsearch response. |
@ymao1 All fixed up and ready for another review! |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just some comments about additional tests and a question about deduping the reason string.
import { getEsErrorMessage } from './es_error_parser'; | ||
|
||
describe('ES error parser', () => { | ||
test('should return all the cause of the error', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test for when there are nested caused_by
reasons?
return getEsCause(obj.caused_by, updated); | ||
} | ||
|
||
if (obj.failed_shards && obj.failed_shards.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test for this case? When there is no caused_by
but there is an array of failed_shards
} | ||
|
||
// Recursively find all the "caused by" reasons | ||
return getEsCause(obj.caused_by, updated); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It looks like sometimes the recursively nested caused_by
reasons can be pretty redundant. Could we possibly dedupe them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks all. I'm finishing some last minute item and will get back to this PR later this week. |
94e000f
to
0320100
Compare
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
…ls (elastic#96254) * WIP for ES error parser * Fix tests * Ensure the error shows up in the UI too * wip * Handle multiple types here * Fix tests * PR feedback Co-authored-by: Kibana Machine <[email protected]>
…ls (#96254) (#97259) * WIP for ES error parser * Fix tests * Ensure the error shows up in the UI too * wip * Handle multiple types here * Fix tests * PR feedback Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Kibana Machine <[email protected]>
Resolves to #95516
This PR attempts to identify the root cause of an ES exception and display that in the error message. Unfortunately, it does not appear as if error formats are not normalized across all ES APIs so we need to look in a few places for the correct root cause.
The logic in this PR is borrowed from work by the ES UI team -> https://github.com/elastic/kibana/blob/master/src/plugins/es_ui_shared/__packages_do_not_import__/errors/es_error_parser.ts. I decided to copy/modify as we aren't using all of the logic and we might need to change it over time for our own purposes.
Here is the difference in the server log and UI:
When a runtime field changes type to something incompatible with the existing query
When the provided query is invalid
We do have a little more information available so please feel free to suggest how the messaging could be better and what other information we should show.