Investigate how alerting handles index pattern field changes and removals #93501

mikecote · 2021-03-03T19:53:22Z

Investigate how alerting handles index pattern field changes and removals.

elasticmachine · 2021-03-03T19:53:24Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

pmuellr · 2021-03-09T16:33:40Z

I'm thinking this is about rule type authoring vs an alerting framework thing? So we need to make sure all the rule type executors are "safe" when it comes to dealing with runtime fields?

mikecote · 2021-03-09T17:06:05Z

We should inform #92753 about other rule types.

pmuellr · 2021-03-09T18:36:16Z

I posted a comment #92753 (comment) regarding the other rule types.

ymao1 · 2021-03-25T19:25:29Z

To investigate this, I used the es-apm-sys-sim to generate data and added the following runtime fields to the mapping:

{
  "runtime": {
    "second_timestamp": {
      "type": "date",
      "script": {
        "source": "emit(doc['@timestamp'].getValue().getMillis())"
      }
    },
    "free_memory": {
      "type": "double",
      "script": {
        "source": "emit(100 * doc['system.memory.actual.free'].value / doc['system.memory.total'].value)"
      }
    },
    "day_of_week": {
      "type": "keyword",
      "script": {
        "source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
      }
    }
  }
}

Index threshold rule type

There are 3 places to specify fields within the Index threshold rule: timestamp field, metric aggregation field and group by field.

On runtime field create
List of runtime fields showed up in create rule flyout with no additional intervention. The preview chart correctly populated when using runtime fields.

On runtime field update
After creating rules using the runtime fields, I updated the runtime field mapping definitions so that they each had a different type than they started out with:

Timestamp field - When selected runtime field was updated to be a non-date type, rule execution would fail with a search_phase_execution_exception error. The actual error from ES is a little more descriptive:

When updating from date type to double type:

{
  "type": "query_shard_exception",
  "reason": "failed to create query: For input string: \"2021-03-25T18:35:54.545Z\"",
  "index_uuid": "znV1kqQrTEuOWgad1KOBBw",
  "index": "es-apm-sys-sim",
  "caused_by": {
  "type": "number_format_exception",
    "reason": "For input string: \"2021-03-25T18:35:54.545Z\""
  }
}

When updating from date type to keyword type:

{
  "type": "illegal_argument_exception",
  "reason": "Field [second_timestamp] of type [keyword] does not support custom formats",
  "caused_by": {
    "type": "illegal_argument_exception",
    "reason": "Field [second_timestamp] of type [keyword] does not support custom formats"
  }
}

Metric agg field
- When selected runtime field was updated from double type to date type, the rule execution would continue and the date as epoch millis would be used as the value for the metric agg.
- When selected runtime field was updated from double type to keyword type, rule execution would fail with a search_phase_execution_exception error. The actual error from ES:

{
  "type": "illegal_argument_exception",
  "reason": "Field [free_memory] of type [keyword] is not supported for aggregation [avg]",
  "caused_by": {
    "type": "illegal_argument_exception",
    "reason": "Field [free_memory] of type [keyword] is not supported for aggregation [avg]"
  }
}

Group by field - When selected runtime field was updated to be a non-keyword type, rule execution would succeed by the group by buckets will be unexpected (epoch millis or a numeric value)

On runtime field delete
Rule execution looks normal, no errors in any logs. Query just doesn't return any results. This would lead to a confusing experience for the user since the rule is not failing but they would not be getting alerted when expected.

Elasticsearch query rule type

There are 2 places to specify fields within the Index threshold rule: timestamp field and within the query DSL. Using runtime fields in the timestamp exhibited the same behavior as described above for the Index threshold alert. Behavior of the DSL query itself depended on the content of the query and the type of runtime field mapping update. For example, I set up a rule with a range query on a numeric runtime field and then updated the mapping to be keyword and the query executed without errors (just no hits). I also set up a rule with a term query on a keyword runtime field and then updated the mapping to be numeric and received a search_phase_execution_exception where the underlying ES error was

{
  "type": "query_shard_exception",
  "reason": "failed to create query: For input string: \"Thursday\"",
  "index_uuid": "znV1kqQrTEuOWgad1KOBBw",
  "index": "es-apm-sys-sim",
  "caused_by": {
    "type": "number_format_exception",
    "reason": "For input string: \"Thursday\""
  }
}

ymao1 · 2021-03-25T20:20:34Z

I think in the short term, we should be logging better error messages when rule execution fails due to this condition. search_phase_execution_exception is not very descriptive and it would be more helpful to capture the more descriptive error message from ES in the event log.

In the long term, would we want to validate the fields used in the query before executing the query? Seems like overkill to do it on each rule execution, but when/where would we want do this? Maybe this could be something that is done as part of the explain feature if we implement that? At least at that point, the user could see the underlying query that is run for a rule.

ymao1 · 2021-03-26T12:59:45Z

Created #95523, #95520 and #95516 to capture work that might be done as an outcome of this investigation. Closing this investigation issue.

mikecote added chore Feature:Alerting Feature:Actions Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Mar 3, 2021

mikecote mentioned this issue Mar 3, 2021

Ensure Kibana Apps gracefully handle index pattern field changes and removal #92753

Closed

ymao1 self-assigned this Mar 25, 2021

ymao1 mentioned this issue Mar 25, 2021

[Alerting] Investigate runtime fields defined on an index mapping #75791

Closed

This was referenced Mar 26, 2021

[Alerting] Log more descriptive error messages when runtime field mappings are updated to be incompatible with original query #95516

Closed

[Alerting] Do we need to validate field mappings before rule execution? #95520

Closed

ymao1 closed this as completed Mar 26, 2021

ymao1 mentioned this issue Mar 26, 2021

[Alerting] Editing stack rules should provide warning if fields have unexpected mapping #95523

Open

ymao1 mentioned this issue May 26, 2021

There are insufficient functional tests for runtime field support in Stack Rules #100738

Open

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate how alerting handles index pattern field changes and removals #93501

Investigate how alerting handles index pattern field changes and removals #93501

mikecote commented Mar 3, 2021

elasticmachine commented Mar 3, 2021

pmuellr commented Mar 9, 2021

mikecote commented Mar 9, 2021

pmuellr commented Mar 9, 2021

ymao1 commented Mar 25, 2021

ymao1 commented Mar 25, 2021

ymao1 commented Mar 26, 2021 •

edited

Loading

Investigate how alerting handles index pattern field changes and removals #93501

Investigate how alerting handles index pattern field changes and removals #93501

Comments

mikecote commented Mar 3, 2021

elasticmachine commented Mar 3, 2021

pmuellr commented Mar 9, 2021

mikecote commented Mar 9, 2021

pmuellr commented Mar 9, 2021

ymao1 commented Mar 25, 2021

Index threshold rule type

Elasticsearch query rule type

ymao1 commented Mar 25, 2021

ymao1 commented Mar 26, 2021 • edited Loading

ymao1 commented Mar 26, 2021 •

edited

Loading