Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Client-Side Job Validation v6.3 #18349

Closed
39 tasks done
elasticmachine opened this issue Jan 30, 2018 · 6 comments
Closed
39 tasks done

[ML] Client-Side Job Validation v6.3 #18349

elasticmachine opened this issue Jan 30, 2018 · 6 comments

Comments

@elasticmachine
Copy link
Contributor

elasticmachine commented Jan 30, 2018

Original comment by @walterra:

(Client-Side means x-pack-kibana instead of x-pack-elasticsearch in this context)

This is a follow up issue to LINK REDACTED to keep track of the actual implementation.

UI/Job Validation Framework

  • General boilerplate/framework to support validation checks including UI prototype LINK REDACTED
  • Refactor UI to use EUI/React LINK REDACTED LINK REDACTED
  • Enable Buttons in jobs list and wizards LINK REDACTED LINK REDACTED
  • Change UI to display "success" messages for test categories LINK REDACTED
  • Include job validation message status in messages.json LINK REDACTED
  • Deduplicate messages LINK REDACTED
  • Loading indicator / status icon for job validation button LINK REDACTED
  • Add check for duplicated detectors to population wizard LINK REDACTED
  • "Learn more" links should link to current version LINK REDACTED
  • Fix "undefined" fieldname LINK REDACTED

Bugs

Checks v6.3

  • Basic checks (previously advanced job wizard form checks) moved to server LINK REDACTED
  • Check cardinality of fields
    • Basic checks LINK REDACTED
    • by fields and other tweaks LINK REDACTED
    • Rules for skipping checks LINK REDACTED
  • Check bucket_span
    • invalid if zero LINK REDACTED
    • test if > 1 day LINK REDACTED
    • test if different from bucket-estimation (trigger INFO level message) LINK REDACTED (disabled for now)
  • Check influencers
    • are there any? LINK REDACTED
    • Rules for skipping checks LINK REDACTED
  • Improve start/end time handling
    • Add optional support for start/end timestamps to the node API endpoint. Use this for the job validation checks triggered from wizards instead of hijacking data_count. LINK REDACTED
    • If start/end is not set, try to fallback to timespan of original data LINK REDACTED
    • Check for data before the unix epoch (see LINK REDACTED)
    • Check if time range is 25 times the bucket span LINK REDACTED or at least 2h LINK REDACTED
  • Check against model memory limit LINK REDACTED
  • Check job does not contain duplicate detectors (same function and field names) LINK REDACTED
  • Check for non-aggregatable fields LINK REDACTED LINK REDACTED
  • If model_plot_config enabled and cardinality of by/over/partition >100 then warn this is a resource intensive job LINK REDACTED

Unit Tests

They should cover all possible message ids being returned.

  • Basic tests LINK REDACTED LINK REDACTED
  • Cardinality LINK REDACTED
  • Bucket Span LINK REDACTED
  • Influencers LINK REDACTED
  • Time Range LINK REDACTED
  • Model Memory Limit LINK REDACTED
@elasticmachine
Copy link
Contributor Author

Original comment by @sophiec20:

Awesome first review... feedback coming...

A1: Repeated error for each detector - any way to de-dup?
Double over_field value check.
image
and also
image

A2: Links to blogs
I don't think we should link to blogs from inside the UI. It seems a bit too temporary and unable to version control. We could link to a page in the docs, which could then link to a blog - however should we do this either? @lcawl thoughts? is there precedence in say upgrade wiz or add data?

A3: bucket span test is very quick (and always seems to be 15m) - is this actually plugged in?

@elasticmachine
Copy link
Contributor Author

Original comment by @walterra:

Yes those blog links are temporary, Lisa is already working on replacing them here: LINK REDACTED

@elasticmachine
Copy link
Contributor Author

Original comment by @jgowdyelastic:

Just making a note here so I don't forget.
The current way that the validation button has been added to the job wizards might need addressing.
The job config is passed to the mlValidateJob directive by a function call:
job="getJobFromConfig()"
this means this function is called on every digest cycle.
Also as a side effect it means the job id validation is called every digest cycle. So the user gets this error message as soon as the page loads:
image

Perhaps a reference to getJobFromConfig could be passed across instead and only called when the Validate job button is pressed.

@elasticmachine
Copy link
Contributor Author

Original comment by @walterra:

  • there's now a check in place to detect duplicate detectors LINK REDACTED (already in master/6.x)
  • job validation message deduplication is added in LINK REDACTED
  • the above problem in the wizards described by James is solved in LINK REDACTED

@elasticmachine
Copy link
Contributor Author

elasticmachine commented Mar 27, 2018

Original comment by @walterra:

I cleaned up the issue's description an moved validation checks planned for v6.4 to #18368.

@walterra
Copy link
Contributor

walterra commented May 8, 2018

Created a follow up issue to track further progress for v6.3.x #18898

@walterra walterra closed this as completed May 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants