Skip to content
This repository was archived by the owner on Jan 31, 2024. It is now read-only.

Improve categorization UX for checking validity of datasets #11

Closed
mukeshelastic opened this issue Feb 5, 2020 · 4 comments
Closed
Assignees
Labels
design For design issues

Comments

@mukeshelastic
Copy link

Summary of the problem
Categorization by design provides better results on unstructured machine data with repeated patterns. Additionally, a lack of sufficient training dataset can also impact the categorization results quality. We’d like to inform users if their datasets have either of the problems and the action they need to take for getting better categorization results.

Ideal solution (optional)
When user creates the job for the first time or reconfigures it to add more datasets, we’s should show a warning message at top of the results screen describing the specific datasets not being suitable for categorization either due to high cardinality or lack of sufficiently long training dataset and suggest them to filter them. Users can choose to not take an action, how ever we will show this message again when we reload the categories page again.

@katrin-freihofner
Copy link
Contributor

I had a sync with @mukeshelastic yesterday. It seems like there are two cases we need to handle:

  1. If the data does not provide meaningful patterns -> this means the categories are not as useful. We can only verify this when the job is running.

  2. We have too little training data (there should be a way to verify this upfront). This means we can warn our users during the ML job creation (creation should still be possible).

In both cases we need a way to show which dataset(s) are affected and which actions can be taken.

@katrin-freihofner
Copy link
Contributor

1. If the data does not provide meaningful patterns...

- ...in one dataset

Screenshot 2020-02-20 at 11 56 48

- ...in multiple datasets

Screenshot 2020-02-20 at 11 56 39

2. We have too little training data...

- ...in one dataset

Screenshot 2020-02-20 at 11 56 55

Setup screen (during job creation)

Screenshot 2020-02-20 at 11 55 58

- ...in multiple datasets

Screenshot 2020-02-20 at 11 57 03

Setup screen (during job creation)

Screenshot 2020-02-20 at 11 56 10

Additional indicators (alert icon) to highlight what needs to change

Screenshot 2020-02-20 at 12 06 56

If both happen.

Screenshot 2020-02-20 at 11 57 17

A few notes

  • The warning messages on the categorization view should be removable (notice the X)
  • The warning messages on the setup screen can't be removed
  • There can be 0-2 callouts on the categorization view
  • If we detect that the dataset can not be categorized in a meaningful way and the user navigates to the setup screen, we should show this also in the setup screen.

@katrin-freihofner
Copy link
Contributor

A few updates according to the feedback I got from @mukeshelastic

  • The warning callout in the categories view can not be removed
  • I added a link to the docs
  • If there is too little training data and no meaningful categorization we combine these in one warning.

This is an example of how the updated version looks like (multiple datasets producing too many categories)
Screenshot 2020-02-21 at 12 20 43

We also talked about an easier way to access and disable/enable individual datasets. This is an example of how the setup screen could look like:
Screenshot 2020-02-21 at 12 19 59

@katrin-freihofner
Copy link
Contributor

Quick update for the setup screen. With this version we are better prepared to display a large number of datasets:

Screenshot 2020-02-28 at 09 13 39

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
design For design issues
Projects
None yet
Development

No branches or pull requests

3 participants