Improve categorization UX for checking validity of datasets #11

mukeshelastic · 2020-02-05T13:32:37Z

Summary of the problem
Categorization by design provides better results on unstructured machine data with repeated patterns. Additionally, a lack of sufficient training dataset can also impact the categorization results quality. We’d like to inform users if their datasets have either of the problems and the action they need to take for getting better categorization results.

Ideal solution (optional)
When user creates the job for the first time or reconfigures it to add more datasets, we’s should show a warning message at top of the results screen describing the specific datasets not being suitable for categorization either due to high cardinality or lack of sufficiently long training dataset and suggest them to filter them. Users can choose to not take an action, how ever we will show this message again when we reload the categories page again.

katrin-freihofner · 2020-02-19T08:29:45Z

I had a sync with @mukeshelastic yesterday. It seems like there are two cases we need to handle:

If the data does not provide meaningful patterns -> this means the categories are not as useful. We can only verify this when the job is running.
We have too little training data (there should be a way to verify this upfront). This means we can warn our users during the ML job creation (creation should still be possible).

In both cases we need a way to show which dataset(s) are affected and which actions can be taken.

katrin-freihofner · 2020-02-20T11:10:54Z

1. If the data does not provide meaningful patterns...

- ...in one dataset

- ...in multiple datasets

2. We have too little training data...

- ...in one dataset

Setup screen (during job creation)

- ...in multiple datasets

Setup screen (during job creation)

Additional indicators (alert icon) to highlight what needs to change

If both happen.

A few notes

The warning messages on the categorization view should be removable (notice the X)
The warning messages on the setup screen can't be removed
There can be 0-2 callouts on the categorization view
If we detect that the dataset can not be categorized in a meaningful way and the user navigates to the setup screen, we should show this also in the setup screen.

katrin-freihofner · 2020-02-21T11:37:45Z

A few updates according to the feedback I got from @mukeshelastic

The warning callout in the categories view can not be removed
I added a link to the docs
If there is too little training data and no meaningful categorization we combine these in one warning.

This is an example of how the updated version looks like (multiple datasets producing too many categories)

We also talked about an easier way to access and disable/enable individual datasets. This is an example of how the setup screen could look like:

katrin-freihofner · 2020-02-28T08:17:08Z

Quick update for the setup screen. With this version we are better prepared to display a large number of datasets:

mukeshelastic added the design For design issues label Feb 5, 2020

katrin-freihofner self-assigned this Feb 5, 2020

katrin-freihofner added the [zube]: Inbox label Feb 5, 2020

katrin-freihofner added [zube]: In Progress and removed [zube]: Inbox labels Feb 19, 2020

katrin-freihofner added [zube]: In Review and removed [zube]: In Progress labels Feb 21, 2020

katrin-freihofner mentioned this issue Mar 2, 2020

[Logs UI] categorisation setup screen elastic/kibana#59005

Closed

katrin-freihofner closed this as completed Mar 2, 2020

katrin-freihofner added [zube]: Done and removed [zube]: In Review labels Mar 2, 2020

alvarolobato removed the [zube]: Done label Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve categorization UX for checking validity of datasets #11

Improve categorization UX for checking validity of datasets #11

mukeshelastic commented Feb 5, 2020

katrin-freihofner commented Feb 19, 2020

katrin-freihofner commented Feb 20, 2020

katrin-freihofner commented Feb 21, 2020

katrin-freihofner commented Feb 28, 2020

Improve categorization UX for checking validity of datasets #11

Improve categorization UX for checking validity of datasets #11

Comments

mukeshelastic commented Feb 5, 2020

katrin-freihofner commented Feb 19, 2020

katrin-freihofner commented Feb 20, 2020

1. If the data does not provide meaningful patterns...

- ...in one dataset

- ...in multiple datasets

2. We have too little training data...

- ...in one dataset

Setup screen (during job creation)

- ...in multiple datasets

Setup screen (during job creation)

Additional indicators (alert icon) to highlight what needs to change

If both happen.

A few notes

katrin-freihofner commented Feb 21, 2020

katrin-freihofner commented Feb 28, 2020