-
Notifications
You must be signed in to change notification settings - Fork 2
Improve categorization UX for checking validity of datasets #11
Comments
I had a sync with @mukeshelastic yesterday. It seems like there are two cases we need to handle:
In both cases we need a way to show which dataset(s) are affected and which actions can be taken. |
1. If the data does not provide meaningful patterns...- ...in one dataset- ...in multiple datasets2. We have too little training data...- ...in one datasetSetup screen (during job creation)- ...in multiple datasetsSetup screen (during job creation)Additional indicators (alert icon) to highlight what needs to changeIf both happen.A few notes
|
A few updates according to the feedback I got from @mukeshelastic
This is an example of how the updated version looks like (multiple datasets producing too many categories) We also talked about an easier way to access and disable/enable individual datasets. This is an example of how the setup screen could look like: |
Summary of the problem
Categorization by design provides better results on unstructured machine data with repeated patterns. Additionally, a lack of sufficient training dataset can also impact the categorization results quality. We’d like to inform users if their datasets have either of the problems and the action they need to take for getting better categorization results.
Ideal solution (optional)
When user creates the job for the first time or reconfigures it to add more datasets, we’s should show a warning message at top of the results screen describing the specific datasets not being suitable for categorization either due to high cardinality or lack of sufficiently long training dataset and suggest them to filter them. Users can choose to not take an action, how ever we will show this message again when we reload the categories page again.
The text was updated successfully, but these errors were encountered: