-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Ability to move anomaly detection jobs between clusters #37987
Comments
Pinging @elastic/ml-core |
Another idea on this enhancement. Sometimes it is necessary to "clone" a job on the same cluster, but KEEP The state. The scenario is:
This would be fixed if the user could "export/import" on the same cluster (with the model state). |
I can't easily think of a scenario where the reasons for the current job getting locked up would not also apply to the export/imported version. Although I agree that we should explore other reasons for staying in the same cluster, such as moving from a shared results index to dedicated perhaps. Also, I think we do need to be careful that this would not be (ab)used as a way to bootstrap a job - because in general, it will take longer to unlearn a model trained on different data than to learn from scratch on the right data. Advice through docs and best practice should be given. |
I can, this is a real bug that occurred. The Java code threw an exception during |
Ability to "snapshot" and "restore" job and datafeed configurations so that they can be moved between clusters.
Could be just job and datafeed config, for example if a job had been proven in a staging environment, could then easily transfer its config to production. Also being able to store job configs in git (for example) from where they can be easily recreated - the current GET jobs API does not give a clean config which can be recreated.
Or in a DR scenario, could move whole job including model and persisted state. Job could be set continue from where it left off (or from time of latest persisted state), providing the source indices were also available to read. This would also be applicable to a migration / side-by-side upgrade scenario.
I suggest we be careful about overloading the snapshot/restore terminology.
Note: This has been often requested and discussed, but seems to be lacking an issue (that I can find).
The text was updated successfully, but these errors were encountered: