You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The original design of the datafeed envisioned it as a general purpose tool that could be used with different types of jobs rather than just anomaly detection. As the code has evolved the datafeed looks more like a single purpose tool dedicated to feeding data to anomaly detector jobs (query delay, aggregations, write to autodetect) and not easily adaptable to future use cases or job types. Also it was imagined that a single datafeed could feed multiple jobs but aggregations efficiently reduce the data volume enough that we have not required this and because the ideal aggregation interval is a function of bucket span it is not always appropriate to feed the same data to multiple jobs at different bucket spans. To some extent multi-bucket anomalies have mitigated this requirement.
The change to move configuration out of the cluster state (#32905) has shown the current arrangement is vulnerable to inconsistencies as the datafeed and job are defined in separate documents that can change independently. Given that a datafeed is tightly coupled to its job the configuration could be defined inside the job itself - this is how the UI presents the datafeed as part of the job - simplifying the code as only one document needs to be read and ensuring consistency. This needn't break the REST API as the datafeeds can be extracted from the jobs without the client having any knowledge of where they came from.
I'm not advocating making the change today but if the burden of maintaining separate configs for datafeeds and anomaly detector jobs grows the refactor should be made.
The text was updated successfully, but these errors were encountered:
Another thing is that even if datafeeds were made generic enough that they could be reused for some future type of job, that wouldn't preclude storing the datafeeds for anomaly detector jobs inside the anomaly detector job config.
Since #37349 jobs and datafeeds are even more tightly coupled together. The fact that we have to do these checks to make jobs and datafeeds work together highlights that it was probably the wrong decision in the first place to separate them. However, the work to combine them now would be huge - it would be a similar project to the one that moved ML configs from cluster state to an index, so would have to be done around the 7.last -> 8.0 timeframe and would take 3-4 person months of effort to do in a way that was backwards compatible for end users. These changes would also have more impact on the UI than the ML config migration project, so total time taken would be even greater. I'm not sure that having separate jobs and datafeeds causes enough pain to justify all this complex rework.
The original design of the datafeed envisioned it as a general purpose tool that could be used with different types of jobs rather than just anomaly detection. As the code has evolved the datafeed looks more like a single purpose tool dedicated to feeding data to anomaly detector jobs (query delay, aggregations, write to autodetect) and not easily adaptable to future use cases or job types. Also it was imagined that a single datafeed could feed multiple jobs but aggregations efficiently reduce the data volume enough that we have not required this and because the ideal aggregation interval is a function of bucket span it is not always appropriate to feed the same data to multiple jobs at different bucket spans. To some extent multi-bucket anomalies have mitigated this requirement.
The change to move configuration out of the cluster state (#32905) has shown the current arrangement is vulnerable to inconsistencies as the datafeed and job are defined in separate documents that can change independently. Given that a datafeed is tightly coupled to its job the configuration could be defined inside the job itself - this is how the UI presents the datafeed as part of the job - simplifying the code as only one document needs to be read and ensuring consistency. This needn't break the REST API as the datafeeds can be extracted from the jobs without the client having any knowledge of where they came from.
I'm not advocating making the change today but if the burden of maintaining separate configs for datafeeds and anomaly detector jobs grows the refactor should be made.
The text was updated successfully, but these errors were encountered: