[ML] Add queue_capacity setting to start deployment API #79433

dimitris-athanasiou · 2021-10-19T08:20:45Z

Adds a setting to the start trained model deployment API
that allows configuring the size of the queueing mechanism
that handles inference requests.

Adds a setting to the start trained model deployment API that allows configuring the size of the queueing mechanism that handles inference requests.

elasticmachine · 2021-10-19T08:20:48Z

Pinging @elastic/ml-core (Team:ML)

dimitris-athanasiou · 2021-10-19T08:20:53Z

A few thoughts:

I opted for a per-deployment queue capacity setting. The reason is that it allows for users to scale up the deployments that see lots of traffic while not letting smaller deployments waste memory
The name queue_capacity was the simplest name I could think of. However, it may be best to qualify it a bit more. Let me know what you think.
We could also introduce a cluster setting for the default value to allow users to change it without having to input it each time they call the start API. But we can always do this in the future if we see it being useful so I haven't added it in this PR.

benwtrent · 2021-10-19T11:24:54Z

@dimitris-athanasiou I am not sure about adding a deployment setting. My gut tells me that this should initially be a cluster setting (like enrich). If we see the need for adding a deployment setting, then we can add one. I think we are making this decision too soon.

dimitris-athanasiou · 2021-10-19T11:35:55Z

@benwtrent Sure, I can also see it a good start to have a cluster setting and we can add a per-deployment param in the future if it appears to be useful. I'll change the PR to do so.

dimitris-athanasiou · 2021-10-19T11:37:49Z

In fact, it doesn't make sense to change this PR. So I'll just close it and open a new one.

droberts195 · 2021-10-19T11:48:22Z

I think this is quite different to enrich. In the enrich case the same queue is being used for all enrichment. In the inference case the queue is per native process.

If we have a cluster setting how dynamically will it take effect? If changes to the cluster setting would only take effect on restarting a deployment then I think that's a very good reason to keep this as a per-deployment setting. Then it can be changed for the deployment that's having problems without disturbing any other deployments or creating confusion about which queue size applies to which deployment. If the cluster setting can take effect as soon as it's changed, without restarting any deployment, then this is not a consideration, and actually then the cluster setting would be better.

dimitris-athanasiou · 2021-10-19T13:26:54Z

Based on Dave's points and offline discussion we have decided to keep this as a per-deployment setting.

.../src/main/java/org/elasticsearch/xpack/core/ml/action/StartTrainedModelDeploymentAction.java

...ugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/deployment/DeploymentManager.java

benwtrent · 2021-10-19T14:30:49Z

.../src/main/java/org/elasticsearch/xpack/core/ml/action/StartTrainedModelDeploymentAction.java

-            if (modelBytes < 0) {
-                throw new IllegalArgumentException("modelBytes must be non-negative");
-            }
            this.inferenceThreads = inferenceThreads;
-            if (inferenceThreads < 1) {
-                throw new IllegalArgumentException(INFERENCE_THREADS + " must be positive");
-            }
            this.modelThreads = modelThreads;
-            if (modelThreads < 1) {
-                throw new IllegalArgumentException(MODEL_THREADS + " must be positive");
-            }


I think these validations should remain. Especially modelBytes as negative bytes will break a ton of logic down stream (node allocation, etc.)

We validate those elsewhere. For example, we fetch model bytes from TrainedModelDefinitionDoc where we check the value is positive. Same goes for the threading and queue params which are validated on the start request. The benefit of having no validation here is that this object is persisted in the cluster state and deciding to change the range of valid values in the future may result in the cluster not being able to start. We have adequate validations around those in case someone changes the cluster state directly (e.g. native process will fail to launch for invalid values).

* upstream/master: (24 commits) Implement framework for migrating system indices (elastic#78951) Improve transient settings deprecation message (elastic#79504) Remove getValue and getValues from Field (elastic#79516) Store Template's mappings as bytes for disk serialization (elastic#78746) [ML] Add queue_capacity setting to start deployment API (elastic#79433) [ML] muting rest compat test issue elastic#79518 (elastic#79519) Avoid redundant available indices check (elastic#76540) Re-enable BWC tests TEST Ensure password 14 chars length on Kerberos FIPS tests (elastic#79496) [DOCS] Temporarily remove APM links (elastic#79411) Fix CCSDuelIT for skipped shards (elastic#79490) Add other time accounting in HotThreads (elastic#79392) Add deprecation info API entries for deprecated monitoring settings (elastic#78799) Add note in breaking changes for nameid_format (elastic#77785) Use 'migration' instead of 'upgrade' in GET system feature migration status responses (elastic#79302) Upgrade lucene version 8b68bf60c98 (elastic#79461) Use Strings#EMPTY_ARRAY (elastic#79452) Quicker shared cache file preallocation (elastic#79447) [ML] Removing some code that's obsolete for 8.0 (elastic#79444) Ensure indexing_data CCR requests are compressed (elastic#79413) ...

[ML] Add queue_capacity setting to start deployment API

75c1963

Adds a setting to the start trained model deployment API that allows configuring the size of the queueing mechanism that handles inference requests.

dimitris-athanasiou added >non-issue :ml Machine learning v8.0.0 labels Oct 19, 2021

elasticmachine added the Team:ML Meta label for the ML team label Oct 19, 2021

dimitris-athanasiou closed this Oct 19, 2021

dimitris-athanasiou reopened this Oct 19, 2021

Removing upper bound for queue_capacity

45bf83a

benwtrent reviewed Oct 19, 2021

View reviewed changes

benwtrent approved these changes Oct 19, 2021

View reviewed changes

dimitris-athanasiou merged commit 7d637b8 into elastic:master Oct 19, 2021

dimitris-athanasiou deleted the inference-queue-capacity-setting-reopen branch October 19, 2021 20:57

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Oct 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Add queue_capacity setting to start deployment API #79433

[ML] Add queue_capacity setting to start deployment API #79433

dimitris-athanasiou commented Oct 19, 2021

elasticmachine commented Oct 19, 2021

dimitris-athanasiou commented Oct 19, 2021 •

edited

Loading

benwtrent commented Oct 19, 2021

dimitris-athanasiou commented Oct 19, 2021

dimitris-athanasiou commented Oct 19, 2021

droberts195 commented Oct 19, 2021

dimitris-athanasiou commented Oct 19, 2021

benwtrent Oct 19, 2021

dimitris-athanasiou Oct 19, 2021

[ML] Add queue_capacity setting to start deployment API #79433

[ML] Add queue_capacity setting to start deployment API #79433

Conversation

dimitris-athanasiou commented Oct 19, 2021

elasticmachine commented Oct 19, 2021

dimitris-athanasiou commented Oct 19, 2021 • edited Loading

benwtrent commented Oct 19, 2021

dimitris-athanasiou commented Oct 19, 2021

dimitris-athanasiou commented Oct 19, 2021

droberts195 commented Oct 19, 2021

dimitris-athanasiou commented Oct 19, 2021

benwtrent Oct 19, 2021

Choose a reason for hiding this comment

dimitris-athanasiou Oct 19, 2021

Choose a reason for hiding this comment

dimitris-athanasiou commented Oct 19, 2021 •

edited

Loading