-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Add queue_capacity setting to start deployment API #79433
[ML] Add queue_capacity setting to start deployment API #79433
Conversation
Adds a setting to the start trained model deployment API that allows configuring the size of the queueing mechanism that handles inference requests.
Pinging @elastic/ml-core (Team:ML) |
A few thoughts:
|
@dimitris-athanasiou I am not sure about adding a deployment setting. My gut tells me that this should initially be a cluster setting (like enrich). If we see the need for adding a deployment setting, then we can add one. I think we are making this decision too soon. |
@benwtrent Sure, I can also see it a good start to have a cluster setting and we can add a per-deployment param in the future if it appears to be useful. I'll change the PR to do so. |
In fact, it doesn't make sense to change this PR. So I'll just close it and open a new one. |
I think this is quite different to enrich. In the enrich case the same queue is being used for all enrichment. In the inference case the queue is per native process. If we have a cluster setting how dynamically will it take effect? If changes to the cluster setting would only take effect on restarting a deployment then I think that's a very good reason to keep this as a per-deployment setting. Then it can be changed for the deployment that's having problems without disturbing any other deployments or creating confusion about which queue size applies to which deployment. If the cluster setting can take effect as soon as it's changed, without restarting any deployment, then this is not a consideration, and actually then the cluster setting would be better. |
Based on Dave's points and offline discussion we have decided to keep this as a per-deployment setting. |
.../src/main/java/org/elasticsearch/xpack/core/ml/action/StartTrainedModelDeploymentAction.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/xpack/core/ml/action/StartTrainedModelDeploymentAction.java
Outdated
Show resolved
Hide resolved
...ugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/deployment/DeploymentManager.java
Outdated
Show resolved
Hide resolved
if (modelBytes < 0) { | ||
throw new IllegalArgumentException("modelBytes must be non-negative"); | ||
} | ||
this.inferenceThreads = inferenceThreads; | ||
if (inferenceThreads < 1) { | ||
throw new IllegalArgumentException(INFERENCE_THREADS + " must be positive"); | ||
} | ||
this.modelThreads = modelThreads; | ||
if (modelThreads < 1) { | ||
throw new IllegalArgumentException(MODEL_THREADS + " must be positive"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these validations should remain. Especially modelBytes as negative bytes will break a ton of logic down stream (node allocation, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We validate those elsewhere. For example, we fetch model bytes from TrainedModelDefinitionDoc
where we check the value is positive. Same goes for the threading and queue params which are validated on the start request. The benefit of having no validation here is that this object is persisted in the cluster state and deciding to change the range of valid values in the future may result in the cluster not being able to start. We have adequate validations around those in case someone changes the cluster state directly (e.g. native process will fail to launch for invalid values).
* upstream/master: (24 commits) Implement framework for migrating system indices (elastic#78951) Improve transient settings deprecation message (elastic#79504) Remove getValue and getValues from Field (elastic#79516) Store Template's mappings as bytes for disk serialization (elastic#78746) [ML] Add queue_capacity setting to start deployment API (elastic#79433) [ML] muting rest compat test issue elastic#79518 (elastic#79519) Avoid redundant available indices check (elastic#76540) Re-enable BWC tests TEST Ensure password 14 chars length on Kerberos FIPS tests (elastic#79496) [DOCS] Temporarily remove APM links (elastic#79411) Fix CCSDuelIT for skipped shards (elastic#79490) Add other time accounting in HotThreads (elastic#79392) Add deprecation info API entries for deprecated monitoring settings (elastic#78799) Add note in breaking changes for nameid_format (elastic#77785) Use 'migration' instead of 'upgrade' in GET system feature migration status responses (elastic#79302) Upgrade lucene version 8b68bf60c98 (elastic#79461) Use Strings#EMPTY_ARRAY (elastic#79452) Quicker shared cache file preallocation (elastic#79447) [ML] Removing some code that's obsolete for 8.0 (elastic#79444) Ensure indexing_data CCR requests are compressed (elastic#79413) ...
Adds a setting to the start trained model deployment API
that allows configuring the size of the queueing mechanism
that handles inference requests.