Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Add queue_capacity setting to start deployment API #79433

Conversation

dimitris-athanasiou
Copy link
Contributor

Adds a setting to the start trained model deployment API
that allows configuring the size of the queueing mechanism
that handles inference requests.

Adds a setting to the start trained model deployment API
that allows configuring the size of the queueing mechanism
that handles inference requests.
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Oct 19, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@dimitris-athanasiou
Copy link
Contributor Author

dimitris-athanasiou commented Oct 19, 2021

A few thoughts:

  • I opted for a per-deployment queue capacity setting. The reason is that it allows for users to scale up the deployments that see lots of traffic while not letting smaller deployments waste memory
  • The name queue_capacity was the simplest name I could think of. However, it may be best to qualify it a bit more. Let me know what you think.
  • We could also introduce a cluster setting for the default value to allow users to change it without having to input it each time they call the start API. But we can always do this in the future if we see it being useful so I haven't added it in this PR.

@benwtrent
Copy link
Member

@dimitris-athanasiou I am not sure about adding a deployment setting. My gut tells me that this should initially be a cluster setting (like enrich). If we see the need for adding a deployment setting, then we can add one. I think we are making this decision too soon.

@dimitris-athanasiou
Copy link
Contributor Author

@benwtrent Sure, I can also see it a good start to have a cluster setting and we can add a per-deployment param in the future if it appears to be useful. I'll change the PR to do so.

@dimitris-athanasiou
Copy link
Contributor Author

In fact, it doesn't make sense to change this PR. So I'll just close it and open a new one.

@droberts195
Copy link
Contributor

I think this is quite different to enrich. In the enrich case the same queue is being used for all enrichment. In the inference case the queue is per native process.

If we have a cluster setting how dynamically will it take effect? If changes to the cluster setting would only take effect on restarting a deployment then I think that's a very good reason to keep this as a per-deployment setting. Then it can be changed for the deployment that's having problems without disturbing any other deployments or creating confusion about which queue size applies to which deployment. If the cluster setting can take effect as soon as it's changed, without restarting any deployment, then this is not a consideration, and actually then the cluster setting would be better.

@dimitris-athanasiou
Copy link
Contributor Author

Based on Dave's points and offline discussion we have decided to keep this as a per-deployment setting.

Comment on lines -283 to -293
if (modelBytes < 0) {
throw new IllegalArgumentException("modelBytes must be non-negative");
}
this.inferenceThreads = inferenceThreads;
if (inferenceThreads < 1) {
throw new IllegalArgumentException(INFERENCE_THREADS + " must be positive");
}
this.modelThreads = modelThreads;
if (modelThreads < 1) {
throw new IllegalArgumentException(MODEL_THREADS + " must be positive");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these validations should remain. Especially modelBytes as negative bytes will break a ton of logic down stream (node allocation, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We validate those elsewhere. For example, we fetch model bytes from TrainedModelDefinitionDoc where we check the value is positive. Same goes for the threading and queue params which are validated on the start request. The benefit of having no validation here is that this object is persisted in the cluster state and deciding to change the range of valid values in the future may result in the cluster not being able to start. We have adequate validations around those in case someone changes the cluster state directly (e.g. native process will fail to launch for invalid values).

@dimitris-athanasiou dimitris-athanasiou merged commit 7d637b8 into elastic:master Oct 19, 2021
@dimitris-athanasiou dimitris-athanasiou deleted the inference-queue-capacity-setting-reopen branch October 19, 2021 20:57
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Oct 20, 2021
* upstream/master: (24 commits)
  Implement framework for migrating system indices (elastic#78951)
  Improve transient settings deprecation message (elastic#79504)
  Remove getValue and getValues from Field (elastic#79516)
  Store Template's mappings as bytes for disk serialization (elastic#78746)
  [ML] Add queue_capacity setting to start deployment API (elastic#79433)
  [ML] muting rest compat test issue elastic#79518 (elastic#79519)
  Avoid redundant available indices check (elastic#76540)
  Re-enable BWC tests
  TEST Ensure password 14 chars length on Kerberos FIPS tests (elastic#79496)
  [DOCS] Temporarily remove APM links (elastic#79411)
  Fix CCSDuelIT for skipped shards (elastic#79490)
  Add other time accounting in HotThreads (elastic#79392)
  Add deprecation info API entries for deprecated monitoring settings (elastic#78799)
  Add note in breaking changes for nameid_format (elastic#77785)
  Use 'migration' instead of 'upgrade' in GET system feature migration status responses (elastic#79302)
  Upgrade lucene version 8b68bf60c98 (elastic#79461)
  Use Strings#EMPTY_ARRAY (elastic#79452)
  Quicker shared cache file preallocation (elastic#79447)
  [ML] Removing some code that's obsolete for 8.0 (elastic#79444)
  Ensure indexing_data CCR requests are compressed (elastic#79413)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >non-issue Team:ML Meta label for the ML team v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants