Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] Group
DeploymentHandle
autoscaling metrics pushes by process #45957base: master
Are you sure you want to change the base?
[Serve] Group
DeploymentHandle
autoscaling metrics pushes by process #45957Changes from all commits
9b3e368
45dc6e5
7cf7299
84f6c10
4a48902
af5d76a
415a44b
088800f
86da5fa
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zcin @edoakes I'm working on resurrecting this PR and I think this was the main sticking point - the shared metrics pusher wouldn't respect the
autoscaling_config.metrics_interval_s
in the current form of the PR. Can we deprecate that parameter? Would we need to feature-flag this PR so it can be introduced gradually? Or maybe we want to preserve it and have a different shared pusher for eachmetric_interval_s
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with making the
metrics_interval_s
a cluster-level option configured via env var. WDYT Cindy?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I like that option!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think making the metrics interval a cluster level option makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good - I will take that approach when I get back from the holidays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it turns out there already is an env var for this,
HANDLE_METRIC_PUSH_INTERVAL_S
ray/python/ray/serve/_private/constants.py
Lines 148 to 151 in 4742373
ray/python/ray/serve/_private/router.py
Line 350 in 86da5fa
The
autoscaling_config.metrics_interval_s
is also used in a variety of other places, like for controlling how often metrics are recordedray/python/ray/serve/_private/router.py
Lines 200 to 203 in 86da5fa
Should I remove all of those and just use the single env var, or keep the other uses and only "ignore" the setting here for the shared pusher?
I'm inclined to remove all of them for consistency but that's a much bigger blast radius...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove
autoscaling_config.metrics_interval_s
and just useRAY_SERVE_HANDLE_AUTOSCALING_METRIC_RECORD_PERIOD_S
for recording metrics, because the interval at which we record metrics is usually more frequent than pushing metrics to the controller. And then we can useHANDLE_METRIC_PUSH_INTERVAL_S
, but perhaps renaming it to include "autoscaling" would be better going forward, if it is to replaceautoscaling_config.metrics_interval_s
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this
else
block - it seems like even ifRAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE = False
we still register the pushing task? It just won't have any data because the_add_autoscaling_metrics_point
task isn't registered? 🤔