-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix][Core] Restore logging of stats in the async engine #4150
Conversation
# Log stats. | ||
if self.log_stats: | ||
self.stat_logger.log(self._get_stats(scheduler_outputs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth considering moving the call to stat_logger.log()
into _process_model_outputs()
?
Currently, it's being invoked in both LLMEngine.step()
and AsyncLLMEngine.step_async()
, duplicating the functionality.
vllm/vllm/engine/llm_engine.py
Lines 542 to 544 in fe3b5bb
# Log stats. | |
if self.log_stats: | |
self.stat_logger.log(self._get_stats(scheduler_outputs)) |
What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel like having it here makes more sense for abstraction? it is technically not related to process model output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively, we can create a method like post_process_step() and add logging logic here and just call it in both sync/async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is technically not related to process model output
it's a logging to me and can be fold into anywhere lol..
i feel it's simple to just fold it into _process_model_outputs
, maybe make _process_model_outputs
accept scheduler_outputs
instead of scheduled_seq_groups
and ignored_seq_groups
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO relevant logging should go to the relevant API. E.g., logging about e2e stats is a little bit weird to be in process model output API, but it makes sense to be in step API. It is fine now because of all assuptions we have (process_model_output is called only by step), so not very strong opinion or a blocker at the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add a simple regression test?
# Log stats. | ||
if self.log_stats: | ||
self.stat_logger.log(self._get_stats(scheduler_outputs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel like having it here makes more sense for abstraction? it is technically not related to process model output
output, scheduler_outputs.scheduled_seq_groups, | ||
scheduler_outputs.ignored_seq_groups) | ||
|
||
# Log stats. | ||
if self.log_stats: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: i think you can just do the logging before below instead of assigning it to request_outputs
return self._process_model_outputs(
output, scheduler_outputs.scheduled_seq_groups,
scheduler_outputs.ignored_seq_groups)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it will work, as stat_logger.log()
should be called after seq_group.maybe_set_first_token_time(now)
.
vllm/vllm/engine/llm_engine.py
Line 467 in e8cc796
seq_group.maybe_set_first_token_time(now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is first_token_time
used in stat_logger
? https://github.com/search?q=repo%3Avllm-project%2Fvllm%20first_token_time&type=code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, It's not. However, I still think that the logging should be after _process_model_outputs()
. That's because during logging, we calculate the latency between tokens, which I believe should include the processing of the model outputs.
vllm/vllm/engine/llm_engine.py
Lines 601 to 603 in fe3b5bb
# Time since last token. | |
# (n.b. updates seq_group.metrics.last_token_time) | |
time_last_iters.append(seq_group.get_last_latency(now)) |
Let's merge it after adding a simple regression test ! |
It seems the openai api server is not reporting tokens/s metric correctly anymore (on main as of today):
This PR won't affect this, but any ideas? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing it. I'm merging this to unblock release. Please follow up with testing which is desperately needed!
Following the refactor of #3894, I noticed that the async engine has stopped logging stats to Prometheus.
The reason is that the call to
stat_logger.log()
was relocated from the_process_model_outputs()
method (which is shared between async and non-async engines) toLLMEngine.step()
, exclusively used in the non-async engine.This PR fixes this by reintroducing the call to
stat_logger.log()
in the async engine.cc @cadedaniel