Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix][Core] Restore logging of stats in the async engine #4150

Merged
merged 1 commit into from
Apr 19, 2024

Conversation

ronensc
Copy link
Contributor

@ronensc ronensc commented Apr 17, 2024

Following the refactor of #3894, I noticed that the async engine has stopped logging stats to Prometheus.

The reason is that the call to stat_logger.log() was relocated from the _process_model_outputs() method (which is shared between async and non-async engines) to LLMEngine.step(), exclusively used in the non-async engine.

This PR fixes this by reintroducing the call to stat_logger.log() in the async engine.

cc @cadedaniel

Comment on lines +224 to +226
# Log stats.
if self.log_stats:
self.stat_logger.log(self._get_stats(scheduler_outputs))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth considering moving the call to stat_logger.log() into _process_model_outputs()?
Currently, it's being invoked in both LLMEngine.step() and AsyncLLMEngine.step_async(), duplicating the functionality.

# Log stats.
if self.log_stats:
self.stat_logger.log(self._get_stats(scheduler_outputs))

What are your thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel like having it here makes more sense for abstraction? it is technically not related to process model output

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, we can create a method like post_process_step() and add logging logic here and just call it in both sync/async?

Copy link
Contributor

@sfc-gh-zhwang sfc-gh-zhwang Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is technically not related to process model output

it's a logging to me and can be fold into anywhere lol..

i feel it's simple to just fold it into _process_model_outputs, maybe make _process_model_outputs accept scheduler_outputs instead of scheduled_seq_groups and ignored_seq_groups?

Copy link
Collaborator

@rkooo567 rkooo567 Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO relevant logging should go to the relevant API. E.g., logging about e2e stats is a little bit weird to be in process model output API, but it makes sense to be in step API. It is fine now because of all assuptions we have (process_model_output is called only by step), so not very strong opinion or a blocker at the moment

Copy link
Collaborator

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a simple regression test?

Comment on lines +224 to +226
# Log stats.
if self.log_stats:
self.stat_logger.log(self._get_stats(scheduler_outputs))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel like having it here makes more sense for abstraction? it is technically not related to process model output

output, scheduler_outputs.scheduled_seq_groups,
scheduler_outputs.ignored_seq_groups)

# Log stats.
if self.log_stats:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: i think you can just do the logging before below instead of assigning it to request_outputs

        return self._process_model_outputs(
            output, scheduler_outputs.scheduled_seq_groups,
            scheduler_outputs.ignored_seq_groups)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it will work, as stat_logger.log() should be called after seq_group.maybe_set_first_token_time(now).

seq_group.maybe_set_first_token_time(now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@ronensc ronensc Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, It's not. However, I still think that the logging should be after _process_model_outputs(). That's because during logging, we calculate the latency between tokens, which I believe should include the processing of the model outputs.

# Time since last token.
# (n.b. updates seq_group.metrics.last_token_time)
time_last_iters.append(seq_group.get_last_latency(now))

@rkooo567
Copy link
Collaborator

Let's merge it after adding a simple regression test !

@mgoin
Copy link
Member

mgoin commented Apr 19, 2024

It seems the openai api server is not reporting tokens/s metric correctly anymore (on main as of today):

INFO 04-19 13:38:10 metrics.py:224] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 49 reqs, Swapped: 0 reqs, Pending: 981 reqs, GPU KV cache usage: 4.5%, CPU KV cache usage: 0.0%
INFO 04-19 13:38:20 metrics.py:224] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 79 reqs, Swapped: 0 reqs, Pending: 951 reqs, GPU KV cache usage: 6.9%, CPU KV cache usage: 0.0%
INFO 04-19 13:38:30 metrics.py:224] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 99 reqs, Swapped: 0 reqs, Pending: 931 reqs, GPU KV cache usage: 8.6%, CPU KV cache usage: 0.0%
INFO 04-19 13:38:40 metrics.py:224] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 129 reqs, Swapped: 0 reqs, Pending: 901 reqs, GPU KV cache usage: 11.2%, CPU KV cache usage: 0.0%
INFO 04-19 13:38:50 metrics.py:224] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 149 reqs, Swapped: 0 reqs, Pending: 881 reqs, GPU KV cache usage: 12.9%, CPU KV cache usage: 0.0%

This PR won't affect this, but any ideas?

@esmeetu esmeetu mentioned this pull request Apr 19, 2024
9 tasks
Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing it. I'm merging this to unblock release. Please follow up with testing which is desperately needed!

@simon-mo simon-mo merged commit 7be4f56 into vllm-project:main Apr 19, 2024
46 checks passed
alexeykondrat pushed a commit to alexeykondrat/ci-vllm that referenced this pull request May 1, 2024
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024
@ronensc ronensc deleted the fix-log-stats branch May 29, 2024 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants