[Bugfix][Core] Restore logging of stats in the async engine #4150

ronensc · 2024-04-17T14:33:36Z

Following the refactor of #3894, I noticed that the async engine has stopped logging stats to Prometheus.

The reason is that the call to stat_logger.log() was relocated from the _process_model_outputs() method (which is shared between async and non-async engines) to LLMEngine.step(), exclusively used in the non-async engine.

This PR fixes this by reintroducing the call to stat_logger.log() in the async engine.

cc @cadedaniel

ronensc · 2024-04-17T14:44:29Z

vllm/engine/async_llm_engine.py

+        # Log stats.
+        if self.log_stats:
+            self.stat_logger.log(self._get_stats(scheduler_outputs))


Would it be worth considering moving the call to stat_logger.log() into _process_model_outputs()?
Currently, it's being invoked in both LLMEngine.step() and AsyncLLMEngine.step_async(), duplicating the functionality.

vllm/vllm/engine/llm_engine.py

Lines 542 to 544 in fe3b5bb

# Log stats.

if self.log_stats:

self.stat_logger.log(self._get_stats(scheduler_outputs))

What are your thoughts?

feel like having it here makes more sense for abstraction? it is technically not related to process model output

alternatively, we can create a method like post_process_step() and add logging logic here and just call it in both sync/async?

it is technically not related to process model output

it's a logging to me and can be fold into anywhere lol..

i feel it's simple to just fold it into _process_model_outputs, maybe make _process_model_outputs accept scheduler_outputs instead of scheduled_seq_groups and ignored_seq_groups?

IMO relevant logging should go to the relevant API. E.g., logging about e2e stats is a little bit weird to be in process model output API, but it makes sense to be in step API. It is fine now because of all assuptions we have (process_model_output is called only by step), so not very strong opinion or a blocker at the moment

rkooo567

Is it possible to add a simple regression test?

rkooo567 · 2024-04-18T08:18:48Z

vllm/engine/async_llm_engine.py

+        # Log stats.
+        if self.log_stats:
+            self.stat_logger.log(self._get_stats(scheduler_outputs))


feel like having it here makes more sense for abstraction? it is technically not related to process model output

sfc-gh-zhwang · 2024-04-18T08:29:18Z

vllm/engine/async_llm_engine.py

            output, scheduler_outputs.scheduled_seq_groups,
            scheduler_outputs.ignored_seq_groups)

+        # Log stats.
+        if self.log_stats:


Nit: i think you can just do the logging before below instead of assigning it to request_outputs

return self._process_model_outputs( output, scheduler_outputs.scheduled_seq_groups, scheduler_outputs.ignored_seq_groups)

I'm not sure it will work, as stat_logger.log() should be called after seq_group.maybe_set_first_token_time(now).

vllm/vllm/engine/llm_engine.py

Line 467 in e8cc796

seq_group.maybe_set_first_token_time(now)

is first_token_time used in stat_logger? https://github.com/search?q=repo%3Avllm-project%2Fvllm%20first_token_time&type=code

My bad, It's not. However, I still think that the logging should be after _process_model_outputs(). That's because during logging, we calculate the latency between tokens, which I believe should include the processing of the model outputs.

vllm/vllm/engine/llm_engine.py

Lines 601 to 603 in fe3b5bb

# Time since last token.

# (n.b. updates seq_group.metrics.last_token_time)

time_last_iters.append(seq_group.get_last_latency(now))

rkooo567 · 2024-04-18T23:47:21Z

Let's merge it after adding a simple regression test !

mgoin · 2024-04-19T13:41:30Z

It seems the openai api server is not reporting tokens/s metric correctly anymore (on main as of today):

INFO 04-19 13:38:10 metrics.py:224] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 49 reqs, Swapped: 0 reqs, Pending: 981 reqs, GPU KV cache usage: 4.5%, CPU KV cache usage: 0.0%
INFO 04-19 13:38:20 metrics.py:224] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 79 reqs, Swapped: 0 reqs, Pending: 951 reqs, GPU KV cache usage: 6.9%, CPU KV cache usage: 0.0%
INFO 04-19 13:38:30 metrics.py:224] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 99 reqs, Swapped: 0 reqs, Pending: 931 reqs, GPU KV cache usage: 8.6%, CPU KV cache usage: 0.0%
INFO 04-19 13:38:40 metrics.py:224] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 129 reqs, Swapped: 0 reqs, Pending: 901 reqs, GPU KV cache usage: 11.2%, CPU KV cache usage: 0.0%
INFO 04-19 13:38:50 metrics.py:224] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 149 reqs, Swapped: 0 reqs, Pending: 881 reqs, GPU KV cache usage: 12.9%, CPU KV cache usage: 0.0%

This PR won't affect this, but any ideas?

simon-mo

Thank you for fixing it. I'm merging this to unblock release. Please follow up with testing which is desperately needed!

…ject#4150)

Add log stats to async engine

711ba21

ronensc commented Apr 17, 2024

View reviewed changes

simon-mo assigned cadedaniel Apr 18, 2024

rkooo567 approved these changes Apr 18, 2024

View reviewed changes

sfc-gh-zhwang approved these changes Apr 18, 2024

View reviewed changes

sfc-gh-zhwang reviewed Apr 18, 2024

View reviewed changes

esmeetu mentioned this pull request Apr 19, 2024

v0.4.1 Release Tracker #4181

Closed

9 tasks

simon-mo approved these changes Apr 19, 2024

View reviewed changes

simon-mo merged commit 7be4f56 into vllm-project:main Apr 19, 2024
46 checks passed

ronensc mentioned this pull request Apr 20, 2024

Add more Prometheus metrics #2764

Merged

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 21, 2024

[Bugfix][Core] Restore logging of stats in the async engine (vllm-pro…

5d413be

…ject#4150)

ZackBradshaw pushed a commit to ZackBradshaw/vllm that referenced this pull request Apr 22, 2024

[Bugfix][Core] Restore logging of stats in the async engine (vllm-pro…

d900e5b

…ject#4150)

ZackBradshaw pushed a commit to ZackBradshaw/vllm that referenced this pull request Apr 22, 2024

[Bugfix][Core] Restore logging of stats in the async engine (vllm-pro…

bb9707f

…ject#4150)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Apr 25, 2024

[Bugfix][Core] Restore logging of stats in the async engine (vllm-pro…

842facc

…ject#4150)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Apr 26, 2024

[Bugfix][Core] Restore logging of stats in the async engine (vllm-pro…

a9fb129

…ject#4150)

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 26, 2024

[Bugfix][Core] Restore logging of stats in the async engine (vllm-pro…

93b20db

…ject#4150)

alexeykondrat pushed a commit to alexeykondrat/ci-vllm that referenced this pull request May 1, 2024

[Bugfix][Core] Restore logging of stats in the async engine (vllm-pro…

204d581

…ject#4150)

ronensc mentioned this pull request May 1, 2024

[CI]Add regression tests to ensure the async engine generates metrics #4524

Merged

z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024

[Bugfix][Core] Restore logging of stats in the async engine (vllm-pro…

c69fae9

…ject#4150)

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

ronensc deleted the fix-log-stats branch May 29, 2024 11:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix][Core] Restore logging of stats in the async engine #4150

[Bugfix][Core] Restore logging of stats in the async engine #4150

ronensc commented Apr 17, 2024

ronensc Apr 17, 2024

rkooo567 Apr 18, 2024

rkooo567 Apr 18, 2024

sfc-gh-zhwang Apr 18, 2024 •

edited

Loading

rkooo567 Apr 18, 2024 •

edited

Loading

rkooo567 left a comment •

edited

Loading

rkooo567 Apr 18, 2024

sfc-gh-zhwang Apr 18, 2024

ronensc Apr 18, 2024

sfc-gh-zhwang Apr 18, 2024

ronensc Apr 18, 2024 •

edited

Loading

rkooo567 commented Apr 18, 2024

mgoin commented Apr 19, 2024

simon-mo left a comment

	# Log stats.
	if self.log_stats:
	self.stat_logger.log(self._get_stats(scheduler_outputs))

	# Time since last token.
	# (n.b. updates seq_group.metrics.last_token_time)
	time_last_iters.append(seq_group.get_last_latency(now))

[Bugfix][Core] Restore logging of stats in the async engine #4150

[Bugfix][Core] Restore logging of stats in the async engine #4150

Conversation

ronensc commented Apr 17, 2024

ronensc Apr 17, 2024

Choose a reason for hiding this comment

rkooo567 Apr 18, 2024

Choose a reason for hiding this comment

rkooo567 Apr 18, 2024

Choose a reason for hiding this comment

sfc-gh-zhwang Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

rkooo567 Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

rkooo567 left a comment • edited Loading

Choose a reason for hiding this comment

rkooo567 Apr 18, 2024

Choose a reason for hiding this comment

sfc-gh-zhwang Apr 18, 2024

Choose a reason for hiding this comment

ronensc Apr 18, 2024

Choose a reason for hiding this comment

sfc-gh-zhwang Apr 18, 2024

Choose a reason for hiding this comment

ronensc Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

rkooo567 commented Apr 18, 2024

mgoin commented Apr 19, 2024

simon-mo left a comment

Choose a reason for hiding this comment

sfc-gh-zhwang Apr 18, 2024 •

edited

Loading

rkooo567 Apr 18, 2024 •

edited

Loading

rkooo567 left a comment •

edited

Loading

ronensc Apr 18, 2024 •

edited

Loading