server bench: fix bench not waiting for model load #7284

JohannesGaessler · 2024-05-14T13:34:52Z

While working on #6828 I noticed that when using a large static n-ngam cache the benchmark would report 0 iterations for the first 8 minutes and then 30 iterations for the last 2 minutes. What seems to be happening is that bench.py doesn't correctly wait for the server to be ready so the clock starts ticking even while the n-gram cache is still being loaded. From what I can tell loading the model from disk can have the same issue if it's e.g. on an HDD.

This PR makes it so that bench.py waits for response 200 (SERVER_STATE_READY) from the health endpoint for checking whether the server is actually ready. I'm not sure if there is a better way to implement this than what I did; I'm definitely open to suggestions.

ggerganov · 2024-05-16T14:42:21Z

It looks like this change causes the server Benchmark that we run on the self-hosted runner to fail like this:

https://github.com/ggerganov/llama.cpp/actions/runs/9094073377/job/24998422481

I tried to revert it and now the benchmark passes:

https://github.com/ggerganov/llama.cpp/actions/runs/9112533114

I'm not sure why it is causing the error - any ideas how to fix?

phymbert · 2024-05-16T18:09:05Z

Yes, the problem is here:

llama.cpp/examples/server/bench/bench.py

Line 113 in 9afdffe

if is_server_listening("0.0.0.0", 9090):

It considers prometheus not started, which is not working as expected. Probably easier to revert and separate in another PR prometheus check vs llama.cpp server checks ?

This reverts commit 583fd6b.

…7334) This reverts commit 583fd6b.

…#7284)" (ggerganov#7334) This reverts commit 583fd6b.

server bench: fix bench not waiting for model load

f692dbd

JohannesGaessler requested a review from phymbert May 14, 2024 13:34

phymbert approved these changes May 14, 2024

View reviewed changes

mofosyne added examples Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix python python script changes labels May 14, 2024

JohannesGaessler merged commit 583fd6b into ggerganov:master May 15, 2024
24 checks passed

phymbert added a commit that referenced this pull request May 16, 2024

Revert "server bench: fix bench not waiting for model load (#7284)"

e7f7bef

This reverts commit 583fd6b.

phymbert mentioned this pull request May 16, 2024

Revert "server bench: fix bench not waiting for model load" #7334

Merged

phymbert added a commit that referenced this pull request May 16, 2024

Revert "server bench: fix bench not waiting for model load (#7284)" (#…

24ecb58

…7334) This reverts commit 583fd6b.

teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 17, 2024

server bench: fix bench not waiting for model load (ggerganov#7284)

53332ff

teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 17, 2024

Revert "server bench: fix bench not waiting for model load (ggerganov…

657f980

…#7284)" (ggerganov#7334) This reverts commit 583fd6b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server bench: fix bench not waiting for model load #7284

server bench: fix bench not waiting for model load #7284

JohannesGaessler commented May 14, 2024

ggerganov commented May 16, 2024

phymbert commented May 16, 2024

server bench: fix bench not waiting for model load #7284

server bench: fix bench not waiting for model load #7284

Conversation

JohannesGaessler commented May 14, 2024

ggerganov commented May 16, 2024

phymbert commented May 16, 2024