Promptly respond to timeout requests under reject policy #333

kthui · 2024-03-04T18:01:25Z

Related PR: triton-inference-server/server#6938

Currently, the dynamic batch scheduler only rejects timed-out requests when a payload slot is available and a new batch of requests is formed and submitted for inference. The payload slot availability is based on the completion of the previous batched inference request, which can take a long time depending on the model. Thus, the rejection of timed-out requests can be significantly delayed.

This change adds the ability for the dynamic batch scheduler to reject timed-out requests while waiting for the availability of a payload slot.

src/scheduler_utils.cc

This reverts commit b682018.

src/dynamic_batch_scheduler.cc

oandreeva-nv

Nice approach!

GuanLuo · 2024-03-08T21:00:55Z

src/dynamic_batch_scheduler.cc

-          // Recapture the outer most lock to keep making progress.
-          lock.lock();
-        }
+        WaitForPayloadSlotAvailable(&lock, default_wait_microseconds);


default_wait_microseconds is a fixed value, wouldn't there still be delay of 0.5 seconds? Or that is acceptable for rejecting requests?

I think 0.5s should be fine for now. What they were complaining about is timeout of 2s in the scenario, when a model executes for 10s, so timeout requests are waiting for 8 extra seconds to be returned.

We can always adjust it upon their feedback. @kthui, what do you think?

Yes. I think we can have them try this version first, we can always reduce the delay interval if they ask. I have filed an enhancement ticket to remove the need of a manually set interval.

wouldn't there still be delay of 0.5 seconds? Or that is acceptable for rejecting requests?

I think it is acceptable, since they mentioned "[some large model] could last for tens of seconds" and it becomes a concern, which 0.5 seconds is significantly less than tens of seconds.

kthui mentioned this pull request Mar 5, 2024

Add test for max queue delay timeout prompt response triton-inference-server/server#6938

Merged

kthui marked this pull request as ready for review March 6, 2024 00:51

kthui requested review from tanmayv25, GuanLuo and oandreeva-nv March 6, 2024 00:51

oandreeva-nv reviewed Mar 6, 2024

View reviewed changes

src/scheduler_utils.cc Outdated Show resolved Hide resolved

kthui added 6 commits March 6, 2024 17:26

Promptly respond to timeout requests under reject policy

f8f167e

Remove wrapping CV

1386880

Revert "Remove wrapping CV"

ce7096b

This reverts commit b682018.

Add wait timeout to cv

4fb2f91

Fix typo

1039de9

Use different loop

0c968ec

kthui force-pushed the jacky-dynamic-batch-timeout branch from 558f59c to 0c968ec Compare March 7, 2024 01:26

kthui requested a review from oandreeva-nv March 7, 2024 20:17

oandreeva-nv reviewed Mar 7, 2024

View reviewed changes

src/dynamic_batch_scheduler.cc Outdated Show resolved Hide resolved

oandreeva-nv previously approved these changes Mar 7, 2024

View reviewed changes

Use CV wait for and avoid a separate thread

6b21d35

kthui dismissed oandreeva-nv’s stale review via 6b21d35 March 8, 2024 01:46

kthui requested a review from oandreeva-nv March 8, 2024 19:28

oandreeva-nv approved these changes Mar 8, 2024

View reviewed changes

GuanLuo reviewed Mar 8, 2024

View reviewed changes

kthui merged commit 9f1fad2 into main Mar 8, 2024
1 check passed

kthui deleted the jacky-dynamic-batch-timeout branch March 8, 2024 23:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promptly respond to timeout requests under reject policy #333

Promptly respond to timeout requests under reject policy #333

kthui commented Mar 4, 2024 •

edited

Loading

oandreeva-nv left a comment

GuanLuo Mar 8, 2024

oandreeva-nv Mar 8, 2024 •

edited

Loading

kthui Mar 8, 2024

kthui Mar 8, 2024

Promptly respond to timeout requests under reject policy #333

Promptly respond to timeout requests under reject policy #333

Conversation

kthui commented Mar 4, 2024 • edited Loading

oandreeva-nv left a comment

Choose a reason for hiding this comment

GuanLuo Mar 8, 2024

Choose a reason for hiding this comment

oandreeva-nv Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

kthui Mar 8, 2024

Choose a reason for hiding this comment

kthui Mar 8, 2024

Choose a reason for hiding this comment

kthui commented Mar 4, 2024 •

edited

Loading

oandreeva-nv Mar 8, 2024 •

edited

Loading