Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Endpoints should return an error when the cluster is overloaded #22670

Closed
2 tasks done
frreiss opened this issue Feb 25, 2022 · 1 comment
Closed
2 tasks done
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue
Milestone

Comments

@frreiss
Copy link

frreiss commented Feb 25, 2022

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

The Ray Serve Router should detect that it does not have the resources to handle additional requests in a timely fashion and return an appropriate HTTP error instead of attempting to take on more work.

Here's a suggested policy for how this could work:

IF a request arrives and the following conditions are met:
  * All replicas have reached their `max_concurrent_queries` quotas
  * There are insufficient resources to allocate additional replicas
  * The backlog of requests (i.e requests not assigned to a replica)
    exceeds a user-configurable size, OR the age of the oldest request 
    in the backlog exceeds a user-configurable timeout
THEN return an HTTP error (such as 503, service unavailable) instead of enqueuing the request

Returning an error in this case would not only help to prevent the service from thrashing, but would also provide feedback to upstream proxies and queues that the service is overloaded.

Use case

While benchmarking some use cases involving deploying expensive models on Ray Serve, I've observed that, under high load, Serve can get into the following state:

  • All replicas have reached their max_concurrent_queries quotas
  • There are insufficient resources to allocate additional replicas
  • The Serve Router continues to accept incoming HTTP requests but cannot assign them to a replica
  • An unbounded backlog of pending requests accumulates inside the Router
  • Client response times grow indefinitely
  • Server memory consumption grows indefinitely

In this situation, it would be better if Serve could be configured to return an error instead of enqueuing additional requests.

Related issues

#21438 dealt with a particularly nasty version of this issue where the client cancels and reissues requests, but it did not address the underlying problem of Serve queuing up an unbounded number of requests.

#21161 proposes handling overloads by paging the excess requests to an on-disk queue. This approach makes sense in some applications, but in many cases it is better to return an error instead of queuing a request for an unbounded amount of time. For example, the application may be latency-sensitive.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@frreiss frreiss added the enhancement Request for new feature and/or capability label Feb 25, 2022
@jiaodong jiaodong added the serve Ray Serve Related Issue label Feb 25, 2022
@shrekris-anyscale shrekris-anyscale added this to the Serve backlog milestone Feb 28, 2022
@edoakes edoakes removed the platform label Apr 25, 2022
@sihanwang41 sihanwang41 added the P1 Issue that should be fixed within a few weeks label Mar 23, 2023
@akshay-anyscale
Copy link
Contributor

There are a few new configurations around this now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue
Projects
None yet
Development

No branches or pull requests

7 participants