Skip to content

Commit

Permalink
Exponential back-off on 503s (#165)
Browse files Browse the repository at this point in the history
Certain LLM providers can produce 503s in circumstances similar to 429s,
and recommend performing the same back-off.
  • Loading branch information
ankrgyl authored Feb 25, 2025
1 parent 6ddc3af commit abebd17
Showing 1 changed file with 12 additions and 1 deletion.
13 changes: 12 additions & 1 deletion packages/proxy/src/proxy.ts
Original file line number Diff line number Diff line change
Expand Up @@ -793,6 +793,7 @@ interface ModelResponse {
}

const RATE_LIMIT_ERROR_CODE = 429;
const OVERLOADED_ERROR_CODE = 503;
const RATE_LIMIT_MAX_WAIT_MS = 45 * 1000; // Wait up to 45 seconds while retrying
const BACKOFF_EXPONENT = 2;

Expand All @@ -804,6 +805,15 @@ const TRY_ANOTHER_ENDPOINT_ERROR_CODES = [
// 429 is rate limiting. We may want to track stats about this and potentially handle more
// intelligently, eg if all APIs are rate limited, back off and try something else.
RATE_LIMIT_ERROR_CODE,

// 503 is overloaded. We may want to track stats about this and potentially handle more
// intelligently, eg if all APIs are overloaded, back off and try something else.
OVERLOADED_ERROR_CODE,
];

const RATE_LIMITING_ERROR_CODES = [
RATE_LIMIT_ERROR_CODE,
OVERLOADED_ERROR_CODE,
];

let loopIndex = 0;
Expand Down Expand Up @@ -969,7 +979,8 @@ async function fetchModelLoop(
// loop, and we haven't waited the maximum allotted time, then
// sleep for a bit, and reset the loop.
if (
httpCode === RATE_LIMIT_ERROR_CODE &&
httpCode !== undefined &&
RATE_LIMITING_ERROR_CODES.includes(httpCode) &&
i === secrets.length - 1 &&
totalWaitedTime < RATE_LIMIT_MAX_WAIT_MS
) {
Expand Down

0 comments on commit abebd17

Please sign in to comment.