Disallow new requests after ≥10 and ≥1% undocumented error responses #82

Gallaecio · 2025-02-16T22:08:13Z

Continuation of Gallaecio#1

It’s a rather hacky approach, to be honest:

The number of total responses and undocumented error responses is tracked in class variables. Tests involve monkey patching to reset or hard-code their values.
- If we want to keep the implementation at tenacity level, this is the cleanest path I saw, although I don’t discard there being better ways.
  
  The use of the class to count, instead of an instance, is due to tenacity creating copies of the instance for each function wrapping. The specifics are not completely clear to me, I am not 100% sure there is no way around it, but this line seems problematic for instance-based storage of these counters. The statistics on the next line are also not global, and get reset on every wrapped call.
- We could consider implementing this logic outside tenacity, in the client code. It would allow for a cleaner implementation, e.g. based on stats. The main issue I see, and it might be considered minor, is that then we cannot stop new requests as soon as the condition is met, we still need to wait for the retries of 1 of the on-going requests to finish, which means we may sometimes stop new requests only after e.g. 11+ and not only 10 undocumented error responses. There is also the issue of not being able to customize this logic through a custom retry policy class, but we might not want to support that to begin with, and if we did, we could instead provide some client parameter(s) instead.

I made some other decisions I’m not entirely sure about:

Did not make the aggressive retry policy behave any different here, i.e. it also stops new requests on ≥10 and ≥1%, and not on ≥20 and ≥2%.
Did not provide any facility to override these settings in custom retry policy classes, assuming we might want to discourage overriding this, while still allowing it.

Post-merge work:

Prepare a PR for scrapy-zyte-api that closes a spider with a specific close reason upon getting the TooManyUndocumentedErrors exception.

lopuhin · 2025-02-17T19:35:46Z

@Gallaecio would errors like below be considered unknown here? Or only those with defined response code? Checking just in case, as I've seen a few jobs with >1% of these.

scrapy-zyte-api/exception_types/<class 'asyncio.exceptions.TimeoutError'>

Gallaecio · 2025-02-17T20:07:44Z

Those would count as network errors, a separate kind.

Gallaecio added 4 commits December 30, 2024 17:25

Make the default retry policy the aggressive one with half the attempts

9d60a9a

Initial stab at circuit break for undocumented erors

d052ed1

Remove unnecessary comments

2d2c56f

unquote type hints

6394d94

Gallaecio requested review from kmike and wRAR February 16, 2025 22:08

Gallaecio changed the title ~~Retry 5xx global limit~~ Disallow new requests after ≥10 and ≥1% undocumented error responses Feb 16, 2025

Gallaecio mentioned this pull request Feb 16, 2025

Disallow new requests after ≥10 and ≥1% undocumented error responses Gallaecio/python-zyte-api#1

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disallow new requests after ≥10 and ≥1% undocumented error responses #82

Disallow new requests after ≥10 and ≥1% undocumented error responses #82

Gallaecio commented Feb 16, 2025

lopuhin commented Feb 17, 2025

Gallaecio commented Feb 17, 2025

Disallow new requests after ≥10 and ≥1% undocumented error responses #82

Are you sure you want to change the base?

Disallow new requests after ≥10 and ≥1% undocumented error responses #82

Conversation

Gallaecio commented Feb 16, 2025

lopuhin commented Feb 17, 2025

Gallaecio commented Feb 17, 2025