Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow new requests after ≥10 and ≥1% undocumented error responses #82

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Gallaecio
Copy link
Contributor

Continuation of Gallaecio#1

It’s a rather hacky approach, to be honest:

  • The number of total responses and undocumented error responses is tracked in class variables. Tests involve monkey patching to reset or hard-code their values.

    • If we want to keep the implementation at tenacity level, this is the cleanest path I saw, although I don’t discard there being better ways.

      The use of the class to count, instead of an instance, is due to tenacity creating copies of the instance for each function wrapping. The specifics are not completely clear to me, I am not 100% sure there is no way around it, but this line seems problematic for instance-based storage of these counters. The statistics on the next line are also not global, and get reset on every wrapped call.

    • We could consider implementing this logic outside tenacity, in the client code. It would allow for a cleaner implementation, e.g. based on stats. The main issue I see, and it might be considered minor, is that then we cannot stop new requests as soon as the condition is met, we still need to wait for the retries of 1 of the on-going requests to finish, which means we may sometimes stop new requests only after e.g. 11+ and not only 10 undocumented error responses. There is also the issue of not being able to customize this logic through a custom retry policy class, but we might not want to support that to begin with, and if we did, we could instead provide some client parameter(s) instead.

I made some other decisions I’m not entirely sure about:

  • Did not make the aggressive retry policy behave any different here, i.e. it also stops new requests on ≥10 and ≥1%, and not on ≥20 and ≥2%.
  • Did not provide any facility to override these settings in custom retry policy classes, assuming we might want to discourage overriding this, while still allowing it.

Post-merge work:

  • Prepare a PR for scrapy-zyte-api that closes a spider with a specific close reason upon getting the TooManyUndocumentedErrors exception.

@Gallaecio Gallaecio requested review from kmike and wRAR February 16, 2025 22:08
@Gallaecio Gallaecio changed the title Retry 5xx global limit Disallow new requests after ≥10 and ≥1% undocumented error responses Feb 16, 2025
@lopuhin
Copy link

lopuhin commented Feb 17, 2025

@Gallaecio would errors like below be considered unknown here? Or only those with defined response code? Checking just in case, as I've seen a few jobs with >1% of these.

scrapy-zyte-api/exception_types/<class 'asyncio.exceptions.TimeoutError'>

@Gallaecio
Copy link
Contributor Author

Those would count as network errors, a separate kind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants