Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[file_utils] do not gobble certain kinds of requests.ConnectionError #10235

Merged
merged 2 commits into from
Mar 18, 2021

Conversation

julien-c
Copy link
Member

Backport from huggingface/huggingface_hub@34b7b70

might close (or at the very least provide more transparency into) #8690, #10067, and others

@julien-c julien-c requested review from LysandreJik and sgugger and removed request for LysandreJik February 17, 2021 15:08
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Out of curiosity, is huggingface_hub ready to be used as a dependency to transformers? It would make life easier to have the code in one place only.

@julien-c
Copy link
Member Author

@sgugger Definitely on my radar at some point.

For now though it's useful for me to have a more experimental codebase where the API can change/break :)

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I don't have sufficient requests knowledge to be sure this catches all exceptions that we want to catch

@julien-c
Copy link
Member Author

LGTM but I don't have sufficient requests knowledge to be sure this catches all exceptions that we want to catch

we're in the same boat, sailor

if isinstance(exc, requests.exceptions.SSLError) or isinstance(exc, requests.exceptions.ProxyError):
raise exc
# Otherwise, our Internet connection is down.
# etag is None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really the key point of the discussion, but it would be much more readable to write this:

  • by putting just what can raise exceptions in the try/except block
  • without isinstance checks -- except does that already
        try:
            r = requests.head(url, headers=headers, allow_redirects=False, proxies=proxies, timeout=etag_timeout)
            r.raise_for_status()
        except (requests.exceptions.SSLError, requests.exceptions.ProxyError):
            # non recoverable errors -- user has to fix their system
            raise
        except (requests.exceptions.ConnectionError, requests.exceptions.Timeout):
            # recoverable errors -- likely the network failed during the download
            etag = None
        else:
            etag = r.headers.get("X-Linked-Etag") or r.headers.get("ETag")
            # rest of etag-related logic goes here

@aaugustin
Copy link
Contributor

In order to validate your strategy, I took the list of every exception in the public API of requests: https://2.python-requests.org/en/master/api/#exceptions

I built the inheritance tree between exceptions: https://requests.readthedocs.io/en/master/_modules/requests/exceptions/

Here's a readable version:

IOError
+-- RequestException 
    +-- HTTPError
    +-- ConnectionError
    |  +-- ProxyError
    |  +-- SSLError
    |  +-- ConnectTimeout (also inherits Timeout)
    +-- Timeout
    |  +-- ConnectTimeout (also inherits ConnectionError)
    |  +-- ReadTimeout
    +-- URLRequired
    +-- TooManyRedirects
    +-- MissingSchema (also inherits ValueError)
    +-- InvalidSchema (also inherits ValueError)
    +-- InvalidURL (also inherits ValueError)
    |  +-- InvalidProxyURL
    +-- InvalidHeader (also inherits ValueError)
    +-- ChunkedEncodingError
    +-- ContentDecodingError (also inherits urllib3.exceptions.HTTPError)
    +-- StreamConsumedError (also inherits TypeError)
    +-- RetryError
    +-- UnrewindableBodyError

Multiple inheritance is most likely for backwards-compatibility when changing exception tyes. For example, if you want to raise the more accurate ContentDecodingError instead of a generic urllib3.exceptions.HTTPError, making ContentDecodingError inherit urllib3.exceptions.HTTPError ensures you don't break the code of users who used to catch urllib3.exceptions.HTTPError.

I assume you want to tell apart situations where it's worth retrying from situations it isn't worth retrying, because there's a configuration issue that won't solve itself by retrying.

Based on the above, on the documentation, and on a quick search in the source code of requests, what you're doing looks correct to me, modulo my suggestion on the style.

@LysandreJik
Copy link
Member

Applied most of your comments @aaugustin, merging after internal review from @julien-c

@LysandreJik LysandreJik merged commit 4f3e93c into master Mar 18, 2021
@LysandreJik LysandreJik deleted the raise_some_connectionerrors branch March 18, 2021 16:37
@aaugustin
Copy link
Contributor

👍

Iwontbecreative pushed a commit to Iwontbecreative/transformers that referenced this pull request Jul 15, 2021
…uggingface#10235)

* do not gobble certain kinds of requests.ConnectionError

* Apply review comments

Co-authored-by: Lysandre <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants