-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not pre authenticate requests #12496
base: main
Are you sure you want to change the base?
Conversation
It's not listing all the choices and the default shown is hardcoded to the wrong value.
I haven't had a complete look at this PR yet (it'll take more time than I have available right now) but I'm not 100% clear how this addresses the token-based authentication case. If my understanding is correct, the following sequence of events will occur:
So what have we gained? We've done an extra HTTP request and we're no better off. In fact, we're worse off, because we just did an extra HTTP request, and all we got from it was confirmation that the server needs authentication. But we already knew that, because the user said so, by providing authentication info! I think the most important thing to do here is to start from the user experience. If I'm trying to
The first thing we need to clarify (and document) is what the user needs to do. With all of these scenarios, the important implementation question that follows on from designing a good UI is "how does pip tell, from the information provided by the user, what to do next?" If pip can't distinguish between two cases where it needs to take different actions, then we need to go back and reconsider the UI. Of course, if this is necessary, we are going to have to consider backward compatibility and transition. But at least we'll be doing so with a clear understanding of why it's needed. |
We did gain the knowledge that authentication is required. As such we do the work to look for credentials in pip/src/pip/_internal/network/auth.py Line 500 in ab7a034
In the token scenario, we will not discover any password but we still have a username (actually the token): pip/src/pip/_internal/network/auth.py Lines 416 to 418 in ab7a034
And we nevertheless cache the token and None as the username and password: pip/src/pip/_internal/network/auth.py Lines 521 to 522 in ab7a034
Subsequent request will hit the cache and use the token because the username is not None: pip/src/pip/_internal/network/auth.py Lines 459 to 463 in ab7a034
Note: At least on Ubuntu with SecretService, it is technically possible to avoid the caching while still having credentials. You must create manually an entry in the keyring for the URL or netloc with a user set to None and a non-None password (
pip/docs/html/topics/authentication.md Lines 22 to 27 in ab7a034
pip/docs/html/topics/authentication.md Lines 16 to 20 in ab7a034
pip/docs/html/topics/authentication.md Lines 48 to 54 in ab7a034
pip/docs/html/topics/authentication.md Lines 99 to 123 in ab7a034
That one is the least clear in the doc. It is mentioned here: pip/docs/html/topics/authentication.md Lines 156 to 158 in ab7a034
I'll happily make a second pass on it.
Maybe I should add a FAQ section to the authentication topic
When the userinfo part of the URL does not contain a password, pip does not have enough information to figure out what to do with it. Currently it assumes that the user meant that the password is an empty string and uses that (from main branch): pip/src/pip/_internal/network/auth.py Lines 415 to 421 in e57cf4f
That's mainly a side effect of #8687 that switched the default value of The previous behaviour returned correct credentials in all case but due to other issues with the previous usage of keyring it also resulted in blocking keyring pop-up. I will not pretend I've fully understood the problems there and how well they were addressed but my understanding is that, as of now, they are mainly solved and we only get a single pop-up to unlock the keyring if it is locked and if the unlock succeed (of course, if it fails you get more pop-up as it tries more URL) (tested on Ubuntu 22.04 with both With my implementation, querying keyring is delayed until the 401 is hit as done by the current one. And the problematic guessing of the user intent is also delayed until the 401 which is the same point where we know we have to disturb the user in the non-token scenario. In the token scenario, we might still be disturbing him if the keyring is locked but then the token will be cached.
This might indeed be the simplest solution. Given the token scenario is technically identical to the one where a username is provided with an empty password, i.e.
netrc would still be checked if no password is found.
I'm afraid disambiguating the token scenario in anyway will be a breaking change but I'm fine with the idea. |
Let me turn the question around. If the server returns a response that says authentication is not required, but the user has supplied credentials, should we ignore those credentials? Send them anyway (maybe the user is trying to work around a broken server that doesn't send the right response)? And do we need to know that authentication is required anyway? If we have credentials, send them. If we don't, then don't. The problem case isn't that we don't know we need to send credentials, it's that we can't work out what credentials to send (token or username/password).
That makes me uncomfortable, it suggests that we're using "do we have saved credentials" as a disambiguating heuristic. What if we want to fall back to prompting the user for a password? Or if the saved credentials are "there's no password for this user"? Maybe it can work, but it feels fragile.
You seem to be assuming that keyring is the preferred solution. But pip deliberately made keyring support optional, so we need to allow for "I don't use the keyring support" as a valid answer. And while you say "It is recommended to avoid storing password in clear text", I think we need to be careful to remember that "not recommended" is not the same as "not supported". I might be using a plain text configuration file that's encrypted and locked down by my OS.
Maybe it will be. How do other applications that take URLs disambiguate? We shouldn't be inventing our own solution here, we should be following common practice (IMO). Just to be clear, my main concern is that we're doing a lot of restructuring to address one issue, which is that Artifactory implements a rate-limiting security feature that interacts badly with our approach to authentication. I'm willing to accept that Artifactory's behaviour is reasonable, but if that's the case, why don't any other applications hit the same issue as pip? At the highest level, it's "because they use a different pattern of calls to authenticate" - but what do they do? And can pip do the same? Fundamentally, this issue isn't about the pre-authenticating, or about the server responding "Needs authentication" or not. It's about disambiguating authentication with a token from authentication with a username but no password. So basically, #6818 wasn't sufficient. I'm -1 on layering further fixes onto the behaviour if there's an underlying (and fundamental) problem that we still haven't solved. Is there no standard that defines how access tokens should be specified in a URL? I found https://datatracker.ietf.org/doc/html/rfc6750#section-2.3, but that seems linked to OAuth, so may not be a general mechanism. At this point, I'm way out of my depth, I'm afraid. |
Well, the HTTP protocol does not specify any status code for that. So I don't think this is a valid point.
Yes, that's the actual problem. We don't know if what we get is a complete set of credentials or just a pointer for finding more.
That's definitely the heuristic implemented.
That is still happening if we did not find a username and a password: pip/src/pip/_internal/network/auth.py Lines 510 to 511 in ab7a034
Note that the condition is the same as what is currently in main: pip/src/pip/_internal/network/auth.py Lines 499 to 500 in 99a82b7
That would work as that's exactly what the token scenario ends up doing.
I did not mean to say that keyring is the preferred solution; it just happens that currently it is the only solution offered by pip. Should you have your plain text config file encrypted at rest, pip would be totally unaware of it.
npm store the credentials in its configuration file but separate from the registry URL: https://docs.npmjs.com/cli/v9/configuring-npm/npmrc#auth-related-configuration docker store the credentials either in your keyring with the help of a credential helper executable or as a separate poetry does not support a keyring but it stores the credentials in its config file independently from the repository URL (and using special settings for basic auth, pypi token, client certs): https://python-poetry.org/docs/repositories/#configuring-credentials and https://python-poetry.org/docs/configuration/#http-basicnameusernamepassword That's only three (significant?) use case but none tries to store the credentials in the URL.
My understanding is that they don't even face the problem because they store credentials in a way that completely avoid any guesswork and so they just send the right header from the start.
Correct.
I can only agree with you on that!
That's very unlikely given RFC9110 HTTP Semantics has an explicit section entitled Deprecation of userinfo in http(s) URIs. However, that section acknowledge that application tends to use it for configuration purposes. But as shown above, embedding them is not really what I would call mainstream. |
Pip supports a netrc file which as far as I understand it is a relatively standard way of storing authentication data in a config file. I wonder - maybe the message here is actually that pip should deprecate including authentication data in URLs, on the basis that other tools don't support it and RFC9110 deprecates it? It would be a breaking change, and we wouldn't be able to do it quickly, but it would simplify things and bring us in line with common practice, which seems like a good thing. Which begs the question - if pip didn't support authinfo in the URL, what would you be doing and would you then not be triggering the Artifactory behaviour that started all of this? I'm out of time to dig into this much further at the moment, so let's see if any of the other pip maintainers want to weigh in on the discussion. I'll get back to it when I get some more free time. |
Looks like the issue of using a simpler external credential manager similar to what docker is doing was already discussed in #10389. I'll try to revive this issue. |
Hello, I just want to warn that this PR should not be merged or it will break authentication with Artifactory and other code hosts that implement authentication.
This premise is incorrect. Servers like artifactory have fine grained access control per user and per project. This PR assumes that the server would reply with a 401 but it is incorrect. HTTP requests to the server should always pass credentials upfront. |
This PR addresses the issue raised in #11721 and specifically the problems with Artifactory temporarily suspending users due to an initial request with an
Autentication
header being constructed with only the username provided in the (index-)url and without attempting to find a password in keyring. It supersedes #12473.The problem stem from #6818 that solve #6796 requesting support for token in URL. The semantic of the userinfo part of the URL is not well specified. We cannot distinguish the case of a token and a username without password only from the URL. A token can be identified as such only after searching for a password corresponding to what could be a username has failed. To avoid the potentially costly / blocking request to keyring, pip considered the userinfo without password to be a token and sends an authenticated requests to the server. If the server accepts this as a token, all is fine. Otherwise, as is the case with Artifactory, this is seen as a failed authentication attempt and subjected to throttling. That's what happens with Artifactory (which can be seen as being overly protective) that temporarily suspend the user named in the
Authentication
header for 1s (initially). While short, this is still enough to block the second attempt made while handling the 401 during which pip would have found a complete set of credentials.This PR is based on the reasoning that only the server actually knows whether authentication is required or not and as such pip should first attempt to access the resource without any
Authentication
header. This obviously prevent the server from temporarily suspending the user because it no longer have enough information to identify the relevant user.If the server requires authentication, it will let us know by returning a 401 Unauthorized response with a
WWW-Authenticate
header which is meant to tell us what kind of challenge to use. The code is not changed with respect to the challenge supported and still assumes blindly that the server want us to use Basic authentication. (The proposed code could be extended to store the kind of authentication challenge to use alongside the credentials. As I've still to see a server advertising something other than Basic, I've considered this out of scope.)Once we know that credentials are actually required, attempts are made to obtain them from (in order):
keyring
if available (depends on--keyring-provider
and--no-input
)netrc
The credentials are then cached in order to be reused for subsequent requests to the server.
Things to note compared to the current implementation:
netrc
is now the last resort because it is the least specific. Keyring allows to store credentials per service, which can be an URL butnetrc
only supportnetloc
. Of course, the difference only matters when configuring credentials in bothkeyring
andnetrc
.netloc
with some logic to check the usernames are the same. Instead the caching is per url-prefix which allows for multiple entries for the samenetloc
but with different common path. As such, caching will work for server hosting multiple indexes with different user per path unlike the current implementation.The PR consist of:
_get_new_credentials
and_get_index_url
which now return the url_prefix to use for caching the credentials too_get_url_and_credentials
are removed too_cache_required_credentials
and_get_required_credentials