Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filing waivers in Fedora fails with [ERROR] waiverdb.app: User info field 'sub' is unavailable; available are: dict_keys([]) #219

Closed
AdamWill opened this issue Jul 16, 2024 · 32 comments

Comments

@AdamWill
Copy link
Contributor

Several folks have reported recently that filing waivers for Fedora no longer works (I am tempted to call this a feature, but I'd probably get pelted with rotten fruit). Looking at the logs, we see:

[ERROR] waiverdb.app: User info field 'sub' is unavailable; available are: dict_keys([])
10.131.2.2 - - [16/Jul/2024:11:09:40 +0000] "POST /api/v1.0/waivers/ HTTP/1.1" 401 43 "-" "python-requests/2.31.0"

The relevant code was heavily refactored in 2a4d031 and then touched again in abea90e , so something in those two commits is likely the issue. I will see if I can figure it out this afternoon.

CC @xvitaly @omajid

@AdamWill
Copy link
Contributor Author

hmm, this may be specific to filing waivers via Bodhi. It looks to me like this code is correct assuming a normally-logged-in user is filing a waiver directly in waiverdb, but that is not how it's usually done in Fedora; people usually file waivers by clicking the 'waive' button in Bodhi, which causes Bodhi to do it. I suspect the way we let Bodhi submit waivers doesn't look sufficiently like a normally-logged-in-via-OIDC session and that's causing this to fail. But I'm having trouble digging up exactly how we have things set up so Bodhi has privs on waiverdb.

@AdamWill
Copy link
Contributor Author

okay, I think I found how we let Bodhi file to waiverdb - it's a special long-lasting token. The token value itself is an ansible secret. Bodhi has that token in its config. I guess the token was manually generated as I can find no other reference to it in ansible anywhere.

When Bodhi sends a waiver to waiverdb, it includes that token in the headers.

So...either this updated code for authorized user checking in waiverdb either needs to still handle that setup, or we need to tweak that setup somehow so that session.oidc_auth_profile is populated when Bodhi posts a waiver?

AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 16, 2024
AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 16, 2024
@AdamWill
Copy link
Contributor Author

#220 might fix this, but it's just a guess based on https://flask-oidc.readthedocs.io/en/latest/#resource-server .

AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 16, 2024
AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 16, 2024
AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 16, 2024
AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 16, 2024
AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 16, 2024
AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 16, 2024
AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 16, 2024
AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 16, 2024
AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 16, 2024
AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 17, 2024
@hluk
Copy link
Member

hluk commented Jul 17, 2024

Oh, maybe #218 fixes this.

@hluk
Copy link
Member

hluk commented Jul 17, 2024

The fix was deployed to stage (waiverdb.stg.fedoraproject.org).

BTW, it should also be possible to use the new web form: https://waiverdb.stg.fedoraproject.org/api/v1.0/waivers/new

@AdamWill
Copy link
Contributor Author

Great, I'll see if I can test it (it's a bit tricky to get staging Bodhi to file waivers, but I can probably bodge it somehow). The form is nice, but still much harder work than just hitting the button in Bodhi, which figures out the appropriate details of the waiver(s) automatically.

@hluk
Copy link
Member

hluk commented Jul 17, 2024

You pass the default form field values through query parameters, letting user to specify just the comment.
Example: https://waiverdb.stg.fedoraproject.org/api/v1.0/waivers/new?subject_type=TYPE&subject_identifier=ID&testcase=TEST&product_version=PV&comment=&scenario=

@AdamWill
Copy link
Contributor Author

AdamWill commented Jul 17, 2024

well, I mean, sure, but then something has to construct that URL. Are you saying Bodhi could do that instead of just auto-generating waivers? I guess it could, I'm not sure whether it's an improvement...

edit...well, maybe it would be an improvement...it would mean we wouldn't need this magic token/superuser setup...but it does mean users would have to fill in the form multiple times to waive multiple failures...not that I'm against more hurdles in the way of the waiver cannon...

ryanlerch pushed a commit to fedora-infra/ansible that referenced this issue Jul 18, 2024
This is to help us figure out
release-engineering/waiverdb#219 , it
adds an auth debugging endpoint so we can verify exactly what
fields are present when doing token auth...

Signed-off-by: Adam Williamson <[email protected]>
@zdohnal
Copy link

zdohnal commented Jul 18, 2024

Hi all,

any ETA on this? I have build with tests which failed on infrastructure error, and I can't waive them due this issue :) . e.g. https://bodhi.fedoraproject.org/updates/FEDORA-2024-4048cd7d28 .

@hluk
Copy link
Member

hluk commented Jul 18, 2024

any ETA on this? I have build with tests which failed on infrastructure error, and I can't waive them due this issue :) . e.g. https://bodhi.fedoraproject.org/updates/FEDORA-2024-4048cd7d28 .

I've released the fix to production. Can you check if it works now?

Otherwise, we can rollback to an older working version.

@mvalik
Copy link
Contributor

mvalik commented Jul 18, 2024

We've tested yesterday using my test_auth point with @AdamWill
It seems to me that I found a solution. I only need a valid token to test if it really works.

@AdamWill
Copy link
Contributor Author

So, latest issue we found is a difference between RH internal (keycloak) and Fedora (ipsilon) auth providers; waiverdb is now relying on token introspection via flask-oidc instead of validating the token itself, but flask-oidc expects the server to include a token_introspection_endpoint URL in its config and ipsilon did not. https://pagure.io/ipsilon/pull-request/406 fixes that, I'll look at getting it deployed.

@AdamWill
Copy link
Contributor Author

ugh, so now that's fixed, we've got another error:

[2024-07-18 18:55:49 +0000] [8] [ERROR] Error handling request /api/v1.0/test_auth
Traceback (most recent call last):
  File "/venv/lib64/python3.11/site-packages/gunicorn/workers/gthread.py", line 282, in handle
    keepalive = self.handle_request(req, conn)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/gunicorn/workers/gthread.py", line 334, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask/app.py", line 2552, in __call__
    return self.wsgi_app(environ, start_response)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask/app.py", line 2532, in wsgi_app
    response = self.handle_exception(e)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask_cors/extension.py", line 178, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
                                                ^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask_restx/api.py", line 671, in error_router
    return original_handler(f)
           ^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask_restx/api.py", line 669, in error_router
    return self.handle_error(e)
           ^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask/app.py", line 2529, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask/app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask_cors/extension.py", line 178, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
                                                ^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask_restx/api.py", line 671, in error_router
    return original_handler(f)
           ^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask_restx/api.py", line 669, in error_router
    return self.handle_error(e)
           ^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask_restx/api.py", line 402, in wrapper
    resp = resource(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask/views.py", line 107, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/flask_restx/resource.py", line 41, in dispatch_request
    resp = meth(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/authlib/integrations/flask_oauth2/resource_protector.py", line 98, in decorated
    self.acquire_token(**claims)
  File "/venv/lib64/python3.11/site-packages/authlib/integrations/flask_oauth2/resource_protector.py", line 69, in acquire_token
    token = self.validate_request(request=request, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib64/python3.11/site-packages/authlib/oauth2/rfc6749/resource_protector.py", line 139, in validate_request
    validator.validate_token(token, scopes, request, **kwargs)
  File "/venv/lib64/python3.11/site-packages/authlib/oauth2/rfc7662/token_validator.py", line 31, in validate_token
    if not token or not token['active']:
                        ~~~~~^^^^^^^^^^
KeyError: 'active'

@AdamWill
Copy link
Contributor Author

Note, I'm pretty sure this is not a waiverdb issue. All waiverdb is doing here is decorating a route with the @oidc.accept_token() decorator, and we're blowing up in the handling of that decorator.

It's a bug somewhere between ipsilon, flask-oidc and authlib, I think. @abompard @puiterwijk any ideas?

@AdamWill
Copy link
Contributor Author

possibly something should be calling ensure_active_token between the token being authenticated and it being validated? authlib.oauth2.rfc6749.resource_protector.ResourceProtector.validate_request does token = validator.authenticate_token(token_string), which winds up hitting flask_oidc.IntrospectTokenValidator.introspect_token, which gets the endpoint URL then calls authlib.oauth2.client.OAuth2Client.introspect_token. That part is all going through okay, it looks like, and nothing in that chain does ensure_active_token AFAICS. Then authlib.oauth2.rfc6749.resource_protector.ResourceProtector.validate_request does validator.validate_token(token, scopes, request, **kwargs), which is where we blow up; that's authlib.oauth2.rfc7662.IntrospectTokenValidator.validate_token, which expects the token to be active, but it's not. I don't know if adding an ensure_active_token call between the validate_request and validate_token steps in authlib would help or be correct, or maybe if flask-oidc's introspect_token should do it at the end, or something?

@AdamWill
Copy link
Contributor Author

ehh, not sure that's it. I messed around with a test script that hits the token introspection endpoint:

#!/bin/python

import requests
import json

token = "REALTOKENHERE"
client_secret = "REALCLIENTSECRETHERE"
tokentype = "Bearer"

params = {"token": token, "client_id": "waiverdb-stg", "client_secret": client_secret}
rv = requests.post(f"https://id.stg.fedoraproject.org/openidc/TokenInfo", params=params, headers={'Content-type': 'application/x-www-form-urlencoded'})
print(rv)
print(rv.text)

If I run that script with the correct token string and client secret, I get a response that looks like it ought to make everything happy (it has an active key with value True, and a sub key):

{"active": true, "scope": "openid https://waiverdb.stg.fedoraproject.org/oidc/create-waiver", "client_id": "bodhi", "username": "bodhi@service", "token_type": "Bearer", "exp": "1971903148", "iat": "1656543148", "sub": "bodhi@service", "aud": "bodhi", "iss": "https://id.stg.fedoraproject.org/openidc/"}

If I run that script with an invalid token value - e.g. the correct token prefixed with "Bearer ", as I wondered if the problem might be that nothing's stripping that prefix from the token string - I get this response:

Response [200]>
{"active": false}

which shouldn't cause the problem we're seeing, because it does have an active key. Instead of a KeyError we should get the InvalidTokenError that IntrospectTokenValidator.validate_token wants to raise, if that was the response we're really getting. So...I don't know what's going on :(

@AdamWill
Copy link
Contributor Author

a weird thing about the failure we're getting here is that it implies token is not empty or None. It must be a dict, but just a dict that has no active key. That's pretty weird, actually, and I've no idea how it can be happening...

@AdamWill
Copy link
Contributor Author

AdamWill commented Jul 19, 2024

okay, progress: this is because authlib is using the wrong token_endpoint_auth_method. it's using client_secret_basic not client_secret_post. I don't know why yet, though. We do set token_endpoint_auth_method="client_secret_post" in the ipsilon client config for the waiverdb-stg client. I guess somewhere between flask-oidc and authlib this just doesn't get set?

I'm now drowning in the stuff in authlib.integrations.base_client.registry.BaseOAuth and authlib.integrations.flask_client.integration.FlaskIntegration, trying to work out if there's a mismatch between how flask-oidc is specifying the method when registering the app:

from authlib.integrations.flask_client import OAuth
...
        self.oauth = OAuth(app)
        self.oauth.register(
            name="oidc",
            server_metadata_url=app.config["OIDC_SERVER_METADATA_URL"],
            client_kwargs={
                "scope": app.config["OIDC_SCOPES"],
                "token_endpoint_auth_method": app.config[
                    "OIDC_INTROSPECTION_AUTH_METHOD"
                ],
            },
            update_token=self._update_token,
        )

...and how authlib actually expects it to be done. It is hard to puzzle out.

edit: aha, I think I've got it, I think it's a bug in authlib. testing a fix.

@hluk
Copy link
Member

hluk commented Jul 19, 2024

Wow, thank for the updates! This makes me think that making OIDC work in multiple infras is impossible task. 😿

@AdamWill
Copy link
Contributor Author

lepture/authlib#662 fixes this in my testing at least. Of course, we have to get it into the waiverdb container to fix this for real. I am using a hideous method to do it for testing - carrying a copy of client.py in the waiverdb tree and copying it over the "real" one in the Dockerfile - but ideally I'd like to not have to put that into production. I'm gonna go to bed and see if there's any action on the PR overnight.

@AdamWill
Copy link
Contributor Author

AdamWill commented Jul 19, 2024

Wow, thank for the updates! This makes me think that making OIDC work in multiple infras is impossible task. 😿

ehh, well in this specific case - if I'm right - that's kinda incidental...at a guess, maybe for keycloak, client_secret_basic is the correct method so this is hidden? Otherwise I can't see how this would work for keycloak either; it definitely seems to be an authlib issue rather than an ipsilon one. edit: or maybe nothing uses token auth for the internal waiverdb so this just isn't being used at all, but if anything was, it'd also be broken?

@hluk
Copy link
Member

hluk commented Jul 19, 2024

or maybe nothing uses token auth for the internal waiverdb so this just isn't being used at all, but if anything was, it'd also be broken?

Yeah, that is that case. We still support the old kerberos/keytab authentication (so curl --negotiate works) and new waiver form (web browser + cookies + server-side session).

To get the token auth working internally, you would need to request a special service account. Currently, this is only tested with the waiverdb keycloak account via docker-compose up && tox -e functional.

The token auth could be handy in some cases (especially when delegating waiving through a different service) but I would prefer users to go to the prefilled waiver form because waiving tests should not be usually automated.

@AdamWill
Copy link
Contributor Author

AdamWill commented Jul 19, 2024

ugh...well, my fix makes my test script that hits the test auth endpoint with the correct token work, but waiving from staging bodhi still doesn't work (with the User info field 'sub' is unavailable; available are: dict_keys([]) problem), and I don't know why. now putting some of the debugging stuff back to see if I can figure it out.

@AdamWill
Copy link
Contributor Author

oh, hah, I think I see why. the endpoint bodhi hits is served by WaiversResource, which doesn't have the @oidc.accept_token() decorator.

@hluk
Copy link
Member

hluk commented Jul 19, 2024

oh, hah, I think I see why. the endpoint bodhi hits is served by WaiversResource, which doesn't have the @oidc.accept_token() decorator.

Hmm, I thought that would be fixed in #221. Did you test that fix? cc @mvalik

We cannot use the decorator because we want to support also the other auth methods (kerberos; based on AUTH_METHODS option).

@AdamWill
Copy link
Contributor Author

AdamWill commented Jul 19, 2024

oh, I don't have that PR on the branch I'm working from. If that's preferable to adding the decorator, I can do it. I was assuming the decorator was rather...additive, than exclusive, IYSWIM.

@AdamWill
Copy link
Contributor Author

AdamWill commented Jul 19, 2024

I confirmed at least that adding the decorator fixes waiving from staging Bodhi. waiver 5424 at https://waiverdb.stg.fedoraproject.org/api/v1.0/waivers/ was created that way. It's late here, so I'm going to bed, we can figure out the details later. note that currently I've set Fedora staging waiverdb to use my own container remote - https://quay.io/repository/adamwill/waiverdb-test - so you can't push anything there right now. It's running commit 4548471 , plus the @oidc.accept_token() added to WaiversResource.post() , and /venv/lib/python3.11/site-packages/authlib/oauth2/client.py in the container replaced with a patched version containing the lepture/authlib#662 fix. with those changes, waiving from bodhi works for me.

@AdamWill
Copy link
Contributor Author

AdamWill commented Jul 19, 2024

well, I lied about going to bed...I tweaked it so now staging waiverdb is running 4548471 plus the authlib fix and the changes from #221 but not adding the decorator to WaiversResource.post(), and it looks like filing waivers from bodhi still works. so, I think #221 is good, if we can't use the decorator.

AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 19, 2024
@AdamWill
Copy link
Contributor Author

OK, #223 is a much nicer way to pull a fixed authlib into the container build. With that, on top of #221 (which I see was merged), waiving works and I think it's clean enough to deploy to prod.

AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 19, 2024
AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 19, 2024
AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 19, 2024
AdamWill added a commit to AdamWill/waiverdb that referenced this issue Jul 31, 2024
hluk pushed a commit that referenced this issue Aug 1, 2024
@mvalik
Copy link
Contributor

mvalik commented Aug 1, 2024

The issue has been resolved, closing.

@mvalik mvalik closed this as completed Aug 1, 2024
@AdamWill
Copy link
Contributor Author

AdamWill commented Aug 1, 2024

well, it looks like the container build of waiverdb with the fix has been tagged as 'stage', 'latest' and 'master', but not 'prod' or 'prod-fedora'. Before I started poking at this, staging used the 'latest' tag, but prod uses 'prod-fedora', so we need the fixed container to have that tag in order to go back to 'normal'. If we flip the ansible config back to how it used to be right now, prod will break.

@hluk
Copy link
Member

hluk commented Aug 1, 2024

@AdamWill I have released the new version with prod-fedora tag just now!

ryanlerch pushed a commit to fedora-infra/ansible that referenced this issue Nov 27, 2024
The fixes for release-engineering/waiverdb#219
were merged and the factory2 'latest' image now includes them.
'prod-fedora' does not, so for now, we'll have prod use 'latest'
instead.

Signed-off-by: Adam Williamson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants