-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apis SAML security SAML authentication API access with expired access token. expired access token should be automatically refreshed #27919
Comments
Pinging @elastic/kibana-security |
This is being caused by some changes that ES made, and is being investigated by @jkakavas |
This doesn't appear to be easily reproducible on ES side either. It looks like both our authentication via an expired Bearer token and the API to refresh an already used refresh token respond as expected. That said, the fact that this started being flaky as soon as elastic/elasticsearch#36893 was merged is no coincidence. Would it be possible to get DEBUG logs (TRACE from ES would be awesome too) to help us get to the bottom of it? |
Thanks @jkakavas, let me see what I can find as far as debug information. |
I unfortunately do not see additional logs in https://kibana-ci.elastic.co/job/elastic+kibana+pull-request/JOB=x-pack-ciGroup6,node=immutable/5462/console as it looks like ES logs are mostly truncated in the output and for kibana, I'm not sure what we expected to see but I see :
We synced with @kobelb yesterday but I'll leave this here too as I am not sure if this has been applied: The logging settings for elasticsearch that would allow us to troubleshoot this are
|
@jkakavas - Yea I don't see any ES debug info in the Jenkins console and I don't see any debug settings set for ES (would be here: saml_api_integration/config.js). Even without those settings I did not see anything print, not sure if I should have - that is just to check if the functional test runner is not hiding any ES output. I tried running locally and could not reproduce the problem but I noticed this exception in the logs but no failure: java.lang.IllegalStateException: Channel is already closed - not sure if this should be caught or not. I have also asked infra to see if I can get access to the Jenkins worker so we can see if we can debug on there since the error happened again today and/or @kobelb if you want to add the logging settings and we can retest on Jenkins. |
Thanks @kobelb ! |
@jkakavas @kobelb - I was able to reproduce this problem on Ubuntu system consistently. @kobelb I took the settings you had in your PR when I ran it and I will attach the logs, the errors will be towards the end of the file. You should be able to reproduce on Ubuntu - problem did not occur for me on Darwin. |
These are the errors I am seeing - I put them in bold below: Failure in [refresh token] for id [token_zEnq_xiFR2uVJyGZbii0WQ] - [[token has already been refreshed]] Failed to create token |
@liza-mae any chance you could upload all of the logs, including the ones from Kibana? |
Thanks @liza-mae !!! |
@liza-mae I can't reproduce this locally (I'm on Ubuntu too). The tests passes each time. I used
to run the test server and then
to run the specific test. I can see similar output to the one you see, but this is expected
should be shown as we try to use a refresh token twice and expect the first one to succeed but the second one to fail (and it fails with the message above) |
@jkakavas interesting on the instances that failed on Jenkins they were all on ubuntu and when I tried it failed for me, also today no failure - but it looks like it ran on centos. Did the log I attach help? |
The kibana logs for the failing test are ( grep only saml messages on debug, annotations my own )
The question is why the third authentication request to
which makes sense, the access token is expired. I would like to see what Kibana does afterwards and how does ES respond but this is missing |
I added some more logs here: 812ab70 and will keep running it on CI so we can see what error message Kibana is getting back. I believe that we aren't seeing ES's response as a "invalid refresh token" error and just give up with a 401. |
It just failed! It looks like we're getting back the following:
and our code to detect if it's an invalid refresh token: https://github.com/elastic/kibana/blob/master/x-pack/plugins/security/server/lib/authentication/providers/saml.js#L52 is hard-coded to handle responses like the following:
should we also be checking for that new error that we're seeing sporatically? |
The error above looks like it is thrown at authentication attempt and not at trying to refresh the token. Since the call to the token API with the refresh grant is made with the internal user, I can't see how this would result in a 401 . This is from the one that failed: https://kibana-ci.elastic.co/job/elastic+kibana+pull-request/JOB=x-pack-ciGroup6,node=immutable/5775/consoleText
I see a request with an expired token getting a 401 at 22:29:27.323 then a successful attempt to refresh the token at 22:29:29.371 and then a subsequent request (with the new token ? or the old one? ) that gets a 401 at 22:29:29.397 . Then an failure message:
but no logs around it ? :/ I'm not sure the test output makes sense to me to be honest |
Is there any way to figure out if the test fails on this
|
Could it be that we're making a wrong assumption here ?: kibana/x-pack/plugins/security/server/lib/authentication/providers/saml.js Lines 284 to 306 in ab0d678
What if the error we catch is not from the call to
/api/security/v1/me and not a 401 as expected.
Question then is why would the request with a newly created access token return such an error? This is only returned when the token is expired (hard to believe as it was just created) with a trace log like:
or if the access token has been explicitly invalidated with a trace log like
but I don't see any ES logs in the build logs of the job that failed |
I agree. The following is the general flow of the "expired access token should be automatically refreshed" test:
As far as I can tell, the test failure is related to step (5) above. When we try to use the old access/refresh tokens when calling
I'm adding some more logs to confirm this, if there are any others you'd like to see, please let me know. |
Aha ! I see we have this private static final String EXPIRED_TOKEN_WWW_AUTH_VALUE = "Bearer realm=\"" + XPackField.SECURITY +
"\", error=\"invalid_token\", error_description=\"The access token expired\""; That should be I'll push a fix to see if this resolves your flaky test |
Thanks @jkakavas ! |
Turns out this cannot be "fixed" on our side see elastic/elasticsearch#37196 (review). The
is the proper error message for an expired token so it looks like you need to adjust the code that checks whether you got an expired access token error. |
@jkakavas thanks for looking into this, I'll make that change on our side. |
@jkakavas I assume this should be backported to 6.x, as this could occur there as well, it's just not being hit by CI at the moment? |
@kobelb The fact that you started getting it now in the tests has to do with elastic/elasticsearch#36893 which will not be backported to 6.x. However I believe you should backport as the |
@jkakavas when I'm getting the following response, from what I can tell this isn't valid JSON for us to parse to look at the error:
I was able to replicate this using cURL, as I was paranoid our ES client library might've been mucking with the response, but it looks like the response itself isn't valid. |
@kobelb what is the request you're sending? Not sure how this This is the response I'm getting after my access token has expired for {
"error": {
"root_cause": [
{
"type": "security_exception",
"reason": "token expired",
"header": {
"WWW-Authenticate": "Bearer realm=\"security\", error=\"invalid_token\", error_description=\"The access token expired\""
}
}
],
"type": "security_exception",
"reason": "token expired",
"header": {
"WWW-Authenticate": "Bearer realm=\"security\", error=\"invalid_token\", error_description=\"The access token expired\""
}
},
"status": 401
} |
@jkakavas my recreation steps were rather manual, and only occasionally returned the response that I was looking for. I continued to run our functional test server, and our functional tests while logging the access token that is used for the _security/_authenticate API and when I saw our tests fail with the 401 as opposed to the 401, I grabbed that access token and continued to hit the ES API until I got the unanticipated response. My saml-logs branch of Kibana has the log statements that I refer to below. After cloning kibana, inside the x-pack folder I run the following to start up Kibana and ES:
Then from also within the x-pack folder, I run the following to actually run the problematic test:
It normally takes me a few times of running this to get the test failure. At that point, I'll grab the access token from the logs that the ES/Kibana server output from the first command, and run a curl similar to the following against ES:
Eventually, it'll give me that response that isn't proper JSON. |
Any reason why you also use the credentials with |
Just sloppiness... I was able to replicate the issue without the I don't know if it helps at all, but I just got the following response also, which is also improper JSON:
|
This should be fixed by elastic/elasticsearch@2a79c46 and I already see Kibana's CI is looking better. Let's leave this open until we're certain the problem is resolved |
Closing this for now as the tests haven't failed for more than a day ( last on |
Error: expected 400 "Bad Request", got 401 "Unauthorized"
at Test._assertStatus (/var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:268:12)
at Test._assertFunction (/var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:283:11)
at Test.assert (/var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:173:18)
at assert (/var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:131:12)
at /var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:128:5
at Test.Request.callback (/var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/superagent/lib/node/index.js:718:3)
at parser (/var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/superagent/lib/node/index.js:906:18)
at IncomingMessage.res.on (/var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/superagent/lib/node/parsers/json.js:19:7)
at endReadableNT (_stream_readable.js:1094:12)
at process._tickCallback (internal/process/next_tick.js:63:19)
Branch: master
The text was updated successfully, but these errors were encountered: