Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apis SAML security SAML authentication API access with expired access token. expired access token should be automatically refreshed #27919

Closed
liza-mae opened this issue Jan 2, 2019 · 35 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience failed-test A test failure on a tracked branch, potentially flaky-test Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more!

Comments

@liza-mae
Copy link
Contributor

liza-mae commented Jan 2, 2019

Error: expected 400 "Bad Request", got 401 "Unauthorized"
at Test._assertStatus (/var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:268:12)
at Test._assertFunction (/var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:283:11)
at Test.assert (/var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:173:18)
at assert (/var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:131:12)
at /var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:128:5
at Test.Request.callback (/var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/superagent/lib/node/index.js:718:3)
at parser (/var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/superagent/lib/node/index.js:906:18)
at IncomingMessage.res.on (/var/lib/jenkins/workspace/elastic+kibana+master/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/superagent/lib/node/parsers/json.js:19:7)
at endReadableNT (_stream_readable.js:1094:12)
at process._tickCallback (internal/process/next_tick.js:63:19)

Branch: master

@liza-mae liza-mae added test Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! test_xpack_functional failed-test A test failure on a tracked branch, potentially flaky-test labels Jan 2, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-security

@kobelb
Copy link
Contributor

kobelb commented Jan 2, 2019

This is being caused by some changes that ES made, and is being investigated by @jkakavas

@jkakavas
Copy link
Member

jkakavas commented Jan 2, 2019

This doesn't appear to be easily reproducible on ES side either. It looks like both our authentication via an expired Bearer token and the API to refresh an already used refresh token respond as expected. That said, the fact that this started being flaky as soon as elastic/elasticsearch#36893 was merged is no coincidence. Would it be possible to get DEBUG logs (TRACE from ES would be awesome too) to help us get to the bottom of it?

@jkakavas jkakavas self-assigned this Jan 2, 2019
@liza-mae
Copy link
Contributor Author

liza-mae commented Jan 2, 2019

Thanks @jkakavas, let me see what I can find as far as debug information.

@jkakavas
Copy link
Member

jkakavas commented Jan 3, 2019

I unfortunately do not see additional logs in https://kibana-ci.elastic.co/job/elastic+kibana+pull-request/JOB=x-pack-ciGroup6,node=immutable/5462/console as it looks like ES logs are mostly truncated in the output and for kibana, I'm not sure what we expected to see but I see :

 warn debug logs are being captured, only error logs will be written to the console

We synced with @kobelb yesterday but I'll leave this here too as I am not sure if this has been applied: The logging settings for elasticsearch that would allow us to troubleshoot this are

org.elasticsearch.xpack.security.authc to DEBUG, org.elasticsearch.xpack.security.authc.TokenService to TRACE and org.elasticsearch.xpack.security.rest.action.oauth2 toDEBUG

@liza-mae
Copy link
Contributor Author

liza-mae commented Jan 3, 2019

@jkakavas - Yea I don't see any ES debug info in the Jenkins console and I don't see any debug settings set for ES (would be here: saml_api_integration/config.js). Even without those settings I did not see anything print, not sure if I should have - that is just to check if the functional test runner is not hiding any ES output.

I tried running locally and could not reproduce the problem but I noticed this exception in the logs but no failure: java.lang.IllegalStateException: Channel is already closed - not sure if this should be caught or not.

I have also asked infra to see if I can get access to the Jenkins worker so we can see if we can debug on there since the error happened again today and/or @kobelb if you want to add the logging settings and we can retest on Jenkins.

@kobelb
Copy link
Contributor

kobelb commented Jan 3, 2019

@liza-mae I threw up #27997 to run these tests with more verbose logging.

@liza-mae
Copy link
Contributor Author

liza-mae commented Jan 3, 2019

Thanks @kobelb !

@liza-mae
Copy link
Contributor Author

liza-mae commented Jan 3, 2019

@jkakavas @kobelb - I was able to reproduce this problem on Ubuntu system consistently. @kobelb I took the settings you had in your PR when I ran it and I will attach the logs, the errors will be towards the end of the file. You should be able to reproduce on Ubuntu - problem did not occur for me on Darwin.

@liza-mae
Copy link
Contributor Author

liza-mae commented Jan 3, 2019

These are the errors I am seeing - I put them in bold below:

Failure in [refresh token] for id [token_zEnq_xiFR2uVJyGZbii0WQ] - [[token has already been refreshed]]
│ org.elasticsearch.ElasticsearchSecurityException: invalid_grant
│ at org.elasticsearch.xpack.security.authc.TokenService.invalidGrantException(TokenService.java:1195) ~[?:?]
│ at org.elasticsearch.xpack.security.authc.TokenService.checkTokenDocForRefresh(TokenService.java:807) ~[?:?]
│ at org.elasticsearch.xpack.security.authc.TokenService.lambda$innerRefresh$32(TokenService.java:732) ~[?:?]
│ at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:60) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:64) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:60) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction$2.handleResponse(TransportSingleShardAction.java:271) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction$2.handleResponse(TransportSingleShardAction.java:255) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1094) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1175) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1155) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:54) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.HandledTransportAction$ChannelActionListener.onResponse(HandledTransportAction.java:93) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.HandledTransportAction$ChannelActionListener.onResponse(HandledTransportAction.java:77) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$1.doRun(TransportSingleShardAction.java:117) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
│ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
│ at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]

Failed to create token
│ org.elasticsearch.ElasticsearchSecurityException: invalid_grant
│ at org.elasticsearch.xpack.security.authc.TokenService.invalidGrantException(TokenService.java:1195) ~[?:?]
│ at org.elasticsearch.xpack.security.authc.TokenService.checkTokenDocForRefresh(TokenService.java:807) ~[?:?]
│ at org.elasticsearch.xpack.security.authc.TokenService.lambda$innerRefresh$32(TokenService.java:732) ~[?:?]
│ at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:60) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:64) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:60) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction$2.handleResponse(TransportSingleShardAction.java:271) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction$2.handleResponse(TransportSingleShardAction.java:255) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1094) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1175) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1155) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:54) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.HandledTransportAction$ChannelActionListener.onResponse(HandledTransportAction.java:93) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.HandledTransportAction$ChannelActionListener.onResponse(HandledTransportAction.java:77) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$1.doRun(TransportSingleShardAction.java:117) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
│ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
│ at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
java.lang.IllegalStateException: Channel is already closed
│ at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.close(RestController.java:504) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.sendResponse(RestController.java:497) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.rest.action.RestActionListener.onFailure(RestActionListener.java:58) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:49) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:64) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:60) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.xpack.security.action.user.TransportAuthenticateAction.doExecute(TransportAuthenticateAction.java:48) ~[?:?]
│ at org.elasticsearch.xpack.security.action.user.TransportAuthenticateAction.doExecute(TransportAuthenticateAction.java:26) ~[?:?]
│ at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:141) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$apply$0(SecurityActionFilter.java:90) ~[?:?]
│ at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:60) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$authorizeRequest$4(SecurityActionFilter.java:179) ~[?:?]
│ at org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.maybeRun(AuthorizationUtils.java:177) ~[?:?]
│ at org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.setRunAsRoles(AuthorizationUtils.java:171) ~[?:?]
│ at org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.authorize(AuthorizationUtils.java:159) ~[?:?]
│ at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.authorizeRequest(SecurityActionFilter.java:181) ~[?:?]
│ at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$applyInternal$3(SecurityActionFilter.java:159) ~[?:?]
│ at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:60) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:180) ~[?:?]
│ at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$4(AuthenticationService.java:213) ~[?:?]
│ at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:224) ~[?:?]
│ at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:178) ~[?:?]
│ at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.access$000(AuthenticationService.java:137) ~[?:?]
│ at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:107) ~[?:?]
│ at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.applyInternal(SecurityActionFilter.java:158) ~[?:?]
│ at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:111) ~[?:?]
│ at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:139) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:117) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:60) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:83) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:72) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:393) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.xpack.security.rest.action.RestAuthenticateAction.lambda$innerPrepareRequest$1(RestAuthenticateAction.java:59) ~[?:?]
│ at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:97) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.xpack.security.rest.SecurityRestFilter.lambda$handleRequest$0(SecurityRestFilter.java:58) ~[?:?]
│ at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:60) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$writeAuthToContext$21(AuthenticationService.java:429) ~[?:?]
│ at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.writeAuthToContext(AuthenticationService.java:438) ~[?:?]
│ at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$0(AuthenticationService.java:184) ~[?:?]
│ at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:60) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.xpack.security.authc.TokenService.lambda$checkIfTokenIsValid$40(TokenService.java:1044) ~[?:?]
│ at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:60) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:64) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:60) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction$2.handleResponse(TransportSingleShardAction.java:271) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction$2.handleResponse(TransportSingleShardAction.java:255) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1094) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
│ at org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1175) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]

@kobelb
Copy link
Contributor

kobelb commented Jan 3, 2019

@liza-mae any chance you could upload all of the logs, including the ones from Kibana?

@liza-mae
Copy link
Contributor Author

liza-mae commented Jan 3, 2019

@kobelb log attached, let me know if you need additional info, thanks!

mylog.txt

@kobelb
Copy link
Contributor

kobelb commented Jan 3, 2019

Thanks @liza-mae !!!

@jkakavas
Copy link
Member

jkakavas commented Jan 4, 2019

@liza-mae I can't reproduce this locally (I'm on Ubuntu too). The tests passes each time.

I used

node scripts/functional_tests_server.js --config x-pack/test/saml_api_integration/config.js

to run the test server and then

node scripts/functional_test_runner.js --config x-pack/test/saml_api_integration/config.js --grep="expired access token should be automatically refreshed with two concurrent requests"

to run the specific test.

I can see similar output to the one you see, but this is expected

Failure in [refresh token] for id [token_zEnq_xiFR2uVJyGZbii0WQ] - [[token has already been refreshed]]

should be shown as we try to use a refresh token twice and expect the first one to succeed but the second one to fail (and it fails with the message above)

@liza-mae
Copy link
Contributor Author

liza-mae commented Jan 4, 2019

@jkakavas interesting on the instances that failed on Jenkins they were all on ubuntu and when I tried it failed for me, also today no failure - but it looks like it ran on centos. Did the log I attach help?

@jkakavas
Copy link
Member

jkakavas commented Jan 4, 2019

The kibana logs for the failing test are ( grep only saml messages on debug, annotations my own )

[ First authentication request]
[20:12:40.074] [debug][saml][security] Trying to authenticate user request to /api/security/v1/me.
[20:12:40.075] [debug][saml][security] Trying to authenticate via header.
[20:12:40.075] [debug][saml][security] Authorization header is not presented.
[20:12:40.075] [debug][saml][security] Trying to authenticate via state.
[ First authentication request fails because the access token is expired as expected]
[20:12:40.079] [debug][saml][security] Failed to authenticate request via state: [security_exception] token expired, with { header={ WWW-Authenticate="Bearer realm=\"security\", error=\"invalid_token\", error_description=\"The access token expired\"" } } 
[ First authentication request - Successful attempt to refresh the access token]
[20:12:40.079] [debug][saml][security] Trying to refresh access token.
[20:12:42.074] [debug][saml][security] Access token has been successfully refreshed.
[20:12:42.078] [debug][saml][security] Request has been authenticated via refreshed token.
[ Second authentication request]
[20:12:42.093] [debug][saml][security] Trying to authenticate user request to /api/security/v1/me.
[20:12:42.093] [debug][saml][security] Trying to authenticate via header.
[20:12:42.093] [debug][saml][security] Authorization header is not presented.
[20:12:42.093] [debug][saml][security] Trying to authenticate via state.
[ Second authentication request fails because the access token is expired as expected]
[20:12:42.098] [debug][saml][security] Failed to authenticate request via state: [security_exception] token expired, with { header={ WWW-Authenticate="Bearer realm=\"security\", error=\"invalid_token\", error_description=\"The access token expired\"" } } 
[ Second authentication request - Failed attempt to refresh the access token as expected]
[20:12:42.098] [debug][saml][security] Trying to refresh access token.
[20:12:42.110] [debug][saml][security] Failed to refresh access token: invalid_grant
[ Third authentication request]
[20:12:42.140] [debug][saml][security] Trying to authenticate user request to /api/security/v1/me.
[20:12:42.141] [debug][saml][security] Trying to authenticate via header.
[20:12:42.141] [debug][saml][security] Authorization header is not presented.
[20:12:42.141] [debug][saml][security] Trying to authenticate via state.
[ Third authentication request fails because the access token is expired as expected ]
[20:12:42.149] [debug][saml][security] Failed to authenticate request via state: Authentication Exception

The question is why the third authentication request to /api/security/v1/me returns a 401 instead of a 400 and there is nothing in the logs of Kibana or Elasticsearch that would indicate what the reason is. Last log message from ES is

info [o.e.x.s.a.TokenService] [liza-mach] Failure in [validate token] for id [3JByL_nWQVeD1Wp6EjUmvA]
                 │      org.elasticsearch.ElasticsearchSecurityException: token expired

which makes sense, the access token is expired. I would like to see what Kibana does afterwards and how does ES respond but this is missing

@kobelb
Copy link
Contributor

kobelb commented Jan 4, 2019

which makes sense, the access token is expired. I would like to see what Kibana does afterwards and how does ES respond but this is missing

I added some more logs here: 812ab70 and will keep running it on CI so we can see what error message Kibana is getting back. I believe that we aren't seeing ES's response as a "invalid refresh token" error and just give up with a 401.

@kobelb
Copy link
Contributor

kobelb commented Jan 4, 2019

It just failed! It looks like we're getting back the following:

{"username":"[email protected]","roles":[],"full_name":null,"email":null,"metadata":{"saml_nameid":"[email protected]","saml(urn:oid:0.0.7)":["[email protected]"],"saml_nameid_format":"urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified"},"enabled":true,"authentication_realm":{"name":"saml1","type":"saml"},"lookup_realm":{"name":"saml1","type":"saml"}} error=\\"invalid_token\\", error_description=\\"The access token expired\\""

and our code to detect if it's an invalid refresh token: https://github.com/elastic/kibana/blob/master/x-pack/plugins/security/server/lib/authentication/providers/saml.js#L52 is hard-coded to handle responses like the following:

{ error:    { root_cause: [ [Object] ],      type: 'security_exception',      reason: 'token expired',      header:       { 'WWW-Authenticate':          'Bearer realm="security", error="invalid_token", error_description="The access token expired"' } },   status: 401 }

should we also be checking for that new error that we're seeing sporatically?

@jkakavas
Copy link
Member

jkakavas commented Jan 6, 2019

should we also be checking for that new error that we're seeing sporatically?

The error above looks like it is thrown at authentication attempt and not at trying to refresh the token. Since the call to the token API with the refresh grant is made with the internal user, I can't see how this would result in a 401 .

This is from the one that failed: https://kibana-ci.elastic.co/job/elastic+kibana+pull-request/JOB=x-pack-ciGroup6,node=immutable/5775/consoleText

respons [22:29:27.305]  GET /api/security/v1/me 200 2082ms - 9.0B
log   [22:29:27.307] [debug][saml][security] Trying to authenticate user request to /api/security/v1/me.
log   [22:29:27.308] [debug][saml][security] Trying to authenticate via header.
log   [22:29:27.308] [debug][saml][security] Authorization header is not presented.
log   [22:29:27.308] [debug][saml][security] Trying to authenticate via state.
log   [22:29:27.323] [debug][saml][security] Failed to authenticate request via state: [security_exception] token expired, with { header={ WWW-Authenticate="Bearer realm=\"security\", error=\"invalid_token\", error_description=\"The access token expired\"" } } 
log   [22:29:27.324] [debug][saml][security] Unable to authenticateViaState: { error:
 { root_cause: [ [Object] ],
   type: 'security_exception',
   reason: 'token expired',
   header:
    { 'WWW-Authenticate':
       'Bearer realm="security", error="invalid_token", error_description="The access token expired"' } },
status: 401 }
log   [22:29:27.324] [debug][saml][security] Trying to refresh access token.
log   [22:29:29.363] [debug][saml][security] Access token has been successfully refreshed.
log   [22:29:29.371] [debug][saml][security] Request has been authenticated via refreshed token.
log   [22:29:29.397] [debug][saml][security] Trying to authenticate user request to /api/security/v1/me.
log   [22:29:29.398] [debug][saml][security] Trying to authenticate via header.
log   [22:29:29.398] [debug][saml][security] Authorization header is not presented.
log   [22:29:29.398] [debug][saml][security] Trying to authenticate via state.
log   [22:29:29.409] [debug][saml][security] Failed to authenticate request via state: Authentication Exception
log   [22:29:29.409] [debug][saml][security] Unable to authenticateViaState: '{"username":"[email protected]","roles":[],"full_name":null,"email":null,"metadata":{"saml_nameid":"[email protected]","saml(urn:oid:0.0.7)":["[email protected]"],"saml_nameid_format":"urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified"},"enabled":true,"authentication_realm":{"name":"saml1","type":"saml"},"lookup_realm":{"name":"saml1","type":"saml"}} error=\\"invalid_token\\", error_description=\\"The access token expired\\""}},"status":401}'
log   [22:29:29.410] [info][authentication] Authentication attempt failed: Authentication Exception
log   [22:29:29.410] [debug][license][xpack] Calling [data] Elasticsearch _xpack API. Polling frequency: 30001
└- ✖ fail: "apis SAML security SAML authentication API access with expired access token. expired access token should be automatically refreshed"
│      Error: expected 400 "Bad Request", got 401 "Unauthorized"
│       at Test._assertStatus (/var/lib/jenkins/workspace/elastic+kibana+pull-request/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:268:12)
│       at Test._assertFunction (/var/lib/jenkins/workspace/elastic+kibana+pull-request/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:283:11)
│       at Test.assert (/var/lib/jenkins/workspace/elastic+kibana+pull-request/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:173:18)
│       at assert (/var/lib/jenkins/workspace/elastic+kibana+pull-request/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:131:12)
│       at /var/lib/jenkins/workspace/elastic+kibana+pull-request/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/supertest/lib/test.js:128:5
│       at Test.Request.callback (/var/lib/jenkins/workspace/elastic+kibana+pull-request/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/superagent/lib/node/index.js:718:3)
│       at parser (/var/lib/jenkins/workspace/elastic+kibana+pull-request/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/superagent/lib/node/index.js:906:18)
│       at IncomingMessage.res.on (/var/lib/jenkins/workspace/elastic+kibana+pull-request/JOB/x-pack-ciGroup6/node/immutable/kibana/node_modules/superagent/lib/node/parsers/json.js:19:7)
│       at endReadableNT (_stream_readable.js:1094:12)
│       at process._tickCallback (internal/process/next_tick.js:63:19)
│ 
│ 
respons [22:29:29.396]  GET /api/security/v1/me 401 48ms - 9.0B

I see a request with an expired token getting a 401 at 22:29:27.323 then a successful attempt to refresh the token at 22:29:29.371 and then a subsequent request (with the new token ? or the old one? ) that gets a 401 at 22:29:29.397 .

Then an failure message:

 fail: "apis SAML security SAML authentication API access with expired access token. expired access token should be automatically refreshed"
│      Error: expected 400 "Bad Request", got 401 "Unauthorized"

but no logs around it ? :/ I'm not sure the test output makes sense to me to be honest

@jkakavas
Copy link
Member

jkakavas commented Jan 6, 2019

Is there any way to figure out if the test fails on this

expect(apiResponseTwo.body).to.eql({
assertion or on the next
expect(apiResponseWithExpiredToken.body).to.eql({
one ?

@jkakavas
Copy link
Member

jkakavas commented Jan 6, 2019

Could it be that we're making a wrong assumption here ?:

try {
// Token should be refreshed by the same user that obtained that token.
const {
access_token: newAccessToken,
refresh_token: newRefreshToken
} = await this._options.client.callWithInternalUser(
'shield.getAccessToken',
{ body: { grant_type: 'refresh_token', refresh_token: refreshToken } }
);
this._options.log(['debug', 'security', 'saml'], 'Access token has been successfully refreshed.');
request.headers.authorization = `Bearer ${newAccessToken}`;
const user = await this._options.client.callWithRequest(request, 'shield.authenticate');
this._options.log(['debug', 'security', 'saml'], 'Request has been authenticated via refreshed token.');
return AuthenticationResult.succeeded(
user,
{ accessToken: newAccessToken, refreshToken: newRefreshToken }
);
} catch (err) {

What if the error we catch is not from the call to shield.getAccessToken on 290 that attempts to refresh the token but from shield.authenticate in line 298? That would explain why the if in

if (isInvalidRefreshTokenError(err)) {
is false and so a 401 is returned from /api/security/v1/me and not a 401 as expected.

Question then is why would the request with a newly created access token return such an error? This is only returned when the token is expired (hard to believe as it was just created) with a trace log like:

Failure in [validate token] for id [XXXXXXXXXX]

or if the access token has been explicitly invalidated with a trace log like

Failure in [check token state] for id [XXXXXXXXXX]

but I don't see any ES logs in the build logs of the job that failed

@kobelb
Copy link
Contributor

kobelb commented Jan 7, 2019

The error above looks like it is thrown at authentication attempt and not at trying to refresh the token. Since the call to the token API with the refresh grant is made with the internal user, I can't see how this would result in a 401 .

I agree. The following is the general flow of the "expired access token should be automatically refreshed" test:

  1. GET /abc/xyz to get the handshake response
  2. POST /api/security/v1/saml to auth with the SAML response tied to the previous handshake response
  3. Wait until the access token has expired
  4. GET /api/security/v1/me which will use the refresh token to use a new access token
  5. GET /api/security/v1/me using the old access/refresh tokens returned by (2) to ensure we get back a 400.
  6. GET /api/security/v1/me using the new access/refresh tokens returned by (4) to ensure we get a 200.

As far as I can tell, the test failure is related to step (5) above. When we try to use the old access/refresh tokens when calling shield.authenticate we expect to get back a response that we classify as an "access token expired" error, but we're getting back the unanticipated response below:

{"username":"[email protected]","roles":[],"full_name":null,"email":null,"metadata":{"saml_nameid":"[email protected]","saml(urn:oid:0.0.7)":["[email protected]"],"saml_nameid_format":"urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified"},"enabled":true,"authentication_realm":{"name":"saml1","type":"saml"},"lookup_realm":{"name":"saml1","type":"saml"}} error=\\"invalid_token\\", error_description=\\"The access token expired\\""

I'm adding some more logs to confirm this, if there are any others you'd like to see, please let me know.

@jkakavas
Copy link
Member

jkakavas commented Jan 7, 2019

Aha ! I see we have this

https://github.com/elastic/elasticsearch/blob/9d0e0eb0f3c13404d862795802aa6f0ee7e756b8/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java#L150

private static final String EXPIRED_TOKEN_WWW_AUTH_VALUE = "Bearer realm=\"" + XPackField.SECURITY +
"\", error=\"invalid_token\", error_description=\"The access token expired\"";

That should be error=\"token_expired\" instead. What is worrying is that this is there forever and I can't figure out why we starting returning this now only. Could be related to the reordering I did for token decoding and validation in elastic/elasticsearch#36893 but I'd like to understand why.

I'll push a fix to see if this resolves your flaky test

@kobelb
Copy link
Contributor

kobelb commented Jan 7, 2019

Thanks @jkakavas !

@jkakavas
Copy link
Member

jkakavas commented Jan 7, 2019

Turns out this cannot be "fixed" on our side see elastic/elasticsearch#37196 (review). The

error=\\"invalid_token\\", error_description=\\"The access token expired\\

is the proper error message for an expired token so it looks like you need to adjust the code that checks whether you got an expired access token error.

@kobelb
Copy link
Contributor

kobelb commented Jan 7, 2019

@jkakavas thanks for looking into this, I'll make that change on our side.

@kobelb
Copy link
Contributor

kobelb commented Jan 7, 2019

@jkakavas I assume this should be backported to 6.x, as this could occur there as well, it's just not being hit by CI at the moment?

@jkakavas
Copy link
Member

jkakavas commented Jan 7, 2019

@kobelb The fact that you started getting it now in the tests has to do with elastic/elasticsearch#36893 which will not be backported to 6.x. However I believe you should backport as the invalid_token error is a proper one to be looking for no matter why this was not triggered up until now ( I will still try and get to the bottom of what actually changed with #36893 for this behavior )

@kobelb
Copy link
Contributor

kobelb commented Jan 7, 2019

@jkakavas when I'm getting the following response, from what I can tell this isn't valid JSON for us to parse to look at the error:

{"username":"[email protected]","roles":[],"full_name":null,"email":null,"metadata":{"saml_nameid":"[email protected]","saml(urn:oid:0.0.7)":["[email protected]"],"saml_nameid_format":"urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified"},"enabled":true,"authentication_realm":{"name":"saml1","type":"saml"},"lookup_realm":{"name":"saml1","type":"saml"}} error=\"invalid_token\", error_description=\"The access token expired\""}},"status":401}

I was able to replicate this using cURL, as I was paranoid our ES client library might've been mucking with the response, but it looks like the response itself isn't valid.

@jkakavas
Copy link
Member

jkakavas commented Jan 7, 2019

@kobelb what is the request you're sending? Not sure how this error=\"invalid_token\", error_description=\"The access token expired\""}},"status":401} part makes it into that response above which looks otherwise successful. Can you share your replication steps?

This is the response I'm getting after my access token has expired for localhost:9200/_xpack/security/authenticate i.e.:

{
    "error": {
        "root_cause": [
            {
                "type": "security_exception",
                "reason": "token expired",
                "header": {
                    "WWW-Authenticate": "Bearer realm=\"security\", error=\"invalid_token\", error_description=\"The access token expired\""
                }
            }
        ],
        "type": "security_exception",
        "reason": "token expired",
        "header": {
            "WWW-Authenticate": "Bearer realm=\"security\", error=\"invalid_token\", error_description=\"The access token expired\""
        }
    },
    "status": 401
}

@kobelb
Copy link
Contributor

kobelb commented Jan 8, 2019

@jkakavas my recreation steps were rather manual, and only occasionally returned the response that I was looking for. I continued to run our functional test server, and our functional tests while logging the access token that is used for the _security/_authenticate API and when I saw our tests fail with the 401 as opposed to the 401, I grabbed that access token and continued to hit the ES API until I got the unanticipated response. My saml-logs branch of Kibana has the log statements that I refer to below.

After cloning kibana, inside the x-pack folder I run the following to start up Kibana and ES:

node ./scripts/functional_tests_server.js --config test/saml_api_integration/config.js

Then from also within the x-pack folder, I run the following to actually run the problematic test:

node ../scripts/functional_test_runner.js --config test/saml_api_integration/config.js --grep "apis SAML security SAML authentication API access with expired access token. expired access token should be automatically refreshed$"

It normally takes me a few times of running this to get the test failure. At that point, I'll grab the access token from the logs that the ES/Kibana server output from the first command, and run a curl similar to the following against ES:

curl -u elastic:changeme -H 'Accept: application/json' -XGET http://localhost:9220/_security/_authenticate -H 'Authorization: Bearer o6CrAyD6pEvZW+c99rY8/sDHhPl4V6/jlnDIbkriCoUtnECQtAg4128yf4PzPgy4Vtq50YAiFxGwRzSZc6Ej2dQRQnMb20+ugWjek2uI9QybC7DFyKH4lRd8+OLjLQlWlEI='

Eventually, it'll give me that response that isn't proper JSON.

@jkakavas
Copy link
Member

jkakavas commented Jan 8, 2019

Any reason why you also use the credentials with -u ? This will make curl add an additional Authorization: Basic base64encoded header to your request

@kobelb
Copy link
Contributor

kobelb commented Jan 8, 2019

Any reason why you also use the credentials with -u ? This will make curl add an additional Authorization: Basic base64encoded header to your request

Just sloppiness... I was able to replicate the issue without the -u parameter as well.

I don't know if it helps at all, but I just got the following response also, which is also improper JSON:

{"error":{"root_cause":[{"type":"illegal_state_exception","reason":"Channel is already closed"}],"type":"illegal_state_exception","reason":"Channel is already closed"},"status":500}id-format:unspecified"},"enabled":true,"authentication_realm":{"name":"saml1","type":"saml"},"lookup_realm":{"name":"saml1","type":"saml"}} error=\"invalid_token\", error_description=\"The access token expired\""}},"status":401}

@jkakavas
Copy link
Member

jkakavas commented Jan 9, 2019

This should be fixed by elastic/elasticsearch@2a79c46 and I already see Kibana's CI is looking better. Let's leave this open until we're certain the problem is resolved

@jkakavas
Copy link
Member

Closing this for now as the tests haven't failed for more than a day ( last on January 9th 2019, 10:47:10.476 ). Feel free to reopen if we see something similar. I'm obliged this was caught by Kibana's CI, sorry for the noise it caused

@liza-mae liza-mae added bug Fixes for quality problems that affect the customer experience and removed bug Fixes for quality problems that affect the customer experience test test_xpack_functional labels Jan 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience failed-test A test failure on a tracked branch, potentially flaky-test Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants