Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auth overhaul (access tokens, refresh tokens, api tokens) #3636

Closed
wants to merge 1 commit into from

Conversation

sunaurus
Copy link
Collaborator

@sunaurus sunaurus commented Jul 16, 2023

This PR is not complete (missing items detailed below), but I am submitting it already as a draft to get some early feedback. Please check the description below before checking code - I would really appreciate feedback on the overall design which is included in the description. But comments on the partially complete code are of course welcome as well.


Introduction

This PR contains an overhaul of Lemmy authentication. It introduces three new authentication tokens: access tokens, refresh tokens, and api tokens (more details below).

The changes are intended to be backwards compatible - the existing /login endpoint will become deprecated but will remain operational until we are ready to remove it in a future version.

What is wrong with our current authentication?

  1. Auth tokens never expire: [Security]: No token/session invalidation after logout/reset password #3364
  2. Auth sessions can't be revoked by users
  3. There is no support for httpOnly cookie based auth: Consider using HttpOnly for jwt cookie lemmy-ui#1252
  4. There is no support for api token based auth - all 3rd party apps require user passwords
  5. All auth tokens have full access to everything, their scope can't be limited

This PR contains intends to solve all these issues.

Proposed solution

This PR proposes to replace the existing auth token with 3 new types of tokens:

Access token

This token can be acquired with either a refresh token or an API token.

The new access token is intended to be a backwards compatible drop-in replacement for the existing auth token, with a few key differences:

  • It expires within 5 minutes (so even if it leaks, it can only be abused within 5 minutes of the leak)
  • It contains a method claim, which can be used later to limit certain activities to specific methods (for example, disallow password changes if the access token was obtained via an API token)

Refresh token

This token can be acquired using username + password (+ 2fa).

It lives in a secure httpOnly cookie (can't be read from browser js), which is limited only to the /api/v3/get_access_token path.

This is intended only for trusted web interfaces (such as lemmy-ui) and can be used to create access tokens with full access to the user. Each refresh token can be considered a separate "session". Each token records its last use time, as well as last use ip address - these values can be displayed to users in some new security UI so they get an overview of their active sessions. Each refresh token expires 2 weeks after it was last used, or when revoked manually by a user.

API token

This token must be manually created by users with a specific label and expiry date.

This is intended for 3rd party apps to avoid users from entering their passwords directly into untrusted code. The api token can be used similarly to refresh tokens to request access tokens, but the created access tokens would have limited access. Each API token will also record their last use time as well as last use ip address. API tokens expire after their user defined expiry date, or when revoked manually.


To summarize the general flow:

  1. Acquire either a refresh token (if trusted web ui) or an API token (if 3rd party app)
  2. Request access token using the token from step 1
  3. Make all API requests with access token from step 2
  4. If access token is close to expiry (or last request failed due to token), get a new access token (and retry last request)
  5. If getting access token fails due to a token error, assume the (refresh or api) token has expired and go back to step 1

Rollout plan

  1. Release the new logic in a minor Lemmy version
  2. Add a migration guide to release notes to allow app developers to migrate to the new APIs
  3. Update Lemmy-ui to use the new endpoints
  4. After some time has passed, remove the old /login endpoint in a backwards-incompatible Lemmy update

TODO in this PR

  • Add refresh token list & revoke endpoints
  • Add api token create & list & revoke endpoints
  • Disallow some actions (new api token creation + password change + reading user e-mail?) when access token method is Api
  • Add some tests

TODO in future PRs

  • Switch lemmy-ui to use new authentication
  • Add security page to lemmy-ui, where users can see and revoke their sessions (refresh tokens), as well as see/revoke/create API tokens
  • Add method for 3rd party apps to redirect users to an API token creation page (with a potential return_url to automatically get back to the app with the created token)

@sunaurus sunaurus marked this pull request as draft July 16, 2023 18:00
@0xAnansi
Copy link

As discussed in matrix, there is an issue in the logic that I briefly described in the ticket #3364 since it does not check for token reuse.

The proposed PR switch from never ending access token, to medium lifespan refresh tokens and short lived access tokens.
While this is far better, the issue is that since the refresh token has to be stored in the client, it's usually still vulnerable to a lot of different attack vectors, one of the worst being a malicious add-on or even some injections that could bypass the httponly protection for the cookie. In the end, since the browser has to read this cookie, anything running on the browser could read it with enough effort.

The mitigation I highlighted in the previous ticket was to implement re-use checks for the refresh token, and make them one time only.

Every time an access token is generated, it should be accompanied by a new refresh token. This refresh token will be silently sent to the backend before the access token expires to get a new access token, and a new refresh token for the next silent refresh.

If this refresh token has been used before, it means that someone else is using a stolen token, therefore you need to wipe all refresh tokens linked to the user, killing all further silent refresh and forcing a hard re-auth.

If this keeps happening, a warning should be sent to the user since his browser/account has most likely been compromised.

This would mean that the DB schema should be modified to have a separate last_used_date and created_date, to be able to check for the following scenarios:

  • User has a valid token that can still be used -> do nothing
  • User has a valid token that has expired but was never used -> remove it when created + lifespan < today
  • User has already used this token -> remove all user's token and warn if necessary

@bqv
Copy link

bqv commented Jul 17, 2023

Relevant Chat:

w: sunaurus (lemm.ee): Just wondering if you have any plans to address this issue regarding the JWT 'sub' not being a string in your Auth overhaul PR?

sunaurus (lemm.ee): That's a breaking change, so most likely not

sunaurus (lemm.ee): I have no idea what clients might already be depending on it being like it is 😅

me: It shouldnt matter? Have a two tier system, the old login flow returns a jwt with integer subject still, but the new auth flow returns a correct jwt. Clients using the old flow will never even see the new flow jwts

me: I.e. It need not be a breaking change, but if not done now it will be

gene c: good ole versioned apis

cr4yfish: I'd say it's better to address this now instead of sometime later since the situation will only get worse with more clients depending on it. But it's your call :).

Worth mentioning so not passed over

@aeharding
Copy link

aeharding commented Jul 17, 2023

Is there a good place to provide feedback on this other than the PR? I have a few concerns. Specifically, I don't think this increases security at all for 3rd party apps that aren't/can't use an httponly cookie, and it also dramatically increases the barrier to build a 3rd party lemmy app with tons of logic required to manage tokens: verify if the access token is valid, refresh the token with the refresh token if the refresh token is valid before any the request is made, etc etc.

Instead of the approach in this PR, I'd much rather see:

  • a way to revoke the current Lemmy tokens (✅ user being hacked)
  • making the current tokens httponly (for lemmy-ui) (✅ solves xss)
  • adding an OAuth flow for 3rd party apps (✅ solves typing password into 3rd party code)

@sunaurus
Copy link
Collaborator Author

@0xAnansi I agree that refresh token rotation is a good idea, will add it soon. Probably tracking re-use and notifiying users about it will be out of scope here, though

@sunaurus
Copy link
Collaborator Author

sunaurus commented Jul 17, 2023

@aeharding Your comment is a bit confusing for me:

Instead of the approach in this PR, I'd much rather see:

a way to revoke the current Lemmy tokens (✅ user being hacked)

This proposal adds that

making the current tokens httponly (for lemmy-ui) (✅ solves xss)

This proposal adds that (for the refresh tokens, which are already more secure than the current auth tokens thanks to having an expiry)

adding an OAuth flow for 3rd party apps (✅ solves typing password into 3rd party code)

Earlier you complained about barrier to implement 3rd party apps. I guarantee you that barrier is higher with oauth than with the API tokens in this proposal - oauth brings a lot of extra complexity, but doesn't really seem to bring a lot of extra value. By the way, with oauth, clients also need to have very similar logic to manage tokens.

Regarding oauth in general:

One of the central concepts in oauth is client identification, but this does not really make a lot of sense with a federated system - there is no central registry to keep track of clients. Every 3rd party app would need to register a new client_id on every new Lemmy instance it's used on - the whole client_id becomes kind of meaningless.

Basically, oauth adds a lot of extra complexity for no real benefit - it seems to me that implementing it on Lemmy would just be buzzword-driven development 😅By the way, I'm saying that as somebody who has implemented oauth several times in different projects - I definitely recognize that it has its uses, but it doesn't seem super useful in a federated situation.

This is not to say that Lemmy shouldn't support logging in with external oauth providers - I think that would be a super cool feature, but it's orthogonal to this PR.


Let me address one more thing you said:

I don't think this increases security at all for 3rd party apps

Security is increased by:

  1. Adding an expiry to API tokens (currently there is none)
  2. Using short-lived auth tokens for the majority of requests (long-lived tokens are no longer transmitted all the time)
  3. Reducing access when using API tokens (currently any auth token can be used to fully take over an account)
  4. Enabling API token revocation (currently impossible)

@aeharding
Copy link

aeharding commented Jul 17, 2023

@sunaurus Maybe I'm misunderstanding.

For context, currently I use the lemmy-js-client library to make calls.

With this PR, before every single API request I'll need to add logic to first check if the 5-min token is still valid, make a request using the refresh token (which is still in JS context anyways) to refresh the access token, and then somehow coordinate this to only do the access token refresh once for all pending requests if there are multiple requests that need to be made at the same time.

I've dealt with refresh tokens before and they add a lot of extra complexity.

Earlier you complained about barrier to implement 3rd party apps. I guarantee you that barrier is higher with oauth than with the API tokens in this proposal - oauth brings a lot of extra complexity, but doesn't really seem to bring a lot of extra value. By the way, with oauth, clients also need to have very similar logic to manage tokens.

I'm confused the logic you are expecting for logging a user into a 3rd party app is, so I'd appreciate if you can clarify. As I understand it, an API token would be something that end users would have to be exposed to and understand. If this is true, this is unacceptable. End users cannot be expected to know what an API token is and go to an instance to create one. The only use case I can see for this would be for technical users making bots and integrations, not 3rd party clients.

Assuming this is true, most if not all 3rd party clients will just continue to accept username and password for the user in the 3rd party code.

By the way, with oauth, clients also need to have very similar logic to manage tokens.

I don't believe Mastodon requires juggling both a refresh and an access token. I believe there's only one token. https://docs.joinmastodon.org/spec/oauth/

@Nutomic
Copy link
Member

Nutomic commented Jul 19, 2023

Making the cookie httponly doesnt require separate tokens. It would be enough if Lemmy could read the token directly from cookie, instead of requiring to pass it via parameter. #3596 is going in this direction, later we can add a middleware which handles auth.

Im also not sure how effective the 5 minute lifetime of api token would be. An attacker doesnt need to do any manual action with a stolen token, instead he could setup a script which performs such actions via api. So it does limit the attack surface, but Im not sure its worth the extra complexity and bugs.

In any case a db table for tokens is a good idea. We can use this to invalidate tokens. Basically check with each api call if the given token is stored in the db. Implement a new logout api call which deletes the current token from db. Change password apis also delete all tokens for current user from db.

@0xAnansi
Copy link

@Nutomic indeed, if we don't check for re-use, it's not as useful since you will not detect the actual attack.
The fact that they're single use makes it far harder to exploit if the user is still using his session since you're running into a race condition with the legitimate user.

That goes for user token as well as API tokens.

@erlend-sh
Copy link

Does this take #1368 into account?

@bqv
Copy link

bqv commented Jul 23, 2023

Does this take #1368 into account?

#3636 (comment)

@fvanzee fvanzee mentioned this pull request Jul 25, 2023
@Security-Chief-Odo
Copy link

Security-Chief-Odo commented Jul 28, 2023

Wanted to share this documentation for password reset guidance that may be useful when creating a new implementation of such.

@Nutomic
Copy link
Member

Nutomic commented Aug 4, 2023

I opened #3818 which solves these issues in a simpler way. It adds a db table for login tokens, but otherwise keeps the single token like it is now.

sub: local_user_id,
iss: hostname.to_string(),
iat: Utc::now().timestamp(),
exp: Some((Utc::now() + Duration::minutes(5)).timestamp()),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes:

  1. It might be nice for this to be a configurable value.
  2. 5 minutes seems like an aggressive default for access tokens. This isn't a banking app; the impact of leaked tokens for most users is probably minimal. I looked around the Internet for best practices, and they vary, but I think 1 hour would be sensible; it's the default for Okta.
  3. I suppose I'd be concerned about admin tokens leaking, so perhaps it would be prudent to set a lower timeout for admin tokens and a higher one for normal users?

@bqv
Copy link

bqv commented Aug 8, 2023

Are the JWTs just gonna stay broken then...

@dessalines
Copy link
Member

When I get some time, I plan on researching this myself as well, but does anyone have good links (maybe from stackoverflow), for a high-level discussion of the post popular / used auth methods outside of oauth, that utilize some of these concepts above?

More than anything, I want to make sure we aren't adding any unecessary complication.

@erlend-sh
Copy link

Some good discussions here: https://hn.algolia.com/?q=auth

I imagine @discourse makes for a good reference implementation, since it’s got very similar design goals as Lemmy and is also using Postgres. Supabase also bases its auth heavily on Postgres and documents the approach in great detail.

@TheBrokenRail
Copy link

With this PR, before every single API request I'll need to add logic to first check if the 5-min token is still valid, make a request using the refresh token (which is still in JS context anyway) to refresh the access token, and then somehow coordinate this to only do the access token refresh once for all pending requests if there are multiple requests that need to be made at the same time.

I agree with @aeharding on this. While the current situation is obviously terrible (no token scoping, expiration, or revocation), this solution seems to over-correct and over-complicate things.

Requiring the token to be refreshed every 5 minutes, just sounds like a fragile system to me. Especially if the refresh token also changes every refresh.

For instance, what if you have an Android app, and two activities refresh their tokens at roughly the same time? The first one would succeed, generating new access and refresh tokens, while the second one would fail (and notify the user that their account was hacked) because its refresh token was just invalidated. You'd have to keep everything completely synchronized to make sure that only one refresh was happening at a time and everything was all using the same refresh + access tokens. Code like that is infamously difficult.

This might seem simple to implement when only one API call is made at a time, but the moment that isn't the case, things would get difficult, complicated, and buggy really quickly in my opinion.

@0xAnansi
Copy link

The described process is the de-facto standard for all web applications that provide access to native clients and 3rd party tools.

Each client will have its own "string" of tokens, in your example, the client is the app, not an activity. In this kind of case there is a single worker handling the token renewal, so there is no "changing at the same time", the app's token will change, and the drift between the new token lifespan and the old one will ensure that older calls made with the older token will still work.

In any case, if something goes wrong because of 403 caused by a dead token, you only have to check client side if the token has changed and retry with the new one silently, and you only lose the equivalent of one request time.

I don't see any other way to ensure both an adequate level of security, and a way to easily provide access to 3rd parties that user want to submit their credentials to.

This is the most common OAuth use case, and its most common implementation. Also, while longer lifetime are technically possible, my level of confidence in the security of the Lemmy codebase makes me weight in favor of the shortest possible at the very least until everything becomes more stable.

While the token rotation will fix the current issue with the way the tokens are handled, not having an actual 3rd party auth system makes any tool building difficult. You cannot "log as" without having to store user IDs, which is a big no for a lot of reasons.

With OAuth, you can give access to your information and revoke access at any time without compromising your account.

@Nutomic
Copy link
Member

Nutomic commented Sep 11, 2023

@sunaurus Can you say when you will continue working on this? Or maybe its better I finish #3818 first, and we implement further changes in separate PRs.

@kalleep
Copy link

kalleep commented Sep 13, 2023

I agree with @0xAnansi that rotation of the refresh token should happen when requesting a new access token and the old refresh token should be invalidated, then on reuse detection all refresh tokens in the chain should be invalidated.

For instance, what if you have an Android app, and two activities refresh their tokens at roughly the same time? The first one would succeed, generating new access and refresh tokens, while the second one would fail (and notify the user that their account was hacked) because its refresh token was just invalidated. You'd have to keep everything completely synchronized to make sure that only one refresh was happening at a time and everything was all using the same refresh + access tokens. Code like that is infamously difficult.

This can be mitigate by having a short grace period for the previous refresh token so it still can be used to issue new access tokens for a short period, see https://developer.okta.com/docs/guides/refresh-tokens/main/#grace-period-for-token-rotation

@RikudouSage
Copy link

RikudouSage commented Sep 15, 2023

This proposal seems like a mess combining multiple things into one. Why not go with a (modified) OAuth? For example, apps could create the OAuth client on-the-fly when encountering a new instance. This is what Mastodon does. Or we could be extra fancy and federate OAuth clients using AP.

But please, implement some widely-used standard, don't make up a Frankenstein's monster auth.

Edit: It's even an official standard: https://www.rfc-editor.org/rfc/rfc7591. And we could use another standard for automatic discoverability of OAuth: https://www.rfc-editor.org/rfc/rfc8414.html

@0xAnansi
Copy link

When I get some time, I plan on researching this myself as well, but does anyone have good links (maybe from stackoverflow), for a high-level discussion of the post popular / used auth methods outside of oauth, that utilize some of these concepts above?

More than anything, I want to make sure we aren't adding any unecessary complication.

@dessalines Completely missed your message. I went over this article and it seems pretty good at explaining the different ways to authenticate users on modern web apps https://testdriven.io/blog/web-authentication-methods/

Just know that OAuth is just one flavor of token based authentication!

@phiresky
Copy link
Collaborator

Without looking too closely, I'd agree that it might be better to "simply" implement full oauth instead of a custom solution and use that for all authentication (including the official client). There's this library for oauth2 server in Rust https://github.com/HeroicKatora/oxide-auth that looks fairly good.

@Nutomic
Copy link
Member

Nutomic commented Sep 18, 2023

Closing this in favor of #3818 which is much simpler and doesnt involve any breaking changes for clients. Further improvements (like oauth) can be implemented in separate PRs.

@Nutomic Nutomic closed this Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.