[#687] trim payload before try to parse it. #721

bmarwell · 2022-04-16T12:08:26Z

fixes #687

Hint: I really don't like your heuristics.
Imagine this scenario:

You have a binary payload which is by design not intended to be json, e.g. HOCON (typesafe config).
The parser won't parse it as either JSON nor as payload just because it starts and ends with { and }.

Better approach:

Try to parse as JSON no matter what
If it fails, just log info message and use the payload as-is.

If you are interested in this approach, I can create another PR. :)

bdemers

I agree, the checking for the leading { is a pain.
We have talked about this a few times as @lhazlewood has been working away on the "jwe" branch.

IIRC, we talked about using the cty header, for this.

The cty header:

It will typically not be used by applications when the kind of object is already known. This parameter is ignored by JWS implementations; any processing of this parameter is performed by the JWS application. Use of this Header Parameter is OPTIONAL.

Per spec, the JWT payload could be anything, though its usually JSON, so the default should assume JSON, and anything else should use a parseBytes() method.

This came up recently too, this method isn't 100% correct, it should return a byte[] (well, and have it's name tweaked)

jjwt/api/src/main/java/io/jsonwebtoken/JwtParser.java

Line 506 in 72973f9

Jwt<Header, String> parsePlaintextJwt(String plaintextJwt)

If interested in hacking on this send a PR to the jwe branch 😄

impl/src/main/java/io/jsonwebtoken/impl/DefaultJwtParser.java

bdemers · 2022-04-18T17:02:40Z

Also, thanks @bmarwell!

lhazlewood · 2022-04-18T17:49:04Z

The concept of 'plaintext' (as a String) will be going away for 1.0. JJWT's concept of String vs JSON body support was created when the JWT RFCs weren't yet finalized. Now that they are final, the reality is that a JWT can contain any type of payload, with JSON and JSON claims being the most common type of payload.

This is very cool IMO, and not many folks are aware of this super cool capability. It means that JWTs can be used to represent images, pdf documents, etc - basically anything. JWT is now effectively a general-purpose messaging format, not something limited to just identity assertions.

As such, if it's not a commonly-expected JSON claims payload, the cty header should be set, and the application developer will be able to get that body as a byte[] and do what they want with it.

As @bdemers quoted above, if the cty header param is set, then the JWT RFC spec is clear that a JWT library like JJWT must ignore the body/payload, because it is expected that the application developer will handle the payload themselves. As a result, the jwe branch currently has the heuristic written as such:

jjwt/impl/src/main/java/io/jsonwebtoken/impl/DefaultJwtParser.java

Lines 511 to 514 in 3ba22fe

    
           String payload = new String(bytes, Strings.UTF_8); 
        
           Claims claims = null; 
        
           if (!payload.isEmpty() && !hasContentType(header) && payload.charAt(0) == '{' && payload.charAt(payload.length() - 1) == '}') { //likely to be json, parse it:

(but without the raw byte[] array stuff yet - that will be handled in subsequent work before releasing 1.0).

Now - with regards to trimming the resulting String before employing any heuristics:

Because the payload can be anything - even a normal string - we can't alter the payload for trying to detect if it's JSON (because that leading whitespace could be an expected part of a non-JSON payload). If anything, the best thing to do is extract this logic out to a helper method, e.g.

private static boolean isLikelyJson(String payload, Header<?> header) {
   // 1. simple null/empty check:
    if (Strings.isEmpty(payload) {
        return false;
    }
    // 2. if (hasContentType(header), return `false`.

    // 3. Start at the beginning of the string, and iterate character by character, stopping at the first non-whitespace character
    //     if, a `{`, continue, if not, return `false`

    // 4.  Start at the end of the string and iterate character by character, stopping at the first non-whitespace character
    //.    if a `}`, return `true`, if not, return `false`
}

Something to that effect. This ensures we don't create another non-mutable String in the String table, and doesn't alter the intended payload.

bmarwell · 2022-04-19T07:38:35Z

Now - with regards to trimming the resulting String before employing any heuristics:

Because the payload can be anything - even a normal string - we can't alter the payload for trying to detect if it's JSON (because that leading whitespace could be an expected part of a non-JSON payload). If anything, the best thing to do is extract this logic out to a helper method, e.g.
private static boolean isLikelyJson(String payload, Header<?> header) {
   // 1. simple null/empty check:
    if (Strings.isEmpty(payload) {
        return false;
    }
    // 2. if (hasContentType(header), return `false`.

    // 3. Start at the beginning of the string, and iterate character by character, stopping at the first non-whitespace character
    //     if, a `{`, continue, if not, return `false`

    // 4.  Start at the end of the string and iterate character by character, stopping at the first non-whitespace character
    //.    if a `}`, return `true`, if not, return `false`
}
Something to that effect. This ensures we don't create another non-mutable String in the String table, and doesn't alter the intended payload.

Even after reading this, I still don't get why just trying to parse it is not an option. Besides, my option does not alter the input...

If you are just trying to parse JSON and add an try-except, the parser will take care of superfluous whitespace.

Anyway, I was trying to get a quick fix into the current version. Your branch might take another month or so to evolve. So why not start with an 80% solution?

Long story short:
This is a serious but in jjwt. User delivers perfectly fine JSON and we reject it because of choosing heuristics instead of using a parser.

lhazlewood · 2022-04-19T20:21:05Z

Even after reading this, I still don't get why just trying to parse it is not an option. Besides, my option does not alter the input...

It doesn't alter the input directly, but it does add a potentially-large temporary String to the heap unnecessarily (the JWT payload is often the largest String in the JWT token in most production applications). Given how many hundreds or thousands of JWTs that can be parsed per second in many apps (every single http request, JMS or MQTT message, etc), it's a performance hit that I wouldn't want to impose on application developers. Even though the JVM is pretty efficient, if we can avoid the hit to the heap, all the better.

If you are just trying to parse JSON and add an try-except, the parser will take care of superfluous whitespace.

Invoking a JSON parser and swallowing a subsequent exception when the payload isn't JSON is a heavyweight operation that could significantly impact performance - better to avoid it if possible. I've also never really been a fan of using try/catch scenarios for expected logic, because it's not 'exceptional' any longer. But that's less of a concern for me - performance is more important IMO.

But even if we ignore the heap and potential performance impact, there's a bigger problem here: How do you distinguish between JSON-but-invalid-JSON and not-JSON-but-should-be-swallowed/ignored exceptions?

They need to be handled differently, and there's no way to know really without inspecting the cty header, and that's not being done today. And unfortunately we probably can't add the cty check until the next major release (i.e. 1.0), because it fundamentally changes the semantics of how JJWT works today: if anyone is setting that header today, and they are still expecting JJWT to parse the body as Claims, JJWT would stop working for them as they expect. This kind of change should ideally only be done on a major release IMO.

Anyway, I was trying to get a quick fix into the current version. Your branch might take another month or so to evolve. So why not start with an 80% solution?

We can do that today. The quick fix is to create the helper method isLikelyJson as suggested above, but not check the cty header as I proposed above. (i.e. do steps 1, 3, and 4).

This ensures there is zero heap or performance penalty, and retains current behavior semantics until we can enforce the cty check for the 1.0 release.

This is a serious but in jjwt. User delivers perfectly fine JSON and we reject it because of choosing heuristics instead of using a parser.

I agree we should be more tolerant here per the Robustness Principle, but in fairness, I wouldn't exactly call it serious. To the best of my knowledge, this is the first report I can remember in many years of leading whitespace being an issue, probably because practically every other JWT library doesn't add extraneous whitespace.

That said, we do need address this, and I think(?) the minimal 3-part isLikelyJson suggestion above will solve all concerns until we can add the cty check to it for 1.0.

Does that work?

bmarwell · 2022-04-20T11:20:33Z

Does that work?

Phew, I don’t really know… for several reasons.

It doesn't alter the input directly, but it does add a potentially-large temporary String to the heap unnecessarily (the JWT payload is often the largest String in the JWT token in most production applications). Given how many hundreds or thousands of JWTs that can be parsed per second in many apps (every single http request, JMS or MQTT message, etc), it's a performance hit that I wouldn't want to impose on application developers. Even though the JVM is pretty efficient, if we can avoid the hit to the heap, all the better.

That's a potential +1 for the Json parser, btw. ;-)

If you are just trying to parse JSON and add an try-except, the parser will take care of superfluous whitespace.
Invoking a JSON parser and swallowing a subsequent exception when the payload isn't JSON is a heavyweight operation that could significantly impact performance - better to avoid it if possible.

Is it? They can operate on streams and fail early, especially if you tell them to deserialize a JsonObject.
The instance of the JSON parser is somewhere in-memory already, probably.
If you'd really want performance optimizations, they should use a (ByteArray)InputStream or something.

That said, the more effort we put into heuristics or any other isLikelyJson method, the lesser the gap to parsing and try-catching an exception.

I've also never really been a fan of using try/catch scenarios for expected logic, because it's not 'exceptional' any longer. But that's less of a concern for me - performance is more important IMO.

I totally agree here.

But even if we ignore the heap and potential performance impact, there's a bigger problem here: How do you distinguish between JSON-but-invalid-JSON and not-JSON-but-should-be-swallowed/ignored exceptions?

At the moment we have an even bigger problem described in #678 by the original author:
Not sure how much we want to blow up the isLikelyJson() method, but there is a good chance that we miss valid json and give the user a String/byte[] payload when we should not.

At least that is a bigger problem for me, because jjwt is already using JSON parsers which would solve the problem.
As you mentioned: Yes, there is a cost. But how high is that cost compared to delivering a String instead of Claims at (seemingly) random?

The workaround would be for users to implement the callback parseJws/parseJwt methods where THEY do the try-catch logic. Not very user friendly when the library could do this for you.

I fully agree with all the following cty stuff, but I am still looking for a fix for a minor version.

They need to be handled differently, and there's no way to know really without inspecting the cty header, and that's not being done today. And unfortunately we probably can't add the cty check until the next major release (i.e. 1.0), because it fundamentally changes the semantics of how JJWT works today: if anyone is setting that header today, and they are still expecting JJWT to parse the body as Claims, JJWT would stop working for them as they expect. This kind of change should ideally only be done on a major release IMO.

Anyway, I was trying to get a quick fix into the current version. Your branch might take another month or so to evolve. So why not start with an 80% solution?

We can do that today. The quick fix is to create the helper method isLikelyJson as suggested above, but not check the cty header as I proposed above. (i.e. do steps 1, 3, and 4).

This ensures there is zero heap or performance penalty, and retains current behavior semantics until we can enforce the cty check for the 1.0 release.

As mentioned above, there is a high potential that we are not giving back a ClaimJWT/JWS but instead a byte[] payload when the JWT contained perfectly valid JSON.
Unless you want to write a really fancy isLikelyJson method.
… which will undermine your performance assumptions, because the JSON libs already do this for you (and can probably do it better).

That said, we do need address this, and I think(?) the minimal 3-part isLikelyJson suggestion above will solve all concerns until we can add the cty check to it for 1.0.

🤷🏻‍♂️
-0.5 for isLikelyJson because either it gets very big and complicated so there is no performance saving, or it will miss some non-edge-cases.

I mean, we could do a performance measurement.

lhazlewood · 2022-04-20T19:53:07Z

Invoking a JSON parser and swallowing a subsequent exception when the payload isn't JSON is a heavyweight operation that could significantly impact performance - better to avoid it if possible.

Is it?

I'm fairly confident that trying logic when not needed, throwing an exception, filling in a stack trace, propagating the call stack, etc, is most definitely slower and more cumbersome than just skipping whitespace in a loop, especially when 99.99% of payloads never have leading or trailing whitespace.

That said, the more effort we put into heuristics or any other isLikelyJson method, the lesser the gap to parsing and try-catching an exception.

I'm not worried about this particular case because the fix is trivial: We'd be keeping logic that has been working for 7+ years, and just accounting for any leading or trailing whitespace that might occur. This is what the JWT RFC requires for JSON Claims anyway. Once that logic exists, I'm pretty sure it won't need to change for another 7 years (if ever).

We can go straight to the 'if the cty header is missing then parse blindly' approach for the 1.0 release and remove all heuristics entirely, but I'm unwilling to make that change now because that would actually change behavior for app developers, and we can't do that until 1.0.

So in summary, skipping leading and trailing whitespace in the payload before checking for a { and } 1) requires the least amount of change or risk to existing JJWT users and 2) fully resolves #687. I'm not comfortable with the parse and try/catch approach until 1.0.

I'm really not trying to be difficult, I promise! I like the parse/try/catch approach, but I want to do the bare minimum necessary to resolve #687 without adding any additional risk whatsoever.

bmarwell · 2022-04-21T04:40:46Z

Ok, I'll try to come up with something new.

lhazlewood · 2023-09-16T21:25:57Z

Closing this just as a matter of housekeeping: the corresponding logic for this is in master (after #279 was merged), here:

jjwt/impl/src/main/java/io/jsonwebtoken/impl/DefaultJwtParser.java

Lines 257 to 325 in fa1e32b

    
               /** 
        
                * Returns {@code true} IFF the specified payload starts with a <code>&#123;</code> character and ends with a 
        
                * <code>&#125;</code> character, ignoring any leading or trailing whitespace as defined by 
        
                * {@link Character#isWhitespace(char)}.  This does not guarantee JSON, just that it is likely JSON and 
        
                * should be passed to a JSON Deserializer to see if it is actually JSON.  If this {@code returns false}, it 
        
                * should be considered a byte[] payload and <em>not</em> delegated to a JSON Deserializer. 
        
                * 
        
                * @param payload the byte array that could be JSON 
        
                * @return {@code true} IFF the specified payload starts with a <code>&#123;</code> character and ends with a 
        
                * <code>&#125;</code> character, ignoring any leading or trailing whitespace as defined by 
        
                * {@link Character#isWhitespace(char)} 
        
                * @since JJWT_RELEASE_VERSION 
        
                */ 
        
               private static boolean isLikelyJson(byte[] payload) { 
        
                   int len = Bytes.length(payload); 
        
                   if (len == 0) { 
        
                       return false; 
        
                   } 
        
                   int maxIndex = len - 1; 
        
                   int jsonStartIndex = -1; // out of bounds means didn't find any 
        
                   int jsonEndIndex = len; // out of bounds means didn't find any 
        
                   for (int i = 0; i < len; i++) { 
        
                       int c = payload[i]; 
        
                       if (c == '{') { 
        
                           jsonStartIndex = i; 
        
                           break; 
        
                       } 
        
                   } 
        
                   if (jsonStartIndex == -1) { //exhausted entire payload, didn't find starting '{', can't be a JSON object 
        
                       return false; 
        
                   } 
        
                   if (jsonStartIndex > 0) { 
        
                       // we found content at the start of the payload, but before the first '{' character, so we need to check 
        
                       // to see if any of it (when UTF-8 decoded) is not whitespace. If so, it can't be a valid JSON object. 
        
                       byte[] leading = new byte[jsonStartIndex]; 
        
                       System.arraycopy(payload, 0, leading, 0, jsonStartIndex); 
        
                       String s = new String(leading, StandardCharsets.UTF_8); 
        
                       if (Strings.hasText(s)) { // found something before '{' that isn't whitespace; can't be a valid JSON object 
        
                           return false; 
        
                       } 
        
                   } 
        
                   for (int i = maxIndex; i > jsonStartIndex; i--) { 
        
                       int c = payload[i]; 
        
                       if (c == '}') { 
        
                           jsonEndIndex = i; 
        
                           break; 
        
                       } 
        
                   } 
        
                   if (jsonEndIndex > maxIndex) { // found start '{' char, but no closing '} char. Can't be a JSON object 
        
                       return false; 
        
                   } 
        
                   if (jsonEndIndex < maxIndex) { 
        
                       // We found content at the end of the payload, after the last '}' character.  We need to check to see if 
        
                       // any of it (when UTF-8 decoded) is not whitespace. If so, it's not a valid JSON object payload. 
        
                       int size = maxIndex - jsonEndIndex; 
        
                       byte[] trailing = new byte[size]; 
        
                       System.arraycopy(payload, jsonEndIndex + 1, trailing, 0, size); 
        
                       String s = new String(trailing, StandardCharsets.UTF_8); 
        
                       return !Strings.hasText(s); // if just whitespace after last '}', we can assume JSON and try and parse it 
        
                   } 
        
                   return true; 
        
               }

[jwtk#687] trim payload before try to parse it.

d76c009

bdemers reviewed Apr 18, 2022

View reviewed changes

impl/src/main/java/io/jsonwebtoken/impl/DefaultJwtParser.java Outdated Show resolved Hide resolved

[jwtk#687] Trim payload

f4f1f62

lhazlewood closed this Sep 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#687] trim payload before try to parse it. #721

[#687] trim payload before try to parse it. #721

bmarwell commented Apr 16, 2022

bdemers left a comment

bdemers commented Apr 18, 2022

lhazlewood commented Apr 18, 2022 •

edited

Loading

bmarwell commented Apr 19, 2022

lhazlewood commented Apr 19, 2022

bmarwell commented Apr 20, 2022

lhazlewood commented Apr 20, 2022

bmarwell commented Apr 21, 2022

lhazlewood commented Sep 16, 2023

[#687] trim payload before try to parse it. #721

[#687] trim payload before try to parse it. #721

Conversation

bmarwell commented Apr 16, 2022

bdemers left a comment

Choose a reason for hiding this comment

bdemers commented Apr 18, 2022

lhazlewood commented Apr 18, 2022 • edited Loading

bmarwell commented Apr 19, 2022

lhazlewood commented Apr 19, 2022

bmarwell commented Apr 20, 2022

lhazlewood commented Apr 20, 2022

bmarwell commented Apr 21, 2022

lhazlewood commented Sep 16, 2023

lhazlewood commented Apr 18, 2022 •

edited

Loading