-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CDRIVER-3380] Fixup JSON specials and decimal128 parsing #988
[CDRIVER-3380] Fixup JSON specials and decimal128 parsing #988
Conversation
@@ -542,7 +542,7 @@ bson_decimal128_from_string_w_len (const char *string, /* IN */ | |||
continue; | |||
} | |||
|
|||
if (ndigits_stored < 34) { | |||
if (ndigits_stored < BSON_DECIMAL128_MAX_DIGITS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No functional change, just replace a magic number.
@@ -634,7 +634,7 @@ bson_decimal128_from_string_w_len (const char *string, /* IN */ | |||
/* Shift exponent to significand and decrease */ | |||
last_digit++; | |||
|
|||
if (last_digit - first_digit > BSON_DECIMAL128_MAX_DIGITS) { | |||
if (last_digit - first_digit >= BSON_DECIMAL128_MAX_DIGITS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If last_digit - first_digit
is exactly 34
(BSON_DECIMAL128_MAX_DIGITS
), we want to treat that as out-of-bounds as well.
src/libbson/src/bson/bson-json.c
Outdated
case BSON_JSON_LF_CODE: | ||
case BSON_JSON_LF_DECIMAL128: | ||
case BSON_JSON_LF_DOUBLE: | ||
case BSON_JSON_LF_INT32: | ||
case BSON_JSON_LF_INT64: | ||
case BSON_JSON_LF_MAXKEY: | ||
case BSON_JSON_LF_MINKEY: | ||
case BSON_JSON_LF_OID: | ||
case BSON_JSON_LF_OPTIONS: | ||
case BSON_JSON_LF_REGEX: // $regex lands here, but $regularExpression does | ||
// not | ||
case BSON_JSON_LF_REGULAR_EXPRESSION_OPTIONS: | ||
case BSON_JSON_LF_REGULAR_EXPRESSION_PATTERN: | ||
case BSON_JSON_LF_SYMBOL: | ||
case BSON_JSON_LF_UNDEFINED: | ||
case BSON_JSON_LF_UUID: | ||
_bson_json_read_set_error ( | ||
reader, | ||
"Unexpected nested object value for \"%s\" key", | ||
reader->bson.unescaped.buf); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all of these cases, we are parsing a special "$dollar"
object key that does not want a nested sub-object value. Previously, this would fall through and it would later treat the closing brace }
of the sub-object as the closing of the entire special object itself. This fixes CDRIVER-3380.
src/libbson/src/jsonsl/jsonsl.c
Outdated
if (not_u && not_a) { | ||
VERIFY_SPECIAL ("null", 4); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes a bug in NaN
/null
parsing in jsonsl
. Previously: Any byte with value 'n'
would then be expected to be followed by a u
or an A
/a
character, but this check did not error if 'n'
was followed by anything else, and the contiguous sequence of characters would be thrown away and ignored. That is, the string [nope]
would parse as an empty array, because the nope
would be erroneously ignored. This had fun effects as in {"foo": nope}
, where we generate a key with no associated value. Woops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. This may be worth submitting a PR to jsonsl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that, but I also discovered Roberto's existing open PR that has been sitting open for a good while 🙂.
@@ -2727,6 +2727,7 @@ test_bson_json_errors (void) | |||
"Invalid read of boolean in state IN_BSON_TYPE"}, | |||
{"{\"x\": {\"$dbPointer\": true}}", | |||
"Invalid read of boolean in state IN_BSON_TYPE_DBPOINTER_STARTMAP"}, | |||
{"[{\"$code\": {}}]", "Unexpected nested object value for \"$code\" key"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test case to reject {"$code": {}}
, which does not accept an object value. (Tests CDRIVER-3380)
src/libbson/src/bson/bson-json.c
Outdated
/* special case, we started parsing {$type: {$numberInt: "2"}} and we | ||
* expected a legacy Binary format. now we see the second "{", so | ||
* backtrack and parse $type query operator. */ | ||
bson->read_state = BSON_JSON_IN_START_MAP; | ||
bson->read_state = BSON_JSON_IN_START_MAP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bson->read_state = BSON_JSON_IN_START_MAP; |
Accidental line duplication?
src/libbson/src/bson/bson-json.c
Outdated
case BSON_JSON_LF_REGEX: // $regex lands here, but $regularExpression does | ||
// not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the significance of this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to clear up confusion for future reader, as I was initially confused by this. read_state
of BSON_JSON_IN_BSON_TYPE
is used when $regex
is found, but BSON_JSON_IN_BSON_TYPE_REGEX_STARTMAP
is used for $regularExpression
, which will instead go to below on line ~1228. They're both called "regex" in their respective enumerators, but they behave differently when parsing.
src/libbson/src/jsonsl/jsonsl.c
Outdated
bool not_u = CUR_CHAR != 'u'; | ||
bool not_a = tolower (CUR_CHAR) != 'a'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool not_u = CUR_CHAR != 'u'; | |
bool not_a = tolower (CUR_CHAR) != 'a'; | |
const bool not_u = CUR_CHAR != 'u'; | |
const bool not_a = tolower (CUR_CHAR) != 'a'; |
src/libbson/src/jsonsl/jsonsl.c
Outdated
bool not_u = CUR_CHAR != 'u'; | ||
bool not_a = tolower (CUR_CHAR) != 'a'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion: rename and redefine not_u
and not_a
as is_u
and is_a
respectively to avoid possibility of double-negatives in conditions (e.g. !not_u
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the negation here to maintain the intent of the original code, which is "if the char is not c
, unset the bit that corresponds to c
". Inverting that would be "if the char is c
, do not unset the bit that corresponds to c
", which also contains a double negative "do not unset". Storing them as variables is just to dedupe the later &&
check.
src/libbson/src/jsonsl/jsonsl.c
Outdated
if (not_u && not_a) { | ||
VERIFY_SPECIAL ("null", 4); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirming: is not_u
really supposed to be in this conditional? Given we are verifying the value to be "null"
, the check being conditioned on not_u
seems contradictory at first glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to put a comment explaining this. The VERIFY_SPECIAL
here will unconditionally generate an error about an invalid "special" token. This is to halt JSON tokenization if a bare n
is followed by anything other than a u
or an a
. The choice of "null"
as the string here is just to match what the user more likely meant to type (as NaN is an extension in jsonsl).
src/libbson/src/jsonsl/jsonsl.c
Outdated
state->special_flags &= ~JSONSL_SPECIALf_NAN; | ||
} | ||
if (not_u && not_a) { | ||
/* This veryify will always fail, as we have an 'n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* This veryify will always fail, as we have an 'n' | |
/* This verify will always fail, as we have an 'n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/libbson/src/jsonsl/jsonsl.c
Outdated
if (not_u && not_a) { | ||
VERIFY_SPECIAL ("null", 4); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. This may be worth submitting a PR to jsonsl.
Certain token sequences leave the bson-from-json parser unhappy. This addresses those.