Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CDRIVER-3380] Fixup JSON specials and decimal128 parsing #988

Merged
merged 6 commits into from
May 11, 2022

Conversation

vector-of-bool
Copy link
Collaborator

Certain token sequences leave the bson-from-json parser unhappy. This addresses those.

@@ -542,7 +542,7 @@ bson_decimal128_from_string_w_len (const char *string, /* IN */
continue;
}

if (ndigits_stored < 34) {
if (ndigits_stored < BSON_DECIMAL128_MAX_DIGITS) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No functional change, just replace a magic number.

@@ -634,7 +634,7 @@ bson_decimal128_from_string_w_len (const char *string, /* IN */
/* Shift exponent to significand and decrease */
last_digit++;

if (last_digit - first_digit > BSON_DECIMAL128_MAX_DIGITS) {
if (last_digit - first_digit >= BSON_DECIMAL128_MAX_DIGITS) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If last_digit - first_digit is exactly 34 (BSON_DECIMAL128_MAX_DIGITS), we want to treat that as out-of-bounds as well.

Comment on lines 1194 to 1214
case BSON_JSON_LF_CODE:
case BSON_JSON_LF_DECIMAL128:
case BSON_JSON_LF_DOUBLE:
case BSON_JSON_LF_INT32:
case BSON_JSON_LF_INT64:
case BSON_JSON_LF_MAXKEY:
case BSON_JSON_LF_MINKEY:
case BSON_JSON_LF_OID:
case BSON_JSON_LF_OPTIONS:
case BSON_JSON_LF_REGEX: // $regex lands here, but $regularExpression does
// not
case BSON_JSON_LF_REGULAR_EXPRESSION_OPTIONS:
case BSON_JSON_LF_REGULAR_EXPRESSION_PATTERN:
case BSON_JSON_LF_SYMBOL:
case BSON_JSON_LF_UNDEFINED:
case BSON_JSON_LF_UUID:
_bson_json_read_set_error (
reader,
"Unexpected nested object value for \"%s\" key",
reader->bson.unescaped.buf);
break;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all of these cases, we are parsing a special "$dollar" object key that does not want a nested sub-object value. Previously, this would fall through and it would later treat the closing brace } of the sub-object as the closing of the entire special object itself. This fixes CDRIVER-3380.

Comment on lines 468 to 470
if (not_u && not_a) {
VERIFY_SPECIAL ("null", 4);
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a bug in NaN/null parsing in jsonsl. Previously: Any byte with value 'n' would then be expected to be followed by a u or an A/a character, but this check did not error if 'n' was followed by anything else, and the contiguous sequence of characters would be thrown away and ignored. That is, the string [nope] would parse as an empty array, because the nope would be erroneously ignored. This had fun effects as in {"foo": nope}, where we generate a key with no associated value. Woops.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. This may be worth submitting a PR to jsonsl.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but I also discovered Roberto's existing open PR that has been sitting open for a good while 🙂.

@@ -2727,6 +2727,7 @@ test_bson_json_errors (void)
"Invalid read of boolean in state IN_BSON_TYPE"},
{"{\"x\": {\"$dbPointer\": true}}",
"Invalid read of boolean in state IN_BSON_TYPE_DBPOINTER_STARTMAP"},
{"[{\"$code\": {}}]", "Unexpected nested object value for \"$code\" key"},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test case to reject {"$code": {}}, which does not accept an object value. (Tests CDRIVER-3380)

@vector-of-bool vector-of-bool changed the title Fixup JSON specials and decimal128 parsing [CDRIVER-3380] Fixup JSON specials and decimal128 parsing May 10, 2022
/* special case, we started parsing {$type: {$numberInt: "2"}} and we
* expected a legacy Binary format. now we see the second "{", so
* backtrack and parse $type query operator. */
bson->read_state = BSON_JSON_IN_START_MAP;
bson->read_state = BSON_JSON_IN_START_MAP;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bson->read_state = BSON_JSON_IN_START_MAP;

Accidental line duplication?

Comment on lines 1203 to 1204
case BSON_JSON_LF_REGEX: // $regex lands here, but $regularExpression does
// not
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the significance of this comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to clear up confusion for future reader, as I was initially confused by this. read_state of BSON_JSON_IN_BSON_TYPE is used when $regex is found, but BSON_JSON_IN_BSON_TYPE_REGEX_STARTMAP is used for $regularExpression, which will instead go to below on line ~1228. They're both called "regex" in their respective enumerators, but they behave differently when parsing.

Comment on lines 460 to 461
bool not_u = CUR_CHAR != 'u';
bool not_a = tolower (CUR_CHAR) != 'a';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool not_u = CUR_CHAR != 'u';
bool not_a = tolower (CUR_CHAR) != 'a';
const bool not_u = CUR_CHAR != 'u';
const bool not_a = tolower (CUR_CHAR) != 'a';

Comment on lines 460 to 461
bool not_u = CUR_CHAR != 'u';
bool not_a = tolower (CUR_CHAR) != 'a';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion: rename and redefine not_u and not_a as is_u and is_a respectively to avoid possibility of double-negatives in conditions (e.g. !not_u).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the negation here to maintain the intent of the original code, which is "if the char is not c, unset the bit that corresponds to c". Inverting that would be "if the char is c, do not unset the bit that corresponds to c", which also contains a double negative "do not unset". Storing them as variables is just to dedupe the later && check.

Comment on lines 468 to 470
if (not_u && not_a) {
VERIFY_SPECIAL ("null", 4);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming: is not_u really supposed to be in this conditional? Given we are verifying the value to be "null", the check being conditioned on not_u seems contradictory at first glance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to put a comment explaining this. The VERIFY_SPECIAL here will unconditionally generate an error about an invalid "special" token. This is to halt JSON tokenization if a bare n is followed by anything other than a u or an a. The choice of "null" as the string here is just to match what the user more likely meant to type (as NaN is an extension in jsonsl).

@vector-of-bool vector-of-bool requested a review from eramongodb May 11, 2022 18:45
state->special_flags &= ~JSONSL_SPECIALf_NAN;
}
if (not_u && not_a) {
/* This veryify will always fail, as we have an 'n'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* This veryify will always fail, as we have an 'n'
/* This verify will always fail, as we have an 'n'

Copy link
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 468 to 470
if (not_u && not_a) {
VERIFY_SPECIAL ("null", 4);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. This may be worth submitting a PR to jsonsl.

@vector-of-bool vector-of-bool merged commit c9add86 into mongodb:master May 11, 2022
@vector-of-bool vector-of-bool deleted the fixup-json-specials branch May 25, 2022 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants