Skip to content

Commit

Permalink
Add fix and tests for \x with no hex digits
Browse files Browse the repository at this point in the history
  • Loading branch information
NWilson committed Sep 17, 2024
1 parent 5e75d9b commit 4d2bb33
Show file tree
Hide file tree
Showing 7 changed files with 58 additions and 14 deletions.
23 changes: 21 additions & 2 deletions src/pcre2_compile.c
Original file line number Diff line number Diff line change
Expand Up @@ -2093,10 +2093,29 @@ else

else
{
c = 0;
if (ptr >= ptrend || (cc = XDIGIT(*ptr)) == 0xff) break; /* Not a hex digit */
/* Perl has the surprising/broken behaviour that \x without following
hex digits is treated as an escape for NUL. Their source code laments
this but keeps it for backwards compatibility. A warning is printed
when "use warnings" is enabled, and it's forbidden when "use strict"
is enabled. Because we don't have warnings, we simply forbid it. */
if (ptr >= ptrend || (cc = XDIGIT(*ptr)) == 0xff)
{
/* Not a hex digit */
*errorcodeptr = ERR78;
break;
}
ptr++;
c = cc;

/* With "use re 'strict'" Perl actually requires exactly two digits (error
for both \xA and \xAAA). This seems overly strict, and there seems
little incentive to align with that, given the backwards-compatibility
cost.
For comparison, note that other engines disagree. For example:
- Java allows 1 or 2 hex digits. Error if 0 digits. No error if >2 digits
- .NET requires 2 hex digits. Error if 0, 1 digits. No error if >2 digits.
*/
if (ptr >= ptrend || (cc = XDIGIT(*ptr)) == 0xff) break; /* Not a hex digit */
ptr++;
c = (c << 4) | cc;
Expand Down
2 changes: 1 addition & 1 deletion src/pcre2_error.c
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ static const unsigned char compile_error_texts[] =
"using UCP is disabled by the application\0"
"name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)\0"
"character code point value in \\u.... sequence is too large\0"
"digits missing in \\x{} or \\o{} or \\N{U+}\0"
"digits missing after \\x or in \\x{} or \\o{} or \\N{U+}\0"
"syntax error or number too big in (?(VERSION condition\0"
/* 80 */
"internal error: unknown opcode in auto_possessify()\0"
Expand Down
12 changes: 9 additions & 3 deletions testdata/testinput1
Original file line number Diff line number Diff line change
Expand Up @@ -5610,9 +5610,15 @@ name)/mark
/^ (?:(?<A>A)|(?'B'B)(?<A>A)) (?('A')x) (?(<B>)y)$/x,dupnames
Ax
BAxy

/^A\xZ/
A\0Z

/^A\xBz/
A\x{0B}z

/^A\xABz/
A\x{AB}z

/^A\xABCz/
A\x{AB}Cz

/^A\o{123}B/
A\123B
Expand Down
4 changes: 4 additions & 0 deletions testdata/testinput2
Original file line number Diff line number Diff line change
Expand Up @@ -3983,6 +3983,10 @@

/\xthing/

/^A\xZ/

/^A\x/

/\x{}/

/\x{whatever}/
Expand Down
16 changes: 12 additions & 4 deletions testdata/testoutput1
Original file line number Diff line number Diff line change
Expand Up @@ -8919,10 +8919,18 @@ No match
1: <unset>
2: B
3: A

/^A\xZ/
A\0Z
0: A\x00Z

/^A\xBz/
A\x{0B}z
0: A\x0bz

/^A\xABz/
A\x{AB}z
0: A\xabz

/^A\xABCz/
A\x{AB}Cz
0: A\xabCz

/^A\o{123}B/
A\123B
Expand Down
13 changes: 10 additions & 3 deletions testdata/testoutput2
Original file line number Diff line number Diff line change
Expand Up @@ -13239,7 +13239,7 @@ Failed: error 167 at offset 5: non-hex character in \x{} (closing brace missing?
Failed: error 167 at offset 7: non-hex character in \x{} (closing brace missing?)

/^A\x{/
Failed: error 178 at offset 5: digits missing in \x{} or \o{} or \N{U+}
Failed: error 178 at offset 5: digits missing after \x or in \x{} or \o{} or \N{U+}

/[ab]++/B,no_auto_possess
------------------------------------------------------------------
Expand Down Expand Up @@ -13451,15 +13451,22 @@ Failed: error 133 at offset 7: parentheses are too deeply nested (stack check)
Failed: error 155 at offset 2: missing opening brace after \o

/\o{}/
Failed: error 178 at offset 3: digits missing in \x{} or \o{} or \N{U+}
Failed: error 178 at offset 3: digits missing after \x or in \x{} or \o{} or \N{U+}

/\o{whatever}/
Failed: error 164 at offset 3: non-octal character in \o{} (closing brace missing?)

/\xthing/
Failed: error 178 at offset 2: digits missing after \x or in \x{} or \o{} or \N{U+}

/^A\xZ/
Failed: error 178 at offset 4: digits missing after \x or in \x{} or \o{} or \N{U+}

/^A\x/
Failed: error 178 at offset 4: digits missing after \x or in \x{} or \o{} or \N{U+}

/\x{}/
Failed: error 178 at offset 3: digits missing in \x{} or \o{} or \N{U+}
Failed: error 178 at offset 3: digits missing after \x or in \x{} or \o{} or \N{U+}

/\x{whatever}/
Failed: error 167 at offset 3: non-hex character in \x{} (closing brace missing?)
Expand Down
2 changes: 1 addition & 1 deletion testdata/testoutput5
Original file line number Diff line number Diff line change
Expand Up @@ -4702,7 +4702,7 @@ Callout 0: last capture = 1
Failed: error 193 at offset 2: \N{U+dddd} is supported only in Unicode (UTF) mode

/\N{U+}/utf
Failed: error 178 at offset 5: digits missing in \x{} or \o{} or \N{U+}
Failed: error 178 at offset 5: digits missing after \x or in \x{} or \o{} or \N{U+}

/\N{U}/
Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u
Expand Down

0 comments on commit 4d2bb33

Please sign in to comment.