-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use regress
to implement JS regex usage for pattern
and patternProperties
+ use unicode mode regexes by default
#511
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Docs cover the expected usage, which is not implemented in this initial change. The rename and alias is all that's covered here.
When an alternate RegexImplementation is selected, it will be passed down to the code which builds a Validator. The Validator is extended with the keyword validator provided by the RegexImplementation. Because this uses the `extend()` interface, a test which subclassed a validator broke -- this is documented in `jsonschema` as unsupported usage, so the test simply had to be updated to use supported interfaces.
Change RegexImplementation to dispatch between variants at init time, so that each call to the relevant methods is faster and does not re-check the variant. Minor tool config fixes: - flake8 ignore vs extend-ignore (whoops) - tox.ini depends needed to run in parallel
This allows the `check_schema` call to succeed when it requires that the regex implementation is regress. Patterns which can compile as regress patterns are thereby supported. Add initial acceptance tests for regress 'pattern' support, which revealed this issue.
This is a variant of the regress engine usage in which the `u` flag is not set. It is needed for usages which rely on unicode escapes *not* being interpreted. In particular, the azure pipelines schema won't pass metaschema validation (let alone apply correctly) if `pattern` interpretation uses unicode-mode regexes.
Primarily, make the regress forms inherit and share most of their code.
sirosen
force-pushed
the
use-regress-for-patterns
branch
from
January 1, 2025 19:53
bac71eb
to
e9aa64e
Compare
Rather than relying directly on `validator_cls.check_schema`, reimplement this functionality in-tree here to allow for customizations to the validator class before it is run on the user's schema. A new test uses an invalid regex under the regress unicode engine which is valid in the python engine to ensure that consistent checking is applied. Manual testing revealed that the `_fail()` message production for SchemaError was showing error information twice, which is fixed without a new test to guarantee the new behavior.
sirosen
force-pushed
the
use-regress-for-patterns
branch
from
January 8, 2025 03:24
62aedbd
to
cc12d98
Compare
In the implementation prior to this change, passing `--disable-formats` would impact not only the "actual" schema validator, but also the validator built to evaluate the schema against its metaschema. As a result, `--disable-formats *` and similar would enable schemas to run which previously should have been caught as invalid. Furthermore, the customized format checker which had extensions for date and time evaluation added was used, and any other customizations to format checking would implicitly be shared with the metaschema check. To resolve, refactor format checker building to allow it to be used more directly for the metaschema check, and add test cases to confirm that a bad regex in a `pattern` is always rejected, even when `--disable-formats regex` or similar is used.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changelog
Regular expression interpretation in
"pattern"
,"patternProperties"
, and"format": "regex"
usages now uses unicode-mode JS regular expressions by default. (resolves Support ECMAScript unicode-mode RegExp usage for 'pattern' and 'patternProperties' #353)--regex-variant nonunicode
to get non-unicode JS regular expressions, the default behavior from previous versions.jsonschema
library'sextend()
API to control thepattern
andpatternProperties
keywords.Notes
Right now, this is still usingValidator.check_schema
to do metaschema checks, which means the regex engine for metaschema checking is always python.When format checking is disabled, we currently fall-back to using the python-based format checker. This is a subtle flaw, in that it would be better to use the desired regex engine even when format checking is disabled.This issue is addressed as well.