-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Reclass reference parsing #2
Conversation
The reference parser is implemented with the [nom] library which allows easily writing parsers through combinator functions. The parser has been implemented by replicating the grammar supported by the Python parser (cf. [`get_ref_parser`]). Notably, the combinator functions are named roughly the same as their corresponding `pyparsing` functions in the Python implementation. Note that the Rust implementation doesn't yet support inventory queries (default syntax `$[...]`). However, the Reclass reference `${...}` parsing should be compatible with the Python parser including the extension which allows nested references. The parser implementation comes with a large amount of unit tests for edge and corner cases which can appear when working with references. [nom]: https://docs.rs/nom [`get_ref_parser`]: https://github.com/kapicorp/reclass/blob/856b34cb77811d665c6346883238d436ac5c4924/reclass/values/parser_funcs.py#L103-L168
0770b89
to
6f7814c
Compare
Provide pretty-printed form which uses nom's `convert_error` as `fmt::Display` for our custom type. This should allow us to provide reasonable errors in Python by converting the pretty-printed form into a Python exception. Note that `convert_error()` doesn't produce perfect error descriptions, but it will highlight the position in the reference which caused the error. Having the structure introduced here will also allow us to improve the error format down the line without having to change clients of the reference parser.
Feel free to ping me in chat if you have questions regarding Rust or the |
We implement trait `fmt::Display` for the parsed Reclass reference Token type to recreate the originally parsed String including escape sequences. This replaces the ambiguously named `as_string()` method on `Token` which was only used in the parser's `coalesce_literals()` to extract the Enum value for merging adjacent Literal tokens. We now directly extract the Enum value in `coalesce_literals()`, to avoid providing a misleading function `as_string()` which produces incorrect string representations of `Token`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not understand how the escapes worked so i prodded around a bit and found two test cases that should work if I understand correctly, but they don't:
#[test]
fn test_double_escape_then_escape() {
let refstr = r#"\\\${foo}"#.to_string();
assert_eq!(
parse_ref(&refstr),
Ok(("", Token::literal_from_str(r"\${foo}")))
)
}
#[test]
fn test_parse_double_double_escaped_ref() {
let refstr = r#"\\\\${foo}"#.to_string();
assert_eq!(
parse_ref(&refstr),
Ok((
"",
Token::Combined(vec![
Token::literal_from_str("\\\\"),
Token::Ref(vec![Token::literal_from_str("foo")])
])
))
)
}
failures:
---- refs::parser::test_parser_funcs::test_double_escape_then_escape stdout ----
thread 'refs::parser::test_parser_funcs::test_double_escape_then_escape' panicked at 'assertion failed: `(left == right)`
left: `Ok(("", Combined([Literal("\\\\"), Ref([Literal("foo")])])))`,
right: `Ok(("", Literal("\\${foo}")))`', src/refs/parser.rs:408:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref stdout ----
thread 'refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref' panicked at 'assertion failed: `(left == right)`
left: `Ok(("", Combined([Literal("\\\\\\"), Ref([Literal("foo")])])))`,
right: `Ok(("", Combined([Literal("\\\\"), Ref([Literal("foo")])])))`', src/refs/parser.rs:432:9
failures:
refs::parser::test_parser_funcs::test_double_escape_then_escape
refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref
Looks like this currently matches the behavior of the Python Reclass parser:
Looking at the Python implementation, the escape sequences are only taken into account when they precede a reference, cf. https://github.com/kapicorp/reclass/blob/856b34cb77811d665c6346883238d436ac5c4924/reclass/values/parser_funcs.py#L109-L112. From what I understand, other backslashes don't need to be escaped, and will be consumed as single characters. So the first example I don't know if we should try to change our implementation to deviate from this behavior, as it might break existing inventories in unexpected ways. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. It does make sense to keep compatibility with the python parser.
It would be good to have these corner case tests with an explanation for future reference in the code base tho. |
Added 622235e |
Reclass's reference parsing only requires escaping backslashes that should be literals when they precede a reference opening or closing symbol. Other backslashes don't need to be escaped. The parser will try to parse backslashes as single characters first, and will only interpret them as escape characters when they precede a reference opening or closing symbol. We add three test cases which illustrate this behavior: `\\\${foo}` which is parsed as a single `\` followed by a double-escaped reference, `\\\\${foo}` which is parsed as two `\` followed by a double-escaped reference, and `${foo\\\}` which is parsed as a reference to `foo\\`.
The reference parser is implemented with the nom library which allows easily writing parsers through combinator functions. The parser has been implemented by replicating the grammar supported by the Python parser (cf.
get_ref_parser
). Notably, the combinator functions are named roughly the same as their correspondingpyparsing
functions in the Python implementation.Note that the Rust implementation doesn't yet support inventory queries (default syntax
$[...]
) or custom start and end markers for parameter references. However, the Reclass reference${...}
parsing should be compatible with the Python parser including the extension which allows nested references and escaping references with\
.The parser implementation comes with a large amount of unit tests for edge and corner cases which can appear when working with references.
Checklist
bug
,enhancement
,documentation
,change
,breaking
,dependency
,internal
as they show up in the changelog