Implement Reclass reference parsing #2

simu · 2023-08-29T12:30:18Z

The reference parser is implemented with the nom library which allows easily writing parsers through combinator functions. The parser has been implemented by replicating the grammar supported by the Python parser (cf. get_ref_parser). Notably, the combinator functions are named roughly the same as their corresponding pyparsing functions in the Python implementation.

Note that the Rust implementation doesn't yet support inventory queries (default syntax $[...]) or custom start and end markers for parameter references. However, the Reclass reference ${...} parsing should be compatible with the Python parser including the extension which allows nested references and escaping references with \.

The parser implementation comes with a large amount of unit tests for edge and corner cases which can appear when working with references.

Checklist

The PR has a meaningful title. The title will be used to auto generate the changelog
PR contains a single logical change (to build a better changelog).
Update tests.
Categorize the PR by setting a good title and adding one of the labels:
bug, enhancement, documentation, change, breaking, dependency, internal
as they show up in the changelog

The reference parser is implemented with the [nom] library which allows easily writing parsers through combinator functions. The parser has been implemented by replicating the grammar supported by the Python parser (cf. [`get_ref_parser`]). Notably, the combinator functions are named roughly the same as their corresponding `pyparsing` functions in the Python implementation. Note that the Rust implementation doesn't yet support inventory queries (default syntax `$[...]`). However, the Reclass reference `${...}` parsing should be compatible with the Python parser including the extension which allows nested references. The parser implementation comes with a large amount of unit tests for edge and corner cases which can appear when working with references. [nom]: https://docs.rs/nom [`get_ref_parser`]: https://github.com/kapicorp/reclass/blob/856b34cb77811d665c6346883238d436ac5c4924/reclass/values/parser_funcs.py#L103-L168

Provide pretty-printed form which uses nom's `convert_error` as `fmt::Display` for our custom type. This should allow us to provide reasonable errors in Python by converting the pretty-printed form into a Python exception. Note that `convert_error()` doesn't produce perfect error descriptions, but it will highlight the position in the reference which caused the error. Having the structure introduced here will also allow us to improve the error format down the line without having to change clients of the reference parser.

simu · 2023-08-29T16:21:45Z

Feel free to ping me in chat if you have questions regarding Rust or the nom library

We implement trait `fmt::Display` for the parsed Reclass reference Token type to recreate the originally parsed String including escape sequences. This replaces the ambiguously named `as_string()` method on `Token` which was only used in the parser's `coalesce_literals()` to extract the Enum value for merging adjacent Literal tokens. We now directly extract the Enum value in `coalesce_literals()`, to avoid providing a misleading function `as_string()` which produces incorrect string representations of `Token`.

bastjan

I did not understand how the escapes worked so i prodded around a bit and found two test cases that should work if I understand correctly, but they don't:

    #[test]
    fn test_double_escape_then_escape() {
        let refstr = r#"\\\${foo}"#.to_string();
        assert_eq!(
            parse_ref(&refstr),
            Ok(("", Token::literal_from_str(r"\${foo}")))
        )
    }


    #[test]
    fn test_parse_double_double_escaped_ref() {
        let refstr = r#"\\\\${foo}"#.to_string();
        assert_eq!(
            parse_ref(&refstr),
            Ok((
                "",
                Token::Combined(vec![
                    Token::literal_from_str("\\\\"),
                    Token::Ref(vec![Token::literal_from_str("foo")])
                ])
            ))
        )
    }

failures:

---- refs::parser::test_parser_funcs::test_double_escape_then_escape stdout ----
thread 'refs::parser::test_parser_funcs::test_double_escape_then_escape' panicked at 'assertion failed: `(left == right)`
  left: `Ok(("", Combined([Literal("\\\\"), Ref([Literal("foo")])])))`,
 right: `Ok(("", Literal("\\${foo}")))`', src/refs/parser.rs:408:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref stdout ----
thread 'refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref' panicked at 'assertion failed: `(left == right)`
  left: `Ok(("", Combined([Literal("\\\\\\"), Ref([Literal("foo")])])))`,
 right: `Ok(("", Combined([Literal("\\\\"), Ref([Literal("foo")])])))`', src/refs/parser.rs:432:9


failures:
    refs::parser::test_parser_funcs::test_double_escape_then_escape
    refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref

src/refs/parser.rs

simu · 2023-08-30T08:38:04Z

I did not understand how the escapes worked so i prodded around a bit and found two test cases that should work if I understand correctly, but they don't:

    #[test]
    fn test_double_escape_then_escape() {
        let refstr = r#"\\\${foo}"#.to_string();
        assert_eq!(
            parse_ref(&refstr),
            Ok(("", Token::literal_from_str(r"\${foo}")))
        )
    }


    #[test]
    fn test_parse_double_double_escaped_ref() {
        let refstr = r#"\\\\${foo}"#.to_string();
        assert_eq!(
            parse_ref(&refstr),
            Ok((
                "",
                Token::Combined(vec![
                    Token::literal_from_str("\\\\"),
                    Token::Ref(vec![Token::literal_from_str("foo")])
                ])
            ))
        )
    }

failures:

---- refs::parser::test_parser_funcs::test_double_escape_then_escape stdout ----
thread 'refs::parser::test_parser_funcs::test_double_escape_then_escape' panicked at 'assertion failed: `(left == right)`
  left: `Ok(("", Combined([Literal("\\\\"), Ref([Literal("foo")])])))`,
 right: `Ok(("", Literal("\\${foo}")))`', src/refs/parser.rs:408:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref stdout ----
thread 'refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref' panicked at 'assertion failed: `(left == right)`
  left: `Ok(("", Combined([Literal("\\\\\\"), Ref([Literal("foo")])])))`,
 right: `Ok(("", Combined([Literal("\\\\"), Ref([Literal("foo")])])))`', src/refs/parser.rs:432:9


failures:
    refs::parser::test_parser_funcs::test_double_escape_then_escape
    refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref

Looks like this currently matches the behavior of the Python Reclass parser:

simon@phoenix:~/work/syn/reclass-rs $ cat inventory/targets/n1.yml 
parameters:
  foo: qux
  bar: \\\${foo}
  baz: \\\\${foo}
simon@phoenix:~/work/syn/reclass-rs $ kapitan inventory -t n1
__reclass__:
  environment: base
  name: n1
  node: n1
  timestamp: Wed Aug 30 10:40:39 2023
  uri: yaml_fs:///home/simon/work/syn/reclass-rs/inventory/targets/n1.yml
applications: []
classes: []
environment: base
exports: {}
parameters:
  _reclass_:
    environment: base
    name:
      full: n1
      parts:
        - n1
      path: n1
      short: n1
  bar: \\qux
  baz: \\\qux
  foo: qux

Looking at the Python implementation, the escape sequences are only taken into account when they precede a reference, cf. https://github.com/kapicorp/reclass/blob/856b34cb77811d665c6346883238d436ac5c4924/reclass/values/parser_funcs.py#L109-L112. From what I understand, other backslashes don't need to be escaped, and will be consumed as single characters.

So the first example \\\${foo} is interpreted as a freestanding \ followed by a double-escaped ref_open by both implementations, and the second example \\\\${foo} is interpreted as two freestanding \ followed by a double-escaped ref_open.

I don't know if we should try to change our implementation to deviate from this behavior, as it might break existing inventories in unexpected ways.

… unreachable

bastjan

Thanks for the explanation. It does make sense to keep compatibility with the python parser.

bastjan · 2023-08-30T08:56:22Z

Thanks for the explanation. It does make sense to keep compatibility with the python parser.

It would be good to have these corner case tests with an explanation for future reference in the code base tho.

simu · 2023-08-30T09:10:35Z

Thanks for the explanation. It does make sense to keep compatibility with the python parser.

It would be good to have these corner case tests with an explanation for future reference in the code base tho.

Added 622235e

Reclass's reference parsing only requires escaping backslashes that should be literals when they precede a reference opening or closing symbol. Other backslashes don't need to be escaped. The parser will try to parse backslashes as single characters first, and will only interpret them as escape characters when they precede a reference opening or closing symbol. We add three test cases which illustrate this behavior: `\\\${foo}` which is parsed as a single `\` followed by a double-escaped reference, `\\\\${foo}` which is parsed as two `\` followed by a double-escaped reference, and `${foo\\\}` which is parsed as a reference to `foo\\`.

simu added the enhancement New feature or request label Aug 29, 2023

simu force-pushed the feat/ref-parser branch from c17243e to f205831 Compare August 29, 2023 12:49

simu added 3 commits August 29, 2023 15:55

Enable more clippy lints

9a6260e

Switch to nom's VerboseError type for all parser combinators

a6eac0a

simu force-pushed the feat/ref-parser branch 2 times, most recently from 0770b89 to 6f7814c Compare August 29, 2023 15:39

simu added 3 commits August 29, 2023 17:51

Move parse_ref() tests to refs/mod.rs

c549d90

Add tests for parsing strings which contain no references

8c48e9f

simu force-pushed the feat/ref-parser branch from 6f7814c to 002cf53 Compare August 29, 2023 15:51

simu marked this pull request as ready for review August 29, 2023 16:21

simu requested a review from a team August 29, 2023 16:21

simu added 4 commits August 29, 2023 18:45

Add docstrings in refs/parser.rs

41de0f8

Add docstrings for Token

b724867

Add docstrings for the top-level reference parsing interface

daaf5ca

simu force-pushed the feat/ref-parser branch from 002cf53 to daaf5ca Compare August 29, 2023 16:46

bastjan reviewed Aug 30, 2023

View reviewed changes

src/refs/parser.rs Outdated Show resolved Hide resolved

Replace panic!() with unreachable!() for branches which should be…

112b22a

… unreachable

bastjan approved these changes Aug 30, 2023

View reviewed changes

simu force-pushed the feat/ref-parser branch from a19155c to 622235e Compare August 30, 2023 09:18

simu merged commit f3b9eac into main Aug 30, 2023

simu deleted the feat/ref-parser branch August 30, 2023 12:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Reclass reference parsing #2

Implement Reclass reference parsing #2

simu commented Aug 29, 2023 •

edited

Loading

simu commented Aug 29, 2023 •

edited

Loading

bastjan left a comment

simu commented Aug 30, 2023 •

edited

Loading

bastjan left a comment

bastjan commented Aug 30, 2023

simu commented Aug 30, 2023 •

edited

Loading

Implement Reclass reference parsing #2

Implement Reclass reference parsing #2

Conversation

simu commented Aug 29, 2023 • edited Loading

Checklist

simu commented Aug 29, 2023 • edited Loading

bastjan left a comment

Choose a reason for hiding this comment

simu commented Aug 30, 2023 • edited Loading

bastjan left a comment

Choose a reason for hiding this comment

bastjan commented Aug 30, 2023

simu commented Aug 30, 2023 • edited Loading

simu commented Aug 29, 2023 •

edited

Loading

simu commented Aug 29, 2023 •

edited

Loading

simu commented Aug 30, 2023 •

edited

Loading

simu commented Aug 30, 2023 •

edited

Loading