Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Reclass reference parsing #2

Merged
merged 12 commits into from
Aug 30, 2023
Merged

Implement Reclass reference parsing #2

merged 12 commits into from
Aug 30, 2023

Conversation

simu
Copy link
Member

@simu simu commented Aug 29, 2023

The reference parser is implemented with the nom library which allows easily writing parsers through combinator functions. The parser has been implemented by replicating the grammar supported by the Python parser (cf. get_ref_parser). Notably, the combinator functions are named roughly the same as their corresponding pyparsing functions in the Python implementation.

Note that the Rust implementation doesn't yet support inventory queries (default syntax $[...]) or custom start and end markers for parameter references. However, the Reclass reference ${...} parsing should be compatible with the Python parser including the extension which allows nested references and escaping references with \.

The parser implementation comes with a large amount of unit tests for edge and corner cases which can appear when working with references.

Checklist

  • The PR has a meaningful title. The title will be used to auto generate the changelog
  • PR contains a single logical change (to build a better changelog).
  • Update tests.
  • Categorize the PR by setting a good title and adding one of the labels:
    bug, enhancement, documentation, change, breaking, dependency, internal
    as they show up in the changelog

@simu simu added the enhancement New feature or request label Aug 29, 2023
@simu simu force-pushed the feat/ref-parser branch from c17243e to f205831 Compare August 29, 2023 12:49
simu added 3 commits August 29, 2023 15:55
The reference parser is implemented with the [nom] library which allows
easily writing parsers through combinator functions. The parser has been
implemented by replicating the grammar supported by the Python parser
(cf. [`get_ref_parser`]). Notably, the combinator functions are named
roughly the same as their corresponding `pyparsing` functions in the
Python implementation.

Note that the Rust implementation doesn't yet support inventory queries
(default syntax `$[...]`). However, the Reclass reference `${...}`
parsing should be compatible with the Python parser including the
extension which allows nested references.

The parser implementation comes with a large amount of unit tests for
edge and corner cases which can appear when working with references.

[nom]: https://docs.rs/nom
[`get_ref_parser`]: https://github.com/kapicorp/reclass/blob/856b34cb77811d665c6346883238d436ac5c4924/reclass/values/parser_funcs.py#L103-L168
@simu simu force-pushed the feat/ref-parser branch 2 times, most recently from 0770b89 to 6f7814c Compare August 29, 2023 15:39
simu added 3 commits August 29, 2023 17:51
Provide pretty-printed form which uses nom's `convert_error` as
`fmt::Display` for our custom type. This should allow us to provide
reasonable errors in Python by converting the pretty-printed form into a
Python exception.

Note that `convert_error()` doesn't produce perfect error descriptions,
but it will highlight the position in the reference which caused the
error. Having the structure introduced here will also allow us to
improve the error format down the line without having to change clients
of the reference parser.
@simu simu force-pushed the feat/ref-parser branch from 6f7814c to 002cf53 Compare August 29, 2023 15:51
@simu simu marked this pull request as ready for review August 29, 2023 16:21
@simu simu requested a review from a team August 29, 2023 16:21
@simu
Copy link
Member Author

simu commented Aug 29, 2023

Feel free to ping me in chat if you have questions regarding Rust or the nom library

simu added 4 commits August 29, 2023 18:45
We implement trait `fmt::Display` for the parsed Reclass reference Token
type to recreate the originally parsed String including escape
sequences. This replaces the ambiguously named `as_string()` method on
`Token` which was only used in the parser's `coalesce_literals()` to
extract the Enum value for merging adjacent Literal tokens.

We now directly extract the Enum value in `coalesce_literals()`, to
avoid providing a misleading function `as_string()` which produces
incorrect string representations of `Token`.
@simu simu force-pushed the feat/ref-parser branch from 002cf53 to daaf5ca Compare August 29, 2023 16:46
Copy link

@bastjan bastjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand how the escapes worked so i prodded around a bit and found two test cases that should work if I understand correctly, but they don't:

    #[test]
    fn test_double_escape_then_escape() {
        let refstr = r#"\\\${foo}"#.to_string();
        assert_eq!(
            parse_ref(&refstr),
            Ok(("", Token::literal_from_str(r"\${foo}")))
        )
    }


    #[test]
    fn test_parse_double_double_escaped_ref() {
        let refstr = r#"\\\\${foo}"#.to_string();
        assert_eq!(
            parse_ref(&refstr),
            Ok((
                "",
                Token::Combined(vec![
                    Token::literal_from_str("\\\\"),
                    Token::Ref(vec![Token::literal_from_str("foo")])
                ])
            ))
        )
    }
failures:

---- refs::parser::test_parser_funcs::test_double_escape_then_escape stdout ----
thread 'refs::parser::test_parser_funcs::test_double_escape_then_escape' panicked at 'assertion failed: `(left == right)`
  left: `Ok(("", Combined([Literal("\\\\"), Ref([Literal("foo")])])))`,
 right: `Ok(("", Literal("\\${foo}")))`', src/refs/parser.rs:408:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref stdout ----
thread 'refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref' panicked at 'assertion failed: `(left == right)`
  left: `Ok(("", Combined([Literal("\\\\\\"), Ref([Literal("foo")])])))`,
 right: `Ok(("", Combined([Literal("\\\\"), Ref([Literal("foo")])])))`', src/refs/parser.rs:432:9


failures:
    refs::parser::test_parser_funcs::test_double_escape_then_escape
    refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref

@simu
Copy link
Member Author

simu commented Aug 30, 2023

I did not understand how the escapes worked so i prodded around a bit and found two test cases that should work if I understand correctly, but they don't:

    #[test]
    fn test_double_escape_then_escape() {
        let refstr = r#"\\\${foo}"#.to_string();
        assert_eq!(
            parse_ref(&refstr),
            Ok(("", Token::literal_from_str(r"\${foo}")))
        )
    }


    #[test]
    fn test_parse_double_double_escaped_ref() {
        let refstr = r#"\\\\${foo}"#.to_string();
        assert_eq!(
            parse_ref(&refstr),
            Ok((
                "",
                Token::Combined(vec![
                    Token::literal_from_str("\\\\"),
                    Token::Ref(vec![Token::literal_from_str("foo")])
                ])
            ))
        )
    }
failures:

---- refs::parser::test_parser_funcs::test_double_escape_then_escape stdout ----
thread 'refs::parser::test_parser_funcs::test_double_escape_then_escape' panicked at 'assertion failed: `(left == right)`
  left: `Ok(("", Combined([Literal("\\\\"), Ref([Literal("foo")])])))`,
 right: `Ok(("", Literal("\\${foo}")))`', src/refs/parser.rs:408:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref stdout ----
thread 'refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref' panicked at 'assertion failed: `(left == right)`
  left: `Ok(("", Combined([Literal("\\\\\\"), Ref([Literal("foo")])])))`,
 right: `Ok(("", Combined([Literal("\\\\"), Ref([Literal("foo")])])))`', src/refs/parser.rs:432:9


failures:
    refs::parser::test_parser_funcs::test_double_escape_then_escape
    refs::parser::test_parser_funcs::test_parse_double_double_escaped_ref

Looks like this currently matches the behavior of the Python Reclass parser:

simon@phoenix:~/work/syn/reclass-rs $ cat inventory/targets/n1.yml 
parameters:
  foo: qux
  bar: \\\${foo}
  baz: \\\\${foo}
simon@phoenix:~/work/syn/reclass-rs $ kapitan inventory -t n1
__reclass__:
  environment: base
  name: n1
  node: n1
  timestamp: Wed Aug 30 10:40:39 2023
  uri: yaml_fs:///home/simon/work/syn/reclass-rs/inventory/targets/n1.yml
applications: []
classes: []
environment: base
exports: {}
parameters:
  _reclass_:
    environment: base
    name:
      full: n1
      parts:
        - n1
      path: n1
      short: n1
  bar: \\qux
  baz: \\\qux
  foo: qux

Looking at the Python implementation, the escape sequences are only taken into account when they precede a reference, cf. https://github.com/kapicorp/reclass/blob/856b34cb77811d665c6346883238d436ac5c4924/reclass/values/parser_funcs.py#L109-L112. From what I understand, other backslashes don't need to be escaped, and will be consumed as single characters.

So the first example \\\${foo} is interpreted as a freestanding \ followed by a double-escaped ref_open by both implementations, and the second example \\\\${foo} is interpreted as two freestanding \ followed by a double-escaped ref_open.

I don't know if we should try to change our implementation to deviate from this behavior, as it might break existing inventories in unexpected ways.

Copy link

@bastjan bastjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. It does make sense to keep compatibility with the python parser.

@bastjan
Copy link

bastjan commented Aug 30, 2023

Thanks for the explanation. It does make sense to keep compatibility with the python parser.

It would be good to have these corner case tests with an explanation for future reference in the code base tho.

@simu
Copy link
Member Author

simu commented Aug 30, 2023

Thanks for the explanation. It does make sense to keep compatibility with the python parser.

It would be good to have these corner case tests with an explanation for future reference in the code base tho.

Added 622235e

Reclass's reference parsing only requires escaping backslashes that
should be literals when they precede a reference opening or closing
symbol. Other backslashes don't need to be escaped. The parser will try
to parse backslashes as single characters first, and will only interpret
them as escape characters when they precede a reference opening or
closing symbol.

We add three test cases which illustrate this behavior: `\\\${foo}`
which is parsed as a single `\` followed by a double-escaped reference,
`\\\\${foo}` which is parsed as two `\` followed by a double-escaped
reference, and `${foo\\\}` which is parsed as a reference to `foo\\`.
@simu simu force-pushed the feat/ref-parser branch from a19155c to 622235e Compare August 30, 2023 09:18
@simu simu merged commit f3b9eac into main Aug 30, 2023
@simu simu deleted the feat/ref-parser branch August 30, 2023 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants