Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the low-level SMT parser #35

Open
AdrienChampion opened this issue Aug 15, 2022 · 4 comments
Open

Rewrite the low-level SMT parser #35

AdrienChampion opened this issue Aug 15, 2022 · 4 comments

Comments

@AdrienChampion
Copy link
Contributor

SmtParser uses a handmade parser that is showing its limits more and more (#33)

Would love to have something cleaner, though it would need to be as efficient as the current parser. Needs investigation (peg, nom, ?) and some benchmarking.

@filippodebortoli
Copy link

A PEG for SMT-LIB 2.6 (syntax up to Section 3.5 of the standard document) in pest could look like this:

WHITESPACE = _{ "\t" | "\n" | "\r" | " " }
COMMENT = _{ ";" ~ ANY* ~ NEWLINE }

printable_char = @{ '\x20'..'\x7e' | '\x80'..'\x8f' }
numeral = @{ "0" | ('1'..'9') ~ ASCII_DIGIT* }
decimal = @{ numeral ~ "." ~ "0"* ~ numeral }
hexadecimal = @{ "#x" ~ ASCII_HEX_DIGIT+ }
binary = @{ "#b" ~ ('0'..'1')+}
escaped_double_quote = _{ "\"\"" }
string = @{ "\"" ~ (printable_char | WHITESPACE | escaped_double_quote)* "\"" }
reserved_word = @{ "BINARY" | "DECIMAL" | "HEXADECIMAL" | "NUMERAL" | "STRING" | "_" | "!" | "as" | "let" | "exists" | "forall" | "match" | "par" }
char_list = { "~" | "!" | "@" | "$" | "%" | "^" | "&" | "*" | "_" | "-" | "+" | "=" | "<" | ">" | "." | "?" | "/"}
sym_simple = @{ (ASCII_ALPHA | char_list) ~ (ASCII_ALPHANUMERIC | char_list)* }
sym_quoted = @{ "|" ~ (!("\\" | "|") ~ (WHITESPACE | printable_char))* ~ "|"}
symbol = { sym_simple | sym_quoted }
keyword = {":" ~ sym_simple }
spec_constant = _{ numeral | decimal | hexadecimal | binary | string }
s_expr = @{ spec_constant | symbol | reserved_word | keyword | "(" ~ s_expr* ~ ")" }
index = { numeral | symbol }
identifier = { symbol | "(_ " ~ symbol ~ " " ~ index+ ")" }
attribute_value = @{ spec_constant | symbol | "(" ~ s_expr* ~ ")" }
attribute = @{ keyword | keyword ~ " " ~ attribute_value}
sort = @{ identifier | "(" ~ identifier ~ " " ~ sort+ ")" }

The next steps would be to encode syntax for: Terms and formulas, theory declarations, logic declarations, scripts.
While writing the grammar file could be somewhat easy, I'd expect the task of rewriting the parsing functions to be cumbersome.

@AdrienChampion
Copy link
Contributor Author

@filippodebortoli are you confident that peg would be as efficient as a handmade parser?

Generally one ignores such concerns because printing/parsing time should be (very) small compared to solving time. The issue is that efficient verification tools go to great length to issue checks that are as small as possible, in some cases extremely small.

Usually, this means that the tool issues a lot of small checks as opposed to one/a few big one(s). In this context I think parsing time can be a concern, but that's just my opinion 🐱

@filippodebortoli
Copy link

@filippodebortoli are you confident that peg would be as efficient as a handmade parser?

I am not confident enough to claim that it would yield a parser that competes with or outperforms the current, handmade one. I have the feeling that it would simplify a lot the parser's implementation and make it easier to maintain.
There is one clear drawback: the parser provided by pest only accepts str as input, which is far more restrictive than what SmtParser currently allows for.

@AdrienChampion
Copy link
Contributor Author

I wrote this a looonnng time ago, I need to take a look but right now I lean more towards keeping the handwritten parser, especially since it's quite small. I don't remember what the parser's code looks like but I expect that its maintainability can be improved greatly :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants