parse expressions outside of functions #20

the-mikedavis · 2022-02-28T03:00:10Z

This PR is actually more of a question than a solution 😅

Parsing expressions in the top-level (under source_file, outside of functions where they would normally be parsed) is advantageous when injecting Gleam, as you might do with markdown. This is pertinent to #14 because as far as I know, GitHub uses the tree-sitter parser for markdown's fenced code blocks. For example in #19, I have a code block

```gleam
<<code:int-size(8)-unit(2), reason:utf8>>
```

Which is not a valid gleam program but is a valid gleam "snippet" so to speak.

What would you think about allowing expressions in the top-level like this?

(I didn't really look at those changed import tests yet, looks like some error nodes made their way into the tests and this PR ends up changing the error behavior.)

J3RN · 2022-03-04T00:41:30Z

Sorry for my delayed response!

Generally speaking, I'd like to follow the example of the first-party tree-sitter grammars, as they're probably the best "documentation" for tree-sitter grammars. As far as those go, Ruby, JavaScript, and Elixir all actually allow top-level expressions. I think I'll have to look at Rust and/or Go to see if either A) top level expressions are allowed in the language (I don't think so?) and B) if the tree-sitter grammar allows top-level expressions. This is on my to-do list 😅

looks like some error nodes made their way into the tests and this PR ends up changing the error behavior.

At some point I thought it was a good idea to test the error cases to ensure that the errors made sense or were helpful. Unfortunately tree-sitter gives us very little control over this, and it's probably a bad idea after all. All this to say, that test (or tests) should be fine to remove 👍

the-mikedavis · 2022-03-12T01:02:59Z

I did some looking around and it looks like

tree-sitter-go allows them via _statement -> _simple_statement -> _expression
tree-sitter-rust allows them through expression_statement -> _expression

For rust it looks like tree-sitter-rust is permissive about top-level expressions where the rust language is not. E.g.

let x = 2 + 3;
fn main() {
    println!("Hello, world!");
}

gives an error

error: expected item, found keyword `let`
 --> src/main.rs:1:1
  |
1 | let x = 2 + 3;
  | ^^^ expected item

error: could not compile `foo` due to previous error

but is parsed successfully by tree-sitter-rust...

(source_file [0, 0] - [4, 0]
  (let_declaration [0, 0] - [0, 14]
    pattern: (identifier [0, 4] - [0, 5])
    value: (binary_expression [0, 8] - [0, 13]
      left: (integer_literal [0, 8] - [0, 9])
      right: (integer_literal [0, 12] - [0, 13])))
  (function_item [1, 0] - [3, 1]
    name: (identifier [1, 3] - [1, 7])
    parameters: (parameters [1, 7] - [1, 9])
    body: (block [1, 10] - [3, 1]
      (expression_statement [2, 4] - [2, 30]
        (macro_invocation [2, 4] - [2, 29]
          macro: (identifier [2, 4] - [2, 11])
          (token_tree [2, 12] - [2, 29]
            (string_literal [2, 13] - [2, 28])))))))

I don't have a compiler tool-chain setup for go though so I'm not sure about that one 😅

This change allows the parser to return valid nodes for expressions on the "top-level" of a document. Here "top-level" is read as "not within a function." This is actually invalid Gleam code: for example, you cannot write a `case/2` statement outside of a function body. This is desirable for the tree-sitter parser, though, because the parser will end up being used in flexible situations, such as one-off highlights in fenced markdown blocks, e.g.: ```gleam <<code:int-size(8)-unit(2), reason:utf8>> ``` Which is a common usage in an editor, or on GitHub.

J3RN

This looks good to me! 🎉 I'm leaving this open for the moment in case you just want to remove those error tests which are likely not useful 😁

J3RN · 2022-03-24T19:26:49Z

test/corpus/imports.txt

+    module: (module)
+    (ERROR)
+    imports: (unqualified_imports
+      (unqualified_import
+        name: (identifier))
+      (unqualified_import
+        name: (identifier)))))


We should probably just remove this test 😁

J3RN · 2022-03-24T19:27:08Z

test/corpus/imports.txt

  (ERROR)
+  (record
+    name: (type_identifier))


This one also 😅

the-mikedavis · 2022-03-24T20:10:25Z

I wish there were a way to test that bad syntax errors with more stability. I know tree-sitter-nix recently removed test cases like that because of the churn when updating tree-sitter-cli versions. It's nice not only to show that bad syntax does make an ERROR node but also it can help test how the parser behaves with an incomplete document (iirc that's what tree-sitter-nix was using it for)

the-mikedavis marked this pull request as ready for review March 12, 2022 01:31

J3RN approved these changes Mar 24, 2022

View reviewed changes

remove test cases for invalid syntax

05ea2a7

J3RN merged commit 5b9171b into gleam-lang:main Mar 24, 2022

the-mikedavis deleted the md-top-level-expressions branch March 24, 2022 20:13

the-mikedavis mentioned this pull request Apr 2, 2023

Reclassify let/use as statements #52

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parse expressions outside of functions #20

parse expressions outside of functions #20

the-mikedavis commented Feb 28, 2022

J3RN commented Mar 4, 2022

the-mikedavis commented Mar 12, 2022 •

edited

Loading

J3RN left a comment

J3RN Mar 24, 2022

J3RN Mar 24, 2022

the-mikedavis commented Mar 24, 2022

parse expressions outside of functions #20

parse expressions outside of functions #20

Conversation

the-mikedavis commented Feb 28, 2022

J3RN commented Mar 4, 2022

the-mikedavis commented Mar 12, 2022 • edited Loading

J3RN left a comment

Choose a reason for hiding this comment

J3RN Mar 24, 2022

Choose a reason for hiding this comment

J3RN Mar 24, 2022

Choose a reason for hiding this comment

the-mikedavis commented Mar 24, 2022

the-mikedavis commented Mar 12, 2022 •

edited

Loading