Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse expressions outside of functions #20

Merged
merged 2 commits into from
Mar 24, 2022
Merged

parse expressions outside of functions #20

merged 2 commits into from
Mar 24, 2022

Conversation

the-mikedavis
Copy link
Member

This PR is actually more of a question than a solution 😅

Parsing expressions in the top-level (under source_file, outside of functions where they would normally be parsed) is advantageous when injecting Gleam, as you might do with markdown. This is pertinent to #14 because as far as I know, GitHub uses the tree-sitter parser for markdown's fenced code blocks. For example in #19, I have a code block

```gleam
<<code:int-size(8)-unit(2), reason:utf8>>
```

Which is not a valid gleam program but is a valid gleam "snippet" so to speak.

What would you think about allowing expressions in the top-level like this?

(I didn't really look at those changed import tests yet, looks like some error nodes made their way into the tests and this PR ends up changing the error behavior.)

@J3RN
Copy link
Member

J3RN commented Mar 4, 2022

Sorry for my delayed response!

Generally speaking, I'd like to follow the example of the first-party tree-sitter grammars, as they're probably the best "documentation" for tree-sitter grammars. As far as those go, Ruby, JavaScript, and Elixir all actually allow top-level expressions. I think I'll have to look at Rust and/or Go to see if either A) top level expressions are allowed in the language (I don't think so?) and B) if the tree-sitter grammar allows top-level expressions. This is on my to-do list 😅

looks like some error nodes made their way into the tests and this PR ends up changing the error behavior.

At some point I thought it was a good idea to test the error cases to ensure that the errors made sense or were helpful. Unfortunately tree-sitter gives us very little control over this, and it's probably a bad idea after all. All this to say, that test (or tests) should be fine to remove 👍

@the-mikedavis
Copy link
Member Author

the-mikedavis commented Mar 12, 2022

I did some looking around and it looks like

For rust it looks like tree-sitter-rust is permissive about top-level expressions where the rust language is not. E.g.

let x = 2 + 3;
fn main() {
    println!("Hello, world!");
}
gives an error
error: expected item, found keyword `let`
 --> src/main.rs:1:1
  |
1 | let x = 2 + 3;
  | ^^^ expected item

error: could not compile `foo` due to previous error
but is parsed successfully by tree-sitter-rust...
(source_file [0, 0] - [4, 0]
  (let_declaration [0, 0] - [0, 14]
    pattern: (identifier [0, 4] - [0, 5])
    value: (binary_expression [0, 8] - [0, 13]
      left: (integer_literal [0, 8] - [0, 9])
      right: (integer_literal [0, 12] - [0, 13])))
  (function_item [1, 0] - [3, 1]
    name: (identifier [1, 3] - [1, 7])
    parameters: (parameters [1, 7] - [1, 9])
    body: (block [1, 10] - [3, 1]
      (expression_statement [2, 4] - [2, 30]
        (macro_invocation [2, 4] - [2, 29]
          macro: (identifier [2, 4] - [2, 11])
          (token_tree [2, 12] - [2, 29]
            (string_literal [2, 13] - [2, 28])))))))

I don't have a compiler tool-chain setup for go though so I'm not sure about that one 😅

This change allows the parser to return valid nodes for expressions
on the "top-level" of a document.

Here "top-level" is read as "not within a function." This is actually
invalid Gleam code: for example, you cannot write a `case/2` statement
outside of a function body. This is desirable for the tree-sitter
parser, though, because the parser will end up being used in flexible
situations, such as one-off highlights in fenced markdown blocks, e.g.:

    ```gleam
    <<code:int-size(8)-unit(2), reason:utf8>>
    ```

Which is a common usage in an editor, or on GitHub.
@the-mikedavis the-mikedavis marked this pull request as ready for review March 12, 2022 01:31
Copy link
Member

@J3RN J3RN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! 🎉 I'm leaving this open for the moment in case you just want to remove those error tests which are likely not useful 😁

Comment on lines 59 to 65
module: (module)
(ERROR)
imports: (unqualified_imports
(unqualified_import
name: (identifier))
(unqualified_import
name: (identifier)))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably just remove this test 😁

Comment on lines 104 to 106
(ERROR)
(record
name: (type_identifier))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one also 😅

@the-mikedavis
Copy link
Member Author

I wish there were a way to test that bad syntax errors with more stability. I know tree-sitter-nix recently removed test cases like that because of the churn when updating tree-sitter-cli versions. It's nice not only to show that bad syntax does make an ERROR node but also it can help test how the parser behaves with an incomplete document (iirc that's what tree-sitter-nix was using it for)

@J3RN J3RN merged commit 5b9171b into gleam-lang:main Mar 24, 2022
@the-mikedavis the-mikedavis deleted the md-top-level-expressions branch March 24, 2022 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants