Support for the prism compiler #44

kddnewton · 2024-05-20T17:50:29Z

No description provided.

mame · 2024-05-20T19:36:56Z

Nice! That's an elegant implementation using Prism API.

We need a CI with prism. Could you add it please?

kddnewton · 2024-05-20T19:43:57Z

Yes! Done!

mame · 2024-05-20T19:52:54Z

lib/error_highlight/base.rb

+    return unless path
+
+    lineno = loc.lineno
+    column = RubyVM::AbstractSyntaxTree.node_id_for_backtrace_location(loc)


@kddnewton Hey, does the prism-derived iseq have columns in the node_id field? It is a bit hacky. What column is in the field?

I'm a little concerned the information loss of projecting node to lineno and column. If the column of the call foo.bar.baz for example is the first column of the receiver (i.e., f), then foo.bar and foo.bar.baz would be indistinguishable.
Do you always use the column of a surface position where the node is bottom-most in the tunnel? (somewhere in .baz in this example)

Hey @mame — I have been nervous about this as well for a while, but I think it might be okay. When we're creating the instructions in the compiler, the current compile.c compiles with (lineno, node_id) for most instructions. We compile with (lineno, column). So in the case of the call nodes that you specified (foo.bar.baz), the columns are 0, 4, and 8 (they match up to the location of the name of the method).

The tunnel method always goes as far is it can down into the nodes, so it's effectively a function of tunnel(line, column) => {node}. It's stable across parses, so it will always be consistent. Because we walk backward through the list to find the bottom-most node, it should always give us the correct call.

I'm open to changing it, as I have definitely been considering the things you're asking for here as well. However, I think it is working as it is and will continue to work as it's currently set up. (The reason I'm reluctant to add a node id is just because I don't want to add another 4 bytes to every node.) What do you think?

Hmmm. In a trivial example, Prism.parse("1") results in PrismNode -> StatementsNode -> IntegerNode, but they all have the completely same location and cannot be distinguished by (lineno, column).

Indeed, I don't think this will be a problem in most cases, including error_highlight. However, why do you avoid node_id?

The reason I'm reluctant to add a node id is just because I don't want to add another 4 bytes to every node.

You can save node_id instead of column. There should be no difference in performance, since the entire file will still be reparsed.

I know node_id seems not very cool (it was suggested by @ko1 and I didn't like it much at first myself). However, I am now happy with it, as it has worked well so far. If there is no clear problem, I vote not to change it.

We don't store column on the node at all, we only have offsets from the beginning of the file. line/column information is calculated lazily for the users that need it. Our nodes are basically:

2 bytes type

2 bytes flags

8 bytes start pointer

8 bytes end pointer

I've also got a branch where we're experimenting with tagged pointers so that it's:

2 bytes type

2 bytes flags

2 bytes start offset

2 bytes length

which is really nice because we can pack the nodes into a single pointer without having to allocate anything. So this is why I'm pushing back on this because if we have a 32-bit node ID, there's no way to do tagged pointers, we'll always have to allocate.

Hmmm, can you elaborate on that idea a bit more? Or do you have documentation or suggestions? I am wondering how attributes and child elements would be represented.

If we want to reduce the size that much, is there any way to use a reproducible node identification method, such as post order traversal? There is no need to save the node_id.

Record: I talked with @kddnewton and @ko1, and we decided to go with this once.

Support for the prism compiler

c1279a8

kddnewton requested a review from mame May 20, 2024 17:50

Add prism GitHub CI workflow

85a9aa0

mame reviewed May 20, 2024

View reviewed changes

mame approved these changes Jun 7, 2024

View reviewed changes

mame merged commit 24f4b44 into master Jun 7, 2024
6 checks passed

mame deleted the prism branch June 7, 2024 13:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for the prism compiler #44

Support for the prism compiler #44

kddnewton commented May 20, 2024

mame commented May 20, 2024

kddnewton commented May 20, 2024

mame May 20, 2024

kddnewton May 21, 2024

mame May 21, 2024

kddnewton May 21, 2024

mame May 21, 2024

mame Jun 7, 2024

Support for the prism compiler #44

Support for the prism compiler #44

Conversation

kddnewton commented May 20, 2024

mame commented May 20, 2024

kddnewton commented May 20, 2024

mame May 20, 2024

Choose a reason for hiding this comment

kddnewton May 21, 2024

Choose a reason for hiding this comment

mame May 21, 2024

Choose a reason for hiding this comment

kddnewton May 21, 2024

Choose a reason for hiding this comment

mame May 21, 2024

Choose a reason for hiding this comment

mame Jun 7, 2024

Choose a reason for hiding this comment