-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for the prism compiler #44
Conversation
Nice! That's an elegant implementation using Prism API. We need a CI with prism. Could you add it please? |
Yes! Done! |
return unless path | ||
|
||
lineno = loc.lineno | ||
column = RubyVM::AbstractSyntaxTree.node_id_for_backtrace_location(loc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kddnewton Hey, does the prism-derived iseq have columns in the node_id field? It is a bit hacky. What column is in the field?
I'm a little concerned the information loss of projecting node to lineno and column. If the column of the call foo.bar.baz
for example is the first column of the receiver (i.e., f
), then foo.bar
and foo.bar.baz
would be indistinguishable.
Do you always use the column of a surface position where the node is bottom-most in the tunnel? (somewhere in .baz
in this example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mame — I have been nervous about this as well for a while, but I think it might be okay. When we're creating the instructions in the compiler, the current compile.c
compiles with (lineno, node_id)
for most instructions. We compile with (lineno, column)
. So in the case of the call nodes that you specified (foo.bar.baz
), the columns are 0, 4, and 8 (they match up to the location of the name of the method).
The tunnel method always goes as far is it can down into the nodes, so it's effectively a function of tunnel(line, column) => {node}
. It's stable across parses, so it will always be consistent. Because we walk backward through the list to find the bottom-most node, it should always give us the correct call.
I'm open to changing it, as I have definitely been considering the things you're asking for here as well. However, I think it is working as it is and will continue to work as it's currently set up. (The reason I'm reluctant to add a node id is just because I don't want to add another 4 bytes to every node.) What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. In a trivial example, Prism.parse("1")
results in PrismNode -> StatementsNode -> IntegerNode, but they all have the completely same location and cannot be distinguished by (lineno, column).
Indeed, I don't think this will be a problem in most cases, including error_highlight. However, why do you avoid node_id?
The reason I'm reluctant to add a node id is just because I don't want to add another 4 bytes to every node.
You can save node_id instead of column. There should be no difference in performance, since the entire file will still be reparsed.
I know node_id seems not very cool (it was suggested by @ko1 and I didn't like it much at first myself). However, I am now happy with it, as it has worked well so far. If there is no clear problem, I vote not to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't store column on the node at all, we only have offsets from the beginning of the file. line/column information is calculated lazily for the users that need it. Our nodes are basically:
- 2 bytes type
- 2 bytes flags
- 8 bytes start pointer
- 8 bytes end pointer
I've also got a branch where we're experimenting with tagged pointers so that it's:
- 2 bytes type
- 2 bytes flags
- 2 bytes start offset
- 2 bytes length
which is really nice because we can pack the nodes into a single pointer without having to allocate anything. So this is why I'm pushing back on this because if we have a 32-bit node ID, there's no way to do tagged pointers, we'll always have to allocate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, can you elaborate on that idea a bit more? Or do you have documentation or suggestions? I am wondering how attributes and child elements would be represented.
If we want to reduce the size that much, is there any way to use a reproducible node identification method, such as post order traversal? There is no need to save the node_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Record: I talked with @kddnewton and @ko1, and we decided to go with this once.
No description provided.