Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for the prism compiler #44

Merged
merged 2 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .github/workflows/ruby.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,15 @@ jobs:
- name: Run the test suite
run: |
RUBYOPT=--disable-error_highlight bundle exec rake TESTOPT=-v

prism:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ruby/setup-ruby@v1
with:
ruby-version: head
bundler-cache: true
- name: Run the test suite
run: |
RUBYOPT="--disable-error_highlight --parser=prism" bundle exec rake TESTOPT=-v
307 changes: 304 additions & 3 deletions lib/error_highlight/base.rb
Original file line number Diff line number Diff line change
Expand Up @@ -60,14 +60,14 @@ def self.spot(obj, **opts)
rescue RuntimeError => error
# RubyVM::AbstractSyntaxTree.of raises an error with a message that
# includes "prism" when the ISEQ was compiled with the prism compiler.
# In this case, we'll set the node to `nil`. In the future, we will
# reparse with the prism parser and pass the parsed node to Spotter.
# In this case, we'll try to parse again with prism instead.
raise unless error.message.include?("prism")
prism_find(loc, **opts)
end

Spotter.new(node, **opts).spot

when RubyVM::AbstractSyntaxTree::Node
when RubyVM::AbstractSyntaxTree::Node, Prism::Node
Spotter.new(obj, **opts).spot

else
Expand All @@ -81,6 +81,71 @@ def self.spot(obj, **opts)
return nil
end

# Accepts a Thread::Backtrace::Location object and returns a Prism::Node
# corresponding to the location in the source code.
def self.prism_find(loc, point_type: :name, name: nil)
require "prism"
return nil if Prism::VERSION < "0.29.0"

path = loc.absolute_path
return unless path

lineno = loc.lineno
column = RubyVM::AbstractSyntaxTree.node_id_for_backtrace_location(loc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kddnewton Hey, does the prism-derived iseq have columns in the node_id field? It is a bit hacky. What column is in the field?

I'm a little concerned the information loss of projecting node to lineno and column. If the column of the call foo.bar.baz for example is the first column of the receiver (i.e., f), then foo.bar and foo.bar.baz would be indistinguishable.
Do you always use the column of a surface position where the node is bottom-most in the tunnel? (somewhere in .baz in this example)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mame — I have been nervous about this as well for a while, but I think it might be okay. When we're creating the instructions in the compiler, the current compile.c compiles with (lineno, node_id) for most instructions. We compile with (lineno, column). So in the case of the call nodes that you specified (foo.bar.baz), the columns are 0, 4, and 8 (they match up to the location of the name of the method).

The tunnel method always goes as far is it can down into the nodes, so it's effectively a function of tunnel(line, column) => {node}. It's stable across parses, so it will always be consistent. Because we walk backward through the list to find the bottom-most node, it should always give us the correct call.

I'm open to changing it, as I have definitely been considering the things you're asking for here as well. However, I think it is working as it is and will continue to work as it's currently set up. (The reason I'm reluctant to add a node id is just because I don't want to add another 4 bytes to every node.) What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. In a trivial example, Prism.parse("1") results in PrismNode -> StatementsNode -> IntegerNode, but they all have the completely same location and cannot be distinguished by (lineno, column).

Indeed, I don't think this will be a problem in most cases, including error_highlight. However, why do you avoid node_id?

The reason I'm reluctant to add a node id is just because I don't want to add another 4 bytes to every node.

You can save node_id instead of column. There should be no difference in performance, since the entire file will still be reparsed.

I know node_id seems not very cool (it was suggested by @ko1 and I didn't like it much at first myself). However, I am now happy with it, as it has worked well so far. If there is no clear problem, I vote not to change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't store column on the node at all, we only have offsets from the beginning of the file. line/column information is calculated lazily for the users that need it. Our nodes are basically:

  • 2 bytes type
  • 2 bytes flags
  • 8 bytes start pointer
  • 8 bytes end pointer

I've also got a branch where we're experimenting with tagged pointers so that it's:

  • 2 bytes type
  • 2 bytes flags
  • 2 bytes start offset
  • 2 bytes length

which is really nice because we can pack the nodes into a single pointer without having to allocate anything. So this is why I'm pushing back on this because if we have a 32-bit node ID, there's no way to do tagged pointers, we'll always have to allocate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, can you elaborate on that idea a bit more? Or do you have documentation or suggestions? I am wondering how attributes and child elements would be represented.

If we want to reduce the size that much, is there any way to use a reproducible node identification method, such as post order traversal? There is no need to save the node_id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Record: I talked with @kddnewton and @ko1, and we decided to go with this once.

tunnel = Prism.parse_file(path).value.tunnel(lineno, column)

# Prism provides the Prism::Node#tunnel API to find all of the nodes that
# correspond to the given line and column in the source code, with the first
# node in the list being the top-most node and the last node in the list
# being the bottom-most node.
tunnel.each_with_index.reverse_each.find do |part, index|
case part
when Prism::CallNode, Prism::CallOperatorWriteNode, Prism::IndexOperatorWriteNode, Prism::LocalVariableOperatorWriteNode
# If we find any of these nodes, we can stop searching as these are the
# nodes that triggered the exceptions.
break part
when Prism::ConstantReadNode, Prism::ConstantPathNode
if index != 0 && tunnel[index - 1].is_a?(Prism::ConstantPathOperatorWriteNode)
# If we're inside of a constant path operator write node, then this
# constant path may be highlighting a couple of different kinds of
# parts.
if part.name == name
# Explicitly turn off Foo::Bar += 1 where Foo and Bar are on
# different lines because error highlight expects this to not work.
break nil if part.delimiter_loc.end_line != part.name_loc.start_line

# Otherwise, because we have matched the name we can return this
# part.
break part
end

# If we haven't matched the name, it's the operator that we're looking
# for, and we can return the parent node here.
break tunnel[index - 1]
elsif part.name == name
# If we have matched the name of the constant, then we can return this
# inner node as the node that triggered the exception.
break part
else
# If we are at the beginning of the tunnel or we are at the beginning
# of a constant lookup chain, then we will return this node.
break part if index == 0 || !tunnel[index - 1].is_a?(Prism::ConstantPathNode)
end
when Prism::LocalVariableReadNode, Prism::ParenthesesNode
# If we find any of these nodes, we want to continue searching up the
# tree because these nodes cannot trigger the exceptions.
false
else
# If we find a different kind of node that we haven't already handled,
# we don't know how to handle it so we'll stop searching and assume this
# is not an exception we can decorate.
break nil
end
end
end

private_class_method :prism_find

class Spotter
class NonAscii < Exception; end
private_constant :NonAscii
Expand Down Expand Up @@ -205,6 +270,48 @@ def spot

when :OP_CDECL
spot_op_cdecl

when :call_node
case @point_type
when :name
prism_spot_call_for_name
when :args
prism_spot_call_for_args
end

when :local_variable_operator_write_node
case @point_type
when :name
prism_spot_local_variable_operator_write_for_name
when :args
prism_spot_local_variable_operator_write_for_args
end

when :call_operator_write_node
case @point_type
when :name
prism_spot_call_operator_write_for_name
when :args
prism_spot_call_operator_write_for_args
end

when :index_operator_write_node
case @point_type
when :name
prism_spot_index_operator_write_for_name
when :args
prism_spot_index_operator_write_for_args
end

when :constant_read_node
prism_spot_constant_read

when :constant_path_node
prism_spot_constant_path

when :constant_path_operator_write_node
prism_spot_constant_path_operator_write

end

if @snippet && @beg_column && @end_column && @beg_column < @end_column
Expand Down Expand Up @@ -548,6 +655,200 @@ def fetch_line(lineno)
@beg_lineno = @end_lineno = lineno
@snippet = @fetch[lineno]
end

# Take a location from the prism parser and set the necessary instance
# variables.
def prism_location(location)
@beg_lineno = location.start_line
@beg_column = location.start_column
@end_lineno = location.end_line
@end_column = location.end_column
@snippet = @fetch[@beg_lineno, @end_lineno]
end

# Example:
# x.foo
# ^^^^
# x.foo(42)
# ^^^^
# x&.foo
# ^^^^^
# x[42]
# ^^^^
# x.foo = 1
# ^^^^^^
# x[42] = 1
# ^^^^^^
# x + 1
# ^
# +x
# ^
# foo(42)
# ^^^
# foo 42
# ^^^
# foo
# ^^^
def prism_spot_call_for_name
# Explicitly turn off foo.() syntax because error_highlight expects this
# to not work.
return nil if @node.name == :call && @node.message_loc.nil?

location = @node.message_loc || @node.call_operator_loc || @node.location
location = @node.call_operator_loc.join(location) if @node.call_operator_loc&.start_line == location.start_line

# If the method name ends with "=" but the message does not, then this is
# a method call using the "attribute assignment" syntax
# (e.g., foo.bar = 1). In this case we need to go retrieve the = sign and
# add it to the location.
if (name = @node.name).end_with?("=") && [email protected]_with?("=")
location = location.adjoin("=")
end

prism_location(location)

if !name.end_with?("=") && !name.match?(/[[:alpha:]_\[]/)
# If the method name is an operator, then error_highlight only
# highlights the first line.
fetch_line(location.start_line)
end
end

# Example:
# x.foo(42)
# ^^
# x[42]
# ^^
# x.foo = 1
# ^
# x[42] = 1
# ^^^^^^^
# x[] = 1
# ^^^^^
# x + 1
# ^
# foo(42)
# ^^
# foo 42
# ^^
def prism_spot_call_for_args
# Explicitly turn off foo.() syntax because error_highlight expects this
# to not work.
return nil if @node.name == :call && @node.message_loc.nil?

if @node.name == :[]= && @node.opening == "[" && (@node.arguments&.arguments || []).length == 1
prism_location(@node.opening_loc.copy(start_offset: @node.opening_loc.start_offset + 1).join(@node.arguments.location))
else
prism_location(@node.arguments.location)
end
end

# Example:
# x += 1
# ^
def prism_spot_local_variable_operator_write_for_name
prism_location(@node.binary_operator_loc.chop)
end

# Example:
# x += 1
# ^
def prism_spot_local_variable_operator_write_for_args
prism_location(@node.value.location)
end

# Example:
# x.foo += 42
# ^^^ (for foo)
# x.foo += 42
# ^ (for +)
# x.foo += 42
# ^^^^^^^ (for foo=)
def prism_spot_call_operator_write_for_name
if [email protected]_with?(/[[:alpha:]_]/)
prism_location(@node.binary_operator_loc.chop)
else
location = @node.message_loc
if @node.call_operator_loc.start_line == location.start_line
location = @node.call_operator_loc.join(location)
end

location = location.adjoin("=") if @name.end_with?("=")
prism_location(location)
end
end

# Example:
# x.foo += 42
# ^^
def prism_spot_call_operator_write_for_args
prism_location(@node.value.location)
end

# Example:
# x[1] += 42
# ^^^ (for [])
# x[1] += 42
# ^ (for +)
# x[1] += 42
# ^^^^^^ (for []=)
def prism_spot_index_operator_write_for_name
case @name
when :[]
prism_location(@node.opening_loc.join(@node.closing_loc))
when :[]=
prism_location(@node.opening_loc.join(@node.closing_loc).adjoin("="))
else
# Explicitly turn off foo[] += 1 syntax when the operator is not on
# the same line because error_highlight expects this to not work.
return nil if @node.binary_operator_loc.start_line != @node.opening_loc.start_line

prism_location(@node.binary_operator_loc.chop)
end
end

# Example:
# x[1] += 42
# ^^^^^^^^
def prism_spot_index_operator_write_for_args
opening_loc =
if @node.arguments.nil?
@node.opening_loc.copy(start_offset: @node.opening_loc.start_offset + 1)
else
@node.arguments.location
end

prism_location(opening_loc.join(@node.value.location))
end

# Example:
# Foo
# ^^^
def prism_spot_constant_read
prism_location(@node.location)
end

# Example:
# Foo::Bar
# ^^^^^
def prism_spot_constant_path
if @node.parent && @node.parent.location.end_line == @node.location.end_line
fetch_line(@node.parent.location.end_line)
prism_location(@node.delimiter_loc.join(@node.name_loc))
else
fetch_line(@node.location.end_line)
location = @node.name_loc
location = @node.delimiter_loc.join(location) if @node.delimiter_loc.end_line == location.start_line
prism_location(location)
end
end

# Example:
# Foo::Bar += 1
# ^^^^^^^^
def prism_spot_constant_path_operator_write
prism_location(@node.binary_operator_loc.chop)
end
end

private_constant :Spotter
Expand Down
10 changes: 9 additions & 1 deletion test/test_error_highlight.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@
require "tempfile"

class ErrorHighlightTest < Test::Unit::TestCase
# We can't revisit instruction sequences to find node ids if the prism
# compiler was used instead of the parse.y compiler. In that case, we'll omit
# some tests.
def self.compiling_with_prism?
RubyVM::InstructionSequence.compile("").to_a[4][:parser] == :prism
end

class DummyFormatter
def self.message_for(corrections)
""
Expand Down Expand Up @@ -868,7 +875,7 @@ def test_COLON2_4
end
end

if ErrorHighlight.const_get(:Spotter).const_get(:OPT_GETCONSTANT_PATH)
if ErrorHighlight.const_get(:Spotter).const_get(:OPT_GETCONSTANT_PATH) && !compiling_with_prism?
def test_COLON2_5
# Unfortunately, we cannot identify which `NotDefined` caused the NameError
assert_error_message(NameError, <<~END) do
Expand Down Expand Up @@ -1334,6 +1341,7 @@ def test_spot_with_backtrace_location

def test_spot_with_node
omit unless RubyVM::AbstractSyntaxTree.respond_to?(:node_id_for_backtrace_location)
omit if ErrorHighlightTest.compiling_with_prism?

begin
raise_name_error
Expand Down