Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop gensym identifiers hijacking routine decl names in templates #23392

Merged
merged 4 commits into from
Apr 9, 2024

Conversation

metagn
Copy link
Collaborator

@metagn metagn commented Mar 11, 2024

fixes #23326

In a routine declaration node in a template, if the routine is marked as gensym, the compiler adds it as a new symbol to a preliminary scope of the template. If it's not marked as gensym, then it searches the preliminary scope of the template for the name of the routine, then when it matches a template parameter or a gensym identifier, the compiler replaces the name node with a symbol node of the found symbol.

This makes sense for the template parameter since it has to be replaced later, but not really for the gensym identifier, as it doesn't allow us to inject a routine with the same name as an identifier previously declared as gensym (the problem in #23326 is when this is in another when branch).

However this is the only channel to reuse a gensym symbol in a declaration, so maybe removing it has side effects. For example if we have:

proc foo(x: int) {.gensym.} = discard
proc foo(x: float) {.gensym.} = discard

it will not behave the same as

proc foo(x: int) {.gensym.} = discard
proc foo(x: float) = discard

behaved previously, which maybe allowed overloading over the gensym'd symbols.

A note to the "undeclared identifier" error message has also been added for a potential error code that implicitly depended on the old behavior might give, namely undeclared identifier: 'abc`gensym123', which happens when in a template an identifier is first declared gensym in code that doesn't compile, then as a routine which injects by default, then the identifier is used.

@metagn
Copy link
Collaborator Author

metagn commented Mar 11, 2024

The breaks are from arraymancer:

https://github.com/mratsim/Arraymancer/blob/7d6d21cbcafda25201f4fd2fb481fb1316704813/src/arraymancer/tensor/private/p_accessors.nim#L217-L220

This is even in this module's documentation:

when getSubType(t) is KnownSupportsCopyMem:
  let data = t.unsafe_raw_offset()
else:
   template data: untyped = t

The compiler makes a new gensym in the let statement, ignores the template symbol, uses the gensym let symbol in later code, but the let statement is not compiled on instantiation, only the template symbol is, and the output code refers to the nonexistant gensym symbol, giving the error message "undeclared identifier data`12345".

It works when declaring the template gensym which is probably also the correct choice, but this is still an annoying break. Maybe we could get a better error message by detecting if an undeclared identifier error is about a gensym and add ", maybe a gensym symbol in a when statement was not compiled", and warn about mixing gensym and injected symbols in documentation, or just about undeclared gensym symbols in general (when false: let x {.gensym.} = ...).

As mentioned in #23326 we could also require an explicit inject for ths new behavior and keep the old behavior for no inject. But this means the compiler distinguishes between implicit and explicit injects which is harder to reason about. Or we "infer" a routine to be gensym if a gensym symbol with its name exists, but this is also weird, though maybe these are the same thing since they would both mirror the current behavior.

@metagn metagn marked this pull request as ready for review April 6, 2024 04:32
@metagn metagn closed this Apr 6, 2024
@metagn metagn reopened this Apr 6, 2024
@Araq Araq merged commit 73b0b0d into nim-lang:devel Apr 9, 2024
32 checks passed
Copy link
Contributor

github-actions bot commented Apr 9, 2024

Thanks for your hard work on this PR!
The lines below are statistics of the Nim compiler built from 73b0b0d

Hint: mm: orc; opt: speed; options: -d:release
178523 lines; 8.399s; 753.098MiB peakmem

metagn added a commit to metagn/Nim that referenced this pull request Apr 16, 2024
Was introduced to handle a break in nim-lang#23392, according to nim-lang#23503 (comment) should not be needed anymore
ringabout pushed a commit that referenced this pull request Apr 17, 2024
Was introduced to handle a break in #23392, according to
#23503 (comment)
should not be needed anymore
narimiran pushed a commit that referenced this pull request May 21, 2024
…3392)

fixes #23326

In a routine declaration node in a template, if the routine is marked as
`gensym`, the compiler adds it as a new symbol to a preliminary scope of
the template. If it's not marked as gensym, then it searches the
preliminary scope of the template for the name of the routine, then when
it matches a template parameter or a gensym identifier, the compiler
replaces the name node with a symbol node of the found symbol.

This makes sense for the template parameter since it has to be replaced
later, but not really for the gensym identifier, as it doesn't allow us
to inject a routine with the same name as an identifier previously
declared as gensym (the problem in #23326 is when this is in another
`when` branch).

However this is the only channel to reuse a gensym symbol in a
declaration, so maybe removing it has side effects. For example if we
have:

```nim
proc foo(x: int) {.gensym.} = discard
proc foo(x: float) {.gensym.} = discard
```

it will not behave the same as

```nim
proc foo(x: int) {.gensym.} = discard
proc foo(x: float) = discard
```

behaved previously, which maybe allowed overloading over the gensym'd
symbols.

A note to the "undeclared identifier" error message has also been added
for a potential error code that implicitly depended on the old behavior
might give, namely ``undeclared identifier: 'abc`gensym123'``, which
happens when in a template an identifier is first declared gensym in
code that doesn't compile, then as a routine which injects by default,
then the identifier is used.

(cherry picked from commit 73b0b0d)
narimiran pushed a commit that referenced this pull request May 21, 2024
…3392)

fixes #23326

In a routine declaration node in a template, if the routine is marked as
`gensym`, the compiler adds it as a new symbol to a preliminary scope of
the template. If it's not marked as gensym, then it searches the
preliminary scope of the template for the name of the routine, then when
it matches a template parameter or a gensym identifier, the compiler
replaces the name node with a symbol node of the found symbol.

This makes sense for the template parameter since it has to be replaced
later, but not really for the gensym identifier, as it doesn't allow us
to inject a routine with the same name as an identifier previously
declared as gensym (the problem in #23326 is when this is in another
`when` branch).

However this is the only channel to reuse a gensym symbol in a
declaration, so maybe removing it has side effects. For example if we
have:

```nim
proc foo(x: int) {.gensym.} = discard
proc foo(x: float) {.gensym.} = discard
```

it will not behave the same as

```nim
proc foo(x: int) {.gensym.} = discard
proc foo(x: float) = discard
```

behaved previously, which maybe allowed overloading over the gensym'd
symbols.

A note to the "undeclared identifier" error message has also been added
for a potential error code that implicitly depended on the old behavior
might give, namely ``undeclared identifier: 'abc`gensym123'``, which
happens when in a template an identifier is first declared gensym in
code that doesn't compile, then as a routine which injects by default,
then the identifier is used.

(cherry picked from commit 73b0b0d)
narimiran pushed a commit that referenced this pull request May 24, 2024
Was introduced to handle a break in #23392, according to
#23503 (comment)
should not be needed anymore

(cherry picked from commit 49e1ca0)
metagn added a commit to metagn/Nim that referenced this pull request Jul 15, 2024
Araq pushed a commit that referenced this pull request Jul 16, 2024
…23842)

fixes #23813, partially reverts #23392

Before #23392, if a `gensym` symbol was defined before a proc with the
same name in a template even with an `inject` annotation, the proc would
be `gensym`. After #23392 the proc was instead changed to be `inject` as
long as no `gensym` annotation was given. Now, to keep compatibility
with the old behavior, the behavior is changed back to infer the proc as
`gensym` when no `inject` annotation is given, however an explicit
`inject` annotation will still inject the proc. This is also documented
in the manual as the old behavior was undocumented and the new behavior
is slightly different.
narimiran pushed a commit that referenced this pull request Jul 16, 2024
…23842)

fixes #23813, partially reverts #23392

Before #23392, if a `gensym` symbol was defined before a proc with the
same name in a template even with an `inject` annotation, the proc would
be `gensym`. After #23392 the proc was instead changed to be `inject` as
long as no `gensym` annotation was given. Now, to keep compatibility
with the old behavior, the behavior is changed back to infer the proc as
`gensym` when no `inject` annotation is given, however an explicit
`inject` annotation will still inject the proc. This is also documented
in the manual as the old behavior was undocumented and the new behavior
is slightly different.

(cherry picked from commit cd94608)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Symbols in when false section affect lookup
2 participants