lazy symbol semantic checking #416

timotheecour · 2021-08-26T07:22:30Z

TLDR: this would obsolete coreReordering, avoid need for fwd declarations, solve cyclic dependencies, massively speedup compilation times and likely decrease binary sizes.

proposal

Sem-check symbols lazily, deferring semcheck of declaration and body until they're needed:

a symbol declared gets added to current scope but isn't semchecked yet
a reference to an overloaded symbol (ie a rountine call fn(a,b) triggers semcheck of declarations of each overload of fn (only the declaration, not the body)
once a call is resolved to a symbol, that symbol's body is semchecked, at which point a new scope is created (with parent scope = scope where symbol was declared) and semcheck recurses in this scope in same way as for the top-level scope

At a high-level it works similarly to how nim implements dead code elimination in backend, by processing declarations top-down, which has the built-in property of skipping unused declarations (even if those appear in a cycle, eg if foo calls bar but no top-level code calls (transitively) either foo or bar, both foo and bar will be dead-code eliminated as a natural property.

semcheck as depth first traversal with deferred semchecking

We define a processing stack containing declaration scopes (PScope), and semcheck consists in lazily processing statements in a scope and recursing when a non-declaration statement is visited that requires semchecking a symbol. Processing starts by pushing the top-level scope of the main project module to the processing stack.

There is no need for fwd declarations nor complex data structures nor keeping track of scopes attached to declarations; all that's needed is a stack of scopes.

At any given time during semcheck, you only need to keep track of a stack of N scopes where N is the processing depth (eg: fn1 calls fn2 calls fn3 => N = 3). When a symbol f is declared in a scope S, f's declaration scope S will grow if new declarations occur after f is declared (in same lexical scope); when a statement (eg: f(), which can occur in any scope where f is in scope) triggers semcheck of f (delaration or body), a new scope is created (whose parent is S).

The parent relationship implicitly defines a tree (rooted at module top-level), but we never need to visit the children of a scope; all that's needed is walking up the scope when doing symbol resolution (ident => symbol).

proc a00=discard
proc a01=
  proc a11=
    a12()
  proc a12=
    a00()
    a11()
  a12()

a01()

processing steps:
start with top-level scope S0, initially empty and with S0.parent = nil; and push S0 to the stack

register a00, a01; stack = [S0] # S0 contains a00, a01
visit a01; stack = [S0, S1] # S1.parent = S0
- register a11, a12; stack = [S0, S1]
- visit a12; stack = [S0, S1, S2]
  - visit a00; stack = [S0, S1, S2, S3] # S3.parent = S0, not S2!
  - visit a11; stack = [S0, S1, S2, S4] # s4.parent = S2
    - visit a12(cached)

semcheck consists in doing depth first search in the symbol graph, where recursion is triggered for each statement that requires semchecking a declaration (a new declaration doesn't require semchecking nor recursion; instead the symbol just declared is merely added to the current scope). Each time a symbol (in a declaration scope S) is semchecked, a new scope is created (with parent S) and pushed to the stack; it is popped when semchecking for this symbol is completed.

this can be represented simply as follows:

type
  PScope = ref object
    parent: PScope
    symbols: TStrTable
var stack: seq[PScope] # the processing stack

in particular, the position in processingStack is unrelated to the parent field.

example

# m1.nim
type
  A = object
  B = typeof(fn3())
  C = seq[B]

proc fn1(a: C) =
  from m2 import fn2
  fn2(a)

proc fn3(): int = 1

# until this point, the symbol table just contains shallow entries: `A(type) , B(type), C(type), fn1 (proc), fn3(proc)`;
# in particular, fn1, fn3, C have not been semchecked.
# If the module ended here, that would be the end of semcheck

discard fn3() # this triggers semcheck of fn3

var a: C # this triggers semcheck of C => B => fn3 (cached)
discard fn1(a) # this triggers semcheck of fn1 => registers m2; then fn2(a) triggers import of m2, and semcheck of `fn2`

scoping rules

the declaration scope looks both before and after a declaration

proc fn1 = discard
# scope: fn1
block:
  proc fn2 =
    # when body is semchecked (triggered by `fn2()`), the symbols in scope will be: [fn1, fn2, fn3]
    fn3()
  proc fn3 = discard
  fn2()
# fn2 and fn3 are not in scope anymore (symbols in scope = [fn1])

behavior of `declared`

const a = declared(foo)
static: echo a # prints `false` because a top-level code triggers semcheck of `a` and `foo` isn't yet declared
proc foo() = discard

contrast with:

const a = declared(foo)
proc foo() = discard
static: echo a # prints true # the top-level code triggers after foo is declared

behavior of lazy imports

this just follows from the above rules:

# a.nim
proc a1=
  from b import b1
  b1()
proc a2=
  from b import b2
  b2()

a1()
a2()

# b.nim
proc b0 = discard
proc b1 = b0()
proc b2 = discard

semcheck steps:

symbol table registers a1, a2 (without semchecking those)
a1() is seen, triggers semcheck of a1
- from b import b1 is seen, registers symbols b, b1 (no semcheck nor import yet)
- b1() is seen, triggers semcheck of b1, which triggers import of b, then of b1()
  - b0() is seen, triggers semcheck of b0
a2() is seen, triggers semcheck of a2
- from b import b2 is seen, registers symbols, b, b2
- b2() is seen, triggers semcheck of b2

benefits

this would solve cyclic dependencies
forward declarations become un-needed
this is a more principled and effective way to achieve {.experimental: "codeReordering".}, and subsumes this feature
this would massively speed up compilation times, and possibly also binary sizes; the dead code elimination that nim currently does cannot recover from certain earlier decisions such as generating module initializations for modules that would be skipped thanks to lazy symbol resolution
there would be no need for include files anymore (these are still pervasive in compiler code)

links

i had a very old/outdated POC for something related to this
EDIT: now implemented, see Implement RFC 416: lazy symbol semantic checking: enable cyclic deps, remove need for fwd decls, etc Nim#18818

The text was updated successfully, but these errors were encountered:

konsumlamm · 2021-08-26T10:41:45Z

Are you suggesting that unused definitions just aren't semchecked at all? I don't think that would be a good idea.

Araq · 2021-08-26T11:36:24Z

This is very good but at the end of the module symbols should be checked even if unused. (The compiler could skip this step if the module hasn't changed between runs, providing the speed advantage of your "laziness".)

timotheecour · 2021-08-26T17:21:52Z

at the end of the module symbols should be checked even if unused

whether to semcheck unused symbols could very well be controlled via a flag --lazysemcheck:skipunused, otherwise you'd lose the (potentially very large) performance benefit on compile times; in particular during development, or for tools like inim etc where compile times are a critical aspect (even for 1st compilation).

The compiler could skip this step if the module hasn't changed between runs

possible, TBD

Araq · 2021-08-26T18:41:20Z

whether to semcheck unused symbols could very well be controlled via a flag --lazysemcheck:skipunused, otherwise you'd lose the (potentially very large) performance benefit on compile times; in particular during development, or for tools like inim etc where compile times are a critical aspect (even for 1st compilation).

Well inim should be allowed to cheat, yes. Not sure if that needs a switch or if inim should import the compiler as a library but it's a minor point.

Araq · 2021-08-30T12:12:33Z

I take it back, lazy sem'checking seems far too powerful an idea for weakening it from the beginning.

Varriount · 2021-09-01T07:04:00Z

This certainly looks promising in concept... I wonder if there has been any research done in this area?

Araq · 2021-09-08T06:45:27Z

Additional remark: The performance benefits of lazy sem'checking for code bases that have little dead code like the Nim compiler should not be all that great. IC does not have this problem.

Varriount · 2021-09-08T20:44:58Z

Additional remark: The performance benefits of lazy sem'checking for code bases that have little dead code like the Nim compiler should not be all that great. IC does not have this problem.

The compiler is likely atypical in this case though - I can't speak for large Nim applications specifically, but Python, JavaScript, and Java applications all tend to have moderate amounts of dead code via external libraries.

It should also be noted that, at least so far, IC has been fairly "fragile", especially in the case of Nim's compile-time execution features. For example, the macrocache module had to be created because IC can't handle global compile-time global variables very well.

Araq · 2021-09-09T09:41:53Z

For example, the macrocache module had to be created because IC can't handle global compile-time global variables very well.

So? There are no other alternatives around, "let's recompile everything and hope for plenty of dead code" isn't a solution...

Varriount · 2021-09-09T19:48:43Z

So? There are no other alternatives around, "let's recompile everything and hope for plenty of dead code" isn't a solution...

Sure. The point I'm trying to make though is that just like the benefits from the proposed "lazy semantic checking", the benefits from incremental compilation can be highly situational. Neither one seems superior to the other in this regard.

Araq · 2021-09-10T08:01:49Z

Well they are not in conflict in principle, you can have a compiler that uses both IC and lazy sem'checking.

This was referenced Aug 26, 2021

[WIP] lazy symbol semantic checking timotheecour/Nim#804

Open

import at local scope #410

Closed

Araq added the Let's see what it breaks accepted RFC label Aug 30, 2021

Araq mentioned this issue Aug 30, 2021

atlas tries to checkout incorrect commit when a requires: dep doesn't specify a commit nim-lang/Nim#18753

Closed

This was referenced Sep 1, 2021

parametric effects: proc fn(a: proc()) {.raises: [a].} #408

Closed

Implement RFC 416: lazy symbol semantic checking: enable cyclic deps, remove need for fwd decls, etc nim-lang/Nim#18818

Closed

Araq added a commit to nim-lang/Nim that referenced this issue Sep 8, 2021

alternative to nim-lang/RFCs#416; WIP

591a44c

quantimnot mentioned this issue Nov 17, 2021

version-2-0 branch and CI #436

Closed

metagn mentioned this issue Nov 24, 2021

Improvements to defines #438

Closed

metagn mentioned this issue Jan 14, 2022

RFC: type section macros #66

Closed

metagn mentioned this issue May 4, 2022

Roadmap for Nim #437

Closed

33 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lazy symbol semantic checking #416

lazy symbol semantic checking #416

timotheecour commented Aug 26, 2021 •

edited

Loading

konsumlamm commented Aug 26, 2021

Araq commented Aug 26, 2021 •

edited

Loading

timotheecour commented Aug 26, 2021 •

edited

Loading

Araq commented Aug 26, 2021

Araq commented Aug 30, 2021

Varriount commented Sep 1, 2021

Araq commented Sep 8, 2021

Varriount commented Sep 8, 2021 •

edited

Loading

Araq commented Sep 9, 2021

Varriount commented Sep 9, 2021

Araq commented Sep 10, 2021

lazy symbol semantic checking #416

lazy symbol semantic checking #416

Comments

timotheecour commented Aug 26, 2021 • edited Loading

proposal

semcheck as depth first traversal with deferred semchecking

example

scoping rules

behavior of declared

behavior of lazy imports

benefits

links

konsumlamm commented Aug 26, 2021

Araq commented Aug 26, 2021 • edited Loading

timotheecour commented Aug 26, 2021 • edited Loading

Araq commented Aug 26, 2021

Araq commented Aug 30, 2021

Varriount commented Sep 1, 2021

Araq commented Sep 8, 2021

Varriount commented Sep 8, 2021 • edited Loading

Araq commented Sep 9, 2021

Varriount commented Sep 9, 2021

Araq commented Sep 10, 2021

timotheecour commented Aug 26, 2021 •

edited

Loading

behavior of `declared`

Araq commented Aug 26, 2021 •

edited

Loading

timotheecour commented Aug 26, 2021 •

edited

Loading

Varriount commented Sep 8, 2021 •

edited

Loading