-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lazy symbol semantic checking #416
Comments
Are you suggesting that unused definitions just aren't semchecked at all? I don't think that would be a good idea. |
|
whether to semcheck unused symbols could very well be controlled via a flag
possible, TBD |
Well inim should be allowed to cheat, yes. Not sure if that needs a switch or if inim should import the compiler as a library but it's a minor point. |
I take it back, lazy sem'checking seems far too powerful an idea for weakening it from the beginning. |
This certainly looks promising in concept... I wonder if there has been any research done in this area? |
Additional remark: The performance benefits of lazy sem'checking for code bases that have little dead code like the Nim compiler should not be all that great. IC does not have this problem. |
The compiler is likely atypical in this case though - I can't speak for large Nim applications specifically, but Python, JavaScript, and Java applications all tend to have moderate amounts of dead code via external libraries. It should also be noted that, at least so far, IC has been fairly "fragile", especially in the case of Nim's compile-time execution features. For example, the macrocache module had to be created because IC can't handle global compile-time global variables very well. |
So? There are no other alternatives around, "let's recompile everything and hope for plenty of dead code" isn't a solution... |
Sure. The point I'm trying to make though is that just like the benefits from the proposed "lazy semantic checking", the benefits from incremental compilation can be highly situational. Neither one seems superior to the other in this regard. |
Well they are not in conflict in principle, you can have a compiler that uses both IC and lazy sem'checking. |
TLDR: this would obsolete coreReordering, avoid need for fwd declarations, solve cyclic dependencies, massively speedup compilation times and likely decrease binary sizes.
proposal
Sem-check symbols lazily, deferring semcheck of declaration and body until they're needed:
fn(a,b)
triggers semcheck of declarations of each overload offn
(only the declaration, not the body)At a high-level it works similarly to how nim implements dead code elimination in backend, by processing declarations top-down, which has the built-in property of skipping unused declarations (even if those appear in a cycle, eg if
foo
callsbar
but no top-level code calls (transitively) either foo or bar, both foo and bar will be dead-code eliminated as a natural property.semcheck as depth first traversal with deferred semchecking
We define a processing stack containing declaration scopes (PScope), and semcheck consists in lazily processing statements in a scope and recursing when a non-declaration statement is visited that requires semchecking a symbol. Processing starts by pushing the top-level scope of the main project module to the processing stack.
There is no need for fwd declarations nor complex data structures nor keeping track of scopes attached to declarations; all that's needed is a stack of scopes.
At any given time during semcheck, you only need to keep track of a stack of N scopes where N is the processing depth (eg: fn1 calls fn2 calls fn3 => N = 3). When a symbol
f
is declared in a scopeS
,f
's declaration scopeS
will grow if new declarations occur afterf
is declared (in same lexical scope); when a statement (eg:f()
, which can occur in any scope where f is in scope) triggers semcheck off
(delaration or body), a new scope is created (whose parent isS
).The parent relationship implicitly defines a tree (rooted at module top-level), but we never need to visit the children of a scope; all that's needed is walking up the scope when doing symbol resolution (ident => symbol).
processing steps:
start with top-level scope S0, initially empty and with S0.parent = nil; and push S0 to the
stack
semcheck consists in doing depth first search in the symbol graph, where recursion is triggered for each statement that requires semchecking a declaration (a new declaration doesn't require semchecking nor recursion; instead the symbol just declared is merely added to the current scope). Each time a symbol (in a declaration scope S) is semchecked, a new scope is created (with parent S) and pushed to the stack; it is popped when semchecking for this symbol is completed.
this can be represented simply as follows:
in particular, the position in processingStack is unrelated to the
parent
field.example
scoping rules
the declaration scope looks both before and after a declaration
behavior of
declared
contrast with:
behavior of lazy imports
this just follows from the above rules:
semcheck steps:
from b import b1
is seen, registers symbols b, b1 (no semcheck nor import yet)from b import b2
is seen, registers symbols, b, b2benefits
{.experimental: "codeReordering".}
, and subsumes this featurelinks
The text was updated successfully, but these errors were encountered: