Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocate closure env on stack if viable (wip) #14881

Closed
wants to merge 5 commits into from

Conversation

recloser
Copy link
Contributor

@recloser recloser commented Jul 2, 2020

Opening a PR to check how it does on the tests.

If no inner proc is actually passed somewhere we can make the closure env a non ref object and pass it to the inner procs as a ptr instead. This should make some common uses of inner procs much more performant. It would also make it valid to capture an outer var argument inside non escaping inner procs (not implemented yet).
Not sure if the check for this (anyInnerProcMightEscape) covers all the cases thoroughly though, so it needs to be reviewed.

@Araq
Copy link
Member

Araq commented Jul 3, 2020

Not sure if the check for this (anyInnerProcMightEscape) covers all the cases thoroughly though, so it needs to be reviewed.

You need to outline the ideas behind it. Preferable as comment in the code itself.

@Araq
Copy link
Member

Araq commented Jul 5, 2020

You really need to explain the optimization and how it is sound for a precise GC that expects "env" to be a pointer pointing into the GC's heap.

@recloser
Copy link
Contributor Author

recloser commented Jul 5, 2020

Consider this code:

proc stuff =
  var x = 123
  proc inner =
    x = doSomething(x)
  inner()

Without the PR it will be transformed into something like this during lambda lifting:

proc inner(_env2: ref object) =
    _env2.x = doSomething(_env2.x)

proc stuff =
  var _env1: ref object # unconditionally a ref object
  new(_env1)
  _env1.x = 123
  var _closure = (inner, env1)
  _closure[0](_closure[1])

As you can see the heap allocations here are completely pointless as the env never actually escapes the stack, since inner is just called locally and never passed to some other proc or assigned somewhere.

During lambda lifting we can scan stuff's AST and see if any of its inner procs are assigned or passed somewhere, and if that isn't the case we should be safe to allocate the env object on stack.

What this PR intends to do is transform this simple case into something like this instead, as long as we determined that the env can't escape the stack:

proc inner(_env2: ptr object) =
    # ^ inner's env param becomes just a regular pointer
    _env2.x = doSomething(_env2.x)

proc stuff =
  var _env1: object # regular stack object is safe in this case
  _env1.x = 123
  var _closure = (inner, addr(env1))
  _closure[0](_closure[1])

GC that expects "env" to be a pointer pointing into the GC's heap.

If the env is stack allocated I use ptrs instead of refs for all the inner procs params, and likewise for the closure tuple store just the env ptr, so GC shouldn't touch them at all. Do you mean that the GC code has some special handling for env and it always assumes it to be a ref? If that's the case that would be a blocker. Wouldn't it be possible to make it discern between a ref and a ptr env?

BTW: Not strictly related to this PR, but something I ran into - not sure if this is a bug. If a proc sym is gensymmed it will never have the sfAddrTaken flag set, example:

import macros

macro makeSomething(name, alias: untyped; useGenSym = false) =
  let 
    s = if useGenSym.boolVal: genSym(nskProc, name.strVal) else: name
  return quote do:
    proc `s`: int = 1
    let `alias` = `s` # sfAddrTaken is present on `s` only if useGenSym is false

proc main =
  makeSomething(foobar, baz, true)
  echo baz()
  makeSomething(foobar2, baz2, false)
  echo baz2()

main()

# in cgen:
# proc genProc(m: BModule, prc: PSym) =
#   if "foobar" in prc.name.s:
#     echo prc, " ", prc.flags
# ...
# outputs something like:
# foobar@123 {sfUsed, sfGenSym}
# foobar2@234 {sfUsed, sfAddrTaken}

@disruptek
Copy link
Contributor

Good idea; definitely watching with interest. 👍

@Araq
Copy link
Member

Araq commented Jul 6, 2020

If the env is stack allocated I use ptrs instead of refs for all the inner procs params, and likewise for the closure tuple store just the env ptr, so GC shouldn't touch them at all.

Good!

Do you mean that the GC code has some special handling for env and it always assumes it to be a ref? If that's the case that would be a blocker.

No, that's fine but the compiler needs to be reviewed for all the cases where we assume that tyProc and callConv == ccClosure assumes a ref environment as now ptr is a possibility too. The compiler needs to be reviewed, not the GCs.

BTW: Not strictly related to this PR, but something I ran into - not sure if this is a bug. If a proc sym is gensymmed it will never have the sfAddrTaken flag set, example:

I think we should remove sfAddrTaken.

@recloser
Copy link
Contributor Author

recloser commented Jul 9, 2020

Ugh, wasted so much time trying to figure out why some simple tests fail before realizing that ptrs are totally broken on the VM. Blocked by #13887.

I think we should remove sfAddrTaken.

Yeah - now that I've run into even more instances of it not being properly set on syms beyond of what I've mentioned above I see that it has been quite neglected. Guess we're gonna have to do it the hard way.

@timotheecour
Copy link
Member

timotheecour commented Jul 14, 2020

see also #14976 which seems related; i was initially using sfAddrTaken but then realized it was the wrong thing to use (for many reasons) and is also buggy: nim-lang/compilerdev#12; I'm instead computing a dependency graph between symbols and derefence level; this could be useful for this PR, to figure out the set of symbols a closure depends on by reference

@stale
Copy link

stale bot commented Jul 14, 2021

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

@stale stale bot added the stale Staled PR/issues; remove the label after fixing them label Jul 14, 2021
@stale stale bot closed this Aug 13, 2021
@recloser
Copy link
Contributor Author

I believe that this PR was more or less working sans for the blocker cause by the still existing VM bug. If it were possible to run the closure codegen separately for VM and C we could simply not generate stack closures on the VM - but I'm not sure how to implement this separation, or whether it is realistic at all, so any hints would be appreciated.

And just to recap why I believe implementing this notion of 'stack closures' into the compiler is worthwhile, here are the issues it would address:

  • The obvious one: no pointless heap allocs for closures that never escape the stack.
  • Heap closures prevent optimizations by the C compiler. The assembly generated for closures that don't escape the stack with this PR is far more efficient - the intermediate closure env is essentially optimized away.
  • Stack closures avoid unnecessary copies for captured variables, eg.
proc foo(x: seq[int]) =
  proc bar():
    echo x[0]
  bar() 

With stack closures bar in the above example can simply capture x by address, therefore avoiding a sneaky perf pitfall.

  • More expressive code:
proc foo(x: var seq[int]) =
  proc bar() =
    x &= 123
  bar()

Even though this code is safe, currently the compiler will fail with an error complaining that capturing x would violate memory safety. With stack closures we could safely relax this limitation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Staled PR/issues; remove the label after fixing them
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants