Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

init checks and 'out' parameters #14521

Merged
merged 17 commits into from
Jun 23, 2020
Merged

init checks and 'out' parameters #14521

merged 17 commits into from
Jun 23, 2020

Conversation

Araq
Copy link
Member

@Araq Araq commented May 31, 2020

No description provided.

@Araq
Copy link
Member Author

Araq commented May 31, 2020

Note: Needs an RFC.

@Clyybber
Copy link
Contributor

Clyybber commented May 31, 2020

I think a callsite magic to not prevent observable stores is a better idea than introducing out params.
I fear that the existance of outparams will lead people will write two versions of every proc, one with the result being an outparam and the other normally. IMO they should never be needed.

@Araq
Copy link
Member Author

Araq commented May 31, 2020

It is not about observable stores though, it's about "definite assignment" analysis.

@timotheecour
Copy link
Member

if 99% of the time tyOut is treated like tyVar, why not instead introduce tfOut and use that flag in the 1% code locations where you need to distinguish

As you mentioned, new type kinds are expensive in terms of code maintenance (and well, i was kind of hoping to use the tyOpt spot for tyAliasSym in #11992, but it's not just about that)

@Araq
Copy link
Member Author

Araq commented Jun 1, 2020

if 99% of the time tyOut is treated like tyVar, why not instead introduce tfOut and use that flag in the 1% code locations where you need to distinguish

Fair enough. I thought a type flag makes the language less "introspectable" because you cannot query a type flag as easily from macros.nim.

@timotheecour
Copy link
Member

timotheecour commented Jun 1, 2020

Fair enough. I thought a type flag makes the language less "introspectable" because you cannot query a type flag as easily from macros.nim.

the same will hold for macro writers; I am better that consolidating var+out will lead to less bugs even for macro writers because it'll avoid cases like "oh i forgot to handle out kind", and the majority of cases (like the 99% here) would not care to distinguish out vs var.

Whether to expose type flags is a different story (because of minmizing leaks of internal detail) but it's totally find to introduce an api analog to proc querySetting*(setting: SingleValueSetting): string but for types, where we'd only expose things that are needed for macro clients, possibly tfOut

=> stable macro API hiding internal details that may change

after all, out only makes sense in that context, unlike things like ptr T where ptr makes sense in all contexts, and the minor (in terms of spec difference) difference bw var and out doesnt' justify using a new decorator out

@Araq
Copy link
Member Author

Araq commented Jun 1, 2020

why not instead proc fun(a {.out.}: var int) = discard ?

  1. It encourages "lying" as the shorter version is the wrong one.
  2. Compat problems because version 1.0 lacks pragmas for parameters...

@timotheecour
Copy link
Member

Compat problems because version 1.0 lacks pragmas for parameters...

this compiles with 1.0

when false:
  proc main(a {.out2.}: var int)=
    echo a
  main()

(but in other cases this is a recurring problem: lexer tries to to too much, instead of deferring to semantic; It makes since less useful; we can patch nim 1.0.x+1.2.x to make lexer produce an error node for yet unsupported lexical things, so that since (or when false) will produce no error)

shorter version is the wrong one

that's definitely application specific, there are legitimate cases where var is what you need (eg passing a Table by var). Unless I'm misunderstanding the yet-to-be-written RFC.

@Araq
Copy link
Member Author

Araq commented Jun 1, 2020

Unless I'm misunderstanding the yet-to-be-written RFC.

I think you do, read the spec additions please in this PR.

@timotheecour
Copy link
Member

timotheecour commented Jun 1, 2020

I had only read changelog + skimmed code, hadn't seen manual.rst diff; now that I've read it I have many questions

  • my point above remains more valid than ever. I'd actually want to use var instead of out in almost all cases in my code: input-output parameters is the most flexible, practical, even with performance in mind. Not to mention, append semantics is an important use case that out can't do.

It is enforced that every code path assigns a value to every out parameter before the routine returns

that seems to imply that's also the case if it throws; if so, that severely restricts what you can do for initialization.
In fact even the example you gave violates the spec:

proc divmod(a, b: int; res, remainder: out int) =
    res = a div b # if this throws
    remainder = a mod b
var res, remainder
divmod(0,0, res,remainder) # this raises, and `remainder` was unassigned even though the routine returned

it's even worse in following example:

proc divmod(a, b: int; a2: out int, res: out int) =
  a2 = a # observable store
  res = a div b # raises  

divmod(0,0, a2, res) # a2 was modified, but not res (see also ObservableStores discussion)

so that implies you can only initialize the out params inside the proc with code that doesn't raise (or only raises defects AND user has --panics:on)

that means the only thing you can use in practice is use builtin constructors eg myparam = A(x: 10, y: "foo"), even if user doesn't care about any of those out guarantees

performance

It is enforced that every code path assigns a value to every out parameter before the routine returns and before a read from the out parameter can be performed

If I'm reading this right it means the proc itself can't read any out params before all the out params were assigned (otherwise it'd be redundant with before the routine returns)

A typical use case for var param is buffer reuse, eg:

proc toStr[T](result: var string, b: T) =
  result.setLen 0
  # etc

with out param, this wouldn't work because you'd require: result = "" before any operation.
in particular, result.setLen 0 followed by result.add $T is actually violating the out spec since it reads result.capacity. Even if you allow this (say because it's "builtin"), that doesn't generalize to user defined types, eg if user has a custom type, it'd prevent any use of caching, debugging fields, isInitialized fields etc; in effect user would have to decouple the "input-output fields" from the "output-only fields" before calling such proc. I don't see how that'd even work with a user-defined MySeq[T] type (with len, capacity, data)

breaking change

Seeing as you had to add result = 0 (etc) initializers in many places, this makes me wonder whether this would cause code breakage.

out = {.requiresInit.} ?

can't {.requiresInit.} fit the bill for out, eg: proc fn(a: int, b {.requiresInit.}: var int) = discard

desired goals + alternative approach?

I'm wondering what are the real benefits here to outweigh the other factors, and if there's alternative approach.
it seems like the key sentence is:

There is no semantic difference between ``proc p(x: out T) {.raises: [].}`` and ``proc p(): T {.raises: [].}``
but how these constructs are mapped to machine code might differ.

but the proposal needs more details here; if there is no semantic difference, then the only reason for out is that it would be more efficient; but maybe there's another way to achieve that.
Eg instead of nimZeroMem for result variable, semantic phase could use a user defined reinit which is similar to reset except it allows keeping memory buffers; it's the generalization of setLen 0 for user defined types. eg:

proc joinPath(a, b: string): string =
  # result.reinit # implicitly called by semantic phase
  result.add a & '/' & b

var c: string
for i in 0..<10:
  c = joinPath("foo", "bar") # calls c.reinit before each iteration, which for string is `c.setLen 0`

=> no allocs, buffers are reused, and the friendlier return syntax is used

@Araq
Copy link
Member Author

Araq commented Jun 1, 2020

desired goals + alternative approach?

More precise C interop (see stat) plus it allows us to improve the existing practice of using (res: var T): bool instead of Option[T]. Note that we all agree the existing practice is bad but it turns old lying code into code that doesn't lie anymore without API breakages. I will expand this point in the RFC.

Futhermore, out parameters finally allow us to write re-usable constructors for object hierarchies. More in the RFC.

@Araq
Copy link
Member Author

Araq commented Jun 1, 2020

but in other cases this is a recurring problem: lexer tries to to too much, instead of deferring to semantic; It makes since less useful; we can patch nim 1.0.x+1.2.x to make lexer produce an error node for yet unsupported lexical things, so that since (or when false) will produce no error

I think more often than not we got it right, see for example how the parser always allowed for let x {.importc.} .

@timotheecour
Copy link
Member

timotheecour commented Jun 1, 2020

More precise C interop (see stat)

for me, that's just 1 of several parameter specfiers, eg: restrict, const, and I don't see a reason to single this out.
Note also that, unlike restrict and const, out doens't exist in C so it's purely a matter of "faith" in fact that C function behaves like an out param.

And in many cases it doesn't behave like out:

  • sprintf (input buffer only partially written to)
  • memmove (even memcpy, which is undefined behavior if regions overlap; not sure whether UB counts)
  • realloc (more like input output)

semantic of out in D

in D, out just means parameter a is initialized upon function entry with a.init (our default(T)); that actually is IMO already a better semantic as it's done by the compiler instead of require require user to do it (and then checking with static analysis); and can be optimized away when compiler can prove its safe to do so. And it's in fact similar to how nim handles result initialization (or variable initialization) via nimZeroMem.

That said, IIRC it's not a very popular feature as oftentimes input output params are more practical.

@Araq
Copy link
Member Author

Araq commented Jun 1, 2020

that actually is IMO already a better semantic as it's done by the compiler instead of require require user to do it (and then checking with static analysis); and can be optimized away when compiler can prove its safe to do so.

Maybe sure, but for now I consider it an unimportant detail.

@planetis-m
Copy link
Contributor

I think out parameters is a different enough concept (from var) that deserve their one language constract and not hidden / confused with var.

@zah
Copy link
Member

zah commented Jun 3, 2020

BTW, out is already used for specifying covariant generic parameters.

I would check if this patch breaks the test suite with -d:nimEnableCovariance (Araq, as a reminder, you've promised to consider making this the default)

@Araq
Copy link
Member Author

Araq commented Jun 3, 2020

-d:nimEnableCovariance (Araq, as a reminder, you've promised to consider making this the default)

I have no interest whatsoever in covariance. :-/

@zah
Copy link
Member

zah commented Jun 3, 2020

The covariant generic parameters seem like an obscure feature now, but they will become quite important when people start defining smart pointer types in Nim. The feature is essential for preserving the sub-type relationship between ptr Derived and ptr Base when using custom pointer types.

@timotheecour
Copy link
Member

@zah => timotheecour#285 to avoid hijacking this

@Araq
Copy link
Member Author

Araq commented Jun 4, 2020

Note: The upcoming RFC for out parameters will likely be considerably different from this implementation.

@Araq
Copy link
Member Author

Araq commented Jun 23, 2020

Note: I removed out parameters again but kept the minimal support for it in the parser because it might be useful if we ever add this feature to Nim.

@Araq Araq merged commit da29222 into devel Jun 23, 2020
@Araq Araq deleted the araq-init-checks branch June 23, 2020 08:54
@timotheecour
Copy link
Member

@Araq @Clyybber I just want to make sure we're not going in circles

some of the changes in this PR relating to result initialization are not so good IMO (if they're indeed needed, it would need some kind of explanation):

for examples things like this:
https://github.com/nim-lang/Nim/pull/14521/files#diff-31a643844935ff24fb24f54b472f38cbR110
which adds result = "" which doesn't seem necessary; since no change in semantics to result read before initialization was done so it's not warranted, and in fact a ton of code relies on this so it doesn't make sense to have these initializations added in cases like this

and in fact that change gets reverted here: https://github.com/nim-lang/Nim/pull/14777/files?file-filters%5B%5D=#diff-31a643844935ff24fb24f54b472f38cbL110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants