Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statement-ize Looping Forms #955

Closed
wants to merge 2 commits into from

Conversation

pnkfelix
Copy link
Member

@pnkfelix pnkfelix commented Mar 9, 2015

Restrict grammar of Rust language for 1.0 so that all looping syntactic forms (for, loop, while, and while let) are statements (instead of expressions). Forms like let d = loop { }; and (while foo() { }) all produce errors at parse time.

Rendered draft

Spawned off of discussion of RFC #352

@Ericson2314
Copy link
Contributor

So right now we have the following:

    { loop { break } - 3 } // ==> -3
    { (loop { break } - 3) } // type error
    { if false {} else {} - 3 } // ==> -3
    { (if false {} else {} - 3) } // type error
    { { } - 3 } // ==> 3
    { ({ } - 3) } // type error

See http://is.gd/zSjrKp.

If so, I would keep the existing symmetry and just mandate a final semicolon for a block that ends in a loop. Temporarily it's weird, but long run it is the most consistent by treating all "cury brace expressions" the same.

I suppose I'd like even more to get rid of the parenthesis rules on all of these and just require more semi colons, but that would break more code.

@pnkfelix
Copy link
Member Author

pnkfelix commented Mar 9, 2015

@Ericson2314 I'm sorry, I cannot parse your sentence

If so, would keep the existing symmetry and just mandate a final semicolon for a block that ends in a loop.

(Did you mean to have an "I" in there, as in "I would"?)

I need a more concrete elaboration of what alternative you are proposing; I honestly cannot infer what it is from what you have written here.

@Ericson2314
Copy link
Contributor

@pnkfelix I am sorry, both for the typo and the lack of clarity. I did mean "I would" (edited for posterity). What I am proposing is disallowing ending a block with a loop, unless the loop is followed by a semicolon. This gives us:

    { loop { break } - 3 }         // ==> -3
    { (loop { break } - 3) }       // type error
    { loop { break } }             // syntax error for now
    { loop { break }; }            // ==> ()

    { if false {} else {} - 3 }    // ==> -3
    { (if false {} else {} - 3) }  // type error
    { if false { 3 } else { 3 } }  // ==> 3
    { if false { 3 } else { 3 }; } // ==> ()

    { { } - 3 }                    // ==> -3
    { ({ } - 3) }                  // type error
    { { 3 } }                      // ==> 3
    { { 3 }; }                     // ==> ()

Which can then become backwards compatibly:

    { loop { break } - 3 }         // ==> -3
    { (loop { break } - 3) }       // type error
    { loop { break 3 } }           // ==> 3
    { loop { break 3 }; }          // ==> ()

    { if false {} else {} - 3 }    // ==> -3
    { (if false {} else {} - 3) }  // type error
    { if false { 3 } else { 3 } }  // ==> 3
    { if false { 3 } else { 3 }; } // ==> ()

    { { } - 3 }                    // ==> -3
    { ({ } - 3) }                  // type error
    { { 3 } }                      // ==> 3
    { { 3 }; }                     // ==> ()

@pnkfelix
Copy link
Member Author

pnkfelix commented Mar 9, 2015

@Ericson2314 I will add that to alternatives. I can see the appeal, yet it would (IMO) make a lot of code a teensy bit uglier.

@Ericson2314
Copy link
Contributor

Thanks!

@nikomatsakis
Copy link
Contributor

I am not keen on requiring more semicolons than we require now -- seems like we could add these extensions backwards compatbility even without this, though it would mean that while/for loops with a "break with value" would have type Option<T> vs () (asymmetric).

That said, I am unpersuaded as to the need for the original feature itself -- I agree it makes for elegant code in some cases to break with a value, but you can achieve many if not all of the same ends using iterators, filter_map, and a single call to next().

@pnkfelix
Copy link
Member Author

though it would mean that while/for loops with a "break with value" would have type Option vs () (asymmetric).

(To be clear, there are other options, such as a required else clause attached to such for/while-loops, as described in RFC #352. But of course, those details are not really germane to this RFC discussion; this RFC is solely about answering the question about how much future proofing we wish to do today.)

@lilyball
Copy link
Contributor

This feels a bit odd to me. Without requiring semicolons, it feels inconsistent with the rest of the language. And requiring semicolons is ugly.

As near as I can tell, the only thing this accomplishes is allowing looping constructs to always evaluate to a non-() type (when not in statement position) in the future, without breaking code that expressly uses looping constructs as expressions and expects a () result (which is rare). But I'm not convinced that's a worthwhile change. I can see the merits of allowing for looping constructs to sometimes have non-() types (e.g. with a break <expr> coupled with an else block), but it would feel surprising to me to have looping constructs always have a non-() type, and even more-so if they only did this when used in a non-statement position.

@brson
Copy link
Contributor

brson commented Mar 11, 2015

I'm in favor of the sentiment as futureproofing.

The phrase "then we probably will require the use of parentheses around such forms when they appear as the tail expression of a block" is worrying since putting the expression-loop in tail position might be a common case for the feature.

@Ericson2314
Copy link
Contributor

@kballard The idea isn't that loops always have a non-() type in expression position, but that absent of a concrete plan it is only safe they do for compatibility. Also it is better to think of the loops in "statement position" as expressions with their value thrown away, rather than a "true statement" like a let binding. That way, the type of the loop itself is context-free.

@nikomatsakis
Copy link
Contributor

Overall I still trend negative on this proposal, because I think the scheme we have is working pretty well and the need to change it is small. But I just re-read the RFC briefly and realized I was slightly confused about the distinction between the alternatives and main proposal. Unfortunately, now that I grok it better, I feel like there is a fundamental conflict between C-like and expression-like usage that makes me dislike the idea of while and for uniformly adopting Option type (see below).

In particular, (iiuc) the RFC is saying that a looping control-flow construct cannot be in the tail expression of a block. That is, in the following program:

fn foo() {
    while something { ...; break; ... }
}

the while loop here would be parsed today as a tail expression of type () but in this proposal would be parsed as a statement and there would be no tail expression.

This implies that in the future, if while loops uniformly had type Option<T> for some T, then the following program would not type check:

fn bar(i: i32) -> Option<i32> {
    while something(i) { if something_else(i) { break i; } ... }
}

Instead one would be required to write return while... or (while...) or something like that.

On the other hand, if we adopted the alternative that @Ericson2314 proposed], then we would find that the fallout would be much larger, because we would need a ;. I would go so far as to call it a non-starter for me, because I think that the original fn we saw (with unit return type) ought to be legal.

This conflict seems to me to be somewhat fundamental though, and seems to suggest to me that making while and for uniformly have Option type is misguided. That is, we can't accept both the foo and bar examples above. The only way to accept both would be if we made a while loop with a "value-less" break (as today) have () type, and a while loop with a value-carrying break have Option type. Which also means we don't need to change anything today to prepare for that possibility.

Is there a flaw in this reasoning? I guess the flaw might be that we want both foo and bar to continue parsing -- perhaps one might be willing to sacrifice foo or bar (for example, by requiring foo to have a ;).

@nikomatsakis
Copy link
Contributor

In re-reading @brson's comment I realize he put my entire post into 1 sentence. ;)

@pnkfelix
Copy link
Member Author

Okay, so far it sounds like core team may be trending towards the second listed alternative:

Obvious 2: We could decide that non-unit looping expressions are only worth adopting if they can be added without requiring a change like this.

I'll keep this RFC up for a while to see if anyone comes in with strong arguments against that view.


Followup note: In all honesty, I tend to agree with both the commentary from @brson and the analysis from @nikomatsakis

I have only one counter-argument against this point from @brson (and its quite a weak counter) :

The phrase "then we probably will require the use of parentheses around such forms when they appear as the tail expression of a block" is worrying since putting the expression-loop in tail position might be a common case for the feature.

A similar situation of requiring a parenthesis can arise in if-statements, as noted in the appendix -- note that the example there is not choosing between compiler-error versus code-running, but rather between code runs with one result versus another depending on whether you put an opening parenthesis before the if.

(However, that example is definitely a strawman, constructed solely to illustrate a corner case, that I do not expect to see perhaps ever in practice. @brson's note wins out because of the key phrase: "common case".)

@aidancully
Copy link

For what it's worth (which, I acknowledge, probably isn't much), I am in favor of requiring semicolons on the end of looping expressions. I disagree with the aesthetic concern about requiring new semicolons: I actually consider it more elegant, and more revealing of programmer intention, to require semicolons after loops than to treat the braces from loops (and if statements) specially. And I think we are currently treating loops specially:

  • Where the language requires the result of evaluation to be (), a semicolon is usually required.
  • Where the language requires braces, the result of evaluation is usually allowed to be other than ().

Requiring a semicolon after a loop statement reveals programmer intention that the value returned by the looping expression is (currently, until loops can return non-unit values) meaningless. Regarding the appendix, it is surprising that those two if expressions behave differently. If a semicolon were required on the end of an if statement, it would resolve the ambiguity, and the surprise would disappear.

@Ericson2314
Copy link
Contributor

While I do prefer the semicolon route, Here's an alternative plan that is fully backwards compatible:

First some context, I have two guiding principles:

  • break; should mean the same thing as break ();. This preserves the symmetry with return, and more broadly (though subjectively) aligns with the other cases of syntactic support for eliding ().
  • loop { } is the most powerful looping construct---all others can be desugared to it. If we want to add new functionality too loops, it is most important that it be available with loop { } as the other types of loops can be manually desugared by the programmer to take advantage of the new functionality.

So lets add:

a : A
---------------------------------
loop { ... break a; ... } : A

All our existing loop { }s desugar to

loop { ... break (); ... } : ()

and thus keep their type from today.

@pnkfelix
Copy link
Member Author

@Ericson2314 I know you already said this new alternative plan is fully backwards compatible, but just to be 100% clear: This RFC was meant to address future-proofing concerns to accommodate #352 / #961 .

If I understand what you said properly, you are advocating (or at least outlining) a plan that requires no changes from Rust as it stands today; i.e. no future proofing, right? I only ask to try to clarify whether you are suggesting that we can indeed close this RFC.

@Ericson2314
Copy link
Contributor

@pnkfelix Yes, exactly. That plan would replace #352 / #961 and close this.

@Ericson2314
Copy link
Contributor

For the record, a third plan is to make it so that the for- and while- loops without explicit breaks return () too. Since there is only one way to exit those loops, I don't see any useful information could be returned. But I'd still likefor... / while... { ... break (); ... } to return Option<()>, both for consistency with breaking an value of an arbitrary type, and pragmatically to distinguish between the two ways the loop can be exited. The only way this plan would be backwards compatible would be to make break; and break (); mean different things, but I consider that a non-starter.

@glaebhoerl
Copy link
Contributor

For what it's worth, my preference is (and has been all along) for the approach where Option is not special cased, for and while are extended with an optional else clause (whatever we name it), the return type of the various looping constructs depend on the presence or absence of break and/or else, and break is short for break () (as @Ericson2314 also writes).

So:

x, y: T

loop { }: ! // for<T> T

loop { break }: ()

loop { break x }: T

while (foo) { }: ()

while (foo) { break }: ()

while (foo) { break x }: type error

while (foo) { break x } else { y }: T

with for..in the same as while. (As far as I'm aware this is backwards compatible.)

@Ericson2314
Copy link
Contributor

@glaebhoerl Ah, I initially thought of else rules as "functional break requires else", which breaks the break break () equivalence. But if it is instead thought of as ... == ... else { () }, the rules as the same as with if (if x { () } is a legal expression), and the break break () equivalence is preserved. I'm not wild about else as option gives the same expressive power without extra syntax, but it does help with compatibility and for that I might forgive it.

So in sum, if the core team changes their minds and decides semicolons are OK after all, then I like that and eventually arriving at:

given x:T, y: U
loop { }: !
loop { break }: ()
loop { break x }: T
while foo { }: ()
while foo { break }: Option<()>
while foo { break x }: Option<T>
for _ in for { }: ()
for _ in for { break }: Option<()>
for _ in for { break x }: Option<T>
while let PAT = y { }: U // y if it doesn't match the pattern
while let PAT = y { break }: Result<(), U>
while let PAT = y { break x }: Result<T, U>

If semicolons are no-go, I like @glaebhoerl's plan (but what do you do with while let), or my "plan 2" (#955 (comment)) which is forwards compatible with @glaebhoerl's plan and backwards compatible with the status-quo and thus can be viewed as a stepping stone.

@bill-myers
Copy link

Requiring semicolons seems a non-starter since other C-like languages don't and it would be extremely surprising for programmers (most likely, expletives would be uttered upon discovering this rule).

Doing nothing and in the future making "break" and "break ()" do different things instead seems completely fine and perhaps the best option, and might actually be good as it emphasizes the fact that the behavior of "break ()" is something new compared to current C-like languages (where for/while is not an expression), and also emphasizes that the programmer wants to know whether the loop was exited by breaking or not.

@Diggsey
Copy link
Contributor

Diggsey commented Mar 12, 2015

I like @glaebhoerl 's plan too, because it doesn't change the meaning of existing loops, but merely extends them in a consistent fashion. I also prefer the "else" syntax to using "Option", as it makes the common use-case clear and simple:

for x in xs {
   if x.foo == y { break x; }
} else {
    // Handle missing case
}

While of course the same thing can be achieve with "Option", it requires either an additional "if" or "match" statement, or a closure. Using "else" just feels leaner and more straightforward, even though it is techincally introducing new syntax.

@aidancully
Copy link

@bill-myers Thank you for putting your case against semicolons so plainly. I disagree because brace-blocks mean something completely different in Rust than they do in other C-like languages: in other languages, they are used to group statements. In Rust, a brace-block is generally an expression. Expressions are generally terminated with semicolons. Requiring a semicolon makes this fundamental difference in brace-usage actually appear different in code, which I think is to the good.

If semicolons were required, then a programmer used to C who forgets a semicolon on the end of a while-loop would probably utter an expletive. But that would be followed by an "aha" moment when they understand what's going on, and the language's behavior and design would become more clear as a result. That is, the confusion is brief, and isolated to developers learning the language for the first time (and who should, as such, be expecting to learn new concepts as they try the language out).

On the other hand, if semicolons are not required to terminate statements, then when you run into situations like @pnkfelix described in the appendix to this RFC (showing an ambiguity between an if statement and an if expression), it would be confusing even to intermediate Rust developers. There is no initial confusion or expletive uttered when working with the language for the first time, but on the other hand, that allows a possibly mistaken interpretation of brace-expressions to become ingrained in the reader's mind, so that the expletive is uttered much later, more rarely, but with more force.

To make this more concrete, I'd advocate for a rule that all statements should be terminated with semicolons. Braces in most Rust code blocks denote expressions. Expressions and statements should be unambiguous.

@pnkfelix
Copy link
Member Author

pnkfelix commented Apr 9, 2015

(i am no longer a proponent of doing this, if i ever was ...; closing.)

@pnkfelix pnkfelix closed this Apr 9, 2015
@Ericson2314
Copy link
Contributor

To be clear, is just the stopped gap ruled out now / are the backwards compatible plans still on the table for post 1.0?

@pnkfelix
Copy link
Member Author

@Ericson2314 the hypothetical feature of allowing loops to return values is still a potential future feature, at least in the sense that #961 remains open (just postponed for post-1.0).

The particular detail of one kind of future proofing proposed by this RFC #955, however, is no longer on the table. (Or at least, I am no longer pushing for it, and it is quite unlikely to get put into 1.0.)

@Ericson2314
Copy link
Contributor

Gotcha---didn't realize a postponement issue was made when the original RFC was closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants