diff --git a/0000-template.md b/0000-template.md index 629e4e4a37e..ef898e3360a 100644 --- a/0000-template.md +++ b/0000-template.md @@ -1,29 +1,47 @@ +- Feature Name: (fill me in with a unique ident, my_awesome_feature) - Start Date: (fill me in with today's date, YYYY-MM-DD) -- RFC PR #: (leave this empty) -- Rust Issue #: (leave this empty) +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) # Summary +[summary]: #summary One para explanation of the feature. # Motivation +[motivation]: #motivation Why are we doing this? What use cases does it support? What is the expected outcome? # Detailed design +[design]: #detailed-design This is the bulk of the RFC. Explain the design in enough detail for somebody familiar with the language to understand, and for somebody familiar with the compiler to implement. This should get into specifics and corner-cases, and include examples of how the feature is used. +# How We Teach This +[how-we-teach-this]: #how-we-teach-this + +What names and terminology work best for these concepts and why? +How is this idea best presented—as a continuation of existing Rust patterns, or as a wholly new one? + +Would the acceptance of this proposal change how Rust is taught to new users at any level? +How should this feature be introduced and taught to existing Rust users? + +What additions or changes to the Rust Reference, _The Rust Programming Language_, and/or _Rust by Example_ does it entail? + # Drawbacks +[drawbacks]: #drawbacks Why should we *not* do this? # Alternatives +[alternatives]: #alternatives What other designs have been considered? What is the impact of not doing this? # Unresolved questions +[unresolved]: #unresolved-questions What parts of the design are still TBD? diff --git a/README.md b/README.md index db8eeea30de..5680d0b0e2b 100644 --- a/README.md +++ b/README.md @@ -1,78 +1,265 @@ # Rust RFCs +[Rust RFCs]: #rust-rfcs -Many changes, including bug fixes and documentation improvements can be +Many changes, including bug fixes and documentation improvements can be implemented and reviewed via the normal GitHub pull request workflow. -Some changes though are "substantial", and we ask that these be put -through a bit of a design process and produce a consensus among the Rust -community and the [core team]. +Some changes though are "substantial", and we ask that these be put +through a bit of a design process and produce a consensus among the Rust +community and the [sub-team]s. The "RFC" (request for comments) process is intended to provide a -consistent and controlled path for new features to enter the language -and standard libraries, so that all stakeholders can be confident about +consistent and controlled path for new features to enter the language +and standard libraries, so that all stakeholders can be confident about the direction the language is evolving in. +## Table of Contents +[Table of Contents]: #table-of-contents +* [Opening](#rust-rfcs) +* [Table of Contents] +* [When you need to follow this process] +* [Before creating an RFC] +* [What the process is] +* [The role of the shepherd] +* [The RFC life-cycle] +* [Reviewing RFC's] +* [Implementing an RFC] +* [RFC Postponement] +* [Help this is all too informal!] + + ## When you need to follow this process +[When you need to follow this process]: #when-you-need-to-follow-this-process -You need to follow this process if you intend to make "substantial" -changes to the Rust distribution. What constitutes a "substantial" -change is evolving based on community norms, but may include the following. +You need to follow this process if you intend to make "substantial" changes to +Rust, Cargo, Crates.io, or the RFC process itself. What constitutes a +"substantial" change is evolving based on community norms and varies depending +on what part of the ecosystem you are proposing to change, but may include the +following. - Any semantic or syntactic change to the language that is not a bugfix. - - Changes to the interface between the compiler and libraries, -including lang items and intrinsics. - - Additions to `std` + - Removing language features, including those that are feature-gated. + - Changes to the interface between the compiler and libraries, including lang + items and intrinsics. + - Additions to `std`. Some changes do not require an RFC: - - Rephrasing, reorganizing, refactoring, or otherwise "changing shape + - Rephrasing, reorganizing, refactoring, or otherwise "changing shape does not change meaning". - - Additions that strictly improve objective, numerical quality -criteria (warning removal, speedup, better platform coverage, more + - Additions that strictly improve objective, numerical quality +criteria (warning removal, speedup, better platform coverage, more parallelism, trap more errors, etc.) - - Additions only likely to be _noticed by_ other developers-of-rust, + - Additions only likely to be _noticed by_ other developers-of-rust, invisible to users-of-rust. -If you submit a pull request to implement a new feature without going -through the RFC process, it may be closed with a polite request to +If you submit a pull request to implement a new feature without going +through the RFC process, it may be closed with a polite request to submit an RFC first. +For more details on when an RFC is required, please see the following specific +guidelines, these correspond with some of the Rust community's +[sub-teams](http://www.rust-lang.org/team.html): + +* [language changes](lang_changes.md), +* [library changes](libs_changes.md), +* [compiler changes](compiler_changes.md). + + +## Before creating an RFC +[Before creating an RFC]: #before-creating-an-rfc + +A hastily-proposed RFC can hurt its chances of acceptance. Low quality +proposals, proposals for previously-rejected features, or those that +don't fit into the near-term roadmap, may be quickly rejected, which +can be demotivating for the unprepared contributor. Laying some +groundwork ahead of the RFC can make the process smoother. + +Although there is no single way to prepare for submitting an RFC, it +is generally a good idea to pursue feedback from other project +developers beforehand, to ascertain that the RFC may be desirable: +having a consistent impact on the project requires concerted effort +toward consensus-building. + +The most common preparations for writing and submitting an RFC include +talking the idea over on #rust-internals, filing and discusssing ideas +on the [RFC issue tracker][issues], and occasionally posting +'pre-RFCs' on [the developer discussion forum][discuss] for early +review. + +As a rule of thumb, receiving encouraging feedback from long-standing +project developers, and particularly members of the relevant [sub-team] +is a good indication that the RFC is worth pursuing. + +[issues]: https://github.com/rust-lang/rfcs/issues +[discuss]: http://internals.rust-lang.org/ + + ## What the process is +[What the process is]: #what-the-process-is -In short, to get a major feature added to Rust, one must first get the -RFC merged into the RFC repo as a markdown file. At that point the RFC -is 'active' and may be implemented with the goal of eventual inclusion +In short, to get a major feature added to Rust, one must first get the +RFC merged into the RFC repo as a markdown file. At that point the RFC +is 'active' and may be implemented with the goal of eventual inclusion into Rust. * Fork the RFC repo http://github.com/rust-lang/rfcs -* Copy `0000-template.md` to `active/0000-my-feature.md` (where -'my-feature' is descriptive. don't assign an RFC number yet). -* Fill in the RFC -* Submit a pull request. The pull request is the time to get review of -the design from the larger community. -* Build consensus and integrate feedback. RFCs that have broad support -are much more likely to make progress than those that don't receive any -comments. -* Eventually, somebody on the [core team] will either accept the RFC by -merging the pull request and assigning the RFC a number, at which point -the RFC is 'active', or reject it by closing the pull request. - -Once an RFC becomes active then authors may implement it and submit the -feature as a pull request to the Rust repo. An 'active' is not a rubber -stamp, and in particular still does not mean the feature will ultimately -be merged; it does mean that in principle all the major stakeholders -have agreed to the feature and are amenable to merging it. - -Modifications to active RFC's can be done in followup PR's. An RFC that -makes it through the entire process to implementation is considered -'complete' and is moved to the 'complete' folder; an RFC that fails -after becoming active is 'inactive' and moves to the 'inactive' folder. +* Copy `0000-template.md` to `text/0000-my-feature.md` (where 'my-feature' is +descriptive. don't assign an RFC number yet). +* Fill in the RFC. Put care into the details: RFCs that do not present +convincing motivation, demonstrate understanding of the impact of the design, or +are disingenuous about the drawbacks or alternatives tend to be poorly-received. +* Submit a pull request. As a pull request the RFC will receive design feedback +from the larger community, and the author should be prepared to revise it in +response. +* Each pull request will be labeled with the most relevant [sub-team]. +* Each sub-team triages its RFC pull requests. The sub-team will either close +the pull request (for RFCs that clearly will not be accepted) or assign it a +*shepherd*. The shepherd is a trusted developer who is familiar with the RFC +process, who will help to move the RFC forward, and ensure that the right people +see and review it. +* Build consensus and integrate feedback. RFCs that have broad support are much +more likely to make progress than those that don't receive any comments. The +shepherd assigned to your RFC should help you get feedback from Rust developers +as well. +* The shepherd may schedule meetings with the author and/or relevant +stakeholders to discuss the issues in greater detail. +* The sub-team will discuss the RFC pull request, as much as possible in the +comment thread of the pull request itself. Offline discussion will be summarized +on the pull request comment thread. +* RFCs rarely go through this process unchanged, especially as alternatives and +drawbacks are shown. You can make edits, big and small, to the RFC to +clarify or change the design, but make changes as new commits to the pull +request, and leave a comment on the pull request explaining your changes. +Specifically, do not squash or rebase commits after they are visible on the pull +request. +* Once both proponents and opponents have clarified and defended positions and +the conversation has settled, the RFC will enter its *final comment period* +(FCP). This is a final opportunity for the community to comment on the pull +request and is a reminder for all members of the sub-team to be aware of the +RFC. +* The FCP lasts one week. It may be extended if consensus between sub-team +members cannot be reached. At the end of the FCP, the [sub-team] will either +accept the RFC by merging the pull request, assigning the RFC a number +(corresponding to the pull request number), at which point the RFC is 'active', +or reject it by closing the pull request. How exactly the sub-team decide on an +RFC is up to the sub-team. + + +## The role of the shepherd +[The role of the shepherd]: #the-role-of-the-shepherd + +During triage, every RFC will either be closed or assigned a shepherd from the +relevant sub-team. The role of the shepherd is to move the RFC through the +process. This starts with simply reading the RFC in detail and providing initial +feedback. The shepherd should also solicit feedback from people who are likely +to have strong opinions about the RFC. When this feedback has been incorporated +and the RFC seems to be in a steady state, the shepherd and/or sub-team leader +will announce an FCP. In general, the idea here is to "front-load" as much of +the feedback as possible before the point where we actually reach a decision - +by the end of the FCP, the decision on whether or not to accept the RFC should +usually be obvious from the RFC discussion thread. On occasion, there may not be +consensus but discussion has stalled. In this case, the relevant team will make +a decision. + + +## The RFC life-cycle +[The RFC life-cycle]: #the-rfc-life-cycle + +Once an RFC becomes active then authors may implement it and submit +the feature as a pull request to the Rust repo. Being 'active' is not +a rubber stamp, and in particular still does not mean the feature will +ultimately be merged; it does mean that in principle all the major +stakeholders have agreed to the feature and are amenable to merging +it. + +Furthermore, the fact that a given RFC has been accepted and is +'active' implies nothing about what priority is assigned to its +implementation, nor does it imply anything about whether a Rust +developer has been assigned the task of implementing the feature. +While it is not *necessary* that the author of the RFC also write the +implementation, it is by far the most effective way to see an RFC +through to completion: authors should not expect that other project +developers will take on responsibility for implementing their accepted +feature. + +Modifications to active RFC's can be done in follow-up pull requests. We strive +to write each RFC in a manner that it will reflect the final design of +the feature; but the nature of the process means that we cannot expect +every merged RFC to actually reflect what the end result will be at +the time of the next major release. + +In general, once accepted, RFCs should not be substantially changed. Only very +minor changes should be submitted as amendments. More substantial changes should +be new RFCs, with a note added to the original RFC. Exactly what counts as a +"very minor change" is up to the sub-team to decide. There are some more +specific guidelines in the sub-team RFC guidelines for the [language](lang_changes.md), +[libraries](libs_changes.md), and [compiler](compiler_changes.md). + + +## Reviewing RFC's +[Reviewing RFC's]: #reviewing-rfcs + +While the RFC pull request is up, the shepherd may schedule meetings with the +author and/or relevant stakeholders to discuss the issues in greater +detail, and in some cases the topic may be discussed at a sub-team +meeting. In either case a summary from the meeting will be +posted back to the RFC pull request. + +A sub-team makes final decisions about RFCs after the benefits and drawbacks are +well understood. These decisions can be made at any time, but the sub-team will +regularly issue decisions. When a decision is made, the RFC pull request will +either be merged or closed. In either case, if the reasoning is not clear from +the discussion in thread, the sub-team will add a comment describing the +rationale for the decision. + + +## Implementing an RFC +[Implementing an RFC]: #implementing-an-rfc + +Some accepted RFC's represent vital features that need to be +implemented right away. Other accepted RFC's can represent features +that can wait until some arbitrary developer feels like doing the +work. Every accepted RFC has an associated issue tracking its +implementation in the Rust repository; thus that associated issue can +be assigned a priority via the triage process that the team uses for +all issues in the Rust repository. + +The author of an RFC is not obligated to implement it. Of course, the +RFC author (like any other developer) is welcome to post an +implementation for review after the RFC has been accepted. + +If you are interested in working on the implementation for an 'active' +RFC, but cannot determine if someone else is already working on it, +feel free to ask (e.g. by leaving a comment on the associated issue). + + +## RFC Postponement +[RFC Postponement]: #rfc-postponement + +Some RFC pull requests are tagged with the 'postponed' label when they are +closed (as part of the rejection process). An RFC closed with “postponed” is +marked as such because we want neither to think about evaluating the proposal +nor about implementing the described feature until some time in the future, and +we believe that we can afford to wait until then to do so. Historically, +"postponed" was used to postpone features until after 1.0. Postponed pull +requests may be re-opened when the time is right. We don't have any formal +process for that, you should ask members of the relevant sub-team. + +Usually an RFC pull request marked as “postponed” has already passed +an informal first round of evaluation, namely the round of “do we +think we would ever possibly consider making this change, as outlined +in the RFC pull request, or some semi-obvious variation of it.” (When +the answer to the latter question is “no”, then the appropriate +response is to close the RFC, not postpone it.) + ### Help this is all too informal! +[Help this is all too informal!]: #help-this-is-all-too-informal -The process is intended to be as lightweight as reasonable for the -present circumstances. As usual, we are trying to let the process be -driven by consensus and community norms, not impose more structure than +The process is intended to be as lightweight as reasonable for the +present circumstances. As usual, we are trying to let the process be +driven by consensus and community norms, not impose more structure than necessary. -[core team]: https://github.com/mozilla/rust/wiki/Note-core-team +[sub-team]: http://www.rust-lang.org/team.html diff --git a/active/0003-opt-in-builtin-traits.md b/active/0003-opt-in-builtin-traits.md deleted file mode 100644 index 499aa93f428..00000000000 --- a/active/0003-opt-in-builtin-traits.md +++ /dev/null @@ -1,437 +0,0 @@ -- Start Date: 2014-03-24 -- RFC PR #: 19 -- Rust Issue #: 13231 - -# Summary - -- Rather than determining membership in the builtin traits - automatically, use `impl` (and `#\[deriving]`) declarations as with - other traits. -- The compiler will check that for each such `impl` declaration the - type meets certain criteria (i.e., to implement `Send` for a struct - `S`, all fields of `S` must have types which are `Send`). -- To check for membership in a builtin trait, we employ a slightly - modified version of the standard trait matching algorithm. - Modifications are needed because the language cannot express the - full set of impls we would require. -- Rename `Pod` trait to `Copy`. - -# Motivation - -In today's Rust, there are a number of builtin traits (sometimes -called "kinds"): `Send`, `Share`, and `Pod` (in the future, perhaps -`Sized`, but the details of that differ and will addressed in the DST -RFC). These are expressed as traits, but they are quite unlike other -traits in certain ways. One way is that they do not have any methods; -instead, implementing a trait like `Send` indicates that the type has -certain properties (defined below). The biggest difference, though, is -that these traits are not implemented manually by users. Instead, the -compiler decides automatically whether or not a type implements them -based on the contents of the type. - -This RFC argues to change this system and instead have users manually -implement the builtin traits for new types that they define. -Naturally there would be `#[deriving]` options as well for -convenience. The compiler's rules (e.g., that a sendable value cannot -reach a non-sendable value) would still be enforced, but at the point -where a builtin trait is explicitly implemented, rather than being -automatically deduced. - -There are a couple of reasons to make this change: - -1. **Consistency.** All other traits are opt-in, including very common - traits like `Eq` and `Clone`. It is somewhat surprising that the - builtin traits act differently. -2. **API Stability.** The builtin traits that are implemented by a - type are really part of its public API, but unlike other similar - things they are not declared. This means that seemingly innocent - changes to the definition of a type can easily break downstream - users. For example, imagine a type that changes from POD to non-POD - -- suddenly, all references to instances of that type go from - copies to moves. Similarly, a type that goes from sendable to - non-sendable can no longer be used as a message. By opting in to - being POD (or sendable, etc), library authors make explicit what - properties they expect to maintain, and which they do not. -3. **Pedagogy.** Many users find the distinction between pod types - (which copy) and linear types (which move) to be surprising. Making - pod-ness opt-in would help to ease this confusion. -4. **Safety and correctness.** In the presence of unsafe code, - compiler inference is unsound, and it is unfortunate that users - must remember to "opt out" from inapplicable kinds. There are also - concerns about future compatibility. Even in safe code, it can also - be useful to impose additional usage constriants beyond those - strictly required for type soundness. - -More details about these points are provided after the -`Detailed design` section. - -# Detailed design - -I will first cover the existing builtin traits and define what they -are used for. I will then explain each of the above reasons in more -detail. Finally, I'll give some syntax examples. - -## The builtin traits - -We currently define the following builtin traits: - -- `Send` -- a type that deeply owns all its contents. - (Examples: `int`, `~int`, `Cell`, not `&int` or `Rc`) -- `Pod` -- "plain old data" which can be safely copied via memcpy. - (Examples: `int`, `&int`, not `~int` or `&mut int`) -- `Share` -- a type which is threadsafe when accessed via an `&T` - reference. (Examples: `int`, `~int`, `&int`, `&mut int`, - `Atomic`, not `Cell` or `Rc`) - -Note that `Pod` is a proper subset of `Send`, but `Send` and `Share` -are unrelated: - -- `Cell` is `Send` but not `Share`. -- `&uint` is `Share` but not `Send`. - -## Proposed syntax - -Under this proposal, for a struct or enum to be considered send, -share, or pod, those traits must be explicitly implemented: - - struct Foo { ... } - impl Send for Foo { } - impl Pod for Foo { } - impl Share for Foo { } - -As usual, deriving forms would be available. - -Builtin traits can only be implemented for struct or enum types and -only within the crate in which that struct or enum is defined (see the -section on *Matching and Coherence* below). Whenever a builtin trait -is implemented, the compiler will enforce that all fields or that -struct/enum are of a type which implements the trait (or else of -`Unsafe` type, which matches all traits, see *Matching and -Coherence*). - - struct Foo<'a> { x: &'a int } - - // ERROR: Cannot implement `Send` because the field `x` has type - // `&'a int` which is not sendable. - impl<'a> Send for Foo<'a> { } - -For generic types, conditional impls are often required to avoid -errors. In the case of `Option`, for example, we must know that the -type `T` implements (e.g.) `Send` before we can implement `Send` for -`Option`: - - enum Option { Some(T), None } - impl Send for Option { } // ERROR: T may not implement `Send` - -Rewriting that code using a conditional impl would be fine: - - enum Option { Some(T), None } - impl Send for Option { } // ERROR: T may not implement `Send` - -(This is of course precisely what `#[deriving(Send)]` would generate.) - -## Naming of Pod - -Part of the proposal is to rename `Pod` to `Copy` so as to better -align the names of the builtin traits (they would not all be verbs). - -## Copy and linearity - -One of the most important aspects of this proposal is that the `Copy` -trait would be something that one "opts in" to. This means that -structs and enums would *move by default* unless their type is -explicitly declared to be `Copy`. So, for example, the following code -would be in error: - - struct Point { x: int, y: int } - ... - let p = Point { x: 1, y: 2 }; - let q = p; // moves p - print(p.x); // ERROR - -To allow that example, one would have to impl `Copy` for `Point`: - - struct Point { x: int, y: int } - impl Copy for Point { } - ... - let p = Point { x: 1, y: 2 }; - let q = p; // copies p, because Point is Pod - print(p.x); // OK - -Effectively this change introduces a three step ladder for types: - -1. If you do nothing, your type is *linear*, meaning that it moves - from place to place and can never be copied in any way. (We need a - better name for that.) -2. If you implement `Clone`, your type is *cloneable*, meaning that it - moves from place to place, but it can be explicitly cloned. This is - suitable for cases where copying is expensive. -3. If you implement `Copy`, your type is *copyable*, meaning that - it is just copied by default without the need for an explicit - clone. This is suitable for small bits of data like ints or - points. - -What is nice about this change is that when a type is defined, the -user makes an *explicit choice* between these three options. - -## Matching and coherence - -In general, determining whether a type implements a builtin trait can -follow the existing trait matching algorithm, but it will have to be -somewhat specialized. The problem is that we are somewhat limited in -the kinds of impls that we can write, so some of the implementations -we would want must be "hard-coded". - -Specifically we are limited around tuples, fixed-length array types, -proc types, closure types, and trait types: - -- *Fixed-length arrays:* A fixed-length array `[T, ..n]` is `Send/Copy/Share` - if `T` is `Send/Copy/Share`, regardless of `n`. (Conceivably, we could - also say that if `n` is `0`, then `[T, ..n]` is `Send/Copy/Share` regardless - of `T`). -- *Tuples*: A tuple `(T_0, ..., T_n)` is `Send/Copy/Share` depending - if, for all `i`, `T_i` is `Send/Copy/Share`. -- *Closures*: A closure type `|T_0, ..., T_n|:K -> T_n+1` is never - `Send` nor `Copy`. It is `Share` iff `K` is `Share`. -- *Procs*: A proc type `proc(T_0, ..., T_n):K -> T_n+1` is - never `Copy`. It is `Send/Share` iff `K` is `Send/Share`. -- *Trait objects*: A trait object type `Trait:K` (assuming DST here ;) is - never `Copy`. It may be `Send/Share` iff `K` is `Send/Share`. - -We cannot currently express the above conditions using impls. We may -at some point in the future grow the ability to express some of them. -For now, though, these "impls" will be hardcoded into the algorithm. - -Otherwise, the complete list of builtin impls is roughly like this -(undoubtedly I am missing a few things): - - trait Send; - trait Share; - trait Copy; // aka Pod - - impl Copy for "scalars like uint, u8, etc" { } - impl Copy for *T { } - impl<'a,T> Copy for &'a T { } - - impl Send for "scalars like uint, u8, etc" { } - impl for *T { } - impl for ~T { } - - impl Share for "scalars like uint, u8, etc" { } - impl for *T { } - impl for ~T { } - impl<'a,T:Share> for &'a T { } - impl<'a,T:Share> for &'a mut T { } // (if this surprises you, see * below) - -Per the usual coherence rules, since we will have the above impls in -`libstd`, and we will have impls for types like tuples and -fixed-length arrays baked in, the only impls that end users are -permitted to write are impls for struct and enum types that they -define themselves. This is simply an extra coherence rule, hard-coded -because some of the impls (e.g., for tuples) are hard-coded. - -(\*) Wait, `&mut T` is `Share`? How is that threadsafe? - -Somewhat surprisingly, `&mut T` is share. Remember, a type `U` is -share if all possible operations on `&U` are threadsafe. In this case, -`U` is `&mut T`, this means we have to consider what operations are -possible on a `& &mut T`. In that case ,the `&mut T` is found in an -aliasable location and hence is immutable (if you can find a counter -example, that's definitely a bug). - -Moreover, there is one further exception to the rules. The -`Unsafe` type is *always* considered to implement `Share`, no -matter the type `T`. `Send` and `Copy` are implemented if `T` is -`Send` and `Copy`. The motivation here is that we want to be able to -permit a type like `Mutex` to be `Share` even if it closes over data -that is not `Share`. - -# Implementation plan - -Here is a loose implementation plan that @flaper87 and I worked -out. No doubt things will change along the way. - -1. Create a nicely encapsulated subroutine S to check whether type T - meets bound B For example, to test that some type T is Pod. @eddyb - did something recently you can use as an example, where he added - some code to do vtable matching for the Drop trait from trans. One - catch is that we will definitely want some sort of cache. - -2. Modify the vtable code to handle builtin bounds and add builtin - impls (see below) - - We'll need special code to accommodate the types detailed above - -3. Use the subroutine S in moves.rs to do the "is pod" check. - -4. Same for rustc::middle::kind, except that we should move the "check - bounds on type parameters" into type check. - - Why do this? Because these checks will now be so close to vtable - matching it no longer makes sense to do them in `kind.rs` - -5. Check to make sure that the impls the user provides are safe: - - User-defined impls can only apply to enums or structs - - If implementing a builtin trait T for a struct type S, each - field of S must have a type that implements S. - - same for enums, but "for each variant, for each argument" essentially - -# Expanded motivation - -Now that the detailed design is presented, I wanted to expand more on -the motivation. - -## Consistency - -This change would bring the builtin traits more in line with other -common traits, such as `Eq` and `Clone`. On a historical note, this -proposal continues a trend, in that both of those operations used to -be natively implemented by the compiler as well. - -## API Stability - -The set of builtin traits implemented by a type must be considered -part of its public inferface. At present, though, it's quite invisible -and not under user control. If a type is changed from `Pod` to -non-pod, or `Send` to non-send, no error message will result until -client code attempts to use an instance of that type. In general we -have tried to avoid this sort of situation, and instead have each -declaration contain enough information to check it indepenently of its -uses. Issue #12202 describes this same concern, specifically with -respect to stability attributes. - -Making opt-in explicit effectively solves this problem. It is clearly -written out which traits a type is expected to fulfill, and if the -type is changed in such a way as to violate one of these traits, an -error will be reported at the `impl` site (or `#[deriving]` -declaration). - -## Pedagogy - -When users first start with Rust, ownership and ownership transfer is -one of the first things that they must learn. This is made more -confusing by the fact that types are automatically divided into pod -and non-pod without any sort of declaration. It is not necessarily -obvious why a `T` and `~T` value, which are *semantically equivalent*, -behave so differently by default. Makes the pod category something you -opt into means that types will all be linear by default, which can -make teaching and leaning easier. - -## Safety and correctness: unsafe code - -For safe code, the compiler's rules for deciding whether or not a type -is sendable (and so forth) are perfectly sound. However, when unsafe -code is involved, the compiler may draw the wrong conclusion. For such -cases, types must *opt out* of the builtin traits. - -In general, the *opt out* approach seems to be hard to reason about: -many people (including myself) find it easier to think about what -properties a type *has* than what properties it *does not* have, -though clearly the two are logically equivalent in this binary world -we programmer's inhabit. - -More concretely, opt out is dangerous because it means that types with -unsafe methods are generally *wrong by default*. As an example, -consider the definition of the `Cell` type: - - struct Cell { - priv value: T - } - -This is a perfectly ordinary struct, and hence the compiler would -conclude that cells are freezable (if `T` is freezable) and so forth. -However, the *methods* attached to `Cell` use unsafe magic to mutate -`value`, even when the `Cell` is aliased: - - impl Cell { - pub fn set(&self, value: T) { - unsafe { - *cast::transmute_mut(&self.value) = value - } - } - } - -To accommodate this, we currently use *marker types* -- special types -known to the compiler which are considered nonpod and so forth. Therefore, -the full definition of `Cell` is in fact: - - pub struct Cell { - priv value: T, - priv marker1: marker::InvariantType, - priv marker2: marker::NoFreeze, - } - -Note the two markers. The first, `marker1`, is a hint to the variance -engine indicating that the type `Cell` must be invariant with respect -to its type argument. The second, `marker2`, indicates that `Cell` is -non-freeze. This then informs the compiler that the referent of a -`&Cell` can't be considered immutable. The problem here is that, if -you don't know to opt-out, you'll wind up with a type definition that -is unsafe. - -This argument is rather weakened by the continued necessity of a -`marker::InvariantType` marker. This could be read as an argument -towards explicit variance. However, I think that in this particular -case, the better solution is to introduce the `Mut` type described -in #12577 -- the `Mut` type would give us the invariance. - -Using `Mut` brings us back to a world where any type that uses -`Mut` to obtain interior mutability is correct by default, at least -with respect to the builtin kinds. Types like `Atomic` and -`Volatile`, which guarantee data race freedom, would therefore have -to *opt in* to the `Share` kind, and types like `Cell` would simply -do nothing. - -## Safety and correctness: future compatibility - -Another concern about having the compiler automatically infer -membership into builtin bounds is that we may find cause to add new -bounds in the future. In that case, existing Rust code which uses -unsafe methods might be inferred incorrectly, because it would not -know to opt out of those future bounds. Therefore, any future bounds -will *have* to be opt out anyway, so perhaps it is best to be -consistent from the start. - -## Safety and correctness: semantic constraints - -Even if type safety is maintained, some types ought not to be copied -for semantic reasons. An example from the compiler is the -`Datum` type, which is used in code generation to represent -the computed result of an rvalue expression. At present, the type -`Rvalue` implements a (empty) destructor -- the sole purpose of this -destructor is to ensure that datums are not consumed more than once, -because this would likely correspond to a code gen bug, as it would -mean that the result of the expression evaluation is consumed more -than once. Another example might be a newtype'd integer used for -indexing into a thread-local array: such a value ought not to be -sendable. And so forth. Using marker types for these kinds of -situations, or empty destructors, is very awkward. Under this -proposal, users needs merely refrain from implementing the relevant -traits. - -# Alternatives and counterarguments - -The downsides of this proposal are: - -- There is some annotation burden. I had intended to gather statistics - to try and measure this but have not had the time. - -- If a library forgets to implement all the relevant traits for a - type, there is little recourse for users of that library beyond pull - requests to the original repository. This is already true with - traits like `Eq` and `Ord`. However, as SiegeLord noted on IRC, that - you can often work around the absence of `Eq` with a newtype - wrapper, but this is not true if a type fails to implement `Send` or - `Copy`. This danger (forgetting to implement traits) is essentially - the counterbalance to the "forward compatbility" case made above: - where implementing traits by default means types may implement too - much, forcing explicit opt in means types may implement too little. - One way to mitigate this problem would be to have a lint for when an - impl of some kind (etc) would be legal, but isn't implemented, at - least for publicly exported types in library crates. - -What other designs have been considered? What is the impact of not doing this? - -# Unresolved questions - -Do we want some kind of shorthand for common trait combinations? I -originally proposed `Data` but we couldn't settle on what a useful set -of trait combinations would be. This can easily be added later. diff --git a/compiler_changes.md b/compiler_changes.md new file mode 100644 index 00000000000..4b9f8cdf17f --- /dev/null +++ b/compiler_changes.md @@ -0,0 +1,53 @@ +# RFC policy - the compiler + +We have not previously had an RFC system for compiler changes, so policy here is +likely to change as we get the hang of things. We don't want to slow down most +compiler development, but on the other hand we do want to do more design work +ahead of time on large additions and refactorings. + +Compiler RFCs will be managed by the compiler sub-team, and tagged `T-compiler`. +The compiler sub-team will do an initial triage of new PRs within a week of +submission. The result of triage will either be that the PR is assigned to a +member of the sub-team for shepherding, the PR is closed because the sub-team +believe it should be done without an RFC, or closed because the sub-team feel it +should clearly not be done and further discussion is not necessary. We'll follow +the standard procedure for shepherding, final comment period, etc. + +Where there is significant design work for the implementation of a language +feature, the preferred workflow is to submit two RFCs - one for the language +design and one for the implementation design. The implementation RFC may be +submitted later if there is scope for large changes to the language RFC. + + +## Changes which need an RFC + +* New lints (these fall under the lang team) +* Large refactorings or redesigns of the compiler +* Changing the API presented to syntax extensions or other compiler plugins in + non-trivial ways +* Adding, removing, or changing a stable compiler flag +* The implementation of new language features where there is significant change + or addition to the compiler. There is obviously some room for interpretation + about what consitutes a "significant" change and how much detail the + implementation RFC needs. For guidance, [associated items](text/0195-associated-items.md) + and [UFCS](text/0132-ufcs.md) would clearly need an implementation RFC, + [type ascription](text/0803-type-ascription.md) and + [lifetime elision](text/0141-lifetime-elision.md) would not. +* Any other change which causes backwards incompatible changes to stable + behaviour of the compiler, language, or libraries + + +## Changes which don't need an RFC + +* Bug fixes, improved error messages, etc. +* Minor refactoring/tidying up +* Implmenting language features which have an accepted RFC, where the + implementation does not significantly change the compiler or require + significant new design work +* Adding unstable API for tools (note that all compiler API is currently unstable) +* Adding, removing, or changing an unstable compiler flag (if the compiler flag + is widely used there should be at least some discussion on discuss, or an RFC + in some cases) + +If in doubt it is probably best to just announce the change you want to make to +the compiler subteam on discuss or IRC, and see if anyone feels it needs an RFC. diff --git a/lang_changes.md b/lang_changes.md new file mode 100644 index 00000000000..bc09d9a417e --- /dev/null +++ b/lang_changes.md @@ -0,0 +1,38 @@ +# RFC policy - language design + +Pretty much every change to the language needs an RFC. Note that new +lints (or major changes to an existing lint) are considered changes to +the language. + +Language RFCs are managed by the language sub-team, and tagged `T-lang`. The +language sub-team will do an initial triage of new PRs within a week of +submission. The result of triage will either be that the PR is assigned to a +member of the sub-team for shepherding, the PR is closed as postponed because +the subteam believe it might be a good idea, but is not currently aligned with +Rust's priorities, or the PR is closed because the sub-team feel it should +clearly not be done and further discussion is not necessary. In the latter two +cases, the sub-team will give a detailed explanation. We'll follow the standard +procedure for shepherding, final comment period, etc. + + +## Amendments + +Sometimes in the implementation of an RFC, changes are required. In general +these don't require an RFC as long as they are very minor and in the spirit of +the accepted RFC (essentially bug fixes). In this case implementers should +submit an RFC PR which amends the accepted RFC with the new details. Although +the RFC repository is not intended as a reference manual, it is preferred that +RFCs do reflect what was actually implemented. Amendment RFCs will go through +the same process as regular RFCs, but should be less controversial and thus +should move more quickly. + +When a change is more dramatic, it is better to create a new RFC. The RFC should +be standalone and reference the original, rather than modifying the existing +RFC. You should add a comment to the original RFC with referencing the new RFC +as part of the PR. + +Obviously there is some scope for judgment here. As a guideline, if a change +affects more than one part of the RFC (i.e., is a non-local change), affects the +applicability of the RFC to its motivating use cases, or there are multiple +possible new solutions, then the feature is probably not 'minor' and should get +a new RFC. diff --git a/libs_changes.md b/libs_changes.md new file mode 100644 index 00000000000..31f1de0210d --- /dev/null +++ b/libs_changes.md @@ -0,0 +1,114 @@ +# RFC guidelines - libraries sub-team + +# Motivation + +* RFCs are heavyweight: + * RFCs generally take at minimum 2 weeks from posting to land. In + practice it can be more on the order of months for particularly + controversial changes. + * RFCs are a lot of effort to write; especially for non-native speakers or + for members of the community whose strengths are more technical than literary. + * RFCs may involve pre-RFCs and several rewrites to accommodate feedback. + * RFCs require a dedicated shepherd to herd the community and author towards + consensus. + * RFCs require review from a majority of the subteam, as well as an official + vote. + * RFCs can't be downgraded based on their complexity. Full process always applies. + Easy RFCs may certainly land faster, though. + * RFCs can be very abstract and hard to grok the consequences of (no implementation). + +* PRs are low *overhead* but potentially expensive nonetheless: + * Easy PRs can get insta-merged by any rust-lang contributor. + * Harder PRs can be easily escalated. You can ping subject-matter experts for second + opinions. Ping the whole team! + * Easier to grok the full consequences. Lots of tests and Crater to save the day. + * PRs can be accepted optimistically with bors, buildbot, and the trains to guard + us from major mistakes making it into stable. The size of the nightly community + at this point in time can still mean major community breakage regardless of trains, + however. + * HOWEVER: Big PRs can be a lot of work to make only to have that work rejected for + details that could have been hashed out first. + +* RFCs are *only* meaningful if a significant and diverse portion of the +community actively participates in them. The official teams are not +sufficiently diverse to establish meaningful community consensus by agreeing +amongst themselves. + +* If there are *tons* of RFCs -- especially trivial ones -- people are less +likely to engage with them. Official team members are super busy. Domain experts +and industry professionals are super busy *and* have no responsibility to engage +in RFCs. Since these are *exactly* the most important people to get involved in +the RFC process, it is important that we be maximally friendly towards their +needs. + + +# Is an RFC required? + +The overarching philosophy is: *do whatever is easiest*. If an RFC +would be less work than an implementation, that's a good sign that an RFC is +necessary. That said, if you anticipate controversy, you might want to short-circuit +straight to an RFC. For instance new APIs almost certainly merit an RFC. Especially +as `std` has become more conservative in favour of the much more agile cargoverse. + +* **Submit a PR** if the change is a: + * Bugfix + * Docfix + * Obvious API hole patch, such as adding an API from one type to a symmetric type. + e.g. `Vec -> Box<[T]>` clearly motivates adding `String -> Box` + * Minor tweak to an unstable API (renaming, generalizing) + * Implementing an "obvious" trait like Clone/Debug/etc +* **Submit an RFC** if the change is a: + * New API + * Semantic Change to a stable API + * Generalization of a stable API (e.g. how we added Pattern or Borrow) + * Deprecation of a stable API + * Nontrivial trait impl (because all trait impls are insta-stable) +* **Do the easier thing** if uncertain. (choosing a path is not final) + + +# Non-RFC process + +* A (non-RFC) PR is likely to be **closed** if clearly not acceptable: + * Disproportionate breaking change (small inference breakage may be acceptable) + * Unsound + * Doesn't fit our general design philosophy around the problem + * Better as a crate + * Too marginal for std + * Significant implementation problems + +* A PR may also be closed because an RFC is approriate. + +* A (non-RFC) PR may be **merged as unstable**. In this case, the feature +should have a fresh feature gate and an associated tracking issue for +stabilisation. Note that trait impls and docs are insta-stable and thus have no +tracking issue. This may imply requiring a higher level of scrutiny for such +changes. + +However, an accepted RFC is not a rubber-stamp for merging an implementation PR. +Nor must an implementation PR perfectly match the RFC text. Implementation details +may merit deviations, though obviously they should be justified. The RFC may be +amended if deviations are substantial, but are not generally necessary. RFCs should +favour immutability. The RFC + Issue + PR should form a total explanation of the +current implementation. + +* Once something has been merged as unstable, a shepherd should be assigned + to promote and obtain feedback on the design. + +* Every time a release cycle ends, the libs teams assesses the current unstable + APIs and selects some number of them for potential stabilization during the + next cycle. These are announced for FCP at the beginning of the cycle, and + (possibly) stabilized just before the beta is cut. + +* After the final comment period, an API should ideally take one of two paths: + * **Stabilize** if the change is desired, and consensus is reached + * **Deprecate** is the change is undesired, and consensus is reached + * **Extend the FCP** is the change cannot meet consensus + * If consensus *still* can't be reached, consider requiring a new RFC or + just deprecating as "too controversial for std". + +* If any problems are found with a newly stabilized API during its beta period, + *strongly* favour reverting stability in order to prevent stabilizing a bad + API. Due to the speed of the trains, this is not a serious delay (~2-3 months + if it's not a major problem). + + diff --git a/complete/0004-private-fields.md b/text/0001-private-fields.md similarity index 97% rename from complete/0004-private-fields.md rename to text/0001-private-fields.md index a37879467df..cf5d881a04a 100644 --- a/complete/0004-private-fields.md +++ b/text/0001-private-fields.md @@ -1,6 +1,6 @@ - Start Date: 2014-03-11 -- RFC PR #: 1 -- Rust Issue #: 8122 +- RFC PR: [rust-lang/rfcs#1](https://github.com/rust-lang/rfcs/pull/1) +- Rust Issue: [rust-lang/rust#8122](https://github.com/rust-lang/rust/issues/8122) # Summary diff --git a/active/0001-rfc-process.md b/text/0002-rfc-process.md similarity index 80% rename from active/0001-rfc-process.md rename to text/0002-rfc-process.md index cae5fdedd6e..4bd0c1f02a8 100644 --- a/active/0001-rfc-process.md +++ b/text/0002-rfc-process.md @@ -1,6 +1,6 @@ - Start Date: 2014-03-11 -- RFC PR #: 2, 6 -- Rust Issue #: N/A +- RFC PR: [rust-lang/rfcs#2](https://github.com/rust-lang/rfcs/pull/2), [rust-lang/rfcs#6](https://github.com/rust-lang/rfcs/pull/6) +- Rust Issue: N/A # Summary @@ -34,6 +34,7 @@ changes to the Rust distribution. What constitutes a "substantial" change is evolving based on community norms, but may include the following. - Any semantic or syntactic change to the language that is not a bugfix. + - Removing language features, including those that are feature-gated. - Changes to the interface between the compiler and libraries, including lang items and intrinsics. - Additions to `std` @@ -59,8 +60,8 @@ RFC merged into the RFC repo as a markdown file. At that point the RFC is 'active' and may be implemented with the goal of eventual inclusion into Rust. -* Fork the RFC repo http://github.com/rust-lang/rfcs -* Copy `0000-template.md` to `active/0000-my-feature.md` (where +* Fork the RFC repo https://github.com/rust-lang/rfcs +* Copy `0000-template.md` to `text/0000-my-feature.md` (where 'my-feature' is descriptive. don't assign an RFC number yet). * Fill in the RFC * Submit a pull request. The pull request is the time to get review of @@ -70,16 +71,19 @@ are much more likely to make progress than those that don't receive any comments. Eventually, somebody on the [core team] will either accept the RFC by -merging the pull request and assigning the RFC a number, at which point -the RFC is 'active', or reject it by closing the pull request. +merging the pull request, at which point the RFC is 'active', or +reject it by closing the pull request. Whomever merges the RFC should do the following: -* Assign a sequential id. -* Add the file in the active directory. -* Create a corresponding issue on Rust. -* Fill in the remaining metadata in the RFC header, including the original - PR # and Rust issue #. +* Assign an id, using the PR number of the RFC pull request. (If the RFC + has multiple pull requests associated with it, choose one PR number, + preferably the minimal one.) +* Add the file in the `text/` directory. +* Create a corresponding issue on [Rust repo](https://github.com/rust-lang/rust) +* Fill in the remaining metadata in the RFC header, including links for + the original pull request(s) and the newly created Rust issue. +* Add an entry in the [Active RFC List] of the root `README.md`. * Commit everything. Once an RFC becomes active then authors may implement it and submit the @@ -90,9 +94,11 @@ have agreed to the feature and are amenable to merging it. Modifications to active RFC's can be done in followup PR's. An RFC that makes it through the entire process to implementation is considered -'complete' and is moved to the 'complete' folder; an RFC that fails +'complete' and is removed from the [Active RFC List]; an RFC that fails after becoming active is 'inactive' and moves to the 'inactive' folder. +[Active RFC List]: ../README.md#active-rfc-list + # Alternatives Retain the current informal RFC process. The newly proposed RFC process is diff --git a/complete/0002-attribute-usage.md b/text/0003-attribute-usage.md similarity index 95% rename from complete/0002-attribute-usage.md rename to text/0003-attribute-usage.md index 36af8a5b7a4..4891d74944f 100644 --- a/complete/0002-attribute-usage.md +++ b/text/0003-attribute-usage.md @@ -1,6 +1,6 @@ - Start Date: 2012-03-20 -- RFC PR #: 3 -- Rust Issue #: 14373 +- RFC PR: [rust-lang/rfcs#3](https://github.com/rust-lang/rfcs/pull/3) +- Rust Issue: [rust-lang/rust#14373](https://github.com/rust-lang/rust/issues/14373) # Summary diff --git a/active/0005-new-intrinsics.md b/text/0008-new-intrinsics.md similarity index 90% rename from active/0005-new-intrinsics.md rename to text/0008-new-intrinsics.md index ecf56bad70d..78043cf9270 100644 --- a/active/0005-new-intrinsics.md +++ b/text/0008-new-intrinsics.md @@ -1,6 +1,10 @@ - Start Date: 2014-03-14 -- RFC PR #: 8 -- Rust Issue #: (leave this empty) +- RFC PR: [rust-lang/rfcs#8](https://github.com/rust-lang/rfcs/pull/8) +- Rust Issue: + +** Note: this RFC was never implemented and has been retired. The +design may still be useful in the future, but before implementing we +would prefer to revisit it so as to be sure it is up to date. ** # Summary diff --git a/text/0016-more-attributes.md b/text/0016-more-attributes.md new file mode 100644 index 00000000000..3cd6554f070 --- /dev/null +++ b/text/0016-more-attributes.md @@ -0,0 +1,215 @@ +- Start Date: 2014-03-20 +- RFC PR: [rust-lang/rfcs#16](https://github.com/rust-lang/rfcs/pull/16) +- Rust Issue: [rust-lang/rust#15701](https://github.com/rust-lang/rust/issues/15701) + +# Summary + +Allow attributes on more places inside functions, such as statements, +blocks and expressions. + +# Motivation + +One sometimes wishes to annotate things inside functions with, for +example, lint `#[allow]`s, conditional compilation `#[cfg]`s, and even +extra semantic (or otherwise) annotations for external tools. + +For the lints, one can currently only activate lints at the level of +the function which is possibly larger than one needs, and so may allow +other "bad" things to sneak through accidentally. E.g. + +```rust +#[allow(uppercase_variable)] +let L = List::new(); // lowercase looks like one or capital i +``` + +For the conditional compilation, the work-around is duplicating the +whole containing function with a `#[cfg]`, or breaking the conditional +code into a its own function. This does mean that any variables need +to be explicitly passed as arguments. + +The sort of things one could do with other arbitrary annotations are + +```rust +#[allowed_unsafe_actions(ffi)] +#[audited="2014-04-22"] +unsafe { ... } +``` + +and then have an external tool that checks that that `unsafe` block's +only unsafe actions are FFI, or a tool that lists blocks that have +been changed since the last audit or haven't been audited ever. + +The minimum useful functionality would be supporting attributes on +blocks and `let` statements, since these are flexible enough to allow +for relatively precise attribute handling. + +# Detailed design + +Normal attribute syntax on `let` statements, blocks and expressions. + +```rust +fn foo() { + #[attr1] + let x = 1; + + #[attr2] + { + // code + } + + #[attr3] + unsafe { + // code + } + #[attr4] foo(); + + let x = #[attr5] 1; + + qux(3 + #[attr6] 2); + + foo(x, #[attr7] y, z); +} +``` + +Attributes bind tighter than any operator, that is `#[attr] x op y` is +always parsed as `(#[attr] x) op y`. + +## `cfg` + +It is definitely an error to place a `#[cfg]` attribute on a +non-statement expressions, that is, `attr1`--`attr4` can possibly be +`#[cfg(foo)]`, but `attr5`--`attr7` cannot, since it makes little +sense to strip code down to `let x = ;`. + +However, like `#ifdef` in C/C++, widespread use of `#[cfg]` may be an +antipattern that makes code harder to read. This RFC is just adding +the ability for attributes to be placed in specific places, it is not +mandating that `#[cfg]` actually be stripped in those places (although +it should be an error if it is ignored). + +## Inner attributes + +Inner attributes can be placed at the top of blocks (and other +structure incorporating a block) and apply to that block. + +```rust +{ + #![attr11] + + foo() +} + +match bar { + #![attr12] + + _ => {} +} + +// are the same as + +#[attr11] +{ + foo() +} + +#[attr12] +match bar { + _ => {} +} +``` + +## `if` + +Attributes would be disallowed on `if` for now, because the +interaction with `if`/`else` chains are funky, and can be simulated in +other ways. + +```rust +#[cfg(not(foo))] +if cond1 { +} else #[cfg(not(bar))] if cond2 { +} else #[cfg(not(baz))] { +} +``` + +There is two possible interpretations of such a piece of code, +depending on if one regards the attributes as attaching to the whole +`if ... else` chain ("exterior") or just to the branch on which they +are placed ("interior"). + +- `--cfg foo`: could be either removing the whole chain (exterior) or + equivalent to `if cond2 {} else {}` (interior). +- `--cfg bar`: could be either `if cond1 {}` (*e*) or `if cond1 {} + else {}` (*i*) +- `--cfg baz`: equivalent to `if cond1 {} else if cond2 {}` (no subtlety). +- `--cfg foo --cfg bar`: could be removing the whole chain (*e*) or the two + `if` branches (leaving only the `else` branch) (*i*). + +(This applies to any attribute that has some sense of scoping, not +just `#[cfg]`, e.g. `#[allow]` and `#[warn]` for lints.) + +As such, to avoid confusion, attributes would not be supported on +`if`. Alternatives include using blocks: + +```rust +#[attr] if cond { ... } else ... +// becomes, for an exterior attribute, +#[attr] { + if cond { ... } else ... +} +// and, for an interior attribute, +if cond { + #[attr] { ... } +} else ... +``` + +And, if the attributes are meant to be associated with the actual +branching (e.g. a hypothetical `#[cold]` attribute that indicates a +branch is unlikely), one can annotate `match` arms: + +```rust +match cond { + #[attr] true => { ... } + #[attr] false => { ... } +} +``` + +# Drawbacks + +This starts mixing attributes with nearly arbitrary code, possibly +dramatically restricting syntactic changes related to them, for +example, there was some consideration for using `@` for attributes, +this change may make this impossible (especially if `@` gets reused +for something else, e.g. Python is +[using it for matrix multiplication](http://legacy.python.org/dev/peps/pep-0465/)). It +may also make it impossible to use `#` for other things. + +As stated above, allowing `#[cfg]`s everywhere can make code harder to +reason about, but (also stated), this RFC is not for making such +`#[cfg]`s be obeyed, it just opens the language syntax to possibly +allow it. + +# Alternatives + +These instances could possibly be approximated with macros and helper +functions, but to a low degree degree (e.g. how would one annotate a +general `unsafe` block). + +Only allowing attributes on "statement expressions" that is, +expressions at the top level of a block, this is slightly limiting; +but we can expand to support other contexts backwards compatibly in +the future. + +The `if`/`else` issue may be able to be resolved by introducing +explicit "interior" and "exterior" attributes on `if`: by having +`#[attr] if cond { ...` be an exterior attribute (applying to the +whole `if`/`else` chain) and `if cond #[attr] { ... ` be an interior +attribute (applying to only the current `if` branch). There is no +difference between interior and exterior for an `else {` branch, and +so `else #[attr] {` is sufficient. + + +# Unresolved questions + +Are the complications of allowing attributes on arbitrary +expressions worth the benefits? diff --git a/text/0019-opt-in-builtin-traits.md b/text/0019-opt-in-builtin-traits.md new file mode 100644 index 00000000000..c29e0356326 --- /dev/null +++ b/text/0019-opt-in-builtin-traits.md @@ -0,0 +1,531 @@ +- Start Date: 2014-09-18 +- RFC PR #: [rust-lang/rfcs#19](https://github.com/rust-lang/rfcs/pull/19), [rust-lang/rfcs#127](https://github.com/rust-lang/rfcs/pull/127) +- Rust Issue #: [rust-lang/rust#13231](https://github.com/rust-lang/rust/issues/13231) + +# Summary + +The high-level idea is to add language features that simultaneously +achieve three goals: + +1. move `Send` and `Share` out of the language entirely and into the + standard library, providing mechanisms for end users to easily + implement and use similar "marker" traits of their own devising; +2. make "normal" Rust types sendable and sharable by default, without + the need for explicit opt-in; and, +3. continue to require "unsafe" Rust types (those that manipulate + unsafe pointers or implement special abstractions) to "opt-in" to + sendability and sharability with an unsafe declaration. + +These goals are achieved by two changes: + +1. **Unsafe traits:** An *unsafe trait* is a trait that is unsafe to + implement, because it represents some kind of trusted + assertion. Note that unsafe traits are perfectly safe to + *use*. `Send` and `Share` are examples of unsafe traits: + implementing these traits is effectively an assertion that your + type is safe for threading. +2. **Default and negative impls:** A *default impl* is one that + applies to all types, except for those types that explicitly *opt + out*. For example, there would be a default impl for `Send`, + indicating that all types are `Send` "by default". + + To counteract a default impl, one uses a *negative impl* that + explicitly opts out for a given type `T` and any type that contains + `T`. For example, this RFC proposes that unsafe pointers `*T` will + opt out of `Send` and `Share`. This implies that unsafe pointers + cannot be sent or shared between threads by default. It also + implies that any structs which contain an unsafe pointer cannot be + sent. In all examples encountered thus far, the set of negative + impls is fixed and can easily be declared along with the trait + itself. + + Safe wrappers like `Arc`, `Atomic`, or `Mutex` can opt to implement + `Send` and `Share` explicitly. This will then make them be + considered sendable (or sharable) even though they contain unsafe + pointers etc. + +Based on these two mechanisms, we can remove the notion of `Send` and +`Share` as builtin concepts. Instead, these would become unsafe traits +with default impls (defined purely in the library). The library would +explicitly *opt out* of `Send`/`Share` for certain types, like unsafe +pointers (`*T`) or interior mutability (`Unsafe`). Any type, +therefore, which contains an unsafe pointer would be confined (by +default) to a single thread. Safe wrappers around those types, like +`Arc`, `Atomic`, or `Mutex`, can then opt back in by explicitly +implementing `Send` (these impls would have to be designed as unsafe). + +# Motivation + +Since proposing opt-in builtin traits, I have become increasingly +concerned about the notion of having `Send` and `Share` be strictly +opt-in. There are two main reasons for my concern: + +1. Rust is very close to being a language where computations can be + parallelized by default. Making `Send`, and *especially* `Share`, + opt-in makes that harder to achieve. +2. The model followed by `Send`/`Share` cannot easily be extended to + other traits in the future nor can it be extended by end-users with + their own similar traits. It is worrisome that I have come across + several use cases already which might require such extension + (described below). + +To elaborate on those two points: With respect to parallelization: for +the most part, Rust types are threadsafe "by default". To make +something non-threadsafe, you must employ unsychronized interior +mutability (e.g., `Cell`, `RefCell`) or unsychronized shared ownership +(`Rc`). In both cases, there are also synchronized variants available +(`Mutex`, `Arc`, etc). This implies that we can make APIs to enable +intra-task parallelism and they will work ubiquitously, so long as +people avoid `Cell` and `Rc` when not needed. Explicit opt-in +threatens that future, however, because fewer types will implement +`Share`, even if they are in fact threadsafe. + +With respect to extensibility, it is partiularly worrisome that if a +library forgets to implement `Send` or `Share`, downstream clients are +stuck. They cannot, for example, use a newtype wrapper, because it +would be illegal to implement `Send` on the newtype. This implies that +all libraries must be vigilant about implementing `Send` and `Share` +(even more so than with other pervasive traits like `Eq` or `Ord`). +The current plan is to address this via lints and perhaps some +convenient deriving syntax, which may be adequate for `Send` and +`Share`. But if we wish to add new "classification" traits in the +future, these new traits won't have been around from the start, and +hence won't be implemented by all existing code. + +Another concern of mine is that end users cannot define classification +traits of their own. For example, one might like to define a trait for +"tainted" data, and then test to ensure that tainted data doesn't pass +through some generic routine. There is no particular way to do this +today. + +More examples of classification traits that have come up recently in +various discussions: + +- `Snapshot` (nee `Freeze`), which defines *logical* immutability + rather than *physical* immutability. `Rc`, for example, would + be considered `Snapshot`. `Snapshot` could be useful because + `Snapshot+Clone` indicates a type whose value can be safely + "preserved" by cloning it. +- `NoManaged`, a type which does not contain managed data. This might + be useful for integrating garbage collection with custom allocators + which do not wish to serve as potential roots. +- `NoDrop`, a type which does not contain an explicit destructor. This + can be used to avoid nasty GC quandries. + +All three of these (`Snapshot`, `NoManaged`, `NoDrop`) can be easily +defined using traits with default impls. + +A final, somewhat weaker, motivator is aesthetics. Ownership has allowed +us to move threading almost entirely into libaries. The one exception +is that the `Send` and `Share` types remain built-in. Opt-in traits +makes them *less* built-in, but still requires custom logic in the +"impl matching" code as well as special safety checks when +`Safe` or `Share` are implemented. + +After the changes I propose, the only traits which would be +specicially understood by the compiler are `Copy` and `Sized`. I +consider this acceptable, since those two traits are intimately tied +to the core Rust type system, unlike `Send` and `Share`. + +# Detailed design + +## Unsafe traits + +Certain traits like `Send` and `Share` are critical to memory safety. +Nonetheless, it is not feasible to check the thread-safety of all +types that implement `Send` and `Share`. Therefore, we introduce a +notion of an *unsafe trait* -- this is a trait that is unsafe to +implement, because implementing it carries semantic guarantees that, +if compromised, threaten memory safety in a deep way. + +An unsafe trait is declared like so: + + unsafe trait Foo { ... } + +To implement an unsafe trait, one must mark the impl as unsafe: + + unsafe impl Foo for Bar { ... } + +Designating an impl as unsafe does not automatically mean that the +body of the methods is an unsafe block. Each method in the trait must +also be declared as unsafe if it to be considered unsafe. + +Unsafe traits are only unsafe to *implement*. It is always safe to +reference an unsafe trait. For example, the following function is +safe: + + fn foo(x: T) { ... } + +It is also safe to *opt out* of an unsafe trait (as discussed in the +next section). + +## Default and negative impls + +We add a notion of a *default impl*, written: + + impl Trait for .. { } + +Default impls are subject to various limitations: + +1. The default impl must appear in the same module as `Trait` (or a submodule). +2. `Trait` must not define any methods. + +We further add the notion of a *negative impl*, written: + + impl !Trait for Foo { } + +Negative impls are only permitted if `Trait` has a default impl. +Negative impls are subject to the usual orphan rules, but they are +permitting to be overlapping. This makes sense because negative impls +are not providing an implementation and hence we are not forced to +select between them. For similar reasons, negative impls never need to +be marked unsafe, even if they reference an unsafe trait. + +Intuitively, to check whether a trait `Foo` that contains a default +impl is implemented for some type `T`, we first check for explicit +(positive) impls that apply to `T`. If any are found, then `T` +implements `Foo`. Otherwise, we check for negative impls. If any are +found, then `T` does not implement `Foo`. If neither positive nor +negative impls were found, we proceed to check the component types of +`T` (i.e., the types of a struct's fields) to determine whether all of +them implement `Foo`. If so, then `Foo` is considered implemented by +`T`. + +Oe non-obvious part of the procedure is that, as we recursively +examine the component types of `T`, we add to our list of assumptions +that `T` implements `Foo`. This allows recursive types like + + struct List { data: T, next: Option> } + +to be checked successfully. Otherwise, we would recursive infinitely. +(This procedure is directly analagous to what the existing +`TypeContents` code does.) + +Note that there exist types that expand to an infinite tree of types. +Such types cannot be successfully checked with a recursive impl; they +will simply overflow the builtin depth checking. However, such types +also break code generation under monomorphization (we cannot create a +finite set of LLVM types that correspond to them) and are in general +not supported. Here is an example of such a type: + + struct Foo { + data: Option>> + } + +The difference between `Foo` and `List` above is that `Foo` +references `Foo>`, which will then in turn reference +`Foo>>` and so on. + +## Modeling Send and Share using default traits + +The `Send` and `Share` traits will be modeled entirely in the library +as follows. First, we declare the two traits as follows: + + unsafe trait Send { } + unsafe impl Send for .. { } + + unsafe trait Share { } + unsafe impl Share for .. { } + +Both traits are declared as unsafe because declaring that a type if +`Send` and `Share` has ramifications for memory safety (and data-race +freedom) that the compiler cannot, itself, check. + +Next, we will add *opt out* impls of `Send` and `Share` for the +various unsafe types: + + impl !Send for *T { } + impl !Share for *T { } + + impl !Send for *mut T { } + impl !Share for *mut T { } + + impl !Share for Unsafe { } + +Note that it is not necessary to write unsafe to *opt out* of an +unsafe trait, as that is the default state. + +Finally, we will add *opt in* impls of `Send` and `Share` for the +various safe wrapper types as needed. Here I give one example, which +is `Mutex`. `Mutex` is interesting because it has the property that it +converts a type `T` from being `Sendable` to something `Sharable`: + + unsafe impl Send for Mutex { } + unsafe impl Share for Mutex { } + +## The `Copy` and `Sized` traits + +The final two builtin traits are `Copy` and `Share`. This RFC does not +propose any changes to those two traits but rather relies on the +specification from [the original opt-in RFC](0003-opt-in-builtin-traits.md). + +### Controlling copy vs move with the `Copy` trait + +The `Copy` trait is "opt-in" for user-declared structs and enums. A +struct or enum type is considered to implement the `Copy` trait only +if it implements the `Copy` trait. This means that structs and enums +would *move by default* unless their type is explicitly declared to be +`Copy`. So, for example, the following code would be in error: + + struct Point { x: int, y: int } + ... + let p = Point { x: 1, y: 2 }; + let q = p; // moves p + print(p.x); // ERROR + +To allow that example, one would have to impl `Copy` for `Point`: + + struct Point { x: int, y: int } + impl Copy for Point { } + ... + let p = Point { x: 1, y: 2 }; + let q = p; // copies p, because Point is Pod + print(p.x); // OK + +Effectively, there is a three step ladder for types: + +1. If you do nothing, your type is *linear*, meaning that it moves + from place to place and can never be copied in any way. (We need a + better name for that.) +2. If you implement `Clone`, your type is *cloneable*, meaning that it + moves from place to place, but it can be explicitly cloned. This is + suitable for cases where copying is expensive. +3. If you implement `Copy`, your type is *copyable*, meaning that + it is just copied by default without the need for an explicit + clone. This is suitable for small bits of data like ints or + points. + +What is nice about this change is that when a type is defined, the +user makes an *explicit choice* between these three options. + +### Determining whether a type is `Sized` + +Per the DST specification, the array types `[T]` and object types like +`Trait` are unsized, as are any structs that embed one of those +types. The `Sized` trait can never be explicitly implemented and +membership in the trait is always automatically determined. + +### Matching and coherence for the builtin types `Copy` and `Sized` + +In general, determining whether a type implements a builtin trait can +follow the existing trait matching algorithm, but it will have to be +somewhat specialized. The problem is that we are somewhat limited in +the kinds of impls that we can write, so some of the implementations +we would want must be "hard-coded". + +Specifically we are limited around tuples, fixed-length array types, +proc types, closure types, and trait types: + +- *Fixed-length arrays:* A fixed-length array `[T, ..n]` is `Copy` + if `T` is `Copy`. It is always `Sized` as `T` is required to be `Sized`. +- *Tuples*: A tuple `(T_0, ..., T_n)` is `Copy/Sized` depending if, + for all `i`, `T_i` is `Copy/Sized`. +- *Trait objects* (including procs and closures): A trait object type + `Trait:K` (assuming DST here ;) is never `Copy` nor `Sized`. + +We cannot currently express the above conditions using impls. We may +at some point in the future grow the ability to express some of them. +For now, though, these "impls" will be hardcoded into the algorithm as +if they were written in libstd. + +Per the usual coherence rules, since we will have the above impls in +`libstd`, and we will have impls for types like tuples and +fixed-length arrays baked in, the only impls that end users are +permitted to write are impls for struct and enum types that they +define themselves. Although this rule is in the general spirit of the +coherence checks, it will have to be written specially. + +# Design discussion + +#### Why unsafe traits + +Without unsafe traits, it would be possible to +create data races without using the `unsafe` keyword: + + struct MyStruct { foo: Cell } + impl Share for MyStruct { } + +#### Balancing abstraction, safety, and convenience. + +In general, the existence of default traits is *anti-abstraction*, in +the sense that it exposes implementation details a library might +prefer to hide. Specifically, adding new private fields can cause your +types to become non-sendable or non-sharable, which may break +downstream clients without your knowing. This is a known challenge +with parallelism: knowing whether it is safe to parallelize relies on +implementation details we have traditionally tried to keep secret from +clients (often it is said that parallelism is "anti-modular" or +"anti-compositional" for this reason). + +I think this risk must be weighed against the limitations of requiring +total opt in. Requiring total opt in not only means that some types +will accidentally fail to implement send or share when they could, but +it also means that libraries which wish to employ marker traits cannot +be composed with other libraries that are not aware of those marker +traits. In effect, opt-in is anti-modular in its own way. + +To be more specific, imagine that library A wishes to define a +`Untainted` trait, and it specifically opts out of `Untainted` for +some base set of types. It then wishes to have routines that only +operate on `Untained` data. Now imagine that there is some other +library B that defines a nifty replacement for `Vector`, +`NiftyVector`. Finally, some library C wishes to use a +`NiftyVector`, which should not be considered tainted, because +it doesn't reference any tainted strings. However, `NiftyVector` +does not implement `Untainted` (nor can it, without either library A +or libary B knowing about one another). Similar problems arise for any +trait, of course, due to our coherence rules, but often they can be +overcome with new types. Not so with `Send` and `Share`. + +#### Other use cases + +Part of the design involves making space for other use cases. I'd like +to skech out how some of those use cases can be implemented briefly. +This is not included in the *Detailed design* section of the RFC +because these traits generally concern other features and would be +added under RFCs of their own. + +**Isolating snapshot types.** It is useful to be able to identify +types which, when cloned, result in a logical *snapshot*. That is, a +value which can never be mutated. Note that there may in fact be +mutation under the covers, but this mutation is not visible to the +user. An example of such a type is `Rc` -- although the ref count +on the `Rc` may change, the user has no direct access and so `Rc` +is still logically snapshotable. However, not all `Rc` instances are +snapshottable -- in particular, something like `Rc>` is not. + + trait Snapshot { } + impl Snapshot for .. { } + + // In general, anything that can reach interior mutability is not + // snapshotable. + impl !Snapshot for Unsafe { } + + // But it's ok for Rc. + impl Snapshot for Rc { } + +Note that these definitions could all occur in a library. That is, the +`Rc` type itself doesn't need to know about the `Snapshot` trait. + +**Preventing access to managed data.** As part of the GC design, we +expect it will be useful to write specialized allocators or smart +pointers that explicitly do *not* support tracing, so as to avoid any +kind of GC overhead. The general idea is that there should be a bound, +let's call it `NoManaged`, that indicates that a type cannot reach +managed data and hence does not need to be part of the GC's root +set. This trait could be implemented as follows: + + unsafe trait NoManaged { } + unsafe impl NoManaged for .. { } + impl !NoManaged for Gc { } + +**Preventing access to destructors.** It is generally recognized that +allowing destructors to escape into managed data -- frequently +referred to as finalizers -- is a bad idea. Therefore, we would +generally like to ensure that anything is placed into a managed box +does not implement the drop trait. Instead, we would prefer to regular +the use of drop through a guardian-like API, which basically means +that destructors are not asynchronously executed by the GC, as they +would be in Java, but rather enqueued for the mutator thread to run +synchronously at its leisure. In order to handle this, though, we +presumably need some sort of guardian wrapper types that can take a +value which has a destructor and allow it to be embedded within +managed data. We can summarize this in a trait `GcSafe` as follows: + + unsafe trait GcSafe { } + unsafe impl GcSafe for .. { } + + // By default, anything which has drop trait is not GcSafe. + impl !GcSafe for T { } + + // But guardians are, even if `T` has drop. + impl GcSafe for Guardian { } + +#### Why are `Copy` and `Sized` different? + +The `Copy` and `Sized` traits remain builtin to the compiler. This +makes sense because they are intimately tied to analyses the compiler +performs. For example, the running of destructors and tracking of +moves requires knowing which types are `Copy`. Similarly, the +allocation of stack frames need to know whether types are fully +`Sized`. In contrast, sendability and sharability has been fully +exported to libraries at this point. + +In addition, opting in to `Copy` makes sense for several reasons: + +- Experience has shown that "data-like structs", for which `Copy` is + most appropriate, are a very small percentage of the total. +- Changing a public API from being copyable to being only movable has + a outsized impact on users of the API. It is common however that as + APIs evolve they will come to require owned data (like a `Vec`), + even if they do not initially, and hence will change from being + copyable to only movable. Opting in to `Copy` is a way of saying + that you never foresee this coming to pass. +- Often it is useful to create linear "tokens" that do not themselves + have data but represent permissions. This can be done today using + markers but it is awkward. It becomes much more natural under this + proposal. + +# Drawbacks + +**API stability.** The main drawback of this approach over the +existing opt-in approach seems to be that a type may be "accidentally" +sendable or sharable. I discuss this above under the heading of +"balancing abstraction, safety, and convenience". One point I would +like to add here, as it specifically pertains to API stability, is +that a library may, if they choose, opt out of `Send` and `Share` +pre-emptively, in order to "reserve the right" to add non-sendable +things in the future. + +# Alternatives + +- The existing opt-in design is of course an alternative. + +- We could also simply add the notion of `unsafe` traits and *not* + default impls and then allow types to unsafely implement `Send` or + `Share`, bypassing the normal safety guidelines. This gives an + escape valve for a downstream client to assert that something is + sendable which was not declared as sendable. However, such a + solution is deeply unsatisfactory, because it rests on the + downstream client making an assertion about the implementation of + the library it uses. If that library should be updated, the client's + assumptions could be invalidated, but no compilation errors will + result (the impl was already declared as unsafe, after all). + +# Phasing + +Many of the mechanisms described in this RFC are not needed +immediately. Therefore, we would like to implement a minimal +"forwards compatible" set of changes now and then leave the remaining +work for after the 1.0 release. The builtin rules that the compiler +currently implements for send and share are quite close to what is +proposed in this RFC. The major change is that unsafe pointers and the +`UnsafeCell` type are currently considered sendable. + +Therefore, to be forwards compatible in the short term, we can use the +same hybrid of builtin and explicit impls for `Send` and `Share` that +we use for `Copy`, with the rule that unsafe pointers and `UnsafeCell` +are not considered sendable. We must also implement the `unsafe trait` +and `unsafe impl` concept. + +What this means in practice is that using `*const T`, `*mut T`, and +`UnsafeCell` will make a type `T` non-sendable and non-sharable, and +`T` must then explicitly implement `Send` or `Share`. + +# Unresolved questions + +- The terminology of "unsafe trait" seems somewhat misleading, since + it seems to suggest that "using" the trait is unsafe, rather than + implementing it. One suggestion for an alternate keyword was + `trusted trait`, which might dovetail with the use of `trusted` to + specify a trusted block of code. If we did use `trusted trait`, it + seems that all impls would also have to be `trusted impl`. +- Perhaps we should declare a trait as a "default trait" directly, + rather than using the `impl Drop for ..` syntax. I don't know + precisely what syntax to use, though. +- Currently, there are special rules relating to object types and + the builtin traits. If the "builtin" traits are no longer builtin, + we will have to generalize object types to be simply a set of trait + references. This is already planned but merits a second RFC. Note + that no changes here are required for the 1.0, since the phasing + plan dictates that builtin traits remain special until after 1.0. diff --git a/complete/0006-remove-priv.md b/text/0026-remove-priv.md similarity index 92% rename from complete/0006-remove-priv.md rename to text/0026-remove-priv.md index c633f6f2c2d..3f19f36e1a4 100644 --- a/complete/0006-remove-priv.md +++ b/text/0026-remove-priv.md @@ -1,6 +1,6 @@ - Start Date: 2014-03-31 -- RFC PR #: 26 -- Rust Issue #: 13535 +- RFC PR: [rust-lang/rfcs#26](https://github.com/rust-lang/rfcs/pull/26) +- Rust Issue: [rust-lang/rust#13535](https://github.com/rust-lang/rust/issues/13535) # Summary diff --git a/active/0011-bounded-type-parameters.md b/text/0034-bounded-type-parameters.md similarity index 95% rename from active/0011-bounded-type-parameters.md rename to text/0034-bounded-type-parameters.md index fdad0b437a4..96504d94336 100644 --- a/active/0011-bounded-type-parameters.md +++ b/text/0034-bounded-type-parameters.md @@ -1,6 +1,6 @@ - Start Date: 2014-04-05 -- RFC PR #: -- Rust Issue #: +- RFC PR: [rust-lang/rfcs#34](https://github.com/rust-lang/rfcs/pull/34) +- Rust Issue: [rust-lang/rust#15759](https://github.com/rust-lang/rust/issues/15759) # Summary diff --git a/complete/0012-libstd-facade.md b/text/0040-libstd-facade.md similarity index 99% rename from complete/0012-libstd-facade.md rename to text/0040-libstd-facade.md index a7cddde8134..3ad2d525356 100644 --- a/complete/0012-libstd-facade.md +++ b/text/0040-libstd-facade.md @@ -1,6 +1,6 @@ - Start Date: 2014-04-08 -- RFC PR #: 40 -- Rust Issue #: 13851 +- RFC PR: [rust-lang/rfcs#40](https://github.com/rust-lang/rfcs/pull/40) +- Rust Issue: [rust-lang/rust#13851](https://github.com/rust-lang/rust/issues/13851) # Summary diff --git a/complete/0007-regexps.md b/text/0042-regexps.md similarity index 98% rename from complete/0007-regexps.md rename to text/0042-regexps.md index 1df9c4962fc..89973b3c7b8 100644 --- a/complete/0007-regexps.md +++ b/text/0042-regexps.md @@ -1,6 +1,6 @@ - Start Date: 2014-04-12 -- RFC PR #: 42 -- Rust Issue #: 13700 +- RFC PR: [rust-lang/rfcs#42](https://github.com/rust-lang/rfcs/pull/42) +- Rust Issue: [rust-lang/rust#13700](https://github.com/rust-lang/rust/issues/13700) # Summary diff --git a/active/0024-traits.md b/text/0048-traits.md similarity index 99% rename from active/0024-traits.md rename to text/0048-traits.md index 7037a108752..7a1c624993c 100644 --- a/active/0024-traits.md +++ b/text/0048-traits.md @@ -1,6 +1,6 @@ - Start Date: 2014-06-10 -- RFC PR #: 48 -- Rust Issue #: 5527 +- RFC PR: [rust-lang/rfcs#48](https://github.com/rust-lang/rfcs/pull/48) +- Rust Issue: [rust-lang/rust#5527](https://github.com/rust-lang/rust/issues/5527) # Summary diff --git a/complete/0008-match-arm-attributes.md b/text/0049-match-arm-attributes.md similarity index 93% rename from complete/0008-match-arm-attributes.md rename to text/0049-match-arm-attributes.md index bd9ee06a7af..54ed3321828 100644 --- a/complete/0008-match-arm-attributes.md +++ b/text/0049-match-arm-attributes.md @@ -1,6 +1,6 @@ - Start Date: 2014-03-20 -- RFC PR #: 49 -- Rust Issue #: 12812 +- RFC PR: [rust-lang/rfcs#49](https://github.com/rust-lang/rfcs/pull/49) +- Rust Issue: [rust-lang/rust#12812](https://github.com/rust-lang/rust/issues/12812) # Summary diff --git a/complete/0015-assert.md b/text/0050-assert.md similarity index 87% rename from complete/0015-assert.md rename to text/0050-assert.md index c10af91e5cd..ae1dc69eada 100644 --- a/complete/0015-assert.md +++ b/text/0050-assert.md @@ -1,6 +1,6 @@ - Start Date: 2014-04-18 -- RFC PR #: 50 -- Rust Issue #: 13789 +- RFC PR: [rust-lang/rfcs#50](https://github.com/rust-lang/rfcs/pull/50) +- Rust Issue: [rust-lang/rust#13789](https://github.com/rust-lang/rust/issues/13789) # Summary diff --git a/active/0014-remove-tilde.md b/text/0059-remove-tilde.md similarity index 93% rename from active/0014-remove-tilde.md rename to text/0059-remove-tilde.md index 2b1cdccc21d..c638d9bfee8 100644 --- a/active/0014-remove-tilde.md +++ b/text/0059-remove-tilde.md @@ -1,6 +1,6 @@ - Start Date: 2014-04-30 -- RFC PR #: 59 -- Rust Issue #: 13885 +- RFC PR: [rust-lang/rfcs#59](https://github.com/rust-lang/rfcs/pull/59) +- Rust Issue: [rust-lang/rust#13885](https://github.com/rust-lang/rust/issues/13885) # Summary diff --git a/complete/0019-rename-strbuf.md b/text/0060-rename-strbuf.md similarity index 77% rename from complete/0019-rename-strbuf.md rename to text/0060-rename-strbuf.md index 5d6dea0b954..7adb8b0786d 100644 --- a/complete/0019-rename-strbuf.md +++ b/text/0060-rename-strbuf.md @@ -1,6 +1,6 @@ - Start Date: 2014-04-30 -- RFC PR #: 60 -- Rust Issue #: 14312 +- RFC PR: [rust-lang/rfcs#60](https://github.com/rust-lang/rfcs/pull/60) +- Rust Issue: [rust-lang/rust#14312](https://github.com/rust-lang/rust/issues/14312) # Summary diff --git a/complete/0016-module-file-system-hierarchy.md b/text/0063-module-file-system-hierarchy.md similarity index 93% rename from complete/0016-module-file-system-hierarchy.md rename to text/0063-module-file-system-hierarchy.md index ef6f0a88609..807f2c943d7 100644 --- a/complete/0016-module-file-system-hierarchy.md +++ b/text/0063-module-file-system-hierarchy.md @@ -1,6 +1,6 @@ - Start Date: 2014-05-02 -- RFC PR #: 63 -- Rust Issue #: 14180 +- RFC PR: [rust-lang/rfcs#63](https://github.com/rust-lang/rfcs/pull/63) +- Rust Issue: [rust-lang/rust#14180](https://github.com/rust-lang/rust/issues/14180) # Summary diff --git a/active/0031-better-temporary-lifetimes.md b/text/0066-better-temporary-lifetimes.md similarity index 92% rename from active/0031-better-temporary-lifetimes.md rename to text/0066-better-temporary-lifetimes.md index 0216afaf90d..7f2fdad2b24 100644 --- a/active/0031-better-temporary-lifetimes.md +++ b/text/0066-better-temporary-lifetimes.md @@ -1,6 +1,6 @@ - Start Date: 2014-05-04 -- RFC PR #: 0031 -- Rust Issue #: 15023 +- RFC PR: [rust-lang/rfcs#66](https://github.com/rust-lang/rfcs/pull/66) +- Rust Issue: [rust-lang/rust#15023](https://github.com/rust-lang/rust/issues/15023) # Summary diff --git a/complete/0032-const-unsafe-pointers.md b/text/0068-const-unsafe-pointers.md similarity index 97% rename from complete/0032-const-unsafe-pointers.md rename to text/0068-const-unsafe-pointers.md index fad4158ba61..fe6d54ec833 100644 --- a/complete/0032-const-unsafe-pointers.md +++ b/text/0068-const-unsafe-pointers.md @@ -1,6 +1,6 @@ - Start Date: 2014-06-11 -- RFC PR #: 68 -- Rust Issue #: 7362 +- RFC PR: [rust-lang/rfcs#68](https://github.com/rust-lang/rfcs/pull/68) +- Rust Issue: [rust-lang/rust#7362](https://github.com/rust-lang/rust/issues/7362) # Summary diff --git a/complete/0023-ascii-literals.md b/text/0069-ascii-literals.md similarity index 95% rename from complete/0023-ascii-literals.md rename to text/0069-ascii-literals.md index 7e7c542dbb0..cffcf2ff641 100644 --- a/complete/0023-ascii-literals.md +++ b/text/0069-ascii-literals.md @@ -1,6 +1,6 @@ - Start Date: 2014-05-05 -- RFC PR #: 69 -- Rust Issue #: 14646 +- RFC PR: [rust-lang/rfcs#69](https://github.com/rust-lang/rfcs/pull/69) +- Rust Issue: [rust-lang/rust#14646](https://github.com/rust-lang/rust/issues/14646) # Summary diff --git a/complete/0017-const-block-expr.md b/text/0071-const-block-expr.md similarity index 94% rename from complete/0017-const-block-expr.md rename to text/0071-const-block-expr.md index 389d9e39b6f..f521c25ced2 100644 --- a/complete/0017-const-block-expr.md +++ b/text/0071-const-block-expr.md @@ -1,6 +1,6 @@ - Start Date: 2014-05-07 -- RFC PR #: 71 -- Rust Issue #: 14181 +- RFC PR: [rust-lang/rfcs#71](https://github.com/rust-lang/rfcs/pull/71) +- Rust Issue: [rust-lang/rust#14181](https://github.com/rust-lang/rust/issues/14181) # Summary diff --git a/active/0018-undefined-struct-layout.md b/text/0079-undefined-struct-layout.md similarity index 96% rename from active/0018-undefined-struct-layout.md rename to text/0079-undefined-struct-layout.md index 16cc2bbfc93..cadf4ecacd3 100644 --- a/active/0018-undefined-struct-layout.md +++ b/text/0079-undefined-struct-layout.md @@ -1,6 +1,6 @@ - Start Date: 2014-05-17 -- RFC PR #: 79 -- Rust Issue #: 14309 +- RFC PR: [rust-lang/rfcs#79](https://github.com/rust-lang/rfcs/pull/79) +- Rust Issue: [rust-lang/rust#14309](https://github.com/rust-lang/rust/issues/14309) # Summary diff --git a/complete/0020-pattern-macros.md b/text/0085-pattern-macros.md similarity index 93% rename from complete/0020-pattern-macros.md rename to text/0085-pattern-macros.md index 1a834875699..7f2abeeff32 100644 --- a/complete/0020-pattern-macros.md +++ b/text/0085-pattern-macros.md @@ -1,6 +1,6 @@ - Start Date: 2014-05-21 -- RFC PR #: 85 -- Rust Issue #: 14473 +- RFC PR: [rust-lang/rfcs#85](https://github.com/rust-lang/rfcs/pull/85) +- Rust Issue: [rust-lang/rust#14473](https://github.com/rust-lang/rust/issues/14473) # Summary diff --git a/complete/0022-plugin-registrar.md b/text/0086-plugin-registrar.md similarity index 95% rename from complete/0022-plugin-registrar.md rename to text/0086-plugin-registrar.md index 9cd9a2ec3ae..d726b0c06b7 100644 --- a/complete/0022-plugin-registrar.md +++ b/text/0086-plugin-registrar.md @@ -1,6 +1,6 @@ - Start Date: 2014-05-22 -- RFC PR #: 86 -- Rust Issue #: 14637 +- RFC PR: [rust-lang/rfcs#86](https://github.com/rust-lang/rfcs/pull/86) +- Rust Issue: [rust-lang/rust#14637](https://github.com/rust-lang/rust/issues/14637) # Summary diff --git a/complete/0027-trait-bounds-with-plus.md b/text/0087-trait-bounds-with-plus.md similarity index 89% rename from complete/0027-trait-bounds-with-plus.md rename to text/0087-trait-bounds-with-plus.md index 39ca3ce72aa..9098e2c6aaf 100644 --- a/complete/0027-trait-bounds-with-plus.md +++ b/text/0087-trait-bounds-with-plus.md @@ -1,6 +1,6 @@ - Start Date: 2014-05-22 -- RFC PR #: 87 -- Rust Issue #: 12778 +- RFC PR: [rust-lang/rfcs#87](https://github.com/rust-lang/rfcs/pull/87) +- Rust Issue: [rust-lang/rust#12778](https://github.com/rust-lang/rust/issues/12778) # Summary diff --git a/complete/0029-loadable-lints.md b/text/0089-loadable-lints.md similarity index 97% rename from complete/0029-loadable-lints.md rename to text/0089-loadable-lints.md index 899ba86d70e..ae5fa935f57 100644 --- a/complete/0029-loadable-lints.md +++ b/text/0089-loadable-lints.md @@ -1,6 +1,6 @@ - Start Date: 2014-05-23 -- RFC PR #: 89 -- Rust Issue #: 14067 +- RFC PR: [rust-lang/rfcs#89](https://github.com/rust-lang/rfcs/pull/89) +- Rust Issue: [rust-lang/rust#14067](https://github.com/rust-lang/rust/issues/14067) # Summary diff --git a/active/0021-lexical-syntax-simplification.md b/text/0090-lexical-syntax-simplification.md similarity index 97% rename from active/0021-lexical-syntax-simplification.md rename to text/0090-lexical-syntax-simplification.md index 8a9f30d6621..19dd5c4d2db 100644 --- a/active/0021-lexical-syntax-simplification.md +++ b/text/0090-lexical-syntax-simplification.md @@ -1,6 +1,6 @@ - Start Date: 2014-05-23 -- RFC PR #: 90 -- Rust Issue #: 14504 +- RFC PR: [rust-lang/rfcs#90](https://github.com/rust-lang/rfcs/pull/90) +- Rust Issue: [rust-lang/rust#14504](https://github.com/rust-lang/rust/issues/14504) # Summary diff --git a/complete/0025-struct-grammar.md b/text/0092-struct-grammar.md similarity index 95% rename from complete/0025-struct-grammar.md rename to text/0092-struct-grammar.md index 28bf25ebf23..c59ecd4f5a9 100644 --- a/complete/0025-struct-grammar.md +++ b/text/0092-struct-grammar.md @@ -1,6 +1,6 @@ - Start Date: 2014-06-10 -- RFC PR #: 92 -- Rust Issue #: 14803 +- RFC PR: [rust-lang/rfcs#92](https://github.com/rust-lang/rfcs/pull/92) +- Rust Issue: [rust-lang/rust#14803](https://github.com/rust-lang/rust/issues/14803) # Summary diff --git a/complete/0026-remove-format-intl.md b/text/0093-remove-format-intl.md similarity index 96% rename from complete/0026-remove-format-intl.md rename to text/0093-remove-format-intl.md index edd40c2a787..1c6f2204024 100644 --- a/complete/0026-remove-format-intl.md +++ b/text/0093-remove-format-intl.md @@ -1,6 +1,6 @@ - Start Date: 2014-06-10 -- RFC PR #: 93 -- Rust Issue #: 14812 +- RFC PR: [rust-lang/rfcs#93](https://github.com/rust-lang/rfcs/pull/93) +- Rust Issue: [rust-lang/rust#14812](https://github.com/rust-lang/rust/issues/14812) # Summary diff --git a/complete/0028-partial-cmp.md b/text/0100-partial-cmp.md similarity index 96% rename from complete/0028-partial-cmp.md rename to text/0100-partial-cmp.md index ba8a9cf33b4..5069559598e 100644 --- a/complete/0028-partial-cmp.md +++ b/text/0100-partial-cmp.md @@ -1,6 +1,6 @@ - Start Date: 2014-06-01 -- RFC PR #: 100 -- Rust Issue #: 14987 +- RFC PR: [rust-lang/rfcs#100](https://github.com/rust-lang/rfcs/pull/100) +- Rust Issue: [rust-lang/rust#14987](https://github.com/rust-lang/rust/issues/14987) # Summary diff --git a/active/0036-pattern-guards-with-bind-by-move.md b/text/0107-pattern-guards-with-bind-by-move.md similarity index 96% rename from active/0036-pattern-guards-with-bind-by-move.md rename to text/0107-pattern-guards-with-bind-by-move.md index 19eb9e50953..485ebd7b0e7 100644 --- a/active/0036-pattern-guards-with-bind-by-move.md +++ b/text/0107-pattern-guards-with-bind-by-move.md @@ -1,6 +1,6 @@ - Start Date: 2014-06-05 -- RFC PR #: 107 -- Rust Issue #: 15287 +- RFC PR: [rust-lang/rfcs#107](https://github.com/rust-lang/rfcs/pull/107) +- Rust Issue: [rust-lang/rust#15287](https://github.com/rust-lang/rust/issues/15287) # Summary diff --git a/active/0035-remove-crate-id.md b/text/0109-remove-crate-id.md similarity index 98% rename from active/0035-remove-crate-id.md rename to text/0109-remove-crate-id.md index bd613e7f409..9910df83e72 100644 --- a/active/0035-remove-crate-id.md +++ b/text/0109-remove-crate-id.md @@ -1,6 +1,6 @@ - Start Date: 2014-06-24 -- RFC PR #: 109 -- Rust Issue #: 14470 +- RFC PR: [rust-lang/rfcs#109](https://github.com/rust-lang/rfcs/pull/109) +- Rust Issue: [rust-lang/rust#14470](https://github.com/rust-lang/rust/issues/14470) # Summary diff --git a/active/0034-index-traits.md b/text/0111-index-traits.md similarity index 88% rename from active/0034-index-traits.md rename to text/0111-index-traits.md index 9e2acf553cb..5045fe4e7df 100644 --- a/active/0034-index-traits.md +++ b/text/0111-index-traits.md @@ -1,6 +1,6 @@ - Start Date: 2014-06-09 -- RFC PR #: #111 -- Rust Issue #: #6515 +- RFC PR: [rust-lang/rfcs#111](https://github.com/rust-lang/rfcs/pull/111) +- Rust Issue: [rust-lang/rust#6515](https://github.com/rust-lang/rust/issues/6515) # Summary diff --git a/complete/0033-remove-cross-borrowing.md b/text/0112-remove-cross-borrowing.md similarity index 84% rename from complete/0033-remove-cross-borrowing.md rename to text/0112-remove-cross-borrowing.md index 9d8b6a7db6b..50c5c365a3f 100644 --- a/complete/0033-remove-cross-borrowing.md +++ b/text/0112-remove-cross-borrowing.md @@ -1,6 +1,6 @@ - Start Date: 2014-06-09 -- RFC PR #: 112 -- Rust Issue #: #10504 +- RFC PR: [rust-lang/rfcs#112](https://github.com/rust-lang/rfcs/pull/112) +- Rust Issue: [rust-lang/rust#10504](https://github.com/rust-lang/rust/issues/10504) # Summary diff --git a/text/0114-closures.md b/text/0114-closures.md new file mode 100644 index 00000000000..adc63f44f0c --- /dev/null +++ b/text/0114-closures.md @@ -0,0 +1,437 @@ +- Start Date: 2014-07-29 +- RFC PR: [rust-lang/rfcs#114](https://github.com/rust-lang/rfcs/pull/114) +- Rust Issue: [rust-lang/rust#16095](https://github.com/rust-lang/rust/issues/16095) + +# Summary + +- Convert function call `a(b, ..., z)` into an overloadable operator + via the traits `Fn`, `FnShare`, and `FnOnce`, where `A` + is a tuple `(B, ..., Z)` of the types `B...Z` of the arguments + `b...z`, and `R` is the return type. The three traits differ in + their self argument (`&mut self` vs `&self` vs `self`). +- Remove the `proc` expression form and type. +- Remove the closure types (though the form lives on as syntactic + sugar, see below). +- Modify closure expressions to permit specifying by-reference vs + by-value capture and the receiver type: + - Specifying by-reference vs by-value closures: + - `ref |...| expr` indicates a closure that captures upvars from the + environment by reference. This is what closures do today and the + behavior will remain unchanged, other than requiring an explicit + keyword. + - `|...| expr` will therefore indicate a closure that captures upvars + from the environment by value. As usual, this is either a copy or + move depending on whether the type of the upvar implements `Copy`. + - Specifying receiver mode (orthogonal to capture mode above): + - `|a, b, c| expr` is equivalent to `|&mut: a, b, c| expr` + - `|&mut: ...| expr` indicates that the closure implements `Fn` + - `|&: ...| expr` indicates that the closure implements `FnShare` + - `|: a, b, c| expr` indicates that the closure implements `FnOnce`. +- Add syntactic sugar where `|T1, T2| -> R1` is translated to + a reference to one of the fn traits as follows: + - `|T1, ..., Tn| -> R` is translated to `Fn<(T1, ..., Tn), R>` + - `|&mut: T1, ..., Tn| -> R` is translated to `Fn<(T1, ..., Tn), R>` + - `|&: T1, ..., Tn| -> R` is translated to `FnShare<(T1, ..., Tn), R>` + - `|: T1, ..., Tn| -> R` is translated to `FnOnce<(T1, ..., Tn), R>` + +One aspect of closures that this RFC does *not* describe is that we +must permit trait references to be universally quantified over regions +as closures are today. A description of this change is described below +under *Unresolved questions* and the details will come in a +forthcoming RFC. + +# Motivation + +Over time we have observed a very large number of possible use cases +for closures. The goal of this RFC is to create a unified closure +model that encompasses all of these use cases. + +Specific goals (explained in more detail below): + +1. Give control over inlining to users. +2. Support closures that bind by reference and closures that bind by value. +3. Support different means of accessing the closure environment, + corresponding to `self`, `&self`, and `&mut self` methods. + +As a side benefit, though not a direct goal, the RFC reduces the +size/complexity of the language's core type system by unifying +closures and traits. + +## The core idea: unifying closures and traits + +The core idea of the RFC is to unify closures, procs, and +traits. There are a number of reasons to do this. First, it simplifies +the language, because closures, procs, and traits already served +similar roles and there was sometimes a lack of clarity about which +would be the appropriate choice. However, in addition, the unification +offers increased expressiveness and power, because traits are a more +generic model that gives users more control over optimization. + +The basic idea is that function calls become an overridable operator. +Therefore, an expression like `a(...)` will be desugar into an +invocation of one of the following traits: + + trait Fn { + fn call(&mut self, args: A) -> R; + } + + trait FnShare { + fn call_share(&self, args: A) -> R; + } + + trait FnOnce { + fn call_once(self, args: A) -> R; + } + +Essentially, `a(b, c, d)` becomes sugar for one of the following: + + Fn::call(&mut a, (b, c, d)) + FnShare::call_share(&a, (b, c, d)) + FnOnce::call_once(a, (b, c, d)) + +To integrate with this, closure expressions are then translated into a +fresh struct that implements one of those three traits. The precise +trait is currently indicated using explicit syntax but may eventually +be inferred. + +This change gives user control over virtual vs static dispatch. This +works in the same way as generic types today: + + fn foo(x: &mut Fn<(int,),int>) -> int { + x(2) // virtual dispatch + } + + fn foo>(x: &mut F) -> int { + x(2) // static dispatch + } + +The change also permits returning closures, which is not currently +possible (the example relies on the proposed `impl` syntax from +rust-lang/rfcs#105): + + fn foo(x: impl Fn<(int,),int>) -> impl Fn<(int,),int> { + |v| x(v * 2) + } + +Basically, in this design there is nothing special about a closure. +Closure expressions are simply a convenient way to generate a struct +that implements a suitable `Fn` trait. + +## Bind by reference vs bind by value + +When creating a closure, it is now possible to specify whether the +closure should capture variables from its environment ("upvars") by +reference or by value. The distinction is indicated using the leading +keyword `ref`: + + || foo(a, b) // captures `a` and `b` by value + + ref || foo(a, b) // captures `a` and `b` by reference, as today + +### Reasons to bind by value + +Bind by value is useful when creating closures that will escape from +the stack frame that created them, such as task bodies (`spawn(|| +...)`) or combinators. It is also useful for moving values out of a +closure, though it should be possible to enable that with bind by +reference as well in the future. + +### Reasons to bind by reference + +Bind by reference is useful for any case where the closure is known +not to escape the creating stack frame. This frequently occurs +when using closures to encapsulate common control-flow patterns: + + map.insert_or_update_with(key, value, || ...) + opt_val.unwrap_or_else(|| ...) + +In such cases, the closure frequently wishes to read or modify local +variables on the enclosing stack frame. Generally speaking, then, such +closures should capture variables by-reference -- that is, they should +store a reference to the variable in the creating stack frame, rather +than copying the value out. Using a reference allows the closure to +mutate the variables in place and also avoids moving values that are +simply read temporarily. + +The vast majority of closures in use today are should be "by +reference" closures. The only exceptions are those closures that wish +to "move out" from an upvar (where we commonly use the so-called +"option dance" today). In fact, even those closures could be "by +reference" closures, but we will have to extend the inference to +selectively identify those variables that must be moved and take those +"by value". + +# Detailed design + +## Closure expression syntax + +Closure expressions will have the following form (using EBNF notation, +where `[]` denotes optional things and `{}` denotes a comma-separated +list): + + CLOSURE = ['ref'] '|' [SELF] {ARG} '|' ['->' TYPE] EXPR + SELF = ':' | '&' ':' | '&' 'mut' ':' + ARG = ID [ ':' TYPE ] + +The optional keyword `ref` is used to indicate whether this closure +captures *by reference* or *by value*. + +Closures are always translated into a fresh struct type with one field +per upvar. In a by-value closure, the types of these fields will be +the same as the types of the corresponding upvars (modulo `&mut` +reborrows, see below). In a by-reference closure, the types of these +fields will be a suitable reference (`&`, `&mut`, etc) to the +variables being borrowed. + +### By-value closures + +The default form for a closure is by-value. This implies that all +upvars which are referenced are copied/moved into the closure as +appropriate. There is one special case: if the type of the value to be +moved is `&mut`, we will "reborrow" the value when it is copied into +the closure. That is, given an upvar `x` of type `&'a mut T`, the +value which is actually captured will have type `&'b mut T` where `'b +<= 'a`. This rule is consistent with our general treatment of `&mut`, +which is to aggressively reborrow wherever possible; moreover, this +rule cannot introduce additional compilation errors, it can only make +more programs successfully typecheck. + +### By-reference closures + +A *by-reference* closure is a convenience form in which values used in +the closure are converted into references before being captured. +By-reference closures are always rewritable into by-value closures if +desired, but the rewrite can often be cumbersome and annoying. + +Here is a (rather artificial) example of a by-reference closure in +use: + + let in_vec: Vec = ...; + let mut out_vec: Vec = Vec::new(); + let opt_int: Option = ...; + + opt_int.map(ref |v| { + out_vec.push(v); + in_vec.fold(v, |a, &b| a + b) + }); + +This could be rewritten into a by-value closure as follows: + + let in_vec: Vec = ...; + let mut out_vec: Vec = Vec::new(); + let opt_int: Option = ...; + + opt_int.map({ + let in_vec = &in_vec; + let out_vec = &mut in_vec; + |v| { + out_vec.push(v); + in_vec.fold(v, |a, &b| a + b) + } + }) + +In this case, the capture closed over two variables, `in_vec` and +`out_vec`. As you can see, the compiler automatically infers, for each +variable, how it should be borrowed and inserts the appropriate +capture. + +In the body of a `ref` closure, the upvars continue to have the same +type as they did in the outer environment. For example, the type of a +reference to `in_vec` in the above example is always `Vec`, +whether or not it appears as part of a `ref` closure. This is not only +convenient, it is required to make it possible to infer whether each +variable is borrowed as an `&T` or `&mut T` borrow. + +Note that there are some cases where the compiler internally employs a +form of borrow that is not available in the core language, +`&uniq`. This borrow does not permit aliasing (like `&mut`) but does +not require mutability (like `&`). This is required to allow +transparent closing over of `&mut` pointers as +[described in this blog post][p]. + +**Evolutionary note:** It is possible to evolve by-reference +closures in the future in a backwards compatible way. The goal would +be to cause more programs to type-check by default. Two possible +extensions follow: + +- Detect when values are *moved* and hence should be taken by value + rather than by reference. (This is only applicable to once + closures.) +- Detect when it is only necessary to borrow a sub-path. Imagine a + closure like `ref || use(&context.variable_map)`. Currently, this + closure will borrow `context`, even though it only *uses* the field + `variable_map`. As a result, it is sometimes necessary to rewrite + the closure to have the form `{let v = &context.variable_map; || + use(v)}`. In the future, however, we could extend the inference so + that rather than borrowing `context` to create the closure, we would + borrow `context.variable_map` directly. + +## Closure sugar in trait references + +The current type for closures, `|T1, T2| -> R`, will be repurposed as +syntactic sugar for a reference to the appropriate `Fn` trait. This +shorthand be used any place that a trait reference is appropriate. The +full type will be written as one of the following: + + <'a...'z> |T1...Tn|: K -> R + <'a...'z> |&mut: T1...Tn|: K -> R + <'a...'z> |&: T1...Tn|: K -> R + <'a...'z> |: T1...Tn|: K -> R + +Each of which would then be translated into the following trait +references, respectively: + + <'a...'z> Fn<(T1...Tn), R> + K + <'a...'z> Fn<(T1...Tn), R> + K + <'a...'z> FnShare<(T1...Tn), R> + K + <'a...'z> FnOnce<(T1...Tn), R> + K + +Note that the bound lifetimes `'a...'z` are not in scope for the bound +`K`. + +# Drawbacks + +This model is more complex than the existing model in some respects +(but the existing model does not serve the full set of desired use cases). + +# Alternatives + +There is one aspect of the design that is still under active +discussion: + +**Introduce a more generic sugar.** It was proposed that we could +introduce `Trait(A, B) -> C` as syntactic sugar for `Trait<(A,B),C>` +rather than retaining the form `|A,B| -> C`. This is appealing but +removes the correspondence between the expression form and the +corresponding type. One (somewhat open) question is whether there will +be additional traits that mirror fn types that might benefit from this +more general sugar. + +**Tweak trait names.** In conjunction with the above, there is some +concern that the type name `fn(A) -> B` for a bare function with no +environment is too similar to `Fn(A) -> B` for a closure. To remedy +that, we could change the name of the trait to something like +`Closure(A) -> B` (naturally the other traits would be renamed to +match). + +Then there are a large number of permutations and options that were +largely rejected: + +**Only offer by-value closures.** We tried this and found it +required a lot of painful rewrites of perfectly reasonable code. + +**Make by-reference closures the default.** We felt this was +inconsistent with the language as a whole, which tends to make "by +value" the default (e.g., `x` vs `ref x` in patterns, `x` vs `&x` in +expressions, etc.). + +**Use a capture clause syntax that borrows individual variables.** "By +value" closures combined with `let` statements already serve this +role. Simply specifying "by-reference closure" also gives us room to +continue improving inference in the future in a backwards compatible +way. Moreover, the syntactic space around closures expressions is +extremely constrained and we were unable to find a satisfactory +syntax, particularly when combined with self-type annotations. +Finally, if we decide we *do* want the ability to have "mostly +by-value" closures, we can easily extend the current syntax by writing +something like `(ref x, ref mut y) || ...` etc. + +**Retain the proc expression form.** It was proposed that we could +retain the `proc` expression form to specify a by-value closure and +have `||` expressions be by-reference. Frankly, the main objection to +this is that nobody likes the `proc` keyword. + +**Use variadic generics in place of tuple arguments.** While variadic +generics are an interesting addition in their own right, we'd prefer +not to introduce a dependency between closures and variadic +generics. Having all arguments be placed into a tuple is also a +simpler model overall. Moreover, native ABIs on platforms of interest +treat a structure passed by value identically to distinct +arguments. Finally, given that trait calls have the "Rust" ABI, which +is not specified, we can always tweak the rules if necessary (though +there are advantages for tooling when the Rust ABI closely matches the +native ABI). + +**Use inference to determine the self type of a closure rather than an +annotation.** We retain this option for future expansion, but it is +not clear whether we can always infer the self type of a +closure. Moreover, using inference rather a default raises the +question of what to do for a type like `|int| -> uint`, where +inference is not possible. + +**Default to something other than `&mut self`.** It is our belief that +this is the most common use case for closures. + +# Transition plan + +TBD. pcwalton is working furiously as we speak. + +# Unresolved questions + +**What relationship should there be between the closure +traits?** On the one hand, there is clearly a relationship between the +traits. For example, given a `FnShare`, one can easily implement +`Fn`: + + impl> Fn for T { + fn call(&mut self, args: A) -> R { + (&*self).call_share(args) + } + } + +Similarly, given a `Fn` or `FnShare`, you can implement `FnOnce`. From +this, one might derive a subtrait relationship: + + trait FnOnce { ... } + trait Fn : FnOnce { ... } + trait FnShare : Fn { ... } + +Employing this relationship, however, would require that any manual +implement of `FnShare` or `Fn` must implement adapters for the other +two traits, since a subtrait cannot provide a specialized default of +supertrait methods (yet?). On the other hand, having no relationship +between the traits limits reuse, at least without employing explicit +adapters. + +Other alternatives that have been proposed to address the problem: + +- Use impls to implement the fn traits in terms of one another, + similar to what is shown above. The problem is that we would need to + implement `FnOnce` both for all `T` where `T:Fn` and for all `T` + where `T:FnShare`. This will yield coherence errors unless we extend + the language with a means to declare traits as mutually exclusive + (which might be valuable, but no such system has currently been + proposed nor agreed upon). + +- Have the compiler implement multiple traits for a single closure. + As with supertraits, this would require manual implements to + implement multiple traits. It would also require generic users to + write `T:Fn+FnMut` or else employ an explicit adapter. On the other + hand, it preserves the "one method per trait" rule described below. + +**Can we optimize away the trait vtable?** The runtime representation +of a reference `&Trait` to a trait object (and hence, under this +proposal, closures as well) is a pair of pointers `(data, vtable)`. It +has been proposed that we might be able to optimize this +representation to `(data, fnptr)` so long as `Trait` has a single +function. This slightly improves the performance of invoking the +function as one need not indirect through the vtable. The actual +implications of this on performance are unclear, but it might be a +reason to keep the closure traits to a single method. + +## Closures that are quantified over lifetimes + +A separate RFC is needed to describe bound lifetimes in trait +references. For example, today one can write a type like `<'a> |&'a A| +-> &'a B`, which indicates a closure that takes and returns a +reference with the same lifetime specified by the caller at each +call-site. Note that a trait reference like `Fn<(&'a A), &'a B>`, +while syntactically similar, does *not* have the same meaning because +it lacks the universal quantifier `<'a>`. Therefore, in the second +case, `'a` refers to some specific lifetime `'a`, rather than being a +lifetime parameter that is specified at each callsite. The high-level +summary of the change therefore is to permit trait references like +`<'a> Fn<(&'a A), &'a B>`; in this case, the value of `<'a>` will be +specified each time a method or other member of the trait is accessed. + +[p]: http://smallcultfollowing.com/babysteps/blog/2014/05/13/focusing-on-ownership/ diff --git a/complete/0030-rm-integer-fallback.md b/text/0115-rm-integer-fallback.md similarity index 93% rename from complete/0030-rm-integer-fallback.md rename to text/0115-rm-integer-fallback.md index 9c902421aca..6b526df4c08 100644 --- a/complete/0030-rm-integer-fallback.md +++ b/text/0115-rm-integer-fallback.md @@ -1,6 +1,6 @@ - Start Date: 2014-06-11 -- RFC PR #: 115 -- Rust Issue #: 6023 +- RFC PR: [rust-lang/rfcs#115](https://github.com/rust-lang/rfcs/pull/115) +- Rust Issue: [rust-lang/rust#6023](https://github.com/rust-lang/rust/issues/6023) # Summary diff --git a/text/0116-no-module-shadowing.md b/text/0116-no-module-shadowing.md new file mode 100644 index 00000000000..2cba84d175f --- /dev/null +++ b/text/0116-no-module-shadowing.md @@ -0,0 +1,191 @@ +- Start Date: 2014-06-12 +- RFC PR #: https://github.com/rust-lang/rfcs/pull/116 +- Rust Issue #: https://github.com/rust-lang/rust/issues/16464 + +# Summary + +Remove or feature gate the shadowing of view items on the same scope level, in order to have less +complicated semantic and be more future proof for module system changes or experiments. + +This means the names brought in scope by `extern crate` and `use` may never collide with +each other, nor with any other item (unless they live in different namespaces). +Eg, this will no longer work: + +```rust +extern crate foo; +use foo::bar::foo; // ERROR: There is already a module `foo` in scope +``` + +Shadowing would still be allowed in case of lexical scoping, so this continues to work: + +```rust +extern crate foo; + +fn bar() { + use foo::bar::foo; // Shadows the outer foo + + foo::baz(); +} + +``` +# Definitions +Due to a certain lack of official, clearly defined semantics and terminology, a list of relevant +definitions is included: + +- __Scope__ + A _scope_ in Rust is basically defined by a block, following the rules of lexical + scoping: + + ``` + scope 1 (visible: scope 1) + { + scope 1-1 (visible: scope 1, scope 1-1) + { + scope 1-1-1 (visible: scope 1, scope 1-1, scope 1-1-1) + } + scope 1-1 + { + scope 1-1-2 + } + scope 1-1 + } + scope 1 + ``` + + Blocks include block expressions, `fn` items and `mod` items, but not things like + `extern`, `enum` or `struct`. Additionally, `mod` is special in that it isolates itself from + parent scopes. +- __Scope Level__ + Anything with the same name in the example above is on the same scope level. + In a scope level, all names defined in parent scopes are visible, but can be shadowed + by a new definition with the same name, which will be in scope for that scope itself and all its + child scopes. +- __Namespace__ + Rust has different namespaces, and the scoping rules apply to each one separately. + The exact number of different namespaces is not well defined, but they are roughly + - types (`enum Foo {}`) + - modules (`mod foo {}`) + - item values (`static FOO: uint = 0;`) + - local values (`let foo = 0;`) + - lifetimes (`impl<'a> ...`) + - macros (`macro_rules! foo {...}`) +- __Definition Item__ + Declarations that create new entities in a crate are called (by the author) + definition items. They include `struct`, `enum`, `mod`, `fn`, etc. + Each of them creates a name in the type, module, item value or macro namespace in the same + scope level they are written in. +- __View Item__ + Declarations that just create aliases to existing declarations in a crate are called + view items. They include `use` and `extern crate`, and also create a name in the type, + module, item value or macro namespace in the same scope level they are written in. +- __Item__ + Both definition items and view items together are collectively called items. +- __Shadowing__ + While the principle of shadowing exists in all namespaces, there are different forms of it: + - item-style: Declarations shadow names from outer scopes, and are visible everywhere in their + own, including lexically before their own definition. + This requires there to be only one definition with the same name and namespace per scope level. + Types, modules, item values and lifetimes fall under these rules. + - sequentially: Declarations shadow names that are lexically before them, both in parent scopes + and their own. This means you can reuse the same name in the same scope, but a definition + will not be visibly before itself. This is how local values and macros work. + (Due to sequential code execution and parsing, respectively) + - _view item_: + A special case exists with view items; In the same scope level, + `extern crate` creates entries in the module namespace, which are shadowable by names created + with `use`, which are shadowable with any definition item. + __The singular goal of this RFC is to remove this shadowing behavior of view items__ + +# Motivation + +As explained above, what is currently visible under which namespace in a given scope is determined +by a somewhat complicated three step process: + +1. First, every `extern crate` item creates a name in the module namespace. +2. Then, every `use` can create a name in any namespace, + shadowing the `extern crate` ones. +3. Lastly, any definition item can shadow any name brought in scope by both `extern crate` and `use`. + +These rules have developed mostly in response to the older, more complicated import system, and +the existence of wildcard imports (`use foo::*`). +In the case of wildcard imports, this shadowing behavior prevents local code from breaking if the +source module gets updated to include new names that happen to be defined locally. + +However, wildcard imports are now feature gated, and name conflicts in general can be resolved by +using the renaming feature of `extern crate` and `use`, so in the current non-gated state +of the language there is no need for this shadowing behavior. + +Gating it off opens the door to remove it altogether in a backwards compatible way, or to +re-enable it in case wildcard imports are officially supported again. + +It also makes the mental model around items simpler: Any shadowing of items happens through +lexical scoping only, and every item can be considered unordered and mutually recursive. + +If this RFC gets accepted, a possible next step would be a RFC to lift the ordering restriction +between `extern crate`, `use` and definition items, which would make them truly behave the same in +regard to shadowing and the ability to be reordered. It would also lift the weirdness of +`use foo::bar; mod foo;`. + +Implementing this RFC would also not change anything about how name resolution works, as its just +a tightening of the existing rules. + +# Drawbacks + +- Feature gating import shadowing might break some code using `#[feature(globs)]`. +- The behavior of `libstd`s prelude becomes more magical if it still allows shadowing, + but this could be de-magified again by a new feature, see below in unresolved questions. +- Or the utility of `libstd`s prelude becomes more restricted if it doesn't allow shadowing. + +# Detailed design + +A new feature gate `import_shadowing` gets created. + +During the name resolution phase of compilation, every time the compiler detects a shadowing +between `extern crate`, `use` and definition items in the same scope level, +it bails out unless the feature gate got enabled. This amounts to two rules: + +- Items in the same scope level and either the type, module, item value or lifetime namespace + may not shadow each other in the respective namespace. +- Items may shadow names from outer scopes in any namespace. + +Just like for the `globs` feature, the `libstd` prelude import would be preempt from this, +and still be allowed to be shadowed. + +# Alternatives + +The alternative is to do nothing, and risk running into a backwards compatibility hazard, +or committing to make a final design decision around the whole module system before 1.0 gets +released. + +# Unresolved questions + +- It is unclear how the `libstd` preludes fits into this. + + On the one hand, it basically acts like a hidden `use std::prelude::*;` import + which ignores the `globs` feature, so it could simply also ignore the + `import_shadowing` feature as well, and the rule becomes that the prelude is a magic + compiler feature that injects imports into every module but doesn't prevent the user + from taking the same names. + + On the other hand, it is also thinkable to simply forbid shadowing of prelude items as well, + as defining things with the same name as std exports is not recommended anyway, and this would + nicely enforce that. It would however mean that the prelude can not change without breaking + backwards compatibility, which might be too restricting. + + A compromise would be to specialize wildcard imports into a new `prelude use` feature, which + has the explicit properties of being shadow-able and using a wildcard import. `libstd`s prelude + could then simply use that, and users could define and use their own preludes as well. + But that's a somewhat orthogonal feature, and should be discussed in its own RFC. + +- Interaction with overlapping imports. + + Right now its legal to write this: + ```rust +fn main() { + use Bar = std::result::Result; + use Bar = std::option::Option; + let x: Bar = None; +} + ``` + where the latter `use` shadows the former. This would have to be forbidden as well, + however the current semantic seems like a accident anyway. diff --git a/text/0123-share-to-threadsafe.md b/text/0123-share-to-threadsafe.md new file mode 100644 index 00000000000..0d8739d7748 --- /dev/null +++ b/text/0123-share-to-threadsafe.md @@ -0,0 +1,46 @@ +- Start Date: 2014-06-15 +- RFC PR #: [rust-lang/rfcs#123](https://github.com/rust-lang/rfcs/pull/123) +- Rust Issue #: [rust-lang/rust#16281](https://github.com/rust-lang/rust/issues/16281) + +# Summary + +Rename the `Share` trait to `Sync` + +# Motivation + +With interior mutability, the name "immutable pointer" for a value of type `&T` +is not quite accurate. Instead, the term "shared reference" is becoming popular +to reference values of type `&T`. The usage of the term "shared" is in conflict +with the `Share` trait, which is intended for types which can be safely shared +concurrently with a shared reference. + +# Detailed design + +Rename the `Share` trait in `std::kinds` to `Sync`. Documentation would +refer to `&T` as a shared reference and the notion of "shared" would simply mean +"many references" while `Sync` implies that it is safe to share among many +threads. + +# Drawbacks + +The name `Sync` may invoke conceptions of "synchronized" from languages such as +Java where locks are used, rather than meaning "safe to access in a shared +fashion across tasks". + +# Alternatives + +As any bikeshed, there are a number of other names which could be possible for +this trait: + +* `Concurrent` +* `Synchronized` +* `Threadsafe` +* `Parallel` +* `Threaded` +* `Atomic` +* `DataRaceFree` +* `ConcurrentlySharable` + +# Unresolved questions + +None. diff --git a/text/0130-box-not-special.md b/text/0130-box-not-special.md new file mode 100644 index 00000000000..41e08c5b7b7 --- /dev/null +++ b/text/0130-box-not-special.md @@ -0,0 +1,118 @@ +- Start Date: 2014-07-29 +- RFC PR: [rust-lang/rfcs#130](https://github.com/rust-lang/rfcs/pull/130) +- Rust Issue: [rust-lang/rust#16094](https://github.com/rust-lang/rust/issues/16094) + +# Summary + +Remove special treatment of `Box` from the borrow checker. + +# Motivation + +Currently the `Box` type is special-cased and converted to the old +`~T` internally. This is mostly invisible to the user, but it shows up +in some places that give special treatment to `Box`. This RFC is +specifically concerned with the fact that the borrow checker has +greater precision when derefencing `Box` vs other smart pointers +that rely on the `Deref` traits. Unlike the other kinds of special +treatment, we do not currently have a plan for how to extend this +behavior to all smart pointer types, and hence we would like to remove +it. + +Here is an example that illustrates the extra precision afforded to +`Box` vs other types that implement the `Deref` traits. The +following program, written using the `Box` type, compiles +successfully: + + struct Pair { + a: uint, + b: uint + } + + fn example1(mut smaht: Box) { + let a = &mut smaht.a; + let b = &mut smaht.b; + ... + } + +This program compiles because the type checker can see that +`(*smaht).a` and `(*smaht).b` are always distinct paths. In contrast, +if I use a smart pointer, I get compilation errors: + + fn example2(cell: RefCell) { + let mut smaht: RefMut = cell.borrow_mut(); + let a = &mut smaht.a; + + // Error: cannot borrow `smaht` as mutable more than once at a time + let b = &mut smaht.b; + } + +To see why this, consider the desugaring: + + fn example2(smaht: RefCell) { + let mut smaht = smaht.borrow_mut(); + + let tmp1: &mut Pair = smaht.deref_mut(); // borrows `smaht` + let a = &mut tmp1.a; + + let tmp2: &mut Pair = smaht.deref_mut(); // borrows `smaht` again! + let b = &mut tmp2.b; + } + +It is a violation of the Rust type system to invoke `deref_mut` when +the reference to `a` is valid and usable, since `deref_mut` requires +`&mut self`, which in turn implies no alias to `self` or anything +owned by `self`. + +This desugaring suggests how the problem can be worked around in user +code. The idea is to pull the result of the deref into a new temporary: + + fn example3(smaht: RefCell) { + let mut smaht: RefMut = smaht.borrow_mut(); + let temp: &mut Pair = &mut *smaht; + let a = &mut temp.a; + let b = &mut temp.b; + } + +# Detailed design + +Removing this treatment from the borrow checker basically means +changing the construction of loan paths for unique pointers. + +I don't actually know how best to implement this in the borrow +checker, particularly concerning the desire to keep the ability to +move out of boxes and use them in patterns. This requires some +investigation. The easiest and best way may be to "do it right" and is +probably to handle derefs of `Box` in a similar way to how +overloaded derefs are handled, but somewhat differently to account for +the possibility of moving out of them. Some investigation is needed. + +# Drawbacks + +The borrow checker rules are that much more restrictive. + +# Alternatives + +We have ruled out inconsistent behavior between `Box` and other smart +pointer types. We considered a number of ways to extend the current +treatment of box to other smart pointer types: + +1. *Require* compiler to introduce deref temporaries automatically + where possible. This is plausible as a future extension but + requires some thought to work through all cases. It may be + surprising. Note that this would be a required optimization because + if the optimization is not performed it affects what programs can + successfully type check. (Naturally it is also observable.) + +2. Some sort of unsafe deref trait that acknolwedges possibliity of + other pointers into the referent. Unappealing because the problem + is not that bad as to require unsafety. + +3. Determining conditions (perhaps based on parametricity?) where it + is provably safe to call deref. It is dubious and unknown if such + conditions exist or what that even means. Rust also does not really + enjoy parametricity properties due to presence of reflection and + unsafe code. + +# Unresolved questions + +Best implementation strategy. diff --git a/text/0131-target-specification.md b/text/0131-target-specification.md new file mode 100644 index 00000000000..2bbb008af6b --- /dev/null +++ b/text/0131-target-specification.md @@ -0,0 +1,97 @@ +- Start Date: 2014-06-18 +- RFC PR: [rust-lang/rfcs#131](https://github.com/rust-lang/rfcs/pull/131) +- Rust Issue: [rust-lang/rust#16093](https://github.com/rust-lang/rust/issues/16093) + +# Summary + +*Note:* This RFC discusses the behavior of `rustc`, and not any changes to the +language. + +Change how target specification is done to be more flexible for unexpected +usecases. Additionally, add support for the "unknown" OS in target triples, +providing a minimum set of target specifications that is valid for bare-metal +situations. + +# Motivation + +One of Rust's important use cases is embedded, OS, or otherwise "bare metal" +software. At the moment, we still depend on LLVM's split-stack prologue for +stack safety. In certain situations, it is impossible or undesirable to +support what LLVM requires to enable this (on x86, a certain thread-local +storage setup). Additionally, porting `rustc` to a new platform requires +modifying the compiler, adding a new OS manually. + +# Detailed design + +A target triple consists of three strings separated by a hyphen, with a +possible fourth string at the end preceded by a hyphen. The first is the +architecture, the second is the "vendor", the third is the OS type, and the +optional fourth is environment type. In theory, this specifies precisely what +platform the generated binary will be able to run on. All of this is +determined not by us but by LLVM and other tools. When on bare metal or a +similar environment, there essentially is no OS, and to handle this there is +the concept of "unknown" in the target triple. When the OS is "unknown", +no runtime environment is assumed to be present (including things such as +dynamic linking, threads/thread-local storage, IO, etc). + +Rather than listing specific targets for special treatment, introduce a +general mechanism for specifying certain characteristics of a target triple. +Redesign how targets are handled around this specification, including for the +built-in targets. Extend the `--target` flag to accept a file name of a target +specification. A table of the target specification flags and their meaning: + +* `data-layout`: The [LLVM data +layout](http://llvm.org/docs/LangRef.html#data-layout) to use. Mostly included +for completeness; changing this is unlikely to be used. +* `link-args`: Arguments to pass to the linker, unconditionally. +* `cpu`: Default CPU to use for the target, overridable with `-C target-cpu` +* `features`: Default target features to enable, augmentable with `-C + target-features`. +* `dynamic-linking-available`: Whether the `dylib` crate type is allowed. +* `split-stacks-supported`: Whether there is runtime support that will allow + LLVM's split stack prologue to function as intended. +* `llvm-target`: What target to pass to LLVM. +* `relocation-model`: What relocation model to use by default. +* `target_endian`, `target_word_size`: Specify the strings used for the + corresponding `cfg` variables. +* `code-model`: Code model to pass to LLVM, overridable with `-C code-model`. +* `no-redzone`: Disable use of any stack redzone, overridable with `-C + no-redzone` + +Rather than hardcoding a specific set of behaviors per-target, with no +recourse for escaping them, the compiler would also use this mechanism when +deciding how to build for a given target. The process would look like: + +1. Look up the target triple in an internal map, and load that configuration + if it exists. If that fails, check if the target name exists as a file, and + try loading that. If the file does not exist, look up `.json` in + the `RUST_TARGET_PATH`, which is a colon-separated list of directories. +2. If `-C linker` is specified, use that instead of the target-specified + linker. +3. If `-C link-args` is given, add those to the ones specified by the target. +4. If `-C target-cpu` is specified, replace the target `cpu` with it. +5. If `-C target-feature` is specified, add those to the ones specified by the + target. +6. If `-C relocation-model` is specified, replace the target + `relocation-model` with it. +7. If `-C code-model` is specified, replace the target `code-model` with it. +8. If `-C no-redzone` is specified, replace the target `no-redzone` with true. + + +Then during compilation, this information is used at the proper places rather +than matching against an enum listing the OSes we recognize. The `target_os`, +`target_family`, and `target_arch` `cfg` variables would be extracted from the +`--target` passed to rustc. + +# Drawbacks + +More complexity. However, this is very flexible and allows one to use Rust on +a new or non-standard target *incredibly easy*, without having to modify the +compiler. rustc is the only compiler I know of that would allow that. + +# Alternatives + +A less holistic approach would be to just allow disabling split stacks on a +per-crate basis. Another solution could be adding a family of targets, +`-unknown-unknown`, which omits all of the above complexity but does not +allow extending to new targets easily. diff --git a/text/0132-ufcs.md b/text/0132-ufcs.md new file mode 100644 index 00000000000..686c58e9244 --- /dev/null +++ b/text/0132-ufcs.md @@ -0,0 +1,212 @@ +- Start Date: 2014-03-17 +- RFC PR #: [#132](https://github.com/rust-lang/rfcs/pull/132) +- Rust Issue #: [#16293](https://github.com/rust-lang/rust/issues/16293) + +# Summary + +This RFC describes a variety of extensions to allow any method to be +used as first-class functions. The same extensions also allow for +trait methods without receivers to be invoked in a more natural +fashion. + +First, at present, the notation `path::method()` can be used to invoke +inherent methods on types. For example, `Vec::new()` is used to create +an instance of a vector. This RFC extends that notion to also cover +trait methods, so that something like `T::size_of()` or `T::default()` +is legal. + +Second, currently it is permitted to reference so-called "static +methods" from traits using a function-like syntax. For example, one +can write `Default::default()`. This RFC extends that notation so it +can be used with any methods, whether or not they are defined with a +receiver. (In fact, the distinction between static methods and other +methods is completely erased, as per the method lookup of RFC PR #48.) + +Third, we introduce an unambiguous if verbose notation that permits +one to precisely specify a trait method and its receiver type in one +form. Specifically, the notation `::item` can be used +to designate an item `item`, defined in a trait `TraitRef`, as +implemented by the type `T`. + +# Motivation + +There are several motivations: + +- There is a need for an unambiguous way to invoke methods. This is typically + a fallback for when the more convenient invocation forms fail: + - For example, when multiple traits are in scope that all define the same + method for the same types, there must be a way to disambiguate which + method you mean. + - It is sometimes desirable not to have autoderef: + - For methods like `clone()` that apply to almost all types, it is + convenient to be more specific about which precise type you want + to clone. To get this right with autoderef, one must know the + precise rules being used, which is contrary to the "DWIM" + intention. + - For types that implement `Deref`, UFCS can be used to + unambiguously differentiate between methods invoked on the smart + pointer itself and methods invoked on its referent. +- There are many methods, such as `SizeOf::size_of()`, that return properties + of the type alone and do not naturally take any argument that can be used + to decide which trait impl you are referring to. + - This proposal introduces a variety of ways to invoke such methods, + varying in the amount of explicit information one includes: + - `T::size_of()` -- shorthand, but only works if `T` is a path + - `::size_of()` -- infers the trait `SizeOf` based on the traits in scope, + just as with a method call + - `::size_of()` -- completely unambiguous + +# Detailed design + +### Path syntax + +The syntax of paths is extended as follows: + + PATH = ID_SEGMENT { '::' ID_SEGMENT } + | TYPE_SEGMENT { '::' ID_SEGMENT } + | ASSOC_SEGMENT '::' ID_SEGMENT { '::' ID_SEGMENT } + ID_SEGMENT = ID [ '::' '<' { TYPE ',' TYPE } '>' ] + TYPE_SEGMENT = '<' TYPE '>' + ASSOC_SEGMENT = '<' TYPE 'as' TRAIT_REFERENCE '>' + +Examples of valid paths. In these examples, capitalized names refer to +types (though this doesn't affect the grammar). + + a::b::c + a::::b::c + T::size_of + ::size_of + ::size_of + Eq::eq + Eq::::eq + Zero::zero + +### Normalization of path that reference types + +Whenever a path like `...::a::...` resolves to a type (but not a +*trait*), it is rewritten (internally) to `<...::a>::...`. + +Note that there is a subtle distinction between the following paths: + + ToStr::to_str + ::to_str + +In the former, we are selecting the member `to_str` from the trait `ToStr`. +The result is a function whose type is basically equivalent to: + + fn to_str(self: &Self) -> String + +In the latter, we are selecting the member `to_str` from the *type* +`ToStr` (i.e., an `ToStr` object). Resolving type members is +different. In this case, it would yield a function roughly equivalent +to: + + fn to_str(self: &ToStr) -> String + +This subtle distinction arises from the fact that we pun on the trait +name to indicate both a type and a reference to the trait itself. In +this case, depending on which interpretation we choose, the path +resolution rules differ slightly. + +### Paths that begin with a TYPE_SEGMENT + +When a path begins with a TYPE_SEGMENT, it is a type-relative path. If +this is the complete path (e.g., ``), then the path resolves to +the specified type. If the path continues (e.g., `::size_of`) +then the next segment is resolved using the following procedure. The +procedure is intended to mimic method lookup, and hence any changes to +method lookup may also change the details of this lookup. + +Given a path `::m::...`: + +1. Search for members of inherent impls defined on `T` (if any) with + the name `m`. If any are found, the path resolves to that item. +2. Otherwise, let `IN_SCOPE_TRAITS` be the set of traits that are in + scope and which contain a member named `m`: + - Let `IMPLEMENTED_TRAITS` be those traits from `IN_SCOPE_TRAITS` + for which an implementation exists that (may) apply to `T`. + - There can be ambiguity in the case that `T` contains type inference + variables. + - If `IMPLEMENTED_TRAITS` is not a singleton set, report an ambiguity + error. Otherwise, let `TRAIT` be the member of `IMPLEMENTED_TRAITS`. + - If `TRAIT` is ambiguously implemented for `T`, report an + ambiguity error and request further type information. + - Otherwise, rewrite the path to `::m::...` and + continue. + +### Paths that begin with an ASSOC_SEGMENT + +When a path begins with an ASSOC_SEGMENT, it is a reference to an +associated item defined from a trait. Note that such paths must always +have a follow-on member `m` (that is, `` is not a complete +path, but `::m` is). + +To resolve the path, first search for an applicable implementation of +`Trait` for `T`. If no implementation can be found -- or the result is +ambiguous -- then report an error. + +Otherwise: + +- Determine the types of output type parameters for `Trait` from the + implementation. +- If output type parameters were specified in the path, ensure that they + are compatible with those specified on the impl. + - For example, if the path were `>`, and + the impl is declared as `impl SomeTrait for int`, then an error + would be reported because `char` and `uint` are not compatible. +- Resolve the path to the member of the trait with the substitution composed + of the output type parameters from the impl and `Self => T`. + +# Alternatives + +We have explored a number of syntactic alternatives. This has been selected +as being the only one that is simultaneously: + +- Tolerable to look at. +- Able to convey *all* necessary information along with auxiliary information + the user may want to verify: + - Self type, type of trait, name of member, type output parameters + +Here are some leading candidates that were considered along with their +equivalents in the syntax proposed by this RFC. The reasons for their +rejection are listed: + + module::type::(Trait::member) ::member + --> semantics of parentheses considered too subtle + --> cannot accomodate types that are not paths, like `[int]` + + (type: Trait)::member ::member + --> complicated to parse + --> cannot accomodate types that are not paths, like `[int]` + + ... (I can't remember all the rest) + +One variation that is definitely possible is that we could use the `:` +rather than the keyword `as`: + + ::member ::member + --> no real objection. `as` was chosen because it mimics the + syntax for constructing a trait object. + +# Unresolved questions + +Is there a better way to disambiguate a reference to a trait item +`ToStr::to_str` versus a reference to a member of the object type +`::to_str`? I personally do not think so: so long as we pun on +the name of the trait, the potential for confusion will +remain. Therefore, the only two possibilities I could come up with are +to try and change the question: + +- One answer might be that we simply make the second form meaningless + by prohibiting inherent impls on object types. But there remains a + utility to being able to write something like `::is_sized()` + (where `is_sized()` is an example of a trait fn that could apply to + both sized and unsized types). Moreover, artificially restricting + object types just for this reason doesn't seem right. + +- Another answer is to change the syntax of object types. I have + sometimes considered that `impl ToStr` might be better suited as the + object type and then `ToStr` could be used as syntactic sugar for a + type parameter. But there exists a lot of precedent for the current + approach and hence I think this is likely a bad idea (not to mention + that it'a a drastic change). diff --git a/text/0135-where.md b/text/0135-where.md new file mode 100644 index 00000000000..24e24529874 --- /dev/null +++ b/text/0135-where.md @@ -0,0 +1,463 @@ +- Start Date: 2014-09-30 +- RFC PR #: https://github.com/rust-lang/rfcs/pull/135 +- Rust Issue #: https://github.com/rust-lang/rust/issues/17657 + +# Summary + +Add `where` clauses, which provide a more expressive means of +specifying trait parameter bounds. A `where` clause comes after a +declaration of a generic item (e.g., an impl or struct definition) and +specifies a list of bounds that must be proven once precise values are +known for the type parameters in question. The existing bounds +notation would remain as syntactic sugar for where clauses. + +So, for example, the `impl` for `HashMap` could be changed from this: + + impl HashMap + { + .. + } + +to the following: + + impl HashMap + where K : Hash + Eq + { + .. + } + +The full grammar can be found in the detailed design. + +# Motivation + +The high-level bit is that the current bounds syntax does not scale to +complex cases. Introducing `where` clauses is a simple extension that +gives us a lot more expressive power. In particular, it will allow us +to refactor the operator traits to be in a convenient, multidispatch +form (e.g., so that user-defined mathematical types can be added to +`int` and vice versa). (It's also worth pointing out that, once #5527 +lands at least, implementing where clauses will be very little work.) + +Here is a list of limitations with the current bounds syntax that are +overcome with the `where` syntax: + +- **It cannot express bounds on anything other than type parameters.** + Therefore, if you have a function generic in `T`, you can write + `T:MyTrait` to declare that `T` must implement `MyTrait`, but you + can't write `Option : MyTrait` or `(int, T) : MyTrait`. These + forms are less commonly required but still important. + +- **It does not work well with associated types.** This is because + there is no space to specify the value of an associated type. Other + languages use `where` clauses (or something analogous) for this + purpose. + +- **It's just plain hard to read.** Experience has shown that as the + number of bounds grows, the current syntax becomes hard to read and + format. + +Let's examine each case in detail. + +### Bounds are insufficiently expressive + +Currently bounds can only be declared on type parameters. But there +are situations where one wants to declare bounds not on the type +parameter itself but rather a type that includes the type parameter. + +#### Partially generic types + +One situation where this is occurs is when you want to write functions +where types are partially known and have those interact with other +functions that are fully generic. To explain the situation, let's +examine some code adapted from rustc. + +Imagine I have a table parameterized by a value type `V` and a key +type `K`. There are also two traits, `Value` and `Key`, that describe +the keys and values. Also, each type of key is linked to a specific +value: + + struct Table> { ... } + trait Key { ... } + trait Value { ... } + +Now, imagine I want to write some code that operates over all keys +whose value is an `Option` for some `T`: + + fn example>(table: &Table, K>) { ... } + +This seems reasonable, but this code will not compile. The problem is +that the compiler needs to know that the value type implements +`Value`, but here the value type is `Option`. So we'd need to +declare `Option : Value`, which we cannot do. + +There are workarounds. I might write a new trait `OptionalValue`: + + trait OptionalValue { + fn as_option<'a>(&'a self) -> &'a Option; // identity fn + } + +and then I could write my example as: + + fn example,K:Key(table: &Table) { ... } + +But this is making my example function, already a bit complicated, +become quite obscure. + +#### Multidispatch traits + +Another situation where a similar problem is encountered is +*multidispatch traits* (aka, multiparameter type classes in Haskell). +The idea of a multidispatch trait is to be able to choose the impl +based not just on one type, as is the most common case, but on +multiple types (usually, but not always, two). + +Multidispatch is rarely needed because the *vast* majority of traits +are characterized by a single type. But when you need it, you really +need it. One example that arises in the standard library is the traits +for binary operators like `+`. Today, the `Add` trait is defined using +only single-dispatch (like so): + +``` +pub trait Add { + fn add(&self, rhs: &Rhs) -> Sum; +} +``` + +The expression `a + b` is thus sugar for `Add::add(&a, &b)`. Because +of how our trait system works, this means that only the type of the +left-hand side (the `Self` parameter) will be used to select the +impl. The type for the right-hand side (`Rhs`) along with the type of +their sum (`Sum`) are defined as trait parameters, which are always +*outputs* of the trait matching: that is, they are specified by the +impl and are not used to select which impl is used. + +This setup means that addition is not as extensible as we would +like. For example, the standard library includes implementations of +this trait for integers and other built-in types: + +``` +impl Add for int { ... } +impl Add for f32 { ... } +``` + +The limitations of this setup become apparent when we consider how a +hypothetical user library might integrate. Imagine a library L that +defines a type `Complex` representing complex numbers: + +``` +struct Complex { ... } +``` + +Naturally, it should be possible to add complex numbers and integers. +Since complex number addition is commutative, it should be possible to +write both `1 + c` and `c + 1`. Thus one might try the following +impls: + +``` +impl Add for Complex { ... } // 1. Complex + int +impl Add for int { ... } // 2. int + Complex +impl Add for Complex { ... } // 3. Complex + Complex +``` + +Due to the coherence rules, however, this setup will not work. There +are in fact three errors. The first is that there are two impls of +`Add` defined for `Complex` (1 and 3). The second is that there are +two impls of `Add` defined for `int` (the one from the standard +library and 2). The final error is that impl 2 violates the orphan +rule, since the type `int` is not defined in the current crate. + +This is not a new problem. Object-oriented languages, with their focus +on single dispatch, have long had trouble dealing with binary +operators. One common solution is double dispatch, an awkward but +effective pattern in which no type ever implements `Add` +directly. Instead, we introduce "indirection" traits so that, e.g., +`int` is addable to anything that implements `AddToInt` and so +on. This is not my preferred solution so I will not describe it in +detail, but rather refer readers to [this blog post][bp] where I +describe how it works. + +An alternative to double dispatch is to define `Add` on tuple types +`(LHS, RHS)` rather than on a single value. Imagine that the `Add` +trait were defined as follows: + + trait Add { + fn add(self) -> Sum; + } + + impl Add for (int, int) { + fn add(self) -> int { + let (x, y) = self; + x + y + } + } + +Now the expression `a + b` would be sugar for `Add::add((a, b))`. +This small change has several interesting ramifications. For one +thing, the library L can easily extend `Add` to cover complex numbers: + +``` +impl Add for (Complex, int) { ... } +impl Add for (int, Complex) { ... } +impl Add for (Complex, Complex) { ... } +``` + +These impls do not violate the coherence rules because they are all +applied to distinct types. Moreover, none of them violate the orphan +rule because each of them is a tuple involving at least one type local +to the library. + +One downside of this `Add` pattern is that there is no way within the +trait definition to refer to the type of the left- or right-hand side +individually; we can only use the type `Self` to refer to the tuple of +both types. In the *Discussion* section below, I will introduce +an extended "multi-dispatch" pattern that addresses this particular +problem. + +There is however another problem that where clauses help to +address. Imagine that we wish to define a function to increment +complex numbers: + + fn increment(c: Complex) -> Complex { + 1 + c + } + +This function is pretty generic, so perhaps we would like to +generalize it to work over anything that can be added to an int. We'll +use our new version of the `Add` trait that is implemented over +tuples: + + fn increment(c: T) -> T { + 1 + c + } + +At this point we encounter the problem. What bound should we give for +`T`? We'd like to write something like `(int, T) : Add` -- that +is, `Add` is implemented for the tuple `(int, T)` with the sum type +`T`. But we can't write that, because the current bounds syntax is too +limited. + +Where clauses give us an answer. We can write a generic version of +`increment` like so: + + fn increment(c: T) -> T + where (int, T) : Add + { + 1 + c + } + +### Associated types + +It is unclear exactly what form associated types will have in Rust, +but it is [well documented][comparison] that our current design, in +which type parameters decorate traits, does not scale particularly +well. (For curious readers, there are [several][part1] [blog][part2] +[posts][pnkfelix] exploring the design space of associated types with +respect to Rust in particular.) + +The high-level summary of associated types is that we can replace +a generic trait like `Iterator`: + + trait Iterator { + fn next(&mut self) -> Option; + } + +With a version where the type parameter is a "member" of the +`Iterator` trait: + + trait Iterator { + type E; + + fn next(&mut self) -> Option; + } + +This syntactic change helps to highlight that, for any given type, the +type `E` is *fixed* by the impl, and hence it can be considered a +member (or output) of the trait. It also scales better as the number +of associated types grows. + +One challenge with this design is that it is not clear how to convert +a function like the following: + + fn sum>(i: I) -> int { + ... + } + +With associated types, the reference `Iterator` is no longer +valid, since the trait `Iterator` doesn't have type parameters. + +The usual solution to this problem is to employ a where clause: + + fn sum(i: I) -> int + where I::E == int + { + ... + } + +We can also employ where clauses with object types via a syntax like +`&Iterator` (admittedly somewhat wordy) + +## Readability + +When writing very generic code, it is common to have a large number of +parameters with a large number of bounds. Here is some example +function extracted from `rustc`: + + fn set_var_to_merged_bounds>>( + &self, + v_id: V, + a: &Bounds, + b: &Bounds, + rank: uint) + -> ures; + +Definitions like this are very difficult to read (it's hard to even know +how to *format* such a definition). + +Using a `where` clause allows the bounds to be separated from the list +of type parameters: + + fn set_var_to_merged_bounds(&self, + v_id: V, + a: &Bounds, + b: &Bounds, + rank: uint) + -> ures + where T:Clone, // it is legal to use individual clauses... + T:InferStr, + T:LatticeValue, + V:Clone+Eq+ToStr+Vid+UnifyVid>, // ...or use `+` + { + .. + } + +This helps to separate out the function signature from the extra +requirements that the function places on its types. + +If I may step aside from the "impersonal voice" of the RFC for a +moment, I personally find that when writing generic code it is helpful +to focus on the types and signatures, and come to the bounds +later. Where clauses help to separate these distinctions. Naturally, +your mileage may vary. - nmatsakis + +# Detailed design + +### Where can where clauses appear? + +Where clauses can be added to anything that can be parameterized with +type/lifetime parameters with the exception of trait method +definitions: `impl` declarations, `fn` declarations, and `trait` and +`struct` definitions. They appear as follows: + + impl Foo + where ... + { } + + impl Foo for C + where ... + { } + + impl Foo for C + { + fn foo -> C + where ... + { } + } + + fn foo -> C + where ... + { } + + struct Foo + where ... + { } + + trait Foo : C + where ... + { } + +#### Where clauses cannot (yet) appear on trait methods + +Note that trait method definitions were specifically excluded from the +list above. The reason is that including where clauses on a trait +method raises interesting questions for what it means to implement the +trait. Using where clauses it becomes possible to define methods that +do not necessarily apply to all implementations. We intend to enable +this feature but it merits a second RFC to delve into the details. + +### Where clause grammar + +The grammar for a `where` clause would be as follows (BNF): + + WHERE = 'where' BOUND { ',' BOUND } [,] + BOUND = TYPE ':' TRAIT { '+' TRAIT } [+] + TRAIT = Id [ '<' [ TYPE { ',' TYPE } [,] ] '>' ] + TYPE = ... (same type grammar as today) + +### Semantics + +The meaning of a where clause is fairly straightforward. Each bound in +the where clause must be proven by the caller after substitution of +the parameter types. + +One interesting case concerns trivial where clauses where the +self-type does not refer to any of the type parameters, such as the +following: + + fn foo() + where int : Eq + { ... } + +Where clauses like these are considered an error. They have no +particular meaning, since the callee knows all types involved. This is +a conservative choice: if we find that we do desire a particular +interpretation for them, we can always make them legal later. + +# Drawbacks + +This RFC introduces two ways to declare a bound. + +# Alternatives + +**Remove the existing trait bounds.** I decided against this both to +avoid breaking lots of existing code and because the existing syntax +is convenient much of the time. + +**Embed where clauses in the type parameter list.** One alternative +syntax that was proposed is to embed a where-like clause in the type +parameter list. Thus the `increment()` example + + fn increment(c: T) -> T + where () : Add + { + 1 + c + } + +would become something like: + + fn increment>(c: T) -> T + { + 1 + c + } + +This is unfortunately somewhat ambiguous, since a bound like `T:Eq` +could either be declared a type parameter `T` or as a condition that +the (existing) type `T` implement `Eq`. + +**Use a colon intead of the keyword.** There is some precedent for +this from the type state days. Unfortunately, it doesn't work with +traits due to the supertrait list, and it also doesn't look good with +the use of `:` as a trait-bound separator: + + fn increment(c: T) -> T + : () : Add + { + 1 + c + } + +[bp]: http://smallcultfollowing.com/babysteps/blog/2012/10/04/refining-traits-slash-impls/ +[comparison]: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.110.122 +[pnkfelix]: http://blog.pnkfx.org/blog/2013/04/22/designing-syntax-for-associated-items-in-rust/#background +[part1]: http://www.smallcultfollowing.com/babysteps/blog/2013/04/02/associated-items/ +[part2]: http://www.smallcultfollowing.com/babysteps/blog/2013/04/03/associated-items-continued/ + diff --git a/text/0136-no-privates-in-public.md b/text/0136-no-privates-in-public.md new file mode 100644 index 00000000000..4a04cdfbf68 --- /dev/null +++ b/text/0136-no-privates-in-public.md @@ -0,0 +1,275 @@ +- Start Date: 2014-06-24 +- RFC PR #: [#136](https://github.com/rust-lang/rfcs/pull/136) +- Rust Issue #: [#16463](https://github.com/rust-lang/rust/issues/16463) + +# Summary + +Require a feature gate to expose private items in public APIs, until we grow the +appropriate language features to be able to remove the feature gate and forbid +it entirely. + +# Motivation + +Privacy is central to guaranteeing the invariants necessary to write +correct code that employs unsafe blocks. Although the current language +rules prevent a private item from being directly named from outside +the current module, they still permit direct access to private items +in some cases. For example, a public function might return a value of +private type. A caller from outside the module could then invoke this +function and, thanks to type inference, gain access to the private +type (though they still could not invoke public methods or access +public fields). This access could undermine the reasoning of the +author of the module. Fortunately, it is not hard to prevent. + +# Detailed design + +## Overview + +The general idea is that: + + * If an item is declared as public, items referred to in the + public-facing parts of that item (e.g. its type) must themselves be + declared as public. + +Details follow. + +## The rules + +These rules apply as long as the feature gate is not enabled. After the feature +gate has been removed, they will apply always. + +### When is an item "public"? + +Items that are explicitly declared as `pub` are always public. In +addition, items in the `impl` of a trait (not an inherent impl) are +considered public if all of the following conditions are met: + + * The trait being implemented is public. + * All input types (currently, the self type) of the impl are public. + * *Motivation:* If any of the input types or the trait is public, it + should be impossible for an outside to access the items defined in + the impl. They cannot name the types nor they can get direct access + to a value of those types. + +### What restrictions apply to public items? + +The rules for various kinds of public items are as follows: + + * If it is a `static` declaration, items referred to in its type must be public. + + * If it is an `fn` declaration, items referred to in its trait bounds, argument + types, and return type must be public. + + * If it is a `struct` or `enum` declaration, items referred to in its trait + bounds and in the types of its `pub` fields must be public. + + * If it is a `type` declaration, items referred to in its definition must be + public. + + * If it is a `trait` declaration, items referred to in its super-traits, in the + trait bounds of its type parameters, and in the signatures of its methods + (see `fn` case above) must be public. + +### Examples + +Here are some examples to demonstrate the rules. + +#### Struct fields + +```` +// A private struct may refer to any type in any field. +struct Priv { + a: Priv, + b: Pub, + pub c: Priv +} + +enum Vapor { X, Y, Z } // Note that A is not used + +// Public fields of a public struct may only refer to public types. +pub struct Item { + // Private field may reference a private type. + a: Priv, + + // Public field must refer to a public type. + pub b: Pub, + + // ERROR: Public field refers to a private type. + pub c: Priv, + + // ERROR: Public field refers to a private type. + // For the purposes of this test, we do not descend into the type, + // but merely consider the names that appear in type parameters + // on the type, regardless of usage (or lack thereof) within the type + // definition itself. + pub d: Vapor, +} + +pub struct Pub { ... } +```` + +#### Methods + +``` +struct Priv { .. } +pub struct Pub { .. } +pub struct Foo { .. } + +impl Foo { + // Illegal: public method with argument of private type. + pub fn foo(&self, p: Priv) { .. } +} +``` + +#### Trait bounds + +``` +trait PrivTrait { ... } + +// Error: type parameter on public item bounded by a private trait. +pub struct Foo { ... } + +// OK: type parameter on private item. +struct Foo { ... } +``` + +#### Trait definitions + +``` +struct PrivStruct { ... } + +pub trait PubTrait { + // Error: private struct referenced from method in public trait + fn method(x: PrivStruct) { ... } +} + +trait PrivTrait { + // OK: private struct referenced from method in private trait + fn method(x: PrivStruct) { ... } +} +``` + +#### Implementations + +To some extent, implementations are prevented from exposing private +types because their types must match the trait. However, that is not +true with generics. + +``` +pub trait PubTrait { + fn method(t: T); +} + +struct PubStruct { ... } + +struct PrivStruct { ... } + +impl PubTrait for PubStruct { + // ^~~~~~~~~~ Error: Private type referenced from impl of + // public trait on a public type. [Note: this is + // an "associated type" here, not an input.] + + fn method(t: PrivStruct) { + // ^~~~~~~~~~ Error: Private type in method signature. + // + // Implementation note. It may not be a good idea to report + // an error here; I think private types can only appear in + // an impl by having an associated type bound to a private + // type. + } +} +``` + +#### Type aliases + +Note that the path to the public item does not have to be private. + +``` +mod impl { + pub struct Foo { ... } +} +pub type Bar = self::impl::Foo; +``` + +### Negative examples + +The following examples should fail to compile under these rules. + +#### Non-public items referenced by a pub use + +These examples are illegal because they use a `pub use` to re-export +a private item: + +```` +struct Item { ... } +pub mod module { + // Error: Item is not declared as public, but is referenced from + // a `pub use`. + pub use Item; +} +```` + +```` +struct Foo { ... } +// Error: Non-public item referenced by `pub use`. +pub use Item = Foo; +```` + +If it was desired to have a private name that is publicly "renamed" using a pub +use, that can be achieved using a module: + +``` +mod impl { + pub struct ItemPriv; +} +pub use Item = self::impl::ItemPriv; +``` + +# Drawbacks + +Adds a (temporary) feature gate. + +Requires some existing code to opt-in to the feature gate before +transitioning to a more explicit alternative. + +Requires effort to implement. + +# Alternatives + +If we stick with the status quo, we'll have to resolve several bizarre questions +and keep supporting its behavior indefinitely after 1.0. + +Instead of a feature gate, we could just ban these things outright right away, +at the cost of temporarily losing some convenience and a small amount of +expressiveness before the more principled replacement features are implemented. + +We could make an exception for private supertraits, as these are not quite as +problematic as the other cases. However, especially given that a more principled +alternative is known (private methods), I would rather not make any exceptions. + +The original design of this RFC had a stronger notion of "public" +which also considered whether a public path existed to the item. In +other words, a module `X` could not refer to a public item `Y` from a +submodule `Z`, unless `X` also exposed a public path to `Y` (whether +that be because `Z` was public, or via a `pub use`). This definition +strengthened the basic guarantee of "private things are only directly +accessible from within the current module" to include the idea that +public functions in outer modules cannot accidentally refer to public +items from inner modules unless there is a public path from the outer +to the inner module. Unfortunately, these rules were complex to state +concisely and also hard to understand in practice; when an error +occurred under these rules, it was very hard to evaluate whether the +error was legitimate. The newer rules are simpler while still +retaining the basic privacy guarantee. + +One important advantage of the earlier approach, and a scenario not +directly addressed in this RFC, is that there may be items which are +declared as public by an inner module but *still* not intended to be +exposed to the world at large (in other words, the items are only +expected to be used within some subtree). A special case of this is +crate-local data. In the older rules, the "intended scope" of privacy +could be somewhat inferred from the existence (or non-existence) of +`pub use` declarations. However, in the author's opinion, this +scenario would be best addressed by making `pub` declarations more +expressive so that the intended scope can be stated directly. + diff --git a/text/0139-remove-cross-borrowing-entirely.md b/text/0139-remove-cross-borrowing-entirely.md new file mode 100644 index 00000000000..d505fc6ad64 --- /dev/null +++ b/text/0139-remove-cross-borrowing-entirely.md @@ -0,0 +1,31 @@ +- Start Date: 2014-06-25 +- RFC PR: [rust-lang/rfcs#139](https://github.com/rust-lang/rfcs/pull/139) +- Rust Issue: [rust-lang/rust#10504](https://github.com/rust-lang/rust/issues/10504) + +# Summary + +Remove the coercion from `Box` to `&T` from the language. + +# Motivation + +The coercion between `Box` to `&T` is not replicable by user-defined smart pointers and has been found to be rarely used [1]. We already removed the coercion between `Box` and `&mut T` in RFC 33. + +# Detailed design + +The coercion between `Box` and `&T` should be removed. + +Note that methods that take `&self` can still be called on values of type `Box` without any special referencing or dereferencing. That is because the semantics of auto-deref and auto-ref conspire to make it work: the types unify after one autoderef followed by one autoref. + +# Drawbacks + +Borrowing from `Box` to `&T` may be convenient. + +# Alternatives + +The impact of not doing this is that the coercion will remain. + +# Unresolved questions + +None. + +[1]: https://github.com/rust-lang/rust/pull/15171 diff --git a/text/0141-lifetime-elision.md b/text/0141-lifetime-elision.md new file mode 100644 index 00000000000..9979ba362a2 --- /dev/null +++ b/text/0141-lifetime-elision.md @@ -0,0 +1,322 @@ +- Start Date: (2014-06-24) +- RFC PR: [rust-lang/rfcs#141](https://github.com/rust-lang/rfcs/pull/141) +- Rust Issue: [rust-lang/rust#15552](https://github.com/rust-lang/rust/issues/15552) + +# Summary + +This RFC proposes to + +1. Expand the rules for eliding lifetimes in `fn` definitions, and +2. Follow the same rules in `impl` headers. + +By doing so, we can avoid writing lifetime annotations ~87% of the time that +they are currently required, based on a survey of the standard library. + +# Motivation + +In today's Rust, lifetime annotations make code more verbose, both for methods + +```rust +fn get_mut<'a>(&'a mut self) -> &'a mut T +``` + +and for `impl` blocks: + +```rust +impl<'a> Reader for BufReader<'a> { ... } +``` + +In the vast majority of cases, however, the lifetimes follow a very simple +pattern. + +By codifying this pattern into simple rules for filling in elided lifetimes, we +can avoid writing any lifetimes in ~87% of the cases where they are currently +required. + +Doing so is a clear ergonomic win. + +# Detailed design + +## Today's lifetime elision rules + +Rust currently supports eliding lifetimes in functions, so that you can write + +```rust +fn print(s: &str); +fn get_str() -> &str; +``` + +instead of + +```rust +fn print<'a>(s: &'a str); +fn get_str<'a>() -> &'a str; +``` + +The elision rules work well for functions that consume references, but not for +functions that produce them. The `get_str` signature above, for example, +promises to produce a string slice that lives arbitrarily long, and is +either incorrect or should be replaced by + +```rust +fn get_str() -> &'static str; +``` + +Returning `'static` is relatively rare, and it has been proposed to make leaving +off the lifetime in output position an error for this reason. + +Moreover, lifetimes cannot be elided in `impl` headers. + +## The proposed rules + +### Overview + +This RFC proposes two changes to the lifetime elision rules: + +1. Since eliding a lifetime in output position is usually wrong or undesirable + under today's elision rules, interpret it in a different and more useful way. + +2. Interpret elided lifetimes for `impl` headers analogously to `fn` definitions. + +### Lifetime positions + +A _lifetime position_ is anywhere you can write a lifetime in a type: + +```rust +&'a T +&'a mut T +T<'a> +``` + +As with today's Rust, the proposed elision rules do _not_ distinguish between +different lifetime positions. For example, both `&str` and `Ref` have +elided a single lifetime. + +Lifetime positions can appear as either "input" or "output": + +* For `fn` definitions, input refers to the types of the formal arguments + in the `fn` definition, while output refers to + result types. So `fn foo(s: &str) -> (&str, &str)` has elided one lifetime in + input position and two lifetimes in output position. + Note that the input positions of a `fn` method definition do not + include the lifetimes that occur in the method's `impl` header + (nor lifetimes that occur in the trait header, for a default method). + + +* For `impl` headers, input refers to the lifetimes appears in the type + receiving the `impl`, while output refers to the trait, if any. So `impl<'a> + Foo<'a>` has `'a` in input position, while `impl<'a, 'b, 'c> + SomeTrait<'b, 'c> for Foo<'a, 'c>` has `'a` in input position, `'b` + in output position, and `'c` in both input and output positions. + +### The rules + +* Each elided lifetime in input position becomes a distinct lifetime + parameter. This is the current behavior for `fn` definitions. + +* If there is exactly one input lifetime position (elided or not), that lifetime + is assigned to _all_ elided output lifetimes. + +* If there are multiple input lifetime positions, but one of them is `&self` or + `&mut self`, the lifetime of `self` is assigned to _all_ elided output lifetimes. + +* Otherwise, it is an error to elide an output lifetime. + +Notice that the _actual_ signature of a `fn` or `impl` is based on the expansion +rules above; the elided form is just a shorthand. + +### Examples + +```rust +fn print(s: &str); // elided +fn print<'a>(s: &'a str); // expanded + +fn debug(lvl: uint, s: &str); // elided +fn debug<'a>(lvl: uint, s: &'a str); // expanded + +fn substr(s: &str, until: uint) -> &str; // elided +fn substr<'a>(s: &'a str, until: uint) -> &'a str; // expanded + +fn get_str() -> &str; // ILLEGAL + +fn frob(s: &str, t: &str) -> &str; // ILLEGAL + +fn get_mut(&mut self) -> &mut T; // elided +fn get_mut<'a>(&'a mut self) -> &'a mut T; // expanded + +fn args(&mut self, args: &[T]) -> &mut Command // elided +fn args<'a, 'b, T:ToCStr>(&'a mut self, args: &'b [T]) -> &'a mut Command // expanded + +fn new(buf: &mut [u8]) -> BufWriter; // elided +fn new<'a>(buf: &'a mut [u8]) -> BufWriter<'a> // expanded + +impl Reader for BufReader { ... } // elided +impl<'a> Reader for BufReader<'a> { .. } // expanded + +impl Reader for (&str, &str) { ... } // elided +impl<'a, 'b> Reader for (&'a str, &'b str) { ... } // expanded + +impl StrSlice for &str { ... } // elided +impl<'a> StrSlice<'a> for &'a str { ... } // expanded + +trait Bar<'a> { fn bound(&'a self) -> &int { ... } fn fresh(&self) -> &int { ... } } // elided +trait Bar<'a> { fn bound(&'a self) -> &'a int { ... } fn fresh<'b>(&'b self) -> &'b int { ... } } // expanded + +impl<'a> Bar<'a> for &'a str { + fn bound(&'a self) -> &'a int { ... } fn fresh(&self) -> &int { ... } // elided +} +impl<'a> Bar<'a> for &'a str { + fn bound(&'a self) -> &'a int { ... } fn fresh<'b>(&'b self) -> &'b int { ... } // expanded +} + +// Note that when the impl reuses the same signature (with the same elisions) +// from the trait definition, the expanded forms will also match, and thus +// the `impl` will be compatible with the `trait`. + +impl Bar for &str { fn bound(&self) -> &int { ... } } // elided +impl<'a> Bar<'a> for &'a str { fn bound<'b>(&'b self) -> &'b int { ... } } // expanded + +// Note that the preceding example's expanded methods do not match the +// signatures from the above trait definition for `Bar`; in the general +// case, if the elided signatures between the `impl` and the `trait` do +// not match, an expanded `impl` may not be compatible with the given +// `trait` (and thus would not compile). + +impl Bar for &str { fn fresh(&self) -> &int { ... } } // elided +impl<'a> Bar<'a> for &'a str { fn fresh<'b>(&'b self) -> &'b int { ... } } // expanded + +impl Bar for &str { + fn bound(&'a self) -> &'a int { ... } fn fresh(&self) -> &int { ... } // ILLEGAL: unbound 'a +} + +``` + +## Error messages + +Since the shorthand described above should eliminate most uses of explicit +lifetimes, there is a potential "cliff". When a programmer first encounters a +situation that requires explicit annotations, it is important that the compiler +gently guide them toward the concept of lifetimes. + +An error can arise with the above shorthand only when the program elides an +output lifetime and neither of the rules can determine how to annotate it. + +### For `fn` + +The error message should guide the programmer toward the concept of lifetime by +talking about borrowed values: + +> This function's return type contains a borrowed value, but the signature does +> not say which parameter it is borrowed from. It could be one of a, b, or +> c. Mark the input parameter it borrows from using lifetimes, +> e.g. [generated example]. See [url] for an introduction to lifetimes. + +This message is slightly inaccurate, since the presence of a lifetime parameter +does not necessarily imply the presence of a borrowed value, but there are no +known use-cases of phantom lifetime parameters. + +### For `impl` + +The error case on `impl` is exceedingly rare: it requires (1) that the `impl` is +for a trait with a lifetime argument, which is uncommon, and (2) that the `Self` +type has multiple lifetime arguments. + +Since there are no clear "borrowed values" for an `impl`, this error message +speaks directly in terms of lifetimes. This choice seems warranted given that a +programmer implementing a trait with lifetime parameters will almost certainly +already understand lifetimes. + +> TraitName requires lifetime arguments, and the impl does not say which +> lifetime parameters of TypeName to use. Mark the parameters explicitly, +> e.g. [generated example]. See [url] for an introduction to lifetimes. + +## The impact + +To assess the value of the proposed rules, we conducted a survey of the code +defined _in_ `libstd` (as opposed to the code it reexports). This corpus is +large and central enough to be representative, but small enough to easily +analyze. + +We found that of the 169 lifetimes that currently require annotation for +`libstd`, 147 would be elidable under the new rules, or 87%. + +_Note: this percentage does not include the large number of lifetimes that are +already elided with today's rules._ + +The detailed data is available at: +https://gist.github.com/aturon/da49a6d00099fdb0e861 + +# Drawbacks + +## Learning lifetimes + +The main drawback of this change is pedagogical. If lifetime annotations are +rarely used, newcomers may encounter error messages about lifetimes long before +encountering lifetimes in signatures, which may be confusing. Counterpoints: + +* This is already the case, to some extent, with the current elision rules. + +* Most existing error messages are geared to talk about specific borrows not + living long enough, pinpointing their _locations_ in the source, rather than + talking in terms of lifetime annotations. When the errors do mention + annotations, it is usually to suggest specific ones. + +* The proposed error messages above will help programmers transition out of the + fully elided regime when they first encounter a signature requiring it. + +* When combined with a good tutorial on the borrow/lifetime system (which should + be introduced early in the documentation), the above should provide a + reasonably gentle path toward using and understanding explicit lifetimes. + +Programmers learn lifetimes once, but will use them many times. Better to favor +long-term ergonomics, if a simple elision rule can cover 87% of current lifetime +uses (let alone the currently elided cases). + +## Subtlety for non-`&` types + +While the rules are quite simple and regular, they can be subtle when applied to +types with lifetime positions. To determine whether the signature + +```rust +fn foo(r: Bar) -> Bar +``` + +is actually using lifetimes via the elision rules, you have to know whether +`Bar` has a lifetime parameter. But this subtlety already exists with the +current elision rules. The benefit is that library types like `Ref<'a, T>` get +the same status and ergonomics as built-ins like `&'a T`. + +# Alternatives + +* Do not include _output_ lifetime elision for `impl`. Since traits with lifetime + parameters are quite rare, this would not be a great loss, and would simplify + the rules somewhat. + +* Only add elision rules for `fn`, in keeping with current practice. + +* Only add elision for explicit `&` pointers, eliminating one of the drawbacks + mentioned above. Doing so would impose an ergonomic penalty on abstractions, + though: `Ref` would be more painful to use than `&`. + +# Unresolved questions + +The `fn` and `impl` cases tackled above offer the biggest bang for the buck for +lifetime elision. But we may eventually want to consider other opportunities. + +## Double lifetimes + +Another pattern that sometimes arises is types like `&'a Foo<'a>`. We could +consider an additional elision rule that expands `&Foo` to `&'a Foo<'a>`. + +However, such a rule could be easily added later, and it is unclear how common +the pattern is, so it seems best to leave that for a later RFC. + +## Lifetime elision in `struct`s + +We may want to allow lifetime elision in `struct`s, but the cost/benefit +analysis is much less clear. In particular, it could require chasing an +arbitrary number of (potentially private) `struct` fields to discover the source +of a lifetime parameter for a `struct`. There are also some good reasons to +treat elided lifetimes in `struct`s as `'static`. + +Again, since shorthand can be added backwards-compatibly, it seems best to wait. diff --git a/text/0151-capture-by-value.md b/text/0151-capture-by-value.md new file mode 100644 index 00000000000..5f4a66d05cc --- /dev/null +++ b/text/0151-capture-by-value.md @@ -0,0 +1,33 @@ +- Start Date: 2014-07-02 +- RFC PR: [rust-lang/rfcs#151](https://github.com/rust-lang/rfcs/pull/151) +- Rust Issue: [rust-lang/rust#12831](https://github.com/rust-lang/rust/issues/12831) + +# Summary + +Closures should capture their upvars by value unless the `ref` keyword is used. + +# Motivation + +For unboxed closures, we will need to syntactically distinguish between captures by value and captures by reference. + +# Detailed design + +This is a small part of #114, split off to separate it from the rest of the discussion going on in that RFC. + +Closures should capture their upvars (closed-over variables) by value unless the `ref` keyword precedes the opening `|` of the argument list. Thus `|x| x + 2` will capture `x` by value (and thus, if `x` is not `Copy`, it will move `x` into the closure), but `ref |x| x + 2` will capture `x` by reference. + +In an unboxed-closures world, the immutability/mutability of the borrow (as the case may be) is inferred from the type of the closure: `Fn` captures by immutable reference, while `FnMut` captures by mutable reference. In a boxed-closures world, the borrows are always mutable. + +# Drawbacks + +It may be that `ref` is unwanted complexity; it only changes the semantics of 10%-20% of closures, after all. This does not add any core functionality to the language, as a reference can always be made explicitly and then captured. However, there are a *lot* of closures, and the workaround to capture a reference by value is painful. + +# Alternatives + +As above, the impact of not doing this is that reference semantics would have to be achieved. However, the diff against current Rust was thousands of lines of pretty ugly code. + +Another alternative would be to annotate each individual upvar with its capture semantics, like capture clauses in C++11. This proposal does not preclude adding that functionality should it be deemed useful in the future. Note that C++11 provides a syntax for capturing all upvars by reference, exactly as this proposal does. + +# Unresolved questions + +None. diff --git a/text/0155-anonymous-impl-only-in-same-module.md b/text/0155-anonymous-impl-only-in-same-module.md new file mode 100644 index 00000000000..8bc961ec716 --- /dev/null +++ b/text/0155-anonymous-impl-only-in-same-module.md @@ -0,0 +1,136 @@ +- Start Date: 2014-07-04 +- RFC PR #: [rust-lang/rfcs#155](https://github.com/rust-lang/rfcs/pull/155) +- Rust Issue #: [rust-lang/rust#17059](https://github.com/rust-lang/rust/issues/17059) + +# Summary + +Require "anonymous traits", i.e. `impl MyStruct` to occur only in the same module that `MyStruct` is defined. + +# Motivation + +Before I can explain the motivation for this, I should provide some background +as to how anonymous traits are implemented, and the sorts of bugs we see with +the current behaviour. The conclusion will be that we effectively already only +support `impl MyStruct` in the same module that `MyStruct` is defined, and +making this a rule will simply give cleaner error messages. + +- The compiler first sees `impl MyStruct` during the resolve phase, specifically + in `Resolver::build_reduced_graph()`, called by `Resolver::resolve()` in + `src/librustc/middle/resolve.rs`. This is before any type checking (or type + resolution, for that matter) is done, so the compiler trusts for now that + `MyStruct` is a valid type. +- If `MyStruct` is a path with more than one segment, such as `mymod::MyStruct`, + it is silently ignored (how was this not flagged when the code was written??), + which effectively causes static methods in such `impl`s to be dropped on the + floor. A silver lining here is that nothing is added to the current module + namespace, so the shadowing bugs demonstrated in the next bullet point do not + apply here. (To locate this bug in the code, find the `match` immediately following + the `FIXME (#3785)` comment in `resolve.rs`.) This leads to the following +```` +mod break1 { + pub struct MyGuy; + + impl MyGuy { + pub fn do1() { println!("do 1"); } + } +} + +impl break1::MyGuy { + fn do2() { println!("do 2"); } +} + +fn main() { + break1::MyGuy::do1(); + break1::MyGuy::do2(); +} +```` +```` +:15:5: 15:23 error: unresolved name `break1::MyGuy::do2`. +:15 break1::MyGuy::do2(); +```` + as noticed by @huonw in https://github.com/rust-lang/rust/issues/15060 . +- If one does not exist, the compiler creates a submodule `MyStruct` of the + current module, with `kind` `ImplModuleKind`. Static methods are placed into + this module. If such a module already exists, the methods are appended to it, + to support multiple `impl MyStruct` blocks within the same module. If a module + exists that is not `ImplModuleKind`, the compiler signals a duplicate module + definition error. +- Notice at this point that if there is a `use MyStruct`, the compiler will act + as though it is unaware of this. This is because imports are not resolved yet + (they are in `Resolver::resolve_imports()` called immediately after + `Resolver::build_reduced_graph()` is called). In the final resolution step, + `MyStruct` will be searched in the namespace of the current module, checking + imports only as a fallback (and only in some contexts), so the `use MyStruct` is + effectively shadowed. If there is an `impl MyStruct` in the file being imported + from, the user expects that the new `impl MyStruct` will append to that one, + same as if they are in the original file. This leads to the original bug report + https://github.com/rust-lang/rust/issues/15060 . +- In fact, even if no methods from the import are used, the name `MyStruct` will + not be associated to a type, so that +```` +trait T {} +impl Vec { + fn from_slice<'a>(x: &'a [uint]) -> Vec { + fail!() + } +} +fn main() { let r = Vec::from_slice(&[1u]); } +```` +```` +error: found module name used as a type: impl Vec::Vec (id=5) +impl Vec +```` + which @Ryman noticed in https://github.com/rust-lang/rust/issues/15060 . The + reason for this is that in `Resolver::resolve_crate()`, the final step of + `Resolver::resolve()`, the type of an anonymous `impl` is determined by + `NameBindings::def_for_namespace(TypeNS)`. This function searches the namespace + `TypeNS` (which is not affected by imports) for a type; failing that it + tries for a module; failing that it returns `None`. The result is that when + typeck runs, it sees `impl [module name]` instead of `impl [type name]`. + + +The main motivation of this RFC is to clear out these bugs, which do not make +sense to a user of the language (and had me confused for quite a while). + +A secondary motivation is to enforce consistency in code layout; anonymous traits +are used the way that class methods are used in other languages, and the data +and methods of a struct should be defined nearby. + +# Detailed design + +I propose three changes to the language: + +- `impl` on multiple-ident paths such as `impl mymod::MyStruct` is disallowed. + Since this currently suprises the user by having absolutely no effect for + static methods, support for this is already broken. +- `impl MyStruct` must occur in the same module that `MyStruct` is defined. + This is to prevent the above problems with `impl`-across-modules. + Migration path is for users to just move code between source files. + +# Drawbacks + +Static methods on `impl`s-away-from-definition never worked, while non-static +methods can be implemented using non-anonymous traits. So there is no loss in +expressivity. However, using a trait where before there was none may be clumsy, +since it might not have a sensible name, and it must be explicitly imported by +all users of the trait methods. + +For example, in the stdlib `src/libstd/io/fs.rs` we see the code `impl path::Path` +to attach (non-static) filesystem-related methods to the `Path` type. This would +have to be done via a `FsPath` trait which is implemented on `Path` and exported +alongside `Path` in the prelude. + +It is worth noting that this is the only instance of this RFC conflicting with +current usage in the stdlib or compiler. + +# Alternatives + +- Leaving this alone and fixing the bugs directly. This is really hard. To do it + properly, we would need to seriously refactor resolve. + +# Unresolved questions + +None. + + + diff --git a/text/0160-if-let.md b/text/0160-if-let.md new file mode 100644 index 00000000000..462477ebd76 --- /dev/null +++ b/text/0160-if-let.md @@ -0,0 +1,228 @@ +- Start Date: 2014-08-26 +- RFC PR #: [rust-lang/rfcs#160](https://github.com/rust-lang/rfcs/pull/160) +- Rust Issue #: [rust-lang/rust#16779](https://github.com/rust-lang/rust/issues/16779) + +# Summary + +Introduce a new `if let PAT = EXPR { BODY }` construct. This allows for refutable pattern matching +without the syntactic and semantic overhead of a full `match`, and without the corresponding extra +rightward drift. Informally this is known as an "if-let statement". + +# Motivation + +Many times in the past, people have proposed various mechanisms for doing a refutable let-binding. +None of them went anywhere, largely because the syntax wasn't great, or because the suggestion +introduced runtime failure if the pattern match failed. + +This proposal ties the refutable pattern match to the pre-existing conditional construct (i.e. `if` +statement), which provides a clear and intuitive explanation for why refutable patterns are allowed +here (as opposed to a `let` statement which disallows them) and how to behave if the pattern doesn't +match. + +The motivation for having any construct at all for this is to simplify the cases that today call for +a `match` statement with a single non-trivial case. This is predominately used for unwrapping +`Option` values, but can be used elsewhere. + +The idiomatic solution today for testing and unwrapping an `Option` looks like + +```rust +match optVal { + Some(x) => { + doSomethingWith(x); + } + None => {} +} +``` + +This is unnecessarily verbose, with the `None => {}` (or `_ => {}`) case being required, and +introduces unnecessary rightward drift (this introduces two levels of indentation where a normal +conditional would introduce one). + +The alternative approach looks like this: + +```rust +if optVal.is_some() { + let x = optVal.unwrap(); + doSomethingWith(x); +} +``` + +This is generally considered to be a less idiomatic solution than the `match`. It has the benefit of +fixing rightward drift, but it ends up testing the value twice (which should be optimized away, but +semantically speaking still happens), with the second test being a method that potentially +introduces failure. From context, the failure won't happen, but it still imposes a semantic burden +on the reader. Finally, it requires having a pre-existing let-binding for the optional value; if the +value is a temporary, then a new let-binding in the parent scope is required in order to be able to +test and unwrap in two separate expressions. + +The `if let` construct solves all of these problems, and looks like this: + +```rust +if let Some(x) = optVal { + doSomethingWith(x); +} +``` + +# Detailed design + +The `if let` construct is based on the precedent set by Swift, which introduced its own `if let` +statement. In Swift, `if let var = expr { ... }` is directly tied to the notion of optional values, +and unwraps the optional value that `expr` evaluates to. In this proposal, the equivalent is `if let +Some(var) = expr { ... }`. + +Given the following rough grammar for an `if` condition: + +``` +if-expr = 'if' if-cond block else-clause? +if-cond = expression +else-clause = 'else' block | 'else' if-expr +``` + +The grammar is modified to add the following productions: + +``` +if-cond = 'let' pattern '=' expression +``` + +The `expression` is restricted to disallow a trailing braced block (e.g. for struct literals) the +same way the `expression` in the normal `if` statement is, to avoid ambiguity with the then-block. + +Contrary to a `let` statement, the pattern in the `if let` expression allows refutable patterns. The +compiler should emit a warning for an `if let` expression with an irrefutable pattern, with the +suggestion that this should be turned into a regular `let` statement. + +Like the `for` loop before it, this construct can be transformed in a syntax-lowering pass into the +equivalent `match` statement. The `expression` is given to `match` and the `pattern` becomes a match +arm. If there is an `else` block, that becomes the body of the `_ => {}` arm, otherwise `_ => {}` is +provided. + +Optionally, one or more `else if` (not `else if let`) blocks can be placed in the same `match` using +pattern guards on `_`. This could be done to simplify the code when pretty-printing the expansion +result. Otherwise, this is an unnecessary transformation. + +Due to some uncertainty regarding potentially-surprising fallout of AST rewrites, and some worries +about exhaustiveness-checking (e.g. a tautological `if let` would be an error, which may be +unexpected), this is put behind a feature gate named `if_let`. + +## Examples + +Source: + +```rust +if let Some(x) = foo() { + doSomethingWith(x) +} +``` + +Result: + +```rust +match foo() { + Some(x) => { + doSomethingWith(x) + } + _ => {} +} +``` + +Source: + +```rust +if let Some(x) = foo() { + doSomethingWith(x) +} else { + defaultBehavior() +} +``` + +Result: + +```rust +match foo() { + Some(x) => { + doSomethingWith(x) + } + _ => { + defaultBehavior() + } +} +``` + +Source: + +```rust +if cond() { + doSomething() +} else if let Some(x) = foo() { + doSomethingWith(x) +} else { + defaultBehavior() +} +``` + +Result: + +```rust +if cond() { + doSomething() +} else { + match foo() { + Some(x) => { + doSomethingWith(x) + } + _ => { + defaultBehavior() + } + } +} +``` + +With the optional addition specified above: + +```rust +if let Some(x) = foo() { + doSomethingWith(x) +} else if cond() { + doSomething() +} else if other_cond() { + doSomethingElse() +} +``` + +Result: + +```rust +match foo() { + Some(x) => { + doSomethingWith(x) + } + _ if cond() => { + doSomething() + } + _ if other_cond() => { + doSomethingElse() + } + _ => {} +} +``` + +# Drawbacks + +It's one more addition to the grammar. + +# Alternatives + +This could plausibly be done with a macro, but the invoking syntax would be pretty terrible and +would largely negate the whole point of having this sugar. + +Alternatively, this could not be done at all. We've been getting alone just fine without it so far, +but at the cost of making `Option` just a bit more annoying to work with. + +# Unresolved questions + +It's been suggested that alternates or pattern guards should be allowed. I think if you need those +you could just go ahead and use a `match`, and that `if let` could be extended to support those in +the future if a compelling use-case is found. + +I don't know how many `match` statements in our current code base could be replaced with this +syntax. Probably quite a few, but it would be informative to have real data on this. diff --git a/text/0164-feature-gate-slice-pats.md b/text/0164-feature-gate-slice-pats.md new file mode 100644 index 00000000000..4712e3aa342 --- /dev/null +++ b/text/0164-feature-gate-slice-pats.md @@ -0,0 +1,30 @@ +- Start Date: 2014-07-14 +- RFC PR #: [rust-lang/rfcs#164](https://github.com/rust-lang/rfcs/pull/164) +- Rust Issue #: [rust-lang/rust#16951](https://github.com/rust-lang/rust/issues/16951) + +# Summary + +Rust's support for pattern matching on slices has grown steadily and incrementally without a lot of oversight. +We have concern that Rust is doing too much here, and that the complexity is not worth it. This RFC proposes +to feature gate multiple-element slice matches in the head and middle positions (`[xs.., 0, 0]` and `[0, xs.., 0]`). + +# Motivation + +Some general reasons and one specific: first, the implementation of Rust's match machinery is notoriously complex, and not well-loved. Removing features is seen as a valid way to reduce complexity. Second, slice matching in particular, is difficult to implement, while also being of only moderate utility (there are many types of collections - slices just happen to be built into the language). Finally, the exhaustiveness check is not correct for slice patterns because of their complexity; it's not known if it +can be done correctly, nor whether it is worth the effort to do so. + +# Detailed design + +The `advanced_slice_patterns` feature gate will be added. When the compiler encounters slice pattern matches in head or middle position it will emit a warning or error according to the current settings. + +# Drawbacks + +It removes two features that some people like. + +# Alternatives + +Fixing the exhaustiveness check would allow the feature to remain. + +# Unresolved questions + +N/A diff --git a/text/0168-mod.md b/text/0168-mod.md new file mode 100644 index 00000000000..71ca89377fa --- /dev/null +++ b/text/0168-mod.md @@ -0,0 +1,59 @@ +- Start Date: 2014-06-06 +- RFC PR: [rust-lang/rfcs#168](https://github.com/rust-lang/rfcs/pull/168) +- Rust Issue: [rust-lang/rust#15722](https://github.com/rust-lang/rust/issues/15722) +- Author: Tommit (edited by nrc) + + +# Summary + +Add syntax sugar for importing a module and items in that module in a single +view item. + + +# Motivation + +Make use clauses more concise. + + +# Detailed design + +The `mod` keyword may be used in a braced list of modules in a `use` item to +mean the prefix module for that list. For example, writing `prefix::{mod, +foo};` is equivalent to writing + +``` +use prefix; +use prefix::foo; +``` + +The `mod` keyword cannot be used outside of braces, nor can it be used inside +braces which do not have a prefix path. Both of the following examples are +illegal: + +``` +use module::mod; +use {mod, foo}; +``` + +A programmer may write `mod` in a module list with only a single item. E.g., +`use prefix::{mod};`, although this is considered poor style and may be forbidden +by a lint. (The preferred version is `use prefix;`). + + +# Drawbacks + +Another use of the `mod` keyword. + +We introduce a way (the only way) to have paths in use items which do not +correspond with paths which can be used in the program. For example, with `use +foo::bar::{mod, baz};` the programmer can use `foo::bar::baz` in their program +but not `foo::bar::mod` (instead `foo::bar` is imported). + +# Alternatives + +Don't do this. + + +# Unresolved questions + +N/A diff --git a/text/0169-use-path-as-id.md b/text/0169-use-path-as-id.md new file mode 100644 index 00000000000..f0f2313cc3d --- /dev/null +++ b/text/0169-use-path-as-id.md @@ -0,0 +1,206 @@ +- Start Date: 2014-07-16 +- RFC PR #: [#169](https://github.com/rust-lang/rfcs/pull/169) +- Rust Issue #: https://github.com/rust-lang/rust/issues/16461 + +# Summary + +Change the rebinding syntax from `use ID = PATH` to `use PATH as ID`, +so that paths all line up on the left side, and imported identifers +are all on the right side. Also modify `extern crate` syntax +analogously, for consistency. + +# Motivation + +Currently, the view items at the start of a module look something like +this: + +```rust +mod old_code { + use a::b::c::d::www; + use a::b::c::e::xxx; + use yyy = a::b::yummy; + use a::b::c::g::zzz; +} +``` + +This means that if you want to see what identifiers have been +imported, your eyes need to scan back and forth on both the left-hand +side (immediately beside the `use`) and the right-hand side (at the +end of each line). In particular, note that `yummy` is *not* in scope +within the body of `old_code` + +This RFC proposes changing the grammar of Rust so that the example +above would look like this: + +```rust +mod new_code { + use a::b::c::d::www; + use a::b::c::e::xxx; + use a::b::yummy as yyy; + use a::b::c::g::zzz; +} +``` + +There are two benefits we can see by comparing `mod old_code` and `mod +new_code`: + + * As alluded to above, now all of the imported identfifiers are on + the right-hand side of the block of view items. + + * Additionally, the left-hand side looks much more regular, since one + sees the straight lines of `a::b::` characters all the way down, + which makes the *actual* differences between the different paths + more visually apparent. + +# Detailed design + +Currently, the grammar for use statements is something like: + +``` + use_decl : "pub" ? "use" [ ident '=' path + | path_glob ] ; +``` + +Likewise, the grammar for extern crate declarations is something like: + +``` + extern_crate_decl : "extern" "crate" ident [ '(' link_attrs ')' ] ? [ '=' string_lit ] ? ; +``` + +This RFC proposes changing the grammar for use statements to something like: + +``` + use_decl : "pub" ? "use" [ path "as" ident + | path_glob ] ; +``` + +and the grammar for extern crate declarations to something like: + +``` + extern_crate_decl : "extern" "crate" [ string_lit "as" ] ? ident [ '(' link_attrs ')' ] ? ; +``` + +Both `use` and `pub use` forms are changed to use `path as ident` +instead of `ident = path`. The form `use path as ident` has the same +constraints and meaning that `use ident = path` has today. + +Nothing about path globs is changed; the view items that use +`ident = path` are disjoint from the view items that use path globs, +and that continues to be the case under `path as ident`. + +The old syntaxes + `"use" ident '=' path` +and + `"extern" "crate" ident '=' string_lit` +are removed (or at least deprecated). + +# Drawbacks + +* `pub use export = import_path` may be preferred over `pub use + import_path as export` since people are used to seeing the name + exported by a `pub` item on the left-hand side of an `=` sign. + (See "Have distinct rebinding syntaxes for `use` and `pub use`" + below.) + +* The 'as' keyword is not currently used for any binding form in Rust. + Adopting this RFC would change that precedent. + (See "Change the signaling token" below.) + +# Alternatives + +## Keep things as they are + +This just has the drawbacks outlined in the motivation: the left-hand +side of the view items are less regular, and one needs to scan both +the left- and right-hand sides to see all the imported identifiers. + +## Change the signaling token + +Go ahead with switch, so imported identifier is on the left-hand side, +but use a different token than `as` to signal a rebinding. + +For example, we could use `@`, as an analogy with its use as a binding +operator in match expressions: + +```rust +mod new_code { + use a::b::c::d::www; + use a::b::c::e::xxx; + use a::b::yummy @ yyy; + use a::b::c::g::zzz; +} +``` +(I do not object to `path @ ident`, though I find it somehow more +"line-noisy" than `as` in this context.) + +Or, we could use `=`: + +```rust +mod new_code { + use a::b::c::d::www; + use a::b::c::e::xxx; + use a::b::yummy = yyy; + use a::b::c::g::zzz; +} +``` +(I *do* object to `path = ident`, since typically when `=` is used to +bind, the identifier being bound occurs on the left-hand side.) + +Or, we could use `:`, by (weak) analogy with struct pattern syntax: +```rust +mod new_code { + use a::b::c::d::www; + use a::b::c::e::xxx; + use a::b::yummy : yyy; + use a::b::c::g::zzz; +} +``` +(I cannot figure out if this is genius or madness. Probably madness, +especially if one is allowed to omit the whitespace around the `:`) + +## Have distinct rebinding syntaxes for `use` and `pub use` + +If people really like having `ident = path` for `pub use`, by the +reasoning presented above that people are used to seeing the name +exported by a `pub` item on the left-hand side of an `=` sign, then we +could support that by continuing to support `pub use ident = path`. + +If we were to go down that route, I would prefer to have distinct +notions of the exported name and imported name, so that: + +`pub use a = foo::bar;` would actually *import* `bar` (and `a` would +just be visible as an *export*), and then one could rebind for export +and import simultaneously, like so: +`pub use exported_bar = foo::bar as imported_bar;` + +But really, is `pub use foo::bar as a` all that bad? + +## Allow `extern crate ident as ident` + +As written, this RFC allows for two variants of `extern_crate_decl`: + +```rust +extern crate old_name; +extern crate "old_name" as new_name; +``` + +These are just analogous to the current options that use `=` instead of `as`. + +However, the RFC comment dialogue suggested also allowing a renaming +form that does not use a string literal: + +```rust +extern crate old_name as new_name; +``` + +I have no opinion on whether this should be added or not. Arguably +this choice is orthgonal to the goals of this RFC (since, if this is a +good idea, it could just as well be implemented with the `=` syntax). +Perhaps it should just be filed as a separate RFC on its own. + +# Unresolved questions + +* In the revised `extern crate` form, is it best to put the + `link_attrs` after the identifier, as written above? Or would it be + better for them to come after the `string_literal` when using the + `extern crate string_literal as ident` form? diff --git a/text/0179-and-mut-patterns.md b/text/0179-and-mut-patterns.md new file mode 100644 index 00000000000..183e7c50131 --- /dev/null +++ b/text/0179-and-mut-patterns.md @@ -0,0 +1,83 @@ +- Start Date: 23-07-2014 +- RFC PR: [rust-lang/rfcs#179](https://github.com/rust-lang/rfcs/pull/179) +- Rust Issue: [rust-lang/rust#20496](https://github.com/rust-lang/rust/issues/20496) + +# Summary + +Change pattern matching on an `&mut T` to `&mut `, away from its +current `&` syntax. + +# Motivation + +Pattern matching mirrors construction for almost all types, *except* +`&mut`, which is constructed with `&mut ` but destructured with +`&`. This is almost certainly an unnecessary inconsistency. + +This can and does lead to confusion, since people expect the pattern +syntax to match construction, but a pattern like `&mut (ref mut x, _)` is +actually currently a parse error: + +```rust +fn main() { + let &mut (ref mut x, _); +} +``` + +``` +and-mut-pat.rs:2:10: 2:13 error: expected identifier, found path +and-mut-pat.rs:2 let &mut (ref mut x, _); + ^~~ +``` + + +Another (rarer) way it can be confusing is the pattern `&mut x`. It is +expected that this binds `x` to the contents of `&mut T` +pointer... which it does, but as a mutable binding (it is parsed as +`&(mut x)`), meaning something like + +```rust +for &mut x in some_iterator_over_and_mut { + println!("{}", x) +} +``` + +gives an unused mutability warning. NB. it's somewhat rare that one +would want to pattern match to directly bind a name to the contents of +a `&mut` (since the normal reason to have a `&mut` is to mutate the +thing it points at, but this pattern is (byte) copying the data out, +both before and after this change), but can occur if a type only +offers a `&mut` iterator, i.e. types for which a `&` one is no more +flexible than the `&mut` one. + +# Detailed design + +Add ` := &mut ` to the pattern grammar, and require that it is used +when matching on a `&mut T`. + +# Drawbacks + +It makes matching through a `&mut` more verbose: `for &mut (ref mut x, +p_) in v.mut_iter()` instead of `for &(ref mut x, _) in +v.mut_iter()`. + +Macros wishing to pattern match on either `&` or `&mut` need to handle +each case, rather than performing both with a single `&`. However, +macros handling these types already need special `mut` vs. not +handling if they ever name the types, or if they use `ref` vs. `ref +mut` subpatterns. + +It also makes obtaining the current behaviour (binding by-value the +contents of a reference to a mutable local) slightly harder. For a +`&mut T` the pattern becomes `&mut mut x`, and, at the moment, for a +`&T`, it must be matched with `&x` and then rebound with `let mut x = +x;` (since disambiguating like `&(mut x)` doesn't yet work). However, +based on some loose grepping of the Rust repo, both of these are very +rare. + +# Alternatives + +None. + +# Unresolved questions + +None. diff --git a/text/0184-tuple-accessors.md b/text/0184-tuple-accessors.md new file mode 100644 index 00000000000..aa4d06fec40 --- /dev/null +++ b/text/0184-tuple-accessors.md @@ -0,0 +1,74 @@ +- Start Date: 2014-07-24 +- RFC PR #: https://github.com/rust-lang/rfcs/pull/184 +- Rust Issue #: https://github.com/rust-lang/rust/issues/16950 + +Summary +======= + +Add simple syntax for accessing values within tuples and tuple structs behind a +feature gate. + +Motivation +========== + +Right now accessing fields of tuples and tuple structs is incredibly painful—one +must rely on pattern-matching alone to extract values. This became such a +problem that twelve traits were created in the standard library +(`core::tuple::Tuple*`) to make tuple value accesses easier, adding `.valN()`, +`.refN()`, and `.mutN()` methods to help this. But this is not a very nice +solution—it requires the traits to be implemented in the standard library, not +the language, and for those traits to be imported on use. On the whole this is +not a problem, because most of the time `std::prelude::*` is imported, but this +is still a hack which is not a real solution to the problem at hand. It also +only supports tuples of length up to twelve, which is normally not a problem but +emphasises how bad the current situation is. + +Detailed design +=============== + +Add syntax of the form `.` for accessing values within tuples and +tuple structs. This (and the functionality it provides) would only be allowed +when the feature gate `tuple_indexing` is enabled. This syntax is recognised +wherever an unsuffixed integer literal is found in place of the normal field or +method name expected when accessing fields with `.`. Because the parser would be +expecting an integer, not a float, an expression like `expr.0.1` would be a +syntax error (because `0.1` would be treated as a single token). + +Tuple/tuple struct field access behaves the same way as accessing named fields +on normal structs: + +```rust +// With tuple struct +struct Foo(int, int); +let mut foo = Foo(3, -15); +foo.0 = 5; +assert_eq!(foo.0, 5); + +// With normal struct +struct Foo2 { _0: int, _1: int } +let mut foo2 = Foo2 { _0: 3, _1: -15 }; +foo2._0 = 5; +assert_eq!(foo2._0, 5); +``` + +Effectively, a tuple or tuple struct field is just a normal named field with an +integer for a name. + +Drawbacks +========= + +This adds more complexity that is not strictly necessary. + +Alternatives +============ + +Stay with the status quo. Either recommend using a struct with named fields or +suggest using pattern-matching to extract values. If extracting individual +fields of tuples is really necessary, the `TupleN` traits could be used instead, +and something like `#[deriving(Tuple3)]` could possibly be added for tuple +structs. + +Unresolved questions +==================== + +None. diff --git a/text/0192-bounds-on-object-and-generic-types.md b/text/0192-bounds-on-object-and-generic-types.md new file mode 100644 index 00000000000..060a7081a7d --- /dev/null +++ b/text/0192-bounds-on-object-and-generic-types.md @@ -0,0 +1,452 @@ +- Start Date: 2014-08-06 +- RFC PR: https://github.com/rust-lang/rfcs/pull/192 +- Rust Issue: https://github.com/rust-lang/rust/issues/16462 + +# Summary + +- Remove the special-case bound `'static` and replace with a generalized + *lifetime bound* that can be used on objects and type parameters. +- Remove the rules that aim to prevent references from being stored + into objects and replace with a simple lifetime check. +- Tighten up type rules pertaining to reference lifetimes and + well-formed types containing references. +- Introduce explicit lifetime bounds (`'a:'b`), with the meaning that + the lifetime `'a` outlives the lifetime `'b`. These exist today but + are always inferred; this RFC adds the ability to specify them + explicitly, which is sometimes needed in more complex cases. + +# Motivation + +Currently, the type system is not supposed to allow references to +escape into object types. However, there are various bugs where it +fails to prevent this from happening. Moreover, it is very useful (and +frequently necessary) to store a reference into an object. Moreover, +the current treatment of generic types is in some cases naive and not +obviously sound. + +# Detailed design + +## Lifetime bounds on parameters + +The heart of the new design is the concept of a *lifetime bound*. In fact, +this (sort of) exists today in the form of the `'static` bound: + + fn foo(x: A) { ... } + +Here, the notation `'static` means "all borrowed content within `A` +outlives the lifetime `'static`". (Note that when we say that +something outlives a lifetime, we mean that it lives *at least that +long*. In other words, for any lifetime `'a`, `'a` outlives `'a`. This +is similar to how we say that every type `T` is a subtype of itself.) + +In the newer design, it is possible to use an arbitrary lifetime as a +bound, and not just `'static`: + + fn foo<'a, A:'a>(x: A) { ... } + +Explicit lifetime bounds are in fact only rarely necessary, for two +reasons: + +1. The compiler is often able to infer this relationship from the argument + and return types. More on this below. +2. It is only important to bound the lifetime of a generic type like + `A` when one of two things is happening (and both of these are + cases where the inference generally is sufficient): + - A borrowed pointer to an `A` instance (i.e., value of type `&A`) + is being consumed or returned. + - A value of type `A` is being closed over into an object reference + (or closure, which per the unboxed closures RFC is really the + same thing). + +Note that, per RFC 11, these lifetime bounds may appear in types as +well (this is important later on). For example, an iterator might be +declared: + + struct Items<'a, T:'a> { + v: &'a Collection + } + +Here, the constraint `T:'a` indicates that the data being iterated +over must live at least as long as the collection (logically enough). + +## Lifetime bounds on object types + +Like parameters, all object types have a lifetime bound. Unlike +parameter types, however, object types are *required* to have exactly +one bound. This bound can be either specified explicitly or derived +from the traits that appear in the object type. In general, the rule is +as follows: + +- If an explicit bound is specified, use that. +- Otherwise, let S be the set of lifetime bounds we can derive. +- Otherwise, if S contains 'static, use 'static. +- Otherwise, if S is a singleton set, use that. +- Otherwise, error. + +Here are some examples: + + trait IsStatic : 'static { } + trait Is<'a> : 'a { } + + // Type Bounds + // IsStatic 'static + // Is<'a> 'a + // IsStatic+Is<'a> 'static+'a + // IsStatic+'a 'static+'a + // IsStatic+Is<'a>+'b 'static,'a,'b + +Object types must have exactly one bound -- zero bounds is not +acceptable. Therefore, if an object type with no derivable bounds +appears, we will supply a default lifetime using the normal rules: + + trait Writer { /* no derivable bounds */ } + struct Foo<'a> { + Box, // Error: try Box or Box + Box, // OK: Send implies 'static + &'a Writer, // Error: try &'a (Writer+'a) + } + + fn foo(a: Box, // OK: Sugar for Box where 'a fresh + b: &Writer) // OK: Sugar for &'b (Writer+'c) where 'b, 'c fresh + { ... } + +This kind of annotation can seem a bit tedious when using object types +extensively, though type aliases can help quite a bit: + + type WriterObj = Box; + type WriterRef<'a> = &'a (Writer+'a); + +The unresolved questions section discussed possibles ways to lighten +the burden. + +See Appendix B for the motivation on why object types are permitted to +have exactly one lifetime bound. + +## Specifying relations between lifetimes + +Currently, when a type or fn has multiple lifetime parameters, there +is no facility to explicitly specify a relationship between them. For +example, in a function like this: + + fn foo<'a, 'b>(...) { ... } + +the lifetimes `'a` and `'b` are declared as independent. In some +cases, though, it can be important that there be a relation between +them. In most cases, these relationships can be inferred (and in fact +are inferred today, see below), but it is useful to be able to state +them explicitly (and necessary in some cases, see below). + +A *lifetime bound* is written `'a:'b` and it means that "`'a` outlives +`'b`". For example, if `foo` were declared like so: + + fn foo<'x, 'y:'x>(...) { ... } + +that would indicate that the lifetime '`x` was shorter than (or equal +to) `'y`. + +## The "type must outlive" and well-formedness relation + +Many of the rules to come make use of a "type must outlive" relation, +written `T outlives 'a`. This relation means primarily that all +borrowed data in `T` is known to have a lifetime of at least '`a` +(hence the name). However, the relation also guarantees various basic +lifetime constraints are met. For example, for every reference type +`&'b U` that is found within `T`, it would be required that `U +outlives 'b` (and that `'b` outlives `'a`). + +In fact, `T outlives 'a` is defined on another function `WF(T:'a)`, +which yields up a list of lifetime relations that must hold for `T` to +be well-formed and to outlive `'a`. It is not necessary to understand +the details of this relation in order to follow the rest of the RFC, I +will defer its precise specification to an appendix below. + +For this section, it suffices to give some examples: + + // int always outlives any region + WF(int : 'a) = [] + + // a reference with lifetime 'a outlives 'b if 'a outlives 'b + WF(&'a int : 'b) = ['a : 'b] + + // the outer reference must outlive 'c, and the inner reference + // must outlive the outer reference + WF(&'a &'b int : 'c) = ['a : 'c, 'b : 'a] + + // Object type with bound 'static + WF(SomeTrait+'static : 'a) = ['static : 'a] + + // Object type with bound 'a + WF(SomeTrait+'a : 'b) = ['a : 'b] + +## Rules for when object closure is legal + +Whenever data of type `T` is closed over to form an object, the type +checker will require that `T outlives 'a` where `'a` is the primary +lifetime bound of the object type. + +## Rules for types to be well-formed + +Currently we do not apply any tests to the types that appear in type +declarations. Per RFC 11, however, this should change, as we intend to +enforce trait bounds on types, wherever those types appear. Similarly, +we should be requiring that types are well-formed with respect to the +`WF` function. This means that a type like the following would be +illegal without a lifetime bound on the type parameter `T`: + + struct Ref<'a, T> { c: &'a T } + +This is illegal because the field `c` has type `&'a T`, which is only +well-formed if `T:'a`. Per usual practice, this RFC does not propose +any form of inference on struct declarations and instead requires all +conditions to be spelled out (this is in contrast to fns and methods, +see below). + +## Rules for expression type validity + +We should add the condition that for every expression with lifetime +`'e` and type `T`, then `T outlives 'e`. We already enforce this in +many special cases but not uniformly. + +## Inference + +The compiler will infer lifetime bounds on both type parameters and +region parameters as follows. Within a function or method, we apply +the wellformedness function `WF` to each function or parameter type. +This yields up a set of relations that must hold. The idea here is +that the caller could not have type checked unless the types of the +arguments were well-formed, so that implies that the callee can assume +that those well-formedness constraints hold. + +As an example, in the following function: + + fn foo<'a, A>(x: &'a A) { ... } + +the callee here can assume that the type parameter `A` outlives the +lifetime `'a`, even though that was not explicitly declared. + +Note that the inference also pulls in constraints that were declared +on the types of arguments. So, for example, if there is a type `Items` +declared as follows: + + struct Items<'a, T:'a> { ... } + +And a function that takes an argument of type `Items`: + + fn foo<'a, T>(x: Items<'a, T>) { ... } + +The inference rules will conclude that `T:'a` because the `Items` type +was declared with that bound. + +In practice, these inference rules largely remove the need to manually +declare lifetime relations on types. When porting the existing library +and rustc over to these rules, I had to add explicit lifetime bounds +to exactly one function (but several types, almost exclusively +iterators). + +Note that this sort of inference is already done. This RFC simply +proposes a more extensive version that also includes bounds of the +form `X:'a`, where `X` is a type parameter. + +# What does all this mean in practice? + +This RFC has a lot of details. The main implications for end users are: + +1. Object types must specify a lifetime bound when they appear in a type. + This most commonly means changing `Box` to `Box` + and `&'a Trait` to `&'a Trait+'a`. +2. For types that contain references to generic types, lifetime bounds + are needed in the type definition. This comes up most often in iterators: + + struct Items<'a, T:'a> { + x: &'a [T] + } + + Here, the presence of `&'a [T]` within the type definition requires + that the type checker can show that `T outlives 'a` which in turn + requires the bound `T:'a` on the type definition. These bounds are + rarely outside of type definitions, because they are almost always + implied by the types of the arguments. +3. It is sometimes, but rarely, necessary to use lifetime bounds, + specifically around double indirections (references to references, + often the second reference is contained within a struct). For + example: + + struct GlobalContext<'global> { + arena: &'global Arena + } + + struct LocalContext<'local, 'global:'local> { + x: &'local mut Context<'global> + } + + Here, we must know that the lifetime `'global` outlives `'local` in + order for this type to be well-formed. + +# Phasing + +Some parts of this RFC require new syntax and thus must be phased in. +The current plan is to divide the implementation three parts: + +1. Implement support for everything in this RFC except for region bounds + and requiring that every expression type be well-formed. Enforcing + the latter constraint leads to type errors that require lifetime + bounds to resolve. +2. Implement support for `'a:'b` notation to be parsed under a feature + gate `issue_5723_bootstrap`. +3. Implement the final bits of the RFC: + - Bounds on lifetime parameters + - Wellformedness checks on every expression + - Wellformedness checks in type definitions + +Parts 1 and 2 can be landed simultaneously, but part 3 requires a +snapshot. Parts 1 and 2 have largely been written. Depending on +precisely how the timing works out, it might make sense to just merge +parts 1 and 3. + +# Drawbacks / Alternatives + +If we do not implement some solution, we could continue with the +current approach (but patched to be sound) of banning references from +being closed over in object types. I consider this a non-starter. + +# Unresolved questions + +## Inferring wellformedness bounds + +Under this RFC, it is required to write bounds on struct types which are +in principle inferable from their contents. For example, iterators +tend to follow a pattern like: + + struct Items<'a, T:'a> { + x: &'a [T] + } + +Note that `T` is bounded by `'a`. It would be possible to infer these +bounds, but I've stuck to our current principle that type definitions +are always fully spelled out. The danger of inference is that it +becomes unclear *why* a particular constraint exists if one must +traverse the type hierarchy deeply to find its origin. This could +potentially be addressed with better error messages, though our track +record for lifetime error messages is not very good so far. + +Also, there is a potential interaction between this sort of inference +and the description of default trait bounds below. + +## Default trait bounds + +When referencing a trait object, it is almost *always* the case that one follows +certain fixed patterns: + +- `Box` +- `Rc` (once DST works) +- `&'a (Trait+'a)` +- and so on. + +You might think that we should simply provide some kind of defaults +that are sensitive to where the `Trait` appears. The same is probably +true of struct type parameters (in other words, `&'a SomeStruct<'a>` +is a very common pattern). + +However, there are complications: + +- What about a type like `struct Ref<'a, T:'a> { x: &'a T }`? `Ref<'a, + Trait>` should really work the same way as `&'a Trait`. One way that + I can see to do this is to drive the defaulting based on the default + trait bounds of the `T` type parameter -- but if we do that, it is + both a non-local default (you have to consult the definition of + `Ref`) and interacts with the potential inference described in the + previous section. +- There *are* reasons to want a type like `Box`. For example, + the macro parser includes a function like: + + fn make_macro_ext<'cx>(cx: &'cx Context, ...) -> Box + + In other words, this function returns an object that closes over the + macro context. In such a case, if `Box` implies a static + bound, then taking ownership of this macro object would require a signature + like: + + fn take_macro_ext<'cx>(b: Box) { } + + Note that the `'cx` variable is only used in one place. It's purpose + is just to disable the `'static` default that would otherwise be + inserted. + +# Appendix: Definition of the outlives relation and well-formedness + +To make this more specific, we can "formally" model the Rust type +system as: + + T = scalar (int, uint, fn(...)) // Boring stuff + | *const T // Unsafe pointer + | *mut T // Unsafe pointer + | Id

// Nominal type (struct, enum) + | &'x T // Reference + | &'x mut T // Mutable reference + | {TraitReference

}+'x // Object type + | X // Type variable + P = {'x} + {T} + +We can define a function `WF(T : 'a)` which, given a type `T` and +lifetime `'a` yields a list of `'b:'c` or `X:'d` pairs. For each pair +`'b:'c`, the lifetime `'b` must outlive the lifetime `'c` for the type +`T` to be well-formed in a location with lifetime `'a`. For each pair +`X:'d`, the type parameter `X` must outlive the lifetime `'d`. + +- `WF(int : 'a)` yields an empty list +- `WF(X:'a)` where `X` is a type parameter yields `(X:'a)`. +- `WF(Foo

:'a)` where `Foo

` is an enum or struct type yields: + - For each lifetime parameter `'b` that is contravariant or invariant, + `'b : 'a`. + - For each type parameter `T` that is covariant or invariant, the + results of `WF(T : 'a)`. + - The lifetime bounds declared on `Foo`'s lifetime or type parameters. + - The reasoning here is that if we can reach borrowed data with + lifetime `'a` through `Foo<'a>`, then `'a` must be contra- or + invariant. Covariant lifetimes only occur in "setter" + situations. Analogous reasoning applies to the type case. +- `WF(T:'a)` where `T` is an object type: + - For the primary bound `'b`, `'b : 'a`. + - For each derived bound `'c` of `T`, `'b : 'c` + - Motivation: The primary bound of an object type implies that all + other bounds are met. This simplifies some of the other + formulations and does not represent a loss of expressiveness. + +We can then say that `T outlives 'a` if all lifetime relations +returned by `WF(T:'a)` hold. + +# Appendix B: Why object types must have exactly one bound + +The motivation is that handling multiple bounds is overwhelmingly +complicated to reason about and implement. In various places, +constraints arise of the form `all i. exists j. R[i] <= R[j]`, where +`R` is a list of lifetimes. This is challenging for lifetime +inference, since there are many options for it to choose from, and +thus inference is no longer a fixed-point iteration. Moreover, it +doesn't seem to add any particular expressiveness. + +The places where this becomes important are: + +- Checking lifetime bounds when data is closed over into an object type +- Subtyping between object types, which would most naturally be + contravariant in the lifetime bound + +Similarly, requiring that the "master" bound on object lifetimes outlives +all other bounds also aids inference. Now, given a type like the +following: + + trait Foo<'a> : 'a { } + trait Bar<'b> : 'b { } + + ... + + let x: Box+Bar<'b>> + +the inference engine can create a fresh lifetime variable `'0` for the +master bound and then say that `'0:'a` and `'0:'b`. Without the +requirement that `'0` be a master bound, it would be somewhat unclear +how `'0` relates to `'a` and `'b` (in fact, there would be no +necessary relation). But if there is no necessary relation, then when +closing over data, one would have to ensure that the closed over data +outlives *all* derivable lifetime bounds, which again creates a +constraint of the form `all i. exists j.`. diff --git a/text/0194-cfg-syntax.md b/text/0194-cfg-syntax.md new file mode 100644 index 00000000000..769e4a3c808 --- /dev/null +++ b/text/0194-cfg-syntax.md @@ -0,0 +1,102 @@ +- Start Date: 2014-08-09 +- RFC PR #: [rust-lang/rfcs#194](https://github.com/rust-lang/rfcs/pull/194) +- Rust Issue: [rust-lang/rust#17490](https://github.com/rust-lang/rust/issues/17490) + +# Summary + +The `#[cfg(...)]` attribute provides a mechanism for conditional compilation of +items in a Rust crate. This RFC proposes to change the syntax of `#[cfg]` to +make more sense as well as enable expansion of the conditional compilation +system to attributes while maintaining a single syntax. + +# Motivation + +In the current implementation, `#[cfg(...)]` takes a comma separated list of +`key`, `key = "value"`, `not(key)`, or `not(key = "value")`. An individual +`#[cfg(...)]` attribute "matches" if *all* of the contained cfg patterns match +the compilation environment, and an item preserved if it *either* has no +`#[cfg(...)]` attributes or *any* of the `#[cfg(...)]` attributes present +match. + +This is problematic for several reasons: + +* It is excessively verbose in certain situations. For example, implementing + the equivalent of `(a AND (b OR c OR d))` requires three separate + attributes and `a` to be duplicated in each. +* It differs from all other attributes in that all `#[cfg(...)]` attributes on + an item must be processed together instead of in isolation. This change + will move `#[cfg(...)]` closer to implementation as a normal syntax + extension. + +# Detailed design + +The `

` inside of `#[cfg(

)]` will be called a *cfg pattern* and have a +simple recursive syntax: + +* `key` is a cfg pattern and will match if `key` is present in the + compilation environment. +* `key = "value"` is a cfg pattern and will match if a mapping from `key` + to `value` is present in the compilation environment. At present, key-value + pairs only exist for compiler defined keys such as `target_os` and + `endian`. +* `not(

)` is a cfg pattern if `

` is and matches if `

` does not match. +* `all(

, ...)` is a cfg pattern if all of the comma-separated `

`s are cfg + patterns and all of them match. +* `any(

, ...)` is a cfg pattern if all of the comma-separated `

`s are cfg + patterns and any of them match. + +If an item is tagged with `#[cfg(

)]`, that item will be stripped from the +AST if the cfg pattern `

` does not match. + +One implementation hazard is that the semantics of +```rust +#[cfg(a)] +#[cfg(b)] +fn foo() {} +``` +will change from "include `foo` if *either of* `a` and `b` are present in the +compilation environment" to "include `foo` if *both of* `a` and `b` are present +in the compilation environment". To ease the transition, the old semantics of +multiple `#[cfg(...)]` attributes will be maintained as a special case, with a +warning. After some reasonable period of time, the special case will be +removed. + +In addition, `#[cfg(a, b, c)]` will be accepted with a warning and be +equivalent to `#[cfg(all(a, b, c))]`. Again, after some reasonable period of +time, this behavior will be removed as well. + +The `cfg!()` syntax extension will be modified to accept cfg patterns as well. +A `#[cfg_attr(

, )]` syntax extension will be added +([PR 16230](https://github.com/rust-lang/rust/pull/16230)) which will expand to +`#[]` if the cfg pattern `

` matches. The test harness's +`#[ignore]` attribute will have its built-in cfg filtering +functionality stripped in favor of `#[cfg_attr(

(&self, pred: P) where P: Predicate { ... } + ... +} +``` + +Since these two patterns are particularly common throughout `std`, this RFC +proposes adding both of the above traits, and using them to cut down on the +number of method variants. + +In particular, some methods on string slices currently work with `CharEq`, which +is similar to `Predicate`: + +```rust +pub trait CharEq { + fn matches(&mut self, char) -> bool; + fn only_ascii(&self) -> bool; +} +``` + +The difference is the `only_ascii` method, which is used to optimize certain +operations when the predicate only holds for characters in the ASCII range. + +To keep these optimizations intact while connecting to `Predicate`, this RFC +proposes the following restructuring of `CharEq`: + +```rust +pub trait CharPredicate: Predicate { + fn only_ascii(&self) -> bool { + false + } +} +``` + +### Why not leverage unboxed closures? + +A natural question is: why not use the traits for unboxed closures to achieve a +similar effect? For example, you could imagine writing a blanket `impl` for +`Fn(&T) -> bool` for any `T: PartialEq`, which would allow `PartialEq` values to +be used anywhere a predicate-like closure was requested. + +The problem is that these blanket `impl`s will often conflict. In particular, +*any* type `T` could implement `Fn() -> T`, and that single blanket `impl` would +preclude any others (at least, assuming that unboxed closure traits treat the +argument and return types as associated (output) types). + +In addition, the explicit use of traits like `Predicate` makes the intended +semantics more clear, and the overloading less surprising. + +## The APIs + +Now we'll delve into the detailed APIs for the various concrete +collections. These APIs will often be given in tabular form, grouping together +common APIs across multiple collections. When writing these function signatures: + +* We will assume a type parameter `T` for `Vec`, `BinaryHeap`, `DList` and `RingBuf`; +we will also use this parameter for APIs on `String`, where it should be +understood as `char`. + +* We will assume type parameters `K: Borrow` and `V` for `HashMap` and +`TreeMap`; for `TrieMap` and `SmallIntMap` the `K` is assumed to be `uint` + +* We will assume a type parameter `K: Borrow` for `HashSet` and `TreeSet`; for + `BitvSet` it is assumed to be `uint`. + +We will begin by outlining the most widespread APIs in tables, making it easy to +compare names and signatures across different kinds of collections. Then we will +focus on some APIs specific to particular classes of collections -- e.g. sets +and maps. Finally, we will briefly discuss APIs that are specific to a single +concrete collection. + +### Construction + +All of the collections should support a static function: + +```rust +fn new() -> Self +``` + +that creates an empty version of the collection; the constructor may take +arguments needed to set up the collection, e.g. the capacity for `LruCache`. + +Several collections also support separate constructors for providing capacities in +advance; these are discussed [below](#capacity-management). + +#### The `FromIterator` trait + +All of the collections should implement the `FromIterator` trait: + +```rust +pub trait FromIterator { + type A: + fn from_iter(T) -> Self where T: IntoIterator; +} +``` + +Note that this varies from today's `FromIterator` by consuming an `IntoIterator` +rather than `Iterator`. As explained [above](#intoiterator-and-iterable), this +choice is strictly more general and will not break any existing code. + +This constructor initializes the collection with the contents of the +iterator. For maps, the iterator is over key/value pairs, and the semantics is +equivalent to inserting those pairs in order; if keys are repeated, the last +value is the one left in the map. + +### Insertion + +The table below gives methods for inserting items into various concrete collections: + +Operation | Collections +--------- | ----------- +`fn push(&mut self, T)` | `Vec`, `BinaryHeap`, `String` +`fn push_front(&mut self, T)` | `DList`, `RingBuf` +`fn push_back(&mut self, T)` | `DList`, `RingBuf` +`fn insert(&mut self, uint, T)` | `Vec`, `RingBuf`, `String` +`fn insert(&mut self, K::Owned) -> bool` | `HashSet`, `TreeSet`, `TrieSet`, `BitvSet` +`fn insert(&mut self, K::Owned, V) -> Option` | `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap` +`fn append(&mut self, Self)` | `DList` +`fn prepend(&mut self, Self)` | `DList` + +There are a few changes here from the current state of affairs: + +* The `DList` and `RingBuf` data structures no longer provide `push`, but rather + `push_front` and `push_back`. This change is based on (1) viewing them as + deques and (2) not giving priority to the "front" or the "back". + +* The `insert` method on maps returns the value previously associated with the + key, if any. Previously, this functionality was provided by a `swap` method, + which has been dropped (consolidating needless method variants.) + +Aside from these changes, a number of insertion methods will be deprecated +(e.g. the `append` and `append_one` methods on `Vec`). These are discussed +further in the section on "specialized operations" +[below](#specialized-operations). + +#### The `Extend` trait (was: `Extendable`) + +In addition to the standard insertion operations above, *all* collections will +implement the `Extend` trait. This trait was previously called `Extendable`, but +in general we +[prefer to avoid](http://aturon.github.io/style/naming/README.html) `-able` +suffixes and instead name the trait using a verb (or, especially, the key method +offered by the trait.) + +The `Extend` trait allows data from an arbitrary iterator to be inserted into a +collection, and will be defined as follows: + +```rust +pub trait Extend: FromIterator { + fn extend(&mut self, T) where T: IntoIterator; +} +``` + +As with `FromIterator`, this trait has been modified to take an `IntoIterator` +value. + +### Deletion + +The table below gives methods for removing items into various concrete collections: + +Operation | Collections +--------- | ----------- +`fn clear(&mut self)` | *all* +`fn pop(&mut self) -> Option` | `Vec`, `BinaryHeap`, `String` +`fn pop_front(&mut self) -> Option` | `DList`, `RingBuf` +`fn pop_back(&mut self) -> Option` | `DList`, `RingBuf` +`fn remove(&mut self, uint) -> Option` | `Vec`, `RingBuf`, `String` +`fn remove(&mut self, &K) -> bool` | `HashSet`, `TreeSet`, `TrieSet`, `BitvSet` +`fn remove(&mut self, &K) -> Option` | `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap` +`fn truncate(&mut self, len: uint)` | `Vec`, `String`, `Bitv`, `DList`, `RingBuf` +`fn retain

(&mut self, f: P) where P: Predicate` | `Vec`, `DList`, `RingBuf` +`fn dedup(&mut self)` | `Vec`, `DList`, `RingBuf` where `T: PartialEq` + +As with the insertion methods, there are some differences from today's API: + +* The `DList` and `RingBuf` data structures no longer provide `pop`, but rather + `pop_front` and `pop_back` -- similarly to the `push` methods. + +* The `remove` method on maps returns the value previously associated with the + key, if any. Previously, this functionality was provided by a separate `pop` + method, which has been dropped (consolidating needless method variants.) + +* The `retain` method takes a `Predicate`. + +* The `truncate`, `retain` and `dedup` methods are offered more widely. + +Again, some of the more specialized methods are not discussed here; see +"specialized operations" [below](#specialized-operations). + +### Inspection/mutation + +The next table gives methods for inspection and mutation of existing items in collections: + +Operation | Collections +--------- | ----------- +`fn len(&self) -> uint` | *all* +`fn is_empty(&self) -> bool` | *all* +`fn get(&self, uint) -> Option<&T>` | `[T]`, `Vec`, `RingBuf` +`fn get_mut(&mut self, uint) -> Option<&mut T>` | `[T]`, `Vec`, `RingBuf` +`fn get(&self, &K) -> Option<&V>` | `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap` +`fn get_mut(&mut self, &K) -> Option<&mut V>` | `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap` +`fn contains

(&self, P) where P: Predicate` | `[T]`, `str`, `Vec`, `String`, `DList`, `RingBuf`, `BinaryHeap` +`fn contains(&self, &K) -> bool` | `HashSet`, `TreeSet`, `TrieSet`, `EnumSet` +`fn contains_key(&self, &K) -> bool` | `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap` + +The biggest changes from the current APIs are: + +* The `find` and `find_mut` methods have been renamed to `get` and `get_mut`. + Further, all `get` methods return `Option` values and do not invoke `fail!`. + This is part of a general convention described in the next section (on the + `Index` traits). + +* The `contains` method is offered more widely. + +* There is no longer an equivalent of `find_copy` (which should be called + `find_clone`). Instead, we propose to add the following method to the `Option<&'a T>` + type where `T: Clone`: + + ```rust + fn cloned(self) -> Option { + self.map(|x| x.clone()) + } + ``` + + so that `some_map.find_copy(key)` will instead be written + `some_map.find(key).cloned()`. This method chain is slightly longer, but is + more clear and allows us to drop the `_copy` variants. Moreover, *all* users + of `Option` benefit from the new convenience method. + +#### The `Index` trait + +The `Index` and `IndexMut` traits provide indexing notation like `v[0]`: + +```rust +pub trait Index { + type Index; + type Result; + fn index(&'a self, index: &Index) -> &'a Result; +} + +pub trait IndexMut { + type Index; + type Result; + fn index_mut(&'a mut self, index: &Index) -> &'a mut Result; +} +``` + +These traits will be implemented for: `[T]`, `Vec`, `RingBuf`, `HashMap`, `TreeMap`, `TrieMap`, `SmallIntMap`. + +As a general convention, implementation of the `Index` traits will *fail the +task* if the index is invalid (out of bounds or key not found); they will +therefor return direct references to values. Any collection implementing `Index` +(resp. `IndexMut`) should also provide a `get` method (resp. `get_mut`) as a +non-failing variant that returns an `Option` value. + +This allows us to keep indexing notation maximally concise, while still +providing convenient non-failing variants (which can be used to provide a check +for index validity). + +### Iteration + +Every collection should provide the standard trio of iteration methods: + +```rust +fn iter(&'a self) -> Items<'a>; +fn iter_mut(&'a mut self) -> ItemsMut<'a>; +fn into_iter(self) -> ItemsMove; +``` + +and in particular implement the `IntoIterator` trait on both the collection type +and on (mutable) references to it. + +### Capacity management + +many of the collections have some notion of "capacity", which may be fixed, grow +explicitly, or grow implicitly: + +- No capacity/fixed capacity: `DList`, `TreeMap`, `TreeSet`, `TrieMap`, `TrieSet`, slices, `EnumSet` +- Explicit growth: `LruCache` +- Implicit growth: `Vec`, `RingBuf`, `HashMap`, `HashSet`, `BitvSet`, `BinaryHeap` + +Growable collections provide functions for capacity management, as follows. + +#### Explicit growth + +For explicitly-grown collections, the normal constructor (`new`) takes a +capacity argument. Capacity can later be inspected or updated as follows: + +```rust +fn capacity(&self) -> uint +fn set_capacity(&mut self, capacity: uint) +``` + +(Note, this renames `LruCache::change_capacity` to `set_capacity`, the +prevailing style for setter method.) + +#### Implicit growth + +For implicitly-grown collections, the normal constructor (`new`) does not take a +capacity, but there is an explicit `with_capacity` constructor, along with other +functions to work with the capacity later on: + +```rust +fn with_capacity(uint) -> Self +fn capacity(&self) -> uint +fn reserve(&mut self, additional: uint) +fn reserve_exact(&mut self, additional: uint) +fn shrink_to_fit(&mut self) +``` + +There are some important changes from the current APIs: + +* The `reserve` and `reserve_exact` methods now take as an argument the *extra* + space to reserve, rather than the final desired capacity, as this usage is + vastly more common. The `reserve` function may grow the capacity by a larger + amount than requested, to ensure amortization, while `reserve_exact` will + reserve exactly the requested additional capacity. The `reserve_additional` + methods are deprecated. + +* The `with_capacity` constructor does *not* take any additional arguments, for + uniformity with `new`. This change affects `Bitv` in particular. + +#### Bounded iterators + +Some of the maps (e.g. `TreeMap`) currently offer specialized iterators over +their entries starting at a given key (called `lower_bound`) and above a given +key (called `upper_bound`), along with `_mut` variants. While the functionality +is worthwhile, the names are not very clear, so this RFC proposes the following +replacement API (thanks to [@Gankro for the suggestion](https://github.com/rust-lang/rfcs/pull/235#issuecomment-55512788)): + +```rust +Bound { + /// An inclusive bound + Included(T), + + /// An exclusive bound + Excluded(T), + + Unbounded, +} + +/// Creates a double-ended iterator over a sub-range of the collection's items, +/// starting at min, and ending at max. If min is `Unbounded`, then it will +/// be treated as "negative infinity", and if max is `Unbounded`, then it will +/// be treated as "positive infinity". Thus range(Unbounded, Unbounded) will yield +/// the whole collection. +fn range(&self, min: Bound<&T>, max: Bound<&T>) -> RangedItems<'a, T>; + +fn range_mut(&self, min: Bound<&T>, max: Bound<&T>) -> RangedItemsMut<'a, T>; +``` + +These iterators should be provided for any maps over ordered keys (`TreeMap`, +`TrieMap` and `SmallIntMap`). + +In addition, analogous methods should be provided for sets over ordered keys +(`TreeSet`, `TrieSet`, `BitvSet`). + +### Set operations + +#### Comparisons + +All sets should offer the following methods, as they do today: + +```rust +fn is_disjoint(&self, other: &Self) -> bool; +fn is_subset(&self, other: &Self) -> bool; +fn is_superset(&self, other: &Self) -> bool; +``` + +#### Combinations + +Sets can also be combined using the standard operations -- union, intersection, +difference and symmetric difference (exclusive or). Today's APIs for doing so +look like this: + +```rust +fn union<'a>(&'a self, other: &'a Self) -> I; +fn intersection<'a>(&'a self, other: &'a Self) -> I; +fn difference<'a>(&'a self, other: &'a Self) -> I; +fn symmetric_difference<'a>(&'a self, other: &'a Self) -> I; +``` + +where the `I` type is an iterator over keys that varies by concrete +set. Working with these iterators avoids materializing intermediate +sets when they're not needed; the `collect` method can be used to +create sets when they are. This RFC proposes to keep these names +intact, following the +[RFC](https://github.com/rust-lang/rfcs/pull/344) on iterator +conventions. + +Sets should also implement the `BitOr`, `BitAnd`, `BitXor` and `Sub` traits from +`std::ops`, allowing overloaded notation `|`, `&`, `|^` and `-` to be used with +sets. These are equivalent to invoking the corresponding `iter_` method and then +calling `collect`, but for some sets (notably `BitvSet`) a more efficient direct +implementation is possible. + +Unfortunately, we do not yet have a set of traits corresponding to operations +`|=`, `&=`, etc, but again in some cases doing the update in place may be more +efficient. Right now, `BitvSet` is the only concrete set offering such operations: + +```rust +fn union_with(&mut self, other: &BitvSet) +fn intersect_with(&mut self, other: &BitvSet) +fn difference_with(&mut self, other: &BitvSet) +fn symmetric_difference_with(&mut self, other: &BitvSet) +``` + +This RFC punts on the question of naming here: it does *not* propose a new set +of names. Ideally, we would add operations like `|=` in a separate RFC, and use +those conventionally for sets. If not, we will choose fallback names during the +stabilization of `BitvSet`. + +### Map operations + +#### Combined methods + +The `HashMap` type currently provides a somewhat bewildering set of `find`/`insert` variants: + +```rust +fn find_or_insert(&mut self, k: K, v: V) -> &mut V +fn find_or_insert_with<'a>(&'a mut self, k: K, f: |&K| -> V) -> &'a mut V +fn insert_or_update_with<'a>(&'a mut self, k: K, v: V, f: |&K, &mut V|) -> &'a mut V +fn find_with_or_insert_with<'a, A>(&'a mut self, k: K, a: A, found: |&K, &mut V, A|, not_found: |&K, A| -> V) -> &'a mut V +``` + +These methods are used to couple together lookup and insertion/update +operations, thereby avoiding an extra lookup step. However, the current set of +method variants seems overly complex. + +There is [another RFC](https://github.com/rust-lang/rfcs/pull/216) already in +the queue addressing this problem in a very nice way, and this RFC defers to +that one + +#### Key and value iterators + +In addition to the standard iterators, maps should provide by-reference +convenience iterators over keys and values: + +```rust +fn keys(&'a self) -> Keys<'a, K> +fn values(&'a self) -> Values<'a, V> +``` + +While these iterators are easy to define in terms of the main `iter` method, +they are used often enough to warrant including convenience methods. + +### Specialized operations + +Many concrete collections offer specialized operations beyond the ones given +above. These will largely be addressed through the API stabilization process +(which focuses on local API issues, as opposed to general conventions), but a +few broad points are addressed below. + +#### Relating `Vec` and `String` to slices + +One goal of this RFC is to supply all of the methods on (mutable) slices on +`Vec` and `String`. There are a few ways to achieve this, so concretely the +proposal is for `Vec` to implement `Deref<[T]>` and `DerefMut<[T]>`, and +`String` to implement `Deref`. This will automatically allow all slice +methods to be invoked from vectors and strings, and will allow writing `&*v` +rather than `v.as_slice()`. + +In this scheme, `Vec` and `String` are really "smart pointers" around the +corresponding slice types. While counterintuitive at first, this perspective +actually makes a fair amount of sense, especially with DST. + +(Initially, it was unclear whether this strategy would play well with method +resolution, but the planned resolution rules should work fine.) + +#### `String` API + +One of the key difficulties with the `String` API is that strings use utf8 +encoding, and some operations are only efficient when working at the byte level +(and thus taking this encoding into account). + +As a general principle, we will move the API toward the following convention: +index-related operations always work in terms of bytes, other operations deal +with chars by default (but can have suffixed variants for working at other +granularities when appropriate.) + +#### `DList` + +The `DList` type offers a number of specialized methods: + +```rust +swap_remove, insert_when, insert_ordered, merge, rotate_forward and rotate_backward +``` + +Prior to stabilizing the `DList` API, we will attempt to simplify its API +surface, possibly by using idea from the +[collection views RFC](https://github.com/rust-lang/rfcs/pull/216). + +### Minimizing method variants via iterators + +#### Partitioning via `FromIterator` + +One place we can move toward iterators is functions like `partition` and +`partitioned` on vectors and slices: + +```rust +// on Vec +fn partition(self, f: |&T| -> bool) -> (Vec, Vec); + +// on [T] where T: Clone +fn partitioned(&self, f: |&T| -> bool) -> (Vec, Vec); +``` + +These two functions transform a vector/slice into a pair of vectors, based on a +"partitioning" function that says which of the two vectors to place elements +into. The `partition` variant works by moving elements of the vector, while +`paritioned` clones elements. + +There are a few unfortunate aspects of an API like this one: + +* It's specific to vectors/slices, although in principle both the source and + target containers could be more general. + +* The fact that two variants have to be exposed, for owned versus clones, is + somewhat unfortunate. + +This RFC proposes the following alternative design: + +```rust +pub enum Either { + pub Left(T), + pub Right(U), +} + +impl FromIterator for (A, B) where A: Extend, B: Extend { + fn from_iter(mut iter: I) -> (A, B) where I: IntoIterator> { + let mut left: A = FromIterator::from_iter(None::); + let mut right: B = FromIterator::from_iter(None::); + + for item in iter { + match item { + Left(t) => left.extend(Some(t)), + Right(u) => right.extend(Some(u)), + } + } + + (left, right) + } +} + +trait Iterator { + ... + fn partition(self, |&A| -> bool) -> Partitioned { ... } +} + +// where Partitioned: Iterator> +``` + +This design drastically generalizes the partitioning functionality, allowing it +be used with arbitrary collections and iterators, while removing the +by-reference and by-value distinction. + +Using this design, you have: + +```rust +// The following two lines are equivalent: +let (u, w) = v.partition(f); +let (u, w): (Vec, Vec) = v.into_iter().partition(f).collect(); + +// The following two lines are equivalent: +let (u, w) = v.as_slice().partitioned(f); +let (u, w): (Vec, Vec) = v.iter_cloned().partition(f).collect(); +``` + +There is some extra verbosity, mainly due to the type annotations for `collect`, +but the API is much more flexible, since the partitioned data can now be +collected into other collections (or even differing collections). In addition, +partitioning is supported for *any* iterator. + +#### Removing methods like `from_elem`, `from_fn`, `grow`, and `grow_fn` + +Vectors and some other collections offer constructors and growth functions like +the following: + +```rust +fn from_elem(length: uint, value: T) -> Vec +fn from_fn(length: uint, op: |uint| -> T) -> Vec +fn grow(&mut self, n: uint, value: &T) +fn grow_fn(&mut self, n: uint, f: |uint| -> T) +``` + +These extra variants can easily be dropped in favor of iterators, and this RFC +proposes to do so. + +The `iter` module already contains a `Repeat` iterator; this RFC proposes to add +a free function `repeat` to `iter` as a convenience for `iter::Repeat::new`. + +With that in place, we have: + +```rust +// Equivalent: +let v = Vec::from_elem(n, a); +let v = Vec::from_iter(repeat(a).take(n)); + +// Equivalent: +let v = Vec::from_fn(n, f); +let v = Vec::from_iter(range(0, n).map(f)); + +// Equivalent: +v.grow(n, a); +v.extend(repeat(a).take(n)); + +// Equivalent: +v.grow_fn(n, f); +v.extend(range(0, n).map(f)); +``` + +While these replacements are slightly longer, an important aspect of ergonomics +is *memorability*: by placing greater emphasis on iterators, programmers will +quickly learn the iterator APIs and have those at their fingertips, while +remembering ad hoc method variants like `grow_fn` is more difficult. + +#### Long-term: removing `push_all` and `push_all_move` + +The `push_all` and `push_all_move` methods on vectors are yet more API variants +that could, in principle, go through iterators: + +```rust +// The following are *semantically* equivalent +v.push_all(some_slice); +v.extend(some_slice.iter_cloned()); + +// The following are *semantically* equivalent +v.push_all_move(some_vec); +v.extend(some_vec); +``` + +However, currently the `push_all` and `push_all_move` methods can rely +on the *exact* size of the container being pushed, in order to elide +bounds checks. We do not currently have a way to "trust" methods like +`len` on iterators to elide bounds checks. A separate RFC will +introduce the notion of a "trusted" method which should support such +optimization and allow us to deprecate the `push_all` and +`push_all_move` variants. (This is unlikely to happen before 1.0, so +the methods will probably still be included with "experimental" +status, and likely with different names.) + +# Alternatives + +## `Borrow` and the `Equiv` problem + +### Variants of `Borrow` + +The original version of `Borrow` was somewhat more subtle: + +```rust +/// A trait for borrowing. +/// If `T: Borrow` then `&T` represents data borrowed from `T::Owned`. +trait Borrow for Sized? { + /// The type being borrowed from. + type Owned; + + /// Immutably borrow from an owned value. + fn borrow(&Owned) -> &Self; + + /// Mutably borrow from an owned value. + fn borrow_mut(&mut Owned) -> &mut Self; +} + +trait ToOwned: Borrow { + /// Produce a new owned value, usually by cloning. + fn to_owned(&self) -> Owned; +} + +impl Borrow for A { + type Owned = A; + fn borrow(a: &A) -> &A { + a + } + fn borrow_mut(a: &mut A) -> &mut A { + a + } +} + +impl ToOwned for A { + fn to_owned(&self) -> A { + self.clone() + } +} + +impl Borrow for str { + type Owned = String; + fn borrow(s: &String) -> &str { + s.as_slice() + } + fn borrow_mut(s: &mut String) -> &mut str { + s.as_mut_slice() + } +} + +impl ToOwned for str { + fn to_owned(&self) -> String { + self.to_string() + } +} + +impl Borrow for [T] { + type Owned = Vec; + fn borrow(s: &Vec) -> &[T] { + s.as_slice() + } + fn borrow_mut(s: &mut Vec) -> &mut [T] { + s.as_mut_slice() + } +} + +impl ToOwned for [T] { + fn to_owned(&self) -> Vec { + self.to_vec() + } +} + +impl HashMap where K: Borrow + Hash + Eq { + fn find(&self, k: &K) -> &V { ... } + fn insert(&mut self, k: K::Owned, v: V) -> Option { ... } + ... +} + +pub enum Cow<'a, T> where T: ToOwned { + Shared(&'a T), + Owned(T::Owned) +} +``` + +This approach ties `Borrow` directly to the borrowed data, and uses an +associated type to *uniquely determine* the corresponding owned data type. + +For string keys, we would use `HashMap`. Then, the `find` method would +take an `&str` key argument, while `insert` would take an owned `String`. On the +other hand, for some other type `Foo` a `HashMap` would take +`&Foo` for `find` and `Foo` for `insert`. (More discussion on the choice of +ownership is given in the [alternatives section](#ownership-management-for-keys). + +**Benefits of this alternative**: + +* Unlike the current `_equiv` or `find_with` methods, or the proposal in the +RFC, this approach guarantees coherence about hashing or ordering. For example, +`HashMap` above requires that `K` (the borrowed key type) is `Hash`, and will +produce hashes from owned keys by first borrowing from them. + +* Unlike the proposal in this RFC, the signature of the methods for maps is + *very simple* -- essentially the same as the current `find`, `insert`, etc. + +* Like the proposal in this RFC, there is only a single `Borrow` + trait, so it would be possible to standardize on a `Map` trait later + on and include these APIs. The trait could be made somewhat simpler + with this alternative form of `Borrow`, but can be provided in + either case; see + [these](https://github.com/rust-lang/rfcs/pull/235#issuecomment-55976755) + [comments](https://github.com/rust-lang/rfcs/pull/235#issuecomment-56070223) + for details. + +* The `Cow` data type is simpler than in the RFC's proposal, since it does not + need a type parameter for the owned data. + +**Drawbacks of this alternative**: + +* It's quite subtle that you want to use `HashMap` rather than + `HashMap`. That is, if you try to use a map in the "obvious way" + you will not be able to use string slices for lookup, which is part of what + this RFC is trying to achieve. The same applies to `Cow`. + +* The design is somewhat less flexible than the one in the RFC, because (1) + there is a fixed choice of owned type corresponding to each borrowed type and + (2) you cannot use multiple borrow types for lookups at different types + (e.g. using `&String` sometimes and `&str` other times). On the other hand, + these restrictions guarantee coherence of hashing/equality/comparison. + +* This version of `Borrow`, mapping from borrowed to owned data, is + somewhat less intuitive. + +On the balance, the approach proposed in the RFC seems better, because using the +map APIs in the obvious ways works by default. + +### The `HashMapKey` trait and friends + +An earlier proposal for solving the `_equiv` problem was given in the +[associated items RFC](https://github.com/rust-lang/rfcs/pull/195)): + +```rust +trait HashMapKey : Clone + Hash + Eq { + type Query: Hash = Self; + fn compare(&self, other: &Query) -> bool { self == other } + fn query_to_key(q: &Query) -> Self { q.clone() }; +} + +impl HashMapKey for String { + type Query = str; + fn compare(&self, other: &str) -> bool { + self.as_slice() == other + } + fn query_to_key(q: &str) -> String { + q.into_string() + } +} + +impl HashMap where K: HashMapKey { + fn find(&self, q: &K::Query) -> &V { ... } +} +``` + +This solution has several drawbacks, however: + +* It requires a separate trait for different kinds of maps -- one for `HashMap`, + one for `TreeMap`, etc. + +* It requires that a trait be implemented on a given key without providing a + blanket implementation. Since you also need different traits for different + maps, it's easy to imagine cases where a out-of-crate type you want to use as + a key doesn't implement the key trait, forcing you to newtype. + +* It doesn't help with the `MaybeOwned` problem. + +### Daniel Micay's hack + +@strcat has a [PR](https://github.com/rust-lang/rust/pull/16713) that makes it +possible to, for example, coerce a `&str` to an `&String` value. + +This provides some help for the `_equiv` problem, since the `_equiv` methods +could potentially be dropped. However, there are a few downsides: + +* Using a map with string keys is still a bit more verbose: + + ```rust + map.find("some static string".as_string()) // with the hack + map.find("some static string") // with this RFC + ``` + +* The solution is specialized to strings and vectors, and does not necessarily + support user-defined unsized types or slices. + +* It doesn't help with the `MaybeOwned` problem. + +* It exposes some representation interplay between slices and references to + owned values, which we may not want to commit to or reveal. + +## For `IntoIterator` + +### Handling of `for` loops + +The fact that `for x in v` moves elements from `v`, while `for x in v.iter()` +yields references, may be a bit surprising. On the other hand, moving is the +default almost everywhere in Rust, and with the proposed approach you get to use `&` and +`&mut` to easily select other forms of iteration. + +(See +[@huon's comment](https://github.com/rust-lang/rfcs/pull/235/files#r17697796) +for additional drawbacks.) + +Unfortunately, it's a bit tricky to make for use by-ref iterators instead. The +problem is that an iterator is `IntoIterator`, but it is not `Iterable` (or +whatever we call the by-reference trait). Why? Because `IntoIterator` gives you +an iterator that can be used only *once*, while `Iterable` allows you to ask for +iterators repeatedly. + +If `for` demanded an `Iterable`, then `for x in v.iter()` and `for x in v.iter_mut()` +would cease to work -- we'd have to find some other approach. It might be +doable, but it's not obvious how to do it. + +### Input versus output type parameters + +An important aspect of the `IntoIterator` design is that the element type is an +associated type, *not* an input type. + +This is a tradeoff: + +* Making it an associated type means that the `for` examples work, because the + type of `Self` uniquely determines the element type for iteration, aiding type + inference. + +* Making it an input type would forgo those benefits, but would allow some + additional flexibility. For example, you could implement `IntoIterator` for + an iterator on `&A` when `A` is cloned, therefore *implicitly* cloning as + needed to make the ownership work out (and obviating the need for + `iter_cloned`). However, we have generally kept away from this kind of + implicit magic, *especially* when it can involve hidden costs like cloning, so + the more explicit design given in this RFC seems best. + +# Downsides + +Design tradeoffs were discussed inline. + +# Unresolved questions + +## Unresolved conventions/APIs + +As mentioned [above](#combinations), this RFC does not resolve the question of +what to call set operations that update the set in place. + +It likewise does not settle the APIs that appear in only single concrete +collections. These will largely be handled through the API stabilization +process, unless radical changes are proposed. + +Finally, additional methods provided via the `IntoIterator` API are left for +future consideration. + +## Coercions + +Using the `Borrow` trait, it might be possible to safely add a coercion for auto-slicing: + +``` + If T: Borrow: + coerce &'a T::Owned to &'a T + coerce &'a mut T::Owned to &'a mut T +``` + +For sized types, this coercion is *forced* to be trivial, so the only time it +would involve running user code is for unsized values. + +A general story about such coercions will be left to a +[follow-up RFC](https://github.com/rust-lang/rfcs/pull/241). diff --git a/text/0236-error-conventions.md b/text/0236-error-conventions.md new file mode 100644 index 00000000000..071a61f41b4 --- /dev/null +++ b/text/0236-error-conventions.md @@ -0,0 +1,241 @@ +- Start Date: 2014-10-30 +- RFC PR #: [rust-lang/rfcs#236](https://github.com/rust-lang/rfcs/pull/236) +- Rust Issue #: [rust-lang/rust#18466](https://github.com/rust-lang/rust/issues/18466) + +# Summary + +This is a *conventions* RFC for formalizing the basic conventions around error +handling in Rust libraries. + +The high-level overview is: + +* For *catastrophic errors*, abort the process or fail the task depending on + whether any recovery is possible. + +* For *contract violations*, fail the task. (Recover from programmer errors at a coarse grain.) + +* For *obstructions to the operation*, use `Result` (or, less often, + `Option`). (Recover from obstructions at a fine grain.) + +* Prefer liberal function contracts, especially if reporting errors in input + values may be useful to a function's caller. + +This RFC follows up on [two](https://github.com/rust-lang/rfcs/pull/204) +[earlier](https://github.com/rust-lang/rfcs/pull/220) attempts by giving more +leeway in when to fail the task. + +# Motivation + +Rust provides two basic strategies for dealing with errors: + +* *Task failure*, which unwinds to at least the task boundary, and by default + propagates to other tasks through poisoned channels and mutexes. Task failure + works well for coarse-grained error handling. + +* *The Result type*, which allows functions to signal error conditions through + the value that they return. Together with a lint and the `try!` macro, + `Result` works well for fine-grained error handling. + +However, while there have been some general trends in the usage of the two +handling mechanisms, we need to have formal guidelines in order to ensure +consistency as we stabilize library APIs. That is the purpose of this RFC. + +For the most part, the RFC proposes guidelines that are already followed today, +but it tries to motivate and clarify them. + +# Detailed design + +Errors fall into one of three categories: + +* Catastrophic errors, e.g. out-of-memory. +* Contract violations, e.g. wrong input encoding, index out of bounds. +* Obstructions, e.g. file not found, parse error. + +The basic principle of the conventions is that: + +* Catastrophic errors and programming errors (bugs) can and should only be +recovered at a *coarse grain*, i.e. a task boundary. +* Obstructions preventing an operation should be reported at a maximally *fine +grain* -- to the immediate invoker of the operation. + +## Catastrophic errors + +An error is _catastrophic_ if there is no meaningful way for the current task to +continue after the error occurs. + +Catastrophic errors are _extremely_ rare, especially outside of `libstd`. + +**Canonical examples**: out of memory, stack overflow. + +### For catastrophic errors, fail the task. + +For errors like stack overflow, Rust currently aborts the process, but +could in principle fail the task, which (in the best case) would allow +reporting and recovery from a supervisory task. + +## Contract violations + +An API may define a contract that goes beyond the type checking enforced by the +compiler. For example, slices support an indexing operation, with the contract +that the supplied index must be in bounds. + +Contracts can be complex and involve more than a single function invocation. For +example, the `RefCell` type requires that `borrow_mut` not be called until all +existing borrows have been relinquished. + +### For contract violations, fail the task. + +A contract violation is always a bug, and for bugs we follow the Erlang +philosophy of "let it crash": we assume that software *will* have bugs, and we +design coarse-grained task boundaries to report, and perhaps recover, from these +bugs. + +### Contract design + +One subtle aspect of these guidelines is that the contract for a function is +chosen by an API designer -- and so the designer also determines what counts as +a violation. + +This RFC does not attempt to give hard-and-fast rules for designing +contracts. However, here are some rough guidelines: + +* Prefer expressing contracts through static types whenever possible. + +* It *must* be possible to write code that uses the API without violating the + contract. + +* Contracts are most justified when violations are *inarguably* bugs -- but this + is surprisingly rare. + +* Consider whether the API client could benefit from the contract-checking + logic. The checks may be expensive. Or there may be useful programming + patterns where the client does not want to check inputs before hand, but would + rather attempt the operation and then find out whether the inputs were invalid. + +* When a contract violation is the *only* kind of error a function may encounter + -- i.e., there are no obstructions to its success other than "bad" inputs -- + using `Result` or `Option` instead is especially warranted. Clients can then use + `unwrap` to assert that they have passed valid input, or re-use the error + checking done by the API for their own purposes. + +* When in doubt, use loose contracts and instead return a `Result` or `Option`. + +## Obstructions + +An operation is *obstructed* if it cannot be completed for some reason, even +though the operation's contract has been satisfied. Obstructed operations may +have (documented!) side effects -- they are not required to roll back after +encountering an obstruction. However, they should leave the data structures in +a "coherent" state (satisfying their invariants, continuing to guarantee safety, +etc.). + +Obstructions may involve external conditions (e.g., I/O), or they may involve +aspects of the input that are not covered by the contract. + +**Canonical examples**: file not found, parse error. + +### For obstructions, use `Result` + +The +[`Result` type](http://static.rust-lang.org/doc/master/std/result/index.html) +represents either a success (yielding `T`) or failure (yielding `E`). By +returning a `Result`, a function allows its clients to discover and react to +obstructions in a fine-grained way. + +#### What about `Option`? + +The `Option` type should not be used for "obstructed" operations; it +should only be used when a `None` return value could be considered a +"successful" execution of the operation. + +This is of course a somewhat subjective question, but a good litmus +test is: would a reasonable client ever ignore the result? The +`Result` type provides a lint that ensures the result is actually +inspected, while `Option` does not, and this difference of behavior +can help when deciding between the two types. + +Another litmus test: can the operation be understood as asking a +question (possibly with sideeffects)? Operations like `pop` on a +vector can be viewed as asking for the contents of the first element, +with the side effect of removing it if it exists -- with an `Option` +return value. + +## Do not provide both `Result` and `fail!` variants. + +An API should not provide both `Result`-producing and `fail`ing versions of an +operation. It should provide just the `Result` version, allowing clients to use +`try!` or `unwrap` instead as needed. This is part of the general pattern of +cutting down on redundant variants by instead using method chaining. + +There is one exception to this rule, however. Some APIs are strongly oriented +around failure, in the sense that their functions/methods are explicitly +intended as assertions. If there is no other way to check in advance for the +validity of invoking an operation `foo`, however, the API may provide a +`foo_catch` variant that returns a `Result`. + +The main examples in `libstd` that *currently* provide both variants are: + +* Channels, which are the primary point of failure propagation between tasks. As + such, calling `recv()` is an _assertion_ that the other end of the channel is + still alive, which will propagate failures from the other end of the + channel. On the other hand, since there is no separate way to atomically test + whether the other end has hung up, channels provide a `recv_opt` variant that + produces a `Result`. + + > Note: the `_opt` suffix would be replaced by a `_catch` suffix if this RFC + > is accepted. + +* `RefCell`, which provides a dynamic version of the borrowing rules. Calling + the `borrow()` method is intended as an assertion that the cell is in a + borrowable state, and will `fail!` otherwise. On the other hand, there is no + separate way to check the state of the `RefCell`, so the module provides a + `try_borrow` variant that produces a `Result`. + + > Note: the `try_` prefix would be replaced by a `_catch` catch if this RFC is + > accepted. + +(Note: it is unclear whether these APIs will continue to provide both variants.) + +# Drawbacks + +The main drawbacks of this proposal are: + +* Task failure remains somewhat of a landmine: one must be sure to document, and + be aware of, all relevant function contracts in order to avoid task failure. + +* The choice of what to make part of a function's contract remains somewhat + subjective, so these guidelines cannot be used to decisively resolve + disagreements about an API's design. + +The alternatives mentioned below do not suffer from these problems, but have +drawbacks of their own. + +# Alternatives + +[Two](https://github.com/rust-lang/rfcs/pull/204) +[alternative](https://github.com/rust-lang/rfcs/pull/220) designs have been +given in earlier RFCs, both of which take a much harder line on using `fail!` +(or, put differently, do not allow most functions to have contracts). + +As was +[pointed out by @SiegeLord](https://github.com/rust-lang/rfcs/pull/220#issuecomment-54715268), +however, mixing what might be seen as contract violations with obstructions can +make it much more difficult to write obstruction-robust code; see the linked +comment for more detail. + +## Naming + +There are numerous possible suffixes for a `Result`-producing variant: + +* `_catch`, as proposed above. As + [@kballard points out](https://github.com/rust-lang/rfcs/pull/236#issuecomment-55344336), + this name connotes exception handling, which could be considered + misleading. However, since it effectively prevents further unwinding, catching + an exception may indeed be the right analogy. + +* `_result`, which is straightforward but not as informative/suggestive as some + of the other proposed variants. + +* `try_` prefix. Also connotes exception handling, but has an unfortunately + overlap with the common use of `try_` for nonblocking variants (which is in + play for `recv` in particular). diff --git a/text/0240-unsafe-api-location.md b/text/0240-unsafe-api-location.md new file mode 100644 index 00000000000..58697bf4616 --- /dev/null +++ b/text/0240-unsafe-api-location.md @@ -0,0 +1,162 @@ +- Start Date: 2014-10-07 +- RFC PR: [rust-lang/rfcs#240](https://github.com/rust-lang/rfcs/pull/240) +- Rust Issue: [rust-lang/rust#17863](https://github.com/rust-lang/rust/issues/17863) + +# Summary + +This is a *conventions RFC* for settling the location of `unsafe` APIs relative +to the types they work with, as well as the use of `raw` submodules. + +The brief summary is: + +* Unsafe APIs should be made into methods or static functions in the same cases + that safe APIs would be. + +* `raw` submodules should be used only to *define* explicit low-level + representations. + +# Motivation + +Many data structures provide unsafe APIs either for avoiding checks or working +directly with their (otherwise private) representation. For example, `string` +provides: + +* An `as_mut_vec` method on `String` that provides a `Vec` view of the + string. This method makes it easy to work with the byte-based representation + of the string, but thereby also allows violation of the utf8 guarantee. + +* A `raw` submodule with a number of free functions, like `from_parts`, that + constructs a `String` instances from a raw-pointer-based representation, a + `from_utf8` variant that does not actually check for utf8 validity, and so + on. The unifying theme is that all of these functions avoid checking some key + invariant. + +The problem is that currently, there is no clear/consistent guideline about +which of these APIs should live as methods/static functions associated with a +type, and which should live in a `raw` submodule. Both forms appear throughout +the standard library. + +# Detailed design + +The proposed convention is: + +* When an unsafe function/method is clearly "about" a certain type (as a way of + constructing, destructuring, or modifying values of that type), it should be a + method or static function on that type. This is the same as the convention for + placement of safe functions/methods. So functions like + `string::raw::from_parts` would become static functions on `String`. + +* `raw` submodules should only be used to *define* low-level + types/representations (and methods/functions on them). Methods for converting + to/from such low-level types should be available directly on the high-level + types. Examples: `core::raw`, `sync::raw`. + +The benefits are: + +* *Ergonomics*. You can gain easy access to unsafe APIs merely by having a value + of the type (or, for static functions, importing the type). + +* *Consistency and simplicity*. The rules for placement of unsafe APIs are the + same as those for safe APIs. + +The perspective here is that marking APIs `unsafe` is enough to deter their use +in ordinary situations; they don't need to be further distinguished by placement +into a separate module. + +There are also some naming conventions to go along with unsafe static functions +and methods: + +* When an unsafe function/method is an unchecked variant of an otherwise safe + API, it should be marked using an `_unchecked` suffix. + + For example, the `String` module should provide both `from_utf8` and + `from_utf8_unchecked` constructors, where the latter does not actually check + the utf8 encoding. The `string::raw::slice_bytes` and + `string::raw::slice_unchecked` functions should be merged into a single + `slice_unchecked` method on strings that checks neither bounds nor utf8 + boundaries. + +* When an unsafe function/method produces or consumes a low-level representation + of a data structure, the API should use `raw` in its name. Specifically, + `from_raw_parts` is the typical name used for constructing a value from e.g. a + pointer-based representation. + +* Otherwise, *consider* using a name that suggests *why* the API is unsafe. In + some cases, like `String::as_mut_vec`, other stronger conventions apply, and the + `unsafe` qualifier on the signature (together with API documentation) is + enough. + +The unsafe methods and static functions for a given type should be placed in +their own `impl` block, at the end of the module defining the type; this will +ensure that they are grouped together in rustdoc. (Thanks @kballard for the +suggestion.) + +# Drawbacks + +One potential drawback of these conventions is that the documentation for a +module will be cluttered with rarely-used `unsafe` APIs, whereas the `raw` +submodule approach neatly groups these APIs. But rustdoc could easily be +changed to either hide or separate out `unsafe` APIs by default, and in the +meantime the `impl` block grouping should help. + +More specifically, the convention of placing unsafe constructors in `raw` makes +them very easy to find. But the usual `from_` convention, together with the +naming conventions suggested above, should make it fairly easy to discover such +constructors even when they're supplied directly as static functions. + +More generally, these conventions give `unsafe` APIs more equal status with safe +APIs. Whether this is a *drawback* depends on your philosophy about the status +of unsafe programming. But on a technical level, the key point is that the APIs +are marked `unsafe`, so users still have to opt-in to using them. *Ed note: from +my perspective, low-level/unsafe programming is important to support, and there +is no reason to penalize its ergonomics given that it's opt-in anyway.* + +# Alternatives + +There are a few alternatives: + +* Rather than providing unsafe APIs directly as methods/static functions, they + could be grouped into a single extension trait. For example, the `String` type + could be accompanied by a `StringRaw` extension trait providing APIs for + working with raw string representations. This would allow a clear grouping of + unsafe APIs, while still providing them as methods/static functions and + allowing them to easily be imported with e.g. `use std::string::StringRaw`. + On the other hand, it still further penalizes the raw APIs (beyond marking + them `unsafe`), and given that rustdoc could easily provide API grouping, it's + unclear exactly what the benefit is. + +* ([Suggested by @kballard](https://github.com/rust-lang/rfcs/pull/240#issuecomment-55635468)): + + > Use `raw` for functions that construct a value of the type without checking + > for one or more invariants. + + The advantage is that it's easy to find such invariant-ignoring functions. The + disadvantage is that their ergonomics is worsened, since they much be + separately imported or referenced through a lengthy path: + + ```rust + // Compare the ergonomics: + string::raw::slice_unchecked(some_string, start, end) + some_string.slice_unchecked(start, end) + ``` + +* Another suggestion by @kballard is to keep the basic structure of `raw` + submodules, but use associated types to improve the ergonomics. Details (and + discussions of pros/cons) are in + [this comment](https://github.com/rust-lang/rfcs/pull/240/files#r17572875). + +* Use `raw` submodules to group together *all* manipulation of low-level + representations. No module in `std` currently does this; existing modules + provide some free functions in `raw`, and some unsafe methods, without a clear + driving principle. The ergonomics of moving *everything* into free functions + in a `raw` submodule are quite poor. + +# Unresolved questions + +The `core::raw` module provides structs with public representations equivalent +to several built-in and library types (boxes, closures, slices, etc.). It's not +clear whether the name of this module, or the location of its contents, should +change as a result of this RFC. The module is a special case, because not all of +the types it deals with even have corresponding modules/type declarations -- so +it probably suffices to leave decisions about it to the API stabilization +process. diff --git a/text/0241-deref-conversions.md b/text/0241-deref-conversions.md new file mode 100644 index 00000000000..19ab298708d --- /dev/null +++ b/text/0241-deref-conversions.md @@ -0,0 +1,275 @@ +- Start Date: 2014-09-16 +- RFC PR: [rust-lang/rfcs#241](https://github.com/rust-lang/rfcs/pull/241) +- Rust Issue: [rust-lang/rust#21432](https://github.com/rust-lang/rust/issues/21432) + +# Summary + +Add the following coercions: + +* From `&T` to `&U` when `T: Deref`. +* From `&mut T` to `&U` when `T: Deref`. +* From `&mut T` to `&mut U` when `T: DerefMut` + +These coercions eliminate the need for "cross-borrowing" (things like `&**v`) +and calls to `as_slice`. + +# Motivation + +Rust currently supports a conservative set of *implicit coercions* that are used +when matching the types of arguments against those given for a function's +parameters. For example, if `T: Trait` then `&T` is implicitly coerced to +`&Trait` when used as a function argument: + +```rust +trait MyTrait { ... } +struct MyStruct { ... } +impl MyTrait for MyStruct { ... } + +fn use_trait_obj(t: &MyTrait) { ... } +fn use_struct(s: &MyStruct) { + use_trait_obj(s) // automatically coerced from &MyStruct to &MyTrait +} +``` + +In older incarnations of Rust, in which types like vectors were built in to the +language, coercions included things like auto-borrowing (taking `T` to `&T`), +auto-slicing (taking `Vec` to `&[T]`) and "cross-borrowing" (taking `Box` +to `&T`). As built-in types migrated to the library, these coercions have +disappeared: none of them apply today. That means that you have to write code +like `&**v` to convert `&Box` or `Rc>` to `&T` and `v.as_slice()` +to convert `Vec` to `&T`. + +The ergonomic regression was coupled with a promise that we'd improve things in +a more general way later on. + +"Later on" has come! The premise of this RFC is that (1) we have learned some +valuable lessons in the interim and (2) there is a quite conservative kind of +coercion we can add that dramatically improves today's ergonomic state of +affairs. + +# Detailed design + +## Design principles + +### The centrality of ownership and borrowing + +As Rust has evolved, +[a theme has emerged](http://blog.rust-lang.org/2014/09/15/Rust-1.0.html): +*ownership* and *borrowing* are the focal point of Rust's design, and the key +enablers of much of Rust's achievements. + +As such, reasoning about ownership/borrowing is a central aspect of programming +in Rust. + +In the old coercion model, borrowing could be done completely implicitly, so an +invocation like: + +```rust +foo(bar, baz, quux) +``` + +might move `bar`, immutably borrow `baz`, and mutably borrow `quux`. To +understand the flow of ownership, then, one has to be aware of the details of +all function signatures involved -- it is not possible to see ownership at a +glance. + +When +[auto-borrowing was removed](https://mail.mozilla.org/pipermail/rust-dev/2013-November/006849.html), +this reasoning difficulty was cited as a major motivator: + +> Code readability does not necessarily benefit from autoref on arguments: + + ```rust + let a = ~Foo; + foo(a); // reading this code looks like it moves `a` + fn foo(_: &Foo) {} // ah, nevermind, it doesn't move `a`! + + let mut a = ~[ ... ]; + sort(a); // not only does this not move `a`, but it mutates it! + ``` + +Having to include an extra `&` or `&mut` for arguments is a slight +inconvenience, but it makes it much easier to track ownership at a glance. +(Note that ownership is not *entirely* explicit, due to `self` and macros; see +the [appendix](#appendix-ownership-in-rust-today).) + +This RFC takes as a basic principle: **Coercions should never implicitly borrow from owned data**. + +This is a key difference from the +[cross-borrowing RFC](https://github.com/rust-lang/rfcs/pull/226). + +### Limit implicit execution of arbitrary code + +Another positive aspect of Rust's current design is that a function call like +`foo(bar, baz)` does not invoke arbitrary code (general implicit coercions, as +found in e.g. Scala). It simply executes `foo`. + +The tradeoff here is similar to the ownership tradeoff: allowing arbitrary +implicit coercions means that a programmer must understand the types of the +arguments given, the types of the parameters, and *all* applicable coercion code +in order to understand what code will be executed. While arbitrary coercions are +convenient, they come at a substantial cost in local reasoning about code. + +Of course, method dispatch can implicitly execute code via `Deref`. But `Deref` +is a pretty specialized tool: + +* Each type `T` can only deref to *one* other type. + + (Note: this restriction is not currently enforced, but will be enforceable + once [associated types](https://github.com/rust-lang/rfcs/pull/195) land.) + +* Deref makes all the methods of the target type visible on the source type. +* The source and target types are both references, limiting what the `deref` + code can do. + +These characteristics combined make `Deref` suitable for smart pointer-like +types and little else. They make `Deref` implementations relatively rare. And as +a consequence, you generally know when you're working with a type implementing +`Deref`. + +This RFC takes as a basic principle: **Coercions should narrowly limit the code they execute**. + +Coercions through `Deref` are considered narrow enough. + +## The proposal + +The idea is to introduce a coercion corresponding to `Deref`/`DerefMut`, but +*only* for already-borrowed values: + +* From `&T` to `&U` when `T: Deref`. +* From `&mut T` to `&U` when `T: Deref`. +* From `&mut T` to `&mut U` when `T: DerefMut` + +These coercions are applied *recursively*, similarly to auto-deref for method +dispatch. + +Here is a simple pseudocode algorithm for determining the applicability of +coercions. Let `HasBasicCoercion(T, U)` be a procedure for determining whether +`T` can be coerced to `U` using today's coercion rules (i.e. without deref). +The general `HasCoercion(T, U)` procedure would work as follows: + +``` +HasCoercion(T, U): + + if HasBasicCoercion(T, U) then + true + else if T = &V and V: Deref then + HasCoercion(&W, U) + else if T = &mut V and V: Deref then + HasCoercion(&W, U) + else if T = &mut V and V: DerefMut then + HasCoercion(&W, U) + else + false +``` + +Essentially, the procedure looks for applicable "basic" coercions at increasing +levels of deref from the given argument, just as method resolution searches for +applicable methods at increasing levels of deref. + +Unlike method resolution, however, this coercion does *not* automatically borrow. + +### Benefits of the design + +Under this coercion design, we'd see the following ergonomic improvements for +"cross-borrowing": + +```rust +fn use_ref(t: &T) { ... } +fn use_mut(t: &mut T) { ... } + +fn use_rc(t: Rc) { + use_ref(&*t); // what you have to write today + use_ref(&t); // what you'd be able to write +} + +fn use_mut_box(t: &mut Box) { + use_mut(&mut *t); // what you have to write today + use_mut(t); // what you'd be able to write + + use_ref(*t); // what you have to write today + use_ref(t); // what you'd be able to write +} + +fn use_nested(t: &Box) { + use_ref(&**t); // what you have to write today + use_ref(t); // what you'd be able to write (note: recursive deref) +} +``` + +In addition, if `Vec: Deref<[T]>` (as proposed +[here](https://github.com/rust-lang/rfcs/pull/235)), slicing would be automatic: + +```rust +fn use_slice(s: &[u8]) { ... } + +fn use_vec(v: Vec) { + use_slice(v.as_slice()); // what you have to write today + use_slice(&v); // what you'd be able to write +} + +fn use_vec_ref(v: &Vec) { + use_slice(v.as_slice()); // what you have to write today + use_slice(v); // what you'd be able to write +} +``` + +### Characteristics of the design + +The design satisfies both of the principles laid out in the Motivation: + +* It does not introduce implicit borrows of owned data, since it only applies to + already-borrowed data. + +* It only applies to `Deref` types, which means there is only limited potential + for implicitly running unknown code; together with the expectation that + programmers are generally aware when they are using `Deref` types, this should + retain the kind of local reasoning Rust programmers can do about + function/method invocations today. + +There is a *conceptual model* implicit in the design here: `&` is a "borrow" +operator, and richer coercions are available between borrowed types. This +perspective is in opposition to viewing `&` primarily as adding a layer of +indirection -- a view that, given compiler optimizations, is often inaccurate +anyway. + +# Drawbacks + +As with any mechanism that implicitly invokes code, deref coercions make it more +complex to fully understand what a given piece of code is doing. The RFC argued +inline that the design conserves local reasoning in practice. + +As mentioned above, this coercion design also changes the mental model +surrounding `&`, and in particular somewhat muddies the idea that it creates a +pointer. This change could make Rust more difficult to learn (though note that +it puts *more* attention on ownership), though it would make it more convenient +to use in the long run. + +# Alternatives + +The main alternative that addresses the same goals as this RFC is the +[cross-borrowing RFC](https://github.com/rust-lang/rfcs/pull/226), which +proposes a more aggressive form of deref coercion: it would allow converting +e.g. `Box` to `&T` and `Vec` to `&[T]` directly. The advantage is even +greater convenience: in many cases, even `&` is not necessary. The disadvantage +is the change to local reasoning about ownership: + +```rust +let v = vec![0u8, 1, 2]; +foo(v); // is v moved here? +bar(v); // is v still available? +``` + +Knowing whether `v` is moved in the call to `foo` requires knowing `foo`'s +signature, since the coercion would *implicitly borrow* from the vector. + +# Appendix: ownership in Rust today + +In today's Rust, ownership transfer/borrowing is explicit for all +function/method arguments. It is implicit only for: + +* *`self` on method invocations.* In practice, the name and context of a method + invocation is almost always sufficient to infer its move/borrow semantics. + +* *Macro invocations.* Since macros can expand into arbitrary code, macro + invocations can appear to move when they actually borrow. diff --git a/text/0243-trait-based-exception-handling.md b/text/0243-trait-based-exception-handling.md new file mode 100644 index 00000000000..946428d000f --- /dev/null +++ b/text/0243-trait-based-exception-handling.md @@ -0,0 +1,698 @@ +- Feature-gates: `question_mark`, `try_catch` +- Start Date: 2014-09-16 +- RFC PR #: [rust-lang/rfcs#243](https://github.com/rust-lang/rfcs/pull/243) +- Rust Issue #: [rust-lang/rust#31436](https://github.com/rust-lang/rust/issues/31436) + + +# Summary + +Add syntactic sugar for working with the `Result` type which models common +exception handling constructs. + +The new constructs are: + + * An `?` operator for explicitly propagating "exceptions". + + * A `catch { ... }` expression for conveniently catching and handling + "exceptions". + +The idea for the `?` operator originates from [RFC PR 204][204] by +[@aturon](https://github.com/aturon). + +[204]: https://github.com/rust-lang/rfcs/pull/204 + + +# Motivation and overview + +Rust currently uses the `enum Result` type for error handling. This solution is +simple, well-behaved, and easy to understand, but often gnarly and inconvenient +to work with. We would like to solve the latter problem while retaining the +other nice properties and avoiding duplication of functionality. + +We can accomplish this by adding constructs which mimic the exception-handling +constructs of other languages in both appearance and behavior, while improving +upon them in typically Rustic fashion. Their meaning can be specified by a +straightforward source-to-source translation into existing language constructs, +plus a very simple and obvious new one. (They may also, but need not +necessarily, be implemented in this way.) + +These constructs are strict additions to the existing language, and apart from +the issue of keywords, the legality and behavior of all currently existing Rust +programs is entirely unaffected. + +The most important additions are a postfix `?` operator for +propagating "exceptions" and a `catch {..}` expression for catching +them. By an "exception", for now, we essentially just mean the `Err` +variant of a `Result`, though the Unresolved Questions includes some +discussion of extending to other types. + +## `?` operator + +The postfix `?` operator can be applied to `Result` values and is equivalent to +the current `try!()` macro. It either returns the `Ok` value directly, or +performs an early exit and propagates the `Err` value further out. (So given +`my_result: Result`, we have `my_result?: Foo`.) This allows it to be +used for e.g. conveniently chaining method calls which may each "throw an +exception": + + foo()?.bar()?.baz() + +Naturally, in this case the types of the "exceptions thrown by" `foo()` and +`bar()` must unify. Like the current `try!()` macro, the `?` operator will also +perform an implicit "upcast" on the exception type. + +When used outside of a `catch` block, the `?` operator propagates the exception to +the caller of the current function, just like the current `try!` macro does. (If +the return type of the function isn't a `Result`, then this is a type error.) +When used inside a `catch` block, it propagates the exception up to the innermost +`catch` block, as one would expect. + +Requiring an explicit `?` operator to propagate exceptions strikes a very +pleasing balance between completely automatic exception propagation, which most +languages have, and completely manual propagation, which we'd have apart from +the `try!` macro. It means that function calls remain simply function calls +which return a result to their caller, with no magic going on behind the scenes; +and this also *increases* flexibility, because one gets to choose between +propagation with `?` or consuming the returned `Result` directly. + +The `?` operator itself is suggestive, syntactically lightweight enough to not +be bothersome, and lets the reader determine at a glance where an exception may +or may not be thrown. It also means that if the signature of a function changes +with respect to exceptions, it will lead to type errors rather than silent +behavior changes, which is a good thing. Finally, because exceptions are tracked +in the type system, and there is no silent propagation of exceptions, and all +points where an exception may be thrown are readily apparent visually, this also +means that we do not have to worry very much about "exception safety". + +### Exception type upcasting + +In a language with checked exceptions and subtyping, it is clear that if a +function is declared as throwing a particular type, its body should also be able +to throw any of its subtypes. Similarly, in a language with structural sum types +(a.k.a. anonymous `enum`s, polymorphic variants), one should be able to throw a +type with fewer cases in a function declaring that it may throw a superset of +those cases. This is essentially what is achieved by the common Rust practice of +declaring a custom error `enum` with `From` `impl`s for each of the upstream +error types which may be propagated: + + enum MyError { + IoError(io::Error), + JsonError(json::Error), + OtherError(...) + } + + impl From for MyError { ... } + impl From for MyError { ... } + +Here `io::Error` and `json::Error` can be thought of as subtypes of `MyError`, +with a clear and direct embedding into the supertype. + +The `?` operator should therefore perform such an implicit conversion, in the +nature of a subtype-to-supertype coercion. The present RFC uses the +`std::convert::Into` trait for this purpose (which has a blanket `impl` +forwarding from `From`). The precise requirements for a conversion to be "like" +a subtyping coercion are an open question; see the "Unresolved questions" +section. + +## `catch` expressions + +This RFC also introduces an expression form `catch {..}`, which serves +to "scope" the `?` operator. The `catch` operator executes its +associated block. If no exception is thrown, then the result is +`Ok(v)` where `v` is the value of the block. Otherwise, if an +exception is thrown, then the result is `Err(e)`. Note that unlike +other languages, a `catch` block always catches all errors, and they +must all be coercable to a single type, as a `Result` only has a +single `Err` type. This dramatically simplifies thinking about the +behavior of exception-handling code. + +Note that `catch { foo()? }` is essentially equivalent to `foo()`. +`catch` can be useful if you want to coalesce *multiple* potential +exceptions -- `catch { foo()?.bar()?.baz()? }` -- into a single +`Result`, which you wish to then e.g. pass on as-is to another +function, rather than analyze yourself. (The last example could also +be expressed using a series of `and_then` calls.) + +# Detailed design + +The meaning of the constructs will be specified by a source-to-source +translation. We make use of an "early exit from any block" feature +which doesn't currently exist in the language, generalizes the current +`break` and `return` constructs, and is independently useful. + +## Early exit from any block + +The capability can be exposed either by generalizing `break` to take an optional +value argument and break out of any block (not just loops), or by generalizing +`return` to take an optional lifetime argument and return from any block, not +just the outermost block of the function. This feature is only used in this RFC +as an explanatory device, and implementing the RFC does not require exposing it, +so I am going to arbitrarily choose the `break` syntax for the following and +won't discuss the question further. + +So we are extending `break` with an optional value argument: `break 'a EXPR`. +This is an expression of type `!` which causes an early return from the +enclosing block specified by `'a`, which then evaluates to the value `EXPR` (of +course, the type of `EXPR` must unify with the type of the last expression in +that block). This works for any block, not only loops. + +A completely artificial example: + + 'a: { + let my_thing = if have_thing() { + get_thing() + } else { + break 'a None + }; + println!("found thing: {}", my_thing); + Some(my_thing) + } + +Here if we don't have a thing, we escape from the block early with `None`. + +If no value is specified, it defaults to `()`: in other words, the current +behavior. We can also imagine there is a magical lifetime `'fn` which refers to +the lifetime of the whole function: in this case, `break 'fn` is equivalent to +`return`. + +Again, this RFC does not propose generalizing `break` in this way at this time: +it is only used as a way to explain the meaning of the constructs it does +propose. + + +## Definition of constructs + +Finally we have the definition of the new constructs in terms of a +source-to-source translation. + +In each case except the first, I will provide two definitions: a single-step +"shallow" desugaring which is defined in terms of the previously defined new +constructs, and a "deep" one which is "fully expanded". + +Of course, these could be defined in many equivalent ways: the below definitions +are merely one way. + + * Construct: + + EXPR? + + Shallow: + + match EXPR { + Ok(a) => a, + Err(e) => break 'here Err(e.into()) + } + + Where `'here` refers to the innermost enclosing `catch` block, or to `'fn` if + there is none. + + The `?` operator has the same precedence as `.`. + + * Construct: + + catch { + foo()?.bar() + } + + Shallow: + + 'here: { + Ok(foo()?.bar()) + } + + Deep: + + 'here: { + Ok(match foo() { + Ok(a) => a, + Err(e) => break 'here Err(e.into()) + }.bar()) + } + +The fully expanded translations get quite gnarly, but that is why it's good that +you don't have to write them! + +In general, the types of the defined constructs should be the same as the types +of their definitions. + +(As noted earlier, while the behavior of the constructs can be *specified* using +a source-to-source translation in this manner, they need not necessarily be +*implemented* this way.) + +As a result of this RFC, both `Into` and `Result` would have to become lang +items. + + +## Laws + +Without any attempt at completeness, here are some things which should be true: + + * `catch { foo() } ` = `Ok(foo())` + * `catch { Err(e)? } ` = `Err(e.into())` + * `catch { try_foo()? } ` = `try_foo().map_err(Into::into)` + +(In the above, `foo()` is a function returning any type, and `try_foo()` is a +function returning a `Result`.) + +## Feature gates + +The two major features here, the `?` syntax and `catch` expressions, +will be tracked by independent feature gates. Each of the features has +a distinct motivation, and we should evaluate them independently. + +# Unresolved questions + +These questions should be satisfactorally resolved before stabilizing the +relevant features, at the latest. + +## Optional `match` sugar + +Originally, the RFC included the ability to `match` the errors caught +by a `catch` by writing `catch { .. } match { .. }`, which could be translated +as follows: + + * Construct: + + catch { + foo()?.bar() + } match { + A(a) => baz(a), + B(b) => quux(b) + } + + Shallow: + + match (catch { + foo()?.bar() + }) { + Ok(a) => a, + Err(e) => match e { + A(a) => baz(a), + B(b) => quux(b) + } + } + + Deep: + + match ('here: { + Ok(match foo() { + Ok(a) => a, + Err(e) => break 'here Err(e.into()) + }.bar()) + }) { + Ok(a) => a, + Err(e) => match e { + A(a) => baz(a), + B(b) => quux(b) + } + } + +However, it was removed for the following reasons: + +- The `catch` (originally: `try`) keyword adds the real expressive "step up" here, the `match` (originally: `catch`) was just sugar for `unwrap_or`. +- It would be easy to add further sugar in the future, once we see how `catch` is used (or not used) in practice. +- There was some concern about potential user confusion about two aspects: + - `catch { }` yields a `Result` but `catch { } match { }` yields just `T`; + - `catch { } match { }` handles all kinds of errors, unlike `try/catch` in other languages which let you pick and choose. + +It may be worth adding such a sugar in the future, or perhaps a +variant that binds irrefutably and does not immediately lead into a +`match` block. + +## Choice of keywords + +The RFC to this point uses the keyword `catch`, but there are a number +of other possibilities, each with different advantages and drawbacks: + + * `try { ... } catch { ... }` + + * `try { ... } match { ... }` + + * `try { ... } handle { ... }` + + * `catch { ... } match { ... }` + + * `catch { ... } handle { ... }` + + * `catch ...` (without braces or a second clause) + +Among the considerations: + + * Simplicity. Brevity. + + * Following precedent from existing, popular languages, and familiarity with + respect to their analogous constructs. + + * Fidelity to the constructs' actual behavior. For instance, the first clause + always catches the "exception"; the second only branches on it. + + * Consistency with the existing `try!()` macro. If the first clause is called + `try`, then `try { }` and `try!()` would have essentially inverse meanings. + + * Language-level backwards compatibility when adding new keywords. I'm not sure + how this could or should be handled. + +## Semantics for "upcasting" + +What should the contract for a `From`/`Into` `impl` be? Are these even the right +`trait`s to use for this feature? + +Two obvious, minimal requirements are: + + * It should be pure: no side effects, and no observation of side effects. (The + result should depend *only* on the argument.) + + * It should be total: no panics or other divergence, except perhaps in the case + of resource exhaustion (OOM, stack overflow). + +The other requirements for an implicit conversion to be well-behaved in the +context of this feature should be thought through with care. + +Some further thoughts and possibilities on this matter, only as brainstorming: + + * It should be "like a coercion from subtype to supertype", as described + earlier. The precise meaning of this is not obvious. + + * A common condition on subtyping coercions is coherence: if you can + compound-coerce to go from `A` to `Z` indirectly along multiple different + paths, they should all have the same end result. + + * It should be lossless, or in other words, injective: it should map each + observably-different element of the input type to observably-different + elements of the output type. (Observably-different means that it is possible + to write a program which behaves differently depending on which one it gets, + modulo things that "shouldn't count" like observing execution time or + resource usage.) + + * It should be unambiguous, or preserve the meaning of the input: + `impl From for u32` as `x as u32` feels right; as `(x as u32) * 12345` + feels wrong, even though this is perfectly pure, total, and injective. What + this means precisely in the general case is unclear. + + * The types converted between should the "same kind of thing": for instance, + the *existing* `impl From for Ipv4Addr` feels suspect on this count. + (This perhaps ties into the subtyping angle: `Ipv4Addr` is clearly not a + supertype of `u32`.) + +## Forwards-compatibility + +If we later want to generalize this feature to other types such as `Option`, as +described below, will we be able to do so while maintaining backwards-compatibility? + +## Monadic do notation + +There have been many comparisons drawn between this syntax and monadic +do notation. Before stabilizing, we should determine whether we plan +to make changes to better align this feature with a possible `do` +notation (for example, by removing the implicit `Ok` at the end of a +`catch` block). Note that such a notation would have to extend the +standard monadic bind to accommodate rich control flow like `break`, +`continue`, and `return`. + +# Drawbacks + + * Increases the syntactic surface area of the language. + + * No expressivity is added, only convenience. Some object to "there's more than + one way to do it" on principle. + + * If at some future point we were to add higher-kinded types and syntactic + sugar for monads, a la Haskell's `do` or Scala's `for`, their functionality + may overlap and result in redundancy. However, a number of challenges would + have to be overcome for a generic monadic sugar to be able to fully supplant + these features: the integration of higher-kinded types into Rust's type + system in the first place, the shape of a `Monad` `trait` in a language with + lifetimes and move semantics, interaction between the monadic control flow + and Rust's native control flow (the "ambient monad"), automatic upcasting of + exception types via `Into` (the exception (`Either`, `Result`) monad normally + does not do this, and it's not clear whether it can), and potentially others. + + +# Alternatives + + * Don't. + + * Only add the `?` operator, but not `catch` expressions. + + * Instead of a built-in `catch` construct, attempt to define one using + macros. However, this is likely to be awkward because, at least, macros may + only have their contents as a single block, rather than two. Furthermore, + macros are excellent as a "safety net" for features which we forget to add + to the language itself, or which only have specialized use cases; but + generally useful control flow constructs still work better as language + features. + + * Add [first-class checked exceptions][notes], which are propagated + automatically (without an `?` operator). + + This has the drawbacks of being a more invasive change and duplicating + functionality: each function must choose whether to use checked exceptions + via `throws`, or to return a `Result`. While the two are isomorphic and + converting between them is easy, with this proposal, the issue does not even + arise, as exception handling is defined *in terms of* `Result`. Furthermore, + automatic exception propagation raises the specter of "exception safety": how + serious an issue this would actually be in practice, I don't know - there's + reason to believe that it would be much less of one than in C++. + + * Wait (and hope) for HKTs and generic monad sugar. + +[notes]: https://github.com/glaebhoerl/rust-notes/blob/268266e8fbbbfd91098d3bea784098e918b42322/my_rfcs/Exceptions.txt + + +# Future possibilities + +## Expose a generalized form of `break` or `return` as described + +This RFC doesn't propose doing so at this time, but as it would be an independently useful feature, it could be added as well. + +## `throw` and `throws` + +It is possible to carry the exception handling analogy further and also add +`throw` and `throws` constructs. + +`throw` is very simple: `throw EXPR` is essentially the same thing as +`Err(EXPR)?`; in other words it throws the exception `EXPR` to the innermost +`catch` block, or to the function's caller if there is none. + +A `throws` clause on a function: + + fn foo(arg: Foo) -> Bar throws Baz { ... } + +would mean that instead of writing `return Ok(foo)` and `return Err(bar)` in the +body of the function, one would write `return foo` and `throw bar`, and these +are implicitly turned into `Ok` or `Err` for the caller. This removes syntactic +overhead from both "normal" and "throwing" code paths and (apart from `?` to +propagate exceptions) matches what code might look like in a language with +native exceptions. + +## Generalize over `Result`, `Option`, and other result-carrying types + +`Option` is completely equivalent to `Result` modulo names, and many +common APIs use the `Option` type, so it would be useful to extend all of the +above syntax to `Option`, and other (potentially user-defined) +equivalent-to-`Result` types, as well. + +This can be done by specifying a trait for types which can be used to "carry" +either a normal result or an exception. There are several different, equivalent +ways to formulate it, which differ in the set of methods provided, but the +meaning in any case is essentially just that you can choose some types `Normal` +and `Exception` such that `Self` is isomorphic to `Result`. + +Here is one way: + + #[lang(result_carrier)] + trait ResultCarrier { + type Normal; + type Exception; + fn embed_normal(from: Normal) -> Self; + fn embed_exception(from: Exception) -> Self; + fn translate>(from: Self) -> Other; + } + +For greater clarity on how these methods work, see the section on `impl`s below. +(For a simpler formulation of the trait using `Result` directly, see further +below.) + +The `translate` method says that it should be possible to translate to any +*other* `ResultCarrier` type which has the same `Normal` and `Exception` types. +This may not appear to be very useful, but in fact, this is what can be used to +inspect the result, by translating it to a concrete, known type such as +`Result` and then, for example, pattern matching on it. + +Laws: + + 1. For all `x`, `translate(embed_normal(x): A): B ` = `embed_normal(x): B`. + 2. For all `x`, `translate(embed_exception(x): A): B ` = `embed_exception(x): B`. + 3. For all `carrier`, `translate(translate(carrier: A): B): A` = `carrier: A`. + +Here I've used explicit type ascription syntax to make it clear that e.g. the +types of `embed_` on the left and right hand sides are different. + +The first two laws say that embedding a result `x` into one result-carrying type +and then translating it to a second result-carrying type should be the same as +embedding it into the second type directly. + +The third law says that translating to a different result-carrying type and then +translating back should be a no-op. + + +## `impl`s of the trait + + impl ResultCarrier for Result { + type Normal = T; + type Exception = E; + fn embed_normal(a: T) -> Result { Ok(a) } + fn embed_exception(e: E) -> Result { Err(e) } + fn translate>(result: Result) -> Other { + match result { + Ok(a) => Other::embed_normal(a), + Err(e) => Other::embed_exception(e) + } + } + } + +As we can see, `translate` can be implemented by deconstructing ourself and then +re-embedding the contained value into the other result-carrying type. + + impl ResultCarrier for Option { + type Normal = T; + type Exception = (); + fn embed_normal(a: T) -> Option { Some(a) } + fn embed_exception(e: ()) -> Option { None } + fn translate>(option: Option) -> Other { + match option { + Some(a) => Other::embed_normal(a), + None => Other::embed_exception(()) + } + } + } + +Potentially also: + + impl ResultCarrier for bool { + type Normal = (); + type Exception = (); + fn embed_normal(a: ()) -> bool { true } + fn embed_exception(e: ()) -> bool { false } + fn translate>(b: bool) -> Other { + match b { + true => Other::embed_normal(()), + false => Other::embed_exception(()) + } + } + } + +The laws should be sufficient to rule out any "icky" impls. For example, an impl +for `Vec` where an exception is represented as the empty vector, and a normal +result as a single-element vector: here the third law fails, because if the +`Vec` has more than one element *to begin with*, then it's not possible to +translate to a different result-carrying type and then back without losing +information. + +The `bool` impl may be surprising, or not useful, but it *is* well-behaved: +`bool` is, after all, isomorphic to `Result<(), ()>`. + +### Other miscellaneous notes about `ResultCarrier` + + * Our current lint for unused results could be replaced by one which warns for + any unused result of a type which implements `ResultCarrier`. + + * If there is ever ambiguity due to the result-carrying type being + underdetermined (experience should reveal whether this is a problem in + practice), we could resolve it by defaulting to `Result`. + + * Translating between different result-carrying types with the same `Normal` + and `Exception` types *should*, but may not necessarily *currently* be, a + machine-level no-op most of the time. + + We could/should make it so that: + + * repr(`Option`) = repr(`Result`) + * repr(`bool`) = repr(`Option<()>`) = repr(`Result<(), ()>`) + + If these hold, then `translate` between these types could in theory be + compiled down to just a `transmute`. (Whether LLVM is smart enough to do + this, I don't know.) + + * The `translate()` function smells to me like a natural transformation between + functors, but I'm not category theorist enough for it to be obvious. + + +### Alternative formulations of the `ResultCarrier` trait + +All of these have the form: + + trait ResultCarrier { + type Normal; + type Exception; + ...methods... + } + +and differ only in the methods, which will be given. + +#### Explicit isomorphism with `Result` + + fn from_result(Result) -> Self; + fn to_result(Self) -> Result; + +This is, of course, the simplest possible formulation. + +The drawbacks are that it, in some sense, privileges `Result` over other +potentially equivalent types, and that it may be less efficient for those types: +for any non-`Result` type, every operation requires two method calls (one into +`Result`, and one out), whereas with the `ResultCarrier` trait in the main text, +they only require one. + +Laws: + + * For all `x`, `from_result(to_result(x))` = `x`. + * For all `x`, `to_result(from_result(x))` = `x`. + +Laws for the remaining formulations below are left as an exercise for the +reader. + +#### Avoid privileging `Result`, most naive version + + fn embed_normal(Normal) -> Self; + fn embed_exception(Exception) -> Self; + fn is_normal(&Self) -> bool; + fn is_exception(&Self) -> bool; + fn assert_normal(Self) -> Normal; + fn assert_exception(Self) -> Exception; + +Of course this is horrible. + +#### Destructuring with HOFs (a.k.a. Church/Scott-encoding) + + fn embed_normal(Normal) -> Self; + fn embed_exception(Exception) -> Self; + fn match_carrier(Self, FnOnce(Normal) -> T, FnOnce(Exception) -> T) -> T; + +This is probably the right approach for Haskell, but not for Rust. + +With this formulation, because they each take ownership of them, the two +closures may not even close over the same variables! + +#### Destructuring with HOFs, round 2 + + trait BiOnceFn { + type ArgA; + type ArgB; + type Ret; + fn callA(Self, ArgA) -> Ret; + fn callB(Self, ArgB) -> Ret; + } + + trait ResultCarrier { + type Normal; + type Exception; + fn normal(Normal) -> Self; + fn exception(Exception) -> Self; + fn match_carrier(Self, BiOnceFn) -> T; + } + +Here we solve the environment-sharing problem from above: instead of two objects +with a single method each, we use a single object with two methods! I believe +this is the most flexible and general formulation (which is however a strange +thing to believe when they are all equivalent to each other). Of course, it's +even more awkward syntactically. diff --git a/text/0246-const-vs-static.md b/text/0246-const-vs-static.md new file mode 100644 index 00000000000..9daf3df619b --- /dev/null +++ b/text/0246-const-vs-static.md @@ -0,0 +1,237 @@ +- Start Date: 2014-08-08 +- RFC PR: [rust-lang/rfcs#246](https://github.com/rust-lang/rfcs/pull/246) +- Rust Issue: [rust-lang/rust#17718](https://github.com/rust-lang/rust/issues/17718) + +# Summary + +Divide global declarations into two categories: + +- **constants** declare *constant values*. These represent a value, + not a memory address. This is the most common thing one would reach + for and would replace `static` as we know it today in almost all + cases. +- **statics** declare *global variables*. These represent a memory + address. They would be rarely used: the primary use cases are + global locks, global atomic counters, and interfacing with legacy C + libraries. + +# Motivation + +We have been wrestling with the best way to represent globals for some +times. There are a number of interrelated issues: + +- *Significant addresses and inlining:* For optimization purposes, it + is useful to be able to inline constant values directly into the + program. It is even more useful if those constant values do not have + known addresses, because that means the compiler is free to replicate + them as it wishes. Moreover, if a constant is inlined into downstream + crates, then they must be recompiled whenever that constant changes. +- *Read-only memory:* Whenever possible, we'd like to place large + constants into read-only memory. But this means that the data must + be truly immutable, or else a segfault will result. +- *Global atomic counters and the like:* We'd like to make it possible + for people to create global locks or atomic counters that can be + used without resorting to unsafe code. +- *Interfacing with C code:* Some C libraries require the use of + global, mutable data. Other times it's just convenient and threading + is not a concern. +- *Initializer constants:* There must be a way to have initializer + constants for things like locks and atomic counters, so that people + can write `static MY_COUNTER: AtomicUint = INIT_ZERO` or some + such. It should not be possible to modify these initializer + constants. + +The current design is that we have only one keyword, `static`, which +declares a global variable. By default, global variables do not have +significant addresses and can be inlined into the program. You can make +a global variable have a *significant* address by marking it +`#[inline(never)]`. Furthermore, you can declare a mutable global +using `static mut`: all accesses to `static mut` variables are +considered unsafe. Because we wish to allow `static` values to be +placed in read-only memory, they are forbidden from having a type that +includes interior mutable data (that is, an appearance of `UnsafeCell` +type). + +Some concrete problems with this design are: + +- There is no way to have a safe global counter or lock. Those must be + placed in `static mut` variables, which means that access to them is + illegal. To resolve this, there is an alternative proposal, according + to which, access to `static mut` is considered safe if the type of the + static mut meets the `Sync` trait. +- The significance (no pun intended) of the `#[inline(never)]` annotation + is not intuitive. +- There is no way to have a generic type constant. + +Other less practical and more aesthetic concerns are: + +- Although `static` and `let` look and feel analogous, the two behave + quite differently. Generally speaking, `static` declarations do not + declare variables but rather values, which can be inlined and which + do not have fixed addresses. You cannot have interior mutability in + a `static` variable, but you can in a `let`. So that `static` + variables can appear in patterns, it is illegal to shadow a `static` + variable -- but `let` variables cannot appear in patterns. Etc. +- There are other constructs in the language, such as nullary enum + variants and nullary structs, which look like global data but in + fact act quite differently. They are actual values which do not have + addresses. They are categorized as rvalues and so forth. + +# Detailed design + +## Constants + +Reintroduce a `const` declaration which declares a *constant*: + + const name: type = value; + +Constants may be declared in any scope. They cannot be shadowed. +Constants are considered rvalues. Therefore, taking the address of a +constant actually creates a spot on the local stack -- they by +definition have no significant addresses. Constants are intended to +behave exactly like nullary enum variants. + +### Possible extension: Generic constants + +As a possible extension, it is perfectly reasonable for constants to +have generic parameters. For example, the following constant is legal: + + struct WrappedOption { value: Option } + const NONE = WrappedOption { value: None } + +Note that this makes no sense for a `static` variable, which represents +a memory location and hence must have a concrete type. + +### Possible extension: constant functions + +It is possible to imagine constant functions as well. This could help +to address the problem of encapsulating initialization. To avoid the +need to specify what kinds of code can execute in a constant function, +we can limit them syntactically to a single constant expression that +can be expanded at compilation time (no recursion). + + struct LockedData { lock: Lock, value: T } + + const LOCKED(t: T) -> LockedData { + LockedData { lock: INIT_LOCK, value: t } + } + +This would allow us to make the `value` field on `UnsafeCell` private, +among other things. + +## Static variables + +Repurpose the `static` declaration to declare static variables +only. Static variables always have single addresses. `static` +variables can optionally be declared as `mut`. The lifetime of a +`static` variable is `'static`. It is not legal to move from a static. +Accesses to a static variable generate actual reads and writes: the +value is not inlined (but see "Unresolved Questions" below). + +Non-`mut` statics must have a type that meets the `Sync` bound. All +access to the static is considered safe (that is, reading the variable +and taking its address). If the type of the static does not contain +an `UnsafeCell` in its interior, the compiler may place it in +read-only memory, but otherwise it must be placed in mutable memory. + +`mut` statics may have any type. All access is considered unsafe. +They may not be placed in read-only memory. + +## Globals referencing Globals + +### const => const + +It is possible to create a `const` or a `static` which references another +`const` or another `static` by its address. For example: + + struct SomeStruct { x: uint } + const FOO: SomeStruct = SomeStruct { x: 1 }; + const BAR: &'static SomeStruct = &FOO; + +Constants are generally inlined into the stack frame from which they are +referenced, but in a static context there is no stack frame. Instead, the +compiler will reinterpret this as if it were written as: + + struct SomeStruct { x: uint } + const FOO: SomeStruct = SomeStruct { x: 1 }; + const BAR: &'static SomeStruct = { + static TMP: SomeStruct = FOO; + &TMP + }; + +Here a `static` is introduced to be able to give the `const` a pointer which +does indeed have the `'static` lifetime. Due to this rewriting, the compiler +will disallow `SomeStruct` from containing an `UnsafeCell` (interior +mutability). In general, a constant A cannot reference the address of another +constant B if B contains an `UnsafeCell` in its interior. + +### const => static + +It is illegal for a constant to refer to another static. A constant represents a +*constant* value while a static represents a memory location, and this sort of +reference is difficult to reconcile in light of their definitions. + +### static => const + +If a `static` references the address of a `const`, then a similar rewriting +happens, but there is no interior mutability restriction (only a `Sync` +restriction). + +### static => static + +It is illegal for a `static` to reference another `static` by value. It is +required that all references be borrowed. Additionally, not all kinds of borrows +are allowed, only explicitly taking the address of another static is allowed. +For example, interior borrows of fields and elements or accessing elements of an +array are both disallowed. + +If a by-value reference were allowed, then this sort of reference would require +that the static being referenced fall into one of two categories: + +1. It's an initializer pattern. This is the purpose of `const`, however. +2. The values are kept in sync. This is currently technically infeasible. + +Instead of falling into one of these two categories, the compiler will instead +disallow any references to statics by value (from other statics). + +## Patterns + +Today, a `static` is allowed to be used in pattern matching. With the +introduction of `const`, however, a `static` will be forbidden from appearing +in a pattern match, and instead only a `const` can appear. + +# Drawbacks + +This RFC introduces two keywords for global data. Global data is kind +of an edge feature so this feels like overkill. (On the other hand, +the only keyword that most Rust programmers should need to know is +`const` -- I imagine `static` variables will be used quite rarely.) + +# Alternatives + +The other design under consideration is to keep the current split but +make access to `static mut` be considered safe if the type of the +static mut is `Sync`. For the details of this discussion, please see +[RFC 177](https://github.com/rust-lang/rfcs/pull/177). + +One serious concern is with regard to timing. Adding more things to +the Rust 1.0 schedule is inadvisable. Therefore, it would be possible +to take a hybrid approach: keep the current `static` rules, or perhaps +the variation where access to `static mut` is safe, for the time +being, and create `const` declarations after Rust 1.0 is released. + +# Unresolved questions + +- Should the compiler be allowed to inline the values of `static` + variables which are deeply immutable (and thus force recompilation)? + +- Should we permit `static` variables whose type is not `Sync`, but + simply make access to them unsafe? + +- Should we permit `static` variables whose type is not `Sync`, but whose + initializer value does not actually contain interior mutability? For example, + a `static` of `Option>` with the initializer of `None` is in + theory safe. + +- How hard are the envisioned extensions to implement? If easy, they + would be nice to have. If hard, they can wait. diff --git a/text/0255-object-safety.md b/text/0255-object-safety.md new file mode 100644 index 00000000000..bb5284fa331 --- /dev/null +++ b/text/0255-object-safety.md @@ -0,0 +1,216 @@ +- Start Date: 2014-09-22 +- RFC PR: [rust-lang/rfcs#255](https://github.com/rust-lang/rfcs/pull/255) +- Rust Issue: [rust-lang/rust#17670](https://github.com/rust-lang/rust/issues/17670) + +# Summary + +Restrict which traits can be used to make trait objects. + +Currently, we allow any traits to be used for trait objects, but restrict the +methods which can be called on such objects. Here, we propose instead +restricting which traits can be used to make objects. Despite being less +flexible, this will make for better error messages, less surprising software +evolution, and (hopefully) better design. The motivation for the proposed change +is stronger due to part of the DST changes. + +# Motivation + +Part of the planned, in progress DST work is to allow trait objects where a +trait is expected. Example: + +```rust +fn foo(y: &T) { ... } + +fn bar(x: &SomeTrait) { + foo(x) +} +``` + +Previous to DST the call to `foo` was not expected to work because `SomeTrait` +was not a type, so it could not instantiate `T`. With DST this is possible, and +it makes intuitive sense for this to work (an alternative is to require `impl +SomeTrait for SomeTrait { ... }`, but that seems weird and confusing and rather +like boilerplate. Note that the precise mechanism here is out of scope for this +RFC). + +This is only sound if the trait is /object-safe/. We say a method `m` on trait +`T` is object-safe if it is legal (in current Rust) to call `x.m(...)` where `x` +has type `&T`, i.e., `x` is a trait object. If all methods in `T` are object- +safe, then we say `T` is object-safe. + +If we ignore this restriction we could allow code such as the following: + +```rust +trait SomeTrait { + fn foo(&self, other: &Self) { ... } // assume self and other have the same concrete type +} + +fn bar(x: &T, y: &T) { + x.foo(y); // x and y may have different concrete types, pre-DST we could + // assume that x and y had the same concrete types. +} + +fn baz(x: &SomeTrait, y: &SomeTrait) { + bar(x, y) // x and y may have different concrete types +} +``` + +This RFC proposes enforcing object-safety when trait objects are created, rather +than where methods on a trait object are called or where we attempt to match +traits. This makes both method call and using trait objects with generic code +simpler. The downside is that it makes Rust less flexible, since not all traits +can be used to create trait objects. + +Software evolution is improved with this proposal: imagine adding a non-object- +safe method to a previously object-safe trait. With this proposal, you would +then get errors wherever a trait-object is created. The error would explain why +the trait object could not be created and point out exactly which method was to +blame and why. Without this proposal, the only errors you would get would be +where a trait object is used with a generic call and would be something like +"type error: SomeTrait does not implement SomeTrait" - no indication that the +non-object-safe method were to blame, only a failure in trait matching. + +Another advantage of this proposal is that it implies that all +method-calls can always be rewritten into an equivalent [UFCS] +call. This simplifies the "core language" and makes method dispatch +notation -- which involves some non-trivial inference -- into a kind +of "sugar" for the more explicit UFCS notation. + +# Detailed design + +To be precise about object-safety, an object-safe method must meet one +of the following conditions: + +* require `Self : Sized`; or, +* meet all of the following conditions: + * must not have any type parameters; and, + * must have a receiver that has type `Self` or which dereferences to the `Self` type; + - for now, this means `self`, `&self`, `&mut self`, or `self: Box`, + but eventually this should be extended to custom types like + `self: Rc` and so forth. + * must not use `Self` (in the future, where we allow arbitrary types + for the receiver, `Self` may only be used for the type of the + receiver and only where we allow `Sized?` types). + +A trait is object-safe if all of the following conditions hold: + +* all of its methods are object-safe; and, +* the trait does not require that `Self : Sized` (see also [RFC 546]). + +When an expression with pointer-to-concrete type is coerced to a trait object, +the compiler will check that the trait is object-safe (in addition to the usual +check that the concrete type implements the trait). It is an error for the trait +to be non-object-safe. + +Note that a trait can be object-safe even if some of its methods use +features that are not supported with an object receiver. This is true +when code that attempted to use those features would only work if the +`Self` type is `Sized`. This is why all methods that require +`Self:Sized` are exempt from the typical rules. This is also why +by-value self methods are permitted, since currently one cannot invoke +pass an unsized type by-value (though we consider that a useful future +extension). + +# Drawbacks + +This is a breaking change and forbids some safe code which is legal +today. This can be addressed in two ways: splitting traits, or adding +`where Self:Sized` clauses to methods that cannot not be used with +objects. + +### Example problem + +Here is an example trait that is not object safe: + +```rust +trait SomeTrait { + fn foo(&self) -> int { ... } + + // Object-safe methods may not return `Self`: + fn new() -> Self; +} +``` + +### Splitting a trait + +One option is to split a trait into object-safe and non-object-safe +parts. We hope that this will lead to better design. We are not sure +how much code this will affect, it would be good to have data about +this. + +```rust +trait SomeTrait { + fn foo(&self) -> int { ... } +} + +trait SomeTraitCtor : SomeTrait { + fn new() -> Self; +} +``` + +### Adding a where-clause + +Sometimes adding a second trait feels like overkill. In that case, it +is often an option to simply add a `where Self:Sized` clause to the +methods of the trait that would otherwise violate the object safety +rule. + +```rust +trait SomeTrait { + fn foo(&self) -> int { ... } + + fn new() -> Self + where Self : Sized; // this condition is new +} +``` + +The reason that this makes sense is that if one were writing a generic +function with a type parameter `T` that may range over the trait +object, that type parameter would have to be declared `?Sized`, and +hence would not have access to the `new` method: + +```rust +fn baz(t: &T) { + let v: T = SomeTrait::new(); // illegal because `T : Sized` is not known to hold +} +``` + +However, if one writes a function with sized type parameter, which +could never be a trait object, then the `new` function becomes +available. + +```rust +fn baz(t: &T) { + let v: T = SomeTrait::new(); // OK +} +``` + +# Alternatives + +We could continue to check methods rather than traits are +object-safe. When checking the bounds of a type parameter for a +function call where the function is called with a trait object, we +would check that all methods are object-safe as part of the check that +the actual type parameter satisfies the formal bounds. We could +probably give a different error message if the bounds are met, but the +trait is not object-safe. + +We might in the future use finer-grained reasoning to permit more +non-object-safe methods from appearing in the trait. For example, we +might permit `fn foo() -> Self` because it (implicitly) requires that +`Self` be sized. Similarly, we might permit other tests beyond just +sized-ness. Any such extension would be backwards compatible. + +# Unresolved questions + +N/A + +# Edits + +* 2014-02-09. Edited by Nicholas Matsakis to (1) include the + requirement that object-safe traits do not require `Self:Sized` and + (2) specify that methods may include `where Self:Sized` to overcome + object safety restrictions. + +[UFCS]: 0132-ufcs.md +[RFC 546]: 0546-Self-not-sized-by-default.md diff --git a/text/0256-remove-refcounting-gc-of-t.md b/text/0256-remove-refcounting-gc-of-t.md new file mode 100644 index 00000000000..aff8e8309ab --- /dev/null +++ b/text/0256-remove-refcounting-gc-of-t.md @@ -0,0 +1,237 @@ +- Start Date: 2014-09-19 +- RFC PR: https://github.com/rust-lang/rfcs/pull/256 +- Rust Issue: https://github.com/rust-lang/rfcs/pull/256 + +# Summary + +Remove the reference-counting based `Gc` type from the standard +library and its associated support infrastructure from `rustc`. + +Doing so lays a cleaner foundation upon which to prototype a proper +tracing GC, and will avoid people getting incorrect impressions of +Rust based on the current reference-counting implementation. + +# Motivation + +## Ancient History + +Long ago, the Rust language had integrated support for automatically +managed memory with arbitrary graph structure (notably, multiple +references to the same object), via the type constructors `@T` and +`@mut T` for any `T`. The intention was that Rust would provide a +task-local garbage collector as part of the standard runtime for Rust +programs. + +As a short-term convenience, `@T` and `@mut T` were implemented via +reference-counting: each instance of `@T`/`@mut T` had a reference +count added to it (as well as other meta-data that were again for +implementation convenience). To support this, the `rustc` compiler +would emit, for any instruction copying or overwriting an instance of +`@T`/`@mut T`, code to update the reference count(s) accordingly. + +(At the same time, `@T` was still considered an instance of `Copy` by +the compiler. Maintaining the reference counts of `@T` means that you +*cannot* create copies of a given type implementing `Copy` by +`memcpy`'ing blindly; one must distinguish so-called "POD" data that +is `Copy and contains no `@T` from "non-POD" `Copy` data that can +contain `@T` and thus must be sure to update reference counts when +creating a copy.) + +Over time, `@T` was replaced with the library type `Gc` (and `@mut +T` was rewritten as `Gc>`), but the intention was that Rust +would still have integrated support for a garbage collection. To +continue supporting the reference-count updating semantics, the +`Gc` type has a lang item, `"gc"`. In effect, all of the compiler +support for maintaining the reference-counts from the prior `@T` was +still in place; the move to a library type `Gc` was just a shift in +perspective from the end-user's point of view (and that of the +parser). + +## Recent history: Removing uses of Gc from the compiler + +Largely due to the tireless efforts of `eddyb`, one of the primary +clients of `Gc`, namely the `rustc` compiler itself, has little to +no remaining uses of `Gc`. + +## A new hope + +This means that we have an opportunity now, to remove the `Gc` type +from `libstd`, and its associated built-in reference-counting support +from `rustc` itself. + +I want to distinguish removal of the particular reference counting +`Gc` from our compiler and standard library (which is what is being +proposed here), from removing the goal of supporting a garbage +collected `Gc` in the future. I (and I think the majority of the +Rust core team) still believe that there are use cases that would be +well handled by a proper tracing garbage collector. + +The expected outcome of removing reference-counting `Gc` are as follows: + + * A cleaner compiler code base, + + * A cleaner standard library, where `Copy` data can be indeed copied + blindly (assuming the source and target types are in agreement, + which is required for a tracing GC), + + * It would become impossible for users to use `Gc` and then get + incorrect impressions about how Rust's GC would behave in the + future. In particular, if we leave the reference-counting `Gc` + in place, then users may end up depending on implementation + artifacts that we would be pressured to continue supporting in the + future. (Note that `Gc` is already marked "experimental", so + this particular motivation is not very strong.) + +# Detailed design + +Remove the `std::gc` module. This, I believe, is the extent of the +end-user visible changes proposed by this RFC, at least for users who +are using `libstd` (as opposed to implementing their own). + +Then remove the `rustc` support for `Gc`. As part of this, we can +either leave in or remove the `"gc"` and `"managed_heap"` entries in +the lang items table (in case they could be of use for a future GC +implementation). I propose leaving them, but it does not matter +terribly to me. The important thing is that once `std::gc` is gone, +then we can remove the support code associated with those two lang +items, which is the important thing. + +# Drawbacks + +Taking out the reference-counting `Gc` now may lead people to think +that Rust will never have a `Gc`. + + * In particular, having `Gc` in place now means that it is easier + to argue for putting in a tracing collector (since it would be a + net win over the status quo, assuming it works). + + (This sub-bullet is a bit of a straw man argument, as I suspect any + community resistance to adding a tracing GC will probably be + unaffected by the presence or absence of the reference-counting + `Gc`.) + + * As another related note, it may confuse people to take out a + `Gc` type now only to add another implementation with the same + name later. (Of course, is that more or less confusing than just + replacing the underlying implementation in such a severe manner.) + +Users may be using `Gc` today, and they would have to switch to +some other option (such as `Rc`, though note that the two are not +100% equivalent; see [Gc versus Rc] appendix). + +# Alternatives + +Keep the `Gc` implementation that we have today, and wait until we +have a tracing GC implemented and ready to be deployed before removing +the reference-counting infrastructure that had been put in to support +`@T`. (Which may never happen, since adding a tracing GC is only a +goal, not a certainty, and thus we may be stuck supporting the +reference-counting `Gc` until we eventually do decide to remove +`Gc` in the future. So this RFC is just suggesting we be proactive +and pull that band-aid off now. + +# Unresolved questions + +None yet. + +# Appendices + +## Gc versus Rc + +There are performance differences between the current ref-counting +`Gc` and the library type `Rc`, but such differences are beneath +the level of abstraction of interest to this RFC. The main user +observable difference between the ref-counting `Gc` and the library +type `Rc` is that cyclic structure allocated via `Gc` will be +torn down when the task itself terminates successfully or via unwind. + +The following program illustrates this difference. If you have a +program that is using `Gc` and is relying on this tear-down behavior +at task death, then switching to `Rc` will not suffice. + +```rust +use std::cell::RefCell; +use std::gc::{GC,Gc}; +use std::io::timer; +use std::rc::Rc; +use std::time::Duration; + +struct AnnounceDrop { name: String } + +#[allow(non_snake_case)] +fn AnnounceDrop(s:S) -> AnnounceDrop { + AnnounceDrop { name: s.as_slice().to_string() } +} + +impl Drop for AnnounceDrop{ + fn drop(&mut self) { + println!("dropping {}", self.name); + } +} + +struct RcCyclic { _on_drop: D, recur: Option>>> } +struct GcCyclic { _on_drop: D, recur: Option>>> } + +type RRRcell = Rc>>; +type GRRcell = Gc>>; + +fn make_rc_and_gc(name: S) -> (RRRcell, GRRcell) { + let name = name.as_slice().to_string(); + let rc_cyclic = Rc::new(RefCell::new(RcCyclic { + _on_drop: AnnounceDrop(name.clone().append("-rc")), + recur: None, + })); + + let gc_cyclic = box (GC) RefCell::new(GcCyclic { + _on_drop: AnnounceDrop(name.append("-gc")), + recur: None, + }); + + (rc_cyclic, gc_cyclic) +} + +fn make_proc(name: &str, sleep_time: i64, and_then: proc():Send) -> proc():Send { + let name = name.to_string(); + proc() { + let (rc_cyclic, gc_cyclic) = make_rc_and_gc(name); + + rc_cyclic.borrow_mut().recur = Some(rc_cyclic.clone()); + gc_cyclic.borrow_mut().recur = Some(gc_cyclic); + + timer::sleep(Duration::seconds(sleep_time)); + + and_then(); + } +} + +fn main() { + let (_rc_noncyclic, _gc_noncyclic) = make_rc_and_gc("main-noncyclic"); + + spawn(make_proc("success-cyclic", 2, proc () {})); + + spawn(make_proc("failure-cyclic", 1, proc () { fail!("Oop"); })); + + println!("Hello, world!") +} +``` + +The above program produces output as follows: + +``` +% rustc gc-vs-rc-sample.rs && ./gc-vs-rc-sample +Hello, world! +dropping main-noncyclic-gc +dropping main-noncyclic-rc +task '' failed at 'Oop', gc-vs-rc-sample.rs:60 +dropping failure-cyclic-gc +dropping success-cyclic-gc +``` + +This illustrates that both `Gc` and `Rc` will be reclaimed when +used to represent non-cyclic data (the cases labelled +`main-noncyclic-gc` and `main-noncyclic-rc`. But when you actually +complete the cyclic structure, then in the tasks that run to +completion (either successfully or unwinding from a failure), we still +manage to drop the `Gc` cyclic structures, illustrated by the +printouts from the cases labelled `failure-cyclic-gc` and +`success-cyclic-gc`. diff --git a/text/0320-nonzeroing-dynamic-drop.md b/text/0320-nonzeroing-dynamic-drop.md new file mode 100644 index 00000000000..81069c29e7f --- /dev/null +++ b/text/0320-nonzeroing-dynamic-drop.md @@ -0,0 +1,760 @@ +- Feature Name: (none for the bulk of RFC); unsafe_no_drop_flag +- Start Date: 2014-09-24 +- RFC PR: [rust-lang/rfcs#320](https://github.com/rust-lang/rfcs/pull/320) +- Rust Issue: [rust-lang/rust#5016](https://github.com/rust-lang/rust/issues/5016) + +# Summary + +Remove drop flags from values implementing `Drop`, and remove +automatic memory zeroing associated with dropping values. + +Keep dynamic drop semantics, by having each function maintain a +(potentially empty) set of auto-injected boolean flags for the drop +obligations for the function that need to be tracked dynamically +(which we will call "dynamic drop obligations"). + +# Motivation + +Currently, implementing `Drop` on a struct (or enum) injects a hidden +bit, known as the "drop-flag", into the struct (and likewise, each of +the enum variants). The drop-flag, in tandem with Rust's implicit +zeroing of dropped values, tracks whether a value has already been +moved to another owner or been dropped. (See the ["How dynamic drop +semantics works"](#how-dynamic-drop-semantics-works) appendix for more +details if you are unfamiliar with this part of Rust's current +implementation.) + +However, the above implementation is sub-optimal; problems include: + + * Most important: implicit memory zeroing is a hidden cost that today + all Rust programs pay, in both execution time and code size. + With the removal of the drop flag, we can remove implicit memory + zeroing (or at least revisit its utility -- there may be other + motivations for implicit memory zeroing, e.g. to try to keep secret + data from being exposed to unsafe code). + + * Hidden bits are bad: Users coming from a C/C++ background + expect `struct Foo { x: u32, y: u32 }` to occupy 8 bytes, but if + `Foo` implements `Drop`, the hidden drop flag will cause it to + double in size (16 bytes). + See the [Program illustrating semantic impact of hidden drop flag] + appendix for a concrete illustration. Note that when `Foo` + implements `Drop`, each instance of `Foo` carries a drop-flag, even + in contexts like a `Vec` where a program + cannot actually move individual values out of the collection. + Thus, the amount of extra memory being used by drop-flags is not + bounded by program stack growth; the memory wastage is strewn + throughout the heap. + +An earlier RFC (the withdrawn [RFC PR #210]) suggested resolving this +problem by switching from a dynamic drop semantics to a "static drop +semantics", which was defined in that RFC as one that performs drop of +certain values earlier to ensure that the set of drop-obligations does +not differ at any control-flow merge point, i.e. to ensure that the +set of values to drop is statically known at compile-time. + +[RFC PR #210]: https://github.com/rust-lang/rfcs/pull/210 + +However, discussion on the [RFC PR #210] comment thread pointed out +its policy for inserting early drops into the code is non-intuitive +(in other words, that the drop policy should either be more +aggressive, a la [RFC PR #239], or should stay with the dynamic drop +status quo). Also, the mitigating mechanisms proposed by that RFC +(`NoisyDrop`/`QuietDrop`) were deemed unacceptable. + +[RFC PR #239]: https://github.com/rust-lang/rfcs/pull/239 + +So, static drop semantics are a non-starter. Luckily, the above +strategy is not the only way to implement dynamic drop semantics. +Rather than requiring that the set of drop-obligations be the same at +every control-flow merge point, we can do a intra-procedural static +analysis to identify the set of drop-obligations that differ at any +merge point, and then inject a set of stack-local boolean-valued +drop-flags that dynamically track them. That strategy is what this +RFC is describing. + +The expected outcomes are as follows: + + * We remove the drop-flags from all structs/enums that implement + `Drop`. (There are still the injected stack-local drop flags, but + those should be cheaper to inject and maintain.) + + * Since invoking drop code is now handled by the stack-local drop + flags and we have no more drop-flags on the values themselves, + we can (and will) remove memory zeroing. + + * Libraries currently relying on drop doing memory zeroing (i.e. + libraries that check whether content is zero to decide whether its + `fn drop` has been invoked will need to be revised, since we will + not have implicit memory zeroing anymore. + + * In the common case, most libraries using `Drop` will not need to + change at all from today, apart from the caveat in the previous + bullet. + +# Detailed design + + +## Drop obligations + +No struct or enum has an implicit drop-flag. When a local variable is +initialized, that establishes a set of "drop obligations": a set of +structural paths (e.g. a local `a`, or a path to a field `b.f.y`) that +need to be dropped (or moved away to a new owner). + +The drop obligations for a local variable `x` of struct-type `T` are +computed from analyzing the structure of `T`. If `T` itself +implements `Drop`, then `x` is a drop obligation. If `T` does not +implement `Drop`, then the set of drop obligations is the union of the +drop obligations of the fields of `T`. + +When a path is moved to a new location, or consumed by a function call, +or when control flow reaches the end of its owner's lexical scope, +the path is removed from the set of drop obligations. + +At control-flow merge points, e.g. nodes that have predecessor nodes +P_1, P_2, ..., P_k with drop obligation sets S_1, S_2, ... S_k, we + + * First identify the set of drop obligations that differ between the + predecessor nodes, i.e. the set: + + `(S_1 | S_2 | ... | S_k) \ (S_1 & S_2 & ... & S_k)` + + where `|` denotes set-union, `&` denotes set-intersection, + `\` denotes set-difference. These are the dynamic drop obligations + induced by this merge point. Note that if `S_1 = S_2 = ... = S_k`, + the above set is empty. + + * The set of drop obligations for the merge point itself is the + union of the drop-obligations from all predecessor points in + the control flow, i.e. `(S_1 | S_2 | ... | S_k)` in the + above notation. + + (One could also just use the intersection here; it actually makes + no difference to the static analysis, since all of the elements of + the difference + + `(S_1 | S_2 | ... | S_k) \ (S_1 & S_2 & ... & S_k)` + + have already been added to the set of dynamic drop obligations. + But the overall code transformation is clearer if one keeps + the dynamic drop obligations in the set of drop obligations.) + +## Stack-local drop flags + +For every dynamic drop obligation induced by a merge point, the compiler +is responsible for ensure that its drop code is run at some point. +If necessary, it will inject and maintain boolean flag analogous to +```rust +enum NeedsDropFlag { NeedsLocalDrop, DoNotDrop } +``` + +Some compiler analysis may be able to identify dynamic drop +obligations that do not actually need to be tracked. Therefore, we do +not specify the precise set of boolean flags that are injected. + +## Example of code with dynamic drop obligations + + +The function `f2` below was copied from the static drop [RFC PR #210]; +it has differing sets of drop obligations at a merge point, +necessitating a potential injection of a `NeedsDropFlag`. + +```rust +fn f2() { + + // At the outset, the set of drop obligations is + // just the set of moved input parameters (empty + // in this case). + + // DROP OBLIGATIONS + // ------------------------ + // { } + let pDD : Pair = ...; + pDD.x = ...; + // {pDD.x} + pDD.y = ...; + // {pDD.x, pDD.y} + let pDS : Pair = ...; + // {pDD.x, pDD.y, pDS.x} + let some_d : Option; + // {pDD.x, pDD.y, pDS.x} + if test() { + // {pDD.x, pDD.y, pDS.x} + { + let temp = xform(pDD.y); + // {pDD.x, pDS.x, temp} + some_d = Some(temp); + // {pDD.x, pDS.x, temp, some_d} + } // END OF SCOPE for `temp` + // {pDD.x, pDS.x, some_d} + + // MERGE POINT PREDECESSOR 1 + + } else { + { + // {pDD.x, pDD.y, pDS.x} + let z = D; + // {pDD.x, pDD.y, pDS.x, z} + + // This drops `pDD.y` before + // moving `pDD.x` there. + pDD.y = pDD.x; + + // { pDD.y, pDS.x, z} + some_d = None; + // { pDD.y, pDS.x, z, some_d} + } // END OF SCOPE for `z` + // { pDD.y, pDS.x, some_d} + + // MERGE POINT PREDECESSOR 2 + + } + + // MERGE POINT: set of drop obligations do not + // match on all incoming control-flow paths. + // + // Predecessor 1 has drop obligations + // {pDD.x, pDS.x, some_d} + // and Predecessor 2 has drop obligations + // { pDD.y, pDS.x, some_d}. + // + // Therefore, this merge point implies that + // {pDD.x, pDD.y} are dynamic drop obligations, + // while {pDS.x, some_d} are potentially still + // resolvable statically (and thus may not need + // associated boolean flags). + + // The resulting drop obligations are the following: + + // {pDD.x, pDD.y, pDS.x, some_d}. + + // (... some code that does not change drop obligations ...) + + // {pDD.x, pDD.y, pDS.x, some_d}. + + // END OF SCOPE for `pDD`, `pDS`, `some_d` +} +``` + +After the static analysis has identified all of the dynamic drop +obligations, code is injected to maintain the stack-local drop flags +and to do any necessary drops at the appropriate points. +Below is the updated `fn f2` with an approximation of the injected code. + +Note: we say "approximation", because one does need to ensure that the +drop flags are updated in a manner that is compatible with potential +task `fail!`/`panic!`, because stack unwinding must be informed which +state needs to be dropped; i.e. you need to initialize `_pDD_dot_x` +before you start to evaluate a fallible expression to initialize +`pDD.y`. + + +```rust +fn f2_rewritten() { + + // At the outset, the set of drop obligations is + // just the set of moved input parameters (empty + // in this case). + + // DROP OBLIGATIONS + // ------------------------ + // { } + let _drop_pDD_dot_x : NeedsDropFlag; + let _drop_pDD_dot_y : NeedsDropFlag; + + _drop_pDD_dot_x = DoNotDrop; + _drop_pDD_dot_y = DoNotDrop; + + let pDD : Pair; + pDD.x = ...; + _drop_pDD_dot_x = NeedsLocalDrop; + pDD.y = ...; + _drop_pDD_dot_y = NeedsLocalDrop; + + // {pDD.x, pDD.y} + let pDS : Pair = ...; + // {pDD.x, pDD.y, pDS.x} + let some_d : Option; + // {pDD.x, pDD.y, pDS.x} + if test() { + // {pDD.x, pDD.y, pDS.x} + { + _drop_pDD_dot_y = DoNotDrop; + let temp = xform(pDD.y); + // {pDD.x, pDS.x, temp} + some_d = Some(temp); + // {pDD.x, pDS.x, temp, some_d} + } // END OF SCOPE for `temp` + // {pDD.x, pDS.x, some_d} + + // MERGE POINT PREDECESSOR 1 + + } else { + { + // {pDD.x, pDD.y, pDS.x} + let z = D; + // {pDD.x, pDD.y, pDS.x, z} + + // This drops `pDD.y` before + // moving `pDD.x` there. + _drop_pDD_dot_x = DoNotDrop; + pDD.y = pDD.x; + + // { pDD.y, pDS.x, z} + some_d = None; + // { pDD.y, pDS.x, z, some_d} + } // END OF SCOPE for `z` + // { pDD.y, pDS.x, some_d} + + // MERGE POINT PREDECESSOR 2 + + } + + // MERGE POINT: set of drop obligations do not + // match on all incoming control-flow paths. + // + // Predecessor 1 has drop obligations + // {pDD.x, pDS.x, some_d} + // and Predecessor 2 has drop obligations + // { pDD.y, pDS.x, some_d}. + // + // Therefore, this merge point implies that + // {pDD.x, pDD.y} are dynamic drop obligations, + // while {pDS.x, some_d} are potentially still + // resolvable statically (and thus may not need + // associated boolean flags). + + // The resulting drop obligations are the following: + + // {pDD.x, pDD.y, pDS.x, some_d}. + + // (... some code that does not change drop obligations ...) + + // {pDD.x, pDD.y, pDS.x, some_d}. + + // END OF SCOPE for `pDD`, `pDS`, `some_d` + + // rustc-inserted code (not legal Rust, since `pDD.x` and `pDD.y` + // are inaccessible). + + if _drop_pDD_dot_x { mem::drop(pDD.x); } + if _drop_pDD_dot_y { mem::drop(pDD.y); } +} +``` + +Note that in a snippet like +```rust + _drop_pDD_dot_y = DoNotDrop; + let temp = xform(pDD.y); +``` +this is okay, in part because the evaluating the identifier `xform` is +infallible. If instead it were something like: +```rust + _drop_pDD_dot_y = DoNotDrop; + let temp = lookup_closure()(pDD.y); +``` +then that would not be correct, because we need to set +`_drop_pDD_dot_y` to `DoNotDrop` after the `lookup_closure()` +invocation. + +It may probably be more intellectually honest to write the transformation like: +```rust + let temp = lookup_closure()({ _drop_pDD_dot_y = DoNotDrop; pDD.y }); +``` + + +## Control-flow sensitivity + +Note that the dynamic drop obligations are based on a control-flow +analysis, *not* just the lexical nesting structure of the code. + +In particular: If control flow splits at a point like an if-expression, +but the two arms never meet, then they can have completely +sets of drop obligations. + +This is important, since in coding patterns like loops, one +often sees different sets of drop obligations prior to a `break` +compared to a point where the loop repeats, such as a `continue` +or the end of a `loop` block. + +```rust + // At the outset, the set of drop obligations is + // just the set of moved input parameters (empty + // in this case). + + // DROP OBLIGATIONS + // ------------------------ + // { } + let mut pDD : Pair = mk_dd(); + let mut maybe_set : D; + + // { pDD.x, pDD.y } + 'a: loop { + // MERGE POINT + + // { pDD.x, pDD.y } + if test() { + // { pDD.x, pDD.y } + consume(pDD.x); + // { pDD.y } + break 'a; + } + // *not* merge point (only one path, the else branch, flows here) + + // { pDD.x, pDD.y } + + // never falls through; must merge with 'a loop. + } + + // RESUME POINT: break 'a above flows here + + // { pDD.y } + + // This is the point immediately preceding `'b: loop`; (1.) below. + + 'b: loop { + // MERGE POINT + // + // There are *three* incoming paths: (1.) the statement + // preceding `'b: loop`, (2.) the `continue 'b;` below, and + // (3.) the end of the loop's block below. The drop + // obligation for `maybe_set` originates from (3.). + + // { pDD.y, maybe_set } + + consume(pDD.y); + + // { , maybe_set } + + if test() { + // { , maybe_set } + pDD.x = mk_d(); + // { pDD.x , maybe_set } + break 'b; + } + + // *not* merge point (only one path flows here) + + // { , maybe_set } + + if test() { + // { , maybe_set } + pDD.y = mk_d(); + + // This is (2.) referenced above. { pDD.y, maybe_set } + continue 'b; + } + // *not* merge point (only one path flows here) + + // { , maybe_set } + + pDD.y = mk_d(); + // This is (3.) referenced above. { pDD.y, maybe_set } + + maybe_set = mk_d(); + g(&maybe_set); + + // This is (3.) referenced above. { pDD.y, maybe_set } + } + + // RESUME POINT: break 'b above flows here + + // { pDD.x , maybe_set } + + // when we hit the end of the scope of `maybe_set`; + // check its stack-local flag. +``` + +Likewise, a `return` statement represents another control flow jump, +to the end of the function. + +## Remove implicit memory zeroing + +With the above in place, the remainder is relatively trivial. +The compiler can be revised to no longer inject a drop flag into +structs and enums that implement `Drop`, and likewise memory zeroing can +be removed. + +Beyond that, the libraries will obviously need to be audited for +dependence on implicit memory zeroing. + +# Drawbacks + +The only reasons not do this are: + + 1. Some hypothetical reason to *continue* doing implicit memory zeroing, or + + 2. We want to abandon dynamic drop semantics. + +At this point Felix thinks the Rust community has made a strong +argument in favor of keeping dynamic drop semantics. + +# Alternatives + +* Static drop semantics [RFC PR #210] has been referenced frequently + in this document. + +* Eager drops [RFC PR #239] is the more aggressive semantics that + would drop values immediately after their final use. This would + probably invalidate a number of RAII style coding patterns. + +# Optional Extensions + +## A lint identifying dynamic drop obligations + +Add a lint (set by default to `allow`) that reports potential dynamic +drop obligations, so that end-user code can opt-in to having them +reported. The expected benefits of this are: + + 1. developers may have intended for a value to be moved elsewhere on + all paths within a function, and, + + 2. developers may want to know about how many boolean dynamic drop + flags are potentially being injected into their code. + +# Unresolved questions + +## How to handle moves out of `array[index_expr]` + +Niko pointed out to me today that my prototype was not addressing +moves out of `array[index_expr]` properly. I was assuming +that we would just make such an expression illegal (or that they +should already be illegal). + +But they are not already illegal, and above assumption that we +would make it illegal should have been explicit. That, or we +should address the problem in some other way. + +To make this concrete, here is some code that runs today: + +```rust +#[deriving(Show)] +struct AnnounceDrop { name: &'static str } + +impl Drop for AnnounceDrop { + fn drop(&mut self) { println!("dropping {}", self.name); } +} + +fn foo(a: [A, ..3], i: uint) -> A { + a[i] +} + +fn main() { + let a = [AnnounceDrop { name: "fst" }, + AnnounceDrop { name: "snd" }, + AnnounceDrop { name: "thd" }]; + let r = foo(a, 1); + println!("foo returned {}", r); +} +``` + +This prints: +``` +dropping fst +dropping thd +foo returned AnnounceDrop { name: snd } +dropping snd +``` + +because it first moves the entire array into `foo`, and then `foo` +returns the second element, but still needs to drop the rest of the +array. + +Embedded drop flags and zeroing support this seamlessly, of course. +But the whole point of this RFC is to get rid of the embedded +per-value drop-flags. + +If we want to continue supporting moving out of `a[i]` (and we +probably do, I have been converted on this point), then the drop flag +needs to handle this case. Our current thinking is that we can +support it by using a single *`uint`* flag (as opposed to the booleans +used elsewhere) for such array that has been moved out of. The `uint` +flag represents "drop all elements from the array *except* for the one +listed in the flag." (If it is only moved out of on one branch and +not another, then we would either use an `Option`, or still use +`uint` and just represent unmoved case via some value that is not +valid index, such as the length of the array). + +## Should we keep `#[unsafe_no_drop_flag]` ? + +Currently there is an `unsafe_no_drop_flag` attribute that is used to +indicate that no drop flag should be associated with a struct/enum, +and instead the user-written drop code will be run multiple times (and +thus must internally guard itself from its own side-effects; e.g. do +not attempt to free the backing buffer for a `Vec` more than once, by +tracking within the `Vec` itself if the buffer was previously freed). + +The "obvious" thing to do is to remove `unsafe_no_drop_flag`, since +the per-value drop flag is going away. However, we *could* keep the +attribute, and just repurpose its meaning to instead mean the +following: *Never* inject a dynamic stack-local drop-flag for this +value. Just run the drop code multiple times, just like today. + +In any case, since the semantics of this attribute are unstable, we +will feature-gate it (with feature name `unsafe_no_drop_flag`). + +# Appendices + +## How dynamic drop semantics works + +(This section is just presenting background information on the +semantics of `drop` and the drop-flag as it works in Rust today; it +does not contain any discussion of the changes being proposed by this +RFC.) + +A struct or enum implementing `Drop` will have its drop-flag +automatically set to a non-zero value when it is constructed. When +attempting to drop the struct or enum (i.e. when control reaches the +end of the lexical scope of its owner), the injected glue code will +only execute its associated `fn drop` if its drop-flag is non-zero. + +In addition, the compiler injects code to ensure that when a value is +moved to a new location in memory or dropped, then the original memory +is entirely zeroed. + +A struct/enum definition implementing `Drop` can be tagged with the +attribute `#[unsafe_no_drop_flag]`. When so tagged, the struct/enum +will not have a hidden drop flag embedded within it. In this case, the +injected glue code will execute the associated glue code +unconditionally, even though the struct/enum value may have been moved +to a new location in memory or dropped (in either case, the memory +representing the value will have been zeroed). + +The above has a number of implications: + + * A program can manually cause the drop code associated with a value + to be skipped by first zeroing out its memory. + + * A `Drop` implementation for a struct tagged with `unsafe_no_drop_flag` + must assume that it will be called more than once. (However, every + call to `drop` after the first will be given zeroed memory.) + +### Program illustrating semantic impact of hidden drop flag + +```rust +#![feature(macro_rules)] + +use std::fmt; +use std::mem; + +#[deriving(Clone,Show)] +struct S { name: &'static str } + +#[deriving(Clone,Show)] +struct Df { name: &'static str } + +#[deriving(Clone,Show)] +struct Pair{ x: X, y: Y } + +static mut current_indent: uint = 0; + +fn indent() -> String { + String::from_char(unsafe { current_indent }, ' ') +} + +impl Drop for Df { + fn drop(&mut self) { + println!("{}dropping Df {}", indent(), self.name) + } +} + +macro_rules! struct_Dn { + ($Dn:ident) => { + + #[unsafe_no_drop_flag] + #[deriving(Clone,Show)] + struct $Dn { name: &'static str } + + impl Drop for $Dn { + fn drop(&mut self) { + if unsafe { (0,0) == mem::transmute::<_,(uint,uint)>(self.name) } { + println!("{}dropping already-zeroed {}", + indent(), stringify!($Dn)); + } else { + println!("{}dropping {} {}", + indent(), stringify!($Dn), self.name) + } + } + } + } +} + +struct_Dn!(DnA) +struct_Dn!(DnB) +struct_Dn!(DnC) + +fn take_and_pass(t: T) { + println!("{}t-n-p took and will pass: {}", indent(), &t); + unsafe { current_indent += 4; } + take_and_drop(t); + unsafe { current_indent -= 4; } +} + +fn take_and_drop(t: T) { + println!("{}t-n-d took and will drop: {}", indent(), &t); +} + +fn xform(mut input: Df) -> Df { + input.name = "transformed"; + input +} + +fn foo(b: || -> bool) { + let mut f1 = Df { name: "f1" }; + let mut n2 = DnC { name: "n2" }; + let f3 = Df { name: "f3" }; + let f4 = Df { name: "f4" }; + let f5 = Df { name: "f5" }; + let f6 = Df { name: "f6" }; + let n7 = DnA { name: "n7" }; + let _fx = xform(f6); // `f6` consumed by `xform` + let _n9 = DnB { name: "n9" }; + let p = Pair { x: f4, y: f5 }; // `f4` and `f5` moved into `p` + let _f10 = Df { name: "f10" }; + + println!("foo scope start: {}", (&f3, &n7)); + unsafe { current_indent += 4; } + if b() { + take_and_pass(p.x); // `p.x` consumed by `take_and_pass`, which drops it + } + if b() { + take_and_pass(n7); // `n7` consumed by `take_and_pass`, which drops it + } + + // totally unsafe: manually zero the struct, including its drop flag. + unsafe fn manually_zero(s: &mut S) { + let len = mem::size_of::(); + let p : *mut u8 = mem::transmute(s); + for i in range(0, len) { + *p.offset(i as int) = 0; + } + } + unsafe { + manually_zero(&mut f1); + manually_zero(&mut n2); + } + println!("foo scope end"); + unsafe { current_indent -= 4; } + + // here, we drop each local variable, in reverse order of declaration. + // So we should see the following drop sequence: + // drop(f10), printing "Df f10" + // drop(p) + // ==> drop(p.y), printing "Df f5" + // ==> attempt to drop(and skip) already-dropped p.x, no-op + // drop(_n9), printing "DnB n9" + // drop(_fx), printing "Df transformed" + // attempt to drop already-dropped n7, printing "already-zeroed DnA" + // no drop of `f6` since it was consumed by `xform` + // no drop of `f5` since it was moved into `p` + // no drop of `f4` since it was moved into `p` + // drop(f3), printing "f3" + // attempt to drop manually-zeroed `n2`, printing "already-zeroed DnC" + // attempt to drop manually-zeroed `f1`, no-op. +} + +fn main() { + foo(|| true); +} +``` diff --git a/text/0326-restrict-xXX-to-ascii.md b/text/0326-restrict-xXX-to-ascii.md new file mode 100644 index 00000000000..16d05784631 --- /dev/null +++ b/text/0326-restrict-xXX-to-ascii.md @@ -0,0 +1,343 @@ +- Start Date: 2014-09-26 +- RFC PR: 326 +- Rust Issue: https://github.com/rust-lang/rust/issues/18062 + +# Summary + +In string literal contexts, restrict `\xXX` escape sequences to just +the range of ASCII characters, `\x00` -- `\x7F`. `\xXX` inputs in +string literals with higher numbers are rejected (with an error +message suggesting that one use an `\uNNNN` escape). + +# Motivation +[Motivation]: #motivation + +In a string literal context, the current `\xXX` character escape +sequence is potentially confusing when given inputs greater than +`0x7F`, because it does not encode that byte literally, but instead +encodes whatever the escape sequence `\u00XX` would produce. + +Thus, for inputs greater than `0x7F`, `\xXX` will encode multiple +bytes into the generated string literal, as illustrated in the +[Rust example] appendix. + +This is different from what C/C++ programmers might expect (see +[Behavior of xXX in C] appendix). + +(It would not be legal to encode the single byte literally into the +string literal, since then the string would not be well-formed UTF-8.) + +It has been suggested that the `\xXX` character escape should be +removed entirely (at least from string literal contexts). This RFC is +taking a slightly less aggressive stance: keep `\xXX`, but only for +ASCII inputs when it occurs in string literals. This way, people can +continue using this escape format (which shorter than the `\uNNNN` +format) when it makes sense. + +Here are some links to discussions on this topic, including direct +comments that suggest exactly the strategy of this RFC. + + * https://github.com/rust-lang/rfcs/issues/312 + * https://github.com/rust-lang/rust/issues/12769 + * https://github.com/rust-lang/rust/issues/2800#issuecomment-31477259 + * https://github.com/rust-lang/rfcs/pull/69#issuecomment-43002505 + * https://github.com/rust-lang/rust/issues/12769#issuecomment-43574856 + * https://github.com/rust-lang/meeting-minutes/blob/master/weekly-meetings/2014-01-21.md#xnn-escapes-in-strings + * https://mail.mozilla.org/pipermail/rust-dev/2012-July/002025.html + +Note in particular the meeting minutes bullet, where the team +explicitly decided to keep things "as they are". + +However, at the time of that meeting, Rust did not have byte string +literals; people were converting string-literals into byte arrays via +the `bytes!` macro. (Likewise, the rust-dev post is also from a time, +summer 2012, when we did not have byte-string literals.) + +We are in a different world now. The fact that now `\xXX` denotes a +code unit in a byte-string literal, but in a string literal denotes a +codepoint, does not seem elegant; it rather seems like a source of +confusion. (Caveat: While Felix does believe this assertion, this +context-dependent interpretation of `\xXX` does have precedent +in both Python and Racket; see [Racket example] and [Python example] +appendices.) + +By restricting `\xXX` to the range `0x00`--`0x7F`, we side-step the +question of "is it a code unit or a code point?" entirely (which was +the *real* context of both the rust-dev thread and the meeting minutes +bullet). This RFC is a far more conservative choice that we can +safely make for the short term (i.e. for the 1.0 release) than it +would have been to switch to a "`\xXX` is a code unit" interpretation. + +The expected outcome is reduced confusion for C/C++ programmers (which +is, after all, our primary target audience for conversion), and any +other language where `\xXX` never results in more than one byte. +The error message will point them to the syntax they need to adopt. + +# Detailed design + +In string literal contexts, `\xXX` inputs with `XX > 0x7F` are +rejected (with an error message that mentions either, or both, of +`\uNNNN` escapes and the byte-string literal format `b".."`). + +The full byte range remains supported when `\xXX` is used in +byte-string literals, `b"..."` + +Raw strings by design do not offer escape sequences, so they are +unchanged. + +Character and string escaping routines (such as +`core::char::escape_unicode`, and such as used by the `"{:?}"` +formatter) are updated so that string inputs that previously would +previously have printed `\xXX` with `XX > 0x7F` are updated to use +`\uNNNN` escapes instead. + +# Drawbacks + +Some reasons not to do this: + + * we think that the current behavior is intuitive, + + * it is consistent with language X (and thus has precedent), + + * existing libraries are relying on this behavior, or + + * we want to optimize for inputting characters with codepoints + in the range above `0x7F` in string-literals, rather than + optimizing for ASCII. + +The thesis of this RFC is that the first bullet is a falsehood. + +While there is some precedent for the "`\xXX` is code point" +interpretation in some languages, the [majority] do seem to favor the +"`\xXX` is code unit" point of view. The proposal of this RFC is +side-stepping the distinction by limiting the input range for `\xXX`. + +[majority]: https://mail.mozilla.org/pipermail/rust-dev/2012-July/002025.html + +The third bullet is a strawman since we have not yet released 1.0, and +thus everything is up for change. + +This RFC makes no comment on the validity of the fourth bullet. + +# Alternatives + +* We could remove `\xXX` entirely from string literals. This would + require people to use the `\uNNNN` escape format even for bytes in the + range `00`--`0x7F`, which seems annoying. + +* We could switch `\xXX` from meaning code point to meaning code unit + in both string literal and byte-string literal contexts. This + was previously considered and explicitly rejected in an earlier + meeting, as discussed in the [Motivation] section. + +# Unresolved questions + +None. + +# Appendices + +## Behavior of xXX in C +[Behavior of xXX in C]: #behavior-of-xxx-in-c + +Here is a C program illustrating how `xXX` escape sequences are treated +in string literals in that context: + +```c +#include + +int main() { + char *s; + + s = "a"; + printf("s[0]: %d\n", s[0]); + printf("s[1]: %d\n", s[1]); + + s = "\x61"; + printf("s[0]: %d\n", s[0]); + printf("s[1]: %d\n", s[1]); + + s = "\x7F"; + printf("s[0]: %d\n", s[0]); + printf("s[1]: %d\n", s[1]); + + s = "\x80"; + printf("s[0]: %d\n", s[0]); + printf("s[1]: %d\n", s[1]); + return 0; +} +``` + +Its output is the following: +``` +% gcc example.c && ./a.out +s[0]: 97 +s[1]: 0 +s[0]: 97 +s[1]: 0 +s[0]: 127 +s[1]: 0 +s[0]: -128 +s[1]: 0 +``` + +## Rust example +[Rust example]: #rust-example + +Here is a Rust program that explores the various ways `\xXX` sequences are +treated in both string literal and byte-string literal contexts. + +```rust + #![feature(macro_rules)] + +fn main() { + macro_rules! print_str { + ($r:expr, $e:expr) => { { + println!("{:>20}: \"{}\"", + format!("\"{}\"", $r), + $e.escape_default()) + } } + } + + macro_rules! print_bstr { + ($r:expr, $e:expr) => { { + println!("{:>20}: {}", + format!("b\"{}\"", $r), + $e) + } } + } + + macro_rules! print_bytes { + ($r:expr, $e:expr) => { + println!("{:>9}.as_bytes(): {}", format!("\"{}\"", $r), $e.as_bytes()) + } } + + // println!("{}", b"\u0000"); // invalid: \uNNNN is not a byte escape. + print_str!(r"\0", "\0"); + print_bstr!(r"\0", b"\0"); + print_bstr!(r"\x00", b"\x00"); + print_bytes!(r"\x00", "\x00"); + print_bytes!(r"\u0000", "\u0000"); + println!(""); + print_str!(r"\x61", "\x61"); + print_bstr!(r"a", b"a"); + print_bstr!(r"\x61", b"\x61"); + print_bytes!(r"\x61", "\x61"); + print_bytes!(r"\u0061", "\u0061"); + println!(""); + print_str!(r"\x7F", "\x7F"); + print_bstr!(r"\x7F", b"\x7F"); + print_bytes!(r"\x7F", "\x7F"); + print_bytes!(r"\u007F", "\u007F"); + println!(""); + print_str!(r"\x80", "\x80"); + print_bstr!(r"\x80", b"\x80"); + print_bytes!(r"\x80", "\x80"); + print_bytes!(r"\u0080", "\u0080"); + println!(""); + print_str!(r"\xFF", "\xFF"); + print_bstr!(r"\xFF", b"\xFF"); + print_bytes!(r"\xFF", "\xFF"); + print_bytes!(r"\u00FF", "\u00FF"); + println!(""); + print_str!(r"\u0100", "\u0100"); + print_bstr!(r"\x01\x00", b"\x01\x00"); + print_bytes!(r"\u0100", "\u0100"); +} +``` + +In current Rust, it generates output as follows: +``` +% rustc --version && echo && rustc example.rs && ./example +rustc 0.12.0-pre (d52d0c836 2014-09-07 03:36:27 +0000) + + "\0": "\x00" + b"\0": [0] + b"\x00": [0] + "\x00".as_bytes(): [0] + "\u0000".as_bytes(): [0] + + "\x61": "a" + b"a": [97] + b"\x61": [97] + "\x61".as_bytes(): [97] + "\u0061".as_bytes(): [97] + + "\x7F": "\x7f" + b"\x7F": [127] + "\x7F".as_bytes(): [127] + "\u007F".as_bytes(): [127] + + "\x80": "\x80" + b"\x80": [128] + "\x80".as_bytes(): [194, 128] + "\u0080".as_bytes(): [194, 128] + + "\xFF": "\xff" + b"\xFF": [255] + "\xFF".as_bytes(): [195, 191] + "\u00FF".as_bytes(): [195, 191] + + "\u0100": "\u0100" + b"\x01\x00": [1, 0] + "\u0100".as_bytes(): [196, 128] +% +``` + +Note that the behavior of `\xXX` on byte-string literals matches the +expectations established by the C program in [Behavior of xXX in C]; +that is good. The problem is the behavior of `\xXX` for `XX > 0x7F` +in string-literal contexts, namely in the fourth and fifth examples +where the `.as_bytes()` invocations are showing that the underlying +byte array has two elements instead of one. + +## Racket example +[Racket example]: #racket-example + +``` +% racket +Welcome to Racket v5.93. +> (define a-string "\xbb\n") +> (display a-string) +» +> (bytes-length (string->bytes/utf-8 a-string)) +3 +> (define a-byte-string #"\xc2\xbb\n") +> (bytes-length a-byte-string) +3 +> (display a-byte-string) +» +> (exit) +% +``` + +The above code illustrates that in Racket, the `\xXX` escape sequence +denotes a code unit in byte-string context (`#".."` in that language), +while it denotes a code point in string context (`".."`). + +## Python example +[Python example]: #python-example + +``` +% python +Python 2.7.5 (default, Mar 9 2014, 22:15:05) +[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin +Type "help", "copyright", "credits" or "license" for more information. +>>> a_string = u"\xbb\n"; +>>> print a_string +» + +>>> len(a_string.encode("utf-8")) +3 +>>> a_byte_string = "\xc2\xbb\n"; +>>> len(a_byte_string) +3 +>>> print a_byte_string +» + +>>> exit() +% +``` + +The above code illustrates that in Python, the `\xXX` escape sequence +denotes a code unit in byte-string context (`".."` in that language), +while it denotes a code point in *unicode* string context (`u".."`). diff --git a/text/0339-statically-sized-literals.md b/text/0339-statically-sized-literals.md new file mode 100644 index 00000000000..8b370509d67 --- /dev/null +++ b/text/0339-statically-sized-literals.md @@ -0,0 +1,202 @@ +- Start Date: 2014-09-29 +- RFC PR: [rust-lang/rfcs#339](https://github.com/rust-lang/rfcs/pull/339) +- Rust Issue: [rust-lang/rust#18465](https://github.com/rust-lang/rust/issues/18465) + +# Summary + +Change the types of byte string literals to be references to statically sized types. +Ensure the same change can be performed backward compatibly for string literals in the future. + +# Motivation + +Currently byte string and string literals have types `&'static [u8]` and `&'static str`. +Therefore, although the sizes of the literals are known at compile time, they are erased from their types and inaccessible until runtime. +This RFC suggests to change the type of byte string literals to `&'static [u8, ..N]`. +In addition this RFC suggest not to introduce any changes to `str` or string literals, that would prevent a backward compatible addition of strings of fixed size `FixedString` (the name FixedString in this RFC is a placeholder and is open for bikeshedding) and the change of the type of string literals to `&'static FixedString` in the future. + +`FixedString` is essentially a `[u8, ..N]` with UTF-8 invariants and additional string methods/traits. +It fills the gap in the vector/string chart: + +`Vec` | `String` +---------|-------- +`[T, ..N]` | ??? +`&[T]` | `&str` + +Today, given the lack of non-type generic parameters and compile time (function) evaluation (CTE), strings of fixed size are not very useful. +But after introduction of CTE the need in compile time string operations will raise rapidly. +Even without CTE but with non-type generic parameters alone fixed size strings can be used in runtime for "heapless" string operations, which are useful in constrained environments or for optimization. So the main motivation for changes today is forward compatibility. + +Examples of use for new literals, that are not possible with old literals: + +``` +// Today: initialize mutable array with byte string literal +let mut arr: [u8, ..3] = *b"abc"; +arr[0] = b'd'; + +// Future with CTE: compile time string concatenation +static LANG_DIR: FixedString<5 /*The size should, probably, be inferred*/> = *"lang/"; +static EN_FILE: FixedString<_> = LANG_DIR + *"en"; // FixedString implements Add +static FR_FILE: FixedString<_> = LANG_DIR + *"fr"; + +// Future without CTE: runtime "heapless" string concatenation +let DE_FILE = LANG_DIR + *"de"; // Performed at runtime if not optimized +``` + +# Detailed design + +Change the type of byte string literals from `&'static [u8]` to `&'static [u8, ..N]`. +Leave the door open for a backward compatible change of the type of string literals from `&'static str` to `&'static FixedString`. + +### Strings of fixed size + +If `str` is moved to the library today, then strings of fixed size can be implemented like this: +``` +struct str(T); +``` +Then string literals will have types `&'static str<[u8, ..N]>`. + +Drawbacks of this approach include unnecessary exposition of the implementation - underlying sized or unsized arrays `[u8]`/`[u8, ..N]` and generic parameter `T`. +The key requirement here is the autocoercion from reference to fixed string to string slice an we are unable to meet it now without exposing the implementation. + +In the future, after gaining the ability to parameterize on integers, strings of fixed size could be implemented in a better way: +``` +struct __StrImpl(T); // private + +pub type str = __StrImpl<[u8]>; // unsized referent of string slice `&str`, public +pub type FixedString = __StrImpl<[u8, ..N]>; // string of fixed size, public + +// &FixedString -> &str : OK, including &'static FixedString -> &'static str for string literals +``` +So, we don't propose to make these changes today and suggest to wait until generic parameterization on integers is added to the language. + +### Precedents + +C and C++ string literals are lvalue `char` arrays of fixed size with static duration. +C++ library proposal for strings of fixed size ([link][1]), the paper also contains some discussion and motivation. + +# Rejected alternatives and discussion + +## Array literals + +The types of array literals potentially can be changed from `[T, ..N]` to `&'a [T, ..N]` for consistency with the other literals and ergonomics. +The major blocker for this change is the inability to move out from a dereferenced array literal if `T` is not `Copy`. +``` +let mut a = *[box 1i, box 2, box 3]; // Wouldn't work without special-casing of array literals with regard to moving out from dereferenced borrowed pointer +``` +Despite that array literals as references have better usability, possible `static`ness and consistency with other literals. + +### Usage statistics for array literals + +Array literals can be used both as slices, when a view to array is sufficient to perform the task, and as values when arrays themselves should be copied or modified. +The exact estimation of the frequencies of both uses is problematic, but some regex search in the Rust codebase gives the next statistics: +In approximately *70%* of cases array literals are used as slices (explicit `&` on array literals, immutable bindings). +In approximately *20%* of cases array literals are used as values (initialization of struct fields, mutable bindings, boxes). +In the rest *10%* of cases the usage is unclear. + +So, in most cases the change to the types of array literals will lead to shorter notation. + +### Static lifetime + +Although all the literals under consideration are similar and are essentially arrays of fixed size, array literals are different from byte string and string literals with regard to lifetimes. +While byte string and string literals can always be placed into static memory and have static lifetime, array literals can depend on local variables and can't have static lifetime in general case. +The chosen design potentially allows to trivially enhance *some* array literals with static lifetime in the future to allow use like +``` +fn f() -> &'static [int] { + [1, 2, 3] +} +``` + +## Alternatives + +The alternative design is to make the literals the values and not the references. + +### The changes + +1) +Keep the types of array literals as `[T, ..N]`. +Change the types of byte literals from `&'static [u8]` to `[u8, ..N]`. +Change the types of string literals form `&'static str` to to `FixedString`. +2) +Introduce the missing family of types - strings of fixed size - `FixedString`. +... +3) +Add the autocoercion of array *literals* (not arrays of fixed size in general) to slices. +Add the autocoercion of new byte literals to slices. +Add the autocoercion of new string literals to slices. +Non-literal arrays and strings do not autocoerce to slices, in accordance with the general agreements on explicitness. +4) +Make string and byte literals lvalues with static lifetime. + +Examples of use: +``` +// Today: initialize mutable array with literal +let mut arr: [u8, ..3] = b"abc"; +arr[0] = b'd'; + +// Future with CTE: compile time string concatenation +static LANG_DIR: FixedString<_> = "lang/"; +static EN_FILE: FixedString<_> = LANG_DIR + "en"; // FixedString implements Add +static FR_FILE: FixedString<_> = LANG_DIR + "fr"; + +// Future without CTE: runtime "heapless" string concatenation +let DE_FILE = LANG_DIR + "de"; // Performed at runtime if not optimized +``` + +### Drawbacks of the alternative design + +Special rules about (byte) string literals being static lvalues add a bit of unnecessary complexity to the specification. + +In theory `let s = "abcd";` copies the string from static memory to stack, but the copy is unobservable an can, probably, be elided in most cases. + +The set of additional autocoercions has to exist for ergonomic purpose (and for backward compatibility). +Writing something like: +``` +fn f(arg: &str) {} +f("Hello"[]); +f(&"Hello"); +``` +for all literals would be just unacceptable. + +Minor breakage: +``` +fn main() { + let s = "Hello"; + fn f(arg: &str) {} + f(s); // Will require explicit slicing f(s[]) or implicit DST coersion from reference f(&s) +} +``` + +### Status quo + +Status quo (or partial application of the changes) is always an alternative. + +### Drawbacks of status quo + +Examples: +``` +// Today: can't use byte string literals in some cases +let mut arr: [u8, ..3] = [b'a', b'b', b'c']; // Have to use array literals +arr[0] = b'd'; + +// Future: FixedString is added, CTE is added, but the literal types remain old +let mut arr: [u8, ..3] = b"abc".to_fixed(); // Have to use a conversion method +arr[0] = b'd'; + +static LANG_DIR: FixedString<_> = "lang/".to_fixed(); // Have to use a conversion method +static EN_FILE: FixedString<_> = LANG_DIR + "en".to_fixed(); +static FR_FILE: FixedString<_> = LANG_DIR + "fr".to_fixed(); + +// Bad future: FixedString is not added +// "Heapless"/compile-time string operations aren't possible, or performed with "magic" like extended concat! or recursive macros. +``` +Note, that in the "Future" scenario the return *type* of `to_fixed` depends on the *value* of `self`, so it requires sufficiently advanced CTE, for example C++14 with its powerful `constexpr` machinery still doesn't allow to write such a function. + +# Drawbacks + +None. + +# Unresolved questions + +None. + + [1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4121.pdf diff --git a/text/0341-remove-virtual-structs.md b/text/0341-remove-virtual-structs.md new file mode 100644 index 00000000000..ea7c5a0db8f --- /dev/null +++ b/text/0341-remove-virtual-structs.md @@ -0,0 +1,45 @@ +- Start Date: 2014-09-30 +- RFC PR: https://github.com/rust-lang/rfcs/pull/341 +- Rust Issue: https://github.com/rust-lang/rust/issues/17861 + +# Summary + +Removes the "virtual struct" (aka struct inheritance) feature, which +is currently feature gated. + +# Motivation + +Virtual structs were added experimentally prior to the RFC process as +a way of inheriting fields from one struct when defining a new struct. + +The feature was introduced and remains behind a feature gate. + +The motivations for removing this feature altogether are: + +1. The feature is likely to be replaced by a more general mechanism, + as part of the need to address hierarchies such as the DOM, ASTs, + and so on. See + [this post](http://discuss.rust-lang.org/t/summary-of-efficient-inheritance-rfcs/494/43) + for some recent discussion. + +2. The implementation is somewhat buggy and incomplete, and the + feature is not well-documented. + +3. Although it's behind a feature gate, keeping the feature around is + still a maintenance burden. + +# Detailed design + +Remove the implementation and feature gate for virtual structs. + +Retain the `virtual` keyword as reserved for possible future use. + +# Drawbacks + +The language will no longer offer any built-in mechanism for avoiding +repetition of struct fields. Macros offer a reasonable workaround +until a more general mechanism is added. + +# Unresolved questions + +None known. diff --git a/text/0342-keywords.md b/text/0342-keywords.md new file mode 100644 index 00000000000..a550d908248 --- /dev/null +++ b/text/0342-keywords.md @@ -0,0 +1,35 @@ +- Start Date: 2014-10-07 +- RFC PR: https://github.com/rust-lang/rfcs/pull/342 +- Rust Issue: https://github.com/rust-lang/rust/issues/17862 + +# Summary + +Reserve `abstract`, `final`, and `override` as possible keywords. + +# Motivation + +We intend to add some mechanism to Rust to support more efficient inheritance +(see, e.g., RFC PRs #245 and #250, and this +[thread](http://discuss.rust-lang.org/t/summary-of-efficient-inheritance-rfcs/494/43) +on discuss). Although we have not decided how to do this, we do know that we +will. Any implementation is likely to make use of keywords `virtual` (already +used, to remain reserved), `abstract`, `final`, and `override`, so it makes +sense to reserve these now to make the eventual implementation as backwards +compatible as possible. + +# Detailed design + +Make `abstract`, `final`, and `override` reserved keywords. + +# Drawbacks + +Takes a few more words out of the possible vocabulary of Rust programmers. + +# Alternatives + +Don't do this and deal with it when we have an implementation. This would mean +bumping the language version, probably. + +# Unresolved questions + +N/A diff --git a/text/0344-conventions-galore.md b/text/0344-conventions-galore.md new file mode 100644 index 00000000000..615c2ebfd77 --- /dev/null +++ b/text/0344-conventions-galore.md @@ -0,0 +1,295 @@ +- Start Date: 2014-10-15 +- RFC PR: [rust-lang/rfcs#344](https://github.com/rust-lang/rfcs/pull/344) +- Rust Issue: [rust-lang/rust#18074](https://github.com/rust-lang/rust/issues/18074) + +# Summary + +This is a conventions RFC for settling a number of remaining naming conventions: + +* Referring to types in method names +* Iterator type names +* Additional iterator method names +* Getter/setter APIs +* Associated types +* Trait naming +* Lint naming +* Suffix ordering +* Prelude traits + +It also proposes to standardize on lower case error messages within the compiler +and standard library. + +# Motivation + +As part of the ongoing API stabilization process, we need to settle naming +conventions for public APIs. This RFC is a continuation of that process, +addressing a number of smaller but still global naming issues. + +# Detailed design + +The RFC includes a number of unrelated naming conventions, broken down into +subsections below. + +## Referring to types in method names + +Function names often involve type names, the most common example being conversions +like `as_slice`. If the type has a purely textual name (ignoring parameters), it +is straightforward to convert between type conventions and function conventions: + +Type name | Text in methods +--------- | --------------- +`String` | `string` +`Vec` | `vec` +`YourType`| `your_type` + +Types that involve notation are less clear, so this RFC proposes some standard +conventions for referring to these types. There is some overlap on these rules; +apply the most specific applicable rule. + +Type name | Text in methods +--------- | --------------- +`&str` | `str` +`&[T]` | `slice` +`&mut [T]`| `mut_slice` +`&[u8]` | `bytes` +`&T` | `ref` +`&mut T` | `mut` +`*const T`| `ptr` +`*mut T` | `mut_ptr` + +The only surprise here is the use of `mut` rather than `mut_ref` for mutable +references. This abbreviation is already a fairly common convention +(e.g. `as_ref` and `as_mut` methods), and is meant to keep this very common case +short. + +## Iterator type names + +The current convention for iterator *type* names is the following: + +> Iterators require introducing and exporting new types. These types should use +> the following naming convention: +> +> * **Base name**. If the iterator yields something that can be described with a +> specific noun, the base name should be the pluralization of that noun +> (e.g. an iterator yielding words is called `Words`). Generic contains use the +> base name `Items`. +> +> * **Flavor prefix**. Iterators often come in multiple flavors, with the default +> flavor providing immutable references. Other flavors should prefix their name: +> +> * Moving iterators have a prefix of `Move`. +> * If the default iterator yields an immutable reference, an iterator +> yielding a mutable reference has a prefix `Mut`. +> * Reverse iterators have a prefix of `Rev`. + +(These conventions were established as part of +[this PR](https://github.com/rust-lang/rust/pull/8090) and later +[this one](https://github.com/rust-lang/rust/pull/11001).) + +These conventions have not yet been updated to reflect the +[recent change](https://github.com/rust-lang/rfcs/pull/199) to the iterator +method names, in part to allow for a more significant revamp. There are some +problems with the current rules: + +* They are fairly loose and therefore not mechanical or predictable. In + particular, the choice of noun to use for the base name is completely + arbitrary. + +* They are not always applicable. The `iter` module, for example, defines a + large number of iterator types for use in the adapter methods on `Iterator` + (e.g. `Map` for `map`, `Filter` for `filter`, etc.) The module does not follow + the convention, and it's not clear how it could do so. + +This RFC proposes to instead align the convention with the `iter` module: the +name of an iterator type should be the same as the method that produces the +iterator. + +For example: +* `iter` would yield an `Iter` +* `iter_mut` would yield an `IterMut` +* `into_iter` would yield an `IntoIter` + +These type names make the most sense when prefixed with their owning module, +e.g. `vec::IntoIter`. + +Advantages: + +* The rule is completely mechanical, and therefore highly predictable. + +* The convention can be (almost) universally followed: it applies equally well + to `vec` and to `iter`. + +Disadvantages: + +* `IntoIter` is not an ideal name. Note, however, that since we've moved to + `into_iter` as the method name, the existing convention (`MoveItems`) needs to + be updated to match, and it's not clear how to do better than `IntoItems` in + any case. + +* This naming scheme can result in clashes if multiple containers are defined in + the same module. Note that this is *already* the case with today's + conventions. In most cases, this situation should be taken as an indication + that a more refined module hierarchy is called for. + +## Additional iterator method names + +An [earlier RFC](https://github.com/rust-lang/rfcs/pull/199) settled the +conventions for the "standard" iterator methods: `iter`, `iter_mut`, +`into_iter`. + +However, there are many cases where you also want "nonstandard" iterator +methods: `bytes` and `chars` for strings, `keys` and `values` for maps, +the various adapters for iterators. + +This RFC proposes the following convention: + +* Use `iter` (and variants) for data types that can be viewed as containers, + and where the iterator provides the "obvious" sequence of contained items. + +* If there is no single "obvious" sequence of contained items, or if there are + multiple desired views on the container, provide separate methods for these + that do *not* use `iter` in their name. The name should instead directly + reflect the view/item type being iterated (like `bytes`). + +* Likewise, for iterator adapters (`filter`, `map` and so on) or other + iterator-producing operations (`intersection`), use the clearest name to + describe the adapter/operation directly, and do not mention `iter`. + +* If not otherwise qualified, an iterator-producing method should provide an + iterator over immutable references. Use the `_mut` suffix for variants + producing mutable references, and the `into_` prefix for variants consuming + the data in order to produce owned values. + +## Getter/setter APIs + +Some data structures do not wish to provide direct access to their fields, but +instead offer "getter" and "setter" methods for manipulating the field state +(often providing checking or other functionality). + +The proposed convention for a field `foo: T` is: + +* A method `foo(&self) -> &T` for getting the current value of the field. +* A method `set_foo(&self, val: T)` for setting the field. (The `val` argument + here may take `&T` or some other type, depending on the context.) + +Note that this convention is about getters/setters on ordinary data types, *not* +on [builder objects](http://aturon.github.io/ownership/builders.html). The +naming conventions for builder methods are still open. + +## Associated types + +Unlike type parameters, the *names* of +[associated types](https://github.com/rust-lang/rfcs/pull/195) for a trait are a +meaningful part of its public API. + +Associated types should be given concise, but meaningful names, generally +following the convention for type names rather than generic. For example, use +`Err` rather than `E`, and `Item` rather than `T`. + +## Trait naming + +The wiki guidelines have long suggested naming traits as follows: + +> Prefer (transitive) verbs, nouns, and then adjectives; avoid grammatical suffixes (like `able`) + +Trait names like `Copy`, `Clone` and `Show` follow this convention. The +convention avoids grammatical verbosity and gives Rust code a distinctive flavor +(similar to its short keywords). + +This RFC proposes to amend the convention to further say: if there is a single +method that is the dominant functionality of the trait, consider using the same +name for the trait itself. This is already the case for `Clone` and `ToCStr`, +for example. + +According to these rules, `Encodable` should probably be `Encode`. + +There are some open questions about these rules; see Unresolved Questions below. + +## Lints + +Our lint names are +[not consistent](https://github.com/rust-lang/rust/issues/16545). While this may +seem like a minor concern, when we hit 1.0 the lint names will be locked down, +so it's worth trying to clean them up now. + +The basic rule is: the lint name should make sense when read as "allow +*lint-name*" or "allow *lint-name* items". For example, "allow +`deprecated` items" and "allow `dead_code`" makes sense, while "allow +`unsafe_block`" is ungrammatical (should be plural). + +Specifically, this RFC proposes that: + +* Lint names should state the bad thing being checked for, + e.g. `deprecated`, so that `#[allow(deprecated)]` (items) reads + correctly. Thus `ctypes` is not an appropriate name; `improper_ctypes` is. + +* Lints that apply to arbitrary items (like the stability lints) should just + mention what they check for: use `deprecated` rather than `deprecated_items`. + This keeps lint names short. (Again, think "allow *lint-name* items".) + +* If a lint applies to a specific grammatical class, mention that class and use + the plural form: use `unused_variables` rather than `unused_variable`. + This makes `#[allow(unused_variables)]` read correctly. + +* Lints that catch unnecessary, unused, or useless aspects of code + should use the term `unused`, e.g. `unused_imports`, `unused_typecasts`. + +* Use snake case in the same way you would for function names. + +## Suffix ordering + +Very occasionally, conventions will require a method to have multiple suffixes, +for example `get_unchecked_mut`. When feasible, design APIs so that this +situation does not arise. + +Because it is so rare, it does not make sense to lay out a complete convention +for the order in which various suffixes should appear; no one would be able to +remember it. + +However, the *mut* suffix is so common, and is now entrenched as showing up in +final position, that this RFC does propose one simple rule: if there are +multiple suffixes including `mut`, place `mut` last. + +## Prelude traits + +It is not currently possible to define inherent methods directly on basic data +types like `char` or slices. Consequently, `libcore` and other basic crates +provide one-off traits (like `ImmutableSlice` or `Char`) that are intended to be +implemented solely by these primitive types, and which are included in the +prelude. + +These traits are generally *not* designed to be used for generic programming, +but the fact that they appear in core libraries with such basic names makes it +easy to draw the wrong conclusion. + +This RFC proposes to use a `Prelude` suffix for these basic traits. Since the +traits are, in fact, included in the prelude their names do not generally appear +in Rust programs. Therefore, choosing a longer and clearer name will help avoid +confusion about the intent of these traits, and will avoid namespace polution. + +(There is one important drawback in today's Rust: associated functions in these +traits cannot yet be called directly on the types implementing the traits. These +functions are the one case where you would need to mention the trait by name, +today. Hopefully, this situation will change before 1.0; otherwise we may need a +separate plan for dealing with associated functions.) + +## Error messages + +Error messages -- including those produced by `fail!` and those placed in the +`desc` or `detail` fields of e.g. `IoError` -- should in general be in all lower +case. This applies to both `rustc` and `std`. + +This is already the predominant convention, but there are some inconsistencies. + +# Alternatives + +## Iterator type names + +The iterator type name convention could instead basically stick with today's +convention, but using suffixes instead of prefixes, and `IntoItems` rather than +`MoveItems`. + +# Unresolved questions + +How far should the rules for trait names go? Should we avoid "-er" suffixes, +e.g. have `Read` rather than `Reader`? diff --git a/text/0356-no-module-prefixes.md b/text/0356-no-module-prefixes.md new file mode 100644 index 00000000000..e2ba40a4050 --- /dev/null +++ b/text/0356-no-module-prefixes.md @@ -0,0 +1,68 @@ +- Start Date: 2014-10-15 +- RFC PR: [rust-lang/rfcs#356](https://github.com/rust-lang/rfcs/pull/356) +- Rust Issue: [rust-lang/rust#18073](https://github.com/rust-lang/rust/issues/18073) + +# Summary + +This is a conventions RFC that proposes that the items exported from a module +should *never* be prefixed with that module name. For example, we should have +`io::Error`, not `io::IoError`. + +(An alternative design is included that special-cases overlap with the +`prelude`.) + +# Motivation + +Currently there is no clear prohibition around including the module's name as a +prefix on an exported item, and it is sometimes done for type names that are +feared to be "popular" (like `Error` and `Result` being `IoError` and +`IoResult`) for clarity. + +This RFC include two designs: one that entirely rules out such prefixes, and one +that rules it out *except* for names that overlap with the prelude. Pros/cons +are given for each. + +# Detailed design + +The main rule being proposed is very simple: the items exported from a module +should never be prefixed with the module's name. + +Rationale: + +* Avoids needless stuttering like `io::IoError`. +* Any ambiguity can be worked around: + * Either qualify by the module, i.e. `io::Error`, + * Or rename on import: `use io::Error as IoError`. +* The rule is extremely simple and clear. + +Downsides: + +* The name may already exist in the module wanting to export it. + * If that's due to explicit imports, those imports can be renamed or + module-qualified (see above). + * If that's due to a *prelude* conflict, however, confusion may arise due to + the conventional *global* meaning of identifiers defined in the prelude + (i.e., programmers do not expect prelude imports to be shadowed). + +Overall, the RFC author believes that *if* this convention is adopted, confusion +around redefining prelude names would gradually go away, because (at least for +things like `Result`) we would come to expect it. + +# Alternative design + +An alternative rule would be to never prefix an exported item with the module's +name, *except* for names that are also defined in the prelude, which *must* be +prefixed by the module's name. + +For example, we would have `io::Error` and `io::IoResult`. + +Rationale: + +* Largely the same as the above, but less decisively. +* Avoids confusion around prelude-defined names. + +Downsides: + +* Retains stuttering for some important cases, e.g. custom `Result` types, which + are likely to be fairly common. +* Makes it even more problematic to expand the prelude in the future. diff --git a/text/0369-num-reform.md b/text/0369-num-reform.md new file mode 100644 index 00000000000..14e56c19ec4 --- /dev/null +++ b/text/0369-num-reform.md @@ -0,0 +1,432 @@ +- Start Date: 2014-09-16 +- RFC PR: [rust-lang/rfcs#369](https://github.com/rust-lang/rfcs/pull/369) +- Rust Issue: [rust-lang/rust#18640](https://github.com/rust-lang/rust/issues/18640) + +# Summary + +This RFC is preparation for API stabilization for the `std::num` module. The +proposal is to finish the simplification efforts started in +[@bjz's reversal of the numerics hierarcy](https://github.com/rust-lang/rust/issues/10387). + +Broadly, the proposal is to collapse the remaining numeric hierarchy +in `std::num`, and to provide only limited support for generic +programming (roughly, only over primitive numeric types that vary +based on size). Traits giving detailed numeric hierarchy can and +should be provided separately through the Cargo ecosystem. + +Thus, this RFC proposes to flatten or remove most of the traits +currently provided by `std::num`, and generally to simplify the module +as much as possible in preparation for API stabilization. + +# Motivation + +## History + +Starting in early 2013, there was +[an effort](https://github.com/rust-lang/rust/issues/4819) to design a +comprehensive "numeric hierarchy" for Rust: a collection of traits classifying a +wide variety of numbers and other algebraic objects. The intent was to allow +highly-generic code to be written for algebraic structures and then instantiated +to particular types. + +This hierarchy covered structures like bigints, but also primitive integer and +float types. It was an enormous and long-running community effort. + +Later, [it was recognized](https://github.com/rust-lang/rust/issues/10387) that +building such a hierarchy within `libstd` was misguided: + +> @bjz The API that resulted from #4819 attempted, like Haskell, to blend both +> the primitive numerics and higher level mathematical concepts into one +> API. This resulted in an ugly hybrid where neither goal was adequately met. I +> think the libstd should have a strong focus on implementing fundamental +> operations for the base numeric types, but no more. Leave the higher level +> concepts to libnum or future community projects. + +The `std::num` module has thus been slowly migrating *away* from a large trait +hierarchy toward a simpler one providing just APIs for primitive data types: +this is +[@bjz's reversal of the numerics hierarcy](https://github.com/rust-lang/rust/issues/10387). + +Along side this effort, there are already external numerics packages like +[@bjz's num-rs](https://github.com/bjz/num-rs). + +But we're not finished yet. + +## The current state of affairs + +The `std::num` module still contains quite a few traits that subdivide out +various features of numbers: + +```rust +pub trait Zero: Add { + fn zero() -> Self; + fn is_zero(&self) -> bool; +} + +pub trait One: Mul { + fn one() -> Self; +} + +pub trait Signed: Num + Neg { + fn abs(&self) -> Self; + fn abs_sub(&self, other: &Self) -> Self; + fn signum(&self) -> Self; + fn is_positive(&self) -> bool; + fn is_negative(&self) -> bool; +} + +pub trait Unsigned: Num {} + +pub trait Bounded { + fn min_value() -> Self; + fn max_value() -> Self; +} + +pub trait Primitive: Copy + Clone + Num + NumCast + PartialOrd + Bounded {} + +pub trait Num: PartialEq + Zero + One + Neg + Add + Sub + + Mul + Div + Rem {} + +pub trait Int: Primitive + CheckedAdd + CheckedSub + CheckedMul + CheckedDiv + + Bounded + Not + BitAnd + BitOr + + BitXor + Shl + Shr { + fn count_ones(self) -> uint; + fn count_zeros(self) -> uint { ... } + fn leading_zeros(self) -> uint; + fn trailing_zeros(self) -> uint; + fn rotate_left(self, n: uint) -> Self; + fn rotate_right(self, n: uint) -> Self; + fn swap_bytes(self) -> Self; + fn from_be(x: Self) -> Self { ... } + fn from_le(x: Self) -> Self { ... } + fn to_be(self) -> Self { ... } + fn to_le(self) -> Self { ... } +} + +pub trait FromPrimitive { + fn from_i64(n: i64) -> Option; + fn from_u64(n: u64) -> Option; + + // many additional defaulted methods + // ... +} + +pub trait ToPrimitive { + fn to_i64(&self) -> Option; + fn to_u64(&self) -> Option; + + // many additional defaulted methods + // ... +} + +pub trait NumCast: ToPrimitive { + fn from(n: T) -> Option; +} + +pub trait Saturating { + fn saturating_add(self, v: Self) -> Self; + fn saturating_sub(self, v: Self) -> Self; +} + +pub trait CheckedAdd: Add { + fn checked_add(&self, v: &Self) -> Option; +} + +pub trait CheckedSub: Sub { + fn checked_sub(&self, v: &Self) -> Option; +} + +pub trait CheckedMul: Mul { + fn checked_mul(&self, v: &Self) -> Option; +} + +pub trait CheckedDiv: Div { + fn checked_div(&self, v: &Self) -> Option; +} + +pub trait Float: Signed + Primitive { + // a huge collection of static functions (for constants) and methods + ... +} + +pub trait FloatMath: Float { + // an additional collection of methods +} +``` + +The `Primitive` traits are intended primarily to support a mechanism, +`#[deriving(FromPrimitive)]`, that makes it easy to provide +conversions from numeric types to C-like `enum`s. + +The `Saturating` and `Checked` traits provide operations that provide +special handling for overflow and other numeric errors. + +Almost all of these traits are currently included in the prelude. + +In addition to these traits, the `std::num` module includes a couple +dozen free functions, most of which duplicate methods available though +traits. + +## Where we want to go: a summary + +The goal of this RFC is to refactor the `std::num` hierarchy with the +following goals in mind: + +* Simplicity. + +* *Limited* generic programming: being able to work generically over + the natural classes of *primitive* numeric types that vary only by + size. There should be enough abstraction to support porting + `strconv`, the generic string/number conversion code used in `std`. + +* Minimizing dependencies for `libcore`. For example, it should not + require `cmath`. + +* Future-proofing for external numerics packages. The Cargo ecosystem + should ultimately provide choices of sophisticated numeric + hierarchies, and `std::num` should not get in the way. + +# Detailed design + +## Overview: the new hierarchy + +This RFC proposes to collapse the trait hierarchy in `std::num` to +just the following traits: + +* `Int`, implemented by all primitive integer types (`u8` - `u64`, `i8`-`i64`) + * `UnsignedInt`, implemented by `u8` - `u64` +* `Signed`, implemented by all signed primitive numeric types (`i8`-`i64`, `f32`-`f64`) +* `Float`, implemented by `f32` and `f64` + * `FloatMath`, implemented by `f32` and `f64`, which provides functionality from `cmath` + +These traits inherit from all applicable overloaded operator traits +(from `core::ops`). They suffice for generic programming over several +basic categories of primitive numeric types. + +As designed, these traits include a certain amount of redundancy +between `Int` and `Float`. The Alternatives section shows how this +could be factored out into a separate `Num` trait. But doing so +suggests a level of generic programming that these traits aren't +intended to support. + +The main reason to pull out `Signed` into its own trait is so that it +can be added to the prelude. (Further discussion below.) + +## Detailed definitions + +Below is the full definition of these traits. The functionality +remains largely as it is today, just organized into fewer traits: + +```rust +pub trait Int: Copy + Clone + PartialOrd + PartialEq + + Add + Sub + + Mul + Div + Rem + + Not + BitAnd + BitOr + + BitXor + Shl + Shr +{ + // Constants + fn zero() -> Self; // These should be associated constants when those are available + fn one() -> Self; + fn min_value() -> Self; + fn max_value() -> Self; + + // Deprecated: + // fn is_zero(&self) -> bool; + + // Bit twidling + fn count_ones(self) -> uint; + fn count_zeros(self) -> uint { ... } + fn leading_zeros(self) -> uint; + fn trailing_zeros(self) -> uint; + fn rotate_left(self, n: uint) -> Self; + fn rotate_right(self, n: uint) -> Self; + fn swap_bytes(self) -> Self; + fn from_be(x: Self) -> Self { ... } + fn from_le(x: Self) -> Self { ... } + fn to_be(self) -> Self { ... } + fn to_le(self) -> Self { ... } + + // Checked arithmetic + fn checked_add(self, v: Self) -> Option; + fn checked_sub(self, v: Self) -> Option; + fn checked_mul(self, v: Self) -> Option; + fn checked_div(self, v: Self) -> Option; + fn saturating_add(self, v: Self) -> Self; + fn saturating_sub(self, v: Self) -> Self; +} + +pub trait UnsignedInt: Int { + fn is_power_of_two(self) -> bool; + fn checked_next_power_of_two(self) -> Option; + fn next_power_of_two(self) -> Self; +} + +pub trait Signed: Neg { + fn abs(&self) -> Self; + fn signum(&self) -> Self; + fn is_positive(&self) -> bool; + fn is_negative(&self) -> bool; + + // Deprecated: + // fn abs_sub(&self, other: &Self) -> Self; +} + +pub trait Float: Copy + Clone + PartialOrd + PartialEq + Signed + + Add + Sub + + Mul + Div + Rem +{ + // Constants + fn zero() -> Self; // These should be associated constants when those are available + fn one() -> Self; + fn min_value() -> Self; + fn max_value() -> Self; + + // Classification and decomposition + fn is_nan(self) -> bool; + fn is_infinite(self) -> bool; + fn is_finite(self) -> bool; + fn is_normal(self) -> bool; + fn classify(self) -> FPCategory; + fn integer_decode(self) -> (u64, i16, i8); + + // Float intrinsics + fn floor(self) -> Self; + fn ceil(self) -> Self; + fn round(self) -> Self; + fn trunc(self) -> Self; + fn mul_add(self, a: Self, b: Self) -> Self; + fn sqrt(self) -> Self; + fn powi(self, n: i32) -> Self; + fn powf(self, n: Self) -> Self; + fn exp(self) -> Self; + fn exp2(self) -> Self; + fn ln(self) -> Self; + fn log2(self) -> Self; + fn log10(self) -> Self; + + // Conveniences + fn fract(self) -> Self; + fn recip(self) -> Self; + fn rsqrt(self) -> Self; + fn to_degrees(self) -> Self; + fn to_radians(self) -> Self; + fn log(self, base: Self) -> Self; +} + +// This lives directly in `std::num`, not `core::num`, since it requires `cmath` +pub trait FloatMath: Float { + // Exactly the methods defined in today's version +} +``` + +## Float constants, float math, and `cmath` + +This RFC proposes to: + +* Remove all float constants from the `Float` trait. These constants + are available directly from the `f32` and `f64` modules, and are not + really useful for the kind of generic programming these new traits + are intended to allow. + +* Continue providing various `cmath` functions as methods in the + `FloatMath` trait. Putting this in a separate trait means that + `libstd` depends on `cmath` but `libcore` does not. + +## Free functions + +All of the free functions defined in `std::num` are deprecated. + +## The prelude + +The prelude will only include the `Signed` trait, as the operations it +provides are widely expected to be available when they apply. + +The reason for removing the rest of the traits is two-fold: + +* The remaining operations are relatively uncommon. Note that various + overloaded operators, like `+`, work regardless of this choice. + Those doing intensive work with e.g. floats would only need to + import `Float` and `FloatMath`. + +* Keeping this functionality out of the prelude means that the names + of methods and associated items remain available for external + numerics libraries in the Cargo ecosystem. + +## `strconv`, `FromStr`, `ToStr`, `FromStrRadix`, `ToStrRadix` + +Currently, traits for converting from `&str` and to `String` are both +included, in their own modules, in `libstd`. This is largely due to +the desire to provide `impl`s for numeric types, which in turn relies +on `std::num::strconv`. + +This RFC proposes to: + +* Move the `FromStr` trait into `core::str`. +* Rename the `ToStr` trait to `ToString`, and move it to `collections::string`. +* Break up and revise `std::num::strconv` into separate, *private* + modules that provide the needed functionality for the `from_str` and + `to_string` methods. (Some of this functionality has already + migrated to `fmt` and been deprecated in `strconv`.) +* Move the `FromStrRadix` into `core::num`. +* Remove `ToStrRadix`, which is already deprecated in favor of `fmt`. + +## `FromPrimitive` and friends + +Ideally, the `FromPrimitive`, `ToPrimitive`, `Primitive`, `NumCast` +traits would all be removed in favor of a more principled way of +working with C-like enums. However, such a replacement is outside of +the scope of this RFC, so these traits are left (as `#[experimental]`) +for now. A follow-up RFC proposing a better solution should appear soon. + +In the meantime, see +[this proposal](https://github.com/rust-lang/rust/issues/10418) and +the discussion on +[this issue](https://github.com/rust-lang/rust/issues/10272) about +`Ordinal` for the rough direction forward. + +# Drawbacks + +This RFC somewhat reduces the potential for writing generic numeric +code with `std::num` traits. This is intentional, however: the new +design represents "just enough" generics to cover differently-sized +built-in types, without any attempt at general algebraic abstraction. + +# Alternatives + +The status quo is clearly not ideal, and as explained above there was +a long attempt at providing a more complete numeric hierarchy in `std`. +So *some* collapse of the hierarchy seems desirable. + +That said, there are other possible factorings. We could introduce the +following `Num` trait to factor out commonalities between `Int` and `Float`: + +```rust +pub trait Num: Copy + Clone + PartialOrd + PartialEq + + Add + Sub + + Mul + Div + Rem +{ + fn zero() -> Self; // These should be associated constants when those are available + fn one() -> Self; + fn min_value() -> Self; + fn max_value() -> Self; +} +``` + +However, it's not clear whether this factoring is worth having a more +complex hierarchy, especially because the traits are not intended for +generic programming at that level (and generic programming across +integer and floating-point types is likely to be extremely rare) + +The signed and unsigned operations could be offered on more types, +allowing removal of more traits but a less clear-cut semantics. + + +# Unresolved questions + +This RFC does not propose a replacement for +`#[deriving(FromPrimitive)]`, leaving the relevant traits in limbo +status. (See +[this proposal](https://github.com/rust-lang/rust/issues/10418) and +the discussion on +[this issue](https://github.com/rust-lang/rust/issues/10272) about +`Ordinal` for the rough direction forward.) diff --git a/text/0378-expr-macros.md b/text/0378-expr-macros.md new file mode 100644 index 00000000000..34808c403ed --- /dev/null +++ b/text/0378-expr-macros.md @@ -0,0 +1,75 @@ +- Start Date: 2014-10-09 +- RFC PR #: https://github.com/rust-lang/rfcs/pull/378 +- Rust Issue #: https://github.com/rust-lang/rust/issues/18635 + +Summary +======= + +Parse macro invocations with parentheses or square brackets as expressions no +matter the context, and require curly braces or a semicolon following the +invocation to invoke a macro as a statement. + +Motivation +========== + +Currently, macros that start a statement want to be a whole statement, and so +expressions such as `foo!().bar` don’t parse if they start a statement. The +reason for this is because sometimes one wants a macro that expands to an item +or statement (for example, `macro_rules!`), and forcing the user to add a +semicolon to the end is annoying and easy to forget for long, multi-line +statements. However, the vast majority of macro invocations are not intended to +expand to an item or statement, leading to frustrating parser errors. + +Unfortunately, this is not as easy to resolve as simply checking for an infix +operator after every statement-like macro invocation, because there exist +operators that are both infix and prefix. For example, consider the following +function: + +```rust +fn frob(x: int) -> int { + maybe_return!(x) + // Provide a default value + -1 +} +``` + +Today, this parses successfully. However, if a rule were added to the parser +that any macro invocation followed by an infix operator be parsed as a single +expression, this would still parse successfully, but not in the way expected: it +would be parsed as `(maybe_return!(x)) - 1`. This is an example of how it is +impossible to resolve this ambiguity properly without breaking compatibility. + +Detailed design +=============== + +Treat all macro invocations with parentheses, `()`, or square brackets, `[]`, as +expressions, and never attempt to parse them as statements or items in a block +context unless they are followed directly by a semicolon. Require all +item-position macro invocations to be either invoked with curly braces, `{}`, or +be followed by a semicolon (for consistency). + +This distinction between parentheses and curly braces has precedent in Rust: +tuple structs, which use parentheses, must be followed by a semicolon, while +structs with fields do not need to be followed by a semicolon. Many constructs +like `match` and `if`, which use curly braces, also do not require semicolons +when they begin a statement. + +Drawbacks +========= + +- This introduces a difference between different macro invocation delimiters, + where previously there was no difference. +- This requires the use of semicolons in a few places where it was not necessary + before. + +Alternatives +============ + +- Require semicolons after all macro invocations that aren’t being used as + expressions. This would have the downside of requiring semicolons after every + `macro_rules!` declaration. + +Unresolved questions +==================== + +None. diff --git a/text/0379-remove-reflection.md b/text/0379-remove-reflection.md new file mode 100644 index 00000000000..dea6b12a198 --- /dev/null +++ b/text/0379-remove-reflection.md @@ -0,0 +1,92 @@ +- Start Date: 2014-10-13 +- RFC PR: [rust-lang/rfcs#379](https://github.com/rust-lang/rfcs/pull/379) +- Rust Issue: [rust-lang/rust#18046](https://github.com/rust-lang/rust/issues/18046) + +# Summary + +* Remove reflection from the compiler +* Remove `libdebug` +* Remove the `Poly` format trait as well as the `:?` format specifier + +# Motivation + +In ancient Rust, one of the primary methods of printing a value was via the `%?` +format specifier. This would use reflection at runtime to determine how to print +a type. Metadata generated by the compiler (a `TyDesc`) would be generated to +guide the runtime in how to print a type. One of the great parts about +reflection was that it was quite easy to print any type. No extra burden was +required from the programmer to print something. + +There are, however, a number of cons to this approach: + +* Generating extra metadata for many many types by the compiler can lead to + noticeable increases in compile time and binary size. +* This form of formatting is inherently not speedy. Widespread usage of `%?` led + to misleading benchmarks about formatting in Rust. +* Depending on how metadata is handled, this scheme makes it very difficult to + allow recompiling a library without recompiling downstream dependants. + +Over time, usage off the `?` formatting has fallen out of fashion for the +following reasons: + +* The `deriving`-based infrastructure was improved greatly and has started + seeing much more widespread use, especially for traits like `Clone`. +* The formatting language implementation and syntax has changed. The most common + formatter is now `{}` (an implementation of `Show`), and it is quite common to + see an implementation of `Show` on nearly all types (frequently via + `deriving`). This form of customizable-per-typformatting largely provides the + gap that the original formatting language did not provide, which was limited + to only primitives and `%?`. +* Compiler built-ins, such as `~[T]` and `~str` have been removed from the + language, and runtime reflection on `Vec` and `String` are far less useful + (they just print pointers, not contents). + +As a result, the `:?` formatting specifier is quite rarely used today, and +when it *is* used it's largely for historical purposes and the output is not of +very high quality any more. + +The drawbacks and today's current state of affairs motivate this RFC to +recommend removing this infrastructure entirely. It's possible to add it back in +the future with a more modern design reflecting today's design principles of +Rust and the many language changes since the infrastructure was created. + +# Detailed design + +* Remove all reflection infrastructure from the compiler. I am not personally + super familiar with what exists, but at least these concrete actions will be + taken. + * Remove the `visit_glue` function from `TyDesc`. + * Remove any form of `visit_glue` generation. + * (maybe?) Remove the `name` field of `TyDesc`. +* Remove `core::intrinsics::TyVisitor` +* Remove `core::intrinsics::visit_tydesc` +* Remove `libdebug` +* Remove `std::fmt::Poly` +* Remove the `:?` format specifier in the formatting language syntax. + +# Drawbacks + +The current infrastructure for reflection, although outdated, represents a +significant investment of work in the past which could be a shame to lose. While +present in the git history, this infrastructure has been updated over time, and +it will no longer receive this attention. + +Additionally, given an arbitrary type `T`, it would now be impossible to print +it in literally any situation. Type parameters will now require some bound, such +as `Show`, to allow printing a type. + +These two drawbacks are currently not seen as large enough to outweigh the gains +from reducing the surface area of the `std::fmt` API and reduction in +maintenance load on the compiler. + +# Alternatives + +The primary alternative to outright removing this infrastructure is to preserve +it, but flag it all as `#[experimental]` or feature-gated. The compiler could +require the `fmt_poly` feature gate to be enabled to enable formatting via `:?` +in a crate. This would mean that any backwards-incompatible changes could +continue to be made, and any arbitrary type `T` could still be printed. + +# Unresolved questions + +* Can `core::intrinsics::TyDesc` be removed entirely? diff --git a/text/0380-stabilize-std-fmt.md b/text/0380-stabilize-std-fmt.md new file mode 100644 index 00000000000..65c048954be --- /dev/null +++ b/text/0380-stabilize-std-fmt.md @@ -0,0 +1,478 @@ +- Start Date: 2014-11-12 +- RFC PR: [rust-lang/rfcs#380](https://github.com/rust-lang/rfcs/pull/380) +- Rust Issue: [rust-lang/rust#18904](https://github.com/rust-lang/rust/issues/18904) + +# Summary + +Stabilize the `std::fmt` module, in addition to the related macros and +formatting language syntax. As a high-level summary: + +* Leave the format syntax as-is. +* Remove a number of superfluous formatting traits (renaming a few in the + process). + +# Motivation + +This RFC is primarily motivated by the need to stabilize `std::fmt`. In the past +stabilization has not required RFCs, but the changes envisioned for this module +are far-reaching and modify some parts of the language (format syntax), leading +to the conclusion that this stabilization effort required an RFC. + +# Detailed design + +The `std::fmt` module encompasses more than just the actual +structs/traits/functions/etc defined within it, but also a number of macros and +the formatting language syntax for describing format strings. Each of these +features of the module will be described in turn. + +## Formatting Language Syntax + +The [documented syntax](http://doc.rust-lang.org/std/fmt/#syntax) will not be +changing as-written. All of these features will be accepted wholesale +(considered stable): + +* Usage of `{}` for "format something here" placeholders +* `{{` as an escape for `{` (and vice-versa for `}`) +* Various format specifiers + * fill character for alignment + * actual alignment, left (`<`), center (`^`), and right (`>`). + * sign to print (`+` or `-`) + * minimum width for text to be printed + * both a literal count and a runtime argument to the format string + * precision or maximum width + * all of a literal count, a specific runtime argument to the format string, + and "the next" runtime argument to the format string. + * "alternate formatting" (`#`) + * leading zeroes (`0`) +* Integer specifiers of what to format (`{0}`) +* Named arguments (`{foo}`) + +### Using Format Specifiers + +While quite useful occasionally, there is no static guarantee that any +implementation of a formatting trait actually respects the format specifiers +passed in. For example, this code does not necessarily work as expected: + +```rust +#[deriving(Show)] +struct A; + +format!("{:10}", A); +``` + +All of the primitives for rust (strings, integers, etc) have implementations of +`Show` which respect these formatting flags, but almost no other implementations +do (notably those generated via `deriving`). + +This RFC proposes stabilizing the formatting flags, despite this current state +of affairs. There are in theory possible alternatives in which there is a +static guarantee that a type does indeed respect format specifiers when one is +provided, generating a compile-time error when a type doesn't respect a +specifier. These alternatives, however, appear to be too heavyweight and are +considered somewhat overkill. + +In general it's trivial to respect format specifiers if an implementation +delegates to a primitive or somehow has a buffer of what's to be formatted. To +cover these two use cases, the `Formatter` structure passed around has helper +methods to assist in formatting these situations. This is, however, quite rare +to fall into one of these two buckets, so the specifiers are largely ignored +(and the formatter is `write!`-n to directly). + +### Named Arguments + +Currently Rust does not support named arguments anywhere *except* for format +strings. Format strings can get away with it because they're all part of a macro +invocation (unlike the rest of Rust syntax). + +The worry for stabilizing a named argument syntax for the formatting language is +that if Rust ever adopts named arguments with a *different* syntax, it would be +quite odd having two systems. + +The most recently proposed [keyword argument +RFC](https://github.com/rust-lang/rfcs/pull/257) used `:` for the invocation +syntax rather than `=` as formatting does today. Additionally, today `foo = bar` +is a valid expression, having a value of type `()`. + +With these worries, there are one of two routes that could be pursued: + +1. The `expr = expr` syntax could be disallowed on the language level. This + could happen both in a total fashion or just allowing the expression + appearing as a function argument. For both cases, this will probably be + considered a "wart" of Rust's grammar. +2. The `foo = bar` syntax could be allowed in the macro with prior knowledge + that the default argument syntax for Rust, if one is ever developed, will + likely be different. This would mean that the `foo = bar` syntax in + formatting macros will likely be considered a wart in the future. + +Given these two cases, the clear choice seems to be accepting a wart in the +formatting macros themselves. It will likely be possible to extend the macro in +the future to support whatever named argument syntax is developed as well, and +the old syntax could be accepted for some time. + +## Formatting Traits + +Today there are 16 formatting traits. Each trait represents a "type" of +formatting, corresponding to the `[type]` production in the formatting syntax. +As a bit of history, the original intent was for each trait to declare what +specifier it used, allowing users to add more specifiers in newer crates. For +example the `time` crate could provide the `{:time}` formatting trait. This +design was seen as too complicated, however, so it was not landed. It does, +however, partly motivate why there is one trait per format specifier today. + +The 16 formatting traits and their format specifiers are: + +* *nothing* ⇒ `Show` +* `d` ⇒ `Signed` +* `i` ⇒ `Signed` +* `u` ⇒ `Unsigned` +* `b` ⇒ `Bool` +* `c` ⇒ `Char` +* `o` ⇒ `Octal` +* `x` ⇒ `LowerHex` +* `X` ⇒ `UpperHex` +* `s` ⇒ `String` +* `p` ⇒ `Pointer` +* `t` ⇒ `Binary` +* `f` ⇒ `Float` +* `e` ⇒ `LowerExp` +* `E` ⇒ `UpperExp` +* `?` ⇒ `Poly` + +This RFC proposes removing the following traits: + +* `Signed` +* `Unsigned` +* `Bool` +* `Char` +* `String` +* `Float` + +Note that this RFC would like to remove `Poly`, but that is covered by [a +separate RFC](https://github.com/rust-lang/rfcs/pull/379). + +Today by far the most common formatting trait is `Show`, and over time the +usefulness of these formatting traits has been reduced. The traits this RFC +proposes to remove are only assertions that the type provided actually +implements the trait, there are few known implementations of the traits which +diverge on how they are implemented. + +Additionally, there are a two of oddities inherited from ancient C: + +* Both `d` and `i` are wired to `Signed` +* One may reasonable expect the `Binary` trait to use `b` as its specifier. + +The remaining traits this RFC recommends leaving. The rationale for this is that +they represent alternate representations of primitive types in general, and are +also quite often expected when coming from other format syntaxes such as +C/Python/Ruby/etc. + +It would, of course, be possible to re-add any of these traits in a +backwards-compatible fashion. + +### Format type for `Binary` + +With the removal of the `Bool` trait, this RFC recommends renaming the specifier +for `Binary` to `b` instead of `t`. + +### Combining all traits + +A possible alternative to having many traits is to instead have one trait, such +as: + +```rust +pub trait Show { + fn fmt(...); + fn hex(...) { fmt(...) } + fn lower_hex(...) { fmt(...) } + ... +} +``` + +There are a number of pros to this design: + +* Instead of having to consider many traits, only one trait needs to be + considered. +* All types automatically implement all format types or zero format types. +* In a hypothetical world where a format string could be constructed at runtime, + this would alleviate the signature of such a function. The concrete type taken + for all its arguments would be `&Show` and then if the format string supplied + `:x` or `:o` the runtime would simply delegate to the relevant trait method. + +There are also a number of cons to this design, which motivate this RFC +recommending the remaining separation of these traits. + +* The "static assertion" that a type implements a relevant format trait becomes + almost nonexistent because all types either implement none or all formatting + traits. +* The documentation for the `Show` trait becomes somewhat overwhelming because + it's no longer immediately clear which method should be overridden for what. +* A hypothetical world with runtime format string construction could find a + different system for taking arguments. + +### Method signature + +Currently, each formatting trait has a signature as follows: + +```rust +fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result; +``` + +This implies that all formatting is considered to be a stream-oriented operation +where `f` is a sink to write bytes to. The `fmt::Result` type indicates that +some form of "write error" happened, but conveys no extra information. + +This API has a number of oddities: + +* The type `Formatter` has inherent `write` and `write_fmt` methods to be used + in conjuction with the `write!` macro return an instance of `fmt::Result`. +* The `Formatter` type also implements the `std::io::Writer` trait in order to + be able to pass around a `&mut Writer`. +* This relies on the duck-typing of macros and for the inherent `write_fmt` + method to trump the `Writer`'s `write_fmt` method in order to return an error + of the correct type. +* The `Result` return type is an enumeration with precisely one variant, + `FormatError`. + +Overall, this signature seems to be appropriate in terms of "give me a sink of +bytes to write myself to, and let me return an error if one happens". Due to +this, this RFC recommends that all formatting traits be marked `#[unstable]`. + +## Macros + +There are a number of prelude macros which interact with the format syntax: + +* `format_args` +* `format_args_method` +* `write` +* `writeln` +* `print` +* `println` +* `format` +* `fail` +* `assert` +* `debug_assert` + +All of these are `macro_rules!`-defined macros, except for `format_args` and +`format_args_method`. + +### Common syntax + +All of these macros take some form of prefix, while the trailing suffix is +always some instantiation of the formatting syntax. The suffix portion is +recommended to be considered `#[stable]`, and the sections below will discuss +each macro in detail with respect to its prefix and semantics. + +### format_args + +The fundamental purpose of this macro is to generate a value of type +`&fmt::Arguments` which represents a pending format computation. This structure +can then be passed at some point to the methods in `std::fmt` to actually +perform the format. + +The prefix of this macro is some "callable thing", be it a top-level function or +a closure. It cannot invoke a method because `foo.bar` is not a "callable thing" +to call the `bar` method on `foo`. + +Ideally, this macro would have no prefix, and would be callable like: + +```rust +use std::fmt; + +let args = format_args!("Hello {}!", "world"); +let hello_world = fmt::format(args); +``` + +Unfortunately, without an implementation of [RFC 31][rfc-31] this is not +possible. As a result, this RFC proposes a `#[stable]` consideration of this +macro and its syntax. + +[rfc-31]: https://github.com/rust-lang/rfcs/blob/master/active/0031-better-temporary-lifetimes.md + +### format_args_method + +The purpose of this macro is to solve the "call this method" case not covered +with the `format_args` macro. This macro was introduced fairly late in the game +to solve the problem that `&*trait_object` was not allowed. This is currently +allowed, however (due to DST). + +This RFC proposes immediately removing this macro. The primary user of this +macro is `write!`, meaning that the following code, which compiles today, would +need to be rewritten: + +```rust +let mut output = std::io::stdout(); +// note the lack of `&mut` in front +write!(output, "hello {}", "world"); +``` + +The `write!` macro would be redefined as: + +```rust +macro_rules! write( + ($dst:expr, $($arg:tt)*) => ({ + let dst = &mut *$dst; + format_args!(|args| { dst.write_fmt(args) }, $($arg)*) + }) +) +``` + +The purpose here is to borrow `$dst` *outside* of the closure to ensure that the +closure doesn't borrow too many of its contents. Otherwise, code such as this +would be disallowed + +```rust +write!(&mut my_struct.writer, "{}", my_struct.some_other_field); +``` + +### write/writeln + +These two macros take the prefix of "some pointer to a writer" as an argument, +and then format data into the write (returning whatever `write_fmt` returns). +These macros were originally designed to require a `&mut T` as the first +argument, but today, due to the usage of `format_args_method`, they can take any +`T` which responds to `write_fmt`. + +This RFC recommends marking these two macros `#[stable]` with the modification +above (removing `format_args_method`). The `ln` suffix to `writeln` will be +discussed shortly. + +### print/println + +These two macros take no prefix, and semantically print to a *task-local* stdout +stream. The purpose of a task-local stream is provide some form of buffering to +make stdout printing at all performant. + +This RFC recommends marking these two macros a `#[stable]`. + +#### The `ln` suffix + +The name `println` is one of the few locations in Rust where a short C-like +abbreviation is accepted rather than the more verbose, but clear, `print_line` +(for example). Due to the overwhelming precedent of other languages (even Java +uses `println`!), this is seen as an acceptable special case to the rule. + +### format + +This macro takes no prefix and returns a `String`. + +In ancient rust this macro was called its shorter name, `fmt`. Additionally, the +name `format` is somewhat inconsistent with the module name of `fmt`. Despite +this, this RFC recommends considering this macro `#[stable]` due to its +delegation to the `format` method in the `std::fmt` module, similar to how the +`write!` macro delegates to the `fmt::write`. + +### fail/assert/debug_assert + +The format string portions of these macros are recommended to be considered as +`#[stable]` as part of this RFC. The actual stability of the macros is not +considered as part of this RFC. + +## Freestanding Functions + +There are a number of [freestanding +functions](http://doc.rust-lang.org/std/fmt/index.html#functions) to consider in +the `std::fmt` module for stabilization. + +* `fn format(args: &Arguments) -> String` + + This RFC recommends `#[experimental]`. This method is largely an + implementation detail of this module, and should instead be used via: + + ```rust + let args: &fmt::Arguments = ...; + format!("{}", args) + ``` + +* `fn write(output: &mut FormatWriter, args: &Arguments) -> Result` + + This is somewhat surprising in that the argument to this function is not a + `Writer`, but rather a `FormatWriter`. This is technically speaking due to the + core/std separation and how this function is defined in core and `Writer` is + defined in std. + + This RFC recommends marking this function `#[experimental]` as the + `write_fmt` exists on `Writer` to perform the corresponding operation. + Consequently we may wish to remove this function in favor of the `write_fmt` + method on `FormatWriter`. + + Ideally this method would be removed from the public API as it is just an + implementation detail of the `write!` macro. + +* `fn radix(x: T, base: u8) -> RadixFmt` + + This function is a bit of an odd-man-out in that it is a constructor, but does + not follow the existing conventions of `Type::new`. The purpose of this + function is to expose the ability to format a number for any radix. The + default format specifiers `:o`, `:x`, and `:t` are essentially shorthands for + this function, except that the format types have specialized implementations + per radix instead of a generic implementation. + + This RFC proposes that this function be considered `#[unstable]` as its + location and naming are a bit questionable, but the functionality is desired. + +## Miscellaneous items + +* `trait FormatWriter` + + This trait is currently the actual implementation strategy of formatting, and + is defined specially in libcore. It is rarely used outside of libcore. It is + recommended to be `#[experimental]`. + + There are possibilities in moving `Reader` and `Writer` to libcore with the + error type as an associated item, allowing the `FormatWriter` trait to be + eliminated entirely. Due to this possibility, the trait will be experimental + for now as alternative solutions are explored. + +* `struct Argument`, `mod rt`, `fn argument`, `fn argumentstr`, + `fn argumentuint`, `Arguments::with_placeholders`, `Arguments::new` + + These are implementation details of the `Arguments` structure as well as the + expansion of the `format_args!` macro. It's recommended to mark these as + `#[experimental]` and `#[doc(hidden)]`. Ideally there would be some form of + macro-based privacy hygiene which would allow these to be truly private, but + it will likely be the case that these simply become stable and we must live + with them forever. + +* `struct Arguments` + + This is a representation of a "pending format string" which can be used to + safely execute a `Formatter` over it. This RFC recommends `#[stable]`. + +* `struct Formatter` + + This instance is passed to all formatting trait methods and contains helper + methods for respecting formatting flags. This RFC recommends `#[unstable]`. + + This RFC also recommends deprecating all public fields in favor of accessor + methods. This should help provide future extensibility as well as preventing + unnecessary mutation in the future. + +* `enum FormatError` + + This enumeration only has one instance, `WriteError`. It is recommended to + make this a `struct` instead and rename it to just `Error`. The purpose of + this is to signal that an error has occurred as part of formatting, but it + does not provide a generic method to transmit any other information other than + "an error happened" to maintain the ergonomics of today's usage. It's strongly + recommended that implementations of `Show` and friends are infallible and only + generate an error if the underlying `Formatter` returns an error itself. + +* `Radix`/`RadixFmt` + + Like the `radix` function, this RFC recommends `#[unstable]` for both of these + pieces of functionality. + +# Drawbacks + +Today's macro system necessitates exporting many implementation details of the +formatting system, which is unfortunate. + +# Alternatives + +A number of alternatives were laid out in the detailed description for various +aspects. + +# Unresolved questions + +* How feasible and/or important is it to construct a format string at runtime + given the recommend stability levels in this RFC? diff --git a/text/0385-module-system-cleanup.md b/text/0385-module-system-cleanup.md new file mode 100644 index 00000000000..875c0780e14 --- /dev/null +++ b/text/0385-module-system-cleanup.md @@ -0,0 +1,188 @@ +# Module system cleanups + +- Start Date: 2014-10-10 +- RFC PR: [rust-lang/rfcs#385](https://github.com/rust-lang/rfcs/pull/385) +- Rust Issue: [rust-lang/rust#18219](https://github.com/rust-lang/rust/issues/18219) + +# Summary + +- Lift the hard ordering restriction between `extern crate`, `use` and other items. +- Allow `pub extern crate` as opposed to only private ones. +- Allow `extern crate` in blocks/functions, and not just in modules. + +# Motivation + +The main motivation is consistency and simplicity: +None of the changes proposed here change the module system in any meaningful way, +they just remove weird forbidden corner cases that are all already possible to express today with workarounds. + +Thus, they make it easier to learn the system for beginners, and easier to for developers to evolve their module hierarchies + +## Lifting the ordering restriction between `extern crate`, `use` and other items. + +Currently, certain items need to be written in a fixed order: First all `extern crate`, then all `use` and then all other items. +This has historically reasons, due to the older, more complex resolution algorithm, which included that shadowing was allowed between those items in that order, +and usability reasons, as it makes it easy to locate imports and library dependencies. + +However, after [RFC 50](https://github.com/rust-lang/rfcs/blob/master/complete/0050-no-module-shadowing.md) got accepted there +is only ever one item name in scope from any given source so the historical "hard" reasons loose validity: +Any resolution algorithm that used to first process all `extern crate`, then all `use` and then all items can still do so, it +just has to filter out the relevant items from the whole module body, rather then from sequential sections of it. +And any usability reasons for keeping the order can be better addressed with conventions and lints, rather than hard parser rules. + +(The exception here are the special cased prelude, and globs and macros, which are feature gated and out of scope for this proposal) + +As it is, today the ordering rule is a unnecessary complication, as it routinely causes beginner to stumble over things like this: + +```rust +mod foo; +use foo::bar; // ERROR: Imports have to precede items +``` + +In addition, it doesn't even prevent certain patterns, as it is possible to work around the order restriction by using a submodule: + +```rust +struct Foo; +// One of many ways to expose the crate out of order: +mod bar { extern crate bar; pub use self::bar::x; pub use self::bar::y; ... } +``` + +Which with this RFC implemented would be identical to + +```rust +struct Foo; +extern crate bar; +``` + +Another use case are item macros/attributes that want to automatically include their their crate dependencies. +This is possible by having the macro expand to an item that links to the needed crate, eg like this: + +```rust +#[my_attribute] +struct UserType; +``` + +Expands to: + +```rust +struct UserType; +extern crate "MyCrate" as +impl ::MyTrait for UserType { ... } +``` + +With the order restriction still in place, this requires the sub module workaround, which is unnecessary verbose. + +As an example, [gfx-rs](https://github.com/gfx-rs/gfx-rs) currently employs this strategy. + +## Allow `pub extern crate` as opposed to only private ones. + +`extern crate` semantically is somewhere between `use`ing a module, and declaring one with `mod`, +and is identical to both as far as as the module path to it is considered. +As such, its surprising that its not possible to declare a `extern crate` as public, +even though you can still make it so with an reexport: + +```rust + +mod foo { + extern crate "bar" as bar_; + pub use bar_ as bar; +} + +``` + +While its generally not neccessary to export a extern library directly, the need for it does arise +occasionally during refactorings of huge crate collections, +generally if a public module gets turned into its own crate. + +As an example,the author recalls stumbling over it during a refactoring of gfx-rs. + +## Allow `extern crate` in blocks/functions, and not just in modules. + +Similar to the point above, its currently possible to both import and declare a module in a +block expression or function body, but not to link to an library: + +```rust +fn foo() { + let x = { + extern crate qux; // ERROR: Extern crate not allowed here + use bar::baz; // OK + mod bar { ... }; // OK + qux::foo() + }; +} +``` + +This is again a unnecessary restriction considering that you can declare modules and imports there, +and thus can make an extern library reachable at that point: + +```rust +fn foo() { + let x = { + mod qux { extern crate "qux" as qux_; pub use self::qux_ as qux; } + qux::foo() + }; +} +``` + +This again benefits macros and gives the developer the power to place external dependencies +only needed for a single function lexically near it. + +## General benefits + +In general, the simplification and freedom added by these changes +would positively effect the docs of Rusts module system (which is already often regarded as too complex by outsiders), +and possibly admit other simplifications or RFCs based on the now-equality of view items and items in the module system. + +(As an example, the author is considering an RFC about merging the `use` and `type` features; +by lifting the ordering restriction they become more similar and thus more redundant) + +This also does not have to be a 1.0 feature, as it is entirely backwards compatible to implement, +and strictly allows more programs to compile than before. +However, as alluded to above it might be a good idea for 1.0 regardless + +# Detailed design + +- Remove the ordering restriction from resolve +- If necessary, change resolve to look in the whole scope block for view items, not just in a prefix of it. +- Make `pub extern crate` parse and teach privacy about it +- Allow `extern crate` view items in blocks + +# Drawbacks + +- The source of names in scope might be harder to track down +- Similarly, it might become confusing to see when a library dependency exist. + +However, these issues already exist today in one form or another, and can be addressed by proper +docs that make library dependencies clear, and by the fact that definitions are generally greppable in a file. + +# Alternatives + +As this just cleans up a few aspects of the module system, there isn't really an alternative +apart from not or only partially implementing it. + +By not implementing this proposal, the module system remains more complex for the user than necessary. + +# Unresolved questions + +- Inner attributes occupy the same syntactic space as items and view items, and are currently + also forced into a given order by needing to be written first. + This is also potentially confusing or restrictive for the same reasons as for the view items + mentioned above, especially in regard to the build-in crate attributes, and has one big issue: + It is currently not possible to load a syntax extension + that provides an crate-level attribute, as with the current macro system this would have to be written like this: + + ``` + #[phase(plugin)] + extern crate mycrate; + #![myattr] + ``` + + Which is impossible to write due to the ordering restriction. + However, as attributes and the macro system are also not finalized, this has not been included in + this RFC directly. +- This RFC does also explicitly not talk about wildcard imports and macros in regard to resolution, + as those are feature gated today and likely subject to change. In any case, it seems unlikely that + they will conflict with the changes proposed here, as macros would likely follow + the same module system rules where possible, and wildcard imports would + either be removed, or allowed in a way that doesn't conflict with explicitly imported names to + prevent compilation errors on upstream library changes (new public item may not conflict with downstream items). diff --git a/text/0387-higher-ranked-trait-bounds.md b/text/0387-higher-ranked-trait-bounds.md new file mode 100644 index 00000000000..0a716ffc8b7 --- /dev/null +++ b/text/0387-higher-ranked-trait-bounds.md @@ -0,0 +1,283 @@ +- Start Date: 2014-10-10 +- RFC PR: [rust-lang/rfcs#387](https://github.com/rust-lang/rfcs/pull/387) +- Rust Issue: [rust-lang/rust#18639](https://github.com/rust-lang/rust/issues/18639) + +# Summary + +- Add the ability to have trait bounds that are polymorphic over lifetimes. + +# Motivation + +Currently, closure types can be polymorphic over lifetimes. But +closure types are deprecated in favor of traits and object types as +part of RFC #44 (unboxed closures). We need to close the gap. The +canonical example of where you want this is if you would like a +closure that accepts a reference with any lifetime. For example, +today you might write: + +```rust +fn with(callback: |&Data|) { + let data = Data { ... }; + callback(&data) +} +``` + +If we try to write this using unboxed closures today, we have a problem: + +``` +fn with<'a, T>(callback: T) + where T : FnMut(&'a Data) +{ + let data = Data { ... }; + callback(&data) +} + +// Note that the `()` syntax is shorthand for the following: +fn with<'a, T>(callback: T) + where T : FnMut<(&'a Data,),()> +{ + let data = Data { ... }; + callback(&data) +} +``` + +The problem is that the argument type `&'a Data` must include a +lifetime, and there is no lifetime one could write in the fn sig that +represents "the stack frame of the `with` function". Naturally +we have the same problem if we try to use an `FnMut` object (which is +the closer analog to the original closure example): + +```rust +fn with<'a>(callback: &mut FnMut(&'a Data)) +{ + let data = Data { ... }; + callback(&data) +} + +fn with<'a>(callback: &mut FnMut<(&'a Data,),()>) +{ + let data = Data { ... }; + callback(&data) +} +``` + +Under this proposal, you would be able to write this code as follows: + +``` +// Using the FnMut(&Data) notation, the &Data is +// in fact referencing an implicit bound lifetime, just +// as with closures today. +fn with(callback: T) + where T : FnMut(&Data) +{ + let data = Data { ... }; + callback(&data) +} + +// If you prefer, you can use an explicit name, +// introduced by the `for<'a>` syntax. +fn with(callback: T) + where T : for<'a> FnMut(&'a Data) +{ + let data = Data { ... }; + callback(&data) +} + +// No sugar at all. +fn with(callback: T) + where T : for<'a> FnMut<(&'a Data,),()> +{ + let data = Data { ... }; + callback(&data) +} +``` + +And naturally the object form(s) work as well: + +```rust +// The preferred notation, using `()`, again introduces +// implicit binders for omitted lifetimes: +fn with(callback: &mut FnMut(&Data)) +{ + let data = Data { ... }; + callback(&data) +} + +// Explicit names work too. +fn with(callback: &mut for<'a> FnMut(&'a Data)) +{ + let data = Data { ... }; + callback(&data) +} + +// The fully explicit notation requires an explicit `for`, +// as before, to declare the bound lifetimes. +fn with(callback: &mut for<'a> FnMut<(&'a Data,),()>) +{ + let data = Data { ... }; + callback(&data) +} +``` + +The syntax for `fn` types must be updated as well to use `for`. + +# Detailed design + +## For syntax + +We modify the grammar for a trait reference to include + + for Trait + for Trait(T1, ..., tn) -> Tr + +This syntax can be used in where clauses and types. The `for` syntax +is not permitted in impls nor in qualified paths (``). In +impls, the distinction between early and late-bound lifetimes are +inferred. In qualified paths, which are used to select a member from +an impl, no bound lifetimes are permitted. + +## Update syntax of fn types + +The existing bare fn types will be updated to use the same `for` +notation. Therefore, `<'a> fn(&'a int)` becomes `for<'a> fn(&'a int)`. + +## Implicit binders when using parentheses notation and in fn types + +When using the `Trait(T1, ..., Tn)` notation, implicit binders are +introduced for omitted lifetimes. In other words, `FnMut(&int)` is +effectively shorthand for `for<'a> FnMut(&'a int)`, which is itself +shorthand for `for<'a> FnMut<(&'a int,),()>`. No implicit binders are +introduced when not using the parentheses notation (i.e., +`Trait`). These binders interact with lifetime elision in +the usual way, and hence `FnMut(&Foo) -> &Bar` is shorthand for +`for<'a> FnMut(&'a Foo) -> &'a Bar`. The same is all true (and already +true) for fn types. + +## Distinguishing early vs late bound lifetimes in impls + +We will distinguish early vs late-bound lifetimes on impls in the same +way as we do for fns. Background on this process can be found in these +two blog posts \[[1][1], [2][2]\]. The basic idea is to distinguish +early-bound lifetimes, which must be substituted immediately, from +late-bound lifetimes, which can be made into a higher-ranked trait +reference. + +The rule is that any lifetime parameter `'x` declared on an impl is +considered *early bound* if `'x` appears in any of the following locations: + +- the self type of the impl; +- a where clause associated with the impl (here we assume that all bounds on + impl parameters are desugared into where clauses). + +All other lifetimes are considered *late bound*. + +When we decide what kind of trait-reference is *provided* by an impl, +late bound lifetimes are moved into a `for` clause attached to the +reference. Here are some examples: + +```rust +// Here 'late does not appear in any where clause nor in the self type, +// and hence it is late-bound. Thus this impl is considered to provide: +// +// SomeType : for<'late> FnMut<(&'late Foo,),()> +impl<'late> FnMut(&'late Foo) -> Bar for SomeType { ... } + +// Here 'early appears in the self type and hence it is early bound. +// This impl thus provides: +// +// SomeOtherType<'early> : FnMut<(&'early Foo,),()> +impl<'early> FnMut(&'early Foo) -> Bar for SomeOtherType<'early> { ... } +``` + +This means that if there were a consumer that required a type which +implemented `FnMut(&Foo)`, only `SomeType` could be used, not +`SomeOtherType`: + +```rust +fn foo(t: T) where T : FnMut(&Foo) { ... } + +foo::(...) // ok +foo::>(...) // not ok +``` + +[1]: http://smallcultfollowing.com/babysteps/blog/2013/10/29/intermingled-parameter-lists/ +[2]: http://smallcultfollowing.com/babysteps/blog/2013/11/04/intermingled-parameter-lists/ + +## Instantiating late-bound lifetimes in a trait reference + +Whenever +an associated item from a trait reference is accessed, all late-bound +lifetimes are instantiated. This means basically when a method is +called and so forth. Here are some examples: + + fn foo<'b,T:for<'a> FnMut(&'a &'b Foo)>(t: T) { + t(...); // here, 'a is freshly instantiated + t(...); // here, 'a is freshly instantiated again + } + +Other times when a late-bound lifetime would be instantiated: + +- Accessing an associated constant, once those are implemented. +- Accessing an associated type. + +Another way to state these rules is that bound lifetimes are not +permitted in the traits found in qualified paths -- and things like +method calls and accesses to associated items can all be desugared +into calls via qualified paths. For example, the call `t(...)` above +is equivalent to: + + fn foo<'b,T:for<'a> FnMut(&'a &'b Foo)>(t: T) { + // Here, per the usual rules, the omitted lifetime on the outer + // reference will be instantiated with a fresh variable. + ::call_mut(&mut t, ...); + ::call_mut(&mut t, ...); + } + +## Subtyping of trait references + +The subtyping rules for trait references that involve higher-ranked +lifetimes will be defined in an analogous way to the current subtyping +rules for closures. The high-level idea is to replace each +higher-ranked lifetime with a skolemized variable, perform the usual +subtyping checks, and then check whether those skolemized variables +would be being unified with anything else. The interested reader is +referred to +[Simon Peyton-Jones rather thorough but quite readable paper on the topic][spj] +or the documentation in +`src/librustc/middle/typeck/infer/region_inference/doc.rs`. + +The most important point is that the rules provide for subtyping that +goes from "more general" to "less general". For example, if I have a +trait reference like `for<'a> FnMut(&'a int)`, that would be usable +wherever a trait reference with a concrete lifetime, like +`FnMut(&'static int)`, is expected. + +[spj]: http://research.microsoft.com/en-us/um/people/simonpj/papers/higher-rank/ + +# Drawbacks + +This feature is needed. There isn't really any particular drawback beyond +language complexity. + +# Alternatives + +**Drop the keyword.** The `for` keyword is used due to potential +ambiguities surrounding UFCS notation. Under UFCS, it is legal to +write e.g. `::Foo::Bar` in a type context. This is awfully close to +something like `<'a> ::std::FnMut`. Currently, the parser could +probably use the lifetime distinction to know the difference, but +future extensions (see next paragraph) could allow types to be used as +well, and it is still possible we will opt to "drop the tick" in +lifetimes. Moreover, the syntax `<'a> FnMut(&'a uint)` is not exactly +beautiful to begin with. + +**Permit higher-ranked traits with type variables.** This RFC limits +"higher-rankedness" to lifetimes. It is plausible to extend the system +in the future to permit types as well, though only in where clauses +and not in types. For example, one might write: + + fn foo(t: IDENTITY) where IDENTITY : for FnMut(U) -> U { ... } + +# Unresolved questions + +None. Implementation is underway though not complete. diff --git a/text/0390-enum-namespacing.md b/text/0390-enum-namespacing.md new file mode 100644 index 00000000000..c24a10a92ad --- /dev/null +++ b/text/0390-enum-namespacing.md @@ -0,0 +1,333 @@ +- Start Date: 2014-07-16 +- RFC PR #: https://github.com/rust-lang/rfcs/pull/390 +- Rust Issue #: https://github.com/rust-lang/rust/issues/18478 + +# Summary + +The variants of an enum are currently defined in the same namespace as the enum +itself. This RFC proposes to define variants under the enum's namespace. + +## Note + +In the rest of this RFC, *flat enums* will be used to refer to the current enum +behavior, and *namespaced enums* will be used to refer to the proposed enum +behavior. + +# Motivation + +Simply put, flat enums are the wrong behavior. They're inconsistent with the +rest of the language and harder to work with. + +## Practicality + +Some people prefer flat enums while others prefer namespaced enums. It is +trivial to emulate flat enums with namespaced enums: +```rust +pub use MyEnum::*; + +pub enum MyEnum { + Foo, + Bar, +} +``` +On the other hand, it is *impossible* to emulate namespaced enums with the +current enum system. It would have to look something like this: +```rust +pub enum MyEnum { + Foo, + Bar, +} + +pub mod MyEnum { + pub use super::{Foo, Bar}; +} +``` +However, it is now forbidden to have a type and module with the same name in +the same namespace. This workaround was one of the rationales for the rejection +of the `enum mod` proposal previously. + +Many of the variants in Rust code today are *already* effectively namespaced, +by manual name mangling. As an extreme example, consider the enums in +`syntax::ast`: +```rust +pub enum Item_ { + ItemStatic(...), + ItemFn(...), + ItemMod(...), + ItemForeignMod(...), + ... +} + +pub enum Expr_ { + ExprBox(...), + ExprVec(...), + ExprCall(...), + ... +} + +... +``` +These long names are unavoidable as all variants of the 47 enums in the `ast` +module are forced into the same namespace. + +Going without name mangling is a risky move. Sometimes variants have to be +inconsistently mangled, as in the case of `IoErrorKind`. All variants are +un-mangled (e.g, `EndOfFile` or `ConnectionRefused`) except for one, +`OtherIoError`. Presumably, `Other` would be too confusing in isolation. One +also runs the risk of running into collisions as the library grows. + +## Consistency + +Flat enums are inconsistent with the rest of the language. Consider the set of +items. Some don't have their own names, such as `extern {}` blocks, so items +declared inside of them have no place to go but the enclosing namespace. Some +items do not declare any "sub-names", like `struct` definitions or statics. +Consider all other items, and how sub-names are accessed: +```rust +mod foo { + fn bar() {} +} + +foo::bar() +``` + +```rust +trait Foo { + type T; + + fn bar(); +} + +Foo::T +Foo::bar() +``` + +```rust +impl Foo { + fn bar() {} + fn baz(&self) {} +} + +Foo::bar() +Foo::baz(a_foo) // with UFCS +``` + +```rust +enum Foo { + Bar, +} + +Bar // ?? +``` + +Enums are the odd one out. + +Current Rustdoc output reflects this inconsistency. Pages in Rustdoc map to +namespaces. The documentation page for a module contains all names defined +in its namespace - structs, typedefs, free functions, reexports, statics, +enums, but *not* variants. Those are placed on the enum's own page, next to +the enum's inherent methods which *are* placed in the enum's namespace. In +addition, search results incorrectly display a path for variant results that +contains the enum itself, such as `std::option::Option::None`. These issues +can of course be fixed, but that will require adding more special cases to work +around the inconsistent behavior of enums. + +## Usability + +This inconsistency makes it harder to work with enums compared to other items. + +There are two competing forces affecting the design of libraries. On one hand, +the author wants to limit the size of individual files by breaking the crate +up into multiple modules. On the other hand, the author does not necessarily +want to expose that module structure to consumers of the library, as overly +deep namespace hierarchies are hard to work with. A common solution is to use +private modules with public reexports: +```rust +// lib.rs +pub use inner_stuff::{MyType, MyTrait}; + +mod inner_stuff; + +// a lot of code +``` +```rust +// inner_stuff.rs +pub struct MyType { ... } + +pub trait MyTrait { ... } + +// a lot of code +``` +This strategy does not work for flat enums in general. It is not all that +uncommon for an enum to have *many* variants - for example, take +[`rust-postgres`'s `SqlState` +enum](http://www.rust-ci.org/sfackler/rust-postgres/doc/postgres/error/enum.PostgresSqlState.html), +which contains 232 variants. It would be ridiculous to `pub use` all of them! +With namespaced enums, this kind of reexport becomes a simple `pub use` of the +enum itself. + +Sometimes a developer wants to use many variants of an enum in an "unqualified" +manner, without qualification by the containing module (with flat enums) or +enum (with namespaced enums). This is especially common for private, internal +enums within a crate. With flat enums, this is trivial within the module in +which the enum is defined, but very painful anywhere else, as it requires each +variant to be `use`d individually, which can get *extremely* verbose. For +example, take this [from +`rust-postgres`](https://github.com/sfackler/rust-postgres/blob/557a159a8a4a8e33333b06ad2722b1322e95566c/src/lib.rs#L97-L136): +```rust +use message::{AuthenticationCleartextPassword, + AuthenticationGSS, + AuthenticationKerberosV5, + AuthenticationMD5Password, + AuthenticationOk, + AuthenticationSCMCredential, + AuthenticationSSPI, + BackendKeyData, + BackendMessage, + BindComplete, + CommandComplete, + CopyInResponse, + DataRow, + EmptyQueryResponse, + ErrorResponse, + NoData, + NoticeResponse, + NotificationResponse, + ParameterDescription, + ParameterStatus, + ParseComplete, + PortalSuspended, + ReadyForQuery, + RowDescription, + RowDescriptionEntry}; +use message::{Bind, + CancelRequest, + Close, + CopyData, + CopyDone, + CopyFail, + Describe, + Execute, + FrontendMessage, + Parse, + PasswordMessage, + Query, + StartupMessage, + Sync, + Terminate}; +use message::{WriteMessage, ReadMessage}; +``` +A glob import can't be used because it would pull in other, unwanted names from +the `message` module. With namespaced enums, this becomes far simpler: +```rust +use messages::BackendMessage::*; +use messages::FrontendMessage::*; +use messages::{FrontendMessage, BackendMessage, WriteMessage, ReadMessage}; +``` + +# Detailed design + +The compiler's resolve stage will be altered to place the value and type +definitions for variants in their enum's module, just as methods of inherent +impls are. Variants will be handled differently than those methods are, +however. Methods cannot currently be directly imported via `use`, while +variants will be. The determination of importability currently happens at the +module level. This logic will be adjusted to move that determination to the +definition level. Specifically, each definition will track its "importability", +just as it currently tracks its "publicness". All definitions will be +importable except for methods in implementations and trait declarations. + +The implementation will happen in two stages. In the first stage, resolve will +be altered as described above. However, variants will be defined in *both* the +flat namespace and nested namespace. This is necessary t keep the compiler +bootstrapping. + +After a new stage 0 snapshot, the standard library will be ported and resolve +will be updated to remove variant definitions in the flat namespace. This will +happen as one atomic PR to keep the implementation phase as compressed as +possible. In addition, if unforseen problems arise during this set of work, we +can roll back the initial commit and put the change off until after 1.0, with +only a small pre-1.0 change required. This initial conversion will focus on +making the minimal set of changes required to port the compiler and standard +libraries by reexporting variants in the old location. Later work can alter +the APIs to take advantage of the new definition locations. + +## Library changes + +Library authors can use reexports to take advantage of enum namespacing without +causing too much downstream breakage: +```rust +pub enum Item { + ItemStruct(Foo), + ItemStatic(Bar), +} +``` +can be transformed to +```rust +pub use Item::Struct as ItemStruct; +pub use Item::Static as ItemStatic; + +pub enum Item { + Struct(Foo), + Static(Bar), +} +``` +To simply keep existing code compiling, a glob reexport will suffice: +```rust +pub use Item::*; + +pub enum Item { + ItemStruct(Foo), + ItemStatic(Bar), +} +``` +Once RFC #385 is implemented, it will be possible to write a syntax extension +that will automatically generate the reexport: +```rust +#[flatten] +pub enum Item { + ItemStruct(Foo), + ItemStatic(Bar), +} +``` + +# Drawbacks + +The transition period will cause enormous breakage in downstream code. It is +also a fairly large change to make to resolve, which is already a bit fragile. + +# Alternatives + +We can implement enum namespacing after 1.0 by adding a "fallback" case to +resolve, where variants can be referenced from their "flat" definition location +if no other definition would conflict in that namespace. In the grand scheme of +hacks to preserve backwards compatibility, this is not that bad, but still +decidedly worse than not having to worry about fallback at all. + +Earlier iterations of namespaced enum proposals suggested preserving flat enums +and adding `enum mod` syntax for namespaced enums. However, variant namespacing +isn't a large enough enough difference for the additon of a second way to +define enums to hold its own weight as a language feature. In addition, it +would simply cause confusion, as library authors need to decide which one they +want to use, and library consumers need to double check which place they can +import variants from. + +# Unresolved questions + +A recent change placed enum variants in the type as well as the value namespace +to allow for future language expansion. This broke some code that looked like +this: +```rust +pub enum MyEnum { + Foo(Foo), + Bar(Bar), +} + +pub struct Foo { ... } +pub struct Bar { ... } +``` +Is it possible to make such a declaration legal in a world with namespaced +enums? The variants `Foo` and `Bar` would no longer clash with the structs +`Foo` and `Bar`, from the perspective of a consumer of this API, but the +variant declarations `Foo(Foo)` and `Bar(Bar)` are ambiguous, since the `Foo` +and `Bar` structs will be in scope inside of the `MyEnum` declaration. diff --git a/text/0401-coercions.md b/text/0401-coercions.md new file mode 100644 index 00000000000..816879f4f89 --- /dev/null +++ b/text/0401-coercions.md @@ -0,0 +1,452 @@ +- Start Date: 2014-10-30 +- RFC PR #: https://github.com/rust-lang/rfcs/pull/401 +- Rust Issue #: https://github.com/rust-lang/rust/issues/18469 + +# Summary + +Describe the various kinds of type conversions available in Rust and suggest +some tweaks. + +Provide a mechanism for smart pointers to be part of the DST coercion system. + +Reform coercions from functions to closures. + +The `transmute` intrinsic and other unsafe methods of type conversion are not +covered by this RFC. + + +# Motivation + +It is often useful to convert a value from one type to another. This conversion +might be implicit or explicit and may or may not involve some runtime action. +Such conversions are useful for improving reuse of code, and avoiding unsafe +transmutes. + +Our current rules around type conversions are not well-described. The different +conversion mechanisms interact poorly and the implementation is somewhat ad-hoc. + +# Detailed design + +Rust has several kinds of type conversion: subtyping, coercion, and casting. +Subtyping and coercion are implicit, there is no syntax. Casting is explicit, +using the `as` keyword. The syntax for a cast expression is: + +``` +e_cast ::= e as U +``` + +Where `e` is any valid expression and `U` is any valid type (note that we +restrict in type checking the valid types for `U`). + +These conversions (and type equality) form a total order in terms of their +strength. For any types `T` and `U`, if `T == U` then `T` is also a subtype of +`U`. If `T` is a subtype of `U`, then `T` coerces to `U`, and if `T` coerces to +`U`, then `T` can be cast to `U`. + +There is an additional kind of coercion which does not fit into that total order +- implicit coercions of receiver expressions. (I will use 'expression coercion' +when I need to distinguish coercions in non-receiver position from coercions of +receivers). All expression coercions are valid receiver coercions, but not all +receiver coercions are valid casts. + +Finally, I will discuss function polymorphism, which is something of a coercion +edge case. + +## Subtyping + +Subtyping is implicit and can occur at any stage in type checking or inference. +Subtyping in Rust is very restricted and occurs only due to variance with +respect to lifetimes and between types with higher ranked lifetimes. If we were +to erase lifetimes from types, then the only subtyping would be due to type +equality. + + +## Coercions + +A coercion is implicit and has no syntax. A coercion can only occur at certain +coercion sites in a program, these are typically places where the desired type +is explicit or can be derived by propagation from explicit types (without type +inference). The base cases are: + +* In `let` statements where an explicit type is given: in `let _: U = e;`, `e` + is coerced to have type `U` + +* In statics and consts, similarly to `let` statements + +* In argument position for function calls. The value being coerced is the actual + parameter and it is coerced to the type of the formal parameter. For example, + where `foo` is defined as `fn foo(x: U) { ... }` and is called with `foo(e);`, + `e` is coerced to have type `U` + +* Where a field of a struct or variant is instantiated. E.g., where `struct Foo + { x: U }` and the instantiation is `Foo { x: e }`, `e` is coerced to have + type `U` + +* The result of a function, either the final line of a block if it is not semi- + colon terminated or any expression in a `return` statement. For example, for + `fn foo() -> U { e }`, `e` is coerced to have type `U` + +If the expression in one of these coercion sites is a coercion-propagating +expression, then the relevant sub-expressions in that expression are also +coercion sites. Propagation recurses from these new coercion sites. Propagating +expressions and their relevant sub-expressions are: + +* Array literals, where the array has type `[U, ..n]`, each sub-expression in + the array literal is a coercion site for coercion to type `U` + +* Array literals with repeating syntax, where the array has type `[U, ..n]`, the + repeated sub-expression is a coercion site for coercion to type `U` + +* Tuples, where a tuple is a coercion site to type `(U_0, U_1, ..., U_n)`, each + sub-expression is a coercion site for the respective type, e.g., the zero-th + sub-expression is a coercion site to `U_0` + +* The box expression, if the expression has type `Box`, the sub-expression is + a coercion site to `U` (I expect this to be generalised when `box` expressions + are) + +* Parenthesised sub-expressions (`(e)`), if the expression has type `U`, then + the sub-expression is a coercion site to `U` + +* Blocks, if a block has type `U`, then the last expression in the block (if it + is not semicolon-terminated) is a coercion site to `U`. This includes blocks + which are part of control flow statements, such as `if`/`else`, if the block + has a known type. + + +Note that we do not perform coercions when matching traits (except for +receivers, see below). If there is an impl for some type `U`, and `T` coerces to +`U`, that does not constitute an implementation for `T`. For example, the +following will not type check, even though it is OK to coerce `t` to `&T` and +there is an impl for `&T`: + +```rust +struct T; +trait Trait {} + +fn foo(t: X) {} + +impl<'a> Trait for &'a T {} + + +fn main() { + let t: &mut T = &mut T; + foo(t); //~ ERROR failed to find an implementation of trait Trait for &mut T +} +``` + +In a cast expression, `e as U`, the compiler will first attempt to coerce `e` to +`U`, and only if that fails will the conversion rules for casts (see below) be +applied. + +Coercion is allowed between the following types: + +* `T` to `U` if `T` is a subtype of `U` (the 'identity' case) + +* `T_1` to `T_3` where `T_1` coerces to `T_2` and `T_2` coerces to `T_3` + (transitivity case) + +* `&mut T` to `&T` + +* `*mut T` to `*const T` + +* `&T` to `*const T` + +* `&mut T` to `*mut T` + +* `T` to `fn` if `T` is a closure that does not capture any local variables + in its environment. + +* `T` to `U` if `T` implements `CoerceUnsized` (see below) and `T = Foo<...>` + and `U = Foo<...>` (for any `Foo`, when we get HKT I expect this could be a + constraint on the `CoerceUnsized` trait, rather than being checked here) + +* From TyCtor(`T`) to TyCtor(coerce_inner(`T`)) (these coercions could be + provided by implementing `CoerceUnsized` for all instances of TyCtor) + where TyCtor(`T`) is one of `&T`, `&mut T`, `*const T`, `*mut T`, or `Box`. + +And where coerce_inner is defined as: + +* coerce_inner(`[T, ..n]`) = `[T]`; + +* coerce_inner(`T`) = `U` where `T` is a concrete type which implements the + trait `U`; + +* coerce_inner(`T`) = `U` where `T` is a sub-trait of `U`; + +* coerce_inner(`Foo<..., T, ...>`) = `Foo<..., coerce_inner(T), ...>` where + `Foo` is a struct and only the last field has type `T` and `T` is not part of + the type of any other fields; + +* coerce_inner(`(..., T)`) = `(..., coerce_inner(T))`. + +Note that coercing from sub-trait to a super-trait is a new coercion and is non- +trivial. One implementation strategy which avoids re-computation of vtables is +given in RFC PR #250. + +A note for the future: although there hasn't been an RFC nor much discussion, it +is likely that post-1.0 we will add type ascription to the language (see #354). +That will (probably) allow any expression to be annotated with a type (e.g, +`foo(a, b: T, c)` a function call where the second argument has a type +annotation). + +Type ascription is purely descriptive and does not cast the sub-expression to +the required type. However, it seems sensible that type ascription would be a +coercion site, and thus type ascription would be a way to make implicit +coercions explicit. There is a danger that such coercions would be confused with +casts. I hope the rule that casting should change the type and type ascription +should not is enough of a discriminant. Perhaps we will need a style guideline +to encourage either casts or type ascription to force an implicit coercion. +Perhaps type ascription should not be a coercion site. Or perhaps we don't need +type ascription at all if we allow trivial casts. + + +### Custom unsizing coercions + +It should be possible to coerce smart pointers (e.g., `Rc`) in the same way as +the built-in pointers. In order to do so, we provide two traits and an intrinsic +to allow users to make their smart pointers work with the compiler's coercions. +It might be possible to implement some of the coercions described for built-in +pointers using this machinery, and whether that is a good idea or not is an +implementation detail. + +``` +// Cannot be impl'ed - it really is quite a magical trait, see the cases below. +trait Unsize for Sized? {} +``` + +The `Unsize` trait is a marker trait and a lang item. It should not be +implemented by users and user implementations will be ignored. The compiler will +assume the following implementations, these correspond to the definition of +coerce_inner, above; note that these cannot be expressed in real Rust: + +``` +impl Unsize<[T]> for [T, ..n] {} + +// Where T is a trait +impl Unsize for U {} + +// Where T and U are traits +impl Unsize for U {} + +// Where T and U are structs ... following the rules for coerce_inner +impl Unsize for U {} + +impl Unsize<(..., T)> for (..., U) + where U: Unsize(T) {} +``` + +The `CoerceUnsized` trait should be implemented by smart pointers and containers +which want to be part of the coercions system. + +``` +trait CoerceUnsized { + fn coerce(self) -> U; +} +``` + +To help implement `CoerceUnsized`, we provide an intrinsic - +`fat_pointer_convert`. This takes and returns raw pointers. The common case will +be to take a thin pointer, unsize the contents, and return a fat pointer. But +the exact behaviour depends on the types involved. This will perform any +computation associated with a coercion (for example, adjusting or creating +vtables). The implementation of fat_pointer_convert will match what the +compiler must do in coerce_inner as described above. + +``` +intrinsic fn fat_pointer_convert(t: *const T) -> *const U + where T : Unsize; +``` + +Here is an example implementation of `CoerceUnsized` for `Rc`: + +``` +impl CoerceUnsized> for Rc { + where U: Unsize + + fn coerce(self) -> Rc { + let new_ptr: *const RcBox = fat_pointer_convert(self._ptr); + Rc { _ptr: new_ptr } + } +} +``` + +## Coercions of receiver expressions + +These coercions occur when matching the type of the receiver of a method call +with the self type (i.e., the type of `e` in `e.m(...)`) or in field access. +These coercions can be thought of as a feature of the `.` operator, they do not +apply when using the UFCS form with the self argument in argument position. Only +an expression before the dot is coerced as a receiver. When using the UFCS form +of method call, arguments are only coerced according to the expression coercion +rules. This matches the rules for dispatch - dynamic dispatch only happens using +the `.` operator, not the UFCS form. + +In method calls the target type of the coercion is the concrete type of the impl +in which the method is defined, modified by the type of `self`. Assuming the +impl is for `T`, the target type is given by: + + self | target type +------------------|------------ + `self` | `T` + `&self` | `&T` + `&mut self` | `&mut T` + `self: Box`| `Box` + +and likewise with any variations of the self type we might add in the future. + +For field access, the target type is `&T`, `&mut T` for field assignment, +where `T` is a struct with the named field. + +A receiver coercion consists of some number of dereferences (either compiler +built-in (of a borrowed reference or `Box` pointer, not raw pointers) or custom, +given by the `Deref` trait), one or zero applications of `coerce_inner` or use +of the `CoerceUnsized` trait (as defined above, note that this requires we are +at a type which has neither references nor dereferences at the top level), and +up to two address-of operations (i.e., `T` to `&T`, `&mut T`, `*const T`, or +`*mut T`, with a fresh lifetime.). The usual mutability rules for taking a +reference apply. (Note that the implementation of the coercion isn't so simple, +it is embedded in the search for candidate methods, but from the point of view +of type conversions, that is not relevant). + +Alternatively, a receiver coercion may be thought of as a two stage process. +First, we dereference and then take the address until the source type has the +same shape (i.e., has the same kind and number of indirection) as the target +type. Then we try to coerce the adjusted source type to the target type using +the usual coercion machinery. I believe, but have not proved, that these two +descriptions are equivalent. + + +## Casts + +Casting is indicated by the `as` keyword. A cast `e as U` is valid if one of the +following holds: + + * `e` has type `T` and `T` coerces to `U`; *coercion-cast* + * `e` has type `*T`, `U` is `*U_0`, and either `U_0: Sized` or + unsize_kind(`T`) = unsize_kind(`U_0`); *ptr-ptr-cast* + * `e` has type `*T` and `U` is a numeric type, while `T: Sized`; *ptr-addr-cast* + * `e` is an integer and `U` is `*U_0`, while `U_0: Sized`; *addr-ptr-cast* + * `e` has type `T` and `T` and `U` are any numeric types; *numeric-cast* + * `e` is a C-like enum and `U` is an integer type; *enum-cast* + * `e` has type `bool` or `char` and `U` is an integer; *prim-int-cast* + * `e` has type `u8` and `U` is `char`; *u8-char-cast* + * `e` has type `&[T; n]` and `U` is `*const T`; *array-ptr-cast* + * `e` is a function pointer type and `U` has type `*T`, + while `T: Sized`; *fptr-ptr-cast* + * `e` is a function pointer type and `U` is an integer; *fptr-addr-cast* + +where `&.T` and `*T` are references of either mutability, +and where unsize_kind(`T`) is the kind of the unsize info +in `T` - the vtable for a trait definition (e.g. `fmt::Display` or +`Iterator`, not `Iterator`) or a length (or `()` if `T: Sized`). + +Note that lengths are not adjusted when casting raw slices - +`T: *const [u16] as *const [u8]` creates a slice that only includes +half of the original memory. + +Casting is not transitive, that is, even if `e as U1 as U2` is a valid +expression, `e as U2` is not necessarily so (in fact it will only be valid if +`U1` coerces to `U2`). + +A cast may require a runtime conversion. + +There will be a lint for trivial casts. A trivial cast is a cast `e as T` where +`e` has type `U` and `U` is a subtype of `T`. The lint will be warn by default. + + +## Function type polymorphism + +Currently, functions may be used where a closure is expected by coercing a +function to a closure. We will remove this coercion and instead use the +following scheme: + +* Every function item has its own fresh type. This type cannot be written by the + programmer (i.e., it is expressible but not denotable). +* Conceptually, for each fresh function type, there is an automatically generated + implementation of the `Fn`, `FnMut`, and `FnOnce` traits. +* All function types are implicitly coercible to a `fn()` type with the + corresponding parameter types. +* Conceptually, there is an implementation of `Fn`, `FnMut`, and `FnOnce` for + every `fn()` type. +* `Fn`, `FnMut`, or `FnOnce` trait objects and references to type parameters + bounded by these traits may be considered to have the corresponding unboxed + closure type. This is a desugaring (alias), rather than a coercion. This is + an existing part of the unboxed closures work. + +These steps should allow for functions to be stored in variables with both +closure and function type. It also allows variables with function type to be +stored as a variable with closure type. Note that these have different +dynamic semantics, as described below. For example, + +``` +fn foo() { ... } // `foo` has a fresh and non-denotable type. + +fn main() { + let x: fn() = foo; // `foo` is coerced to `fn()`. + let y: || = x; // `x` is coerced to `&Fn` (a closure object), + // legal due to the `fn()` auto-impls. + + let z: || = foo; // `foo` is coerced to `&T` where `T` is fresh and + // bounded by `Fn`. Legal due to the fresh function + // type auto-impls. +} +``` + +The two kinds of auto-generated impls are rather different: the first case (for +the fresh and non-denotable function types) is a static call to `Fn::Call`, +which in turn calls the function with the given arguments. The first call would +be inlined (in fact, the impls and calls to them may be special-cased by the +compiler). In the second case (for `fn()` types), we must execute a virtual call +to find the implementing method and then call the function itself because the +function is 'wrapped' in a closure object. + + +## Changes required + +* Add cast from unsized slices to raw pointers (`&[V] to *V`); + +* allow coercions as casts and add lint for trivial casts; + +* ensure we support all coercion sites; + +* remove [T, ..n] to &[T]/*[T] coercions; + +* add raw pointer coercions; + +* add sub-trait coercions; + +* add unsized tuple coercions; + +* add all transitive coercions; + +* receiver coercions - add referencing to raw pointers, remove triple + referencing for slices; + +* remove function coercions, add function type polymorphism; + +* add DST/custom coercions. + + +# Drawbacks + +We are adding and removing some coercions. There is always a trade-off with +implicit coercions on making Rust ergonomic vs making it hard to comprehend due +to magical conversions. By changing this balance we might be making some things +worse. + + +# Alternatives + +These rules could be tweaked in any number of ways. + +Specifically for the DST custom coercions, the compiler could throw an error if +it finds a user-supplied implementation of the `Unsize` trait, rather than +silently ignoring them. + +# Amendments + +* Updated by [#1558](https://github.com/rust-lang/rfcs/pull/1558), which allows + coercions from a non-capturing closure to a function pointer. + +# Unresolved questions \ No newline at end of file diff --git a/text/0403-cargo-build-command.md b/text/0403-cargo-build-command.md new file mode 100644 index 00000000000..cb169b59346 --- /dev/null +++ b/text/0403-cargo-build-command.md @@ -0,0 +1,541 @@ +- Start Date: 2014-10-30 +- RFC PR: [rust-lang/rfcs#403](https://github.com/rust-lang/rfcs/pull/403) +- Rust Issue: [rust-lang/rust#18473](https://github.com/rust-lang/rust/issues/18473) + +# Summary + +Overhaul the `build` command internally and establish a number of conventions +around build commands to facilitate linking native code to Cargo packages. + +1. Instead of having the `build` command be some form of script, it will be a + Rust command instead +2. Establish a namespace of `foo-sys` packages which represent the native + library `foo`. These packages will have Cargo-based dependencies between + `*-sys` packages to express dependencies among C packages themselves. +3. Establish a set of standard environment variables for build commands which + will instruct how `foo-sys` packages should be built in terms of dynamic or + static linkage, as well as providing the ability to override where a package + comes from via environment variables. + +# Motivation + +Building native code is normally quite a tricky business, and the original +design of Cargo was to essentially punt on this problem. Today's "solution" +involves invoking an arbitrary `build` command in a sort of pseudo-shell with a +number of predefined environment variables. This ad-hoc solution was known to be +lacking at the time of implementing with the intention of identifying major pain +points over time and revisiting the design once we had more information. + +While today's "hands off approach" certainly has a number of drawbacks, one of +the upsides is that Cargo minimizes the amount of logic inside it as much as +possible. This proposal attempts to stress this point as much as possible by +providing a strong foundation on which to build robust build scripts, but not +baking all of the logic into Cargo itself. + +The time has now come to revisit the design, and some of the largest pain points +that have been identified are: + +1. Packages needs the ability to build differently on different platforms. +2. Projects should be able to control dynamic vs static at the top level. Note + that the term "project" here means "top level package". +3. It should be possible to use libraries of build tool functionality. Cargo is + indeed a package manager after all, and currently there is no way share a + common set of build tool functionality among different Cargo packages. +4. There is very little flexibility in locating packages, be it on the system, + in a build directory, or in a home build dir. +5. There is no way for two Rust packages to declare that they depend on the same + native dependency. +6. There is no way for C libraries to express their dependence on other C + libraries. +7. There is no way to encode a platform-specific dependency. + +Each of these concerns can be addressed somewhat ad-hocly with a vanilla `build` +command, but Cargo can certainly provide a more comprehensive solution to these +problems. + +Most of these concerns are fairly self-explanatory, but specifically (2) may +require a bit more explanation: + +## Selecting linkage from the top level + +Conceptually speaking, a native library is largely just a collections of +symbols. The linkage involved in creating a final product is an implementation +detail that is almost always irrelevant with respect to the symbols themselves. + +When it comes to linking a native library, there are often a number of +overlapping and sometimes competing concerns: + +1. Most unix-like distributions with package managers highly recommend dynamic + linking of all dependencies. This reduces the overall size of an installation + and allows dependencies to be updated without updating the original + application. +2. Those who distribute binaries of an application to many platforms prefer + static linking as much as possible. This is largely done because the actual + set of libraries on the platforms being installed on are often unknown and + could be quite different than those linked to. Statically linking solves + these problems by reducing the number of dependencies for an application. +3. General developers of a package simply want a package to build at all costs. + It's ok to take a little bit longer to build, but if it takes hours of + googling obscure errors to figure out you needed to install `libfoo` it's + probably not ok. +4. Some native libraries have obscure linkage requirements. For example OpenSSL + on OSX likely wants to be linked dynamically due to the special keychain + support, but on linux it's more ok to statically link OpenSSL if necessary. + +The key point here is that the author of a library is not the one who dictates +how an application should be linked. The builder or packager of a library is the +one responsible for determining how a package should be linked. + +Today this is not quite how Cargo operates, depending on what flavor of syntax +extension you may be using. One of the goals of this re-working is to enable +top-level projects to make easier decisions about how to link to libraries, +where to find linked libraries, etc. + +# Detailed design + +Summary: + +* Add a `-l` flag to rustc +* Tweak an `include!` macro to rustc +* Add a `links` key to Cargo manifests +* Add platform-specific dependencies to Cargo manifests +* Allow pre-built libraries in the same manner as Cargo overrides +* Use Rust for build scripts +* Develop a convention of `*-sys` packages + +## Modifications to `rustc` + +A new flag will be added to `rustc`: + +``` + -l LIBRARY Link the generated crate(s) to the specified native + library LIBRARY. The name `LIBRARY` will have the format + `kind:name` where `kind` is one of: dylib, static, + framework. This corresponds to the `kind` key of the + `#[link]` attribute. The `name` specified is the name of + the native library to link. The `kind:` prefix may be + omitted and the `dylib` format will be assumed. +``` + +``` +rustc -l dylib:ssl -l static:z foo.rs +``` + +Native libraries often have widely varying dependencies depending on what +platforms they are compiled on. Often times these dependencies aren't even +constant among one platform! The reality we sadly have to face is that the +dependencies of a native library itself are sometimes unknown until *build +time*, at which point it's too late to modify the source code of the program to +link to a library. + +For this reason, the `rustc` CLI will grow the ability to link to arbitrary +libraries at build time. This is motivated by the build scripts which Cargo is +growing, but it likely useful for custom Rust compiles at large. + +Note that this RFC does not propose style guidelines nor suggestions for usage +of `-l` vs `#[link]`. For Cargo it will later recommend discouraging use of +`#[link]`, but this is not generally applicable to all Rust code in existence. + +## Declaration of native library dependencies + +Today Cargo has very little knowledge about what dependencies are being used by +a package. By knowing the exact set of dependencies, Cargo paves a way into the +future to extend its handling of native dependencies, for example downloading +precompiled libraries. This extension allows Cargo to better handle constraint 5 +above. + +```toml +[package] + +# This package unconditionally links to this list of native libraries +links = ["foo", "bar"] +``` + +The key `links` declares that the package will link to and provide the given C +libraries. Cargo will impose the restriction that the same C library *must not* +appear more than once in a dependency graph. This will prevent the same C +library from being linked multiple times to packages. + +If conflicts arise from having multiple packages in a dependency graph linking +to the same C library, the C dependency should be refactored into a common +Cargo-packaged dependency. + +It is illegal to define `links` without also defining `build`. + +## Platform-specific dependencies + +A number of native dependencies have various dependencies depending on what +platform they're building for. For example, libcurl does not depend on OpenSSL +on Windows, but it is a common dependency on unix-based systems. To this end, +Cargo will gain support for platform-specific dependencies, solving constriant 7 +above: + +```toml + +[target.i686-pc-windows-gnu.dependencies.crypt32] +git = "https://github.com/user/crypt32-rs" + +[target.i686-pc-windows-gnu.dependencies.winhttp] +path = "winhttp" +``` + +Here the top-level configuration key `target` will be a table whose sub-keys +are target triples. The dependencies section underneath is the same as the +top-level dependencies section in terms of functionality. + +Semantically, platform specific dependencies are activated whenever Cargo is +compiling for a the exact target. Dependencies in other `$target` sections +will not be compiled. + +However, when generating a lockfile, Cargo will always download all dependencies +unconditionally and perform resolution as if all packages were included. This is +done to prevent the lockfile from radically changing depending on whether the +package was last built on Linux or windows. This has the advantage of a stable +lockfile, but has the drawback that all dependencies must be downloaded, even if +they're not used. + +## Pre-built libraries + +A common pain point with constraints 1, 2, and cross compilation is that it's +occasionally difficult to compile a library for a particular platform. Other +times it's often useful to have a copy of a library locally which is linked +against instead of built or detected otherwise for debugging purposes (for +example). To facilitate these pain points, Cargo will support pre-built +libraries being on the system similar to how local package overrides are +available. + +Normal Cargo configuration will be used to specify where a library is and how +it's supposed to be linked against: + +```toml +# Each target triple has a namespace under the global `target` key and the +# `libs` key is a table for each native library. +# +# Each library can specify a number of key/value pairs where the values must be +# strings. The key/value pairs are metadata which are passed through to any +# native build command which depends on this library. The `rustc-flags` key is +# specially recognized as a set of flags to pass to `rustc` in order to link to +# this library. +[target.i686-unknown-linux-gnu.ssl] +rustc-flags = "-l static:ssl -L /home/build/root32/lib" +root = "/home/build/root32" +``` + +This configuration will be placed in the normal locations that `.cargo/config` +is found. The configuration will only be queried if the target triple being +built matches what's in the configuration. + +## Rust build scripts + +First pioneered by @tomaka in https://github.com/rust-lang/cargo/issues/610, the +`build` command will no longer be an actual command, but rather a build script +itself. This decision is motivated in solving constraints 1 and 3 above. The +major motivation for this recommendation is the realization that the only common +denominator for platforms that Cargo is running on is the fact that a Rust +compiler is available. The natural conclusion from this fact is for a build +script is to use Rust itself. + +Furthermore, Cargo itself which serves quite well as a dependency manager, so by +using Rust as a build tool it will be able to manage dependencies of the build +tool itself. This will allow third-party solutions for build tools to be +developed outside of Cargo itself and shared throughout the ecosystem of +packages. + +The concrete design of this will be the `build` command in the manifest being a +relative path to a file in the package: + +```toml +[package] +# ... +build = "build/compile.rs" +``` + +This file will be considered the entry point as a "build script" and will be +built as an executable. A new top-level dependencies array, `build-dependencies` +will be added to the manifest. These dependencies will all be available to the +build script as external crates. Requiring that the build command have a +separate set of dependencies solves a number of constraints: + +* When cross-compiling, the build tool as well as all of its dependencies are + required to be built for the host architecture instead of the target + architecture. A clear deliniation will indicate precisely what dependencies + need to be built for the host architecture. +* Common packages, such as one to build `cmake`-based dependencies, can develop + conventions around filesystem hierarchy formats to require minimum + configuration to build extra code while being easily identified as having + extra support code. + +This RFC does not propose a convention of what to name the build script files. + +Unlike `links`, it will be legal to specify `build` without specifying `links`. +This is motivated by the code generation case study below. + +### Inputs + +Cargo will provide a number of inputs to the build script to facilitate building +native code for the current package: + +* The `TARGET` environment variable will contain the target triple that the + native code needs to be built for. This will be passed unconditionally. +* The `NUM_JOBS` environment variable will indicate the number of parallel jobs + that the script itself should execute (if relevant). +* The `CARGO_MANIFEST_DIR` environment variables will be the directory of the + manifest of the package being built. Note that this is not the directory of + the package whose build command is being run. +* The `OPT_LEVEL` environment variable will contain the requested optimization + level of code being built. This will be in the range 0-2. Note that this + variable is the same for all build commands. +* The `PROFILE` environment variable will contain the currently active Cargo + profile being built. Note that this variable is the same for all build + commands. +* The `DEBUG` environment variable will contain `true` or `false` depending on + whether the current profile specified that it should be debugged or not. Note + that this variable is the same for all build commands. +* The `OUT_DIR` environment variables contains the location in which all output + should be placed. This should be considered a scratch area for compilations of + any bundled items. +* The `CARGO_FEATURE_` environment variable will be present if the feature + `foo` is enabled. for the package being compiled. +* The `DEP__` environment variables will contain metadata about the + native dependencies for the current package. As the output section below will + indicate, each compilation of a native library can generate a set of output + metadata which will be passed through to dependencies. The only dependencies + available (`foo`) will be those in `links` for immediate dependencies of the + package being built. Note that each metadata `key` will be uppercased and `-` + characters transformed to `_` for the name of the environment variable. +* If `links` is not present, then the command is unconditionally run with 0 + command line arguments, otherwise: +* The libraries that are requested via `links` are passed as command line + arguments. The pre-built libraries in `links` (detailed above) will be + filtered out and not passed to the build command. If there are no libraries to + build (they're all pre-built), the build command will not be invoked. + +### Outputs + +The responsibility of the build script is to ensure that all requested native +libraries are available for the crate to compile. The conceptual output of the +build script will be metadata on stdout explaining how the compilation +went and whether it succeeded. + +An example output of a build command would be: + +``` +cargo:rustc-flags=-l static:foo -L /path/to/foo +cargo:root=/path/to/foo +cargo:libdir=/path/to/foo/lib +cargo:include=/path/to/foo/include +``` + +Each line that begins with `cargo:` is interpreted as a line of metadata for +Cargo to store. The remaining part of the line is of the form `key=value` (like +environment variables). + +This output is similar to the pre-built libraries section above in that most +key/value pairs are opaque metadata except for the special `rustc-flags` key. +The `rustc-flags` key indicates to Cargo necessary flags needed to link the +libraries specified. + +For `rustc-flags` specifically, Cargo will propagate all `-L` flags transitively +to all dependencies, and `-l` flags to the package being built. All metadata +will only be passed to immediate dependants. Note that this is recommending that +`#[link]` is discouraged as it is not the source code's responsibility to +dictate linkage. + +If the build script exits with a nonzero exit code, then Cargo will consider it +to have failed and will abort compilation. + +### Input/Output rationale + +In general one of the purposes of a custom build command is to dynamically +determine the necessary dependencies for a library. These dependencies may have +been discovered through `pkg-config`, built locally, or even downloaded from a +remote. This set can often change, and is the impetus for the `rustc-flags` +metadata key. This key indicates what libraries should be linked (and how) along +with where to find the libraries. + +The remaining metadata flags are not as useful to `rustc` itself, but are quite +useful to interdependencies among native packages themselves. For example +libssh2 depends on OpenSSL on linux, which means it needs to find the +corresponding libraries and header files. The metadata keys serve as a vector +through which this information can be transmitted. The maintainer of the +`openssl-sys` package (described below) would have a build script responsible +for generating this sort of metadata so consumer packages can use it to build C +libraries themselves. + +## A set of `*-sys` packages + +This section will discuss a *convention* by which Cargo packages providing +native dependencies will be named, it is not proposed to have Cargo enforce this +convention via any means. These conventions are proposed to address constraints +5 and 6 above. + +Common C dependencies will be refactored into a package named `foo-sys` where +`foo` is the name of the C library that `foo-sys` will provide and link to. +There are two key motivations behind this convention: + +* Each `foo-sys` package will declare its own dependencies on other `foo-sys` + based packages +* Dependencies on native libraries expressed through Cargo will be subject to + version management, version locking, and deduplication as usual. + +Each `foo-sys` package is responsible for providing the following: + +* Declarations of all symbols in a library. Essentially each `foo-sys` library + is *only* a header file in terms of Rust-related code. +* Ensuring that the native library `foo` is linked to the `foo-sys` crate. This + guarantees that all exposed symbols are indeed linked into the crate. + +Dependencies making use of `*-sys` packages will not expose `extern` blocks +themselves, but rather use the symbols exposed in the `foo-sys` package +directly. Additionally, packages using `*-sys` packages should not declare a +`#[link]` directive to link to the native library as it's already linked to the +`*-sys` package. + +## Phasing strategy + +The modifications to the `build` command are breaking changes to Cargo. To ease +the transition, the build comand will be join'd to the root path of a crate, and +if the file exists and ends with `.rs`, it will be compiled as describe above. +Otherwise a warning will be printed and the fallback behavior will be +executed. + +The purpose of this is to help most build scripts today continue to work (but +not necessarily all), and pave the way forward to implement the newer +integration. + +## Case study: Cargo + +Cargo has a surprisingly complex set of C dependencies, and this proposal has +created an [example repository][example] for what the configuration of Cargo +would look like with respect to its set of C dependencies. + +[example]: https://github.com/alexcrichton/complicated-linkage-example + +## Case study: generated code + +As the release of Rust 1.0 comes closer, the use of complier plugins has become +increasingly worrying over time. It is likely that plugins will not be available +by default in the stable and beta release channels of Rust. Many core Cargo +packages in the ecosystem today, such as gl-rs and iron, depend on plugins +to build. Others, like rust-http, are already using compile-time code generation +with a build script (which this RFC will attempt to standardize on). + +When taking a closer look at these crates' dependence on plugins it's discovered +that the primary use case is generating Rust code at compile time. For gl-rs, +this is done to bind a platform-specific and evolving API, and for rust-http +this is done to make code more readable and easier to understand. In general +generating code at compile time is quite a useful ability for other applications +such as bindgen (C bindings), dom bindings (used in Servo), etc. + +Cargo's and Rust's support for compile-time generated code is quite lacking +today, and overhauling the `build` command provides a nice opportunity to +rethink this sort of functionality. + +With this motivation, this RFC proposes tweaking the `include!` macro to enable +it to be suitable for the purpose of including generated code: + +```rust +include!(concat!(env!("OUT_DIR"), "/generated.rs")); +``` + +Today this does not compile as the argument to `include!` must be a string +literal. This RFC proposes tweaking the semantics of the `include!` macro to +expand locally before testing for a string literal. This is similar to the +behavior of the `format_args!` macro today. + +Using this, Cargo crates will have `OUT_DIR` present for compilations, and any +generated Rust code can be generated by the `build` command and placed into +`OUT_DIR`. The `include!` macro would then be used to include the contents of +the code inside of the appropriate module. + +## Case study: controlling linkage + +One of the motivations for this RFC and redesign of the `build` command is to +making linkage controls more explicit to Cargo itself rather than hardcoding +particular linkages in source code. As proposed, however, this RFC does not bake +any sort of dynamic-vs-static knowledge into Cargo itself. + +This design area is intentionally left untouched by Cargo in order to reduce the +number of moving parts and also in an effort to simplify build commands as much +as possible. There are, however, a number of methods to control how libraries +are linked: + +1. First and foremost is the ability to override libraries via Cargo + configuration. Overridden native libraries are specified manually and + override whatever the "default" would have been otherwise. +2. Delegation to arbitrary code running in build scripts allow the possibility + of specification through other means such as environment variables. +3. Usage of common third-party build tools will allow for conventions about + selecting linkage to develop over time. + +Note that points 2 and 3 are intentionally vague as this RFC does not have a +specific recommendation for how scripts or tooling should respect linkage. By +relying on a common set of dependencies to find native libraries it is +envisioned that the tools will grow a convention through which a linkage +preference can be specified. + +For example, a possible implementation of `pkg-config` will be discussed. This +tool can be used as a first-line-defense to help locate a library on the system +as well as its dependencies. If a crate requests that `pkg-config` find the +library `foo`, then the `pkg-config` crate could inspect some environments +variables for how it operates: + +* If `FOO_NO_PKG_CONFIG` is set, then pkg-config immediately returns an errors. + This helps users who want to force pkg-config to not find a package or force + the package to build a statically linked fallback. +* If `FOO_DYNAMIC` is set, then pkg-config will only succeed if it finds a + dynamic version of `foo`. A similar meaning could be applied to `FOO_STATIC`. +* If `PKG_CONFIG_ALL_DYNAMIC` is set, then it will act as if the package `foo` + is requested by be dynamic specifically (similarly for static linking). + +Note that this is not a concrete design, this is just meant to be an example to +show how a common third-party tool can develop a convention for controlling +linkage not through Cargo itself. + +Also note that this can mean that `cargo` itself may not succeed "by default" in +all cases, or larger projects with more flavorful configurations may want to +pursue more fine-tuned control over how libraries are linked. It is intended +that `cargo` will itself be driven with something such as a `Makefile` to +perform this configuration (be it environment or in files). + +# Drawbacks + +* The system proposed here for linking native code is in general somewhat + verbose. In theory well designed third-party Cargo crates can alleviate this + verbosity by providing much of the boilerplate, but it's unclear to what + extent they'll be able to alleviate it. +* None of the third-party crates with "convenient build logic" currently exist, + and it will take time to build these solutions. +* Platform specific dependencies mean that the entire package graph must always + be downloaded, regardless of the platform. +* In general dealing with linkage is quite complex, and the conventions/systems + proposed here aren't exactly trivial and may be overkill for these purposes. + +* As can be seen in the [example repository][verbose], platform dependencies are + quite verbose and are difficult to work with when you actually want a negation + instead of a positive platform to include. +* Features themselves will also likely need to be platform-specific, but this + runs into a number of tricky situations and needs to be fleshed out. + +[verbose]: https://github.com/alexcrichton/complicated-linkage-example/blob/master/curl-sys/Cargo.toml#L9-L17 + +# Alternatives + +* It has been proposed to support the `links` manifest key in the `features` + section as well. In the proposed scheme you would have to create an optional + dependency representing an optional native dependency, but this may be too + burdensome for some cases. + +* The build command could instead take a script from an external package to run + instead of a script inside of the package itself. The major drawback of this + approach is that even the tiniest of build scripts require a full-blown + package which needs to be uploaded to the registry and such. Due to the + verboseness of so many packages, this was decided against. + +* Cargo remains fairly "dumb" with respect to how native libraries are linked, + and it's always a possibility that Cargo could grow more first-class support + for dealing with the linkage of C libraries. + +# Unresolved questions + +None diff --git a/text/0404-change-prefer-dynamic.md b/text/0404-change-prefer-dynamic.md new file mode 100644 index 00000000000..51a45dc1464 --- /dev/null +++ b/text/0404-change-prefer-dynamic.md @@ -0,0 +1,147 @@ +- Start Date: 2014-11-01 +- RFC PR: [#404](https://github.com/rust-lang/rfcs/pull/404) +- Rust Issue: [#18499](https://github.com/rust-lang/rust/issues/18499) + +# Summary + +When the compiler generates a dynamic library, alter the default behavior to +favor linking all dependencies statically rather than maximizing the number of +dynamic libraries. This behavior can be disabled with the existing +`-C prefer-dynamic` flag. + +# Motivation + +Long ago rustc used to only be able to generate dynamic libraries and as a +consequence all Rust libraries were distributed/used in a dynamic form. Over +time the compiler learned to create static libraries (dubbed rlibs). With this +ability the compiler had to grow the ability to choose between linking a library +either statically or dynamically depending on the available formats available to +the compiler. + +Today's heuristics and algorithm are [documented in the compiler][linkage], and +the general idea is that as soon as "statically link all dependencies" fails +then the compiler maximizes the number of dynamic dependencies. Today there is +also not a method of instructing the compiler precisely what form intermediate +libraries should be linked in the source code itself. The linkage can be +"controlled" by passing `--extern` flags with only one per dependency where the +desired format is passed. + +[linkage]: https://github.com/rust-lang/rust/blob/master/src/librustc/middle/dependency_format.rs + +While functional, these heuristics do not allow expressing an important use case +of building a dynamic library as a final product (as opposed to an intermediate +Rust library) while having all dependencies statically linked to the final +dynamic library. This use case has been seen in the wild a number of times, and +the current workaround is to generate a `staticlib` and then invoke the linker +directly to convert that to a `dylib` (which relies on rustc generating PIC +objects by default). + +The purpose of this RFC is to remedy this use case while largely retaining the +current abilities of the compiler today. + +# Detailed design + +In english, the compiler will change its heuristics for when a dynamic library +is being generated. When doing so, it will attempt to link all dependencies +statically, and failing that, will continue to maximize the number of dynamic +libraries which are linked in. + +The compiler will also repurpose the `-C prefer-dynamic` flag to indicate that +this behavior is not desired, and the compiler should maximize dynamic +dependencies regardless. + +In terms of code, the following patch will be applied to the compiler: + +```patch +diff --git a/src/librustc/middle/dependency_format.rs b/src/librustc/middle/dependency_format.rs +index 8e2d4d0..dc248eb 100644 +--- a/src/librustc/middle/dependency_format.rs ++++ b/src/librustc/middle/dependency_format.rs +@@ -123,6 +123,16 @@ fn calculate_type(sess: &session::Session, + return Vec::new(); + } + ++ // Generating a dylib without `-C prefer-dynamic` means that we're going ++ // to try to eagerly statically link all dependencies. This is normally ++ // done for end-product dylibs, not intermediate products. ++ config::CrateTypeDylib if !sess.opts.cg.prefer_dynamic => { ++ match attempt_static(sess) { ++ Some(v) => return v, ++ None => {} ++ } ++ } ++ + // Everything else falls through below + config::CrateTypeExecutable | config::CrateTypeDylib => {}, + } +``` + +# Drawbacks + +None currently, but the next section of alternatives lists a few other methods +of possibly achieving the same goal. + +# Alternatives + +## Disallow intermediate dynamic libraries + +One possible solution to this problem is to completely disallow dynamic +libraries as a possible intermediate format for rust libraries. This would solve +the above problem in the sense that the compiler never has to make a choice. +This would also additionally cut the distribution size in roughly half because +only rlibs would be shipped, not dylibs. + +Another point in favor of this approach is that the story for dynamic libraries +in Rust (for Rust) is also somewhat lacking with today's compiler. The ABI of a +library changes quite frequently for unrelated changes, and it is thus +infeasible to expect to ship a dynamic Rust library to later be updated +in-place without recompiling downstream consumers. By disallowing dynamic +libraries as intermediate formats in Rust, it is made quite obvious that a Rust +library cannot depend on another dynamic Rust library. This would be codifying +the convention today of "statically link all Rust code" in the compiler itself. + +The major downside of this approach is that it would then be impossible to write +a plugin for Rust in Rust. For example compiler plugins would cease to work +because the standard library would be statically linked to both the `rustc` +executable as well as the plugin being loaded. + +In the common case duplication of a library in the same process does not tend to +have adverse side effects, but some of the more flavorful features tend to +interact adversely with duplication such as: + +* Globals with significant addresses (`static`s). These globals would all be + duplicated and have different addresses depending on what library you're + talking to. +* TLS/TLD. Any "thread local" or "task local" notion will be duplicated + across each library in the process. + +Today's design of the runtime in the standard library causes dynamically loaded +plugins with a statically linked standard library to fail very quickly as soon +as any runtime-related operations is performed. Note, however, that the runtime +of the standard library will likely be phased out soon, but this RFC considers +the cons listed above to be reasons to not take this course of action. + +## Allow fine-grained control of linkage + +Another possible alternative is to allow fine-grained control in the compiler to +explicitly specify how each library should be linked (as opposed to a blanked +prefer dynamic or not). + +Recent forays with native libraries in Cargo has led to the conclusion that +hardcoding linkage into source code is often a hazard and a source of pain down +the line. The ultimate decision of how a library is linked is often not up to +the author, but rather the developer or builder of a library itself. + +This leads to the conclusion that linkage control of this form should be +controlled through the command line instead, which is essentially already +possible today (via `--extern`). Cargo essentially does this, but the standard +libraries are shipped in dylib/rlib formats, causing the pain points listed in +the motivation. + +As a result, this RFC does not recommend pursuing this alternative too far, but +rather considers the alteration above to the compiler's heuristics to be +satisfactory for now. + +# Unresolved questions + +None yet! diff --git a/text/0418-struct-variants.md b/text/0418-struct-variants.md new file mode 100644 index 00000000000..326d33a8e50 --- /dev/null +++ b/text/0418-struct-variants.md @@ -0,0 +1,126 @@ +- Start Date: 2014-10-25 +- RFC PR: [rust-lang/rfcs#418](https://github.com/rust-lang/rfcs/pull/418) +- Rust Issue: [rust-lang/rust#18641](https://github.com/rust-lang/rust/issues/18641) + +# Summary + +Just like structs, variants can come in three forms - unit-like, tuple-like, +or struct-like: +```rust +enum Foo { + Foo, + Bar(int, String), + Baz { a: int, b: String } +} +``` +The last form is currently feature gated. This RFC proposes to remove that gate +before 1.0. + +# Motivation + +Tuple variants with multiple fields can become difficult to work with, +especially when the types of the fields don't make it obvious what each one is. +It is not an uncommon sight in the compiler to see inline comments used to help +identify the various variants of an enum, such as this snippet from +`rustc::middle::def`: +```rust +pub enum Def { + // ... + DefVariant(ast::DefId /* enum */, ast::DefId /* variant */, bool /* is_structure */), + DefTy(ast::DefId, bool /* is_enum */), + // ... +} +``` +If these were changed to struct variants, this ad-hoc documentation would move +into the names of the fields themselves. These names are visible in rustdoc, +so a developer doesn't have to go source diving to figure out what's going on. +In addition, the fields of struct variants can have documentation attached. +```rust +pub enum Def { + // ... + DefVariant { + enum_did: ast::DefId, + variant_did: ast::DefId, + /// Identifies the variant as tuple-like or struct-like + is_structure: bool, + }, + DefTy { + did: ast::DefId, + is_enum: bool, + }, + // ... +} +``` + +As the number of fields in a variant increases, it becomes increasingly crucial +to use struct variants. For example, consider this snippet from +`rust-postgres`: +```rust +enum FrontendMessage<'a> { + // ... + Bind { + pub portal: &'a str, + pub statement: &'a str, + pub formats: &'a [i16], + pub values: &'a [Option>], + pub result_formats: &'a [i16] + }, + // ... +} +``` +If we convert `Bind` to a tuple variant: +```rust +enum FrontendMessage<'a> { + // ... + Bind(&'a str, &'a str, &'a [i16], &'a [Option>], &'a [i16]), + // ... +} +``` +we run into both the documentation issues discussed above, as well as ergonomic +issues. If code only cares about the `values` and `formats` fields, working +with a struct variant is nicer: +```rust +match msg { + // you can reorder too! + Bind { values, formats, .. } => ... + // ... +} +``` +versus +```rust +match msg { + Bind(_, _, formats, values, _) => ... + // ... +} +``` + +This feature gate was originally put in place because there were many serious +bugs in the compiler's support for struct variants. This is not the case today. +The issue tracker does not appear have any open correctness issues related to +struct variants and many libraries, including rustc itself, have been using +them without trouble for a while. + +# Detailed design + +Change the `Status` of the `struct_variant` feature from `Active` to +`Accepted`. + +The fields of struct variants use the same style of privacy as normal struct +fields - they're private unless tagged `pub`. This is inconsistent with tuple +variants, where the fields have inherited visibility. Struct variant fields +will be changed to have inhereted privacy, and `pub` will no longer be allowed. + +# Drawbacks + +Adding formal support for a feature increases the maintenance burden of rustc. + +# Alternatives + +If struct variants remain feature gated at 1.0, libraries that want to ensure +that they will continue working into the future will be forced to avoid struct +variants since there are no guarantees about backwards compatibility of +feature-gated parts of the language. + +# Unresolved questions + +N/A diff --git a/text/0430-finalizing-naming-conventions.md b/text/0430-finalizing-naming-conventions.md new file mode 100644 index 00000000000..4989d0d25d4 --- /dev/null +++ b/text/0430-finalizing-naming-conventions.md @@ -0,0 +1,85 @@ +- Start Date: 2014-11-02 +- RFC PR: [rust-lang/rfcs#430](https://github.com/rust-lang/rfcs/pull/430) +- Rust Issue: [rust-lang/rust#19091](https://github.com/rust-lang/rust/issues/19091) + +# Summary + +This conventions RFC tweaks and finalizes a few long-running de facto +conventions, including capitalization/underscores, and the role of the `unwrap` method. + +See [this RFC](https://github.com/rust-lang/rfcs/pull/328) for a competing proposal for `unwrap`. + +# Motivation + +This is part of the ongoing conventions formalization process. The +conventions described here have been loosely followed for a long time, +but this RFC seeks to nail down a few final details and make them +official. + +# Detailed design + +## General naming conventions + +In general, Rust tends to use `CamelCase` for "type-level" constructs +(types and traits) and `snake_case` for "value-level" constructs. More +precisely, the proposed (and mostly followed) conventions are: + +| Item | Convention | +| ---- | ---------- | +| Crates | `snake_case` (but prefer single word) | +| Modules | `snake_case` | +| Types | `CamelCase` | +| Traits | `CamelCase` | +| Enum variants | `CamelCase` | +| Functions | `snake_case` | +| Methods | `snake_case` | +| General constructors | `new` or `with_more_details` | +| Conversion constructors | `from_some_other_type` | +| Local variables | `snake_case` | +| Static variables | `SCREAMING_SNAKE_CASE` | +| Constant variables | `SCREAMING_SNAKE_CASE` | +| Type parameters | concise `CamelCase`, usually single uppercase letter: `T` | +| Lifetimes | short, lowercase: `'a` | + +### Fine points + +In `CamelCase`, acronyms count as one word: use `Uuid` rather than +`UUID`. In `snake_case`, acronyms are lower-cased: `is_xid_start`. + +In `snake_case` or `SCREAMING_SNAKE_CASE`, a "word" should never +consist of a single letter unless it is the last "word". So, we have +`btree_map` rather than `b_tree_map`, but `PI_2` rather than `PI2`. + +## `unwrap`, `into_foo` and `into_inner` + +There has been a [long](https://github.com/mozilla/rust/issues/13159) +[running](https://github.com/rust-lang/rust/pull/16436) +[debate](https://github.com/rust-lang/rust/pull/16436) +[about](https://github.com/rust-lang/rfcs/pull/328) the name of the +`unwrap` method found in `Option` and `Result`, but also a few other +standard library types. Part of the problem is that for some types +(e.g. `BufferedReader`), `unwrap` will never panic; but for `Option` +and `Result` calling `unwrap` is akin to asserting that the value is +`Some`/`Ok`. + +There's basic agreement that we should have an unambiguous term for +the `Option`/`Result` version of `unwrap`. Proposals have included +`assert`, `ensure`, `expect`, `unwrap_or_panic` and others; see the +links above for extensive discussion. No clear consensus has emerged. + +This RFC proposes a simple way out: continue to call the methods +`unwrap` for `Option` and `Result`, and rename *other* uses of +`unwrap` to follow conversion conventions. Whenever possible, these +panic-free unwrapping operations should be `into_foo` for some +concrete `foo`, but for generic types like `RefCell` the name +`into_inner` will suffice. By convention, these `into_` methods cannot +panic; and by (proposed) convention, `unwrap` should be reserved for +an `into_inner` conversion that *can*. + +# Drawbacks + +Not really applicable; we need to finalize these conventions. + +# Unresolved questions + +Are there remaining subtleties about the rules here that should be clarified? diff --git a/text/0438-precedence-of-plus.md b/text/0438-precedence-of-plus.md new file mode 100644 index 00000000000..4328be9cacb --- /dev/null +++ b/text/0438-precedence-of-plus.md @@ -0,0 +1,92 @@ +- Start Date: 2014-11-18 +- RFC PR: [rust-lang/rfcs#438](https://github.com/rust-lang/rfcs/pull/438) +- Rust Issue: [rust-lang/rust#19092](https://github.com/rust-lang/rust/issues/19092) + +# Summary + +Change the precedence of `+` (object bounds) in type grammar so that +it is similar to the precedence in the expression grammars. + +# Motivation + +Currently `+` in types has a much higher precedence than it does in expressions. +This means that for example one can write a type like the following: + +``` +&Object+Send +``` + +Whereas if that were an expression, parentheses would be required: + +```rust +&(Object+Send) +```` + +Besides being confusing in its own right, this loose approach with +regard to precedence yields ambiguities with unboxed closure bounds: + +```rust +fn foo(f: F) + where F: FnOnce(&int) -> &Object + Send +{ } +``` + +In this example, it is unclear whether `F` returns an object which is +`Send`, or whether `F` itself is `Send`. + +# Detailed design + +This RFC proposes that the precedence of `+` be made lower than unary +type operators. In addition, the grammar is segregated such that in +"open-ended" contexts (e.g., after `->`), parentheses are required to +use a `+`, whereas in others (e.g., inside `<>`), parentheses are not. +Here are some examples: + +```rust +// Before After Note +// ~~~~~~ ~~~~~ ~~~~ + &Object+Send &(Object+Send) + &'a Object+'a &'a (Object+'a) + Box Box + foo::(...) foo::(...) + Fn() -> Object+Send Fn() -> (Object+Send) // (*) + Fn() -> &Object+Send Fn() -> &(Object+Send) + +// (*) Must yield a type error, as return type must be `Sized`. +``` + +More fully, the type grammar is as follows (EBNF notation): + + TYPE = PATH + | '&' [LIFETIME] TYPE + | '&' [LIFETIME] 'mut' TYPE + | '*' 'const' TYPE + | '*' 'mut' TYPE + | ... + | '(' SUM ')' + SUM = TYPE { '+' TYPE } + PATH = IDS '<' SUM { ',' SUM } '>' + | IDS '(' SUM { ',' SUM } ')' '->' TYPE + IDS = ['::'] ID { '::' ID } + +Where clauses would use the following grammar: + + WHERE_CLAUSE = PATH { '+' PATH } + +One property of this grammar is that the `TYPE` nonterminal does not +require a terminator as it has no "open-ended" expansions. `SUM`, in +contrast, can be extended any number of times via the `+` token. Hence +is why `SUM` must be enclosed in parens to make it into a `TYPE`. + +# Drawbacks + +Common types like `&'a Foo+'a` become slightly longer (`&'a (Foo+'a)`). + +# Alternatives + +We could live with the inconsistency between the type/expression +grammars and disambiguate where clauses in an ad-hoc way. + +# Unresolved questions + +None. diff --git a/text/0439-cmp-ops-reform.md b/text/0439-cmp-ops-reform.md new file mode 100644 index 00000000000..e7f01ddb89f --- /dev/null +++ b/text/0439-cmp-ops-reform.md @@ -0,0 +1,487 @@ +- Start Date: 2014-11-03 +- RFC PR: [rust-lang/rfcs#439](https://github.com/rust-lang/rfcs/pull/439) +- Rust Issue: [rust-lang/rfcs#19148](https://github.com/rust-lang/rust/issues/19148) + +# Summary + +This RFC proposes a number of design improvements to the `cmp` and +`ops` modules in preparation for 1.0. The impetus for these +improvements, besides the need for stabilization, is that we've added +several important language features (like multidispatch) that greatly +impact the design. Highlights: + +* Make basic unary and binary operators work by value and use associated types. +* Generalize comparison operators to work across different types; drop `Equiv`. +* Refactor slice notation in favor of *range notation* so that special + traits are no longer needed. +* Add `IndexSet` to better support maps. +* Clarify ownership semantics throughout. + +# Motivation + +The operator and comparison traits play a double role: they are lang +items known to the compiler, but are also library APIs that need to be +stabilized. + +While the traits have been fairly stable, a lot has changed in the +language recently, including the addition of multidispatch, associated +types, and changes to method resolution (especially around smart +pointers). These are all things that impact the ideal design of the traits. + +Since it is now relatively clear how these language features will work +at 1.0, there is enough information to make final decisions about the +construction of the comparison and operator traits. That's what this +RFC aims to do. + +# Detailed design + +The traits in `cmp` and `ops` can be broken down into several +categories, and to keep things manageable this RFC discusses each +category separately: + +* Basic operators: + * Unary: `Neg`, `Not` + * Binary: `Add`, `Sub`, `Mul`, `Div`, `Rem`, `Shl`, `Shr`, `BitAnd`, `BitOr`, `BitXor`, +* Comparison: `PartialEq`, `PartialOrd`, `Eq`, `Ord`, `Equiv` +* Indexing and slicing: `Index`, `IndexMut`, `Slice`, `SliceMut` +* Special traits: `Deref`, `DerefMut`, `Drop`, `Fn`, `FnMut`, `FnOnce` + +## Basic operators + +The basic operators include arithmetic and bitwise notation with both +unary and binary operators. + +### Current design + +Here are two example traits, one unary and one binary, for basic operators: + +```rust +pub trait Not { + fn not(&self) -> Result; +} + +pub trait Add { + fn add(&self, rhs: &Rhs) -> Result; +} +``` + +The rest of the operators follow the same pattern. Note that `self` +and `rhs` are taken by reference, and the compiler introduce *silent* +uses of `&` for the operands. + +The traits also take `Result` as an +[*input*](https://github.com/rust-lang/rfcs/pull/195) type. + +### Proposed design + +This RFC proposes to make `Result` an associated (output) type, and to +make the traits work by value: + +```rust +pub trait Not { + type Result; + fn not(self) -> Result; +} + +pub trait Add { + type Result; + fn add(self, rhs: Rhs) -> Result; +} +``` + +The reason to make `Result` an associated type is straightforward: it +should be uniquely determined given `Self` and other input types, and +making it an associated type is better for both type inference and for +keeping things concise when using these traits in bounds. + +Making these traits work by value is motivated by cases like `DList` +concatenation, where you may want the operator to actually consume the +operands in producing its output (by welding the two lists together). + +It also means that the compiler does not have to introduce a silent +`&` for the operands, which means that the ownership semantics when +using these operators is much more clear. + +Fortunately, there is no loss in expressiveness, since you can always +implement the trait on reference types. However, for types that *do* +need to be taken by reference, there is a slight loss in ergonomics +since you may need to explicitly borrow the operands with `&`. The +upside is that the ownership semantics become clearer: they more +closely resemble normal function arguments. + +By keeping `Rhs` as an input trait on the trait, you can overload on the +types of both operands via +[multidispatch](https://github.com/rust-lang/rfcs/pull/195). By +defaulting `Rhs` to `Self`, in +[the future](https://github.com/rust-lang/rfcs/pull/213) it will be +possible to simply say `T: Add` as shorthand for `T: Add`, which is +the common case. + +Examples: + +```rust +// Basic setup for Copy types: +impl Add for uint { + type Result = uint; + fn add(self, rhs: uint) -> uint { ... } +} + +// Overloading on the Rhs: +impl Add for Complex { + type Result = Complex; + fn add(self, rhs: uint) -> Complex { ... } +} + +impl Add for Complex { + type Result = Complex; + fn add(self, rhs: Complex) -> Complex { ... } +} + +// Recovering by-ref semantics: +impl<'a, 'b> Add<&'a str> for &'b str { + type Result = String; + fn add(self, rhs: &'a str) -> String { ... } +} +``` + +## Comparison traits + +The comparison traits provide overloads for operators like `==` and `>`. + +### Current design + +Comparisons are subtle, because some types (notably `f32` and `f64`) +do not actually provide full equivalence relations or total +orderings. The current design therefore splits the comparison traits +into "partial" variants that do not promise full equivalence +relations/ordering, and "total" variants which inherit from them but +make stronger semantic guarantees. The floating point types implement +the partial variants, and the operators defer to them. But certain +collection types require e.g. total rather than partial orderings: + +```rust +pub trait PartialEq { + fn eq(&self, other: &Self) -> bool; + + fn ne(&self, other: &Self) -> bool { !self.eq(other) } +} + +pub trait Eq: PartialEq {} + +pub trait PartialOrd: PartialEq { + fn partial_cmp(&self, other: &Self) -> Option; + fn lt(&self, other: &Self) -> bool { .. } + fn le(&self, other: &Self) -> bool { .. } + fn gt(&self, other: &Self) -> bool { .. } + fn ge(&self, other: &Self) -> bool { .. } +} + +pub trait Ord: Eq + PartialOrd { + fn cmp(&self, other: &Self) -> Ordering; +} + +pub trait Equiv { + fn equiv(&self, other: &T) -> bool; +} +``` + +In addition there is an `Equiv` trait that can be used to compare +values of *different* types for equality, but does not correspond to +any operator sugar. (It was introduced in part to help solve some +problems in map APIs, which are now resolved in a different way.) + +The comparison traits all work by reference, and the compiler inserts +implicit uses of `&` to make this ergonomic. + +### Proposed design + +This RFC proposes to follow largely the same design strategy, but to +remove `Equiv` and instead generalize the traits via multidispatch: + +```rust +pub trait PartialEq { + fn eq(&self, other: &Rhs) -> bool; + + fn ne(&self, other: &Rhs) -> bool { !self.eq(other) } +} + +pub trait Eq: PartialEq {} + +pub trait PartialOrd: PartialEq { + fn partial_cmp(&self, other: &Rhs) -> Option; + fn lt(&self, other: &Rhs) -> bool { .. } + fn le(&self, other: &Rhs) -> bool { .. } + fn gt(&self, other: &Rhs) -> bool { .. } + fn ge(&self, other: &Rhs) -> bool { .. } +} + +pub trait Ord: Eq + PartialOrd { + fn cmp(&self, other: &Rhs) -> Ordering; +} +``` + +Due to the use of defaulting, this generalization loses no +ergonomics. However, it makes it *possible* to overload notation like +`==` to compare different types without needing an explicit +conversion. (Precisely *which* overloadings we provide in `std` will +be subject to API stabilization.) This more general design will allow +us to eliminate the `iter::order` submodule in favor of comparison +notation, for example. + +This design suffers from the problem that it is somewhat painful to +implement or derive `Eq`/`Ord`, which is the common case. We can +likely improve e.g. `#[deriving(Ord)]` to automatically derive +`PartialOrd`. See Alternatives for a more radical design (and the +reasons that it's not feasible right now.) + +## Indexing and slicing + +There are a few traits that support `[]` notation for indexing and slicing. + +### Current design: + +The current design is as follows: + +```rust +pub trait Index { + fn index<'a>(&'a self, index: &Index) -> &'a Result; +} + +pub trait IndexMut { + fn index_mut<'a>(&'a mut self, index: &Index) -> &'a mut Result; +} + +pub trait Slice for Sized? { + fn as_slice_<'a>(&'a self) -> &'a Result; + fn slice_from_or_fail<'a>(&'a self, from: &Idx) -> &'a Result; + fn slice_to_or_fail<'a>(&'a self, to: &Idx) -> &'a Result; + fn slice_or_fail<'a>(&'a self, from: &Idx, to: &Idx) -> &'a Result; +} + +// and similar for SliceMut... +``` + +The index and slice traits work somewhat differently. For +`Index`/`IndexMut`, the return value is *implicitly* dereferenced, so +that notation like `v[i] = 3` makes sense. If you want to get your +hands on the actual reference, you usually need an explicit `&`, for +example `&v[i]` or `&mut v[i]` (the compiler determines whether to use +`Index` or `IndexMut` by context). This follows the C notational +tradition. + +Slice notation, on the other hand, does *not* automatically dereference +and so requires a special `mut` marker: `v[mut 1..]`. + +For both of these traits, the indexes themselves are taken by +reference, and the compiler automatically introduces a `&` (so you +write `v[3]` not `v[&3]`). + +### Proposed design + +This RFC proposes to refactor the slice design into more modular +components, which as a side-product will make slicing automatically +dereference the result (consistently with indexing). The latter is +desirable because `&mut v[1..]` is more consistent with the rest of +the language than `v[mut 1..]` (and also makes the borrowing semantics +more explicit). + +#### Index revisions + +In the new design, the index traits take the index by value and the +compiler no longer introduces a silent `&`. This follows the same +design as for e.g. `Add` above, and for much the same reasons. That +means in particular that it will be possible to write `map["key"]` +rather than `map[*"key"]` when using a map with `String` keys, and +will still be possible to write `v[3]` for vectors. In addition, the +`Result` becomes an associated type, again following the same design +outlined above: + +```rust +pub trait Index for Sized? { + type Sized? Result; + fn index<'a>(&'a self, index: Idx) -> &'a Result; +} + +pub trait IndexMut for Sized? { + type Sized? Result; + fn index_mut<'a>(&'a mut self, index: Idx) -> &'a mut Result; +} +``` + +In addition, this RFC proposes another trait, `IndexSet`, that is used for `expr[i] = expr`: + +```rust +pub trait IndexSet { + type Val; + fn index_set<'a>(&'a mut self, index: Idx, val: Val); +} +``` + +(This idea is borrowed from +[@sfackler's earlier RFC](https://github.com/rust-lang/rfcs/pull/159/files).) + +The motivation for this trait is cases like `map["key"] = val`, which +should correspond to an *insertion* rather than a mutable lookup. With +today's setup, that expression would result in a panic if "key" was +not already present in the map. + +Of course, `IndexSet` and `IndexMut` overlap, since `expr[i] = expr` +could be interpreted using either. Some types may implement `IndexSet` +but not `IndexMut` (for example, if it doesn't make sense to produce +an interior reference). But for types providing both, the compiler +will use `IndexSet` to interpret the `expr[i] = expr` syntax. (You can +always get `IndexMut` by instead writing `* &mut expr[i] = expr`, but +this will likely be extremely rare.) + +#### Slice revisions + +The changes to slice notation are more radical: this RFC proposes to +remove the slice traits altogether! The replacement is to introduce +*range notation* and overload indexing on it. + +The current slice notation allows you to write `v[i..j]`, `v[i..]`, +`v[..j]` and `v[]`. The idea for handling the first three is to add +the following desugaring: + +```rust +i..j ==> Range(i, j) +i.. ==> RangeFrom(i) +..j ==> RangeTo(j) + +where + +struct Range(Idx, Idx); +struct RangeFrom(Idx); +struct RangeTo(Idx); +``` + +Then, to implement slice notation, you just implement `Index`/`IndexMut` with +`Range`, `RangeFrom`, and `RangeTo` index types. + +This cuts down on the number of special traits and machinery. It makes +indexing and slicing more consistent (since both will implicitly deref +their result); you'll write `&mut v[1..]` to get a mutable slice. It +also opens the door to other uses of the range notation: + +``` +for x in 1..100 { ... } +``` + +because the refactored design is more modular. + +What about `v[]` notation? The proposal is to desugar this to +`v[FullRange]` where `struct FullRange;`. + +Note that `..` is already used in a few places in the grammar, notably +fixed length arrays and functional record update. The former is at the +type level, however, and the latter is not ambiguous: `Foo { a: x, +.. bar}` since the `.. bar` component will never be parsed as an +expression. + +## Special traits + +Finally, there are a few "special" traits that hook into the compiler +in various ways that go beyond basic operator overlaoding. + +### `Deref` and `DerefMut` + +The `Deref` and `DerefMut` traits are used for overloading +dereferencing, typically for smart pointers. + +The current traits look like so: + +```rust +pub trait Deref { + fn deref<'a>(&'a self) -> &'a Result; +} +``` + +but the `Result` type should become an associated type, dictating that +a smart pointer can only deref to a single other type (which is also +needed for inference and other magic around deref): + +```rust +pub trait Deref { + type Sized? Result; + fn deref<'a>(&'a self) -> &'a Result; +} +``` + +### `Drop` + +This RFC proposes no changes to the `Drop` trait. + +### Closure traits + +This RFC proposes no changes to the closure traits. The current design looks like: + +```rust +pub trait Fn { + fn call(&self, args: Args) -> Result; +} +``` + +and, given the way that multidispatch has worked out, it is safe and +more flexible to keep both `Args` and `Result` as input types (which +means that custom implementations could overload on either). In +particular, the sugar for these traits requires writing all of these +types anyway. + +These traits should *not* be exposed as `#[stable]` for 1.0, meaning +that you will not be able to implement or use them directly from the +[stable release channel](http://blog.rust-lang.org/2014/10/30/Stability.html). There +are a few reasons for this. For one, when bounding by these traits you +generally want to use the sugar `Fn (T, U) -> V` instead, which will +be stable. Keeping the traits themselves unstable leaves us room to +change their definition to support +[variadic generics](https://github.com/rust-lang/rfcs/issues/376) in +the future. + +# Drawbacks + +The main drawback is that implementing the above will take a bit of +time, which is something we're currently very short on. However, +stabilizing `cmp` and `ops` has always been part of the plan, and has +to be done for 1.0. + +# Alternatives + +## Comparison traits + +We could pursue a more aggressive change to the comparison traits by +not having `PartialOrd` be a super trait of `Ord`, but instead +providing a blanket `impl` for `PartialOrd` for any `T: +Ord`. Unfortunately, this design poses some problems when it comes to +things like tuples, which want to provide `PartialOrd` and `Ord` if +all their components do: you would end up with overlapping +`PartialOrd` `impl`s. It's possible to work around this, but at the +expense of additional language features (like "negative bounds", the +ability to make an `impl` apply only when certain things are *not* +true). + +Since it's unlikely that these other changes can happen in time for +1.0, this RFC takes a more conservative approach. + +## Slicing + +We may want to drop the `[]` notation. This notation was introduced to +improve ergonomics (from `foo(v.as_slice())` to `foo(v[]`), but now +that [collections reform](https://github.com/rust-lang/rfcs/pull/235) +is starting to land we can instead write `foo(&*v)`. If we also had +[deref coercions](https://github.com/rust-lang/rfcs/pull/241), that +would be just `foo(&v)`. + +While `&*v` notation is more ergonomic than `v.as_slice()`, it is also +somewhat intimidating notation for a situation that newcomers to the +language are likely to face quickly. + +In the opinion of this RFC author, we should either keep `[]` +notation, or provide deref coercions so that you can just say `&v`. + +# Unresolved questions + +In the long run, we should support overloading of operators like `+=` +which often have a more efficient implementation than desugaring into +a `+` and an `=`. However, this can be added backwards-compatibly and +is not significantly blocking library stabilization, so this RFC +postpones consideration until a later date. diff --git a/text/0445-extension-trait-conventions.md b/text/0445-extension-trait-conventions.md new file mode 100644 index 00000000000..79d700711bd --- /dev/null +++ b/text/0445-extension-trait-conventions.md @@ -0,0 +1,148 @@ +- Start Date: 2014-11-05 +- RFC PR: [rust-lang/rfcs#445](https://github.com/rust-lang/rfcs/pull/445) +- Rust Issue: [rust-lang/rust#19324](https://github.com/rust-lang/rust/issues/19324) + +# Summary + +This is a conventions RFC establishing a definition and naming +convention for *extension traits*: `FooExt`. + +# Motivation + +This RFC is part of the ongoing API conventions and stabilization +effort. + +Extension traits are a programming pattern that makes it +possible to add methods to an existing type outside of the crate +defining that type. While they should be used sparingly, the new +[object safety rules](https://github.com/rust-lang/rfcs/pull/255) have +increased the need for this kind of trait, and hence the need for a +clear convention. + +# Detailed design + +## What is an extension trait? + +Rust currently allows inherent methods to be defined on a type only in +the crate where that type is defined. But it is often the case that +clients of a type would like to incorporate additional methods to +it. Extension traits are a pattern for doing so: + +```rust +extern crate foo; +use foo::Foo; + +trait FooExt { + fn bar(&self); +} + +impl FooExt for Foo { + fn bar(&self) { .. } +} +``` + +By defining a new trait, a client of `foo` can add new methods to `Foo`. + +Of course, adding methods via a new trait happens all the time. What +makes it an *extension* trait is that the trait is not designed for +*generic* use, but only as way of adding methods to a specific type or +family of types. + +This is of course a somewhat subjective distinction. Whenever +designing an extension trait, one should consider whether the trait +could be used in some more generic way. If so, the trait should be +named and exported as if it were just a "normal" trait. But traits +offering groups of methods that really only make sense in the context +of some particular type(s) are true extension traits. + +The new +[object safety rules](https://github.com/rust-lang/rfcs/pull/255) mean +that a trait can only be used for trait objects if *all* of its +methods are usable; put differently, it ensures that for "object safe +traits" there is always a canonical way to implement `Trait` for +`Box`. To deal with this new rule, it is sometimes necessary to +break traits apart into an object safe trait and extension traits: + +```rust +// The core, object-safe trait +trait Iterator { + fn next(&mut self) -> Option; +} + +// The extension trait offering object-unsafe methods +trait IteratorExt: Iterator { + fn chain>(self, other: U) -> Chain { ... } + fn zip>(self, other: U) -> Zip { ... } + fn map(self, f: |A| -> B) -> Map<'r, A, B, Self> { ... } + ... +} + +// A blanket impl +impl IteratorExt for I where I: Iterator { + ... +} +``` + +Note that, although this split-up definition is somewhat more complex, +it is also more flexible: because `Box>` will implement +`Iterator`, you can now use *all* of the adapter methods provided +in `IteratorExt` on trait objects, even though they are not object +safe. + +## The convention + +The proposed convention is, first of all, to (1) prefer adding default +methods to existing traits or (2) prefer generically useful traits to +extension traits whenever feasible. + +For true extension traits, there should be a clear type or trait that +they are extending. The extension trait should be called `FooExt` +where `Foo` is that type or trait. + +In some cases, the extension trait only applies conditionally. For +example, `AdditiveIterator` is an extension trait currently in `std` +that applies to iterators over numeric types. These extension traits +should follow a similar convention, putting together the type/trait +name and the qualifications, together with the `Ext` suffix: +`IteratorAddExt`. + +### What about `Prelude`? + +A [previous convention](https://github.com/rust-lang/rfcs/pull/344) +used a `Prelude` suffix for extension traits that were also part of +the `std` prelude; this new convention deprecates that one. + +## Future proofing + +In the future, the need for many of these extension traits may +disappear as other languages features are added. For example, +method-level `where` clauses will eliminate the need for +`AdditiveIterator`. And allowing inherent `impl`s like `impl +T { .. }` for the crate defining `Trait` would eliminate even more. + +However, there will always be *some* use of extension traits, and we +need to stabilize the 1.0 libraries prior to these language features +landing. So this is the proposed convention for now, and in the future +it may be possible to deprecate some of the resulting traits. + +# Alternatives + +It seems clear that we need *some* convention here. Other possible +suffixes would be `Util` or `Methods`, but `Ext` is both shorter and +connects to the name of the pattern. + +# Drawbacks + +In general, extension traits tend to require additional imports -- +especially painful when dealing with object safety. However, this is +more to do with the language as it stands today than with the +conventions in this RFC. + +Libraries are already starting to export their own `prelude` module +containing extension traits among other things, which by convention is +glob imported. + +In the long run, we should add a general "prelude" facility for +external libraries that makes it possible to *globally* import a small +set of names from the crate. Some early investigations of such a +feature are already under way, but are outside the scope of this RFC. diff --git a/text/0446-es6-unicode-escapes.md b/text/0446-es6-unicode-escapes.md new file mode 100644 index 00000000000..8124f205b9a --- /dev/null +++ b/text/0446-es6-unicode-escapes.md @@ -0,0 +1,84 @@ +- Start Date: 2014-11-05 +- RFC PR: https://github.com/rust-lang/rfcs/pull/446 +- Rust Issue: https://github.com/rust-lang/rust/issues/19739 + +# Summary + +Remove `\u203D` and `\U0001F4A9` unicode string escapes, and add +[ECMAScript 6-style](https://mathiasbynens.be/notes/javascript-escapes#unicode-code-point) +`\u{1F4A9}` escapes instead. + +# Motivation + +The syntax of `\u` followed by four hexadecimal digits dates from when Unicode +was a 16-bit encoding, and only went up to U+FFFF. +`\U` followed by eight hex digits was added as a band-aid +when Unicode was extended to U+10FFFF, +but neither four nor eight digits particularly make sense now. + +Having two different syntaxes with the same meaning but that apply +to different ranges of values is inconsistent and arbitrary. +This proposal unifies them into a single syntax that has a precedent +in ECMAScript a.k.a. JavaScript. + + +# Detailed design + +In terms of the grammar in [The Rust Reference]( +http://doc.rust-lang.org/reference.html#character-and-string-literals), +replace: + +``` +unicode_escape : 'u' hex_digit 4 + | 'U' hex_digit 8 ; +``` + +with + +``` +unicode_escape : 'u' '{' hex_digit+ 6 '}' +``` + +That is, `\u{` followed by one to six hexadecimal digits, followed by `}`. + +The behavior would otherwise be identical. + +## Migration strategy + +In order to provide a graceful transition from the old `\uDDDD` and +`\UDDDDDDDD` syntax to the new `\u{DDDDD}` syntax, this feature +should be added in stages: + +* Stage 1: Add support for the new `\u{DDDDD}` syntax, without removing +previous support for `\uDDDD` and `\UDDDDDDDD`. + +* Stage 2: Warn on occurrences of `\uDDDD` and `\UDDDDDDDD`. Convert +all library code to use `\u{DDDDD}` instead of the old syntax. + +* Stage 3: Remove support for the old syntax entirely (preferably +during a separate release from the one that added the warning from +Stage 2). + +# Drawbacks + +* This is a breaking change and updating code for it manually is annoying. + It is however very mechanical, and we could provide scripts to automate it. +* Formatting templates already use curly braces. + Having multiple curly braces pairs in the same strings that have a very + different meaning can be surprising: + `format!("\u{e8}_{e8}", e8 = "é")` would be `"è_é"`. + However, there is a precedent of overriding characters: + `\` can start an escape sequence both in the Rust lexer for strings + and in regular expressions. + + +# Alternatives + +* Status quo: don’t change the escaping syntax. +* Add the new `\u{…}` syntax, but also keep the existing `\u` and `\U` syntax. + This is what ES 6 does, but only to keep compatibility with ES 5. + We don’t have that constaint pre-1.0. + +# Unresolved questions + +None so far. diff --git a/text/0447-no-unused-impl-parameters.md b/text/0447-no-unused-impl-parameters.md new file mode 100644 index 00000000000..159519119ab --- /dev/null +++ b/text/0447-no-unused-impl-parameters.md @@ -0,0 +1,168 @@ +- Start Date: 2014-11-06 +- RFC PR: https://github.com/rust-lang/rfcs/pull/447 +- Rust Issue: https://github.com/rust-lang/rust/issues/20598 + +# Summary + +Disallow unconstrained type parameters from impls. In practice this +means that every type parameter must either: + +1. appear in the trait reference of the impl, if any; +2. appear in the self type of the impl; or, +3. be bound as an associated type. + +This is an informal description, see below for full details. + +# Motivation + +Today it is legal to have impls with type parameters that are +effectively unconstrainted. This RFC proses to make these illegal by +requiring that all impl type parameters must appear in either the self +type of the impl or, if the impl is a trait impl, an (input) type +parameter of the trait reference. Type parameters can also be constrained +by associated types. + +There are many reasons to make this change. First, impls are not +explicitly instantiated or named, so there is no way for users to +manually specify the values of type variables; the values must be +inferred. If the type parameters do not appear in the trait reference +or self type, however, there is no basis on which to infer them; this +almost always yields an error in any case (unresolved type variable), +though there are some corner cases where the inferencer can find a +constraint. + +Second, permitting unconstrained type parameters to appear on impls +can potentially lead to ill-defined semantics later on. The current +way that the language works for cross-crate inlining is that the body +of the method is effectively reproduced within the target crate, but +in a fully elaborated form where it is as if the user specified every +type explicitly that they possibly could. This should be sufficient to +reproduce the same trait selections, even if the crate adds additional +types and additional impls -- but this cannot be guaranteed if there +are free-floating type parameters on impls, since their values are not +written anywhere. (This semantics, incidentally, is not only +convenient, but also required if we wish to allow for specialization +as a possibility later on.) + +Finally, there is little to no loss of expressiveness. The type +parameters in question can always be moved somewhere else. + +Here are some examples to clarify what's allowed and disallowed. In +each case, we also clarify how the example can be rewritten to be +legal. + +```rust +// Legal: +// - A is used in the self type. +// - B is used in the input trait type parameters. +impl SomeTrait> for Foo { + type Output = Result; +} + +// Legal: +// - A and B are used in the self type +impl Vec<(A,B)> { + ... +} + +// Illegal: +// - A does not appear in the self type nor trait type parameters. +// +// This sort of pattern can generally be written by making `Bar` carry +// `A` as a phantom type parameter, or by making `Elem` an input type +// of `Foo`. +impl Foo for Bar { + type Elem = A; // associated types do not count + ... +} + +// Illegal: B does not appear in the self type. +// +// Note that B could be moved to the method `get()` with no +// loss of expressiveness. +impl Foo { + fn do_something(&self) { + } + + fn get(&self) -> B { + B::Default + } +} + +// Legal: `U` does not appear in the input types, +// but it bound as an associated type of `T`. +impl Foo for T + where T : Bar { +} +``` + +# Detailed design + +Type parameters are legal if they are "constrained" according to the +following inference rules: + +``` +If T appears in the impl trait reference, + then: T is constrained + +If T appears in the impl self type, + then: T is constrained + +If >::U == V appears in the impl predicates, + and T0...Tn are constrained + and T0 as Trait is not the impl trait reference + then: V is constrained +``` + +The interesting rule is of course the final one. It says that type +parameters whose value is determined by an associated type reference +are legal. A simple example is: + +``` +impl Foo for T + where T : Bar +``` + +However, we have to be careful to avoid cases where the associated +type is an associated type of things that are not themselves +constrained: + +``` +impl Foo for T + where U: Bar +``` + +Similarly, the final clause in the rule aims to prevent an impl from +"self-referentially" constraining an output type parameter: + +``` +impl Bar for T + where T : Bar +``` + +This last case isn't that important because impls like this, when +used, tend to result in overflow in the compiler, but it's more +user-friendly to report an error earlier. + +# Drawbacks + +This pattern requires a non-local rewrite to reproduce: + +``` +impl Foo for Bar { + type Elem = A; // associated types do not count + ... +} +``` + +# Alternatives + +To make these type parameters well-defined, we could also create a +syntax for specifying impl type parameter instantiations and/or have +the compiler track the full tree of impl type parameter instantiations +at type-checking time and supply this to the translation phase. This +approach rules out the possibility of impl specialization. + +# Unresolved questions + +None. diff --git a/text/0450-un-feature-gate-some-more-gates.md b/text/0450-un-feature-gate-some-more-gates.md new file mode 100644 index 00000000000..d1f5e0d11f1 --- /dev/null +++ b/text/0450-un-feature-gate-some-more-gates.md @@ -0,0 +1,56 @@ +- Start Date: 2014-12-02 +- RFC PR: [450](https://github.com/rust-lang/rfcs/pull/450) +- Rust Issue: [19469](https://github.com/rust-lang/rust/issues/19469) + +# Summary + +Remove the `tuple_indexing`, `if_let`, and `while_let` feature gates and add +them to the language. + +# Motivation + +## Tuple Indexing + +This feature has proven to be quite useful for tuples and struct variants, and +it allows for the removal of some unnecessary tuple accessing traits in the +standard library (TupleN). + +The implementation has also proven to be quite solid with very few reported +internal compiler errors related to this feature. + +## `if let` and `while let` + +This feature has also proven to be quite useful over time. Many projects are now +leveraging these feature gates which is a testament to their usefulness. + +Additionally, the implementation has also proven to be quite solid with very +few reported internal compiler errors related to this feature. + +# Detailed design + +* Remove the `if_let`, `while_let`, and `tuple_indexing` feature gates. +* Add these features to the language (do not require a feature gate to use them). +* Deprecate the `TupleN` traits in `std::tuple`. + +# Drawbacks + +Adding features to the language this late in the game is always somewhat of a +risky business. These features, while having baked for a few weeks, haven't had +much time to bake in the grand scheme of the language. These are both backwards +compatible to accept, and it could be argued that this could be done later +rather than sooner. + +In general, the major drawbacks of this RFC are the scheduling risks and +"feature bloat" worries. This RFC, however, is quite easy to implement (reducing +schedule risk) and concerns two fairly minor features which are unambiguously +nice to have. + +# Alternatives + +* Instead of un-feature-gating before 1.0, these features could be released + after 1.0 (if at all). The `TupleN` traits would then be required to be + deprecated for the entire 1.0 release cycle. + +# Unresolved questions + +None at the moment. diff --git a/text/0453-macro-reform.md b/text/0453-macro-reform.md new file mode 100644 index 00000000000..dac5bff5751 --- /dev/null +++ b/text/0453-macro-reform.md @@ -0,0 +1,295 @@ +- Start Date: 2014-11-05 +- RFC PR: [rust-lang/rfcs#453](https://github.com/rust-lang/rfcs/pull/453) +- Rust Issue: [rust-lang/rust#20008](https://github.com/rust-lang/rust/issues/20008) + +# Summary + +Various enhancements to macros ahead of their standardization in 1.0. + +**Note**: This is not the final Rust macro system design for all time. Rather, +it addresses the largest usability problems within the limited time frame for +1.0. It's my hope that a lot of these problems can be solved in nicer ways +in the long term (there is some discussion of this below). + +# Motivation + +`macro_rules!` has [many rough +edges](https://github.com/rust-lang/rfcs/issues/440). A few of the big ones: + +- You can't re-export macros +- Even if you could, names produced by the re-exported macro won't follow the re-export +- You can't use the same macro in-crate and exported, without the "curious inner-module" hack +- There's no namespacing at all +- You can't control which macros are imported from a crate +- You need the feature-gated `#[phase(plugin)]` to import macros + +These issues in particular are things we have a chance of addressing for 1.0. +This RFC contains plans to do so. + +# Semantic changes + +These are the substantial changes to the macro system. The examples also use +the improved syntax, described later. + +## `$crate` + +The first change is to disallow importing macros from an `extern crate` that is +not at the crate root. In that case, if + +```rust +extern crate "bar" as foo; +``` + +imports macros, then it's also introducing ordinary paths of the form +`::foo::...`. We call `foo` the *crate ident* of the `extern crate`. + +We introduce a special macro metavar `$crate` which expands to `::foo` when a +macro was imported through crate ident `foo`, and to nothing when it was +defined in the crate where it is being expanded. `$crate::bar::baz` will be an +absolute path either way. + +This feature eliminates the need for the "curious inner-module" and also +enables macro re-export (see below). It is [implemented and +tested](https://github.com/kmcallister/rust/commits/macro-reexport) but needs a +rebase. + +We can add a lint to warn about cases where an exported macro has paths that +are not absolute-with-crate or `$crate`-relative. This will have some +(hopefully rare) false positives. + +## Macro scope + +In this document, the "syntax environment" refers to the set of syntax +extensions that can be invoked at a given position in the crate. The names in +the syntax environment are simple unqualified identifiers such as `panic` and +`vec`. Informally we may write `vec!` to distinguish from an ordinary item. +However, the exclamation point is really part of the invocation syntax, not the +name, and some syntax extensions are invoked with no exclamation point, for +example item decorators like `deriving`. + +We introduce an attribute `macro_use` to specify which macros from an external +crate should be imported to the syntax environment: + +```rust +#[macro_use(vec, panic="fail")] +extern crate std; + +#[macro_use] +extern crate core; +``` + +The list of macros to import is optional. Omitting the list imports all macros, +similar to a glob `use`. (This is also the mechanism by which `std` will +inject its macros into every non-`no_std` crate.) + +Importing with rename is an optional part of this proposal that will be +implemented for 1.0 only if time permits. + +Macros imported this way can be used anywhere in the module after the +`extern crate` item, including in child modules. Since a macro-importing +`extern crate` must appear at the crate root, and view items come before +other items, this effectively means imported macros will be visible for +the entire crate. + +Any name collision between macros, whether imported or defined in-crate, is a +hard error. + +Many macros expand using other "helper macros" as an implementation detail. +For example, librustc's `declare_lint!` uses `lint_initializer!`. The client +should not know about this macro, although it still needs to be exported for +cross-crate use. For this reason we allow `#[macro_use]` on a macro +definition. + +```rust +/// Not to be imported directly. +#[macro_export] +macro_rules! lint_initializer { ... } + +/// Declare a lint. +#[macro_export] +#[macro_use(lint_initializer)] +macro_rules! declare_lint { + ($name:ident, $level:ident, $desc:expr) => ( + static $name: &'static $crate::lint::Lint + = &lint_initializer!($name, $level, $desc); + ) +} +``` + +The macro `lint_initializer!`, imported from the same crate as `declare_lint!`, +will be visible only during further expansion of the result of invoking +`declare_lint!`. + +`macro_use` on `macro_rules` is an optional part of this proposal that will be +implemented for 1.0 only if time permits. Without it, libraries that use +helper macros will need to list them in documentation so that users can import +them. + +Procedural macros need their own way to manipulate the syntax environment, but +that's an unstable internal API, so it's outside the scope of this RFC. + +# New syntax + +We also clean up macro syntax in a way that complements the semantic changes above. + +## `#[macro_use(...)] mod` + +The `macro_use` attribute can be applied to a `mod` item as well. The +specified macros will "escape" the module and become visible throughout the +rest of the enclosing module, including any child modules. A crate might start +with + +```rust +#[macro_use] +mod macros; +``` + +to define some macros for use by the whole crate, without putting those +definitions in `lib.rs`. + +Note that `#[macro_use]` (without a list of names) is equivalent to the +current `#[macro_escape]`. However, the new convention is to use an outer +attribute, in the file whose syntax environment is affected, rather than an +inner attribute in the file defining the macros. + +## Macro export and re-export + +Currently in Rust, a macro definition qualified by `#[macro_export]` becomes +available to other crates. We keep this behavior in the new system. A macro +qualified by `#[macro_export]` can be the target of `#[macro_use(...)]`, and +will be imported automatically when `#[macro_use]` is given with no list of +names. + +`#[macro_export]` has no effect on the syntax environment for the current +crate. + +We can also re-export macros that were imported from another crate. For +example, libcollections defines a `vec!` macro, which would now look like: + +```rust +#[macro_export] +macro_rules! vec { + ($($e:expr),*) => ({ + let mut _temp = $crate::vec::Vec::new(); + $(_temp.push($e);)* + _temp + }) +} +``` + +Currently, libstd duplicates this macro in its own `macros.rs`. Now it could +do + +```rust +#[macro_reexport(vec)] +extern crate collections; +``` + +as long as the module `std::vec` is interface-compatible with +`collections::vec`. + +(Actually the current libstd `vec!` is completely different for efficiency, but +it's just an example.) + +Because macros are exported in crate metadata as strings, macro re-export "just +works" as soon as `$crate` is available. It's implemented as part of the +`$crate` branch mentioned above. + +## `#[plugin]` attribute + +`#[phase(plugin)]` becomes simply `#[plugin]` and is still feature-gated. It +only controls whether to search for and run a plugin registrar function. The +plugin itself will decide whether it's to be linked at runtime, by calling a +`Registry` method. + +`#[plugin]` can optionally take any [meta +items](http://doc.rust-lang.org/syntax/ast/enum.MetaItem_.html) as "arguments", +e.g. + +```rust +#[plugin(foo, bar=3, baz(quux))] +extern crate myplugin; +``` + +rustc itself will not interpret these arguments, but will make them available +to the plugin through a `Registry` method. This facilitates plugin +configuration. The alternative in many cases is to use interacting side +effects between procedural macros, which are harder to reason about. + +## Syntax convention + +`macro_rules!` already allows `{ }` for the macro body, but the convention is +`( )` for some reason. In accepting this RFC we would change to a `{ }` +convention for consistency with the rest of the language. + +## Reserve `macro` as a keyword + +A lot of the syntax alternatives discussed for this RFC involved a `macro` +keyword. The consensus is that macros are too unfinished to merit using the +keyword now. However, we should reserve it for a future macro system. + +# Implementation and transition + +I will coordinate implementation of this RFC, and I expect to write most of the +code myself. + +To ease the transition, we can keep the old syntax as a deprecated synonym, to +be removed before 1.0. + +# Drawbacks + +This is big churn on a major feature, not long before 1.0. + +We can ship improved versions of `macro_rules!` in a back-compatible way (in +theory; I would like to smoke test this idea before 1.0). So we could defer +much of this reform until after 1.0. The main reason not to is macro +import/export. Right now every macro you import will be expanded using your +local copy of `macro_rules!`, regardless of what the macro author had in mind. + +# Alternatives + +We could try to implement proper hygienic capture of crate names in macros. +This would be wonderful, but I don't think we can get it done for 1.0. + +We would have to actually parse the macro RHS when it's defined, find all the +paths it wants to emit (somehow), and then turn each crate reference within +such a path into a globally unique thing that will still work when expanded in +another crate. Right now libsyntax is oblivious to librustc's name resolution +rules, and those rules can't be applied until macro expansion is done, because +(for example) a macro can expand to a `use` item. + +nrc suggested dropping the `#![macro_escape]` functionality as part of this +reform. Two ways this could work out: + +- *All* macros are visible throughout the crate. This seems bad; I depend on + module scoping to stay (marginally) sane when working with macros. You can + have private helper macros in two different modules without worrying that + the names will clash. + +- Only macros at the crate root are visible throughout the crate. I'm also + against this because I like keeping `lib.rs` as a declarative description + of crates, modules, etc. without containing any actual code. Forcing the + user's hand as to which file a particular piece of code goes in seems + un-Rusty. + +# Unresolved questions + +Should we forbid `$crate` in non-exported macros? It seems useless, however I +think we should allow it anyway, to encourage the habit of writing `$crate::` +for any references to the local crate. + +Should `#[macro_reexport]` support the "glob" behavior of `#[macro_use]` with +no names listed? + +# Acknowledgements + +This proposal is edited by Keegan McAllister. It has been refined through many +engaging discussions with: + +* Brian Anderson, Shachaf Ben-Kiki, Lars Bergstrom, Nick Cameron, John Clements, Alex Crichton, Cathy Douglass, Steven Fackler, Manish Goregaokar, Dave Herman, Steve Klabnik, Felix S. Klock II, Niko Matsakis, Matthew McPherrin, Paul Stansifer, Sam Tobin-Hochstadt, Erick Tryzelaar, Aaron Turon, Huon Wilson, Brendan Zabarauskas, Cameron Zwarich +* *GitHub*: `@bill-myers` `@blaenk` `@comex` `@glaebhoerl` `@Kimundi` `@mitchmindtree` `@mitsuhiko` `@P1Start` `@petrochenkov` `@skinner` +* *Reddit*: `gnusouth` `ippa` `!kibwen` `Mystor` `Quxxy` `rime-frost` `Sinistersnare` `tejp` `UtherII` `yigal100` +* *IRC*: `bstrie` `ChrisMorgan` `cmr` `Earnestly` `eddyb` `tiffany` + +My apologies if I've forgotten you, used an un-preferred name, or accidentally +categorized you as several different people. Pull requests are welcome :) diff --git a/text/0458-send-improvements.md b/text/0458-send-improvements.md new file mode 100644 index 00000000000..ed4a2774950 --- /dev/null +++ b/text/0458-send-improvements.md @@ -0,0 +1,216 @@ +- Start Date: 2014-11-10 +- RFC PR: https://github.com/rust-lang/rfcs/pull/458 +- Rust Issue: https://github.com/rust-lang/rust/issues/22251 + +# Summary + +I propose altering the `Send` trait as proposed by RFC #17 as +follows: + +* Remove the implicit `'static` bound from `Send`. +* Make `&T` `Send` if and only if `T` is `Sync`. + ```rust + impl<'a, T> !Send for &'a T {} + + unsafe impl<'a, T> Send for &'a T where T: Sync + 'a {} + ``` +* Evaluate each `Send` bound currently in `libstd` and either leave it as-is, add an + explicit `'static` bound, or bound it with another lifetime parameter. + +# Motivation + +Currently, Rust has two types that deal with concurrency: `Sync` and `Send` + +If `T` is `Sync`, then `&T` is threadsafe (that is, can cross task boundaries without +data races). This is always true of any type with simple inherited mutability, and it is also true +of types with interior mutability that perform explicit synchronization (e.g. `Mutex` and +`Arc`). By fiat, in safe code all static items require a `Sync` bound. `Sync` is most +interesting as the proposed bound for closures in a fork-join concurrency model, where the thread +running the closure can be guaranteed to terminate before some lifetime `'a`, and as one of the +required bounds for `Arc`. + +If `T` is `Send`, then `T` is threadsafe to send between tasks. At an initial glance, +this type is harder to define. `Send` currently requires a `'static` bound, which excludes +types with non-'static references, and there are a few types (notably, `Rc` and +`local_data::Ref`) that opt out of `Send`. All static items other than those that are +`Sync` but not `Send` (in the stdlib this is just `local_data::Ref` and its derivatives) +are `Send`. `Send` is most interesting as a required bound for `Mutex`, channels, `spawn()`, and +other concurrent types and functions. + +This RFC is mostly motivated by the challenges of writing a safe interface for fork-join concurrency +in current Rust. Specifically: + +* It is not clear what it means for a type to be `Sync` but not `Send`. Currently there + is nothing in the type system preventing these types from being instantiated. In a fork-join + model with a bounded, non-`'static` lifetime `'a` for worker tasks, using a + `Sync + 'a` bound on a closure is the intended way to make sure the operation is safe to run + in another thread in parallel with the main thread. But there is no way of preventing the main + and worker tasks from concurrently accessing an item that is `Sync + NoSend`. +* Because `Send` has a `'static` bound, most concurrency constructs cannot be used if they have any non-static references in them, even in a thread with a bounded lifetime. It seems like there should be a way to extend `Send` to shorter lifetimes. But + naively removing the `'static` bound causes memory unsafety in existing APIs like Mutex. + +# Detailed Design + +## Proposal + +Extend the current meaning of `Send` in a (mostly) backwards-compatible way that +retains memory-safety, but allows for existing concurrent types like `Arc` and `Mutex` to be +used across non-`'static` boundaries. Use `Send` with a bounded lifetime instead of `Sync` for fork-join concurrency. + +The first proposed change is to remove the `'static` bound from `Send`. Without doing this, +we would have to write brand new types for fork-join libraries that took `Sync` bounds but were +otherwise identical to the existing implementations. For example, we cannot create a +`Mutex>` as long as `Mutex` requires a `'static` bound. By itself, +though, this causes unsafety. For example, a `Mutex<&'a Cell>` does not necessarily +actually lock the data in the `Cell`: + +```rust +let cell = Cell:new(true); +let ref_ = &cell; +let mutex = Mutex::new(&cell); +ref_.set(false); // Modifying the cell without locking the Mutex. +``` + +This leads us to our second refinement. We add the rule that `&T` is `Send` if and only if +`T` is `Sync`--in other words, we disallow `Send`ing shared references with a +non-threadsafe interior. We do, however, still allow `&mut T` where `T` is `Send`, even +if it is not `Sync`. This is safe because `&mut T` linearizes access--the only way to +access the the original data is through the unique reference, so it is safe to send to other +threads. Similarly, we allow `&T` where `T` is `Sync`, even if it is not `Send`, since by the definition of `Sync` `&T` is already known to be threadsafe. + +Note that this definition of `Send` is identical to the old definition of `Send` when +restricted to `'static` lifetimes in safe code. Since `static mut` items are not accessible +in safe code, and it is not possible to create a safe `&'static mut` outside of such an item, we +know that if `T: Send + 'static`, it either has only `&'static` references, or has no references at +all. Since `'static` references can only be created in `static` items and literals in safe code, and +all `static` items (and literals) are `Sync`, we know that any such references are `Sync`. Thus, our +new rule that `T` must be `Sync` for `&'static T` to be `Send` does not actually +remove `Send` from any existing types. And since `T` has no `&'static mut` references, +unless any were created in unsafe code, we also know that our rule allowing `&'static mut T` +did not add `Send` to any new types. We conclude that the second refinement is backwards compatible +with the old behavior, provided that old interfaces are updated to require `'static` bounds and they did not +create unsafe `'static` and `'static mut` references. But unsafe types like these were already not +guaranteed to be threadsafe by Rust's type system. + +Another important note is that with this definition, `Send` will fulfill the proposed role of `Sync` in a fork-join concurrency library. At present, to use `Sync` in a fork-join library one must make the implicit assumption that if `T` is `Sync`, `T` is `Send`. One might be tempted to codify this by making `Sync` a subtype of `Send`. Unfortunately, this is not always the case, though it should be most of the time. A type can be created with `&mut` methods that are not thread safe, but no `&`-methods that are not thread safe. An example would be a version of `Rc` called `RcMut`. `RcMut` would have a `clone_mut()` method that took `&mut self` and no other `clone()` method. `RcMut` could be thread-safely shared provided that a `&mut RcMut` was not sent to another thread. As long as that invariant was upheld, `RcMut` could only be cloned in its original thread and could not be dropped while shared (hence, `RcMut` is `Sync`) but a mutable reference could not be thread-safely shared, nor could it be moved into another thread (hence, `&mut RcMut` is not `Send`, which means that `RcMut` is not `Send`). Because `&T` is Send if `T` is Sync (per the new definition), adding a `Send` bound will guarantee that only shared pointers of this type are moved between threads, so our new definition of `Send` preserves thread safety in the presence of such types. + +Finally, we'd hunt through existing instances of `Send` in Rust libraries and replace them with +sensible defaults. For example, the `spawn()` APIs should all have `'static` bounds, +preserving current behavior. I don't think this would be too difficult, but it may be that there +are some edge cases here where it's tricky to determine what the right solution is. + +## More unusual types + +We discussed whether a type with a destructor that manipulated thread-local data could be non-`Send` even though `&mut T` was. In general it could not, because you can call a destructor through `&mut` references (through `swap` or simply assigning a new value to `*x` where `x: &mut T`). It was noted that since `&uniq T` cannot be dropped, this suggests a role for such types. + +Some unusual types proposed by `arielb1` and myself to explain why `T: Send` does not mean `&mut T` is threadsafe, and `T: Sync` does not imply `T: Send`. The first type is a bottom type, the second takes `self` by value (so `RcMainTask` is not `Send` but `&mut RcMainTask` is `Send`). + +Comments from arielb1: + +Observe that `RcMainTask::main_clone` would be unsafe outside the main task. + +`&mut Xyz` and `&mut RcMainTask` are perfectly fine `Send` types. However, `Xyz` is a bottom (can be used to violate memory safety), and `RcMainTask` is not `Send`. + +```rust +#![feature(tuple_indexing)] +use std::rc::Rc; +use std::mem; +use std::kinds::marker; + +// Invariant: &mut Xyz always points to a valid C xyz. +// Xyz rvalues don't exist. + +// These leak. I *could* wrap a box or arena, but that would +// complicate things. + +extern "C" { + // struct Xyz; + fn xyz_create() -> *mut Xyz; + fn xyz_play(s: *mut Xyz); +} + +pub struct Xyz(marker::NoCopy); + +impl Xyz { + pub fn new() -> &'static mut Xyz { + unsafe { + let x = xyz_create(); + mem::transmute(x) + } + } + + pub fn play(&mut self) { + unsafe { xyz_play(mem::transmute(self)) } + } +} + +// Invariant: only the main task has RcMainTask values + +pub struct RcMainTask(Rc); +impl RcMainTask { + pub fn new(t: T) -> Option> { + if on_main_task() { + Some(RcMainTask(Rc::new(t))) + } else { None } + } + + pub fn main_clone(self) -> (RcMainTask, RcMainTask) { + let new = RcMainTask(self.0.clone()); + (self, new) + } +} + +impl Deref for RcMainTask { + fn deref(&self) -> &T { &*self.0 } +} + +// - by Sharp + +pub struct RcMut(Rc); +impl RcMut { + pub fn new(t: T) -> RcMut { + RcMut(Rc::new(t)) + } + + pub fn mut_clone(&mut self) -> RcMut { + RcMut(self.0.clone()) + } +} + +impl Deref for RcMut { + fn deref(&self) -> &T { &*self.0 } +} + +// fn on_main_task() -> bool { false /* XXX: implement */ } +// fn main() {} +``` + +# Drawbacks + +Libraries get a bit more complicated to write, since you may have to write `Send + 'static` where previously you just wrote `Send`. + +# Alternatives + +We could accept the status quo. This would mean that any existing `Sync` `NoSend` +type like those described above would be unsafe (that is, it would not be possible to write a non-`'static` closure with the correct bounds to make it safe to use), and it would not be possible to write a type like `Arc` for a `T` with a bounded lifetime, as well as other safe concurrency constructs for fork-join concurrency. I do not think this is a good alternative. + +We could do as proposed above, but change `Sync` to be a subtype of `Send`. Things wouldn't be too +different, but you wouldn't be able to write types like those discussed above. I am not sure that types like that are actually useful, but even if we did this I think you would usually want to use a `Send` bound anyway. + +We could do as proposed above, but instead of changing `Send`, create a new type for this +purpose. I suppose the advantage of this would be that user code currently using `Send` as a way to +get a `'static` bound would not break. However, I don't think it makes a lot of sense to keep the +current `Send` type around if this is implemented, since the new type should be backwards compatible +with it where it was being used semantically correctly. + +# Unresolved questions + +* Is the new scheme actually safe? I *think* it is, but I certainly haven't proved it. + +* Can this wait until after Rust 1.0, if implemented? I think it is backwards incompatible, but I +believe it will also be much easier to implement once opt-in kinds are fully implemented. + +* Is this actually necessary? I've asserted that I think it's important to be able to do the same +things in bounded-lifetime threads that you can in regular threads, but it may be that it isn't. + +* Are types that are `Sync` and `NoSend` actually useful? diff --git a/text/0459-disallow-shadowing.md b/text/0459-disallow-shadowing.md new file mode 100644 index 00000000000..20ffdacbdae --- /dev/null +++ b/text/0459-disallow-shadowing.md @@ -0,0 +1,91 @@ +- Start Date: 2014-11-29 +- RFC PR: [rust-lang/rfcs#459](https://github.com/rust-lang/rfcs/pull/459) +- Rust Issue: [rust-lang/rust#19390](https://github.com/rust-lang/rust/issues/19390) + +# Summary + +Disallow type/lifetime parameter shadowing. + +# Motivation + +Today we allow type and lifetime parameters to be shadowed. This is a +common source of bugs as well as confusing errors. An example of such a confusing case is: + +```rust +struct Foo<'a> { + x: &'a int +} + +impl<'a> Foo<'a> { + fn set<'a>(&mut self, v: &'a int) { + self.x = v; + } +} + +fn main() { } +``` + +In this example, the lifetime parameter `'a` is shadowed on the method, leading to two +logically distinct lifetime parameters with the same name. This then leads to the error +message: + + mismatched types: expected `&'a int`, found `&'a int` (lifetime mismatch) + +which is obviously completely unhelpful. + +Similar errors can occur with type parameters: + +```rust +struct Foo { + x: T +} + +impl Foo { + fn set(&mut self, v: T) { + self.x = v; + } +} + +fn main() { } +``` + +Compiling this program yields: + + mismatched types: expected `T`, found `T` (expected type parameter, found a different type parameter) + +Here the error message was improved by [a recent PR][pr], but this is +still a somewhat confusing situation. + +Anecdotally, this kind of accidental shadowing is fairly frequent +occurrence. It recently arose on [this discuss thread][dt], for +example. + +[dt]: http://discuss.rust-lang.org/t/confused-by-lifetime-error-messages-tell-me-about-it/358/41?u=nikomatsakis +[pr]: https://github.com/rust-lang/rust/pull/18264 + +# Detailed design + +Disallow shadowed type/lifetime parameter declarations. An error would +be reported by the resolve/resolve-lifetime passes in the compiler and +hence fairly early in the pipeline. + +# Drawbacks + +We otherwise allow shadowing, so it is inconsistent. + +# Alternatives + +We could use a lint instead. However, we'd want to ensure that the +lint error messages were printed *before* type-checking begins. We +could do this, perhaps, by running the lint printing pass multiple +times. This might be useful in any case as the placement of lints in +the compiler pipeline has proven problematic before. + +We could also attempt to improve the error messages. Doing so for +lifetimes is definitely important in any case, but also somewhat +tricky due to the extensive inference. It is usually easier and more +reliable to help avoid the error in the first place. + +# Unresolved questions + +None. diff --git a/text/0461-tls-overhaul.md b/text/0461-tls-overhaul.md new file mode 100644 index 00000000000..073b7979b4b --- /dev/null +++ b/text/0461-tls-overhaul.md @@ -0,0 +1,331 @@ +- Start Date: 2014-11-11 +- RFC PR: https://github.com/rust-lang/rfcs/pull/461 +- Rust Issue: https://github.com/rust-lang/rust/issues/19175 + +# Summary + +Introduce a new thread local storage module to the standard library, `std::tls`, +providing: + +* Scoped TLS, a non-owning variant of TLS for any value. +* Owning TLS, an owning, dynamically initialized, dynamically destructed + variant, similar to `std::local_data` today. + +# Motivation + +In the past, the standard library's answer to thread local storage was the +`std::local_data` module. This module was designed based on the Rust task model +where a task could be either a 1:1 or M:N task. This design constraint has +[since been lifted][runtime-rfc], allowing for easier solutions to some of the +current drawbacks of the module. While redesigning `std::local_data`, it can +also be scrutinized to see how it holds up to modern-day Rust style, guidelines, +and conventions. + +[runtime-rfc]: https://github.com/rust-lang/rfcs/blob/master/text/0230-remove-runtime.md + +In general the amount of work being scheduled for 1.0 is being trimmed down as +much as possible, especially new work in the standard library that isn't focused +on cutting back what we're shipping. Thread local storage, however, is such a +critical part of many applications and opens many doors to interesting sets of +functionality that this RFC sees fit to try and wedge it into the schedule. The +current `std::local_data` module simply doesn't meet the requirements of what +one may expect out of a TLS implementation for a language like Rust. + +## Current Drawbacks + +Today's implementation of thread local storage, `std::local_data`, suffers from +a few drawbacks: + +* The implementation is not super speedy, and it is unclear how to enhance the + existing implementation to be on par with OS-based TLS or `#[thread_local]` + support. As an example, today a lookup takes `O(log N)` time where N is the + number of set TLS keys for a task. + + This drawback is also not to be taken lightly. TLS is a fundamental building + block for rich applications and libraries, and an inefficient implementation + will only deter usage of an otherwise quite useful construct. + +* The types which can be stored into TLS are not maximally flexible. Currently + only types which ascribe to `'static` can be stored into TLS. It's often the + case that a type with references needs to be placed into TLS for a short + period of time, however. + +* The interactions between TLS destructors and TLS itself is not currently very + well specified, and it can easily lead to difficult-to-debug runtime panics or + undocumented leaks. + +* The implementation currently assumes a local `Task` is available. Once the + runtime removal is complete, this will no longer be a valid assumption. + +## Current Strengths + +There are, however, a few pros to the usage of the module today which should be +required for any replacement: + +* All platforms are supported. +* `std::local_data` allows consuming ownership of data, allowing it to live past + the current stack frame. + +## Building blocks available + +There are currently two primary building blocks available to Rust when building +a thread local storage abstraction, `#[thread_local]` and OS-based TLS. Neither +of these are currently used for `std::local_data`, but are generally seen as +"adequately efficient" implementations of TLS. For example, an TLS access of a +`#[thread_local]` global is simply a pointer offset, which when compared to a +`O(log N)` lookup is quite speedy! + +With these available, this RFC is motivated in redesigning TLS to make use of +these primitives. + +# Detailed design + +Three new modules will be added to the standard library: + +* The `std::sys::tls` module provides platform-agnostic bindings the OS-based + TLS support. This support is intended to only be used in otherwise unsafe code + as it supports getting and setting a `*mut u8` parameter only. + +* The `std::tls` module provides a dynamically initialized and dynamically + destructed variant of TLS. This is very similar to the current + `std::local_data` module, except that the implicit `Option` is not + mandated as an initialization expression is required. + +* The `std::tls::scoped` module provides a flavor of TLS which can store a + reference to any type `T` for a scoped set of time. This is a variant of TLS + not provided today. The backing idea is that if a reference only lives in TLS + for a fixed set of time then there's no need for TLS to consume ownership of + the value itself. + + This pattern of TLS is quite common throughout the compiler's own usage of + `std::local_data` and often more expressive as no dances are required to move + a value into and out of TLS. + +The design described below can be found as an existing cargo package: +https://github.com/alexcrichton/tls-rs. + +## The OS layer + +While LLVM has support for `#[thread_local]` statics, this feature is not +supported on all platforms that LLVM can target. Almost all platforms, however, +provide some form of OS-based TLS. For example Unix normally comes with +`pthread_key_create` while Windows comes with `TlsAlloc`. + +This RFC proposes introducing a `std::sys::tls` module which contains bindings +to the OS-based TLS mechanism. This corresponds to the `os` module in the +example implementation. While not currently public, the contents of `sys` are +slated to become public over time, and the API of the `std::sys::tls` module +will go under API stabilization at that time. + +This module will support "statically allocated" keys as well as dynamically +allocated keys. A statically allocated key will actually allocate a key on +first use. + +### Destructor support + +The major difference between Unix and Windows TLS support is that Unix supports +a destructor function for each TLS slot while Windows does not. When each Unix +TLS key is created, an optional destructor is specified. If any key has a +non-NULL value when a thread exits, the destructor is then run on that value. + +One possibility for this `std::sys::tls` module would be to not provide +destructor support at all (least common denominator), but this RFC proposes +implementing destructor support for Windows to ensure that functionality is not +lost when writing Unix-only code. + +Destructor support for Windows will be provided through a custom implementation +of tracking known destructors for TLS keys. + +## Scoped TLS + +As discussed before, one of the motivations for this RFC is to provide a method +of inserting any value into TLS, not just those that ascribe to `'static`. This +provides maximal flexibility in storing values into TLS to ensure any "thread +local" pattern can be encompassed. + +Values which do not adhere to `'static` contain references with a constrained +lifetime, and can therefore not be moved into TLS. They can, however, be +*borrowed* by TLS. This scoped TLS api provides the ability to insert a +reference for a particular period of time, and then a non-escaping reference can +be extracted at any time later on. + +In order to implement this form of TLS, a new module, `std::tls::scoped`, will +be added. It will be coupled with a `scoped_tls!` macro in the prelude. The API +looks like: + +```rust +/// Declares a new scoped TLS key. The keyword `static` is required in front to +/// emphasize that a `static` item is being created. There is no initializer +/// expression because this key initially contains no value. +/// +/// A `pub` variant is also provided to generate a public `static` item. +macro_rules! scoped_tls( + (static $name:ident: $t:ty) => (/* ... */); + (pub static $name:ident: $t:ty) => (/* ... */); +) + +/// A structure representing a scoped TLS key. +/// +/// This structure cannot be created dynamically, and it is accessed via its +/// methods. +pub struct Key { /* ... */ } + +impl Key { + /// Insert a value into this scoped TLS slot for a duration of a closure. + /// + /// While `cb` is running, the value `t` will be returned by `get` unless + /// this function is called recursively inside of cb. + /// + /// Upon return, this function will restore the previous TLS value, if any + /// was available. + pub fn set(&'static self, t: &T, cb: || -> R) -> R { /* ... */ } + + /// Get a value out of this scoped TLS variable. + /// + /// This function takes a closure which receives the value of this TLS + /// variable, if any is available. If this variable has not yet been set, + /// then None is yielded. + pub fn with(&'static self, cb: |Option<&T>| -> R) -> R { /* ... */ } +} +``` + +The purpose of this module is to enable the ability to insert a value into TLS +for a scoped period of time. While able to cover many TLS patterns, this flavor +of TLS is not comprehensive, motivating the owning variant of TLS. + +### Variations + +Specifically the `with` API can be somewhat unwieldy to use. The `with` function +takes a closure to run, yielding a value to the closure. It is believed that +this is required for the implementation to be sound, but it also goes against +the "use RAII everywhere" principle found elsewhere in the stdlib. + +Additionally, the `with` function is more commonly called `get` for accessing a +contained value in the stdlib. The name `with` is recommended because it may be +possible in the future to express a `get` function returning a reference with a +lifetime bound to the stack frame of the caller, but it is not currently +possible to do so. + +The `with` functions yields an `Option<&T>` instead of `&T`. This is to cover +the use case where the key has not been `set` before it used via `with`. This is +somewhat unergonomic, however, as it will almost always be followed by +`unwrap()`. An alternative design would be to provide a `is_set` function and +have `with` `panic!` instead. + +## Owning TLS + +Although scoped TLS can store any value, it is also limited in the fact that it +cannot own a value. This means that TLS values cannot escape the stack from from +which they originated from. This is itself another common usage pattern of TLS, +and to solve this problem the `std::tls` module will provided support for +placing owned values into TLS. + +These values must not contain references as that could trigger a use-after-free, +but otherwise there are no restrictions on placing statics into owned TLS. The +module will support dynamic initialization (run on first use of the variable) as +well as dynamic destruction (implementors of `Drop`). + +The interface provided will be similar to what `std::local_data` provides today, +except that the `replace` function has no analog (it would be written with a +`RefCell>`). + +```rust +/// Similar to the `scoped_tls!` macro, except allows for an initializer +/// expression as well. +macro_rules! tls( + (static $name:ident: $t:ty = $init:expr) => (/* ... */) + (pub static $name:ident: $t:ty = $init:expr) => (/* ... */) +) + +pub struct Key { /* ... */ } + +impl Key { + /// Access this TLS variable, lazily initializing it if necessary. + /// + /// The first time this function is called on each thread the TLS key will + /// be initialized by having the specified init expression evaluated on the + /// current thread. + /// + /// This function can return `None` for the same reasons of static TLS + /// returning `None` (destructors are running or may have run). + pub fn with(&'static self, f: |Option<&T>| -> R) -> R { /* ... */ } +} +``` + +### Destructors + +One of the major points about this implementation is that it allows for values +with destructors, meaning that destructors must be run when a thread exits. This +is similar to placing a value with a destructor into `std::local_data`. This RFC +attempts to refine the story around destructors: + +* A TLS key cannot be accessed while its destructor is running. This is + currently manifested with the `Option` return value. +* A TLS key *may* not be accessible after its destructor has run. +* Re-initializing TLS keys during destruction may cause memory leaks (e.g. + setting the key FOO during the destructor of BAR, and initializing BAR in the + destructor of FOO). An implementation will strive to destruct initialized + keys whenever possible, but it may also result in a memory leak. +* A `panic!` in a TLS destructor will result in a process abort. This is similar + to a double-failure. + +These semantics are still a little unclear, and the final behavior may still +need some more hammering out. The sample implementation suffers from a few extra +drawbacks, but it is believed that some more implementation work can overcome +some of the minor downsides. + +### Variations + +Like the scoped TLS variation, this key has a `with` function instead of the +normally expected `get` function (returning a reference). One possible +alternative would be to yield `&T` instead of `Option<&T>` and `panic!` if the +variable has been destroyed. Another possible alternative is to have a `get` +function returning a `Ref`. Currently this is unsafe, however, as there is no +way to ensure that `Ref` does not satisfy `'static`. If the returned +reference satisfies `'static`, then it's possible for TLS values to reference +each other after one has been destroyed, causing a use-after-free. + +# Drawbacks + +* There is no variant of TLS for statically initialized data. Currently the + `std::tls` module requires dynamic initialization, which means a slight + penalty is paid on each access (a check to see if it's already initialized). +* The specification of destructors on owned TLS values is still somewhat shaky + at best. It's possible to leak resources in unsafe code, and it's also + possible to have different behavior across platforms. +* Due to the usage of macros for initialization, all fields of `Key` in all + scenarios must be public. Note that `os` is excepted because its initializers + are a `const`. +* This implementation, while declared safe, is not safe for systems that do any + form of multiplexing of many threads onto one thread (aka green tasks or + greenlets). This RFC considers it the multiplexing systems' responsibility to + maintain native TLS if necessary, or otherwise strongly recommend not using + native TLS. + +# Alternatives + +Alternatives on the API can be found in the "Variations" sections above. + +Some other alternatives might include: + +* A 0-cost abstraction over `#[thread_local]` and OS-based TLS which does not + have support for destructors but requires static initialization. Note that + this variant still needs destructor support *somehow* because OS-based TLS + values must be pointer-sized, implying that the rust value must itself be + boxed (whereas `#[thread_local]` can support any type of any size). + +* A variant of the `tls!` macro could be used where dynamic initialization is + opted out of because it is not necessary for a particular use case. + +* A [previous PR][prev-pr] from @thestinger leveraged macros more heavily than + this RFC and provided statically constructible Cell and RefCell equivalents + via the usage of `transmute`. The implementation provided did not, however, + include the scoped form of this RFC. + +[prev-pr]: https://github.com/rust-lang/rust/pull/17583 + +# Unresolved questions + +* Are the questions around destructors vague enough to warrant the `get` method + being `unsafe` on owning TLS? +* Should the APIs favor `panic!`-ing internally, or exposing an `Option`? diff --git a/text/0463-future-proof-literal-suffixes.md b/text/0463-future-proof-literal-suffixes.md new file mode 100644 index 00000000000..d7440b72541 --- /dev/null +++ b/text/0463-future-proof-literal-suffixes.md @@ -0,0 +1,133 @@ +- Start Date: 2014--28 +- RFC PR: [#463](https://github.com/rust-lang/rfcs/pull/463) +- Rust Issue: [#19088](https://github.com/rust-lang/rust/issues/19088) + +# Summary + +Include identifiers immediately after literals in the literal token to +allow future expansion, e.g. `"foo"bar` and a `1baz` are considered +whole (but semantically invalid) tokens, rather than two separate +tokens `"foo"`, `bar` and `1`, `baz` respectively. This allows future +expansion of handling literals without risking breaking (macro) code. + + +# Motivation + +Currently a few kinds of literals (integers and floats) can have a +fixed set of suffixes and other kinds do not include any suffixes. The +valid suffixes on numbers are: + + +```text +u, u8, u16, u32, u64 +i, i8, i16, i32, i64 +f32, f64 +``` + +Most things not in this list are just ignored and treated as an +entirely separate token (prefixes of `128` are errors: e.g. `1u12` has +an error `"invalid int suffix"`), and similarly any suffixes on other +literals are also separate tokens. For example: + +```rust +#![feature(macro_rules)] + +// makes a tuple +macro_rules! foo( ($($a: expr)*) => { ($($a, )+) } ) + +fn main() { + let bar = "suffix"; + let y = "suffix"; + + let t: (uint, uint) = foo!(1u256); + println!("{}", foo!("foo"bar)); + println!("{}", foo!('x'y)); +} +/* +output: +(1, 256) +(foo, suffix) +(x, suffix) +*/ +``` + +The compiler is eating the `1u` and then seeing the invalid suffix +`256` and so treating that as a separate token, and similarly for the +string and character literals. (This problem is only visible in +macros, since that is the only place where two literals/identifiers can be placed +directly adjacent.) + +This behaviour means we would be unable to expand the possibilities +for literals after freezing the language/macros, which would be +unfortunate, since [user defined literals in C++][cpp] are reportedly +very nice, proposals for "bit data" would like to use types like `u1` +and `u5` (e.g. [RFC PR 327][327]), and there are "fringe" types like +[`f16`][f16], [`f128`][f128] and `u128` that have uses but are not +common enough to warrant adding to the language now. + +[cpp]: http://en.cppreference.com/w/cpp/language/user_literal +[327]: https://github.com/rust-lang/rfcs/pull/327 +[f16]: http://en.wikipedia.org/wiki/Half-precision_floating-point_format +[f128]: https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format + +# Detailed design + +The tokenizer will have grammar `literal: raw_literal identifier?` +where `raw_literal` covers strings, characters and numbers without +suffixes (e.g. `"foo"`, `'a'`, `1`, `0x10`). + +Examples of "valid" literals after this change (that is, entities that +will be consumed as a single token): + +``` +"foo"bar "foo"_baz +'a'x 'a'_y + +15u16 17i18 19f20 21.22f23 +0b11u25 0x26i27 28.29e30f31 + +123foo 0.0bar +``` + +Placing a space between the letter of the suffix and the literal will +cause it to be parsed as two separate tokens, just like today. That is +`"foo"bar` is one token, `"foo" bar` is two tokens. + +The example above would then be an error, something like: + +```rust + let t: (uint, uint) = foo!(1u256); // error: literal with unsupported size + println!("{}", foo!("foo"bar)); // error: literal with unsupported suffix + println!("{}", foo!('x'y)); // error: literal with unsupported suffix +``` + +The above demonstrates that numeric suffixes could be special cased +to detect `u<...>` and `i<...>` to give more useful error messages. + +(The macro example there is definitely an error because it is using +the incorrectly-suffixed literals as `expr`s. If it was only +handling them as a token, i.e. `tt`, there is the possibility that it +wouldn't have to be illegal, e.g. `stringify!(1u256)` doesn't have to +be illegal because the `1u256` never occurs at runtime/in the type +system.) + +# Drawbacks + +None beyond outlawing placing a literal immediately before a pattern, +but the current behaviour can easily be restored with a space: `123u +456`. (If a macro is using this for the purpose of hacky generalised +literals, the unresolved question below touches on this.) + +# Alternatives + +Don't do this, or consider doing it for adjacent suffixes with an +alternative syntax, e.g. `10'bar` or `10$bar`. + +# Unresolved questions + +- Should it be the parser or the tokenizer rejecting invalid suffixes? + This is effectively asking if it is legal for syntax extensions to + be passed the raw literals? That is, can a `foo` procedural syntax + extension accept and handle literals like `foo!(1u2)`? + +- Should this apply to all expressions, e.g. `(1 + 2)bar`? diff --git a/text/0469-feature-gate-box-patterns.md b/text/0469-feature-gate-box-patterns.md new file mode 100644 index 00000000000..a3a0b39c68b --- /dev/null +++ b/text/0469-feature-gate-box-patterns.md @@ -0,0 +1,33 @@ +- Start Date: 2014-11-17 +- RFC PR: [rust-lang/rfcs#469](https://github.com/rust-lang/rfcs/pull/469) +- Rust Issue: [rust-lang/rust#21931](https://github.com/rust-lang/rust/issues/21931) + +# Summary + +Move `box` patterns behind a feature gate. + +# Motivation + +A recent RFC (https://github.com/rust-lang/rfcs/pull/462) proposed renaming `box` patterns to `deref`. The discussion that followed indicates that while the language community may be in favour of some sort of renaming, there is no significant consensus around any concrete proposal, including the original one or any that emerged from the discussion. + +This RFC proposes moving `box` patterns behind a feature gate to postpone that discussion and decision to when it becomes more clear how `box` patterns should interact with types other than `Box`. + +In addition, in the future `box` patterns are expected to be made more general by enabling them to destructure any type that implements one of the `Deref` family of traits. As such a generalisation may potentially lead to some currently valid programs being rejected due to the interaction with type inference or other language features, it is desirable that this particular feature stays feature gated until then. + +# Detailed design + +A feature gate `box_patterns` will be defined and all uses of the `box` pattern will require said gate to be enabled. + +# Drawbacks + +Some currently valid Rust programs will have to opt in to another feature gate. + +# Alternatives + +Pursue https://github.com/rust-lang/rfcs/pull/462 before 1.0 and stabilise `box patterns` without a feature gate. + +Leave `box` patterns as-is without putting them behind a feature gate. + +# Unresolved questions + +None. diff --git a/text/0474-path-reform.md b/text/0474-path-reform.md new file mode 100644 index 00000000000..63b43bf8bb8 --- /dev/null +++ b/text/0474-path-reform.md @@ -0,0 +1,449 @@ +- Start Date: 2014-11-12 +- RFC PR: [rust-lang/rfcs#474](https://github.com/rust-lang/rfcs/pull/474) +- Rust Issue: [rust-lang/rust#20034](https://github.com/rust-lang/rust/issues/20034) + +# Summary + +This RFC reforms the design of the `std::path` module in preparation for API +stabilization. The path API must deal with many competing demands, and the +current design handles many of them, but suffers from some significant problems +given in "Motivation" below. The RFC proposes a redesign modeled loosely on the +current API that addresses these problems while maintaining the advantages of +the current design. + +# Motivation + +The design of a path abstraction is surprisingly hard. Paths work radically +differently on different platforms, so providing a cross-platform abstraction is +challenging. On some platforms, paths are not required to be in Unicode, posing +ergonomic and semantic difficulties for a Rust API. These difficulties are +compounded if one also tries to provide efficient path manipulation that does +not, for example, require extraneous copying. And, of course, the API should be +easy and pleasant to use. + +The current `std::path` module makes a strong effort to balance these design +constraints, but over time a few key shortcomings have emerged. + +## Semantic problems + +Most importantly, the current `std::path` module makes some semantic assumptions +about paths that have turned out to be incorrect. + +### Normalization + +Paths in `std::path` are always *normalized*, meaning that `a/../b` is treated +like `b` (among other things). Unfortunately, this kind of normalization changes +the meaning of paths when symbolic links are present: if `a` is a symbolic link, +then the relative paths `a/../b` and `b` may refer to completely different +locations. See [this issue](https://github.com/rust-lang/rust/issues/14028) for +more detail. + +For this reason, most path libraries do *not* perform full normalization of +paths, though they may normalize paths like `a/./b` to `a/b`. Instead, they +offer (1) methods to optionally normalize and (2) methods to normalize based on +the contents of the underlying file system. + +Since our current normalization scheme can silently and incorrectly alter the +meaning of paths, it needs to be changed. + +### Unicode and Windows + +In the original `std::path` design, it was assumed that all paths on Windows +were Unicode. However, it +[turns out](https://github.com/rust-lang/rust/issues/12056) that the Windows +filesystem APIs actually work with [UCS-2](http://en.wikipedia.org/wiki/UTF-16), +which roughly means that they accept arbitrary sequences of `u16` values but +interpret them as UTF-16 when it is valid to do so. + +The current `std::path` implementation is built around the assumption that +Windows paths can be represented as Rust string slices, and will need to be +substantially revised. + +## Ergonomic problems + +Because paths in general are not in Unicode, the `std::path` module cannot rely on +an internal string or string slice representation. That in turn causes trouble +for methods like `dirname` that are intended to extract a subcomponent of a path +-- what should it return? + +There are basically three possible options, and today's `std::path` module +chooses *all* of them: + +* Yield a byte sequence: `dirname` yields an `&[u8]` +* Yield a string slice, accounting for potential non-UTF-8 values: `dirname_str` + yields an `Option<&str>` +* Yield another path: `dir_path` yields a `Path` + +This redundancy is present for most of the decomposition methods. The saving +grace is that, in general, path methods consume `BytesContainer` values, so one +can use the `&[u8]` variant but continue to work with other path methods. But in +general `&[u8]` values are not ergonomic to work with, and the explosion in +methods makes the module more (superficially) complex than one might expect. + +You might be tempted to provide only the third option, but `Path` values are +*owned* and *mutable*, so that would imply cloning on every decomposition +operation. For applications like Cargo that work heavily with paths, this would +be an unfortunate (and seemingly unnecessary) overhead. + +## Organizational problems + +Finally, the `std::path` module presents a somewhat complex API organization: + +* The `Path` type is a direct alias of a platform-specific path type. +* The `GenericPath` trait provides most of the common API expected on both platforms. +* The `GenericPathUnsafe` trait provides a few unsafe/unchecked functions for + performance reasons. +* The `posix` and `windows` submodules provide their own `Path` types and a + handful of platform-specific functionality (in particular, `windows` provides + support for working with volumes and "verbatim" paths prefixed with `\\?\`) + +This organization needs to be updated to match current conventions and +simplified if possible. + +One thing to note: with the current organization, it is possible to work with +non-native paths, which can sometimes be useful for interoperation. The new +design should retain this functionality. + +# Detailed design + +Note: this design is influenced by the +[Boost filesystem library](www.boost.org/doc/libs/1_57_0/libs/filesystem/doc/reference.html) +and [Scheme48](http://s48.org/1.8/manual/manual-Z-H-6.html#node_sec_5.15) and +[Racket's](http://plt.eecs.northwestern.edu/snapshots/current/doc/reference/windowspaths.html#%28part._windowspathrep%29) +approach to encoding issues on windows. + +## Overview + +The basic design uses DST to follow the same pattern as `Vec/[T]` and +`String/str`: there is a `PathBuf` type for owned, mutable paths and an unsized +`Path` type for slices. The various "decomposition" methods for extracting +components of a path all return slices, and `PathBuf` itself derefs to `Path`. + +The result is an API that is both efficient and ergonomic: there is no need to +allocate/copy when decomposing a path, but there is also no need to provide +multiple variants of methods to extract bytes versus Unicode strings. For +example, the `Path` slice type provides a *single* method for converting to a +`str` slice (when applicable). + +A key aspect of the design is that there is no internal normalization of paths +at all. Aside from solving the symbolic link problem, this choice also has +useful ramifications for the rest of the API, described below. + +The proposed API deals with the other problems mentioned above, and also brings +the module in line with current Rust patterns and conventions. These details +will be discussed after getting a first look at the core API. + +## The cross-platform API + +The proposed core, cross-platform API provided by the new `std::path` is as follows: + +```rust +// A sized, owned type akin to String: +pub struct PathBuf { .. } + +// An unsized slice type akin to str: +pub struct Path { .. } + +// Some ergonomics and generics, following the pattern in String/str and Vec/[T] +impl Deref for PathBuf { ... } +impl BorrowFrom for Path { ... } + +// A replacement for BytesContainer; used to cut down on explicit coercions +pub trait AsPath for Sized? { + fn as_path(&self) -> &Path; +} + +impl PathBuf where P: AsPath { + pub fn new(path: T) -> PathBuf; + + pub fn push(&mut self, path: &P); + pub fn pop(&mut self) -> bool; + + pub fn set_file_name(&mut self, file_name: &P); + pub fn set_extension(&mut self, extension: &P); +} + +// These will ultimately replace the need for `push_many` +impl FromIterator

for PathBuf where P: AsPath { .. } +impl Extend

for PathBuf where P: AsPath { .. } + +impl Path where P: AsPath { + pub fn new(path: &str) -> &Path; + + pub fn as_str(&self) -> Option<&str> + pub fn to_str_lossy(&self) -> Cow; // Cow will replace MaybeOwned + pub fn to_owned(&self) -> PathBuf; + + // iterate over the components of a path + pub fn iter(&self) -> Iter; + + pub fn is_absolute(&self) -> bool; + pub fn is_relative(&self) -> bool; + pub fn is_ancestor_of(&self, other: &P) -> bool; + + pub fn path_relative_from(&self, base: &P) -> Option; + pub fn starts_with(&self, base: &P) -> bool; + pub fn ends_with(&self, child: &P) -> bool; + + // The "root" part of the path, if absolute + pub fn root_path(&self) -> Option<&Path>; + + // The "non-root" part of the path + pub fn relative_path(&self) -> &Path; + + // The "directory" portion of the path + pub fn dir_path(&self) -> &Path; + + pub fn file_name(&self) -> Option<&Path>; + pub fn file_stem(&self) -> Option<&Path>; + pub fn extension(&self) -> Option<&Path>; + + pub fn join(&self, path: &P) -> PathBuf; + + pub fn with_file_name(&self, file_name: &P) -> PathBuf; + pub fn with_extension(&self, extension: &P) -> PathBuf; +} + +pub struct Iter<'a> { .. } + +impl<'a> Iterator<&'a Path> for Iter<'a> { .. } + +pub const SEP: char = .. +pub const ALT_SEPS: &'static [char] = .. + +pub fn is_separator(c: char) -> bool { .. } +``` + +There is plenty of overlap with today's API, and the methods being retained here +largely have the same semantics. + +But there are also a few potentially surprising aspects of this design that merit +comment: + +* **Why does `PathBuf::new` take `IntoString`?** It needs an owned buffer + internally, and taking a string means that Unicode input is guaranteed, which + works on all platforms. (In general, the assumption is that non-Unicode paths + are most commonly produced by *reading* a path from the filesystem, rather + than creating now ones. As we'll see below, there are *platform-specific* ways + to crate non-Unicode paths.) + +* **Why no `Path::as_bytes` method?** There is no cross-platform way to expose + paths directly in terms of byte sequences, because each platform extends + beyond Unicode in its own way. In particular, Unix platforms accept arbitrary + u8 sequences, while Windows accepts arbitrary *u16* sequences (both modulo + disallowing interior 0s). The u16 sequences provided by Windows do not have a + canonical encoding as bytes; this RFC proposed to use + [WTF-8](http://simonsapin.github.io/wtf-8/) (see below), but does not reveal + that choice. + +* **What about interior nulls?** Currently various Rust system APIs will panic + when given strings containing interior null values because, while these are + valid UTF-8, it is not possible to send them as-is to C APIs that expect + null-terminated strings. The API here follows the same approach, panicking if + given a path with an interior null. + +* **Why do `file_name` and `extension` operations work with `Path` rather than + some other type?** In particular, it may seem strange to view an extension as + a path. But doing so allows us to not reveal platform differences about the + various character sets used in paths. By and large, extensions in practice will + be valid Unicode, so the various methods going to and from `str` will + suffice. But as with paths in general, there are platform-specific ways of + working with non-Unicode data, explained below. + +* **Where did `push_many` and friends go?** They're replaced by implementing + `FromIterator` and `Extend`, following a similar pattern with the `Vec` + type. (Some work will be needed to retain full efficiency when doing so.) + +* **How does `Path::new` work?** The ability to directly get a `&Path` from an + `&str` (i.e., with no allocation or other work) is a key part of the + representation choices, which are described below. + +* **Where is the `normalize` method?** Since the path type no longer internally + normalizes, it may be useful to explicitly request normalization. This can be + done by writing `let normalized: PathBuf = p.iter().collect()` for a path `p`, + because the iterator performs some on-the-fly normalization (see + below). **NOTE* this normalization does *not* include removing `..`, for the + reasons explained at the beginning of the RFC. + +* **What does the iterator yield?** Unlike today's `components`, the `iter` + method here will begin with `root_path` if there is one. Thus, `a/b/c` will + yield `a`, `b` and `c`, while `/a/b/c` will yield `/`, `a`, `b` and `c`. + +## Important semantic rules + +The path API is designed to satisfy several semantic rules described below. +**Note that `==` here is *lazily* normalizing**, treating `./b` as `b` and +`a//b` as `a/b`; see the next section for more details. + +Suppose `p` is some `&Path` and `dot == Path::new(".")`: + +```rust +p == p.join(dot) +p == dot.join(p) + +p == p.root_path().unwrap_or(dot) + .join(p.relative_path()) + +p.relative_path() == match p.root_path() { + None => p, + Some(root) => p.path_relative_from(root).unwrap() +} + +p == p.dir_path() + .join(p.file_name().unwrap_or(dot)) + +p == p.iter().collect() + +p == match p.file_name() { + None => p, + Some(name) => p.with_file_name(name) +} + +p == match p.extension() { + None => p, + Some(ext) => p.with_extension(ext) +} + +p == match (p.file_stem(), p.extension()) { + (Some(stem), Some(ext)) => p.with_file_name(name).with_extension(ext), + _ => p +} +``` + +## Representation choices, Unicode, and normalization + +A lot of the design in this RFC depends on a key property: both Unix and Windows +paths can be easily represented as a flat byte sequence "compatible" with +UTF-8. For Unix platforms, this is trivial: they accept any byte sequence, and +will generally interpret the byte sequences as UTF-8 when valid to do so. For +Windows, this representation involves a clever hack -- proposed formally as +[WTF-8](http://simonsapin.github.io/wtf-8/) -- that encodes its native UCS-2 in +a generalization of UTF-8. This RFC will not go into the details of that hack; +please read Simon's excellent writeup if you're interested. + +The upshot of all of this is that we can uniformly represent path slices as +newtyped byte slices, and any UTF-8 encoded data will "do the right thing" on +all platforms. + +Furthermore, by not doing any internal, up-front normalization, it's possible to +provide a `Path::new` that goes from `&str` to `&Path` with no intermediate +allocation or validation. In the common case that you're working with Rust +strings to construct paths, there is zero overhead. It also means that +`Path::new(some_str).as_str = Some(some_str)`. + +The main downside of this choice is that some of the path functionality must +cope with non-normalized paths. So, for example, the iterator must skip `.` path +components (unless it is the entire path), and similarly for methods like +`pop`. In general, methods that yield new path slices are expected to work as if: + +* `./b` is just `b` +* `a//b` is just `a/b` + +and comparisons between paths should also behave as if the paths had been +normalized in this way. + +## Organization and platform-specific APIs + +Finally, the proposed API is organized as `std::path` with `unix` and `windows` +submodules, as today. However, there is no `GenericPath` or `GenericPathUnsafe`; +instead, the API given above is implemented as a trivial wrapper around path +implementations provided by either the `unix` or the `windows` submodule (based +on `#[cfg]`). In other words: + +* `std::path::windows::Path` works with Windows-style paths +* `std::path::unix::Path` works with Unix-style paths +* `std::path::Path` is a thin newtype wrapper around the current platform's path implementation + +This organization makes it possible to manipulate foreign paths by working with +the appropriate submodule. + +In addition, each submodule defines some extension traits, explained below, that +supplement the path API with functionality relevant to its variant of path. + +But what if you're writing a platform-specific application and wish to use the +extended functionality directly on `std::path::Path`? In this case, you will be +able to import the appropriate extension trait via `os::unix` or `os::windows`, +depending on your platform. This is part of a new, general strategy for +explicitly "opting-in" to platform-specific features by importing from +`os::some_platform` (where the `some_platform` submodule is available only on +that platform.) + +### Unix + +On Unix platforms, the only additional functionality is to let you work directly +with the underlying byte representation of various path types: + +```rust +pub trait UnixPathBufExt { + fn from_vec(path: Vec) -> Self; + fn into_vec(self) -> Vec; +} + +pub trait UnixPathExt { + fn from_bytes(path: &[u8]) -> &Self; + fn as_bytes(&self) -> &[u8]; +} +``` + +This is acceptable because the platform supports arbitrary byte sequences +(usually interpreted as UTF-8). + +### Windows + +On Windows, the additional APIs allow you to convert to/from UCS-2 (roughly, +arbitrary `u16` sequences interpreted as UTF-16 when applicable); because the +name "UCS-2" does not have a clear meaning, these APIs use `u16_slice` and will +be carefully documented. They also provide the remaining Windows-specific path +decomposition functionality that today's path module supports. + +```rust +pub trait WindowsPathBufExt { + fn from_u16_slice(path: &[u16]) -> Self; + fn make_non_verbatim(&mut self) -> bool; +} + +pub trait WindowsPathExt { + fn is_cwd_relative(&self) -> bool; + fn is_vol_relative(&self) -> bool; + fn is_verbatim(&self) -> bool; + fn prefix(&self) -> PathPrefix; + fn to_u16_slice(&self) -> Vec; +} + +enum PathPrefix<'a> { + Verbatim(&'a Path), + VerbatimUNC(&'a Path, &'a Path), + VerbatimDisk(&'a Path), + DeviceNS(&'a Path), + UNC(&'a Path, &'a Path), + Disk(&'a Path), +} +``` + +# Drawbacks + +The DST/slice approach is conceptually more complex than today's API, but in +practice seems to yield a much tighter API surface. + +# Alternatives + +Due to the known semantic problems, it is not really an option to retain the +current path implementation. As explained above, supporting UCS-2 also means +that the various byte-slice methods in the current API are untenable, so the API +also needs to change. + +Probably the main alternative to the proposed API would be to *not* use +DST/slices, and instead use owned paths everywhere (probably doing some +normalization of `.` at the same time). While the resulting API would be simpler +in some respects, it would also be substantially less efficient for common operations. + +# Unresolved questions + +It is not clear how best to incorporate the +[WTF-8 implementation](https://github.com/SimonSapin/rust-wtf8) (or how much to +incorporate) into `libstd`. + +There has been a long debate over whether paths should implement `Show` given +that they may contain non-UTF-8 data. This RFC does not take a stance on that +(the API may include something like today's `display` adapter), but a follow-up +RFC will address the question more generally. diff --git a/text/0486-std-ascii-reform.md b/text/0486-std-ascii-reform.md new file mode 100644 index 00000000000..037b5deabab --- /dev/null +++ b/text/0486-std-ascii-reform.md @@ -0,0 +1,123 @@ +- Start Date: 2014-11-27 +- RFC PR: https://github.com/rust-lang/rfcs/pull/486 +- Rust Issue: https://github.com/rust-lang/rust/issues/19908 + +# Summary + +Move the `std::ascii::Ascii` type and related traits to a new Cargo package on crates.io, +and instead expose its functionality for `u8`, `[u8]`, `char`, and `str` types. + +# Motivation + +The `std::ascii::Ascii` type is a `u8` wrapper that enforces +(unless `unsafe` code is used) +that the value is in the ASCII range, +similar to `char` with `u32` in the range of Unicode scalar values, +and `String` with `Vec` containing well-formed UTF-8 data. +`[Ascii]` and `Vec` are naturally strings of text entirely in the ASCII range. + +Using the type system like this to enforce data invariants is interesting, +but in practice `Ascii` is not that useful. +Data (such as from the network) is rarely guaranteed to be ASCII only, +nor is it desirable to remove or replace non-ASCII bytes, +even if ASCII-range-only operations are used. +(For example, *ASCII case-insensitive matching* is common in HTML and CSS.) + +Every single use of the `Ascii` type in the Rust distribution +is only to use the `to_lowercase` or `to_uppercase` method, +then immediately convert back to `u8` or `char`. + +# Detailed design + +The `Ascii` type +as well as the `AsciiCast`, `OwnedAsciiCast`, `AsciiStr`, and `IntoBytes` traits +should be copied into a new `ascii` Cargo package on crates.io. +The `std::ascii` copy should be deprecated and removed at some point before Rust 1.0. + +Currently, the `AsciiExt` trait is: + +```rust +pub trait AsciiExt { + fn to_ascii_upper(&self) -> T; + fn to_ascii_lower(&self) -> T; + fn eq_ignore_ascii_case(&self, other: &Self) -> bool; +} + +impl AsciiExt for str { ... } +impl AsciiExt> for [u8] { ... } +``` + +It should gain new methods for the functionality that is being removed with `Ascii`, +be implemented for `u8` and `char`, +and (if this is stable enough yet) use an associated type instead of the `T` parameter: + +```rust +pub trait AsciiExt { + type Owned = Self; + fn to_ascii_upper(&self) -> Owned; + fn to_ascii_lower(&self) -> Owned; + fn eq_ignore_ascii_case(&self, other: &Self) -> bool; + fn is_ascii(&self) -> bool; + + // Maybe? See unresolved questions + fn is_ascii_lowercase(&self) -> bool; + fn is_ascii_uppercase(&self) -> bool; + ... +} + +impl AsciiExt for str { type Owned = String; ... } +impl AsciiExt for [u8] { type Owned = Vec; ... } +impl AsciiExt char { ... } +impl AsciiExt u8 { ... } +``` + +The `OwnedAsciiExt` trait should stay as it is: + +```rust +pub trait OwnedAsciiExt { + fn into_ascii_upper(self) -> Self; + fn into_ascii_lower(self) -> Self; +} + +impl OwnedAsciiExt for String { ... } +impl OwnedAsciiExt for Vec { ... } +``` + +The `std::ascii::escape_default` function has little to do with ASCII. +I *think* it’s relevant to `b'x'` and `b"foo"` byte literals, +which have types `u8` and `&'static [u8]`. +I suggest moving it into `std::u8`. + + +I (@SimonSapin) can help with the implementation work. + + +# Drawbacks + +Code using `Ascii` (not only for e.g. `to_lowercase`) +would need to install a Cargo package to get it. +This is strictly more work than having it in `std`, +but should still be easy. + +# Alternatives + +* The `Ascii` type could stay in `std::ascii` +* Some variations per *Unresolved questions* below. + +# Unresolved questions + +* What to do with `std::ascii::escape_default`? +* Rename the `AsciiExt` and `OwnedAsciiExt` traits? +* Should they be in the prelude? The `Ascii` type and the related traits currently are. +* Are associated type stable enough yet? + If not, `AsciiExt` should temporarily keep its type parameter. +* Which of all the `Ascii::is_*` methods should `AsciiExt` include? Those included should have `ascii` added in their name. + * *Maybe* `is_lowercase`, `is_uppercase`, `is_alphabetic`, or `is_alphanumeric` could be useful, + but I’d be fine with dropping them and reconsider if someone asks for them. + The same result can be achieved + with `.is_ascii() &&` and the corresponding `UnicodeChar` method, + which in most cases has an ASCII fast path. + And in some cases it’s an easy range check like `'a' <= c && c <= 'z'`. + * `is_digit` and `is_hex` are identical to `Char::is_digit(10)` and `Char::is_digit(16)`. + * `is_blank`, `is_control`, `is_graph`, `is_print`, and `is_punctuation` are never used + in the Rust distribution or Servo. diff --git a/text/0490-dst-syntax.md b/text/0490-dst-syntax.md new file mode 100644 index 00000000000..f2a35500ad8 --- /dev/null +++ b/text/0490-dst-syntax.md @@ -0,0 +1,181 @@ +- Start Date: 2014-11-29 +- RFC PR: [490](https://github.com/rust-lang/rfcs/pull/490) +- Rust Issue: [19607](https://github.com/rust-lang/rust/issues/19607) + +Summary +======= + +Change the syntax for dynamically sized type parameters from `Sized? T` to `T: +?Sized`, and change the syntax for traits for dynamically sized types to `trait +Foo for ?Sized`. Extend this new syntax to work with `where` clauses. + +Motivation +========== + +History of the DST syntax +------------------------- + +When dynamically sized types were first designed, and even when they were first +being implemented, the syntax for dynamically sized type parameters had not been +fully settled on. Initially, dynamically sized type parameters were denoted by a +leading `unsized` keyword: + +```rust +fn foo(x: &T) { ... } +struct Foo { field: T } +// etc. +``` + +This is the syntax used in Niko Matsakis’s [initial design for +DST](http://smallcultfollowing.com/babysteps/blog/2014/01/05/dst-take-5/). This +syntax makes sense to those who are familiar with DST, but has some issues which +could be perceived as problems for those learning to work with dynamically sized +types: + +- It implies that the parameter *must* be unsized, where really it’s only + optional; +- It does not visually relate to the `Sized` trait, which is fundamentally + related to declaring a type as unsized (removing the default `Sized` bound). + +Later, Felix S. Klock II [came up with an alternative +syntax](http://blog.pnkfx.org/blog/2014/03/13/an-insight-regarding-dst-grammar-for-rust/) +using the `type` keyword: + +```rust +fn foo(x: &T) { ... } +struct Foo { field: T } +// etc. +``` + +The inspiration behind this is that the union of all sized types and all unsized +types is simply all types. Thus, it makes sense for the most general type +parameter to be written as `type T`. + +This syntax resolves the first problem listed above (i.e., it no longer implies +that the type *must* be unsized), but does not resolve the second. Additionally, +it is possible that some people could be confused by the use of the `type` +keyword, as it contains little meaning—one would assume a bare `T` as a *type* +parameter to be a type already, so what does adding a `type` keyword mean? + +Perhaps because of these concerns, the syntax for dynamically sized type +parameters has since been changed one more time, this time to use the `Sized` +trait’s name followed by a question mark: + +```rust +fn foo(x: &T) { ... } +struct Foo { field: T } +// etc. +``` + +This syntax simply removes the implicit `Sized` bound on every type parameter +using the `?` symbol. It resolves the problem about not mentioning `Sized` that +the first two syntaxes didn’t. It also hints towards being related to sizedness, +resolving the problem that plagued `type`. It also successfully states that +unsizedness is only *optional*—that the parameter may be sized or unsized. This +syntax has stuck, and is the syntax used today. Additionally, it could +potentially be extended to other traits: for example, a new pointer type that +cannot be dropped, `&uninit`, could be added, requiring that it be written to +before being dropped. However, many generic functions assume that any parameter +passed to them can be dropped. `Drop` could be made a default bound to resolve +this, and `Drop?` would remove this bound from a type parameter. + +The problem with `Sized? T` +--------------------------- + +There is some inconsistency present with the `Sized` syntax. After going through +multiple syntaxes for DST, all of which were keywords preceding type parameters, +the `Sized?` annotation stayed *before* the type parameter’s name when it was +adopted as the syntax for dynamically sized type parameters. This can be +considered inconsistent in some ways—`Sized?` looks like a bound, contains a +trait name like a bound does, and changes what types can unify with the type +parameter like a bound does, but does not come *after* the type parameter’s name +like a bound does. This also is inconsistent with Rust’s general pattern of not +using C-style variable declarations (`int x`) but instead using a colon and +placing the type after the name (`x: int`). (A type parameter is not strictly a +variable declaration, but is similar: it declares a new name in a scope.) These +problems together make `Sized?` the only marker that comes before type parameter +or even variable names, and with the addition of negative bounds, it looks even +more inconsistent: + +```rust +// Normal bound +fn foo() {} +// Negative bound +fn foo() {} +// Generalising ‘anti-bound’ +fn foo() {} +``` + +The syntax also looks rather strange when recent features like associated types +and `where` clauses are considered: + +```rust +// This `where` clause syntax doesn’t work today, but perhaps should: +trait Foo where Sized? T { + type Sized? Bar; +} +``` + +Furthermore, the `?` on `Sized?` comes after the trait name, whereas most +unary-operator-like symbols in the Rust language come before what they are +attached to. + +This RFC proposes to change the syntax for dynamically sized type parameters to +`T: ?Sized` to resolve these issues. + +Detailed design +=============== + +Change the syntax for dynamically sized type parameters to `T: ?Sized`: + +```rust +fn foo(x: &T) { ... } +struct Foo { field: Box } +trait Bar { type Baz: ?Sized; } +// etc. +``` + +Change the syntax for traits for dynamically-sized types to have a prefix `?` +instead of a postfix one: + +```rust +trait Foo for ?Sized { ... } +``` + +Allow using this syntax in `where` clauses: + +```rust +fn foo(x: &T) where T: ?Sized { ... } +``` + +Drawbacks +========= + +- The current syntax uses position to distinguish between removing and adding + bounds, while the proposed syntax only uses a symbol. Since `?Sized` is + actually an anti-bound (it removes a bound), it (in some ways) makes sense to + put it on the opposite side of a type parameter to show this. + +- Only a single character separates adding a `Sized` bound and removing an + implicit one. This shouldn’t be a problem in general, as adding a `Sized` + bound to a type parameter is pointless (because it is implicitly there + already). A lint could be added to check for explicit default bounds if this + turns out to be a problem. + +Alternatives +============ + +- Choose one of the previous syntaxes or a new syntax altogether. The drawbacks + of the previous syntaxes are discussed in the ‘History of the DST syntax’ + section of this RFC. + +- Change the syntax to `T: Sized?` instead. This is less consistent with things + like negative bounds (which would probably be something like `T: !Foo`), and + uses a suffix operator, which is less consistent with other parts of Rust’s + syntax. It is, however, closer to the current syntax (`Sized? T`), and looks + more natural because of how `?` is used in natural languages such as English. + +Unresolved questions +==================== + +None. diff --git a/text/0494-c_str-and-c_vec-stability.md b/text/0494-c_str-and-c_vec-stability.md new file mode 100644 index 00000000000..50d2ba45c4b --- /dev/null +++ b/text/0494-c_str-and-c_vec-stability.md @@ -0,0 +1,205 @@ +- Start Date: 2015-01-02 +- RFC PR: https://github.com/rust-lang/rfcs/pull/494 +- Rust Issue: https://github.com/rust-lang/rust/issues/20444 + +# Summary + +* Remove the `std::c_vec` module +* Move `std::c_str` under a new `std::ffi` module, not exporting the `c_str` + module. +* Focus `CString` on *Rust-owned* bytes, providing a static assertion that a + pile of bytes has no interior nuls but has a trailing nul. +* Provide convenience functions for translating *C-owned* types into slices in + Rust. + +# Motivation + +The primary motivation for this RFC is to work out the stabilization of the +`c_str` and `c_vec` modules. Both of these modules exist for interoperating with +C types to ensure that values can cross the boundary of Rust and C relatively +safely. These types also need to be designed with ergonomics in mind to ensure +that it's tough to get them wrong and easy to get them right. + +The current `CString` and `CVec` types are quite old and are long due for a +scrutinization, and these types are currently serving a number of competing +concerns: + +1. A `CString` can both take ownership of a pointer as well as inspect a + pointer. +2. A `CString` is always allocated/deallocated on the libc heap. +3. A `CVec` looks like a slice but does not quite act like one. +4. A `CString` looks like a byte slice but does not quite act like one. +5. There are a number of pieces of duplicated functionality throughout the + standard library when dealing with raw C types. There are a number of + conversion functions on the `Vec` and `String` types as well as the `str` and + `slice` modules. + +In general all of this functionality needs to be reconciled with one another to +provide a consistent and coherence interface when operating with types +originating from C. + +# Detailed design + +In refactoring all usage could be categorized into one of three categories: + +1. A Rust type wants to be passed into C. +2. A C type was handed to Rust, but Rust does not own it. +3. A C type was handed to Rust, and Rust owns it. + +The current `CString` attempts to handle all three of these concerns all at +once, somewhat conflating desires. Additionally, `CVec` provides a fairly +different interface than `CString` while providing similar functionality. + +## A new `std::ffi` + +> **Note**: an old implementation of the design below can be found [in a branch +> of mine][c_str] + +[c_str]: https://github.com/alexcrichton/rust/blob/cstr/src/librustrt/c_str.rs + +The entire `c_str` module will be deleted as-is today and replaced with the +following interface at the new location `std::ffi`: + +```rust +#[deriving(Clone, PartialEq, PartialOrd, Eq, Ord, Hash)] +pub struct CString { /* ... */ } + +impl CString { + pub fn from_slice(s: &[u8]) -> CString { /* ... */ } + pub fn from_vec(s: Vec) -> CString { /* ... */ } + pub unsafe fn from_vec_unchecked(s: Vec) -> CString { /* ... */ } + + pub fn as_slice(&self) -> &[libc::c_char] { /* ... */ } + pub fn as_slice_with_nul(&self) -> &[libc::c_char] { /* ... */ } + pub fn as_bytes(&self) -> &[u8] { /* ... */ } + pub fn as_bytes_with_nul(&self) -> &[u8] { /* ... */ } +} + +impl Deref<[libc::c_char]> for CString { /* ... */ } +impl Show for CString { /* ... */ } + +pub unsafe fn c_str_to_bytes<'a>(raw: &'a *const libc::c_char) -> &'a [u8] { /* ... */ } +pub unsafe fn c_str_to_bytes_with_nul<'a>(raw: &'a *const libc::c_char) -> &'a [u8] { /* ... */ } +``` + +The new `CString` API is focused solely on providing a static assertion that a +byte slice contains no interior nul bytes and there is a terminating nul byte. +A `CString` is usable as a slice of `libc::c_char` similar to how a `Vec` is +usable as a slice, but a `CString` can also be viewed as a byte slice with a +concrete `u8` type. The default of `libc::c_char` was chosen to ensure that +`.as_ptr()` returns a pointer of the right value. Note that `CString` does not +provide a `DerefMut` implementation to maintain the static guarantee that there +are no interior nul bytes. + +### Constructing a `CString` + +One of the major departures from today's API is how a `CString` is constructed. +Today this can be done through the `CString::new` function or the `ToCStr` +trait. These two construction vectors serve two very different purposes, one for +C-originating data and one for Rust-originating data. This redesign of `CString` +is solely focused on going from Rust to C (case 1 above) and only supports +constructors in this flavor. + +The first constructor, `from_slice`, is intended to allow `CString` to implement +an on-the-stack buffer optimization in the future without having to resort to a +`Vec` with its allocation. This is similar to the optimization performed by +`with_c_str` today. Of the other two constructors, `from_vec` will consume a +vector, assert there are no 0 bytes, an then push a 0 byte on the end. The +`from_vec_unchecked` constructor will not perform the verification, but will +still push a zero. Note that both of these constructors expose the fact that a +`CString` is not necessarily valid UTF-8. + +The `ToCStr` trait is removed entirely (including from the prelude) in favor of +these construction functions. This could possibly be re-added in the future, but +for now it will be removed from the module. + +### Working with `*const libc::c_char` + +Instead of using `CString` to look at a `*const libc::c_char`, the module now +provides two conversion functions to go from a C string to a byte slice. The +signature of this function is similar to the new `std::slice::from_raw_buf` +function and will use the lifetime of the pointer itself as an anchor for the +lifetime of the returned slice. + +These two functions solve the use case (2) above where a C string just needs to +be inspected. Because a C string is fundamentally just a pile of bytes, it's +interpreted in Rust as a `u8` slice. With these two functions, all of the +following functions will also be deprecated: + +* `std::str::from_c_str` - this function should be replaced with + `ffi::c_str_to_bytes` plus one of `str::from_utf8` or + `str::from_utf8_unchecked`. +* `String::from_raw_buf` - similarly to `from_c_str`, each step should be + composed individually to perform the required checks. This would involve using + `ffi::c_str_to_bytes`, `str::from_utf8`, and `.to_string()`. +* `String::from_raw_buf_len` - this should be replaced the same way as + `String::from_raw_buf` except that `slice::from_raw_buf` is used instead of + `ffi`. + +## Removing `c_vec` + +The new `ffi` module serves as a solution to desires (1) and (2) above, but +the third use case is left unsolved so far. This is what the current `c_vec` +module is attempting to solve, but it does so in a somewhat ad-hoc fashion. The +constructor for the type takes a `proc` destructor to invoke when the vector is +dropped to allow for custom destruction. To make matters a little more +interesting, the `CVec` type provides a default constructor which invokes +`libc::free` on the pointer. + +Transferring ownership of pointers without a custom deallocation function is in +general quite a dangerous operation for libraries to perform. Not all platforms +support the ability to `malloc` in one library and `free` in the other, and this +is also generally considered an antipattern. + +Creating a custom wrapper struct with a simple `Deref` and `Drop` implementation +as necessary is likely to be sufficient for this use case, so this RFC proposes +removing the entire `c_vec` module with no replacement. It is expected that a +utility crate for interoperating with raw pointers in this fashion may manifest +itself on crates.io, and inclusion into the standard library can be considered +at that time. + +## Working with C Strings + +The design above has been implemented in [a branch][branch] of mine where the +fallout can be seen. The primary impact of this change is that the `to_c_str` +and `with_c_str` methods are no longer in the prelude by default, and +`CString::from_*` must be called in order to create a C string. + +[branch]: https://github.com/alexcrichton/rust/tree/cstr + +# Drawbacks + +* Whenever Rust works with a C string, it's tough to avoid the cost associated + with the initial length calculation. All types provided here involve + calculating the length of a C string up front, and no type is provided to + operate on a C string without calculating its length. + +* With the removal of the `ToCStr` trait, unnecessary allocations may be made + when converting to a `CString`. For example, a `Vec` can be called by + directly calling `CString::from_vec`, but it may be more frequently called via + `CString::from_slice`, resulting in an unnecessary allocation. Note, however, + that one would have to remember to call `into_c_str` on the `ToCStr` trait, so + it doesn't necessarily help too too much. + +* The ergonomics of operating C strings have been somewhat reduced as part of + this design. The `CString::from_slice` method is somewhat long to call + (compared to `to_c_string`), and convenience methods of going straight from a + `*const libc::c_char` were deprecated in favor of only supporting a conversion + to a slice. + +# Alternatives + +* There is an [alternative RFC](https://github.com/rust-lang/rfcs/pull/435) + which discusses pursuit of today's general design of the `c_str` module as + well as a refinement of its current types. + +* The `from_vec_unchecked` function could do precisely 0 work instead of always + pushing a 0 at the end. + +# Unresolved questions + +* On some platforms, `libc::c_char` is not necessarily just one byte, which + these types rely on. It's unclear how much this should affect the design of + this module as to how important these platforms are. + +* Are the `*_with_nul` functions necessary on `CString`? diff --git a/text/0495-array-pattern-changes.md b/text/0495-array-pattern-changes.md new file mode 100644 index 00000000000..eb7d8c83d1b --- /dev/null +++ b/text/0495-array-pattern-changes.md @@ -0,0 +1,152 @@ +- Start Date: 2014-12-03 +- RFC PR: [rust-lang/rfcs#495](https://github.com/rust-lang/rfcs/pull/495) +- Rust Issue: [rust-lang/rust#23121](https://github.com/rust-lang/rust/issues/23121) + +Summary +======= + +Change array/slice patterns in the following ways: + +- Make them only match on arrays (`[T; n]` and `[T]`), not slices; +- Make subslice matching yield a value of type `[T; n]` or `[T]`, not `&[T]` or + `&mut [T]`; +- Allow multiple mutable references to be made to different parts of the same + array or slice in array patterns (resolving rust-lang/rust [issue + #8636](https://github.com/rust-lang/rust/issues/8636)). + +Motivation +========== + +Before DST (and after the removal of `~[T]`), there were only two types based on +`[T]`: `&[T]` and `&mut [T]`. With DST, we can have many more types based on +`[T]`, `Box<[T]>` in particular, but theoretically any pointer type around a +`[T]` could be used. However, array patterns still match on `&[T]`, `&mut [T]`, +and `[T; n]` only, meaning that to match on a `Box<[T]>`, one must first convert +it to a slice, which disallows moves. This may prove to significantly limit the +amount of useful code that can be written using array patterns. + +Another problem with today’s array patterns is in subslice matching, which +specifies that the rest of a slice not matched on already in the pattern should +be put into a variable: + +```rust +let foo = [1i, 2, 3]; +match foo { + [head, tail..] => { + assert_eq!(head, 1); + assert_eq!(tail, &[2, 3]); + }, + _ => {}, +} +``` + +This makes sense, but still has a few problems. In particular, `tail` is a +`&[int]`, even though the compiler can always assert that it will have a length +of `2`, so there is no way to treat it like a fixed-length array. Also, all +other bindings in array patterns are by-value, whereas bindings using subslice +matching are by-reference (even though they don’t use `ref`). This can create +confusing errors because of the fact that the `..` syntax is the only way of +taking a reference to something within a pattern without using the `ref` +keyword. + +Finally, the compiler currently complains when one tries to take multiple +mutable references to different values within the same array in a slice pattern: + +```rust +let foo: &mut [int] = &mut [1, 2, 3]; +match foo { + [ref mut a, ref mut b] => ..., + ... +} +``` + +This fails to compile, because the compiler thinks that this would allow +multiple mutable borrows to the same value (which is not the case). + +Detailed design +=============== + +- Make array patterns match only on arrays (`[T; n]` and `[T]`). For example, + the following code: + + ```rust + let foo: &[u8] = &[1, 2, 3]; + match foo { + [a, b, c] => ..., + ... + } + ``` + + Would have to be changed to this: + + ```rust + let foo: &[u8] = &[1, 2, 3]; + match foo { + &[a, b, c] => ..., + ... + } + ``` + + This change makes slice patterns mirror slice expressions much more closely. + +- Make subslice matching in array patterns yield a value of type `[T; n]` (if + the array is of fixed size) or `[T]` (if not). This means changing most code + that looks like this: + + ```rust + let foo: &[u8] = &[1, 2, 3]; + match foo { + [a, b, c..] => ..., + ... + } + ``` + + To this: + + ```rust + let foo: &[u8] = &[1, 2, 3]; + match foo { + &[a, b, ref c..] => ..., + ... + } + ``` + + It should be noted that if a fixed-size array is matched on using subslice + matching, and `ref` is used, the type of the binding will be `&[T; n]`, *not* + `&[T]`. + +- Improve the compiler’s analysis of multiple mutable references to the same + value within array patterns. This would be done by allowing multiple mutable + references to different elements of the same array (including bindings from + subslice matching): + + ```rust + let foo: &mut [u8] = &mut [1, 2, 3, 4]; + match foo { + &[ref mut a, ref mut b, ref c, ref mut d..] => ..., + ... + } + ``` + +Drawbacks +========= + +- This will break a non-negligible amount of code, requiring people to add `&`s + and `ref`s to their code. + +- The modifications to subslice matching will require `ref` or `ref mut` to be + used in almost all cases. This could be seen as unnecessary. + +Alternatives +============ + +- Do a subset of this proposal; for example, the modifications to subslice + matching in patterns could be removed. + +Unresolved questions +==================== + +- What are the precise implications to the borrow checker of the change to + multiple mutable borrows in the same array pattern? Since it is a + backwards-compatible change, it can be implemented after 1.0 if it turns out + to be difficult to implement. diff --git a/text/0501-consistent_no_prelude_attributes.md b/text/0501-consistent_no_prelude_attributes.md new file mode 100644 index 00000000000..773493daf6d --- /dev/null +++ b/text/0501-consistent_no_prelude_attributes.md @@ -0,0 +1,78 @@ +- Start Date: (2014-12-06) +- RFC PR: https://github.com/rust-lang/rfcs/pull/501 +- Rust Issue: https://github.com/rust-lang/rust/issues/20561 + +# Summary + +Make name and behavior of the `#![no_std]` and `#![no_implicit_prelude]` attributes +consistent by renaming the latter to `#![no_prelude]` and having it only apply to the current +module. + +# Motivation + +Currently, Rust automatically inserts an implicit `extern crate std;` in the crate root that can be +disabled with the `#[no_std]` attribute. + +It also automatically inserts an implicit `use std::prelude::*;` in every module that can be +disabled with the `#[no_implicit_prelude]` attribute. + +Lastly, if `#[no_std]` is used, all module automatically don't import the prelude, so the +`#[no_implicit_prelude]` attribute is unneeded in those cases. + +However, the later attribute is inconsistent with the former in two regards: + +- Naming wise, it redundantly contains the word "implicit" +- Semantic wise, it applies to the current module __and all submodules__. + +That last one is surprising because normally, whether or not a module contains a certain import +does not affect whether or not a sub module contains a certain import, so you'd expect a attribute +that disables an implicit import to only apply to that module as well. + +This behavior also gets in the way in some of the already rare cases where you want to disable the +prelude while still linking to std. + +As an example, the author had been made aware of this behavior of `#[no_implicit_prelude]` while +attempting to prototype a variation of the `Iterator` traits, leading to code that looks like this: + +```rust +mod my_iter { + #![no_implicit_prelude] + + trait Iterator { /* ... */ } + + mod adapters { + /* Tries to access the existing prelude, and fails to resolve */ + } +} +``` + +While such use cases might be resolved by just requiring an explicit `use std::prelude::*;` +in the submodules, it seems like just making the attribute behave as expected is the better outcome. + +Of course, for the cases where you want the prelude disabled for a whole sub tree of modules, it +would now become necessary to add a `#[no_prelude]` attribute in each of them - but that +is consistent with imports in general. + +# Detailed design + +`libsyntax` needs to be changed to accept both the name `no_implicit_prelude` and `no_prelude` for +the attribute. Then the attributes effect on the AST needs to be changed to not deeply remove all +imports, and all fallout of this change needs to be fixed in order for the new semantic to +bootstrap. + +Then a snapshot needs to be made, and all uses of `#[no_implicit_prelude]` can be +changed to `#[no_prelude]` in both the main code base, and user code. + +Finally, the old attribute name should emit a deprecated warning, and be removed in time. + +# Drawbacks + +- The attribute is a rare use case to begin with, so any effort put into this would + distract from more important stabilization work. + +# Alternatives + + - Keep the current behavior + - Remove the `#[no_implicit_prelude]` attribute all together, instead forcing users to use + `#[no_std]` in combination with `extern crate std;` and `use std::prelude::*`. + - Generalize preludes more to allow custom ones, which might superseed the attributes from this RFC. diff --git a/text/0503-prelude-stabilization.md b/text/0503-prelude-stabilization.md new file mode 100644 index 00000000000..64607683683 --- /dev/null +++ b/text/0503-prelude-stabilization.md @@ -0,0 +1,325 @@ +- Start Date: 2014-12-20 +- RFC PR: https://github.com/rust-lang/rfcs/pull/503 +- Rust Issue: https://github.com/rust-lang/rust/issues/20068 + +# Summary + +Stabilize the `std::prelude` module by removing some of the less commonly used +functionality of it. + +# Motivation + +The prelude of the standard library is included into all Rust programs by +default, and is consequently quite an important module to consider when +stabilizing the standard library. Some of the primary tasks of the prelude are: + +* The prelude is used to represent imports that would otherwise occur in nearly + all Rust modules. The threshold for entering the prelude is consequently quite + high as it is unlikely to be able to change in a backwards compatible fashion + as-is. +* Primitive types such as `str` and `char` are unable to have inherent methods + attached to them. In order to provide methods extension traits must be used. + All of these traits are members of the prelude in order to enable methods on + language-defined types. + +This RFC currently focuses on removing functionality from the prelude rather +than adding it. New additions can continue to happen before 1.0 and will be +evaluated on a case-by-case basis. The rationale for removal or inclusion will +be provided below. + +# Detailed Design + +The current `std::prelude` module was copied into the document of this RFC, and +each reexport should be listed below and categorized. The rationale for +inclusion of each type is included inline. + +## Reexports to retain + +This section provides the exact prelude that this RFC proposes: + +```rust +// Boxes are a ubiquitous type in Rust used for representing an allocation with +// a known fixed address. It is also one of the canonical examples of an owned +// type, appearing in many examples and tests. Due to its common usage, the Box +// type is present. +pub use boxed::Box; + +// These two traits are present to provide methods on the `char` primitive type. +// The two traits will be collapsed into one `CharExt` trait in the `std::char` +// module, however instead of reexporting two traits. +pub use char::{Char, UnicodeChar}; + +// One of the most common operations when working with references in Rust is the +// `clone()` method to promote the reference to an owned value. As one of the +// core concepts in Rust used by virtually all programs, this trait is included +// in the prelude. +pub use clone::Clone; + +// It is expected that these traits will be used in generic bounds much more +// frequently than there will be manual implementations. This common usage in +// bounds to provide the fundamental ability to compare two values is the reason +// for the inclusion of these traits in the prelude. +pub use cmp::{PartialEq, PartialOrd, Eq, Ord}; + +// Iterators are one of the most core primitives in the standard libary which is +// used to interoperate between any sort of sequence of data. Due to the +// widespread use, these traits and extension traits are all present in the +// prelude. +// +// The `Iterator*Ext` traits can be removed if generalized where clauses for +// methods are implemented, and they are currently included to represent the +// functionality provided today. The various traits other than `Iterator`, such +// as `DoubleEndedIterator` and `ExactSizeIterator` are provided in order to +// ensure that the methods are available like the `Iterator` methods. +pub use iter::{DoubleEndedIteratorExt, CloneIteratorExt}; +pub use iter::{Extend, ExactSizeIterator}; +pub use iter::{Iterator, IteratorExt, DoubleEndedIterator}; +pub use iter::{IteratorCloneExt}; +pub use iter::{IteratorOrdExt}; + +// As core language concepts and frequently used bounds on generics, these kinds +// are all included in the prelude by default. Note, however, that the exact +// set of kinds in the prelude will be determined by the stabilization of this +// module. +pub use kinds::{Copy, Send, Sized, Sync}; + +// One of Rust's fundamental principles is ownership, and understanding movement +// of types is key to this. The drop function, while a convenience, represents +// the concept of ownership and relinquishing ownership, so it is included. +pub use mem::drop; + +// As described below, very few `ops` traits will continue to remain in the +// prelude. `Drop`, however, stands out from the other operations for many of +// the similar reasons as to the `drop` function. +pub use ops::Drop; + +// Similarly to the `cmp` traits, these traits are expected to be bounds on +// generics quite commonly to represent a pending computation that can be +// executed. +pub use ops::{Fn, FnMut, FnOnce}; + +// The `Option` type is one of Rust's most common and ubiquitous types, +// justifying its inclusion into the prelude along with its two variants. +pub use option::Option::{mod, Some, None}; + +// In order to provide methods on raw pointers, these two traits are included +// into the prelude. It is expected that these traits will be renamed to +// `PtrExt` and `MutPtrExt`. +pub use ptr::{RawPtr, RawMutPtr}; + +// This type is included for the same reasons as the `Option` type. +pub use result::Result::{mod, Ok, Err}; + +// The slice family of traits are all provided in order to export methods on the +// language slice type. The `SlicePrelude` and `SliceAllocPrelude` will be +// collapsed into one `SliceExt` trait by the `std::slice` module. Many of the +// remaining traits require generalized where clauses on methods to be merged +// into the `SliceExt` trait, which may not happen for 1.0. +pub use slice::{SlicePrelude, SliceAllocPrelude, CloneSlicePrelude}; +pub use slice::{CloneSliceAllocPrelude, OrdSliceAllocPrelude}; +pub use slice::{PartialEqSlicePrelude, OrdSlicePrelude}; + +// These traits, like the above traits, are providing inherent methods on +// slices, but are not candidates for merging into `SliceExt`. Nevertheless +// these common operations are included for the purpose of adding methods on +// language-defined types. +pub use slice::{BoxedSlicePrelude, AsSlice, VectorVector}; + +// The str family of traits provide inherent methods on the `str` type. The +// `StrPrelude`, `StrAllocating`, and `UnicodeStrPrelude` traits will all be +// collapsed into one `StrExt` trait to be reexported in the prelude. The `Str` +// trait itself will be handled in the stabilization of the `str` module, but +// for now is included for consistency. Similarly, the `StrVector` trait is +// still undergoing stabilization but remains for consistency. +pub use str::{Str, StrPrelude}; +pub use str::{StrAllocating, UnicodeStrPrelude}; +pub use str::{StrVector}; + +// As the standard library's default owned string type, `String` is provided in +// the prelude. Many of the same reasons for `Box`'s inclusion apply to `String` +// as well. +pub use string::String; + +// Converting types to a `String` is seen as a common-enough operation for +// including this trait in the prelude. +pub use string::ToString; + +// Included for the same reasons as `String` and `Box`. +pub use vec::Vec; +``` + +## Reexports to remove + +All of the following reexports are currently present in the prelude and are +proposed for removal by this RFC. + +```rust +// While currently present in the prelude, these traits do not need to be in +// scope to use the language syntax associated with each trait. These traits are +// also only rarely used in bounds on generics and are consequently +// predominately used for `impl` blocks. Due to this lack of need to be included +// into all modules in Rust, these traits are all removed from the prelude. +pub use ops::{Add, Sub, Mul, Div, Rem, Neg, Not}; +pub use ops::{BitAnd, BitOr, BitXor}; +pub use ops::{Deref, DerefMut}; +pub use ops::{Shl, Shr}; +pub use ops::{Index, IndexMut}; +pub use ops::{Slice, SliceMut}; + +// Now that tuple indexing is a feature of the language, these traits are no +// longer necessary and can be deprecated. +pub use tuple::{Tuple1, Tuple2, Tuple3, Tuple4}; +pub use tuple::{Tuple5, Tuple6, Tuple7, Tuple8}; +pub use tuple::{Tuple9, Tuple10, Tuple11, Tuple12}; + +// Interoperating with ascii data is not necessarily a core language operation +// and the ascii module itself is currently undergoing stabilization. The design +// will likely end up with only one trait (as opposed to the many listed here). +// The prelude will be responsible for providing unicode-respecting methods on +// primitives while requiring that ascii-specific manipulation is imported +// manually. +pub use ascii::{Ascii, AsciiCast, OwnedAsciiCast, AsciiStr}; +pub use ascii::IntoBytes; + +// Inclusion of this trait is mostly a relic of old behavior and there is very +// little need for the `into_cow` method to be ubiquitously available. Although +// mostly used in bounds on generics, this trait is not itself as commonly used +// as `FnMut`, for example. +pub use borrow::IntoCow; + +// The `c_str` module is currently undergoing stabilization as well, but it's +// unlikely for `to_c_str` to be a common operation in almost all Rust code in +// existence, so this trait, if it survives stabilization, is removed from the +// prelude. +pub use c_str::ToCStr; + +// This trait is `#[experimental]` in the `std::cmp` module and the prelude is +// intended to be a stable subset of Rust. If later marked #[stable] the trait +// may re-enter the prelude but it will be removed until that time. +pub use cmp::Equiv; + +// Actual usage of the `Ordering` enumeration and its variants is quite rare in +// Rust code. Implementors of the `Ord` and `PartialOrd` traits will likely be +// required to import these names, but it is not expected that Rust code at +// large will require these names to be in the prelude. +pub use cmp::Ordering::{mod, Less, Equal, Greater}; + +// With language-defined `..` syntax there is no longer a need for the `range` +// function to remain in the prelude. This RFC does, however, recommend leaving +// this function in the prelude until the `..` syntax is implemented in order to +// provide a smoother deprecation strategy. +pub use iter::range; + +// The FromIterator trait does not need to be present in the prelude as it is +// not adding methods to iterators and is mostly only required to be imported by +// implementors, which is not common enough for inclusion. +pub use iter::{FromIterator}; + +// Like `cmp::Equiv`, these two iterators are `#[experimental]` and are +// consequently removed from the prelude. +pub use iter::{RandomAccessIterator, MutableDoubleEndedIterator}; + +// I/O stabilization will have its own RFC soon, and part of that RFC involves +// creating a `std::io::prelude` module which will become the home for these +// traits. This RFC proposes leaving these in the current prelude, however, +// until the I/O stabilization is complete. +pub use io::{Buffer, Writer, Reader, Seek, BufferPrelude}; + +// These two traits are relics of an older `std::num` module which need not be +// included in the prelude any longer. Their methods are not called often, nor +// are they taken as bounds frequently enough to justify inclusion into the +// prelude. +pub use num::{ToPrimitive, FromPrimitive}; + +// As part of the Path stabilization RFC, these traits and structures will be +// removed from the prelude. Note that the ergonomics of opening a File today +// will decrease in the sense that `Path` must be imported, but eventually +// importing `Path` will not be necessary due to the `AsPath` trait. More +// details can be found in the path stabilization RFC. +pub use path::{GenericPath, Path, PosixPath, WindowsPath}; + +// This function is included in the prelude as a convenience function for the +// `FromStr::from_str` associated function. Inclusion of this method, however, +// is inconsistent with respect to the lack of inclusion of a `default` method, +// for example. It is also not necessarily seen as `from_str` being common +// enough to justify its inclusion. +pub use str::from_str; + +// This trait is currently only implemented for `Vec` which is likely to +// be removed as part of `std::ascii` stabilization, obsoleting the need for the +// trait and its inclusion in the prelude. +pub use string::IntoString; + +// The focus of Rust's story about concurrent program has been constantly +// shifting since it was incepted, and the prelude doesn't necessarily always +// keep up. Message passing is only one form of concurrent primitive that Rust +// provides, and inclusion in the prelude can provide the wrong impression that +// it is the *only* concurrent primitive that Rust offers. In order to +// facilitate a more unified front in Rust's concurrency story, these primitives +// will be removed from the prelude (and soon moved to std::sync as well). +// +// Additionally, while spawning a new thread is a common operation in concurrent +// programming, it is not a frequent operation in code in general. For example +// even highly concurrent applications may end up only calling `spawn` in one or +// two locations which does not necessarily justify its inclusion in the prelude +// for all Rust code in existence. +pub use comm::{sync_channel, channel}; +pub use comm::{SyncSender, Sender, Receiver}; +pub use task::spawn; +``` + +## Move to an inner `v1` module + +This RFC also proposes moving all reexports to `std::prelude::v1` module instead +of just inside `std::prelude`. The compiler will then start injecting `use +std::prelude::v1::*`. + +This is a pre-emptive move to help provide room to grow the prelude module over +time. It is unlikely that any reexports could ever be added to the prelude +backwards-compatibly, so newer preludes (which may happen over time) will have +to live in new modules. If the standard library grows multiple preludes over +time, then it is expected for crates to be able to specify which prelude they +would like to be compiled with. This feature is left as an open question, +however, and movement to an inner `v1` module is simply preparation for this +possible move happening in the future. + +The versioning scheme for the prelude over time (if it happens) is also left as +an open question by this RFC. + +# Drawbacks + +A fairly large amount of functionality was removed from the prelude in order to +hone in on the driving goals of the prelude, but this unfortunately means that +many imports must be added throughout code currently using these reexports. It +is expected, however, that the most painful removals will have roughtly equal +ergonomic replacements in the future. For example: + +* Removal of `Path` and friends will retain the current level of ergonomics with + no imports via the `AsPath` trait. +* Removal of `iter::range` will be replaced via the *more* ergonomic `..` + syntax. + +Many other cases which may be initially seen as painful to migrate are intended +to become aligned with other Rust conventions and practices today. For example +getting into the habit of importing implemented traits (such as the `ops` +traits) is consistent with how many implementations will work. Similarly removal +of synchronization primitives allows for consistence in usage of all concurrent +primitives that Rust provides. + +# Alternatives + +A number of alternatives were discussed above, and this section can otherwise +largely be filled with various permutations of moving reexports between the +"keep" and "remove" sections above. + +# Unresolved Questions + +This RFC is fairly aggressive about removing functionality from the prelude, but +is unclear how necessary this is. If Rust grows the ability to +backwards-compatibly modify the prelude in some fasion (for example introducing +multiple preludes that can be opted into) then the aggressive removal may not be +necessary. + +If user-defined preludes are allowed in some form, it is also unclear about how +this would impact the inclusion of reexports in the standard library's prelude +in some form. diff --git a/text/0504-show-stabilization.md b/text/0504-show-stabilization.md new file mode 100644 index 00000000000..6611c7c4371 --- /dev/null +++ b/text/0504-show-stabilization.md @@ -0,0 +1,191 @@ +- Start Date: 2014-12-19 +- RFC PR: https://github.com/rust-lang/rfcs/pull/504 +- Rust Issue: https://github.com/rust-lang/rust/issues/20013 + +# Summary + +Today's `Show` trait will be tasked with the purpose of providing the ability to +inspect the representation of implementors of the trait. A new trait, `String`, +will be introduced to the `std::fmt` module to in order to represent data that +can essentially be serialized to a string, typically representing the precise +internal state of the implementor. + +The `String` trait will take over the `{}` format specifier and the `Show` trait +will move to the now-open `{:?}` specifier. + +# Motivation + +The formatting traits today largely provide clear guidance to what they are +intended for. For example the `Binary` trait is intended for printing the binary +representation of a data type. The ubiquitous `Show` trait, however, is not +quite well defined in its purpose. It is currently used for a number of use +cases which are typically at odds with one another. + +One of the use cases of `Show` today is to provide a "debugging view" of a type. +This provides the easy ability to print *some* string representation of a type +to a stream in order to debug an application. The `Show` trait, however, is also +used for printing user-facing information. This flavor of usage is intended for +display to all users as opposed to just developers. Finally, the `Show` trait is +connected to the `ToString` trait providing the `to_string` method +unconditionally. + +From these use cases of `Show`, a number of pain points have arisen over time: + +1. It's not clear whether all types should implement `Show` or not. Types like + `Path` quite intentionally avoid exposing a string representation (due to + paths not being valid UTF-8 always) and hence do not want a `to_string` + method to be defined on them. +2. It is quite common to use `#[deriving(Show)]` to easily print a Rust + structure. This is not possible, however, when particular members do not + implement `Show` (for example a `Path`). +3. Some types, such as a `String`, desire the ability to "inspect" the + representation as well as printing the representation. An inspection mode, + for example, would escape characters like newlines. +4. Common pieces of functionality, such as `assert_eq!` are tied to the `Show` + trait which is not necessarily implemented for all types. + +The purpose of this RFC is to clearly define what the `Show` trait is intended +to be used for, as well as providing guidelines to implementors of what +implementations should do. + +# Detailed Design + +As described in the motivation section, the intended use cases for the current +`Show` trait are actually motivations for two separate formatting traits. One +trait will be intended for all Rust types to implement in order to easily allow +debugging values for macros such as `assert_eq!` or general `println!` +statements. A separate trait will be intended for Rust types which are +faithfully represented as a string. These types can be represented as a string +in a non-lossy fashion and are intended for general consumption by more than +just developers. + +This RFC proposes naming these two traits `Show` and `String`, respectively. + +## The `String` trait + +A new formatting trait will be added to `std::fmt` as follows: + +```rust +pub trait String for Sized? { + fn fmt(&self, f: &mut Formatter) -> Result; +} +``` + +This trait is identical to all other formatting traits except for its name. The +`String` trait will be used with the `{}` format specifier, typically considered +the default specifier for Rust. + +An implementation of the `String` trait is an assertion that the type can be +faithfully represented as a UTF-8 string at all times. If the type can be +reconstructed from a string, then it is recommended, but not required, that the +following relation be true: + +```rust +assert_eq!(foo, from_str(format!("{}", foo).as_slice()).unwrap()); +``` + +If the type cannot necessarily be reconstructed from a string, then the output +may be less descriptive than the type can provide, but it is guaranteed to be +human readable for all users. + +It is **not** expected that all types implement the `String` trait. Not all +types can satisfy the purpose of this trait, and for example the following types +will not implement the `String` trait: + +* `Path` will abstain as it is not guaranteed to contain valid UTF-8 data. +* `CString` will abstain for the same reasons as `Path`. +* `RefCell` will abstain as it may not be accessed at all times to be + represented as a `String`. +* `Weak` references will abstain for the same reasons as `RefCell`. + +Almost all types that implement `Show` in the standard library today, however, +will implement the `String` trait. For example all primitive integer types, +vectors, slices, strings, and containers will all implement the `String` trait. +The output format will not change from what it is today (no extra escaping or +debugging will occur). + +The compiler will **not** provide an implementation of `#[deriving(String)]` for +types. + +## The `Show` trait + +The current `Show` trait will not change location nor definition, but it will +instead move to the `{:?}` specifier instead of the `{}` specifier (which +`String` now uses). + +An implementation of the `Show` trait is expected for **all** types in Rust and +provides very few guarantees about the output. Output will typically represent +the internal state as faithfully as possible, but it is not expected that this +will always be true. The output of `Show` should never be used to reconstruct +the object itself as it is not guaranteed to be possible to do so. + +The purpose of the `Show` trait is to facilitate debugging Rust code which +implies that it needs to be maximally useful by extending to all Rust types. All +types in the standard library which do not currently implement `Show` will gain +an implementation of the `Show` trait including `Path`, `RefCell`, and `Weak` +references. + +Many implementations of `Show` in the standard library will differ from what +they currently are today. For example `str`'s implementation will escape all +characters such as newlines and tabs in its output. Primitive integers will +print the suffix of the type after the literal in all cases. Characters will +also be printed with surrounding single quotes while escaping values such as +newlines. The purpose of these implementations are to provide debugging views +into these types. + +Implementations of the `Show` trait are expected to never `panic!` and always +produce valid UTF-8 data. The compiler will continue to provide a +`#[deriving(Show)]` implementation to facilitate printing and debugging +user-defined structures. + +## The `ToString` trait + +Today the `ToString` trait is connected to the `Show` trait, but this RFC +proposes wiring it to the newly-proposed `String` trait instead. This switch +enables users of `to_string` to rely on the same guarantees provided by `String` +as well as not erroneously providing the `to_string` method on types that are +not intended to have one. + +It is strongly discouraged to provide an implementation of the `ToString` trait +and not the `String` trait. + +# Drawbacks + +It is inherently easier to understand fewer concepts from the standard library +and introducing multiple traits for common formatting implementations may lead +to frequently mis-remembering which to implement. It is expected, however, that +this will become such a common idiom in Rust that it will become second nature. + +This RFC establishes a convention that `Show` and `String` produce valid UTF-8 +data, but no static guarantee of this requirement is provided. Statically +guaranteeing this invariant would likely involve adding some form of +`TextWriter` which we are currently not willing to stabilize for the 1.0 +release. + +The default format specifier, `{}`, will quickly become unable to print many +types in Rust. Without a `#[deriving]` implementation, manual implementations +are predicted to be fairly sparse. This means that the defacto default may +become `{:?}` for inspecting Rust types, providing pressure to re-shuffle the +specifiers. Currently it is seen as untenable, however, for the default output +format of a `String` to include escaped characters (as opposed to printing the +string). Due to the debugging nature of `Show`, it is seen as a non-starter to +make it the "default" via `{}`. + +It may be too ambitious to define that `String` is a non-lossy representation of +a type, eventually motivating other formatting traits. + +# Alternatives + +The names `String` and `Show` may not necessarily imply "user readable" and +"debuggable". An alternative proposal would be to use `Show` for user +readability and `Inspect` for debugging. This alternative also opens up the door +for other names of the debugging trait like `Repr`. This RFC, however, has +chosen `String` for user readability to provide a clearer connection with the +`ToString` trait as well as emphasizing that the type can be faithfully +represented as a `String`. Additionally, this RFC considers the name `Show` +roughly on par with other alternatives and would help reduce churn for code +migrating today. + +# Unresolved Questions + +None at this time. diff --git a/text/0505-api-comment-conventions.md b/text/0505-api-comment-conventions.md new file mode 100644 index 00000000000..432c82fdcad --- /dev/null +++ b/text/0505-api-comment-conventions.md @@ -0,0 +1,146 @@ +- Start Date: 2014-12-08 +- RFC PR: [rust-lang/rfcs#505](https://github.com/rust-lang/rfcs/pull/505) +- Rust Issue: N/A + +# Summary + +This is a conventions RFC, providing guidance on providing API documentation +for Rust projects, including the Rust language itself. + +# Motivation + +Documentation is an extremely important part of any project. It's important +that we have consistency in our documentation. + +For the most part, the RFC proposes guidelines that are already followed today, +but it tries to motivate and clarify them. + +# Detailed design + +There are a number of individual guidelines. Most of these guidelines are for +any Rust project, but some are specific to documenting `rustc` itself and the +standard library. These are called out specifically in the text itself. + +## Use line comments + +Avoid block comments. Use line comments instead: + +```rust +// Wait for the main task to return, and set the process error code +// appropriately. +``` + +Instead of: + +```rust +/* + * Wait for the main task to return, and set the process error code + * appropriately. + */ +``` + +Only use inner doc comments `//!` to write crate and module-level documentation, +nothing else. When using `mod` blocks, prefer `///` outside of the block: + +```rust +/// This module contains tests +mod test { + // ... +} +``` + +over + +```rust +mod test { + //! This module contains tests + + // ... +} +``` + +## Formatting + +The first line in any doc comment should be a single-line short sentence +providing a summary of the code. This line is used as a summary description +throughout Rustdoc's output, so it's a good idea to keep it short. + +All doc comments, including the summary line, should be properly punctuated. +Prefer full sentences to fragments. + +The summary line should be written in third person singular present indicative +form. Basically, this means write "Returns" instead of "Return". + +## Using Markdown + +Within doc comments, use Markdown to format your documentation. + +Use top level headings # to indicate sections within your comment. Common headings: + +* Examples +* Panics +* Failure + +Even if you only include one example, use the plural form: "Examples" rather +than "Example". Future tooling is easier this way. + +Use graves (`) to denote a code fragment within a sentence. + +Use triple graves (```) to write longer examples, like this: + + This code does something cool. + + ```rust + let x = foo(); + x.bar(); + ``` + +When appropriate, make use of Rustdoc's modifiers. Annotate triple grave blocks with +the appropriate formatting directive. While they default to Rust in Rustdoc, prefer +being explicit, so that it highlights syntax in places that do not, like GitHub. + + ```rust + println!("Hello, world!"); + ``` + + ```ruby + puts "Hello" + ``` + +Rustdoc is able to test all Rust examples embedded inside of documentation, so +it's important to mark what is not Rust so your tests don't fail. + +References and citation should be linked 'reference style.' Prefer + +``` +[Rust website][1] + +[1]: http://www.rust-lang.org +``` + +to + +``` +[Rust website](http://www.rust-lang.org) +``` + +## English + +This section applies to `rustc` and the standard library. + +All documentation is standardized on American English, with regards to +spelling, grammar, and punctuation conventions. Language changes over time, +so this doesn't mean that there is always a correct answer to every grammar +question, but there is often some kind of formal consensus. + +# Drawbacks + +None. + +# Alternatives + +Not having documentation guidelines. + +# Unresolved questions + +None. diff --git a/text/0507-release-channels.md b/text/0507-release-channels.md new file mode 100644 index 00000000000..b1a89071a6e --- /dev/null +++ b/text/0507-release-channels.md @@ -0,0 +1,471 @@ +- Start Date: 2014-10-27 +- RFC PR: [rust-lang/rfcs#507](https://github.com/rust-lang/rfcs/pull/507) +- Rust Issue: [rust-lang/rust#20445](https://github.com/rust-lang/rust/issues/20445) + +# Summary + +This RFC describes changes to the Rust release process, primarily the +division of Rust's time-based releases into 'release channels', +following the 'release train' model used by e.g. Firefox and Chrome; +as well as 'feature staging', which enables the continued development +of unstable language features and libraries APIs while providing +strong stability guarantees in stable releases. + +It also redesigns and simplifies stability attributes to better +integrate with release channels and the other stability-moderating +system in the language, 'feature gates'. While this version of +stability attributes is only suitable for use by the standard +distribution, we leave open the possibility of adding a redesigned +system for the greater cargo ecosystem to annotate feature stability. + +Finally, it discusses how Cargo may leverage feature gates to +determine compatibility of Rust crates with specific revisions of the +Rust language. + +# Motivation + +We soon intend to [provide stable releases][1] of Rust that offer +backwards compatibility with previous stable releases. Still, we +expect to continue developing new features at a rapid pace for some +time to come. We need to be able to provide these features to users +for testing as they are developed while also proving strong stability +guarantees to users. + +[1]: http://blog.rust-lang.org/2014/10/30/Stability.html + +# Detailed design + +The Rust release process moves to a 'release train' model, in which +there are three 'release channels' through which the official Rust +binaries are published: 'nightly', 'beta', and 'stable', and these +release channels correspond to development branches. + +'Nightly` is exactly as today, and where most development occurs; a +separate 'beta' branch provides time for vetting a release and fixing +bugs - particularly in backwards compatibility - before it gets wide +use. Each release cycle beta gets promoted to stable (the release), +and nightly gets promoted to beta. + +The benefits of this model are a few: + +* It provides a window for testing the next release before committing + to it. Currently we release straight from the (very active) master + branch, with almost no testing. + +* It provides a window in which library developers can test their code + against the next release, and - importantly - report unintended + breakage of stable features. + +* It provides a testing ground for unstable features in the + nightly release channel, while allowing the primary releases to + contain only features which are complete and backwards-compatible + ('feature-staging'). + +This proposal describes the practical impact to users of the release +train, particularly with regard to feature staging. A more detailed +description of the impact on the development process is [available +elsewhere][3]. + +## Versioning and releases + +The nature of development and releases differs between channels, as +each serves a specific purpose: nightly is for active development, +beta is for testing and bugfixing, and stable is for final releases. + +Each pending version of Rust progresses in sequence through the +'nightly' and 'beta' channels before being promoted to the 'stable' +channel, at which time the final commit is tagged and that version is +considered 'released'. + +Development cycles are reduced to six weeks from the current twelve. + +Under normal circumstances, the version is only bumped on the nightly +branch, once per development cycle, with the release channel +controlling the label (`-nightly`, `-beta`) appended to the version +number. Other circumstances, such as security incidents, may require +point releases on the stable channel, the policy around which is yet +undetermined. + +Builds of the 'nightly' channel are published every night based on the +content of the master branch. Each published build during a single +development cycle carries *the same version number*, +e.g. '1.0.0-nightly', though for debugging purposes rustc builds can +be uniquely identified by reporting the commit number from which they +were built. As today, published nightly artifacts are simply referred +to as 'rust-nightly' (not named after their version number). Artifacts +produced from the nightly release channel should be considered +transient, though we will maintain historical archives for convenience +of projects that occasionally need to pin to specific revisions. + +Builds of the 'beta' channel are published periodically as fixes are +merged, and like the 'nightly' channel each published build during a +single development cycle retains the same version number, but can be +uniquely identified by the commit number. Beta artifacts are likewise +simply named 'rust-beta'. + +We will ensure that it is convenient to perform continuous integration +of Cargo packages against the beta channel on Travis CI. This will +help detect any accidental breakage early, while not interfering with +their build status. + +Stable builds are versioned and named the same as today's releases, +both with just a bare version number, e.g. '1.0.0'. They are +published at the beginning of each development cycle and once +published are never refreshed or overwritten. Provisions for stable +point releases will be made at a future time. + +## Exceptions for the 1.0.0 beta period + +Under the release train model version numbers are incremented +automatically each release cycle on a predetermined schedule. Six +weeks after 1.0.0 is released 1.1.0 will be released, and six weeks +after that 1.2.0, etc. + +The release cycles approaching 1.0.0 will break with this pattern to +give us leeway to extend 1.0.0 betas for multiple cycles until we are +confident the intended stability guarantees are in place. + +In detail, when the development cycle begins in which we are ready to +publish the 1.0.0 beta, we will *not* publish anything on the stable +channel, and the release on the beta channel will be called +1.0.0-beta1. If 1.0.0 betas extend for multiple cycles, the will be +called 1.0.0-beta2, -beta3, etc, before being promoted to the stable +channel as 1.0.0 and beginning the release train process in full. + +During the beta cycles, as with the normal release cycles, primary +development will be on the nightly branch, with only bugfixes on the +beta branch. + +## Feature staging + +In builds of Rust distributed through the 'beta' and 'stable' release +channels, it is impossible to turn on unstable features +by writing the `#[feature(...)]` attribute. This is accomplished +primarily through a new lint called `unstable_features`. +This lint is set to `allow` by default in nightlies and `forbid` in beta +and stable releases (and by the `forbid` setting cannot be disabled). + +The `unstable_features` lint simply looks for all 'feature' +attributes and emits the message 'unstable feature'. + +The decision to set the feature staging lint is driven by a new field +of the compilation `Session`, `disable_staged_features`. When set to +true the lint pass will configure the feature staging lint to +'forbid', with a `LintSource` of `ReleaseChannel`. When a +`ReleaseChannel` lint is triggered, in addition to the lint's error +message, it is accompanied by the note 'this feature may not be used +in the {channel} release channel', where `{channel}` is the name of +the release channel. + +In feature-staged builds of Rust, rustdoc sets +`disable_staged_features` to *`false`*. Without doing so, it would not +be possible for rustdoc to successfully run against e.g. the +accompanying std crate, as rustdoc runs the lint pass. Additionally, +in feature-staged builds, rustdoc does not generate documentation for +unstable APIs for crates (read below for the impact of feature staging +on unstable APIs). + +With staged features disabled, the Rust build itself is not possible, +and some portion of the test suite will fail. To build the compiler +itself and keep the test suite working the build system activates +a hack via environment variables to disable the feature staging lint, +a mechanism that is not be available under typical use. The build +system additionally includes a way to run the test suite with the +feature staging lint enabled, providing a means of tracking what +portion of the test suite can be run without invoking unstable +features. + +The prelude causes complications with this scheme because prelude +injection presently uses two feature gates: globs, to import the +prelude, and phase, to import the standard `macro_rules!` macros. In +the short term this will be worked-around with hacks in the +compiler. It's likely that these hacks can be removed before 1.0 if +globs and `macro_rules!` imports become stable. + +## Merging stability attributes and feature gates + +In addition to the feature gates that, in conjuction with the +aforementioned `unstable_features` lint, manage the stable evolution +of *language* features, Rust *additionally* has another independent +system for managing the evolution of *library* features, 'stability +attributes'. This system, inspired by node.js, divides APIs into a +number of stability levels: `#[experimental]`, `#[unstable]`, +`#[stable]`, `#[frozen]`, `#[locked]`, and `#[deprecated]`, along with +unmarked functions (which are in most cases considered unstable). + +As a simplifying measure stability attributes are unified with feature +gates, and thus tied to release channels and Rust language versions. + +* All existing stability attributes are removed of any semantic + meaning by the compiler. Existing code that uses these attributes + will continue to compile, but neither rustc nor rustdoc will + interpret them in any way. +* New `#[staged_unstable(...)]`, `#[staged_stable(...)]`, + and `#[staged_deprecated(...)]` attributes are added. +* All three require a `feature` parameter, + e.g. `#[staged_unstable(feature = "chicken_dinner")]`. This signals + that the item tagged by the attribute is part of the named feature. +* The `staged_stable` and `staged_deprecated` attributes require an + additional parameter `since`, whose value is equal to a *version of + the language* (where currently the language version is equal to the + compiler version), e.g. `#[stable(feature = "chicken_dinner", since + = "1.6")]`. + +All stability attributes continue to support an optional `description` +parameter. + +The intent of adding the 'staged_' prefix to the stability attributes +is to leave the more desirable attribute names open for future use. + +With these modifications, new API surface area becomes a new "language +feature" which is controlled via the `#[feature]` attribute just like +other normal language features. The compiler will disallow all usage +of `#[staged_unstable(feature = "foo")]` APIs unless the current crate +declares `#![feature(foo)]`. This enables crates to declare what API +features of the standard library they rely on without opting in to all +unstable API features. + +Examples of APIs tagged with stability attributes: + +``` +#[staged_unstable(feature = "a")] +fn foo() { } + +#[staged_stable(feature = "b", since = "1.6")] +fn bar() { } + +#[staged_stable(feature = "c", since = "1.6")] +#[staged_deprecated(feature = "c", since = "1.7")] +fn baz() { } +``` + +Since *all* feature additions to Rust are associated with a language +version, source code can be finely analyzed for language +compatibility. Association with distinct feature names leads to a +straightforward process for tracking the progression of new features +into the language. More detail on these matters below. + +Some additional restrictions are enforced by the compiler as a sanity +check that they are being used correctly. + +* The `staged_deprecated` attribute *must* be paired with a + `staged_stable` attribute, enforcing that the progression of all + features is from 'staged_unstable' to 'staged_stable' to + 'staged_deprecated' and that the version in which the feature was + promoted to stable is recorded and maintained as well as the version + in which a feature was deprecated. +* Within a crate, the compiler enforces that for all APIs with the + same feature name where any are marked `staged_stable`, all are + either `staged_stable` or `staged_deprecated`. In other words, no + single feature may be partially promoted from `unstable` to + `stable`, but features may be partially deprecated. This ensures + that no APIs are accidentally excluded from stabilization and that + entire features may be considered either 'unstable' or 'stable'. + +It's important to note that these stability attributes are *only known +to be useful to the standard distribution*, because of the explicit +linkage to language versions and release channels. There is though no +mechanism to explicitly forbid their use outside of the standard +distribution. A general mechanism for indicating API stability +will be reconsidered in the future. + +### API lifecycle + +These attributes alter the process of how new APIs are added to the +standard library slightly. First an API will be proposed via the RFC +process, and a name for the API feature being added will be assigned +at that time. When the RFC is accepted, the API will be added to the +standard library with an `#[staged_unstable(feature = +"...")]`attribute indicating what feature the API was assigned to. + +After receiving test coverage from nightly users (who have opted into +the feature) or thorough review, all APIs with a given feature will be +changed from `staged_unstable` to `staged_stable`, adding `since = +"..."` to mark the version in which the promotion occurred, and the +feature is considered stable and may be used on the stable release +channel. + +When a stable API becomes deprecated the `staged_deprecated` attribute +is added in addition to the existing `staged_stable` attribute, as +well recording the version in which the deprecation was performed with +the `since` parameter. + +(Occassionally unstable APIs may be deprecated for the sake of easing +user transitions, in which case they receive both the `staged_stable` +and `staged_deprecated` attributes at once.) + +### Checking `#[feature]` + +The names of features will no longer be a hardcoded list in the compiler +due to the free-form nature of the `#[staged_unstable]` feature names. +Instead, the compiler will perform the following steps when inspecting +`#[feature]` attributes lists: + +1. The compiler will discover all `#![feature]` directives + enabled for the crate and calculate a list of all enabled features. +2. While compiling, all unstable language features used will be + removed from this list. If a used feature is not enabled, then an + error is generated. +3. A new pass, the stability pass, will be extracted from the current + stability lint pass to detect usage of all unstable APIs. If an + unstable API is used, an error is generated if the feature is not + used, and otherwise the feature is removed from the list. +4. If the remaining list of enabled features is not empty, then the + features were not used when compiling the current crate. The compiler + will generate an error in this case unconditionally. + +These steps ensure that the `#[feature]` attribute is used exhaustively +and will check unstable language and library features. + +## Features, Cargo and version detection + +Over time, it has become clear that with an ever-growing number of Rust +releases that crates will want to be able to manage what versions of +rust they indicate they can be compiled with. Some specific use cases are: + +* Although upgrades are highly encouraged, not all users upgrade + immediately. Cargo should be able to help out with the process of + downloading a new dependency and indicating that a newer version of + the Rust compiler is required. +* Not all users will be able to continuously upgrade. Some enterprises, + for example, may upgrade rarely for technical reasons. In doing so, + however, a large portion of the crates.io ecosystem becomes unusable + once accepted features begin to propagate. +* Developers may wish to prepare new releases of libraries during the + beta channel cycle in order to have libraries ready for the next + stable release. In this window, however, published versions will not + be compatible with the current stable compiler (they use new + features). + +To solve this problem, Cargo and crates.io will grow the knowledge of +the minimum required Rust language version required to compile a +crate. Currently the Rust language version coincides with the version +of the `rustc` compiler. + +In the absense of user-supplied information about minimum language +version requirements, *Cargo will attempt to use feature information +to determine version compatibility*: by knowing in which version each +feature of the language and each feature of the library was +stabilized, and by detecting every feature used by a crate, rustc can +determine the minimum version required; and rustc may assume that the +crate will be compatible with future stable releases. There are two +caveats: first, conditional compilation makes it not possible in some +cases to detect all features in use, which may result in Cargo +detecting a minumum version less than that required on all +platforms. For this and other reasons Cargo will allow the minimum +version to be specified manually. Second, rustc can not make any +assumptions about compatibility across major revisions of the +language. + +To calculate this information, Cargo will compile crates just before +publishing. In this process, the Rust compiler will record all used +language features as well as all used `#[staged_stable]` APIs. Each +compiler will contain archival knowledge of what stable version of the +compiler language features were added to, and each `#[staged_stable]` +API has the `since` metadata to tell which version of the compiler it +was released in. The compiler will calculate the maximum of all these +versions (language plus library features) to pass to Cargo. If any +`#[feature]` directive is detected, however, the required Rust +language version is "nightly". + +Cargo will then pass this required language version to crates.io which +will both store it in the index as well as present it as part of the UI. +Each crate will have a "badge" indicating what version of the Rust +compiler is needed to compile it. The "badge" may indicate that the +nightly or beta channels must be used if the version required has not +yet been released (this happens when a crate is published on a +non-stable channel). If the required language version is "nightly", then +the crate will permanently indicate that it requires the "nightly" +version of the language. + +When resolving dependencies, Cargo will discard all incompatible +candidates based on the version of the available compiler. This will +enable authors to publish crates which rely on the current beta channel +while not interfering with users taking advantage of the stable channel. + +# Drawbacks + +Adding multiple release channels and reducing the release cycle from +12 to 6 weeks both increase the amount of release engineering work +required. + +The major risk in feature staging is that, at the 1.0 release not +enough of the language is available to foster a meaningful library +ecosystem around the stable release. While we might expect many users +to continue using nightly releases with or without this change, if the +stable 1.0 release cannot be used in any practical sense it will be +problematic from a PR perspective. Implementing this RFC will require +careful attention to the libraries it affects. + +Recognizing this risk, we must put in place processes to monitor the +compatibility of known Cargo crates with the stable release channel, +using evidence drawn from those crates to prioritize the stabilization +of features and libraries. [This work has already begun][1], with +popular feature gates being ungated, and library stabilization work +being prioritized based on the needs of Cargo crates. + +Syntax extensions, lints, and any program using the compiler APIs +will not be compatible with the stable release channel at 1.0 since it +is not possible to stabilize `#[plugin_registrar]` in time. Plugins +are very popular. This pain will partially be alleviated by a proposed +[Cargo] feature that enables Rust code generation. `macro_rules!` +*is* expected to be stable by 1.0 though. + +[Cargo]: https://github.com/rust-lang/rfcs/pull/403 +[1]: http://blog.rust-lang.org/2014/10/30/Stability.html + +With respect to stability attributes and Cargo, the proposed design is +very specific to the standard library and the Rust compiler without +being intended for use by third-party libraries. It is planned to extend +Cargo's own support for features (distinct from Rust features) to enable +this form of feature development in a first-class method through Cargo. +At this time, however, there are no concrete plans for this design and +it is unlikely to happen soon. + +The attribute syntax for declaring feature names is different for +declaring feature names (a string) and for turning them on (an ident). +This is done as a judgement call that in each context the given syntax +looks best, and accepting that since this is a feature that is not +intended for general use the discrepancy is not a major problem. + +Having Cargo do version detection through feature analysis is known +not to be foolproof, and may present further unknown obstacles. + +# Alternatives + +Leave feature gates and unstable APIs exposed to the stable +channel, as precedented by Haskell, web vendor prefixes, and node.js. + +Make the beta channel a compromise between the nightly and stable +channels, allowing some set of unstable features and APIs. This +would allow more projects to use a 'more stable' release, but would +make beta no longer representative of the pending stable release. + +# Unresolved questions + +The exact method for working around the prelude's use of feature gates +is undetermined. Fixing [#18102] will complicate the situation as the +prelude relies on a bug in lint checking to work at all. + +[#18102]: https://github.com/rust-lang/rust/issues/18102 + +Rustdoc disables the feature-staging lints so they don't cause it to +fail, but I don't know why rustdoc needs to be running lints. It may +be possible to just stop running lints in rustdoc. + +If stability attributes are only for std, that takes away the +`#[deprecated]` attribute from Cargo libs, which is more clearly +applicable. + +What mechanism ensures that all API's have stability coverage? Probably +the will just default to unstable with some 'default' feature name. + +# See Also + +* [Stability as a deliverable][1] +* [Prior work week discussion][2] +* [Prior detailed description of process changes][3] + +[1]: http://blog.rust-lang.org/2014/10/30/Stability.html +[2]: https://github.com/rust-lang/meeting-minutes/blob/master/workweek-2014-08-18/versioning.md) +[3]: http://discuss.rust-lang.org/t/rfc-impending-changes-to-the-release-process/508 diff --git a/text/0509-collections-reform-part-2.md b/text/0509-collections-reform-part-2.md new file mode 100644 index 00000000000..eb43c9c6062 --- /dev/null +++ b/text/0509-collections-reform-part-2.md @@ -0,0 +1,362 @@ +- Start Date: 2014-12-18 +- RFC PR: https://github.com/rust-lang/rfcs/pull/509 +- Rust Issue: https://github.com/rust-lang/rust/issues/19986 + +# Summary + +This RFC shores up the finer details of collections reform. In particular, where the +[previous RFC][part1] +focused on general conventions and patterns, this RFC focuses on specific APIs. It also patches +up any errors that were found during implementation of [part 1][part1]. Some of these changes +have already been implemented, and simply need to be ratified. + +# Motivation + +Collections reform stabilizes "standard" interfaces, but there's a lot that still needs to be +hashed out. + +# Detailed design + +## The fate of entire collections: + +* Stable: Vec, RingBuf, HashMap, HashSet, BTreeMap, BTreeSet, DList, BinaryHeap +* Unstable: Bitv, BitvSet, VecMap +* Move to [collect-rs](https://github.com/Gankro/collect-rs/) for incubation: +EnumSet, bitflags!, LruCache, TreeMap, TreeSet, TrieMap, TrieSet + +The stable collections have solid implementations, well-maintained APIs, are non-trivial, +fundamental, and clearly useful. + +The unstable collections are effectively "on probation". They're ok, but they need some TLC and +further consideration before we commit to having them in the standard library *forever*. Bitv in +particular won't have *quite* the right API without IndexGet *and* IndexSet. + +The collections being moved out are in poor shape. EnumSet is weird/trivial, bitflags is awkward, +LruCache is niche. Meanwhile Tree\* and Trie\* have simply bit-rotted for too long, without anyone +clearly stepping up to maintain them. Their code is scary, and their APIs are out of date. Their +functionality can also already reasonably be obtained through either HashMap or BTreeMap. + +Of course, instead of moving them out-of-tree, they could be left `experimental`, but that would +perhaps be a fate *worse* than death, as it would mean that these collections would only be +accessible to those who opt into running the Rust nightly. This way, these collections will be +available for everyone through the cargo ecosystem. Putting them in `collect-rs` also gives them +a chance to still benefit from a network effect and active experimentation. If they thrive there, +they may still return to the standard library at a later time. + +## Add the following methods: + +* To all collections +``` +/// Moves all the elements of `other` into `Self`, leaving `other` empty. +pub fn append(&mut self, other: &mut Self) +``` + +Collections know everything about themselves, and can therefore move data more +efficiently than any more generic mechanism. Vec's can safely trust their own capacity +and length claims. DList and TreeMap can also reuse nodes, avoiding allocating. + +This is by-ref instead of by-value for a couple reasons. First, it adds symmetry (one doesn't have +to be owned). Second, in the case of array-based structures, it allows `other`'s capacity to be +reused. This shouldn't have much expense in the way of making `other` valid, as almost all of our +collections are basically a no-op to make an empty version of if necessary (usually it amounts to +zeroing a few words of memory). BTree is the only exception the author is aware of (root is pre- +allocated +to avoid an Option). + +* To DList, Vec, RingBuf, BitV: +``` +/// Splits the collection into two at the given index. Useful for similar reasons as `append`. +pub fn split_off(&mut self, at: uint) -> Self; +``` + +* To all other "sorted" collections +``` +/// Splits the collection into two at the given key. Returns everything after the given key, +/// including the key. +pub fn split_off>(&mut self, at: B) -> Self; +``` + +Similar reasoning to `append`, although perhaps even more needed, as there's *no* other mechanism +for moving an entire subrange of a collection efficiently like this. `into_iterator` consumes +the whole collection, and using `remove` methods will do a lot of unnecessary work. For instance, +in the case of `Vec`, using `pop` and `push` will involve many length changes, bounds checks, +unwraps, and ultimately produce a *reversed* Vec. + +* To BitvSet, VecMap: + +``` +/// Reserves capacity for an element to be inserted at `len - 1` in the given +/// collection. The collection may reserve more space to avoid frequent reallocations. +pub fn reserve_len(&mut self, len: uint) + +/// Reserves the minimum capacity for an element to be inserted at `len - 1` in the given +/// collection. +pub fn reserve_len_exact(&mut self, len: uint) +``` + +The "capacity" of these two collections isn't really strongly related to the +number of elements they hold, but rather the largest index an element is stored at. +See Errata and Alternatives for extended discussion of this design. + +* For Ringbuf: +``` +/// Gets two slices that cover the whole range of the RingBuf. +/// The second one may be empty. Otherwise, it continues *after* the first. +pub fn as_slices(&'a self) -> (&'a [T], &'a [T]) +``` + +This provides some amount of support for viewing the RingBuf like a slice. Unfortunately +the RingBuf may be wrapped, making this impossible. See Alternatives for other designs. + +There is an implementation of this at rust-lang/rust#19903. + +* For Vec: +``` +/// Resizes the `Vec` in-place so that `len()` equals to `new_len`. +/// +/// Calls either `grow()` or `truncate()` depending on whether `new_len` +/// is larger than the current value of `len()` or not. +pub fn resize(&mut self, new_len: uint, value: T) where T: Clone +``` + +This is actually easy to implement out-of-tree on top of the current Vec API, but it has +been frequently requested. + +* For Vec, RingBuf, BinaryHeap, HashMap and HashSet: +``` +/// Clears the container, returning its owned contents as an iterator, but keeps the +/// allocated memory for reuse. +pub fn drain(&mut self) -> Drain; +``` + +This provides a way to grab elements out of a collection by value, without +deallocating the storage for the collection itself. + +There is a partial implementation of this at rust-lang/rust#19946. + +============== +## Deprecate + +* `Vec::from_fn(n, f)` use `(0..n).map(f).collect()` +* `Vec::from_elem(n, v)` use `repeat(v).take(n).collect()` +* `Vec::grow` use `extend(repeat(v).take(n))` +* `Vec::grow_fn` use `extend((0..n).map(f))` +* `dlist::ListInsertion` in favour of inherent methods on the iterator + +============== + +## Misc Stabilization: + +* Rename `BinaryHeap::top` to `BinaryHeap::peek`. `peek` is a more clear name than `top`, and is +already used elsewhere in our APIs. + +* `Bitv::get`, `Bitv::set`, where `set` panics on OOB, and `get` returns an Option. `set` may want +to wait on IndexSet being a thing (see Alternatives). + +* Rename SmallIntMap to VecMap. (already done) + +* Stabilize `front`/`back`/`front_mut`/`back_mut` for peeking on the ends of Deques + +* Explicitly specify HashMap's iterators to be non-deterministic between iterations. This would +allow e.g. `next_back` to be implemented as `next`, reducing code complexity. This can be undone +in the future backwards-compatibly, but the reverse does not hold. + +* Move `Vec` from `std::vec` to `std::collections::vec`. + +* Stabilize RingBuf::swap + +============== + +## Clarifications and Errata from Part 1 + +* Not every collection can implement every kind of iterator. This RFC simply wishes to clarify +that iterator implementation should be a "best effort" for what makes sense for the collection. + +* Bitv was marked as having *explicit* growth capacity semantics, when in fact it is implicit +growth. It has the same semantics as Vec. + +* BitvSet and VecMap are part of a surprise *fourth* capacity class, which isn't really based on +the number of elements contained, but on the maximum index stored. This RFC proposes the name of +*maximum growth*. + +* `reserve(x)` should specifically reserve space for `x + len()` elements, as opposed to e.g. `x + +capacity()` elements. + +* Capacity methods should be based on a "best effort" model: + + * `capacity()` can be regarded as a *lower bound* on the number of elements that can be + inserted before a resize occurs. It is acceptable for more elements to be insertable. A + collection may also randomly resize before capacity is met if highly degenerate behaviour + occurs. This is relevant to HashMap, which due to its use of integer multiplication cannot + precisely compute its "true" capacity. It also may wish to resize early if a long chain of + collisions occurs. Note that Vec should make *clear* guarantees about the precision of + capacity, as this is important for `unsafe` usage. + + * `reserve_exact` may be subverted by the collection's own requirements (e.g. many collections + require a capacity related to a power of two for fast modular arithmetic). The allocator may + also give the collection more space than it requests, in which case it may as well use that + space. It will still give you at least as much capacity as you request. + + * `shrink_to_fit` may not shrink to the true minimum size for similar reasons as + `reserve_exact`. + + * Neither `reserve` nor `reserve_exact` can be trusted to reliably produce a specific + capacity. At best you can guarantee that there will be space for the number you ask for. + Although even then `capacity` itself may return a smaller number due to its own fuzziness. + +============== + +## Entry API V2.0 + +The old Entry API: +``` +impl Map { + fn entry<'a>(&'a mut self, key: K) -> Entry<'a, K, V> +} + +pub enum Entry<'a, K: 'a, V: 'a> { + Occupied(OccupiedEntry<'a, K, V>), + Vacant(VacantEntry<'a, K, V>), +} + +impl<'a, K, V> VacantEntry<'a, K, V> { + fn set(self, value: V) -> &'a mut V +} + +impl<'a, K, V> OccupiedEntry<'a, K, V> { + fn get(&self) -> &V + fn get_mut(&mut self) -> &mut V + fn into_mut(self) -> &'a mut V + fn set(&mut self, value: V) -> V + fn take(self) -> V +} +``` + +Based on feedback and collections reform landing, this RFC proposes the following new API: + +``` +impl Map { + fn entry<'a, O: ToOwned>(&'a mut self, key: &O) -> Entry<'a, O, V> +} + +pub enum Entry<'a, O: 'a, V: 'a> { + Occupied(OccupiedEntry<'a, O, V>), + Vacant(VacantEntry<'a, O, V>), +} + +impl Entry<'a, O: 'a, V:'a> { + fn get(self) -> Result<&'a mut V, VacantEntry<'a, O, V>> +} + +impl<'a, K, V> VacantEntry<'a, K, V> { + fn insert(self, value: V) -> &'a mut V +} + +impl<'a, K, V> OccupiedEntry<'a, K, V> { + fn get(&self) -> &V + fn get_mut(&mut self) -> &mut V + fn into_mut(self) -> &'a mut V + fn insert(&mut self, value: V) -> V + fn remove(self) -> V +} +``` + +Replacing get/get_mut with Deref is simply a nice ergonomic improvement. Renaming `set` and `take` +to `insert` and `remove` brings the API more inline with other collection APIs, and makes it +more clear what they do. The convenience method on Entry itself makes it just nicer to use. +Permitting the following `map.entry(key).get().or_else(|vacant| vacant.insert(Vec::new()))`. + +This API should be stabilized for 1.0 with the exception of the impl on Entry itself. + +# Alternatives + +## Traits vs Inherent Impls on Entries +The Entry API as proposed would leave Entry and its two variants defined by each collection. We +could instead make the actual concrete VacantEntry/OccupiedEntry implementors implement a trait. +This would allow Entry to be hoisted up to root of collections, with utility functions implemented +once, as well as only requiring one import when using multiple collections. This *would* require +that the traits be imported, unless we get inherent trait implementations. + +These traits can of course be introduced later. + +============== + +## Alternatives to ToOwned on Entries +The Entry API currently is a bit wasteful in the by-value key case. If, for instance, a user of a +`HashMap` happens to have a String they don't mind losing, they can't pass the String by +-value to the Map. They must pass it by-reference, and have it get cloned. + +One solution to this is to actually have the bound be IntoCow. This will potentially have some +runtime overhead, but it should be dwarfed by the cost of an insertion anyway, and would be a +clear win in the by-value case. + +Another alternative would be an *IntoOwned* trait, which would have the signature `(self) -> +Owned`, as opposed to the current ToOwned `(&self) -> Owned`. IntoOwned more closely matches the +semantics we actually want for our entry keys, because we really don't care about preserving them +after the conversion. This would allow us to dispatch to either a no-op or a full clone as +necessary. This trait would also be appropriate for the CoW type, and in fact all of our current +uses of the type. However the relationship between FromBorrow and IntoOwned is currently awkward +to express with our type system, as it would have to be implemented e.g. for `&str` instead of +`str`. IntoOwned also has trouble co-existing "fully" with ToOwned due to current lack of negative +bounds in where clauses. That is, we would want a blanket impl of IntoOwned for ToOwned, but this +can't be properly expressed for coherence reasons. + +This RFC does not propose either of these designs in favour of choosing the conservative ToOwned +now, with the possibility of "upgrading" into IntoOwned, IntoCow, or something else when we have a +better view of the type-system landscape. + +============== + +## Don't stabilize `Bitv::set` + +We could wait for IndexSet, Or make `set` return a result. +`set` really is redundant with an IndexSet implementation, and we +don't like to provide redundant APIs. On the other hand, it's kind of weird to have only `get`. + +============== + +## `reserve_index` vs `reserve_len` + +`reserve_len` is primarily motivated by BitvSet and VecMap, whose capacity semantics are largely +based around the largest index they have set, and not the number of elements they contain. This +design was chosen for its equivalence to `with_capacity`, as well as possible +future-proofing for adding it to other collections like `Vec` or `RingBuf`. + +However one could instead opt for `reserve_index`, which are effectively the same method, +but with an off-by-one. That is, `reserve_len(x) == reserve_index(x - 1)`. This more closely +matches the intent (let me have index `7`), but has tricky off-by-one with `capacity`. + +Alternatively `reserve_len` could just be called `reserve_capacity`. + +============== + +## RingBuf `as_slice` + +Other designs for this usecase were considered: + +``` +/// Attempts to get a slice over all the elements in the RingBuf, but may instead +/// have to return two slices, in the case that the elements aren't contiguous. +pub fn as_slice(&'a self) -> RingBufSlice<'a, T> + +enum RingBufSlice<'a, T> { + Contiguous(&'a [T]), + Split((&'a [T], &'a [T])), +} +``` + +``` +/// Gets a slice over all the elements in the RingBuf. This may require shifting +/// all the elements to make this possible. +pub fn to_slice(&mut self) -> &[T] +``` + +The one settled on had the benefit of being the simplest. In particular, having the enum wasn't +very helpful, because most code would just create an empty slice anyway in the contiguous case +to avoid code-duplication. + +# Unresolved questions + +`reserve_index` vs `reserve_len` and `Ringbuf::as_slice` are the two major ones. + +[part1]: https://github.com/rust-lang/rfcs/blob/master/text/0235-collections-conventions.md diff --git a/text/0517-io-os-reform.md b/text/0517-io-os-reform.md new file mode 100644 index 00000000000..a7f21e2440b --- /dev/null +++ b/text/0517-io-os-reform.md @@ -0,0 +1,1914 @@ +- Start Date: 2014-12-07 +- RFC PR: [rust-lang/rfcs#517](https://github.com/rust-lang/rfcs/pull/517) +- Rust Issue: [rust-lang/rust#21070](https://github.com/rust-lang/rust/issues/21070) + +# Summary +[Summary]: #summary + +This RFC proposes a significant redesign of the `std::io` and `std::os` modules +in preparation for API stabilization. The specific problems addressed by the +redesign are given in the [Problems] section below, and the key ideas of the +design are given in [Vision for IO]. + +# Note about RFC structure + +This RFC was originally posted as a single monolithic file, which made +it difficult to discuss different parts separately. + +It has now been split into a skeleton that covers (1) the problem +statement, (2) the overall vision and organization, and (3) the +`std::os` module. + +Other parts of the RFC are marked with `(stub)` and will be filed as +follow-up PRs against this RFC. + +# Table of contents +[Table of contents]: #table-of-contents +* [Summary] +* [Table of contents] +* [Problems] + * [Atomicity and the `Reader`/`Writer` traits] + * [Timeouts] + * [Posix and libuv bias] + * [Unicode] + * [stdio] + * [Overly high-level abstractions] + * [The error chaining pattern] +* [Detailed design] + * [Vision for IO] + * [Goals] + * [Design principles] + * [What cross-platform means] + * [Relation to the system-level APIs] + * [Platform-specific opt-in] + * [Proposed organization] + * [Revising `Reader` and `Writer`] + * [Read] + * [Write] + * [String handling] + * [Key observations] + * [The design: `os_str`] + * [The future] + * [Deadlines] (stub) + * [Splitting streams and cancellation] (stub) + * [Modules] + * [core::io] + * [Adapters] + * [Free functions] + * [Seeking] + * [Buffering] + * [Cursor] + * [The std::io facade] + * [Errors] + * [Channel adapters] + * [stdin, stdout, stderr] + * [Printing functions] + * [std::env] + * [std::fs] + * [Free functions] + * [Files] + * [File kinds] + * [File permissions] + * [std::net] + * [TCP] + * [UDP] + * [Addresses] + * [std::process] + * [Command] + * [Child] + * [std::os] + * [Odds and ends] + * [The io prelude] +* [Drawbacks] +* [Alternatives] +* [Unresolved questions] + +# Problems +[Problems]: #problems + +The `io` and `os` modules are the last large API surfaces of `std` that need to +be stabilized. While the basic functionality offered in these modules is +*largely* traditional, many problems with the APIs have emerged over time. The +RFC discusses the most significant problems below. + +This section only covers specific problems with the current library; see +[Vision for IO] for a higher-level view. section. + +## Atomicity and the `Reader`/`Writer` traits +[Atomicity and the `Reader`/`Writer` traits]: #atomicity-and-the-readerwriter-traits + +One of the most pressing -- but also most subtle -- problems with `std::io` is +the lack of *atomicity* in its `Reader` and `Writer` traits. + +For example, the `Reader` trait offers a `read_to_end` method: + +```rust +fn read_to_end(&mut self) -> IoResult> +``` + +Executing this method may involve many calls to the underlying `read` +method. And it is possible that the first several calls succeed, and then a call +returns an `Err` -- which, like `TimedOut`, could represent a transient +problem. Unfortunately, given the above signature, there is no choice but to +simply _throw this data away_. + +The `Writer` trait suffers from a more fundamental problem, since its primary +method, `write`, may actually involve several calls to the underlying system -- +and if a failure occurs, there is no indication of how much was written. + +Existing blocking APIs all have to deal with this problem, and Rust +can and should follow the existing tradition here. See +[Revising `Reader` and `Writer`] for the proposed solution. + +## Timeouts +[Timeouts]: #timeouts + +The `std::io` module supports "timeouts" on virtually all IO objects via a +`set_timeout` method. In this design, every IO object (file, socket, etc.) has +an optional timeout associated with it, and `set_timeout` mutates the associated +timeout. All subsequent blocking operations are implicitly subject to this timeout. + +This API choice suffers from two problems, one cosmetic and the other deeper: + +* The "timeout" is + [actually a *deadline*](https://github.com/rust-lang/rust/issues/15802) and + should be named accordingly. + +* The stateful API has poor composability: when passing a mutable reference of + an IO object to another function, it's possible that the deadline has been + changed. In other words, users of the API can easily interfere with each other + by accident. + +See [Deadlines] for the proposed solution. + +## Posix and libuv bias +[Posix and libuv bias]: #posix-and-libuv-bias + +The current `io` and `os` modules were originally designed when `librustuv` was +providing IO support, and to some extent they reflect the capabilities and +conventions of `libuv` -- which in turn are loosely based on Posix. + +As such, the modules are not always ideal from a cross-platform standpoint, both +in terms of forcing Windows programmings into a Posix mold, and also of offering +APIs that are not actually usable on all platforms. + +The modules have historically also provided *no* platform-specific APIs. + +Part of the goal of this RFC is to set out a clear and extensible story for both +cross-platform and platform-specific APIs in `std`. See [Design principles] for +the details. + +## Unicode +[Unicode]: #unicode + +Rust has followed the [utf8 everywhere](http://utf8everywhere.org/) approach to +its strings. However, at the borders to platform APIs, it is revealed that the +world is not, in fact, UTF-8 (or even Unicode) everywhere. + +Currently our story for platform APIs is that we either assume they can take or +return Unicode strings (suitably encoded) or an uninterpreted byte +sequence. Sadly, this approach does *not* actually cover all platform needs, and +is also not highly ergonomic as presently implemented. (Consider `os::getenv` +which introduces replacement characters (!) versus `os::getenv_as_bytes` which +yields a `Vec`; neither is ideal.) + +This topic was covered in some detail in the +[Path Reform RFC](https://github.com/rust-lang/rfcs/pull/474), but this RFC +gives a more general account in [String handling]. + +## `stdio` +[stdio]: #stdio + +The `stdio` module provides access to readers/writers for `stdin`, `stdout` and +`stderr`, which is essential functionality. However, it *also* provides a means +of changing e.g. "stdout" -- but there is no connection between these two! In +particular, `set_stdout` affects only the writer that `println!` and friends +use, while `set_stderr` affects `panic!`. + +This module needs to be clarified. See [The std::io facade] and +[Functionality moved elsewhere] for the detailed design. + +## Overly high-level abstractions +[Overly high-level abstractions]: #overly-high-level-abstractions + +There are a few places where `io` provides high-level abstractions over system +services without also providing more direct access to the service as-is. For example: + +* The `Writer` trait's `write` method -- a cornerstone of IO -- actually + corresponds to an unbounded number of invocations of writes to the underlying + IO object. This RFC changes `write` to follow more standard, lower-level + practice; see [Revising `Reader` and `Writer`]. + +* Objects like `TcpStream` are `Clone`, which involves a fair amount of + supporting infrastructure. This RFC tackles the problems that `Clone` was + trying to solve more directly; see [Splitting streams and cancellation]. + +The motivation for going lower-level is described in [Design principles] below. + +## The error chaining pattern +[The error chaining pattern]: #the-error-chaining-pattern + +The `std::io` module is somewhat unusual in that most of the functionality it +proves are used through a few key traits (like `Reader`) and these traits are in +turn "lifted" over `IoResult`: + +```rust +impl Reader for IoResult { ... } +``` + +This lifting and others makes it possible to chain IO operations that might +produce errors, without any explicit mention of error handling: + +```rust +File::open(some_path).read_to_end() + ^~~~~~~~~~~ can produce an error + ^~~~ can produce an error +``` + +The result of such a chain is either `Ok` of the outcome, or `Err` of the first +error. + +While this pattern is highly ergonomic, it does not fit particularly well into +our evolving error story +([interoperation](https://github.com/rust-lang/rfcs/pull/201) or +[try blocks](https://github.com/rust-lang/rfcs/pull/243)), and it is the only +module in `std` to follow this pattern. + +Eventually, we would like to write + +```rust +File::open(some_path)?.read_to_end() +``` + +to take advantage of the `FromError` infrastructure, hook into error handling +control flow, and to provide good chaining ergonomics throughout *all* Rust APIs +-- all while keeping this handling a bit more explicit via the `?` +operator. (See https://github.com/rust-lang/rfcs/pull/243 for the rough direction). + +In the meantime, this RFC proposes to phase out the use of impls for +`IoResult`. This will require use of `try!` for the time being. + +(Note: this may put some additional pressure on at least landing the basic use +of `?` instead of today's `try!` before 1.0 final.) + +# Detailed design +[Detailed design]: #detailed-design + +There's a lot of material here, so the RFC starts with high-level goals, +principles, and organization, and then works its way through the various modules +involved. + +## Vision for IO +[Vision for IO]: #vision-for-io + +Rust's IO story had undergone significant evolution, starting from a +`libuv`-style pure green-threaded model to a dual green/native model and now to +a [pure native model](https://github.com/rust-lang/rfcs/pull/230). Given that +history, it's worthwhile to set out explicitly what is, and is not, in scope for +`std::io` + +### Goals +[Goals]: #goals + +For Rust 1.0, the aim is to: + +* Provide a *blocking* API based directly on the services provided by the native + OS for native threads. + + These APIs should cover the basics (files, basic networking, basic process + management, etc) and suffice to write servers following the classic Apache + thread-per-connection model. They should impose essentially zero cost over the + underlying OS services; the core APIs should map down to a single syscall + unless more are needed for cross-platform compatibility. + +* Provide basic blocking abstractions and building blocks (various stream and + buffer types and adapters) based on traditional blocking IO models but adapted + to fit well within Rust. + +* Provide hooks for integrating with low-level and/or platform-specific APIs. + +* Ensure reasonable forwards-compatibility with future async IO models. + +It is explicitly *not* a goal at this time to support asynchronous programming +models or nonblocking IO, nor is it a goal for the blocking APIs to eventually +be used in a nonblocking "mode" or style. + +Rather, the hope is that the basic abstractions of files, paths, sockets, and so +on will eventually be usable directly within an async IO programing model and/or +with nonblocking APIs. This is the case for most existing languages, which offer +multiple interoperating IO models. + +The *long term* intent is certainly to support async IO in some form, +but doing so will require new research and experimentation. + +### Design principles +[Design principles]: #design-principles + +Now that the scope has been clarified, it's important to lay out some broad +principles for the `io` and `os` modules. Many of these principles are already +being followed to some extent, but this RFC makes them more explicit and applies +them more uniformly. + +#### What cross-platform means +[What cross-platform means]: #what-cross-platform-means + +Historically, Rust's `std` has always been "cross-platform", but as discussed in +[Posix and libuv bias] this hasn't always played out perfectly. The proposed +policy is below. **With this policies, the APIs should largely feel like part of +"Rust" rather than part of any legacy, and they should enable truly portable +code**. + +Except for an explicit opt-in (see [Platform-specific opt-in] below), all APIs +in `std` should be cross-platform: + +* The APIs should **only expose a service or a configuration if it is supported on + all platforms**, and if the semantics on those platforms is or can be made + loosely equivalent. (The latter requires exercising some + judgment). Platform-specific functionality can be handled separately + ([Platform-specific opt-in]) and interoperate with normal `std` abstractions. + + This policy rules out functions like `chown` which have a clear meaning on + Unix and no clear interpretation on Windows; the ownership and permissions + models are *very* different. + +* The APIs should **follow Rust's conventions**, including their naming, which + should be platform-neutral. + + This policy rules out names like `fstat` that are the legacy of a particular + platform family. + +* The APIs should **never directly expose the representation** of underlying + platform types, even if they happen to coincide on the currently-supported + platforms. Cross-platform types in `std` should be newtyped. + + This policy rules out exposing e.g. error numbers directly as an integer type. + +The next subsection gives detail on what these APIs should look like in relation +to system services. + +#### Relation to the system-level APIs +[Relation to the system-level APIs]: #relation-to-the-system-level-apis + +How should Rust APIs map into system services? This question breaks down along +several axes which are in tension with one another: + +* **Guarantees**. The APIs provided in the mainline `io` modules should be + predominantly safe, aside from the occasional `unsafe` function. In + particular, the representation should be sufficiently hidden that most use + cases are safe by construction. Beyond memory safety, though, the APIs should + strive to provide a clear multithreaded semantics (using the `Send`/`Sync` + kinds), and should use Rust's type system to rule out various kinds of bugs + when it is reasonably ergonomic to do so (following the usual Rust + conventions). + +* **Ergonomics**. The APIs should present a Rust view of things, making use of + the trait system, newtypes, and so on to make system services fit well with + the rest of Rust. + +* **Abstraction/cost**. On the other hand, the abstractions introduced in `std` + must not induce significant costs over the system services -- or at least, + there must be a way to safely access the services directly without incurring + this penalty. When useful abstractions would impose an extra cost, they must + be pay-as-you-go. + +Putting the above bullets together, **the abstractions must be safe, and they +should be as high-level as possible without imposing a tax**. + +* **Coverage**. Finally, the `std` APIs should over time strive for full + coverage of non-niche, cross-platform capabilities. + +#### Platform-specific opt-in +[Platform-specific opt-in]: #platform-specific-opt-in + +Rust is a systems language, and as such it should expose seamless, no/low-cost +access to system services. In many cases, however, this cannot be done in a +cross-platform way, either because a given service is only available on some +platforms, or because providing a cross-platform abstraction over it would be +costly. + +This RFC proposes *platform-specific opt-in*: submodules of `os` that are named +by platform, and made available via `#[cfg]` switches. For example, `os::unix` +can provide APIs only available on Unix systems, and `os::linux` can drill +further down into Linux-only APIs. (You could even imagine subdividing by OS +versions.) This is "opt-in" in the sense that, like the `unsafe` keyword, it is +very easy to audit for potential platform-specificity: just search for +`os::anyplatform`. Moreover, by separating out subsets like `linux`, it's clear +exactly how specific the platform dependency is. + +The APIs in these submodules are intended to have the same flavor as other `io` +APIs and should interoperate seamlessly with cross-platform types, but: + +* They should be named according to the underlying system services when there is + a close correspondence. + +* They may reveal the underlying OS type if there is nothing to be gained by + hiding it behind an abstraction. + +For example, the `os::unix` module could provide a `stat` function that takes a +standard `Path` and yields a custom struct. More interestingly, `os::linux` +might include an `epoll` function that could operate *directly* on many `io` +types (e.g. various socket types), without any explicit conversion to a file +descriptor; that's what "seamless" means. + +Each of the platform modules will offer a custom `prelude` submodule, +intended for glob import, that includes all of the extension traits +applied to standard IO objects. + +The precise design of these modules is in the very early stages and will likely +remain `#[unstable]` for some time. + +### Proposed organization +[Proposed organization]: #proposed-organization + +The `io` module is currently the biggest in `std`, with an entire hierarchy +nested underneath; it mixes general abstractions/tools with specific IO objects. +The `os` module is currently a bit of a dumping ground for facilities that don't +fit into the `io` category. + +This RFC proposes the revamp the organization by flattening out the hierarchy +and clarifying the role of each module: + +``` +std + env environment manipulation + fs file system + io core io abstractions/adapters + prelude the io prelude + net networking + os + unix platform-specific APIs + linux .. + windows .. + os_str platform-sensitive string handling + process process management +``` + +In particular: + +* The contents of `os` will largely move to `env`, a new module for +inspecting and updating the "environment" (including environment variables, CPU +counts, arguments to `main`, and so on). + +* The `io` module will include things like `Reader` and `BufferedWriter` -- + cross-cutting abstractions that are needed throughout IO. + + The `prelude` submodule will export all of the traits and most of the types + for IO-related APIs; a single glob import should suffice to set you up for + working with IO. (Note: this goes hand-in-hand with *removing* the bits of + `io` currently in the prelude, as + [recently proposed](https://github.com/rust-lang/rfcs/pull/503).) + +* The root `os` module is used purely to house the platform submodules discussed + [above](#platform-specific-opt-in). + +* The `os_str` module is part of the solution to the Unicode problem; see + [String handling] below. + +* The `process` module over time will grow to include querying/manipulating + already-running processes, not just spawning them. + +## Revising `Reader` and `Writer` +[Revising `Reader` and `Writer`]: #revising-reader-and-writer + +The `Reader` and `Writer` traits are the backbone of IO, representing +the ability to (respectively) pull bytes from and push bytes to an IO +object. The core operations provided by these traits follows a very +long tradition for blocking IO, but they are still surprisingly subtle +-- and they need to be revised. + +* **Atomicity and data loss**. As discussed + [above](#atomicity-and-the-reader-writer-traits), the `Reader` and + `Writer` traits currently expose methods that involve multiple + actual reads or writes, and data is lost when an error occurs after + some (but not all) operations have completed. + + The proposed strategy for `Reader` operations is to (1) separate out + various deserialization methods into a distinct framework, (2) + *never* have the internal `read` implementations loop on errors, (3) + cut down on the number of non-atomic read operations and (4) adjust + the remaining operations to provide more flexibility when possible. + + For writers, the main + change is to make `write` only perform a single underlying write + (returning the number of bytes written on success), and provide a + separate `write_all` method. + +* **Parsing/serialization**. The `Reader` and `Writer` traits + currently provide a large number of default methods for + (de)serialization of various integer types to bytes with a given + endianness. Unfortunately, these operations pose atomicity problems + as well (e.g., a read could fail after reading two of the bytes + needed for a `u32` value). + + Rather than complicate the signatures of these methods, the + (de)serialization infrastructure is removed entirely -- in favor of + instead eventually introducing a much richer + parsing/formatting/(de)serialization framework that works seamlessly + with `Reader` and `Writer`. + + Such a framework is out of scope for this RFC, but the + endian-sensitive functionality will be provided elsewhere + (likely out of tree). + +With those general points out of the way, let's look at the details. + +### `Read` +[Read]: #read + +The updated `Reader` trait (and its extension) is as follows: + +```rust +trait Read { + fn read(&mut self, buf: &mut [u8]) -> Result; + + fn read_to_end(&mut self, buf: &mut Vec) -> Result<(), Error> { ... } + fn read_to_string(&self, buf: &mut String) -> Result<(), Error> { ... } +} + +// extension trait needed for object safety +trait ReadExt: Read { + fn bytes(&mut self) -> Bytes { ... } + + ... // more to come later in the RFC +} +impl ReadExt for R {} +``` + +Following the +[trait naming conventions](https://github.com/rust-lang/rfcs/pull/344), +the trait is renamed to `Read` reflecting the clear primary method it +provides. + +The `read` method should not involve internal looping (even over +errors like `EINTR`). It is intended to faithfully represent a single +call to an underlying system API. + +The `read_to_end` and `read_to_string` methods now take explicit +buffers as input. This has multiple benefits: + +* Performance. When it is known that reading will involve some large + number of bytes, the buffer can be preallocated in advance. + +* "Atomicity" concerns. For `read_to_end`, it's possible to use this + API to retain data collected so far even when a `read` fails in the + middle. For `read_to_string`, this is not the case, because UTF-8 + validity cannot be ensured in such cases; but if intermediate + results are wanted, one can use `read_to_end` and convert to a + `String` only at the end. + +Convenience methods like these will retry on `EINTR`. This is partly +under the assumption that in practice, EINTR will *most often* arise +when interfacing with other code that changes a signal handler. Due to +the global nature of these interactions, such a change can suddenly +cause your own code to get an error irrelevant to it, and the code +should probably just retry in those cases. In the case where you are +using EINTR explicitly, `read` and `write` will be available to handle +it (and you can always build your own abstractions on top). + +#### Removed methods + +The proposed `Read` trait is much slimmer than today's `Reader`. The vast +majority of removed methods are parsing/deserialization, which were +discussed above. + +The remaining methods (`read_exact`, `read_at_least`, `push`, +`push_at_least`) were removed for various reasons: + +* `read_exact`, `read_at_least`: these are somewhat more obscure + conveniences that are not particularly robust due to lack of + atomicity. + +* `push`, `push_at_least`: these are special-cases for working with + `Vec`, which this RFC proposes to replace with a more general + mechanism described next. + +To provide some of this functionality in a more composition way, +extend `Vec` with an unsafe method: + +```rust +unsafe fn with_extra(&mut self, n: uint) -> &mut [T]; +``` + +This method is equivalent to calling `reserve(n)` and then providing a +slice to the memory starting just after `len()` entries. Using this +method, clients of `Read` can easily recover the `push` method. + +### `Write` +[Write]: #write + +The `Writer` trait is cut down to even smaller size: + +```rust +trait Write { + fn write(&mut self, buf: &[u8]) -> Result; + fn flush(&mut self) -> Result<(), Error>; + + fn write_all(&mut self, buf: &[u8]) -> Result<(), Error> { .. } + fn write_fmt(&mut self, fmt: &fmt::Arguments) -> Result<(), Error> { .. } +} +``` + +The biggest change here is to the semantics of `write`. Instead of +repeatedly writing to the underlying IO object until all of `buf` is +written, it attempts a *single* write and on success returns the +number of bytes written. This follows the long tradition of blocking +IO, and is a more fundamental building block than the looping write we +currently have. Like `read`, it will propagate EINTR. + +For convenience, `write_all` recovers the behavior of today's `write`, +looping until either the entire buffer is written or an error +occurs. To meaningfully recover from an intermediate error and keep +writing, code should work with `write` directly. Like the `Read` +conveniences, `EINTR` results in a retry. + +The `write_fmt` method, like `write_all`, will loop until its entire +input is written or an error occurs. + +The other methods include endian conversions (covered by +serialization) and a few conveniences like `write_str` for other basic +types. The latter, at least, is already uniformly (and extensibly) +covered via the `write!` macro. The other helpers, as with `Read`, +should migrate into a more general (de)serialization library. + +## String handling +[String handling]: #string-handling + +The fundamental problem with Rust's full embrace of UTF-8 strings is that not +all strings taken or returned by system APIs are Unicode, let alone UTF-8 +encoded. + +In the past, `std` has assumed that all strings are *either* in some form of +Unicode (Windows), *or* are simply `u8` sequences (Unix). Unfortunately, this is +wrong, and the situation is more subtle: + +* Unix platforms do indeed work with arbitrary `u8` sequences (without interior + nulls) and today's platforms usually interpret them as UTF-8 when displayed. + +* Windows, however, works with *arbitrary `u16` sequences* that are roughly + interpreted at UTF-16, but may not actually be valid UTF-16 -- an "encoding" + often called UCS-2; see http://justsolve.archiveteam.org/wiki/UCS-2 for a bit + more detail. + +What this means is that all of Rust's platforms go beyond Unicode, but they do +so in different and incompatible ways. + +The current solution of providing both `str` and `[u8]` versions of +APIs is therefore problematic for multiple reasons. For one, **the +`[u8]` versions are not actually cross-platform** -- even today, they +panic on Windows when given non-UTF-8 data, a platform-specific +behavior. But they are also incomplete, because on Windows you should +be able to work directly with UCS-2 data. + +### Key observations +[Key observations]: #key-observations + +Fortunately, there is a solution that fits well with Rust's UTF-8 strings *and* +offers the possibility of platform-specific APIs. + +**Observation 1**: it is possible to re-encode UCS-2 data in a way that is also + compatible with UTF-8. This is the + [WTF-8 encoding format](http://simonsapin.github.io/wtf-8/) proposed by Simon + Sapin. This encoding has some remarkable properties: + +* Valid UTF-8 data is valid WTF-8 data. When decoded to UCS-2, the result is + exactly what would be produced by going straight from UTF-8 to UTF-16. In + other words, making up some methods: + + ```rust + my_ut8_data.to_wtf8().to_ucs2().as_u16_slice() == my_utf8_data.to_utf16().as_u16_slice() + ``` + +* Valid UTF-16 data re-encoded as WTF-8 produces the corresponding UTF-8 data: + + ```rust + my_utf16_data.to_wtf8().as_bytes() == my_utf16_data.to_utf8().as_bytes() + ``` + +These two properties mean that, when working with Unicode data, the WTF-8 +encoding is highly compatible with both UTF-8 *and* UTF-16. In particular, the +conversion from a Rust string to a WTF-8 string is a no-op, and the conversion +in the other direction is just a validation. + +**Observation 2**: all platforms can *consume* Unicode data (suitably + re-encoded), and it's also possible to validate the data they produce as + Unicode and extract it. + +**Observation 3**: the non-Unicode spaces on various platforms are deeply + incompatible: there is no standard way to port non-Unicode data from one to + another. Therefore, the only cross-platform APIs are those that work entirely + with Unicode. + +### The design: `os_str` +[The design: `os_str`]: #the-design-os_str + +The observations above lead to a somewhat radical new treatment of strings, +first proposed in the +[Path Reform RFC](https://github.com/rust-lang/rfcs/pull/474). This RFC proposes +to introduce new string and string slice types that (opaquely) represent +*platform-sensitive strings*, housed in the `std::os_str` module. + +The `OsString` type is analogous to `String`, and `OsStr` is analogous to `str`. +Their backing implementation is platform-dependent, but they offer a +cross-platform API: + +```rust +pub mod os_str { + /// Owned OS strings + struct OsString { + inner: imp::Buf + } + /// Slices into OS strings + struct OsStr { + inner: imp::Slice + } + + // Platform-specific implementation details: + #[cfg(unix)] + mod imp { + type Buf = Vec; + type Slice = [u8]; + ... + } + + #[cfg(windows)] + mod imp { + type Buf = Wtf8Buf; // See https://github.com/SimonSapin/rust-wtf8 + type Slice = Wtf8; + ... + } + + impl OsString { + pub fn from_string(String) -> OsString; + pub fn from_str(&str) -> OsString; + pub fn as_slice(&self) -> &OsStr; + pub fn into_string(Self) -> Result; + pub fn into_string_lossy(Self) -> String; + + // and ultimately other functionality typically found on vectors, + // but CRUCIALLY NOT as_bytes + } + + impl Deref for OsString { ... } + + impl OsStr { + pub fn from_str(value: &str) -> &OsStr; + pub fn as_str(&self) -> Option<&str>; + pub fn to_string_lossy(&self) -> CowString; + + // and ultimately other functionality typically found on slices, + // but CRUCIALLY NOT as_bytes + } + + trait IntoOsString { + fn into_os_str_buf(self) -> OsString; + } + + impl IntoOsString for OsString { ... } + impl<'a> IntoOsString for &'a OsStr { ... } + + ... +} +``` + +These APIs make OS strings appear roughly as opaque vectors (you +cannot see the byte representation directly), and can always be +produced starting from Unicode data. They make it possible to collapse +functions like `getenv` and `getenv_as_bytes` into a single function +that produces an OS string, allowing the client to decide how (or +whether) to extract Unicode data. It will be possible to do things +like concatenate OS strings without ever going through Unicode. + +It will also likely be possible to do things like search for Unicode +substrings. The exact details of the API are left open and are likely +to grow over time. + +In addition to APIs like the above, there will also be +platform-specific ways of viewing or constructing OS strings that +reveals more about the space of possible values: + +```rust +pub mod os { + #[cfg(unix)] + pub mod unix { + trait OsStringExt { + fn from_vec(Vec) -> Self; + fn into_vec(Self) -> Vec; + } + + impl OsStringExt for os_str::OsString { ... } + + trait OsStrExt { + fn as_byte_slice(&self) -> &[u8]; + fn from_byte_slice(&[u8]) -> &Self; + } + + impl OsStrExt for os_str::OsStr { ... } + + ... + } + + #[cfg(windows)] + pub mod windows{ + // The following extension traits provide a UCS-2 view of OS strings + + trait OsStringExt { + fn from_wide_slice(&[u16]) -> Self; + } + + impl OsStringExt for os_str::OsString { ... } + + trait OsStrExt { + fn to_wide_vec(&self) -> Vec; + } + + impl OsStrExt for os_str::OsStr { ... } + + ... + } + + ... +} +``` + +By placing these APIs under `os`, using them requires a clear *opt in* +to platform-specific functionality. + +### The future +[The future]: #the-future + +Introducing an additional string type is a bit daunting, since many +existing APIs take and consume only standard Rust strings. Today's +solution demands that strings coming from the OS be assumed or turned +into Unicode, and the proposed API continues to allow that (with more +explicit and finer-grained control). + +In the long run, however, robust applications are likely to work +opaquely with OS strings far beyond the boundary to the system to +avoid data loss and ensure maximal compatibility. If this situation +becomes common, it should be possible to introduce an abstraction over +various string types and generalize most functions that work with +`String`/`str` to instead work generically. This RFC does *not* +propose taking any such steps now -- but it's important that we *can* +do so later if Rust's standard strings turn out to not be sufficient +and OS strings become commonplace. + +## Deadlines +[Deadlines]: #deadlines + +> To be added in a follow-up PR. + +## Splitting streams and cancellation +[Splitting streams and cancellation]: #splitting-streams-and-cancellation + +> To be added in a follow-up PR. + +## Modules +[Modules]: #modules + +Now that we've covered the core principles and techniques used +throughout IO, we can go on to explore the modules in detail. + +### `core::io` +[core::io]: #coreio + +Ideally, the `io` module will be split into the parts that can live in +`libcore` (most of it) and the parts that are added in the `std::io` +facade. This part of the organization is non-normative, since it +requires changes to today's `IoError` (which currently references +`String`); if these changes cannot be performed, everything here will +live in `std::io`. + +#### Adapters +[Adapters]: #adapters + +The current `std::io::util` module offers a number of `Reader` and +`Writer` "adapters". This RFC refactors the design to more closely +follow `std::iter`. Along the way, it generalizes the `by_ref` adapter: + +```rust +trait ReadExt: Read { + // ... eliding the methods already described above + + // Postfix version of `(&mut self)` + fn by_ref(&mut self) -> &mut Self { ... } + + // Read everything from `self`, then read from `next` + fn chain(self, next: R) -> Chain { ... } + + // Adapt `self` to yield only the first `limit` bytes + fn take(self, limit: u64) -> Take { ... } + + // Whenever reading from `self`, push the bytes read to `out` + #[unstable] // uncertain semantics of errors "halfway through the operation" + fn tee(self, out: W) -> Tee { ... } +} + +trait WriteExt: Write { + // Postfix version of `(&mut self)` + fn by_ref<'a>(&'a mut self) -> &mut Self { ... } + + // Whenever bytes are written to `self`, write them to `other` as well + #[unstable] // uncertain semantics of errors "halfway through the operation" + fn broadcast(self, other: W) -> Broadcast { ... } +} + +// An adaptor converting an `Iterator` to `Read`. +pub struct IterReader { ... } +``` + +As with `std::iter`, these adapters are object unsafe and hence placed +in an extension trait with a blanket `impl`. + +#### Free functions +[Free functions]: #free-functions + +The current `std::io::util` module also includes a number of primitive +readers and writers, as well as `copy`. These are updated as follows: + +```rust +// A reader that yields no bytes +fn empty() -> Empty; // in theory just returns `impl Read` + +impl Read for Empty { ... } + +// A reader that yields `byte` repeatedly (generalizes today's ZeroReader) +fn repeat(byte: u8) -> Repeat; + +impl Read for Repeat { ... } + +// A writer that ignores the bytes written to it (/dev/null) +fn sink() -> Sink; + +impl Write for Sink { ... } + +// Copies all data from a `Read` to a `Write`, returning the amount of data +// copied. +pub fn copy(r: &mut R, w: &mut W) -> Result +``` + +Like `write_all`, the `copy` method will discard the amount of data already +written on any error and also discard any partially read data on a `write` +error. This method is intended to be a convenience and `write` should be used +directly if this is not desirable. + +#### Seeking +[Seeking]: #seeking + +The seeking infrastructure is largely the same as today's, except that +`tell` is removed and the `seek` signature is refactored with more precise +types: + +```rust +pub trait Seek { + // returns the new position after seeking + fn seek(&mut self, pos: SeekFrom) -> Result; +} + +pub enum SeekFrom { + Start(u64), + End(i64), + Current(i64), +} +``` + +The old `tell` function can be regained via `seek(SeekFrom::Current(0))`. + +#### Buffering +[Buffering]: #buffering + +The current `Buffer` trait will be renamed to `BufRead` for +clarity (and to open the door to `BufWrite` at some later +point): + +```rust +pub trait BufRead: Read { + fn fill_buf(&mut self) -> Result<&[u8], Error>; + fn consume(&mut self, amt: uint); + + fn read_until(&mut self, byte: u8, buf: &mut Vec) -> Result<(), Error> { ... } + fn read_line(&mut self, buf: &mut String) -> Result<(), Error> { ... } +} + +pub trait BufReadExt: BufRead { + // Split is an iterator over Result, Error> + fn split(&mut self, byte: u8) -> Split { ... } + + // Lines is an iterator over Result + fn lines(&mut self) -> Lines { ... }; + + // Chars is an iterator over Result + fn chars(&mut self) -> Chars { ... } +} +``` + +The `read_until` and `read_line` methods are changed to take explicit, +mutable buffers, for similar reasons to `read_to_end`. (Note that +buffer reuse is particularly common for `read_line`). These functions +include the delimiters in the strings they produce, both for easy +cross-platform compatibility (in the case of `read_line`) and for ease +in copying data without loss (in particular, distinguishing whether +the last line included a final delimiter). + +The `split` and `lines` methods provide iterator-based versions of +`read_until` and `read_line`, and *do not* include the delimiter in +their output. This matches conventions elsewhere (like `split` on +strings) and is usually what you want when working with iterators. + +The `BufReader`, `BufWriter` and `BufStream` types stay +essentially as they are today, except that for streams and writers the +`into_inner` method yields the structure back in the case of a write error, +and its behavior is clarified to writing out the buffered data without +flushing the underlying reader: +```rust +// If writing fails, you get the unwritten data back +fn into_inner(self) -> Result>; + +pub struct IntoInnerError(W, Error); + +impl IntoInnerError { + pub fn error(&self) -> &Error { ... } + pub fn into_inner(self) -> W { ... } +} +impl FromError> for Error { ... } +``` + +#### `Cursor` +[Cursor]: #cursor + +Many applications want to view in-memory data as either an implementor of `Read` +or `Write`. This is often useful when composing streams or creating test cases. +This functionality primarily comes from the following implementations: + +```rust +impl<'a> Read for &'a [u8] { ... } +impl<'a> Write for &'a mut [u8] { ... } +impl Write for Vec { ... } +``` + +While efficient, none of these implementations support seeking (via an +implementation of the `Seek` trait). The implementations of `Read` and `Write` +for these types is not quite as efficient when `Seek` needs to be used, so the +`Seek`-ability will be opted-in to with a new `Cursor` structure with the +following API: + +```rust +pub struct Cursor { + pos: u64, + inner: T, +} +impl Cursor { + pub fn new(inner: T) -> Cursor; + pub fn into_inner(self) -> T; + pub fn get_ref(&self) -> &T; +} + +// Error indicating that a negative offset was seeked to. +pub struct NegativeOffset; + +impl Seek for Cursor> { ... } +impl<'a> Seek for Cursor<&'a [u8]> { ... } +impl<'a> Seek for Cursor<&'a mut [u8]> { ... } + +impl Read for Cursor> { ... } +impl<'a> Read for Cursor<&'a [u8]> { ... } +impl<'a> Read for Cursor<&'a mut [u8]> { ... } + +impl BufRead for Cursor> { ... } +impl<'a> BufRead for Cursor<&'a [u8]> { ... } +impl<'a> BufRead for Cursor<&'a mut [u8]> { ... } + +impl<'a> Write for Cursor<&'a mut [u8]> { ... } +impl Write for Cursor> { ... } +``` + +A sample implementation can be found in [a gist][cursor-impl]. Using one +`Cursor` structure allows to emphasize that the only ability added is an +implementation of `Seek` while still allowing all possible I/O operations for +various types of buffers. + +[cursor-impl]: https://gist.github.com/alexcrichton/8224f57ed029929447bd + +It is not currently proposed to unify these implementations via a trait. For +example a `Cursor>` is a reasonable instance to have, but it will not +have an implementation listed in the standard library to start out. It is +considered a backwards-compatible addition to unify these various `impl` blocks +with a trait. + +The following types will be removed from the standard library and replaced as +follows: + +* `MemReader` -> `Cursor>` +* `MemWriter` -> `Cursor>` +* `BufReader` -> `Cursor<&[u8]>` or `Cursor<&mut [u8]>` +* `BufWriter` -> `Cursor<&mut [u8]>` + +### The `std::io` facade +[The std::io facade]: #the-stdio-facade + +The `std::io` module will largely be a facade over `core::io`, but it +will add some functionality that can live only in `std`. + +#### `Errors` +[Errors]: #error + +The `IoError` type will be renamed to `std::io::Error`, following our +[non-prefixing convention](https://github.com/rust-lang/rfcs/pull/356). +It will remain largely as it is today, but its fields will be made +private. It may eventually grow a field to track the underlying OS +error code. + +The `std::io::IoErrorKind` type will become `std::io::ErrorKind`, and +`ShortWrite` will be dropped (it is no longer needed with the new +`Write` semantics), which should decrease its footprint. The +`OtherIoError` variant will become `Other` now that `enum`s are +namespaced. Other variants may be added over time, such as `Interrupted`, +as more errors are classified from the system. + +The `EndOfFile` variant will be removed in favor of returning `Ok(0)` +from `read` on end of file (or `write` on an empty slice for example). This +approach clarifies the meaning of the return value of `read`, matches Posix +APIs, and makes it easier to use `try!` in the case that a "real" error should +be bubbled out. (The main downside is that higher-level operations that might +use `Result` with some `T != usize` may need to wrap `IoError` in a +further enum if they wish to forward unexpected EOF.) + +#### Channel adapters +[Channel adapters]: #channel-adapters + +The `ChanReader` and `ChanWriter` adapters will be left as they are today, and +they will remain `#[unstable]`. The channel adapters currently suffer from a few +problems today, some of which are inherent to the design: + +* Construction is somewhat unergonomic. First a `mpsc` channel pair must be + created and then each half of the reader/writer needs to be created. +* Each call to `write` involves moving memory onto the heap to be sent, which + isn't necessarily efficient. +* The design of `std::sync::mpsc` allows for growing more channels in the + future, but it's unclear if we'll want to continue to provide a reader/writer + adapter for each channel we add to `std::sync`. + +These types generally feel as if they're from a different era of Rust (which +they are!) and may take some time to fit into the current standard library. They +can be reconsidered for stabilization after the dust settles from the I/O +redesign as well as the recent `std::sync` redesign. At this time, however, this +RFC recommends they remain unstable. + +#### `stdin`, `stdout`, `stderr` +[stdin, stdout, stderr]: #stdin-stdout-stderr + +The current `stdio` module will be removed in favor of these constructors in the +`io` module: + +```rust +pub fn stdin() -> Stdin; +pub fn stdout() -> Stdout; +pub fn stderr() -> Stderr; +``` + +* `stdin` - returns a handle to a **globally shared** standard input of + the process which is buffered as well. Due to the globally shared nature of + this handle, all operations on `Stdin` directly will acquire a lock internally + to ensure access to the shared buffer is synchronized. This implementation + detail is also exposed through a `lock` method where the handle can be + explicitly locked for a period of time so relocking is not necessary. + + The `Read` trait will be implemented directly on the returned `Stdin` handle + but the `BufRead` trait will not be (due to synchronization concerns). The + locked version of `Stdin` (`StdinLock`) will provide an implementation of + `BufRead`. + + The design will largely be the same as is today with the `old_io` module. + + ```rust + impl Stdin { + fn lock(&self) -> StdinLock; + fn read_line(&mut self, into: &mut String) -> io::Result<()>; + fn read_until(&mut self, byte: u8, into: &mut Vec) -> io::Result<()>; + } + impl Read for Stdin { ... } + impl Read for StdinLock { ... } + impl BufRead for StdinLock { ... } + ``` + +* `stderr` - returns a **non buffered** handle to the standard error output + stream for the process. Each call to `write` will roughly translate to a + system call to output data when written to `stderr`. This handle is locked + like `stdin` to ensure, for example, that calls to `write_all` are atomic with + respect to one another. There will also be an RAII guard to lock the handle + and use the result as an instance of `Write`. + + ```rust + impl Stderr { + fn lock(&self) -> StderrLock; + } + impl Write for Stderr { ... } + impl Write for StderrLock { ... } + ``` + +* `stdout` - returns a **globally buffered** handle to the standard output of + the current process. The amount of buffering can be decided at runtime to + allow for different situations such as being attached to a TTY or being + redirected to an output file. The `Write` trait will be implemented for this + handle, and like `stderr` it will be possible to lock it and then use the + result as an instance of `Write` as well. + + ```rust + impl Stdout { + fn lock(&self) -> StdoutLock; + } + impl Write for Stdout { ... } + impl Write for StdoutLock { ... } + ``` + +#### Windows and stdio +[Windows stdio]: #windows-and-stdio + +On Windows, standard input and output handles can work with either arbitrary +`[u8]` or `[u16]` depending on the state at runtime. For example a program +attached to the console will work with arbitrary `[u16]`, but a program attached +to a pipe would work with arbitrary `[u8]`. + +To handle this difference, the following behavior will be enforced for the +standard primitives listed above: + +* If attached to a pipe then no attempts at encoding or decoding will be done, + the data will be ferried through as `[u8]`. + +* If attached to a console, then `stdin` will attempt to interpret all input as + UTF-16, re-encoding into UTF-8 and returning the UTF-8 data instead. This + implies that data will be buffered internally to handle partial reads/writes. + Invalid UTF-16 will simply be discarded returning an `io::Error` explaining + why. + +* If attached to a console, then `stdout` and `stderr` will attempt to interpret + input as UTF-8, re-encoding to UTF-16. If the input is not valid UTF-8 then an + error will be returned and no data will be written. + +#### Raw stdio +[Raw stdio]: #raw-stdio + +> **Note**: This section is intended to be a sketch of possible raw stdio +> support, but it is not planned to implement or stabilize this +> implementation at this time. + +The above standard input/output handles all involve some form of locking or +buffering (or both). This cost is not always wanted, and hence raw variants will +be provided. Due to platform differences across unix/windows, the following +structure will be supported: + +```rust +mod os { + mod unix { + mod stdio { + struct Stdio { .. } + + impl Stdio { + fn stdout() -> Stdio; + fn stderr() -> Stdio; + fn stdin() -> Stdio; + } + + impl Read for Stdio { ... } + impl Write for Stdio { ... } + } + } + + mod windows { + mod stdio { + struct Stdio { ... } + struct StdioConsole { ... } + + impl Stdio { + fn stdout() -> io::Result; + fn stderr() -> io::Result; + fn stdin() -> io::Result; + } + // same constructors StdioConsole + + impl Read for Stdio { ... } + impl Write for Stdio { ... } + + impl StdioConsole { + // returns slice of what was read + fn read<'a>(&self, buf: &'a mut OsString) -> io::Result<&'a OsStr>; + // returns remaining part of `buf` to be written + fn write<'a>(&self, buf: &'a OsStr) -> io::Result<&'a OsStr>; + } + } + } +} +``` + +There are some key differences from today's API: + +* On unix, the API has not changed much except that the handles have been + consolidated into one type which implements both `Read` and `Write` (although + writing to stdin is likely to generate an error). +* On windows, there are two sets of handles representing the difference between + "console mode" and not (e.g. a pipe). When not a console the normal I/O traits + are implemented (delegating to `ReadFile` and `WriteFile`. The console mode + operations work with `OsStr`, however, to show how they work with UCS-2 under + the hood. + +#### Printing functions +[Printing functions]: #printing-functions + +The current `print`, `println`, `print_args`, and `println_args` functions will +all be "removed from the public interface" by [prefixing them with `__` and +marking `#[doc(hidden)]`][gh22607]. These are all implementation details of the +`print!` and `println!` macros and don't need to be exposed in the public +interface. + +[gh22607]: https://github.com/rust-lang/rust/issues/22607 + +The `set_stdout` and `set_stderr` functions will be removed with no replacement +for now. It's unclear whether these functions should indeed control a thread +local handle instead of a global handle as whether they're justified in the +first place. It is a backwards-compatible extension to allow this sort of output +to be redirected and can be considered if the need arises. + +### `std::env` +[std::env]: #stdenv + +Most of what's available in `std::os` today will move to `std::env`, +and the signatures will be updated to follow this RFC's +[Design principles] as follows. + +**Arguments**: + +* `args`: change to yield an iterator rather than vector if possible; in any + case, it should produce an `OsString`. + +**Environment variables**: + +* `vars` (renamed from `env`): yields a vector of `(OsString, OsString)` pairs. +* `var` (renamed from `getenv`): take a value bounded by `AsOsStr`, + allowing Rust strings and slices to be ergonomically passed in. Yields an + `Option`. +* `var_string`: take a value bounded by `AsOsStr`, returning `Result` where `VarError` represents a non-unicode `OsString` or a "not + present" value. +* `set_var` (renamed from `setenv`): takes two `AsOsStr`-bounded values. +* `remove_var` (renamed from `unsetenv`): takes a `AsOsStr`-bounded value. + +* `join_paths`: take an `IntoIterator` where `T: AsOsStr`, yield a + `Result`. +* `split_paths` take a `AsOsStr`, yield an `Iterator`. + +**Working directory**: + +* `current_dir` (renamed from `getcwd`): yields a `PathBuf`. +* `set_current_dir` (renamed from `change_dir`): takes an `AsPath` value. + +**Important locations**: + +* `home_dir` (renamed from `homedir`): returns home directory as a `PathBuf` +* `temp_dir` (renamed from `tmpdir`): returns a temporary directly as a `PathBuf` +* `current_exe` (renamed from `self_exe_name`): returns the full path + to the current binary as a `PathBuf` in an `io::Result` instead of an + `Option`. + +**Exit status**: + +* `get_exit_status` and `set_exit_status` stay as they are, but with + updated docs that reflect that these only affect the return value of + `std::rt::start`. These will remain `#[unstable]` for now and a future RFC + will determine their stability. + +**Architecture information**: + +* `num_cpus`, `page_size`: stay as they are, but remain `#[unstable]`. A future + RFC will determine their stability and semantics. + +**Constants**: + +* Stabilize `ARCH`, `DLL_PREFIX`, `DLL_EXTENSION`, `DLL_SUFFIX`, + `EXE_EXTENSION`, `EXE_SUFFIX`, `FAMILY` as they are. +* Rename `SYSNAME` to `OS`. +* Remove `TMPBUF_SZ`. + +This brings the constants into line with our naming conventions elsewhere. + +#### Items to move to `os::platform` + +* `pipe` will move to `os::unix`. It is currently primarily used for + hooking to the IO of a child process, which will now be done behind + a trait object abstraction. + +#### Removed items + +* `errno`, `error_string` and `last_os_error` provide redundant, + platform-specific functionality and will be removed for now. They + may reappear later in `os::unix` and `os::windows` in a modified + form. +* `dll_filename`: deprecated in favor of working directly with the constants. +* `_NSGetArgc`, `_NSGetArgv`: these should never have been public. +* `self_exe_path`: deprecated in favor of `current_exe` plus path operations. +* `make_absolute`: deprecated in favor of explicitly joining with the working directory. +* all `_as_bytes` variants: deprecated in favor of yielding `OsString` values + +### `std::fs` +[std::fs]: #stdfs + +The `fs` module will provide most of the functionality it does today, +but with a stronger cross-platform orientation. + +Note that all path-consuming functions will now take an +`AsPath`-bounded parameter for ergonomic reasons (this will allow +passing in Rust strings and literals directly, for example). + +#### Free functions +[Free functions]: #free-functions + +**Files**: + +* `copy`. Take `AsPath` bound. +* `rename`. Take `AsPath` bound. +* `remove_file` (renamed from `unlink`). Take `AsPath` bound. + +* `metadata` (renamed from `stat`). Take `AsPath` bound. Yield a new + struct, `Metadata`, with no public fields, but `len`, `is_dir`, + `is_file`, `perms`, `accessed` and `modified` accessors. The various + `os::platform` modules will offer extension methods on this + structure. + +* `set_perms` (renamed from `chmod`). Take `AsPath` bound, and a + `Perms` value. The `Perms` type will be revamped + as a struct with private implementation; see below. + +**Directories**: + +* `create_dir` (renamed from `mkdir`). Take `AsPath` bound. +* `create_dir_all` (renamed from `mkdir_recursive`). Take `AsPath` bound. +* `read_dir` (renamed from `readdir`). Take `AsPath` bound. Yield a + newtypes iterator, which yields a new type `DirEntry` which has an + accessor for `Path`, but will eventually provide other information + as well (possibly via platform-specific extensions). +* `remove_dir` (renamed from `rmdir`). Take `AsPath` bound. +* `remove_dir_all` (renamed from `rmdir_recursive`). Take + `AsPath` bound. +* `walk_dir`. Take `AsPath` bound. Yield an iterator over `IoResult`. + +**Links**: + +* `hard_link` (renamed from `link`). Take `AsPath` bound. +* `soft_link` (renamed from `symlink`). Take `AsPath` bound. +* `read_link` (renamed form `readlink`). Take `AsPath` bound. + +#### Files +[Files]: #files + +The `File` type will largely stay as it is today, except that it will +use the `AsPath` bound everywhere. + +The `stat` method will be renamed to `metadata`, yield a `Metadata` +structure (as described above), and take `&self`. + +The `fsync` method will be renamed to `sync_all`, and `datasync` will be +renamed to `sync_data`. (Although the latter is not available on +Windows, it can be considered an optimization for `flush` and on +Windows behave identically to `sync_all`, just as it does on some Unix +filesystems.) + +The `path` method wil remain `#[unstable]`, as we do not yet want to +commit to its API. + +The `open_mode` function will be removed in favor of and will take an +`OpenOptions` struct, which will encompass today's `FileMode` and +`FileAccess` and support a builder-style API. + +#### File kinds +[File kinds]: #file-kinds + +The `FileType` type will be removed. As mentioned above, `is_file` and +`is_dir` will be provided directly on `Metadata`; the other types +need to be audited for compatibility across +platforms. Platform-specific kinds will be relegated to extension +traits in `std::os::platform`. + +It's possible that an +[extensible](https://github.com/rust-lang/rfcs/pull/757) `Kind` will +be added in the future. + +#### File permissions +[File permissions]: #file-permissions + +The permission models on Unix and Windows vary greatly -- even between +different filesystems within the same OS. Rather than offer an API +that has no meaning on some platforms, we will initially provide a +very limited `Perms` structure in `std::fs`, and then rich +extension traits in `std::os::unix` and `std::os::windows`. Over time, +if clear cross-platform patterns emerge for richer permissions, we can +grow the `Perms` structure. + +On the Unix side, the constructors and accessors for `Perms` +will resemble the flags we have today; details are left to the implementation. + +On the Windows side, initially there will be no extensions, as Windows +has a very complex permissions model that will take some time to build +out. + +For `std::fs` itself, `Perms` will provide constructors and +accessors for "world readable" -- and that is all. At the moment, that +is all that is known to be compatible across the platforms that Rust +supports. + +#### `PathExt` +[PathExt]: #pathext + +This trait will essentially remain stay as it is (renamed from +`PathExtensions`), following the same changes made to `fs` free functions. + +#### Items to move to `os::platform` + +* `lstat` will move to `os::unix` and remain `#[unstable]` *for now* + since it is not yet implemented for Windows. + +* `chown` will move to `os::unix` (it currently does *nothing* on + Windows), and eventually `os::windows` will grow support for + Windows's permission model. If at some point a reasonable + intersection is found, we will re-introduce a cross-platform + function in `std::fs`. + +* In general, offer all of the `stat` fields as an extension trait on + `Metadata` (e.g. `os::unix::MetadataExt`). + +### `std::net` +[std::net]: #stdnet + +The contents of `std::io::net` submodules `tcp`, `udp`, `ip` and +`addrinfo` will be retained but moved into a single `std::net` module; +the other modules are being moved or removed and are described +elsewhere. + +#### SocketAddr + +This structure will represent either a `sockaddr_in` or `sockaddr_in6` which is +commonly just a pairing of an IP address and a port. + +```rust +enum SocketAddr { + V4(SocketAddrV4), + V6(SocketAddrV6), +} + +impl SocketAddrV4 { + fn new(addr: Ipv4Addr, port: u16) -> SocketAddrV4; + fn ip(&self) -> &Ipv4Addr; + fn port(&self) -> u16; +} + +impl SocketAddrV6 { + fn new(addr: Ipv6Addr, port: u16, flowinfo: u32, scope_id: u32) -> SocketAddrV6; + fn ip(&self) -> &Ipv6Addr; + fn port(&self) -> u16; + fn flowinfo(&self) -> u32; + fn scope_id(&self) -> u32; +} +``` + +#### Ipv4Addr + +Represents a version 4 IP address. It has the following interface: + +```rust +impl Ipv4Addr { + fn new(a: u8, b: u8, c: u8, d: u8) -> Ipv4Addr; + fn any() -> Ipv4Addr; + fn octets(&self) -> [u8; 4]; + fn to_ipv6_compatible(&self) -> Ipv6Addr; + fn to_ipv6_mapped(&self) -> Ipv6Addr; +} +``` + +#### Ipv6Addr + +Represents a version 6 IP address. It has the following interface: + +```rust +impl Ipv6Addr { + fn new(a: u16, b: u16, c: u16, d: u16, e: u16, f: u16, g: u16, h: u16) -> Ipv6Addr; + fn any() -> Ipv6Addr; + fn segments(&self) -> [u16; 8] + fn to_ipv4(&self) -> Option; +} +``` + +#### TCP +[TCP]: #tcp + +The current `TcpStream` struct will be pared back from where it is today to the +following interface: + +```rust +// TcpStream, which contains both a reader and a writer + +impl TcpStream { + fn connect(addr: &A) -> io::Result; + fn peer_addr(&self) -> io::Result; + fn local_addr(&self) -> io::Result; + fn shutdown(&self, how: Shutdown) -> io::Result<()>; + fn try_clone(&self) -> io::Result; +} + +impl Read for TcpStream { ... } +impl Write for TcpStream { ... } +impl<'a> Read for &'a TcpStream { ... } +impl<'a> Write for &'a TcpStream { ... } +#[cfg(unix)] impl AsRawFd for TcpStream { ... } +#[cfg(windows)] impl AsRawSocket for TcpStream { ... } +``` + +* `clone` has been replaced with a `try_clone` function. The implementation of + `try_clone` will map to using `dup` on Unix platforms and + `WSADuplicateSocket` on Windows platforms. The `TcpStream` itself will no + longer be reference counted itself under the hood. +* `close_{read,write}` are both removed in favor of binding the `shutdown` + function directly on sockets. This will map to the `shutdown` function on both + Unix and Windows. +* `set_timeout` has been removed for now (as well as other timeout-related + functions). It is likely that this may come back soon as a binding to + `setsockopt` to the `SO_RCVTIMEO` and `SO_SNDTIMEO` options. This RFC does not + currently proposed adding them just yet, however. +* Implementations of `Read` and `Write` are provided for `&TcpStream`. These + implementations are not necessarily ergonomic to call (requires taking an + explicit reference), but they express the ability to concurrently read and + write from a `TcpStream` + +Various other options such as `nodelay` and `keepalive` will be left +`#[unstable]` for now. The `TcpStream` structure will also adhere to both `Send` +and `Sync`. + +The `TcpAcceptor` struct will be removed and all functionality will be folded +into the `TcpListener` structure. Specifically, this will be the resulting API: + +```rust +impl TcpListener { + fn bind(addr: &A) -> io::Result; + fn local_addr(&self) -> io::Result; + fn try_clone(&self) -> io::Result; + fn accept(&self) -> io::Result<(TcpStream, SocketAddr)>; + fn incoming(&self) -> Incoming; +} + +impl<'a> Iterator for Incoming<'a> { + type Item = io::Result; + ... +} +#[cfg(unix)] impl AsRawFd for TcpListener { ... } +#[cfg(windows)] impl AsRawSocket for TcpListener { ... } +``` + +Some major changes from today's API include: + +* The static distinction between `TcpAcceptor` and `TcpListener` has been + removed (more on this in the [socket][Sockets] section). +* The `clone` functionality has been removed in favor of `try_clone` (same + caveats as `TcpStream`). +* The `close_accept` functionality is removed entirely. This is not currently + implemented via `shutdown` (not supported well across platforms) and is + instead implemented via `select`. This functionality can return at a later + date with a more robust interface. +* The `set_timeout` functionality has also been removed in favor of returning at + a later date in a more robust fashion with `select`. +* The `accept` function no longer takes `&mut self` and returns `SocketAddr`. + The change in mutability is done to express that multiple `accept` calls can + happen concurrently. +* For convenience the iterator does not yield the `SocketAddr` from `accept`. + +The `TcpListener` type will also adhere to `Send` and `Sync`. + +#### UDP +[UDP]: #udp + +The UDP infrastructure will receive a similar face-lift as the TCP +infrastructure will: + +```rust +impl UdpSocket { + fn bind(addr: &A) -> io::Result; + fn recv_from(&self, buf: &mut [u8]) -> io::Result<(usize, SocketAddr)>; + fn send_to(&self, buf: &[u8], addr: &A) -> io::Result; + fn local_addr(&self) -> io::Result; + fn try_clone(&self) -> io::Result; +} + +#[cfg(unix)] impl AsRawFd for UdpSocket { ... } +#[cfg(windows)] impl AsRawSocket for UdpSocket { ... } +``` + +Some important points of note are: + +* The `send` and `recv` function take `&self` instead of `&mut self` to indicate + that they may be called safely in concurrent contexts. +* All configuration options such as `multicast` and `ttl` are left as + `#[unstable]` for now. +* All timeout support is removed. This may come back in the form of `setsockopt` + (as with TCP streams) or with a more general implementation of `select`. +* `clone` functionality has been replaced with `try_clone`. + +The `UdpSocket` type will adhere to both `Send` and `Sync`. + +#### Sockets +[Sockets]: #sockets + +The current constructors for `TcpStream`, `TcpListener`, and `UdpSocket` are +largely "convenience constructors" as they do not expose the underlying details +that a socket can be configured before it is bound, connected, or listened on. +One of the more frequent configuration options is `SO_REUSEADDR` which is set by +default for `TcpListener` currently. + +This RFC leaves it as an open question how best to implement this +pre-configuration. The constructors today will likely remain no matter what as +convenience constructors and a new structure would implement consuming methods +to transform itself to each of the various `TcpStream`, `TcpListener`, and +`UdpSocket`. + +This RFC does, however, recommend not adding multiple constructors to the +various types to set various configuration options. This pattern is best +expressed via a flexible socket type to be added at a future date. + +#### Addresses +[Addresses]: #addresses + +For the current `addrinfo` module: + +* The `get_host_addresses` should be renamed to `lookup_host`. +* All other contents should be removed. + +For the current `ip` module: + +* The `ToSocketAddr` trait should become `ToSocketAddrs` +* The default `to_socket_addr_all` method should be removed. + +The following implementations of `ToSocketAddrs` will be available: + +```rust +impl ToSocketAddrs for SocketAddr { ... } +impl ToSocketAddrs for SocketAddrV4 { ... } +impl ToSocketAddrs for SocketAddrV6 { ... } +impl ToSocketAddrs for (Ipv4Addr, u16) { ... } +impl ToSocketAddrs for (Ipv6Addr, u16) { ... } +impl ToSocketAddrs for (&str, u16) { ... } +impl ToSocketAddrs for str { ... } +impl ToSocketAddrs for &T { ... } +``` + +### `std::process` +[std::process]: #stdprocess + +Currently `std::io::process` is used only for spawning new +processes. The re-envisioned `std::process` will ultimately support +inspecting currently-running processes, although this RFC does not +propose any immediate support for doing so -- it merely future-proofs +the module. + +#### `Command` +[Command]: #command + +The `Command` type is a builder API for processes, and is largely in +good shape, modulo a few tweaks: + +* Replace `ToCStr` bounds with `AsOsStr`. +* Replace `env_set_all` with `env_clear` +* Rename `cwd` to `current_dir`, take `AsPath`. +* Rename `spawn` to `run` +* Move `uid` and `gid` to an extension trait in `os::unix` +* Make `detached` take a `bool` (rather than always setting the + command to detached mode). + +The `stdin`, `stdout`, `stderr` methods will undergo a more +significant change. By default, the corresponding options will be +considered "unset", the interpretation of which depends on how the +process is launched: + +* For `run` or `status`, these will inherit from the current process by default. +* For `output`, these will capture to new readers/writers by default. + +The `StdioContainer` type will be renamed to `Stdio`, and will not be +exposed directly as an enum (to enable growth and change over time). +It will provide a `Capture` constructor for capturing input or output, +an `Inherit` constructor (which just means to use the current IO +object -- it does not take an argument), and a `Null` constructor. The +equivalent of today's `InheritFd` will be added at a later point. + +#### `Child` +[Child]: #child + +We propose renaming `Process` to `Child` so that we can add a +more general notion of non-child `Process` later on (every +`Child` will be able to give you a `Process`). + +* `stdin`, `stdout` and `stderr` will be retained as public fields, + but their types will change to newtyped readers and writers to hide the internal + pipe infrastructure. +* The `kill` method is dropped, and `id` and `signal` will move to `os::platform` extension traits. +* `signal_exit`, `signal_kill`, `wait`, and `forget` will all stay as they are. +* `set_timeout` will be changed to use the `with_deadline` infrastructure. + +There are also a few other related changes to the module: + +* Rename `ProcessOutput` to `Output` +* Rename `ProcessExit` to `ExitStatus`, and hide its + representation. Remove `matches_exit_status`, and add a `status` + method yielding an `Option` +* Remove `MustDieSignal`, `PleaseExitSignal`. +* Remove `EnvMap` (which should never have been exposed). + +### `std::os` +[std::os]: #stdos + +Initially, this module will be empty except for the platform-specific +`unix` and `windows` modules. It is expected to grow additional, more +specific platform submodules (like `linux`, `macos`) over time. + +## Odds and ends +[Odds and ends]: #odds-and-ends + +> To be expanded in a follow-up PR. + +### The `io` prelude +[The io prelude]: #the-io-prelude + +The `prelude` submodule will contain most of the traits, types, and +modules discussed in this RFC; it is meant to provide maximal +convenience when working with IO of any kind. The exact contents of +the module are left as an open question. + +# Drawbacks +[Drawbacks]: #drawbacks + +This RFC is largely about cleanup, normalization, and stabilization of +our IO libraries -- work that needs to be done, but that also +represents nontrivial churn. + +However, the actual implementation work involved is estimated to be +reasonably contained, since all of the functionality is already in +place in some form (including `os_str`, due to @SimonSapin's +[WTF-8 implementation](https://github.com/SimonSapin/rust-wtf8)). + +# Alternatives +[Alternatives]: #alternatives + +The main alternative design would be to continue staying with the +Posix tradition in terms of naming and functionality (for which there +is precedent in some other languages). However, Rust is already +well-known for its strong cross-platform compatibility in `std`, and +making the library more Windows-friendly will only increase its appeal. + +More radically different designs (in terms of different design +principles or visions) are outside the scope of this RFC. + +# Unresolved questions +[Unresolved questions]: #unresolved-questions + +> To be expanded in follow-up PRs. + +## Wide string representation + +(Text from @SimonSapin) + +Rather than WTF-8, `OsStr` and `OsString` on Windows could use +potentially-ill-formed UTF-16 (a.k.a. "wide" strings), with a +different cost trade off. + +Upside: +* No conversion between `OsStr` / `OsString` and OS calls. + +Downsides: +* More expensive conversions between `OsStr` / `OsString` and `str` / `String`. +* These conversions have inconsistent performance characteristics between platforms. (Need to allocate on Windows, but not on Unix.) +* Some of them return `Cow`, which has some ergonomic hit. + +The API (only parts that differ) could look like: + +```rust +pub mod os_str { + #[cfg(windows)] + mod imp { + type Buf = Vec; + type Slice = [u16]; + ... + } + + impl OsStr { + pub fn from_str(&str) -> Cow; + pub fn to_string(&self) -> Option; + pub fn to_string_lossy(&self) -> CowString; + } + + #[cfg(windows)] + pub mod windows{ + trait OsStringExt { + fn from_wide_slice(&[u16]) -> Self; + fn from_wide_vec(Vec) -> Self; + fn into_wide_vec(self) -> Vec; + } + + trait OsStrExt { + fn from_wide_slice(&[u16]) -> Self; + fn as_wide_slice(&self) -> &[u16]; + } + } +} +``` diff --git a/text/0520-new-array-repeat-syntax.md b/text/0520-new-array-repeat-syntax.md new file mode 100644 index 00000000000..45a858377f0 --- /dev/null +++ b/text/0520-new-array-repeat-syntax.md @@ -0,0 +1,179 @@ +- Start Date: 2014-12-13 +- RFC PR: [520](https://github.com/rust-lang/rfcs/pull/520) +- Rust Issue: [19999](https://github.com/rust-lang/rust/issues/19999) + +# Summary + +Under this RFC, the syntax to specify the type of a fixed-length array +containing `N` elements of type `T` would be changed to `[T; N]`. Similarly, the +syntax to construct an array containing `N` duplicated elements of value `x` +would be changed to `[x; N]`. + +# Motivation + +[RFC 439](https://github.com/rust-lang/rfcs/blob/master/text/0439-cmp-ops-reform.md) +(cmp/ops reform) has resulted in an ambiguity that must be resolved. Previously, +an expression with the form `[x, ..N]` would unambiguously refer to an array +containing `N` identical elements, since there would be no other meaning that +could be assigned to `..N`. However, under RFC 439, `..N` should now desugar to +an object of type `RangeTo`, with `T` being the type of `N`. + +In order to resolve this ambiguity, there must be a change to either the syntax +for creating an array of repeated values, or the new range syntax. This RFC +proposes the former, in order to preserve existing functionality while avoiding +modifications that would make the range syntax less intuitive. + +# Detailed design + +The syntax `[T, ..N]` for specifying array types will be replaced by the new +syntax `[T; N]`. + +In the expression `[x, ..N]`, the `..N` will refer to an expression of type +`RangeTo` (where `T` is the type of `N`). As with any other array of two +elements, `x` will have to be of the same type, and the array expression will be +of type `[RangeTo; 2]`. + +The expression `[x; N]` will be equivalent to the old meaning of the syntax +`[x, ..N]`. Specifically, it will create an array of length `N`, each element of +which has the value `x`. + +The effect will be to convert uses of arrays such as this: + +```rust +let a: [uint, ..2] = [0u, ..2]; +``` + +to this: + +```rust +let a: [uint; 2] = [0u; 2]; +``` + +## Match patterns + +In match patterns, `..` is always interpreted as a wildcard for constructor +arguments (or for slice patterns under the `advanced_slice_patterns` feature +gate). This RFC does not change that. In a match pattern, `..` will always be +interpreted as a wildcard, and never as sugar for a range constructor. + +## Suggested implementation + +While not required by this RFC, one suggested transition plan is as follows: + +- Implement the new syntax for `[T; N]`/`[x; N]` proposed above. + +- Issue deprecation warnings for code that uses `[T, ..N]`/`[x, ..N]`, allowing + easier identification of code that needs to be transitioned. + +- When RFC 439 range literals are implemented, remove the deprecated syntax and + thus complete the implementation of this RFC. + +# Drawbacks + +## Backwards incompatibility + +- Changing the method for specifying an array size will impact a large amount of + existing code. Code conversion can probably be readily automated, but will + still require some labor. + +## Implementation time + +This proposal is submitted very close to the anticipated release of Rust +1.0. Changing the array repeat syntax is likely to require more work than +changing the range syntax specified in RFC 439, because the latter has not yet +been implemented. + +However, this decision cannot be reasonably postponed. Many users have expressed +a preference for implementing the RFC 439 slicing syntax as currently specified +rather than preserving the existing array repeat syntax. This cannot be resolved +in a backwards-compatible manner if the array repeat syntax is kept. + +# Alternatives + +Inaction is not an alternative due to the ambiguity introduced by RFC 439. Some +resolution must be chosen in order for the affected modules in `std` to be +stabilized. + +## Retain the type syntax only + +In theory, it seems that the type syntax `[T, ..N]` could be retained, while +getting rid of the expression syntax `[x, ..N]`. The problem with this is that, +if this syntax was removed, there is currently no way to define a macro to +replace it. + +Retaining the current type syntax, but changing the expression syntax, would +make the language somewhat more complex and inconsistent overall. There seem to +be no advocates of this alternative so far. + +## Different array repeat syntax + +The comments in [pull request #498](https://github.com/rust-lang/rfcs/pull/498) +mentioned many candidates for new syntax other than the `[x; N]` form in this +RFC. The comments on the pull request of this RFC mentioned many more. + +- Instead of using `[x; N]`, use `[x for N]`. + + - This use of `for` would not be exactly analogous to existing `for` loops, + because those accept an iterator rather than an integer. To a new user, + the expression `[x for N]` would resemble a list comprehension + (e.g. Python's syntax is `[expr for i in iter]`), but in fact it does + something much simpler. + - It may be better to avoid uses of `for` that could complicate future + language features, e.g. returning a value other than `()` from loops, or + some other syntactic sugar related to iterators. However, the risk of + actual ambiguity is not that high. + +- Introduce a different symbol to specify array sizes, e.g. `[T # N]`, + `[T @ N]`, and so forth. + +- Introduce a keyword rather than a symbol. There are many other options, e.g. + `[x by N]`. The original version of this proposal was for `[N of x]`, but this + was deemed to complicate parsing too much, since the parser would not know + whether to expect a type or an expression after the opening bracket. + +- Any of several more radical changes. + +## Change the range syntax + +The main problem here is that there are no proposed candidates that seem as +clear and ergonomic as `i..j`. The most common alternative for slicing in other +languages is `i:j`, but in Rust this simply causes an ambiguity with a different +feature, namely type ascription. + +## Limit range syntax to the interior of an index (use `i..j` for slicing only) + +This resolves the issue since indices can be distinguished from arrays. However, +it removes some of the benefits of RFC 439. For instance, it removes the +possibility of using `for i in 1..10` to loop. + +## Remove `RangeTo` from RFC 439 + +The proposal in pull request #498 is to remove the sugar for `RangeTo` (i.e., +`..j`) while retaining other features of RFC 439. This is the simplest +resolution, but removes some convenience from the language. It is also +counterintuitive, because `RangeFrom` (i.e. `i..`) is retained, and because `..` +still has several different meanings in the language (ranges, repetition, and +pattern wildcards). + +# Unresolved questions + +## Match patterns + +There will still be two semantically distinct uses of `..`, for the RFC 439 +range syntax and for wildcards in patterns. This could be considered harmful +enough to introduce further changes to separate the two. Or this could be +considered innocuous enough to introduce some additional range-related meaning +for `..` in certain patterns. + +It is possible that the new syntax `[x; N]` could itself be used within +patterns. + +This RFC does not attempt to address any of these issues, because the current +pattern syntax does not allow use of the repeated array syntax, and does not +contain an ambiguity. + +## Behavior of `for` in array expressions + +It may be useful to allow `for` to take on a new meaning in array expressions. +This RFC keeps this possibility open, but does not otherwise propose any +concrete changes to move towards or away from this feature. diff --git a/text/0522-self-impl.md b/text/0522-self-impl.md new file mode 100644 index 00000000000..6bd31ed53c1 --- /dev/null +++ b/text/0522-self-impl.md @@ -0,0 +1,49 @@ +- Start Date: 2014-12-13 +- RFC PR: [522](https://github.com/rust-lang/rfcs/pull/522) +- Rust Issue: [20000](https://github.com/rust-lang/rust/issues/20000) + +# Summary + +Allow `Self` type to be used in impls. + +# Motivation + +Allows macros which operate on methods to do more, more easily without having to +rebuild the concrete self type. Macros could use the literal self type like +programmers do, but that requires extra machinery in the macro expansion code +and extra work by the macro author. + +Allows easier copy and pasting of method signatures from trait declarations to +implementations. + +Is more succinct where the self type is complex. + +## Motivation for doing this now + +I'm hitting the macro problem in a side project. I wrote and hope to land the +compiler code to make it work, but it is ugly and this is a much nicer solution. +It is also really easy to implement, and since it is just a desugaring, it +should not add any additional complexity to the compiler. Obviously, this should +not block 1.0. + +# Detailed design + +When used inside an impl, `Self` is desugared during syntactic expansion to the +concrete type being implemented. `Self` can be used anywhere the desugared type +could be used. + +# Drawbacks + +There are some advantages to being explicit about the self type where it is +possible - clarity and fewer type aliases. + +# Alternatives + +We could just force authors to use the concrete type as we do currently. This +would require macro expansion code to make available the concrete type (or the +whole impl AST) to macros working on methods. The macro author would then +extract/construct the self type and use it instead of `Self`. + +# Unresolved questions + +None. diff --git a/text/0526-fmt-text-writer.md b/text/0526-fmt-text-writer.md new file mode 100644 index 00000000000..d4b84dc4c22 --- /dev/null +++ b/text/0526-fmt-text-writer.md @@ -0,0 +1,104 @@ +- Start Date: 2014-12-30 +- RFC PR: https://github.com/rust-lang/rfcs/pull/526 +- Rust Issue: https://github.com/rust-lang/rust/issues/20352 + +# Summary + +Statically enforce that the `std::fmt` module can only create valid UTF-8 data +by removing the arbitrary `write` method in favor of a `write_str` method. + +# Motivation + +Today it is conventionally true that the output from macros like `format!` and +well as implementations of `Show` only create valid UTF-8 data. This is not +statically enforced, however. As a consequence the `.to_string()` method must +perform a `str::is_utf8` check before returning a `String`. + +This `str::is_utf8` check is currently [one of the most costly parts][bench1] +of the formatting subsystem while normally just being a redundant check. + +[bench1]: https://gist.github.com/alexcrichton/162a5f8f93062800c914 + +Additionally, it is possible to statically enforce the convention that `Show` +only deals with valid unicode, and as such the possibility of doing so should be +explored. + +# Detailed design + +The `std::fmt::FormatWriter` trait will be redefined as: + +```rust +pub trait Writer { + fn write_str(&mut self, data: &str) -> Result; + fn write_char(&mut self, ch: char) -> Result { + // default method calling write_str + } + fn write_fmt(&mut self, f: &Arguments) -> Result { + // default method calling fmt::write + } +} +``` + +There are a few major differences with today's trait: + +* The name has changed to `Writer` in accordance with [RFC 356][rfc356] +* The `write` method has moved from taking `&[u8]` to taking `&str` instead. +* A `write_char` method has been added. + +[rfc356]: https://github.com/rust-lang/rfcs/blob/master/text/0356-no-module-prefixes.md + +The corresponding methods on the `Formatter` structure will also be altered to +respect these signatures. + +The key idea behind this API is that the `Writer` trait only operates on unicode +data. The `write_str` method is a static enforcement of UTF-8-ness, and using +`write_char` follows suit as a `char` can only be a valid unicode codepoint. + +With this trait definition, the implementation of `Writer` for `Vec` will be +removed (note this is *not* the `io::Writer` implementation) in favor of an +implementation directly on `String`. The `.to_string()` method will change +accordingly (as well as `format!`) to write directly into a `String`, bypassing +all UTF-8 validity checks afterwards. + +This change [has been implemented][branch] in a branch of mine, and as expected +the [benchmark numbers have improved][bench2] for the much larger texts. + +[branch]: https://github.com/alexcrichton/rust/tree/fmt-text +[bench2]: https://gist.github.com/alexcrichton/182ccef5d8c2583a2423 + +Note that a key point of the changes implemented is that a call to `write!` into +an arbitrary `io::Writer` is *still valid* as it's still just a sink for bytes. +The changes outlined in this RFC will only affect `Show` and other formatting +trait implementations. As can be seen from the sample implementation, the +fallout is quite minimal with respect to the rest of the standard library. + +# Drawbacks + +A version of this RFC has been [previously postponed][rfc57], but this variant +is much less ambitious in terms of generic `TextWriter` support. At this time +the design of `fmt::Writer` is purposely conservative. + +[rfc57]: https://github.com/rust-lang/rfcs/pull/57 + +There are currently some use cases today where a `&mut Formatter` is interpreted +as a `&mut Writer`, e.g. for the `Show` impl of `Json`. This is undoubtedly used +outside this repository, and it would break all of these users relying on the +binary functionality of the old `FormatWriter`. + +# Alternatives + +Another possible solution to specifically the performance problem is to have an +`unsafe` flag on a `Formatter` indicating that only valid utf-8 data was +written, and if all sub-parts of formatting set this flag then the data can be +assumed utf-8. In general relying on `unsafe` apis is less "pure" than relying +on the type system instead. + +The `fmt::Writer` trait can also be located as `io::TextWriter` instead to +emphasize its possible future connection with I/O, although there are not +concrete plans today to develop these connections. + +# Unresolved questions + +* It is unclear to what degree a `fmt::Writer` needs to interact with + `io::Writer` and the various adaptors/buffers. For example one would have to + implement their own `BufferedWriter` for a `fmt::Writer`. diff --git a/text/0528-string-patterns.md b/text/0528-string-patterns.md new file mode 100644 index 00000000000..a31dcd85dc0 --- /dev/null +++ b/text/0528-string-patterns.md @@ -0,0 +1,440 @@ +- Start Date: 2015-02-17 +- RFC PR: https://github.com/rust-lang/rfcs/pull/528 +- Rust Issue: https://github.com/rust-lang/rust/issues/22477 + +# Summary + +Stabilize all string functions working with search patterns around a new +generic API that provides a unified way to define and use those patterns. + +# Motivation + +Right now, string slices define a couple of methods for string +manipulation that work with user provided values that act as +search patterns. For example, `split()` takes an type implementing `CharEq` +to split the slice at all codepoints that match that predicate. + +Among these methods, the notion of what exactly is being used as a search +pattern varies inconsistently: Many work with the generic `CharEq`, +which only looks at a single codepoint at a time; and some +work with `char` or `&str` directly, sometimes duplicating a method to +provide operations for both. + +This presents a couple of issues: + +- The API is inconsistent. +- The API duplicates similar operations on different types. (`contains` vs `contains_char`) +- The API does not provide all operations for all types. (For example, no `rsplit` for `&str` patterns) +- The API is not extensible, eg to allow splitting at regex matches. +- The API offers no way to explicitly decide between different search algorithms + for the same pattern, for example to use Boyer-Moore string searching. + +At the moment, the full set of relevant string methods roughly looks like this: + +```rust +pub trait StrExt for ?Sized { + fn contains(&self, needle: &str) -> bool; + fn contains_char(&self, needle: char) -> bool; + + fn split(&self, sep: Sep) -> CharSplits; + fn splitn(&self, sep: Sep, count: uint) -> CharSplitsN; + fn rsplitn(&self, sep: Sep, count: uint) -> CharSplitsN; + fn split_terminator(&self, sep: Sep) -> CharSplits; + fn split_str<'a>(&'a self, &'a str) -> StrSplits<'a>; + + fn match_indices<'a>(&'a self, sep: &'a str) -> MatchIndices<'a>; + + fn starts_with(&self, needle: &str) -> bool; + fn ends_with(&self, needle: &str) -> bool; + + fn trim_chars(&self, to_trim: C) -> &'a str; + fn trim_left_chars(&self, to_trim: C) -> &'a str; + fn trim_right_chars(&self, to_trim: C) -> &'a str; + + fn find(&self, search: C) -> Option; + fn rfind(&self, search: C) -> Option; + fn find_str(&self, &str) -> Option; + + // ... +} +``` + +This RFC proposes to fix those issues by providing a unified `Pattern` trait +that all "string pattern" types would implement, and that would be used by the string API +exclusively. + +This fixes the duplication, consistency, and extensibility problems, and also allows to define +newtype wrappers for the same pattern types that use different or specific +search implementations. + +As an additional design goal, the new abstractions should also not pose a problem +for optimization - like for iterators, a concrete instance should produce similar +machine code to a hardcoded optimized loop written in C. + +# Detailed design + +## New traits + +First, new traits will be added to the `str` module in the std library: + +```rust +trait Pattern<'a> { + type Searcher: Searcher<'a>; + fn into_matcher(self, haystack: &'a str) -> Self::Searcher; + + fn is_contained_in(self, haystack: &'a str) -> bool { /* default*/ } + fn match_starts_at(self, haystack: &'a str, idx: usize) -> bool { /* default*/ } + fn match_ends_at(self, haystack: &'a str, idx: usize) -> bool + where Self::Searcher: ReverseSearcher<'a> { /* default*/ } +} +``` + +A `Pattern` represents a builder for an associated type implementing a +family of `Searcher` traits (see below), and will be implemented by all types that +represent string patterns, which includes: + +- `&str` +- `char`, and everything else implementing `CharEq` +- Third party types like `&Regex` or `Ascii` +- Alternative algorithm wrappers like `struct BoyerMoore(&str)` + +```rust +impl<'a> Pattern<'a> for char { /* ... */ } +impl<'a, 'b> Pattern<'a> for &'b str { /* ... */ } + +impl<'a, 'b> Pattern<'a> for &'b [char] { /* ... */ } +impl<'a, F> Pattern<'a> for F where F: FnMut(char) -> bool { /* ... */ } + +impl<'a, 'b> Pattern<'a> for &'b Regex { /* ... */ } +``` + +The lifetime parameter on `Pattern` exists in order to allow threading the lifetime +of the haystack (the string to be searched through) through the API, and is a workaround +for not having associated higher kinded types yet. + +Consumers of this API can then call `into_searcher()` on the pattern to convert it into +a type implementing a family of `Searcher` traits: + +```rust +pub enum SearchStep { + Match(usize, usize), + Reject(usize, usize), + Done +} +pub unsafe trait Searcher<'a> { + fn haystack(&self) -> &'a str; + fn next(&mut self) -> SearchStep; + + fn next_match(&mut self) -> Option<(usize, usize)> { /* default*/ } + fn next_reject(&mut self) -> Option<(usize, usize)> { /* default*/ } +} +pub unsafe trait ReverseSearcher<'a>: Searcher<'a> { + fn next_back(&mut self) -> SearchStep; + + fn next_match_back(&mut self) -> Option<(usize, usize)> { /* default*/ } + fn next_reject_back(&mut self) -> Option<(usize, usize)> { /* default*/ } +} +pub trait DoubleEndedSearcher<'a>: ReverseSearcher<'a> {} +``` + +The basic idea of a `Searcher` is to expose a interface for +iterating through all connected string fragments of the haystack while classifing them as either a match, or a reject. + +This happens in form of the returned enum value. A `Match` needs to contain the start and end indices of a complete non-overlapping match, while a `Rejects` may be emitted for arbitary non-overlapping rejected parts of the string, as long as the start and end indices lie on valid utf8 boundaries. + +Similar to iterators, depending on the concrete implementation a searcher can have +additional capabilities that build on each other, which is why they will be +defined in terms of a three-tier hierarchy: + +- `Searcher<'a>` is the basic trait that all searchers need to implement. + It contains a `next()` method that returns the `start` and `end` indices of + the next match or reject in the haystack, with the search beginning at the front + (left) of the string. It also contains a `haystack()` getter for returning the + actual haystack, which is the source of the `'a` lifetime on the hierarchy. + The reason for this getter being made part of the trait is twofold: + - Every searcher needs to store some reference to the haystack anyway. + - Users of this trait will need access to the haystack in order + for the individual match results to be useful. +- `ReverseSearcher<'a>` adds an `next_back()` method, for also allowing to efficiently + search in reverse (starting from the right). + However, the results are not required to be equal to the results of + `next()` in reverse, (as would be the case for the `DoubleEndedIterator` trait) + because that can not be efficiently guaranteed for all searchers. (For an example, see further below) +- Instead `DoubleEndedSearcher<'a>` is provided as an marker trait for expressing + that guarantee - If a searcher implements this trait, all results found from the + left need to be equal to all results found from the right in reverse order. + +As an important last detail, both +`Searcher` and `ReverseSearcher` are marked as `unsafe` traits, even though the actual methods +aren't. This is because every implementation of these traits need to ensure that all +indices returned by `next()` and `next_back()` lie on valid utf8 boundaries +in the haystack. + +Without that guarantee, every single match returned by a matcher would need to be +double-checked for validity, which would be unnecessary and most likely +unoptimizable work. + +This is in contrast to the current hardcoded implementations, which can +make use of such guarantees because the concrete types are known +and all unsafe code needed for such optimizations is contained inside a single safe impl. + +Given that most implementations of these traits will likely +live in the std library anyway, and are thoroughly tested, marking these traits `unsafe` +doesn't seem like a huge burden to bear for good, optimizable performance. + +### The role of the additional default methods + +`Pattern`, `Searcher` and `ReverseSearcher` each offer a few additional +default methods that give better optimization opportunities. + +Most consumers of the pattern API will use them to more narrowly constraint +how they are looking for a pattern, which given an optimized implementantion, +should lead to mostly optimal code being generated. + +### Example for the issue with double-ended searching + +Let the haystack be the string `"fooaaaaabar"`, and let the pattern be the string `"aa"`. + +Then a efficient, lazy implementation of the matcher searching from the left +would find these matches: + +`"foo[aa][aa]abar"` + +However, the same algorithm searching from the right would find these matches: + +`"fooa[aa][aa]bar"` + +This discrepancy can not be avoided without additional overhead or even +allocations for caching in the reverse matcher, and thus "matching from the front" needs to +be considered a different operation than "matching from the back". + +### Why `(uint, uint)` instead of `&str` + +> Note: This section is a bit outdated now + +It would be possible to define `next` and `next_back` to return `&str`s instead of `(uint, uint)` tuples. + +A concrete searcher impl could then make use of unsafe code to construct such an slice cheaply, +and by its very nature it is guaranteed to lie on utf8 boundaries, +which would also allow not marking the traits as unsafe. + +However, this approach has a couple of issues. For one, not every consumer of +this API cares about only the matched slice itself: + +- The `split()` family of operations cares about the slices _between_ matches. +- Operations like `match_indices()` and `find()` need to actually return the offset + to the start of the string as part of their definition. +- The `trim()` and `Xs_with()` family of operations need to compare individual match + offsets with each other and the start and end of the string. + +In order for these use cases to work with a `&str` match, the concrete adapters +would need to unsafely calculate the offset of a match `&str` to the start of the haystack `&str`. + +But that in turn would require matcher implementors to only return actual sub slices into +the haystack, and not random `static` string slices, as the API defined with `&str` would allow. + +In order to resolve that issue, you'd have to do one of: + +- Add the uncheckable API constraint of only requiring true subslices, which would make the traits + unsafe again, negating much of the benefit. +- Return a more complex custom slice type that still contains the haystack offset. + (This is listed as an alternative at the end of this RFC.) + +In both cases, the API does not really improve significantly, so `uint` indices have been chosen +as the "simple" default design. + +## New methods on `StrExt` + +With the `Pattern` and `Searcher` traits defined and implemented, the actual `str` +methods will be changed to make use of them: + +```rust +pub trait StrExt for ?Sized { + fn contains<'a, P>(&'a self, pat: P) -> bool where P: Pattern<'a>; + + fn split<'a, P>(&'a self, pat: P) -> Splits

where P: Pattern<'a>; + fn rsplit<'a, P>(&'a self, pat: P) -> RSplits

where P: Pattern<'a>; + fn split_terminator<'a, P>(&'a self, pat: P) -> TermSplits

where P: Pattern<'a>; + fn rsplit_terminator<'a, P>(&'a self, pat: P) -> RTermSplits

where P: Pattern<'a>; + fn splitn<'a, P>(&'a self, pat: P, n: uint) -> NSplits

where P: Pattern<'a>; + fn rsplitn<'a, P>(&'a self, pat: P, n: uint) -> RNSplits

where P: Pattern<'a>; + + fn matches<'a, P>(&'a self, pat: P) -> Matches

where P: Pattern<'a>; + fn rmatches<'a, P>(&'a self, pat: P) -> RMatches

where P: Pattern<'a>; + fn match_indices<'a, P>(&'a self, pat: P) -> MatchIndices

where P: Pattern<'a>; + fn rmatch_indices<'a, P>(&'a self, pat: P) -> RMatchIndices

where P: Pattern<'a>; + + fn starts_with<'a, P>(&'a self, pat: P) -> bool where P: Pattern<'a>; + fn ends_with<'a, P>(&'a self, pat: P) -> bool where P: Pattern<'a>, + P::Searcher: ReverseSearcher<'a>; + + fn trim_matches<'a, P>(&'a self, pat: P) -> &'a str where P: Pattern<'a>, + P::Searcher: DoubleEndedSearcher<'a>; + fn trim_left_matches<'a, P>(&'a self, pat: P) -> &'a str where P: Pattern<'a>; + fn trim_right_matches<'a, P>(&'a self, pat: P) -> &'a str where P: Pattern<'a>, + P::Searcher: ReverseSearcher<'a>; + + fn find<'a, P>(&'a self, pat: P) -> Option where P: Pattern<'a>; + fn rfind<'a, P>(&'a self, pat: P) -> Option where P: Pattern<'a>, + P::Searcher: ReverseSearcher<'a>; + + // ... +} +``` + +These are mainly the same pattern-using methods as currently existing, only +changed to uniformly use the new pattern API. The main differences are: + +- Duplicates like `contains(char)` and `contains_str(&str)` got merged into single generic methods. +- `CharEq`-centric naming got changed to `Pattern`-centric naming by changing `chars` + to `matches` in a few method names. +- A `Matches` iterator has been added, that just returns the pattern matches as `&str` slices. + Its uninteresting for patterns that look for a single string fragment, like the `char` and `&str` + matcher, but useful for advanced patterns like predicates over codepoints, or regular expressions. +- All operations that can work from both the front and the back consistently exist in two versions, + the regular front version, and a `r` prefixed reverse versions. As explained above, + this is because both represent different operations, and thus need to be handled as such. + To be more precise, the two can __not__ be abstracted over by providing a `DoubleEndedIterator` + implementations, as the different results would break the requirement for double ended iterators + to behave like a double ended queues where you just pop elements from both sides. + +_However_, all iterators will still implement `DoubleEndedIterator` if the underlying +matcher implements `DoubleEndedSearcher`, to keep the ability to do things like `foo.split('a').rev()`. + +## Transition and deprecation plans + +Most changes in this RFC can be made in such a way that code using the old hardcoded or `CharEq`-using +methods will still compile, or give deprecation warning. + +It would even be possible to generically implement `Pattern` for all `CharEq` types, +making the transition more painless. + +Long-term, post 1.0, it would be possible to define new sets of `Pattern` and `Searcher` +without a lifetime parameter by making use of higher kinded types in order to simplify the +string APIs. Eg, instead of `fn starts_with<'a, P>(&'a self, pat: P) -> bool where P: Pattern<'a>;` +you'd have `fn starts_with

(&self, pat: P) -> bool where P: Pattern;`. + +In order to not break backwards-compability, these can use the same generic-impl trick to +forward to the old traits, which would roughly look like this: + +```rust +unsafe trait NewPattern { + type Searcher<'a> where Searcher: NewSearcher; + + fn into_matcher<'a>(self, s: &'a str) -> Self::Searcher<'a>; +} + +unsafe impl<'a, P> Pattern<'a> for P where P: NewPattern { + type Searcher = ::Searcher<'a>; + + fn into_matcher(self, haystack: &'a str) -> Self::Searcher { + ::into_matcher(self, haystack) + } +} + +unsafe trait NewSearcher for Self<'_> { + fn haystack<'a>(self: &Self<'a>) -> &'a str; + fn next_match<'a>(self: &mut Self<'a>) -> Option<(uint, uint)>; +} + +unsafe impl<'a, M> Searcher<'a> for M<'a> where M: NewSearcher { + fn haystack(&self) -> &'a str { + ::haystack(self) + } + fn next_match(&mut self) -> Option<(uint, uint)> { + ::next_match(self) + } +} +``` + +Based on coherency experiments and assumptions about how future HKT will work, +the author is assuming that the above implementation will work, but can not experimentally prove it. + +> Note: There might be still an issue with this upgrade path on the concrete iterator types. + That is, `Split

` might turn into `Split<'a, P>`... Maybe require the `'a` from the beginning? + +In order for these new traits to fully replace the old ones without getting in their way, +the old ones need to not be defined in a way that makes them "final". +That is, they should be defined in their own submodule, like `str::pattern` that can grow +a sister module like `str::newpattern`, and not be exported in a global place like `str` or even +the `prelude` (which would be unneeded anyway). + +# Drawbacks + +- It complicates the whole machinery and API behind the implementation of matching on string patterns. +- The no-HKT-lifetime-workaround wart might be to confusing for something as commonplace as the string API. +- This add a few layers of generics, so compilation times and micro optimizations might suffer. + +# Alternatives + +> Note: This section is not updated to the new naming scheme + +In general: + +- Keep status quo, with all issues listed at the beginning. +- Stabilize on hardcoded variants, eg providing both `contains` and `contains_str`. + Similar to status quo, but no `CharEq` and thus no generics. + +Under the assumption that the lifetime parameter on the traits in this proposal +is too big a wart to have in the release string API, there is an primary alternative +that would avoid it: + +- Stabilize on a variant around `CharEq` - This would mean hardcoded `_str` methods, + generic `CharEq` methods, and no extensibility to types like `Regex`, but has a + upgrade path for later upgrading `CharEq` to a full-fledged, HKT-using `Pattern` API, by providing + back-comp generic impls. + +Next, there are alternatives that might make a positive difference in the authors opinion, but still have +some negative trade-offs: + +- With the `Matcher` traits having the unsafe constraint of returning results unique to the + current haystack already, they could just directly return a `(*const u8, *const u8)` pointing into it. + This would allow a few more micro-optimizations, as now the `matcher -> match -> final slice` + pipeline would no longer need to keep adding and subtracting the start address of the haystack + for immediate results. +- Extend `Pattern` into `Pattern` and `ReversePattern`, starting the forward-reverse split at the level of + patterns directly. The two would still be in a inherits-from relationship like + `Matcher` and `ReverseSearcher`, and be interchangeable if the later also implement `DoubleEndedSearcher`, + but on the `str` API where clauses like `where P: Pattern<'a>, P::Searcher: ReverseSearcher<'a>` + would turn into `where P: ReversePattern<'a>`. + +Lastly, there are alternatives that don't seem very favorable, but are listed for completeness sake: + +- Remove `unsafe` from the API by returning a special `SubSlice<'a>` type instead of `(uint, uint)` in each + match, that wraps the haystack and the + current match as a `(*start, *match_start, *match_end, *end)` pointer quad. It is unclear whether + those two additional words per match end up being an issue after monomorphization, but two of them + will be constant for the duration of the iteration, so changes are they won't matter. + The `haystack()` could also be removed that way, as each match already returns the haystack. + However, this still prevents removal of the lifetime parameters without HKT. +- Remove the lifetimes on `Matcher` and `Pattern` by requiring users of the API to store the haystack slice + themselves, duplicating it in the in-memory representation. + However, this still runs into HKT issues with the impl of `Pattern`. +- Remove the lifetime parameter on `Pattern` and `Matcher` by making them fully unsafe API's, + and require implementations to unsafely transmuting back the lifetime of the haystack slice. +- Remove `unsafe` from the API by not marking the `Matcher` traits as `unsafe`, requiring users of the API + to explicitly check every match on validity in regard to utf8 boundaries. +- Allow to opt-in the `unsafe` traits by providing parallel safe and unsafe `Matcher` traits or methods, + with the one per default implemented in terms of the other. + +# Unresolved questions + +- Concrete performance is untested compared to the current situation. +- Should the API split in regard to forward-reverse matching be as symmetrical as possible, + or as minimal as possible? + In the first case, iterators like `Matches` and `RMatches` could both implement `DoubleEndedIterator` if a + `DoubleEndedSearcher` exists, in the latter only `Matches` would, with `RMatches` only providing the + minimum to support reverse operation. + A ruling in favor of symmetry would also speak for the `ReversePattern` alternative. + +# Additional extensions + +A similar abstraction system could be implemented for `String` APIs, so that for example `string.push("foo")`, +`string.push('f')`, `string.push('f'.to_ascii())` all work by using something like a `StringSource` trait. + +This would allow operations like `s.replace(®ex!(...), "foo")`, +which would be a method generic over both the pattern matched and the string fragment it gets replaced with: + +```rust +fn replace(&mut self, pat: P, with: S) where P: Pattern, S: StringSource { /* ... */ } +``` diff --git a/text/0529-conversion-traits.md b/text/0529-conversion-traits.md new file mode 100644 index 00000000000..6e2e6b878dd --- /dev/null +++ b/text/0529-conversion-traits.md @@ -0,0 +1,548 @@ +- Feature Name: convert +- Start Date: 2014-11-21 +- RFC PR: [rust-lang/rfcs#529](https://github.com/rust-lang/rfcs/pull/529) +- Rust Issue: [rust-lang/rust#23567](https://github.com/rust-lang/rust/issues/23567) + +# Summary + +This RFC proposes several new *generic conversion* traits. The +motivation is to remove the need for ad hoc conversion traits (like +`FromStr`, `AsSlice`, `ToSocketAddr`, `FromError`) whose *sole role* +is for generics bounds. Aside from cutting down on trait +proliferation, centralizing these traits also helps the ecosystem +avoid incompatible ad hoc conversion traits defined downstream from +the types they convert to or from. It also future-proofs against +eventual language features for ergonomic conversion-based overloading. + +# Motivation + +The idea of generic conversion traits has come up from +[time](https://github.com/rust-lang/rust/issues/7080) +[to](http://discuss.rust-lang.org/t/pre-rfc-add-a-coerce-trait-to-get-rid-of-the-as-slice-calls/415) +[time](http://discuss.rust-lang.org/t/pre-rfc-remove-fromerror-trait-add-from-trait/783/3), +and now that multidispatch is available they can be made to work +reasonably well. They are worth considering due to the problems they +solve (given below), and considering *now* because they would obsolete +several ad hoc conversion traits (and several more that are in the +pipeline) for `std`. + +## Problem 1: overloading over conversions + +Rust does not currently support arbitrary, implicit conversions -- and +for some good reasons. However, it is sometimes important +ergonomically to allow a single function to be *explicitly* overloaded +based on conversions. + +For example, the +[recently proposed path APIs](https://github.com/rust-lang/rfcs/pull/474) +introduce an `AsPath` trait to make various path operations ergonomic: + +```rust +pub trait AsPath { + fn as_path(&self) -> &Path; +} + +impl Path { + ... + + pub fn join(&self, path: &P) -> PathBuf { ... } +} +``` + +The idea in particular is that, given a path, you can join using a +string literal directly. That is: + +```rust +// write this: +let new_path = my_path.join("fixed_subdir_name"); + +// not this: +let new_path = my_path.join(Path::new("fixed_subdir_name")); +``` + +It's a shame to have to introduce new ad hoc traits every time such an +overloading is desired. And because the traits are ad hoc, it's also +not possible to program generically over conversions themselves. + +## Problem 2: duplicate, incompatible conversion traits + +There's a somewhat more subtle problem compounding the above: if the +author of the path API neglects to include traits like `AsPath` for +its core types, but downstream crates want to overload on those +conversions, those downstream crates may each introduce their own +conversion traits, which will not be compatible with one another. + +Having standard, generic conversion traits cuts down on the total +number of traits, and also ensures that all Rust libraries have an +agreed-upon way to talk about conversions. + +## Non-goals + +When considering the design of generic conversion traits, it's +tempting to try to do away will *all* ad hoc conversion methods. That +is, to replace methods like `to_string` and `to_vec` with a single +method `to::` and `to::>`. + +Unfortunately, this approach carries several ergonomic downsides: + +* The required `::< _ >` syntax is pretty unfriendly. Something like + `to` would be much better, but is unlikely to happen given + the current grammar. + +* Designing the traits to allow this usage is surprisingly subtle -- + it effectively requires *two traits* per type of generic conversion, + with blanket `impl`s mapping one to the other. Having such + complexity for *all conversions* in Rust seems like a non-starter. + +* Discoverability suffers somewhat. Looking through a method list and + seeing `to_string` is easier to comprehend (for newcomers + especially) than having to crawl through the `impl`s for a trait on + the side -- especially given the trait complexity mentioned above. + +Nevertheless, this is a serious alternative that will be laid out in +more detail below, and merits community discussion. + +# Detailed design + +## Basic design + +The design is fairly simple, although perhaps not as simple as one +might expect: we introduce a total of *four* traits: + +```rust +trait AsRef { + fn as_ref(&self) -> &T; +} + +trait AsMut { + fn as_mut(&mut self) -> &mut T; +} + +trait Into { + fn into(self) -> T; +} + +trait From { + fn from(T) -> Self; +} +``` + +The first three traits mirror our `as`/`into` conventions, but +add a bit more structure to them: `as`-style conversions are from +references to references and `into`-style conversions are between +arbitrary types (consuming their argument). + +A `To` trait, following our `to` conventions and converting from +references to arbitrary types, is possible but is deferred for now. + +The final trait, `From`, mimics the `from` constructors. This trait is +expected to outright replace most custom `from` constructors. See below. + +**Why the reference restrictions?** + +If all of the conversion traits were between arbitrary types, you +would have to use generalized where clauses and explicit lifetimes even for simple cases: + +```rust +// Possible alternative: +trait As { + fn convert_as(self) -> T; +} + +// But then you get this: +fn take_as<'a, T>(t: &'a T) where &'a T: As<&'a MyType>; + +// Instead of this: +fn take_as(t: &T) where T: As; +``` + +If you need a conversion that works over any lifetime, you need to use +higher-ranked trait bounds: + +```rust +... where for<'a> &'a T: As<&'a MyType> +``` + +This case is particularly important when you cannot name a lifetime in +advance, because it will be created on the stack within the +function. It might be possible to add sugar so that `where &T: +As<&MyType>` expands to the above automatically, but such an elision +might have other problems, and in any case it would preclude writing +direct bounds like `fn foo`. + +The proposed trait definition essentially *bakes in* the needed +lifetime connection, capturing the most common mode of use for +`as`/`to`/`into` conversions. In the future, an HKT-based version of +these traits could likely generalize further. + +**Why have multiple traits at all**? + +The biggest reason to have multiple traits is to take advantage of the +lifetime linking explained above. In addition, however, it is a basic +principle of Rust's libraries that conversions are distinguished by +cost and consumption, and having multiple traits makes it possible to +(by convention) restrict attention to e.g. "free" `as`-style conversions +by bounding only by `AsRef`. + +Why have both `Into` and `From`? There are a few reasons: + +* Coherence issues: the order of the types is significant, so `From` + allows extensibility in some cases that `Into` does not. + +* To match with existing conventions around conversions and + constructors (in particular, replacing many `from` constructors). + +## Blanket `impl`s + +Given the above trait design, there are a few straightforward blanket +`impl`s as one would expect: + +```rust +// AsMut implies Into +impl<'a, T, U> Into<&'a mut U> for &'a mut T where T: AsMut { + fn into(self) -> &'a mut U { + self.as_mut() + } +} + +// Into implies From +impl From for U where T: Into { + fn from(t: T) -> U { t.into() } +} +``` + +## An example + +Using all of the above, here are some example `impl`s and their use: + +```rust +impl AsRef for String { + fn as_ref(&self) -> &str { + self.as_slice() + } +} +impl AsRef<[u8]> for String { + fn as_ref(&self) -> &[u8] { + self.as_bytes() + } +} + +impl Into> for String { + fn into(self) -> Vec { + self.into_bytes() + } +} + +fn main() { + let a = format!("hello"); + let b: &[u8] = a.as_ref(); + let c: &str = a.as_ref(); + let d: Vec = a.into(); +} +``` + +This use of generic conversions within a function body is expected to +be rare, however; usually the traits are used for generic functions: + +``` +impl Path { + fn join_path_inner(&self, p: &Path) -> PathBuf { ... } + + pub fn join_path>(&self, p: &P) -> PathBuf { + self.join_path_inner(p.as_ref()) + } +} +``` + +In this very typical pattern, you introduce an "inner" function that +takes the converted value, and the public API is a thin wrapper around +that. The main reason to do so is to avoid code bloat: given that the +generic bound is used only for a conversion that can be done up front, +there is no reason to monomorphize the entire function body for each +input type. + +### An aside: codifying the generics pattern in the language + +This pattern is so common that we probably want to consider sugar for +it, e.g. something like: + +```rust +impl Path { + pub fn join_path(&self, p: ~Path) -> PathBuf { + ... + } +} +``` + +that would desugar into exactly the above (assuming that the `~` sigil +was restricted to `AsRef` conversions). Such a feature is out of scope +for this RFC, but it's a natural and highly ergonomic extension of the +traits being proposed here. + +## Preliminary conventions + +Would *all* conversion traits be replaced by the proposed ones? +Probably not, due to the combination of two factors (using the example +of `To`, despite its being deferred for now): + +* You still want blanket `impl`s like `ToString` for `Show`, but: +* This RFC proposes that specific conversion *methods* like + `to_string` stay in common use. + +On the other hand, you'd expect a blanket `impl` of `To` for +any `T: ToString`, and one should prefer bounding over `To` +rather than `ToString` for consistency. Basically, the role of +`ToString` is just to provide the ad hoc method name `to_string` in a +blanket fashion. + +So a rough, preliminary convention would be the following: + +* An *ad hoc conversion method* is one following the normal convention + of `as_foo`, `to_foo`, `into_foo` or `from_foo`. A "generic" + conversion method is one going through the generic traits proposed + in this RFC. An *ad hoc conversion trait* is a trait providing an ad + hoc conversion method. + +* Use ad hoc conversion methods for "natural", *outgoing* conversions + that should have easy method names and good discoverability. A + conversion is "natural" if you'd call it directly on the type in + normal code; "unnatural" conversions usually come from generic + programming. + + For example, `to_string` is a natural conversion for `str`, while + `into_string` is not; but the latter is sometimes useful in a + generic context -- and that's what the generic conversion traits can + help with. + +* On the other hand, favor `From` for all conversion constructors. + +* Introduce ad hoc conversion *traits* if you need to provide a + blanket `impl` of an ad hoc conversion method, or need special + functionality. For example, `to_string` needs a trait so that every + `Show` type automatically provides it. + +* For any ad hoc conversion method, *also* provide an `impl` of the + corresponding generic version; for traits, this should be done via a + blanket `impl`. + +* When using generics bounded over a conversion, always prefer to use + the generic conversion traits. For example, bound `S: To` + not `S: ToString`. This encourages consistency, and also allows + clients to take advantage of the various blanket generic conversion + `impl`s. + +* Use the "inner function" pattern mentioned above to avoid code + bloat. + +## Prelude changes + +*All* of the conversion traits are added to the prelude. There are two + reasons for doing so: + +* For `AsRef`/`AsMut`/`Into`, the reasoning is similar to the + inclusion of `PartialEq` and friends: they are expected to appear + ubiquitously as bounds. + +* For `From`, bounds are somewhat less common but the use of the + `from` constructor is expected to be rather widespread. + +# Drawbacks + +There are a few drawbacks to the design as proposed: + +* Since it does not replace all conversion traits, there's the + unfortunate case of having both a `ToString` trait and a + `To` trait bound. The proposed conventions go some distance + toward at least keeping APIs consistent, but the redundancy is + unfortunate. See Alternatives for a more radical proposal. + +* It may encourage more overloading over coercions, and also more + generics code bloat (assuming that the "inner function" pattern + isn't followed). Coercion overloading is not necessarily a bad + thing, however, since it is still explicit in the signature rather + than wholly implicit. If we do go in this direction, we can consider + language extensions that make it ergonomic *and* avoid code bloat. + +# Alternatives + +The original form of this RFC used the names `As.convert_as`, +`AsMut.convert_as_mut`, `To.convert_to` and `Into.convert_into` (though +still `From.from`). After discussion `As` was changed to `AsRef`, +removing the keyword collision of a method named `as`, and the +`convert_` prefixes were removed. + +--- + +The main alternative is one that attempts to provide methods that +*completely replace* ad hoc conversion methods. To make this work, a +form of double dispatch is used, so that the methods are added to +*every type* but bounded by a separate set of conversion traits. + +In this strawman proposal, the name "view shift" is used for `as` +conversions, "conversion" for `to` conversions, and "transformation" +for `into` conversions. These names are not too important, but needed +to distinguish the various generic methods. + +The punchline is that, in the end, we can write + +```rust +let s = format!("hello"); +let b = s.shift_view::<[u8]>(); +``` + +or, put differently, replace `as_bytes` with `shift_view::<[u8]>` -- +for better or worse. + +In addition to the rather large jump in complexity, this alternative +design also suffers from poor error messages. For example, if you +accidentally typed `shift_view::` instead, you receive: + +``` +error: the trait `ShiftViewFrom` is not implemented for the type `u8` +``` + +which takes a bit of thought and familiarity with the traits to fully +digest. Taken together, the complexity, error messages, and poor +ergonomics of things like `convert::` rather than `as_bytes` led +the author to discard this alternative design. + +```rust +// VIEW SHIFTS + +// "Views" here are always lightweight, non-lossy, always +// successful view shifts between reference types + +// Immutable views + +trait ShiftViewFrom { + fn shift_view_from(&T) -> &Self; +} + +trait ShiftView { + fn shift_view(&self) -> &T where T: ShiftViewFrom; +} + +impl ShiftView for T { + fn shift_view>(&self) -> &U { + ShiftViewFrom::shift_view_from(self) + } +} + +// Mutable coercions + +trait ShiftViewFromMut { + fn shift_view_from_mut(&mut T) -> &mut Self; +} + +trait ShiftViewMut { + fn shift_view_mut(&mut self) -> &mut T where T: ShiftViewFromMut; +} + +impl ShiftViewMut for T { + fn shift_view_mut>(&mut self) -> &mut U { + ShiftViewFromMut::shift_view_from_mut(self) + } +} + +// CONVERSIONS + +trait ConvertFrom { + fn convert_from(&T) -> Self; +} + +trait Convert { + fn convert(&self) -> T where T: ConvertFrom; +} + +impl Convert for T { + fn convert(&self) -> U where U: ConvertFrom { + ConvertFrom::convert_from(self) + } +} + +impl ConvertFrom for Vec { + fn convert_from(s: &str) -> Vec { + s.to_string().into_bytes() + } +} + +// TRANSFORMATION + +trait TransformFrom { + fn transform_from(T) -> Self; +} + +trait Transform { + fn transform(self) -> T where T: TransformFrom; +} + +impl Transform for T { + fn transform(self) -> U where U: TransformFrom { + TransformFrom::transform_from(self) + } +} + +impl TransformFrom for Vec { + fn transform_from(s: String) -> Vec { + s.into_bytes() + } +} + +impl<'a, T, U> TransformFrom<&'a T> for U where U: ConvertFrom { + fn transform_from(x: &'a T) -> U { + x.convert() + } +} + +impl<'a, T, U> TransformFrom<&'a mut T> for &'a mut U where U: ShiftViewFromMut { + fn transform_from(x: &'a mut T) -> &'a mut U { + ShiftViewFromMut::shift_view_from_mut(x) + } +} + +// Example + +impl ShiftViewFrom for str { + fn shift_view_from(s: &String) -> &str { + s.as_slice() + } +} +impl ShiftViewFrom for [u8] { + fn shift_view_from(s: &String) -> &[u8] { + s.as_bytes() + } +} + +fn main() { + let s = format!("hello"); + let b = s.shift_view::<[u8]>(); +} +``` + +## Possible further work + +We could add a `To` trait. + +```rust +trait To { + fn to(&self) -> T; +} +``` + +As far as blanket `impl`s are concerned, there are a few simple ones: + +```rust +// AsRef implies To +impl<'a, T: ?Sized, U: ?Sized> To<&'a U> for &'a T where T: AsRef { + fn to(&self) -> &'a U { + self.as_ref() + } +} + +// To implies Into +impl<'a, T, U> Into for &'a T where T: To { + fn into(self) -> U { + self.to() + } +} +``` diff --git a/text/0531-define-rfc-scope.md b/text/0531-define-rfc-scope.md new file mode 100644 index 00000000000..686b1fd682a --- /dev/null +++ b/text/0531-define-rfc-scope.md @@ -0,0 +1,49 @@ +- Start Date: 2014-12-18 +- RFC PR: [531](https://github.com/rust-lang/rfcs/pull/531) +- Rust Issue: n/a + +# Summary + +According to current documents, the RFC process is required to make "substantial" changes to the Rust +distribution. It is currently lightweight, but lacks a definition for the Rust distribution. This RFC +aims to amend the process with a both broad and clear definition of "Rust distribution," while still +keeping the process itself in tact. + +# Motivation + +The motivation for this change comes from the recent decision for Crates.io to affirm its first come, +first serve policy. While there was discussion of the matter on a GitHub issue, this discussion was +rather low visibility. Regardless of the outcome of this particular decision, it highlights the +fact that there is not a clear place for thorough discussion of policy decisions related to the +outermost parts of Rust. + +# Detailed design + +To remedy this issue, there must be a defined scope for the RFC process. This definition would be +incorporated into the section titled "When you need to follow this process." The goal here is to be as +explicit as possible. This RFC proposes that the scope of the RFC process be defined as follows: + +* Rust +* Cargo +* Crates.io +* The RFC process itself + +This definition explicitly does not include: + +* Other crates maintained under the rust-lang organization, such as time. + +# Drawbacks + +The only particular drawback would be if this definition is too narrow, it might be restrictive. +However, this definition fortunately includes the ability to amend the RFC process. So, this +could be expanded if the need exists. + +# Alternatives + +The alternative is leaving the process as is. However, adding clarity at little to no cost should +be preferred as it lowers the barrier to entry for contributions, and increases the visibility of +potential changes that may have previously been discussed outside of an RFC. + +# Unresolved questions + +Are there other things that should be explicitly included as part of the scope of the RFC process right now? diff --git a/text/0532-self-in-use.md b/text/0532-self-in-use.md new file mode 100644 index 00000000000..1e0859ddd52 --- /dev/null +++ b/text/0532-self-in-use.md @@ -0,0 +1,69 @@ +- Start Date: 2014-12-19 +- RFC PR: [532](https://github.com/rust-lang/rfcs/pull/532) +- Rust Issue: [20361](https://github.com/rust-lang/rust/issues/20361) + +# Summary + +This RFC proposes the `mod` keyword used to refer +the immediate parent namespace in `use` items (`use a::b::{mod, c}`) +to be changed to `self`. + +# Motivation + +While this looks fine: + +````rust +use a::b::{mod, c}; + +pub mod a { + pub mod b { + pub type c = (); + } +} +```` + +This looks strange, since we are not really importing a module: + +````rust +use Foo::{mod, Bar, Baz}; + +enum Foo { Bar, Baz } +```` + +RFC #168 was written when there was no namespaced `enum`, +therefore the choice of the keyword was suboptimal. + +# Detailed design + +This RFC simply proposes to use `self` in place of `mod`. +This should amount to one line change to the parser, +possibly with a renaming of relevant AST node (`PathListMod`). + +# Drawbacks + +`self` is already used to denote a relative path in the `use` item. +While they can be clearly distinguished +(any use of `self` proposed in this RFC will appear inside braces), +this can cause some confusion to beginners. + +# Alternatives + +Don't do this. +Simply accept that `mod` also acts as a general term for namespaces. + +Allow `enum` to be used in place of `mod` when the parent item is `enum`. +This clearly expresses the intent and it doesn't reuse `self`. +However, this is not very future-proof for several reasons. + +* Any item acting as a namespace would need a corresponding keyword. + This is backward compatible but cumbersome. +* If such namespace is not defined with an item but only implicitly, + we may not have a suitable keyword to use. +* We currently import all items sharing the same name (e.g. `struct P(Q);`), + with no way of selectively importing one of them by the item type. + An explicit item type in `use` will imply that we *can* selectively import, + while we actually can't. + +# Unresolved questions + +None. diff --git a/text/0533-no-array-elem-moves.md b/text/0533-no-array-elem-moves.md new file mode 100644 index 00000000000..c68271edd4b --- /dev/null +++ b/text/0533-no-array-elem-moves.md @@ -0,0 +1,91 @@ +- Start Date: 2014-12-19 +- RFC PR: [rust-lang/rfcs#533](https://github.com/rust-lang/rfcs/pull/533) +- Rust Issue: [rust-lang/rust#21963](https://github.com/rust-lang/rust/issues/21963) + +# Summary + +In order to prepare for an expected future implementation of +[non-zeroing dynamic drop], remove support for: + +* moving individual elements into an *uninitialized* fixed-sized array, and + +* moving individual elements out of fixed-sized arrays `[T; n]`, + (copying and borrowing such elements is still permitted). + +[non-zeroing dynamic drop]: https://github.com/rust-lang/rfcs/pull/320 + +# Motivation + +If we want to continue supporting dynamic drop while also removing +automatic memory zeroing and drop-flags, then we need to either (1.) +adopt potential complex code generation strategies to support arrays +with only *some* elements initialized (as discussed in the [unresolved +questions for RFC PR 320], or we need to (2.) remove support for +constructing such arrays in safe code. + +[unresolved questions for RFC PR 320]: https://github.com/pnkfelix/rfcs/blob/6288739c584ee6830aa0f79f983c5e762269c562/active/0000-nonzeroing-dynamic-drop.md#how-to-handle-moves-out-of-arrayindex_expr + +This RFC is proposing the second tack. + +The expectation is that relatively few libraries need to support +moving out of fixed-sized arrays (and even fewer take advantage of +being able to initialize individual elements of an uninitialized +array, as supporting this was almost certainly not intentional in the +language design). Therefore removing the feature from the language +will present relatively little burden. + +# Detailed design + +If an expression `e` has type `[T; n]` and `T` does not implement +`Copy`, then it will be illegal to use `e[i]` in an r-value position. + +If an expression `e` has type `[T; n]` expression `e[i] = ` +will be made illegal at points in the control flow where `e` has not +yet been initialized. + +Note that it *remains* legal to overwrite an element in an initialized +array: `e[i] = `, as today. This will continue to drop the +overwritten element before moving the result of `` into place. + +Note also that the proposed change has no effect on the semantics of +destructuring bind; i.e. `fn([a, b, c]: [Elem; 3]) { ... }` will +continue to work as much as it does today. + +A prototype implementation has been posted at [Rust PR 21930]. + +[Rust PR 21930]: https://github.com/rust-lang/rust/pull/21930 + +# Drawbacks + +* Adopting this RFC is introducing a limitation on the language based + on a hypothetical optimization that has not yet been implemented + (though much of the ground work for its supporting analyses are + done). + +Also, as noted in the [comment thread from RFC PR 320] + +[comment thread from RFC PR 320]: https://github.com/rust-lang/rfcs/pull/320#issuecomment-59533551 + +* We support moving a single element out of an n-tuple, and "by + analogy" we should support moving out of `[T; n]` + (Note that one can still move out of `[T; n]` in some cases + via destructuring bind.) + +* It is "nice" to be able to write + ```rust + fn grab_random_from(actions: [Action; 5]) -> Action { actions[rand_index()] } + ``` + to express this now, one would be forced to instead use clone() (or + pass in a `Vec` and do some element swapping). + + +# Alternatives + +We can just leave things as they are; there are hypothetical +code-generation strategies for supporting non-zeroing drop even with +this feature, as discussed in the [comment thread from RFC PR 320]. + +# Unresolved questions + +None + diff --git a/text/0534-deriving2derive.md b/text/0534-deriving2derive.md new file mode 100644 index 00000000000..8f710a039ec --- /dev/null +++ b/text/0534-deriving2derive.md @@ -0,0 +1,31 @@ +- Start Date: 2014-19-19 +- RFC PR: [534](https://github.com/rust-lang/rfcs/pull/534) +- Rust Issue: [20362](https://github.com/rust-lang/rust/issues/20362) + +# Summary + +Rename the `#[deriving(Foo)]` syntax extension to `#[derive(Foo)]`. + +# Motivation + +Unlike our other verb-based attribute names, "deriving" stands alone as a +present participle. By convention our attributes prefer "warn" rather than +"warning", "inline" rather than "inlining", "test" rather than "testing", and so +on. We also have a trend against present participles in general, such as with +`Encoding` being changed to `Encode`. + +It's also shorter to type, which is very important in a world without implicit +Copy implementations. + +Finally, if I may be subjective, `derive(Thing1, Thing2)` simply reads better +than `deriving(Thing1, Thing2)`. + +# Detailed design + +Rename the `deriving` attribute to `derive`. This should be a very simple find- +and-replace. + +# Drawbacks + +Participles the world over will lament the loss of their only foothold in this +promising young language. diff --git a/text/0544-rename-int-uint.md b/text/0544-rename-int-uint.md new file mode 100644 index 00000000000..4ee61964487 --- /dev/null +++ b/text/0544-rename-int-uint.md @@ -0,0 +1,250 @@ +- Start Date: 2014-12-28 +- RFC PR #: [rust-lang/rfcs#544](https://github.com/rust-lang/rfcs/pull/544) +- Rust Issue #: [rust-lang/rust#20639](https://github.com/rust-lang/rust/issues/20639) + +# Summary + +This RFC proposes that we rename the pointer-sized integer types `int/uint`, so as to avoid misconceptions and misuses. After extensive community discussions and several revisions of this RFC, the finally chosen names are `isize/usize`. + +# Motivation + +Currently, Rust defines two [machine-dependent integer types](http://doc.rust-lang.org/reference.html#machine-dependent-integer-types) `int/uint` that have the same number of bits as the target platform's pointer type. These two types are used for many purposes: indices, counts, sizes, offsets, etc. + +The problem is, `int/uint` *look* like default integer types, but pointer-sized integers are not good defaults, and it is desirable to discourage people from overusing them. + +And it is a quite popular opinion that, the best way to discourage their use is to rename them. + +Previously, the latest renaming attempt [RFC PR 464](https://github.com/rust-lang/rfcs/pull/464) was rejected. (Some parts of this RFC is based on that RFC.) [A tale of two's complement](http://discuss.rust-lang.org/t/a-tale-of-twos-complement/1062) states the following reasons: + +- Changing the names would affect literally every Rust program ever written. +- Adjusting the guidelines and tutorial can be equally effective in helping people to select the correct type. +- All the suggested alternative names have serious drawbacks. + +However: + +Rust was and is undergoing quite a lot of breaking changes. Even though the `int/uint` renaming will "break the world", it is not unheard of, and it is mainly a "search & replace". Also, a transition period can be provided, during which `int/uint` can be deprecated, while the new names can take time to replace them. So "to avoid breaking the world" shouldn't stop the renaming. + +`int/uint` have a long tradition of being the default integer type names, so programmers *will* be tempted to use them in Rust, even the experienced ones, no matter what the documentation says. The semantics of `int/uint` in Rust is quite different from that in many other mainstream languages. Worse, the Swift programming language, which is heavily influenced by Rust, has the types `Int/UInt` with *almost* the *same semantics* as Rust's `int/uint`, but it *actively encourages* programmers to use `Int` as much as possible. From [the Swift Programming Language](https://developer.apple.com/library/prerelease/ios/documentation/Swift/Conceptual/Swift_Programming_Language/TheBasics.html#//apple_ref/doc/uid/TP40014097-CH5-ID319): + +> Swift provides an additional integer type, Int, which has the same size as the current platform’s native word size: ... + +> Swift also provides an unsigned integer type, UInt, which has the same size as the current platform’s native word size: ... + +> Unless you need to work with a specific size of integer, always use Int for integer values in your code. This aids code consistency and interoperability. + +> Use UInt only when you specifically need an unsigned integer type with the same size as the platform’s native word size. If this is not the case, Int is preferred, even when the values to be stored are known to be non-negative. + +Thus, it is very likely that newcomers will come to Rust, expecting `int/uint` to be the preferred integer types, *even if they know that they are pointer-sized*. + +Not renaming `int/uint` violates the principle of least surprise, and is not newcomer friendly. + +Before the rejection of [RFC PR 464](https://github.com/rust-lang/rfcs/pull/464), the community largely settled on two pairs of candidates: `imem/umem` and `iptr/uptr`. As stated in previous discussions, the names have some drawbacks that may be unbearable. (Please refer to [A tale of two's complement](http://discuss.rust-lang.org/t/a-tale-of-twos-complement/1062) and related discussions for details.) + +This RFC originally proposed a new pair of alternatives `intx/uintx`. + +However, given the discussions about the previous revisions of this RFC, and the discussions in [Restarting the `int/uint` Discussion]( http://discuss.rust-lang.org/t/restarting-the-int-uint-discussion/1131), this RFC author (@CloudiDust) now believes that `intx/uintx` are not ideal. Instead, one of the other pairs of alternatives should be chosen. The finally chosen names are `isize/usize`. + +# Detailed Design + +- Rename `int/uint` to `isize/usize`, with them being their own literal suffixes. +- Update code and documentation to use pointer-sized integers more narrowly for their intended purposes. Provide a deprecation period to carry out these updates. + +`usize` in action: + +```rust +fn slice_or_fail<'b>(&'b self, from: &usize, to: &usize) -> &'b [T] +``` + +There are different opinions about which literal suffixes to use. The following section would discuss the alternatives. + +## Choosing literal suffixes: + +### `isize/usize`: + +* Pros: They are the same as the type names, very consistent with the rest of the integer primitives. +* Cons: They are too long for some, and may stand out too much as suffixes. However, discouraging people from overusing `isize/usize` is the point of this RFC. And if they are not overused, then this will not be a problem in practice. + +### `is/us`: + +* Pros: They are succinct as suffixes. +* Cons: They are actual English words, with `is` being a keyword in many programming languages and `us` being an abbreviation of "unsigned" (losing information) or "microsecond" (misleading). Also, `is/us` may be *too* short (shorter than `i64/u64`) and *too* pleasant to use, which can be a problem. + +Note: No matter which suffixes get chosen, it can be beneficial to reserve `is` as a keyword, but this is outside the scope of this RFC. + +### `iz/uz`: + +* Pros and cons: Similar to those of `is/us`, except that `iz/uz` are not actual words, which is an additional advantage. However it may not be immediately clear that `iz/uz` are abbreviations of `isize/usize`. + +### `i/u`: + +* Pros: They are very succinct. +* Cons: They are *too* succinct and carry the "default integer types" connotation, which is undesirable. + +### `isz/usz`: + +* Pros: They are the middle grounds between `isize/usize` and `is/us`, neither too long nor too short. They are not actual English words and it's clear that they are short for `isize/usize`. +* Cons: Not everyone likes the appearances of `isz/usz`, but this can be said about all the candidates. + +After community discussions, it is deemed that using `isize/usize` directly as suffixes is a fine choice and there is no need to introduce other suffixes. + +## Advantages of `isize/usize`: + +- The names indicate their common use cases (container sizes/indices/offsets), so people will know where to use them, instead of overusing them everywhere. +- The names follow the `i/u + {suffix}` pattern that is used by all the other primitive integer types like `i32/u32`. +- The names are newcomer friendly and have familiarity advantage over almost all other alternatives. +- The names are easy on the eyes. + +See **Alternatives B to L** for the alternatives to `isize/usize` that have been rejected. + +# Drawbacks + +## Drawbacks of the renaming in general: + +- Renaming `int`/`uint` requires changing much existing code. On the other hand, this is an ideal opportunity to fix integer portability bugs. + +## Drawbacks of `isize/usize`: + +- The names fail to indicate the precise semantics of the types - *pointer-sized integers*. (And they don't follow the `i32/u32` pattern as faithfully as possible, as `32` indicates the exact size of the types, but `size` in `isize/usize` is vague in this aspect.) +- The names favour some of the types' use cases over the others. +- The names remind people of C's `ssize_t/size_t`, but `isize/usize` don't share the exact same semantics with the C types. + +Familiarity is a double edged sword here. `isize/usize` are chosen not because they are perfect, but because they represent a good compromise between semantic accuracy, familiarity and code readability. Given good documentation, the drawbacks listed here may not matter much in practice, and the combined familiarity and readability advantage outweighs them all. + +# Alternatives + +## A. Keep the status quo: + +Which may hurt in the long run, especially when there is at least one (would-be?) high-profile language (which is Rust-inspired) taking the opposite stance of Rust. + +The following alternatives make different trade-offs, and choosing one would be quite a subjective matter. But they are all better than the status quo. + +## B. `iptr/uptr`: + +- Pros: "Pointer-sized integer", exactly what they are. +- Cons: C/C++ have `intptr_t/uintptr_t`, which are typically *only* used for storing casted pointer values. We don't want people to confuse the Rust types with the C/C++ ones, as the Rust ones have more typical use cases. Also, people may wonder why all data structures have "pointers" in their method signatures. Besides the "funny-looking" aspect, the names may have an incorrect "pointer fiddling and unsafe staff" connotation there, as `ptr` isn't usually seen in safe Rust code. + +In the following snippet: + +```rust +fn slice_or_fail<'b>(&'b self, from: &uptr, to: &uptr) -> &'b [T] +``` + +It feels like working with pointers, not integers. + +## C. `imem/umem`: + +When originally proposed, `mem`/`m` are interpreted as "memory numbers" (See @1fish2's comment in [RFC PR 464](https://github.com/rust-lang/rfcs/pull/464)): + +> `imem`/`umem` are "memory numbers." They're good for indexes, counts, offsets, sizes, etc. As memory numbers, it makes sense that they're sized by the address space. + +However this interpretation seems vague and not quite convincing, especially when all other integer types in Rust are named precisely in the "`i`/`u` + `{size}`" pattern, with no "indirection" involved. What is "memory-sized" anyway? But actually, they can be interpreted as **_mem_ory-pointer-sized**, and be a *precise* size specifier just like `ptr`. + +- Pros: Types with similar names do not exist in mainstream languages, so people will not make incorrect assumptions. +- Cons: `mem` -> *memory-pointer-sized* is definitely not as obvious as `ptr` -> *pointer-sized*. The unfamiliarity may turn newcomers away from Rust. + +Also, for some, `imem/umem` just don't feel like integers no matter how they are interpreted, especially under certain circumstances. In the following snippet: + +```rust +fn slice_or_fail<'b>(&'b self, from: &umem, to: &umem) -> &'b [T] +``` + +`umem` still feels like a pointer-like construct here (from "some memory" to "some other memory"), even though it doesn't have `ptr` in its name. + +## D. `intp/uintp` and `intm/uintm`: + +Variants of Alternatives B and C. Instead of stressing the `ptr` or `mem` part, they stress the `int` or `uint` part. + +They are more integer-like than `iptr/uptr` or `imem/umem` if one knows where to split the words. + +The problem here is that they don't strictly follow the `i/u + {size}` pattern, are of different lengths, and the more frequently used type `uintp`(`uintm`) has a longer name. Granted, this problem already exists with `int/uint`, but those two are names that everyone is familiar with. + +So they may not be as pretty as `iptr/uptr` or `imem/umem`. + +```rust +fn slice_or_fail<'b>(&'b self, from: &uintm, to: &uintm) -> &'b [T] +fn slice_or_fail<'b>(&'b self, from: &uintp, to: &uintp) -> &'b [T] +``` + +## E. `intx/uintx`: + +The original proposed names of this RFC, where `x` means "unknown/variable/platform-dependent". + +They share the same problems with `intp/uintp` and `intm/uintm`, while *in addition* failing to be specific enough. There are other kinds of platform-dependent integer types after all (like register-sized ones), so which ones are `intx/uintx`? + +## F. `idiff/usize`: + +There is a problem with `isize`: it most likely will remind people of C/C++ `ssize_t`. But `ssize_t` is in the POSIX standard, not the C/C++ ones, and is *not for index offsets* according to POSIX. The correct type for index offsets in C99 is `ptrdiff_t`, so for a type representing offsets, `idiff` may be a better name. + +However, `isize/usize` have the advantage of being symmetrical, and ultimately, even with a name like `idiff`, some semantic mismatch between `idiff` and `ptrdiff_t` would still exist. Also, for fitting a casted pointer value, a type named `isize` is better than one named `idiff`. (Though both would lose to `iptr`.) + +## G. `iptr/uptr` *and* `idiff/usize`: + +Rename `int/uint` to `iptr/uptr`, with `idiff/usize` being aliases and used in container method signatures. + +This is for addressing the "not enough use cases covered" problem. Best of both worlds at the first glance. + +`iptr/uptr` will be used for storing casted pointer values, while `idiff/usize` will be used for offsets and sizes/indices, respectively. + +`iptr/uptr` and `idiff/usize` may even be treated as different types to prevent people from accidentally mixing their usage. + +This will bring the Rust type names quite in line with the standard C99 type names, which may be a plus from the familiarity point of view. + +However, this setup brings two sets of types that share the same underlying representations. C distinguishes between `size_t`/`uintptr_t`/`intptr_t`/`ptrdiff_t` not only because they are used under different circumstances, but also because the four may have representations that are potentially different from *each other* on some architectures. Rust assumes a flat memory address space and its `int/uint` types don't exactly share semantics with any of the C types if the C standard is strictly followed. + +Thus, even introducing four names would not fix the "failing to express the precise semantics of the types" problem. Rust just doesn't need to, and *shouldn't* distinguish between `iptr/idiff` and `uptr/usize`, doing so would bring much confusion for very questionable gain. + +## H. `isiz/usiz`: + +A pair of variants of `isize/usize`. This author believes that the missing `e` may be enough to warn people that these are not `ssize_t/size_t` with "Rustfied" names. But at the same time, `isiz/usiz` mostly retain the familiarity of `isize/usize`. + +However, `isiz/usiz` still hide the actual semantics of the types, and omitting but a single letter from a word does feel too hack-ish. + +```rust +fn slice_or_fail<'b>(&'b self, from: &usiz, to: &usiz) -> &'b [T] +``` + +## I. `iptr_size/uptr_size`: + +The names are very clear about the semantics, but are also irregular, too long and feel out of place. + +```rust +fn slice_or_fail<'b>(&'b self, from: &uptr_size, to: &uptr_size) -> &'b [T] +``` + +## J. `iptrsz/uptrsz`: + +Clear semantics, but still a bit too long (though better than `iptr_size/uptr_size`), and the `ptr` parts are still a bit concerning (though to a much less extent than `iptr/uptr`). On the other hand, being "a bit too long" may not be a disadvantage here. + +```rust +fn slice_or_fail<'b>(&'b self, from: &uptrsz, to: &uptrsz) -> &'b [T] +``` + +## K. `ipsz/upsz`: + +Now (and only now, which is the problem) it is clear where this pair of alternatives comes from. + +By shortening `ptr` to `p`, `ipsz/upsz` no longer stress the "pointer" parts in anyway. Instead, the `sz` or "size" parts are (comparatively) stressed. Interestingly, `ipsz/upsz` look similar to `isiz/usiz`. + +So this pair of names actually reflects both the precise semantics of "pointer-sized integers" and the fact that they are commonly used for "sizes". However, + +```rust +fn slice_or_fail<'b>(&'b self, from: &upsz, to: &upsz) -> &'b [T] +``` + +`ipsz/upsz` have gone too far. They are completely incomprehensible without the documentation. Many rightfully do not like letter soup. The only advantage here is that, no one would be very likely to think he/she is dealing with pointers. `iptrsz/uptrsz` are better in the comprehensibility aspect. + +## L. Others: + +There are other alternatives not covered in this RFC. Please refer to this RFC's comments and [RFC PR 464](https://github.com/rust-lang/rfcs/pull/464) for more. + +# Unresolved questions + +None. Necessary decisions about Rust's general integer type policies have been made in [Restarting the `int/uint` Discussion](http://discuss.rust-lang.org/t/restarting-the-int-uint-discussion/1131). + +# History + +Amended by [RFC 573][573] to change the suffixes from `is` and `us` to +`isize` and `usize`. Tracking issue for this amendment is +[rust-lang/rust#22496](https://github.com/rust-lang/rust/issues/22496). + +[573]: https://github.com/rust-lang/rfcs/pull/573 diff --git a/text/0546-Self-not-sized-by-default.md b/text/0546-Self-not-sized-by-default.md new file mode 100644 index 00000000000..fb08caecc2a --- /dev/null +++ b/text/0546-Self-not-sized-by-default.md @@ -0,0 +1,106 @@ +- Start Date: 2015-01-03 +- RFC PR: [rust-lang/rfcs#546](https://github.com/rust-lang/rfcs/pull/546) +- Rust Issue: [rust-lang/rust#20497](https://github.com/rust-lang/rust/issues/20497) + +# Summary + +1. Remove the `Sized` default for the implicitly declared `Self` + parameter on traits. +2. Make it "object unsafe" for a trait to inherit from `Sized`. + +# Motivation + +The primary motivation is to enable a trait object `SomeTrait` to +implement the trait `SomeTrait`. This was the design goal of enforcing +object safety, but there was a detail that was overlooked, which this +RFC aims to correct. + +Secondary motivations include: + +- More generality for traits, as they are applicable to DST. +- Eliminate the confusing and irregular `impl Trait for ?Sized` + syntax. +- Sidestep questions about whether the `?Sized` default is inherited + like other supertrait bounds that appear in a similar position. + +This change has been implemented. Fallout within the standard library +was quite minimal, since the default only affects default method +implementations. + +# Detailed design + +Currently, all type parameters are `Sized` by default, including the +implicit `Self` parameter that is part of a trait definition. To avoid +the default `Sized` bound on `Self`, one declares a trait as follows +(this example uses the syntax accepted in [RFC 490] but not yet +implemented): + +```rust +trait Foo for ?Sized { ... } +``` + +This syntax doesn't have any other precendent in the language. One +might expect to write: + +```rust +trait Foo : ?Sized { ... } +``` + +However, placing `?Sized` in the supertrait listing raises awkward +questions regarding inheritance. Certainly, when experimenting with +this syntax early on, we found it very surprising that the `?Sized` +bound was "inherited" by subtraits. At the same time, it makes no +sense to inherit, since all that the `?Sized` notation is saying is +"do not add `Sized`", and you can't inherit the absence of a +thing. Having traits simply not inherit from `Sized` by default +sidesteps this problem altogether and avoids the need for a special +syntax to supress the (now absent) default. + +Removing the default also has the benefit of making traits applicable +to more types by default. One particularly useful case is trait +objects. We are working towards a goal where the trait object for a +trait `Foo` always implements the trait `Foo`. Because the type `Foo` +is an unsized type, this is naturally not possible if `Foo` inherits +from `Sized` (since in that case every type that implements `Foo` must +also be `Sized`). + +The impact of this change is minimal under the current rules. This is +because it only affects default method implementations. In any actual +impl, the `Self` type is bound to a specific type, and hence it known +whether or not that type is `Sized`. This change has been implemented +and hence the fallout can be seen on [this branch] (specifically, +[this commit] contains the fallout from the standard library). That +same branch also implements the changes needed so that every trait +object `Foo` implements the trait `Foo`. + +[RFC 255]: https://github.com/rust-lang/rfcs/blob/master/text/0255-object-safety.md +[RFC 490]: https://github.com/rust-lang/rfcs/blob/master/text/0490-dst-syntax.md +[this branch]: https://github.com/nikomatsakis/rust/tree/impl-trait-for-trait-2 +[this commit]: https://github.com/nikomatsakis/rust/commit/d08a08ab82031b6f935bdaf160a28d9520ded1ab + +# Drawbacks + +The `Self` parameter is inconsistent with other type parameters if we +adopt this RFC. We believe this is acceptable since it is +syntactically distinguished in other ways (for example, it is not +declared), and the benefits are substantial. + +# Alternatives + +- Leave `Self` as it is. The change to object safety must be made in + any case, which would mean that for a trait object `Foo` to + implement the trait `Foo`, it would have to be declared `trait Foo + for Sized?`. Indeed, that would be necessary even to create a trait + object `Foo`. This seems like an untenable burden, so adopting this + design choice seems to imply reversing the decision that all trait + objects implement their respective traits ([RFC 255]). + +- Remove the `Sized` defaults altogether. This approach is purer, but + the annotation burden is substantial. We continue to experiment in + the hopes of finding an alternative to current blanket default, but + without success thus far (beyond the idea of doing global + inference). + +# Unresolved questions + +- None. diff --git a/text/0550-macro-future-proofing.md b/text/0550-macro-future-proofing.md new file mode 100644 index 00000000000..3cec600ab8c --- /dev/null +++ b/text/0550-macro-future-proofing.md @@ -0,0 +1,641 @@ +- Start Date: 2014-12-21 +- RFC PR: [550](https://github.com/rust-lang/rfcs/pull/550) +- Rust Issues: + - [20563](https://github.com/rust-lang/rust/pull/20563) + - [31135](https://github.com/rust-lang/rust/issues/31135) + +# Summary + +Future-proof the allowed forms that input to an MBE can take by requiring +certain delimiters following NTs in a matcher. In the future, it will be +possible to lift these restrictions backwards compatibly if desired. + +# Key Terminology + +- `macro`: anything invokable as `foo!(...)` in source code. +- `MBE`: macro-by-example, a macro defined by `macro_rules`. +- `matcher`: the left-hand-side of a rule in a `macro_rules` invocation, or a subportion thereof. +- `macro parser`: the bit of code in the Rust parser that will parse the input using a grammar derived from all of the matchers. +- `fragment`: The class of Rust syntax that a given matcher will accept (or "match"). +- `repetition` : a fragment that follows a regular repeating pattern +- `NT`: non-terminal, the various "meta-variables" or repetition matchers that can appear in a matcher, specified in MBE syntax with a leading `$` character. +- `simple NT`: a "meta-variable" non-terminal (further discussion below). +- `complex NT`: a repetition matching non-terminal, specified via Kleene closure operators (`*`, `+`). +- `token`: an atomic element of a matcher; i.e. identifiers, operators, open/close delimiters, *and* simple NT's. +- `token tree`: a tree structure formed from tokens (the leaves), complex NT's, and finite sequences of token trees. +- `delimiter token`: a token that is meant to divide the end of one fragment and the start of the next fragment. +- `separator token`: an optional delimiter token in an complex NT that separates each pair of elements in the matched repetition. +- `separated complex NT`: a complex NT that has its own separator token. +- `delimited sequence`: a sequence of token trees with appropriate open- and close-delimiters at the start and end of the sequence. +- `empty fragment`: The class of invisible Rust syntax that separates tokens, i.e. whitespace, or (in some lexical contexts), the empty token sequence. +- `fragment specifier`: The identifier in a simple NT that specifies which fragment the NT accepts. +- `language`: a context-free language. + +Example: + +```rust +macro_rules! i_am_an_mbe { + (start $foo:expr $($i:ident),* end) => ($foo) +} +``` + +`(start $foo:expr $($i:ident),* end)` is a matcher. The whole matcher +is a delimited sequence (with open- and close-delimiters `(` and `)`), +and `$foo` and `$i` are simple NT's with `expr` and `ident` as their +respective fragment specifiers. + +`$(i:ident),*` is *also* an NT; it is a complex NT that matches a +comma-seprated repetition of identifiers. The `,` is the separator +token for the complex NT; it occurs in between each pair of elements +(if any) of the matched fragment. + +Another example of a complex NT is `$(hi $e:expr ;)+`, which matches +any fragment of the form `hi ; hi ; ...` where `hi +;` occurs at least once. Note that this complex NT does not +have a dedicated separator token. + +(Note that Rust's parser ensures that delimited sequences always occur +with proper nesting of token tree structure and correct matching of open- +and close-delimiters.) + +# Motivation + +In current Rust (version 0.12; i.e. pre 1.0), the `macro_rules` parser is very liberal in what it accepts +in a matcher. This can cause problems, because it is possible to write an +MBE which corresponds to an ambiguous grammar. When an MBE is invoked, if the +macro parser encounters an ambiguity while parsing, it will bail out with a +"local ambiguity" error. As an example for this, take the following MBE: + +```rust +macro_rules! foo { + ($($foo:expr)* $bar:block) => (/*...*/) +}; +``` + +Attempts to invoke this MBE will never succeed, because the macro parser +will always emit an ambiguity error rather than make a choice when presented +an ambiguity. In particular, it needs to decide when to stop accepting +expressions for `foo` and look for a block for `bar` (noting that blocks are +valid expressions). Situations like this are inherent to the macro system. On +the other hand, it's possible to write an unambiguous matcher that becomes +ambiguous due to changes in the syntax for the various fragments. As a +concrete example: + +```rust +macro_rules! bar { + ($in:ty ( $($arg:ident)*, ) -> $out:ty;) => (/*...*/) +}; +``` + +When the type syntax was extended to include the unboxed closure traits, +an input such as `FnMut(i8, u8) -> i8;` became ambiguous. The goal of this +proposal is to prevent such scenarios in the future by requiring certain +"delimiter tokens" after an NT. When extending Rust's syntax in the future, +ambiguity need only be considered when combined with these sets of delimiters, +rather than any possible arbitrary matcher. + +---- + +Another example of a potential extension to the language that +motivates a restricted set of "delimiter tokens" is +([postponed][Postponed 961]) [RFC 352][], "Allow loops to return +values other than `()`", where the `break` expression would now accept +an optional input expression: `break `. + + * This proposed extension to the language, combined with the facts that + `break` and `{ ... ? }` are Rust expressions, implies that + `{` should not be in the follow set for the `expr` fragment specifier. + + * Thus in a slightly more ideal world the following program would not be + accepted, because the interpretation of the macro could change if we + were to accept RFC 352: + + ```rust + macro_rules! foo { + ($e:expr { stuff }) => { println!("{:?}", $e) } + } + + fn main() { + loop { foo!(break { stuff }); } + } + ``` + + (in our non-ideal world, the program is legal in Rust versions 1.0 + through at least 1.4) + +[RFC 352]: https://github.com/rust-lang/rfcs/pull/352 + +[Postponed 961]: https://github.com/rust-lang/rfcs/issues/961 + +# Detailed design + +We will tend to use the variable "M" to stand for a matcher, +variables "t" and "u" for arbitrary individual tokens, +and the variables "tt" and "uu" for arbitrary token trees. +(The use of "tt" does present potential ambiguity with its +additional role as a fragment specifier; but it will be clear +from context which interpretation is meant.) + +"SEP" will range over separator tokens, +"OP" over the Kleene operators `*` and `+`, and +"OPEN"/"CLOSE" over matching token pairs surrounding a delimited sequence (e.g. `[` and `]`). + +We also use Greek letters "α" "β" "γ" "δ" to stand for potentially empty +token-tree sequences. (However, the +Greek letter "ε" (epsilon) has a special role in the presentation and +does not stand for a token-tree sequence.) + + * This Greek letter convention is usually just employed when the + presence of a sequence is a technical detail; in particular, when I + wish to *emphasize* that we are operating on a sequence of + token-trees, I will use the notation "tt ..." for the sequence, not + a Greek letter + +Note that a matcher is merely a token tree. A "simple NT", as +mentioned above, is an meta-variable NT; thus it is a +non-repetition. For example, `$foo:ty` is a simple NT but +`$($foo:ty)+` is a complex NT. + +Note also that in the context of this RFC, the term "token" generally +*includes* simple NTs. + +Finally, it is useful for the reader to keep in mind that according to +the definitions of this RFC, no simple NT matches +the empty fragment, and likewise no token matches +the empty fragment of Rust syntax. (Thus, the *only* NT that can match +the empty fragment is a complex NT.) + +## The Matcher Invariant + +This RFC establishes the following two-part invariant for valid matchers + + 1. For any two successive token tree sequences in a matcher `M` + (i.e. `M = ... tt uu ...`), we must have + FOLLOW(`... tt`) ⊇ FIRST(`uu ...`) + + 2. For any separated complex NT in a matcher, `M = ... $(tt ...) SEP OP ...`, + we must have + `SEP` ∈ FOLLOW(`tt ...`). + +The first part says that whatever actual token that comes after a +matcher must be somewhere in the predetermined follow set. This +ensures that a legal macro definition will continue to assign the same +determination as to where `... tt` ends and `uu ...` begins, even as +new syntactic forms are added to the language. + +The second part says that a separated complex NT must use a seperator +token that is part of the predetermined follow set for the internal +contents of the NT. This ensures that a legal macro definition will +continue to parse an input fragment into the same delimited sequence +of `tt ...`'s, even as new syntactic forms are added to the language. + +(This is assuming that all such changes are appropriately restricted, +by the definition of FOLLOW below, of course.) + +The above invariant is only formally meaningful if one knows what +FIRST and FOLLOW denote. We address this in the following sections. + +## FIRST and FOLLOW, informally + +FIRST and FOLLOW are defined as follows. + +A given matcher M maps to three sets: FIRST(M), LAST(M) and FOLLOW(M). + +Each of the three sets is made up of tokens. FIRST(M) and LAST(M) may +also contain a distinguished non-token element ε ("epsilon"), which +indicates that M can match the empty fragment. (But FOLLOW(M) is +always just a set of tokens.) + +Informally: + + * FIRST(M): collects the tokens potentially used first when matching a fragment to M. + + * LAST(M): collects the tokens potentially used last when matching a fragment to M. + + * FOLLOW(M): the set of tokens allowed to follow immediately after some fragment + matched by M. + + In other words: t ∈ FOLLOW(M) if and only if there exists (potentially empty) token sequences α, β, γ, δ where: + * M matches β, + * t matches γ, and + * The concatenation α β γ δ is a parseable Rust program. + +We use the shorthand ANYTOKEN to denote the set of all tokens (including simple NTs). + + * (For example, if any token is legal after a matcher M, then FOLLOW(M) = ANYTOKEN.) + +(To review one's understanding of the above informal descriptions, the +reader at this point may want to jump ahead to the +[examples of FIRST/LAST][examples-of-first-and-last] before reading +their formal definitions.) + +## FIRST, LAST + +Below are formal inductive definitions for FIRST and LAST. + +"A ∪ B" denotes set union, "A ∩ B" denotes set intersection, and +"A \ B" denotes set difference (i.e. all elements of A that are not present +in B). + +FIRST(M), defined by case analysis on the sequence M and the structure +of its first token-tree (if any): + + * if M is the empty sequence, then FIRST(M) = { ε }, + + * if M starts with a token t, then FIRST(M) = { t }, + + (Note: this covers the case where M starts with a delimited + token-tree sequence, `M = OPEN tt ... CLOSE ...`, in which case `t = OPEN` and + thus FIRST(M) = { `OPEN` }.) + + (Note: this critically relies on the property that no simple NT matches the + empty fragment.) + + * Otherwise, M is a token-tree sequence starting with a complex NT: + `M = $( tt ... ) OP α`, or `M = $( tt ... ) SEP OP α`, + (where `α` is the (potentially empty) sequence of token trees for the rest of the matcher). + + * Let sep_set = { SEP } if SEP present; otherwise sep_set = {}. + + * If ε ∈ FIRST(`tt ...`), then FIRST(M) = (FIRST(`tt ...`) \ { ε }) ∪ sep_set ∪ FIRST(`α`) + + * Else if OP = `*`, then FIRST(M) = FIRST(`tt ...`) ∪ FIRST(`α`) + + * Otherwise (OP = `+`), FIRST(M) = FIRST(`tt ...`) + +Note: The ε-case above, + +> FIRST(M) = (FIRST(`tt ...`) \ { ε }) ∪ sep_set ∪ FIRST(`α`) + +may seem complicated, so lets take a moment to break it down. In the +ε case, the sequence `tt ...` may be empty. Therefore our first +token may be `SEP` itself (if it is present), or it may be the first +token of `α`); that's why the result is including "sep_set ∪ +FIRST(`α`)". Note also that if `α` itself may match the empty +fragment, then FIRST(`α`) will ensure that ε is included in our +result, and conversely, if `α` cannot match the empty fragment, then +we must *ensure* that ε is *not* included in our result; these two +facts together are why we can and should unconditionally remove ε +from FIRST(`tt ...`). + +---- + +LAST(M), defined by case analysis on M itself (a sequence of token-trees): + + * if M is the empty sequence, then LAST(M) = { ε } + + * if M is a singleton token t, then LAST(M) = { t } + + * if M is the singleton complex NT repeating zero or more times, + `M = $( tt ... ) *`, or `M = $( tt ... ) SEP *` + + * Let sep_set = { SEP } if SEP present; otherwise sep_set = {}. + + * if ε ∈ LAST(`tt ...`) then LAST(M) = LAST(`tt ...`) ∪ sep_set + + * otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt ...`) ∪ { ε } + + * if M is the singleton complex NT repeating one or more times, + `M = $( tt ... ) +`, or `M = $( tt ... ) SEP +` + + * Let sep_set = { SEP } if SEP present; otherwise sep_set = {}. + + * if ε ∈ LAST(`tt ...`) then LAST(M) = LAST(`tt ...`) ∪ sep_set + + * otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt ...`) + + * if M is a delimited token-tree sequence `OPEN tt ... CLOSE`, then LAST(M) = { `CLOSE` } + + * if M is a non-empty sequence of token-trees `tt uu ...`, + + * If ε ∈ LAST(`uu ...`), then LAST(M) = LAST(`tt`) ∪ (LAST(`uu ...`) \ { ε }). + + * Otherwise, the sequence `uu ...` must be non-empty; then LAST(M) = LAST(`uu ...`) + +NOTE: The presence or absence of SEP *is* relevant to the above +definitions, but solely in the case where the interior of the complex +NT could be empty (i.e. ε ∈ FIRST(interior)). (I overlooked this fact +in my first round of prototyping.) + +NOTE: The above definition for LAST assumes that we keep our +pre-existing rule that the seperator token in a complex NT is *solely* for +separating elements; i.e. that such NT's do not match fragments that +*end with* the seperator token. If we choose to lift this restriction +in the future, the above definition will need to be revised +accordingly. + +## Examples of FIRST and LAST +[examples-of-first-and-last]: #examples-of-first-and-last + +Below are some examples of FIRST and LAST. +(Note in particular how the special ε element is introduced and +eliminated based on the interation between the pieces of the input.) + +Our first example is presented in a tree structure to elaborate on how +the analysis of the matcher composes. (Some of the simpler subtrees +have been elided.) + + INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g + ~~~~~~~~ ~~~~~~~ ~ + | | | + FIRST: { $d:ident } { $e:expr } { h } + + + INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ + ~~~~~~~~~~~~~~~~~~ ~~~~~~~ ~~~ + | | | + FIRST: { $d:ident } { h, ε } { f } + + INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~ ~ + | | | | + FIRST: { $d:ident, ε } { h, ε, ; } { f } { g } + + + INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + | + FIRST: { $d:ident, h, ;, f } + +Thus: + + * FIRST(`$($d:ident $e:expr );* $( $(h)* );* $( f ;)+ g`) = { `$d:ident`, `h`, `;`, `f` } + +Note however that: + + * FIRST(`$($d:ident $e:expr );* $( $(h)* );* $($( f ;)+ g)*`) = { `$d:ident`, `h`, `;`, `f`, ε } + +Here are similar examples but now for LAST. + + * LAST(`$d:ident $e:expr`) = { `$e:expr` } + * LAST(`$( $d:ident $e:expr );*`) = { `$e:expr`, ε } + * LAST(`$( $d:ident $e:expr );* $(h)*`) = { `$e:expr`, ε, `h` } + * LAST(`$( $d:ident $e:expr );* $(h)* $( f ;)+`) = { `;` } + * LAST(`$( $d:ident $e:expr );* $(h)* $( f ;)+ g`) = { `g` } + + and again, changing the end part of matcher changes its last set considerably: + + * LAST(`$( $d:ident $e:expr );* $(h)* $($( f ;)+ g)*`) = { `$e:expr`, ε, `h`, `g` } + +## FOLLOW(M) + +Finally, the definition for `FOLLOW(M)` is built up incrementally atop +more primitive functions. + +We first assume a primitive mapping, `FOLLOW(NT)` (defined +[below][follow-nt]) from a simple NT to the set of allowed tokens for +the fragment specifier for that NT. + +Second, we generalize FOLLOW to tokens: FOLLOW(t) = FOLLOW(NT) if t is (a simple) NT. +Otherwise, t must be some other (non NT) token; in this case FOLLOW(t) = ANYTOKEN. + +Finally, we generalize FOLLOW to arbitrary matchers by composing the primitive +functions above: + +``` +FOLLOW(M) = FOLLOW(t1) ∩ FOLLOW(t2) ∩ ... ∩ FOLLOW(tN) + where { t1, t2, ..., tN } = (LAST(M) \ { ε }) +``` + +Examples of FOLLOW (expressed as equality relations between sets, to avoid +incorporating details of FOLLOW(NT) in these examples): + + * FOLLOW(`$( $d:ident $e:expr )*`) = FOLLOW(`$e:expr`) + * FOLLOW(`$( $d:ident $e:expr )* $(;)*`) = FOLLOW(`$e:expr`) ∩ ANYTOKEN = FOLLOW(`$e:expr`) + * FOLLOW(`$( $d:ident $e:expr )* $(;)* $( f |)+`) = ANYTOKEN + +## FOLLOW(NT) +[follow-nt]: #follownt + +Here is the definition for FOLLOW(NT), which maps every simple NT to +the set of tokens that are allowed to follow it, based on the fragment +specifier for the NT. + +The current legal fragment specifiers are: `item`, `block`, `stmt`, `pat`, +`expr`, `ty`, `ident`, `path`, `meta`, and `tt`. + +- `FOLLOW(pat)` = `{FatArrow, Comma, Eq, Or, Ident(if), Ident(in)}` +- `FOLLOW(expr)` = `{FatArrow, Comma, Semicolon}` +- `FOLLOW(ty)` = `{OpenDelim(Brace), Comma, FatArrow, Colon, Eq, Gt, Semi, Or, Ident(as), Ident(where), OpenDelim(Bracket), Nonterminal(Block)}` +- `FOLLOW(stmt)` = `FOLLOW(expr)` +- `FOLLOW(path)` = `FOLLOW(ty)` +- `FOLLOW(block)` = any token +- `FOLLOW(ident)` = any token +- `FOLLOW(tt)` = any token +- `FOLLOW(item)` = any token +- `FOLLOW(meta)` = any token + +(Note that close delimiters are valid following any NT.) + +## Examples of valid and invalid matchers + +With the above specification in hand, we can present arguments for +why particular matchers are legal and others are not. + + * `($ty:ty < foo ,)` : illegal, because FIRST(`< foo ,`) = { `<` } ⊈ FOLLOW(`ty`) + + * `($ty:ty , foo <)` : legal, because FIRST(`, foo <`) = { `,` } is ⊆ FOLLOW(`ty`). + + * `($pa:pat $pb:pat $ty:ty ,)` : illegal, because FIRST(`$pb:pat $ty:ty ,`) = { `$pb:pat` } ⊈ FOLLOW(`pat`), and also FIRST(`$ty:ty ,`) = { `$ty:ty` } ⊈ FOLLOW(`pat`). + + * `( $($a:tt $b:tt)* ; )` : legal, because FIRST(`$b:tt`) = { `$b:tt` } is ⊆ FOLLOW(`tt`) = ANYTOKEN, as is FIRST(`;`) = { `;` }. + + * `( $($t:tt),* , $(t:tt),* )` : legal (though any attempt to actually use this macro will signal a local ambguity error during expansion). + + * `($ty:ty $(; not sep)* -)` : illegal, because FIRST(`$(; not sep)* -`) = { `;`, `-` } is not in FOLLOW(`ty`). + + * `($($ty:ty)-+)` : illegal, because separator `-` is not in FOLLOW(`ty`). + + +# Drawbacks + +It does restrict the input to a MBE, but the choice of delimiters provides +reasonable freedom and can be extended in the future. + +# Alternatives + +1. Fix the syntax that a fragment can parse. This would create a situation + where a future MBE might not be able to accept certain inputs because the + input uses newer features than the fragment that was fixed at 1.0. For + example, in the `bar` MBE above, if the `ty` fragment was fixed before the + unboxed closure sugar was introduced, the MBE would not be able to accept + such a type. While this approach is feasible, it would cause unnecessary + confusion for future users of MBEs when they can't put certain perfectly + valid Rust code in the input to an MBE. Versioned fragments could avoid + this problem but only for new code. +2. Keep `macro_rules` unstable. Given the great syntactical abstraction that + `macro_rules` provides, it would be a shame for it to be unusable in a + release version of Rust. If ever `macro_rules` were to be stabilized, this + same issue would come up. +3. Do nothing. This is very dangerous, and has the potential to essentially + freeze Rust's syntax for fear of accidentally breaking a macro. + +# Edit History + +- Updated by https://github.com/rust-lang/rfcs/pull/1209, which added + semicolons into the follow set for types. + +- Updated by https://github.com/rust-lang/rfcs/pull/1384: + * replaced detailed design with a specification-oriented presentation rather than an implementation-oriented algorithm. + * fixed some oversights in the specification that led to matchers like `$e:expr { stuff }` being accepted (which match fragments like `break { stuff }`, significantly limiting future language extensions), + * expanded the follows sets for `ty` to include `OpenDelim(Brace), Ident(where), Or` (since Rust's grammar already requires all of `|foo:TY| {}`, `fn foo() -> TY {}` and `fn foo() -> TY where {}` to work). + * expanded the follow set for `pat` to include `Or` (since Rust's grammar already requires `match (true,false) { PAT | PAT => {} }` and `|PAT| {}` to work); see also [RFC issue 1336][]. Also added `If` and `In` to follow set for `pat` (to make the specifiation match the old implementation). + +[RFC issue 1336]: https://github.com/rust-lang/rfcs/issues/1336 + +- Updated by https://github.com/rust-lang/rfcs/pull/1462, which added + open square bracket into the follow set for types. + +- Updated by https://github.com/rust-lang/rfcs/pull/1494, which adjusted + the follow set for types to include block nonterminals. + +# Appendices + +## Appendix A: Algorithm for recognizing valid matchers. + +The detailed design above only sought to provide a *specification* for +what a correct matcher is (by defining FIRST, LAST, and FOLLOW, and +specifying the invariant relating FIRST and FOLLOW for all valid +matchers. + +The above specification can be implemented efficiently; we here give +one example algorithm for recognizing valid matchers. + + * This is not the only possible algorithm; for example, one could + precompute a table mapping every suffix of every token-tree + sequence to its FIRST set, by augmenting `FirstSet` below + accordingly. + + Or one could store a subset of such information during the + precomputation, such as just the FIRST sets for complex NT's, and + then use that table to inform a *forward scan* of the input. + + The latter is in fact what my prototype implementation does; I must + emphasize the point that the algorithm here is not prescriptive. + + * The intent of this RFC is that the specifications of FIRST + and FOLLOW above will take precedence over this algorithm if the two + are found to be producing inconsistent results. + +The algorithm for recognizing valid matchers `M` is named ValidMatcher. + +To define it, we will need a mapping from submatchers of M to the +FIRST set for that submatcher; that is handled by `FirstSet`. + +### Procedure FirstSet(M) + +*input*: a token tree `M` representing a matcher + +*output*: `FIRST(M)` + +``` +Let M = tts[1] tts[2] ... tts[n]. +Let curr_first = { ε }. + +For i in n down to 1 (inclusive): + Let tt = tts[i]. + + 1. If tt is a token, curr_first := { tt } + + 2. Else if tt is a delimited sequence `OPEN uu ... ClOSE`, + curr_first := { OPEN } + + 3. Else tt is a complex NT `$(uu ...) SEP OP` + + Let inner_first = FirstSet(`uu ...`) i.e. recursive call + + if OP == `*` or ε ∈ inner_first then + curr_first := curr_first ∪ inner_first + else + curr_first := inner_first + +return curr_first +``` + +(Note: If we were precomputing a full table in this procedure, we would need +a recursive invocation on (uu ...) in step 2 of the for-loop.) + +### Predicate ValidMatcher(M) + +To simplify the specification, we assume in this presentation that all +simple NT's have a valid fragment specifier (i.e., one that has an +entry in the FOLLOW(NT) table above. + +This algorithm works by scanning forward across the matcher M = α β, +(where α is the prefix we have scanned so far, and β is the suffix +that remains to be scanned). We maintain LAST(α) as we scan, and use +it to compute FOLLOW(α) and compare that to FIRST(β). + +*input*: a token tree, `M`, and a set of tokens that could follow it, `F`. + +*output*: LAST(M) (and also signals failure whenever M is invalid) + +``` +Let last_of_prefix = { ε } + +Let M = tts[1] tts[2] ... tts[n]. + +For i in 1 up to n (inclusive): + // For reference: + // α = tts[1] .. tts[i] + // β = tts[i+1] .. tts[n] + // γ is some outer token sequence; the input F represents FIRST(γ) + + 1. Let tt = tts[i]. + + 2. Let first_of_suffix; // aka FIRST(β γ) + + 3. let S = FirstSet(tts[i+1] .. tts[n]); + + 4. if ε ∈ S then + // (include the follow information if necessary) + + first_of_suffix := S ∪ F + + 5. else + + first_of_suffix := S + + 6. Update last_of_prefix via case analysis on tt: + + a. If tt is a token: + last_of_prefix := { tt } + + b. Else if tt is a delimited sequence `OPEN uu ... CLOSE`: + + i. run ValidMatcher( M = `uu ...`, F = { `CLOSE` }) + + ii. last_of_prefix := { `CLOSE` } + + c. Else, tt must be a complex NT, + in other words, `NT = $( uu .. ) SEP OP` or `NT = $( uu .. ) OP`: + + i. If SEP present, + let sublast = ValidMatcher( M = `uu ...`, F = first_of_suffix ∪ { `SEP` }) + + ii. else: + let sublast = ValidMatcher( M = `uu ...`, F = first_of_suffix) + + iii. If ε in sublast then: + last_of_prefix := last_of_prefix ∪ (sublast \ ε) + + iv. Else: + last_of_prefix := sublast + + 7. At this point, last_of_prefix == LAST(α) and first_of_suffix == FIRST(β γ). + + For each simple NT token t in last_of_prefix: + + a. If first_of_suffix ⊆ FOLLOW(t), then we are okay so far. + + b. Otherwise, we have found a token t whose follow set is not compatible + with the FIRST(β γ), and must signal failure. + +// After running the above for loop on all of `M`, last_of_prefix == LAST(M) + +Return last_of_prefix +``` + +This algorithm should be run on every matcher in every `macro_rules` +invocation, with `F` = { `EOF` }. If it rejects a matcher, an error +should be emitted and compilation should not complete. diff --git a/text/0556-raw-lifetime.md b/text/0556-raw-lifetime.md new file mode 100644 index 00000000000..ce1c9d36868 --- /dev/null +++ b/text/0556-raw-lifetime.md @@ -0,0 +1,135 @@ +- Start Date: 2015-01-06 +- RFC PR: [rust-lang/rfcs#556](https://github.com/rust-lang/rfcs/pull/556) +- Rust Issue: [rust-lang/rust#21923](https://github.com/rust-lang/rust/issues/21923) + +# Summary + +Establish a convention throughout the core libraries for unsafe functions +constructing references out of raw pointers. The goal is to improve usability +while promoting awareness of possible pitfalls with inferred lifetimes. + +# Motivation + +The current library convention on functions constructing borrowed +values from raw pointers has the pointer passed by reference, which +reference's lifetime is carried over to the return value. +Unfortunately, the lifetime of a raw pointer is often not indicative +of the lifetime of the pointed-to data. So the status quo eschews the +flexibility of inferring the lifetime from the usage, while falling short +of providing useful safety semantics in exchange. + +A typical case where the lifetime needs to be adjusted is in bindings +to a foregn library, when returning a reference to an object's +inner value (we know from the library's API contract that +the inner data's lifetime is bound to the containing object): +```rust +impl Outer { + fn inner_str(&self) -> &[u8] { + unsafe { + let p = ffi::outer_get_inner_str(&self.raw); + let s = std::slice::from_raw_buf(&p, libc::strlen(p)); + std::mem::copy_lifetime(self, s) + } + } +} +``` +Raw pointer casts also discard the lifetime of the original pointed-to value. + +# Detailed design + +The signature of `from_raw*` constructors will be changed back to what it +once was, passing a pointer by value: +```rust +unsafe fn from_raw_buf<'a, T>(ptr: *const T, len: uint) -> &'a [T] +``` +The lifetime on the return value is inferred from the call context. + +The current usage can be mechanically changed, while keeping an eye on +possible lifetime leaks which need to be worked around by e.g. providing +safe helper functions establishing lifetime guarantees, as described below. + +## Document the unsafety + +In many cases, the lifetime parameter will come annotated or elided from the +call context. The example above, adapted to the new convention, is safe +despite lack of any explicit annotation: +```rust +impl Outer { + fn inner_str(&self) -> &[u8] { + unsafe { + let p = ffi::outer_get_inner_str(&self.raw); + std::slice::from_raw_buf(p, libc::strlen(p)) + } + } +} +``` + +In other cases, the inferred lifetime will not be correct: +```rust + let foo = unsafe { ffi::new_foo() }; + let s = unsafe { std::slice::from_raw_buf(foo.data, foo.len) }; + + // Some lines later + unsafe { ffi::free_foo(foo) }; + + // More lines later + let guess_what = s[0]; + // The lifetime of s is inferred to extend to the line above. + // That code told you it's unsafe, didn't it? +``` + +Given that the function is unsafe, the code author should exercise due care +when using it. However, the pitfall here is not readily apparent at the +place where the invalid usage happens, so it can be easily committed by an +inexperienced user, or inadvertently slipped in with a later edit. + +To mitigate this, the documentation on the reference-from-raw functions +should include caveats warning about possible misuse and suggesting ways to +avoid it. When an 'anchor' object providing the lifetime is available, the +best practice is to create a safe helper function or method, taking a +reference to the anchor object as input for the lifetime parameter, like in +the example above. The lifetime can also be explicitly assigned with +`std::mem::copy_lifetime` or `std::mem::copy_lifetime_mut`, or annotated when +possible. + +## Fix copy_mut_lifetime + +To improve composability in cases when the lifetime does need to be assigned +explicitly, the first parameter of `std::mem::copy_mut_lifetime` +should be made an immutable reference. There is no reason for the lifetime +anchor to be mutable: the pointer's mutability is usually the relevant +question, and it's an unsafe function to begin with. This wart may +breed tedious, mut-happy, or transmute-happy code, when e.g. a container +providing the lifetime for a mutable view into its contents is not itself +necessarily mutable. + +# Drawbacks + +The implicitly inferred lifetimes are unsafe in sneaky ways, so care is +required when using the borrowed values. + +Changing the existing functions is an API break. + +# Alternatives + +An earlier revision of this RFC proposed adding a generic input parameter to +determine the lifetime of the returned reference: +```rust +unsafe fn from_raw_buf<'a, T, U: Sized?>(ptr: *const T, len: uint, + life_anchor: &'a U) + -> &'a [T] +``` +However, an object with a suitable lifetime is not always available +in the context of the call. In line with the general trend in Rust libraries +to favor composability, `std::mem::copy_lifetime` and +`std::mem::copy_lifetime_mut` should be the principal methods to explicitly +adjust a lifetime. + +# Unresolved questions + +Should the change in function parameter signatures be done before 1.0? + +# Acknowledgements + +Thanks to Alex Crichton for shepherding this proposal in a constructive and +timely manner. He has in fact rationalized the convention in its present form. diff --git a/text/0558-require-parentheses-for-chained-comparisons.md b/text/0558-require-parentheses-for-chained-comparisons.md new file mode 100644 index 00000000000..3651d2e248d --- /dev/null +++ b/text/0558-require-parentheses-for-chained-comparisons.md @@ -0,0 +1,71 @@ +- Start Date: 2015-01-07 +- RFC PR: [rust-lang/rfcs#558](https://github.com/rust-lang/rfcs/pull/558) +- Rust Issue: [rust-lang/rust#20724](https://github.com/rust-lang/rust/issues/20724) + +# Summary + +Remove chaining of comparison operators (e.g. `a == b == c`) from the syntax. +Instead, require extra parentheses (`(a == b) == c`). + +# Motivation + +```rust +fn f(a: bool, b: bool, c: bool) -> bool { + a == b == c +} +``` + +This code is currently accepted and is evaluated as `((a == b) == c)`. +This may be confusing to programmers coming from languages like Python, +where chained comparison operators are evaluated as `(a == b && b == c)`. + +In C, the same problem exists (and is excerbated by implicit conversions). +Styleguides like Misra-C require the use of parentheses in this case. + +By requiring the use of parentheses, we avoid potential confusion now, +and open up the possibility for python-like chained comparisons post-1.0. + +Additionally, making the chain `f < b > (c)` invalid allows us to easily produce +a diagnostic message: "Use `::<` instead of `<` if you meant to specify type arguments.", +which would be a vast improvement over the current diagnostics for this mistake. + +# Detailed design + +Emit a syntax error when a comparison operator appears as an operand of another comparison operator +(without being surrounded by parentheses). +The comparison operators are `<` `>` `<=` `>=` `==` and `!=`. + +This is easily implemented directly in the parser. + +Note that this restriction on accepted syntax will effectively merge the precedence level 4 (`<` `>` `<=` `>=`) with level 3 (`==` `!=`). + +# Drawbacks + +It's a breaking change. + +In particular, code that currently uses the difference between precedence level 3 and 4 breaks +and will require the use of parentheses: + +```rust +if a < 0 == b < 0 { /* both negative or both non-negative */ } +``` + +However, I don't think this kind of code sees much use. +The rustc codebase doesn't seem to have any occurrences of chained comparisons. + +# Alternatives + +As this RFC just makes the chained comparison syntax available for post-1.0 language features, +pretty much every alternative (including returning to the status quo) can still be implemented later. + +If this RFC is not accepted, it will be impossible to add python-style chained comparison operators later. + +A variation on this RFC would be to keep the separation between precedence level 3 and 4, and only reject programs +where a comparison operator appears as an operand of another comparison operator of the same precedence level. +This minimizes the breaking changes, but does not allow full python-style chained comparison operators in the future +(although a more limited form of them would still be possible). + +# Unresolved questions + +Is there real code that would get broken by this change? +So far, I've been unable to find any. diff --git a/text/0560-integer-overflow.md b/text/0560-integer-overflow.md new file mode 100644 index 00000000000..539f225c1cd --- /dev/null +++ b/text/0560-integer-overflow.md @@ -0,0 +1,548 @@ +- Start Date: 2014-06-30 +- RFC PR #: https://github.com/rust-lang/rfcs/pull/560 +- Rust Issue #: https://github.com/rust-lang/rust/issues/22020 + +# Summary + +Change the semantics of the built-in fixed-size integer types from +being defined as wrapping around on overflow to it being considered a +program error (but *not* undefined behavior in the C +sense). Implementations are *permitted* to check for overflow at any +time (statically or dynamically). Implementations are *required* to at +least check dynamically when `debug_assert!` assertions are +enabled. Add a `WrappingOps` trait to the standard library with +operations defined as wrapping on overflow for the limited number of +cases where this is the desired semantics, such as hash functions. + +# Motivation + +Numeric overflow prevents a difficult situation. On the one hand, +overflow (and [underflow]) is known to be a common source of error in +other languages. Rust, at least, does not have to worry about memory +safety violations, but it is still possible for overflow to lead to +bugs. Moreover, Rust's safety guarantees do not apply to `unsafe` +code, which carries the +[same risks as C code when it comes to overflow][phrack]. Unfortunately, +banning overflow outright is not feasible at this time. Detecting +overflow statically is not practical, and detecting it dynamically can +be costly. Therefore, we have to steer a middle ground. + +[phrack]: http://phrack.org/issues/60/10.html#article +[underflow]: http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Integer_Types + +The RFC has several major goals: + +1. Ensure that code which intentionally uses wrapping semantics is + clearly identified. +2. Help users to identify overflow problems and help those who wish to + be careful about overflow to do so. +3. Ensure that users who wish to detect overflow can safely enable + overflow checks and dynamic analysis, both on their code and on + libraries they use, with a minimal risk of "false positives" + (intentional overflows leading to a panic). +4. To the extent possible, leave room in the future to move towards + universal overflow checking if it becomes feasible. This may require + opt-in from end-users. + +To that end the RFC proposes two mechanisms: + +1. Optional, dynamic overflow checking. Ordinary arithmetic operations + (e.g., `a+b`) would conditionally check for overflow. If an + overflow occurs when checking is enabled, a thread panic will be + signaled. Specific intrinsics and library support are provided to + permit either explicit overflow checks or explicit wrapping. +2. Overflow checking would be, by default, tied to debug assertions + (`debug_assert!`). It can be seen as analogous to a debug + assertion: an important safety check that is too expensive to + perform on all code. + +We expect that additional and finer-grained mechanisms for enabling +overflows will be added in the future. One easy option is a +command-line switch to enable overflow checking universally or within +specific crates. Another option might be lexically scoped annotations +to enable overflow (or perhaps disable) checking in specific +blocks. Neither mechanism is detailed in this RFC at this time. + +## Why tie overflow checking to debug assertions + +The reasoning behind connecting overflow checking and debug assertion +is that it ensures that pervasive checking for overflow is performed +*at some point* in the development cycle, even if it does not take +place in shipping code for performance reasons. The goal of this is to +prevent "lock-in" where code has a de-facto reliance on wrapping +semantics, and thus incorrectly breaks when stricter checking is +enabled. + +We would like to allow people to switch "pervasive" overflow checks on +by default, for example. However, if the default is not to check for +overflow, then it seems likely that a pervasive check like that could +not be used, because libraries are sure to come to rely on wrapping +semantics, even if accidentally. + +By making the default for debugging code be checked overflow, we help +ensure that users will encounter overflow errors in practice, and thus +become aware that overflow in Rust is not the norm. It will also help +debug simple errors, like unsigned underflow leading to an infinite +loop. + +# Detailed design + +## Arithmetic operations with error conditions + +There are various operations which can sometimes produce error +conditions (detailed below). Typically these error conditions +correspond to under/overflow but not exclusively. It is the +programmers responsibility to avoid these error conditions: any +failure to do so can be considered a bug, and hence can be flagged by +a static/dynamic analysis tools as an error. This is largerly a +semantic distinction, though. + +The result of an error condition depends upon the state of overflow +checking, which can be either *enabled* or *default* (this RFC does +not describe a way to disable overflow checking completely). If +overflow checking is *enabled*, then an error condition always results +in a panic. For efficiency reasons, this panic may be delayed over +some number of pure operations, as described below. + +If overflow checking is *default*, that means that erroneous +operations will produce a value as specified below. Note though that +code which encounters an error condition is still considered buggy. +In particular, Rust source code (in particular library code) cannot +rely on wrapping semantics, and should always be written with the +assumption that overflow checking *may* be enabled. This is because +overflow checking may be enabled by a downstream consumer of the +library. + +In the future, we could add some way to explicitly *disable* overflow +checking in a scoped fashion. In that case, the result of each error +condition would simply be the same as the optional state when no panic +occurs, and this would requests for override checking specified +elsewhere. However, no mechanism for disabling overflow checks is +provided by this RFC: instead, it is recommended that authors use the +wrapped primitives. + +The error conditions that can arise, and their defined results, are as +follows. The intention is that the defined results are the same as the +defined results today. The only change is that now a panic may result. + +- The operations `+`, `-`, `*`, can underflow and overflow. When checking is + enabled this will panic. When checking is disabled this will two's complement + wrap. +- The operations `/`, `%` for the arguments `INT_MIN` and `-1` + will unconditionally panic. This is unconditional for legacy reasons. +- Shift operations (`<<`, `>>`) on a value of with `N` can be passed a shift value + >= `N`. It is unclear what behaviour should result from this, so the shift value + is unconditionally masked to be modulo `N` to ensure that the argument is always + in range. + +## Enabling overflow checking + +Compilers should present a command-line option to enable overflow +checking universally. Additionally, when building in a default "debug" +configuration (i.e., whenever `debug_assert` would be enabled), +overflow checking should be enabled by default, unless the user +explicitly requests otherwise. The precise control of these settings +is not detailed in this RFC. + +The goal of this rule is to ensure that, during debugging and normal +development, overflow detection is on, so that users can be alerted to +potential overflow (and, in particular, for code where overflow is +expected and normal, they will be immediately guided to use the +wrapping methods introduced below). However, because these checks will +be compiled out whenever an optimized build is produced, final code +will not pay a performance penalty. + +In the future, we may add additional means to control when overflow is +checked, such as scoped attributes or a global, independent +compile-time switch. + +## Delayed panics + +If an error condition should occur and a thread panic should result, +the compiler is not required to signal the panic at the precise point +of overflow. It is free to coalesce checks from adjacent pure +operations. Panics may never be delayed across an unsafe block nor may +they be skipped entirely, however. The precise details of how panics +may be deferred -- and the definition of a pure operation -- can be +hammered out over time, but the intention here is that, at minimum, +overflow checks for adjacent numeric operations like `a+b-c` can be +coallesced into a single check. Another useful example might be that, +when summing a vector, the final overflow check could be deferred +until the summation is complete. + +## Methods for explicit wrapping arithmetic + +For those use cases where explicit wraparound on overflow is required, +such as hash functions, we must provide operations with such +semantics. Accomplish this by providing the following methods defined +in the inherent impls for the various integral types. + +```rust +impl i32 { // and i8, i16, i64, isize, u8, u32, u64, usize + fn wrapping_add(self, rhs: Self) -> Self; + fn wrapping_sub(self, rhs: Self) -> Self; + fn wrapping_mul(self, rhs: Self) -> Self; + fn wrapping_div(self, rhs: Self) -> Self; + fn wrapping_rem(self, rhs: Self) -> Self; + + fn wrapping_lshift(self, amount: u32) -> Self; + fn wrapping_rshift(self, amount: u32) -> Self; +} +``` + +These are implemented to preserve the pre-existing, wrapping semantics +unconditionally. + +### `Wrapping` type for convenience + +For convenience, the `std::num` module also provides a `Wrapping` +newtype for which the operator overloads are implemented using the +`WrappingOps` trait: + + pub struct Wrapping(pub T); + + impl Add, Wrapping> for Wrapping { + fn add(&self, other: &Wrapping) -> Wrapping { + self.wrapping_add(*other) + } + } + + // Likewise for `Sub`, `Mul`, `Div`, and `Rem` + +Note that this is only for potential convenience. The type-based approach has the +drawback that e.g. `Vec` and `Vec>` are incompatible types. + +## Lint + +In general it seems inadvisable to use operations with error +conditions (like a naked `+` or `-`) in unsafe code. It would be +better to use explicit `checked` or `wrapped` operations as +appropriate. The same holds for destructors, since unwinding in +destructors is inadvisable. Therefore, the RFC recommends a lint be +added against such operations, defaulting to warn, though the details +(such as the name of this lint) are not spelled out. + +# Drawbacks + +**Making choices is hard.** Having to think about whether wraparound +arithmetic is appropriate may cause an increased cognitive +burden. However, wraparound arithmetic is almost never the intended +behavior. Therefore, programmers should be able to keep using the +built-in integer types and to not think about this. Where wraparound +semantics are required, it is generally a specialized use case with +the implementor well aware of the requirement. + +**Loss of additive commutativity and benign overflows.** In some +cases, overflow behavior can be benign. For example, given an +expression like `a+b-c`, intermediate overflows are not harmful so +long as the final result is within the range of the integral type. To +take advantage of this property, code would have to be written to use +the wrapping constructs, such as `a.wrapping_add(b).wrapping_sub(c)`. +However, this drawback is counterbalanced by the large number of +arithmetic expressions which do not have the same behavior when +overflow occurs. A common example is `(max+min)/2`, which is a typical +ingredient for [binary searches and the like][BS] and can lead to very +surprising behavior. Moreover, the use of `wrapping_add` and +`wrapping_sub` to highlight the fact that the intermediate result may +overflow seems potentially useful to an end-reader. + +[BS]: http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html + +**Danger of triggering additional panics from within unsafe code.** +This proposal creates more possibility for panics to occur, at least +when checks are enabled. As usual, a panic at an inopportune time can +lead to bugs if code is not exception safe. This is particularly +worrisome in unsafe code, where crucial safety guarantees can be +violated. However, this danger already exists, as there are numerous +ways to trigger a panic, and hence unsafe code must be written with +this in mind. It seems like the best advice is for unsafe code to +eschew the plain `+` and `-` operators, and instead prefer explicit +checked or wrapping operations as appropriate (hence the proposed +lint). Furthermore, the danger of an unexpected panic occurring in +unsafe code must be weighed against the danger of a (silent) overflow, +which can also lead to unsafety. + +**Divergence of debug and optimized code.** The proposal here causes +additional divergence of debug and optimized code, since optimized +code will not include overflow checking. It would therefore be +recommended that robust applications run tests both with and without +optimizations (and debug assertions). That said, this state of affairs +already exists. First, the use of `debug_assert!` causes +debug/optimized code to diverge, but also, optimizations are known to +cause non-trivial changes in behavior. For example, recursive (but +pure) functions may be optimized away entirely by LLVM. Therefore, it +always makes sense to run tests in both modes. This situation is not +unique to Rust; most major projects do something similar. Moreover, in +most languages, `debug_assert!` is in fact the only (or at least +predominant) kind of of assertion, and hence the need to run tests +both with and without assertions enabled is even stronger. + +**Benchmarking.** Someone may conduct a benchmark of Rust with +overflow checks turned on, post it to the Internet, and mislead the +audience into thinking that Rust is a slow language. The choice of +defaults minimizes this risk, however, since doing an optimized build +in cargo (which ought to be a prerequisite for any benchmark) also +disables debug assertions (or ought to). + +**Impact of overflow checking on optimization.** In addition to the +direct overhead of checking for overflow, there is some additional +overhead when checks are enabled because compilers may have to forego +other optimizations or code motion that might have been legal. This +concern seems minimal since, in optimized builds, overflow checking +will not be enabled. Certainly if we ever decided to change the +default for overflow checking to *enabled* in optimized builds, we +would want to measure carefully and likely include some means of +disabling checks in particularly hot paths. + +# Alternatives and possible future directions + +## Do nothing for now + +Defer any action until later, as advocated by: + + * [Patrick Walton on June 22][PW22] + +Reasons this was not pursued: The proposed changes are relatively well-contained. +Doing this after 1.0 would require either breaking existing programs which rely +on wraparound semantics, or introducing an entirely new set of integer types and +porting all code to use those types, whereas doing it now lets us avoid +needlessly proliferating types. Given the paucity of circumstances where +wraparound semantics is appropriate, having it be the default is defensible only +if better options aren't available. + +## Scoped attributes to control runtime checking + +The [original RFC][GH] proposed a system of scoped attributes for +enabling/disabling overflow checking. Nothing in the current RFC +precludes us from going in this direction in the future. Rather, this +RFC is attempting to answer the question (left unanswered in the +original RFC) of what the behavior ought to be when no attribute is in +scope. + +The proposal for scoped attributes in the original RFC was as follows. +Introduce an `overflow_checks` attribute which can be used to turn +runtime overflow checks on or off in a given +scope. `#[overflow_checks(on)]` turns them on, +`#[overflow_checks(off)]` turns them off. The attribute can be applied +to a whole `crate`, a `mod`ule, an `fn`, or (as per [RFC 40][40]) a +given block or a single expression. When applied to a block, this is +analogous to the `checked { }` blocks of C#. As with lint attributes, +an `overflow_checks` attribute on an inner scope or item will override +the effects of any `overflow_checks` attributes on outer scopes or +items. Overflow checks can, in fact, be thought of as a kind of +run-time lint. Where overflow checks are in effect, overflow with the +basic arithmetic operations and casts on the built-in fixed-size +integer types will invoke task failure. Where they are not, the checks +are omitted, and the result of the operations is left unspecified (but +will most likely wrap). + +Significantly, turning `overflow_checks` on or off should only produce an +observable difference in the behavior of the program, beyond the time it takes +to execute, if the program has an overflow bug. + +It should also be emphasized that `overflow_checks(off)` only disables *runtime* +overflow checks. Compile-time analysis can and should still be performed where +possible. Perhaps the name could be chosen to make this more obvious, such as +`runtime_overflow_checks`, but that starts to get overly verbose. + +Illustration of use: + + // checks are on for this crate + #![overflow_checks(on)] + + // but they are off for this module + #[overflow_checks(off)] + mod some_stuff { + + // but they are on for this function + #[overflow_checks(on)] + fn do_thing() { + ... + + // but they are off for this block + #[overflow_checks(off)] { + ... + // but they are on for this expression + let n = #[overflow_checks(on)] (a * b + c); + ... + } + + ... + } + + ... + } + + ... + +[40]: https://github.com/rust-lang/rfcs/blob/master/active/0040-more-attributes.md + +## Checks off means wrapping on + +If we adopted a model of overflow checks, one could use an explicit +request to turn overflow checks *off* as a signal that wrapping is +desirted. This would allow us to do without the `WrappingOps` trait +and to avoid having unspecified results. See: + + * [Daniel Micay on June 24][DM24_2] + +Reasons this was not pursued: The official semantics of a type should not change +based on the context. It should be possible to make the choice between turning +checks `on` or `off` solely based on performance considerations. It should be +possible to distinguish cases where checking was too expensive from where +wraparound was desired. (Wraparound is not usually desired.) + +## Different operators + +Have the usual arithmetic operators check for overflow, and introduce a new set +of operators with wraparound semantics, as done by Swift. Alternately, do the +reverse: make the normal operators wrap around, and introduce new ones which +check. + +Reasons this was not pursued: New, strange operators would pose an entrance +barrier to the language. The use cases for wraparound semantics are not common +enough to warrant having a separate set of symbolic operators. + +## Different types + +Have separate sets of fixed-size integer types which wrap around on overflow and +which are checked for overflow (e.g. `u8`, `u8c`, `i8`, `i8c`, ...). + +Reasons this was not pursued: Programmers might be confused by having to choose +among so many types. Using different types would introduce compatibility hazards +to APIs. `Vec` and `Vec` are incompatible. Wrapping arithmetic is not +common enough to warrant a whole separate set of types. + +## Just use `Checked*` + +Just use the existing `Checked` traits and a `Checked` type after the same +fashion as the `Wrapping` in this proposal. + +Reasons this was not pursued: Wrong defaults. Doesn't enable distinguishing +"checking is slow" from "wrapping is desired" from "it was the default". + +## Runtime-closed range types + +[As proposed by Bill Myers.][BM-RFC] + +Reasons this was not pursued: My brain melted. :( + +## Making `as` be checked + +The RFC originally specified that using `as` to convert between types +would cause checked semantics. However, we now use `as` as a primitive +type operator. This decision was discussed on the +[discuss message board][as]. + +The key points in favor of reverting `as` to its original semantics +were: + +1. `as` is already a fairly low-level operator that can be used (for + example) to convert between `*mut T` and `*mut U`. +2. `as` is the only way to convert types in constants, and hence it is + important that it covers all possibilities that constants might + need (eventually, [const fn][911] or other approaches may change + this, but those are not going to be stable for 1.0). +3. The [type ascription RFC][803] set the precedent that `as` is used + for "dangerous" coercions that require care. +4. Eventually, checked numeric conversions (and perhaps most or all + uses of `as`) can be ergonomically added as methods. The precise + form of this will be resolved in the future. [const fn][911] can + then allow these to be used in constant expressions. + +[as]: http://internals.rust-lang.org/t/on-casts-and-checked-overflow/1710/ +[803]: https://github.com/rust-lang/rfcs/pull/803 +[911]: https://github.com/rust-lang/rfcs/pull/911 + +# Unresolved questions + +None today (see Updates section below). + +# Future work + + * Look into adopting imprecise exceptions and a similar design to Ada's, and to + what is explored in the research on AIR (As Infinitely Ranged) semantics, to + improve the performance of checked arithmetic. See also: + + * [Cameron Zwarich on June 22][CZ22] + * [John Regehr on June 23][JR23_2] + + * Make it easier to use integer types of unbounded size, i.e. actual + mathematical integers and naturals. + +[BM-RFC]: https://github.com/bill-myers/rfcs/blob/no-integer-overflow/active/0000-no-integer-overflow.md +[PW22]: https://mail.mozilla.org/pipermail/rust-dev/2014-June/010494.html +[DM24_2]: https://mail.mozilla.org/pipermail/rust-dev/2014-June/010590.html +[CZ22]: https://mail.mozilla.org/pipermail/rust-dev/2014-June/010483.html +[JR23_2]: https://mail.mozilla.org/pipermail/rust-dev/2014-June/010527.html + +# Updates since being accepted + +Since it was accepted, the RFC has been updated as follows: + +1. The wrapping methods were moved to be inherent, since we gained the + capability for libstd to declare inherent methods on primitive + integral types. +2. `as` was changed to restore the behavior before the RFC (that is, + it truncates to the target bitwidth and reinterprets the highest + order bit, a.k.a. sign-bit, as necessary, as a C cast would). +3. Shifts were specified to mask off the bits of over-long shifts. +4. Overflow was specified to be two's complement wrapping (this was mostly + a clarification). +5. `INT_MIN / -1` and `INT_MIN % -1` panics. + +# Acknowledgements and further reading + +This RFC was [initially written by Gábor Lehel][GH] and was since +edited by Nicholas Matsakis into its current form. Although the text +has changed significantly, the spirit of the original is preserved (at +least in our opinion). The primary changes from the original are: + +1. Define the results of errors in some cases rather than using undefined values. +2. Move discussion of scoped attributes to the "future directions" section. +3. Define defaults for when overflow checking is enabled. + +Many aspects of this proposal and many of the ideas within it were +influenced and inspired by +[a discussion on the rust-dev mailing list][GL18]. The author is +grateful to everyone who provided input, and would like to highlight +the following messages in particular as providing motivation for the +proposal. + +On the limited use cases for wrapping arithmetic: + + * [Jerry Morrison on June 20][JM20] + +On the value of distinguishing where overflow is valid from where it is not: + + * [Gregory Maxwell on June 18][GM18] + * [Gregory Maxwell on June 24][GM24] + * [Robert O'Callahan on June 24][ROC24] + * [Jerry Morrison on June 24][JM24] + +The idea of scoped attributes: + + * [Daniel Micay on June 23][DM23] + +On the drawbacks of a type-based approach: + + * [Daniel Micay on June 24][DM24] + +In general: + + * [John Regehr on June 23][JR23] + * [Lars Bergstrom on June 24][LB24] + +Further credit is due to the commenters in the [GitHub discussion thread][GH]. + +[GL18]: https://mail.mozilla.org/pipermail/rust-dev/2014-June/010363.html +[GM18]: https://mail.mozilla.org/pipermail/rust-dev/2014-June/010371.html +[JM20]: https://mail.mozilla.org/pipermail/rust-dev/2014-June/010410.html +[DM23]: https://mail.mozilla.org/pipermail/rust-dev/2014-June/010566.html +[JR23]: https://mail.mozilla.org/pipermail/rust-dev/2014-June/010558.html +[GM24]: https://mail.mozilla.org/pipermail/rust-dev/2014-June/010580.html +[ROC24]: https://mail.mozilla.org/pipermail/rust-dev/2014-June/010602.html +[DM24]: https://mail.mozilla.org/pipermail/rust-dev/2014-June/010598.html +[JM24]: https://mail.mozilla.org/pipermail/rust-dev/2014-June/010596.html +[LB24]: https://mail.mozilla.org/pipermail/rust-dev/2014-June/010579.html +[GH]: https://github.com/rust-lang/rfcs/pull/146 diff --git a/text/0563-remove-ndebug.md b/text/0563-remove-ndebug.md new file mode 100644 index 00000000000..34f29fcda36 --- /dev/null +++ b/text/0563-remove-ndebug.md @@ -0,0 +1,63 @@ +- Start Date: (fill me in with today's date, YYYY-MM-DD) +- RFC PR: [rust-lang/rfcs#563](https://github.com/rust-lang/rfcs/pull/563) +- Rust Issue: [rust-lang/rust#22492](https://github.com/rust-lang/rust/issues/22492) + +# Summary + +Remove official support for the `ndebug` config variable, replace the current usage of it with a +more appropriate `debug_assertions` compiler-provided config variable. + +# Motivation + +The usage of 'ndebug' to indicate a release build is a strange holdover from C/C++. It is not used +much and is easy to forget about. Since it used like any other value passed to the `cfg` flag, it +does not interact with other flags such as `-g` or `-O`. + +The only current users of `ndebug` are the implementations of the `debug_assert!` macro. At the +time of this writing integer overflow checking is will also be controlled by this variable. Since +the optimisation setting does not influence `ndebug`, this means that code that the user expects to +be optimised will still contain the overflow checking logic. Similarly, `debug_assert!` invocations +are not removed, contrary to what intuition should expect. Enabling optimisations should been seen +as a request to make the user's code faster, removing `debug_assert!` and other checks seems like +a natural consequence. + +# Detailed design + +The `debug_assertions` configuration variable, the replacement for the `ndebug` variable, will be +compiler provided based on the value of the `opt-level` codegen flag, including the implied value +from `-O`. Any value higher than 0 will disable the variable. + +Another codegen flag `debug-assertions` will override this, forcing it on or off based on the value +passed to it. + +# Drawbacks + +Technically backwards incompatible change. However the only usage of the `ndebug` variable in the +rust tree is in the implementation of `debug_assert!`, so it's unlikely that any external code is +using it. + +# Alternatives + +No real alternatives beyond different names and defaults. + +# Unresolved questions + +From the RFC discussion there remain some unresolved details: + +* brson + [writes](https://github.com/rust-lang/rfcs/pull/563#issuecomment-72549694), + "I have a minor concern that `-C debug-assertions` might not be the + right place for this command line flag - it doesn't really affect + code generation, at least in the current codebase (also `--cfg + debug_assertions` has the same effect).". +* huonw + [writes](https://github.com/rust-lang/rfcs/pull/563#issuecomment-72550619), + "It seems like the flag could be more than just a boolean, but + rather take a list of what to enable to allow fine-grained control, + e.g. none, overflow-checks, debug_cfg,overflow-checks, all. (Where + -C debug-assertions=debug_cfg acts like --cfg debug.)". +* huonw + [writes](https://github.com/rust-lang/rfcs/pull/563#issuecomment-74762795), + "if we want this to apply to more than just debug_assert do we want + to use a name other than -C debug-assertions?". + diff --git a/text/0565-show-string-guidelines.md b/text/0565-show-string-guidelines.md new file mode 100644 index 00000000000..aab154cd721 --- /dev/null +++ b/text/0565-show-string-guidelines.md @@ -0,0 +1,166 @@ +- Start Date: 2015-01-08 +- RFC PR: [rust-lang/rfcs#565](https://github.com/rust-lang/rfcs/pull/565) +- Rust Issue: [rust-lang/rust#21436](https://github.com/rust-lang/rust/issues/21436) + +# Summary + +A [recent RFC](https://github.com/rust-lang/rfcs/pull/504) split what was +previously `fmt::Show` into two traits, `fmt::Show` and `fmt::String`, with +format specifiers `{:?}` and `{}` respectively. + +That RFC did not, however, establish complete conventions for when to implement +which of the traits, nor what is expected from the output. That's what this RFC +seeks to do. + +It turns out that, due to the suggested conventions and other +concerns, renaming the traits is also desirable. + +# Motivation + +Part of the reason for splitting up `Show` in the first place was some tension +around the various use cases it was trying to cover, and the fact that it could +not cover them all simultaneously. Now that the trait has been split, this RFC +aims to provide clearer guidelines about their use. + +# Detailed design + +The design of the conventions stems from two basic desires: + +1. It should be easy to generate a debugging representation of + essentially any type. + +2. It should be possible to create user-facing text output via convenient + interpolation. + +Part of the premise behind (2) is that user-facing output cannot automatically +be "composed" from smaller pieces of user-facing output (via, say, +`#[derive]`). Most of the time when you're preparing text for a user +consumption, the output needs to be quite tailored, and interpolation via +`format` is a good tool for that job. + +As part of the conventions being laid out here, the RFC proposes to: + +1. Rename `fmt::Show` to `fmt::Debug`, and +2. Rename `fmt::String` to `fmt::Display`. + +## Debugging: `fmt::Debug` + +The `fmt::Debug` trait is intended for debugging. It should: + +* Be implemented on every type, usually via `#[derive(Debug)]`. +* Never panic. +* Escape away control characters. +* Introduce quotes and other delimiters as necessary to give a clear + representation of the data involved. +* Focus on the *runtime* aspects of a type; repeating information such as + suffixes for integer literals is not generally useful since that data is + readily available from the type definition. + +In terms of the output produced, the goal is make it easy to make sense of +compound data of various kinds without overwhelming debugging output +with every last bit of type information -- most of which is readily +available from the source. The following rules give rough guidance: + +* Scalars print as unsuffixed literals. +* Strings print as normal quoted notation, with escapes. +* Smart pointers print as whatever they point to (without further annotation). +* Fully public structs print as you'd normally construct them: + `MyStruct { f1: ..., f2: ... }` +* Enums print as you'd construct their variants (possibly with special + cases for things like `Option` and single-variant enums?). +* Containers print using *some* notation that makes their type and + contents clear. (Since we lack literals for all container types, + this will be ad hoc). + +It is *not* a *requirement* for the debugging output to be valid Rust +source. This is in general not possible in the presence of private +fields and other abstractions. However, when it is feasible to do so, +debugging output *should* match Rust syntax; doing so makes it easier +to copy debug output into unit tests, for example. + +## User-facing: `fmt::Display` + +The `fmt::Display` trait is intended for user-facing output. It should: + +* Be implemented for scalars, strings, and other basic types. +* Be implemented for generic wrappers like `Option` or smart pointers, where + the output can be wholly delegated to a *single* `fmt::Display` implementation + on the underlying type. +* *Not* be implemented for generic containers like `Vec` or even `Result`, + where there is no useful, general way to tailor the output for user consumption. +* Be implemented for *specific* user-defined types as useful for an application, + with application-defined user-facing output. In particular, applications will + often make their types implement `fmt::Display` specifically for use in + `format` interpolation. +* Never panic. +* Avoid quotes, escapes, and so on unless specifically desired for a user-facing purpose. +* Require use of an explicit adapter (like the `display` method in + `Path`) when it potentially looses significant information. + +A common pattern for `fmt::Display` is to provide simple "adapters", which are +types wrapping another type for the sole purpose of formatting in a certain +style or context. For example: + +```rust +pub struct ForHtml<'a, T>(&'a T); +pub struct ForCli<'a, T>(&'a T); + +impl MyInterestingType { + fn for_html(&self) -> ForHtml { ForHtml(self) } + fn for_cli(&self) -> ForCli { ForCli(self) } +} + +impl<'a> fmt::Display for ForHtml<'a, MyInterestingType> { ... } +impl<'a> fmt::Display for ForCli<'a, MyInterestingType> { ... } +``` + +## Rationale for format specifiers + +Given the above conventions, it should be clear that `fmt::Debug` is +much more commonly *implemented* on types than `fmt::Display`. Why, +then, use `{}` for `fmt::Display` and `{:?}` for `fmt::Debug`? Aren't +those the wrong defaults? + +There are two main reasons for this choice: + +* Debugging output usually makes very little use of interpolation. In general, + one is typically using `#[derive(Show)]` or `format!("{:?}", + something_to_debug)`, and the latter is better done via + [more direct convenience](https://github.com/SimonSapin/rust-std-candidates#the-show-debugging-macro). + +* When creating tailored string output via interpolation, the expected "default" + formatting for things like strings is unquoted and unescapted. It would be + surprising if the default specifiers below did not yield `"hello, world!" as the + output string. + + ```rust + format!("{}, {}!", "hello", "world") + ``` + +In other words, although more types implement `fmt::Debug`, most +meaningful uses of interpolation (other than in such implementations) +will use `fmt::Display`, making `{}` the right choice. + +## Use in errors + +Right now, the (unstable) `Error` trait comes equipped with a `description` +method yielding an `Option`. This RFC proposes to drop this method an +instead inherit from `fmt::Display`. It likewise proposes to make `unwrap` in +`Result` depend and use `fmt::Display` rather than `fmt::Debug`. + +The reason in both cases is the same: although errors are often thought of in +terms of debugging, the messages they result in are often presented directly to +the user and should thus be tailored. Tying them to `fmt::Display` makes it +easier to remember and add such tailoring, and less likely to spew a lot of +unwanted internal representation. + +# Alternatives + +We've already explored an alternative where `Show` tries to play both of the +roles above, and found it to be problematic. There may, however, be alternative +conventions for a multi-trait world. The RFC author hopes this will emerge from +the discussion thread. + +# Unresolved questions + +(Previous questions here have been resolved in an RFC update). diff --git a/text/0572-rustc-attribute.md b/text/0572-rustc-attribute.md new file mode 100644 index 00000000000..43555f2d0ba --- /dev/null +++ b/text/0572-rustc-attribute.md @@ -0,0 +1,49 @@ +- Start Date: 2015-01-11 +- RFC PR: [#572](https://github.com/rust-lang/rfcs/pull/572) +- Rust Issue: [#22203](https://github.com/rust-lang/rust/issues/22203) + +# Summary + +Feature gate unused attributes for backwards compatibility. + +# Motivation + +Interpreting the current backwards compatibility rules strictly, it's not possible to add any further +language features that use new attributes. For example, if we wish to add a feature that expands +the attribute `#[awesome_deriving(Encodable)]` into an implementation of `Encodable`, any existing code that +contains uses of the `#[awesome_deriving]` attribute might be broken. While such attributes are useless in release 1.0 code +(since syntax extensions aren't allowed yet), we still have a case of code that stops compiling after an update of a release build. + + +# Detailed design + +We add a feature gate, `custom_attribute`, that disallows the use of any attributes not defined by the compiler or consumed in any other way. + +This is achieved by elevating the `unused_attribute` lint to a feature gate check (with the gate open, it reverts to being a lint). We'd also need to ensure that it runs after all the other lints (currently it runs as part of the main lint check and might warn about attributes which are actually consumed by other lints later on). + +Eventually, we can try for a namespacing system as described below, however with unused attributes feature gated, we need not worry about it until we start considering stabilizing plugins. + +# Drawbacks + +I don't see much of a drawback (except that the alternatives below might be more lucrative). This might make it harder for people who wish to use custom attributes for static analysis in 1.0 code. + +# Alternatives + +## Forbid `#[rustc_*]` and `#[rustc(...)]` attributes + +(This was the original proposal in the RfC) + +This is less restrictive for the user, but it restricts us to a form of namespacing for any future attributes which we may wish to introduce. This is suboptimal, since by the time plugins stabilize (which is when user-defined attributes become useful for release code) we may add many more attributes to the compiler and they will all have cumbersome names. + +## Do nothing + +If we do nothing we can still manage to add new attributes, however we will need to invent new syntax for it. This will probably be in the form of basic namespacing support +(`#[rustc::awesome_deriving]`) or arbitrary token tree support (the use case will probably still end up looking something like `#[rustc::awesome_deriving]`) + +This has the drawback that the attribute parsing and representation will need to be overhauled before being able to add any new attributes to the compiler. + +# Unresolved questions + +Which proposal to use — disallowing `#[rustc_*]` and `#[rustc]` attributes, or just `#[forbid(unused_attribute)]`ing everything. + +The name of the feature gate could peraps be improved. \ No newline at end of file diff --git a/text/0574-drain-range.md b/text/0574-drain-range.md new file mode 100644 index 00000000000..b1982a48685 --- /dev/null +++ b/text/0574-drain-range.md @@ -0,0 +1,91 @@ +- Start Date: 2015-01-12 +- RFC PR #: https://github.com/rust-lang/rfcs/pull/574 +- Rust Issue #: https://github.com/rust-lang/rust/issues/23055 + +# Summary + +Replace `Vec::drain` by a method that accepts a range parameter. Add +`String::drain` with similar functionality. + +# Motivation + +Allowing a range parameter is strictly more powerful than the current version. +E.g., see the following implementations of some `Vec` methods via the hypothetical +`drain_range` method: + +```rust +fn truncate(x: &mut Vec, len: usize) { + if len <= x.len() { + x.drain_range(len..); + } +} + +fn remove(x: &mut Vec, index: usize) -> u8 { + x.drain_range(index).next().unwrap() +} + +fn pop(x: &mut Vec) -> Option { + match x.len() { + 0 => None, + n => x.drain_range(n-1).next() + } +} + +fn drain(x: &mut Vec) -> DrainRange { + x.drain_range(0..) +} + +fn clear(x: &mut Vec) { + x.drain_range(0..); +} +``` + +With optimization enabled, those methods will produce code that runs as fast +as the current versions. (They should not be implemented this way.) + +In particular, this method allows the user to remove a slice from a vector in +`O(Vec::len)` instead of `O(Slice::len * Vec::len)`. + +# Detailed design + +Remove `Vec::drain` and add the following method: + +```rust +/// Creates a draining iterator that clears the specified range in the Vec and +/// iterates over the removed items from start to end. +/// +/// # Panics +/// +/// Panics if the range is decreasing or if the upper bound is larger than the +/// length of the vector. +pub fn drain(&mut self, range: T) -> /* ... */; +``` + +Where `Trait` is some trait that is implemented for at least `Range`, +`RangeTo`, `RangeFrom`, `FullRange`, and `usize`. + +The precise nature of the return value is to be determined during implementation +and may or may not depend on `T`. + +Add `String::drain`: + +```rust +/// Creates a draining iterator that clears the specified range in the String +/// and iterates over the characters contained in the range. +/// +/// # Panics +/// +/// Panics if the range is decreasing, if the upper bound is larger than the +/// length of the String, or if the start and the end of the range don't lie on +/// character boundaries. +pub fn drain(&mut self, range: T) -> /* ... */; +``` + +Where `Trait` and the return value are as above but need not be the same. + +# Drawbacks + +- The function signature differs from other collections. +- It's not clear from the signature that `..` can be used to get the old behavior. +- The trait documentation will link to the `std::ops` module. It's not immediately apparent how the types in there are related to the `N..M` syntax. +- Some of these problems can be mitigated by solid documentation of the function itself. diff --git a/text/0580-rename-collections.md b/text/0580-rename-collections.md new file mode 100644 index 00000000000..ee4fcf76f97 --- /dev/null +++ b/text/0580-rename-collections.md @@ -0,0 +1,105 @@ +- Start Date: 2015-01-13 +- RFC PR: https://github.com/rust-lang/rfcs/pull/580 +- Rust Issue: https://github.com/rust-lang/rust/issues/22479 + +# Summary + +Rename (maybe one of) the standard collections, so as to make the names more consistent. Currently, among all the alternatives, renaming `BinaryHeap` to `BinHeap` is the slightly preferred solution. + +# Motivation + +In [this comment](http://www.reddit.com/r/programming/comments/2rvoha/announcing_rust_100_alpha/cnk31hf) in the Rust 1.0.0-alpha announcement thread in /r/programming, it was pointed out that Rust's std collections had inconsistent names. Particularly, the abbreviation rules of the names seemed unclear. + +The current collection names (and their longer versions) are: + +* `Vec` -> `Vector` +* `BTreeMap` +* `BTreeSet` +* `BinaryHeap` +* `Bitv` -> `BitVec` -> `BitVector` +* `BitvSet` -> `BitVecSet` -> `BitVectorSet` +* `DList` -> `DoublyLinkedList` +* `HashMap` +* `HashSet` +* `RingBuf` -> `RingBuffer` +* `VecMap` -> `VectorMap` + +The abbreviation rules do seem unclear. Sometimes the first word is abbreviated, sometimes the last. However there are also cases where the names are not abbreviated. `Bitv`, `BitvSet` and `DList` seem strange on first glance. Such inconsistencies are undesirable, as Rust should not give an impression as "the promising language that has strangely inconsistent naming conventions for its standard collections". + +Also, it should be noted that traditionally *ring buffers* have fixed sizes, but Rust's `RingBuf` does not. So it is preferable to rename it to something clearer, in order to avoid incorrect assumptions and surprises. + +# Detailed design + +First some general naming rules should be established. + +1. At least maintain module level consistency when abbreviations are concerned. +2. Prefer commonly used abbreviations. +3. When in doubt, prefer full names to abbreviated ones. +4. Don't be dogmatic. + +And the new names: + +* `Vec` +* `BTreeMap` +* `BTreeSet` +* `BinaryHeap` +* `Bitv` -> `BitVec` +* `BitvSet` -> `BitSet` +* `DList` -> `LinkedList` +* `HashMap` +* `HashSet` +* `RingBuf` -> `VecDeque` +* `VecMap` + +The following changes should be made: + +- Rename `Bitv`, `BitvSet`, `DList` and `RingBuf`. Change affected codes accordingly. +- If necessary, redefine the original names as aliases of the new names, and mark them as deprecated. After a transition period, remove the original names completely. + +## Why prefer full names when in doubt? + +The naming rules should apply not only to standard collections, but also to other codes. It is (comparatively) easier to maintain a higher level of naming consistency by preferring full names to abbreviated ones *when in doubt*. Because given a full name, there are possibly many abbreviated forms to choose from. Which one should be chosen and why? It is hard to write down guidelines for that. + +For example, the name `BinaryBuffer` has at least three convincing abbreviated forms: `BinBuffer`/`BinaryBuf`/`BinBuf`. Which one would be the most preferred? Hard to say. But it is clear that the full name `BinaryBuffer` is not a bad name. + +However, if there *is* a convincing reason, one should not hesitate using abbreviated names. A series of names like `BinBuffer/OctBuffer/HexBuffer` is very natural. Also, few would think that `AtomicallyReferenceCounted`, the full name of `Arc`, is a good type name. + +## Advantages of the new names: + +- `Vec`: The name of the most frequently used Rust collection is left unchanged (and by extension `VecMap`), so the scope of the changes are greatly reduced. `Vec` is an exception to the "prefer full names" rule because it is *the* collection in Rust. +- `BitVec`: `Bitv` is a very unusual abbreviation of `BitVector`, but `BitVec` is a good one given `Vector` is shortened to `Vec`. +- `BitSet`: Technically, `BitSet` is a synonym of `BitVec(tor)`, but it has `Set` in its name and can be interpreted as a set-like "view" into the underlying bit array/vector, so `BitSet` is a good name. No need to have an additional `v`. +- `LinkedList`: `DList` doesn't say much about what it actually is. `LinkedList` is not too long (like `DoublyLinkedList`) and it being a doubly-linked list follows Java/C#'s traditions. +- `VecDeque`: This name exposes some implementation details and signifies its "interface" just like `HashSet`, and it doesn't have the "fixed-size" connotation that `RingBuf` has. Also, `Deque` is commonly preferred to `DoubleEndedQueue`, it is clear that the former should be chosen. + +# Drawbacks + +- There will be breaking changes to standard collections that are already marked `stable`. + +# Alternatives + +## A. Keep the status quo: + +And Rust's standard collections will have some strange names and no consistent naming rules. + +## B. Also rename `Vec` to `Vector`: + +And by extension, `Bitv` to `BitVector` and `VecMap` to `VectorMap`. + +This means breaking changes at a larger scale. Given that `Vec` is *the* collection of Rust, we can have an exception here. + +## C. Rename `DList` to `DLinkedList`, not `LinkedList`: + +It is clearer, but also inconsistent with the other names by having a single-lettered abbreviation of `Doubly`. As Java/C# also have doubly-linked `LinkedList`, it is not necessary to use the additional `D`. + +## D. Also rename `BinaryHeap` to `BinHeap`. + +`BinHeap` can also mean `BinomialHeap`, so `BinaryHeap` is the better name here. + +## E. Rename `RingBuf` to `RingBuffer`, or do not rename `RingBuf` at all. + +Doing so would fail to stop people from making the incorrect assumption that Rust's `RingBuf`s have fixed sizes. + +# Unresolved questions + +None. diff --git a/text/0587-fn-return-should-be-an-associated-type.md b/text/0587-fn-return-should-be-an-associated-type.md new file mode 100644 index 00000000000..faf367974d6 --- /dev/null +++ b/text/0587-fn-return-should-be-an-associated-type.md @@ -0,0 +1,184 @@ +- Start Date: 2015-01-22 +- RFC PR: [rust-lang/rfcs#587](https://github.com/rust-lang/rfcs/pull/587) +- Rust Issue: [rust-lang/rust#21527](https://github.com/rust-lang/rust/issues/21527) + +# Summary + +The `Fn` traits should be modified to make the return type an associated type. + +# Motivation + +The strongest reason is because it would permit impls like the following +(example from @alexcrichton): + +```rust +impl Foo for F : FnMut() -> R { ... } +``` + +This impl is currently illegal because the parameter `R` is not +constrained. (This also has an impact on my attempts to add variance, +which would require a "phantom data" annotation for `R` for the same +reason; but that RFC is not quite ready yet.) + +Another related reason is that it often permits fewer type parameters. +Rather than having a distinct type parameter for the return type, the +associated type projection `F::Output` can be used. Consider the standard +library `Map` type: + +```rust +struct Map + where I : Iterator, + F : FnMut(A) -> B, +{ + ... +} + +impl Iterator for Map + where I : Iterator, + F : FnMut(A) -> B, +{ + type Item = B; + ... +} +``` + +This type could be equivalently written: + +```rust +struct Map + where I : Iterator, F : FnMut<(I::Item,)> +{ + ... +} + +impl Iterator for Map, + where I : Iterator, + F : FnMut<(I::Item,)>, +{ + type Item = F::Output; + ... +} +``` + +This example highlights one subtle point about the `()` notation, +which is covered below. + +# Detailed design + +The design has been implemented. You can see it in [this pull +request]. The `Fn` trait is modified to read as follows: + +```rust +trait Fn { + type Output; + fn call(&self, args: A) -> Self::Output; +} +``` + +The other traits are modified in an analogous fashion. + +[this pull request]: https://github.com/rust-lang/rust/pull/21019 + +### Parentheses notation + +The shorthand `Foo(...)` expands to `Foo<(...), Output=()>`. The +shorthand `Foo(..) -> B` expands to `Foo<(...), Output=B>`. This +implies that if you use the parenthetical notation, you must supply a +return type (which could be a new type parameter). If you would prefer +to leave the return type unspecified, you must use angle-bracket +notation. (Note that using angle-bracket notation with the `Fn` traits +is currently feature-gated, as [described here][18875].) + +[18875]: https://github.com/rust-lang/rust/issues/18875 + +This can be seen in the In the `Map` example from the +introduction. There the `<>` notation was used so that `F::Output` is +left unbound: + +```rust +struct Map + where I : Iterator, F : FnMut<(I::Item,)> +``` + +An alternative would be to retain the type parameter `B`: + +```rust +struct Map + where I : Iterator, F : FnMut(I::Item) -> B +``` + +Or to remove the bound on `F` from the type definition and use it only in the impl: + +```rust +struct Map + where I : Iterator +{ + ... +} + +impl Iterator for Map, + where I : Iterator, + F : FnMut(I::Item) -> B +{ + type Item = F::Output; + ... +} +``` + +Note that this final option is not legal without this change, because +the type parameter `B` on the impl woudl be unconstrained. + +# Drawbacks + +### Cannot overload based on return type alone + +This change means that you cannot overload indexing to "model" a trait +like `Default`: + +```rust +trait Default { + fn default() -> Self; +} +``` + +That is, I can't do something like the following: + +```rust +struct Defaulty; +impl Fn<()> for Defaulty { + type Output = T; + + fn call(&self) -> T { + Default::default() + } +} +``` + +This is not possible because the impl type parameter `T` is not constrained. + +This does not seem like a particularly strong limitation. Overloaded +call notation is already less general than full traits in various ways +(for example, it lacks the ability to define a closure that always +panics; that is, the `!` notation is not a type and hence something +like `FnMut() -> !` is not legal). The ability to overload based on return type +is not removed, it is simply not something you can model using overloaded operators. + +# Alternatives + +### Special syntax to represent the lack of an `Output` binding + +Rather than having people use angle-brackets to omit the `Output` +binding, we could introduce some special syntax for this purpose. For +example, `FnMut() -> ?` could desugar to `FnMut<()>` (whereas +`FnMut()` alone desugars to `FnMut<(), Output=()>`). The first +suggestion that is commonly made is `FnMut() -> _`, but that has an +existing meaning in a function context (where `_` represents a fresh +type variable). + +### Change meaning of `FnMut()` to not bind the output + +We could make `FnMut()` desugar to `FnMut<()>`, and hence require an +explicit `FnMut() -> ()` to bind the return type to unit. This feels +suprising and inconsistent. + + diff --git a/text/0592-c-str-deref.md b/text/0592-c-str-deref.md new file mode 100644 index 00000000000..9c803126e34 --- /dev/null +++ b/text/0592-c-str-deref.md @@ -0,0 +1,164 @@ +- Start Date: 2015-01-17 +- RFC PR: https://github.com/rust-lang/rfcs/pull/592 +- Rust Issue: https://github.com/rust-lang/rust/issues/22469 + +# Summary + +Make `CString` dereference to a token type `CStr`, which designates +null-terminated string data. + +```rust +// Type-checked to only accept C strings +fn safe_puts(s: &CStr) { + unsafe { libc::puts(s.as_ptr()) }; +} + +fn main() { + let s = CString::from_slice("A Rust string"); + safe_puts(s); +} +``` + +# Motivation + +The type `std::ffi::CString` is used to prepare string data for passing +as null-terminated strings to FFI functions. This type dereferences to a +DST, `[libc::c_char]`. The slice type as it is, however, is a poor choice +for representing borrowed C string data, since: + +1. A slice does not express the C string invariant at compile time. + Safe interfaces wrapping FFI functions cannot take slice references as is + without dynamic checks (when null-terminated slices are expected) or + building a temporary `CString` internally (in this case plain Rust slices + must be passed with no interior NULs). +2. An allocated `CString` buffer is not the only desired source for + borrowed C string data. Specifically, it should be possible to interpret + a raw pointer, unsafely and at zero overhead, as a reference to a + null-terminated string, so that the reference can then be used safely. + However, in order to construct a slice (or a dynamically sized newtype + wrapping a slice), its length has to be determined, which is unnecessary + for the consuming FFI function that will only receive a thin pointer. + Another likely data source are string and byte string literals: provided + that a static string is null-terminated, there should be a way to pass it + to FFI functions without an intermediate allocation in `CString`. + +As a pattern of owned/borrowed type pairs has been established +thoughout other modules (see e.g. +[path reform](https://github.com/rust-lang/rfcs/pull/474)), +it makes sense that `CString` gets its own borrowed counterpart. + +# Detailed design + +This proposal introduces `CStr`, a type to designate a null-terminated +string. This type does not implement `Sized`, `Copy`, or `Clone`. +References to `CStr` are only safely obtained by dereferencing `CString` +and a few other helper methods, described below. A `CStr` value should provide +no size information, as there is intent to turn `CStr` into an +[unsized type](https://github.com/rust-lang/rfcs/issues/813), +pending resolution on that proposal. + +## Stage 1: CStr, a DST with a weight problem + +As current Rust does not have unsized types that are not DSTs, at this stage +`CStr` is defined as a newtype over a character slice: + +```rust +#[repr(C)] +pub struct CStr { + chars: [libc::c_char] +} + +impl CStr { + pub fn as_ptr(&self) -> *const libc::c_char { + self.chars.as_ptr() + } +} +``` + +`CString` is changed to dereference to `CStr`: + +```rust +impl Deref for CString { + type Target = CStr; + fn deref(&self) -> &CStr { ... } +} +``` + +In implementation, the `CStr` value needs a length for the internal slice. +This RFC provides no guarantees that the length will be equal to the length +of the string, or be any particular value suitable for safe use. + +## Stage 2: unsized CStr + +If unsized types are enabled later one way of another, the definition +of `CStr` would change to an unsized type with statically sized contents. +The authors of this RFC believe this would constitute no breakage to code +using `CStr` safely. With a view towards this future change, it's recommended +to avoid any unsafe code depending on the internal representation of `CStr`. + +## Returning C strings + +In cases when an FFI function returns a pointer to a non-owned C string, +it might be preferable to wrap the returned string safely as a 'thin' +`&CStr` rather than scan it into a slice up front. To facilitate this, +conversion from a raw pointer should be added (with an inferred lifetime +as per [the established convention](https://github.com/rust-lang/rfcs/pull/556)): +```rust +impl CStr { + pub unsafe fn from_ptr<'a>(ptr: *const libc::c_char) -> &'a CStr { + ... + } +} +``` + +For getting a slice out of a `CStr` reference, method `to_bytes` is +provided. The name is preferred over `as_bytes` to reflect the linear cost +of calculating the length. +```rust +impl CStr { + pub fn to_bytes(&self) -> &[u8] { ... } + pub fn to_bytes_with_nul(&self) -> &[u8] { ... } +} +``` + +An odd consequence is that it is valid, if wasteful, to call `to_bytes` on +a `CString` via auto-dereferencing. + +## Remove c_str_to_bytes + +The functions `c_str_to_bytes` and `c_str_to_bytes_with_nul`, with their +problematic lifetime semantics, are deprecated and eventually removed +in favor of composition of the functions described above: +`c_str_to_bytes(&ptr)` becomes `CStr::from_ptr(ptr).to_bytes()`. + +## Proof of concept + +The described interface changes are implemented in crate +[c_string](https://github.com/mzabaluev/rust-c-str). + +# Drawbacks + +The change of the deref target type is another breaking change to `CString`. +In practice the main purpose of borrowing from `CString` is to obtain a +raw pointer with `.as_ptr()`; for code which only does this and does not +expose the slice in type annotations, parameter signatures and so on, +the change should not be breaking since `CStr` also provides +this method. + +Making the deref target unsized throws away the length information +intrinsic to `CString` and makes it less useful as a container for bytes. +This is countered by the fact that there are general purpose byte containers +in the core libraries, whereas `CString` addresses the specific need to +convey string data from Rust to C-style APIs. + +# Alternatives + +If the proposed enhancements or other equivalent facilities are not adopted, +users of Rust can turn to third-party libraries for better convenience +and safety when working with C strings. This may result in proliferation of +incompatible helper types in public APIs until a dominant de-facto solution +is established. + +# Unresolved questions + +Need a `Cow`? diff --git a/text/0593-forbid-Self-definitions.md b/text/0593-forbid-Self-definitions.md new file mode 100644 index 00000000000..d7c1ecb9ac0 --- /dev/null +++ b/text/0593-forbid-Self-definitions.md @@ -0,0 +1,53 @@ +- Start Date: 2015-01-18 +- RFC PR: [rust-lang/rfcs#593](https://github.com/rust-lang/rfcs/pull/593) +- Rust Issue: [rust-lang/rust#22137](https://github.com/rust-lang/rust/issues/22137) + +# Summary + +Make `Self` a keyword. + +# Motivation + +Right now, `Self` is just a regular identifier that happens to get a special meaning +inside trait definitions and impls. Specifically, users are not forbidden from defining +a type called `Self`, which can lead to weird situations: + +```rust +struct Self; + +struct Foo; + +impl Foo { + fn foo(&self, _: Self) {} +} +``` + +This piece of code defines types called `Self` and `Foo`, +and a method `foo()` that because of the special meaning of `Self` has +the signature `fn(&Foo, Foo)`. + +So in this case it is not possible to define a method on `Foo` that takes the +actual type `Self` without renaming it or creating a renamed alias. + +It would also be highly unidiomatic to actually name the type `Self` +for a custom type, precisely because of this ambiguity, so preventing it outright seems like the right thing to do. + +Making the identifier `Self` an keyword would prevent this situation because the user could not use it freely for custom definitions. + +# Detailed design + +Make the identifier `Self` a keyword that is only legal to use inside a trait definition or impl to refer to the `Self` type. + +# Drawbacks + +It might be unnecessary churn because people already don't run into this +in practice. + +# Alternatives + +Keep the status quo. It isn't a problem in practice, and just means +`Self` is the special case of a contextual type definition in the language. + +# Unresolved questions + +None so far diff --git a/text/0599-default-object-bound.md b/text/0599-default-object-bound.md new file mode 100644 index 00000000000..8a31a9b83c5 --- /dev/null +++ b/text/0599-default-object-bound.md @@ -0,0 +1,366 @@ +- Start Date: 2015-02-12 +- RFC PR: https://github.com/rust-lang/rfcs/pull/599 +- Rust Issue: https://github.com/rust-lang/rust/issues/22211 + +# Summary + +Add a default lifetime bound for object types, so that it is no longer +necessary to write things like `Box` or `&'a +(Trait+'a)`. The default will be based on the context in which the +object type appears. Typically, object types that appear underneath a +reference take the lifetime of the innermost reference under which +they appear, and otherwise the default is `'static`. However, +user-defined types with `T:'a` annotations override the default. + +Examples: + +- `&'a &'b SomeTrait` becomes `&'a &'b (SomeTrait+'b)` +- `&'a Box` becomes `&'a Box` +- `Box` becomes `Box` +- `Rc` becomes `Rc` +- `std::cell::Ref<'a, SomeTrait>` becomes `std::cell::Ref<'a, SomeTrait+'a>` + +Cases where the lifetime bound is either given explicitly or can be +inferred from the traits involved are naturally unaffected. + +# Motivation + +#### Current situation + +As described in [RFC 34][34], object types carry a single lifetime +bound. Sometimes, this bound can be inferred based on the traits +involved. Frequently, however, it cannot, and in that case the +lifetime bound must be given explicitly. Some examples of situations +where an error would be reported are as follows: + +```rust +struct SomeStruct { + object: Box, // <-- ERROR No lifetime bound can be inferred. +} + +struct AnotherStruct<'a> { + callback: &'a Fn(), // <-- ERROR No lifetime bound can be inferred. +} +``` + +Errors of this sort are a [common source of confusion][16948] for new +users (partly due to a poor error message). To avoid errors, those examples +would have to be written as follows: + +```rust +struct SomeStruct { + object: Box, +} + +struct AnotherStruct<'a> { + callback: &'a (Fn()+'a), +} +``` + +Ever since it was introduced, there has been a desire to make this +fully explicit notation more compact for common cases. In practice, +the object bounds are almost always tightly linked to the context in +which the object appears: it is relatively rare, for example, to have +a boxed object type that is not bounded by `'static` or `Send` (e.g., +`Box`). Similarly, it is unusual to have a reference to an +object where the object itself has a distinct bound (e.g., `&'a +(Trait+'b)`). This is not to say these situations *never* arise; as +we'll see below, both of these do arise in practice, but they are +relatively unusual (and in fact there is never a good reason to do +`&'a (Trait+'b)`, though there can be a reason to have `&'a mut +(Trait+'b)`; see ["Detailed Design"](#detailed-design) for full details). + +The need for a shorthand is made somewhat more urgent by +[RFC 458][458], which disconnects the `Send` trait from the `'static` +bound. This means that object types now are written `Box` +would have to be written `Box`. + +Therefore, the following examples would require explicit bounds: + +```rust +trait Message : Send { } +Box // ERROR: 'static no longer inferred from `Send` supertrait +Box // ERROR: 'static no longer inferred from `Send` bound +``` + +#### The proposed rule + +This RFC proposes to use the context in which an object type appears +to derive a sensible default. Specifically, the default begins as +`'static`. Type constructors like `&` or user-defined structs can +alter that default for their type arguments, as follows: + +- The default begins as `'static`. +- `&'a X` and `&'a mut X` change the default for object bounds within `X` to be `'a` +- The defaults for user-defined types like `SomeType` are driven by + the where-clauses defined on `SomeType`, see the next section for + details. The high-level idea is that if the where-clauses on + `SomeType` indicate the `X` will be borrowed for a lifetime `'a`, + then the default for objects appearing in `X` becomes `'a`. + +The motivation for these rules is basically that objects which are not +contained within a reference default to `'static`, and otherwise the +default is the lifetime of the reference. This is almost always what +you want. As evidence, consider the following statistics, which show +the frequency of trait references from three Rust projects. The final +column shows the percentage of uses that would be correctly predicted +by the proposed rule. + +As these statistics were gathered using `ack` and some simple regular +expressions, they only include cover those cases where an explicit +lifetime bound was required today. In function signatures, lifetime +bounds can always be omitted, and it is impossible to distinguish +`&SomeTrait` from `&SomeStruct` using only a regular +expression. However, we belive that the proposed rule would be +compatible with the existing defaults for function signatures in all +or virtually all cases. + +The first table shows the results for objects that appear within a `Box`: + +| package | `Box` | `Box` | `Box` | % | +|---------|-----------------|--------------------|-------------------|------| +| iron | 6 | 0 | 0 | 100% | +| cargo | 7 | 0 | 7 | 50% | +| rust | 53 | 28 | 20 | 80% | + +Here `rust` refers to both the standard library and rustc. As you can +see, cargo (and rust, specifically libsyntax) both have objects that +encapsulate borrowed references, leading to types +`Box`. This pattern is not aided by the current defaults +(though it is also not made any *more* explicit than it already +is). However, this is the minority. + +The next table shows the results for references to objects. + +| package | `&(Trait+Send)` | `&'a [mut] (Trait+'a)` | `&'a mut (Trait+'b)` | % | +|---------|-----------------|----------------------|--------------------|------| +| iron | 0 | 0 | 0 | 100% | +| cargo | 0 | 0 | 5 | 0% | +| rust | 1 | 9 | 0 | 100% | + +As before, the defaults would not help cargo remove its existing +annotations (though they do not get any worse), though all other cases +are resolved. (Also, from casual examination, it appears that cargo +could in fact employ the proposed defaults without a problem, though +the types would be different than the types as they appear in the +source today, but this has not been fully verified.) + +# Detailed design + +This section extends the high-level rule above with suppor for +user-defined types, and also describes potential interactions with +other parts of the system. + +**User-defined types.** The way that user-defined types like +`SomeType<...>` will depend on the where-clauses attached to +`SomeType`: + +- If `SomeType` contains a single where-clause like `T:'a`, where + `T` is some type parameter on `SomeType` and `'a` is some + lifetime, then the type provided as value of `T` will have a + default object bound of `'a`. An example of this is + `std::cell::Ref`: a usage like `Ref<'x, X>` would change the + default for object types appearing in `X` to be `'a`. +- If `SomeType` contains no where-clauses of the form `T:'a` then + the default is not changed. An example of this is `Box` or + `Rc`. Usages like `Box` would therefore leave the default + unchanged for object types appearing in `X`, which probably means + that the default would be `'static` (though `&'a Box` would + have a default of `'a`). +- If `SomeType` contains multiple where-clausess of the form `T:'a`, + then the default is cleared and explicit lifetiem bounds are + required. There are no known examples of this in the standard + library as this situation arises rarely in practice. + +The motivation for these rules is that `T:'a` annotations are only +required when a reference to `T` with lifetime `'a` appears somewhere +within the struct body. For example, the type `std::cell::Ref` is +defined: + +```rust +pub struct Ref<'b, T:'b> { + value: &'b T, + borrow: BorrowRef<'b>, +} +``` + +Because the field `value` has type `&'b T`, the declaration `T:'b` is +required, to indicate that borrowed pointers within `T` must outlive +the lifetime `'b`. This RFC uses this same signal to control the +defaults on objects types. + +It is important that the default is *not* driven by the actual types +of the fields within `Ref`, but solely by the where-clauses declared +on `Ref`. This is both because it better serves to separate interface +and implementation and because trying to examine the types of the +fields to determine the default would create a cycle in the case of +recursive types. + +**Precedence of this rule with respect to other defaults.** This rule +takes precedence over the existing existing defaults that are applied +in function signatures as well as those that are intended (but not yet +implemented) for `impl` declarations. Therefore: + +```rust +fn foo1(obj: &SomeTrait) { } +fn foo2(obj: Box) { } +``` + +expand under this RFC to: + +```rust +// Under this RFC: +fn foo1<'a>(obj: &'a (SomeTrait+'a)) { } +fn foo2(obj: Box) { } +``` + +whereas today those same functions expand to: + +```rust +// Under existing rules: +fn foo1<'a,'b>(obj: &'a (SomeTrait+'b)) { } +fn foo2(obj: Box) { } +``` + +The reason for this rule is that we wish to ensure that if one writes +a struct declaration, then any types which appear in the struct +declaration can be safely copy-and-pasted into a fn signature. For example: + +```rust +struct Foo { + x: Box, // equiv to `Box` +} + +fn bar(foo: &mut Foo, x: Box) { + foo.x = x; // (*) +} +``` + +The goal is to ensure that the line marked with `(*)` continues to +compile. If we gave the fn signature defaults precedence over the +object defaults, the assignment would in this case be illegal, because +the expansion of `Box` would be different. + +**Interaction with object coercion.** The rules specify that `&'a +SomeTrait` and `&'a mut SomeTrait` are expanded to `&'a +(SomeTrait+'a)`and `&'a mut (SomeTrait+'a)` respecively. Today, in fn +signatures, one would get the expansions `&'a (SomeTrait+'b)` and `&'a +mut (SomeTrait+'b)`, respectively. In the case of a shared reference +`&'a SomeTrait`, this difference is basically irrelevant, as the +lifetime bound can always be approximated to be shorter when needed. + +In the case a mutable reference `&'a mut SomeTrait`, however, using +two lifetime variables is *in principle* a more general expansion. The +reason has to do with "variance" -- specifically, because the proposed +expansion places the `'a` lifetime qualifier in the reference of a +mutable reference, the compiler will be unable to allow `'a` to be +approximated with a shorter lifetime. You may have experienced this if +you have types like `&'a mut &'a mut Foo`; the compiler is also forced +to be conservative about the lifetime `'a` in that scenario. + +However, in the specific case of object types, this concern is +ameliorated by the existing object coercions. These coercions permit +`&'a mut (SomeTrait+'a)` to be coerced to `&'b mut (SomeTrait+'c)` +where `'a : 'b` and `'a : 'c`. The reason that this is legal is +because unsized types (like object types) cannot be assigned, thus +sidestepping the variance concerns. This means that programs like the +following compile successfully (though you will find that you get +errors if you replace the object type `(Counter+'a)` with the +underlying type `&'a mut u32`): + +```rust +#![allow(unused_variables)] +#![allow(dead_code)] + +trait Counter { + fn inc_and_get(&mut self) -> u32; +} + +impl<'a> Counter for &'a mut u32 { + fn inc_and_get(&mut self) -> u32 { + **self += 1; + **self + } +} + +fn foo<'a>(x: &'a u32, y: &'a mut (Counter+'a)) { +} + +fn bar<'a>(x: &'a mut (Counter+'a)) { + let value = 2_u32; + foo(&value, x) +} + +fn main() { +} +``` + +This may seem surprising, but it's a reflection of the fact that +object types give the user less power than if the user had direct +access to the underlying data; the user is confined to accessing the +underlying data through a known interface. + +# Drawbacks + +**A. Breaking change.** This change has the potential to break some +existing code, though given the statistics gathered we believe the +effect will be minimal (in particular, defaults are only permitted in +fn signatures today, so in most existing code explicit lifetime bounds +are used). + +**B. Lifetime errors with defaults can get confusing.** Defaults +always carry some potential to surprise users, though it's worth +pointing out that the current rules are also a big source of +confusion. Further improvements like the current system for suggesting +alternative fn signatures would help here, of course (and are an +expected subject of investigation regardless). + +**C. Inferring `T:'a` annotations becomes inadvisable.** It has +sometimes been proposed that we should infer the `T:'a` annotations +that are currently required on structs. Adopting this RFC makes that +inadvisable because the effect of inferred annotations on defaults +would be quite subtle (one could ignore them, which is suboptimal, or +one could try to use them, but that makes the defaults that result +quite non-obvious, and may also introduce cyclic dependencies in the +code that are very difficult to resolve, since inferring the bounds +needed without knowing object lifetime bounds would be challenging). +However, there are good reasons not to want to infer those bounds in +any case. In general, Rust has adopted the principle that type +definitions are always fully explicit when it comes to reference +lifetimes, even though fn signatures may omit information (e.g., +omitted lifetimes, lifetime elision, etc). This principle arose from +past experiments where we used extensive inference in types and found +that this gave rise to particularly confounding errors, since the +errors were based on annotations that were inferred and hence not +always obvious. + +# Alternatives + +1. **Leave things as they are with an improved error message.** +Besides the general dissatisfaction with the current system, a big +concern here is that if [RFC 458][458] is accepted (which seems +likely), this implies that object types like `SomeTrait+Send` will now +require an explicit region bound. Most of the time, that would be +`SomeTrait+Send+'static`, which is very long indeed. We considered the +option of introducing a new trait, let's call it `Own` for now, that +is basically `Send+'static`. However, that required (1) finding a +reasonable name for `Own`; (2) seems to lessen one of the benefits of +[RFC 458][458], which is that lifetimes and other properties can be +considered orthogonally; and (3) does nothing to help with cases like +`&'a mut FnMut()`, which one would still have to write as `&'a mut +(FnMut()+'a)`. + +2. **Do not drive defaults with the `T:'a` annotations that appear on +structs.** An earlier iteration of this RFC omitted the consideration +of `T:'a` annotations from user-defined structs. While this retains +the option of inferring `T:'a` annotations, it means that objects +appearing in user-defined types like `Ref<'a, Trait>` get the wrong +default. + +# Unresolved questions + +None. + +[34]: https://github.com/rust-lang/rfcs/blob/master/text/0034-bounded-type-parameters.md +[16948]: https://github.com/rust-lang/rust/issues/16948 +[458]: https://github.com/rust-lang/rfcs/pull/458 diff --git a/text/0601-replace-be-with-become.md b/text/0601-replace-be-with-become.md new file mode 100644 index 00000000000..767c8e1f44f --- /dev/null +++ b/text/0601-replace-be-with-become.md @@ -0,0 +1,37 @@ +- Start Date: 2015-01-20 +- RFC PR: [rust-lang/rfcs#601](https://github.com/rust-lang/rfcs/pull/601/) +- Rust Issue: [rust-lang/rust#22141](https://github.com/rust-lang/rust/issues/22141) + +# Summary + +Rename the `be` reserved keyword to `become`. + +# Motivation + +A keyword needs to be reserved to support guaranteed tail calls in a backward-compatible way. Currently the keyword reserved for this purpose is `be`, but the `become` alternative was proposed in +the old [RFC](https://github.com/rust-lang/rfcs/pull/81) for guaranteed tail calls, which is now postponed and tracked in [PR#271](https://github.com/rust-lang/rfcs/issues/271). + +Some advantages of the `become` keyword are: + - it provides a clearer indication of its meaning ("this function becomes that function") + - its syntax results in better code alignment (`become` is exactly as long as `return`) + +The expected result is that users will be unable to use `become` as identifier, ensuring that it will be available for future language extensions. + +This RFC is not about implementing tail call elimination, only on whether the `be` keyword should be replaced with `become`. + +# Detailed design + +Rename the `be` reserved word to `become`. This is a very simple find-and-replace. + +# Drawbacks + +Some code might be using `become` as an identifier. + +# Alternatives + +The main alternative is to do nothing, i.e. to keep the `be` keyword reserved for supporting guaranteed tail calls in a backward-compatible way. Using `become` as the keyword for tail calls would not be backward-compatible because it would introduce a new keyword, which might have been used in valid code. + +Another option is to add the `become` keyword, without removing `be`. This would have the same drawbacks as the current proposal (might break existing code), but it would also guarantee that the `become` keyword is available in the future. + +# Unresolved questions + diff --git a/text/0639-discriminant-intrinsic.md b/text/0639-discriminant-intrinsic.md new file mode 100644 index 00000000000..636410a01ec --- /dev/null +++ b/text/0639-discriminant-intrinsic.md @@ -0,0 +1,359 @@ +- Start Date: 2015-01-21 +- RFC PR: [rust-lang/rfcs#639](https://github.com/rust-lang/rfcs/pull/639) +- Rust Issue: [rust-lang/rust#24263](https://github.com/rust-lang/rust/issues/24263) + +# Summary + +Add a new intrinsic, `discriminant_value` that extracts the value of the discriminant for enum +types. + +# Motivation + +Many operations that work with discriminant values can be significantly improved with the ability to +extract the value of the discriminant that is used to distinguish between variants in an enum. While +trivial cases often optimise well, more complex ones would benefit from direct access to this value. + +A good example is the `SqlState` enum from the `postgres` crate (Listed at the end of this RFC). It +contains 233 variants, of which all but one contain no fields. The most obvious implementation of +(for example) the `PartialEq` trait looks like this: + +```rust +match (self, other) { + (&Unknown(ref s1), &Unknown(ref s2)) => s1 == s2, + (&SuccessfulCompletion, &SuccessfulCompletion) => true, + (&Warning, &Warning) => true, + (&DynamicResultSetsReturned, &DynamicResultSetsReturned) => true, + (&ImplicitZeroBitPadding, &ImplicitZeroBitPadding) => true, + . + . + . + (_, _) => false +} +``` + +Even with optimisations enabled, this code is very suboptimal, producing +[this code](https://gist.github.com/Aatch/c23a45634b10aaecad05). A way to extract the discriminant +would allow this code: + +```rust +match (self, other) { + (&Unknown(ref s1), &Unknown(ref s2)) => s1 == s2, + (l, r) => unsafe { + discriminant_value(l) == discriminant_value(r) + } +} +``` + +Which is compiled into [this IR](https://gist.github.com/Aatch/beb736b93a908aa67e84). + +# Detailed design + +## What is a discriminant? + +A discriminant is a value stored in an enum type that indicates which variant the value is. The most +common case is that the discriminant is stored directly as an extra field in the variant. However, +the discriminant may be stored in any place, and in any format. However, we can always extract the +discriminant from the value somehow. + +## Implementation + +For any given type, `discriminant_value` will return a `u64` value. The values returned are as +specified: + +* **Non-Enum Type**: Always 0 +* **C-Like Enum Type**: If no variants have fields, then the enum is considered "C-Like". The user + is able to specify discriminant values in this case, and the return value would be equivalent to + the result of casting the variant to a `u64`. +* **ADT Enum Type**: If any variant has a field, then the enum is conidered to be an "ADT" enum. The + user is not able to specify the discriminant value in this case. The precise values are + unspecified, but have the following characteristics: + + * The value returned for the same variant of the same enum type will compare as + equal. I.E. `discriminant_value(v) == discriminant_value(v)`. + * Two values returned for different variants will compare as unequal relative to their respective + listed positions. That means that if variant `A` is listed before variant `B`, then + `discriminant_value(A) < discriminant_value(B)`. + +Note the returned values for two differently-typed variants may compare in any way. + +# Drawbacks + +* Potentially exposes implementation details. However, relying the specific values returned from +`discriminant_value` should be considered bad practice, as the intrinsic provides no such guarantee. + +* Allows non-enum types to be provided. This may be unexpected by some users. + +# Alternatives + +* More strongly specify the values returned. This would allow for a broader range of uses, but + requires specifying behaviour that we may not want to. + +* Disallow non-enum types. Non-enum types do not have a discriminant, so trying to extract might be + considered an error. However, there is no compelling reason to disallow these types as we can + simply treat them as single-variant enums and synthesise a zero constant. Note that this is what + would be done for single-variant enums anyway. + +* Do nothing. Improvements to codegen and/or optimisation could make this uneccessary. The + "Sufficiently Smart Compiler" trap is a strong case against this reasoning though. There will + likely always be cases where the user can write more efficient code than the compiler can produce. + +# Unresolved questions + +* Should `#[derive]` use this intrinsic to improve derived implementations of traits? While + intrinsics are inherently unstable, `#[derive]`d code is compiler generated and therefore can be + updated if the intrinsic is changed or removed. + +# Appendix + +```rust +pub enum SqlState { + SuccessfulCompletion, + Warning, + DynamicResultSetsReturned, + ImplicitZeroBitPadding, + NullValueEliminatedInSetFunction, + PrivilegeNotGranted, + PrivilegeNotRevoked, + StringDataRightTruncationWarning, + DeprecatedFeature, + NoData, + NoAdditionalDynamicResultSetsReturned, + SqlStatementNotYetComplete, + ConnectionException, + ConnectionDoesNotExist, + ConnectionFailure, + SqlclientUnableToEstablishSqlconnection, + SqlserverRejectedEstablishmentOfSqlconnection, + TransactionResolutionUnknown, + ProtocolViolation, + TriggeredActionException, + FeatureNotSupported, + InvalidTransactionInitiation, + LocatorException, + InvalidLocatorException, + InvalidGrantor, + InvalidGrantOperation, + InvalidRoleSpecification, + DiagnosticsException, + StackedDiagnosticsAccessedWithoutActiveHandler, + CaseNotFound, + CardinalityViolation, + DataException, + ArraySubscriptError, + CharacterNotInRepertoire, + DatetimeFieldOverflow, + DivisionByZero, + ErrorInAssignment, + EscapeCharacterConflict, + IndicatorOverflow, + IntervalFieldOverflow, + InvalidArgumentForLogarithm, + InvalidArgumentForNtileFunction, + InvalidArgumentForNthValueFunction, + InvalidArgumentForPowerFunction, + InvalidArgumentForWidthBucketFunction, + InvalidCharacterValueForCast, + InvalidDatetimeFormat, + InvalidEscapeCharacter, + InvalidEscapeOctet, + InvalidEscapeSequence, + NonstandardUseOfEscapeCharacter, + InvalidIndicatorParameterValue, + InvalidParameterValue, + InvalidRegularExpression, + InvalidRowCountInLimitClause, + InvalidRowCountInResultOffsetClause, + InvalidTimeZoneDisplacementValue, + InvalidUseOfEscapeCharacter, + MostSpecificTypeMismatch, + NullValueNotAllowedData, + NullValueNoIndicatorParameter, + NumericValueOutOfRange, + StringDataLengthMismatch, + StringDataRightTruncationException, + SubstringError, + TrimError, + UnterminatedCString, + ZeroLengthCharacterString, + FloatingPointException, + InvalidTextRepresentation, + InvalidBinaryRepresentation, + BadCopyFileFormat, + UntranslatableCharacter, + NotAnXmlDocument, + InvalidXmlDocument, + InvalidXmlContent, + InvalidXmlComment, + InvalidXmlProcessingInstruction, + IntegrityConstraintViolation, + RestrictViolation, + NotNullViolation, + ForeignKeyViolation, + UniqueViolation, + CheckViolation, + ExclusionViolation, + InvalidCursorState, + InvalidTransactionState, + ActiveSqlTransaction, + BranchTransactionAlreadyActive, + HeldCursorRequiresSameIsolationLevel, + InappropriateAccessModeForBranchTransaction, + InappropriateIsolationLevelForBranchTransaction, + NoActiveSqlTransactionForBranchTransaction, + ReadOnlySqlTransaction, + SchemaAndDataStatementMixingNotSupported, + NoActiveSqlTransaction, + InFailedSqlTransaction, + InvalidSqlStatementName, + TriggeredDataChangeViolation, + InvalidAuthorizationSpecification, + InvalidPassword, + DependentPrivilegeDescriptorsStillExist, + DependentObjectsStillExist, + InvalidTransactionTermination, + SqlRoutineException, + FunctionExecutedNoReturnStatement, + ModifyingSqlDataNotPermittedSqlRoutine, + ProhibitedSqlStatementAttemptedSqlRoutine, + ReadingSqlDataNotPermittedSqlRoutine, + InvalidCursorName, + ExternalRoutineException, + ContainingSqlNotPermitted, + ModifyingSqlDataNotPermittedExternalRoutine, + ProhibitedSqlStatementAttemptedExternalRoutine, + ReadingSqlDataNotPermittedExternalRoutine, + ExternalRoutineInvocationException, + InvalidSqlstateReturned, + NullValueNotAllowedExternalRoutine, + TriggerProtocolViolated, + SrfProtocolViolated, + SavepointException, + InvalidSavepointException, + InvalidCatalogName, + InvalidSchemaName, + TransactionRollback, + TransactionIntegrityConstraintViolation, + SerializationFailure, + StatementCompletionUnknown, + DeadlockDetected, + SyntaxErrorOrAccessRuleViolation, + SyntaxError, + InsufficientPrivilege, + CannotCoerce, + GroupingError, + WindowingError, + InvalidRecursion, + InvalidForeignKey, + InvalidName, + NameTooLong, + ReservedName, + DatatypeMismatch, + IndeterminateDatatype, + CollationMismatch, + IndeterminateCollation, + WrongObjectType, + UndefinedColumn, + UndefinedFunction, + UndefinedTable, + UndefinedParameter, + UndefinedObject, + DuplicateColumn, + DuplicateCursor, + DuplicateDatabase, + DuplicateFunction, + DuplicatePreparedStatement, + DuplicateSchema, + DuplicateTable, + DuplicateAliaas, + DuplicateObject, + AmbiguousColumn, + AmbiguousFunction, + AmbiguousParameter, + AmbiguousAlias, + InvalidColumnReference, + InvalidColumnDefinition, + InvalidCursorDefinition, + InvalidDatabaseDefinition, + InvalidFunctionDefinition, + InvalidPreparedStatementDefinition, + InvalidSchemaDefinition, + InvalidTableDefinition, + InvalidObjectDefinition, + WithCheckOptionViolation, + InsufficientResources, + DiskFull, + OutOfMemory, + TooManyConnections, + ConfigurationLimitExceeded, + ProgramLimitExceeded, + StatementTooComplex, + TooManyColumns, + TooManyArguments, + ObjectNotInPrerequisiteState, + ObjectInUse, + CantChangeRuntimeParam, + LockNotAvailable, + OperatorIntervention, + QueryCanceled, + AdminShutdown, + CrashShutdown, + CannotConnectNow, + DatabaseDropped, + SystemError, + IoError, + UndefinedFile, + DuplicateFile, + ConfigFileError, + LockFileExists, + FdwError, + FdwColumnNameNotFound, + FdwDynamicParameterValueNeeded, + FdwFunctionSequenceError, + FdwInconsistentDescriptorInformation, + FdwInvalidAttributeValue, + FdwInvalidColumnName, + FdwInvalidColumnNumber, + FdwInvalidDataType, + FdwInvalidDataTypeDescriptors, + FdwInvalidDescriptorFieldIdentifier, + FdwInvalidHandle, + FdwInvalidOptionIndex, + FdwInvalidOptionName, + FdwInvalidStringLengthOrBufferLength, + FdwInvalidStringFormat, + FdwInvalidUseOfNullPointer, + FdwTooManyHandles, + FdwOutOfMemory, + FdwNoSchemas, + FdwOptionNameNotFound, + FdwReplyHandle, + FdwSchemaNotFound, + FdwTableNotFound, + FdwUnableToCreateExcecution, + FdwUnableToCreateReply, + FdwUnableToEstablishConnection, + PlpgsqlError, + RaiseException, + NoDataFound, + TooManyRows, + InternalError, + DataCorrupted, + IndexCorrupted, + Unknown(String), +} +``` + +# History + +This RFC was accepted on a provisional basis on 2015-10-04. The +intention is to implement and experiment with the proposed +intrinsic. Some concerns expressed in the RFC discussion that will +require resolution before the RFC can be fully accepted: + +- Using bounds such as `T:Reflect` to help ensure parametricity. +- Do we want to change the return type in some way? + - It may not be helpful if we expose discriminant directly in the + case of (potentially) negative discriminants. + - We might want to return something more opaque to guard against + unintended representation exposure. +- Does this intrinsic need to be unsafe? diff --git a/text/0640-debug-improvements.md b/text/0640-debug-improvements.md new file mode 100644 index 00000000000..b9755f8ccdd --- /dev/null +++ b/text/0640-debug-improvements.md @@ -0,0 +1,199 @@ +- Start Date: 2015-01-20 +- RFC PR: [rust-lang/rfcs#640](https://github.com/rust-lang/rfcs/pull/640) +- Rust Issue: [rust-lang/rust#23083](https://github.com/rust-lang/rust/issues/23083) + +# Summary + +The `Debug` trait is intended to be implemented by every type and display +useful runtime information to help with debugging. This RFC proposes two +additions to the fmt API, one of which aids implementors of `Debug`, and one +which aids consumers of the output of `Debug`. Specifically, the `#` format +specifier modifier will cause `Debug` output to be "pretty printed", and some +utility builder types will be added to the `std::fmt` module to make it easier +to implement `Debug` manually. + +# Motivation + +## Pretty printing + +The conventions for `Debug` format state that output should resemble Rust +struct syntax, without added line breaks. This can make output difficult to +read in the presense of complex and deeply nested structures: +```rust +HashMap { "foo": ComplexType { thing: Some(BufferedReader { reader: FileStream { path: "/home/sfackler/rust/README.md", mode: R }, buffer: 1013/65536 }), other_thing: 100 }, "bar": ComplexType { thing: Some(BufferedReader { reader: FileStream { path: "/tmp/foobar", mode: R }, buffer: 0/65536 }), other_thing: 0 } } +``` +This can be made more readable by adding appropriate indentation: +```rust +HashMap { + "foo": ComplexType { + thing: Some( + BufferedReader { + reader: FileStream { + path: "/home/sfackler/rust/README.md", + mode: R + }, + buffer: 1013/65536 + } + ), + other_thing: 100 + }, + "bar": ComplexType { + thing: Some( + BufferedReader { + reader: FileStream { + path: "/tmp/foobar", + mode: R + }, + buffer: 0/65536 + } + ), + other_thing: 0 + } +} +``` +However, we wouldn't want this "pretty printed" version to be used by default, +since it's significantly more verbose. + +## Helper types + +For many Rust types, a Debug implementation can be automatically generated by +`#[derive(Debug)]`. However, many encapsulated types cannot use the +derived implementation. For example, the types in std::io::buffered all have +manual `Debug` impls. They all maintain a byte buffer that is both extremely +large (64k by default) and full of uninitialized memory. Printing it in the +`Debug` impl would be a terrible idea. Instead, the implementation prints the +size of the buffer as well as how much data is in it at the moment: +https://github.com/rust-lang/rust/blob/0aec4db1c09574da2f30e3844de6d252d79d4939/src/libstd/io/buffered.rs#L48-L60 + +```rust +pub struct BufferedStream { + inner: BufferedReader> +} + +impl fmt::Debug for BufferedStream where S: fmt::Debug { + fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result { + let reader = &self.inner; + let writer = &self.inner.inner.0; + write!(fmt, "BufferedStream {{ stream: {:?}, write_buffer: {}/{}, read_buffer: {}/{} }}", + writer.inner, + writer.pos, writer.buf.len(), + reader.cap - reader.pos, reader.buf.len()) + } +} +``` + +A purely manual implementation is tedious to write and error prone. These +difficulties become even more pronounced with the introduction of the "pretty +printed" format described above. If `Debug` is too painful to manually +implement, developers of libraries will create poor implementations or omit +them entirely. Some simple structures to automatically create the correct +output format can significantly help ease these implementations: +```rust +impl fmt::Debug for BufferedStream where S: fmt::Debug { + fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result { + let reader = &self.inner; + let writer = &self.inner.inner.0; + fmt.debug_struct("BufferedStream") + .field("stream", writer.inner) + .field("write_buffer", &format_args!("{}/{}", writer.pos, writer.buf.len())) + .field("read_buffer", &format_args!("{}/{}", reader.cap - reader.pos, reader.buf.len())) + .finish() + } +} +``` + +# Detailed design + +## Pretty printing + +The `#` modifier (e.g. `{:#?}`) will be interpreted by `Debug` implementations +as a request for "pretty printed" output: + +* Non-compound output is unchanged from normal `Debug` output: e.g. `10`, + `"hi"`, `None`. +* Array, set and map output is printed with one element per line, indented four + spaces, and entries printed with the `#` modifier as well: e.g. +```rust +[ + "a", + "b", + "c" +] +``` +```rust +HashSet { + "a", + "b", + "c" +} +``` +```rust +HashMap { + "a": 1, + "b": 2, + "c": 3 +} +``` +* Struct and tuple struct output is printed with one field per line, indented + four spaces, and fields printed with the `#` modifier as well: e.g. +```rust +Foo { + field1: "hi", + field2: 10, + field3: false +} +``` +```rust +Foo( + "hi", + 10, + false +) +``` + +In all cases, pretty printed and non-pretty printed output should differ *only* +in the addition of newlines and whitespace. + +## Helper types + +Types will be added to `std::fmt` corresponding to each of the common `Debug` +output formats. They will provide a builder-like API to create correctly +formatted output, respecting the `#` flag as needed. A full implementation can +be found at https://gist.github.com/sfackler/6d6610c5d9e271146d11. (Note that +there's a lot of almost-but-not-quite duplicated code in the various impls. +It can probably be cleaned up a bit). For convenience, methods will be added +to `Formatter` which create them. An example of use of the `debug_struct` +method is shown in the Motivation section. In addition, the `padded` method +returns a type implementing `fmt::Writer` that pads input passed to it. This +is used inside of the other builders, but is provided here for use by `Debug` +implementations that require formats not provided with the other helpers. +```rust +impl Formatter { + pub fn debug_struct<'a>(&'a mut self, name: &str) -> DebugStruct<'a> { ... } + pub fn debug_tuple<'a>(&'a mut self, name: &str) -> DebugTuple<'a> { ... } + pub fn debug_set<'a>(&'a mut self, name: &str) -> DebugSet<'a> { ... } + pub fn debug_map<'a>(&'a mut self, name: &str) -> DebugMap<'a> { ... } + + pub fn padded<'a>(&'a mut self) -> PaddedWriter<'a> { ... } +} +``` + +# Drawbacks + +The use of the `#` modifier adds complexity to `Debug` implementations. + +The builder types are adding extra `#[stable]` surface area to the standard +library that will have to be maintained. + +# Alternatives + +We could take the helper structs alone without the pretty printing format. +They're still useful even if a library author doesn't have to worry about the +second format. + +# Unresolved questions + +The indentation level is currently hardcoded to 4 spaces. We could allow that +to be configured as well by using the width or precision specifiers, for +example, `{:2#?}` would pretty print with a 2-space indent. It's not totally +clear to me that this provides enough value to justify the extra complexity. diff --git a/text/0702-rangefull-expression.md b/text/0702-rangefull-expression.md new file mode 100644 index 00000000000..827d4f69c2d --- /dev/null +++ b/text/0702-rangefull-expression.md @@ -0,0 +1,61 @@ +- Start Date: 2015-01-21 +- RFC PR: [#702](https://github.com/rust-lang/rfcs/pull/702) +- Rust Issue: [#21879](https://github.com/rust-lang/rust/issues/21879) + +# Summary + +Add the syntax `..` for `std::ops::RangeFull`. + +# Motivation + +Range expressions `a..b`, `a..` and `..b` all have dedicated syntax and +produce first-class values. This means that they will be usable and +useful in custom APIs, so for consistency, the fourth slicing range, +`RangeFull`, could have its own syntax `..` + +# Detailed design + +`..` will produce a `std::ops::RangeFull` value when it is used in an +expression. This means that slicing the whole range of a sliceable +container is written `&foo[..]`. + +We should remove the old `&foo[]` syntax for consistency. Because of +this breaking change, it would be best to change this before Rust 1.0. + +As previously stated, when we have range expressions in the language, +they become convenient to use when stating ranges in an API. + +@Gankro fielded ideas where +methods like for example `.remove(index) -> element` on a collection +could be generalized by accepting either indices or ranges. Today's `.drain()` +could be expressed as `.remove(..)`. + +Matrix or multidimensional array APIs can use the range expressions for +indexing and/or generalized slicing and `..` represents selecting a full axis +in a multidimensional slice, i.e. `(1..3, ..)` slices the first axis and +preserves the second. + +Because of deref coercions, the very common conversions of String or Vec to +slices don't need to use slicing syntax at all, so the change in verbosity from +`[]` to `[..]` is not a concern. + +# Drawbacks + +* Removing the slicing syntax `&foo[]` is a breaking change. + +* `..` already appears in patterns, as in this example: + `if let Some(..) = foo { }`. This is not a conflict per se, but the + same syntax element is used in two different ways in Rust. + +# Alternatives + +* We could add this syntax later, but we would end up with duplicate + slicing functionality using `&foo[]` and `&foo[..]`. + +* `0..` could replace `..` in many use cases (but not for ranges in + ordered maps). + +# Unresolved questions + +Any parsing questions should already be mostly solved because of the +`a..` and `..b` cases. diff --git a/text/0735-allow-inherent-impls-anywhere.md b/text/0735-allow-inherent-impls-anywhere.md new file mode 100644 index 00000000000..8d700157884 --- /dev/null +++ b/text/0735-allow-inherent-impls-anywhere.md @@ -0,0 +1,73 @@ +- Start Date: 2015-02-19 +- RFC PR: [rust-lang/rfcs#735](https://github.com/rust-lang/rfcs/pull/735) +- Rust Issue: [rust-lang/rust#22563](https://github.com/rust-lang/rust/issues/22563) + +# Summary + +Allow inherent implementations on types outside of the module they are defined in, +effectively reverting [RFC PR 155](https://github.com/rust-lang/rfcs/pull/155). + +# Motivation + +The main motivation for disallowing such `impl` bodies was the implementation +detail of fake modules being created to allow resolving `Type::method`, which +only worked correctly for `impl Type {...}` if a `struct Type` or `enum Type` +were defined in the same module. The old mechanism was obsoleted by UFCS, +which desugars `Type::method` to `::method` and perfoms a type-based +method lookup instead, with path resolution having no knowledge of inherent +`impl`s - and all of that was implemented by [rust-lang/rust#22172](https://github.com/rust-lang/rust/pull/22172). + +Aside from invalidating the previous RFC's motivation, there is something to be +said about dealing with restricted inherent `impl`s: it leads to non-DRY single +use extension traits, the worst offender being `AstBuilder` in libsyntax, with +almost 300 lines of redundant method definitions. + +# Detailed design + +Remove the existing limitation, and only require that the `Self` type of the +`impl` is defined in the same crate. This allows moving methods to other modules: +```rust +struct Player; + +mod achievements { + struct Achievement; + impl Player { + fn achieve(&mut self, _: Achievement) {} + } +} +``` + +# Drawbacks + +Consistency and ease of finding method definitions by looking at the module the +type is defined in, has been mentioned as an advantage of this limitation. +However, trait `impl`s already have that problem and single use extension traits +could arguably be worse. + +# Alternatives + +- Leave it as it is. Seems unsatisfactory given that we're no longer limited + by implementation details. + +- We could go further and allow adding inherent methods to any type that could + implement a trait outside the crate: + ```rust + struct Point { x: T, y: T } + impl (Vec>, T) { + fn foo(&mut self) -> T { ... } + } + ``` + + The implementation would reuse the same coherence rules as for trait `impl`s, + and, for looking up methods, the "type definition to impl" map would be replaced + with a map from method name to a set of `impl`s containing that method. + + *Technically*, I am not aware of any formulation that limits inherent methods + to user-defined types in the same crate, and this extra support could turn out + to have a straight-foward implementation with no complications, but I'm trying + to present the whole situation to avoid issues in the future - even though I'm + not aware of backwards compatibility ones or any related to compiler internals. + +# Unresolved questions + +None. diff --git a/text/0736-privacy-respecting-fru.md b/text/0736-privacy-respecting-fru.md new file mode 100644 index 00000000000..e7e7cf927ac --- /dev/null +++ b/text/0736-privacy-respecting-fru.md @@ -0,0 +1,328 @@ +- Start Date: 2015-01-26 +- RFC PR: https://github.com/rust-lang/rfcs/pull/736 +- Rust Issue: https://github.com/rust-lang/rust/issues/21407 + +# Summary + +Change Functional Record Update (FRU) for struct literal expressions +to respect struct privacy. + +# Motivation + +Functional Record Update is the name for the idiom by which one can +write `..` at the end of a struct literal expression to fill in +all remaining fields of the struct literal by using `` as the +source for them. + +```rust +mod foo { + pub struct Bar { pub a: u8, pub b: String, _cannot_construct: () } + + pub fn new_bar(a: u8, b: String) -> Bar { + Bar { a: a, b: b, _cannot_construct: () } + } +} + +fn main() { + let bar_1 = foo::new_bar(3, format!("bar one")); + + let bar_2a = foo::Bar { b: format!("bar two"), ..bar_1 }; // FRU! + + println!("bar_1: {} bar_2a: {}", bar_1.b, bar_2a.b); + + let bar_2b = foo::Bar { a: 17, ..bar_2a }; // FRU again! + + println!("bar_1: {} bar_2b: {}", bar_1.b, bar_2b.b); +} +``` + +Currently, Functional Record Update will freely move or copy all +fields not explicitly mentioned in the struct literal expression, +so the code above runs successfully. + +In particular, consider a case like this: + +```rust +#![allow(unstable)] +extern crate alloc; +use self::foo::Secrets; +mod foo { + use alloc; + #[allow(raw_pointer_derive)] + #[derive(Debug)] + pub struct Secrets { pub a: u8, pub b: String, ptr: *mut u8 } + + pub fn make_secrets(a: u8, b: String) -> Secrets { + let ptr = unsafe { alloc::heap::allocate(10, 1) }; + Secrets { a: a, b: b, ptr: ptr } + } + + impl Drop for Secrets { + fn drop(&mut self) { + println!("because of {}, deallocating {:p}", self.b, self.ptr); + unsafe { alloc::heap::deallocate(self.ptr, 10, 1); } + } + } +} + +fn main() { + let s_1 = foo::make_secrets(3, format!("ess one")); + let s_2 = foo::Secrets { b: format!("ess two"), ..s_1 }; // FRU ... + + println!("s_1.b: {} s_2.b: {}", s_1.b, s_2.b); + // at end of scope, ... both s_1 *and* s_2 get dropped. Boom! +} +``` + +This example prints the following (if one's memory allocator is not checking for double-frees): + +```text +s_1.b: ess one s_2.b: ess two +because of ess two, deallocating 0x7f00c182e000 +because of ess one, deallocating 0x7f00c182e000 +``` + +In particular, from reading the module `foo`, it appears that one is +attempting to preserve an invariant that each instance of `Secrets` +has its own unique `ptr` value; but this invariant is broken by the use +of FRU. + +Note that there is essentially no way around this abstraction +violation today; as shown for example in [Issue 21407], where +the backing storage for a `Vec` is duplicated in a second `Vec` +by use of the trivial FRU expression `{ ..t }` where `t: Vec`. + +[Issue 21407]: https://github.com/rust-lang/rust/issues/21407#issuecomment-71374092 + +Again, this is due to the current rule that Functional Record Update +will freely move or copy all fields not explicitly mentioned in the +struct literal expression, *regardless* of whether they are visible +(in terms of privacy) in the spot in code. + +This RFC proposes to change that rule, and say that a struct literal +expression using FRU is effectively expanded into a complete struct +literal with initializers for all fields (i.e., a struct literal that +does not use FRU), and that this expanded struct literal is subject to +privacy restrictions. + +The main motivation for this is to plug this abstraction-violating +hole with as little other change to the rules, implementation, and +character of the Rust language as possible. + + +# Detailed design + +As already stated above, the change proposed here is that a struct +literal expression using FRU is effectively expanded into a complete +struct literal with initializers for all fields (i.e., a struct +literal that does not use FRU), and that this expanded struct literal +is subject to privacy restrictions. + +(Another way to think of this change is: one can only use FRU with a +struct if one has visibility of all of its declared fields. If any +fields are hidden by privacy, then all forms of struct literal syntax +are unavailable, including FRU.) + +---- + +This way, the `Secrets` example above will be essentially equivalent to +```rust +#![allow(unstable)] +extern crate alloc; +use self::foo::Secrets; +mod foo { + use alloc; + #[allow(raw_pointer_derive)] + #[derive(Debug)] + pub struct Secrets { pub a: u8, pub b: String, ptr: *mut u8 } + + pub fn make_secrets(a: u8, b: String) -> Secrets { + let ptr = unsafe { alloc::heap::allocate(10, 1) }; + Secrets { a: a, b: b, ptr: ptr } + } + + impl Drop for Secrets { + fn drop(&mut self) { + println!("because of {}, deallocating {:p}", self.b, self.ptr); + unsafe { alloc::heap::deallocate(self.ptr, 10, 1); } + } + } +} + +fn main() { + let s_1 = foo::make_secrets(3, format!("ess one")); + // let s_2 = foo::Secrets { b: format!("ess two"), ..s_1 }; + // is rewritten to: + let s_2 = foo::Secrets { b: format!("ess two"), + /* remainder from FRU */ + a: s_1.a, ptr: s_1.ptr }; + + println!("s_1.b: {} s_2.b: {}", s_1.b, s_2.b); +} +``` + +which is rejected as field `ptr` of `foo::Secrets` is private and +cannot be accessed from `fn main` (both in terms of reading it from +`s_1`, but also in terms of using it to build a new instance of +`foo::Secrets`. + +---- + +(While the change to the language is described above in terms of +rewriting the code, the implementation need not go that route. In +particular, [this commit] shows a different strategy that is isolated +to the `librustc_privacy` crate.) + +[this commit]: https://github.com/pnkfelix/rust/commit/c651bac4189dc03d6a5637323b6ae02fc30e711a + +---- + +The proposed change is applied only to struct literal expressions. In +particular, enum struct variants are left unchanged, since all of +their fields are already implicitly public. + +# Drawbacks + +There is a use case for allowing private fields to be moved/copied via +FRU, which I call the "future extensibility" library design pattern: +it is a convenient way for a library author to tell clients to make +updated copies of a record in a manner that is oblivious to the +addition of new private fields to the struct (at least, new private +fields that implement `Copy`...). + +For example, in Rust today without the change proposed here, in the +first example above using `Bar`, the author of the `mod foo` can +change `Bar` like so: + +```rust + pub struct Bar { pub a: u8, pub b: String, _hidden: u8 } + + pub fn new_bar(a: u8, b: String) -> Bar { + Bar { a: a, b: b, _hidden: 17 } + } +``` + +And all of the code from the `fn main` in the first example will +continue to run. + +Also, when the struct is moved (rather than copied) by the FRU +expression, the same pattern applies and works even when the new +private fields do not implement `Copy`. + +However, there is a small coding pattern that enables such continued +future-extensibility for library authors: divide the struct into the +entirely `pub` frontend, with one member that is the `pub` backend +with entirely private contents, like so: + +```rust +mod foo { + pub struct Bar { pub a: u8, pub b: String, pub _hidden: BarHidden } + pub struct BarHidden { _cannot_construct: () } + fn new_hidden() -> BarHidden { + BarHidden { _cannot_construct: () } + } + + pub fn new_bar(a: u8, b: String) -> Bar { + Bar { a: a, b: b, _hidden: new_hidden() } + } +} + +fn main() { + let bar_1 = foo::new_bar(3, format!("bar one")); + + let bar_2a = foo::Bar { b: format!("bar two"), ..bar_1 }; // FRU! + + println!("bar_1: {} bar_2a: {}", bar_1.b, bar_2a.b); + + let bar_2b = foo::Bar { a: 17, ..bar_2a }; // FRU again! + + println!("bar_1: {} bar_2b: {}", bar_1.b, bar_2b.b); +} +``` + +All hidden changes that one would have formerly made to `Bar` itself +are now made to `BarHidden`. The struct `Bar` is entirely public (including +the supposedly-hidden field named `_hidden`), and +thus can be legally be used with FRU in all client contexts that can +see the type `Bar`, even under the new rules proposed by this RFC. + + + +# Alternatives + +Most Important: If we do not do *something* about this, then both stdlib types like +`Vec` and user-defined types will fundmentally be unable to enforce +abstraction. In other words, the Rust language will be broken. + +---- + +glaebhoerl and pnkfelix outlined a series of potential alternatives, including this one. +Here is an attempt to transcribe/summarize them: + + 1. Change the FRU form `Bar { x: new_x, y: new_y, ..old_b }` so it + somehow is treated as consuming `old_b`, rather than + moving/copying each of the remaining fields in `old_b`. + + It is not totally clear what the semantics actually are for this + form. Also, there may not be time to do this properly for 1.0. + + 2. Try to adopt a data/abstract-type distinction along the lines of the one in [glaebhoerl's draft RFC]. + +[glaebhoerl's draft RFC]: https://raw.githubusercontent.com/glaebhoerl/rust-notes/master/my_rfcs/Distinguish%20data%20types%20from%20abstract%20types.txt + + As a special subnote on this alternative: While [glaebhoerl's draft RFC] proposed + syntactic forms for indicating the data/abstract-type distinction, we could + also (or instead) do it based solely on the presence of a single non-`pub` + field, as pointed out by glaebhoerl at the [comment here]. + +[comment here]: https://github.com/rust-lang/rust/issues/21407#issuecomment-71196581 + + (Another potential criterion could be "has *all* private fields."; see + related discussion below in the item "Outlaw the trivial FRU form Foo".) + + 3. let FRU keep its current privacy violating semantics, but also + make FRU something one must opt-in to support on a type. E.g. make + a builtin `FunUpdate` trait that a struct must implement in order + to be usable with FRU. (Or maybe its an attribute you attach to + the struct item.) + + This approach would impose a burden on all code today that makes + use of FRU, since they would have to start implementing + `FunUpdate`. Thus, not simple to implement for the libraries and + the overall ecosystem. What other designs have been considered? + What is the impact of not doing this? + + 4. Adopt this RFC, but add a builtin `HygienicFunUpdate` trait that + one can opt-into to get the old (privacy violating) semantics. + + While this is obviously complicated, it has the advantage that it + has a staged landing strategy: We could just adopt and implement + this RFC for 1.0 beta. We could add `HygienicFunUpdate` at an + arbitrary point in the future; it would not have to be in the 1.0 + release. + + (For why the trait is named `HygienicFunUpdate`, see comment + thread on [Issue 21407].) + + 5. Add way for struct item to opt out of FRU support entirely, + e.g. via an attribute. + + This seems pretty fragile; i.e., easy to forget. + + 6. Outlaw the trivial FRU form Foo { .. }. That is, to use + FRU, you have to use at least one field in the constructing + expression. Again, this implies that types like Vec and HashMap + will not be subject to the vulnerability outlined here. + + This solves the vulnerability for types like `Vec` and `HashMap`, + but the `Secrets` example from the Motivation section still + breaks; the author for the `mod foo` library will need to write + their code more carefully to ensure that secret things are + contained in a separate struct with all private fields, + much like the `BarHidden` code pattern discussed above. + +# Unresolved questions + +How important is the "future extensibility" library design pattern +described in the Drawbacks section? How many Cargo packages, if any, +use it? diff --git a/text/0738-variance.md b/text/0738-variance.md new file mode 100644 index 00000000000..c4b70f44140 --- /dev/null +++ b/text/0738-variance.md @@ -0,0 +1,566 @@ +- Start Date: 2014-12-19 +- RFC PR: https://github.com/rust-lang/rfcs/pull/738 +- Rust Issue: https://github.com/rust-lang/rust/issues/22212 + +# Summary + +- Use inference to determine the *variance* of input type parameters. +- Make it an error to have unconstrained type/lifetime parameters. +- Revamp the variance markers to make them more intuitive and less numerous. + In fact, there are only two: `PhantomData` and `PhantomFn`. +- Integrate the notion of `PhantomData` into other automated compiler + analyses, notably OIBIT, that can otherwise be deceived into yielding + incorrect results. + +# Motivation + +## Why variance is good + +Today, all type parameters are invariant. This can be problematic +around lifetimes. A particular common example of where problems +arise is in the use of `Option`. Here is a simple example. Consider +this program, which has a struct containing two references: + +``` +struct List<'l> { + field1: &'l int, + field2: &'l int, +} + +fn foo(field1: &int, field2: &int) { + let list = List { field1: field1, field2: field2 }; + ... +} + +fn main() { } +``` + +Here the function `foo` takes two references with distinct lifetimes. +The variable `list` winds up being instantiated with a lifetime that +is the intersection of the two (presumably, the body of `foo`). This +is good. + +If we modify this program so that one of those references is optional, +however, we will find that it gets a compilation error: + +``` +struct List<'l> { + field1: &'l int, + field2: Option<&'l int>, +} + +fn foo(field1: &int, field2: Option<&int>) { + let list = List { field1: field1, field2: field2 }; + // ERROR: Cannot infer an appropriate lifetime + ... +} + +fn main() { } +``` + +The reason for this is that because `Option` is *invariant* with +respect to its argument type, it means that the lifetimes of `field1` +and `field2` must match *exactly*. It is not good enough for them to +have a common subset. This is not good. + +## What variance is + +[Variance][v] is a general concept that comes up in all languages that +combine subtyping and generic types. However, because in Rust all +subtyping is related to the use of lifetimes parameters, Rust uses +variance in a very particular way. Basically, variance is a +determination of when it is ok for lifetimes to be approximated +(either made bigger or smaller, depending on context). + +Let me give a few examples to try and clarify how variance works. +Consider this simple struct `Context`: + +```rust +struct Context<'data> { + data: &'data u32, + ... +} +``` + +Here the `Context` struct has one lifetime parameter, `data`, that +represents the lifetime of some data that it references. Now let's +imagine that the lifetime of the data is some lifetime we call +`'x`. If we have a context `cx` of type `Context<'x>`, it is ok to +(for example) pass `cx` as an argment where a value of type +`Context<'y>` is required, so long as `'x : 'y` ("`'x` outlives +`'y`"). That is, it is ok to approximate `'x` as a shorter lifetime +like `'y`. This makes sense because by changing `'x` to `'y`, we're +just pretending the data has a shorter lifetime than it actually has, +which can't do any harm. Here is an example: + +```rust +fn approx_context<'long,'short>(t: &Context<'long>, data: &'short Data) + where 'long : 'short +{ + // here we approximate 'long as 'short, but that's perfectly safe. + let u: &Context<'short> = t; + do_something(u, data) +} + +fn do_something<'x>(t: &Context<'x>, data: &'x Data) { + ... +} +``` + +This case has been traditionally called "contravariant" by Rust, +though some argue (somewhat persuasively) that +["covariant" is the better terminology][391]. In any case, this RFC +generally abandons the "variance" terminology in publicly exposed APIs +and bits of the language, making this a moot point (in this RFC, +however, I will stick to calling lifetimes which may be made smaller +"contravariant", since that is what we have used in the past). + +[391]: https://github.com/rust-lang/rfcs/issues/391 + +Next let's consider a struct with interior mutability: + +```rust +struct Table<'arg> { + cell: Cell<&'arg Foo> +} +``` + +In the case of `Table`, it is not safe for the compiler to approximate +the lifetime `'arg` at all. This is because `'arg` appears in a +mutable location (the interior of a `Cell`). Let me show you what +could happen if we did allow `'arg` to be approximated: + +```rust +fn innocent<'long>(t: &Table<'long>) { + { + let foo: Foo = ..; + evil(t, &foo); + } + t.cell.get() // reads `foo`, which has been destroyed +} + +fn evil<'long,'short>(t: &Table<'long>, s: &'short Foo) + where 'long : 'short +{ + // The following assignment is not legal, but it would be legal + let u: &Table<'short> = t; + u.cell.set(s); +} +``` + +Here the function `evil()` changes contents of `t.cell` to point at +data with a shorter lifetime than `t` originally had. This is bad +because the caller still has the old type (`Table<'long>`) and doesn't +know that data with a shorter lifetime has been inserted. (This is +traditionally called "invariant".) + +Finally, there can be cases where it is ok to make a lifetime +*longer*, but not shorter. This comes up (for example) in a type like +`fn(&'a u8)`, which may be safely treated as a `fn(&'static u8)`. + +[v]: http://en.wikipedia.org/wiki/Covariance_and_contravariance_%28computer_science%29 + +## Why variance should be inferred + +Actually, lifetime parameters already have a notion of variance, and +this varinace is fully inferred. In fact, the proper variance for type +parameters is *also* being inferred, we're just largely ignoring +it. (It's not completely ignored; it informs the variance of +lifetimes.) + +The main reason we chose inference over declarations is that variance +is rather tricky business. Most of the time, it's annoying to have to +think about it, since it's a purely mechanical thing. The main reason +that it pops up from time to time in Rust today (specifically, in +examples like the one above) is because we *ignore* the results of +inference and just make everything invariant. + +But in fact there is another reason to prefer inference. When manually +specifying variance, it is easy to get those manual specifications +wrong. There is one example later on where the author did this, but +using the mechanisms described in this RFC to guide the inference +actually led to the correct solution. + +## The corner case: unused parameters and parameters that are only used unsafely + +Unfortunately, variance inference only works if type parameters are +actually *used*. Otherwise, there is no data to go on. You might think +parameters would always be used, but this is not true. In particular, +some types have "phantom" type or lifetime parameters that are not +used in the body of the type. This generally occurs with unsafe code: + + struct Items<'vec, T> { // unused lifetime parameter 'vec + x: *mut T + } + + struct AtomicPtr { // unused type parameter T + data: AtomicUint // represents an atomically mutable *mut T, really + } + +Since these parameters are unused, the inference can reasonably +conclude that `AtomicPtr` and `AtomicPtr` are +interchangable: after all, there are no fields of type `T`, so what +difference does it make what value it has? This is not good (and in +fact we have behavior like this today for lifetimes, which is a common +source of error). + +To avoid this hazard, the RFC proposes to make it an error to have a +type or lifetime parameter whose variance is not constrained. Almost +always, the correct thing to do in such a case is to either remove the +parameter in question or insert a *marker type*. Marker types +basically inform the inference engine to pretend as if the type +parameter were used in particular ways. They are discussed in the next section. + +## Revamping the marker types + +### The UnsafeCell type + +As today, the `UnsafeCell` type is well-known to `rustc` and is +always considered invariant with respect to its type parameter `T`. + +### Phantom data + +This RFC proposes to replace the existing marker types +(`CovariantType`, `ContravariantLifetime`, etc) with a single type, +`PhantomData`: + +```rust +// Represents data of type `T` that is logically present, although the +// type system cannot see it. This type is covariant with respect to `T`. +struct PhantomData; +``` + +An instance of `PhantomData` is used to represent data that is +logically present, although the type system cannot see +it. `PhantomData` is covariant with respect to its type parameter `T`. Here are +some examples of uses of `PhantomData` from the standard library: + +```rust +struct AtomicPtr { + data: AtomicUint, + + // Act as if we could reach a `*mut T` for variance. This will + // make `AtomicPtr` *invariant* with respect to `T` (because `T` appears + // underneath the `mut` qualifier). + marker: PhantomData<*mut T>, +} + +pub struct Items<'a, T: 'a> { + ptr: *const T, + end: *const T, + + // Act as if we could reach a slice `[T]` with lifetime `'a`. + // Induces covariance on `T` and suitable variance on `'a` + // (covariance using the definition from rfcs#391). + marker: marker::PhantomData<&'a [T]>, +} +``` + +Note that `PhantomData` can be used to induce covariance, invariance, or contravariance +as desired: + +```rust +PhantomData // covariance +PhantomData<*mut T> // invariance, but see "unresolved question" +PhantomData> // invariance +PhantomData // contravariant +``` + +Even better, the user doesn't really have to understand the terms +covariance, invariance, or contravariance, but simply to accurately +model the kind of data that the type system should pretend is present. + +**Other uses for phantom data.** It turns out that phantom data is an +important concept for other compiler analyses. One example is the +OIBIT analysis, which decides whether certain traits (like `Send` and +`Sync`) are implemented by recursively examining the fields of structs +and enums. OIBIT should treat phantom data the same as normal +fields. Another example is the ongoing work for removing the +`#[unsafe_dtor]` annotation, which also sometimes requires a recursive +analysis of a similar nature. + +### Phantom functions + +One limitation of the marker type `PhantomData` is that it cannot be +used to constrain unused parameters appearing on traits. Consider +the following example: + +```rust +trait Dummy { /* T is never used here! */ } +``` + +Normally, the variance of a trait type parameter would be determined +based on where it appears in the trait's methods: but in this case +there are no methods. Therefore, we introduce two special traits that +can be used to induce variance. Similarly to `PhantomData`, these +traits represent parts of the interface that are logically present, if +not actually present: + + // Act as if there were a method `fn foo(A) -> R`. Induces contravariance on A + // and covariance on R. + trait PhantomFn; + +These traits should appear in the supertrait list. For example, the +`Dummy` trait might be modified as follows: + +```rust +trait Dummy : PhantomFn() -> T { } +``` + +As you can see, the `()` notation can be used with `PhantomFn` as +well. + +### Designating marker traits + +In addition to phantom fns, there is a convenient trait `MarkerTrait` +that is intended for use as a supertrait for traits that designate +sets of types. These traits often have no methods and thus no actual +uses of `Self`. The builtin bounds are a good example: + +```rust +trait Copy : MarkerTrait { } +trait Sized : MarkerTrait { } +unsafe trait Send : MarkerTrait { } +unsafe trait Sync : MarkerTrait { } +``` + +`MarkerTrait` is not builtin to the language or specially understood +by the compiler, it simply encapsulates a common pattern. It is +implemented as follows: + +```rust +trait MarkerTrait for Sized? : PhantomFn(Self) -> bool { } +impl MarkerTrait for T { } +``` + +Intuitively, `MarkerTrait` extends `PhantomFn(Self)` because it is "as +if" the traits were defined like: + +```rust +trait Copy { + fn is_copyable(&self) -> bool { true } +} +``` + +Here, the type parameter `Self` appears in argument position, which is +contravariant. + +**Why contravariance?** To see why contravariance is correct, you have +to consider what it means for `Self` to be contravariant for a marker +trait. It means that if I have evidence that `T : Copy`, then I can +use that as evidence to show that `U +: Copy` if `U <: T`. More formally: + + (T : Copy) <: (U : Copy) // I can use `T:Copy` where `U:Copy` is expected... + U <: T // ...so long as `U <: T` + +More intuitively, it means that if a type `T` implements the marker, +than all of its subtypes must implement the marker. + +Because subtyping is exclusively tied to lifetimes in Rust, and most +marker traits are orthogonal to lifetimes, it actually rarely makes a +difference what choice you make here. But imagine that we have a +marker trait that requires `'static` (such as `Send` today, though +this may change). If we made marker traits covariant with respect to +`Self`, then `&'static Foo : Send` could be used as evidence that `&'x +Foo : Send` for any `'x`, because `&'static Foo <: &'x Foo`: + + (&'static Foo : Send) <: (&'x Foo : Send) // if things were covariant... + &'static Foo <: &'x Foo // ...we'd have the wrong relation here + +*Interesting side story: the author thought that covariance would be +correct for some time. It was only when attempting to phrase the +desired behavior as a fn that I realized I had it backward, and +quickly found the counterexample I give above. This gives me +confidence that expressing variance in terms of data and fns is more +reliable than trying to divine the correct results directly.* + +# Detailed design + +Most of the detailed design has already been covered in the motivation +section. + +#### Summary of changes required + +- Use variance results to inform subtyping of nominal types + (structs, enums). +- Use variance for the output type parameters on traits. +- Input type parameters of traits are considered invariant. +- Variance has no effect on the type parameters on an impl or fn; + rather those are freshly instantiated at each use. +- Report an error if the inference does not find any use of a type or + lifetime parameter *and* that parameter is not bound in an + associated type binding in some where clause. + +These changes have largely been implemented. You can view the results, +and the impact on the standard library, in +[this branch on nikomatsakis's repository][b]. Note though that as of +the time of this writing, the code is slightly outdated with respect +to this RFC in certain respects (which will clearly be rectified +ASAP). + +[b]: https://github.com/nikomatsakis/rust/tree/variance-3 + +#### Variance inference algorithm + +I won't dive too deeply into the inference algorithm that we are using +here. It is based on Section 4 of the paper +["Taming the Wildcards: Combining Definition- and Use-Site Variance"][taming] +published in PLDI'11 and written by Altidor et al. There is a fairly +detailed (and hopefully only slightly outdated) description in +[the code] as well. + +[taming]: http://people.cs.umass.edu/~yannis/variance-pldi11.pdf +[the code]: https://github.com/nikomatsakis/rust/blob/variance-3/src/librustc_typeck/variance.rs#L11-L205 + +#### Bivariance yields an error + +One big change from today is that if we compute a result of bivariance +as the variance for any type or lifetime parameter, we will report a +hard error. The error message explicitly suggests the use of a +`PhantomData` or `PhantomFn` marker as appropriate: + + type parameter `T` is never used; either remove it, or use a + marker such as `std::kinds::marker::PhantomData`" + +The goal is to help users as concretely as possible. The documentation +on the phantom markers should also be helpful in guiding users to make +the right choice (the ability to easily attach documentation to the +marker type was in fact the major factor that led us to adopt marker +types in the first place). + +#### Rules for associated types + +The only exception is when this type parameter is in fact +an output that is implied by where clauses declared on the type. As +an example of why this distinction is important, consider the type +`Map` declared here: + +```rust +struct Map +where I : Iterator, F : FnMut(A) -> B +{ + iter: I, + fn: F, +} +``` + +Neither the type `A` nor `B` are reachable from the fields declared +within `Map`, and hence the variance inference for them results in +bivariance. However, they are nonetheless constrained. In the case of +the parameter `A`, its value is determined by the type `I`, and `B` is +determined by the type `F` (note that [RFC 587][587] makes the return +type of `FnMut` an associated type). + +The analysis to decide when a type parameter is implied by other type +parameters is the same as that specified in [RFC 447][447]. + +[447]: https://github.com/rust-lang/rfcs/blob/master/text/0447-no-unused-impl-parameters.md#detailed-design +[587]: https://github.com/rust-lang/rfcs/blob/master/text/0587-fn-return-should-be-an-associated-type.md + +# Future possibilities + +**Make phantom data and fns more first-class.** One thing I would +consider in the future is to integrate phantom data and fns more +deeply into the language to improve usability. The idea would be to +add a phantom keyword and then permit the explicit declaration of +phantom fields and fns in structs and traits respectively: + +```rust +// Instead of +struct Foo { + pointer: *mut u8, + _marker: PhantomData +} +trait MarkerTrait : PhantomFn(Self) { +} + +// you would write: +struct Foo { + pointer: *mut u8, + phantom T +} +trait MarkerTrait { + phantom fn(Self); +} +``` + +Phantom fields would not need to be specified when creating an +instance of a type and (being anonymous) could never be named. They +exist solely to aid the analysis. This would improve the usability of +phantom markers greatly. + +# Alternatives + +**Default to a particular variance when a type or lifetime parameter +is unused.** A prior RFC advocated for this approach, mostly because +markers were seen as annoying to use. However, after some discussion, +it seems that it is more prudent to make a smaller change and retain +explicit declarations. Some factors that influenced this decision: + +- The importance of phantom data for other analyses like OIBIT. +- Many unused lifetime parameters (and some unused type parameters) are in + fact completely unnecessary. Defaulting to a particular variance would + not help in identifying these cases (though a better dead code lint might). +- There is no default that is always correct but invariance, and + invariance is typically too strong. +- Phantom type parameters occur relatively rarely anyhow. + +**Remove variance inference and use fully explicit declarations.** +Variance inference is a rare case where we do non-local inference +across type declarations. It might seem more consistent to use +explicit declarations. However, variance declarations are notoriously +hard for people to understand. We were unable to come up with a +suitable set of keywords or other system that felt sufficiently +lightweight. Moreover, explicit annotations are error-prone when +compared to the phantom data and fn approach (see example in the +section regarding marker traits). + +# Unresolved questions + +There is one significant unresolved question: the correct way to +handle a `*mut` pointer. It was revealed recently that while the +current treatment of `*mut T` is correct, it frequently yields overly +conservative inference results in practice. At present the inference +treats `*mut T` as invariant with respect to `T`: this is correct and +sound, because a `*mut` represents aliasable, mutable data, and indeed +the subtyping relation for `*mut T` is that `*mut T <: *mut U if T=U`. + +However, in practice, `*mut` pointers are often used to build safe +abstractions, the APIs of which do not in fact permit aliased +mutation. Examples are `Vec`, `Rc`, `HashMap`, and so forth. In all of +these cases, the correct variance is covariant -- but because of the +conservative treatment of `*mut`, all of these types are being +inferred to an invariant result. + +The complete solution to this seems to have two parts. First, for +convenience and abstraction, we should not be building safe +abstractions on raw `*mut` pointers anyway. We should have several +convenient newtypes in the standard library, like `ptr::Unique`, that +can be used, which would also help for handling OIBIT conditions and +`NonZero` optimizations. In my branch I have used the existing (but +unstable) type `ptr::Unique` for the primary role, which is kind of an +"unsafe box". `Unique` should ensure that it is covariant with respect +to its argument. + +However, this raises the question of how to implement `Unique` under +the hood, and what to do with `*mut T` in general. There are various +options: + +1. Change `*mut` so that it behaves like `*const`. This unfortunately + means that abstractions that introduce shared mutability have + a responsibility for add phantom data to that affect, something + like `PhantomData<*const Cell>`. This seems non-obvious and + unnatural. + +2. Rewrite safe abstractions to use `*const` (or even `usize`) instead + of `*mut`, casting to `*mut` only they have a `&mut self` + method. This is probably the most conservative option. + +3. Change variance to ignore `*mut` referents entirely. Add a lint to + detect types with a `*mut T` type and require some sort of explicit + marker that covers `T`. This is perhaps the most explicit + option. Like option 1, it creates the odd scenario that the + variance computation and subtyping relation diverge. + +Currently I lean towards option 2. diff --git a/text/0769-sound-generic-drop.md b/text/0769-sound-generic-drop.md new file mode 100644 index 00000000000..1842124f203 --- /dev/null +++ b/text/0769-sound-generic-drop.md @@ -0,0 +1,1164 @@ +- Start Date: 2013-08-29 +- RFC PR: [rust-lang/rfcs#769](https://github.com/rust-lang/rfcs/pull/769) +- Rust Issue: [rust-lang/rust#8861](https://github.com/rust-lang/rust/issues/8861) + +# History + +2015.09.18 -- This RFC was partially superceded by RFC 1238, which +removed the parametricity-based reasoning in favor of an attribute. + +# Summary + +Remove `#[unsafe_destructor]` from the Rust language. Make it safe +for developers to implement `Drop` on type- and lifetime-parameterized +structs and enum (i.e. "Generic Drop") by imposing new rules on code +where such types occur, to ensure that the drop implementation cannot +possibly read or write data via a reference of type `&'a Data` where +`'a` could have possibly expired before the drop code runs. + +Note: This RFC is describing a feature that has been long in the +making; in particular it was previously sketched in Rust [Issue #8861] +"New Destructor Semantics" (the source of the tongue-in-cheek "Start +Date" given above), and has a [prototype implementation] that is being +prepared to land. The purpose of this RFC is two-fold: + + 1. standalone documentation of the (admittedly conservative) rules + imposed by the new destructor semantics, and + + 2. elicit community feedback on the rules, both in the form they will + take for 1.0 (which is relatively constrained) and the form they + might take in the future (which allows for hypothetical language + extensions). + +[Issue #8861]: https://github.com/rust-lang/rust/issues/8861 + +[prototype implementation]: https://github.com/pnkfelix/rust/tree/77afdb70a1d4d5a20069f12412bfeda3ccd145bf + +# Motivation + +Part of Rust's design is rich use of Resource Acquisition Is +Initialization (RAII) patterns, which requires destructors: code +attached to certain types that runs only when a value of the type goes +out of scope or is otherwise deallocated. In Rust, the `Drop` trait is +used for this purpose. + +Currently (as of Rust 1.0 alpha), a developer cannot implement `Drop` +on a type- or lifetime-parametric type (e.g. `struct Sneetch<'a>` or +`enum Zax`) without attaching the `#[unsafe_destructor]` attribute +to it. The reason this attribute is required is that the current +implementation allows for such destructors to inject unsoundness +accidentally (e.g. reads from or writes to deallocated memory, +accessing data when its representation invariants are no longer +valid). + +Furthermore, while some destructors can be implemented with no danger +of unsoundness, regardless of `T` (assuming that any `Drop` +implementation attached to `T` is itself sound), as soon as one wants +to interact with borrowed data within the `fn drop` code (e.g. access +a field `&'a StarOffMachine` from a value of type `Sneetch<'a>` ), +there is currently no way to enforce a rule that `'a` *strictly* +*outlive* the value itself. This is a huge gap in the language as it +stands: as soon as a developer attaches `#[unsafe_destructor]` to such +a type, it is imposing a subtle and *unchecked* restriction on clients +of that type that they will not ever allow the borrowed data to expire +first. + +## Lifetime parameterization: the Sneetch example +[The Sneetch example]: #lifetime-parameterization-the-sneetch-example + +If today Sylvester writes: + +```rust +// opt-in to the unsoundness! +#![feature(unsafe_destructor)] + +pub mod mcbean { + use std::cell::Cell; + + pub struct StarOffMachine { + usable: bool, + dollars: Cell, + } + + impl Drop for StarOffMachine { + fn drop(&mut self) { + let contents = self.dollars.get(); + println!("Dropping a machine; sending {} dollars to Sylvester.", + contents); + self.dollars.set(0); + self.usable = false; + } + } + + impl StarOffMachine { + pub fn new() -> StarOffMachine { + StarOffMachine { usable: true, dollars: Cell::new(0) } + } + pub fn remove_star(&self, s: &mut Sneetch) { + assert!(self.usable, + "No different than a read of a dangling pointer."); + self.dollars.set(self.dollars.get() + 10); + s.has_star = false; + } + } + + pub struct Sneetch<'a> { + name: &'static str, + has_star: bool, + machine: Cell>, + } + + impl<'a> Sneetch<'a> { + pub fn new(name: &'static str) -> Sneetch<'a> { + Sneetch { + name: name, + has_star: true, + machine: Cell::new(None) + } + } + + pub fn find_machine(&self, m: &'a StarOffMachine) { + self.machine.set(Some(m)); + } + } + + #[unsafe_destructor] + impl<'a> Drop for Sneetch<'a> { + fn drop(&mut self) { + if let Some(m) = self.machine.get() { + println!("{} says ``before I die, I want to join my \ + plain-bellied brethren.''", self.name); + m.remove_star(self); + } + } + } +} + +fn unwary_client() { + use mcbean::{Sneetch, StarOffMachine}; + let (s1, m, s2, s3); // (accommodate PR 21657) + s1 = Sneetch::new("Sneetch One"); + m = StarOffMachine::new(); + s2 = Sneetch::new("Sneetch Two"); + s3 = Sneetch::new("Sneetch Zee"); + + s1.find_machine(&m); + s2.find_machine(&m); + s3.find_machine(&m); +} + +fn main() { + unwary_client(); +} +``` + +This compiles today; if you run it, it prints the following: + +``` +Sneetch Zee says ``before I die, I want to join my plain-bellied brethren.'' +Sneetch Two says ``before I die, I want to join my plain-bellied brethren.'' +Dropping a machine; sending 20 dollars to Sylvester. +Sneetch One says ``before I die, I want to join my plain-bellied brethren.'' +thread '

' panicked at 'No different than a read of a dangling pointer.', :27 +``` + +Explanation: In Sylvester's code, the `Drop` implementation for +`Sneetch` invokes a method on the borrowed reference in the field +`machine`. This implies there is an implicit restriction on an value +`s` of type `Sneetch<'a>`: the lifetime `'a` must *strictly outlive* +`s`. + +(The example encodes this constraint in a dynamically-checked manner +via an explicit `usable` boolean flag that is only set to false in the +machine's own destructor; it is important to keep in mind that this is +just a method to illustrate the violation in a semi-reliable manner: +Using a machine after `usable` is set to false by its `fn drop` code +is analogous to dereferencing a `*mut T` that has been deallocated, or +similar soundness violations.) + +Sylvester's API does not encode the constraint "`'a` must strictly +outlive the `Sneetch<'a>`" explicitly; Rust currently has no way of +expressing the constraint that one lifetime be strictly greater than +another lifetime or type (the form `'a:'b` only formally says that +`'a` must live *at least* as long as `'b`). + +Thus, client code like that in `unwary_client` can inadvertantly set +up scenarios where Sylvester's code may break, and Sylvester might be +completely unaware of the vulnerability. + +## Type parameterization: the problem of trait bounds +[The Zook example]: #type-parameterization-the-problem-of-trait-bounds + +One might think that all instances of this problem can +be identified by the use of a lifetime-parametric `Drop` implementation, +such as `impl<'a> Drop for Sneetch<'a> { ..> }` + +However, consider this trait and struct: + +```rust +trait Button { fn push(&self); } +struct Zook { button: B, } +#[unsafe_destructor] +impl Drop for Zook { + fn drop(&mut self) { self.button.push(); } +} +``` +In this case, it is not obvious that there is anything wrong here. + +But if we continue the example: +```rust +struct Bomb { usable: bool } +impl Drop for Bomb { fn drop(&mut self) { self.usable = false; } } +impl Bomb { fn activate(&self) { assert!(self.usable) } } + +enum B<'a> { HarmlessButton, BigRedButton(&'a Bomb) } +impl<'a> Button for B<'a> { + fn push(&self) { + if let B::BigRedButton(borrowed) = *self { + borrowed.activate(); + } + } +} + +fn main() { + let (mut zook, ticking); + zook = Zook { button: B::HarmlessButton }; + ticking = Bomb { usable: true }; + zook.button = B::BigRedButton(&ticking); +} +``` +Within the `zook` there is a hidden reference to borrowed data, +`ticking`, that is assigned the same lifetime as `zook` but that +will be dropped before `zook` is. + +(These examples may seem contrived; see [Appendix A] for a far less +contrived example, that also illustrates how the use of borrowed data +can lie hidden behind type parameters.) + +## The proposal + +This RFC is proposes to fix this scenario, by having the compiler +ensure that types with destructors are only employed in contexts where +either any borrowed data with lifetime `'a` within the type either +strictly outlives the value of that type, or such borrowed data is +provably not accessible from any `Drop` implementation via a reference +of type `&'a`/`&'a mut`. This is the "Drop-Check" (aka `dropck`) rule. + +# Detailed design + +## The Drop-Check Rule +[The Drop-Check Rule]: #the-drop-check-rule + +The Motivation section alluded to the compiler enforcing a new rule. +Here is a more formal statement of that rule: + +Let `v` be some value (either temporary or named) +and `'a` be some lifetime (scope); +if the type of `v` owns data of type `D`, where +(1.) `D` has a lifetime- or type-parametric `Drop` implementation, and +(2.) the structure of `D` can reach a reference of type `&'a _`, and +(3.) either: + + * (A.) the `Drop impl` for `D` instantiates `D` at `'a` + directly, i.e. `D<'a>`, or, + + * (B.) the `Drop impl` for `D` has some type parameter with a + trait bound `T` where `T` is a trait that has at least + one method, + +then `'a` must strictly outlive the scope of `v`. + +(Note: This rule is using two phrases that deserve further +elaboration and that are discussed further in sections that +follow: ["the type owns data of type `D`"][type-ownership] +and ["must strictly outlive"][strictly-outlives].) + +(Note: When encountering a `D` of the form `Box`, we +conservatively assume that such a type has a `Drop` implementation +parametric in `'b`.) + +This rule allows much sound existing code to compile without complaint +from `rustc`. This is largely due to the fact that many `Drop` +implementations enjoy near-complete parametricity: They tend to not +impose any bounds at all on their type parameters, and thus the rule +does not apply to them. + +At the same time, this rule catches the cases where a destructor could +possibly reference borrowed data via a reference of type `&'a _` or +`&'a mut_`. Here is why: + +Condition (A.) ensures that a type like `Sneetch<'a>` +from [the Sneetch example] will only be +assigned to an expression `s` where `'a` strictly outlives `s`. + +Condition (B.) catches cases like `Zook>` from +[the Zook example], where the destructor's interaction with borrowed +data is hidden behind a method call in the `fn drop`. + +## Near-complete parametricity suffices + +### Noncopy types + +All non-`Copy` type parameters are (still) assumed to have a +destructor. Thus, one would be correct in noting that even a type +`T` with no bounds may still have one hidden method attached; namely, +its `Drop` implementation. + +However, the drop implementation for `T` can only be called when +running the destructor for value `v` if either: + + 1. the type of `v` owns data of type `T`, or + + 2. the destructor of `v` constructs an instance of `T`. + +In the first case, the Drop-Check rule ensures that `T` must satisfy +either Condition (A.) or (B.). In this second case, the freshly +constructed instance of `T` will only be able to access either +borrowed data from `v` itself (and thus such data will already have +lifetime that strictly outlives `v`) or data created during the +execution of the destructor. + +### `Any` instances + +All types implementing `Any` is forced to outlive `'static`. So one +should not be able to hide borrowed data behind the `Any` trait, and +therefore it is okay for the analysis to treat `Any` like a black box +whose destructor is safe to run (at least with respect to not +accessing borrowed data). + +## Strictly outlives +[strictly-outlives]: #strictly-outlives + +There is a notion of "strictly outlives" within the compiler +internals. (This RFC is not adding such a notion to the language +itself; expressing "'a strictly outlives 'b" as an API constraint is +not a strict necessity at this time.) + +The heart of the idea is this: we approximate the notion of "strictly +outlives" by the following rule: if a value `U` needs to strictly +outlive another value `V` with code extent `S`, we could just say that +`U` needs to live at least as long as the parent scope of `S`. + +There are likely to be sound generalizations of the model given here +(and we will likely need to consider such to adopt future extensions +like Single-Entry-Multiple-Exit (SEME) regions, but that is out of +scope for this RFC). + +In terms of its impact on the language, the main change has already +landed in the compiler; see [Rust PR 21657], which added +`CodeExtent::Remainder`, for more direct details on the implications +of that change written in a user-oriented fashion. + +[Rust PR 21657]: https://github.com/rust-lang/rust/pull/21657 + +One important detail of the strictly-outlives relationship +that comes in part from [Rust PR 21657]: +All bindings introduced by a single `let` statement +are modeled as having the *same* lifetime. +In an example like +```rust +let a; +let b; +let (c, d); +... +``` +`a` strictly outlives `b`, and `b` strictly outlives both `c` and `d`. +However, `c` and `d` are modeled as having the same lifetime; neither +one strictly outlives the other. +(Of course, during code execution, one of them will be dropped before +the other; the point is that when `rustc` builds its internal +model of the lifetimes of data, it approximates and assigns them +both the same lifetime.) This is an important detail, +because there are situations where one *must* assign the same +lifetime to two distinct bindings in order to allow them to +mutually refer to each other's data. + +For more details on this "strictly outlives" model, see [Appendix B]. + +## When does one type own another +[type-ownership]: #when-does-one-type-own-another + +The definition of the Drop-Check Rule used the phrase +"if the type owns data of type `D`". + +This criteria is based on recursive descent of the +structure of an input type `E`. + + * If `E` itself has a Drop implementation that satisfies either + condition (A.) or (B.) then add, for all relevant `'a`, + the constraint that `'a` must outlive the scope of + the value that caused the recursive descent. + + * Otherwise, if we have previously seen `E` during the descent + then skip it (i.e. we assume a type has no destructor of interest + until we see evidence saying otherwise). + This check prevents infinite-looping when we + encounter recursive references to a type, which can arise + in e.g. `Option>`. + + * Otherwise, if `E` is a struct (or tuple), for each of the struct's + fields, recurse on the field's type (i.e., a struct owns its + fields). + + * Otherwise, if `E` is an enum, for each of the enum's variants, + and for each field of each variant, recurse on the field's type + (i.e., an enum owns its fields). + + * Otherwise, if `E` is of the form `& T`, `&mut T`, `* T`, or `fn (T, ...) -> T`, + then skip this `E` + (i.e., references, native pointers, and bare functions do not own + the types they refer to). + + * Otherwise, recurse on any immediate type substructure of `E`. + (i.e., an instantiation of a polymorphic type `Poly` is + assumed to own `T_1` and `T_2`; note that structs and enums *do + not* fall into this category, as they are handled up above; but + this does cover cases like `Box+'a>`). + +### Phantom Data + +The above definition for type-ownership is (believed to be) sound for +pure Rust programs that do not use `unsafe`, but it does not suffice +for several important types without some tweaks. + +In particular, consider the implementation of `Vec`: +as of "Rust 1.0 alpha": +```rust +pub struct Vec { + ptr: NonZero<*mut T>, + len: uint, + cap: uint, +} +``` + +According to the above definition, `Vec` does not own `T`. +This is clearly wrong. + +However, it generalizing the rule to say that `*mut T` owns `T` would +be too conservative, since there are cases where one wants to use +`*mut T` to model references to state that are not owned. + +Therefore, we need some sort of marker, so that types like `Vec` +can express that values of that type own instances of `T`. +The `PhantomData` marker proposed by [RFC 738] ("Support variance +for type parameters") is a good match for this. +This RFC assumes that either [RFC 738] will be accepted, +or if necessary, this RFC will be amended so that it +itself adds the concept of `PhantomData` to the language. +Therefore, as an additional special case to the criteria above +for when the type `E` owns data of type `D`, we include: + + * If `E` is `PhantomData`, then recurse on `T`. + +[RFC 738]: https://github.com/rust-lang/rfcs/pull/738 + +## Examples of changes imposed by the Drop-Check Rule + +### Some cyclic structure is still allowed +[Cyclic structure still allowed]: #some-cyclic-structure-is-still-allowed + +Earlier versions of the Drop-Check rule were quite conservative, to +the point where cyclic data would be disallowed in many contexts. +The Drop-Check rule presented in this RFC was crafted to try +to keep many existing useful patterns working. + +In particular, cyclic structure is still allowed in many +contexts. Here is one concrete example: + +```rust +use std::cell::Cell; + +#[derive(Show)] +struct C<'a> { + v: Vec>>>, +} + +impl<'a> C<'a> { + fn new() -> C<'a> { + C { v: Vec::new() } + } +} + +fn f() { + let (mut c1, mut c2, mut c3); + c1 = C::new(); + c2 = C::new(); + c3 = C::new(); + + c1.v.push(Cell::new(None)); + c1.v.push(Cell::new(None)); + c2.v.push(Cell::new(None)); + c2.v.push(Cell::new(None)); + c3.v.push(Cell::new(None)); + c3.v.push(Cell::new(None)); + + c1.v[0].set(Some(&c2)); + c1.v[1].set(Some(&c3)); + c2.v[0].set(Some(&c2)); + c2.v[1].set(Some(&c3)); + c3.v[0].set(Some(&c1)); + c3.v[1].set(Some(&c2)); +} +``` + +In this code, each of the nodes { `c1`, `c2`, `c3` } contains a +reference to the two other nodes, and those references are stored in a +`Vec`. Note that all of the bindings are introduced by a single +let-statement; this is to accommodate the region inference system +which wants to assign a single code extent to the `'a` lifetime, as +discussed in the [strictly-outlives] section. + +Even though `Vec` itself is defined as implementing `Drop`, +it puts no bounds on `T`, and therefore that `Drop` implementation is +ignored by the Drop-Check rule. + +### Directly mixing cycles and `Drop` is rejected + +[The Sneetch example] illustrates a scenario were borrowed data is +dropped while there is still an outstanding borrow that will be +accessed by a destructor. In that particular example, one can easily +reorder the bindings to ensure that the `StarOffMachine` outlives all +of the sneetches. + +But there are other examples that have no such resolution. In +particular, graph-structured data where the destructor for each node +accesses the neighboring nodes in the graph; this simply cannot be +done soundly, because when there are cycles, there is no legal order in which to drop the nodes. + +(At least, we cannot do it soundly without imperatively removing a +node from the graph as the node is dropped; but we are not going to +attempt to support verifying such an invariant as part of this RFC; to +my knowledge it is not likely to be feasible with type-checking based +static analyses). + +In any case, we can easily show some code that will now start to be +rejected due to the Drop-Check rule: we take the same `C<'a>` example +of cyclic structure given above, but we now attach a `Drop` +implementation to `C<'a>`: + +```rust +use std::cell::Cell; + +#[derive(Show)] +struct C<'a> { + v: Vec>>>, +} + +impl<'a> C<'a> { + fn new() -> C<'a> { + C { v: Vec::new() } + } +} + +// (THIS IS NEW) +impl<'a> Drop for C<'a> { + fn drop(&mut self) { } +} + +fn f() { + let (mut c1, mut c2, mut c3); + c1 = C::new(); + c2 = C::new(); + c3 = C::new(); + + c1.v.push(Cell::new(None)); + c1.v.push(Cell::new(None)); + c2.v.push(Cell::new(None)); + c2.v.push(Cell::new(None)); + c3.v.push(Cell::new(None)); + c3.v.push(Cell::new(None)); + + c1.v[0].set(Some(&c2)); + c1.v[1].set(Some(&c3)); + c2.v[0].set(Some(&c2)); + c2.v[1].set(Some(&c3)); + c3.v[0].set(Some(&c1)); + c3.v[1].set(Some(&c2)); +} +``` + +Now the addition of `impl<'a> Drop for C<'a>` changes +the results entirely; + +The Drop-Check rule sees the newly added `impl<'a> Drop for C<'a>`, +which means that for every value of type `C<'a>`, `'a` must strictly +outlive the value. But in the binding +`let (mut c1, mut c2, mut c3)` , all three bindings are assigned +the same type `C<'scope_of_c1_c2_and_c3>`, where +`'scope_of_c1_c2_and_c3` does not strictly outlive any of the three. +Therefore this code will be rejected. + +(Note: it is irrelevant that the `Drop` implementation is a no-op +above. The analysis does not care what the contents of that code are; +it solely cares about the public API presented by the type to its +clients. After all, the `Drop` implementation for `C<'a>` could be +rewritten tomorrow to contain code that accesses the neighboring +nodes. + +### Some temporaries need to be given names + +Due to the way that `rustc` implements the [strictly-outlives] +relation in terms of code-extents, the analysis does not know in an +expression like `foo().bar().quux()` in what order the temporary +values `foo()` and `foo().bar()` will be dropped. + +Therefore, the Drop-Check rule sometimes forces one to rewrite the +code so that it is apparent to the compiler that the value from +`foo()` will definitely outlive the value from `foo().bar()`. + +Thus, on occasion one is forced to rewrite: +```rust +let q = foo().bar().quux(); +... +``` + +as: +```rust +let foo = foo(); +let q = foo.bar().quux() +... +``` + +or even sometimes as: +```rust +let foo = foo(); +let bar = foo.bar(); +let q = bar.quux(); +... +``` +depending on the types involved. + +In practice, pnkfelix saw this arise most often +with code like this: + +```rust +for line in old_io::stdin().lock().lines() { + ... +} +``` + +Here, the result of `stdin()` is a `StdinReader`, which holds a +`RaceBox` in a `Mutex` behind an `Arc`. The result of the `lock()` +method is a `StdinReaderGuard<'a>`, which owns a `MutexGuard<'a, +RaceBox>`. The `MutexGuard` has a `Drop` implementation that is +parametric in `'a`; thus, the Drop-Check rule insists that the +lifetime assigned to `'a` strictly outlive the `MutexGuard`. + +So, under this RFC, we rewrite the code like so: +```rust +let stdin = old_io::stdin(); +for line in stdin.lock().lines() { + ... +} +``` + +(pnkfelix acknowledges that this rewrite is unfortunate. Potential +future work would be to further revise the code extent system so that +the compiler knows that the temporary from `stdin()` will outlive the +temporary from `stdin().lock()`. However, such a change to the +code extents could have unexpected fallout, analogous to the +fallout that was associated with [Rust PR 21657].) + +### Mixing acyclic structure and `Drop` is sometimes rejected + +This is an example of sound code, accepted today, that is +unfortunately rejected by the Drop-Check rule (at least in pnkfelix's +prototype): + +```rust +#![feature(unsafe_destructor)] + +use std::cell::Cell; + +#[derive(Show)] +struct C<'a> { + f: Cell>>, +} + +impl<'a> C<'a> { + fn new() -> C<'a> { + C { f: Cell::new(None), } + } +} + +// force dropck to care about C<'a> +#[unsafe_destructor] +impl<'a> Drop for C<'a> { + fn drop(&mut self) { } +} + +fn f() { + let c2; + let mut c1; + + c1 = C::new(); + c2 = C::new(); + + c1.f.set(Some(&c2)); +} + +fn main() { + f(); +} +``` + +In principle this should work, since `c1` and `c2` are assigned to +distinct code extents, and `c1` will be dropped before `c2`. However, +in the prototype, the region inference system is determining that the +lifetime `'a` in `&'a C<'a>` (from the `c1.f.set(Some(&c2));` +statement) needs to cover the whole block, rather than just the block +remainder extent that is actually covered by the `let c2;`. + +(This may just be a bug somewhere in the prototype, but for the time +being pnkfelix is going to assume that it will be a bug that this RFC +is forced to live with indefinitely.) + +## Unsound APIs need to be revised or removed entirely +[Unsound APIs]: #unsound-apis-that-need-to-be-revised-or-removed-entirely + +While the Drop-Check rule is designed to ensure that safe Rust code is +sound in its use of destructors, it cannot assure us that unsafe code +is sound. It is the responsibility of the author of unsafe code to +ensure it does not perform unsound actions; thus, we need to audit our +own API's to ensure that the standard library is not providing +functionality that circumvents the Drop-Check rule. + +The most obvious instance of this is the `arena` crate: in particular: +one can use an instance of `arena::Arena` to create cyclic graph +structure where each node's destructor accesses (via `&_` references) +its neighboring nodes. + +Here is a version of our running `C<'a>` example +(where we now do something interesting the destructor for `C<'a>`) +that demonstrates the problem: + +Example: +```rust +extern crate arena; + +use std::cell::Cell; + +#[derive(Show)] +struct C<'a> { + name: &'static str, + v: Vec>>>, + usable: bool, +} + +impl<'a> Drop for C<'a> { + fn drop(&mut self) { + println!("dropping {}", self.name); + for neighbor in self.v.iter().map(|v|v.get()) { + if let Some(neighbor) = neighbor { + println!(" {} checking neighbor {}", + self.name, neighbor.name); + assert!(neighbor.usable); + } + } + println!("done dropping {}", self.name); + self.usable = false; + + } +} + +impl<'a> C<'a> { + fn new(name: &'static str) -> C<'a> { + C { name: name, v: Vec::new(), usable: true } + } +} + +fn f() { + use arena::Arena; + let arena = Arena::new(); + let (c1, c2, c3); + + c1 = arena.alloc(|| C::new("c1")); + c2 = arena.alloc(|| C::new("c2")); + c3 = arena.alloc(|| C::new("c3")); + + c1.v.push(Cell::new(None)); + c1.v.push(Cell::new(None)); + c2.v.push(Cell::new(None)); + c2.v.push(Cell::new(None)); + c3.v.push(Cell::new(None)); + c3.v.push(Cell::new(None)); + + c1.v[0].set(Some(c2)); + c1.v[1].set(Some(c3)); + c2.v[0].set(Some(c2)); + c2.v[1].set(Some(c3)); + c3.v[0].set(Some(c1)); + c3.v[1].set(Some(c2)); +} +``` + +Calling `f()` results in the following printout: +``` +dropping c3 + c3 checking neighbor c1 + c3 checking neighbor c2 +done dropping c3 +dropping c1 + c1 checking neighbor c2 + c1 checking neighbor c3 +thread '
' panicked at 'assertion failed: neighbor.usable', ../src/test/compile-fail/dropck_untyped_arena_cycle.rs:19 +``` + +This is unsound. It should not be possible to express such a +scenario without using `unsafe` code. + +This RFC suggests that we revise the `Arena` API by adding a phantom +lifetime parameter to its type, and bound the values the arena +allocates by that phantom lifetime, like so: +```rust +pub struct Arena<'longer_than_self> { + _invariant: marker::InvariantLifetime<'longer_than_self>, + ... +} + +impl<'longer_than_self> Arena<'longer_than_self> { + pub fn alloc(&self, op: F) -> &mut T + where F: FnOnce() -> T { + ... + } +} +``` +Admittedly, this is a severe limitation, since it forces the data +allocated by the Arena to store only references to data that strictly +outlives the arena, regardless of whether the allocated data itself +even has a destructor. (I.e., `Arena` would become much weaker than +`TypedArena` when attempting to work with cyclic structures). +(pnkfelix knows of no way to fix this without adding further extensions +to the language, e.g. some way to express "this type's destructor accesses +none of its borrowed data", which is out of scope for this RFC.) + +Alternatively, we could just deprecate the `Arena` API, (which is not +marked as stable anyway. + +The example given here can be adapted to other kinds of backing +storage structures, in order to double-check whether the API is likely +to be sound or not. For example, the `arena::TypedArena` type +appears to be sound (as long as it carries `PhantomData` just like +`Vec` does). In particular, when one ports the above example to use +`TypedArena` instead of `Arena`, it is statically rejected by `rustc`. + +## The final goal: remove #[unsafe_destructor] + +Once all of the above pieces have landed, lifetime- and +type-parameterized `Drop` will be safe, and thus we will be able to +remove `#[unsafe_destructor]`! + +# Drawbacks + +* The Drop-Check rule is a little complex, and does disallow some + sound code that would compile today. + +* The change proposed in this RFC places restrictions on uses of types + with attached destructors, but provides no way for a type `Foo<'a>` to + state as part of its public interface that its drop implementation + will not read from any borrowed data of lifetime `'a`. (Extending the + language with such a feature is potential future work, but is out of + scope for this RFC.) + +* Some useful interfaces are going to be disallowed by this RFC. + For example, the RFC recommends that the current `arena::Arena` + be revised or simply deprecated, due to its unsoundness. + (If desired, we could add an `UnsafeArena` that continues + to support the current `Arena` API with the caveat that its users need to + *manually* enforce the constraint that the destructors do not access + data that has been already dropped. But again, that decision is out + of scope for this RFC.) + +# Alternatives + +We considered simpler versions of [the Drop-Check rule]; in +particular, an earlier version of it simply said that if the type of +`v` owns any type `D` that implements `Drop`, then for any lifetime +`'a` that `D` refers to, `'a` must strictly outlive the scope of `v`, +because the destructor for `D` might hypothetically access borrowed +data of lifetime `'a`. + + * This rule is simpler in the sense that it more obviously sound. + + * But this rule disallowed far more code; e.g. the [Cyclic structure + still allowed] example was rejected under this more naive rule, + because `C<'a>` owns D = `Vec>>>`, and this + particular D refers to `'a`. + +---- + +Sticking with the current `#[unsafe_destructor]` approach to lifetime- +and type-parametric types that implement `Drop` is not really tenable; +we need to do something (and we have been planning to do something +like this RFC for over a year). + +# Unresolved questions + +* Is the Drop-Check rule provably sound? pnkfelix has based his + argument on informal reasoning about parametricity, but it would be + good to put forth a more formal argument. (And in the meantime, + pnkfelix invites the reader to try to find holes in the rule, + preferably with concrete examples that can be fed into the + prototype.) + +* How much can covariance help with some of the lifetime issues? + + See in particular [Rust Issue 21198] "new scoping rules for safe + dtors may benefit from variance on type params" + +[Rust Issue 21198]: https://github.com/rust-lang/rust/issues/21198 + + Before adding Condition (B.) to [the Drop-Check Rule], it seemed + like enabling covariance in more standard library types was going to + be very important for landing this work. And even now, it is + possible that covariance could still play an important role. + But nonetheless, there are some API's whose current form is fundamentally + incompatible with covariance; e.g. the current `TypedArena` API + is fundamentally invariant with respect to `T`. + +# Appendices + +## Appendix A: Why and when would Drop read from borrowed data +[Appendix A]: #appendix-a-why-and-when-would-drop-read-from-borrowed-data + +Here is a story, about two developers, Julia and Kurt, and the code +they hacked on. + +Julia inherited some code, and it is misbehaving. It appears like +key/value entries that the code inserts into the standard library's +`HashMap` are not always retrievable from the map. Julia's current +hypothesis is that something is causing the keys' computed hash codes +to change dynamically, sometime after the entries have been inserted +into the map (but it is not obvious when or if this change occurs, nor +what its source might be). Julia thinks this hypothesis is plausible, +but does not want to audit all of the key variants for possible causes +of hash code corruption until after she has hard evidence confirming +the hypothesis. + +Julia writes some code that walks a hash map's internals and checks +that all of the keys produce a hash code that is consistent with their +location in the map. However, since it is not clear when the keys' +hash codes are changing, it is not clear where in the overall code +base she should add such checks. (The hash map is sufficiently large +that she cannot simply add calls to do this consistency check +everywhere.) + +However, there is one spot in the control flow that is a clear +contender: if the check is run right before the hash map is dropped, +then that would surely be sometime after the hypothesized corruption +had occurred. In other words, a destructor for the hash map seems +like a good place to start; Julia could make her own local copy of the +hash map library and add this check to a `impl Drop for +HashMap { ... }` implementation. + +In this new destructor code, Julia needs to invoke the hash-code +method on `K`. So she adds the bound `where K: Eq + Hash` to her +`HashMap` and its `Drop` implementation, along with the corresponding +code to walk the table's entries and check that the hash codes for all +the keys matches their position in the table. + +Using this, Julia manages confirms her hypothesis (yay). And since it +was a reasonable amount of effort to do this experiment, she puts this +variation of `HashMap` up on `crates.io`, calling it the +`CheckedHashMap` type. + +Sometime later, Kurt pulls a copy of `CheckHashMap` off of +`crates.io`, and he happens to write some code that looks like this: + +```rust +fn main() { + #[derive(PartialEq, Eq, Hash, Debug)] + struct Key<'a> { name: &'a str } + + { + let (key, mut map, name) : (Key, CheckedHashMap<&Key, String>, String); + name = format!("k1"); + map = CheckedHashMap::new(); + key = Key { name: &*name }; + map.map.insert(&key, format!("Value for k1")); + } +} +``` + +And, kaboom: when the map goes out of scope, the destructor for +`CheckedHashMap` attempts to compute a hashcode on a reference to +`key` that may not still be valid, and even if `key` is still valid, +it holds a reference to a slice of name that likewise may not still be +valid. + +This illustrates a case where one might legitimately mix destructor +code with borrowed data. (Is this example any less contrived than +[the Sneetch example]? That is in the eye of the beholder.) + +## Appendix B: strictly-outlives details +[Appendix B]: #appendix-b-strictly-outlives-details + +The rest of this section gets into some low-level details of parts of +how `rustc` is implemented, largely because the changes described here +do have an impact on what results the `rustc` region inference system +produces (or fails to produce). It serves mostly to explain (1.) why +[Rust PR 21657] was implemented, and (2.) why one may sometimes see +indecipherable region-inference errors. + +### Review: Code Extents + +(Nothing here is meant to be new; its just providing context for the +next subsection.) + +Every Rust expression evaluates to a value `V` that is either placed +into some location with an associated lifetime such as `'l`, or `V` is +associated with a block of code that statically delimits the `V`'s +runtime extent (i.e. we know from the function's text where `V` will +be dropped). In the `rustc` source, the blocks of code are sometimes +called "scopes" and sometimes "code extents"; I will try to stick to +the latter term here, since the word "scope" is terribly overloaded. + +Currently, the code extents in Rust are arranged into a tree hierarchy +structured similarly to the abstract syntax tree; for any given code +extent, the compiler can ask for its parent in this hierarchy. + +Every Rust expression `E` has an associated "terminating extent" +somewhere in its chain of parent code extents; temporary values +created during the execution of `E` are stored at stack locations +managed by `E`'s terminating extent. When we hit the end of the +terminating extent, all such temporaries are dropped. + +An example of a terminating extent: in a let-statement like: +```rust +let = ; +``` +the terminating extent of `` is the let-statement itself. So in +an example like: +```rust +let a1 = input.f().g();` +... +``` +there is a temporary value returned from `input.f()`, and it will live +until the end of the let statement, but not into the subsequent code +represented by `...`. (The value resulting from `input.f().g()`, on +the other hand, will be stored in `a1` and lives until the end of the +block enclosing the let statement.) + +(It is not important to this RFC to know the full set of rules +dictating which parent expressions are deemed terminating extents; we +just will assume that these things do exist.) + +For any given code extent `S`, the parent code extent `P` of `S`, if +it exists, potentially holds bits of code that will execute after `S` +is done. Any cleanup code for any values assigned to `P` will only +run after we have finished with *all* code associated with `S`. + +### A problem with 1.0 alpha code extents + +So, with the above established, we have a hint at how to express that +a lifetime `'a` needs to strictly outlive a particular code extent `S`: +simply say that `'a` needs to live at least long as `P`. + +However, this is a little too simplistic, at least for the Rust +compiler circa Rust 1.0 alpha. The main problem is that all the +bindings established by let statements in a block are assigned the +same code extent. + +This, combined with our simplistic definition, yields real problems. +For example, in: + +```rust +{ + use std::fmt; + #[derive(Debug)] struct DropLoud(&'static str, T); + impl Drop for DropLoud { + fn drop(&mut self) { println!("dropping {}:{:?}", self.0, self.1); } + } + + let c1 = DropLoud("c1", 1); + let c2 = DropLoud("c2", &c1); +} +``` + +In principle, the code above is legal: `c2` will be dropped before +`c1` is, and thus it is okay that `c2` holds a borrowed reference to +`c1` that will be read when `c2` is dropped (indirectly via the +`fmt::Debug` implementation. + +However, with the structure of code extents as of Rust 1.0 alpha, `c1` +and `c2` are both given the same code extent: that of the block +itself. Thus in that context, this definition of "strictly outlives" +indicates that `c1` does *not* strictly outlive `c2`, because `c1` +does not live at least as long as the parent of the block; it only +lives until the end of the block itself. + +This illustrates why "All the bindings established by let statements +in a block are assigned the same code extent" is a problem + +### Block Remainder Code Extents + +The solution proposed here (motivated by experience with the +prototype) is to introduce finer-grained code extents. This solution +is essentially [Rust PR 21657], which has already landed in `rustc`. +(That is in part why this is merely an appendix, rather than part of +the body of the RFC itself.) + +The code extents remain in a tree-hierarchy, but there are now extra +entries in the tree, which provide the foundation for a more precise +"strictly outlives" relation. + +We introduce a new code extent, called a "block remainder" extent, for +every let statement in a block, representing the suffix of the block +covered by the bindings in that let statement. + +For example, given `{ let (a, b) = EXPR_1; let c = EXPR_2; ... }`, +which previously had a code extent structure like: +``` +{ let (a, b) = EXPR_1; let c = EXPR_2; ... } + +----+ +----+ + +------------------+ +-------------+ ++------------------------------------------+ +``` +so the parent extent of each let statement was the whole block. + +But under the new rules, there are two new block remainder extents +introduced, with this structure: + +``` +{ let (a, b) = EXPR_1; let c = EXPR_2; ... } + +----+ +----+ + +------------------+ +-------------+ + +-------------------+ <-- new: block remainder 2 + +------------------------------------------+ <-- new: block remainder 1 ++---------------------------------------------+ +``` + +The first let-statement introduces a block remainder extent that +covers the lifetime for `a` and `b`. The second let-statement +introduces a block remainder extent that covers the lifetime for `c`. + +Each let-statement continues to be the terminating extent for its +initializer expression. But now, the parent of the extent of the +second let statement is a block remainder extent ("block remainder +2"), and, importantly, the parent of block remainder 2 is another +block remainder extent ("block remainder 1"). This way, we precisely +represent the lifetimes of the named values bound by each let +statement, and know that `a` and `b` both strictly outlive `c` +as well as the temporary values created during evaluation of +`EXPR_2`. +Likewise, `c` strictly outlives the bindings and temporaries created +in the `...` that follows it. + +### Why stop at let-statements? + +This RFC does *not* propose that we attempt to go further and track +the order of destruction of the values bound by a *single* let +statement. + +Such an experiment could be made part of future work, but for now, we +just continue to assign `a` and `b` to the same scope; the compiler +does not attempt to reason about what order they will be dropped in, +and thus we cannot for example reference data borrowed from `a` in any +destructor code for `b`. + +The main reason that we do not want to attempt to produce even finer +grain scopes, at least not right now, is that there are scenarios +where it is *important* to be able to assign the same region to two +distinct pieces of data; in particular, this often arises when one +wants to build cyclic structure, as discussed in +[Cyclic structure still allowed]. diff --git a/text/0771-std-iter-once.md b/text/0771-std-iter-once.md new file mode 100644 index 00000000000..ff205044d08 --- /dev/null +++ b/text/0771-std-iter-once.md @@ -0,0 +1,59 @@ +- Start Date: 2015-1-30 +- RFC PR: https://github.com/rust-lang/rfcs/pull/771 +- Rust Issue: https://github.com/rust-lang/rust/issues/24443 + +# Summary + +Add a `once` function to `std::iter` to construct an iterator yielding a given value one time, and an `empty` function to construct an iterator yielding no values. + +# Motivation + +This is a common task when working with iterators. Currently, this can be done in many ways, most of which are unergonomic, do not work for all types (e.g. requiring Copy/Clone), or both. `once` and `empty` are simple to implement, simple to use, and simple to understand. + +# Detailed design + +`once` will return a new struct, `std::iter::Once`, implementing Iterator. Internally, `Once` is simply a newtype wrapper around `std::option::IntoIter`. The actual body of `once` is thus trivial: + +```rust +pub struct Once(std::option::IntoIter); + +pub fn once(x: T) -> Once { + Once( + Some(x).into_iter() + ) +} +``` + +`empty` is similar: + +```rust +pub struct Empty(std::option::IntoIter); + +pub fn empty(x: T) -> Empty { + Empty( + None.into_iter() + ) +} +``` + +These wrapper structs exist to allow future backwards-compatible changes, and hide the implementation. + +# Drawbacks + +Although a tiny amount of code, it still does come with a testing, maintainance, etc. cost. + +It's already possible to do this via `Some(x).into_iter()`, `std::iter::repeat(x).take(1)` (for `x: Clone`), `vec![x].into_iter()`, various contraptions involving `iterate`... + +The existence of the `Once` struct is not technically necessary. + +# Alternatives + +There are already many, many alternatives to this- `Option::into_iter()`, `iterate`... + +The `Once` struct could be not used, with `std::option::IntoIter` used instead. + +# Unresolved questions + +Naturally, `once` is fairly bikesheddable. `one_time`? `repeat_once`? + +Are versions of `once` that return `&T`/`&mut T` desirable? diff --git a/text/0803-type-ascription.md b/text/0803-type-ascription.md new file mode 100644 index 00000000000..e5e62c37ad2 --- /dev/null +++ b/text/0803-type-ascription.md @@ -0,0 +1,226 @@ +- Start Date: 2015-2-3 +- RFC PR: [rust-lang/rfcs#803](https://github.com/rust-lang/rfcs/pull/803) +- Rust Issue: [rust-lang/rust#23416](https://github.com/rust-lang/rust/issues/23416) +- Feature: `ascription` + +# Summary + +Add type ascription to expressions. (An earlier version of this RFC covered type +ascription in patterns too, that has been postponed). + +Type ascription on expression has already been implemented. + +See also discussion on [#354](https://github.com/rust-lang/rfcs/issues/354) and +[rust issue 10502](https://github.com/rust-lang/rust/issues/10502). + + +# Motivation + +Type inference is imperfect. It is often useful to help type inference by +annotating a sub-expression with a type. Currently, this is only possible by +extracting the sub-expression into a variable using a `let` statement and/or +giving a type for a whole expression or pattern. This is un- ergonomic, and +sometimes impossible due to lifetime issues. Specifically, where a variable has +lifetime of its enclosing scope, but a sub-expression's lifetime is typically +limited to the nearest semi-colon. + +Typical use cases are where a function's return type is generic (e.g., collect) +and where we want to force a coercion. + +Type ascription can also be used for documentation and debugging - where it is +unclear from the code which type will be inferred, type ascription can be used +to precisely communicate expectations to the compiler or other programmers. + +By allowing type ascription in more places, we remove the inconsistency that +type ascription is currently only allowed on top-level patterns. + +## Examples: + +(Somewhat simplified examples, in these cases there are sometimes better +solutions with the current syntax). + +Generic return type: + +``` +// Current. +let z = if ... { + let x: Vec<_> = foo.enumerate().collect(); + x +} else { + ... +}; + +// With type ascription. +let z = if ... { + foo.enumerate().collect(): Vec<_> +} else { + ... +}; +``` + +Coercion: + +``` +fn foo(a: T, b: T) { ... } + +// Current. +let x = [1u32, 2, 4]; +let y = [3u32]; +... +let x: &[_] = &x; +let y: &[_] = &y; +foo(x, y); + +// With type ascription. +let x = [1u32, 2, 4]; +let y = [3u32]; +... +foo(x: &[_], y: &[_]); +``` + +Generic return type and coercion: + +``` +// Current. +let x: T = { + let temp: U<_> = foo(); + temp +}; + +// With type ascription. +let x: T = foo(): U<_>; +``` + + +# Detailed design + +The syntax of expressions is extended with type ascription: + +``` +e ::= ... | e: T +``` + +where `e` is an expression and `T` is a type. Type ascription has the same +precedence as explicit coercions using `as`. + +When type checking `e: T`, `e` must have type `T`. The `must have type` test +includes implicit coercions and subtyping, but not explicit coercions. `T` may +be any well-formed type. + +At runtime, type ascription is a no-op, unless an implicit coercion was used in +type checking, in which case the dynamic semantics of a type ascription +expression are exactly those of the implicit coercion. + +@eddyb has implemented the expressions part of this RFC, +[PR](https://github.com/rust-lang/rust/pull/21836). + +This feature should land behind the `ascription` feature gate. + + +### coercion and `as` vs `:` + +A downside of type ascription is the overlap with explicit coercions (aka casts, +the `as` operator). To the programmer, type ascription makes implicit coercions +explicit (however, the compiler makes no distinction between coercions due to +type ascription and other coercions). In RFC 401, it is proposed that all valid +implicit coercions are valid explicit coercions. However, that may be too +confusing for users, since there is no reason to use type ascription rather than +`as` (if there is some coercion). Furthermore, if programmers do opt to use `as` +as the default whether or not it is required, then it loses its function as a +warning sign for programmers to beware of. + +To address this I propose two lints which check for: trivial casts and trivial +numeric casts. Other than these lints we stick with the proposal from #401 that +unnecessary casts will no longer be an error. + +A trivial cast is a cast `x as T` where `x` has type `U` and `x` can be +implicitly coerced to `T` or is already a subtype of `T`. + +A trivial numeric cast is a cast `x as T` where `x` has type `U` and `x` is +implicitly coercible to `T` or `U` is a subtype of `T`, and both `U` and `T` are +numeric types. + +Like any lints, these can be customised per-crate by the programmer. Both lints +are 'warn' by default. + +Although this is a somewhat complex scheme, it allows code that works today to +work with only minor adjustment, it allows for a backwards compatible path to +'promoting' type conversions from explicit casts to implicit coercions, and it +allows customisation of a contentious kind of error (especially so in the +context of cross-platform programming). + + +### Type ascription and temporaries + +There is an implementation choice between treating `x: T` as an lvalue or +rvalue. Note that when an rvalue is used in 'reference context' (e.g., the +subject of a reference operation), then the compiler introduces a temporary +variable. Neither option is satisfactory, if we treat an ascription expression +as an lvalue (i.e., no new temporary), then there is potential for unsoundness: + +``` +let mut foo: S = ...; +{ + let bar = &mut (foo: T); // S <: T, no coercion required + *bar = ... : T; +} +// Whoops, foo has type T, but the compiler thinks it has type S, where potentially T ` is a type ascription +expression): + +``` +&[mut] +let ref [mut] x = +match { .. ref [mut] x .. => { .. } .. } +.foo() // due to autoref + = ...; +``` + +# Drawbacks + +More syntax, another feature in the language. + +Interacts poorly with struct initialisers (changing the syntax for struct +literals has been [discussed and rejected](https://github.com/rust-lang/rfcs/pull/65) +and again in [discuss](http://internals.rust-lang.org/t/replace-point-x-3-y-5-with-point-x-3-y-5/198)). + +If we introduce named arguments in the future, then it would make it more +difficult to support the same syntax as field initialisers. + + +# Alternatives + +We could do nothing and force programmers to use temporary variables to specify +a type. However, this is less ergonomic and has problems with scopes/lifetimes. + +Rely on explicit coercions - the current plan [RFC 401](https://github.com/rust-lang/rfcs/blob/master/text/0401-coercions.md) +is to allow explicit coercion to any valid type and to use a customisable lint +for trivial casts (that is, those given by subtyping, including the identity +case). If we allow trivial casts, then we could always use explicit coercions +instead of type ascription. However, we would then lose the distinction between +implicit coercions which are safe and explicit coercions, such as narrowing, +which require more programmer attention. This also does not help with patterns. + +We could use a different symbol or keyword instead of `:`, e.g., `is`. + + +# Unresolved questions + +Is the suggested precedence correct? + +Should we remove integer suffixes in favour of type ascription? + +Style guidelines - should we recommend spacing or parenthesis to make type +ascription syntax more easily recognisable? diff --git a/text/0809-box-and-in-for-stdlib.md b/text/0809-box-and-in-for-stdlib.md new file mode 100644 index 00000000000..5c379041bc6 --- /dev/null +++ b/text/0809-box-and-in-for-stdlib.md @@ -0,0 +1,752 @@ +- Feature Name: box_syntax, placement_in_syntax +- Start Date: 2015-02-04 +- RFC PR: [rust-lang/rfcs#809](https://github.com/rust-lang/rfcs/pull/809) +- Rust Issue: [rust-lang/rust#22181](https://github.com/rust-lang/rust/issues/22181) + +# Summary + + * Change placement-new syntax from: `box () ` instead + to: `in { }`. + + * Change `box ` to an overloaded operator that chooses its + implementation based on the expected type. + + * Use unstable traits in `core::ops` for both operators, so that + libstd can provide support for the overloaded operators; the + traits are unstable so that the language designers are free to + revise the underlying protocol in the future post 1.0. + + * Feature-gate the placement-`in` syntax via the feature name `placement_in_syntax`. + + * The overloaded `box ` will reuse the `box_syntax` feature name. + +(Note that `` here denotes the interior of a block expression; i.e.: +``` + ::= [ ';' | ] * [ ] +``` +This is the same sense in which the `block` nonterminal is used in the +reference manual.) + +# Motivation + +Goal 1: We want to support an operation analogous to C++'s placement +new, as discussed previously in [Placement Box RFC PR 470]. + +[Placement Box RFC PR 470]: https://github.com/rust-lang/rfcs/pull/470 + +Goal 2: We also would like to overload our `box` syntax so that more +types, such as `Rc` and `Arc` can gain the benefit of avoiding +intermediate copies (i.e. allowing expressions to install their result +value directly into the backing storage of the `Rc` or `Arc` +when it is created). + +However, during discussion of [Placement Box RFC PR 470], some things +became clear: + + * Many syntaxes using the `in` keyword are superior to `box () + ` for the operation analogous to placement-new. + + The proposed `in`-based syntax avoids ambiguities such as having + to write `box () ()` (or `box (alloc::HEAP) ()`) when + one wants to surround `` with parentheses. + It allows the parser to provide clearer error messages when + encountering `in ` (clearer compared to the previous + situation with `box `). + + * It would be premature for Rust to commit to any particular + protocol for supporting placement-`in`. A number of participants in + the discussion of [Placement Box RFC PR 470] were unhappy with the + baroque protocol, especially since it did not support DST and + potential future language changes would allow the protocol + proposed there to be significantly simplified. + +Therefore, this RFC proposes a middle ground for 1.0: Support the +desired syntax, but do not provide stable support for end-user +implementations of the operators. The only stable ways to use the +overloaded `box ` or `in { }` operators will be in +tandem with types provided by the stdlib, such as `Box`. + +# Detailed design + +* Add traits to `core::ops` for supporting the new operators. + This RFC does not commit to any particular set of traits, + since they are not currently meant to be implemented outside + of the stdlib. (However, a demonstration of one working set + of traits is given in [Appendix A].) + + Any protocol that we adopt for the operators needs to properly + handle panics; i.e., `box ` must properly cleanup any + intermediate state if `` panics during its evaluation, + and likewise for `in { }` + + (See [Placement Box RFC PR 470] or [Appendix A] for discussion on + ways to accomplish this.) + +* Change `box ` from built-in syntax (tightly integrated with + `Box`) into an overloaded-`box` operator that uses the expected + return type to decide what kind of value to create. For example, if + `Rc` is extended with an implementation of the appropriate + operator trait, then + + ```rust + let x: Rc<_> = box format!("Hello"); + ``` + + could be a legal way to create an `Rc` without having to + invoke the `Rc::new` function. This will be more efficient for + building instances of `Rc` when `T` is a large type. (It is also + arguably much cleaner syntax to read, regardless of the type `T`.) + + Note that this change will require end-user code to no longer assume + that `box ` always produces a `Box`; such code will need to + either add a type annotation e.g. saying `Box<_>`, or will need to + call `Box::new()` instead of using `box `. + +* Add support for parsing `in { }` as the basis for the + placement operator. + + Remove support for `box () ` from the parser. + + Make `in { }` an overloaded operator that uses + the `` to determine what placement code to run. + + Note: when `` is just an identifier, + ` { }` is not parsed as a struct literal. + We accomplish this via the same means that is used e.g. for `if` expressions: + we restrict `` to not include struct literals + (see [RFC 92]). + +[RFC 92]: https://github.com/rust-lang/rfcs/blob/master/text/0092-struct-grammar.md + +* The only stablized implementation for the `box ` operator + proposed by this RFC is `Box`. The question of which other types + should support integration with `box ` is a library design + issue and needs to go through the conventions and library + stabilization process. + + Similarly, this RFC does not propose *any* stablized implementation + for the `in { }` operator. (An obvious candidate for + `in { }` integration would be a `Vec::emplace_back` + method; but again, the choice of which such methods to add is a + library design issue, beyond the scope of this RFC.) + + (A sample implementation illustrating how to support the operators + on other types is given in [Appendix A].) + +* Feature-gate the two syntaxes under separate feature identifiers, so that we + have the option of removing the gate for one syntax without the other. + (I.e. we already have much experience with non-overloaded `box `, + but we have nearly no experience with placement-`in` as described here). + +# Drawbacks + +* End-users might be annoyed that they cannot add implementations of + the overloaded-`box` and placement-`in` operators themselves. But + such users who want to do such a thing will probably be using the + nightly release channel, which will not have the same stability + restrictions. + +* The currently-implemented desugaring does not infer that in an + expression like `box as Box`, the use of `box ` + should evaluate to some `Box<_>`. pnkfelix has found that this is + due to a weakness in compiler itself ([Rust PR 22012]). + + Likewise, the currently-implemented desugaring does not interact + well with the combination of type-inference and implicit coercions + to trait objects. That is, when `box ` is used in a context + like this: + ``` + fn foo(Box) { ... } + foo(box some_expr()); + ``` + the type inference system attempts to unify the type `Box` + with the return-type of `::protocol::Boxed::finalize(place)`. + This may also be due to weakness in the compiler, but that is not + immediately obvious. + + [Appendix B] has a complete code snippet (using a desugaring much like + the one found in the other appendix) that illustrates two cases of + interest where this weakness arises. + +[Rust PR 22012]: https://github.com/rust-lang/rust/pull/22012 + +# Alternatives + +* We could keep the `box () ` syntax. It is hard + to see what the advantage of that is, unless (1.) we can identify + many cases of types that benefit from supporting both + overloaded-`box` and placement-`in`, or unless (2.) we anticipate + some integration with `box` pattern syntax that would motivate using + the `box` keyword for placement. + +* We could use the `in () ` syntax. An earlier + version of this RFC used this alternative. It is easier to implement + on the current code base, but I do not know of any other benefits. + (Well, maybe parentheses are less "heavyweight" than curly-braces?) + +* A number of other syntaxes for placement have been proposed in the + past; see for example discussion on [RFC PR 405] as well as + [the previous placement RFC][RFC Surface Syntax Discussion]. + + The main constraints I want to meet are: + 1. Do not introduce ambiguity into the grammar for Rust + 2. Maintain left-to-right evaluation order (so the place should + appear to the left of the value expression in the text). + + But otherwise I am not particularly attached to any single + syntax. + + One particular alternative that might placate those who object + to placement-`in`'s `box`-free form would be: + `box (in ) `. + +[RFC PR 405]: https://github.com/rust-lang/rfcs/issues/405 + +[RFC Surface Syntax Discussion]: https://github.com/pnkfelix/rfcs/blob/fsk-placement-box-rfc/text/0000-placement-box.md#same-semantics-but-different-surface-syntax + +* Do nothing. I.e. do not even accept an unstable libstd-only protocol + for placement-`in` and overloaded-`box`. This would be okay, but + unfortunate, since in the past some users have identified + intermediate copies to be a source of inefficiency, and proper use + of `box ` and placement-`in` can help remove intermediate + copies. + +# Unresolved questions + +This RFC represents the current plan for `box`/`in`. However, in the +[RFC discussion][809] a number of questions arose, including possible +design alternatives that might render the `in` keyword unnecessary. +Before the work in this RFC can be unfeature-gated, these questions should +be satisfactorily resolved: + +* Can the type-inference and coercion system of the compiler be + enriched to the point where overloaded `box` and `in` are + seamlessly usable? Or are type-ascriptions unavoidable when + supporting overloading? + + In particular, I am assuming here that some amount of current + weakness cannot be blamed on any particular details of the + sample desugaring. + + (See [Appendix B] for example code showing weaknesses in + `rustc` of today.) +* Do we want to change the syntax for `in(place) expr` / `in place { expr }`? +* Do we need `in` at all, or can we replace it with some future possible feature such as `DerefSet` or `&out` etc? +* Do we want to improve the protocol in some way? + - Note that the protocol was specifically excluded from this RFC. + - Support for DST expressions such as `box [22, ..count]` (where `count` is a dynamic value)? + - Protocol making use of more advanced language features? + +# Appendices + +## Appendix A: sample operator traits +[Appendix A]: #appendix-a-sample-operator-traits + +The goal is to show that code like the following can be made to work +in Rust today via appropriate desugarings and trait definitions. + +```rust +fn main() { + use std::rc::Rc; + + let mut v = vec![1,2]; + in v.emplace_back() { 3 }; // has return type `()` + println!("v: {:?}", v); // prints [1,2,3] + + let b4: Box = box 4; + println!("b4: {}", b4); + + let b5: Rc = box 5; + println!("b5: {}", b5); + + let b6 = in HEAP { 6 }; // return type Box + println!("b6: {}", b6); +} +``` + +To demonstrate the above, this appendix provides code that runs today; +it demonstrates sample protocols for the proposed operators. +(The entire code-block below should work when e.g. cut-and-paste into +http::play.rust-lang.org ) + +```rust +#![feature(unsafe_destructor)] // (hopefully unnecessary soon with RFC PR 769) +#![feature(alloc)] + +// The easiest way to illustrate the desugaring is by implementing +// it with macros. So, we will use the macro `in_` for placement-`in` +// and the macro `box_` for overloaded-`box`; you should read +// `in_!( () )` as if it were `in { }` +// and +// `box_!( )` as if it were `box `. + +// The two macros have been designed to both 1. work with current Rust +// syntax (which in some cases meant avoiding certain associated-item +// syntax that currently causes the compiler to ICE) and 2. infer the +// appropriate code to run based only on either `` (for +// placement-`in`) or on the expected result type (for +// overloaded-`box`). + +macro_rules! in_ { + (($placer:expr) $value:expr) => { { + let p = $placer; + let mut place = ::protocol::Placer::make_place(p); + let raw_place = ::protocol::Place::pointer(&mut place); + let value = $value; + unsafe { + ::std::ptr::write(raw_place, value); + ::protocol::InPlace::finalize(place) + } + } } +} + +macro_rules! box_ { + ($value:expr) => { { + let mut place = ::protocol::BoxPlace::make_place(); + let raw_place = ::protocol::Place::pointer(&mut place); + let value = $value; + unsafe { + ::std::ptr::write(raw_place, value); + ::protocol::Boxed::finalize(place) + } + } } +} + +// Note that while both desugarings are very similar, there are some +// slight differences. In particular, the placement-`in` desugaring +// uses `InPlace::finalize(place)`, which is a `finalize` method that +// is overloaded based on the `place` argument (the type of which is +// derived from the `` input); on the other hand, the +// overloaded-`box` desugaring uses `Boxed::finalize(place)`, which is +// a `finalize` method that is overloaded based on the expected return +// type. Thus, the determination of which `finalize` method to call is +// derived from different sources in the two desugarings. + +// The above desugarings refer to traits in a `protocol` module; these +// are the traits that would be put into `std::ops`, and are given +// below. + +mod protocol { + +/// Both `in PLACE { BLOCK }` and `box EXPR` desugar into expressions +/// that allocate an intermediate "place" that holds uninitialized +/// state. The desugaring evaluates EXPR, and writes the result at +/// the address returned by the `pointer` method of this trait. +/// +/// A `Place` can be thought of as a special representation for a +/// hypothetical `&uninit` reference (which Rust cannot currently +/// express directly). That is, it represents a pointer to +/// uninitialized storage. +/// +/// The client is responsible for two steps: First, initializing the +/// payload (it can access its address via `pointer`). Second, +/// converting the agent to an instance of the owning pointer, via the +/// appropriate `finalize` method (see the `InPlace`. +/// +/// If evaluating EXPR fails, then the destructor for the +/// implementation of Place to clean up any intermediate state +/// (e.g. deallocate box storage, pop a stack, etc). +pub trait Place { + /// Returns the address where the input value will be written. + /// Note that the data at this address is generally uninitialized, + /// and thus one should use `ptr::write` for initializing it. + fn pointer(&mut self) -> *mut Data; +} + +/// Interface to implementations of `in PLACE { BLOCK }`. +/// +/// `in PLACE { BLOCK }` effectively desugars into: +/// +/// ``` +/// let p = PLACE; +/// let mut place = Placer::make_place(p); +/// let raw_place = Place::pointer(&mut place); +/// let value = { BLOCK }; +/// unsafe { +/// std::ptr::write(raw_place, value); +/// InPlace::finalize(place) +/// } +/// ``` +/// +/// The type of `in PLACE { BLOCK }` is derived from the type of `PLACE`; +/// if the type of `PLACE` is `P`, then the final type of the whole +/// expression is `P::Place::Owner` (see the `InPlace` and `Boxed` +/// traits). +/// +/// Values for types implementing this trait usually are transient +/// intermediate values (e.g. the return value of `Vec::emplace_back`) +/// or `Copy`, since the `make_place` method takes `self` by value. +pub trait Placer { + /// `Place` is the intermedate agent guarding the + /// uninitialized state for `Data`. + type Place: InPlace; + + /// Creates a fresh place from `self`. + fn make_place(self) -> Self::Place; +} + +/// Specialization of `Place` trait supporting `in PLACE { BLOCK }`. +pub trait InPlace: Place { + /// `Owner` is the type of the end value of `in PLACE { BLOCK }` + /// + /// Note that when `in PLACE { BLOCK }` is solely used for + /// side-effecting an existing data-structure, + /// e.g. `Vec::emplace_back`, then `Owner` need not carry any + /// information at all (e.g. it can be the unit type `()` in that + /// case). + type Owner; + + /// Converts self into the final value, shifting + /// deallocation/cleanup responsibilities (if any remain), over to + /// the returned instance of `Owner` and forgetting self. + unsafe fn finalize(self) -> Self::Owner; +} + +/// Core trait for the `box EXPR` form. +/// +/// `box EXPR` effectively desugars into: +/// +/// ``` +/// let mut place = BoxPlace::make_place(); +/// let raw_place = Place::pointer(&mut place); +/// let value = $value; +/// unsafe { +/// ::std::ptr::write(raw_place, value); +/// Boxed::finalize(place) +/// } +/// ``` +/// +/// The type of `box EXPR` is supplied from its surrounding +/// context; in the above expansion, the result type `T` is used +/// to determine which implementation of `Boxed` to use, and that +/// `` in turn dictates determines which +/// implementation of `BoxPlace` to use, namely: +/// `<::Place as BoxPlace>`. +pub trait Boxed { + /// The kind of data that is stored in this kind of box. + type Data; /* (`Data` unused b/c cannot yet express below bound.) */ + type Place; /* should be bounded by BoxPlace */ + + /// Converts filled place into final owning value, shifting + /// deallocation/cleanup responsibilities (if any remain), over to + /// returned instance of `Self` and forgetting `filled`. + unsafe fn finalize(filled: Self::Place) -> Self; +} + +/// Specialization of `Place` trait supporting `box EXPR`. +pub trait BoxPlace : Place { + /// Creates a globally fresh place. + fn make_place() -> Self; +} + +} // end of `mod protocol` + +// Next, we need to see sample implementations of these traits. +// First, `Box` needs to support overloaded-`box`: (Note that this +// is not the desired end implementation; e.g. the `BoxPlace` +// representation here is less efficient than it could be. This is +// just meant to illustrate that an implementation *can* be made; +// i.e. that the overloading *works*.) +// +// Also, just for kicks, I am throwing in `in HEAP { }` support, +// though I do not think that needs to be part of the stable libstd. + +struct HEAP; + +mod impl_box_for_box { + use protocol as proto; + use std::mem; + use super::HEAP; + + struct BoxPlace { fake_box: Option> } + + fn make_place() -> BoxPlace { + let t: T = unsafe { mem::zeroed() }; + BoxPlace { fake_box: Some(Box::new(t)) } + } + + unsafe fn finalize(mut filled: BoxPlace) -> Box { + let mut ret = None; + mem::swap(&mut filled.fake_box, &mut ret); + ret.unwrap() + } + + impl<'a, T> proto::Placer for HEAP { + type Place = BoxPlace; + fn make_place(self) -> BoxPlace { make_place() } + } + + impl proto::Place for BoxPlace { + fn pointer(&mut self) -> *mut T { + match self.fake_box { + Some(ref mut b) => &mut **b as *mut T, + None => panic!("impossible"), + } + } + } + + impl proto::BoxPlace for BoxPlace { + fn make_place() -> BoxPlace { make_place() } + } + + impl proto::InPlace for BoxPlace { + type Owner = Box; + unsafe fn finalize(self) -> Box { finalize(self) } + } + + impl proto::Boxed for Box { + type Data = T; + type Place = BoxPlace; + unsafe fn finalize(filled: BoxPlace) -> Self { finalize(filled) } + } +} + +// Second, it might be nice if `Rc` supported overloaded-`box`. +// +// (Note again that this may not be the most efficient implementation; +// it is just meant to illustrate that an implementation *can* be +// made; i.e. that the overloading *works*.) + +mod impl_box_for_rc { + use protocol as proto; + use std::mem; + use std::rc::{self, Rc}; + + struct RcPlace { fake_box: Option> } + + impl proto::Place for RcPlace { + fn pointer(&mut self) -> *mut T { + if let Some(ref mut b) = self.fake_box { + if let Some(r) = rc::get_mut(b) { + return r as *mut T + } + } + panic!("impossible"); + } + } + + impl proto::BoxPlace for RcPlace { + fn make_place() -> RcPlace { + unsafe { + let t: T = mem::zeroed(); + RcPlace { fake_box: Some(Rc::new(t)) } + } + } + } + + impl proto::Boxed for Rc { + type Data = T; + type Place = RcPlace; + unsafe fn finalize(mut filled: RcPlace) -> Self { + let mut ret = None; + mem::swap(&mut filled.fake_box, &mut ret); + ret.unwrap() + } + } +} + +// Third, we want something to demonstrate placement-`in`. Let us use +// `Vec::emplace_back` for that: + +mod impl_in_for_vec_emplace_back { + use protocol as proto; + + use std::mem; + + struct VecPlacer<'a, T:'a> { v: &'a mut Vec } + struct VecPlace<'a, T:'a> { v: &'a mut Vec } + + pub trait EmplaceBack { fn emplace_back(&mut self) -> VecPlacer; } + + impl EmplaceBack for Vec { + fn emplace_back(&mut self) -> VecPlacer { VecPlacer { v: self } } + } + + impl<'a, T> proto::Placer for VecPlacer<'a, T> { + type Place = VecPlace<'a, T>; + fn make_place(self) -> VecPlace<'a, T> { VecPlace { v: self.v } } + } + + impl<'a, T> proto::Place for VecPlace<'a, T> { + fn pointer(&mut self) -> *mut T { + unsafe { + let idx = self.v.len(); + self.v.push(mem::zeroed()); + &mut self.v[idx] + } + } + } + impl<'a, T> proto::InPlace for VecPlace<'a, T> { + type Owner = (); + unsafe fn finalize(self) -> () { + mem::forget(self); + } + } + + #[unsafe_destructor] + impl<'a, T> Drop for VecPlace<'a, T> { + fn drop(&mut self) { + unsafe { + mem::forget(self.v.pop()) + } + } + } +} + +// Okay, that's enough for us to actually demonstrate the syntax! +// Here's our `fn main`: + +fn main() { + use std::rc::Rc; + // get hacked-in `emplace_back` into scope + use impl_in_for_vec_emplace_back::EmplaceBack; + + let mut v = vec![1,2]; + in_!( (v.emplace_back()) 3 ); + println!("v: {:?}", v); + + let b4: Box = box_!( 4 ); + println!("b4: {}", b4); + + let b5: Rc = box_!( 5 ); + println!("b5: {}", b5); + + let b6 = in_!( (HEAP) 6 ); // return type Box + println!("b6: {}", b6); +} +``` + +## Appendix B: examples of interaction between desugaring, type-inference, and coercion +[Appendix B]: #appendix-b-examples-of-interaction-between-desugaring-type-inference-and-coercion + +The following code works with the current version of `box` syntax in Rust, but needs some sort +of type annotation in Rust as it stands today for the desugaring of `box` to work out. + +(The following code uses `cfg` attributes to make it easy to switch between slight variations +on the portions that expose the weakness.) + +```rust +#![feature(box_syntax)] + +// NOTE: Scroll down to "START HERE" + +fn main() { } + +macro_rules! box_ { + ($value:expr) => { { + let mut place = ::BoxPlace::make(); + let raw_place = ::Place::pointer(&mut place); + let value = $value; + unsafe { ::std::ptr::write(raw_place, value); ::Boxed::fin(place) } + } } +} + +// (Support traits and impls for examples below.) + +pub trait BoxPlace : Place { fn make() -> Self; } +pub trait Place { fn pointer(&mut self) -> *mut Data; } +pub trait Boxed { type Place; fn fin(filled: Self::Place) -> Self; } + +struct BP { _fake_box: Option> } + +impl BoxPlace for BP { fn make() -> BP { make_pl() } } +impl Place for BP { fn pointer(&mut self) -> *mut T { pointer(self) } } +impl Boxed for Box { type Place = BP; fn fin(x: BP) -> Self { finaliz(x) } } + +fn make_pl() -> BP { loop { } } +fn finaliz(mut _filled: BP) -> Box { loop { } } +fn pointer(_p: &mut BP) -> *mut T { loop { } } + +// START HERE + +pub type BoxFn<'a> = Box; + +#[cfg(all(not(coerce_works1),not(coerce_works2),not(coerce_works3)))] +pub fn coerce<'a, F>(f: F) -> BoxFn<'a> where F: Fn(), F: 'a { box_!( f ) } + +#[cfg(coerce_works1)] +pub fn coerce<'a, F>(f: F) -> BoxFn<'a> where F: Fn(), F: 'a { box f } + +#[cfg(coerce_works2)] +pub fn coerce<'a, F>(f: F) -> BoxFn<'a> where F: Fn(), F: 'a { let b: Box<_> = box_!( f ); b } + +#[cfg(coerce_works3)] // (This one assumes PR 22012 has landed) +pub fn coerce<'a, F>(f: F) -> BoxFn<'a> where F: Fn(), F: 'a { box_!( f ) as BoxFn } + + +trait Duh { fn duh() -> Self; } + +#[cfg(all(not(duh_works1),not(duh_works2)))] +impl Duh for Box<[T]> { fn duh() -> Box<[T]> { box_!( [] ) } } + +#[cfg(duh_works1)] +impl Duh for Box<[T]> { fn duh() -> Box<[T]> { box [] } } + +#[cfg(duh_works2)] +impl Duh for Box<[T]> { fn duh() -> Box<[T]> { let b: Box<[_; 0]> = box_!( [] ); b } } +``` + +You can pass `--cfg duh_worksN` and `--cfg coerce_worksM` for suitable +`N` and `M` to see them compile. Here is a transcript with those attempts, +including the cases where type-inference fails in the desugaring. + +``` +% rustc /tmp/foo6.rs --cfg duh_works1 --cfg coerce_works1 +% rustc /tmp/foo6.rs --cfg duh_works1 --cfg coerce_works2 +% rustc /tmp/foo6.rs --cfg duh_works2 --cfg coerce_works1 +% rustc /tmp/foo6.rs --cfg duh_works1 +/tmp/foo6.rs:10:25: 10:41 error: the trait `Place` is not implemented for the type `BP` [E0277] +/tmp/foo6.rs:10 let raw_place = ::Place::pointer(&mut place); + ^~~~~~~~~~~~~~~~ +/tmp/foo6.rs:7:1: 14:2 note: in expansion of box_! +/tmp/foo6.rs:37:64: 37:76 note: expansion site +/tmp/foo6.rs:9:25: 9:41 error: the trait `core::marker::Sized` is not implemented for the type `core::ops::Fn()` [E0277] +/tmp/foo6.rs:9 let mut place = ::BoxPlace::make(); + ^~~~~~~~~~~~~~~~ +/tmp/foo6.rs:7:1: 14:2 note: in expansion of box_! +/tmp/foo6.rs:37:64: 37:76 note: expansion site +error: aborting due to 2 previous errors +% rustc /tmp/foo6.rs --cfg coerce_works1 +/tmp/foo6.rs:10:25: 10:41 error: the trait `Place<[_; 0]>` is not implemented for the type `BP<[T]>` [E0277] +/tmp/foo6.rs:10 let raw_place = ::Place::pointer(&mut place); + ^~~~~~~~~~~~~~~~ +/tmp/foo6.rs:7:1: 14:2 note: in expansion of box_! +/tmp/foo6.rs:52:51: 52:64 note: expansion site +/tmp/foo6.rs:9:25: 9:41 error: the trait `core::marker::Sized` is not implemented for the type `[T]` [E0277] +/tmp/foo6.rs:9 let mut place = ::BoxPlace::make(); + ^~~~~~~~~~~~~~~~ +/tmp/foo6.rs:7:1: 14:2 note: in expansion of box_! +/tmp/foo6.rs:52:51: 52:64 note: expansion site +error: aborting due to 2 previous errors +% +``` + +The point I want to get across is +this: It looks like both of these cases can be worked around via +explicit type ascription. Whether or not this is an acceptable cost +is a reasonable question. + + * Note that type ascription is especially annoying for the `fn duh` case, + where one needs to keep the array-length encoded in the type consistent + with the length of the array generated by the expression. + This might motivate extending the use of wildcard `_` within type expressions + to include wildcard constants, for use in the array length, i.e.: `[T; _]`. + +The `fn coerce` example comes from uses of the `fn combine_structure` function in the +`libsyntax` crate. + +The `fn duh` example comes from the implementation of the `Default` +trait for `Box<[T]>`. + +Both examples are instances of coercion; the `fn coerce` example is +trying to express a coercion of a `Box` to a `Box` +(i.e. making a trait-object), and the `fn duh` example is trying to +express a coercion of a `Box<[T; k]>` (specifically `[T; 0]`) to a +`Box<[T]>`. Both are going from a pointer-to-sized to a +pointer-to-unsized. + +(Maybe there is a way to handle both of these cases in a generic +fashion; pnkfelix is not sufficiently familiar with how coercions +currently interact with type-inference in the first place.) + +[809]: https://github.com/rust-lang/rfcs/pull/809 diff --git a/text/0823-hash-simplification.md b/text/0823-hash-simplification.md new file mode 100644 index 00000000000..7d51aa03ac1 --- /dev/null +++ b/text/0823-hash-simplification.md @@ -0,0 +1,430 @@ +- Feature Name: hash +- Start Date: 2015-02-17 +- RFC PR: https://github.com/rust-lang/rfcs/pull/823 +- Rust Issue: https://github.com/rust-lang/rust/issues/22467 + +# Summary + +Pare back the `std::hash` module's API to improve ergonomics of usage and +definitions. While an alternative scheme more in line with what Java and C++ +have is considered, the current `std::hash` module will remain largely as-is +with modifications to its core traits. + +# Motivation + +There are a number of motivations for this RFC, and each will be explained in +term. + +## API ergonomics + +Today the API of the `std::hash` module is sometimes considered overly +complicated and it may not be pulling its weight. As a recap, the API looks +like: + +```rust +trait Hash { + fn hash(&self, state: &mut H); +} +trait Hasher { + type Output; + fn reset(&mut self); + fn finish(&self) -> Self::Output; +} +trait Writer { + fn write(&mut self, data: &[u8]); +} +``` + +The `Hash` trait is implemented by various types where the `H` type parameter +signifies the hashing algorithm that the `impl` block corresponds to. Each +`Hasher` is opaque when taken generically and is frequently paired with a bound +of `Writer` to allow feeding in arbitrary bytes. + +The purpose of not having a `Writer` supertrait on `Hasher` or on the `H` type +parameter is to allow hashing algorithms that are *not* byte-stream oriented +(e.g. Java-like algorithms). Unfortunately all primitive types in Rust are only +defined for `Hash where H: Writer + Hasher`, essentially forcing a +byte-stream oriented hashing algorithm for all hashing. + +Some examples of using this API are: + +```rust +use std::hash::{Hash, Hasher, Writer, SipHasher}; + +impl Hash for MyType { + fn hash(&self, s: &mut S) { + self.field1.hash(s); + // don't want to hash field2 + self.field3.hash(s); + } +} + +fn sip_hash>(t: &T) -> u64 { + let mut s = SipHasher::new_with_keys(0, 0); + t.hash(&mut s); + s.finish() +} +``` + +Forcing many `impl` blocks to require `Hasher + Writer` becomes onerous over +times and also requires at least 3 imports for a custom implementation of +`hash`. Taking a generically hashable `T` is also somewhat cumbersome, +especially if the hashing algorithm isn't known in advance. + +Overall the `std::hash` API is generic enough that its usage is somewhat verbose +and becomes tiresome over time to work with. This RFC strives to make this API +easier to work with. + +## Forcing byte-stream oriented hashing + +Much of the `std::hash` API today is oriented around hashing a stream of bytes +(blocks of `&[u8]`). This is not a hard requirement by the API (discussed +above), but in practice this is essentially what happens everywhere. This form +of hashing is not always the most efficient, although it is often one of the +more flexible forms of hashing. + +Other languages such as Java and C++ have a hashing API that looks more like: + +```rust +trait Hash { + fn hash(&self) -> usize; +} +``` + +This expression of hashing is not byte-oriented but is also much less generic +(an algorithm for hashing is predetermined by the type itself). This API is +encodable with today's traits as: + +```rust +struct Slot(u64); + +impl Hash for MyType { + fn hash(&self, slot: &mut Slot) { + *slot = Slot(self.precomputed_hash); + } +} + +impl Hasher for Slot { + type Output = u64; + fn reset(&mut self) { *self = Slot(0); } + fn finish(&self) -> u64 { self.0 } +} +``` + +This form of hashing (which is useful for performance sometimes) is difficult to +work with primarily because of the frequent bounds on `Writer` for hashing. + +## Non-applicability for well-known hashing algorithms + +One of the current aspirations for the `std::hash` module was to be appropriate +for hashing algorithms such as MD5, SHA\*, etc. The current API has proven +inadequate, however, for the primary reason of hashing being so generic. For +example it should in theory be possible to calculate the SHA1 hash of a byte +slice via: + +```rust +let data: &[u8] = ...; +let hash = std::hash::hash::<&[u8], Sha1>(data); +``` + +There are a number of pitfalls to this approach: + +* Due to slices being able to be hashed generically, each byte will be written + individually to the `Sha1` state, which is likely to not be very efficient. +* Due to slices being able to be hashed generically, the length of the slice is + first written to the `Sha1` state, which is likely not desired. + +The key observation is that the hash values produced in a Rust program are +**not** reproducible outside of Rust. For this reason, APIs for reproducible +hashes to be verified elsewhere will explicitly not be considered in the design +for `std::hash`. It is expected that an external crate may wish to provide a +trait for these hashing algorithms and it would not be bounded by +`std::hash::Hash`, but instead perhaps a "byte container" of some form. + +# Detailed design + +This RFC considers two possible designs as a replacement of today's `std::hash` +API. One is a "minor refactoring" of the current API while the +other is a much more radical change towards being conservative. This section +will propose the minor refactoring change and the other may be found in the +[Alternatives](#alternatives) section. + +## API + +The new API of `std::hash` would be: + +```rust +trait Hash { + fn hash(&self, h: &mut H); + + fn hash_slice(data: &[Self], h: &mut H) { + for piece in data { + data.hash(h); + } + } +} + +trait Hasher { + fn write(&mut self, data: &[u8]); + fn finish(&self) -> u64; + + fn write_u8(&mut self, i: u8) { ... } + fn write_i8(&mut self, i: i8) { ... } + fn write_u16(&mut self, i: u16) { ... } + fn write_i16(&mut self, i: i16) { ... } + fn write_u32(&mut self, i: u32) { ... } + fn write_i32(&mut self, i: i32) { ... } + fn write_u64(&mut self, i: u64) { ... } + fn write_i64(&mut self, i: i64) { ... } + fn write_usize(&mut self, i: usize) { ... } + fn write_isize(&mut self, i: isize) { ... } +} +``` + +This API is quite similar to today's API, but has a few tweaks: + +* The `Writer` trait has been removed by folding it directly into the `Hasher` + trait. As part of this movement the `Hasher` trait grew a number of + specialized `write_foo` methods which the primitives will call. This should + help regain some performance losses where forcing a byte-oriented stream is + a performance loss. + +* The `Hasher` trait no longer has a `reset` method. + +* The `Hash` trait's type parameter is on the *method*, not on the trait. This + implies that the trait is no longer object-safe, but it is much more ergonomic + to operate over generically. + +* The `Hash` trait now has a `hash_slice` method to slice a number of instances + of `Self` at once. This will allow optimization of the `Hash` implementation + of `&[u8]` to translate to a raw `write` as well as other various slices of + primitives. + +* The `Output` associated type was removed in favor of an explicit `u64` return + from `finish`. + +The purpose of this API is to continue to allow APIs to be generic over the +hashing algorithm used. This would allow `HashMap` continue to use a randomly +keyed SipHash as its default algorithm (e.g. continuing to provide DoS +protection, more information on this below). An example encoding of the +alternative API (proposed below) would look like: + +```rust +impl Hasher for u64 { + fn write(&mut self, data: &[u8]) { + for b in data.iter() { self.write_u8(*b); } + } + fn finish(&self) -> u64 { *self } + + fn write_u8(&mut self, i: u8) { *self = combine(*self, i); } + // and so on... +} +``` + +## `HashMap` and `HashState` + +For both this recommendation as well as the alternative below, this RFC proposes +removing the `HashState` trait and `Hasher` structure (as well as the +`hash_state` module) in favor of the following API: + +```rust +struct HashMap; + +impl HashMap { + fn new() -> HashMap { + HashMap::with_hasher(DefaultHasher::new()) + } +} + +impl u64> HashMap { + fn with_hasher(hasher: H) -> HashMap; +} + +impl Fn(&K) -> u64 for DefaultHasher { + fn call(&self, arg: &K) -> u64 { + let (k1, k2) = self.siphash_keys(); + let mut s = SipHasher::new_with_keys(k1, k2); + arg.hash(&mut s); + s.finish() + } +} +``` + +The precise details will be affected based on which design in this RFC is +chosen, but the general idea is to move from a custom trait to the standard `Fn` +trait for calculating hashes. + +# Drawbacks + +* This design is a departure from the precedent set by many other languages. In + doing so, however, it is arguably easier to implement `Hash` as it's more + obvious how to feed in incremental state. We also do not lock ourselves into a + particular hashing algorithm in case we need to alternate in the future. + +* Implementations of `Hash` cannot be specialized and are forced to operate + generically over the hashing algorithm provided. This may cause a loss of + performance in some cases. Note that this could be remedied by moving the type + parameter to the trait instead of the method, but this would lead to a loss in + ergonomics for generic consumers of `T: Hash`. + +* Manual implementations of `Hash` are somewhat cumbersome still by requiring a + separate `Hasher` parameter which is not necessarily always desired. + +* The API of `Hasher` is approaching the realm of serialization/reflection and + it's unclear whether its API should grow over time to support more basic Rust + types. It would be unfortunate if the `Hasher` trait approached a full-blown + `Encoder` trait (as `rustc-serialize` has). + +# Alternatives + +As alluded to in the "Detailed design" section the primary alternative to this +RFC, which still improves ergonomics, is to remove the generic-ness over the +hashing algorithm. + +## API + +The new API of `std::hash` would be: + +```rust +trait Hash { + fn hash(&self) -> usize; +} + +fn combine(a: usize, b: usize) -> usize; +``` + +The `Writer`, `Hasher`, and `SipHasher` structures/traits would all be removed +from `std::hash`. This definition is more or less the Rust equivalent of the +Java/C++ hashing infrastructure. This API is a vast simplification of what +exists today and allows implementations of `Hash` as well as consumers of `Hash` +to quite ergonomically work with hash values as well as hashable objects. + +> **Note**: The choice of `usize` instead of `u64` reflects [C++'s +> choice][cpp-hash] here as well, but it is quite easy to use one instead of +> the other. + +## Hashing algorithm + +With this definition of `Hash`, each type must pre-ordain a particular hash +algorithm that it implements. Using an alternate algorithm would require a +separate newtype wrapper. + +Most implementations would still use `#[derive(Hash)]` which will leverage +`hash::combine` to combine the hash values of aggregate fields. Manual +implementations which only want to hash a select number of fields would look +like: + +```rust +impl Hash for MyType { + fn hash(&self) -> usize { + // ignore field2 + (&self.field1, &self.field3).hash() + } +} +``` + +A possible implementation of combine can be found [in the boost source +code][boost-combine]. + +[boost-combine]: https://github.com/boostorg/functional/blob/master/include/boost/functional/hash/hash.hpp#L209-L213 + +## `HashMap` and DoS protection + +Currently one of the features of the standard library's `HashMap` implementation +is that it by default provides DoS protection through two measures: + +1. A strong hashing algorithm, SipHash 2-4, is used which is fairly difficult to + find collisions with. +2. The SipHash algorithm is randomly seeded for each instance of `HashMap`. The + algorithm is seeded with a 128-bit key. + +These two measures ensure that each `HashMap` is randomly ordered, even if the +same keys are inserted in the same order. As a result, it is quite difficult to +mount a DoS attack against a `HashMap` as it is difficult to predict what +collisions will happen. + +The `Hash` trait proposed above, however, does not allow SipHash to be +implemented generally any more. For example `#[derive(Hash)]` will no longer +leverage SipHash. Additionally, there is no input of state into the `hash` +function, so there is no random state per-`HashMap` to generate different hashes +with. + +Denial of service attacks against hash maps are no new phenomenon, they are +[well](http://www.ocert.org/advisories/ocert-2011-003.html) +[known](http://lwn.net/Articles/474912/) +and have been reported in +[Python](http://bugs.python.org/issue13703), +[Ruby](https://www.ruby-lang.org/en/news/2011/12/28/denial-of-service-attack-was-found-for-rubys-hash-algorithm-cve-2011-4815/) +([other ruby](https://www.ruby-lang.org/en/news/2012/11/09/ruby19-hashdos-cve-2012-5371/)), +[Perl](http://blog.booking.com/hardening-perls-hash-function.html), +and many other languages/frameworks. Rust has taken a fairly proactive step from +the start by using a strong and randomly seeded algorithm since `HashMap`'s +inception. + +In general the standard library does not provide many security-related +guarantees beyond memory safety. For example the new `Read::read_to_end` +function passes a safe buffer of uninitialized data to implementations of +`read` using various techniques to prevent memory safety issues. A DoS attack +against a hash map is such a common and well known exploit, however, that this +RFC considers it critical to consider the design of `Hash` and its relationship +with `HashMap`. + +## Mitigation of DoS attacks + +Other languages have mitigated DoS attacks via a few measures: + +* [C++ specifies][cpp-hash] that the return value of `hash` is not guaranteed to + be stable across program executions, allowing for a global salt to be mixed + into hashes calculated. +* [Ruby has a global seed][ruby-seed] which is randomly initialized on startup + and is used when hashing blocks of memory (e.g. strings). +* PHP and Tomcat have added limits to the maximum amount of keys allowed from a + POST HTTP request (to limit the size of auto-generated maps). This strategy is + not necessarily applicable to the standard library. + +[cpp-hash]: http://en.cppreference.com/w/cpp/utility/hash +[ruby-seed]: https://github.com/ruby/ruby/blob/193ad64359b8ebcd77a2cba50a62d64311e26b22/random.c#L1248-L1251 + +It [has been claimed](http://bugs.python.org/issue13703#msg150558), however, +that a global seed may only mitigate some of the simplest attacks. The primary +downside is that a long-running process may leak the "global seed" through some +other form which could compromise maps in that specific process. + +One possible route to mitigating these attacks with the `Hash` trait above could +be: + +1. All primitives (integers, etc) are `combine`d with a global random seed which + is initialized on first use. +2. Strings will continue to use SipHash as the default algorithm and the + initialization keys will be randomly initialized on first use. + +Given the information available about other DoS mitigations in hash maps for +other languages, however, it is not clear that this will provide the same level +of DoS protection that is available today. For example [@DaGenix explains +well](https://github.com/rust-lang/rfcs/pull/823#issuecomment-74013800) that we +may not be able to provide any form of DoS protection guarantee at all. + +## Alternative Drawbacks + +* One of the primary drawbacks to the proposed `Hash` trait is that it is now + not possible to select an algorithm that a type should be hashed with. Instead + each type's definition of hashing can only be altered through the use of a + newtype wrapper. + +* Today most Rust types can be hashed using a byte-oriented algorithm, so any + number of these algorithms (e.g. SipHash, Fnv hashing) can be used. With this + new `Hash` definition they are not easily accessible. + +* Due to the lack of input state to hashing, the `HashMap` type can no longer + randomly seed each individual instance but may at best have one global seed. + This consequently elevates the risk of a DoS attack on a `HashMap` instance. + +* The method of combining hashes together is not proven among other languages + and is not guaranteed to provide the guarantees we want. This departure from + the may have unknown consequences. + +# Unresolved questions + +* To what degree should `HashMap` attempt to prevent DoS attacks? Is it the + responsibility of the standard library to do so or should this be provided as + an external crate on crates.io? diff --git a/text/0832-from-elem-with-love.md b/text/0832-from-elem-with-love.md new file mode 100644 index 00000000000..4f53e2e150b --- /dev/null +++ b/text/0832-from-elem-with-love.md @@ -0,0 +1,128 @@ +- Feature Name: direct to stable, because it modifies a stable macro +- Start Date: 2015-02-11 +- RFC PR: https://github.com/rust-lang/rfcs/pull/832 +- Rust Issue: https://github.com/rust-lang/rust/issues/22414 + +# Summary + +Add back the functionality of `Vec::from_elem` by improving the `vec![x; n]` sugar to work with Clone `x` and runtime `n`. + +# Motivation + +High demand, mostly. There are currently a few ways to achieve the behaviour of `Vec::from_elem(elem, n)`: + +``` +// #1 +let vec = Vec::new(); +for i in range(0, n) { + vec.push(elem.clone()) +} +``` + +``` +// #2 +let vec = vec![elem; n] +``` + +``` +// #3 +let vec = Vec::new(); +vec.resize(elem, n); +``` + +``` +// #4 +let vec: Vec<_> = (0..n).map(|_| elem.clone()).collect() +``` + +``` +// #5 +let vec: Vec<_> = iter::repeat(elem).take(n).collect(); +``` + +None of these quite match the convenience, power, and performance of: + +``` +let vec = Vec::from_elem(elem, n) +``` + +* `#1` is verbose *and* slow, because each `push` requires a capacity check. +* `#2` only works for a Copy `elem` and const `n`. +* `#3` needs a temporary, but should be otherwise identical performance-wise. +* `#4` and `#5` are considered verbose and noisy. They also need to clone one more +time than other methods *strictly* need to. + +However the issues for `#2` are *entirely* artifical. It's simply a side-effect of +forwarding the impl to the identical array syntax. We can just make the code in the +`vec!` macro better. This naturally extends the compile-timey `[x; n]` array sugar +to the more runtimey semantics of Vec, without introducing "another way to do it". + +`vec![100; 10]` is also *slightly* less ambiguous than `from_elem(100, 10)`, +because the `[T; n]` syntax is part of the language that developers should be +familiar with, while `from_elem` is just a function with arbitrary argument order. + +`vec![x; n]` is also known to be 47% more sick-rad than `from_elem`, which was +of course deprecated to due its lack of sick-radness. + +# Detailed design + +Upgrade the current `vec!` macro to have the following definition: + +```rust +macro_rules! vec { + ($x:expr; $y:expr) => ( + unsafe { + use std::ptr; + use std::clone::Clone; + + let elem = $x; + let n: usize = $y; + let mut v = Vec::with_capacity(n); + let mut ptr = v.as_mut_ptr(); + for i in range(1, n) { + ptr::write(ptr, Clone::clone(&elem)); + ptr = ptr.offset(1); + v.set_len(i); + } + + // No needless clones + if n > 0 { + ptr::write(ptr, elem); + v.set_len(n); + } + + v + } + ); + ($($x:expr),*) => ( + <[_] as std::slice::SliceExt>::into_vec( + std::boxed::Box::new([$($x),*])) + ); + ($($x:expr,)*) => (vec![$($x),*]) +} +``` + +(note: only the `[x; n]` branch is changed) + +Which allows all of the following to work: + +``` +fn main() { + println!("{:?}", vec![1; 10]); + println!("{:?}", vec![Box::new(1); 10]); + let n = 10; + println!("{:?}", vec![1; n]); +} +``` + +# Drawbacks + +Less discoverable than from_elem. All the problems that macros have relative to static methods. + +# Alternatives + +Just un-delete from_elem as it was. + +# Unresolved questions + +No. diff --git a/text/0839-embrace-extend-extinguish.md b/text/0839-embrace-extend-extinguish.md new file mode 100644 index 00000000000..a23acffebf5 --- /dev/null +++ b/text/0839-embrace-extend-extinguish.md @@ -0,0 +1,119 @@ +- Feature Name: embrace-extend-extinguish +- Start Date: 2015-02-13 +- RFC PR: [rust-lang/rfcs#839](https://github.com/rust-lang/rfcs/pull/839) +- Rust Issue: [rust-lang/rust#25976](https://github.com/rust-lang/rust/issues/25976) + +# Summary + +Make all collections `impl<'a, T: Copy> Extend<&'a T>`. + +This enables both `vec.extend(&[1, 2, 3])`, and `vec.extend(&hash_set_of_ints)`. +This partially covers the usecase of the awkward `Vec::push_all` with +literally no ergonomic loss, while leveraging established APIs. + +# Motivation + +Vec::push_all is kinda random and specific. Partially motivated by performance concerns, +but largely just "nice" to not have to do something like +`vec.extend([1, 2, 3].iter().cloned())`. The performance argument falls flat +(we *must* make iterators fast, and trusted_len should get us there). The ergonomics +argument is salient, though. Working with Plain Old Data types in Rust is super annoying +because generic APIs and semantics are tailored for non-Copy types. + +Even with Extend upgraded to take IntoIterator, that won't work with &[Copy], +because a slice can't be moved out of. Collections would have to take `IntoIterator<&T>`, +and copy out of the reference. So, do exactly that. + +As a bonus, this is more expressive than `push_all`, because you can feed in *any* +collection by-reference to clone the data out of it, not just slices. + +# Detailed design + +* For sequences and sets: `impl<'a, T: Copy> Extend<&'a T>` +* For maps: `impl<'a, K: Copy, V: Copy> Extend<(&'a K, &'a V)>` + +e.g. + +```rust +use std::iter::IntoIterator; + +impl<'a, T: Copy> Extend<&'a T> for Vec { + fn extend>(&mut self, iter: I) { + self.extend(iter.into_iter().cloned()) + } +} + + +fn main() { + let mut foo = vec![1]; + foo.extend(&[1, 2, 3, 4]); + let bar = vec![1, 2, 3]; + foo.extend(&bar); + foo.extend(bar.iter()); + + println!("{:?}", foo); +} +``` + +# Drawbacks + +* Mo' generics, mo' magic. How you gonna discover it? + +* This creates a potentially confusing behaviour in a generic context. + +Consider the following code: + +```rust +fn feed<'a, X: Extend<&'a T>>(&'a self, buf: &mut X) { + buf.extend(self.data.iter()); +} +``` + +One would reasonably expect X to contain &T's, but with this +proposal it is possible that X now instead contains T's. It's not +clear that in "real" code that this would ever be a problem, though. +It may lead to novices accidentally by-passing ownership through +implicit copies. + +It also may make inference fail in some other cases, as Extend would +not always be sufficient to determine the type of a `vec![]`. + +* This design does not fully replace the push_all, as it takes `T: Clone`. + +# Alternatives + + +## The Cloneian Candidate +This proposal is artifically restricting itself to `Copy` rather than full +`Clone` as a concession to the general Rustic philosophy of Clones being +explicit. Since this proposal is largely motivated by simple shuffling of +primitives, this is sufficient. Also, because `Copy: Clone`, it would be +backwards compatible to upgrade to `Clone` in the future if demand is +high enough. + +## The New Method +It is theoretically plausible to add a new defaulted method to Extend called +`extend_cloned` that provides this functionality. This removes any concern of +accidental clones and makes inference totally work. However this design cannot +simultaneously support Sequences and Maps, as the signature for sequences would +mean Maps can only Copy through &(K, V), rather than (&K, &V). This would make +it impossible to copy-chain Maps through Extend. + +## Why not FromIterator? + +FromIterator could also be extended in the same manner, but this is less useful for +two reasons: + +* FromIterator is always called by calling `collect`, and IntoIterator doesn't really +"work" right in `self` position. +* Introduces ambiguities in some cases. What is `let foo: Vec<_> = [1, 2, 3].iter().collect()`? + +Of course, context might disambiguate in many cases, and +`let foo: Vec = [1, 2, 3].iter().collect()` might still be nicer than +`let foo: Vec<_> = [1, 2, 3].iter().cloned().collect()`. + + +# Unresolved questions + +None. + diff --git a/text/0840-no-panic-in-c-string.md b/text/0840-no-panic-in-c-string.md new file mode 100644 index 00000000000..4b665a04d4a --- /dev/null +++ b/text/0840-no-panic-in-c-string.md @@ -0,0 +1,97 @@ +- Feature Name: non_panicky_cstring +- Start Date: 2015-02-13 +- RFC PR: https://github.com/rust-lang/rfcs/pull/840 +- Rust Issue: https://github.com/rust-lang/rust/issues/22470 + +# Summary + +Remove panics from `CString::from_slice` and `CString::from_vec`, making +these functions return `Result` instead. + +# Motivation + +> As I shivered and brooded on the casting of that brain-blasting shadow, +> I knew that I had at last pried out one of earth’s supreme horrors—one of +> those nameless blights of outer voids whose faint daemon scratchings we +> sometimes hear on the farthest rim of space, yet from which our own finite +> vision has given us a merciful immunity. +> +> — H. P. Lovecraft, The Lurking Fear + +Currently the functions that produce `std::ffi::CString` out of Rust byte +strings panic when the input contains NUL bytes. As strings containing NULs +are not commonly seen in real-world usage, it is easy for developers to +overlook the potential panic unless they test for such atypical input. + +The panic is particularly sneaky when hidden behind an API using regular Rust +string types. Consider this example: + +```rust +fn set_text(text: &str) { + let c_text = CString::from_slice(text.as_bytes()); // panic lurks here + unsafe { ffi::set_text(c_text.as_ptr()) }; +} +``` + +This implementation effectively imposes a requirement on the input string to +contain no inner NUL bytes, which is generally permitted in pure Rust. +This restriction is not apparent in the signature of the function and needs to +be described in the documentation. Furthermore, the creator of the code may be +oblivious to the potential panic. + +The conventions on failure modes elsewhere in Rust libraries tend to limit +panics to outcomes of programmer errors. Functions validating external data +should return `Result` to allow graceful handling of the errors. + +# Detailed design + +The return types of `CString::from_slice` and `CString::from_vec` is changed +to `Result`: + +```rust +impl CString { + pub fn from_slice(s: &[u8]) -> Result { ... } + pub fn from_vec(v: Vec) -> Result { ... } +} +``` + +The error type `NulError` provides information on the position of the first +NUL byte found in the string. `IntoCStrError` wraps `NulError` and also +provides the `Vec` which has been moved into `CString::from_vec`. + +`std::error::FromError` implementations are provided to convert the error +types above to `std::io::Error` of the `InvalidInput` kind. This facilitates +use of the conversion functions in input-processing code. + +# Proof-of-concept implementation + +The proposed changes are implemented in a crates.io project +[c_string](https://github.com/mzabaluev/rust-c-str), where the analog of +`CString` is named `CStrBuf`. + +# Drawbacks + +The need to extract the data from a `Result` in the success case is annoying. +However, it may be viewed as a speed bump to make the developer aware of a +potential failure and to require an explicit choice on how to handle it. +Even the least graceful way, a call to `unwrap`, makes the potential panic +apparent in the code. + +# Alternatives + +Non-panicky functions can be added alongside the existing functions, e.g., +as `from_slice_failing`. Adding new functions complicates the API where little +reason for that exists; composition is preferred to adding function variants. +Longer function names, together with a less convenient return value, may deter +people from using the safer functions. + +The panicky functions could also be renamed to `unpack_slice` and `unpack_vec`, +respectively, to highlight their conceptual proximity to `unpack`. + +If the panicky behavior is preserved, plentiful possibilities for DoS attacks +and other unforeseen failures in the field may be introduced by code oblivious +to the input constraints. + +# Unresolved questions + +None. diff --git a/text/0873-type-macros.md b/text/0873-type-macros.md new file mode 100644 index 00000000000..bab40b17042 --- /dev/null +++ b/text/0873-type-macros.md @@ -0,0 +1,235 @@ +- Feature Name: macros_in_type_positions +- Start Date: 2015-02-16 +- RFC PR: [rust-lang/rfcs#873](https://github.com/rust-lang/rfcs/pull/873) +- Rust Issue: [rust-lang/rust#27245](https://github.com/rust-lang/rust/issues/27245) + +# Summary + +Allow macros in type positions + +# Motivation + +Macros are currently allowed in syntax fragments for expressions, +items, and patterns, but not for types. This RFC proposes to lift that +restriction. + +1. This would allow macros to be used more flexibly, avoiding the + need for more complex item-level macros or plugins in some + cases. For example, when creating trait implementations with + macros, it is sometimes useful to be able to define the + associated types using a nested type macro but this is + currently problematic. + +2. Enable more programming patterns, particularly with respect to + type level programming. Macros in type positions provide + convenient way to express recursion and choice. It is possible + to do the same thing purely through programming with associated + types but the resulting code can be cumbersome to read and write. + + +# Detailed design + +## Implementation + +The proposed feature has been prototyped at +[this branch](https://github.com/freebroccolo/rust/commits/feature/type_macros). The +implementation is straightforward and the impact of the changes are +limited in scope to the macro system. Type-checking and other phases +of compilation should be unaffected. + +The most significant change introduced by this feature is a +[`TyMac`](https://github.com/freebroccolo/rust/blob/f8f8dbb6d332c364ecf26b248ce5f872a7a67019/src/libsyntax/ast.rs#L1274-L1275) +case for the `Ty_` enum so that the parser can indicate a macro +invocation in a type position. In other words, `TyMac` is added to the +ast and handled analogously to `ExprMac`, `ItemMac`, and `PatMac`. + +## Example: Heterogeneous Lists + +Heterogeneous lists are one example where the ability to express +recursion via type macros is very useful. They can be used as an +alternative to or in combination with tuples. Their recursive +structure provide a means to abstract over arity and to manipulate +arbitrary products of types with operations like appending, taking +length, adding/removing items, computing permutations, etc. + +Heterogeneous lists can be defined like so: + +```rust +#[derive(Copy, Clone, Debug, Eq, Ord, PartialEq, PartialOrd)] +struct Nil; // empty HList +#[derive(Copy, Clone, Debug, Eq, Ord, PartialEq, PartialOrd)] +struct Cons(H, T); // cons cell of HList + +// trait to classify valid HLists +trait HList: MarkerTrait {} +impl HList for Nil {} +impl HList for Cons {} +``` + +However, writing HList terms in code is not very convenient: + +```rust +let xs = Cons("foo", Cons(false, Cons(vec![0u64], Nil))); +``` + +At the term-level, this is an easy fix using macros: + +```rust +// term-level macro for HLists +macro_rules! hlist { + {} => { Nil }; + {=> $($elem:tt),+ } => { hlist_pat!($($elem),+) }; + { $head:expr, $($tail:expr),* } => { Cons($head, hlist!($($tail),*)) }; + { $head:expr } => { Cons($head, Nil) }; +} + +// term-level HLists in patterns +macro_rules! hlist_pat { + {} => { Nil }; + { $head:pat, $($tail:tt),* } => { Cons($head, hlist_pat!($($tail),*)) }; + { $head:pat } => { Cons($head, Nil) }; +} + +let xs = hlist!["foo", false, vec![0u64]]; +``` + +Unfortunately, this solution is incomplete because we have only made +HList terms easier to write. HList types are still inconvenient: + +```rust +let xs: Cons<&str, Cons, Nil>>> = hlist!["foo", false, vec![0u64]]; +``` + +Allowing type macros as this RFC proposes would allows us to be +able to use Rust's macros to improve writing the HList type as +well. The complete example follows: + +```rust +// term-level macro for HLists +macro_rules! hlist { + {} => { Nil }; + {=> $($elem:tt),+ } => { hlist_pat!($($elem),+) }; + { $head:expr, $($tail:expr),* } => { Cons($head, hlist!($($tail),*)) }; + { $head:expr } => { Cons($head, Nil) }; +} + +// term-level HLists in patterns +macro_rules! hlist_pat { + {} => { Nil }; + { $head:pat, $($tail:tt),* } => { Cons($head, hlist_pat!($($tail),*)) }; + { $head:pat } => { Cons($head, Nil) }; +} + +// type-level macro for HLists +macro_rules! HList { + {} => { Nil }; + { $head:ty } => { Cons<$head, Nil> }; + { $head:ty, $($tail:ty),* } => { Cons<$head, HList!($($tail),*)> }; +} + +let xs: HList![&str, bool, Vec] = hlist!["foo", false, vec![0u64]]; +``` + +Operations on HLists can be defined by recursion, using traits with +associated type outputs at the type-level and implementation methods +at the term-level. + +The HList append operation is provided as an example. Type macros are +used to make writing append at the type level (see `Expr!`) more +convenient than specifying the associated type projection manually: + +```rust +use std::ops; + +// nil case for HList append +impl ops::Add for Nil { + type Output = Ys; + + fn add(self, rhs: Ys) -> Ys { + rhs + } +} + +// cons case for HList append +impl ops::Add for Cons where + Xs: ops::Add, +{ + type Output = Cons; + + fn add(self, rhs: Ys) -> Cons { + Cons(self.0, self.1 + rhs) + } +} + +// type macro Expr allows us to expand the + operator appropriately +macro_rules! Expr { + { ( $($LHS:tt)+ ) } => { Expr!($($LHS)+) }; + { HList ! [ $($LHS:tt)* ] + $($RHS:tt)+ } => { >::Output }; + { $LHS:tt + $($RHS:tt)+ } => { >::Output }; + { $LHS:ty } => { $LHS }; +} + +// test demonstrating term level `xs + ys` and type level `Expr!(Xs + Ys)` +#[test] +fn test_append() { + fn aux(xs: Xs, ys: Ys) -> Expr!(Xs + Ys) where + Xs: ops::Add + { + xs + ys + } + let xs: HList![&str, bool, Vec] = hlist!["foo", false, vec![]]; + let ys: HList![u64, [u8; 3], ()] = hlist![0, [0, 1, 2], ()]; + + // demonstrate recursive expansion of Expr! + let zs: Expr!((HList![&str] + HList![bool] + HList![Vec]) + + (HList![u64] + HList![[u8; 3], ()]) + + HList![]) + = aux(xs, ys); + assert_eq!(zs, hlist!["foo", false, vec![], 0, [0, 1, 2], ()]) +} +``` + +# Drawbacks + +There seem to be few drawbacks to implementing this feature as an +extension of the existing macro machinery. The change adds a small +amount of additional complexity to the +[parser](https://github.com/freebroccolo/rust/commit/a224739e92a3aa1febb67d6371988622bd141361) +and +[conversion](https://github.com/freebroccolo/rust/commit/9341232087991dee73713dc4521acdce11a799a2) +but the changes are minimal. + +As with all feature proposals, it is possible that designs for future +extensions to the macro system or type system might interfere with +this functionality but it seems unlikely unless they are significant, +breaking changes. + +# Alternatives + +There are no _direct_ alternatives. Extensions to the type system like +data kinds, singletons, and other forms of staged programming +(so-called CTFE) might alleviate the need for type macros in some +cases, however it is unlikely that they would provide a comprehensive +replacement, particularly where plugins are concerned. + +Not implementing this feature would mean not taking some reasonably +low-effort steps toward making certain programming patterns +easier. One potential consequence of this might be more pressure to +significantly extend the type system and other aspects of the language +to compensate. + +# Unresolved questions + +## Alternative syntax for macro invocations in types + +There is a question as to whether type macros should allow `<` and `>` +as delimiters for invocations, e.g. `Foo!`. This would raise a +number of additional complications and is probably not necessary to +consider for this RFC. If deemed desirable by the community, this +functionality should be proposed separately. + +## Hygiene and type macros + +This RFC also does not address the topic of hygiene regarding macros +in types. It is not clear whether there are issues here or not but it +may be worth considering in further detail. diff --git a/text/0879-small-base-lexing.md b/text/0879-small-base-lexing.md new file mode 100644 index 00000000000..347047d603e --- /dev/null +++ b/text/0879-small-base-lexing.md @@ -0,0 +1,106 @@ +- Feature Name: stable, it only restricts the language +- Start Date: 2015-02-17 +- RFC PR: [rust-lang/rfcs#879](https://github.com/rust-lang/rfcs/pull/879) +- Rust Issue: [rust-lang/rust#23872](https://github.com/rust-lang/rust/pull/23872) + +# Summary + +Lex binary and octal literals as if they were decimal. + +# Motivation + +Lexing all digits (even ones not valid in the given base) allows for +improved error messages & future proofing (this is more conservative +than the current approach) and less confusion, with little downside. + +Currently, the lexer stops lexing binary and octal literals (`0b10` and +`0o12345670`) as soon as it sees an invalid digit (2-9 or 8-9 +respectively), and immediately starts lexing a new token, +e.g. `0b0123` is two tokens, `0b01` and `23`. Writing such a thing in +normal code gives a strange error message: + +```rust +:2:9: 2:11 error: expected one of `.`, `;`, `}`, or an operator, found `23` +:2 0b0123 + ^~ +``` + +However, it is valid to write such a thing in a macro (e.g. using the +`tt` non-terminal), and thus lexing the adjacent digits as two tokens +can lead to unexpected behaviour. + +```rust +macro_rules! expr { ($e: expr) => { $e } } + +macro_rules! add { + ($($token: tt)*) => { + 0 $(+ expr!($token))* + } +} +fn main() { + println!("{}", add!(0b0123)); +} +``` + +prints `24` (`add` expands to `0 + 0b01 + 23`). + +It would be nicer for both cases to print an error like: + +```rust +error: found invalid digit `2` in binary literal +0b0123 + ^ +``` + +(The non-macro case could be handled by detecting this pattern in the +lexer and special casing the message, but this doesn't not handle the +macro case.) + +Code that wants two tokens can opt in to it by `0b01 23`, for +example. This is easy to write, and expresses the intent more clearly +anyway. + +# Detailed design + +The grammar that the lexer uses becomes + +``` +(0b[0-9]+ | 0o[0-9]+ | [0-9]+ | 0x[0-9a-fA-F]+) suffix +``` + +instead of just `[01]` and `[0-7]` for the first two, respectively. + +However, it is always an error (in the lexer) to have invalid digits +in a numeric literal beginning with `0b` or `0o`. In particular, even +a macro invocation like + +```rust +macro_rules! ignore { ($($_t: tt)*) => { {} } } + +ignore!(0b0123) +``` + +is an error even though it doesn't use the tokens. + + +# Drawbacks + +This adds a slightly peculiar special case, that is somewhat unique to +Rust. On the other hand, most languages do not expose the lexical +grammar so directly, and so have more freedom in this respect. That +is, in many languages it is indistinguishable if `0b1234` is one or +two tokens: it is *always* an error either way. + + +# Alternatives + +Don't do it, obviously. + +Consider `0b123` to just be `0b1` with a suffix of `23`, and this is +an error or not depending if a suffix of `23` is valid. Handling this +uniformly would require `"foo"123` and `'a'123` also being lexed as a +single token. (Which may be a good idea anyway.) + +# Unresolved questions + +None. diff --git a/text/0888-compiler-fence-intrinsics.md b/text/0888-compiler-fence-intrinsics.md new file mode 100644 index 00000000000..9cb399c576f --- /dev/null +++ b/text/0888-compiler-fence-intrinsics.md @@ -0,0 +1,65 @@ +- Feature Name: compiler_fence_intrinsics +- Start Date: 2015-02-19 +- RFC PR: [rust-lang/rfcs#888](https://github.com/rust-lang/rfcs/pull/888) +- Rust Issue: [rust-lang/rust#24118](https://github.com/rust-lang/rust/issues/24118) + +# Summary + +Add intrinsics for single-threaded memory fences. + +# Motivation + +Rust currently supports memory barriers through a set of intrinsics, +`atomic_fence` and its variants, which generate machine instructions and are +suitable as cross-processor fences. However, there is currently no compiler +support for single-threaded fences which do not emit machine instructions. + +Certain use cases require that the compiler not reorder loads or stores across a +given barrier but do not require a corresponding hardware guarantee, such as +when a thread interacts with a signal handler which will run on the same thread. +By omitting a fence instruction, relatively costly machine operations can be +avoided. + +The C++ equivalent of this feature is `std::atomic_signal_fence`. + +# Detailed design + +Add four language intrinsics for single-threaded fences: + + * `atomic_compilerfence` + * `atomic_compilerfence_acq` + * `atomic_compilerfence_rel` + * `atomic_compilerfence_acqrel` + +These have the same semantics as the existing `atomic_fence` intrinsics but only +constrain memory reordering by the compiler, not by hardware. + +The existing fence intrinsics are exported in libstd with safe wrappers, but +this design does not export safe wrappers for the new intrinsics. The existing +fence functions will still perform correctly if used where a single-threaded +fence is called for, but with a slight reduction in efficiency. Not exposing +these new intrinsics through a safe wrapper reduces the possibility for +confusion on which fences are appropriate in a given situation, while still +providing the capability for users to opt in to a single-threaded fence when +appropriate. + +# Alternatives + + * Do nothing. The existing fence intrinsics support all use cases, but with a + negative impact on performance in some situations where a compiler-only fence + is appropriate. + + * Recommend inline assembly to get a similar effect, such as `asm!("" ::: + "memory" : "volatile")`. LLVM provides an IR item specifically for this case + (`fence singlethread`), so I believe taking advantage of that feature in LLVM is + most appropriate, since its semantics are more rigorously defined and less + likely to yield unexpected (but not necessarily wrong) behavior. + +# Unresolved questions + +These intrinsics may be better represented with a different name, such as +`atomic_signal_fence` or `atomic_singlethread_fence`. The existing +implementation of atomic intrinsics in the compiler precludes the use of +underscores in their names and I believe it is clearer to refer to this +construct as a "compiler fence" rather than a "signal fence" because not all use +cases necessarily involve signal handlers, hence the current choice of name. diff --git a/text/0909-move-thread-local-to-std-thread.md b/text/0909-move-thread-local-to-std-thread.md new file mode 100644 index 00000000000..937c5dd608f --- /dev/null +++ b/text/0909-move-thread-local-to-std-thread.md @@ -0,0 +1,47 @@ +- Feature Name: N/A +- Start Date: 2015-02-25 +- RFC PR: https://github.com/rust-lang/rfcs/pull/909 +- Rust Issue: https://github.com/rust-lang/rust/issues/23547 + +# Summary + +Move the contents of `std::thread_local` into `std::thread`. Fully +remove `std::thread_local` from the standard library. + +# Motivation + +Thread locals are directly related to threading. Combining the modules +would reduce the number of top level modules, combine related concepts, +and make browsing the docs easier. It also would have the potential to +slightly reduce the number of `use` statementsl + +# Detailed design + +The contents of`std::thread_local` module would be moved into to +`std::thread::local`. `Key` would be renamed to `LocalKey`, and +`scoped` would also be flattened, providing `ScopedKey`, etc. This +way, all thread related code is combined in one module. + +It would also allow using it as such: + +```rust +use std::thread::{LocalKey, Thread}; +``` + +# Drawbacks + +It's pretty late in the 1.0 release cycle. This is a mostly bike +shedding level of a change. It may not be worth changing it at this +point and staying with two top level modules in `std`. Also, some users +may prefer to have more top level modules. + +# Alternatives + +An alternative (as the RFC originally proposed) would be to bring +`thread_local` in as a submodule, rather than flattening. This was +decided against in an effort to keep hierarchies flat, and because of +the slim contents on the `thread_local` module. + +# Unresolved questions + +The exact strategy for moving the contents into `std::thread` diff --git a/text/0911-const-fn.md b/text/0911-const-fn.md new file mode 100644 index 00000000000..388d6213c14 --- /dev/null +++ b/text/0911-const-fn.md @@ -0,0 +1,244 @@ +- Feature Name: const_fn +- Start Date: 2015-02-25 +- RFC PR: [rust-lang/rfcs#911](https://github.com/rust-lang/rfcs/pull/911) +- Rust Issue: [rust-lang/rust#24111](https://github.com/rust-lang/rust/issues/24111) + +# Summary + +Allow marking free functions and inherent methods as `const`, enabling them to be +called in constants contexts, with constant arguments. + +# Motivation + +As it is right now, `UnsafeCell` is a stabilization and safety hazard: the field +it is supposed to be wrapping is public. This is only done out of the necessity +to initialize static items containing atomics, mutexes, etc. - for example: + +```rust +#[lang="unsafe_cell"] +struct UnsafeCell { pub value: T } +struct AtomicUsize { v: UnsafeCell } +const ATOMIC_USIZE_INIT: AtomicUsize = AtomicUsize { + v: UnsafeCell { value: 0 } +}; +``` + +This approach is fragile and doesn't compose well - consider having to initialize +an `AtomicUsize` static with `usize::MAX` - you would need a `const` for each +possible value. + +Also, types like `AtomicPtr` or `Cell` have no way *at all* to initialize +them in constant contexts, leading to overuse of `UnsafeCell` or `static mut`, +disregarding type safety and proper abstractions. + +During implementation, the worst offender I've found was `std::thread_local`: +all the fields of `std::thread_local::imp::Key` are public, so they can be +filled in by a macro - and they're also marked "stable" (due to the lack of +stability hygiene in macros). + +A pre-RFC for the removal of the dangerous (and oftenly misued) `static mut` +received positive feedback, but only under the condition that abstractions +could be created and used in `const` and `static` items. + +Another concern is the ability to use certain intrinsics, like `size_of`, inside +constant expressions, including fixed-length array types. Unlike keyword-based +alternatives, `const fn` provides an extensible and composable building block +for such features. + +The design should be as simple as it can be, while keeping enough functionality +to solve the issues mentioned above. + +The intention of this RFC is to introduce a minimal change that +enables safe abstraction resembling the kind of code that one writes +outside of a constant. Compile-time pure constants (the existing +`const` items) with added parametrization over types and values +(arguments) should suffice. + +This RFC explicitly does not introduce a general CTFE mechanism. In +particular, conditional branching and virtual dispatch are still not +supported in constant expressions, which imposes a severe limitation +on what one can express. + +# Detailed design + +Functions and inherent methods can be marked as `const`: +```rust +const fn foo(x: T, y: U) -> Foo { + stmts; + expr +} +impl Foo { + const fn new(x: T) -> Foo { + stmts; + expr + } + + const fn transform(self, y: U) -> Foo { + stmts; + expr + } +} +``` + +Traits, trait implementations and their methods cannot be `const` - this +allows us to properly design a constness/CTFE system that interacts well +with traits - for more details, see *Alternatives*. + +Only simple by-value bindings are allowed in arguments, e.g. `x: T`. While +by-ref bindings and destructuring can be supported, they're not necessary +and they would only complicate the implementation. + +The body of the function is checked as if it were a block inside a `const`: +```rust +const FOO: Foo = { + // Currently, only item "statements" are allowed here. + stmts; + // The function's arguments and constant expressions can be freely combined. + expr +} +``` + +As the current `const` items are not formally specified (yet), there is a need +to expand on the rules for `const` values (pure compile-time constants), instead +of leaving them implicit: +* the set of currently implemented expressions is: primitive literals, ADTs +(tuples, arrays, structs, enum variants), unary/binary operations on primitives, +casts, field accesses/indexing, capture-less closures, references and blocks +(only item statements and a tail expression) +* no side-effects (assignments, non-`const` function calls, inline assembly) +* struct/enum values are not allowed if their type implements `Drop`, but +this is not transitive, allowing the (perfectly harmless) creation of, e.g. +`None::>` (as an aside, this rule could be used to allow `[x; N]` even +for non-`Copy` types of `x`, but that is out of the scope of this RFC) +* references are trully immutable, no value with interior mutability can be placed +behind a reference, and mutable references can only be created from zero-sized +values (e.g. `&mut || {}`) - this allows a reference to be represented just by +its value, with no guarantees for the actual address in memory +* raw pointers can only be created from an integer, a reference or another raw +pointer, and cannot be dereferenced or cast back to an integer, which means any +constant raw pointer can be represented by either a constant integer or reference +* as a result of not having any side-effects, loops would only affect termination, +which has no practical value, thus remaining unimplemented +* although more useful than loops, conditional control flow (`if`/`else` and +`match`) also remains unimplemented and only `match` would pose a challenge +* immutable `let` bindings in blocks have the same status and implementation +difficulty as `if`/`else` and they both suffer from a lack of demand (blocks +were originally introduced to `const`/`static` for scoping items used only in +the initializer of a global). + +For the purpose of rvalue promotion (to static memory), arguments are considered +potentially varying, because the function can still be called with non-constant +values at runtime. + +`const` functions and methods can be called from any constant expression: +```rust +// Standalone example. +struct Point { x: i32, y: i32 } + +impl Point { + const fn new(x: i32, y: i32) -> Point { + Point { x: x, y: y } + } + + const fn add(self, other: Point) -> Point { + Point::new(self.x + other.x, self.y + other.y) + } +} + +const ORIGIN: Point = Point::new(0, 0); + +const fn sum_test(xs: [Point; 3]) -> Point { + xs[0].add(xs[1]).add(xs[2]) +} + +const A: Point = Point::new(1, 0); +const B: Point = Point::new(0, 1); +const C: Point = A.add(B); +const D: Point = sum_test([A, B, C]); + +// Assuming the Foo::new methods used here are const. +static FLAG: AtomicBool = AtomicBool::new(true); +static COUNTDOWN: AtomicUsize = AtomicUsize::new(10); +#[thread_local] +static TLS_COUNTER: Cell = Cell::new(1); +``` + +Type parameters and their bounds are not restricted, though trait methods cannot +be called, as they are never `const` in this design. Accessing trait methods can +still be useful - for example, they can be turned into function pointers: +```rust +const fn arithmetic_ops() -> [fn(T, T) -> T; 4] { + [Add::add, Sub::sub, Mul::mul, Div::div] +} +``` + +`const` functions can also be unsafe, allowing construction of types that require +invariants to be maintained (e.g. `std::ptr::Unique` requires a non-null pointer) +```rust +struct OptionalInt(u32); +impl OptionalInt { + /// Value must be non-zero + const unsafe fn new(val: u32) -> OptionalInt { + OptionalInt(val) + } +} +``` + +# Drawbacks + +* A design that is not conservative enough risks creating backwards compatibility +hazards that might only be uncovered when a more extensive CTFE proposal is made, +after 1.0. + +# Alternatives + +* While not an alternative, but rather a potential extension, I want to point +out there is only way I could make `const fn`s work with traits (in an untested +design, that is): qualify trait implementations and bounds with `const`. +This is necessary for meaningful interactions with operator overloading traits: +```rust +const fn map_vec3 T>(xs: [T; 3], f: F) -> [T; 3] { + [f([xs[0]), f([xs[1]), f([xs[2])] +} + +const fn neg_vec3(xs: [T; 3]) -> [T; 3] { + map_vec3(xs, |x| -x) +} + +const impl Add for Point { + fn add(self, other: Point) -> Point { + Point { + x: self.x + other.x, + y: self.y + other.y + } + } +} +``` +Having `const` trait methods (where all implementations are `const`) seems +useful, but it would not allow the usecase above on its own. +Trait implementations with `const` methods (instead of the entire `impl` +being `const`) would allow direct calls, but it's not obvious how one could +write a function generic over a type which implements a trait and requiring +that a certain method of that trait is implemented as `const`. + +# Unresolved questions + +* Keep recursion or disallow it for now? The conservative choice of having no +recursive `const fn`s would not affect the usecases intended for this RFC. +If we do allow it, we probably need a recursion limit, and/or an evaluation +algorithm that can handle *at least* tail recursion. +Also, there is no way to actually write a recursive `const fn` at this moment, +because no control flow primitives are implemented for constants, but that +cannot be taken for granted, at least `if`/`else` should eventually work. + +# History + +- This RFC was accepted on 2015-04-06. The primary concerns raised in + the discussion concerned CTFE, and whether the `const fn` strategy + locks us into an undesirable plan there. + +# Updates since being accepted + +Since it was accepted, the RFC has been updated as follows: + +1. Allowed `const unsafe fn` diff --git a/text/0921-entry_v3.md b/text/0921-entry_v3.md new file mode 100644 index 00000000000..f7cdeeef245 --- /dev/null +++ b/text/0921-entry_v3.md @@ -0,0 +1,121 @@ +- Feature Name: entry_v3 +- Start Date: 2015-03-01 +- RFC PR: https://github.com/rust-lang/rfcs/pull/921 +- Rust Issue: https://github.com/rust-lang/rust/issues/23508 + +# Summary + +Replace `Entry::get` with `Entry::or_insert` and +`Entry::or_insert_with` for better ergonomics and clearer code. + +# Motivation + +Entry::get was introduced to reduce a lot of the boiler-plate involved in simple Entry usage. Two +incredibly common patterns in particular stand out: + +``` +match map.entry(key) => { + Entry::Vacant(entry) => { entry.insert(1); }, + Entry::Occupied(entry) => { *entry.get_mut() += 1; }, +} +``` + +``` +match map.entry(key) => { + Entry::Vacant(entry) => { entry.insert(vec![val]); }, + Entry::Occupied(entry) => { entry.get_mut().push(val); }, +} +``` + +This code is noisy, and is visibly fighting the Entry API a bit, such as having to suppress +the return value of insert. It requires the `Entry` enum to be imported into scope. It requires +the user to learn a whole new API. It also introduces a "many ways to do it" stylistic ambiguity: + +``` +match map.entry(key) => { + Entry::Vacant(entry) => entry.insert(vec![]), + Entry::Occupied(entry) => entry.into_mut(), +}.push(val); +``` + +Entry::get tries to address some of this by doing something similar to `Result::ok`. +It maps the Entry into a more familiar Result, while automatically converting the +Occupied case into an `&mut V`. Usage looks like: + + +``` +*map.entry(key).get().unwrap_or_else(|entry| entry.insert(0)) += 1; +``` + +``` +map.entry(key).get().unwrap_or_else(|entry| entry.insert(vec![])).push(val); +``` + +This is certainly *nicer*. No imports are needed, the Occupied case is handled, and we're closer +to a "only one way". However this is still fairly tedious and arcane. `get` provides little +meaning for what is done; `unwrap_or_else` is long and scary-sounding; and VacantEntry literally +*only* supports `insert`, so having to call it seems redundant. + +# Detailed design + +Replace `Entry::get` with the following two methods: + +``` + /// Ensures a value is in the entry by inserting the default if empty, and returns + /// a mutable reference to the value in the entry. + pub fn or_insert(self, default: V) -> &'a mut V { + match self { + Occupied(entry) => entry.into_mut(), + Vacant(entry) => entry.insert(default), + } + } + + /// Ensures a value is in the entry by inserting the result of the default function if empty, + /// and returns a mutable reference to the value in the entry. + pub fn or_insert_with V>(self, default: F) -> &'a mut V { + match self { + Occupied(entry) => entry.into_mut(), + Vacant(entry) => entry.insert(default()), + } + } +``` + +which allows the following: + + +``` +*map.entry(key).or_insert(0) += 1; +``` + +``` +// vec![] doesn't even allocate, and is only 3 ptrs big. +map.entry(key).or_insert(vec![]).push(val); +``` + +``` +let val = map.entry(key).or_insert_with(|| expensive(big, data)); +``` + +Look at all that ergonomics. *Look at it*. This pushes us more into the "one right way" +territory, since this is unambiguously clearer and easier than a full `match` or abusing Result. +Novices don't really need to learn the entry API at all with this. They can just learn the +`.entry(key).or_insert(value)` incantation to start, and work their way up to more complex +usage later. + +Oh hey look this entire RFC is already implemented with all of `rust-lang/rust`'s `entry` +usage audited and updated: https://github.com/rust-lang/rust/pull/22930 + +# Drawbacks + +Replaces the composability of just mapping to a Result with more ad hoc specialty methods. This +is hardly a drawback for the reasons stated in the RFC. Maybe someone was really leveraging +the Result-ness in an exotic way, but it was likely an abuse of the API. Regardless, the `get` +method is trivial to write as a consumer of the API. + +# Alternatives + +Settle for `Result` chumpsville or abandon this sugar altogether. Truly, fates worse than death. + +# Unresolved questions + +None. diff --git a/text/0940-hyphens-considered-harmful.md b/text/0940-hyphens-considered-harmful.md new file mode 100644 index 00000000000..7c5e4dae9b1 --- /dev/null +++ b/text/0940-hyphens-considered-harmful.md @@ -0,0 +1,109 @@ +- Feature Name: `hyphens_considered_harmful` +- Start Date: 2015-03-05 +- RFC PR: [rust-lang/rfcs#940](https://github.com/rust-lang/rfcs/pull/940) +- Rust Issue: [rust-lang/rust#23533](https://github.com/rust-lang/rust/issues/23533) + +# Summary + +Disallow hyphens in Rust crate names, but continue allowing them in Cargo packages. + +# Motivation + +This RFC aims to reconcile two conflicting points of view. + +First: hyphens in crate names are awkward to use, and inconsistent with the rest of the language. Anyone who uses such a crate must rename it on import: + +```rust +extern crate "rustc-serialize" as rustc_serialize; +``` + +An earlier version of this RFC aimed to solve this issue by removing hyphens entirely. + +However, there is a large amount of precedent for keeping `-` in package names. Systems as varied as GitHub, npm, RubyGems and Debian all have an established convention of using hyphens. Disallowing them would go against this precedent, causing friction with the wider community. + +Fortunately, Cargo presents us with a solution. It already separates the concepts of *package name* (used by Cargo and crates.io) and *crate name* (used by rustc and `extern crate`). We can disallow hyphens in the crate name only, while still accepting them in the outer package. This solves the usability problem, while keeping with the broader convention. + +# Detailed design + +## Disallow hyphens in crates (only) + +In **rustc**, enforce that all crate names are valid identifiers. + +In **Cargo**, continue allowing hyphens in package names. + +The difference will be in the crate name Cargo passes to the compiler. If the `Cargo.toml` does *not* specify an explicit crate name, then Cargo will use the package name but with all `-` replaced by `_`. + +For example, if I have a package named `apple-fritter`, Cargo will pass `--crate-name apple_fritter` to the compiler instead. + +Since most packages do not set their own crate names, this mapping will ensure that the majority of hyphenated packages continue to build unchanged. + +## Identify `-` and `_` on crates.io + +Right now, crates.io compares package names case-insensitively. This means, for example, you cannot upload a new package named `RUSTC-SERIALIZE` because `rustc-serialize` already exists. + +Under this proposal, we will extend this logic to identify `-` and `_` as well. + +## Remove the quotes from `extern crate` + +Change the syntax of `extern crate` so that the crate name is no longer in quotes (e.g. `extern crate photo_finish as photo;`). This is viable now that all crate names are valid identifiers. + +To ease the transition, keep the old `extern crate` syntax around, transparently mapping any hyphens to underscores. For example, `extern crate "silver-spoon" as spoon;` will be desugared to `extern crate silver_spoon as spoon;`. This syntax will be deprecated, and removed before 1.0. + +# Drawbacks + +## Inconsistency between packages and crates + +This proposal makes package and crate names inconsistent: the former will accept hyphens while the latter will not. + +However, this drawback may not be an issue in practice. As hinted in the motivation, most other platforms have different syntaxes for packages and crates/modules anyway. Since the package system is orthogonal to the language itself, there is no need for consistency between the two. + +## Inconsistency between `-` and `_` + +Quoth @P1start: + +> ... it's also annoying to have to choose between `-` and `_` when choosing a crate name, and to remember which of `-` and `_` a particular crate uses. + +I believe, like other naming issues, this problem can be addressed by conventions. + +# Alternatives + +## Do nothing + +As with any proposal, we can choose to do nothing. But given the reasons outlined above, the author believes it is important that we address the problem before the beta release. + +## Disallow hyphens in package names as well + +An earlier version of this RFC proposed to disallow hyphens in packages as well. The drawbacks of this idea are covered in the motivation. + +## Make `extern crate` match fuzzily + +Alternatively, we can have the compiler consider hyphens and underscores as equal while looking up a crate. In other words, the crate `flim-flam` would match both `extern crate flim_flam` and `extern crate "flim-flam" as flim_flam`. + +This involves much more magic than the original proposal, and it is not clear what advantages it has over it. + +## Repurpose hyphens as namespace separators + +Alternatively, we can treat hyphens as path separators in Rust. + +For example, the crate `hoity-toity` could be imported as + +```rust +extern crate hoity::toity; +``` + +which is desugared to: + +```rust +mod hoity { + mod toity { + extern crate "hoity-toity" as krate; + pub use krate::*; + } +} +``` + +However, on prototyping this proposal, the author found it too complex and fraught with edge cases. For these reasons the author chose not to push this solution. + +# Unresolved questions + +None so far. diff --git a/text/0953-op-assign.md b/text/0953-op-assign.md new file mode 100644 index 00000000000..cf9d1397de9 --- /dev/null +++ b/text/0953-op-assign.md @@ -0,0 +1,89 @@ +- Feature Name: op_assign +- Start Date: 2015-03-08 +- RFC PR: [rust-lang/rfcs#953](https://github.com/rust-lang/rfcs/pull/953) +- Rust Issue: [rust-lang/rust#28235](https://github.com/rust-lang/rust/issues/28235) + +# Summary + +Add the family of `[Op]Assign` traits to allow overloading assignment +operations like `a += b`. + +# Motivation + +We already let users overload the binary operations, letting them overload the +assignment version is the next logical step. Plus, this sugar is important to +make mathematical libraries more palatable. + +# Detailed design + +Add the following **unstable** traits to libcore and reexported them in libstd: + +``` +// `+=` +#[lang = "add_assign"] +trait AddAssign { + fn add_assign(&mut self, Rhs); +} + +// the remaining traits have the same signature +// (lang items have been omitted for brevity) +trait BitAndAssign { .. } // `&=` +trait BitOrAssign { .. } // `|=` +trait BitXorAssign { .. } // `^=` +trait DivAssign { .. } // `/=` +trait MulAssign { .. } // `*=` +trait RemAssign { .. } // `%=` +trait ShlAssign { .. } // `<<=` +trait ShrAssign { .. } // `>>=` +trait SubAssign { .. } // `-=` +``` + +Implement these traits for the primitive numeric types *without* overloading, +i.e. only `impl AddAssign for i32 { .. }`. + +Add an `op_assign` feature gate. When the feature gate is enabled, the compiler +will consider these traits when typecheking `a += b`. Without the feature gate +the compiler will enforce that `a` and `b` must be primitives of the same +type/category as it does today. + +Once we feel comfortable with the implementation we'll remove the feature gate +and mark the traits as stable. This can be done after 1.0 as this change is +backwards compatible. + +## RHS: By value vs by ref + +Taking the RHS by value is more flexible. The implementations allowed with +a by value RHS are a superset of the implementations allowed with a by ref RHS. +An example where taking the RHS by value is necessary would be operator sugar +for extending a collection with an iterator [1]: `vec ++= iter` where +`vec: Vec` and `iter impls Iterator`. This can't be implemented with the +by ref version as the iterator couldn't be advanced in that case. + +[1] Where `++` is the "combine" operator that has been proposed [elsewhere]. +Note that this RFC doesn't propose adding that particular operator or adding +similar overloaded operations (`vec += iter`) to stdlib's collections, but it +leaves the door open to the possibility of adding them in the future (if +desired). + +[elsewhere]: https://github.com/rust-lang/rfcs/pull/203 + +# Drawbacks + +None that I can think of. + +# Alternatives + +Take the RHS by ref. This is less flexible than taking the RHS by value but, in +some instances, it can save writing `&rhs` when the RHS is owned and the +implementation demands a reference. However, this last point will be moot if we +implement auto-referencing for binary operators, as `lhs += rhs` would actually +call `add_assign(&mut lhs, &rhs)` if `Lhs impls AddAssign<&Rhs>`. + +# Unresolved questions + +Should we overload `ShlAssign` and `ShrAssign`, e.g. +`impl ShlAssign for i32`, since we have already overloaded the `Shl` and +`Shr` traits? + +Should we overload all the traits for references, e.g. +`impl<'a> AddAssign<&'a i32> for i32` to allow `x += &0;`? diff --git a/text/0968-closure-return-type-syntax.md b/text/0968-closure-return-type-syntax.md new file mode 100644 index 00000000000..86679f6cc54 --- /dev/null +++ b/text/0968-closure-return-type-syntax.md @@ -0,0 +1,52 @@ +- Feature Name: N/A +- Start Date: 2015-03-16 +- RFC PR: [rust-lang/rfcs#968](https://github.com/rust-lang/rfcs/pull/968) +- Rust Issue: [rust-lang/rust#23420](https://github.com/rust-lang/rust/issues/23420) + +# Summary + +Restrict closure return type syntax for future compatibility. + +# Motivation + +Today's closure return type syntax juxtaposes a type and an +expression. This is dangerous: if we choose to extend the type grammar +to be more acceptable, we can easily break existing code. + +# Detailed design + +The current closure syntax for annotating the return type is `|Args| +-> Type Expr`, where `Type` is the return type and `Expr` is the body +of the closure. This syntax is future hostile and relies on being able +to determine the end point of a type. If we extend the syntax for +types, we could cause parse errors in existing code. + +An example from history is that we extended the type grammar to +include things like `Fn(..)`. This would have caused the following, +previous, legal -- closure not to parse: `|| -> Foo (Foo)`. As a +simple fix, this RFC proposes that if a return type annotation is +supplied, the body must be enclosed in braces: `|| -> Foo { (Foo) }`. +Types are already juxtaposed with open braces in `fn` items, so this +should not be an additional danger for future evolution. + +# Drawbacks + +This design is minimally invasive but perhaps unfortunate in that it's +not obvious that braces would be required. But then, return type +annotations are very rarely used. + +# Alternatives + +I am not aware of any alternate designs. One possibility would be to +remove return type anotations altogether, perhaps relying on type +ascription or other annotations to force the inferencer to figure +things out, but they are useful in rare scenarios. In particular type +ascription would not be able to handle a higher-ranked signature like +`for<'a> &'a X -> &'a Y` without improving the type checker +implementation in other ways (in particular, we don't infer +generalization over lifetimes at present, unless we can figure it out +from the expected type or explicit annotations). + +# Unresolved questions + +None. diff --git a/text/0979-align-splitn-with-other-languages.md b/text/0979-align-splitn-with-other-languages.md new file mode 100644 index 00000000000..2397d833203 --- /dev/null +++ b/text/0979-align-splitn-with-other-languages.md @@ -0,0 +1,153 @@ +- Feature Name: n/a +- Start Date: 2015-03-15 +- RFC PR: https://github.com/rust-lang/rfcs/pull/979 +- Rust Issue: https://github.com/rust-lang/rust/issues/23911 + +# Summary + +Make the `count` parameter of `SliceExt::splitn`, `StrExt::splitn` and +corresponding reverse variants mean the *maximum number of items +returned*, instead of the *maximum number of times to match the +separator*. + +# Motivation + +The majority of other languages (see examples below) treat the `count` +parameter as the maximum number of items to return. Rust already has +many things newcomers need to learn, making other things similar can +help adoption. + +# Detailed design + +Currently `splitn` uses the `count` parameter to decide how many times +the separator should be matched: + +```rust +let v: Vec<_> = "a,b,c".splitn(2, ',').collect(); +assert_eq!(v, ["a", "b", "c"]); +``` + +The simplest change we can make is to decrement the count in the +constructor functions. If the count becomes zero, we mark the returned +iterator as `finished`. See **Unresolved questions** for nicer +transition paths. + +## Example usage + +### Strings + +```rust +let input = "a,b,c"; +let v: Vec<_> = input.splitn(2, ',').collect(); +assert_eq!(v, ["a", "b,c"]); + +let v: Vec<_> = input.splitn(1, ',').collect(); +assert_eq!(v, ["a,b,c"]); + +let v: Vec<_> = input.splitn(0, ',').collect(); +assert_eq!(v, []); +``` + +### Slices + +```rust +let input = [1, 0, 2, 0, 3]; +let v: Vec<_> = input.splitn(2, |&x| x == 0).collect(); +assert_eq!(v, [[1], [2, 0, 3]]); + +let v: Vec<_> = input.splitn(1, |&x| x == 0).collect(); +assert_eq!(v, [[1, 0, 2, 0, 3]]); + +let v: Vec<_> = input.splitn(0, |&x| x == 0).collect(); +assert_eq!(v, []); +``` + +## Languages where `count` is the maximum number of items returned + +### C# ### + +```c# +"a,b,c".Split(new char[] {','}, 2) +// ["a", "b,c"] +``` + +### Clojure + +```clojure +(clojure.string/split "a,b,c" #"," 2) +;; ["a" "b,c"] +``` + +### Go + +```go +strings.SplitN("a,b,c", ",", 2) +// [a b,c] +``` + +### Java + +```java +"a,b,c".split(",", 2); +// ["a", "b,c"] +``` + +### Ruby + +```ruby +"a,b,c".split(',', 2) +# ["a", "b,c"] +``` + +### Perl + +```perl +split(",", "a,b,c", 2) +# ['a', 'b,c'] +``` + +## Languages where `count` is the maximum number of times the separator will be matched + +### Python + +```python +"a,b,c".split(',', 2) +# ['a', 'b', 'c'] +``` + +### Swift + +```swift +split("a,b,c", { $0 == "," }, maxSplit: 2) +// ["a", "b", "c"] +``` + +# Drawbacks + +Changing the *meaning* of the `count` parameter without changing the +*type* is sure to cause subtle issues. See **Unresolved questions**. + +The iterator can only return 2^64 values; previously we could return +2^64 + 1. This could also be considered an upside, as we can now +return an empty iterator. + +# Alternatives + +1. Keep the status quo. People migrating from many other languages +will continue to be surprised. + +2. Add a parallel set of functions that clearly indicate that `count` +is the maximum number of items that can be returned. + +# Unresolved questions + +Is there a nicer way to change the behavior of `count` such that users +of `splitn` get compile-time errors when migrating? + +1. Add a dummy parameter, and mark the methods unstable. Remove the +parameterand re-mark as stable near the end of the beta period. + +2. Move the methods from `SliceExt` and `StrExt` to a new trait that +needs to be manually imported. After the transition, move the methods +back and deprecate the trait. This would not break user code that +migrated to the new semantic. diff --git a/text/0980-read-exact.md b/text/0980-read-exact.md new file mode 100644 index 00000000000..f703b9c72e2 --- /dev/null +++ b/text/0980-read-exact.md @@ -0,0 +1,284 @@ +- Feature Name: read_exact +- Start Date: 2015-03-15 +- RFC PR: https://github.com/rust-lang/rfcs/pull/980 +- Rust Issue: https://github.com/rust-lang/rust/issues/27585 + +# Summary + +Rust's `Write` trait has the `write_all` method, which is a convenience +method that writes a whole buffer, failing with `ErrorKind::WriteZero` +if the buffer cannot be written in full. + +This RFC proposes adding its `Read` counterpart: a method (here called +`read_exact`) that reads a whole buffer, failing with an error (here +called `ErrorKind::UnexpectedEOF`) if the buffer cannot be read in full. + +# Motivation + +When dealing with serialization formats with fixed-length fields, +reading or writing less than the field's size is an error. For the +`Write` side, the `write_all` method does the job; for the `Read` side, +however, one has to call `read` in a loop until the buffer is completely +filled, or until a premature EOF is reached. + +This leads to a profusion of similar helper functions. For instance, the +`byteorder` crate has a `read_full` function, and the `postgres` crate +has a `read_all` function. However, their handling of the premature EOF +condition differs: the `byteorder` crate has its own `Error` enum, with +`UnexpectedEOF` and `Io` variants, while the `postgres` crate uses an +`io::Error` with an `io::ErrorKind::Other`. + +That can make it unnecessarily hard to mix uses of these helper +functions; for instance, if one wants to read a 20-byte tag (using one's +own helper function) followed by a big-endian integer, either the helper +function has to be written to use `byteorder::Error`, or the calling +code has to deal with two different ways to represent a premature EOF, +depending on which field encountered the EOF condition. + +Additionally, when reading from an in-memory buffer, looping is not +necessary; it can be replaced by a size comparison followed by a +`copy_memory` (similar to `write_all` for `&mut [u8]`). If this +non-looping implementation is `#[inline]`, and the buffer size is known +(for instance, it's a fixed-size buffer in the stack, or there was an +earlier check of the buffer size against a larger value), the compiler +could potentially turn a read from the buffer followed by an endianness +conversion into the native endianness (as can happen when using the +`byteorder` crate) into a single-instruction direct load from the buffer +into a register. + +# Detailed design + +First, a new variant `UnexpectedEOF` is added to the `io::ErrorKind` enum. + +The following method is added to the `Read` trait: + +``` rust +fn read_exact(&mut self, buf: &mut [u8]) -> Result<()>; +``` + +Aditionally, a default implementation of this method is provided: + +``` rust +fn read_exact(&mut self, mut buf: &mut [u8]) -> Result<()> { + while !buf.is_empty() { + match self.read(buf) { + Ok(0) => break, + Ok(n) => { let tmp = buf; buf = &mut tmp[n..]; } + Err(ref e) if e.kind() == ErrorKind::Interrupted => {} + Err(e) => return Err(e), + } + } + if !buf.is_empty() { + Err(Error::new(ErrorKind::UnexpectedEOF, "failed to fill whole buffer")) + } else { + Ok(()) + } +} +``` + +And an optimized implementation of this method for `&[u8]` is provided: + +```rust +#[inline] +fn read_exact(&mut self, buf: &mut [u8]) -> Result<()> { + if (buf.len() > self.len()) { + return Err(Error::new(ErrorKind::UnexpectedEOF, "failed to fill whole buffer")); + } + let (a, b) = self.split_at(buf.len()); + slice::bytes::copy_memory(a, buf); + *self = b; + Ok(()) +} +``` + +The detailed semantics of `read_exact` are as follows: `read_exact` +reads exactly the number of bytes needed to completely fill its `buf` +parameter. If that's not possible due to an "end of file" condition +(that is, the `read` method would return 0 even when passed a buffer +with at least one byte), it returns an `ErrorKind::UnexpectedEOF` error. + +On success, the read pointer is advanced by the number of bytes read, as +if the `read` method had been called repeatedly to fill the buffer. On +any failure (including an `ErrorKind::UnexpectedEOF`), the read pointer +might have been advanced by any number between zero and the number of +bytes requested (inclusive), and the contents of its `buf` parameter +should be treated as garbage (any part of it might or might not have +been overwritten by unspecified data). + +Even if the failure was an `ErrorKind::UnexpectedEOF`, the read pointer +might have been advanced by a number of bytes less than the number of +bytes which could be read before reaching an "end of file" condition. + +The `read_exact` method will never return an `ErrorKind::Interrupted` +error, similar to the `read_to_end` method. + +Similar to the `read` method, no guarantees are provided about the +contents of `buf` when this function is called; implementations cannot +rely on any property of the contents of `buf` being true. It is +recommended that implementations only write data to `buf` instead of +reading its contents. + +# About ErrorKind::Interrupted + +Whether or not `read_exact` can return an `ErrorKind::Interrupted` error +is orthogonal to its semantics. One could imagine an alternative design +where `read_exact` could return an `ErrorKind::Interrupted` error. + +The reason `read_exact` should deal with `ErrorKind::Interrupted` itself +is its non-idempotence. On failure, it might have already partially +advanced its read pointer an unknown number of bytes, which means it +can't be easily retried after an `ErrorKind::Interrupted` error. + +One could argue that it could return an `ErrorKind::Interrupted` error +if it's interrupted before the read pointer is advanced. But that +introduces a non-orthogonality in the design, where it might either +return or retry depending on whether it was interrupted at the beginning +or in the middle. Therefore, the cleanest semantics is to always retry. + +There's precedent for this choice in the `read_to_end` method. Users who +need finer control should use the `read` method directly. + +# About the read pointer + +This RFC proposes a `read_exact` function where the read pointer +(conceptually, what would be returned by `Seek::seek` if the stream was +seekable) is unspecified on failure: it might not have advanced at all, +have advanced in full, or advanced partially. + +Two possible alternatives could be considered: never advance the read +pointer on failure, or always advance the read pointer to the "point of +error" (in the case of `ErrorKind::UnexpectedEOF`, to the end of the +stream). + +Never advancing the read pointer on failure would make it impossible to +have a default implementation (which calls `read` in a loop), unless the +stream was seekable. It would also impose extra costs (like creating a +temporary buffer) to allow "seeking back" for non-seekable streams. + +Always advancing the read pointer to the end on failure is possible; it +happens without any extra code in the default implementation. However, +it can introduce extra costs in optimized implementations. For instance, +the implementation given above for `&[u8]` would need a few more +instructions in the error case. Some implementations (for instance, +reading from a compressed stream) might have a larger extra cost. + +The utility of always advancing the read pointer to the end is +questionable; for non-seekable streams, there's not much that can be +done on an "end of file" condition, so most users would discard the +stream in both an "end of file" and an `ErrorKind::UnexpectedEOF` +situation. For seekable streams, it's easy to seek back, but most users +would treat an `ErrorKind::UnexpectedEOF` as a "corrupted file" and +discard the stream anyways. + +Users who need finer control should use the `read` method directly, or +when available use the `Seek` trait. + +# About the buffer contents + +This RFC proposes that the contents of the output buffer be undefined on +an error return. It might be untouched, partially overwritten, or +completely overwritten (even if less bytes could be read; for instance, +this method might in theory use it as a scratch space). + +Two possible alternatives could be considered: do not touch it on +failure, or overwrite it with valid data as much as possible. + +Never touching the output buffer on failure would make it much more +expensive for the default implementation (which calls `read` in a loop), +since it would have to read into a temporary buffer and copy to the +output buffer on success. Any implementation which cannot do an early +return for all failure cases would have similar extra costs. + +Overwriting as much as possible with valid data makes some sense; it +happens without any extra cost in the default implementation. However, +for optimized implementations this extra work is useless; since the +caller can't know how much is valid data and how much is garbage, it +can't make use of the valid data. + +Users who need finer control should use the `read` method directly. + +# Naming + +It's unfortunate that `write_all` used `WriteZero` for its `ErrorKind`; +were it named `UnexpectedEOF` (which is a much more intuitive name), the +same `ErrorKind` could be used for both functions. + +The initial proposal for this `read_exact` method called it `read_all`, +for symmetry with `write_all`. However, that name could also be +interpreted as "read as many bytes as you can that fit on this buffer, +and return what you could read" instead of "read enough bytes to fill +this buffer, and fail if you couldn't read them all". The previous +discussion led to `read_exact` for the later meaning, and `read_full` +for the former meaning. + +# Drawbacks + +If this method fails, the buffer contents are undefined; the +`read_exact' method might have partially overwritten it. If the caller +requires "all-or-nothing" semantics, it must clone the buffer. In most +use cases, this is not a problem; the caller will discard or overwrite +the buffer in case of failure. + +In the same way, if this method fails, there is no way to determine how +many bytes were read before it determined it couldn't completely fill +the buffer. + +Situations that require lower level control can still use `read` +directly. + +# Alternatives + +The first alternative is to do nothing. Every Rust user needing this +functionality continues to write their own read_full or read_exact +function, or have to track down an external crate just for one +straightforward and commonly used convenience method. Additionally, +unless everybody uses the same external crate, every reimplementation of +this method will have slightly different error handling, complicating +mixing users of multiple copies of this convenience method. + +The second alternative is to just add the `ErrorKind::UnexpectedEOF` or +similar. This would lead in the long run to everybody using the same +error handling for their version of this convenience method, simplifying +mixing their uses. However, it's questionable to add an `ErrorKind` +variant which is never used by the standard library. + +Another alternative is to return the number of bytes read in the error +case. That makes the buffer contents defined also in the error case, at +the cost of increasing the size of the frequently-used `io::Error` +struct, for a rarely used return value. My objections to this +alternative are: + +* If the caller has an use for the partially written buffer contents, + then it's treating the "buffer partially filled" case as an + alternative success case, not as a failure case. This is not a good + match for the semantics of an `Err` return. +* Determining that the buffer cannot be completely filled can in some + cases be much faster than doing a partial copy. Many callers are not + going to be interested in an incomplete read, meaning that all the + work of filling the buffer is wasted. +* As mentioned, it increases the size of a commonly used type in all + cases, even when the code has no mention of `read_exact`. + +The final alternative is `read_full`, which returns the number of bytes +read (`Result`) instead of failing. This means that every caller +has to check the return value against the size of the passed buffer, and +some are going to forget (or misimplement) the check. It also prevents +some optimizations (like the early return in case there will never be +enough data). There are, however, valid use cases for this alternative; +for instance, reading a file in fixed-size chunks, where the last chunk +(and only the last chunk) can be shorter. I believe this should be +discussed as a separate proposal; its pros and cons are distinct enough +from this proposal to merit its own arguments. + +I believe that the case for `read_full` is weaker than `read_exact`, for +the following reasons: + +* While `read_exact` needs an extra variant in `ErrorKind`, `read_full` + has no new error cases. This means that implementing it yourself is + easy, and multiple implementations have no drawbacks other than code + duplication. +* While `read_exact` can be optimized with an early return in cases + where the reader knows its total size (for instance, reading from a + compressed file where the uncompressed size was given in a header), + `read_full` has to always write to the output buffer, so there's not + much to gain over a generic looping implementation calling `read`. diff --git a/text/0982-dst-coercion.md b/text/0982-dst-coercion.md new file mode 100644 index 00000000000..98d32fadeb5 --- /dev/null +++ b/text/0982-dst-coercion.md @@ -0,0 +1,188 @@ +- Feature Name: dst_coercions +- Start Date: 2015-03-16 +- RFC PR: [rust-lang/rfcs#982](https://github.com/rust-lang/rfcs/pull/982) +- Rust Issue: [rust-lang/rust#18598](https://github.com/rust-lang/rust/issues/18598) + +# Summary + +Custom coercions allow smart pointers to fully participate in the DST system. +In particular, they allow practical use of `Rc` and `Arc` where `T` is unsized. + +This RFC subsumes part of [RFC 401 coercions](https://github.com/rust-lang/rfcs/blob/master/text/0401-coercions.md). + +# Motivation + +DST is not really finished without this, in particular there is a need for types +like reference counted trait objects (`Rc`) which are not currently well- +supported (without coercions, it is pretty much impossible to create such values +with such a type). + +# Detailed design + +There is an `Unsize` trait and lang item. This trait signals that a type can be +converted using the compiler's coercion machinery from a sized to an unsized +type. All implementations of this trait are implicit and compiler generated. It +is an error to implement this trait. If `&T` can be coerced to `&U` then there +will be an implementation of `Unsize` for `T`. E.g, `[i32; 42]: +Unsize<[i32]>`. Note that the existence of an `Unsize` impl does not signify a +coercion can itself can take place, it represents an internal part of the +coercion mechanism (it corresponds with `coerce_inner` from RFC 401). The trait +is defined as: + +``` +#[lang="unsize"] +trait Unsize: ::std::marker::PhantomFn {} +``` + +There are implementations for any fixed size array to the corresponding unsized +array, for any type to any trait that that type implements, for structs and +tuples where the last field can be unsized, and for any pair of traits where +`Self` is a sub-trait of `T` (see RFC 401 for more details). + +There is a `CoerceUnsized` trait which is implemented by smart pointer types to +opt-in to DST coercions. It is defined as: + +``` +#[lang="coerce_unsized"] +trait CoerceUnsized: ::std::marker::PhantomFn + Sized {} +``` + +An example implementation: + +``` +impl, U: ?Sized> CoerceUnsized> for Rc {} +impl, U: Zeroable> CoerceUnsized> for NonZero {} + +// For reference, the definitions of Rc and NonZero: +pub struct Rc { + _ptr: NonZero<*mut RcBox>, +} +pub struct NonZero(T); +``` + +Implementing `CoerceUnsized` indicates that the self type should be able to be +coerced to the `Target` type. E.g., the above implementation means that +`Rc<[i32; 42]>` can be coerced to `Rc<[i32]>`. There will be `CoerceUnsized` impls +for the various pointer kinds available in Rust and which allow coercions, therefore +`CoerceUnsized` when used as a bound indicates coercible types. E.g., + +``` +fn foo, U>(x: T) -> U { + x +} +``` + +Built-in pointer impls: + +``` +impl<'a, 'b: 'aT: ?Sized+Unsize, U: ?Sized> CoerceUnsized<&'a U> for &'b mut T {} +impl<'a, T: ?Sized+Unsize, U: ?Sized> CoerceUnsized<&'a mut U> for &'a mut T {} +impl<'a, T: ?Sized+Unsize, U: ?Sized> CoerceUnsized<*const U> for &'a mut T {} +impl<'a, T: ?Sized+Unsize, U: ?Sized> CoerceUnsized<*mut U> for &'a mut T {} + +impl<'a, 'b: 'a, T: ?Sized+Unsize, U: ?Sized> CoerceUnsized<&'a U> for &'b T {} +impl<'b, T: ?Sized+Unsize, U: ?Sized> CoerceUnsized<*const U> for &'b T {} + +impl, U: ?Sized> CoerceUnsized<*const U> for *mut T {} +impl, U: ?Sized> CoerceUnsized<*mut U> for *mut T {} + +impl, U: ?Sized> CoerceUnsized<*const U> for *const T {} +``` + +Note that there are some coercions which are not given by `CoerceUnsized`, e.g., +from safe to unsafe function pointers, so it really is a `CoerceUnsized` trait, +not a general `Coerce` trait. + + +## Compiler checking + +### On encountering an implementation of `CoerceUnsized` (type collection phase) + +* If the impl is for a built-in pointer type, we check nothing, otherwise... +* The compiler checks that the `Self` type is a struct or tuple struct and that +the `Target` type is a simple substitution of type parameters from the `Self` +type (i.e., That `Self` is `Foo`, `Target` is `Foo` and that there exist +`Vs` and `Xs` (where `Xs` are all type parameters) such that `Target = [Vs/Xs]Self`. +One day, with HKT, this could be a regular part of type checking, for now +it must be an ad hoc check). We might enforce that this substitution is of the +form `X/Y` where `X` and `Y` are both formal type parameters of the +implementation (I don't think this is necessary, but it makes checking coercions +easier and is satisfied for all smart pointers). +* The compiler checks each field in the `Self` type against the corresponding field +in the `Target` type. Assuming `Fs` is the type of a field in `Self` and `Ft` is +the type of the corresponding field in `Target`, then either `Ft <: Fs` or +`Fs: CoerceUnsized` (note that this includes some built-in coercions, coercions +unrelated to unsizing are excluded, these could probably be added later, if needed). +* There must be only one non-PhantomData field that is coerced. +* We record for each impl, the index of the field in the `Self` type which is +coerced. + +### On encountering a potential coercion (type checking phase) + +* If we have an expression with type `E` where the type `F` is required during +type checking and `E` is not a subtype of `F`, nor is it coercible using the +built-in coercions, then we search for a bound of `E: CoerceUnsized`. Note +that we may not at this stage find the actual impl, but finding the bound is +good enough for type checking. + +* If we require a coercion in the receiver of a method call or field lookup, we +perform the same search that we currently do, except that where we currently +check for coercions, we check for built-in coercions and then for `CoerceUnsized` +bounds. We must also check for `Unsize` bounds for the case where the receiver +is auto-deref'ed, but not autoref'ed. + + +### On encountering an adjustment (translation phase) + +* In trans (which is post-monomorphisation) we should always be able to find an +impl for any `CoerceUnsized` bound. +* If the impl is for a built-in pointer type, then we use the current coercion +code for the various pointer kinds (`Box` has different behaviour than `&` and +`*` pointers). +* Otherwise, we lookup which field is coerced due to the opt-in coercion, move +the object being coerced and coerce the field in question by recursing (the +built-in pointers are the base cases). + + +### Adjustment types + +We add `AdjustCustom` to the `AutoAdjustment` enum as a placeholder for coercions +due to a `CoerceUnsized` bound. I don't think we need the `UnsizeKind` enum at +all now, since all checking is postponed until trans or relies on traits and impls. + + +# Drawbacks + +Not as flexible as the previous proposal. + +# Alternatives + +The original [DST5 proposal](http://smallcultfollowing.com/babysteps/blog/2014/01/05/dst-take-5/) +contains a similar proposal with no opt-in trait, i.e., coercions are completely +automatic and arbitrarily deep. This is a little too magical and unpredicatable. +It violates some 'soft abstraction boundaries' by interefering with the deep +structure of objects, sometimes even automatically (and implicitly) allocating. + +[RFC 401](https://github.com/rust-lang/rfcs/blob/master/text/0401-coercions.md) +proposed a scheme for proposals where users write their own coercion using +intrinsics. Although more flexible, this allows for implcicit excecution of +arbitrary code. If we need the increased flexibility, I believe we can add a +manual option to the `CoerceUnsized` trait backwards compatibly. + +The proposed design could be tweaked: for example, we could change the +`CoerceUnsized` trait in many ways (we experimented with an associated type to +indicate the field type which is coerced, for example). + +# Unresolved questions + +It is unclear to what extent DST coercions should support multiple fields that +refer to the same type parameter. `PhantomData` should definitely be +supported as an "extra" field that's skipped, but can all zero-sized fields +be skipped? Are there cases where this would enable by-passing the abstractions +that make some API safe? + +# Updates since being accepted + +Since it was accepted, the RFC has been updated as follows: + +1. `CoerceUnsized` was specified to ingore PhantomData fields. diff --git a/text/1011-process.exit.md b/text/1011-process.exit.md new file mode 100644 index 00000000000..e38e2bfdf90 --- /dev/null +++ b/text/1011-process.exit.md @@ -0,0 +1,90 @@ +- Feature Name: exit +- Start Date: 2015-03-24 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1011 +- Rust Issue: (leave this empty) + +# Summary + +Add a function to the `std::process` module to exit the process immediately with +a specified exit code. + +# Motivation + +Currently there is no stable method to exit a program in Rust with a nonzero +exit code without panicking. The current unstable method for doing so is by +using the `exit_status` feature with the `std::env::set_exit_status` function. + +This function has not been stabilized as it diverges from the system APIs (there +is no equivalent) and it represents an odd piece of global state for a Rust +program to have. One example of odd behavior that may arise is that if a library +calls `env::set_exit_status`, then the process is not guaranteed to exit with +that status (e.g. Rust was called from C). + +The purpose of this RFC is to provide at least one method on the path to +stabilization which will provide a method to exit a process with an arbitrary +exit code. + +# Detailed design + +The following function will be added to the `std::process` module: + +```rust +/// Terminates the current process with the specified exit code. +/// +/// This function will never return and will immediately terminate the current +/// process. The exit code is passed through to the underlying OS and will be +/// available for consumption by another process. +/// +/// Note that because this function never returns, and that it terminates the +/// process, no destructors on the current stack or any other thread's stack +/// will be run. If a clean shutdown is needed it is recommended to only call +/// this function at a known point where there are no more destructors left +/// to run. +pub fn exit(code: i32) -> !; +``` + +Implementation-wise this will correspond to the [`exit` function][unix] on unix +and the [`ExitProcess` function][win] on windows. + +[unix]: http://pubs.opengroup.org/onlinepubs/000095399/functions/exit.html +[win]: https://msdn.microsoft.com/en-us/library/windows/desktop/ms682658%28v=vs.85%29.aspx + +This function is also not marked `unsafe`, despite the risk of leaking +allocated resources (e.g. destructors may not be run). It is already possible +to safely create memory leaks in Rust, however, (with `Rc` + `RefCell`), so +this is not considered a strong enough threshold to mark the function as +`unsafe`. + +# Drawbacks + +* This API does not solve all use cases of exiting with a nonzero exit status. + It is sometimes more convenient to simply return a code from the `main` + function instead of having to call a separate function in the standard + library. + +# Alternatives + +* One alternative would be to stabilize `set_exit_status` as-is today. The + semantics of the function would be clearly documented to prevent against + surprises, but it would arguably not prevent all surprises from arising. Some + reasons for not pursuing this route, however, have been outlined in the + motivation. + +* The `main` function of binary programs could be altered to require an + `i32` return value. This would greatly lessen the need to stabilize this + function as-is today as it would be possible to exit with a nonzero code by + returning a nonzero value from `main`. This is a backwards-incompatible + change, however. + +* The `main` function of binary programs could optionally be typed as `fn() -> + i32` instead of just `fn()`. This would be a backwards-compatible change, but + does somewhat add complexity. It may strike some as odd to be able to define + the `main` function with two different signatures in Rust. Additionally, it's + likely that the `exit` functionality proposed will be desired regardless of + whether the main function can return a code or not. + +# Unresolved questions + +* To what degree should the documentation imply that `rt::at_exit` handlers are + run? Implementation-wise their execution is guaranteed, but we may not wish + for this to always be so. diff --git a/text/1014-stdout-existential-crisis.md b/text/1014-stdout-existential-crisis.md new file mode 100644 index 00000000000..8649b6c3508 --- /dev/null +++ b/text/1014-stdout-existential-crisis.md @@ -0,0 +1,39 @@ +- Feature Name: `stdout_existential_crisis` +- Start Date: 2015-03-25 +- RFC PR: [rust-lang/rfcs#1014](https://github.com/rust-lang/rfcs/pull/1014) +- Rust Issue: [rust-lang/rust#25977](https://github.com/rust-lang/rust/issues/25977) + +# Summary + +When calling `println!` it currently causes a panic if `stdout` does not exist. Change this to ignore this specific error and simply void the output. + +# Motivation + +On Linux `stdout` almost always exists, so when people write games and turn off the terminal there is still an `stdout` that they write to. Then when getting the code to run on Windows, when the console is disabled, suddenly `stdout` doesn't exist and `println!` panicks. This behavior difference is frustrating to developers trying to move to Windows. + +There is also precedent with C and C++. On both Linux and Windows, if `stdout` is closed or doesn't exist, neither platform will error when attempting to print to the console. + +# Detailed design + +When using any of the convenience macros that write to either `stdout` or `stderr`, such as `println!` `print!` `panic!` and `assert!`, change the implementation to ignore the specific error of `stdout` or `stderr` not existing. The behavior of all other errors will be unaffected. This can be implemented by redirecting `stdout` and `stderr` to `std::io::sink` if the original handles do not exist. + +Update the methods `std::io::stdin` `std::io::stdout` and `std::io::stderr` as follows: +* If `stdout` or `stderr` does not exist, return the equivalent of `std::io::sink`. +* If `stdin` does not exist, return the equivalent of `std::io::empty`. +* For the raw versions, return a `Result`, and if the respective handle does not exist, return an `Err`. + +# Drawbacks + +* Hides an error from the user which we may want to expose and may lead to people missing panicks occuring in threads. +* Some languages, such as Ruby and Python, do throw an exception when stdout is missing. + +# Alternatives + +* Make `println!` `print!` `panic!` `assert!` return errors that the user has to handle. This would lose a large part of the convenience of these macros. +* Continue with the status quo and panic if `stdout` or `stderr` doesn't exist. +* For `std::io::stdin` `std::io::stdout` and `std::io::stderr`, make them return a `Result`. This would be a breaking change to the signature, so if this is desired it should be done immediately before 1.0. +** Alternatively, make the objects returned by these methods error upon attempting to write to/read from them if their respective handle doesn't exist. + +# Unresolved questions + +* Which is better? Breaking the signatures of those three methods in `std::io`, making them silently redirect to `empty`/`sink`, or erroring upon attempting to write to/read from the handle? diff --git a/text/1023-rebalancing-coherence.md b/text/1023-rebalancing-coherence.md new file mode 100644 index 00000000000..fc3ff424fe2 --- /dev/null +++ b/text/1023-rebalancing-coherence.md @@ -0,0 +1,293 @@ +- Feature Name: `fundamental_attribute` +- Start Date: 2015-03-27 +- RFC PR: [rust-lang/rfcs#1023](https://github.com/rust-lang/rfcs/pull/1023) +- Rust Issue: [rust-lang/rust#23086](https://github.com/rust-lang/rust/issues/23086) + +## Summary + +This RFC proposes two rule changes: + +1. Modify the orphan rules so that impls of remote traits require a + local type that is either a struct/enum/trait defined in the + current crate `LT = LocalTypeConstructor<...>` or a reference to a + local type `LT = ... | < | &mut LT`. +2. Restrict negative reasoning so it too obeys the orphan rules. +3. Introduce an unstable `#[fundamental]` attribute that can be used + to extend the above rules in select cases (details below). + +## Motivation + +The current orphan rules are oriented around allowing as many remote +traits as possible. As so often happens, giving power to one party (in +this case, downstream crates) turns out to be taking power away from +another (in this case, upstream crates). The problem is that due to +coherence, the ability to define impls is a zero-sum game: every impl +that is legal to add in a child crate is also an impl that a parent +crate cannot add without fear of breaking downstream crates. A +detailed look at these problems is +[presented here](https://gist.github.com/nikomatsakis/bbe6821b9e79dd3eb477); +this RFC doesn't go over the problems in detail, but will reproduce +some of the examples found in that document. + +This RFC proposes a shift that attempts to strike a balance between +the needs of downstream and upstream crates. In particular, we wish to +preserve the ability of upstream crates to add impls to traits that +they define, while still allowing downstream creates to define the +sorts of impls they need. + +While exploring the problem, we found that in practice remote impls +almost always are tied to a local type or a reference to a local +type. For example, here are some impls from the definition of `Vec`: + +```rust +// tied to Vec +impl Send for Vec + where T: Send + +// tied to &Vec +impl<'a,T> IntoIterator for &'a Vec +``` + +On this basis, we propose that we limit remote impls to require that +they include a type either defined in the current crate or a reference +to a type defined in the current crate. This is more restrictive than +the current definition, which merely requires a local type appear +*somewhere*. So, for example, under this definition `MyType` and +`&MyType` would be considered local, but `Box`, +`Option`, and `(MyType, i32)` would not. + +Furthermore, we limit the use of *negative reasoning* to obey the +orphan rules. That is, just as a crate cannot define an impl `Type: +Trait` unless `Type` or `Trait` is local, it cannot rely that `Type: +!Trait` holds unless `Type` or `Trait` is local. + +Together, these two changes cause very little code breakage while +retaining a lot of freedom to add impls in a backwards compatible +fashion. However, they are not quite sufficient to compile all the +most popular cargo crates (though they almost succeed). Therefore, we +propose an simple, unstable attribute `#[fundamental]` (described +below) that can be used to extend the system to accommodate some +additional patterns and types. This attribute is unstable because it +is not clear whether it will prove to be adequate or need to be +generalized; this part of the design can be considered somewhat +incomplete, and we expect to finalize it based on what we observe +after the 1.0 release. + +### Practical effect + +#### Effect on parent crates + +When you first define a trait, you must also decide whether that trait +should have (a) a blanket impls for all `T` and (b) any blanket impls +over references. These blanket impls cannot be added later without a +major vesion bump, for fear of breaking downstream clients. + +Here are some examples of the kinds of blanket impls that must be added +right away: + +```rust +impl Bar for T { } +impl<'a,T:Bar> Bar for &'a T { } +``` + +#### Effect on child crates + +Under the base rules, child crates are limited to impls that use local +types or references to local types. They are also prevented from +relying on the fact that `Type: !Trait` unless either `Type` or +`Trait` is local. This turns out to be have very little impact. + +In compiling the libstd facade and librustc, exactly two impls were +found to be illegal, both of which followed the same pattern: + +```rust +struct LinkedListEntry<'a> { + data: i32, + next: Option<&'a LinkedListEntry> +} + +impl<'a> Iterator for Option<&'a LinkedListEntry> { + type Item = i32; + + fn next(&mut self) -> Option { + if let Some(ptr) = *self { + *self = Some(ptr.next); + Some(ptr.data) + } else { + None + } + } +} +``` + +The problem here is that `Option<&LinkedListEntry>` is no longer +considered a local type. A similar restriction would be that one +cannot define an impl over `Box`; but this was not +observed in practice. + +Both of these restrictions can be overcome by using a new type. For +example, the code above could be changed so that instead of writing +the impl for `Option<&LinkedListEntry>`, we define a type `LinkedList` +that wraps the option and implement on that: + +```rust +struct LinkedListEntry<'a> { + data: i32, + next: LinkedList<'a> +} + +struct LinkedList<'a> { + data: Option<&'a LinkedListEntry> +} + +impl<'a> Iterator for LinkedList<'a> { + type Item = i32; + + fn next(&mut self) -> Option { + if let Some(ptr) = self.data { + *self = Some(ptr.next); + Some(ptr.data) + } else { + None + } + } +} +``` + +#### Errors from cargo and the fundamental attribute + +We also applied our prototype to all the "Most Downloaded" cargo +crates as well as the `iron` crate. That exercise uncovered a few +patterns that the simple rules presented thus far can't handle. + +The first is that it is common to implement traits over boxed trait +objects. For example, the `error` crate defines an impl: + +- `impl FromError for Box` + +Here, `Error` is a local trait defined in `error`, but `FromError` is +the trait from `libstd`. This impl would be illegal because +`Box` is not considered local as `Box` is not local. + +The second is that it is common to use `FnMut` in blanket impls, +similar to how the `Pattern` trait in `libstd` works. The `regex` crate +in particular has the following impls: + +- `impl<'t> Replacer for &'t str` +- `impl Replacer for F where F: FnMut(&Captures) -> String` +- these are in conflict because this requires that `&str: !FnMut`, and + neither `&str` nor `FnMut` are local to `regex` + +Given that overloading over closures is likely to be a common request, +and that the `Fn` traits are well-known, core traits tied to the call +operator, it seems reasonable to say that implementing a `Fn` trait is +itself a breaking change. (This is not to suggest that there is +something *fundamental* about the `Fn` traits that distinguish them +from all other traits; just that if the goal is to have rules that +users can easily remember, saying that implememting a core operator +trait is a breaking change may be a reasonable rule, and it enables +useful patterns to boot -- patterns that are baked into the libstd +APIs.) + +To accommodate these cases (and future cases we will no doubt +encounter), this RFC proposes an unstable attribute +`#[fundamental]`. `#[fundamental]` can be applied to types and traits +with the following meaning: + +- A `#[fundamental]` type `Foo` is one where implementing a blanket + impl over `Foo` is a breaking change. As described, `&` and `&mut` are + fundamental. This attribute would be applied to `Box`, making `Box` + behave the same as `&` and `&mut` with respect to coherence. +- A `#[fundamental]` trait `Foo` is one where adding an impl of `Foo` + for an existing type is a breaking change. For now, the `Fn` traits + and `Sized` would be marked fundamental, though we may want to + extend this set to all operators or some other + more-easily-remembered set. + +The `#[fundamental]` attribute is intended to be a kind of "minimal +commitment" that still permits the most important impl patterns we see +in the wild. Because it is unstable, it can only be used within libstd +for now. We are eventually committed to finding some way to +accommodate the patterns above -- which could be as simple as +stabilizing `#[fundamental]` (or, indeed, reverting this RFC +altogether). It could also be a more general mechanism that lets users +specify more precisely what kind of impls are reserved for future +expansion and which are not. + +## Detailed Design + +### Proposed orphan rules + +Given an impl `impl Trait for T0`, either `Trait` +must be local to the current crate, or: + +1. At least one type must meet the `LT` pattern defined above. Let + `Ti` be the first such type. +2. No type parameters `P1...Pn` may appear in the type parameters that + precede `Ti` (that is, `Tj` where `j < i`). + +### Type locality and negative reasoning + +Currently the overlap check employs negative reasoning to segregate +blanket impls from other impls. For example, the following pair of +impls would be legal only if `MyType: !Copy` for all `U` (the +notation `Type: !Trait` is borrowed from [RFC 586][586]): + +```rust +impl Clone for T {..} +impl Clone for MyType {..} +``` + +[586]: https://github.com/rust-lang/rfcs/pull/586 + +This proposal places limits on negative reasoning based on the orphan +rules. Specifically, we cannot conclude that a proposition like `T0: +!Trait` holds unless `T0: Trait` meets the orphan +rules as defined in the previous section. + +In practice this means that, by default, you can only assume negative +things about traits and types defined in your current crate, since +those are under your direct control. This permits parent crates to add +any impls except for blanket impls over `T`, `&T`, or `&mut T`, as +discussed before. + +### Effect on ABI compatibility and semver + +We have not yet proposed a comprehensive semver RFC (it's +coming). However, this RFC has some effect on what that RFC would say. +As discussed above, it is a breaking change for to add a blanket impl +for a `#[fundamental]` type. It is also a breaking change to add an +impl of a `#[fundamental]` trait to an existing type. + +# Drawbacks + +The primary drawback is that downstream crates cannot write an impl +over types other than references, such as `Option`. This +can be overcome by defining wrapper structs (new types), but that can +be annoying. + +# Alternatives + +- **Status quo.** In the status quo, the balance of power is heavily + tilted towards child crates. Parent crates basically cannot add any + impl for an existing trait to an existing type without potentially + breaking child crates. + +- **Take a hard line.** We could forego the `#[fundamental]` attribute, but + it would force people to forego `Box` impls as well as the + useful closure-overloading pattern. This seems + unfortunate. Moreover, it seems likely we will encounter further + examples of "reasonable cases" that `#[fundamental]` can easily + accommodate. + +- **Specializations, negative impls, and contracts.** The gist + referenced earlier includes [a section][c] covering various + alternatives that I explored which came up short. These include + specialization, explicit negative impls, and explicit contracts + between the trait definer and the trait consumer. + +# Unresolved questions + +None. + +[c]: https://gist.github.com/nikomatsakis/bbe6821b9e79dd3eb477#file-c-md diff --git a/text/1030-prelude-additions.md b/text/1030-prelude-additions.md new file mode 100644 index 00000000000..dc05c6933c5 --- /dev/null +++ b/text/1030-prelude-additions.md @@ -0,0 +1,56 @@ +- Feature Name: NA +- Start Date: 2015-04-03 +- RFC PR: [rust-lang/rfcs#1030](https://github.com/rust-lang/rfcs/pull/1030) +- Rust Issue: [rust-lang/rust#24538](https://github.com/rust-lang/rust/issues/24538) + +# Summary + +Add `Default`, `IntoIterator` and `ToOwned` trait to the prelude. + +# Motivation + +Each trait has a distinct motivation: + +* For `Default`, the ergonomics have vastly improved now that you can + write `MyType::default()` (thanks to UFCS). Thanks to this + improvement, it now makes more sense to promote widespread use of + the trait. + +* For `IntoIterator`, promoting to the prelude will make it feasible + to deprecate the inherent `into_iter` methods and directly-exported + iterator types, in favor of the trait (which is currently redundant). + +* For `ToOwned`, promoting to the prelude would add a uniform, + idiomatic way to acquire an owned copy of data (including going from + `str` to `String`, for which `Clone` does not work). + +# Detailed design + +* Add `Default`, `IntoIterator` and `ToOwned` trait to the prelude. + +* Deprecate inherent `into_iter` methods. + +* Ultimately deprecate module-level `IntoIter` types (e.g. in `vec`); + this may want to wait until you can write `Vec::IntoIter` rather + than ` as IntoIterator>::IntoIter`. + +# Drawbacks + +The main downside is that prelude entries eat up some amount of +namespace (particularly, method namespace). However, these are all +important, core traits in `std`, meaning that the method names are +already quite unlikely to be used. + +Strictly speaking, a prelude addition is a breaking change, but as +above, this is highly unlikely to cause actual breakage. In any case, +it can be landed prior to 1.0. + +# Alternatives + +None. + +# Unresolved questions + +The exact timeline of deprecation for `IntoIter` types. + +Are there other traits or types that should be promoted before 1.0? diff --git a/text/1040-duration-reform.md b/text/1040-duration-reform.md new file mode 100644 index 00000000000..a8fe0bbe7c7 --- /dev/null +++ b/text/1040-duration-reform.md @@ -0,0 +1,169 @@ +- Feature Name: duration +- Start Date: 2015-03-24 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1040 +- Rust Issue: https://github.com/rust-lang/rust/issues/24874 + +# Summary + +This RFC suggests stabilizing a reduced-scope `Duration` type that is appropriate for interoperating with various system calls that require timeouts. It does not stabilize a large number of conversion methods in `Duration` that have subtle caveats, with the intent of revisiting those conversions more holistically in the future. + +# Motivation + +There are a number of different notions of "time", each of which has a different set of caveats, and each of which can be designed for optimal ergonomics for its domain. This proposal focuses on one particular one: an amount of time in high-precision units. + +Eventually, there are a number of concepts of time that deserve fleshed out APIs. Using the terminology from the popular Java time library [JodaTime][joda-time]: + +* `Duration`: an amount of time, described in terms of a high + precision unit. +* `Period`: an amount of time described in human terms ("5 minutes, + 27 seconds"), and which can only be resolved into a `Duration` + relative to a moment in time. +* `Instant`: a moment in time represented in terms of a `Duration` + since some epoch. + +[joda-time]: http://www.joda.org/joda-time/ + +Human complications such as leap seconds, days in a month, and leap years, and machine complications such as NTP adjustments make these concepts and their full APIs more complicated than they would at first appear. This proposal focuses on fleshing out a design for `Duration` that is sufficient for use as a timeout, leaving the other concepts of time to a future proposal. + +--- + +For the most part, the system APIs that this type is used to communicate with either use `timespec` (`u64` seconds plus `u32` nanos) or take a timeout in milliseconds (`u32` on Windows). + +> For example, [`GetQueuedCompletionStatus`][iocp-ms-example], one of +> the primary APIs in the Windows IOCP API, takes a `dwMilliseconds` +> parameter as a [`DWORD`][msdn-dword], which is a `u32`. Some Windows +> APIs use "ticks" or 100-nanosecond units. + +[iocp-ms-example]: https://msdn.microsoft.com/en-us/library/windows/desktop/aa364986%28v=vs.85%29.aspx +[msdn-dword]: https://msdn.microsoft.com/en-us/library/cc230318.aspx + +In light of that, this proposal has two primary goals: + +* to define a type that can describe portable timeouts for cross- + platform APIs +* to describe what should happen if a large `Duration` is passed into + an API that does not accept timeouts that large + +In general, this proposal considers it acceptable to reduce the granularity of timeouts (eliminating nanosecond granularity if only milliseconds are supported) and to truncate very large timeouts. + +This proposal retains the two fields in the existing `Duration`: + +* a `u64` of seconds +* a `u32` of additional nanosecond precision + +Timeout APIs defined in terms of milliseconds will truncate `Duration`s that are more than `u32::MAX` in milliseconds, and will reduce the granularity of the nanosecond field. + +> A `u32` of milliseconds supports a timeout longer than 45 days. + +Future APIs to support a broader set of [Durations][joda-duration] APIs, a [Period][joda-period] and [Instant][joda-instant] type, as well as coercions between these types, would be useful, compatible follow-ups to this RFC. + +[joda-duration]: http://www.joda.org/joda-time/key_duration.html +[joda-period]: http://www.joda.org/joda-time/key_period.html +[joda-instant]: http://www.joda.org/joda-time/key_instant.html + +# Detailed design + +A `Duration` represents a period of time represented in terms of nanosecond granularity. It has `u64` seconds and an additional `u32` nanoseconds. There is no concept of a negative `Duration`. + +> A negative `Duration` has no meaning for many APIs that may wish +> to take a `Duration`, which means that all such APIs would need +> to decide what to do when confronted with a negative `Duration`. +> As a result, this proposal focuses on the predominant use-cases for +> `Duration`, where unsigned types remove a number of caveats and +> ambiguities. + +```rust +pub struct Duration { + secs: u64, + nanos: u32 // may not be more than 1 billion +} + +impl Duration { + /// create a Duration from a number of seconds and an + /// additional nanosecond precision. If nanos is one + /// billion or greater, it carries into secs. + pub fn new(secs: u64, nanos: u32) -> Timeout; + + /// create a Duration from a number of seconds + pub fn from_secs(secs: u64) -> Timeout; + + /// create a Duration from a number of milliseconds + pub fn from_millis(millis: u64) -> Timeout; + + /// the number of seconds represented by the Duration + pub fn secs(self) -> u64; + + /// the number of additional nanosecond precision + pub fn nanos(self) -> u32; +} +``` + +When `Duration` is used with a system API that expects `u32` milliseconds, the `Duration`'s precision is coarsened to milliseconds, and, and the number is truncated to `u32::MAX`. + +In general, this RFC assumes that timeout APIs permit spurious updates (see, for example, [pthread_cond_timedwait][pthread_cond_timedwait], "Spurious wakeups from the pthread_cond_timedwait() or pthread_cond_wait() functions may occur"). + +[pthread_cond_timedwait]: http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_timedwait.html + +`Duration` implements: + +* `Add`, `Sub`, `Mul`, `Div` which follow the overflow and underflow + rules for `u64` when applied to the `secs` field (in particular, + `Sub` will panic if the result would be negative). Nanoseconds + must be less than 1 billion and great than or equal to 0, and carry + into the `secs` field. +* `Display`, which prints a number of seconds, milliseconds and + nanoseconds (if more than 0). For example, a `Duration` would be + represented as `"15 seconds, 306 milliseconds, and 13 nanoseconds"` +* `Debug`, `Ord` (and `PartialOrd`), `Eq` (and `PartialEq`), `Copy` + and `Clone`, which are derived. + +This proposal does not, at this time, include mechanisms for instantiating a `Duration` from `weeks`, `days`, `hours` or `minutes`, because there are caveats to each of those units. In particular, the existence of leap seconds means that it is only possible to properly understand them relative to a particular starting point. + +The Joda-Time library in Java explains the problem well [in their documentation][joda-period-confusion]: + +[joda-period-confusion]: http://www.joda.org/joda-time/key_period.html + +> A duration in Joda-Time represents a duration of time measured in milliseconds. The duration is often obtained from an interval. Durations are a very simple concept, and the implementation is also simple. They have no chronology or time zone, **and consist solely of the millisecond duration.** + +> A period in Joda-Time represents a period of time defined in terms of fields, for example, 3 years 5 months 2 days and 7 hours. This differs from a duration in that it is inexact in terms of milliseconds. **A period can only be resolved to an exact number of milliseconds by specifying the instant (including chronology and time zone) it is relative to**. + +In short, this is saying that people expect "23:50:00 + 10 minutes" to equal "00:00:00", but it's impossible to know for sure whether that's true unless you know the exact starting point so you can take leap seconds into consideration. + +In order to address this confusion, Joda-Time's Duration has methods like `standardDays`/`toStandardDays` and `standardHours`/`toStandardHours`, which are meant to indicate to the user that the number of milliseconds is based on the standard number of milliseconds in an hour, rather than the colloquial notion of an "hour". + +An approach like this could work for Rust, but this RFC is intentionally limited in scope to areas without substantial tradeoffs in an attempt to allow a minimal solution to progress more quickly. + +This proposal does not include a method to get a number of milliseconds from a `Duration`, because the number of milliseconds could exceed `u64`, and we would have to decide whether to return an `Option`, panic, or wait for a standard bignum. In the interest of limiting this proposal to APIs with a straight-forward design, this proposal defers such a method. + +# Drawbacks + +The main drawback to this proposal is that it is significantly more minimal than the existing `Duration` API. However, this API is quite sufficient for timeouts, and without the caveats in the existing `Duration` API. + +# Alternatives + +We could stabilize the existing `Duration` API. However, it has a number of serious caveats: + +* The caveats described above about some of the units it supports. +* It supports converting a `Duration` into a number of microseconds or + nanoseconds. Because that cannot be done reliably, those methods + return `Option`s, and APIs that need to convert `Duration` into + nanoseconds have to re-surface the `Option` (unergonomic) or panic. +* More generally, it has a fairly large API surface area, and almost + every method has some caveat that would need to be explored in order + to stabilize it. + +--- + +We could also include a number of convenience APIs that convert from other units into `Duration`s. This proposal assumes that some of those conveniences will eventually be added. However, the design of each of those conveniences is ambiguous, so they are not included in this initial proposal. + +--- + +Finally, we could avoid any API for timeouts, and simply take milliseconds throughout the standard library. However, this has two drawbacks. + +First, it does not allow us to represent higher-precision timeouts on systems that could support them. + +Second, while this proposal does not yet include conveniences, it assumes that some conveniences should be added in the future once the design space is more fully explored. Starting with a simple type gives us space to grow into. + +# Unresolved questions + +* Should we implement all of the listed traits? Others? diff --git a/text/1044-io-fs-2.1.md b/text/1044-io-fs-2.1.md new file mode 100644 index 00000000000..d7c49ea226c --- /dev/null +++ b/text/1044-io-fs-2.1.md @@ -0,0 +1,558 @@ +- Feature Name: `fs2` +- Start Date: 2015-04-04 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1044 +- Rust Issue: https://github.com/rust-lang/rust/issues/24796 + +# Summary + +Expand the scope of the `std::fs` module by enhancing existing functionality, +exposing lower-level representations, and adding a few new functions. + +# Motivation + +The current `std::fs` module serves many of the basic needs of interacting with +a filesystem, but is missing a lot of useful functionality. For example, none of +these operations are possible in stable Rust today: + +* Inspecting a file's modification/access times +* Reading low-level information like that contained in `libc::stat` +* Inspecting the unix permission bits on a file +* Blanket setting the unix permission bits on a file +* Leveraging `DirEntry` for the extra metadata it might contain +* Reading the metadata of a symlink (not what it points at) +* Resolving all symlink in a path + +There is some more functionality listed in the [RFC issue][issue], but this RFC +will not attempt to solve the entirety of that issue at this time. This RFC +strives to expose APIs for much of the functionality listed above that is on the +track to becoming `#[stable]` soon. + +[issue]: https://github.com/rust-lang/rfcs/issues/939 + +## Non-goals of this RFC + +There are a few areas of the `std::fs` API surface which are **not** considered +goals for this RFC. It will be left for future RFCs to add new APIs for these +areas: + +* Enhancing `copy` to copy directories recursively or configuring how copying + happens. +* Enhancing or stabilizing `walk` and its functionality. +* Temporary files or directories + +# Detailed design + +First, a vision for how lowering APIs in general will be presented, and then a +number of specific APIs will each be proposed. Many of the proposed APIs are +independent from one another and this RFC may not be implemented all-in-one-go +but instead piecemeal over time, allowing the designs to evolve slightly in the +meantime. + +## Lowering APIs + +### The vision for the `os` module + +One of the principles of [IO reform][io-reform-vision] was to: + +> Provide hooks for integrating with low-level and/or platform-specific APIs. + +The original RFC went into some amount of detail for how this would look, in +particular by use of the `os` module. Part of the goal of this RFC is to flesh +out that vision in more detail. + +Ultimately, the organization of `os` is planned to look something like the +following: + +``` +os + unix applicable to all cfg(unix) platforms; high- and low-level APIs + io extensions to std::io + fs extensions to std::fs + net extensions to std::net + env extensions to std::env + process extensions to std::process + ... + linux applicable to linux only + io, fs, net, env, process, ... + macos ... + windows ... +``` + +APIs whose behavior is platform-specific are provided only within the `std::os` +hierarchy, making it easy to audit for usage of such APIs. Organizing the +platform modules internally in the same way as `std` makes it easy to find +relevant extensions when working with `std`. + +It is emphatically *not* the goal of the `std::os::*` modules to provide +bindings to *all* system APIs for each platform; this work is left to external +crates. The goals are rather to: + +1. Facilitate interop between abstract types like `File` that `std` provides and + the underlying system. This is done via "lowering": extension traits like + [`AsRawFd`][AsRawFd] allow you to extract low-level, platform-specific + representations out of `std` types like `File` and `TcpStream`. + +2. Provide high-level but platform-specific APIs that feel like those in the + rest of `std`. Just as with the rest of `std`, the goal here is not to + include all possible functionality, but rather the most commonly-used or + fundamental. + +Lowering makes it possible for external crates to provide APIs that work +"seamlessly" with `std` abstractions. For example, a crate for Linux might +provide an `epoll` facility that can work directly with `std::fs::File` and +`std::net::TcpStream` values, completely hiding the internal use of file +descriptors. Eventually, such a crate could even be merged into `std::os::unix`, +with minimal disruption -- there is little distinction between `std` and other +crates in this regard. + +Concretely, lowering has two ingredients: + +1. Introducing one or more "raw" types that are generally direct aliases for C + types (more on this in the next section). + +2. Providing an extension trait that makes it possible to extract a raw type + from a `std` type. In some cases, it's possible to go the other way around as + well. The conversion can be by reference or by value, where the latter is + used mainly to avoid the destructor associated with a `std` type (e.g. to + extract a file descriptor from a `File` and eliminate the `File` object, + without closing the file). + +While we do not seek to exhaustively bind types or APIs from the underlying +system, it *is* a goal to provide lowering operations for every high-level type +to a system-level data type, whenever applicable. This RFC proposes several such +lowerings that are currently missing from `std::fs`. + +[io-reform-vision]: https://github.com/rust-lang/rfcs/blob/master/text/0517-io-os-reform.md#vision-for-io +[AsRawFd]: http://static.rust-lang.org/doc/master/std/os/unix/io/trait.AsRawFd.html + +#### `std::os::platform::raw` + +Each of the primitives in the standard library will expose the ability to be +lowered into its component abstraction, facilitating the need to define these +abstractions and organize them in the platform-specific modules. This RFC +proposes the following guidelines for doing so: + +* Each platform will have a `raw` module inside of `std::os` which houses all of + its platform specific definitions. +* Only type definitions will be contained in `raw` modules, no function + bindings, methods, or trait implementations. +* Cross-platform types (e.g. those shared on all `unix` platforms) will be + located in the respective cross-platform module. Types which only differ in + the width of an integer type are considered to be cross-platform. +* Platform-specific types will exist only in the `raw` module for that platform. + A platform-specific type may have different field names, components, or just + not exist on other platforms. + +Differences in integer widths are not considered to be enough of a platform +difference to define in each separate platform's module, meaning that it will be +possible to write code that uses `os::unix` but doesn't compile on all Unix +platforms. It is believed that most consumers of these types will continue to +store the same type (e.g. not assume it's an `i32`) throughout the application +or immediately cast it to a known type. + +To reiterate, it is not planned for each `raw` module to provide *exhaustive* +bindings to each platform. Only those abstractions which the standard library is +lowering into will be defined in each `raw` module. + +### Lowering `Metadata` (all platforms) + +Currently the `Metadata` structure exposes very few pieces of information about +a file. Some of this is because the information is not available across all +platforms, but some of it is also because the standard library does not have the +appropriate abstraction to return at this time (e.g. time stamps). The raw +contents of `Metadata` (a `stat` on Unix), however, should be accessible via +lowering no matter what. + +The following trait hierarchy and new structures will be added to the standard +library. + +```rust +mod os::windows::fs { + pub trait MetadataExt { + fn file_attributes(&self) -> u32; // `dwFileAttributes` field + fn creation_time(&self) -> u64; // `ftCreationTime` field + fn last_access_time(&self) -> u64; // `ftLastAccessTime` field + fn last_write_time(&self) -> u64; // `ftLastWriteTime` field + fn file_size(&self) -> u64; // `nFileSizeHigh`/`nFileSizeLow` fields + } + impl MetadataExt for fs::Metadata { ... } +} + +mod os::unix::fs { + pub trait MetadataExt { + fn as_raw(&self) -> &Metadata; + } + impl MetadataExt for fs::Metadata { ... } + + pub struct Metadata(raw::stat); + impl Metadata { + // Accessors for fields available in `raw::stat` for *all* unix platforms + fn dev(&self) -> raw::dev_t; // st_dev field + fn ino(&self) -> raw::ino_t; // st_ino field + fn mode(&self) -> raw::mode_t; // st_mode field + fn nlink(&self) -> raw::nlink_t; // st_nlink field + fn uid(&self) -> raw::uid_t; // st_uid field + fn gid(&self) -> raw::gid_t; // st_gid field + fn rdev(&self) -> raw::dev_t; // st_rdev field + fn size(&self) -> raw::off_t; // st_size field + fn blksize(&self) -> raw::blksize_t; // st_blksize field + fn blocks(&self) -> raw::blkcnt_t; // st_blocks field + fn atime(&self) -> (i64, i32); // st_atime field, (sec, nsec) + fn mtime(&self) -> (i64, i32); // st_mtime field, (sec, nsec) + fn ctime(&self) -> (i64, i32); // st_ctime field, (sec, nsec) + } +} + +// st_flags, st_gen, st_lspare, st_birthtim, st_qspare +mod os::{linux, macos, freebsd, ...}::fs { + pub mod raw { + pub type dev_t = ...; + pub type ino_t = ...; + // ... + pub struct stat { + // ... same public fields as libc::stat + } + } + pub trait MetadataExt { + fn as_raw_stat(&self) -> &raw::stat; + } + impl MetadataExt for os::unix::fs::RawMetadata { ... } + impl MetadataExt for fs::Metadata { ... } +} +``` + +The goal of this hierarchy is to expose all of the information in the OS-level +metadata in as cross-platform of a method as possible while adhering to the +design principles of the standard library. + +The interesting part about working in a "cross platform" manner here is that the +makeup of `libc::stat` on unix platforms can vary quite a bit between platforms. +For example some platforms have a `st_birthtim` field while others do not. +To enable as much ergonomic usage as possible, the `os::unix` module will expose +the *intersection* of metadata available in `libc::stat` across all unix +platforms. The information is still exposed in a raw fashion (in terms of the +values returned), but methods are required as the raw structure is not exposed. +The unix platforms then leverage the more fine-grained modules in `std::os` +(e.g. `linux` and `macos`) to return the raw `libc::stat` structure. This will +allow full access to the information in `libc::stat` in all platforms with clear +opt-in to when you're using platform-specific information. + +One of the major goals of the `os::unix::fs` design is to enable as much +functionality as possible when programming against "unix in general" while still +allowing applications to choose to only program against macos, for example. + +#### Fate of `Metadata::{accesed, modified}` + +At this time there is no suitable type in the standard library to represent the +return type of these two functions. The type would either have to be some form +of time stamp or moment in time, both of which are difficult abstractions to add +lightly. + +Consequently, both of these functions will be **deprecated** in favor of +requiring platform-specific code to access the modification/access time of +files. This information is all available via the `MetadataExt` traits listed +above. + +Eventually, once a `std` type for cross-platform timestamps is available, these +methods will be re-instated as returning that type. + +### Lowering and setting `Permissions` (Unix) + +> **Note**: this section only describes behavior on unix. + +Currently there is no stable method of inspecting the permission bits on a file, +and it is unclear whether the current unstable methods of doing so, +`PermissionsExt::mode`, should be stabilized. The main question around this +piece of functionality is whether to provide a higher level abstractiong (e.g. +similar to the `bitflags` crate) for the permission bits on unix. + +This RFC proposes considering the methods for stabilization as-is and not +pursuing a higher level abstraction of the unix permission bits. To facilitate +in their inspection and manipulation, however, the following constants will be +added: + +```rust +mod os::unix::fs { + pub const USER_READ: raw::mode_t; + pub const USER_WRITE: raw::mode_t; + pub const USER_EXECUTE: raw::mode_t; + pub const USER_RWX: raw::mode_t; + pub const OTHER_READ: raw::mode_t; + pub const OTHER_WRITE: raw::mode_t; + pub const OTHER_EXECUTE: raw::mode_t; + pub const OTHER_RWX: raw::mode_t; + pub const GROUP_READ: raw::mode_t; + pub const GROUP_WRITE: raw::mode_t; + pub const GROUP_EXECUTE: raw::mode_t; + pub const GROUP_RWX: raw::mode_t; + pub const ALL_READ: raw::mode_t; + pub const ALL_WRITE: raw::mode_t; + pub const ALL_EXECUTE: raw::mode_t; + pub const ALL_RWX: raw::mode_t; + pub const SETUID: raw::mode_t; + pub const SETGID: raw::mode_t; + pub const STICKY_BIT: raw::mode_t; +} +``` + +Finally, the `set_permissions` function of the `std::fs` module is also proposed +to be marked `#[stable]` soon as a method of blanket setting permissions for a +file. + +## Constructing `Permissions` + +Currently there is no method to construct an instance of `Permissions` on any +platform. This RFC proposes adding the following APIs: + +```rust +mod os::unix::fs { + pub trait PermissionsExt { + fn from_mode(mode: raw::mode_t) -> Self; + } + impl PermissionsExt for Permissions { ... } +} +``` + +This RFC does not propose yet adding a cross-platform way to construct a +`Permissions` structure due to the radical differences between how unix and +windows handle permissions. + +## Creating directories with permissions + +Currently the standard library does not expose an API which allows setting the +permission bits on unix or security attributes on Windows. This RFC proposes +adding the following API to `std::fs`: + +```rust +pub struct DirBuilder { ... } + +impl DirBuilder { + /// Creates a new set of options with default mode/security settings for all + /// platforms and also non-recursive. + pub fn new() -> Self; + + /// Indicate that directories create should be created recursively, creating + /// all parent directories if they do not exist with the same security and + /// permissions settings. + pub fn recursive(&mut self, recursive: bool) -> &mut Self; + + /// Create the specified directory with the options configured in this + /// builder. + pub fn create>(&self, path: P) -> io::Result<()>; +} + +mod os::unix::fs { + pub trait DirBuilderExt { + fn mode(&mut self, mode: raw::mode_t) -> &mut Self; + } + impl DirBuilderExt for DirBuilder { ... } +} + +mod os::windows::fs { + // once a `SECURITY_ATTRIBUTES` abstraction exists, this will be added + pub trait DirBuilderExt { + fn security_attributes(&mut self, ...) -> &mut Self; + } + impl DirBuilderExt for DirBuilder { ... } +} +``` + +This sort of builder is also extendable to other flavors of functions in the +future, such as [C++'s template parameter][cpp-dir-template]: + +[cpp-dir-template]: http://en.cppreference.com/w/cpp/experimental/fs/create_directory + +```rust +/// Use the specified directory as a "template" for permissions and security +/// settings of the new directories to be created. +/// +/// On unix this will issue a `stat` of the specified directory and new +/// directories will be created with the same permission bits. On Windows +/// this will trigger the use of the `CreateDirectoryEx` function. +pub fn template>(&mut self, path: P) -> &mut Self; +``` + +At this time, however, it it not proposed to add this method to +`DirBuilder`. + +## Adding `FileType` + +Currently there is no enumeration or newtype representing a list of "file types" +on the local filesystem. This is partly done because the need is not so high +right now. Some situations, however, imply that it is more efficient to learn +the file type at once instead of testing for each individual file type itself. + +For example some platforms' `DirEntry` type can know the `FileType` without an +extra syscall. If code were to test a `DirEntry` separately for whether it's a +file or a directory, it may issue more syscalls necessary than if it instead +learned the type and then tested that if it was a file or directory. + +The full set of file types, however, is not always known nor portable across +platforms, so this RFC proposes the following hierarchy: + +```rust +#[derive(Copy, Clone, PartialEq, Eq, Hash)] +pub struct FileType(..); + +impl FileType { + pub fn is_dir(&self) -> bool; + pub fn is_file(&self) -> bool; + pub fn is_symlink(&self) -> bool; +} +``` + +Extension traits can be added in the future for testing for other more flavorful +kinds of files on various platforms (such as unix sockets on unix platforms). + +#### Dealing with `is_{file,dir}` and `file_type` methods + +Currently the `fs::Metadata` structure exposes stable `is_file` and `is_dir` +accessors. The struct will also grow a `file_type` accessor for this newtype +struct being added. It is proposed that `Metadata` will retain the +`is_{file,dir}` convenience methods, but no other "file type testers" will be +added. + +## Enhancing symlink support + +Currently the `std::fs` module provides a `soft_link` and `read_link` function, +but there is no method of doing other symlink related tasks such as: + +* Testing whether a file is a symlink +* Reading the metadata of a symlink, not what it points to + +The following APIs will be added to `std::fs`: + +```rust +/// Returns the metadata of the file pointed to by `p`, and this function, +/// unlike `metadata` will **not** follow symlinks. +pub fn symlink_metadata>(p: P) -> io::Result; +``` + +## Binding `realpath` + +There's a [long-standing issue][realpath] that the unix function `realpath` is +not bound, and this RFC proposes adding the following API to the `fs` module: + +[realpath]: https://github.com/rust-lang/rust/issues/11857 + +```rust +/// Canonicalizes the given file name to an absolute path with all `..`, `.`, +/// and symlink components resolved. +/// +/// On unix this function corresponds to the return value of the `realpath` +/// function, and on Windows this corresponds to the `GetFullPathName` function. +/// +/// Note that relative paths given to this function will use the current working +/// directory as a base, and the current working directory is not managed in a +/// thread-local fashion, so this function may need to be synchronized with +/// other calls to `env::change_dir`. +pub fn canonicalize>(p: P) -> io::Result; +``` + +## Tweaking `PathExt` + +Currently the `PathExt` trait is unstable, yet it is quite convenient! The main +motivation for its `#[unstable]` tag is that it is unclear how much +functionality should be on `PathExt` versus the `std::fs` module itself. +Currently a small subset of functionality is offered, but it is unclear what the +guiding principle for the contents of this trait are. + +This RFC proposes a few guiding principles for this trait: + +* Only read-only operations in `std::fs` will be exposed on `PathExt`. All + operations which require modifications to the filesystem will require calling + methods through `std::fs` itself. + +* Some inspection methods on `Metadata` will be exposed on `PathExt`, but only + those where it logically makes sense for `Path` to be the `self` receiver. For + example `PathExt::len` will not exist (size of the file), but + `PathExt::is_dir` will exist. + +Concretely, the `PathExt` trait will be expanded to: + +```rust +pub trait PathExt { + fn exists(&self) -> bool; + fn is_dir(&self) -> bool; + fn is_file(&self) -> bool; + fn metadata(&self) -> io::Result; + fn symlink_metadata(&self) -> io::Result; + fn canonicalize(&self) -> io::Result; + fn read_link(&self) -> io::Result; + fn read_dir(&self) -> io::Result; +} + +impl PathExt for Path { ... } +``` + +## Expanding `DirEntry` + +Currently the `DirEntry` API is quite minimalistic, exposing very few of the +underlying attributes. Platforms like Windows actually contain an entire +`Metadata` inside of a `DirEntry`, enabling much more efficient walking of +directories in some situations. + +The following APIs will be added to `DirEntry`: + +```rust +impl DirEntry { + /// This function will return the filesystem metadata for this directory + /// entry. This is equivalent to calling `fs::symlink_metadata` on the + /// path returned. + /// + /// On Windows this function will always return `Ok` and will not issue a + /// system call, but on unix this will always issue a call to `stat` to + /// return metadata. + pub fn metadata(&self) -> io::Result; + + /// Return what file type this `DirEntry` contains. + /// + /// On some platforms this may not require reading the metadata of the + /// underlying file from the filesystem, but on other platforms it may be + /// required to do so. + pub fn file_type(&self) -> io::Result; + + /// Returns the file name for this directory entry. + pub fn file_name(&self) -> OsString; +} + +mod os::unix::fs { + pub trait DirEntryExt { + fn ino(&self) -> raw::ino_t; // read the d_ino field + } + impl DirEntryExt for fs::DirEntry { ... } +} +``` + +# Drawbacks + +* This is quite a bit of surface area being added to the `std::fs` API, and it + may perhaps be best to scale it back and add it in a more incremental fashion + instead of all at once. Most of it, however, is fairly straightforward, so it + seems prudent to schedule many of these features for the 1.1 release. + +* Exposing raw information such as `libc::stat` or `WIN32_FILE_ATTRIBUTE_DATA` + possibly can hamstring altering the implementation in the future. At this + point, however, it seems unlikely that the exposed pieces of information will + be changing much. + +# Alternatives + +* Instead of exposing accessor methods in `MetadataExt` on Windows, the raw + `WIN32_FILE_ATTRIBUTE_DATA` could be returned. We may change, however, to + using `BY_HANDLE_FILE_INFORMATION` one day which would make the return value + from this function more difficult to implement. + +* A `std::os::MetadataExt` trait could be added to access truly common + information such as modification/access times across all platforms. The return + value would likely be a `u64` "something" and would be clearly documented as + being a lossy abstraction and also only having a platform-specific meaning. + +* The `PathExt` trait could perhaps be implemented on `DirEntry`, but it doesn't + necessarily seem appropriate for all the methods and using inherent methods + also seems more logical. + +# Unresolved questions + +* What is the ultimate role of crates like `liblibc`, and how do we draw the + line between them and `std::os` definitions? diff --git a/text/1047-socket-timeouts.md b/text/1047-socket-timeouts.md new file mode 100644 index 00000000000..81d72964462 --- /dev/null +++ b/text/1047-socket-timeouts.md @@ -0,0 +1,161 @@ +- Feature Name: `socket_timeouts` +- Start Date: 2015-04-08 +- RFC PR: [rust-lang/rfcs#1047](https://github.com/rust-lang/rfcs/pull/1047) +- Rust Issue: [rust-lang/rust#25619](https://github.com/rust-lang/rust/issues/25619) + +# Summary + +Add sockopt-style timeouts to `std::net` types. + +# Motivation + +Currently, operations on various socket types in `std::net` block +indefinitely (i.e., until the connection is closed or data is +transferred). But there are many contexts in which timing out a +blocking call is important. + +The [goal of the current IO system][io-reform] is to gradually expose +cross-platform, blocking APIs for IO, especially APIs that directly +correspond to the underlying system APIs. Sockets are widely available +with nearly identical system APIs across the platforms Rust targets, +and this includes support for timeouts via [sockopts][sockopt]. + +So timeouts are well-motivated and well-suited to `std::net`. + +# Detailed design + +The proposal is to *directly expose* the timeout functionality +provided by [`setsockopt`][sockopt], in much the same way we currently +expose functionality like `set_nodelay`: + +```rust +impl TcpStream { + pub fn set_read_timeout(&self, dur: Option) -> io::Result<()> { ... } + pub fn read_timeout(&self) -> io::Result>; + + pub fn set_write_timeout(&self, dur: Option) -> io::Result<()> { ... } + pub fn write_timeout(&self) -> io::Result>; +} + +impl UdpSocket { + pub fn set_read_timeout(&self, dur: Option) -> io::Result<()> { ... } + pub fn read_timeout(&self) -> io::Result>; + + pub fn set_write_timeout(&self, dur: Option) -> io::Result<()> { ... } + pub fn write_timeout(&self) -> io::Result>; +} +``` + +The setter methods take an amount of time in the form of a `Duration`, +which is [undergoing stabilization][duration-reform]. They are +implemented via straightforward calls to `setsockopt`. The `Option` is +used to signify no timeout (for both setting and +getting). Consequently, `Some(Duration::new(0, 0))` is a possible +argument; the setter methods will return an IO error of kind +`InvalidInput` in this case. (See Alternatives for other approaches.) + +The corresponding socket options are `SO_RCVTIMEO` and `SO_SNDTIMEO`. + +# Drawbacks + +One potential downside to this design is that the timeouts are set +through direct mutation of the socket state, which can lead to +composition problems. For example, a socket could be passed to another +function which needs to use it with a timeout, but setting the timeout +clobbers any previous values. This lack of composability leads to +defensive programming in the form of "callee save" resets of timeouts, +for example. An alternative design is given below. + +The advantage of binding the mutating APIs directly is that we keep a +close correspondence between the `std::net` types and their underlying +system types, and a close correspondence between Rust APIs and system +APIs. It's not clear that this kind of composability is important +enough in practice to justify a departure from the traditional API. + +# Alternatives + +## Taking `Duration` directly + +Using an `Option` introduces a certain amount of complexity +-- it raises the issue of `Some(Duration::new(0, 0))`, and it's +slightly more verbose to set a timeout. + +An alternative would be to take a `Duration` directly, and interpret a +zero length duration as "no timeout" (which is somewhat traditional in +C APIs). That would make the API somewhat more familiar, but less +Rustic, and it becomes somewhat easier to pass in a zero value by +accident (without thinking about this possibility). + +Note that both styles of API require code that does arithmetic on +durations to check for zero in advance. + +Aside from fitting Rust idioms better, the main proposal also gives a +somewhat stronger indication of a bug when things go wrong (rather +than simply failing to time out, for example). + +## Combining with nonblocking support + +Another possibility would be to provide a single method that can +choose between blocking indefinitely, blocking with a timeout, and +nonblocking mode: + +```rust +enum BlockingMode { + Nonblocking, + Blocking, + Timeout(Duration) +} +``` + +This `enum` makes clear that it doesn't make sense to have both a +timeout and put the socket in nonblocking mode. On the other hand, it +would relinquish the one-to-one correspondence between Rust +configuration APIs and underlying socket options. + +## Wrapping for compositionality + +A different approach would be to *wrap* socket types with a "timeout +modifier", which would be responsible for setting and resetting the +timeouts: + +```rust +struct WithTimeout { + timeout: Duration, + inner: T +} + +impl WithTimeout { + /// Returns the wrapped object, resetting the timeout + pub fn into_inner(self) -> T { ... } +} + +impl TcpStream { + /// Wraps the stream with a timeout + pub fn with_timeout(self, timeout: Duration) -> WithTimeout { ... } +} + +impl Read for WithTimeout { ... } +impl Write for WithTimeout { ... } +``` + +A [previous RFC][deadlines] spelled this out in more detail. + +Unfortunately, such a "wrapping" API has problems of its own. It +creates unfortunate type incompatibilities, since you cannot store a +timeout-wrapped socket where a "normal" socket is expected. It is +difficult to be "polymorphic" over timeouts. + +Ultimately, it's not clear that the extra complexities of the type +distinction here are worth the better theoretical composability. + +# Unresolved questions + +Should we consider a preliminary version of this RFC that introduces +methods like `set_read_timeout_ms`, similar to `wait_timeout_ms` on +`Condvar`? These methods have been introduced elsewhere to provide a +stable way to use timeouts prior to `Duration` being stabilized. + +[io-reform]: https://github.com/rust-lang/rfcs/blob/master/text/0517-io-os-reform.md +[sockopt]: http://pubs.opengroup.org/onlinepubs/009695399/functions/setsockopt.html +[duration-reform]: https://github.com/rust-lang/rfcs/pull/1040 +[deadlines]: https://github.com/rust-lang/rfcs/pull/577/ diff --git a/text/1048-rename-soft-link-to-symlink.md b/text/1048-rename-soft-link-to-symlink.md new file mode 100644 index 00000000000..a05d54c4c00 --- /dev/null +++ b/text/1048-rename-soft-link-to-symlink.md @@ -0,0 +1,175 @@ +- Feature Name: `rename_soft_link_to_symlink` +- Start Date: 2015-04-09 +- RFC PR: [rust-lang/rfcs#1048](https://github.com/rust-lang/rfcs/pull/1048) +- Rust Issue: [rust-lang/rust#24222](https://github.com/rust-lang/rust/pull/24222) + +# Summary + +Deprecate `std::fs::soft_link` in favor of platform-specific versions: +`std::os::unix::fs::symlink`, `std::os::windows::fs::symlink_file`, and +`std::os::windows::fs::symlink_dir`. + +# Motivation + +Windows Vista introduced the ability to create symbolic links, in order to +[provide compatibility with applications ported from Unix](https://msdn.microsoft.com/en-us/library/windows/desktop/aa365680%28v=vs.85%29.aspx): + +> Symbolic links are designed to aid in migration and application +> compatibility with UNIX operating systems. Microsoft has implemented its +> symbolic links to function just like UNIX links. + +However, symbolic links on Windows behave differently enough than symbolic +links on Unix family operating systems that you can't, in general, assume that +code that works on one will work on the other. On Unix family operating +systems, a symbolic link may refer to either a directory or a file, and which +one is determined when it is resolved to an actual file. On Windows, you must +specify at the time of creation whether a symbolic link refers to a file or +directory. + +In addition, an arbitrary process on Windows is not allowed to create a +symlink; you need to have [particular privileges][1] in order to be able to do +so; while on Unix, ordinary users can create symlinks, and any additional +security policy (such as [Grsecurity][2]) generally restricts +whether applications follow symlinks, not whether a user can create them. + +[1]: https://technet.microsoft.com/en-us/library/cc766301%28WS.10%29.aspx +[2]: https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Linking_restrictions + +Thus, there needs to be a way to distinguish between the two operations on +Windows, but that distinction is meaningless on Unix, and any code that deals +with symlinks on Windows will need to depend on having appropriate privilege +or have some way of obtaining appropriate privilege, which is all quite +platform specific. + +These two facts mean that it is unlikely that arbitrary code dealing with +symbolic links will be portable between Windows and Unix. Rather than trying +to support both under one API, it would be better to provide platform specific +APIs, making it much more clear upon inspection where portability issues may +arise. + +In addition, the current name `soft_link` is fairly non-standard. At some +point in the split up version of rust-lang/rfcs#517, `std::fs::symlink` was +renamed to `sym_link` and then to `soft_link`. + +The new name is somewhat surprising and can be difficult to find. After a +poll of a number of different platforms and languages, every one appears to +contain `symlink`, `symbolic_link`, or some camel case variant of those for +their equivalent API. Every piece of formal documentation found, for +both Windows and various Unix like platforms, used "symbolic link" exclusively +in prose. + +Here are the names I found for this functionality on various platforms, +libraries, and languages: + +* [POSIX/Single Unix Specification](http://pubs.opengroup.org/onlinepubs/009695399/functions/symlink.html): `symlink` +* [Windows](https://msdn.microsoft.com/en-us/library/windows/desktop/aa365680%28v=vs.85%29.aspx): `CreateSymbolicLink` +* [Objective-C/Swift](https://developer.apple.com/library/ios/documentation/Cocoa/Reference/Foundation/Classes/NSFileManager_Class/index.html#//apple_ref/occ/instm/NSFileManager/createSymbolicLinkAtPath:withDestinationPath:error:): `createSymbolicLinkAtPath:withDestinationPath:error:` +* [Java](https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html): `createSymbolicLink` +* [C++ (Boost/draft standard)](http://en.cppreference.com/w/cpp/experimental/fs): `create_symlink` +* [Ruby](http://ruby-doc.org/core-2.2.0/File.html): `symlink` +* [Python](https://docs.python.org/2/library/os.html#os.symlink): `symlink` +* [Perl](http://perldoc.perl.org/functions/symlink.html): `symlink` +* [PHP](https://php.net/manual/en/function.symlink.php): `symlink` +* [Delphi](http://docwiki.embarcadero.com/Libraries/XE7/en/System.SysUtils.FileCreateSymLink): `FileCreateSymLink` +* PowerShell has no official version, but several community cmdlets ([one example](http://stackoverflow.com/questions/894430/powershell-hard-and-soft-links/894651#894651), [another example](https://gallery.technet.microsoft.com/scriptcenter/New-SymLink-60d2531e)) are named `New-SymLink` + +The term "soft link", probably as a contrast with "hard link", is found +frequently in informal descriptions, but almost always in the form of a +parenthetical of an alternate phrase, such as "a symbolic link (or soft +link)". I could not find it used in any formal documentation or APIs outside +of Rust. + +The name `soft_link` was chosen to be shorter than `symbolic_link`, but +without using Unix specific jargon like `symlink`, to not give undue weight to +one platform over the other. However, based on the evidence above it doesn't +have any precedent as a formal name for the concept or API. + +Furthermore, even on Windows, the name for the [reparse point tag used][3] to +represent symbolic links is `IO_REPARSE_TAG_SYMLINK`. + +[3]: https://msdn.microsoft.com/en-us/library/windows/desktop/aa365511%28v=vs.85%29.aspx + +If you do a Google search for "[windows symbolic link](https://www.google.com/search?q=windows+symbolic+link&ie=utf-8&oe=utf-8)" or "[windows soft link](https://www.google.com/search?q=windows+soft+link&ie=utf-8&oe=utf-8)", +many of the documents you find start using "symlink" after introducing the +concept, so it seems to be a fairly common abbreviation for the full name even +among Windows developers and users. + +# Detailed design + +Move `std::fs::soft_link` to `std::os::unix::fs::symlink`, and create +`std::os::windows::fs::symlink_file` and `std::os::windows::fs::symlink_dir` +that call `CreateSymbolicLink` with the appropriate arguments. + +Keep a deprecated compatibility wrapper `std::fs::soft_link` which wraps +`std::os::unix::fs::symlink` or `std::os::windows::fs::symlink_file`, +depending on the platform (as that is the current behavior of +`std::fs::soft_link`, to create a file symbolic link). + +# Drawbacks + +This deprecates a stable API during the 1.0.0 beta, leaving an extra wrapper +around. + +# Alternatives + +* Have a cross platform `symlink` and `symlink_dir`, that do the same thing on + Unix but differ on Windows. This has the drawback of invisible + compatibility hazards; code that works on Unix using `symlink` may fail + silently on Windows, as creating the wrong type of symlink may succeed but + it may not be interpreted properly once a destination file of the other type + is created. +* Have a cross platform `symlink` that detects the type of the destination + on Windows. This is not always possible as it's valid to create dangling + symbolic links. +* Have `symlink`, `symlink_dir`, and `symlink_file` all cross-platform, where + the first dispatches based on the destination file type, and the latter two + panic if called with the wrong destination file type. Again, this is not + always possible as it's valid to create dangling symbolic links. +* Rather than having two separate functions on Windows, you could have a + separate parameter on Windows to specify the type of link to create; + `symlink("a", "b", FILE_SYMLINK)` vs `symlink("a", "b", DIR_SYMLINK)`. + However, having a `symlink` that had different arity on Unix and Windows + would likely be confusing, and since there are only the two possible + choices, simply having two functions seems like a much simpler solution. + +Other choices for the naming convention would be: + +* The status quo, `soft_link` +* The original proposal from rust-lang/rfcs#517, `sym_link` +* The full name, `symbolic_link` + +The first choice is non-obvious, for people coming from either Windows or +Unix. It is a classic compromise, that makes everyone unhappy. + +`sym_link` is slightly more consistent with the complementary `hard_link` +function, and treating "sym link" as two separate words has some precedent in +two of the Windows-targetted APIs, Delphi and some of the PowerShell cmdlets +observed. However, I have not found any other snake case API that uses that, +and only a couple of Windows-specific APIs that use it in camel case; most +usage prefers the single word "symlink" to the two word "sym link" as the +abbreviation. + +The full name `symbolic_link`, is a bit long and cumbersome compared to most +of the rest of the API, but is explicit and is the term used in prose to +describe the concept everywhere, so shouldn't emphasize any one platform over +the other. However, unlike all other operations for creating a file or +directory (`open`, `create`, `create_dir`, etc), it is a noun, not a verb. +When used as a verb, it would be called "symbolically link", but that sounds +quite odd in the context of an API: `symbolically_link("a", "b")`. "symlink", +on the other hand, can act as either a noun or a verb. + +It would be possible to prefix any of the forms above that read as a noun with +`create_`, such as `create_symlink`, `create_sym_link`, +`create_symbolic_link`. This adds further to the verbosity, though it is +consisted with `create_dir`; you would probably need to also rename +`hard_link` to `create_hard_link` for consistency, and this seems like a lot +of churn and extra verbosity for not much benefit, as `symlink` and +`hard_link` already act as verbs on their own. If you picked this, then the +Windows versions would need to be named `create_file_symlink` and +`create_dir_symlink` (or the variations with `sym_link` or `symbolic_link`). + +# Unresolved questions + +If we deprecate `soft_link` now, early in the beta cycle, would it be +acceptable to remove it rather than deprecate it before 1.0.0, thus avoiding a +permanently stable but deprecated API right out the gate? diff --git a/text/1054-str-words.md b/text/1054-str-words.md new file mode 100644 index 00000000000..abfc3efee8d --- /dev/null +++ b/text/1054-str-words.md @@ -0,0 +1,67 @@ +- Feature Name: str-words +- Start Date: 2015-04-10 +- RFC PR: [rust-lang/rfcs#1054](https://github.com/rust-lang/rfcs/pull/1054) +- Rust Issue: [rust-lang/rust#24543](https://github.com/rust-lang/rust/issues/24543) + +# Summary + +Rename or replace `str::words` to side-step the ambiguity of “a word”. + + +# Motivation + +The [`str::words`](http://doc.rust-lang.org/std/primitive.str.html#method.words) method +is currently marked `#[unstable(reason = "the precise algorithm to use is unclear")]`. +Indeed, the concept of “a word” is not easy to define in presence of punctuation +or languages with various conventions, including not using spaces at all to separate words. + +[Issue #15628](https://github.com/rust-lang/rust/issues/15628) suggests +changing the algorithm to be based on [the *Word Boundaries* section of +*Unicode Standard Annex #29: Unicode Text Segmentation*](http://www.unicode.org/reports/tr29/#Word_Boundaries). + +While a Rust implementation of UAX#29 would be useful, it belong on crates.io more than in `std`: + +* It carries significant complexity that may be surprising from something that looks as simple + as a parameter-less “words” method in the standard library. + Users may not be aware of how subtle defining “a word” can be. +* It is not a definitive answer. The standard itself notes: + + > It is not possible to provide a uniform set of rules that resolves all issues across languages + > or that handles all ambiguous situations within a given language. + > The goal for the specification presented in this annex is to provide a workable default; + > tailored implementations can be more sophisticated. + + and gives many examples of such ambiguous situations. + +Therefore, `std` would be better off avoiding the question of defining word boundaries entirely. + + +# Detailed design + +Rename the `words` method to `split_whitespace`, and keep the current behavior unchanged. +(That is, return an iterator equivalent to `s.split(char::is_whitespace).filter(|s| !s.is_empty())`.) + +Rename the return type `std::str::Words` to `std::str::SplitWhitespace`. + +Optionally, keep a `words` wrapper method for a while, both `#[deprecated]` and `#[unstable]`, +with an error message that suggests `split_whitespace` or the chosen alternative. + + +# Drawbacks + +`split_whitespace` is very similar to the existing `str::split(&self, P)` method, +and having a separate method seems like weak API design. (But see below.) + + +# Alternatives + +* Replace `str::words` with `struct Whitespace;` with a custom `Pattern` implementation, + which can be used in `str::split`. + However this requires the `Whitespace` symbol to be imported separately. +* Remove `str::words` entirely and tell users to use + `s.split(char::is_whitespace).filter(|s| !s.is_empty())` instead. + + +# Unresolved questions + +Is there a better alternative? diff --git a/text/1057-io-error-sync.md b/text/1057-io-error-sync.md new file mode 100644 index 00000000000..8e173b5c029 --- /dev/null +++ b/text/1057-io-error-sync.md @@ -0,0 +1,74 @@ +- Feature Name: `io_error_sync` +- Start Date: 2015-04-11 +- RFC PR: [rust-lang/rfcs#1057](https://github.com/rust-lang/rfcs/pull/1057) +- Rust Issue: [rust-lang/rust#24133](https://github.com/rust-lang/rust/pull/24133) + +# Summary + +Add the `Sync` bound to `io::Error` by requiring that any wrapped custom errors +also conform to `Sync` in addition to `error::Error + Send`. + +# Motivation + +Adding the `Sync` bound to `io::Error` has 3 primary benefits: + +* Values that contain `io::Error`s will be able to be `Sync` +* Perhaps more importantly, `io::Error` will be able to be stored in an `Arc` +* By using the above, a cloneable wrapper can be created that shares an + `io::Error` using an `Arc` in order to simulate the old behavior of being able + to clone an `io::Error`. + +# Detailed design + +The only thing keeping `io::Error` from being `Sync` today is the wrapped custom +error type `Box`. Changing this to +`Box` and adding the `Sync` bound to `io::Error::new()` +is sufficient to make `io::Error` be `Sync`. In addition, the relevant +`convert::From` impls that convert to `Box` will be updated +to convert to `Box` instead. + +# Drawbacks + +The only downside to this change is it means any types that conform to +`error::Error` and are `Send` but not `Sync` will no longer be able to be +wrapped in an `io::Error`. It's unclear if there's any types in the standard +library that will be impacted by this. Looking through the [list of +implementors][impls] for `error::Error`, here's all of the types that may be +affected: + +* `io::IntoInnerError`: This type is only `Sync` if the underlying buffered + writer instance is `Sync`. I can't be sure, but I don't believe we have any + writers that are `Send` but not `Sync`. In addition, this type has a `From` + impl that converts it to `io::Error` even if the writer is not `Send`. +* `sync::mpsc::SendError`: This type is only `Sync` if the wrapped value `T` is + `Sync`. This is of course also true for `Send`. I'm not sure if anyone is + relying on the ability to wrap a `SendError` in an `io::Error`. +* `sync::mpsc::TrySendError`: Same situation as `SendError`. +* `sync::PoisonError`: This type is already not compatible with `io::Error` + because it wraps mutex guards (such as `sync::MutexGuard`) which are not + `Send`. +* `sync::TryLockError`: Same situation as `PoisonError`. + +So the only real question is about `sync::mpsc::SendError`. If anyone is relying +on the ability to convert that into an `io::Error` a `From` impl could be +added that returns an `io::Error` that is indistinguishable from a wrapped +`SendError`. + +[impls]: http://doc.rust-lang.org/nightly/std/error/trait.Error.html + +# Alternatives + +Don't do this. Not adding the `Sync` bound to `io::Error` means `io::Error`s +cannot be stored in an `Arc` and types that contain an `io::Error` cannot be +`Sync`. + +We should also consider whether we should go a step further and change +`io::Error` to use `Arc` instead of `Box` internally. This would let us restore +the `Clone` impl for `io::Error`. + +# Unresolved questions + +Should we add the `From` impl for `SendError`? There is no code in the rust +project that relies on `SendError` being converted to `io::Error`, and I'm +inclined to think it's unlikely for anyone to be relying on that, but I don't +know if there are any third-party crates that will be affected. diff --git a/text/1058-slice-tail-redesign.md b/text/1058-slice-tail-redesign.md new file mode 100644 index 00000000000..194073f4391 --- /dev/null +++ b/text/1058-slice-tail-redesign.md @@ -0,0 +1,97 @@ +- Feature Name: `slice_tail_redesign` +- Start Date: 2015-04-11 +- RFC PR: [rust-lang/rfcs#1058](https://github.com/rust-lang/rfcs/pull/1058) +- Rust Issue: [rust-lang/rust#26906](https://github.com/rust-lang/rust/issues/26906) + +# Summary + +Replace `slice.tail()`, `slice.init()` with new methods `slice.split_first()`, +`slice.split_last()`. + +# Motivation + +The `slice.tail()` and `slice.init()` methods are relics from an older version +of the slice APIs that included a `head()` method. `slice` no longer has +`head()`, instead it has `first()` which returns an `Option`, and `last()` also +returns an `Option`. While it's generally accepted that indexing / slicing +should panic on out-of-bounds access, `tail()`/`init()` are the only +remaining methods that panic without taking an explicit index. + +A conservative change here would be to simply change `head()`/`tail()` to return +`Option`, but I believe we can do better. These operations are actually +specializations of `split_at()` and should be replaced with methods that return +`Option<(&T,&[T])>`. This makes the common operation of processing the +first/last element and the remainder of the list more ergonomic, with very low +impact on code that only wants the remainder (such code only has to add `.1` to +the expression). This has an even more significant effect on code that uses the +mutable variants. + +# Detailed design + +The methods `head()`, `tail()`, `head_mut()`, and `tail_mut()` will be removed, +and new methods will be added: + +```rust +fn split_first(&self) -> Option<(&T, &[T])>; +fn split_last(&self) -> Option<(&T, &[T])>; +fn split_first_mut(&mut self) -> Option<(&mut T, &mut [T])>; +fn split_last_mut(&mut self) -> Option<(&mut T, &mut [T])>; +``` + +Existing code using `tail()` or `init()` could be translated as follows: + +* `slice.tail()` becomes `&slice[1..]` +* `slice.init()` becomes `&slice[..slice.len()-1]` or + `slice.split_last().unwrap().1` + +It is expected that a lot of code using `tail()` or `init()` is already either +testing `len()` explicitly or using `first()` / `last()` and could be refactored +to use `split_first()` / `split_last()` in a more ergonomic fashion. As an +example, the following code from typeck: + +```rust +if variant.fields.len() > 0 { + for field in variant.fields.init() { +``` + +can be rewritten as: + +```rust +if let Some((_, init_fields)) = variant.fields.split_last() { + for field in init_fields { +``` + +And the following code from compiletest: + +```rust +let argv0 = args[0].clone(); +let args_ = args.tail(); +``` + +can be rewritten as: + +```rust +let (argv0, args_) = args.split_first().unwrap(); +``` + +(the `clone()` ended up being unnecessary). + +# Drawbacks + +The expression `slice.split_last().unwrap().1` is more cumbersome than +`slice.init()`. However, this is primarily due to the need for `.unwrap()` +rather than the need for `.1`, and would affect the more conservative solution +(of making the return type `Option<&[T]>`) as well. Furthermore, the more +idiomatic translation is `&slice[..slice.len()-1]`, which can be used any time +the slice is already stored in a local variable. + +# Alternatives + +Only change the return type to `Option` without adding the tuple. This is the +more conservative change mentioned above. It still has the same drawback of +requiring `.unwrap()` when translating existing code. And it's unclear what the +function names should be (the current names are considered suboptimal). + +Just deprecate the current methods without adding replacements. This gets rid of +the odd methods today, but it doesn't do anything to make it easier to safely +perform these operations. diff --git a/text/1066-safe-mem-forget.md b/text/1066-safe-mem-forget.md new file mode 100644 index 00000000000..aefcbb20197 --- /dev/null +++ b/text/1066-safe-mem-forget.md @@ -0,0 +1,124 @@ +- Feature Name: N/A +- Start Date: 2015-04-15 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1066 +- Rust Issue: https://github.com/rust-lang/rust/issues/25186 + +# Summary + +Alter the signature of the `std::mem::forget` function to remove `unsafe`. +Explicitly state that it is not considered unsafe behavior to not run +destructors. + +# Motivation + +It was [recently discovered][scoped-bug] by @arielb1 that the `thread::scoped` +API was unsound. To recap, this API previously allowed spawning a child thread +sharing the parent's stack, returning an RAII guard which `join`'d the child +thread when it fell out of scope. The join-on-drop behavior here is critical to +the safety of the API to ensure that the parent does not pop the stack frames +the child is referencing. Put another way, the safety of `thread::scoped` relied +on the fact that the `Drop` implementation for `JoinGuard` was *always* run. + +[scoped-bug]: https://github.com/rust-lang/rust/issues/24292 + +The [underlying issue][forget-bug] for this safety hole was that it is possible +to write a version of `mem::forget` without using `unsafe` code (which drops a +value without running its destructor). This is done by creating a cycle of `Rc` +pointers, leaking the actual contents. It [has been pointed out][dtor-comment] +that `Rc` is not the only vector of leaking contents today as there are +[known][dtor-bug1] [bugs][dtor-bug2] where `panic!` may fail to run +destructors. Furthermore, it has [also been pointed out][drain-bug] that not +running destructors can affect the safety of APIs like `Vec::drain_range` in +addition to `thread::scoped`. + +[forget-bug]: https://github.com/rust-lang/rust/issues/24456 +[dtor-comment]: https://github.com/rust-lang/rust/issues/24292#issuecomment-93505374 +[dtor-bug1]: https://github.com/rust-lang/rust/issues/14875 +[dtor-bug2]: https://github.com/rust-lang/rust/issues/16135 +[drain-bug]: https://github.com/rust-lang/rust/issues/24292#issuecomment-93513451 + +It has never been a guarantee of Rust that destructors for a type will run, and +this aspect was overlooked with the `thread::scoped` API which requires that its +destructor be run! Reconciling these two desires has lead to a good deal of +discussion of possible mitigation strategies for various aspects of this +problem. This strategy proposed in this RFC aims to fit uninvasively into the +standard library to avoid large overhauls or destabilizations of APIs. + +# Detailed design + +Primarily, the `unsafe` annotation on the `mem::forget` function will be +removed, allowing it to be called from safe Rust. This transition will be made +possible by stating that destructors **may not run** in all circumstances (from +both the language and library level). The standard library and the primitives it +provides will always attempt to run destructors, but will not provide a +guarantee that destructors will be run. + +It is still likely to be a footgun to call `mem::forget` as memory leaks are +almost always undesirable, but the purpose of the `unsafe` keyword in Rust is to +indicate **memory unsafety** instead of being a general deterrent for "should be +avoided" APIs. Given the premise that types must be written assuming that their +destructor may not run, it is the fault of the type in question if `mem::forget` +would trigger memory unsafety, hence allowing `mem::forget` to be a safe +function. + +Note that this modification to `mem::forget` is a breaking change due to the +signature of the function being altered, but it is expected that most code will +not break in practice and this would be an acceptable change to cherry-pick into +the 1.0 release. + +# Drawbacks + +It is clearly a very nice feature of Rust to be able to rely on the fact that a +destructor for a type is always run (e.g. the `thread::scoped` API). Admitting +that destructors may not be run can lead to difficult API decisions later on and +even accidental unsafety. This route, however, is the least invasive for the +standard library and does not require radically changing types like `Rc` or +fast-tracking bug fixes to panicking destructors. + +# Alternatives + +The main alternative this proposal is to provide the guarantee that a destructor +for a type is always run and that it is memory unsafe to not do so. This would +require a number of pieces to work together: + +* Panicking destructors not running other locals' destructors would [need to be + fixed][dtor-bug1] +* Panics in the elements of containers would [need to be fixed][dtor-bug2] to + continue running other elements' destructors. +* The `Rc` and `Arc` types would need be reevaluated somehow. One option would + be to statically prevent cycles, and another option would be to disallow types + that are unsafe to leak from being placed in `Rc` and `Arc` (more details + below). +* An audit would need to be performed to ensure that there are no other known + locations of leaks for types. There are likely more than one location than + those listed here which would need to be addressed, and it's also likely that + there would continue to be locations where destructors were not run. + +There has been quite a bit of discussion specifically on the topic of `Rc` and +`Arc` as they may be tricky cases to fix. Specifically, the compiler could +perform some form of analysis could to forbid *all* cycles or just those that +would cause memory unsafety. Unfortunately, forbidding all cycles is likely to +be too limiting for `Rc` to be useful. Forbidding only "bad" cycles, however, is +a more plausible option. + +Another alternative, as proposed by @arielb1, would be [a `Leak` marker +trait][leak] to indicate that a type is "safe to leak". Types like `Rc` would +require that their contents are `Leak`, and the `JoinGuard` type would opt-out +of it. This marker trait could work similarly to `Send` where all types are +considered leakable by default, but types could opt-out of `Leak`. This +approach, however, requires `Rc` and `Arc` to have a `Leak` bound on their type +parameter which can often leak unfortunately into many generic contexts (e.g. +trait objects). Another option would be to treak `Leak` more similarly to +`Sized` where all type parameters have a `Leak` bound by default. This change +may also cause confusion, however, by being unnecessarily restrictive (e.g. all +collections may want to take `T: ?Leak`). + +[leak]: https://github.com/rust-lang/rust/issues/24292#issuecomment-91646130 + +Overall the changes necessary for this strategy are more invasive than admitting +destructors may not run, so this alternative is not proposed in this RFC. + +# Unresolved questions + +Are there remaining APIs in the standard library which rely on destructors being +run for memory safety? diff --git a/text/1068-rust-governance.md b/text/1068-rust-governance.md new file mode 100644 index 00000000000..16237eb791e --- /dev/null +++ b/text/1068-rust-governance.md @@ -0,0 +1,728 @@ +- Feature Name: not applicable +- Start Date: 2015-02-27 +- RFC PR: [rust-lang/rfcs#1068](https://github.com/rust-lang/rfcs/pull/1068) +- Rust Issue: N/A + +# Summary + +This RFC proposes to expand, and make more explicit, Rust's governance +structure. It seeks to supplement today's core team with several +*subteams* that are more narrowly focused on specific areas of +interest. + +*Thanks to Nick Cameron, Manish Goregaokar, Yehuda Katz, Niko Matsakis and Dave + Herman for many suggestions and discussions along the way.* + +# Motivation + +Rust's governance has evolved over time, perhaps most dramatically +with the introduction of the RFC system -- which has itself been +tweaked many times. RFCs have been a major boon for improving design +quality and fostering deep, productive discussion. It's something we +all take pride in. + +That said, as Rust has matured, a few growing pains have emerged. + +We'll start with a brief review of today's governance and process, +then discuss what needs to be improved. + +## Background: today's governance structure + +Rust is governed by a +[core team](https://github.com/rust-lang/rust-wiki-backup/blob/master/Note-core-team.md), +which is ultimately responsible for all decision-making in the +project. Specifically, the core team: + +* Sets the overall direction and vision for the project; +* Sets the priorities and release schedule; +* Makes final decisions on RFCs. + +The core team currently has 8 members, including some people working +full-time on Rust, some volunteers, and some production users. + +Most technical decisions are decided through the +[RFC process](https://github.com/rust-lang/rfcs#what-the-process-is). +RFCs are submitted for essentially all changes to the language, +most changes to the standard library, and +[a few other topics](https://github.com/rust-lang/rfcs#when-you-need-to-follow-this-process). +RFCs are either closed immediately (if they are clearly not viable), +or else assigned a *shepherd* who is responsible for keeping the +discussion moving and ensuring all concerns are responded to. + +The final decision to accept or reject an RFC is made by the core +team. In many cases this decision follows after many rounds of +consensus-building among all stakeholders for the RFC. In the end, +though, most decisions are about weighting various tradeoffs, and the +job of the core team is to make the final decision about such +weightings in light of the overall direction of the language. + +## What needs improvement + +At a high level, we need to improve: + +* Process scalability. +* Stakeholder involvement. +* Clarity/transparency. +* Moderation processes. + +Below, each of these bullets is expanded into a more detailed analysis +of the problems. These are the problems this RFC is trying to +solve. The "Detailed Design" section then gives the actual proposal. + +### Scalability: RFC process + +In some ways, the RFC process is a victim of its own success: as the +volume and depth of RFCs has increased, it's harder for the entire +core team to stay educated and involved in every RFC. The +[shepherding process](https://github.com/rust-lang/rfcs#the-role-of-the-shepherd) +has helped make sure that RFCs don't fall through the cracks, but even +there it's been hard for the relatively small number of shepherds to +keep up (on top of the other work that they do). + +Part of the problem, of course, is due to the current push toward 1.0, +which has both increased RFC volume and takes up a great deal of +attention from the core team. But after 1.0 is released, the community +is likely to grow significantly, and feature requests will only +increase. + +Growing the core team over time has helped, but there's a practical +limit to the number of people who are jointly making decisions and +setting direction. + +A distinct problem in the other direction has also emerged recently: we've +slowly been requiring RFCs for increasingly minor changes. While it's important +that user-facing changes and commitments be vetted, the process has started to +feel heavyweight (especially for newcomers), so a recalibration may be in order. + +We need a way to scale up the RFC process that: + +* Ensures each RFC is thoroughly reviewed by several people with + interest and expertise in the area, but with different perspectives + and concerns. + +* Ensures each RFC continues moving through the pipeline at a + reasonable pace. + +* Ensures that accepted RFCs are well-aligned with the values, goals, + and direction of the project, and with other RFCs (past, present, + and future). + +* Ensures that simple, uncontentious changes can be made quickly, without undue + process burden. + +### Scalability: areas of focus + +In addition, there are increasingly areas of important work that are +only loosely connected with decisions in the core language or APIs: +tooling, documentation, infrastructure, for example. These areas all +need leadership, but it's not clear that they require the same degree +of global coordination that more "core" areas do. + +These areas are only going to increase in number and importance, so we +should remove obstacles holding them back. + +### Stakeholder involvement + +RFC shepherds are intended to reach out to "stakeholders" in an RFC, +to solicit their feedback. But that is different from the stakeholders +having a direct role in decision making. + +To the extent practical, we should include a diverse range of +perspectives in both design and decision-making, and especially +include people who are most directly affected by decisions: users. + +We have taken some steps in this direction by diversifying the core +team itself, but (1) members of the core team by definition need to +take a balanced, global view of things and (2) the core team should +not grow too large. So some other way of including more stakeholders +in decisions would be preferable. + +### Clarity and transparency + +Despite many steps toward increasing the clarity and openness of +Rust's processes, there is still room for improvement: + +* The priorities and values set by the core team are not always + clearly communicated today. This in turn can make the RFC process + seem opaque, since RFCs move along at different speeds (or are even + closed as postponed) according to these priorities. + + At a large scale, there should be more systematic communication + about high-level priorities. It should be clear whether a given RFC + topic would be considered in the near term, long term, or + never. Recent blog posts about the 1.0 release and stabilization + have made a big step in this direction. After 1.0, as part of the + regular release process, we'll want to find some regular cadence for + setting and communicating priorities. + + At a smaller scale, it is still the case that RFCs fall through the + cracks or have unclear statuses (see Scalability problems + above). Clearer, public tracking of the RFC pipeline would be a + significant improvement. + +* The decision-making process can still be opaque: it's not always + clear to an RFC author exactly when and how a decision on the RFC + will be made, and how best to work with the team for a favorable + decision. We strive to make core team meetings as *uninteresting* as + possible (that is, all interesting debate should happen in public + online communication), but there is still room for being more + explicit and public. + +### Community norms and the Code of Conduct + +Rust's design process and community norms are closely intertwined. The +RFC process is a joint exploration of design space and tradeoffs, and +requires consensus-building. The process -- and the Rust community -- +is at its best when all participants recognize that + +> ... people have differences of opinion and that every design or +> implementation choice carries a trade-off and numerous costs. There +> is seldom a right answer. + +This and other important values and norms are recorded in the +[project code of conduct (CoC)](http://www.rust-lang.org/conduct.html), +which also includes language about harassment and marginalized groups. + +Rust's community has long upheld a high standard of conduct, and has +earned a reputation for doing so. + +However, as the community grows, as people come and go, we must +continually work to maintain this standard. Usually, it suffices to +lead by example, or to gently explain the kind of mutual respect that +Rust's community practices. Sometimes, though, that's not enough, and +explicit moderation is needed. + +One problem that has emerged with the CoC is the lack of clarity about +the mechanics of moderation: + +* Who is responsible for moderation? +* What about conflicts of interest? Are decision-makers also moderators? +* How are moderation decisions reached? When are they unilateral? +* When does moderation begin, and how quickly should it occur? +* Does moderation take into account past history? +* What venues does moderation apply to? + +Answering these questions, and generally clarifying how the CoC is viewed and +enforced, is an important step toward scaling up the Rust community. + +# Detailed design + +The basic idea is to supplement the core team with several "subteams". Each +subteam is focused on a specific area, e.g., language design or libraries. Most +of the RFC review process will take place within the relevant subteam, scaling +up our ability to make decisions while involving a larger group of people in +that process. + +To ensure global coordination and a strong, coherent vision for the project as a +whole, **each subteam is led by a member of the core team**. + +## Subteams + +**The primary roles of each subteam are**: + +* Shepherding RFCs for the subteam area. As always, that means (1) ensuring that + stakeholders are aware of the RFC, (2) working to tease out various design + tradeoffs and alternatives, and (3) helping build consensus. + +* Accepting or rejecting RFCs in the subteam area. + +* Setting policy on what changes in the subteam area require RFCs, and reviewing + direct PRs for changes that do not require an RFC. + +* Delegating *reviewer rights* for the subteam area. The ability to `r+` is not + limited to team members, and in fact earning `r+` rights is a good stepping + stone toward team membership. Each team should set reviewing policy, manage + reviewing rights, and ensure that reviews take place in a timely manner. + (Thanks to Nick Cameron for this suggestion.) + +Subteams make it possible to involve a larger, more diverse group in the +decision-making process. In particular, **they should involve a mix of**: + +* Rust project leadership, in the form of at least one core team member (the + leader of the subteam). + +* Area experts: people who have a lot of interest and expertise in the subteam + area, but who may be far less engaged with other areas of the project. + +* Stakeholders: people who are strongly affected by decisions in the + subteam area, but who may not be experts in the design or + implementation of that area. *It is crucial that some people heavily + using Rust for applications/libraries have a seat at the table, to + make sure we are actually addressing real-world needs.* + +Members should have demonstrated a good sense for design and dealing with +tradeoffs, an ability to work within a framework of consensus, and of course +sufficient knowledge about or experience with the subteam area. Leaders should +in addition have demonstrated exceptional communication, design, and people +skills. They must be able to work with a diverse group of people and help lead +it toward consensus and execution. + +Each subteam is led by a member of the core team. **The leader is responsible for**: + +* Setting up the subteam: + + * Deciding on the initial membership of the subteam (in consultation with + the core team). Once the subteam is up and running. + + * Working with subteam members to determine and publish subteam policies and + mechanics, including the way that subteam members join or leave the team + (which should be based on subteam consensus). + +* Communicating core team vision downward to the subteam. + +* Alerting the core team to subteam RFCs that need global, cross-cutting + attention, and to RFCs that have entered the "final comment period" (see below). + +* Ensuring that RFCs and PRs are progressing at a reasonable rate, re-assigning + shepherds/reviewers as needed. + +* Making final decisions in cases of contentious RFCs that are unable to reach + consensus otherwise (should be rare). + +The way that subteams communicate internally and externally is left to each +subteam to decide, but: + +* Technical discussion should take place as much as possible on public forums, + ideally on RFC/PR threads and tagged discuss posts. + +* Each subteam will have a dedicated + [discuss forum](http://internals.rust-lang.org/) tag. + +* Subteams should actively seek out discussion and input from stakeholders who + are not members of the team. + +* Subteams should have some kind of regular meeting or other way of making + decisions. The content of this meeting should be summarized with the rationale + for each decision -- and, as explained below, decisions should generally be + about weighting a set of already-known tradeoffs, not discussing or + discovering new rationale. + +* Subteams should regularly publish the status of RFCs, PRs, and other news + related to their area. Ideally, this would be done in part via a dashboard + like [the Homu queue](http://buildbot.rust-lang.org/homu/queue/rust) + +## Core team + +**The core team serves as leadership for the Rust project as a whole**. In + particular, it: + +* **Sets the overall direction and vision for the project.** That means setting + the core values that are used when making decisions about technical + tradeoffs. It means steering the project toward specific use cases where Rust + can have a major impact. It means leading the discussion, and writing RFCs + for, *major* initiatives in the project. + +* **Sets the priorities and release schedule.** Design bandwidth is limited, and + it's dangerous to try to grow the language too quickly; the core team makes + some difficult decisions about which areas to prioritize for new design, based + on the core values and target use cases. + +* **Focuses on broad, cross-cutting concerns.** The core team is specifically + designed to take a *global* view of the project, to make sure the pieces are + fitting together in a coherent way. + +* **Spins up or shuts down subteams.** Over time, we may want to expand the set + of subteams, and it may make sense to have temporary "strike teams" that focus + on a particular, limited task. + +* **Decides whether/when to ungate a feature.** While the subteams make + decisions on RFCs, the core team is responsible for pulling the trigger that + moves a feature from nightly to stable. This provides an extra check that + features have adequately addressed cross-cutting concerns, that the + implementation quality is high enough, and that language/library commitments + are reasonable. + +The core team should include both the subteam leaders, and, over time, a diverse +set of other stakeholders that are both actively involved in the Rust community, +and can speak to the needs of major Rust constituencies, to ensure that the +project is addressing real-world needs. + +## Decision-making + +### Consensus + +Rust has long used a form of [consensus decision-making][consensus]. In a +nutshell the premise is that a successful outcome is not where one side of a +debate has "won", but rather where concerns from *all* sides have been addressed +in some way. **This emphatically does not entail design by committee, nor +compromised design**. Rather, it's a recognition that + +> ... every design or implementation choice carries a trade-off and numerous +> costs. There is seldom a right answer. + +Breakthrough designs sometimes end up changing the playing field by eliminating +tradeoffs altogether, but more often difficult decisions have to be made. **The +key is to have a clear vision and set of values and priorities**, which is the +core team's responsibility to set and communicate, and the subteam's +responsibility to act upon. + +Whenever possible, we seek to reach consensus through discussion and design +revision. Concretely, the steps are: + +* Initial RFC proposed, with initial analysis of tradeoffs. +* Comments reveal additional drawbacks, problems, or tradeoffs. +* RFC revised to address comments, often by improving the design. +* Repeat above until "major objections" are fully addressed, or it's clear that + there is a fundamental choice to be made. + +Consensus is reached when most people are left with only "minor" objections, +i.e., while they might choose the tradeoffs slightly differently they do not +feel a strong need to *actively block* the RFC from progressing. + +One important question is: consensus among which people, exactly? Of course, the +broader the consensus, the better. But at the very least, **consensus within the +members of the subteam should be the norm for most decisions.** If the core team +has done its job of communicating the values and priorities, it should be +possible to fit the debate about the RFC into that framework and reach a fairly +clear outcome. + +[consensus]: http://en.wikipedia.org/wiki/Consensus_decision-making + +### Lack of consensus + +In some cases, though, consensus cannot be reached. These cases tend to split +into two very different camps: + +* "Trivial" reasons, e.g., there is not widespread agreement about naming, but + there is consensus about the substance. + +* "Deep" reasons, e.g., the design fundamentally improves one set of concerns at + the expense of another, and people on both sides feel strongly about it. + +In either case, an alternative form of decision-making is needed. + +* For the "trivial" case, usually either the RFC shepherd or subteam leader will + make an executive decision. + +* For the "deep" case, the subteam leader is empowered to make a final decision, + but should consult with the rest of the core team before doing so. + +### How and when RFC decisions are made, and the "final comment period" + +Each RFC has a shepherd drawn from the relevant subteam. The shepherd is +responsible for driving the consensus process -- working with both the RFC +author and the broader community to dig out problems, alternatives, and improved +design, always working to reach broader consensus. + +At some point, the RFC comments will reach a kind of "steady state", where no +new tradeoffs are being discovered, and either objections have been addressed, +or it's clear that the design has fundamental downsides that need to be weighed. + +At that point, the shepherd will announce that the RFC is in a "final comment +period" (which lasts for one week). This is a kind of "last call" for strong +objections to the RFC. **The announcement of the final comment period for an RFC +should be very visible**; it should be included in the subteam's periodic +communications. + +> Note that the final comment period is in part intended to help keep RFCs +> moving. Historically, RFCs sometimes stall out at a point where discussion has +> died down but a decision isn't needed urgently. In this proposed model, the +> RFC author could ask the shepherd to move to the final comment period (and +> hence toward a decision). + +After the final comment period, the subteam can make a decision on the RFC. The +role of the subteam at that point is *not* to reveal any new technical issues or +arguments; if these come up during discussion, they should be added as comments +to the RFC, and it should undergo another final comment period. + +Instead, the subteam decision is based on **weighing the already-revealed +tradeoffs against the project's priorities and values** (which the core team is +responsible for setting, globally). In the end, these decisions are about how to +weight tradeoffs. The decision should be communicated in these terms, pointing +out the tradeoffs that were raised and explaining how they were weighted, and +**never introducing new arguments**. + +## Keeping things lightweight + +In addition to the "final comment period" proposed above, this RFC proposes some +further adjustments to the RFC process to keep it lightweight. + +A key observation is that, thanks to the stability system and nightly/stable +distinction, **it's easy to experiment with features without commitment**. + +### Clarifying what needs an RFC + +Over time, we've been drifting toward requiring an RFC for essentially any +user-facing change, which sometimes means that very minor changes get stuck +awaiting an RFC decision. While subteams + final comment period should help keep +the pipeline flowing a bit better, it would also be good to allow "minor" +changes to go through without an RFC, provided there is sufficient review in +some other way. (And in the end, the core team ungates features, which ensures +at least a final review.) + +This RFC does not attempt to answer the question "What needs an RFC", because +that question will vary for each subteam. However, this RFC stipulates that each +subteam should set an explicit policy about: + +1. What requires an RFC for the subteam's area, and +2. What the non-RFC review process is. + +These guidelines should try to keep the process lightweight for minor changes. + +### Clarifying the "finality" of RFCs + +While RFCs are very important, they do not represent the final state of a +design. Often new issues or improvements arise during implementation, or after +gaining some experience with a feature. **The nightly/stable distinction exists +in part to allow for such design iteration.** + +Thus RFCs do not need to be "perfect" before acceptance. If consensus is reached +on major points, the minor details can be left to implementation and revision. + +Later, if an implementation differs from the RFC in *substantial* ways, the +subteam should be alerted, and may ask for an explicit amendment RFC. Otherwise, +the changes should just be explained in the commit/PR. + +## The teams + +With all of that out of the way, what subteams should we start with? This RFC +proposes the following initial set: + +* Language design +* Libraries +* Compiler +* Tooling and infrastructure +* Moderation + +In the long run, we will likely also want teams for documentation and for +community events, but these can be spun up once there is a more clear need (and +available resources). + +### Language design team + +Focuses on the *design* of language-level features; not all team members need to +have extensive implementation experience. + +Some example RFCs that fall into this area: + +* [Associated types and multidispatch](https://github.com/rust-lang/rfcs/pull/195) +* [DST coercions](https://github.com/rust-lang/rfcs/pull/982) +* [Trait-based exception handling](https://github.com/rust-lang/rfcs/pull/243) +* [Rebalancing coherence](https://github.com/rust-lang/rfcs/pull/1023) +* [Integer overflow](https://github.com/rust-lang/rfcs/pull/560) (this has high + overlap with the library subteam) +* [Sound generic drop](https://github.com/rust-lang/rfcs/pull/769) + +### Library team + +Oversees both `std` and, ultimately, other crates in the `rust-lang` github +organization. The focus up to this point has been the standard library, but we +will want "official" libraries that aren't quite `std` territory but are still +vital for Rust. (The precise plan here, as well as the long-term plan for `std`, +is one of the first important areas of debate for the subteam.) Also includes +API conventions. + +Some example RFCs that fall into this area: + +* [Collections reform](https://github.com/rust-lang/rfcs/pull/235) +* [IO reform](https://github.com/rust-lang/rfcs/pull/517/) +* [Debug improvements](https://github.com/rust-lang/rfcs/pull/640) +* [Simplifying std::hash](https://github.com/rust-lang/rfcs/pull/823) +* [Conventions for ownership variants](https://github.com/rust-lang/rfcs/pull/199) + +### Compiler team + +Focuses on compiler internals, including implementation of language +features. This broad category includes work in codegen, factoring of compiler +data structures, type inference, borrowck, and so on. + +There is a more limited set of example RFCs for this subteam, in part because we +haven't generally required RFCs for this kind of internals work, but here are two: + +* [Non-zeroing dynamic drops](https://github.com/rust-lang/rfcs/pull/320) (this + has high overlap with language design) +* [Incremental compilation](https://github.com/rust-lang/rfcs/pull/594) + +### Tooling and infrastructure team + +Even more broad is the "tooling" subteam, which at inception is planned to +encompass every "official" (rust-lang managed) non-`rustc` tool: + +* rustdoc +* rustfmt +* Cargo +* crates.io +* CI infrastructure +* Debugging tools +* Profiling tools +* Editor/IDE integration +* Refactoring tools + +It's not presently clear exactly what tools will end up under this umbrella, nor +which should be prioritized. + +### Moderation team + +Finally, the moderation team is responsible for dealing with CoC violations. + +One key difference from the other subteams is that the moderation team does not +have a leader. Its members are chosen directly by the core team, and should be +community members who have demonstrated the highest standard of discourse and +maturity. To limit conflicts of interest, **the moderation subteam should not +include any core team members**. However, the subteam is free to consult with +the core team as it deems appropriate. + +The moderation team will have a public email address that can be used to raise +complaints about CoC violations (forwards to all active moderators). + +#### Initial plan for moderation + +What follows is an initial proposal for the mechanics of moderation. The +moderation subteam may choose to revise this proposal by drafting an RFC, which +will be approved by the core team. + +Moderation begins whenever a moderator becomes aware of a CoC problem, either +through a complaint or by observing it directly. In general, the enforcement +steps are as follows: + +> **These steps are adapted from text written by Manish Goregaokar, who helped +articulate them from experience as a Stack Exchange moderator.** + +* Except for extreme cases (see below), try first to address the problem with a + light public comment on thread, aimed to de-escalate the situation. These + comments should strive for as much empathy as possible. Moderators should + emphasize that dissenting opinions are valued, and strive to ensure that the + technical points are heard even as they work to cool things down. + + When a discussion has just gotten a bit heated, the comment can just be a + reminder to be respectful and that there is rarely a clear "right" answer. In + cases that are more clearly over the line into personal attacks, it can + directly call out a problematic comment. + +* If the problem persists on thread, or if a particular person repeatedly comes + close to or steps over the line of a CoC violation, moderators then email the + offender privately. The message should include relevant portions of the CoC + together with the offending comments. Again, the goal is to de-escalate, and + the email should be written in a dispassionate and empathetic way. However, + the message should also make clear that continued violations may result in a + ban. + +* If problems still persist, the moderators can ban the offender. Banning should + occur for progressively longer periods, for example starting at 1 day, then 1 + week, then permanent. The moderation subteam will determine the precise + guidelines here. + +In general, moderators can and should unilaterally take the first step, but +steps beyond that (particularly banning) should be done via consensus with the +other moderators. Permanent bans require core team approval. + +Some situations call for more immediate, drastic measures: deeply inappropriate +comments, harassment, or comments that make people feel unsafe. (See the +[code of conduct](http://www.rust-lang.org/conduct.html) for some more details +about this kind of comment). In these cases, an individual moderator is free to +take immediate, unilateral steps including redacting or removing comments, or +instituting a short-term ban until the subteam can convene to deal with the +situation. + +The moderation team is responsible for interpreting the CoC. Drastic measures +like bans should only be used in cases of clear, repeated violations. + +Moderators themselves are held to a very high standard of behavior, and should +strive for professional and impersonal interactions when dealing with a CoC +violation. They should always push to *de-escalate*. And they should recuse +themselves from moderation in threads where they are actively participating in +the technical debate or otherwise have a conflict of interest. Moderators who +fail to keep up this standard, or who abuse the moderation process, may be +removed by the core team. + +Subteam, and especially core team members are *also* held to a high standard of +behavior. Part of the reason to separate the moderation subteam is to ensure +that CoC violations by Rust's leadership be addressed through the same +independent body of moderators. + +Moderation covers all rust-lang venues, which currently include github +repos, IRC channels (#rust, #rust-internals, #rustc, #rust-libs), and +the two discourse forums. (The subreddit already has its own +moderation structure, and isn't directly associated with the rust-lang +organization.) + +# Drawbacks + +One possibility is that decentralized decisions may lead to a lack of coherence +in the overall design of Rust. However, the existence of the core team -- and +the fact that subteam leaders will thus remain in close communication on +cross-cutting concerns in particular -- serves to greatly mitigate that risk. + +As with any change to governance, there is risk that this RFC would harm +processes that are working well. In particular, bringing on a large number of +new people into official decision-making roles carries a risk of culture clash +or problems with consensus-building. + +By setting up this change as a relatively slow build-out from the current core +team, some of this risk is mitigated: it's not a radical restructuring, but +rather a refinement of the current process. In particular, today core team +members routinely seek input directly from other community members who would be +likely subteam members; in some ways, this RFC just makes that process more +official. + +For the moderation subteam, there is a significant shift toward strong +enforcement of the CoC, and with that a risk of *over*-application: the goal is +to make discourse safe and productive, not to introduce fear of violating the +CoC. The moderation guidelines, careful selection of moderators, and ability to +withdraw moderators mitigate this risk. + +# Alternatives + +There are numerous other forms of open-source governance out there, far more +than we can list or detail here. And in any case, this RFC is intended as an +expansion of Rust's existing governance to address a few scaling problems, +rather than a complete rethink. + +[Mozilla's module system][module], was a partial inspiration for this RFC. The +proposal here can be seen as an evolution of the module system where the subteam +leaders (module owners) are integrated into an explicit core team, providing for +tighter intercommunication and a more unified sense of vision and purpose. +Alternatively, the proposal is an evolution of the current core team structure +to include subteams. + +One seemingly minor, but actually important aspect is *naming*: + +* The name "subteam" (from [jQuery][jq]) felt like a better fit than "module" both +to avoid confusion (having two different kinds of modules associated with +Mozilla seems problematic) and because it emphasizes the more unified nature of +this setup. + +* The term "leader" was chosen to reflect that there is a vision for each subteam +(as part of the larger vision for Rust), which the leader is responsible for +moving the subteam toward. Notably, this is how "module owner" is actually +defined in Mozilla's module system: + + > A "module owner" is the person to whom leadership of a module's work has been + > delegated. + +* The term "team member" is just following standard parlance. It could be +replaced by something like "peer" (following the module system tradition), or +some other term that is less bland than "member". Ideally, the term would +highlight the significant stature of team membership: being part of the +decision-making group for a substantial area of the Rust project. + +[module]: https://wiki.mozilla.org/Modules +[jq]: https://jquery.org/team/ +[mom]: https://wiki.mozilla.org/Modules/Activities#Module_Ownership_System + +# Unresolved questions + +## Subteams + +This RFC purposefully leaves several subteam-level questions open: + +* What is the exact venue and cadence for subteam decision-making? +* Do subteams have dedicated IRC channels or other forums? (This RFC stipulates + only dedicated discourse tags.) +* How large is each subteam? +* What are the policies for when RFCs are required, or when PRs may be reviewed + directly? + +These questions are left to be address by subteams after their formation, in +part because good answers will likely require some iterations to discover. + +## Broader questions + +There are many other questions that this RFC doesn't seek to address, and this +is largely intentional. For one, it avoids trying to set out too much structure +in advance, making it easier to iterate on the mechanics of subteams. In +addition, there is a danger of *too much* policy and process, especially given +that this RFC is aimed to improve the scalability of decision-making. It should +be clear that this RFC is not the last word on governance, and over time we will +probably want to grow more explicit policies in other areas -- but a +lightweight, iterative approach seems the best way to get there. diff --git a/text/1096-remove-static-assert.md b/text/1096-remove-static-assert.md new file mode 100644 index 00000000000..60cf8e81157 --- /dev/null +++ b/text/1096-remove-static-assert.md @@ -0,0 +1,73 @@ +- Feature Name: remove-static-assert +- Start Date: 2015-04-28 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1096 +- Rust Issue: https://github.com/rust-lang/rust/pull/24910 + +# Summary + +Remove the `static_assert` feature. + +# Motivation + +To recap, `static_assert` looks like this: + +```rust +#![feature(static_assert)] +#[static_assert] +static asssertion: bool = true; +``` + +If `assertion` is `false` instead, this fails to compile: + +```text +error: static assertion failed +static asssertion: bool = false; + ^~~~~ +``` + +If you don’t have the `feature` flag, you get another interesting error: + +```text +error: `#[static_assert]` is an experimental feature, and has a poor API +``` + +Throughout its life, `static_assert` has been... weird. Graydon suggested it +[in May of 2013][suggest], and it was +[implemented][https://github.com/rust-lang/rust/pull/6670] shortly after. +[Another issue][issue] was created to give it a ‘better interface’. Here’s why: + +> The biggest problem with it is you need a static variable with a name, that +> goes through trans and ends up in the object file. + +In other words, `assertion` above ends up as a symbol in the final output. Not +something you’d usually expect from some kind of static assertion. + +[suggest]: https://github.com/rust-lang/rust/issues/6568 +[issue]: https://github.com/rust-lang/rust/issues/6676 + +So why not improve `static_assert`? With compile time function evaluation, the +idea of a ‘static assertion’ doesn’t need to have language semantics. Either +`const` functions or full-blown CTFE is a useful feature in its own right that +we’ve said we want in Rust. In light of it being eventually added, +`static_assert` doesn’t make sense any more. + +`static_assert` isn’t used by the compiler at all. + +# Detailed design + +Remove `static_assert`. [Implementation submitted here][here]. + +[here]: https://github.com/rust-lang/rust/pull/24910 + +# Drawbacks + +Why should we *not* do this? + +# Alternatives + +This feature is pretty binary: we either remove it, or we don’t. We could keep the feature, +but build out some sort of alternate version that’s not as weird. + +# Unresolved questions + +None with the design, only “should we do this?” diff --git a/text/1102-rename-connect-to-join.md b/text/1102-rename-connect-to-join.md new file mode 100644 index 00000000000..35bae6a7d5f --- /dev/null +++ b/text/1102-rename-connect-to-join.md @@ -0,0 +1,77 @@ +- Feature Name: `rename_connect_to_join` +- Start Date: 2015-05-02 +- RFC PR: [rust-lang/rfcs#1102](https://github.com/rust-lang/rfcs/pull/1102) +- Rust Issue: [rust-lang/rust#26900](https://github.com/rust-lang/rust/issues/26900) + +# Summary + +Rename `.connect()` to `.join()` in `SliceConcatExt`. + +# Motivation + +Rust has a string concatenation method named `.connect()` in `SliceConcatExt`. +However, this does not align with the precedents in other languages. Most +languages use `.join()` for that purpose, as seen later. + +This is probably because, in the ancient Rust, `join` was a keyword to join a +task. However, `join` retired as a keyword in 2011 with the commit +rust-lang/rust@d1857d3. While `.connect()` is technically correct, the name may +not be directly inferred by the users of the mainstream languages. There was [a +question] about this on reddit. + +[a question]: http://www.reddit.com/r/rust/comments/336rj3/whats_the_best_way_to_join_strings_with_a_space/ + +The languages that use the name of `join` are: + +- Python: [str.join](https://docs.python.org/3/library/stdtypes.html#str.join) +- Ruby: [Array.join](http://ruby-doc.org/core-2.2.0/Array.html#method-i-join) +- JavaScript: [Array.prototype.join](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/join) +- Go: [strings.Join](https://golang.org/pkg/strings/#Join) +- C#: [String.Join](https://msdn.microsoft.com/en-us/library/dd783876%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396) +- Java: [String.join](http://docs.oracle.com/javase/8/docs/api/java/lang/String.html#join-java.lang.CharSequence-java.lang.Iterable-) +- Perl: [join](http://perldoc.perl.org/functions/join.html) + +The languages not using `join` are as follows. Interestingly, they are +all functional-ish languages. + +- Haskell: [intercalate](http://hackage.haskell.org/package/text-1.2.0.4/docs/Data-Text.html#v:intercalate) +- OCaml: [String.concat](http://caml.inria.fr/pub/docs/manual-ocaml/libref/String.html#VALconcat) +- F#: [String.concat](https://msdn.microsoft.com/en-us/library/ee353761.aspx) + +Note that Rust also has `.concat()` in `SliceConcatExt`, which is a specialized +version of `.connect()` that uses an empty string as a separator. + +Another reason is that the term "join" already has similar usage in the standard +library. There are `std::path::Path::join` and `std::env::join_paths` which are +used to join the paths. + +# Detailed design + +While the `SliceConcatExt` trait is unstable, the `.connect()` method itself is +marked as stable. So we need to: + +1. Deprecate the `.connect()` method. +2. Add the `.join()` method. + +Or, if we are to achieve the [instability guarantee], we may remove the old +method entirely, as it's still pre-1.0. However, the author considers that this +may require even more consensus. + +[instability guarantee]: https://github.com/rust-lang/rust/issues/24928 + +# Drawbacks + +Having a deprecated method in a newborn language is not pretty. + +If we do remove the `.connect()` method, the language becomes pretty again, but +it breaks the stability guarantee at the same time. + +# Alternatives + +Keep the status quo. Improving searchability in the docs will help newcomers +find the appropriate method. + +# Unresolved questions + +Are there even more clever names for the method? How about `.homura()`, or +`.madoka()`? diff --git a/text/1105-api-evolution.md b/text/1105-api-evolution.md new file mode 100644 index 00000000000..b8b37f1d2dd --- /dev/null +++ b/text/1105-api-evolution.md @@ -0,0 +1,798 @@ +- Feature Name: not applicable +- Start Date: 2015-05-04 +- RFC PR: [rust-lang/rfcs#1105](https://github.com/rust-lang/rfcs/pull/1105) +- Rust Issue: N/A + +# Summary + +This RFC proposes a comprehensive set of guidelines for which changes to +*stable* APIs are considered breaking from a semver perspective, and which are +not. These guidelines are intended for both the standard library and for the +crates.io ecosystem. + +This does *not* mean that the standard library should be completely free to make +non-semver-breaking changes; there are sometimes still risks of ecosystem pain +that need to be taken into account. Rather, this RFC makes explicit an initial +set of changes that absolutely *cannot* be made without a semver bump. + +Along the way, it also discusses some interactions with potential language +features that can help mitigate pain for non-breaking changes. + +The RFC covers only API issues; other issues related to language features, +lints, type inference, command line arguments, Cargo, and so on are considered +out of scope. + +# Motivation + +Both Rust and its library ecosystem have adopted [semver](http://semver.org/), a +technique for versioning platforms/libraries partly in terms of the effect on +the code that uses them. In a nutshell, the versioning scheme has three components:: + +1. **Major**: must be incremented for changes that break client code. +2. **Minor**: incremented for backwards-compatible feature additions. +3. **Patch**: incremented for backwards-compatible bug fixes. + +[Rust 1.0.0](http://blog.rust-lang.org/2015/02/13/Final-1.0-timeline.html) will +mark the beginning of our +[commitment to stability](http://blog.rust-lang.org/2014/10/30/Stability.html), +and from that point onward it will be important to be clear about what +constitutes a breaking change, in order for semver to play a meaningful role. As +we will see, this question is more subtle than one might think at first -- and +the simplest approach would make it effectively impossible to grow the standard +library. + +The goal of this RFC is to lay out a comprehensive policy for what *must* be +considered a breaking API change from the perspective of semver, along with some +guidance about non-semver-breaking changes. + +# Detailed design + +For clarity, in the rest of the RFC, we will use the following terms: + +* **Major change**: a change that requires a major semver bump. +* **Minor change**: a change that requires only a minor semver bump. +* **Breaking change**: a change that, *strictly speaking*, can cause downstream + code to fail to compile. + +What we will see is that in Rust today, almost any change is technically a +breaking change. For example, given the way that globs currently work, *adding +any public item* to a library can break its clients (more on that later). But +not all breaking changes are equal. + +So, this RFC proposes that **all major changes are breaking, but not all breaking +changes are major.** + +## Overview + +### Principles of the policy + +The basic design of the policy is that **the same code should be able to run +against different minor revisions**. Furthermore, minor changes should require +at most a few local *annotations* to the code you are developing, and in +principle no changes to your dependencies. + +In more detail: + +* Minor changes should require at most minor amounts of work upon upgrade. For + example, changes that may require occasional type annotations or use of UFCS + to disambiguate are not automatically "major" changes. (But in such cases, one + must evaluate how widespread these "minor" changes are). + +* In principle, it should be possible to produce a version of dependency code + that *will not break* when upgrading other dependencies, or Rust itself, to a + new minor revision. This goes hand-in-hand with the above bullet; as we will + see, it's possible to save a fully "elaborated" version of upstream code that + does not require any disambiguation. The "in principle" refers to the fact + that getting there may require some additional tooling or language support, + which this RFC outlines. + +That means that any breakage in a minor release must be very "shallow": it must +always be possible to locally fix the problem through some kind of +disambiguation *that could have been done in advance* (by using more explicit +forms) or other annotation (like disabling a lint). It means that minor changes +can never leave you in a state that requires breaking changes to your own code. + +**Although this general policy allows some (very limited) breakage in minor +releases, it is not a license to make these changes blindly**. The breakage that +this RFC permits, aside from being very simple to fix, is also unlikely to occur +often in practice. The RFC will discuss measures that should be employed in the +standard library to ensure that even these minor forms of breakage do not cause +widespread pain in the ecosystem. + +### Scope of the policy + +The policy laid out by this RFC applies to *stable*, *public* APIs in the +standard library. Eventually, stability attributes will be usable in external +libraries as well (this will require some design work), but for now public APIs +in external crates should be understood as de facto stable after the library +reaches 1.0.0 (per semver). + +## Policy by language feature + +Most of the policy is simplest to lay out with reference to specific language +features and the way that APIs using them can, and cannot, evolve in a minor +release. + +**Breaking changes are assumed to be major changes unless otherwise stated**. +The RFC covers many, but not all breaking changes that are major; it covers +*all* breaking changes that are considered minor. + +### Crates + +#### Major change: going from stable to nightly + +Changing a crate from working on stable Rust to *requiring* a nightly is +considered a breaking change. That includes using `#[feature]` directly, or +using a dependency that does so. Crate authors should consider using Cargo +["features"](http://doc.crates.io/manifest.html#the-[features]-section) for +their crate to make such use opt-in. + +#### Minor change: altering the use of Cargo features + +Cargo packages can provide +[opt-in features](http://doc.crates.io/manifest.html#the-[features]-section), +which enable `#[cfg]` options. When a common dependency is compiled, it is done +so with the *union* of all features opted into by any packages using the +dependency. That means that adding or removing a feature could technically break +other, unrelated code. + +However, such breakage always represents a bug: packages are supposed to support +any combination of features, and if another client of the package depends on a +given feature, that client should specify the opt-in themselves. + +### Modules + +#### Major change: renaming/moving/removing any public items. + +Although renaming an item might seem like a minor change, according to the +general policy design this is not a permitted form of breakage: it's not +possible to annotate code in advance to avoid the breakage, nor is it possible +to prevent the breakage from affecting dependencies. + +Of course, much of the effect of renaming/moving/removing can be achieved by +instead using deprecation and `pub use`, and the standard library should not be +afraid to do so! In the long run, we should consider hiding at least some old +deprecated items from the docs, and could even consider putting out a major +version solely as a kind of "garbage collection" for long-deprecated APIs. + +#### Minor change: adding new public items. + +Note that adding new public items is currently a breaking change, due to glob +imports. For example, the following snippet of code will break if the `foo` +module introduces a public item called `bar`: + +```rust +use foo::*; +fn bar() { ... } +``` + +The problem here is that glob imports currently do not allow any of their +imports to be shadowed by an explicitly-defined item. + +This is considered a minor change because under the principles of this RFC: the +glob imports could have been written as more explicit (expanded) `use` +statements. It is also plausible to do this expansion automatically for a +crate's dependencies, to prevent breakage in the first place. + +(This RFC also suggests permitting shadowing of a glob import by any explicit +item. This has been the intended semantics of globs, but has not been +implemented. The details are left to a future RFC, however.) + +### Structs + +See "[Signatures in type definitions](#signatures-in-type-definitions)" for some +general remarks about changes to the actual types in a `struct` definition. + +#### Major change: adding a private field when all current fields are public. + +This change has the effect of making external struct literals impossible to +write, which can break code irreparably. + +#### Major change: adding a public field when no private field exists. + +This change retains the ability to use struct literals, but it breaks existing +uses of such literals; it likewise breaks exhaustive matches against the struct. + +#### Minor change: adding or removing private fields when at least one already exists (before and after the change). + +No existing code could be relying on struct literals for the struct, nor on +exhaustively matching its contents, and client code will likewise be oblivious +to the addition of further private fields. + +For tuple structs, this is only a minor change if furthermore *all* fields are +currently private. (Tuple structs with mixtures of public and private fields are +bad practice in any case.) + +#### Minor change: going from a tuple struct with all private fields (with at least one field) to a normal struct, or vice versa. + +This is technically a breaking change: + +```rust +// in some other module: +pub struct Foo(SomeType); + +// in downstream code +let Foo(_) = foo; +``` + +Changing `Foo` to a normal struct can break code that matches on it -- but there +is never any real reason to match on it in that circumstance, since you cannot +extract any fields or learn anything of interest about the struct. + +### Enums + +See "[Signatures in type definitions](#signatures-in-type-definitions)" for some +general remarks about changes to the actual types in an `enum` definition. + +#### Major change: adding new variants. + +Exhaustiveness checking means that a `match` that explicitly checks all the +variants for an `enum` will break if a new variant is added. It is not currently +possible to defend against this breakage in advance. + +A [postponed RFC](https://github.com/rust-lang/rfcs/pull/757) discusses a +language feature that allows an enum to be marked as "extensible", which +modifies the way that exhaustiveness checking is done and would make it possible +to extend the enum without breakage. + +#### Major change: adding new fields to a variant. + +If the enum is public, so is the full contents of all of its variants. As per +the rules for structs, this means it is not allowed to add any new fields (which +will automatically be public). + +If you wish to allow for this kind of extensibility, consider introducing a new, +explicit struct for the variant up front. + +### Traits + +#### Major change: adding a non-defaulted item. + +Adding any item without a default will immediately break all trait implementations. + +It's possible that in the future we will allow some kind of +"[sealing](#thoughts-on-possible-language-changes-unofficial)" to say that a trait can only be used as a bound, not +to provide new implementations; such a trait *would* allow arbitrary items to be +added. + +#### Major change: any non-trivial change to item signatures. + +Because traits have both implementors and consumers, any change to the signature +of e.g. a method will affect at least one of the two parties. So, for example, +abstracting a concrete method to use generics instead might work fine for +clients of the trait, but would break existing implementors. (Note, as above, +the potential for "sealed" traits to alter this dynamic.) + +#### Minor change: adding a defaulted item. + +Adding a defaulted item is technically a breaking change: + +```rust +trait Trait1 {} +trait Trait2 { + fn foo(&self); +} + +fn use_both(t: &T) { + t.foo() +} +``` + +If a `foo` method is added to `Trait1`, even with a default, it would cause a +dispatch ambiguity in `use_both`, since the call to `foo` could be referring to +either trait. + +(Note, however, that existing *implementations* of the trait are fine.) + +According to the basic principles of this RFC, such a change is minor: it is +always possible to annotate the call `t.foo()` to be more explicit *in advance* +using UFCS: `Trait2::foo(t)`. This kind of annotation could be done +automatically for code in dependencies (see +[Elaborated source](#elaborated-source)). And it would also be possible to +mitigate this problem by allowing +[method renaming on trait import](#trait-item-renaming). + +While the scenario of adding a defaulted method to a trait may seem somewhat +obscure, the exact same hazards arise with *implementing existing traits* (see +below), which is clearly vital to allow; we apply a similar policy to both. + +All that said, it is incumbent on library authors to ensure that such "minor" +changes are in fact minor in practice: if a conflict like `t.foo()` is likely to +arise at all often in downstream code, it would be advisable to explore a +different choice of names. More guidelines for the standard library are given +later on. + +There are two circumstances when adding a defaulted item is still a major change: + +* The new item would change the trait from object safe to non-object safe. +* The trait has a defaulted associated type and the item being added is a + defaulted function/method. In this case, existing impls that override the + associated type will break, since the function/method default will not + apply. (See + [the associated item RFC](https://github.com/rust-lang/rfcs/blob/master/text/0195-associated-items.md#defaults)). +* Adding a default to an existing associated type is likewise a major change if + the trait has defaulted methods, since it will invalidate use of those + defaults for the methods in existing trait impls. + +#### Minor change: adding a defaulted type parameter. + +As with "[Signatures in type definitions](#signatures-in-type-definitions)", +traits are permitted to add new type parameters as long as defaults are provided +(which is backwards compatible). + +### Trait implementations + +#### Major change: implementing any "fundamental" trait. + +A [recent RFC](https://github.com/rust-lang/rfcs/pull/1023) introduced the idea +of "fundamental" traits which are so basic that *not* implementing such a trait +right off the bat is considered a promise that you will *never* implement the +trait. The `Sized` and `Fn` traits are examples. + +The coherence rules take advantage of fundamental traits in such a way that +*adding a new implementation of a fundamental trait to an existing type can +cause downstream breakage*. Thus, such impls are considered major changes. + +#### Minor change: implementing any non-fundamental trait. + +Unfortunately, implementing any existing trait can cause breakage: + +```rust +// Crate A + pub trait Trait1 { + fn foo(&self); + } + + pub struct Foo; // does not implement Trait1 + +// Crate B + use crateA::Trait1; + + trait Trait2 { + fn foo(&self); + } + + impl Trait2 for crateA::Foo { .. } + + fn use_foo(f: &crateA::Foo) { + f.foo() + } +``` + +If crate A adds an implementation of `Trait1` for `Foo`, the call to `f.foo()` +in crate B will yield a dispatch ambiguity (much like the one we saw for +defaulted items). Thus *technically implementing any existing trait is a +breaking change!* Completely prohibiting such a change is clearly a non-starter. + +However, as before, this kind of breakage is considered "minor" by the +principles of this RFC (see "Adding a defaulted item" above). + +### Inherent implementations + +#### Minor change: adding any inherent items. + +Adding an inherent item cannot lead to dispatch ambiguity, because inherent +items trump any trait items with the same name. + +However, introducing an inherent item *can* lead to breakage if the signature of +the item does not match that of an in scope, implemented trait: + +```rust +// Crate A + pub struct Foo; + +// Crate B + trait Trait { + fn foo(&self); + } + + impl Trait for crateA::Foo { .. } + + fn use_foo(f: &crateA::Foo) { + f.foo() + } +``` + +If crate A adds a method: + +```rust +impl Foo { + fn foo(&self, x: u8) { ... } +} +``` + +then crate B would no longer compile, since dispatch would prefer the inherent +impl, which has the wrong type. + +Once more, this is considered a minor change, since UFCS can disambiguate (see +"Adding a defaulted item" above). + +It's worth noting, however, that if the signatures *did* happen to match then +the change would no longer cause a compilation error, but might silently change +runtime behavior. The case where the same method for the same type has +meaningfully different behavior is considered unlikely enough that the RFC is +willing to permit it to be labeled as a minor change -- and otherwise, inherent +methods could never be added after the fact. + +### Other items + +Most remaining items do not have any particularly unique items: + +* For type aliases, see "[Signatures in type definitions](#signatures-in-type-definitions)". +* For free functions, see "[Signatures in functions](#signatures-in-functions)". + +## Cross-cutting concerns + +### Behavioral changes + +This RFC is largely focused on API changes which may, in particular, cause +downstream code to stop compiling. But in some sense it is even more pernicious +to make a change that allows downstream code to continue compiling, but causes +its runtime behavior to break. + +This RFC does not attempt to provide a comprehensive policy on behavioral +changes, which would be extremely difficult. In general, APIs are expected to +provide explicit contracts for their behavior via documentation, and behavior +that is not part of this contract is permitted to change in minor +revisions. (Remember: this RFC is about setting a *minimum* bar for when major +version bumps are required.) + +This policy will likely require some revision over time, to become more explicit +and perhaps lay out some best practices. + +### Signatures in type definitions + +#### Major change: tightening bounds. + +Adding new constraints on existing type parameters is a breaking change, since +existing uses of the type definition can break. So the following is a major +change: + +```rust +// MAJOR CHANGE + +// Before +struct Foo { .. } + +// After +struct Foo { .. } +``` + +#### Minor change: loosening bounds. + +Loosening bounds, on the other hand, cannot break code because when you +reference `Foo`, you *do not learn anything about the bounds on `A`*. (This +is why you have to repeat any relevant bounds in `impl` blocks for `Foo`, for +example.) So the following is a minor change: + +```rust +// MINOR CHANGE + +// Before +struct Foo { .. } + +// After +struct Foo { .. } +``` + +#### Minor change: adding defaulted type parameters. + +All existing references to a type/trait definition continue to compile and work +correctly after a new defaulted type parameter is added. So the following is +a minor change: + +```rust +// MINOR CHANGE + +// Before +struct Foo { .. } + +// After +struct Foo { .. } +``` + +#### Minor change: generalizing to generics. + +A struct or enum field can change from a concrete type to a generic type +parameter, provided that the change results in an identical type for all +existing use cases. For example, the following change is permitted: + +```rust +// MINOR CHANGE + +// Before +struct Foo(pub u8); + +// After +struct Foo(pub T); +``` + +because existing uses of `Foo` are shorthand for `Foo` which yields the +identical field type. (Note: this is not actually true today, since +[default type parameters](https://github.com/rust-lang/rfcs/pull/213) are not +fully implemented. But this is the intended semantics.) + +On the other hand, the following is not permitted: + +```rust +// MAJOR CHANGE + +// Before +struct Foo(pub T, pub u8); + +// After +struct Foo(pub T, pub T); +``` + +since there may be existing uses of `Foo` with a non-default type parameter +which would break as a result of the change. + +It's also permitted to change from a generic type to a more-generic one in a +minor revision: + +```rust +// MINOR CHANGE + +// Before +struct Foo(pub T, pub T); + +// After +struct Foo(pub T, pub U); +``` + +since, again, all existing uses of the type `Foo` will yield the same field +types as before. + +### Signatures in functions + +All of the changes mentioned below are considered major changes in the context +of trait methods, since they can break implementors. + +#### Major change: adding/removing arguments. + +At the moment, Rust does not provide defaulted arguments, so any change in arity +is a breaking change. + +#### Minor change: introducing a new type parameter. + +Technically, adding a (non-defaulted) type parameter can break code: + +```rust +// MINOR CHANGE (but causes breakage) + +// Before +fn foo(...) { ... } + +// After +fn foo(...) { ... } +``` + +will break any calls like `foo::`. However, such explicit calls are rare +enough (and can usually be written in other ways) that this breakage is +considered minor. (However, one should take into account how likely it is that +the function in question is being called with explicit type arguments). This +RFC also suggests adding a `...` notation to explicit parameter lists to keep +them open-ended (see suggested language changes). + +Such changes are an important ingredient of abstracting to use generics, as +described next. + +#### Minor change: generalizing to generics. + +The type of an argument to a function, or its return value, can be *generalized* +to use generics, including by introducing a new type parameter (as long as it +can be instantiated to the original type). For example, the following change is +allowed: + +```rust +// MINOR CHANGE + +// Before +fn foo(x: u8) -> u8; +fn bar>(t: T); + +// After +fn foo(x: T) -> T; +fn bar>(t: T); +``` + +because all existing uses are instantiations of the new signature. On the other +hand, the following isn't allowed in a minor revision: + +```rust +// MAJOR CHANGE + +// Before +fn foo(x: Vec); + +// After +fn foo>(x: T); +``` + +because the generics include a constraint not satisfied by the original type. + +Introducing generics in this way can potentially create type inference failures, +but these are considered acceptable per the principles of the RFC: they only +require local annotations that could have been inserted in advance. + +Perhaps somewhat surprisingly, generalization applies to trait objects as well, +given that every trait implements itself: + +```rust +// MINOR CHANGE + +// Before +fn foo(t: &Trait); + +// After +fn foo(t: &T); +``` + +(The use of `?Sized` is essential; otherwise you couldn't recover the original +signature). + +### Lints + +#### Minor change: introducing new lint warnings/errors + +Lints are considered advisory, and changes that cause downstream code to receive +additional lint warnings/errors are still considered "minor" changes. + +Making this work well in practice will likely require some infrastructure work +along the lines of +[this RFC issue](https://github.com/rust-lang/rfcs/issues/1029) + +## Mitigation for minor changes + +### The Crater tool + +@brson has been hard at work on a tool called "Crater" which can be used to +exercise changes on the entire crates.io ecosystem, looking for +regressions. This tool will be indispensable when weighing the costs of a minor +change that might cause some breakage -- we can actually gauge what the breakage +would look like in practice. + +While this would, of course, miss code not available publicly, the hope is that +code on crates.io is a broadly representative sample, good enough to turn up +problems. + +Any breaking, but minor change to the standard library must be evaluated through +Crater before being committed. + +### Nightlies + +One line of defense against a "minor" change causing significant breakage is the +nightly release channel: we can get feedback about breakage long before it makes +even into a beta release. And of course the beta cycle itself provides another +line of defense. + +### Elaborated source + +When compiling upstream dependencies, it is possible to generate an "elaborated" +version of the source code where all dispatch is resolved to explicit UFCS form, +all types are annotated, and all glob imports are replaced by explicit imports. + +This fully-elaborated form is almost entirely immune to breakage due to any of +the "minor changes" listed above. + +You could imagine Cargo storing this elaborated form for dependencies upon +compilation. That would in turn make it easy to update Rust, or some subset of +dependencies, without breaking any upstream code (even in minor ways). You would +be left only with very small, local changes to make to the code you own. + +While this RFC does not propose any such tooling change right now, the point is +mainly that there are a lot of options if minor changes turn out to cause +breakage more often than anticipated. + +### Trait item renaming + +One very useful mechanism would be the ability to import a trait while renaming +some of its items, e.g. `use some_mod::SomeTrait with {foo_method as bar}`. In +particular, when methods happen to conflict across traits defined in separate +crates, a user of the two traits could rename one of the methods out of the way. + +## Thoughts on possible language changes (unofficial) + +The following is just a quick sketch of some focused language changes that would +help our API evolution story. + +**Glob semantics** + +As already mentioned, the fact that glob imports currently allow *no* shadowing +is deeply problematic: in a technical sense, it means that the addition of *any* +public item can break downstream code arbitrarily. + +It would be much better for API evolution (and for ergonomics and intuition) if +explicitly-defined items trump glob imports. But this is left to a future RFC. + +**Globs with fine-grained control** + +Another useful tool for working with globs would be the ability to *exclude* +certain items from a glob import, e.g. something like: + +```rust +use some_module::{* without Foo}; +``` + +This is especially useful for the case where multiple modules being glob +imported happen to export items with the same name. + +Another possibility would be to not make it an error for two glob imports to +bring the same name into scope, but to generate the error only at the point that +the imported name was actually *used*. Then collisions could be resolved simply +by adding a single explicit, shadowing import. + +**Default type parameters** + +Some of the minor changes for moving to more generic code depends on an +interplay between defaulted type paramters and type inference, which has been +[accepted as an RFC](https://github.com/rust-lang/rfcs/pull/213) but not yet +implemented. + +**"Extensible" enums** + +There is already [an RFC](https://github.com/rust-lang/rfcs/pull/757) for an +`enum` annotation that would make it possible to add variants without ever +breaking downstream code. + +**Sealed traits** + +The ability to annotate a trait with some "sealed" marker, saying that no +external implementations are allowed, would be useful in certain cases where a +crate wishes to define a closed set of types that implements a particular +interface. Such an attribute would make it possible to evolve the interface +without a major version bump (since no downstream implementors can exist). + +**Defaulted parameters** + +Also known as "optional arguments" -- an +[oft-requested](https://github.com/rust-lang/rfcs/issues/323) feature. Allowing +arguments to a function to be optional makes it possible to add new arguments +after the fact without a major version bump. + +**Open-ended explicit type paramters** + +One hazard is that with today's explicit type parameter syntax, you must always +specify *all* type parameters: `foo::(x, y)`. That means that adding a new +type parameter to `foo` can break code, even if a default is provided. + +This could be easily addressed by adding a notation like `...` to leave +additional parameters unspecified: `foo::(x, y)`. + +# Drawbacks and Alternatives + +The main drawback to the approach laid out here is that it makes the stability +and semver guarantees a bit fuzzier: the promise is not that code will never +break, full stop, but rather that minor release breakage is of an extremely +limited form, for which there are a variety of mitigation strategies. This +approach tries to strike a middle ground between a very hard line for stability +(which, for Rust, would rule out many forms of extension) and willy-nilly +breakage: it's an explicit, but pragmatic policy. + +An alternative would be to take a harder line and find some other way to allow +API evolution. Supposing that we resolved the issues around glob imports, the +main problems with breakage have to do with adding new inherent methods or trait +implementations -- both of which are vital forms of evolution. It might be +possible, in the standard library case, to provide some kind of version-based +opt in to this evolution: a crate could opt in to breaking changes for a +particular version of Rust, which might in turn be provided only through some +`cfg`-like mechanism. + +Note that these strategies are not mutually exclusive. Rust's development +processes involved a very steady, strong stream of breakage, and while we need +to be very serious about stabilization, it is possible to take an iterative +approach. The changes considered "major" by this RFC already move the bar *very +significantly* from what was permitted pre-1.0. It may turn out that even the +minor forms of breakage permitted here are, in the long run, too much to +tolerate; at that point we could revise the policies here and explore some +opt-in scheme, for example. + +# Unresolved questions + +## Behavioral issues + +- Is it permitted to change a contract from "abort" to "panic"? What about from + "panic" to "return an `Err`"? + +- Should we try to lay out more specific guidance for behavioral changes at this + point? diff --git a/text/1119-result-expect.md b/text/1119-result-expect.md new file mode 100644 index 00000000000..59ddf9ed6a1 --- /dev/null +++ b/text/1119-result-expect.md @@ -0,0 +1,41 @@ +- Feature Name: `result_expect` +- Start Date: 2015-05-13 +- RFC PR: [rust-lang/rfcs#1119](https://github.com/rust-lang/rfcs/pull/1119) +- Rust Issue: [rust-lang/rust#25359](https://github.com/rust-lang/rust/pull/25359) + +# Summary + +Add an `expect` method to the Result type, bounded to `E: Debug` + +# Motivation + +While `Result::unwrap` exists, it does not allow annotating the panic message with the operation +attempted (e.g. what file was being opened). This is at odds to 'Option' which includes both +`unwrap` and `expect` (with the latter taking an arbitrary failure message). + +# Detailed design + +Add a new method to the same `impl` block as `Result::unwrap` that takes a `&str` message and +returns `T` if the `Result` was `Ok`. If the `Result` was `Err`, it panics with both the provided +message and the error value. + +The format of the error message is left undefined in the documentation, but will most likely be +the following + +``` +panic!("{}: {:?}", msg, e) +``` + +# Drawbacks + +- It involves adding a new method to a core rust type. +- The panic message format is less obvious than it is with `Option::expect` (where the panic message is the message passed) + +# Alternatives + +- We are perfectly free to not do this. +- A macro could be introduced to fill the same role (which would allow arbitrary formatting of the panic message). + +# Unresolved questions + +Are there any issues with the proposed format of the panic string? diff --git a/text/1122-language-semver.md b/text/1122-language-semver.md new file mode 100644 index 00000000000..ed0985d606d --- /dev/null +++ b/text/1122-language-semver.md @@ -0,0 +1,307 @@ +- Feature Name: N/A +- Start Date: 2015-05-07 +- RFC PR: [rust-lang/rfcs#1122](https://github.com/rust-lang/rfcs/pull/1122) +- Rust Issue: N/A + +# Summary + +This RFC has the goal of defining what sorts of breaking changes we +will permit for the Rust language itself, and giving guidelines for +how to go about making such changes. + +# Motivation + +With the release of 1.0, we need to establish clear policy on what +precisely constitutes a "minor" vs "major" change to the Rust language +itself (as opposed to libraries, which are covered by [RFC 1105]). +**This RFC proposes that minor releases may only contain breaking +changes that fix compiler bugs or other type-system +issues**. Primarily, this means soundness issues where "innocent" code +can cause undefined behavior (in the technical sense), but it also +covers cases like compiler bugs and tightening up the semantics of +"underspecified" parts of the language (more details below). + +However, simply landing all breaking changes immediately could be very +disruptive to the ecosystem. Therefore, **the RFC also proposes +specific measures to mitigate the impact of breaking changes**, and +some criteria when those measures might be appropriate. + +In rare cases, it may be deemed a good idea to make a breaking change +that is not a soundness problem or compiler bug, but rather correcting +a defect in design. Such cases should be rare. But if such a change is +deemed worthwhile, then the guidelines given here can still be used to +mitigate its impact. + +# Detailed design + +The detailed design is broken into two major sections: how to address +soundness changes, and how to address other, opt-in style changes. We +do not discuss non-breaking changes here, since obviously those are +safe. + +### Soundness changes + +When compiler or type-system bugs are encountered in the language +itself (as opposed to in a library), clearly they ought to be +fixed. However, it is important to fix them in such a way as to +minimize the impact on the ecosystem. + +The first step then is to evaluate the impact of the fix on the crates +found in the `crates.io` website (using e.g. the crater tool). If +impact is found to be "small" (which this RFC does not attempt to +precisely define), then the fix can simply be landed. As today, the +commit message of any breaking change should include the term +`[breaking-change]` along with a description of how to resolve the +problem, which helps those people who are affected to migrate their +code. A description of the problem should also appear in the relevant +subteam report. + +In cases where the impact seems larger, any effort to ease the +transition is sure to be welcome. The following are suggestions for +possible steps we could take (not all of which will be applicable to +all scenarios): + +1. Identify important crates (such as those with many dependants) + and work with the crate author to correct the code as quickly as + possible, ideally before the fix even lands. +2. Work hard to ensure that the error message identifies the problem + clearly and suggests the appropriate solution. + - If we develop a rustfix tool, in some cases we may be able to + extend that tool to perform the fix automatically. +3. Provide an annotation that allows for a scoped "opt out" of the + newer rules, as described below. While the change is still + breaking, this at least makes it easy for crates to update and get + back to compiling status quickly. +4. Begin with a deprecation or other warning before issuing a hard + error. In extreme cases, it might be nice to begin by issuing a + deprecation warning for the unsound behavior, and only make the + behavior a hard error after the deprecation has had time to + circulate. This gives people more time to update their crates. + However, this option may frequently not be available, because the + source of a compilation error is often hard to pin down with + precision. + +Some of the factors that should be taken into consideration when +deciding whether and how to minimize the impact of a fix: + +- How important is the change? + - Soundness holes that can be easily exploited or which impact + running code are obviously much more concerning than minor corner + cases. There is somewhat in tension with the other factors: if + there is, for example, a widely deployed vulnerability, fixing + that vulnerability is important, but it will also cause a larger + disruption. +- How many crates on `crates.io` are affected? + - This is a general proxy for the overall impact (since of course + there will always be private crates that are not part of + crates.io). +- Were particularly vital or widely used crates affected? + - This could indicate that the impact will be wider than the raw + number would suggest. +- Does the change silently change the result of running the program, + or simply cause additional compilation failures? + - The latter, while frustrating, are easier to diagnose. +- What changes are needed to get code compiling again? Are those + changes obvious from the error message? + - The more cryptic the error, the more frustrating it is when + compilation fails. + +#### What is a "compiler bug" or "soundness change"? + +In the absence of a formal spec, it is hard to define precisely what +constitutes a "compiler bug" or "soundness change" (see also the +section below on underspecified parts of the language). The obvious +cases are soundness violations in a rather strict sense: + +- Cases where the user is able to produce Undefined Behavior (UB) + purely from safe code. +- Cases where the user is able to produce UB using standard library + APIs or other unsafe code that "should work". + +However, there are other kinds of type-system inconsistencies that +might be worth fixing, even if they cannot lead directly to UB. Bugs +in the coherence system that permit uncontrolled overlap between impls +are one example. Another example might be inference failures that +cause code to compile which should not (because ambiguities +exist). Finally, there is a list below of areas of the language which +are generally considered underspecified. + +We expect that there will be cases that fall on a grey line between +bug and expected behavior, and discussion will be needed to determine +where it falls. The recent conflict between `Rc` and scoped threads is +an example of such a discusison: it was clear that both APIs could not +be legal, but not clear which one was at fault. The results of these +discussions will feed into the Rust spec as it is developed. + +#### Opting out + +In some cases, it may be useful to permit users to opt out of new type +rules. The intention is that this "opt out" is used as a temporary +crutch to make it easy to get the code up and running. Typically this +opt out will thus be removed in a later release. But in some cases, +particularly those cases where the severity of the problem is +relatively small, it could be an option to leave the "opt out" +mechanism in place permanently. In either case, use of the "opt out" +API would trigger the deprecation lint. + +Note that we should make every effort to ensure that crates which +employ this opt out can be used compatibly with crates that do not. + +#### Changes that alter dynamic semantics versus typing rules + +In some cases, fixing a bug may not cause crates to stop compiling, +but rather will cause them to silently start doing something different +than they were doing before. In cases like these, the same principle +of using mitigation measures to lessen the impact (and ease the +transition) applies, but the precise strategy to be used will have to +be worked out on a more case-by-case basis. This is particularly +relevant to the underspecified areas of the language described in the +next section. + +Our approach to handling [dynamic drop][RFC 320] is a good +example. Because we expect that moving to the complete non-zeroing +dynamic drop semantics will break code, we've made an intermediate +change that +[altered the compiler to fill with use a non-zero value](https://github.com/rust-lang/rust/pull/23535), +which helps to expose code that was implicitly relying on the current +behavior (much of which has since been restructured in a more +future-proof way). + +#### Underspecified language semantics + +There are a number of areas where the precise language semantics are +currently somewhat underspecified. Over time, we expect to be fully +defining the semantics of all of these areas. This may cause some +existing code -- and in particular existing unsafe code -- to break or +become invalid. Changes of this nature should be treated as soundness +changes, meaning that we should attempt to mitigate the impact and +ease the transition wherever possible. + +Known areas where change is expected include the following: + +- Destructors semantics: + - We plan to stop zeroing data and instead use marker flags on the stack, + as specified in [RFC 320]. This may affect destructors that rely on ovewriting + memory or using the `unsafe_no_drop_flag` attribute. + - Currently, panicing in a destructor can cause unintentional memory + leaks and other poor behavior (see [#14875], [#16135]). We are + likely to make panic in a destructor simply abort, but the precise + mechanism is not yet decided. + - Order of dtor execution within a data structure is somewhat + inconsistent (see [#744]). +- The legal aliasing rules between unsafe pointers is not fully settled (see [#19733]). +- The interplay of assoc types and lifetimes is not fully settled and can lead + to unsoundness in some cases (see [#23442]). +- The trait selection algorithm is expected to be improved and made more complete over time. + It is possible that this will affect existing code. +- [Overflow semantics][RFC 560]: in particular, we may have missed some cases. +- Memory allocation in unsafe code is currently unstable. We expect to + be defining safe interfaces as part of the work on supporting + tracing garbage collectors (see [#415]). +- The treatment of hygiene in macros is uneven (see [#22462], + [#24278]). In some cases, changes here may be backwards compatible, + or may be more appropriate only with explicit opt-in (or perhaps an + alternate macro system altogether, such as [this proposal][macro]). +- Lints will evolve over time (both the lints that are enabled and the + precise cases that lints catch). We expect to introduce a + [means to limit the effect of these changes on dependencies][#1029]. +- Stack overflow is currently detected via a segmented stack check + prologue and results in an abort. We expect to experiment with a + system based on guard pages in the future. +- We currently abort the process on OOM conditions (exceeding the heap space, overflowing + the stack). We may attempt to panic in such cases instead if possible. +- Some details of type inference may change. For example, we expect to + implement the fallback mechanism described in [RFC 213], and we may + wish to make minor changes to accommodate overloaded integer + literals. In some cases, type inferences changes may be better + handled via explicit opt-in. + +There are other kinds of changes that can be made in a minor version +that may break unsafe code but which are not considered breaking +changes, because the unsafe code is relying on things known to be +intentionally unspecified. One obvious example is the layout of data +structures, which is considered undefined unless they have a +`#[repr(C)]` attribute. + +Although it is not directly covered by this RFC, it's worth noting in +passing that some of the CLI flags to the compiler may change in the +future as well. The `-Z` flags are of course explicitly unstable, but +some of the `-C`, rustdoc, and linker-specific flags are expected to +evolve over time (see e.g. [#24451]). + +# Drawbacks + +The primary drawback is that making breaking changes are disruptive, +even when done with the best of intentions. The alternatives list some +ways that we could avoid breaking changes altogether, and the +downsides of each. + +## Notes on phasing + +# Alternatives + +**Rather than simply fixing soundness bugs, we could issue new major +releases, or use some sort of opt-in mechanism to fix them +conditionally.** This was initially considered as an option, but +eventually rejected for the following reasons: + +- Opting in to type system changes would cause deep splits between + minor versions; it would also create a high maintenance burden in + the compiler, since both older and newer versions would have to be + supported. +- It seems likely that all users of Rust will want to know that their + code is sound and would not want to be working with unsafe + constructs or bugs. +- We already have several mitigation measures, such as opt-out or + temporary deprecation, that can be used to ease the transition + around a soundness fix. Moreover, separating out new type rules so + that they can be "opted into" can be very difficult and would + complicate the compiler internally; it would also make it harder to + reason about the type system as a whole. + +# Unresolved questions + +**What precisely constitutes "small" impact?** This RFC does not +attempt to define when the impact of a patch is "small" or "not +small". We will have to develop guidelines over time based on +precedent. One of the big unknowns is how indicative the breakage we +observe on `crates.io` will be of the total breakage that will occur: +it is certainly possible that all crates on `crates.io` work fine, but +the change still breaks a large body of code we do not have access to. + +**What attribute should we use to "opt out" of soundness changes?** +The section on breaking changes indicated that it may sometimes be +appropriate to includ an "opt out" that people can use to temporarily +revert to older, unsound type rules, but did not specify precisely +what that opt-out should look like. Ideally, we would identify a +specific attribute in advance that will be used for such purposes. In +the past, we have simply created ad-hoc attributes (e.g., +`#[old_orphan_check]`), but because custom attributes are forbidden by +stable Rust, this has the unfortunate side-effect of meaning that code +which opts out of the newer rules cannot be compiled on older +compilers (even though it's using the older type system rules). If we +introduce an attribute in advance we will not have this problem. + +**Are there any other circumstances in which we might perform a +breaking change?** In particular, it may happen from time to time that +we wish to alter some detail of a stable component. If we believe that +this change will not affect anyone, such a change may be worth doing, +but we'll have to work out more precise guidelines. [RFC 1156] is an +example. + +[RFC 1105]: https://github.com/rust-lang/rfcs/pull/1105 +[RFC 320]: https://github.com/rust-lang/rfcs/pull/320 +[#744]: https://github.com/rust-lang/rfcs/issues/744 +[#14875]: https://github.com/rust-lang/rust/issues/14875 +[#16135]: https://github.com/rust-lang/rust/issues/16135 +[#19733]: https://github.com/rust-lang/rust/issues/19733 +[#23442]: https://github.com/rust-lang/rust/issues/23442 +[RFC 213]: https://github.com/rust-lang/rfcs/pull/213 +[#415]: https://github.com/rust-lang/rfcs/issues/415 +[#22462]: https://github.com/rust-lang/rust/issues/22462#issuecomment-81756673 +[#24278]: https://github.com/rust-lang/rust/issues/24278 +[#1029]: https://github.com/rust-lang/rfcs/issues/1029 +[RFC 560]: https://github.com/rust-lang/rfcs/pull/560 +[macro]: https://internals.rust-lang.org/t/pre-rfc-macro-improvements/2088 +[#24451]: https://github.com/rust-lang/rust/pull/24451 +[RFC 1156]: https://github.com/rust-lang/rfcs/pull/1156 diff --git a/text/1123-str-split-at.md b/text/1123-str-split-at.md new file mode 100644 index 00000000000..f57e08b3458 --- /dev/null +++ b/text/1123-str-split-at.md @@ -0,0 +1,102 @@ +- Feature Name: `str_split_at` +- Start Date: 2015-05-17 +- RFC PR: [rust-lang/rfcs#1123](https://github.com/rust-lang/rfcs/pull/1123) +- Rust Issue: [rust-lang/rust#25839](https://github.com/rust-lang/rust/pull/25839) + +# Summary + +Introduce the method `split_at(&self, mid: usize) -> (&str, &str)` on `str`, +to divide a slice into two, just like we can with `[T]`. + +# Motivation + +Adding `split_at` is a measure to provide a method from `[T]` in a version that +makes sense for `str`. + +Once used to `[T]`, users might even expect that `split_at` is present on str. + +It is a simple method with an obvious implementation, but it provides +convenience while working with string segmentation manually, which we already +have ample tools for (for example the method `find` that returns the first +matching byte offset). + +Using `split_at` can lead to less repeated bounds checks, since it is easy to +use cumulatively, splitting off a piece at a time. + +This feature is requested in [rust-lang/rust#18063][freq] + +[freq]: https://github.com/rust-lang/rust/issues/18063 + +# Detailed design + +Introduce the method `split_at(&self, mid: usize) -> (&str, &str)` on `str`, to +divide a slice into two. + +`mid` will be a byte offset from the start of the string, and it must be on +a character boundary. Both `0` and `self.len()` are valid splitting points. + +`split_at` will be an inherent method on `str` where possible, and will be +available from libcore and the layers above it. + +The following is a working implementation, implemented as a trait just for +illustration and to be testable as a custom extension: + +```rust +trait SplitAt { + fn split_at(&self, mid: usize) -> (&Self, &Self); +} + +impl SplitAt for str { + /// Divide one string slice into two at an index. + /// + /// The index `mid` is a byte offset from the start of the string + /// that must be on a character boundary. + /// + /// Return slices `&self[..mid]` and `&self[mid..]`. + /// + /// # Panics + /// + /// Panics if `mid` is beyond the last character of the string, + /// or if it is not on a character boundary. + /// + /// # Examples + /// ``` + /// let s = "Löwe 老虎 Léopard"; + /// let first_space = s.find(' ').unwrap_or(s.len()); + /// let (a, b) = s.split_at(first_space); + /// + /// assert_eq!(a, "Löwe"); + /// assert_eq!(b, " 老虎 Léopard"); + /// ``` + fn split_at(&self, mid: usize) -> (&str, &str) { + (&self[..mid], &self[mid..]) + } +} +``` + +`split_at` will use a byte offset (a.k.a byte index) to be consistent with +slicing and the offset used by interrogator methods such as `find` or iterators +such as `char_indices`. Byte offsets are our standard lightweight position +indicators that we use to support efficient operations on string slices. + +Implementing `split_at_mut` is not relevant for `str` at this time. + +# Drawbacks + +* `split_at` panics on 1) index out of bounds 2) index not on character + boundary. +* Possible name confusion with other `str` methods like `.split()` +* According to our developing API evolution and semver guidelines this is a + breaking change but a (very) minor change. Adding methods is something we + expect to be able to. (See [RFC PR #1105][pr1105]). + +[pr1105]: https://github.com/rust-lang/rfcs/pull/1105 + +# Alternatives + +* Recommend other splitting methods, like the split iterators. +* Stick to writing `(&foo[..mid], &foo[mid..])` + +# Unresolved questions + +* *None* diff --git a/text/1131-likely-intrinsic.md b/text/1131-likely-intrinsic.md new file mode 100644 index 00000000000..b5b894cbd70 --- /dev/null +++ b/text/1131-likely-intrinsic.md @@ -0,0 +1,56 @@ +- Feature Name: expect_intrinsic +- Start Date: 2015-05-20 +- RFC PR: [rust-lang/rfcs#1131](https://github.com/rust-lang/rfcs/pull/1131) +- Rust Issue: [rust-lang/rust#26179](https://github.com/rust-lang/rust/issues/26179) + +# Summary + +Provide a pair of intrinsic functions for hinting the likelyhood of branches being taken. + +# Motivation + +Branch prediction can have significant effects on the running time of some code. Especially tight +inner loops which may be run millions of times. While in general programmers aren't able to +effectively provide hints to the compiler, there are cases where the likelyhood of some branch +being taken can be known. + +For example: in arbitrary-precision arithmetic, operations are often performed in a base that is +equal to `2^word_size`. The most basic division algorithm, "Schoolbook Division", has a step that +will be taken in `2/B` cases (where `B` is the base the numbers are in), given random input. On a +32-bit processor that is approximately one in two billion cases, for 64-bit it's one in 18 +quintillion cases. + +# Detailed design + +Implement a pair of intrinsics `likely` and `unlikely`, both with signature `fn(bool) -> bool` +which hint at the probability of the passed value being true or false. Specifically, `likely` hints +to the compiler that the passed value is likely to be true, and `unlikely` hints that it is likely +to be false. Both functions simply return the value they are passed. + +The primary reason for this design is that it reflects common usage of this general feature in many +C and C++ projects, most of which define simple `LIKELY` and `UNLIKELY` macros around the gcc +`__builtin_expect` intrinsic. It also provides the most flexibility, allowing branches on any +condition to be hinted at, even if the process that produced the branched-upon value is +complex. For why an equivalent to `__builtin_expect` is not being exposed, see the Alternatives +section. + +There are no observable changes in behaviour from use of these intrinsics. It is valid to implement +these intrinsics simply as the identity function. Though it is expected that the intrinsics provide +information to the optimizer, that information is not guaranteed to change the decisions the +optimiser makes. + +# Drawbacks + +The intrinsics cannot be used to hint at arms in `match` expressions. However, given that hints +would need to be variants, a simple intrinsic would not be sufficient for those purposes. + +# Alternatives + +Expose an `expect` intrinsic. This is what gcc/clang does with `__builtin_expect`. However there is +a restriction that the second argument be a constant value, a requirement that is not easily +expressible in Rust code. The split into `likely` and `unlikely` intrinsics reflects the strategy +we have used for similar restrictions like the ordering constraint of the atomic intrinsics. + +# Unresolved questions + +None. diff --git a/text/1135-raw-pointer-comparisons.md b/text/1135-raw-pointer-comparisons.md new file mode 100644 index 00000000000..abdca0cd814 --- /dev/null +++ b/text/1135-raw-pointer-comparisons.md @@ -0,0 +1,60 @@ +- Feature Name: raw-pointer-comparisons +- Start Date: 2015-05-27 +- RFC PR: [rust-lang/rfcs#1135](https://github.com/rust-lang/rfcs/pull/1135) +- Rust Issue: [rust-lang/rust#28235](https://github.com/rust-lang/rust/issues/28236) + +# Summary + +Allow equality, but not order, comparisons between fat raw pointers +of the same type. + +# Motivation + +Currently, fat raw pointers can't be compared via either PartialEq or +PartialOrd (currently this causes an ICE). It seems to me that a primitive +type like a fat raw pointer should implement equality in some way. + +However, there doesn't seem to be a sensible way to order raw fat pointers +unless we take vtable addresses into account, which is relatively weird. + +# Detailed design + +Implement PartialEq/Eq for fat raw pointers, defined as comparing both the +unsize-info and the address. This means that these are true: + +```Rust + &s as &fmt::Debug as *const _ == &s as &fmt::Debug as *const _ // of course + &s.first_field as &fmt::Debug as *const _ + != &s as &fmt::Debug as *const _ // these are *different* (one + // prints only the first field, + // the other prints all fields). +``` + +But +```Rust + &s.first_field as &fmt::Debug as *const _ as *const () == + &s as &fmt::Debug as *const _ as *const () // addresses are equal +``` + +# Drawbacks + +Order comparisons may be useful for putting fat raw pointers into +ordering-based data structures (e.g. BinaryTree). + +# Alternatives + +@nrc suggested to implement heterogeneous comparisons between all thin +raw pointers and all fat raw pointers. I don't like this because equality +between fat raw pointers of different traits is false most of the +time (unless one of the traits is a supertrait of the other and/or the +only difference is in free lifetimes), and anyway you can always compare +by casting both pointers to a common type. + +It is also possible to implement ordering too, either in unsize -> addr +lexicographic order or addr -> unsize lexicographic order. + +# Unresolved questions + +What form of ordering should be adopted, if any? + + diff --git a/text/1152-slice-string-symmetry.md b/text/1152-slice-string-symmetry.md new file mode 100644 index 00000000000..0a863c1e587 --- /dev/null +++ b/text/1152-slice-string-symmetry.md @@ -0,0 +1,69 @@ +- Feature Name: `slice_string_symmetry` +- Start Date: 2015-06-06 +- RFC PR: [rust-lang/rfcs#1152](https://github.com/rust-lang/rfcs/pull/1152) +- Rust Issue: [rust-lang/rust#26697](https://github.com/rust-lang/rust/issues/26697) + +# Summary + +Add some methods that already exist on slices to strings. Specifically, the +following methods should be added: + +- `str::into_string` +- `String::into_boxed_str` + +# Motivation + +Conceptually, strings and slices are similar types. Many methods are already +shared between the two types due to their similarity. However, not all methods +are shared between the types, even though many could be. This is a little +unexpected and inconsistent. Because of that, this RFC proposes to remedy this +by adding a few methods to strings to even out these two types’ available +methods. + +Specifically, it is currently very difficult to construct a `Box`, while it +is fairly simple to make a `Box<[T]>` by using `Vec::into_boxed_slice`. This RFC +proposes a means of creating a `Box` by converting a `String`. + +# Detailed design + +Add the following method to `str`, presumably as an inherent method: + +- `into_string(self: Box) -> String`: Returns `self` as a `String`. This is + equivalent to `[T]`’s `into_vec`. + +Add the following method to `String` as an inherent method: + +- `into_boxed_str(self) -> Box`: Returns `self` as a `Box`, + reallocating to cut off any excess capacity if needed. This is required to + provide a safe means of creating `Box`. This is equivalent to `Vec`’s + `into_boxed_slice`. + + +# Drawbacks + +None, yet. + +# Alternatives + +- The original version of this RFC had a few extra methods: + - `str::chunks(&self, n: usize) -> Chunks`: Returns an iterator that yields + the *characters* (not bytes) of the string in groups of `n` at a time. + Iterator element type: `&str`. + + - `str::windows(&self, n: usize) -> Windows`: Returns an iterator over all + contiguous windows of character length `n`. Iterator element type: `&str`. + + This and `str::chunks` aren’t really useful without proper treatment of + graphemes, so they were removed from the RFC. + + - `<[T]>::subslice_offset(&self, inner: &[T]) -> usize`: Returns the offset + (in elements) of an inner slice relative to an outer slice. Panics of + `inner` is not contained within `self`. + + `str::subslice_offset` isn’t yet stable and its usefulness is dubious, so + this method was removed from the RFC. + + +# Unresolved questions + +None. diff --git a/text/1156-adjust-default-object-bounds.md b/text/1156-adjust-default-object-bounds.md new file mode 100644 index 00000000000..b600f095b5b --- /dev/null +++ b/text/1156-adjust-default-object-bounds.md @@ -0,0 +1,243 @@ +- Feature Name: N/A +- Start Date: 2015-06-4 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1156 +- Rust Issue: https://github.com/rust-lang/rust/issues/26438 + +# Summary + +Adjust the object default bound algorithm for cases like `&'x +Box` and `&'x Arc`. The existing algorithm would default +to `&'x Box`. The proposed change is to default to `&'x +Box`. + +Note: This is a **BREAKING CHANGE**. The change has +[been implemented][branch] and its impact has been evaluated. It was +[found][crater] to cause **no root regressions** on `crates.io`. +Nonetheless, to minimize impact, this RFC proposes phasing in the +change as follows: + +- In Rust 1.2, a warning will be issued for code which will break when the + defaults are changed. This warning can be disabled by using explicit + bounds. The warning will only be issued when explicit bounds would be required + in the future anyway. +- In Rust 1.3, the change will be made permanent. Any code that has + not been updated by that time will break. + +# Motivation + +When we instituted default object bounds, [RFC 599] specified that +`&'x Box` (and `&'x mut Box`) should expand to `&'x +Box` (and `&'x mut Box`). This is in contrast to a +`Box` type that appears outside of a reference (e.g., `Box`), +which defaults to using `'static` (`Box`). This +decision was made because it meant that a function written like so +would accept the broadest set of possible objects: + +```rust +fn foo(x: &Box) { +} +``` + +In particular, under the current defaults, `foo` can be supplied an +object which references borrowed data. Given that `foo` is taking the +argument by reference, it seemed like a good rule. Experience has +shown otherwise (see below for some of the problems encountered). + +This RFC proposes changing the default object bound rules so that the +default is drawn from the innermost type that encloses the trait +object. If there is no such type, the default is `'static`. The type +is a reference (e.g., `&'r Trait`), then the default is the lifetime +`'r` of that reference. Otherwise, the type must in practice be some +user-declared type, and the default is derived from the declaration: +if the type declares a lifetime bound, then this lifetime bound is +used, otherwise `'static` is used. This means that (e.g.) `&'r +Box` would default to `&'r Box`, and `&'r +Ref<'q, Trait>` (from `RefCell`) would default to `&'r Ref<'q, +Trait+'q>`. + +### Problems with the current default. + +**Same types, different expansions.** One problem is fairly +predictable: the current default means that identical types differ in +their interpretation based on where they appear. This is something we +have striven to avoid in general. So, as an example, this code +[will not type-check](http://is.gd/Yaak1l): + +```rust +trait Trait { } + +struct Foo { + field: Box +} + +fn do_something(f: &mut Foo, x: &mut Box) { + mem::swap(&mut f.field, &mut *x); +} +``` + +Even though `x` is a reference to a `Box` and the type of +`field` is a `Box`, the expansions differ. `x` expands to `&'x +mut Box` and the field expands to `Box`. In +general, we have tried to ensure that if the type is *typed precisely +the same* in a type definition and a fn definition, then those two +types are equal (note that fn definitions allow you to omit things +that cannot be omitted in types, so some types that you can enter in a +fn definition, like `&i32`, cannot appear in a type definition). + +Now, the same is of course true for the type `Trait` itself, which +appears identically in different contexts and is expanded in different +ways. This is not a problem here because the type `Trait` is unsized, +which means that it cannot be swapped or moved, and hence the main +sources of type mismatches are avoided. + +**Mental model.** In general the mental model of the newer rules seems +simpler: once you move a trait object into the heap (via `Box`, or +`Arc`), you must explicitly indicate whether it can contain borrowed +data or not. So long as you manipulate by reference, you don't have +to. In contrast, the current rules are more subtle, since objects in +the heap may still accept borrowed data, if you have a reference to +the box. + +**Poor interaction with the dropck rules.** When implementing the +newer dropck rules specified by [RFC 769], we found a +[rather subtle problem] that would arise with the current defaults. +The precise problem is spelled out in appendix below, but the TL;DR is +that if you wish to pass an array of boxed objects, the current +defaults can be actively harmful, and hence force you to specify +explicit lifetimes, whereas the newer defaults do something +reasonable. + +# Detailed design + +The rules for user-defined types from RFC 599 are altered as follows +(text that is not changed is italicized): + +- *If `SomeType` contains a single where-clause like `T:'a`, where + `T` is some type parameter on `SomeType` and `'a` is some + lifetime, then the type provided as value of `T` will have a + default object bound of `'a`. An example of this is + `std::cell::Ref`: a usage like `Ref<'x, X>` would change the + default for object types appearing in `X` to be `'a`.* +- If `SomeType` contains no where-clauses of the form `T:'a`, then + the "base default" is used. The base default depends on the overall context: + - in a fn body, the base default is a fresh inference variable. + - outside of a fn body, such in a fn signature, the base default + is `'static`. + Hence `Box` would typically be a default of `'static` for `X`, + regardless of whether it appears underneath an `&` or not. + (Note that in a fn body, the inference is strong enough to adopt `'static` + if that is the necessary bound, or a looser bound if that would be helpful.) +- *If `SomeType` contains multiple where-clauses of the form `T:'a`, + then the default is cleared and explicit lifetiem bounds are + required. There are no known examples of this in the standard + library as this situation arises rarely in practice.* + +# Timing and breaking change implications + +This is a breaking change, and hence it behooves us to evaluate the +impact and describe a procedure for making the change as painless as +possible. One nice propery of this change is that it only affects +*defaults*, which means that it is always possible to write code that +compiles both before and after the change by avoiding defaults in +those cases where the new and old compiler disagree. + +The estimated impact of this change is very low, for two reasons: +- A recent test of crates.io found [no regressions][crater] caused by + this change (however, a [previous run] (from before Rust 1.0) found 8 + regressions). +- This feature was only recently stabilized as part of Rust 1.0 (and + was only added towards the end of the release cycle), so there + hasn't been time for a large body of dependent code to arise + outside of crates.io. + +Nonetheless, to minimize impact, this RFC proposes phasing in the +change as follows: + +- In Rust 1.2, a warning will be issued for code which will break when the + defaults are changed. This warning can be disabled by using explicit + bounds. The warning will only be issued when explicit bounds would be required + in the future anyway. + - Specifically, types that were written `&Box` where the + (boxed) trait object may contain references should now be written + `&Box` to disable the warning. +- In Rust 1.3, the change will be made permanent. Any code that has + not been updated by that time will break. + +# Drawbacks + +The primary drawback is that this is a breaking change, as discussed +in the previous section. + +# Alternatives + +Keep the current design, with its known drawbacks. + +# Unresolved questions + +None. + +# Appendix: Details of the dropck problem + +This appendix goes into detail about the sticky interaction with +dropck that was uncovered. The problem arises if you have a function +that wishes to take a mutable slice of objects, like so: + +```rust +fn do_it(x: &mut [Box]) { ... } +``` + +Here, `&mut [..]` is used because the objects are `FnMut` objects, and +hence require `&mut self` to call. This function in turn is expanded +to: + +```rust +fn do_it<'x>(x: &'x mut [Box]) { ... } +``` + +Now callers might try to invoke the function as so: + +```rust +do_it(&mut [Box::new(val1), Box::new(val2)]) +``` + +Unfortunately, this code fails to compile -- in fact, it cannot be +made to compile without changing the definition of `do_it`, due to a +sticky interaction between dropck and variance. The problem is that +dropck requires that all data in the box strictly outlives the +lifetime of the box's owner. This is to prevent cyclic +content. Therefore, the type of the objects must be `Box` +where `'R` is some region that strictly outlives the array itself (as +the array is the owner of the objects). However, the signature of +`do_it` demands that the reference to the array has the same lifetime +as the trait objects within (and because this is an `&mut` reference +and hence invariant, no approximation is permitted). This implies that +the array must live for at least the region `'R`. But we defined the +region `'R` to be some region that outlives the array, so we have a +quandry. + +The solution is to change the definition of `do_it` in one of two +ways: + +```rust +// Use explicit lifetimes to make it clear that the reference is not +// required to have the same lifetime as the objects themselves: +fn do_it1<'a,'b>(x: &'a mut [Box]) { ... } + +// Specifying 'static is easier, but then the closures cannot +// capture the stack: +fn do_it2(x: &'a mut [Box]) { ... } +``` + +Under the proposed RFC, `do_it2` would be the default. If one wanted +to use lifetimes, then one would have to use explicit lifetime +overrides as shown in `do_it1`. This is consistent with the mental +model of "once you box up an object, you must add annotations for it +to contain borrowed data". + +[RFC 599]: 0599-default-object-bound.md +[RFC 769]: 0769-sound-generic-drop.md +[rather subtle problem]: https://github.com/rust-lang/rust/pull/25212#issuecomment-100244929 +[crater]: https://gist.github.com/brson/085d84d43c6a9a8d4dc3 +[branch]: https://github.com/nikomatsakis/rust/tree/better-object-defaults +[previous run]: https://gist.github.com/brson/80f9b80acef2e7ab37ee +[RFC 1122]: https://github.com/rust-lang/rfcs/pull/1122 diff --git a/text/1174-into-raw-fd-socket-handle-traits.md b/text/1174-into-raw-fd-socket-handle-traits.md new file mode 100644 index 00000000000..38ba3b720ad --- /dev/null +++ b/text/1174-into-raw-fd-socket-handle-traits.md @@ -0,0 +1,68 @@ +- Feature Name: into-raw-fd-socket-handle-traits +- Start Date: 2015-06-24 +- RFC PR: [rust-lang/rfcs#1174](https://github.com/rust-lang/rfcs/pull/1174) +- Rust Issue: [rust-lang/rust#27062](https://github.com/rust-lang/rust/issues/27062) + +# Summary + +Introduce and implement `IntoRaw{Fd, Socket, Handle}` traits to complement the +existing `AsRaw{Fd, Socket, Handle}` traits already in the standard library. + +# Motivation + +The `FromRaw{Fd, Socket, Handle}` traits each take ownership of the provided +handle, however, the `AsRaw{Fd, Socket, Handle}` traits do not give up +ownership. Thus, converting from one handle wrapper to another (for example +converting an open `fs::File` to a `process::Stdio`) requires the caller to +either manually `dup` the handle, or `mem::forget` the wrapper, which +is unergonomic and can be prone to mistakes. + +Traits such as `IntoRaw{Fd, Socket, Handle}` will allow for easily transferring +ownership of OS handles, and it will allow wrappers to perform any +cleanup/setup as they find necessary. + +# Detailed design + +The `IntoRaw{Fd, Socket, Handle}` traits will behave exactly like their +`AsRaw{Fd, Socket, Handle}` counterparts, except they will consume the wrapper +before transferring ownership of the handle. + +Note that these traits should **not** have a blanket implementation over `T: +AsRaw{Fd, Socket, Handle}`: these traits should be opt-in so that implementors +can decide if leaking through `mem::forget` is acceptable or another course of +action is required. + +```rust +// Unix +pub trait IntoRawFd { + fn into_raw_fd(self) -> RawFd; +} + +// Windows +pub trait IntoRawSocket { + fn into_raw_socket(self) -> RawSocket; +} + +// Windows +pub trait IntoRawHandle { + fn into_raw_handle(self) -> RawHandle; +} +``` + +# Drawbacks + +This adds three new traits and methods which would have to be maintained. + +# Alternatives + +Instead of defining three new traits we could instead use the +`std::convert::Into` trait over the different OS handles. However, this +approach will not offer a duality between methods such as +`as_raw_fd()`/`into_raw_fd()`, but will instead be `as_raw_fd()`/`into()`. + +Another possibility is defining both the newly proposed traits as well as the +`Into` trait over the OS handles letting the caller choose what they prefer. + +# Unresolved questions + +None at the moment. diff --git a/text/1183-swap-out-jemalloc.md b/text/1183-swap-out-jemalloc.md new file mode 100644 index 00000000000..83de6c58ac5 --- /dev/null +++ b/text/1183-swap-out-jemalloc.md @@ -0,0 +1,235 @@ +- Feature Name: `allocator` +- Start Date: 2015-06-27 +- RFC PR: [rust-lang/rfcs#1183](https://github.com/rust-lang/rfcs/pull/1183) +- Rust Issue: [rust-lang/rust#27389](https://github.com/rust-lang/rust/issues/27389) + +# Summary + +Add support to the compiler to override the default allocator, allowing a +different allocator to be used by default in Rust programs. Additionally, also +switch the default allocator for dynamic libraries and static libraries to using +the system malloc instead of jemalloc. + +# Motivation + +Note that this issue was [discussed quite a bit][babysteps] in the past, and +the meat of this RFC draws from Niko's post. + +[babysteps]: http://smallcultfollowing.com/babysteps/blog/2014/11/14/allocators-in-rust/ + +Currently all Rust programs by default use jemalloc for an allocator because it +is a fairly reasonable default as it is commonly much faster than the default +system allocator. This is not desirable, however, when embedding Rust code into +other runtimes. Using jemalloc implies that Rust will be using one allocator +while the host application (e.g. Ruby, Firefox, etc) will be using a separate +allocator. Having two allocators in one process generally hurts performance and +is not recommended, so the Rust toolchain needs to provide a method to configure +the allocator. + +In addition to using an entirely separate allocator altogether, some Rust +programs may want to simply instrument allocations or shim in additional +functionality (such as memory tracking statistics). This is currently quite +difficult to do, and would be accomodated with a custom allocation scheme. + +# Detailed design + +The high level design can be found [in this gist][gist], but this RFC intends to +expound on the idea to make it more concrete in terms of what the compiler +implementation will look like. A [sample implementaiton][impl] is available of +this section. + +[gist]: https://gist.github.com/alexcrichton/41c6aad500e56f49abda +[impl]: https://github.com/alexcrichton/rust/tree/less-jemalloc + +### High level design + +The design of this RFC from 10,000 feet (referred to below), which was +[previously outlined][gist] looks like: + +1. Define a set of symbols which correspond to the APIs specified in + `alloc::heap`. The `liballoc` library will call these symbols directly. + Note that this means that each of the symbols take information like the size + of allocations and such. +2. Create two shim libraries which implement these allocation-related functions. + Each shim is shipped with the compiler in the form of a static library. One + shim will redirect to the system allocator, the other shim will bundle a + jemalloc build along with Rust shims to redirect to jemalloc. +3. Intermediate artifacts (rlibs) do not resolve this dependency, they're just + left dangling. +4. When producing a "final artifact", rustc by default links in one of two + shims: + * If we're producing a staticlib or a dylib, link the system shim. + * If we're producing an exe and all dependencies are rlibs link the + jemalloc shim. + +The final link step will be optional, and one could link in any compliant +allocator at that time if so desired. + +### New Attributes + +Two new **unstable** attributes will be added to the compiler: + +* `#![needs_allocator]` indicates that a library requires the "allocation + symbols" to link successfully. This attribute will be attached to `liballoc` + and no other library should need to be tagged as such. Additionally, most + crates don't need to worry about this attribute as they'll transitively link + to liballoc. +* `#![allocator]` indicates that a crate is an allocator crate. This is + currently also used for tagging FFI functions as an "allocation function" + to leverage more LLVM optimizations as well. + +All crates implementing the Rust allocation API must be tagged with +`#![allocator]` to get properly recognized and handled. + +### New Crates + +Two new **unstable** crates will be added to the standard distribution: + +* `alloc_system` is a crate that will be tagged with `#![allocator]` and will + redirect allocation requests to the system allocator. +* `alloc_jemalloc` is another allocator crate that will bundle a static copy of + jemalloc to redirect allocations to. + +Both crates will be available to link to manually, but they will not be +available in stable Rust to start out. + +### Allocation functions + +Each crate tagged `#![allocator]` is expected to provide the full suite of +allocation functions used by Rust, defined as: + +```rust +extern { + fn __rust_allocate(size: usize, align: usize) -> *mut u8; + fn __rust_deallocate(ptr: *mut u8, old_size: usize, align: usize); + fn __rust_reallocate(ptr: *mut u8, old_size: usize, size: usize, + align: usize) -> *mut u8; + fn __rust_reallocate_inplace(ptr: *mut u8, old_size: usize, size: usize, + align: usize) -> usize; + fn __rust_usable_size(size: usize, align: usize) -> usize; +} +``` + +The exact API of all these symbols is considered **unstable** (hence the +leading `__`). This otherwise currently maps to what `liballoc` expects today. +The compiler will not currently typecheck `#![allocator]` crates to ensure +these symbols are defined and have the correct signature. + +Also note that to define the above API in a Rust crate it would look something +like: + +```rust +#[no_mangle] +pub extern fn __rust_allocate(size: usize, align: usize) -> *mut u8 { + /* ... */ +} +``` + +### Limitations of `#![allocator]` + +Allocator crates (those tagged with `#![allocator]`) are not allowed to +transitively depend on a crate which is tagged with `#![needs_allocator]`. This +would introduce a circular dependency which is difficult to link and is highly +likely to otherwise just lead to infinite recursion. + +The compiler will also not immediately verify that crates tagged with +`#![allocator]` do indeed define an appropriate allocation API, and vice versa +if a crate defines an allocation API the compiler will not verify that it is +tagged with `#![allocator]`. This means that the only meaning `#![allocator]` +has to the compiler is to signal that the default allocator should not be +linked. + +### Default allocator specifications + +Target specifications will be extended with two keys: `lib_allocation_crate` +and `exe_allocation_crate`, describing the default allocator crate for these +two kinds of artifacts for each target. The compiler will by default have all +targets redirect to `alloc_system` for both scenarios, but `alloc_jemalloc` will +be used for binaries on OSX, Bitrig, DragonFly, FreeBSD, Linux, OpenBSD, and GNU +Windows. MSVC will notably **not** use jemalloc by default for binaries (we +don't currently build jemalloc on MSVC). + +### Injecting an allocator + +As described above, the compiler will inject an allocator if necessary into the +current compilation. The compiler, however, cannot blindly do so as it can +easily lead to link errors (or worse, two allocators), so it will have some +heuristics for only injecting an allocator when necessary. The steps taken by +the compiler for any particular compilation will be: + +* If no crate in the dependency graph is tagged with `#![needs_allocator]`, then + the compiler does not inject an allocator. +* If only an rlib is being produced, no allocator is injected. +* If any crate tagged with `#[allocator]` has been explicitly linked to (e.g. + via an `extern crate` statement directly or transitively) then no allocator is + injected. +* If two allocators have been linked to explicitly an error is generated. +* If only a binary is being produced, then the target's `exe_allocation_crate` + value is injected, otherwise the `lib_allocation_crate` is injected. + +The compiler will also record that the injected crate is injected, so later +compilations know that rlibs don't actually require the injected crate at +runtime (allowing it to be overridden). + +### Allocators in practice + +Most libraries written in Rust wouldn't interact with the scheme proposed in +this RFC at all as they wouldn't explicitly link with an allocator and generally +are compiled as rlibs. If a Rust dynamic library is used as a dependency, then +its original choice of allocator is propagated throughout the crate graph, but +this rarely happens (except for the compiler itself, which will continue to use +jemalloc). + +Authors of crates which are embedded into other runtimes will start using the +system allocator by default with no extra annotation needed. If they wish to +funnel Rust allocations to the same source as the host application's allocations +then a crate can be written and linked in. + +Finally, providers of allocators will simply provide a crate to do so, and then +applications and/or libraries can make explicit use of the allocator by +depending on it as usual. + +# Drawbacks + +A significant amount of API surface area is being added to the compiler and +standard distribution as part of this RFC, but it is possible for it to all +enter as `#[unstable]`, so we can take our time stabilizing it and perhaps only +stabilize a subset over time. + +The limitation of an allocator crate not being able to link to the standard +library (or libcollections) may be a somewhat significant hit to the ergonomics +of defining an allocator, but allocators are traditionally a very niche class of +library and end up defining their own data structures regardless. + +Libraries on crates.io may accidentally link to an allocator and not actually +use any specific API from it (other than the standard allocation symbols), +forcing transitive dependants to silently use that allocator. + +This RFC does not specify the ability to swap out the allocator via the command +line, which is certainly possible and sometimes more convenient than modifying +the source itself. + +It's possible to define an allocator API (e.g. define the symbols) but then +forget the `#![allocator]` annotation, causing the compiler to wind up linking +two allocators, which may cause link errors that are difficult to debug. + +# Alternatives + +The compiler's knowledge about allocators could be simplified quite a bit to the +point where a compiler flag is used to just turn injection on/off, and then it's +the responsibility of the application to define the necessary symbols if the +flag is turned off. The current implementation of this RFC, however, is not seen +as overly invasive and the benefits of "everything's just a crate" seems worth +it for the mild amount of complexity in the compiler. + +Many of the names (such as `alloc_system`) have a number of alternatives, and +the naming of attributes and functions could perhaps follow a stronger +convention. + +# Unresolved questions + +Does this enable jemalloc to be built without a prefix on Linux? This would +enable us to direct LLVM allocations to jemalloc, which would be quite nice! + +Should BSD-like systems use Rust's jemalloc by default? Many of them have +jemalloc as the system allocator and even the special APIs we use from jemalloc. diff --git a/text/1184-stabilize-no_std.md b/text/1184-stabilize-no_std.md new file mode 100644 index 00000000000..6f2cbbb896c --- /dev/null +++ b/text/1184-stabilize-no_std.md @@ -0,0 +1,160 @@ +- Feature Name: N/A +- Start Date: 2015-06-26 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1184 +- Rust Issue: https://github.com/rust-lang/rust/issues/27394 + +# Summary + +Tweak the `#![no_std]` attribute, add a new `#![no_core]` attribute, and +pave the way for stabilizing the libcore library. + +# Motivation + +Currently all stable Rust programs must link to the standard library (libstd), +and it is impossible to opt out of this. The standard library is not appropriate +for use cases such as kernels, embedded development, or some various niche cases +in userspace. For these applications Rust itself is appropriate, but the +compiler does not provide a stable interface compiling in this mode. + +The standard distribution provides a library, libcore, which is "the essence of +Rust" as it provides many language features such as iterators, slice methods, +string methods, etc. The defining feature of libcore is that it has 0 +dependencies, unlike the standard library which depends on many I/O APIs, for +example. The purpose of this RFC is to provide a stable method to access +libcore. + +Applications which do not want to use libstd still want to use libcore 99% of +the time, but unfortunately the current `#![no_std]` attribute does not do a +great job in facilitating this. When moving into the realm of not using the +standard library, the compiler should make the use case as ergonomic as +possible, so this RFC proposes different behavior than today's `#![no_std]`. + +Finally, the standard library defines a number of language items which must be +defined when libstd is not used. These language items are: + +* `panic_fmt` +* `eh_personality` +* `stack_exhausted` + +To be able to usefully leverage `#![no_std]` in stable Rust these lang items +must be available in a stable fashion. + +# Detailed Design + +This RFC proposes a nuber of changes: + +* Tweak the `#![no_std]` attribute slightly. +* Introduce a `#![no_core]` attribute. +* Pave the way to stabilize the `core` module. + +## `no_std` + +The `#![no_std]` attribute currently provides two pieces of functionality: + +* The compiler no longer injects `extern crate std` at the top of a crate. +* The prelude (`use std::prelude::v1::*`) is no longer injected at the top of + every module. + +This RFC proposes adding the following behavior to the `#![no_std]` attribute: + +* The compiler will inject `extern crate core` at the top of a crate. +* The libcore prelude will be injected at the top of every module. + +Most uses of `#![no_std]` already want behavior along these lines as they want +to use libcore, just not the standard library. + +## `no_core` + +A new attribute will be added to the compiler, `#![no_core]`, which serves two +purposes: + +* This attribute implies the `#![no_std]` attribute (no std prelude/crate + injection). +* This attribute will prevent core prelude/crate injection. + +Users of `#![no_std]` today who do *not* use libcore would migrate to moving +this attribute instead of `#![no_std]`. + +## Stabilization of libcore + +This RFC does not yet propose a stabilization path for the contents of libcore, +but it proposes readying to stabilize the name `core` for libcore, paving the +way for the rest of the library to be stabilized. The exact method of +stabilizing its contents will be determined with a future RFC or pull requests. + +## Stabilizing lang items + +As mentioned above, there are three separate lang items which are required by +the libcore library to link correctly. These items are: + +* `panic_fmt` +* `stack_exhausted` +* `eh_personality` + +This RFC does **not** attempt to stabilize these lang items for a number of +reasons: + +* The exact set of these lang items is somewhat nebulous and may change over + time. +* The signatures of each of these lang items can either be platform-specific or + it's just "too weird" to stabilize. +* These items are pretty obscure and it's not very widely known what they do or + how they should be implemented. + +Stabilization of these lang items (in any form) will be considered in a future +RFC. + +# Drawbacks + +The current distribution provides precisely one library, the standard library, +for general consumption of Rust programs. Adding a new one (libcore) is adding +more surface area to the distribution (in addition to adding a new `#![no_core]` +attribute). This surface area is greatly desired, however. + +When using `#![no_std]` the experience of Rust programs isn't always the best as +there are some pitfalls that can be run into easily. For example, macros and +plugins sometimes hardcode `::std` paths, but most ones in the standard +distribution have been updated to use `::core` in the case that `#![no_std]` is +present. Another example is that common utilities like vectors, pointers, and +owned strings are not available without liballoc, which will remain an unstable +library. This means that users of `#![no_std]` will have to reimplement all of +this functionality themselves. + +This RFC does not yet pave a way forward for using `#![no_std]` and producing an +executable because the `#[start]` item is required, but remains feature gated. +This RFC just enables creation of Rust static or dynamic libraries which don't +depend on the standard library in addition to Rust libraries (rlibs) which do +not depend on the standard library. + +In stabilizing the `#![no_std]` attribute it's likely that a whole ecosystem of +crates will arise which work with `#![no_std]`, but in theory all of these +crates should also interoperate with the rest of the ecosystem using `std`. +Unfortunately, however, there are known cases where this is not possible. For +example if a macro is exported from a `#![no_std]` crate which references items +from `core` it won't work by default with a `std` library. + +# Alternatives + +Most of the strategies taken in this RFC have some minor variations on what can +happen: + +* The `#![no_std]` attribute could be stabilized as-is without adding a + `#![no_core]` attribute, requiring users to write `extern crate core` and + import the core prelude manually. The burden of adding `#![no_core]` to the + compiler, however, is seen as not-too-bad compared to the increase in + ergonomics of using `#![no_std]`. +* Another stable crate could be provided by the distribution which provides + definitions of these lang items which are all wired to abort. This has the + downside of selecting a name for this crate, however, and also inflating the + crates in our distribution again. + +# Unresolved Questions + +* How important/common are `#![no_std]` executables? Should this RFC attempt to + stabilize that as well? +* When a staticlib is emitted should the compiler *guarantee* that a + `#![no_std]` one will link by default? This precludes us from ever adding + future require language items for features like unwinding or stack exhaustion + by default. For example if a new security feature is added to LLVM and we'd + like to enable it by default, it may require that a symbol or two is defined + somewhere in the compilation. diff --git a/text/1191-hir.md b/text/1191-hir.md new file mode 100644 index 00000000000..1c3c2dd3f87 --- /dev/null +++ b/text/1191-hir.md @@ -0,0 +1,81 @@ +- Feature Name: N/A +- Start Date: 2015-07-06 +- RFC PR: [rust-lang/rfcs#1191](https://github.com/rust-lang/rfcs/pull/1191) +- Rust Issue: N/A + + +# Summary + +Add a high-level intermediate representation (HIR) to the compiler. This is +basically a new (and additional) AST more suited for use by the compiler. + +This is purely an implementation detail of the compiler. It has no effect on the +language. + +Note that adding a HIR does not preclude adding a MIR or LIR in the future. + + +# Motivation + +Currently the AST is used by libsyntax for syntactic operations, by the compiler +for pretty much everything, and in syntax extensions. I propose splitting the +AST into a libsyntax version that is specialised for syntactic operation and +will eventually be stabilised for use by syntax extensions and tools, and the +HIR which is entirely internal to the compiler. + +The benefit of this split is that each AST can be specialised to its task and we +can separate the interface to the compiler (the AST) from its implementation +(the HIR). Specific changes I see that could happen are more ids and spans in +the AST, the AST adhering more closely to the surface syntax, the HIR becoming +more abstract (e.g., combining structs and enums), and using resolved names in +the HIR (i.e., performing name resolution as part of the AST->HIR lowering). + +Not using the AST in the compiler means we can work to stabilise it for syntax +extensions and tools: it will become part of the interface to the compiler. + +I also envisage all syntactic expansion of language constructs (e.g., `for` +loops, `if let`) moving to the lowering step from AST to HIR, rather than being +AST manipulations. That should make both error messages and tool support better +for such constructs. It would be nice to move lifetime elision to the lowering +step too, in order to make the HIR as explicit as possible. + + +# Detailed design + +Initially, the HIR will be an (almost) identical copy of the AST and the +lowering step will simply be a copy operation. Since some constructs (macros, +`for` loops, etc.) are expanded away in libsyntax, these will not be part of the +HIR. Tools such as the AST visitor will need to be duplicated. + +The compiler will be changed to use the HIR throughout (this should mostly be a +matter of change the imports). Incrementally, I expect to move expansion of +language constructs to the lowering step. Further in the future, the HIR should +get more abstract and compact, and the AST should get closer to the surface +syntax. + + +# Drawbacks + +Potentially slower compilations and higher memory use. However, this should be +offset in the long run by making improvements to the compiler easier by having a +more appropriate data structure. + + +# Alternatives + +Leave things as they are. + +Skip the HIR and lower straight to a MIR later in compilation. This has +advantages which adding a HIR does not have, however, it is a far more complex +refactoring and also misses some benefits of the HIR, notably being able to +stabilise the AST for tools and syntax extensions without locking in the +compiler. + + +# Unresolved questions + +How to deal with spans and source code. We could keep the AST around and +reference back to it from the HIR. Or we could copy span information to the HIR +(I plan on doing this initially). Possibly some other solution like keeping the +span info in a side table (note that we need less span info in the compiler than +we do in libsyntax, which is in turn less than tools want). diff --git a/text/1192-inclusive-ranges.md b/text/1192-inclusive-ranges.md new file mode 100644 index 00000000000..a6f619d47a1 --- /dev/null +++ b/text/1192-inclusive-ranges.md @@ -0,0 +1,116 @@ +- Feature Name: inclusive_range_syntax +- Start Date: 2015-07-07 +- RFC PR: [rust-lang/rfcs#1192](https://github.com/rust-lang/rfcs/pull/1192) +- Rust Issue: [rust-lang/rust#28237](https://github.com/rust-lang/rust/issues/28237) + +# Summary + +Allow a `x...y` expression to create an inclusive range. + +# Motivation + +There are several use-cases for inclusive ranges, that semantically +include both end-points. For example, iterating from `0_u8` up to and +including some number `n` can be done via `for _ in 0..n + 1` at the +moment, but this will fail if `n` is `255`. Furthermore, some iterable +things only have a successor operation that is sometimes sensible, +e.g., `'a'..'{'` is equivalent to the inclusive range `'a'...'z'`: +there's absolutely no reason that `{` is after `z` other than a quirk +of the representation. + +The `...` syntax mirrors the current `..` used for exclusive ranges: +more dots means more elements. + +# Detailed design + +`std::ops` defines + +```rust +pub enum RangeInclusive { + Empty { + at: T, + }, + NonEmpty { + start: T, + end: T, + } +} + +pub struct RangeToInclusive { + pub end: T, +} +``` + +Writing `a...b` in an expression desugars to `std::ops::RangeInclusive::NonEmpty { start: a, end: b }`. Writing `...b` in an +expression desugars to `std::ops::RangeToInclusive { end: b }`. + +`RangeInclusive` implements the standard traits (`Clone`, `Debug` +etc.), and implements `Iterator`. The `Empty` variant is to allow the +`Iterator` implementation to work without hacks (see Alternatives). + +The use of `...` in a pattern remains as testing for inclusion +within that range, *not* a struct match. + +The author cannot forsee problems with breaking backward +compatibility. In particular, one tokenisation of syntax like `1...` +now would be `1. ..` i.e. a floating point number on the left, +however, fortunately, it is actually tokenised like `1 ...`, and is +hence an error with the current compiler. + +# Drawbacks + +There's a mismatch between pattern-`...` and expression-`...`, in that +the former doesn't undergo the same desugaring as the +latter. (Although they represent essentially the same thing +semantically.) + +The `...` vs. `..` distinction is the exact inversion of Ruby's syntax. + +Having an extra field in a language-level desugaring, catering to one +library use-case is a little non-"hygienic". It is especially strange +that the field isn't consistent across the different `...` +desugarings. + +# Alternatives + +An alternate syntax could be used, like +`..=`. [There has been discussion][discuss], but there wasn't a clear +winner. + +[discuss]: https://internals.rust-lang.org/t/vs-for-inclusive-ranges/1539 + +This RFC proposes single-ended syntax with only an end, `...b`, but not +with only a start (`a...`) or unconstrained `...`. This balance could be +reevaluated for usefulness and conflicts with other proposed syntax. + +The `Empty` variant could be omitted, leaving two options: + +- `RangeInclusive` could be a struct including a `finished` field. +- `a...b` only implements `IntoIterator`, not `Iterator`, by + converting to a different type that does have the field. However, + this means that `a.. .b` behaves differently to `a..b`, so + `(a...b).map(|x| ...)` doesn't work (the `..` version of that is + used reasonably often, in the author's experience) +- `a...b` can implement `Iterator` for types that can be stepped + backwards: the only case that is problematic things cases like + `x...255u8` where the endpoint is the last value in the type's + range. A naive implementation that just steps `x` and compares + against the second value will never terminate: it will yield 254 + (final state: `255...255`), 255 (final state: `0...255`), 0 (final + state: `1...255`). I.e. it will wrap around because it has no way to + detect whether 255 has been yielded or not. However, implementations + of `Iterator` can detect cases like that, and, after yielding `255`, + backwards-step the second piece of state to `255...254`. + + This means that `a...b` can only implement `Iterator` for types that + can be stepped backwards, which isn't always guaranteed, e.g. types + might not have a unique predecessor (walking along a DAG). + +# Unresolved questions + +None so far. + +# Amendments + +* In rust-lang/rfcs#1320, this RFC was amended to change the `RangeInclusive` + type from a struct with a `finished` field to an enum. diff --git a/text/1193-cap-lints.md b/text/1193-cap-lints.md new file mode 100644 index 00000000000..efac4c0689d --- /dev/null +++ b/text/1193-cap-lints.md @@ -0,0 +1,109 @@ +- Feature Name: N/A +- Start Date: 2015-07-07 +- RFC PR: [rust-lang/rfcs#1193](https://github.com/rust-lang/rfcs/pull/1193) +- Rust Issue: [rust-lang/rust#27259](https://github.com/rust-lang/rust/issues/27259) + +# Summary + +Add a new flag to the compiler, `--cap-lints`, which set the maximum possible +lint level for the entire crate (and cannot be overridden). Cargo will then pass +`--cap-lints allow` to all upstream dependencies when compiling code. + +# Motivation + +> Note: this RFC represents issue [#1029][issue] + +Currently any modification to a lint in the compiler is strictly speaking a +breaking change. All crates are free to place `#![deny(warnings)]` at the top of +their crate, turning any new warnings into compilation errors. This means that +if a future version of Rust starts to emit new warnings it may fail to compile +some previously written code (a breaking change). + +We would very much like to be able to modify lints, however. For example +[rust-lang/rust#26473][pr] updated the `missing_docs` lint to also look for +missing documentation on `const` items. This ended up [breaking some +crates][term-pr] in the ecosystem due to their usage of +`#![deny(missing_docs)]`. + +[issue]: https://github.com/rust-lang/rfcs/issues/1029 +[pr]: https://github.com/rust-lang/rust/pull/26473 +[term-pr]: https://github.com/rust-lang/term/pull/34 + +The mechanism proposed in this RFC is aimed at providing a method to compile +upstream dependencies in a way such that they are resilient to changes in the +behavior of the standard lints in the compiler. A new lint warning or error will +never represent a memory safety issue (otherwise it'd be a real error) so it +should be safe to ignore any new instances of a warning that didn't show up +before. + +# Detailed design + +There are two primary changes propsed by this RFC, the first of which is a new +flag to the compiler: + +``` + --cap-lints LEVEL Set the maximum lint level for this compilation, cannot + be overridden by other flags or attributes. +``` + +For example when `--cap-lints allow` is passed, all instances of `#[warn]`, +`#[deny]`, and `#[forbid]` are ignored. If, however `--cap-lints warn` is passed +only `deny` and `forbid` directives are ignored. + +The acceptable values for `LEVEL` will be `allow`, `warn`, `deny`, or `forbid`. + +The second change proposed is to have Cargo pass `--cap-lints allow` to all +upstream dependencies. Cargo currently passes `-A warnings` to all upstream +dependencies (allow all warnings by default), so this would just be guaranteeing +that no lints could be fired for upstream dependencies. + +With these two pieces combined together it is now possible to modify lints in +the compiler in a backwards compatible fashion. Modifications to existing lints +to emit new warnings will not get triggered, and new lints will also be entirely +suppressed **only for upstream dependencies**. + +## Cargo Backwards Compatibility + +This flag would be first non-1.0 flag that Cargo would be passing to the +compiler. This means that Cargo can no longer drive a 1.0 compiler, but only a +1.N+ compiler which has the `--cap-lints` flag. To handle this discrepancy Cargo +will detect whether `--cap-lints` is a valid flag to the compiler. + +Cargo already runs `rustc -vV` to learn about the compiler (e.g. a "unique +string" that's opaque to Cargo) and it will instead start passing +`rustc -vV --cap-lints allow` to the compiler instead. This will allow Cargo to +simultaneously detect whether the flag is valid and learning about the version +string. If this command fails and `rustc -vV` succeeds then Cargo will fall back +to the old behavior of passing `-A warnings`. + +# Drawbacks + +This RFC adds surface area to the command line of the compiler with a relatively +obscure option `--cap-lints`. The option will almost never be passed by anything +other than Cargo, so having it show up here is a little unfortunate. + +Some crates may inadvertently rely on memory safety through lints, or otherwise +very much not want lints to be turned off. For example if modifications to a new +lint to generate more warnings caused an upstream dependency to fail to compile, +it could represent a serious bug indicating the dependency needs to be updated. +This system would paper over this issue by forcing compilation to succeed. This +use case seems relatively rare, however, and lints are also perhaps not the best +method to ensure the safety of a crate. + +Cargo may one day grow configuration to *not* pass this flag by default (e.g. go +back to passing `-Awarnings` by default), which is yet again more expansion of +API surface area. + +# Alternatives + +* Modifications to lints or additions to lints could be considered + backwards-incompatible changes. +* The meaning of the `-A` flag could be reinterpreted as "this cannot be + overridden" +* A new "meta lint" could be introduced to represent the maximum cap, for + example `-A everything`. This is semantically different enough from `-A foo` + that it seems worth having a new flag. + +# Unresolved questions + +None yet. diff --git a/text/1194-set-recovery.md b/text/1194-set-recovery.md new file mode 100644 index 00000000000..8a2e0a7e1ca --- /dev/null +++ b/text/1194-set-recovery.md @@ -0,0 +1,106 @@ +- Feature Name: `set_recovery` +- Start Date: 2015-07-08 +- RFC PR: [rust-lang/rfcs#1194](https://github.com/rust-lang/rfcs/pull/1194) +- Rust Issue: [rust-lang/rust#28050](https://github.com/rust-lang/rust/issues/28050) + +# Summary + +Add element-recovery methods to the set types in `std`. + +# Motivation + +Sets are sometimes used as a cache keyed on a certain property of a type, but programs may need to +access the type's other properties for efficiency or functionality. The sets in `std` do not expose +their elements (by reference or by value), making this use-case impossible. + +Consider the following example: + +```rust +use std::collections::HashSet; +use std::hash::{Hash, Hasher}; + +// The `Widget` type has two fields that are inseparable. +#[derive(PartialEq, Eq, Hash)] +struct Widget { + foo: Foo, + bar: Bar, +} + +#[derive(PartialEq, Eq, Hash)] +struct Foo(&'static str); + +#[derive(PartialEq, Eq, Hash)] +struct Bar(u32); + +// Widgets are normally considered equal if all their corresponding fields are equal, but we would +// also like to maintain a set of widgets keyed only on their `bar` field. To this end, we create a +// new type with custom `{PartialEq, Hash}` impls. +struct MyWidget(Widget); + +impl PartialEq for MyWidget { + fn eq(&self, other: &Self) -> bool { self.0.bar == other.0.bar } +} + +impl Eq for MyWidget {} + +impl Hash for MyWidget { + fn hash(&self, h: &mut H) { self.0.bar.hash(h); } +} + +fn main() { + // In our program, users are allowed to interactively query the set of widgets according to + // their `bar` field, as well as insert, replace, and remove widgets. + + let mut widgets = HashSet::new(); + + // Add some default widgets. + widgets.insert(MyWidget(Widget { foo: Foo("iron"), bar: Bar(1) })); + widgets.insert(MyWidget(Widget { foo: Foo("nickel"), bar: Bar(2) })); + widgets.insert(MyWidget(Widget { foo: Foo("copper"), bar: Bar(3) })); + + // At this point, the user enters commands and receives output like: + // + // ``` + // > get 1 + // Some(iron) + // > get 4 + // None + // > remove 2 + // removed nickel + // > add 2 cobalt + // added cobalt + // > add 3 zinc + // replaced copper with zinc + // ``` + // + // However, `HashSet` does not expose its elements via its `{contains, insert, remove}` + // methods, instead providing only a boolean indicator of the elements's presence in the set, + // preventing us from implementing the desired functionality. +} +``` + +# Detailed design + +Add the following element-recovery methods to `std::collections::{BTreeSet, HashSet}`: + +```rust +impl Set { + // Like `contains`, but returns a reference to the element if the set contains it. + fn get(&self, element: &Q) -> Option<&T>; + + // Like `remove`, but returns the element if the set contained it. + fn take(&mut self, element: &Q) -> Option; + + // Like `insert`, but replaces the element with the given one and returns the previous element + // if the set contained it. + fn replace(&mut self, element: T) -> Option; +} +``` + +# Drawbacks + +This complicates the collection APIs. + +# Alternatives + +Do nothing. diff --git a/text/1199-simd-infrastructure.md b/text/1199-simd-infrastructure.md new file mode 100644 index 00000000000..aa71c3b4665 --- /dev/null +++ b/text/1199-simd-infrastructure.md @@ -0,0 +1,423 @@ +- Feature Name: repr_simd, platform_intrinsics, cfg_target_feature +- Start Date: 2015-06-02 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1199 +- Rust Issue: https://github.com/rust-lang/rust/issues/27731 + +# Summary + +Lay the ground work for building powerful SIMD functionality. + +# Motivation + +SIMD (Single-Instruction Multiple-Data) is an important part of +performant modern applications. Most CPUs used for that sort of task +provide dedicated hardware and instructions for operating on multiple +values in a single instruction, and exposing this is an important part +of being a low-level language. + +This RFC lays the ground-work for building nice SIMD functionality, +but doesn't fill everything out. The goal here is to provide the raw +types and access to the raw instructions on each platform. + +(An earlier variant of this RFC was discussed as a +[pre-RFC](https://internals.rust-lang.org/t/pre-rfc-simd-groundwork/2343).) + +## Where does this code go? Aka. why not in `std`? + +This RFC is focused on building stable, powerful SIMD functionality in +external crates, not `std`. + +This makes it much easier to support functionality only "occasionally" +available with Rust's preexisting `cfg` system. There's no way for +`std` to conditionally provide an API based on the target features +used for the final artifact. Building `std` in every configuration is +certainly untenable. Hence, if it were to be in `std`, there would +need to be some highly delayed `cfg` system to support that sort of +conditional API exposure. + +With an external crate, we can leverage `cargo`'s existing build +infrastructure: compiling with some target features will rebuild with +those features enabled. + + +# Detailed design + +The design comes in three parts, all on the path to stabilisation: + +- types (`feature(repr_simd)`) +- operations (`feature(platform_intrinsics)`) +- platform detection (`feature(cfg_target_feature)`) + +The general idea is to avoid bad performance cliffs, so that an +intrinsic call in Rust maps to preferably one CPU instruction, or, if +not, the "optimal" sequence required to do the given operation +anyway. This means exposing a *lot* of platform specific details, +since platforms behave very differently: both across architecture +families (x86, x86-64, ARM, MIPS, ...), and even within a family +(x86-64's Skylake, Haswell, Nehalem, ...). + +There is definitely a common core of SIMD functionality shared across +many platforms, but this RFC doesn't try to extract that, it is just +building tools that can be wrapped into a more uniform API later. + + +## Types + +There is a new attribute: `repr(simd)`. + +```rust +#[repr(simd)] +struct f32x4(f32, f32, f32, f32); + +#[repr(simd)] +struct Simd2(T, T); +``` + +The `simd` `repr` can be attached to a struct and will cause such a +struct to be compiled to a SIMD vector. It can be generic, but it is +required that any fully monomorphised instance of the type consist of +only a single "primitive" type, repeated some number of times. + +The `repr(simd)` may not enforce that any trait bounds exists/does the +right thing at the type checking level for generic `repr(simd)` +types. As such, it will be possible to get the code-generator to error +out (ala the old `transmute` size errors), however, this shouldn't +cause problems in practice: libraries wrapping this functionality +would layer type-safety on top (i.e. generic `repr(simd)` types would +use some `unsafe` trait as a bound that is designed to only be +implemented by types that will work). + +Adding `repr(simd)` to a type may increase its minimum/preferred +alignment, based on platform behaviour. (E.g. x86 wants its 128-bit +SSE vectors to be 128-bit aligned.) + +## Operations + +CPU vendors usually offer "standard" C headers for their CPU specific +operations, such as [`arm_neon.h`][armneon] and [the `...mmintrin.h` headers for +x86(-64)][x86]. + +[armneon]: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf +[x86]: https://software.intel.com/sites/landingpage/IntrinsicsGuide + +All of these would be exposed as compiler intrinsics with names very +similar to those that the vendor suggests (only difference would be +some form of manual namespacing, e.g. prefixing with the CPU target), +loadable via an `extern` block with an appropriate ABI. This subset of +intrinsics would be on the path to stabilisation (that is, one can +"import" them with `extern` in stable code), and would not be exported +by `std`. + +Example: + +```rust +extern "platform-intrinsic" { + fn x86_mm_abs_epi16(a: Simd8) -> Simd8; + // ... +} +``` + +These all use entirely concrete types, and this is the core interface +to these intrinsics: essentially it is just allowing code to exactly +specify a CPU instruction to use. These intrinsics only actually work +on a subset of the CPUs that Rust targets, and will result in compile +time errors if they are called on platforms that do not support +them. The signatures are typechecked, but in a "duck-typed" manner: it +will just ensure that the types are SIMD vectors with the appropriate +length and element type, it will not enforce a specific nominal type. + +NB. The structural typing is just for the declaration: if a SIMD intrinsic +is declared to take a type `X`, it must always be called with `X`, +even if other types are structurally equal to `X`. Also, within a +signature, SIMD types that must be structurally equal must be nominally +equal. I.e. if the `add_...` all refer to the same intrinsic to add a +SIMD vector of bytes, + +```rust +// (same length) +struct A(u8, u8, ..., u8); +struct B(u8, u8, ..., u8); + +extern "platform-intrinsic" { + fn add_aaa(x: A, y: A) -> A; // ok + fn add_bbb(x: B, y: B) -> B; // ok + fn add_aab(x: A, y: A) -> B; // error, expected B, found A + fn add_bab(x: B, y: A) -> B; // error, expected A, found B +} + +fn double_a(x: A) -> A { + add_aaa(x, x) +} +fn double_b(x: B) -> B { + add_aaa(x, x) // error, expected A, found B +} +``` + +There would additionally be a small set of cross-platform operations +that are either generally efficiently supported everywhere or are +extremely useful. These won't necessarily map to a single instruction, +but will be shimmed as efficiently as possible. + +- shuffles and extracting/inserting elements +- comparisons +- arithmetic +- conversions + +All of these intrinsics are imported via an `extern` directive similar +to the process for pre-existing intrinsics like `transmute`, however, +the SIMD operations are provided under a special ABI: +`platform-intrinsic`. Use of this ABI (and hence the intrinsics) is +initially feature-gated under the `platform_intrinsics` feature +name. Why `platform-intrinsic` rather than say `simd-intrinsic`? There +are non-SIMD platform-specific instructions that may be nice to expose +(for example, Intel defines an `_addcarry_u32` intrinsic corresponding +to the `ADC` instruction). + +### Shuffles & element operations + +One of the most powerful features of SIMD is the ability to rearrange +data within vectors, giving super-linear speed-ups sometimes. As such, +shuffles are exposed generally: intrinsics that represent arbitrary +shuffles. + +This may violate the "one instruction per instrinsic" principal +depending on the shuffle, but rearranging SIMD vectors is extremely +useful, and providing a direct intrinsic lets the compiler (a) do the +programmers work in synthesising the optimal (short) sequence of +instructions to get a given shuffle and (b) track data through +shuffles without having to understand all the details of every +platform specific intrinsic for shuffling. + +```rust +extern "platform-intrinsic" { + fn simd_shuffle2(v: T, w: T, idx: [i32; 2]) -> U; + fn simd_shuffle4(v: T, w: T, idx: [i32; 4]) -> U; + fn simd_shuffle8(v: T, w: T, idx: [i32; 8]) -> U; + fn simd_shuffle16(v: T, w: T, idx: [i32; 16]) -> U; + // ... +} +``` + +The raw definitions are only checked for validity at monomorphisation +time, ensure that `T` and `U` are SIMD vector with the same element +type, `U` has the appropriate length etc. Libraries can use traits to +ensure that these will be enforced by the type checker too. + +This approach has similar type "safety"/code-generation errors to the +vectors themselves. + +These operations are semantically: + +```rust +// vector of double length +let z = concat(v, w); + +return [z[idx[0]], z[idx[1]], z[idx[2]], ...] +``` + +The index array `idx` has to be compile time constants. Out of bounds +indices yield errors. + +Similarly, intrinsics for inserting/extracting elements into/out of +vectors are provided, to allow modelling the SIMD vectors as actual +CPU registers as much as possible: + +```rust +extern "platform-intrinsic" { + fn simd_insert(v: T, i0: u32, elem: Elem) -> T; + fn simd_extract(v: T, i0: u32) -> Elem; +} +``` + +The `i0` indices do not have to be constant. These are equivalent to +`v[i0] = elem` and `v[i0]` respectively. They are type checked +similarly to the shuffles. + +### Comparisons + +Comparisons are implemented via intrinsics. The raw signatures would +look like: + +```rust +extern "platform-intrinsic" { + fn simd_eq(v: T, w: T) -> U; + fn simd_ne(v: T, w: T) -> U; + fn simd_lt(v: T, w: T) -> U; + fn simd_le(v: T, w: T) -> U; + fn simd_gt(v: T, w: T) -> U; + fn simd_ge(v: T, w: T) -> U; +} +``` + +These are type checked during code-generation similarly to the +shuffles: ensuring that `T` and `U` have the same length, and that `U` +is appropriately "boolean"-y. Libraries can use traits to ensure that +these will be enforced by the type checker too. + +### Arithmetic + +Intrinsics will be provided for arithmetic operations like addition +and multiplication. + +```rust +extern "platform-intrinsic" { + fn simd_add(x: T, y: T) -> T; + fn simd_mul(x: T, y: T) -> T; + // ... +} +``` + +These will have codegen time checks that the element type is correct: + +- `add`, `sub`, `mul`: any float or integer type +- `div`: any float type +- `and`, `or`, `xor`, `shl` (shift left), `shr` (shift right): any + integer type + +(The integer types are `i8`, ..., `i64`, `u8`, ..., `u64` and the +float types are `f32` and `f64`.) + +### Why not inline asm? + +One alternative to providing intrinsics is to instead just use +inline-asm to expose each CPU instruction. However, this approach has +essentially only one benefit (avoiding defining the intrinsics), but +several downsides, e.g. + +- assembly is generally a black-box to optimisers, inhibiting + optimisations, like algebraic simplification/transformation, +- programmers would have to manually synthesise the right sequence of + operations to achieve a given shuffle, while having a generic + shuffle intrinsic lets the compiler do it (NB. the intention is that + the programmer will still have access to the platform specific + operations for when the compiler synthesis isn't quite right), +- inline assembly is not currently stable in + Rust and there's not a strong push for it to be so in the immediate + future (although this could change). + +Benefits of manual assembly writing, like instruction scheduling and +register allocation don't apply to the (generally) one-instruction +`asm!` blocks that replace the intrinsics (they need to be designed so +that the compiler has full control over register allocation, or else +the result will be strictly worse). Those possible advantages of hand +written assembly over intrinsics only come in to play when writing +longer blocks of raw assembly, i.e. some inner loop might be faster +when written as a single chunk of asm rather than as intrinsics. + +## Platform Detection + +The availability of efficient SIMD functionality is very fine-grained, +and our current `cfg(target_arch = "...")` is not precise enough. This +RFC proposes a `target_feature` `cfg`, that would be set to the +features of the architecture that are known to be supported by the +exact target e.g. + +- a default x86-64 compilation would essentially only set + `target_feature = "sse"` and `target_feature = "sse2"` +- compiling with `-C target-feature="+sse4.2"` would set + `target_feature = "sse4.2"`, `target_feature = "sse.4.1"`, ..., + `target_feature = "sse"`. +- compiling with `-C target-cpu=native` on a modern CPU might set + `target_feature = "avx2"`, `target_feature = "avx"`, ... + +The possible values of `target_feature` will be a selected whitelist, +not necessarily just everything LLVM understands. There are other +non-SIMD features that might have `target_feature`s set too, such as +`popcnt` and `rdrnd` on x86/x86-64.) + +With a `cfg_if!` macro that expands to the first `cfg` that is +satisfied (ala [@alexcrichton's `cfg-if`][cfg-if]), code might look +like: + +[cfg-if]: https://crates.io/crates/cfg-if + +```rust +cfg_if_else! { + if #[cfg(target_feature = "avx")] { + fn foo() { /* use AVX things */ } + } else if #[cfg(target_feature = "sse4.1")] { + fn foo() { /* use SSE4.1 things */ } + } else if #[cfg(target_feature = "sse2")] { + fn foo() { /* use SSE2 things */ } + } else if #[cfg(target_feature = "neon")] { + fn foo() { /* use NEON things */ } + } else { + fn foo() { /* universal fallback */ } + } +} +``` + +# Extensions + +- scatter/gather operations allow (partially) operating on a SIMD + vector of pointers. This would require allowing + pointers(/references?) in `repr(simd)` types. +- allow (and ignore for everything but type checking) zero-sized types + in `repr(simd)` structs, to allow tagging them with markers +- the shuffle intrinsics could be made more relaxed in their type + checking (i.e. not require that they return their second type + parameter), to allow more type safety when combined with generic + simd types: + + #[repr(simd)] struct Simd2(T, T); + extern "platform-intrinsic" { + fn simd_shuffle2(x: T, y: T, idx: [u32; 2]) -> Simd2; + } + + This should be a backwards-compatible generalisation. + +# Alternatives + +- Intrinsics could instead by namespaced by ABI, `extern + "x86-intrinsic"`, `extern "arm-intrinsic"`. +- There could be more syntactic support for shuffles, either with true + syntax, or with a syntax extension. The latter might look like: + `shuffle![x, y, i0, i1, i2, i3, i4, ...]`. However, this requires + that shuffles are restricted to a single type only (i.e. `Simd4` + can be shuffled to `Simd4` but nothing else), or some sort of + type synthesis. The compiler has to somehow work out the return + value: + + ```rust + let x: Simd4 = ...; + let y: Simd4 = ...; + + // reverse all the elements. + let z = shuffle![x, y, 7, 6, 5, 4, 3, 2, 1, 0]; + ``` + + Presumably `z` should be `Simd8`, but it's not obvious how the + compiler can know this. The `repr(simd)` approach means there may be + more than one SIMD-vector type with the `Simd8` shape (or, in + fact, there may be zero). +- With type-level integers, there could be one shuffle intrinsic: + + fn simd_shuffle(x: T, y: T, idx: [u32; N]) -> U; + + NB. It is possible to add this as an additional intrinsic (possibly + deprecating the `simd_shuffleNNN` forms) later. +- Type-level values can be applied more generally: since the shuffle + indices have to be compile time constants, the shuffle could be + + fn simd_shuffle(x: T, y: T) -> U; + +- Instead of platform detection, there could be feature detection + (e.g. "platform supports something equivalent to x86's `DPPS`"), but + there probably aren't enough cross-platform commonalities for this + to be worth it. (Each "feature" would essentially be a platform + specific `cfg` anyway.) +- Check vector operators in debug mode just like the scalar versions. +- Make fixed length arrays `repr(simd)`-able (via just flattening), so + that, say, `#[repr(simd)] struct u32x4([u32; 4]);` and + `#[repr(simd)] struct f64x8([f64; 4], [f64; 4]);` etc works. This + will be most useful if/when we allow generic-lengths, `#[repr(simd)] + struct Simd([T; n]);` +- have 100% guaranteed type-safety for generic `#[repr(simd)]` types + and the generic intrinsics. This would probably require a relatively + complicated set of traits (with compiler integration). + +# Unresolved questions + +- Should integer vectors get division automatically? Most CPUs + don't support them for vectors. +- How should out-of-bounds shuffle and insert/extract indices be handled? diff --git a/text/1200-cargo-install.md b/text/1200-cargo-install.md new file mode 100644 index 00000000000..f51d8e2a4fd --- /dev/null +++ b/text/1200-cargo-install.md @@ -0,0 +1,263 @@ +- Feature Name: N/A +- Start Date: 2015-07-10 +- RFC PR: [rust-lang/rfcs#1200](https://github.com/rust-lang/rfcs/pull/1200) +- Rust Issue: N/A + +# Summary + +Add a new subcommand to Cargo, `install`, which will install `[[bin]]`-based +packages onto the local system in a Cargo-specific directory. + +# Motivation + +There has [almost always been a desire][cargo-37] to be able to install Cargo +packages locally, but it's been somewhat unclear over time what the precise +meaning of this is. Now that we have crates.io and lots of experience with +Cargo, however, the niche that `cargo install` would fill is much clearer. + +[cargo-37]: https://github.com/rust-lang/cargo/issues/37 + +Fundamentally, however, Cargo is a ubiquitous tool among the Rust community and +implementing `cargo install` would facilitate sharing Rust code among its +developers. Simple tasks like installing a new cargo subcommand, installing an +editor plugin, etc, would be just a `cargo install` away. Cargo can manage +dependencies and versions itself to make the process as seamless as possible. + +Put another way, enabling easily sharing code is one of Cargo's fundamental +design goals, and expanding into binaries is simply an extension of Cargo's core +functionality. + +# Detailed design + +The following new subcommand will be added to Cargo: + +``` +Install a crate onto the local system + +Installing new crates: + cargo install [options] + cargo install [options] [-p CRATE | --package CRATE] [--vers VERS] + cargo install [options] --git URL [--branch BRANCH | --tag TAG | --rev SHA] + cargo install [options] --path PATH + +Managing installed crates: + cargo install [options] --list + +Options: + -h, --help Print this message + -j N, --jobs N The number of jobs to run in parallel + --features FEATURES Space-separated list of features to activate + --no-default-features Do not build the `default` feature + --debug Build in debug mode instead of release mode + --bin NAME Only install the binary NAME + --example EXAMPLE Install the example EXAMPLE instead of binaries + -p, --package CRATE Install this crate from crates.io or select the + package in a repository/path to install. + -v, --verbose Use verbose output + --root Directory to install packages into + +This command manages Cargo's local set of install binary crates. Only packages +which have [[bin]] targets can be installed, and all binaries are installed into +`$HOME/.cargo/bin` by default (or `$CARGO_HOME/bin` if you change the home +directory). + +There are multiple methods of installing a new crate onto the system. The +`cargo install` command with no arguments will install the current crate (as +specifed by the current directory). Otherwise the `-p`, `--package`, `--git`, +and `--path` options all specify the source from which a crate is being +installed. The `-p` and `--package` options will download crates from crates.io. + +Crates from crates.io can optionally specify the version they wish to install +via the `--vers` flags, and similarly packages from git repositories can +optionally specify the branch, tag, or revision that should be installed. If a +crate has multiple binaries, the `--bin` argument can selectively install only +one of them, and if you'd rather install examples the `--example` argument can +be used as well. + +The `--list` option will list all installed packages (and their versions). +``` + +## Installing Crates + +Cargo attempts to be as flexible as possible in terms of installing crates from +various locations and specifying what should be installed. All binaries will be +stored in a **cargo-local** directory, and more details on where exactly this is +located can be found below. + +Cargo will not attempt to install binaries or crates into system directories +(e.g. `/usr`) as that responsibility is intended for system package managers. + +To use installed crates one just needs to add the binary path to their `PATH` +environment variable. This will be recommended when `cargo install` is run if +`PATH` does not already look like it's configured. + +#### Crate Sources + +The `cargo install` command will be able to install crates from any source that +Cargo already understands. For example it will start off being able to install +from crates.io, git repositories, and local paths. Like with normal +dependencies, downloads from crates.io can specify a version, git repositories +can specify branches, tags, or revisions. + +#### Sources with multiple crates + +Sources like git repositories and paths can have multiple crates inside them, +and Cargo needs a way to figure out which one is being installed. If there is +more than one crate in a repo (or path), then Cargo will apply the following +heuristics to select a crate, in order: + +1. If the `-p` argument is specified, use that crate. +2. If only one crate has binaries, use that crate. +3. If only one crate has examples, use that crate. +4. Print an error suggesting the `-p` flag. + +#### Multiple binaries in a crate + +Once a crate has been selected, Cargo will by default build all binaries and +install them. This behavior can be modified with the `--bin` or `--example` +flags to configure what's installed on the local system. + +#### Building a Binary + +The `cargo install` command has some standard build options found on `cargo +build` and friends, but a key difference is that `--release` is the default for +installed binaries so a `--debug` flag is present to switch this back to +debug-mode. Otherwise the `--features` flag can be specified to activate various +features of the crate being installed. + +The `--target` option is omitted as `cargo install` is not intended for creating +cross-compiled binaries to ship to other platforms. + +#### Conflicting Crates + +Cargo will not namespace the installation directory for crates, so conflicts may +arise in terms of binary names. For example if crates A and B both provide a +binary called `foo` they cannot be both installed at once. Cargo will reject +these situations and recommend that a binary is selected via `--bin` or the +conflicting crate is uninstalled. + +#### Placing output artifacts + +The `cargo install` command can be customized where it puts its output artifacts +to install packages in a custom location. The root directory of the installation +will be determined in a hierarchical fashion, choosing the first of the +following that is specified: + +1. The `--root` argument on the command line. +2. The environment variable `CARGO_INSTALL_ROOT`. +3. The `install.root` configuration option. +4. The value of `$CARGO_HOME` (also determined in an independent and + hierarchical fashion). + +Once the root directory is found, Cargo will place all binaries in the +`$INSTALL_ROOT/bin` folder. Cargo will also reserve the right to retain some +metadata in this folder in order to keep track of what's installed and what +binaries belong to which package. + +## Managing Installations + +If Cargo gives access to installing packages, it should surely provide the +ability to manage what's installed! The first part of this is just discovering +what's installed, and this is provided via `cargo install --list`. + +## Removing Crates + +To remove an installed crate, another subcommand will be added to Cargo: + +``` +Remove a locally installed crate + +Usage: + cargo uninstall [options] SPEC + +Options: + -h, --help Print this message + --bin NAME Only uninstall the binary NAME + --example EXAMPLE Only uninstall the example EXAMPLE + -v, --verbose Use verbose output + +The argument SPEC is a package id specification (see `cargo help pkgid`) to +specify which crate should be uninstalled. By default all binaries are +uninstalled for a crate but the `--bin` and `--example` flags can be used to +only uninstall particular binaries. +``` + +Cargo won't remove the source for uninstalled crates, just the binaries that +were installed by Cargo itself. + +## Non-binary artifacts + +Cargo will not currently attempt to manage anything other than a binary artifact +of `cargo build`. For example the following items will not be available to +installed crates: + +* Dynamic native libraries built as part of `cargo build`. +* Native assets such as images not included in the binary itself. +* The source code is not guaranteed to exist, and the binary doesn't know where + the source code is. + +Additionally, Cargo will not immediately provide the ability to configure the +installation stage of a package. There is often a desire for a "pre-install +script" which runs various house-cleaning tasks. This is left as a future +extension to Cargo. + +# Drawbacks + +Beyond the standard "this is more surface area" and "this may want to +aggressively include more features initially" concerns there are no known +drawbacks at this time. + +# Alternatives + +### System Package Managers + +The primary alternative to putting effort behind `cargo install` is to instead +put effort behind system-specific package managers. For example the line between +a system package manager and `cargo install` is a little blurry, and the +"official" way to distribute a package should in theory be through a system +package manager. This also has the upside of benefiting those outside the Rust +community as you don't have to have Cargo installed to manage a program. This +approach is not without its downsides, however: + +* There are *many* system package managers, and it's unclear how much effort it + would be for Cargo to support building packages for all of them. +* Actually preparing a package for being packaged in a system package manager + can be quite onerous and is often associated with a high amount of overhead. +* Even once a system package is created, it must be added to an online + repository in one form or another which is often different for each + distribution. + +All in all, even if Cargo invested effort in facilitating creation of system +packages, **the threshold for distribution a Rust program is still too high**. +If everything went according to plan it's just unfortunately inherently complex +to only distribute packages through a system package manager because of the +various requirements and how diverse they are. The `cargo install` command +provides a cross-platform, easy-to-use, if Rust-specific interface to installing +binaries. + +It is expected that all major Rust projects will still invest effort into +distribution through standard package managers, and Cargo will certainly have +room to help out with this, but it doesn't obsolete the need for +`cargo install`. + +### Installing Libraries + +Another possibility for `cargo install` is to not only be able to install +binaries, but also libraries. The meaning of this however, is pretty nebulous +and it's not clear that it's worthwhile. For example all Cargo builds will not +have access to these libraries (as Cargo retains control over dependencies). It +may mean that normal invocations of `rustc` have access to these libraries (e.g. +for small one-off scripts), but it's not clear that this is worthwhile enough to +support installing libraries yet. + +Another possible interpretation of installing libraries is that a developer is +informing Cargo that the library should be available in a pre-compiled form. If +any compile ends up using the library, then it can use the precompiled form +instead of recompiling it. This job, however, seems best left to `cargo build` +as it will automatically handle when the compiler version changes, for example. +It may also be more appropriate to add the caching layer at the `cargo build` +layer instead of `cargo install`. + +# Unresolved questions + +None yet diff --git a/text/1201-naked-fns.md b/text/1201-naked-fns.md new file mode 100644 index 00000000000..870e66dd7a7 --- /dev/null +++ b/text/1201-naked-fns.md @@ -0,0 +1,218 @@ +- Feature Name: `naked_fns` +- Start Date: 2015-07-10 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1201 +- Rust Issue: https://github.com/rust-lang/rust/issues/32408 + +# Summary + +Add support for generating naked (prologue/epilogue-free) functions via a new +function attribute. + +# Motivation + +Some systems programming tasks require that the programmer have complete control +over function stack layout and interpretation, generally in cases where the +compiler lacks support for a specific use case. While these cases can be +addressed by building the requisite code with external tools and linking with +Rust, it is advantageous to allow the Rust compiler to drive the entire process, +particularly in that code may be generated via monomorphization or macro +expansion. + +When writing interrupt handlers for example, most systems require additional +state be saved beyond the usual ABI requirements. To avoid corrupting program +state, the interrupt handler must save the registers which might be modified +before handing control to compiler-generated code. Consider a contrived +interrupt handler for x86\_64: + +```rust +unsafe fn isr_nop() { + asm!("push %rax" + /* Additional pushes elided */ :::: "volatile"); + let n = 0u64; + asm!("pop %rax" + /* Additional pops elided */ :::: "volatile"); +} +``` + +The generated assembly for this function might resemble the following +(simplified for readability): + +```x86 +isr_nop: + sub $8, %rsp + push %rax + movq $0, 0(%rsp) + pop %rax + add $8, %rsp + retq +``` + +Here the programmer's need to save machine state conflicts with the compiler's +assumption that it has complete control over stack layout, with the result that +the saved value of `rax` is clobbered by the compiler. Given that details of +stack layout for any given function are not predictable (and may change with +compiler version or optimization settings), attempting to predict the stack +layout to sidestep this issue is infeasible. + +When interacting with FFIs that are not natively supported by the compiler, +a similar situation arises where the programmer knows the expected calling +convention and can implement a translation between the foreign ABI and one +supported by the compiler. + +Support for naked functions also allows programmers to write functions that +would otherwise be unsafe, such as the following snippet which returns the +address of its caller when called with the C ABI on x86. + +``` + mov 4(%ebp), %eax + ret +``` + +--- + +Because the compiler depends on a function prologue and epilogue to maintain +storage for local variable bindings, it is generally unsafe to write anything +but inline assembly inside a naked function. The [LLVM language +reference](http://llvm.org/docs/LangRef.html#function-attributes) describes this +feature as having "very system-specific consequences", which the programmer must +be aware of. + +# Detailed design + +Add a new function attribute to the language, `#[naked]`, indicating the +function should have prologue/epilogue emission disabled. + +Because the calling convention of a naked function is not guaranteed to match +any calling convention the compiler is compatible with, calls to naked functions +from within Rust code are forbidden unless the function is also declared with +a well-defined ABI. + +Defining a naked function with the default (Rust) ABI is an error, because the +Rust ABI is unspecified and the programmer can never write a function which is +guaranteed to be compatible. For example, The function declaration of `foo` in +the following code block is an error. + +```rust +#[naked] +unsafe fn foo() { } +``` + +The following variant is not an error because the C calling convention is +well-defined and it is thus possible for the programmer to write a conforming +function: + +```rust +#[naked] +extern "C" fn foo() { } +``` + +--- + +Because the compiler cannot verify the correctness of code written in a naked +function (since it may have an unknown calling convention), naked functions must +be declared `unsafe` or contain no non-`unsafe` statements in the body. The +function `error` in the following code block is a compile-time error, whereas +the functions `correct1` and `correct2` are permitted. + +``` +#[naked] +extern "C" fn error(x: &mut u8) { + *x += 1; +} + +#[naked] +unsafe extern "C" fn correct1(x: &mut u8) { + *x += 1; +} + +#[naked] +extern "C" fn correct2(x: &mut u8) { + unsafe { + *x += 1; + } +} +``` + +## Example + +The following example illustrates the possible use of a naked function for +implementation of an interrupt service routine on 32-bit x86. + +```rust +use std::intrinsics; +use std::sync::atomic::{self, AtomicUsize, Ordering}; + +#[naked] +#[cfg(target_arch="x86")] +unsafe extern "C" fn isr_3() { + asm!("pushad + call increment_breakpoint_count + popad + iretd" :::: "volatile"); + intrinsics::unreachable(); +} + +static bp_count: AtomicUsize = ATOMIC_USIZE_INIT; + +#[no_mangle] +pub fn increment_breakpoint_count() { + bp_count.fetch_add(1, Ordering::Relaxed); +} + +fn register_isr(vector: u8, handler: unsafe extern "C" fn() -> ()) { /* ... */ } + +fn main() { + register_isr(3, isr_3); + // ... +} +``` + +## Implementation Considerations + +The current support for `extern` functions in `rustc` generates a minimum of two +basic blocks for any function declared in Rust code with a non-default calling +convention: a trampoline which translates the declared calling convention to the +Rust convention, and a Rust ABI version of the function containing the actual +implementation. Calls to the function from Rust code call the Rust ABI version +directly. + +For naked functions, it is impossible for the compiler to generate a Rust ABI +version of the function because the implementation may depend on the calling +convention. In cases where calling a naked function from Rust is permitted, the +compiler must be able to use the target calling convention directly rather than +call the same function with the Rust convention. + +# Drawbacks + +The utility of this feature is extremely limited to most users, and it might be +misused if the implications of writing a naked function are not carefully +considered. + +# Alternatives + +Do nothing. The required functionality for the use case outlined can be +implemented outside Rust code and linked in as needed. Support for additional +calling conventions could be added to the compiler as needed, or emulated with +external libraries such as `libffi`. + +# Unresolved questions + +It is easy to quietly generate wrong code in naked functions, such as by causing +the compiler to allocate stack space for temporaries where none were +anticipated. There is currently no restriction on writing Rust statements inside +a naked function, while most compilers supporting similar features either +require or strongly recommend that authors write only inline assembly inside +naked functions to ensure no code is generated that assumes a particular stack +layout. It may be desirable to place further restrictions on what statements are +permitted in the body of a naked function, such as permitting only `asm!` +statements. + +The `unsafe` requirement on naked functions may not be desirable in all cases. +However, relaxing that requirement in the future would not be a breaking change. + +Because a naked function may use a calling convention unknown to the compiler, +it may be useful to add a "unknown" calling convention to the compiler which is +illegal to call directly. Absent this feature, functions implementing an unknown +ABI would need to be declared with a calling convention which is known to be +incorrect and depend on the programmer to avoid calling such a function +incorrectly since it cannot be prevented statically. diff --git a/text/1210-impl-specialization.md b/text/1210-impl-specialization.md new file mode 100644 index 00000000000..44039c2ee24 --- /dev/null +++ b/text/1210-impl-specialization.md @@ -0,0 +1,2016 @@ +- Feature Name: specialization +- Start Date: 2015-06-17 +- RFC PR: [rust-lang/rfcs#1210](https://github.com/rust-lang/rfcs/pull/1210) +- Rust Issue: [rust-lang/rust#31844](https://github.com/rust-lang/rust/issues/31844) + +# Summary + +This RFC proposes a design for *specialization*, which permits multiple `impl` +blocks to apply to the same type/trait, so long as one of the blocks is clearly +"more specific" than the other. The more specific `impl` block is used in a case +of overlap. The design proposed here also supports refining default trait +implementations based on specifics about the types involved. + +Altogether, this relatively small extension to the trait system yields benefits +for performance and code reuse, and it lays the groundwork for an "efficient +inheritance" scheme that is largely based on the trait system (described in a +forthcoming companion RFC). + +# Motivation + +Specialization brings benefits along several different axes: + +* **Performance**: specialization expands the scope of "zero cost abstraction", + because specialized impls can provide custom high-performance code for + particular, concrete cases of an abstraction. + +* **Reuse**: the design proposed here also supports refining default (but + incomplete) implementations of a trait, given details about the types + involved. + +* **Groundwork**: the design lays the groundwork for supporting + ["efficient inheritance"](https://internals.rust-lang.org/t/summary-of-efficient-inheritance-rfcs/494) + through the trait system. + +The following subsections dive into each of these motivations in more detail. + +## Performance + +The simplest and most longstanding motivation for specialization is +performance. + +To take a very simple example, suppose we add a trait for overloading the `+=` +operator: + +```rust +trait AddAssign { + fn add_assign(&mut self, Rhs); +} +``` + +It's tempting to provide an impl for any type that you can both `Clone` and +`Add`: + +```rust +impl + Clone> AddAssign for T { + fn add_assign(&mut self, rhs: R) { + let tmp = self.clone() + rhs; + *self = tmp; + } +} +``` + +This impl is especially nice because it means that you frequently don't have to +bound separately by `Add` and `AddAssign`; often `Add` is enough to give you +both operators. + +However, in today's Rust, such an impl would rule out any more specialized +implementation that, for example, avoids the call to `clone`. That means there's +a tension between simple abstractions and code reuse on the one hand, and +performance on the other. Specialization resolves this tension by allowing both +the blanket impl, and more specific ones, to coexist, using the specialized ones +whenever possible (and thereby guaranteeing maximal performance). + +More broadly, traits today can provide static dispatch in Rust, but they can +still impose an abstraction tax. For example, consider the `Extend` trait: + +```rust +pub trait Extend { + fn extend(&mut self, iterable: T) where T: IntoIterator; +} +``` + +Collections that implement the trait are able to insert data from arbitrary +iterators. Today, that means that the implementation can assume nothing about +the argument `iterable` that it's given except that it can be transformed into +an iterator. That means the code must work by repeatedly calling `next` and +inserting elements one at a time. + +But in specific cases, like extending a vector with a slice, a much more +efficient implementation is possible -- and the optimizer isn't always capable +of producing it automatically. In such cases, specialization can be used to get +the best of both worlds: retaining the abstraction of `extend` while providing +custom code for specific cases. + +The design in this RFC relies on multiple, overlapping trait impls, so to take +advantage for `Extend` we need to refactor a bit: + +```rust +pub trait Extend> { + fn extend(&mut self, iterable: T); +} + +// The generic implementation +impl Extend for Vec where T: IntoIterator { + // the `default` qualifier allows this method to be specialized below + default fn extend(&mut self, iterable: T) { + ... // implementation using push (like today's extend) + } +} + +// A specialized implementation for slices +impl<'a, A> Extend for Vec { + fn extend(&mut self, iterable: &'a [A]) { + ... // implementation using ptr::write (like push_all) + } +} +``` + +Other kinds of specialization are possible, including using marker traits like: + +```rust +unsafe trait TrustedSizeHint {} +``` + +that can allow the optimization to apply to a broader set of types than slices, +but are still more specific than `T: IntoIterator`. + +## Reuse + +Today's default methods in traits are pretty limited: they can assume only the +`where` clauses provided by the trait itself, and there is no way to provide +conditional or refined defaults that rely on more specific type information. + +For example, consider a different design for overloading `+` and `+=`, such that +they are always overloaded together: + +```rust +trait Add { + type Output; + fn add(self, rhs: Rhs) -> Self::Output; + fn add_assign(&mut self, Rhs); +} +``` + +In this case, there's no natural way to provide a default implementation of +`add_assign`, since we do not want to restrict the `Add` trait to `Clone` data. + +The specialization design in this RFC also allows for *default impls*, +which can provide specialized defaults without actually providing a +full trait implementation: + +```rust +// the `default` qualifier here means (1) not all items are impled +// and (2) those that are can be further specialized +default impl Add for T { + fn add_assign(&mut self, rhs: R) { + let tmp = self.clone() + rhs; + *self = tmp; + } +} +``` + +This default impl does *not* mean that `Add` is implemented for all `Clone` +data, but just that when you do impl `Add` and `Self: Clone`, you can leave off +`add_assign`: + +```rust +#[derive(Copy, Clone)] +struct Complex { + // ... +} + +impl Add for Complex { + type Output = Complex; + fn add(self, rhs: Complex) { + // ... + } + // no fn add_assign necessary +} +``` + +A particularly nice case of refined defaults comes from trait hierarchies: you +can sometimes use methods from subtraits to improve default supertrait +methods. For example, consider the relationship between `size_hint` and +`ExactSizeIterator`: + +```rust +default impl Iterator for T where T: ExactSizeIterator { + fn size_hint(&self) -> (usize, Option) { + (self.len(), Some(self.len())) + } +} +``` + +## Supporting efficient inheritance + +Finally, specialization can be seen as a form of inheritance, since methods +defined within a blanket impl can be overridden in a fine-grained way by a more +specialized impl. As we will see, this analogy is a useful guide to the design +of specialization. But it is more than that: the specialization design proposed +here is specifically tailored to support "efficient inheritance" schemes (like +those discussed +[here](https://internals.rust-lang.org/t/summary-of-efficient-inheritance-rfcs/494)) +without adding an entirely separate inheritance mechanism. + +The key insight supporting this design is that virtual method definitions in +languages like C++ and Java actually encompass two distinct mechanisms: virtual +dispatch (also known as "late binding") and implementation inheritance. These +two mechanisms can be separated and addressed independently; this RFC +encompasses an "implementation inheritance" mechanism distinct from virtual +dispatch, and useful in a number of other circumstances. But it can be combined +nicely with an orthogonal mechanism for virtual dispatch to give a complete +story for the "efficient inheritance" goal that many previous RFCs targeted. + +The author is preparing a companion RFC showing how this can be done with a +relatively small further extension to the language. But it should be said that +the design in *this* RFC is fully motivated independently of its companion RFC. + +# Detailed design + +There's a fair amount of material to cover, so we'll start with a basic overview +of the design in intuitive terms, and then look more formally at a specification. + +At the simplest level, specialization is about allowing overlap between impl +blocks, so long as there is always an unambiguous "winner" for any type falling +into the overlap. For example: + +```rust +impl Debug for T where T: Display { + fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { + ::fmt(self, f) + } +} + +impl Debug for String { + fn fmt(&self, f: &mut Formatter) -> Result { + try!(write!(f, "\"")); + for c in self.chars().flat_map(|c| c.escape_default()) { + try!(write!(f, "{}", c)); + } + write!(f, "\"") + } +} +``` + +The idea for this pair of impls is that you can rest assured that *any* type +implementing `Display` will also implement `Debug` via a reasonable default, but +go on to provide more specific `Debug` implementations when warranted. In +particular, the intuition is that a `Self` type of `String` is somehow "more +specific" or "more concrete" than `T where T: Display`. + +The bulk of the detailed design is aimed at making this intuition more +precise. But first, we need to explore some problems that arise when you +introduce specialization in any form. + +## Hazard: interactions with type checking + +Consider the following, somewhat odd example of overlapping impls: + +```rust +trait Example { + type Output; + fn generate(self) -> Self::Output; +} + +impl Example for T { + type Output = Box; + fn generate(self) -> Box { Box::new(self) } +} + +impl Example for bool { + type Output = bool; + fn generate(self) -> bool { self } +} +``` + +The key point to pay attention to here is the difference in associated types: +the blanket impl uses `Box`, while the impl for `bool` just uses `bool`. +If we write some code that uses the above impls, we can get into trouble: + +```rust +fn trouble(t: T) -> Box { + Example::generate(t) +} + +fn weaponize() -> bool { + let b: Box = trouble(true); + *b +} +``` + +What's going on? When type checking `trouble`, the compiler has a type `T` about +which it knows nothing, and sees an attempt to employ the `Example` trait via +`Example::generate(t)`. Because of the blanket impl, this use of `Example` is +allowed -- but furthermore, the associated type found in the blanket impl is now +directly usable, so that `::Output` is known within `trouble` to +be `Box`, allowing `trouble` to type check. But during *monomorphization*, +`weaponize` will actually produce a version of the code that returns a boolean +instead, and then attempt to dereference that boolean. In other words, things +look different to the typechecker than they do to codegen. Oops. + +So what went wrong? It should be fine for the compiler to assume that `T: +Example` for all `T`, given the blanket impl. But it's clearly problematic to +*also* assume that the associated types will be the ones given by that blanket +impl. Thus, the "obvious" solution is just to generate a type error in `trouble` +by preventing it from assuming `::Output` is `Box`. + +Unfortunately, this solution doesn't work. For one thing, it would be a breaking +change, since the following code *does* compile today: + +```rust +trait Example { + type Output; + fn generate(self) -> Self::Output; +} + +impl Example for T { + type Output = Box; + fn generate(self) -> Box { Box::new(self) } +} + +fn trouble(t: T) -> Box { + Example::generate(t) +} +``` + +And there are definitely cases where this pattern is important. To pick just one +example, consider the following impl for the slice iterator: + +```rust +impl<'a, T> Iterator for Iter<'a, T> { + type Item = &'a T; + // ... +} +``` + +It's essential that downstream code be able to assume that ` as +Iterator>::Item` is just `&'a T`, no matter what `'a` and `T` happen to be. + +Furthermore, it doesn't work to say that the compiler can make this kind of +assumption *unless* specialization is being used, since we want to allow +downstream crates to add specialized impls. We need to know up front. + +Another possibility would be to simply disallow specialization of associated +types. But the trouble described above isn't limited to associated types. Every +function/method in a trait has an implicit associated type that implements the +closure types, and similar bad assumptions about blanket impls can crop up +there. It's not entirely clear whether they can be weaponized, however. (That +said, it may be reasonable to stabilize only specialization of functions/methods +to begin with, and wait for strong use cases of associated type specialization +to emerge before stabilizing that.) + +The solution proposed in this RFC is instead to treat specialization of items in +a trait as a per-item *opt in*, described in the next section. + +## The `default` keyword + +Many statically-typed languages that allow refinement of behavior in some +hierarchy also come with ways to signal whether or not this is allowed: + +- C++ requires the `virtual` keyword to permit a method to be overridden in + subclasses. Modern C++ also supports `final` and `override` qualifiers. + +- C# requires the `virtual` keyword at definition and `override` at point of + overriding an existing method. + +- Java makes things silently virtual, but supports `final` as an opt out. + +Why have these qualifiers? Overriding implementations is, in a way, "action at a +distance". It means that the code that's actually being run isn't obvious when +e.g. a class is defined; it can change in subclasses defined +elsewhere. Requiring qualifiers is a way of signaling that this non-local change +is happening, so that you know you need to look more globally to understand the +actual behavior of the class. + +While impl specialization does not directly involve virtual dispatch, it's +closely-related to inheritance, and it allows some amount of "action at a +distance" (modulo, as we'll see, coherence rules). We can thus borrow directly +from these previous designs. + +This RFC proposes a "final-by-default" semantics akin to C++ that is +backwards-compatible with today's Rust, which means that the following +overlapping impls are prohibited: + +```rust +impl Example for T { + type Output = Box; + fn generate(self) -> Box { Box::new(self) } +} + +impl Example for bool { + type Output = bool; + fn generate(self) -> bool { self } +} +``` + +The error in these impls is that the first impl is implicitly defining "final" +versions of its items, which are thus not allowed to be refined in further +specializations. + +If you want to allow specialization of an item, you do so via the `default` +qualifier *within the impl block*: + +```rust +impl Example for T { + default type Output = Box; + default fn generate(self) -> Box { Box::new(self) } +} + +impl Example for bool { + type Output = bool; + fn generate(self) -> bool { self } +} +``` + +Thus, when you're trying to understand what code is going to be executed, if you +see an impl that applies to a type and the relevant item is *not* marked +`default`, you know that the definition you're looking at is the one that will +apply. If, on the other hand, the item is marked `default`, you need to scan for +other impls that could apply to your type. The coherence rules, described below, +help limit the scope of this search in practice. + +This design optimizes for fine-grained control over when specialization is +permitted. It's worth pausing for a moment and considering some alternatives and +questions about the design: + +- **Why mark `default` on impls rather than the trait?** There are a few reasons + to have `default` apply at the impl level. First of all, traits are + fundamentally *interfaces*, while `default` is really about + *implementations*. Second, as we'll see, it's useful to be able to "seal off" + a certain avenue of specialization while leaving others open; doing it at the + trait level is an all-or-nothing choice. + +- **Why mark `default` on items rather than the entire impl?** Again, this is + largely about granularity; it's useful to be able to pin down part of an impl + while leaving others open for specialization. Furthermore, while this RFC + doesn't propose to do it, we could easily add a shorthand later on in which + `default impl Trait for Type` is sugar for adding `default` to all items in + the impl. + +- **Won't `default` be confused with default methods?** Yes! But usefully so: as + we'll see, in this RFC's design today's default methods become sugar for + tomorrow's specialization. + +Finally, how does `default` help with the hazards described above? Easy: an +associated type from a blanket impl must be treated "opaquely" if it's marked +`default`. That is, if you write these impls: + +```rust +impl Example for T { + default type Output = Box; + default fn generate(self) -> Box { Box::new(self) } +} + +impl Example for bool { + type Output = bool; + fn generate(self) -> bool { self } +} +``` + +then the function `trouble` will fail to typecheck: + +```rust +fn trouble(t: T) -> Box { + Example::generate(t) +} +``` + +The error is that `::Output` no longer normalizes to `Box`, +because the applicable blanket impl marks the type as `default`. The fact that +`default` is an opt in makes this behavior backwards-compatible. + +The main drawbacks of this solution are: + +- **API evolution**. Adding `default` to an associated type *takes away* some + abilities, which makes it a breaking change to a public API. (In principle, + this is probably true for functions/methods as well, but the breakage there is + theoretical at most.) However, given the design constraints discussed so far, + this seems like an inevitable aspect of any simple, backwards-compatible + design. + +- **Verbosity**. It's possible that certain uses of the trait system will result + in typing `default` quite a bit. This RFC takes a conservative approach of + introducing the keyword at a fine-grained level, but leaving the door open to + adding shorthands (like writing `default impl ...`) in the future, if need be. + +## Overlapping impls and specialization + +### What is overlap? + +Rust today does not allow any "overlap" between impls. Intuitively, this means +that you cannot write two trait impls that could apply to the same "input" +types. (An input type is either `Self` or a type parameter of the trait). For +overlap to occur, the input types must be able to "unify", which means that +there's some way of instantiating any type parameters involved so that the input +types are the same. Here are some examples: + +```rust +trait Foo {} + +// No overlap: String and Vec cannot unify. +impl Foo for String {} +impl Foo for Vec {} + +// No overlap: Vec and Vec cannot unify because u16 and u8 cannot unify. +impl Foo for Vec {} +impl Foo for Vec {} + +// Overlap: T can be instantiated to String. +impl Foo for T {} +impl Foo for String {} + +// Overlap: Vec and Vec can unify because T can be instantiated to u8. +impl Foo for Vec {} +impl Foo for Vec + +// No overlap: String and Vec cannot unify, no matter what T is. +impl Foo for String {} +impl Foo for Vec {} + +// Overlap: for any T that is Clone, both impls apply. +impl Foo for Vec where T: Clone {} +impl Foo for Vec {} + +// No overlap: implicitly, T: Sized, and since !Foo: Sized, you cannot instantiate T with it. +impl Foo for Box {} +impl Foo for Box {} + +trait Trait1 {} +trait Trait2 {} + +// Overlap: nothing prevents a T such that T: Trait1 + Trait2. +impl Foo for T {} +impl Foo for T {} + +trait Trait3 {} +trait Trait4: Trait3 {} + +// Overlap: any T: Trait4 is covered by both impls. +impl Foo for T {} +impl Foo for T {} + +trait Bar {} + +// No overlap: *all* input types must unify for overlap to happen. +impl Bar for u8 {} +impl Bar for u8 {} + +// No overlap: *all* input types must unify for overlap to happen. +impl Bar for T {} +impl Bar for T {} + +// No overlap: no way to instantiate T such that T == u8 and T == u16. +impl Bar for T {} +impl Bar for u8 {} + +// Overlap: instantiate U as T. +impl Bar for T {} +impl Bar for U {} + +// No overlap: no way to instantiate T such that T == &'a T. +impl Bar for T {} +impl<'a, T> Bar<&'a T> for T {} + +// Overlap: instantiate T = &'a U. +impl Bar for T {} +impl<'a, T, U> Bar for &'a U where U: Bar {} +``` + +### Permitting overlap + +The goal of specialization is to allow overlapping impls, but it's not as simple +as permitting *all* overlap. There has to be a way to decide which of two +overlapping impls to actually use for a given set of input types. The simpler +and more intuitive the rule for deciding, the easier it is to write and reason +about code -- and since dispatch is already quite complicated, simplicity here +is a high priority. On the other hand, the design should support as many of the +motivating use cases as possible. + +The basic intuition we've been using for specialization is the idea that one +impl is "more specific" than another it overlaps with. Before turning this +intuition into a rule, let's go through the previous examples of overlap and +decide which, if any, of the impls is intuitively more specific. **Note that since +we're leaving out the body of the impls, you won't see the `default` keyword +that would be required in practice for the less specialized impls.** + +```rust +trait Foo {} + +// Overlap: T can be instantiated to String. +impl Foo for T {} +impl Foo for String {} // String is more specific than T + +// Overlap: Vec and Vec can unify because T can be instantiated to u8. +impl Foo for Vec {} +impl Foo for Vec // Vec is more specific than Vec + +// Overlap: for any T that is Clone, both impls apply. +impl Foo for Vec // "Vec where T: Clone" is more specific than "Vec for any T" + where T: Clone {} +impl Foo for Vec {} + +trait Trait1 {} +trait Trait2 {} + +// Overlap: nothing prevents a T such that T: Trait1 + Trait2 +impl Foo for T {} // Neither is more specific; +impl Foo for T {} // there's no relationship between the traits here + +trait Trait3 {} +trait Trait4: Trait3 {} + +// Overlap: any T: Trait4 is covered by both impls. +impl Foo for T {} +impl Foo for T {} // T: Trait4 is more specific than T: Trait3 + +trait Bar {} + +// Overlap: instantiate U as T. +impl Bar for T {} // More specific since both input types are identical +impl Bar for U {} + +// Overlap: instantiate T = &'a U. +impl Bar for T {} // Neither is more specific +impl<'a, T, U> Bar for &'a U + where U: Bar {} +``` + +What are the patterns here? + +- Concrete types are more specific than type variables, e.g.: + - `String` is more specific than `T` + - `Vec` is more specific than `Vec` +- More constraints lead to more specific impls, e.g.: + - `T: Clone` is more specific than `T` + - `Bar for T` is more specific than `Bar for U` +- Unrelated constraints don't contribute, e.g.: + - Neither `T: Trait1` nor `T: Trait2` is more specific than the other. + +For many purposes, the above simple patterns are sufficient for working with +specialization. But to provide a spec, we need a more general, formal way of +deciding precedence; we'll give one next. + +### Defining the precedence rules + +An impl block `I` contains basically two pieces of information relevant to +specialization: + +- A set of type variables, like `T, U` in `impl Bar for U`. + - We'll call this `I.vars`. +- A set of where clauses, like `T: Clone` in `impl Foo for Vec`. + - We'll call this `I.wc`. + +We're going to define a *specialization relation* `<=` between impl blocks, so +that `I <= J` means that impl block `I` is "at least as specific as" impl block +`J`. (If you want to think of this in terms of "size", you can imagine that the +set of types `I` applies to is no bigger than those `J` applies to.) + +We'll say that `I < J` if `I <= J` and `!(J <= I)`. In this case, `I` is *more +specialized* than `J`. + +To ensure specialization is coherent, we will ensure that for any two impls `I` +and `J` that overlap, we have either `I < J` or `J < I`. That is, one must be +truly more specific than the other. Specialization chooses the "smallest" impl +in this order -- and the new overlap rule ensures there is a unique smallest +impl among those that apply to a given set of input types. + +More broadly, while `<=` is not a total order on *all* impls of a given trait, +it will be a total order on any set of impls that all mutually overlap, which is +all we need to determine which impl to use. + +One nice thing about this approach is that, if there is an overlap without there +being an intersecting impl, the compiler can tell the programmer *precisely +which impl needs to be written* to disambiguate the overlapping portion. + +We'll start with an abstract/high-level formulation, and then build up toward an +algorithm for deciding specialization by introducing a number of building +blocks. + +#### Abstract formulation + +Recall that the +[input types](https://github.com/aturon/rfcs/blob/associated-items/active/0000-associated-items.md) +of a trait are the `Self` type and all trait type parameters. So the following +impl has input types `bool`, `u8` and `String`: + +```rust +trait Baz { .. } +// impl I +impl Baz for String { .. } +``` + +If you think of these input types as a tuple, `(bool, u8, String`) you can think +of each trait impl `I` as determining a set `apply(I)` of input type tuples that +obeys `I`'s where clauses. The impl above is just the singleton set `apply(I) = { (bool, +u8, String) }`. Here's a more interesting case: + +```rust +// impl J +impl Baz for U where T: Clone { .. } +``` + +which gives the set `apply(J) = { (T, u8, U) | T: Clone }`. + +Two impls `I` and `J` overlap if `apply(I)` and `apply(J)` intersect. + +**We can now define the specialization order abstractly**: `I <= J` if +`apply(I)` is a subset of `apply(J)`. + +This is true of the two sets above: + +``` +apply(I) = { (bool, u8, String) } + is a strict subset of +apply(J) = { (T, u8, U) | T: Clone } +``` + +Here are a few more examples. + +**Via where clauses**: + +```rust +// impl I +// apply(I) = { T | T a type } +impl Foo for T {} + +// impl J +// apply(J) = { T | T: Clone } +impl Foo for T where T: Clone {} + +// J < I +``` + +**Via type structure**: + +```rust +// impl I +// apply(I) = { (T, U) | T, U types } +impl Bar for U {} + +// impl J +// apply(J) = { (T, T) | T a type } +impl Bar for T {} + +// J < I +``` + +The same reasoning can be applied to all of the examples we saw earlier, and the +reader is encouraged to do so. We'll look at one of the more subtle cases here: + +```rust +// impl I +// apply(I) = { (T, T) | T any type } +impl Bar for T {} + +// impl J +// apply(J) = { (T, &'a U) | U: Bar, 'a any lifetime } +impl<'a, T, U> Bar for &'a U where U: Bar {} +``` + +The claim is that `apply(I)` and `apply(J)` intersect, but neither contains the +other. Thus, these two impls are not permitted to coexist according to this +RFC's design. (We'll revisit this limitation toward the end of the RFC.) + +#### Algorithmic formulation + +The goal in the remainder of this section is to turn the above abstract +definition of `<=` into something closer to an algorithm, connected to existing +mechanisms in the Rust compiler. We'll start by reformulating `<=` in a way that +effectively "inlines" `apply`: + +`I <= J` if: + +- For any way of instantiating `I.vars`, there is some way of instantiating + `J.vars` such that the `Self` type and trait type parameters match up. + +- For this instantiation of `I.vars`, if you assume `I.wc` holds, you can prove + `J.wc`. + +It turns out that the compiler is already quite capable of answering these +questions, via "unification" and "skolemization", which we'll see next. + +##### Unification: solving equations on types + +Unification is the workhorse of type inference and many other mechanisms in the +Rust compiler. You can think of it as a way of solving equations on types that +contain variables. For example, consider the following situation: + +```rust +fn use_vec(v: Vec) { .. } + +fn caller() { + let v = vec![0u8, 1u8]; + use_vec(v); +} +``` + +The compiler ultimately needs to infer what type to use for the `T` in `use_vec` +within the call in `caller`, given that the actual argument has type +`Vec`. You can frame this as a unification problem: solve the equation +`Vec = Vec`. Easy enough: `T = u8`! + +Some equations can't be solved. For example, if we wrote instead: + +```rust +fn caller() { + let s = "hello"; + use_vec(s); +} +``` + +we would end up equating `Vec = &str`. There's no choice of `T` that makes +that equation work out. Type error! + +Unification often involves solving a series of equations between types +simultaneously, but it's not like high school algebra; the equations involved +all have the limited form of `type1 = type2`. + +One immediate way in which unification is relevant to this RFC is in determining +when two impls "overlap": roughly speaking, they overlap if each pair of input +types can be unified simultaneously. For example: + +```rust +// No overlap: String and bool do not unify +impl Foo for String { .. } +impl Foo for bool { .. } + +// Overlap: String and T unify +impl Foo for String { .. } +impl Foo for T { .. } + +// Overlap: T = U, T = V is trivially solvable +impl Bar for T { .. } +impl Bar for V { .. } + +// No overlap: T = u8, T = bool not solvable +impl Bar for T { .. } +impl Bar for bool { .. } +``` + +Note the difference in how *concrete types* and *type variables* work for +unification. When `T`, `U` and `V` are variables, it's fine to say that `T = U`, +`T = V` is solvable: we can make the impls overlap by instantiating all three +variables with the same type. But asking for e.g. `String = bool` fails, because +these are concrete types, not variables. (The same happens in algebra; consider +that `2 = 3` cannot be solved, but `x = y` and `y = z` can be.) This +distinction may seem obvious, but we'll next see how to leverage it in a +somewhat subtle way. + +##### Skolemization: asking forall/there exists questions + +We've already rephrased `<=` to start with a "for all, there exists" problem: + +- For any way of instantiating `I.vars`, there is some way of instantiating + `J.vars` such that the `Self` type and trait type parameters match up. + +For example: + +```rust +// impl I +impl Bar for T {} + +// impl J +impl Bar for V {} +``` + +For any choice of `T`, it's possible to choose a `U` and `V` such that the two +impls match -- just choose `U = T` and `V = T`. But the opposite isn't possible: +if `U` and `V` are different (say, `String` and `bool`), then no choice of `T` +will make the two impls match up. + +This feels similar to a unification problem, and it turns out we can solve it +with unification using a scary-sounding trick known as "skolemization". + +Basically, to "skolemize" a type variable is to treat it *as if it were a +concrete type*. So if `U` and `V` are skolemized, then `U = V` is unsolvable, in +the same way that `String = bool` is unsolvable. That's perfect for capturing +the "for any instantiation of I.vars" part of what we want to formalize. + +With this tool in hand, we can further rephrase the "for all, there exists" part +of `<=` in the following way: + +- After skolemizing `I.vars`, it's possible to unify `I` and `J`. + +Note that a successful unification through skolemization gives you the same +answer as you'd get if you unified without skolemizing. + +##### The algorithmic version + +One outcome of running unification on two impls as above is that we can +understand both impl headers in terms of a single set of type variables. For +example: + +```rust +// Before unification: +impl Bar for T where T: Clone { .. } +impl Bar for Vec where V: Debug { .. } + +// After unification: +// T = Vec +// U = Vec +// V = W +impl Bar> for Vec where Vec: Clone { .. } +impl Bar> for Vec where W: Debug { .. } +``` + +By putting everything in terms of a single set of type params, it becomes +possible to do things like compare the `where` clauses, which is the last piece +we need for a final rephrasing of `<=` that we can implement directly. + +Putting it all together, we'll say `I <= J` if: + +- After skolemizing `I.vars`, it's possible to unify `I` and `J`. +- Under the resulting unification, `I.wc` implies `J.wc` + +Let's look at a couple more examples to see how this works: + +```rust +trait Trait1 {} +trait Trait2 {} + +// Overlap: nothing prevents a T such that T: Trait1 + Trait2 +impl Foo for T {} // Neither is more specific; +impl Foo for T {} // there's no relationship between the traits here +``` + +In comparing these two impls in either direction, we make it past unification +and must try to prove that one where clause implies another. But `T: Trait1` +does not imply `T: Trait2`, nor vice versa, so neither impl is more specific +than the other. Since the impls do overlap, an ambiguity error is reported. + +On the other hand: + +```rust +trait Trait3 {} +trait Trait4: Trait3 {} + +// Overlap: any T: Trait4 is covered by both impls. +impl Foo for T {} +impl Foo for T {} // T: Trait4 is more specific than T: Trait3 +``` + +Here, since `T: Trait4` implies `T: Trait3` but not vice versa, we get + +```rust +impl Foo for T < impl Foo for T +``` + +##### Key properties + +Remember that for each pair of impls `I`, `J`, the compiler will check that +exactly one of the following holds: + +- `I` and `J` do not overlap (a unification check), or else +- `I < J`, or else +- `J < I` + +Recall also that if there is an overlap without there being an intersecting +impl, the compiler can tell the programmer *precisely which impl needs to be +written* to disambiguate the overlapping portion. + +Since `I <= J` ultimately boils down to a subset relationship, we get a lot of +nice properties for free (e.g., transitivity: if `I <= J <= K` then `I <= K`). +Together with the compiler check above, we know that at monomorphization time, +after filtering to the impls that apply to some concrete input types, there will +always be a unique, smallest impl in specialization order. (In particular, if +multiple impls apply to concrete input types, those impls must overlap.) + +There are various implementation strategies that avoid having to recalculate the +ordering during monomorphization, but we won't delve into those details in this +RFC. + +### Implications for coherence + +The coherence rules ensure that there is never an ambiguity about which impl to +use when monomorphizing code. Today, the rules consist of the simple overlap +check described earlier, and the "orphan" check which limits the crates in which +impls are allowed to appear ("orphan" refers to an impl in a crate that defines +neither the trait nor the types it applies to). The orphan check is needed, in +particular, so that overlap cannot be created accidentally when linking crates +together. + +The design in this RFC heavily revises the overlap check, as described above, +but does not propose any changes to the orphan check (which is described in +[a blog post](http://smallcultfollowing.com/babysteps/blog/2015/01/14/little-orphan-impls/)). Basically, +the change to the overlap check does not appear to change the cases in which +orphan impls can cause trouble. And a moment's thought reveals why: if two +sibling crates are unaware of each other, there's no way that they could each +provide an impl overlapping with the other, yet be sure that one of those impls +is more specific than the other in the overlapping region. + +### Interaction with lifetimes + +A hard constraint in the design of the trait system is that *dispatch cannot +depend on lifetime information*. In particular, we both cannot, and should not +allow specialization based on lifetimes: + +- We can't, because when the compiler goes to actually generate code ("trans"), + lifetime information has been erased -- so we'd have no idea what + specializations would soundly apply. + +- We shouldn't, because lifetime inference is subtle and would often lead to + counterintuitive results. For example, you could easily fail to get `'static` + even if it applies, because inference is choosing the smallest lifetime that + matches the other constraints. + +To be more concrete, here are some scenarios which should not be allowed: + +```rust +// Not allowed: trans doesn't know if T: 'static: +trait Bad1 {} +impl Bad1 for T {} +impl Bad1 for T {} + +// Not allowed: trans doesn't know if two refs have equal lifetimes: +trait Bad2 {} +impl Bad2 for T {} +impl<'a, T, U> Bad2<&'b U> for &'a T {} +``` + +But simply *naming* a lifetime that must exist, without *constraining* it, is fine: + +```rust +// Allowed: specializes based on being *any* reference, regardless of lifetime +trait Good {} +impl Good for T {} +impl<'a, T> Good for &'a T {} +``` + +In addition, it's okay for lifetime constraints to show up as long as +they aren't part of specialization: + +```rust +// Allowed: *all* impls impose the 'static requirement; the dispatch is happening +// purely based on `Clone` +trait MustBeStatic {} +impl MustBeStatic for T {} +impl MustBeStatic for T {} +``` + +#### Going down the rabbit hole + +Unfortunately, we cannot easily rule out the undesirable lifetime-dependent +specializations, because they can be "hidden" behind innocent-looking trait +bounds that can even cross crates: + +```rust +//////////////////////////////////////////////////////////////////////////////// +// Crate marker +//////////////////////////////////////////////////////////////////////////////// + +trait Marker {} +impl Marker for u32 {} + +//////////////////////////////////////////////////////////////////////////////// +// Crate foo +//////////////////////////////////////////////////////////////////////////////// + +extern crate marker; + +trait Foo { + fn foo(&self); +} + +impl Foo for T { + default fn foo(&self) { + println!("Default impl"); + } +} + +impl Foo for T { + fn foo(&self) { + println!("Marker impl"); + } +} + +//////////////////////////////////////////////////////////////////////////////// +// Crate bar +//////////////////////////////////////////////////////////////////////////////// + +extern crate marker; + +pub struct Bar(T); +impl marker::Marker for Bar {} + +//////////////////////////////////////////////////////////////////////////////// +// Crate client +//////////////////////////////////////////////////////////////////////////////// + +extern crate foo; +extern crate bar; + +fn main() { + // prints: Marker impl + 0u32.foo(); + + // prints: ??? + // the relevant specialization depends on the 'static lifetime + bar::Bar("Activate the marker!").foo(); +} +``` + +The problem here is that all of the crates in isolation look perfectly innocent. +The code in `marker`, `bar` and `client` is accepted today. It's only when these +crates are plugged together that a problem arises -- you end up with a +specialization based on a `'static` lifetime. And the `client` crate may not +even be aware of the existence of the `marker` crate. + +If we make this kind of situation a hard error, we could easily end up with a +scenario in which plugging together otherwise-unrelated crates is *impossible*. + +#### Proposal: ask forgiveness, rather than permission + +So what do we do? There seem to be essentially two avenues: + +1. Be maximally permissive in the impls you can write, and then just ignore + lifetime information in dispatch. We can generate a warning when this is + happening, though in cases like the above, it may be talking about traits + that the client is not even aware of. The assumption here is that these + "missed specializations" will be extremely rare, so better not to impose a + burden on everyone to rule them out. + +2. Try, somehow, to prevent you from writing impls that appear to dispatch based + on lifetimes. The most likely way of doing that is to somehow flag a trait as + "lifetime-dependent". If a trait is lifetime-dependent, it can have + lifetime-sensitive impls (like ones that apply only to `'static` data), but + it cannot be used when writing specialized impls of another trait. + +The downside of (2) is that it's an additional knob that all trait authors have to +think about. That approach is sketched in more detail in the Alternatives section. + +What this RFC proposes is to follow approach (1), at least during the initial +experimentation phase. That's the easiest way to gain experience with +specialization and see to what extent lifetime-dependent specializations +accidentally arise in practice. If they are indeed rare, it seems much better to +catch them via a lint then to force the entire world of traits to be explicitly +split in half. + +To begin with, this lint should be an error by default; we want to get +feedback as to how often this is happening before any +stabilization. + +##### What this means for the programmer + +Ultimately, the goal of the "just ignore lifetimes for specialization" approach +is to reduce the number of knobs in play. The programmer gets to use both +lifetime bounds and specialization freely. + +The problem, of course, is that when using the two together you can get +surprising dispatch results: + +```rust +trait Foo { + fn foo(&self); +} + +impl Foo for T { + default fn foo(&self) { + println!("Default impl"); + } +} + +impl Foo for &'static str { + fn foo(&self) { + println!("Static string slice: {}", self); + } +} + +fn main() { + // prints "Default impl", but generates a lint saying that + // a specialization was missed due to lifetime dependence. + "Hello, world!".foo(); +} +``` + +Specialization is refusing to consider the second impl because it imposes +lifetime constraints not present in the more general impl. We don't know whether +these constraints hold when we need to generate the code, and we don't want to +depend on them because of the subtleties of region inference. But we alert the +programmer that this is happening via a lint. + +Sidenote: for such simple intracrate cases, we could consider treating the impls +themselves more aggressively, catching that the `&'static str` impl will never +be used and refusing to compile it. + +In the more complicated multi-crate example we saw above, the line + +```rust +bar::Bar("Activate the marker!").foo(); +``` + +would likewise print `Default impl` and generate a warning. In this case, the +warning may be hard for the `client` crate author to understand, since the trait +relevant for specialization -- `marker::Marker` -- belongs to a crate that +hasn't even been imported in `client`. Nevertheless, this approach seems +friendlier than the alternative (discussed in Alternatives). + +#### An algorithm for ignoring lifetimes in dispatch + +Although approach (1) may seem simple, there are some subtleties in handling +cases like the following: + +```rust +trait Foo { ... } +impl Foo for T { ... } +impl Foo for T { ... } +``` + +In this "ignore lifetimes for specialization" approach, we still want the above +specialization to work, because *all* impls in the specialization family impose +the same lifetime constraints. The dispatch here purely comes down to `T: Clone` +or not. That's in contrast to something like this: + +```rust +trait Foo { ... } +impl Foo for T { ... } +impl Foo for T { ... } +``` + +where the difference between the impls includes a nontrivial lifetime constraint +(the `'static` bound on `T`). The second impl should effectively be dead code: +we should never dispatch to it in favor of the first impl, because that depends +on lifetime information that we don't have available in trans (and don't want to +rely on in general, due to the way region inference works). We would instead +lint against it (probably error by default). + +So, how do we tell these two scenarios apart? + +- First, we evaluate the impls normally, winnowing to a list of +applicable impls. + +- Then, we attempt to determine specialization. For any pair of applicable impls + `Parent` and `Child` (where `Child` specializes `Parent`), we do the + following: + + - Introduce as assumptions all of the where clauses of `Parent` + + - Attempt to prove that `Child` definitely applies, using these assumptions. + **Crucially**, we do this test in a special mode: lifetime bounds are only + considered to hold if they (1) follow from general well-formedness or (2) are + directly assumed from `Parent`. That is, a constraint in `Child` that `T: + 'static` has to follow either from some basic type assumption (like the type + `&'static T`) or from a similar clause in `Parent`. + + - If the `Child` impl cannot be shown to hold under these more stringent + conditions, then we have discovered a lifetime-sensitive specialization, and + can trigger the lint. + + - Otherwise, the specialization is valid. + +Let's do this for the two examples above. + +**Example 1** + +```rust +trait Foo { ... } +impl Foo for T { ... } +impl Foo for T { ... } +``` + +Here, if we think both impls apply, we'll start by assuming that `T: 'static` +holds, and then we'll evaluate whether `T: 'static` and `T: Clone` hold. The +first evaluation succeeds trivially from our assumption. The second depends on +`T`, as you'd expect. + +**Example 2** + +```rust +trait Foo { ... } +impl Foo for T { ... } +impl Foo for T { ... } +``` + +Here, if we think both impls apply, we start with no assumption, and then +evaluate `T: 'static` and `T: Clone`. We'll fail to show the former, because +it's a lifetime-dependent predicate, and we don't have any assumption that +immediately yields it. + +This should scale to less obvious cases, e.g. using `T: Any` rather than `T: +'static` -- because when trying to prove `T: Any`, we'll find we need to prove +`T: 'static`, and then we'll end up using the same logic as above. It also works +for cases like the following: + +```rust +trait SometimesDep {} + +impl SometimesDep for i32 {} +impl SometimesDep for T {} + +trait Spec {} +impl Spec for T {} +impl Spec for T {} +``` + +Using `Spec` on `i32` will not trigger the lint, because the specialization is +justified without any lifetime constraints. + +## Default impls + +An interesting consequence of specialization is that impls need not (and in fact +sometimes *cannot*) provide all of the items that a trait specifies. Of course, +this is already the case with defaulted items in a trait -- but as we'll see, +that mechanism can be seen as just a way of using specialization. + +Let's start with a simple example: + +```rust +trait MyTrait { + fn foo(&self); + fn bar(&self); +} + +impl MyTrait for T { + default fn foo(&self) { ... } + default fn bar(&self) { ... } +} + +impl MyTrait for String { + fn bar(&self) { ... } +} +``` + +Here, we're acknowledging that the blanket impl has already provided definitions +for both methods, so the impl for `String` can opt to just re-use the earlier +definition of `foo`. This is one reason for the choice of the keyword `default`. +Viewed this way, items defined in a specialized impl are optional overrides of +those in overlapping blanket impls. + +And, in fact, if we'd written the blanket impl differently, we could *force* the +`String` impl to leave off `foo`: + +```rust +impl MyTrait for T { + // now `foo` is "final" + fn foo(&self) { ... } + + default fn bar(&self) { ... } +} +``` + +Being able to leave off items that are covered by blanket impls means that +specialization is close to providing a finer-grained version of defaulted items +in traits -- one in which the defaults can become ever more refined as more is +known about the input types to the traits (as described in the Motivation +section). But to fully realize this goal, we need one other ingredient: the +ability for the *blanket* impl itself to leave off some items. We do this by +using the `default` keyword at the `impl` level: + +```rust +trait Add { + type Output; + fn add(self, rhs: Rhs) -> Self::Output; + fn add_assign(&mut self, Rhs); +} + +default impl Add for T { + fn add_assign(&mut self, rhs: R) { + let tmp = self.clone() + rhs; + *self = tmp; + } +} +``` + +A subsequent overlapping impl of `Add` where `Self: Clone` can choose to leave +off `add_assign`, "inheriting" it from the partial impl above. + +A key point here is that, as the keyword suggests, a `partial` impl may be +incomplete: from the above code, you *cannot* assume that `T: Add` for any +`T: Clone`, because no such complete impl has been provided. + +Defaulted items in traits are just sugar for a default blanket impl: + +```rust +trait Iterator { + type Item; + fn next(&mut self) -> Option; + + fn size_hint(&self) -> (usize, Option) { + (0, None) + } + // ... +} + +// desugars to: + +trait Iterator { + type Item; + fn next(&mut self) -> Option; + fn size_hint(&self) -> (usize, Option); + // ... +} + +default impl Iterator for T { + fn size_hint(&self) -> (usize, Option) { + (0, None) + } + // ... +} +``` + +Default impls are somewhat akin to abstract base classes in object-oriented +languages; they provide some, but not all, of the materials needed for a fully +concrete implementation, and thus enable code reuse but cannot be used concretely. + +Note that the semantics of `default impls` and defaulted items in +traits is that both are implicitly marked `default` -- that is, both +are considered specializable. This choice gives a coherent mental +model: when you choose *not* to employ a default, and instead provide +your own definition, you are in effect overriding/specializing that +code. (Put differently, you can think of default impls as abstract base classes). + +There are a few important details to nail down with the design. This RFC +proposes starting with the conservative approach of applying the general overlap +rule to default impls, same as with complete ones. That ensures that there is +always a clear definition to use when providing subsequent complete impls. It +would be possible, though, to relax this constraint and allow *arbitrary* +overlap between default impls, requiring then whenever a complete impl overlaps +with them, *for each item*, there is either a unique "most specific" default +impl that applies, or else the complete impl provides its own definition for +that item. Such a relaxed approach is much more flexible, probably easier to +work with, and can enable more code reuse -- but it's also more complicated, and +backwards-compatible to add on top of the proposed conservative approach. + +## Limitations + +One frequent motivation for specialization is broader "expressiveness", in +particular providing a larger set of trait implementations than is possible +today. + +For example, the standard library currently includes an `AsRef` trait +for "as-style" conversions: + +```rust +pub trait AsRef where T: ?Sized { + fn as_ref(&self) -> &T; +} +``` + +Currently, there is also a blanket implementation as follows: + +```rust +impl<'a, T: ?Sized, U: ?Sized> AsRef for &'a T where T: AsRef { + fn as_ref(&self) -> &U { + >::as_ref(*self) + } +} +``` + +which allows these conversions to "lift" over references, which is in turn +important for making a number of standard library APIs ergonomic. + +On the other hand, we'd also like to provide the following very simple +blanket implementation: + +```rust +impl<'a, T: ?Sized> AsRef for T { + fn as_ref(&self) -> &T { + self + } +} +``` + +The current coherence rules prevent having both impls, however, +because they can in principle overlap: + +```rust +AsRef<&'a T> for &'a T where T: AsRef<&'a T> +``` + +Another examples comes from the `Option` type, which currently provides two +methods for unwrapping while providing a default value for the `None` case: + +```rust +impl Option { + fn unwrap_or(self, def: T) -> T { ... } + fn unwrap_or_else(self, f: F) -> T where F: FnOnce() -> T { .. } +} +``` + +The `unwrap_or` method is more ergonomic but `unwrap_or_else` is more efficient +in the case that the default is expensive to compute. The original +[collections reform RFC](https://github.com/rust-lang/rfcs/pull/235) proposed a +`ByNeed` trait that was rendered unworkable after unboxed closures landed: + +```rust +trait ByNeed { + fn compute(self) -> T; +} + +impl ByNeed for T { + fn compute(self) -> T { + self + } +} + +impl ByNeed for F where F: FnOnce() -> T { + fn compute(self) -> T { + self() + } +} + +impl Option { + fn unwrap_or(self, def: U) where U: ByNeed { ... } + ... +} +``` + +The trait represents any value that can produce a `T` on demand. But the above +impls fail to compile in today's Rust, because they overlap: consider `ByNeed +for F` where `F: FnOnce() -> F`. + +There are also some trait hierarchies where a subtrait completely subsumes the +functionality of a supertrait. For example, consider `PartialOrd` and `Ord`: + +```rust +trait PartialOrd: PartialEq { + fn partial_cmp(&self, other: &Rhs) -> Option; +} + +trait Ord: Eq + PartialOrd { + fn cmp(&self, other: &Self) -> Ordering; +} +``` + +In cases like this, it's somewhat annoying to have to provide an impl for *both* +`Ord` and `PartialOrd`, since the latter can be trivially derived from the +former. So you might want an impl like this: + +```rust +impl PartialOrd for T where T: Ord { + fn partial_cmp(&self, other: &T) -> Option { + Some(self.cmp(other)) + } +} +``` + +But this blanket impl would conflict with a number of others that work to "lift" +`PartialOrd` and `Ord` impls over various type constructors like references and +tuples, e.g.: + +```rust +impl<'a, A: ?Sized> Ord for &'a A where A: Ord { + fn cmp(&self, other: & &'a A) -> Ordering { Ord::cmp(*self, *other) } +} + +impl<'a, 'b, A: ?Sized, B: ?Sized> PartialOrd<&'b B> for &'a A where A: PartialOrd { + fn partial_cmp(&self, other: &&'b B) -> Option { + PartialOrd::partial_cmp(*self, *other) + } +``` + +The case where they overlap boils down to: + +```rust +PartialOrd<&'a T> for &'a T where &'a T: Ord +PartialOrd<&'a T> for &'a T where T: PartialOrd +``` + +and there is no implication between either of the where clauses. + +There are many other examples along these lines. + +Unfortunately, *none* of these examples are permitted by the revised overlap +rule in this RFC, because in none of these cases is one of the impls fully a +"subset" of the other; the overlap is always partial. + +It's a shame to not be able to address these cases, but the benefit is a +specialization rule that is very intuitive and accepts only very clear-cut +cases. The Alternatives section sketches some different rules that are less +intuitive but do manage to handle cases like those above. + +If we allowed "relaxed" partial impls as described above, one could at least use +that mechanism to avoid having to give a definition directly in most cases. (So +if you had `T: Ord` you could write `impl PartialOrd for T {}`.) + +## Possible extensions + +It's worth briefly mentioning a couple of mechanisms that one could consider +adding on top of specialization. + +### Inherent impls + +It has long been folklore that inherent impls can be thought of as special, +anonymous traits that are: + +- Automatically in scope; +- Given higher dispatch priority than normal traits. + +It is easiest to make this idea work out if you think of each inherent item as +implicitly defining and implementing its own trait, so that you can account for +examples like the following: + +```rust +struct Foo { .. } + +impl Foo { + fn foo(&self) { .. } +} + +impl Foo { + fn bar(&self) { .. } +} +``` + +In this example, the availability of each inherent item is dependent on a +distinct `where` clause. A reasonable "desugaring" would be: + +```rust +#[inherent] // an imaginary attribute turning on the "special" treatment of inherent impls +trait Foo_foo { + fn foo(&self); +} + +#[inherent] +trait Foo_bar { + fn bar(&self); +} + +impl Foo_foo for Foo { + fn foo(&self) { .. } +} + +impl Foo_bar for Foo { + fn bar(&self) { .. } +} +``` + +With this idea in mind, it is natural to expect specialization to work for +inherent impls, e.g.: + +```rust +impl Vec where I: IntoIterator { + default fn extend(iter: I) { .. } +} + +impl Vec { + fn extend(slice: &[T]) { .. } +} +``` + +We could permit such specialization at the inherent impl level. The +semantics would be defined in terms of the folklore desugaring above. + +(Note: this example was chosen purposefully: it's possible to use specialization +at the inherent impl level to avoid refactoring the `Extend` trait as described +in the Motivation section.) + +There are more details about this idea in the appendix. + +### Super + +Continuing the analogy between specialization and inheritance, one could imagine +a mechanism like `super` to access and reuse less specialized implementations +when defining more specialized ones. While there's not a strong need for this +mechanism as part of this RFC, it's worth checking that the specialization +approach is at least compatible with `super`. + +Fortunately, it is. If we take `super` to mean "the most specific impl +overlapping with this one", there is always a unique answer to that question, +because all overlapping impls are totally ordered with respect to each other via +specialization. + +### Extending HRTBs + +In the Motivation we mentioned the need to refactor the `Extend` trait to take +advantage of specialization. It's possible to work around that need by using +specialization on inherent impls (and having the trait impl defer to the +inherent one), but of course that's a bit awkward. + +For reference, here's the refactoring: + +```rust +// Current definition +pub trait Extend { + fn extend(&mut self, iterable: T) where T: IntoIterator; +} + +// Refactored definition +pub trait Extend> { + fn extend(&mut self, iterable: T); +} +``` + +One problem with this kind of refactoring is that you *lose* the ability to say +that a type `T` is extendable *by an arbitrary iterator*, because every use of +the `Extend` trait has to say precisely what iterator is supported. But the +whole point of this exercise is to have a blanket impl of `Extend` for any +iterator that is then specialized later. + +This points to a longstanding limitation: the trait system makes it possible to +ask for any number of specific impls to exist, but not to ask for a blanket impl +to exist -- *except* in the limited case of lifetimes, where higher-ranked trait +bounds allow you to do this: + +```rust +trait Trait { .. } +impl<'a> Trait for &'a MyType { .. } + +fn use_all(t: T) where for<'a> &'a T: Trait { .. } +``` + +We could extend this mechanism to cover type parameters as well, so that you could write: + +```rust +fn needs_extend_all(t: T) where for> T: Extend { .. } +``` + +Such a mechanism is out of scope for this RFC. + +### Refining bounds on associated types + +The design with `default` makes specialization of associated types an +all-or-nothing affair, but it would occasionally be useful to say that +all further specializations will at least guarantee some additional +trait bound on the associated type. This is particularly relevant for +the "efficient inheritance" use case. Such a mechanism can likely be +added, if needed, later on. + +# Drawbacks + +Many of the more minor tradeoffs have been discussed in detail throughout. We'll +focus here on the big picture. + +As with many new language features, the most obvious drawback of this proposal +is the increased complexity of the language -- especially given the existing +complexity of the trait system. Partly for that reason, the RFC errs on the side +of simplicity in the design wherever possible. + +One aspect of the design that mitigates its complexity somewhat is the fact that +it is entirely opt in: you have to write `default` in an impl in order for +specialization of that item to be possible. That means that all the ways we have +of reasoning about existing code still hold good. When you do opt in to +specialization, the "obviousness" of the specialization rule should mean that +it's easy to tell at a glance which of two impls will be preferred. + +On the other hand, the simplicity of this design has its own drawbacks: + +- You have to lift out trait parameters to enable specialization, as + in the `Extend` example above. Of course, this lifting can be hidden + behind an additional trait, so that the end-user interface remains + idiomatic. The RFC mentions a few other extensions for dealing with + this limitation -- either by employing inherent item specialization, + or by eventually generalizing HRTBs. + +- You can't use specialization to handle some of the more "exotic" cases of + overlap, as described in the Limitations section above. This is a deliberate + trade, favoring simple rules over maximal expressiveness. + +Finally, if we take it as a given that we want to support some form of +"efficient inheritance" as at least a programming pattern in Rust, the ability +to use specialization to do so, while also getting all of its benefits, is a net +simplifier. The full story there, of course, depends on the forthcoming companion RFC. + +# Alternatives + +## Alternatives to specialization + +The main alternative to specialization in general is an approach based on +negative bounds, such as the one outlined in an +[earlier RFC](https://github.com/rust-lang/rfcs/pull/586). Negative bounds make +it possible to handle many of the examples this proposal can't (the ones in the +Limitations section). But negative bounds are also fundamentally *closed*: they +make it possible to perform a certain amount of specialization up front when +defining a trait, but don't easily support downstream crates further +specializing the trait impls. + +## Alternative specialization designs + +### The "lattice" rule + +The rule proposed in this RFC essentially says that overlapping impls +must form *chains*, in which each one is strictly more specific than +the last. + +This approach can be generalized to *lattices*, in which partial +overlap between impls is allowed, so long as there is an additional +impl that covers precisely the area of overlap (the intersection). +Such a generalization can support all of the examples mentioned in the +Limitations section. Moving to the lattice rule is backwards compatible. + +Unfortunately, the lattice rule (or really, any generalization beyond +the proposed chain rule) runs into a nasty problem with our lifetime +strategy. Consider the following: + +```rust +trait Foo {} +impl Foo for (T, U) where T: 'static {} +impl Foo for (T, U) where U: 'static {} +impl Foo for (T, U) where T: 'static, U: 'static {} +``` + +The problem is, if we allow this situation to go through typeck, by +the time we actually generate code in trans, *there is no possible +impl to choose*. That is, we do not have enough information to +specialize, but we also don't know which of the (overlapping) +unspecialized impls actually applies. We can address this problem by +making the "lifetime dependent specialization" lint issue a hard error +for such intersection impls, but that means that certain compositions +will simply not be allowed (and, as mentioned before, these +compositions might involve traits, types, and impls that the +programmer is not even aware of). + +The limitations that the lattice rule addresses are fairly secondary +to the main goals of specialization (as laid out in the Motivation), +and so, since the lattice rule can be added later, the RFC sticks with +the simple chain rule for now. + +### Explicit ordering + +Another, perhaps more palatable alternative would be to take the specialization +rule proposed in this RFC, but have some other way of specifying precedence when +that rule can't resolve it -- perhaps by explicit priority numbering. That kind +of mechanism is usually noncompositional, but due to the orphan rule, it's a +least a crate-local concern. Like the alternative rule above, it could be added +backwards compatibly if needed, since it only enables new cases. + +### Singleton non-default wins + +@pnkfelix suggested the following rule, which allows overlap so long as there is +a unique non-default item. + +> For any given type-based lookup, either: +> +> 0. There are no results (error) +> +> 1. There is only one lookup result, in which case we're done (regardless of +> whether it is tagged as default or not), +> +> 2. There is a non-empty set of results with defaults, where exactly one +> result is non-default -- and then that non-default result is the answer, +> *or* +> +> 3. There is a non-empty set of results with defaults, where 0 or >1 results +> are non-default (and that is an error). + +This rule is arguably simpler than the one proposed in this RFC, and can +accommodate the examples we've presented throughout. It would also support some +of the cases this RFC cannot, because the default/non-default distinction can be +used to specify an ordering between impls when the subset ordering fails to do +so. For that reason, it is not forward-compatible with the main proposal in this +RFC. + +The downsides are: + +- Because actual dispatch occurs at monomorphization, errors are generated quite + late, and only at use sites, not impl sites. That moves traits much more in + the direction of C++ templates. + +- It's less scalable/compositional: this alternative design forces the + "specialization hierarchy" to be flat, in particular ruling out multiple + levels of increasingly-specialized blanket impls. + +## Alternative handling of lifetimes + +This RFC proposes a *laissez faire* approach to lifetimes: we let you +write whatever impls you like, then warn you if some of them are being +ignored because the specialization is based purely on lifetimes. + +The main alternative approach is to make a more "principled" +distinction between two kinds of traits: those that can be used as +constraints in specialization, and those whose impls can be lifetime +dependent. Concretely: + +```rust +#[lifetime_dependent] +trait Foo {} + +// Only allowed to use 'static here because of the lifetime_dependent attribute +impl Foo for &'static str {} + +trait Bar { fn bar(&self); } +impl Bar for T { + // Have to use `default` here to allow specialization + default fn bar(&self) {} +} + +// CANNOT write the following impl, because `Foo` is lifetime_dependent +// and Bar is not. +// +// NOTE: this is what I mean by *using* a trait in specialization; +// we are trying to say a specialization applies when T: Foo holds +impl Bar for T { + fn bar(&self) { ... } +} + +// CANNOT write the following impl, because `Bar` is not lifetime_dependent +impl Bar for &'static str { + fn bar(&self) { ... } +} +``` + +There are several downsides to this approach: + +* It forces trait authors to consider a rather subtle knob for every + trait they write, choosing between two forms of expressiveness and + dividing the world accordingly. The last thing the trait system + needs is another knob. + +* Worse still, changing the knob in either direction is a breaking change: + + * If a trait gains a `lifetime_dependent` attribute, any impl of a + different trait that used it to specialize would become illegal. + + * If a trait loses its `lifetime_dependent` attribute, any impl of + that trait that was lifetime dependent would become illegal. + +* It hobbles specialization for some existing traits in `std`. + +For the last point, consider `From` (which is tied to `Into`). In +`std`, we have the following important "boxing" impl: + +```rust +impl<'a, E: Error + 'a> From for Box +``` + +This impl would necessitate `From` (and therefore, `Into`) being +marked `lifetime_dependent`. But these traits are very likely to be +used to describe specializations (e.g., an impl that applies when `T: +Into`). + +There does not seem to be any way to consider such impls as +lifetime-independent, either, because of examples like the following: + +```rust +// If we consider this innocent... +trait Tie {} +impl<'a, T: 'a> Tie for (T, &'a u8) + +// ... we get into trouble here +trait Foo {} +impl<'a, T> Foo for (T, &'a u8) +impl<'a, T> Foo for (T, &'a u8) where (T, &'a u8): Tie +``` + +All told, the proposed *laissez faire* seems a much better bet in +practice, but only experience with the feature can tell us for sure. + +# Unresolved questions + +All questions from the RFC discussion and prototype have been resolved. + +# Appendix + +## More details on inherent impls + +One tricky aspect for specializing inherent impls is that, since there is no +explicit trait definition, there is no general signature that each definition of +an inherent item must match. Thinking about `Vec` above, for example, notice +that the two signatures for `extend` look superficially different, although it's +clear that the first impl is the more general of the two. + +It's workable to use a very simple-minded conceptual desugaring: each item +desugars into a distinct trait, with type parameters for e.g. each argument and +the return type. All concrete type information then emerges from desugaring into +impl blocks. Thus, for example: + +``` +impl Vec where I: IntoIterator { + default fn extend(iter: I) { .. } +} + +impl Vec { + fn extend(slice: &[T]) { .. } +} + +// Desugars to: + +trait Vec_extend { + fn extend(Arg) -> Result; +} + +impl Vec_extend for Vec where I: IntoIterator { + default fn extend(iter: I) { .. } +} + +impl Vec_extend<&[T], ()> for Vec { + fn extend(slice: &[T]) { .. } +} +``` + +All items of a given name must desugar to the same trait, which means that the +number of arguments must be consistent across all impl blocks for a given `Self` +type. In addition, we'd require that *all of the impl blocks overlap* (meaning +that there is a single, most general impl). Without these constraints, we would +implicitly be permitting full-blown overloading on both arity and type +signatures. For the time being at least, we want to restrict overloading to +explicit uses of the trait system, as it is today. + +This "desugaring" semantics has the benefits of allowing inherent item +specialization, and also making it *actually* be the case that inherent impls +are really just implicit traits -- unifying the two forms of dispatch. Note that +this is a breaking change, since examples like the following are (surprisingly!) +allowed today: + +```rust +struct Foo(A, B); + +impl Foo { + fn foo(&self, _: u32) {} +} + +impl Foo { + fn foo(&self, _: bool) {} +} + +fn use_foo(f: Foo) { + f.foo(true) +} +``` + +As has been proposed +[elsewhere](https://internals.rust-lang.org/t/pre-rfc-adjust-default-object-bounds/2199/), +this "breaking change" could be made available through a feature flag that must +be used even after stabilization (to opt in to specialization of inherent +impls); the full details will depend on pending revisions to +[RFC 1122](https://github.com/rust-lang/rfcs/pull/1122). diff --git a/text/1211-mir.md b/text/1211-mir.md new file mode 100644 index 00000000000..e078cac469b --- /dev/null +++ b/text/1211-mir.md @@ -0,0 +1,823 @@ +- Feature Name: N/A +- Start Date: (fill me in with today's date, YYYY-MM-DD) +- RFC PR: [rust-lang/rfcs#1211](https://github.com/rust-lang/rfcs/pull/1211) +- Rust Issue: [rust-lang/rust#27840](https://github.com/rust-lang/rust/issues/27840) + +# Summary + +Introduce a "mid-level IR" (MIR) into the compiler. The MIR desugars +most of Rust's surface representation, leaving a simpler form that is +well-suited to type-checking and translation. + +# Motivation + +The current compiler uses a single AST from the initial parse all the +way to the final generation of LLVM. While this has some advantages, +there are also a number of distinct downsides. + +1. The complexity of the compiler is increased because all passes must + be written against the full Rust language, rather than being able + to consider a reduced subset. The MIR proposed here is *radically* + simpler than the surface Rust syntax -- for example, it contains no + "match" statements, and converts both `ref` bindings and `&` + expresions into a single form. + + a. There are numerous examples of "desugaring" in Rust. In + principle, desugaring one language feature into another should + make the compiler *simpler*, but in our current implementation, + it tends to make things more complex, because every phase must + simulate the desugaring anew. The most prominent example are + closure expressions (`|| ...`), which desugar to a fresh struct + instance, but other examples abound: `for` loops, `if let` and + `while let`, `box` expressions, overloaded operators (which + desugar to method calls), method calls (which desugar to UFCS + notation). + + b. There are a number of features which are almost infeasible to + implement today but which should be much easier given a MIR + representation. Examples include box patterns and non-lexical + lifetimes. + +2. Reasoning about fine-grained control-flow in an AST is rather + difficult. The right tool for this job is a control-flow graph + (CFG). We currently construct a CFG that lives "on top" of the AST, + which allows the borrow checking code to be flow sensitive, but it + is awkward to work with. Worse, because this CFG is not used by + trans, it is not necessarily the case that the control-flow as seen + by the analyses corresponds to the code that will be generated. + The MIR is based on a CFG, resolving this situation. + +3. The reliability of safety analyses is reduced because the gap + between what is being analyzed (the AST) and what is being executed + (LLVM bitcode) is very wide. The MIR is very low-level and hence the + translation to LLVM should be straightforward. + +4. The reliability of safety proofs, when we have some, would be + reduced because the formal language we are modeling is so far from + the full compiler AST. The MIR is simple enough that it should be + possible to (eventually) make safety proofs based on the MIR + itself. + +5. Rust-specific optimizations, and optimizing trans output, are very + challenging. There are numerous cases where it would be nice to be + able to do optimizations *before* translating to LLVM bitcode, or + to take advantage of Rust-specific knowledge of which LLVM is + unaware. Currently, we are forced to do these optimizations as part + of lowering to bitcode, which can get quite complex. Having an + intermediate form improves the situation because: + + a. In some cases, we can do the optimizations in the MIR itself before translation. + + b. In other cases, we can do analyses on the MIR to easily determine when the optimization + would be safe. + + c. In all cases, whatever we can do on the MIR will be helpful for other + targets beyond LLVM (see next bullet). + +6. Migrating away from LLVM is nearly impossible. In the future, it + may be advantageous to provide a choice of backends beyond + LLVM. Currently though this is infeasible, since so much of the + semantics of Rust itself are embedded in the `trans` step which + converts to LLVM IR. Under the MIR design, those semantics are + instead described in the translation from AST to MIR, and the LLVM + step itself simply applies optimizations. + +Given the numerous benefits of a MIR, you may wonder why we have not +taken steps in this direction earlier. In fact, we have a number of +structures in the compiler that simulate the effect of a MIR: + +1. Adjustments. Every expression can have various adjustments, like + autoderefs and so forth. These are computed by the type-checker + and then read by later analyses. This is a form of MIR, but not a particularly + convenient one. +2. The CFG. The CFG tries to model the flow of execution as a graph + rather than a tree, to help analyses in dealing with complex + control-flow formed by things like loops, `break`, `continue`, etc. + This CFG is however inferior to the MIR in that it is only an + approximation of control-flow and does not include all the + information one would need to actually execute the program (for + example, for an `if` expression, the CFG would indicate that two + branches are possible, but would not contain enough information to + decide which branch to take). +3. `ExprUseVisitor`. The `ExprUseVisitor` is designed to work in + conjunction with the CFG. It walks the AST and highlights actions + of interest to later analyses, such as borrows or moves. For each + such action, the analysis gets a callback indicating the point in + the CFG where the action occurred along with what + happened. Overloaded operators, method calls, and so forth are + "desugared" into their more primitive operations. This is + effectively a kind of MIR, but it is not complete enough to do + translation, since it focuses purely on borrows, moves, and other + things of interest to the safety checker. + +Each of these things were added in order to try and cope with the +complexity of working directly on the AST. The CFG for example +consolidates knowledge about control-flow into one piece of code, +producing a data structure that can be easily interpreted. Similarly, +the `ExprUseVisitor` consolidates knowledge of how to walk and +interpret the current compiler representation. + +### Goals + +It is useful to think about what "knowledge" the MIR should +encapsulate. Here is a listing of the kinds of things that should be +explicit in the MIR and thus that downstream code won't have to +re-encode in the form of repeated logic: + +- **Precise ordering of control-flow.** The CFG makes this very explicit, + and the individual statements and nodes in the MIR are very small + and detailed and hence nothing "interesting" happens in the middle + of an individual node with respect to control-flow. +- **What needs to be dropped and when.** The set of data that needs to + be dropped and when is a fairly complex thing to calculate: you have + to know what's in scope, including temporary values and so forth. + In the MIR, all drops are explicit, including those that result from + panics and unwinding. +- **How matches are desugared.** Reasoning about matches has been a + traditional source of complexity. Matches combine traversing types + with borrows, moves, and all sorts of other things, depending on the + precise patterns in use. This is all vastly simplified and explicit + in MIR. + +One thing the current MIR does not make explicit as explicit as it +could is when something is *moved*. For by-value uses of a value, the +code must still consult the type of the value to decide if that is a +move or not. This could be made more explicit in the IR. + +### Which analyses are well-suited to the MIR? + +Some analyses are better suited to the AST than to a MIR. The +following is a list of work the compiler does that would benefit from +using a MIR: + +- **liveness checking**: this is used to issue warnings about unused assignments + and the like. The MIR is perfect for this sort of data-flow analysis. +- **borrow and move checking**: the borrow checker already uses a + combination of the CFG and `ExprUseVisitor` to try and achieve a + similarly low-level of detail. +- **translation to LLVM IR**: the MIR is much closer than the AST to + the desired end-product. + +Some other passes would probably work equally well on the MIR or an +AST, but they will likely find the MIR somewhat easier to work with +than the current AST simply because it is, well, simpler: + +- **rvalue checking**, which checks that things are `Sized` which need to be. +- **reachability** and **death checking**. + +These items are likely ill-suited to the MIR as designed: + +- **privacy checking**, since it relies on explicit knowledge of paths that is not + necessarily present in the MIR. +- **lint checking**, since it is often dependent on the sort of surface details + we are seeking to obscure. + +For some passes, the impact is not entirely clear. In particular, +**match exhaustiveness checking** could easily be subsumed by the MIR +construction process, which must do a similar analysis during the +lowering process. However, once the MIR is built, the match is +completely desugared into more primitive switches and so forth, so we +will need to leave some markers in order to know where to check for +exhaustiveness and to reconstruct counter examples. + +# Detailed design + +### What is *really* being proposed here? + +The rest of this section goes into detail on a particular MIR design. +However, the true purpose of this RFC is not to nail down every detail +of the MIR -- which are expected to evolve and change over time anyway +-- but rather to establish some high-level principles which drive the +rest of the design: + +1. We should indeed lower the representation from an AST to something + else that will drive later analyses, and this representation should + be based on a CFG, not a tree. +2. This representation should be explicitly minimal and not attempt to retain + the original syntactic structure, though it should be possible to recover enough + of it to make quality error messages. +3. This representation should encode drops, panics, and other + scope-dependent items explicitly. +4. This representation does not have to be well-typed Rust, though it + should be possible to type-check it using a tweaked variant on the + Rust type system. + +### Prototype + +The MIR design being described here [has been prototyped][proto-crate] +and can be viewed in the `nikomatsakis` repository on github. In +particular, [the `repr` module][repr] defines the MIR representation, +and [the `build` module][build] contains the code to create a MIR +representation from an AST-like form. + +For increased flexibility, as well as to make the code simpler, the +prototype is not coded directly against the compiler's AST, but rather +against an idealized representation defined by [the `HIR` trait][hir]. +Note that this HIR trait is entirely independent from the HIR discussed by +nrc in [RFC 1191][1191] -- you can think of it as an abstract trait +that any high-level Rust IR could implement, including our current +AST. Moreover, it's just an implementation detail and not part of the +MIR being proposed here per se. Still, if you want to read the code, +you have to understand its design. + +The `HIR` trait contains a number of opaque associated types for the +various aspects of the compiler. For example, the type `H::Expr` +represents an expression. In order to find out what kind of expression +it is, the `mirror` method is called, which converts an `H::Expr` into +an `Expr` mirror. This mirror then contains embedded `ExprRef` +nodes to refer to further subexpressions; these may either be mirrors +themselves, or else they may be additional `H::Expr` nodes. This +allows the tree that is exported to differ in small ways from the +actual tree within the compiler; the primary intention is to use this +to model "adjustments" like autoderef. The code to convert from our +current AST to the HIR is not yet complete, but it can be found in the +[`tcx` module][tcx]. + +Note that the HIR mirroring system is an experiment and not really +part of the MIR itself. It does however present an interesting option +for (eventually) stabilizing access to the compiler's internals. + +[proto-crate]: https://github.com/nikomatsakis/rust/tree/mir/src/librustc_mir +[repr]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/repr.rs +[build]: https://github.com/nikomatsakis/rust/tree/mir/src/librustc_mir/build +[hir]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/hir.rs +[1191]: https://github.com/rust-lang/rfcs/pull/1191 +[tcx]: https://github.com/nikomatsakis/rust/blob/mir/src/librustc_mir/tcx/mod.rs + +### Overview of the MIR + +The proposed MIR always describes the execution of a single fn. At +the highest level it consists of a series of declarations regarding +the stack storage that will be required and then a set of basic +blocks: + + MIR = fn({TYPE}) -> TYPE { + {let [mut] B: TYPE;} // user-declared bindings and their types + {let TEMP: TYPE;} // compiler-introduced temporary + {BASIC_BLOCK} // control-flow graph + }; + +The storage declarations are broken into two categories. User-declared +bindings have a 1-to-1 relationship with the variables specified in +the program. Temporaries are introduced by the compiler in various +cases. For example, borrowing an lvalue (e.g., `&foo()`) will +introduce a temporary to store the result of `foo()`. Similarly, +discarding a value `foo();` is translated to something like `let tmp = +foo(); drop(tmp);`). Temporaries are single-assignment, but because +they can be borrowed they may be mutated after this assignment and +hence they differ somewhat from variables in a pure SSA +representation. + +The proposed MIR takes the form of a graph where each node is a *basic +block*. A basic block is a standard compiler term for a continuous +sequence of instructions with a single entry point. All interesting +control-flow happens between basic blocks. Each basic block has an id +`BB` and consists of a sequence of statements and a terminator: + + BASIC_BLOCK = BB: {STATEMENT} TERMINATOR + +A `STATEMENT` can have one of three forms: + + STATEMENT = LVALUE "=" RVALUE // assign rvalue into lvalue + | Drop(DROP_KIND, LVALUE) // drop value if needed + DROP_KIND = SHALLOW // (see discussion below) + | DEEP + +The following sections dives into these various kinds of statements in +more detail. + +The `TERMINATOR` for a basic block describes how it connects to +subsequent blocks: + + TERMINATOR = GOTO(BB) // normal control-flow + | PANIC(BB) // initiate unwinding, branching to BB for cleanup + | IF(LVALUE, BB0, BB1) // test LVALUE and branch to BB0 if true, else BB1 + | SWITCH(LVALUE, BB...) // load discriminant from LVALUE (which must be an enum), + // and branch to BB... depending on which variant it is + | CALL(LVALUE0 = LVALUE1(LVALUE2...), BB0, BB1) + // call LVALUE1 with LVALUE2... as arguments. Write + // result into LVALUE0. Branch to BB0 if it returns + // normally, BB1 if it is unwinding. + | DIVERGE // return to caller, unwinding + | RETURN // return to caller normally + +Most of the terminators should be fairly obvious. The most interesting +part is the handling of unwinding. This aligns fairly close with how +LLVM works: there is one terminator, PANIC, that initiates unwinding. +It immediately branches to a handler (BB) which will perform cleanup +and (eventually) reach a block that has a DIVERGE terminator. DIVERGE +causes unwinding to continue up the stack. + +Because calls to other functions can always (or almost always) panic, +calls are themselves a kind of terminator. If we can determine that +some function we are calling cannot unwind, we can always modify the +IR to make the second basic block optional. (We could also add an +`RVALUE` to represent calls, but it's probably easiest to keep the +call as a terminator unless the memory savings of consolidating basic +blocks are found to be worthwhile.) + +It's worth pointing out that basic blocks are just a kind of +compile-time and memory-use optimization; there is no semantic +difference between a single block and two blocks joined by a GOTO +terminator. + +### Assignments, values, and rvalues + +The primary kind of statement is an assignent: + + LVALUE "=" RVALUE + +The semantics of this operation are to first evaluate the RVALUE and +then store it into the LVALUE (which must represent a memory location +of suitable type). + +An `LVALUE` represents a path to a memory location. This is the basic +"unit" analyzed by the borrow checker. It is always possible to +evaluate an `LVALUE` without triggering any side-effects (modulo +derefences of unsafe pointers, which naturally can trigger arbitrary +behavior if the pointer is not valid). + + LVALUE = B // reference to a user-declared binding + | TEMP // a temporary introduced by the compiler + | ARG // a formal argument of the fn + | STATIC // a reference to a static or static mut + | RETURN // the return pointer of the fn + | LVALUE.f // project a field or tuple field, like x.f or x.0 + | *LVALUE // dereference a pointer + | LVALUE[LVALUE] // index into an array (see disc. below about bounds checks) + | (LVALUE as VARIANT) // downcast to a specific variant of an enum, + // see the section on desugaring matches below + +An `RVALUE` represents a computation that yields a result. This result +must be stored in memory somewhere to be accessible. The MIR does not +contain any kind of nested expressions: everything is flattened out, +going through lvalues as intermediaries. + + RVALUE = Use(LVALUE) // just read an lvalue + | [LVALUE; LVALUE] + | &'REGION LVALUE + | &'REGION mut LVALUE + | LVALUE as TYPE + | LVALUE LVALUE + | LVALUE + | Struct { f: LVALUE0, ... } // aggregates, see section below + | (LVALUE...LVALUE) + | [LVALUE...LVALUE] + | CONSTANT + | LEN(LVALUE) // load length from a slice, see section below + | BOX // malloc for builtin box, see section below + BINOP = + | - | * | / | ... // excluding && and || + UNOP = ! | - // note: no `*`, as that is part of LVALUE + +One thing worth pointing out is that the binary and unary operators +are only the *builtin* form, operating on scalar values. Overloaded +operators will be desugared to trait calls. Moreover, all method calls +are desugared into normal calls via UFCS form. + +### Constants + +Constants are a subset of rvalues that can be evaluated at compilation +time: + + CONSTANT = INT + | UINT + | FLOAT + | BOOL + | BYTES + | STATIC_STRING + | ITEM // reference to an item or constant etc + | > // projection + | CONSTANT(CONSTANT...) // + | CAST(CONSTANT, TY) // foo as bar + | Struct { (f: CONSTANT)... } // aggregates... + | (CONSTANT...) // + | [CONSTANT...] // + +### Aggregates and further lowering + +The set of rvalues includes "aggregate" expressions like `(x, y)` or +`Foo { f: x, g: y }`. This is a place where the MIR (somewhat) departs +from what will be generated compilation time, since (often) an +expression like `f = (x, y, z)` will wind up desugared into a series +of piecewise assignments like: + + f.0 = x; + f.1 = y; + f.2 = z; + +However, there are good reasons to include aggregates as first-class +rvalues. For one thing, if we break down each aggregate into the +specific assignments that would be used to construct the value, then +zero-sized types are *never* assigned, since there is no data to +actually move around at runtime. This means that the compiler couldn't +distinguish uninitialized variables from initialized ones. That is, +code like this: + +```rust +let x: (); // note: never initialized +use(x) +``` + +and this: + +```rust +let x: () = (); +use(x); +``` + +would desugar to the same MIR. That is a problem, particularly with +respect to destructors: imagine that instead of the type `()`, we used +a type like `struct Foo;` where `Foo` implements `Drop`. + +Another advantage is that building aggregates in a two-step way +assures the proper execution order when unwinding occurs before the +complete value is constructed. In particular, we want to drop the +intermediate results in the order that they appear in the source, not +in the order in which the fields are specified in the struct +definition. + +A final reason to include aggregates is that, at runtime, the +representation of an aggregate may indeed fit within a single word, in +which case making a temporary and writing the fields piecemeal may in +fact not be the correct representation. + +In any case, after the move and correctness checking is done, it is +easy enough to remove these aggregate rvalues and replace them with +assignments. This could potentially be done during LLVM lowering, or +as a pre-pass that transforms MIR statements like: + + x = ...x; + y = ...y; + z = ...z; + f = (x, y, z) + +to: + + x = ...x; + y = ...y; + z = ...z; + f.0 = x; + f.1 = y; + f.2 = z; + +combined with another pass that removes temporaries that are only used +within a single assignment (and nowhere else): + + f.0 = ...x; + f.1 = ...y; + f.2 = ...z; + +Going further, once type-checking is done, it is plausible to do +further lowering within the MIR purely for optimization purposes. For +example, we could introduce intermediate references to cache the +results of common lvalue computations and so forth. This may well be +better left to LLVM (or at least to the lowering pass). + +### Bounds checking + +Because bounds checks are fallible, it's important to encode them in +the MIR whenever we do indexing. Otherwise the trans code would have +to figure out on its own how to do unwinding at that point. Because +the MIR doesn't "desugar" fat pointers, we include a special rvalue +`LEN` that extracts the length from an array value whose type matches +`[T]` or `[T;n]` (in the latter case, it yields a constant). Using +this, we desugar an array reference like `y = arr[x]` as follows: + + let len: usize; + let idx: usize; + let lt: bool; + + B0: { + len = len(arr); + idx = x; + lt = idx < len; + if lt { B1 } else { B2 } + } + + B1: { + y = arr[idx] + ... + } + + B2: { + + } + +The key point here is that we create a temporary (`idx`) capturing the +value that we bounds checked and we ensure that there is a comparison +against the length. + +### Overflow checking + +Similarly, since overflow checks can trigger a panic, they ought to be +exposed in the MIR as well. This is handled by having distinct binary +operators for "add with overflow" and so forth, analogous to the LLVM +intrinsics. These operators yield a tuple of (result, overflow), so +`result = left + right` might be translated like: + + let tmp: (u32, bool); + + B0: { + tmp = left + right; + if(tmp.1, B2, B1) + } + + B1: { + result = tmp.0 + ... + } + + B2: { + + } + +### Matches + +One of the goals of the MIR is to desugar matches into something much +more primitive, so that we are freed from reasoning about their +complexity. This is primarily achieved through a combination of SWITCH +terminators and downcasts. To get the idea, consider this simple match +statement: + +```rust +match foo() { + Some(ref v) => ...0, + None => ...1 +} +``` + +This would be converted into MIR as follows (leaving out the unwinding support): + + BB0 { + call(tmp = foo(), BB1, ...); + } + + BB1 { + switch(tmp, BB2, BB3) // two branches, corresponding to the Some and None variants resp. + } + + BB2 { + v = &(tmp as Option::Some).0; + ...0 + } + + BB3 { + ...1 + } + +There are some interesting cases that arise from matches that are +worth examining. + +**Vector patterns.** Currently, (unstable) Rust supports vector +patterns which permit borrows that would not otherwise be legal: + +```rust +let mut vec = [1, 2]; +match vec { + [ref mut p, ref mut q] => { ... } +} +``` + +If this code were written using `p = &mut vec[0], q = &mut vec[1]`, +the borrow checker would complain. This is because it does not attempt +to reason about indices being disjoint, even if they are constant +(this is a limitation we may wish to consider lifting at some point in +the future, however). + +To accommodate these, we plan to desugar such matches into lvalues +using the special "constant index" form. The borrow checker would be +able to reason that two constant indices are disjoint but it could +consider "variable indices" to be (potentially) overlapping with all +constant indices. This is a fairly straightforward thing to do (and in +fact the borrow checker already includes similar logic, since the +`ExprUseVisitor` encounters a similar dilemna trying to resolve +borrows). + +### Drops + +The `Drop(DROP_KIND, LVALUE)` instruction is intended to represent +"automatic" compiler-inserted drops. The semantics of a `Drop` is that +it drops "if needed". This means that the compiler can insert it +everywhere that a `Drop` would make sense (due to scoping), and assume +that instrumentation will be done as needed to prevent double +drops. Currently, this signaling is done by zeroing out memory at +runtime, but we are in the process of introducing stack flags for this +purpose: the MIR offers the opportunity to reify those flags if we +wanted, and rewrite drops to be more narrow (versus leaving that work +for LLVM). + +To illustrate how drop works, let's work through a simple +example. Imagine that we have a snippet of code like: + +```rust +{ + let x = Box::new(22); + send(x); +} +``` + +The compiler would generate a drop for `x` at the end of the block, +but the value `x` would also be moved as part of the call to `send`. +A later analysis could easily strip out this `Drop` since it is evident +that the value is always used on all paths that lead to `Drop`. + +### Shallow drops and Box + +The MIR includes the distinction between "shallow" and "deep" +drop. Deep drop is the normal thing, but shallow drop is used when +partially initializing boxes. This is tied to the `box` keyword. +For example, an assignment like the following: + + let x = box Foo::new(); + +would be translated to something like the following: + + let tmp: Box; + + B0: { + tmp = BOX; + f = Foo::new; // constant reference + call(*tmp, f, B1, B2); + } + + B1: { // successful return of the call + x = use(tmp); // move of tmp + ... + } + + B2: { // calling Foo::new() panic'd + drop(Shallow, tmp); + diverge; + } + +The interesting part here is the block B2, which indicates the case +that `Foo::new()` invoked unwinding. In that case, we have to free the +box that we allocated, but we only want to free the box itself, not +its contents (it is not yet initialized). + +Note that having this kind of builtin box code is a legacy thing. The +more generalized protocol that [RFC 809][809] specifies works in +more-or-less exactly the same way: when that is adopted uniformly, the +need for shallow drop and the Box rvalue will go away. + +### Phasing + +Ideally, the translation to MIR would be done during type checking, +but before "region checking". This is because we would like to +implement non-lexical lifetimes eventually, and doing that well would +requires access to a control-flow graph. Given that we do very limited +reasoning about regions at present, this should not be a problem. + +### Representing scopes + +Lexical scopes in Rust play a large role in terms of when destructors +run and how the reasoning about lifetimes works. However, they are +completely erased by the graph format. For the most part, this is not +an issue, since drops are encoded explicitly into the control-flow +where needed. However, one place that we still need to reason about +scopes (at least in the short term) is in region checking, because +currently regions are encoded in terms of scopes, and we have to be +able to map that to a region in the graph. The MIR therefore includes +extra information mapping every scope to a SEME region (single-entry, +multiple-exit). If/when we move to non-lexical lifetimes, regions +would be defined in terms of the graph itself, and the need to retain +scoping information should go away. + +### Monomorphization + +Currently, we do monomorphization at LLVM translation time. If we ever +chose to do it at a MIR level, that would be fine, but one thing to be +careful of is that we may be able to elide `Drop` nodes based on the +specific types. + +### Unchecked assertions + +There are various bits of the MIR that are not trivially type-checked. +In general, these are properties which are assured in Rust by +construction in the high-level syntax, and thus we must be careful not +to do any transformation that would endanger them after the fact. + +- **Bounds-checking.** We introduce explicit bounds checks into the IR + that guard all indexing lvalues, but there is no explicit connection + between this check and the later accesses. +- **Downcasts to a specific variant.** We test variants with a SWITCH + opcode but there is no explicit connection between this test and + later downcasts. + +This need for unchecked operations results form trying to lower and +simplify the representation as much as possible, as well as trying to +represent all panics explicitly. We believe the tradeoff to be +worthwhile, particularly since: + +1. the existing analyses can continue to generally assume that these +properties hold (e.g., that all indices are in bounds and all +downcasts are safe); and, +2. it would be trivial to implement a static dataflow analysis +checking that bounds and downcasts only occur downstream of a relevant +check. + +# Drawbacks + +**Converting from AST to a MIR will take some compilation time.** +Expectations are that constructing the MIR will be quite fast, and +that follow-on code (such as trans and borowck) will execute faster, +because they will operate over a simpler and more compact +representation. However, this needs to be measured. + +**More effort is required to make quality error messages.** Because +the representation the compiler is working with is now quite different +from what the user typed, we have to put in extra effort to make sure +that we bridge this gap when reporting errors. We have some precedent +for dealing with this, however. For example, the `ExprUseVisitor` (and +`mem_categorization`) includes extra annotations and hints to tell the +borrow checker when a reference was introduced as part of a closure +versus being explicit in the source code. The current prototype +doesn't have much in this direction, but it should be relatively +straightforward to add. Hints like those, in addition to spans, should +be enough to bridge the error message gap. + +# Alternatives + +**Use SSA.** In the proposed MIR, temporaries are single-assignment +but can be borrowed, making them more analogous to allocas than SSA +values. This is helpful to analyses like the borrow checker, because +it means that the program operates directly on paths through memory, +versus having the stack modeled as allocas. The current model is also +helpful for generating debuginfo. + +SSA representation can be helpful for more sophisticated backend +optimizations. However, we tend to leave those optimizations to LLVM, +and hence it makes more sense to have the MIR be based on lvalues +instead. There are some cases where it might make sense to do analyses +on the MIR that would benefit from SSA, such as bounds check elision. +In those cases, we could either quickly identify those temporaries +that are not mutably borrowed (and which therefore act like SSA +variables); or, further lower into a LIR, (which would be an SSA +form); or else simply perform the analyses on the MIR using standard +techniques like def-use chains. (CSE and so forth are straightforward +both with and without SSA, honestly.) + +**Exclude unwinding.** Excluding unwinding from the MIR would allow us +to elide annoying details like bounds and overflow checking. These are +not particularly interesting to borrowck, so that is somewhat +appealing. But that would mean that consumers of MIR would have to +reconstruct the order of drops and so forth on unwinding paths, which +would require them reasoning about scopes and other rather complex +bits of information. Moreover, having all drops fully exposed in the +MIR is likely helpful for better handling of dynamic drop and also for +the rules collectively known as dropck, though all details there have +not been worked out. + +**Expand the set of operands.** The proposed MIR forces all rvalue operands +to be lvalues. This means that integer constants and other "simple" things +will wind up introducing a temporary. For example, translating `x = 2+2` +will generate code like: + + tmp0 = 2 + tmp1 = 2 + x = tmp0 + tmp1 + +A more common case will be calls to statically known functions like `x = foo(3)`, +which desugars to a temporary and a constant reference: + + tmp0 = foo; + tmp1 = 3 + x = tmp(tmp1) + +There is no particular *harm* in such constants: it would be very easy +to optimize them away when reducing to LLVM bitcode, and if we do not +do so, LLVM will do it. However, we could also expand the scope of +operands to include both lvalues and some simple rvalues like +constants. The main advantage of this is that it would reduce the +total number of statements and hence might help with memory +consumption. + +**Totally safe MIR.** This MIR includes operations whose safety is not +trivially type-checked (see the section on *unchecked assertions* +above). We might design a higher-level MIR where those properties held +by construction, or modify the MIR to thread "evidence" of some form +that makes it easier to check that the properties hold. The former +would make downstream code accommodate more complexity. The latter +remains an option in the future but doesn't seem to offer much +practical advantage. + +# Unresolved questions + +**What additional info is needed to provide for good error messages?** +Currently the implementation only has spans on statements, not lvalues +or rvalues. We'll have to experiment here. I expect we will probably +wind up placing "debug info" on all lvalues, which includes not only a +span but also a "translation" into terms the user understands. For +example, in a closure, a reference to an by-reference upvar `foo` will +be translated to something like `*self.foo`, and we would like that to +be displayed to the user as just `foo`. + +**What additional info is needed for debuginfo?** It may be that to +generate good debuginfo we want to include additional information +about control-flow or scoping. + +**Unsafe blocks.** Should we layer unsafe in the MIR so that effect +checking can be done on the CFG? It's not the most natural way to do +it, *but* it would make it fairly easy to support (e.g.) autoderef on +unsafe pointers, since all the implicit operations are made explicit +in the MIR. My hunch is that we can improve our HIR instead. diff --git a/text/1212-line-endings.md b/text/1212-line-endings.md new file mode 100644 index 00000000000..aaf327b0607 --- /dev/null +++ b/text/1212-line-endings.md @@ -0,0 +1,70 @@ +- Feature Name: `line_endings` +- Start Date: 2015-07-10 +- RFC PR: [rust-lang/rfcs#1212](https://github.com/rust-lang/rfcs/pull/1212) +- Rust Issue: [rust-lang/rust#28032](https://github.com/rust-lang/rust/issues/28032) + +# Summary + +Change all functions dealing with reading "lines" to treat both '\n' and '\r\n' +as a valid line-ending. + +# Motivation + +The current behavior of these functions is to treat only '\n' as line-ending. +This is surprising for programmers experienced in other languages. Many +languages open files in a "text-mode" per default, which means when they iterate +over the lines, they don't have to worry about the two kinds of line-endings. +Such programmers will be surprised to learn that they have to take care of such +details themselves in Rust. Some may not even have heard of the distinction +between two styles of line-endings. + +The current design also violates the "do what I mean" principle. Both '\r\n' and +'\n' are widely used as line-separators. By talking about the concept of +"lines", it is clear that the current file (or buffer, really) is considered to +be in text format. It is thus very reasonable to expect "lines" to apply to both +kinds of encoding lines in binary format. + +In particular, if the crate is developed on Linux or Mac, the programmer will +probably have most of his input encoded with only '\n' for the line-endings. He +may use the functions talking about "lines", and they will work all right. It is +only when someone runs this crate on input that contains '\r\n' that the bug +will be uncovered. The editor has personally run into this issue when reading +line-by-line from stdin, with the program suddenly failing on Windows. + +# Detailed design + +The following functions will have to be changed: `BufRead::lines` and +`str::lines`. They both should treat '\r\n' as marking the end of a line. This +can be implemented, for example, by first splitting at '\n' like now and then +removing a trailing '\r' right before returning data to the caller. + +Furthermore, `str::lines_any` (the only function currently dealing with both +kinds of line-endings) is deprecated, as it is then functionally equivalent with +`str::lines`. + +# Drawbacks + +This is a semantics-breaking change, changing the behavior of released, stable +API. However, as argued above, the new behavior is much less surprising than the +old one - so one could consider this fixing a bug in the original +implementation. There are alternatives available for the case that one really +wants to split at '\n' only, namely `BufRead::split` and `str::split`. However, +`BufRead:split` does not iterate over `String`, but rather over `Vec`, so +users have to insert an additional explicit call to `String::from_utf8`. + +# Alternatives + +There's the obvious alternative of not doing anything. This leaves a gap in the +features Rust provides to deal with text files, making it hard to treat both +kinds of line-endings uniformly. + +The second alternative is to add `BufRead::lines_any` which works similar to +`str::lines_any` in that it deals with both '\n' and '\r\n'. This provides all +the necessary functionality, but it still leaves people with the need to choose +one of the two functions - and potentially choosing the wrong one. In +particular, the functions with the shorter, nicer name (the existing ones) will +almost always *not* be the right choice. + +# Unresolved questions + +None I can think of. diff --git a/text/1214-projections-lifetimes-and-wf.md b/text/1214-projections-lifetimes-and-wf.md new file mode 100644 index 00000000000..f1da694a143 --- /dev/null +++ b/text/1214-projections-lifetimes-and-wf.md @@ -0,0 +1,1116 @@ +- Feature Name: N/A +- Start Date: (fill me in with today's date, YYYY-MM-DD) +- RFC PR: [rust-lang/rfcs#1214](https://github.com/rust-lang/rfcs/pull/1214) +- Rust Issue: [rust-lang/rust#27579](https://github.com/rust-lang/rust/issues/27579) + +# Summary + +Type system changes to address the outlives relation with respect to +projections, and to better enforce that all types are well-formed +(meaning that they respect their declared bounds). The current +implementation can be both unsound ([#24622]), inconvenient +([#23442]), and surprising ([#21748], [#25692]). The changes are as follows: + +- Simplify the outlives relation to be syntactically based. +- Specify improved rules for the outlives relation and projections. +- Specify more specifically where WF bounds are enforced, covering + several cases missing from the implementation. + +The proposed changes here have been tested and found to cause only a +modest number of regressions (about two dozen root regressions were +previously found on crates.io; however, that run did not yet include +all the provisions from this RFC; updated numbers coming soon). In +order to minimize the impact on users, the plan is to first introduce +the changes in two stages: + +1. Initially, warnings will be issued for cases that violate the rules + specified in this RFC. These warnings are not lints and cannot be + silenced except by correcting the code such that it type-checks + under the new rules. +2. After one release cycle, those warnings will become errors. + +Note that although the changes do cause regressions, they also cause +some code (like that in [#23442]) which currently gets errors to +compile successfully. + +# Motivation + +### TL;DR + +This is a long detailed RFC that is attempting to specify in some +detail aspects of the type system that were underspecified or buggily +implemented before. This section just summarizes the effect on +existing Rust code in terms of changes that may be required. + +**Warnings first, errors later.** Although the changes described in +this RFC are necessary for soundness (and many of them are straight-up +bugfixes), there is some impact on existing code. Therefore the plan +is to first issue warnings for a release cycle and then transition to +hard errors, so as to ease the migration. + +**Associated type projections and lifetimes work more smoothly.** The +current rules for relating associated type projections (like `T::Foo`) +and lifetimes are somewhat cumbersome. The newer rules are more +flexible, so that e.g. we can deduce that `T::Foo: 'a` if `T: 'a`, and +similarly that `T::Foo` is well-formed if `T` is well-formed. As a +bonus, the new rules are also sound. ;) + +**Simpler outlives relation.** The older definition for the outlives +relation `T: 'a` was rather subtle. The new rule basically says that +if all type/lifetime parameters appearing in the type `T` must outlive +`'a`, then `T: 'a` (though there can also be other ways for us to +decide that `T: 'a` is valid, such as in-scope where clauses). So for +example `fn(&'x X): 'a` if `'x: 'a` and `X: 'a` (presuming that `X` is +a type parameter). The older rules were based on what kind of data was +actually *reachable*, and hence accepted this type (since no data of +`&'x X` is reachable from a function pointer). This change primarily +affects struct declarations, since they may now require additional +outlives bounds: + +```rust +// OK now, but after this RFC requires `X: 'a`: +struct Foo<'a, X> { + f: fn(&'a X) // (because of this field) +} +``` + +**More types are sanity checked.** Generally Rust requires that if you +have a type like `SomeStruct`, then whatever where clauses are +declared on `SomeStruct` must hold for `T` (this is called being +"well-formed"). For example, if `SomeStruct` is declared like so: + +```rust +struct SomeStruct { .. } +``` + +then this implies that `SomeStruct` is ill-formed, since `f32` +does not implement `Eq` (just `PartialEq`). However, the current compiler +doesn't check this in associated type definitions: + +```rust +impl Iterator for SomethingElse { + type Item = SomeStruct; // accepted now, not after this RFC +} +``` + +Similarly, WF checking was skipped for trait object types and fn +arguments. This means that `fn(SomeStruct)` would be considered +well-formed today, though attempting to call the function would be an +error. Under this RFC, that fn type is not well-formed (though +sometimes when there are higher-ranked regions, WF checking may still +be deferred until the point where the fn is called). + +There are a few other places where similar requirements were being +overlooked before but will now be enforced. For example, a number of +traits like the following were found in the wild: + +```rust +trait Foo { + // currently accepted, but should require that Self: Sized + fn method(&self, value: Option); +} +``` + +To be well-formed, an `Option` type requires that `T: Sized`. In +this case, though `T=Self`, and `Self` is not `Sized` by +default. Therefore, this trait should be declared `trait Foo: Sized` +to be legal. The compiler is currently *attempting* to enforce these +rules, but many cases were overlooked in practice. + +### Impact on crates.io + +This RFC has been largely implemented and tested against crates.io. A +[total of 43 (root) crates are affected][crater-all] by the +changes. Interestingly, **the vast majority of warnings/errors that +occur are not due to new rules introduced by this RFC**, but rather +due to older rules being more correctly enforced. + +Of the affected crates, **40 are receiving future compatibility +warnings and hence continue to build for the time being**. In the +[remaining three cases][crater-errors], it was not possible to isolate +the effects of the new rules, and hence the compiler reports an error +rather than a future compatibility warning. + +What follows is a breakdown of the reason that crates on crates.io are +receiving errors or warnings. Each row in the table corresponds to one +of the explanations above. + +Problem | Future-compat. warnings | Errors | +----------------------------- | ----------------------- | ------ | +More types are sanity checked | 35 | 3 | +Simpler outlives relation | 5 | | + +As you can see, by far the largest source of problems is simply that +we are now sanity checking more types. This was always the intent, but +there were bugs in the compiler that led to it either skipping +checking altogether or only partially applying the rules. It is +interesting to drill down a bit further into the 38 warnings/errors +that resulted from more types being sanity checked in order to see +what kinds of mistakes are being caught: + +Case | Problem | Number | +---- | ----------------------------- | ------ | + 1 | `Self: Sized` required | 26 | + 2 | `Foo: Bar` required | 11 | + 3 | Not object safe | 1 | + +An example of each case follows: + +**Cases 1 and 2.** In the compiler today, types appearing in trait methods +are incompletely checked. This leads to a lot of traits with +insufficient bounds. By far the most common example was that the +`Self` parameter would appear in a context where it must be sized, +usually when it is embedded within another type (e.g., +`Option`). Here is an example: + +```rust +trait Test { + fn test(&self) -> Option; + // ~~~~~~~~~~~~ + // Incorrectly permitted before. +} +``` + +Because `Option` requires that `T: Sized`, this trait should be +declared as follows: + +```rust +trait Test: Sized { + fn test(&self) -> Option; +} +``` + +**Case 2.** Case 2 is the same as case 1, except that the missing +bound is some trait other than `Sized`, or in some cases an outlives +bound like `T: 'a`. + +**Case 3.** The compiler currently permits non-object-safe traits to +be used as types, even if objects could never actually be created +([#21953]). + +### Projections and the outlives relation + +[RFC 192] introduced the outlives relation `T: 'a` and described the +rules that are used to decide when one type outlives a lifetime. In +particular, the RFC describes rules that govern how the compiler +determines what kind of borrowed data may be "hidden" by a generic +type. For example, given this function signature: + +```rust +fn foo<'a,I>(x: &'a I) + where I: Iterator +{ ... } +``` + +the compiler is able to use implied region bounds (described more +below) to automatically determine that: + +- all borrowed content in the type `I` outlives the function body; +- all borrowed content in the type `I` outlives the lifetime `'a`. + +When associated types were introduced in [RFC 195], some new rules +were required to decide when an "outlives relation" involving a +projection (e.g., `I::Item: 'a`) should hold. The initial rules were +[very conservative][#22246]. This led to the rules from [RFC 192] +being [adapted] to cover associated type projections like +`I::Item`. Unfortunately, these adapted rules are not ideal, and can +still lead to [annoying errors in some situations][#23442]. Finding a +better solution has been on the agenda for some time. + +Simultaneously, we realized in [#24622] that the compiler had a bug +that caused it to erroneously assume that every projection like +`I::Item` outlived the current function body, just as it assumes that +type parameters like `I` outlive the current function body. **This bug +can lead to unsound behavior.** Unfortunately, simply implementing the +naive fix for #24622 exacerbates the shortcomings of the current rules +for projections, causing widespread compilation failures in all sorts +of reasonable and obviously correct code. + +**This RFC describes modifications to the type system that both +restore soundness and make working with associated types more +convenient in some situations.** The changes are largely but not +completely backwards compatible. + +### Well-formed types + +A type is considered *well-formed* (WF) if it meets some simple +correctness criteria. For builtin types like `&'a T` or `[T]`, these +criteria are built into the language. For user-defined types like a +struct or an enum, the criteria are declared in the form of where +clauses. In general, all types that appear in the source and elsewhere +should be well-formed. + +For example, consider this type, which combines a reference to a +hashmap and a vector of additional key/value pairs: + +```rust +struct DeltaMap<'a, K, V> where K: Hash + 'a, V: 'a { + base_map: &'a mut HashMap, + additional_values: Vec<(K,V)> +} +``` + +Here, the WF criteria for `DeltaMap` are as follows: + +- `K: Hash`, because of the where-clause, +- `K: 'a`, because of the where-clause, +- `V: 'a`, because of the where-clause +- `K: Sized`, because of the implicit `Sized` bound +- `V: Sized`, because of the implicit `Sized` bound + +Let's look at those `K:'a` bounds a bit more closely. If you leave +them out, you will find that the the structure definition above does +not type-check. This is due to the requirement that the types of all +fields in a structure definition must be well-formed. In this case, +the field `base_map` has the type `&'a mut HashMap`, and this +type is only valid if `K: 'a` and `V: 'a` hold. Since we don't know +what `K` and `V` are, we have to surface this requirement in the form +of a where-clause, so that users of the struct know that they must +maintain this relationship in order for the struct to be interally +coherent. + +#### An aside: explicit WF requirements on types + +You might wonder why you have to write `K:Hash` and `K:'a` explicitly. +After all, they are obvious from the types of the fields. The reason +is that we want to make it possible to check whether a type like +`DeltaMap<'foo,T,U>` is well-formed *without* having to inspect the +types of the fields -- that is, in the current design, the only +information that we need to use to decide if `DeltaMap<'foo,T,U>` is +well-formed is the set of bounds and where-clauses. + +This has real consequences on usability. It would be possible for the +compiler to infer bounds like `K:Hash` or `K:'a`, but the origin of +the bound might be quite remote. For example, we might have a series +of types like: + +```rust +struct Wrap1<'a,K>(Wrap2<'a,K>); +struct Wrap2<'a,K>(Wrap3<'a,K>); +struct Wrap3<'a,K>(DeltaMap<'a,K,K>); +``` + +Now, for `Wrap1<'foo,T>` to be well-formed, `T:'foo` and `T:Hash` must +hold, but this is not obvious from the declaration of +`Wrap1`. Instead, you must trace deeply through its fields to find out +that this obligation exists. + +#### Implied lifetime bounds + +To help avoid undue annotation, Rust relies on implied lifetime bounds +in certain contexts. Currently, this is limited to fn bodies. The idea +is that for functions, we can make callers do some portion of the WF +validation, and let the callees just assume it has been done +already. (This is in contrast to the type definition, where we +required that the struct itself declares all of its requirements up +front in the form of where-clauses.) + +To see this in action, consider a function that uses a `DeltaMap`: + +```rust +fn foo<'a,K:Hash,V>(d: DeltaMap<'a,K,V>) { ... } +``` + +You'll notice that there are no `K:'a` or `V:'a` annotations required +here. This is due to *implied lifetime bounds*. Unlike structs, a +function's caller must examine not only the explicit bounds and +where-clauses, but *also* the argument and return types. When there +are generic type/lifetime parameters involved, the caller is in charge +of ensuring that those types are well-formed. (This is in contrast +with type definitions, where the type is in charge of figuring out its +own requirements and listing them in one place.) + +As the name "implied lifetime bounds" suggests, we currently limit +implied bounds to region relationships. That is, we will implicitly +derive a bound like `K:'a` or `V:'a`, but not `K:Hash` -- this must +still be written manually. It might be a good idea to change this, but +that would be the topic of a separate RFC. + +Currently, implied bound are limited to fn bodies. This RFC expands +the use of implied bounds to cover impl definitions as well, since +otherwise the annotation burden is quite painful. More on this in the +next section. + +*NB.* There is an additional problem concerning the interaction of +implied bounds and contravariance ([#25860]). To better separate the +issues, this will be addressed in a follow-up RFC that should appear +shortly. + +#### Missing WF checks + +Unfortunately, the compiler currently fails to enforce WF in several +important cases. For example, the +[following program](http://is.gd/6JXjyg) is accepted: + +```rust +struct MyType { t: T } + +trait ExampleTrait { + type Output; +} + +struct ExampleType; + +impl ExampleTrait for ExampleType { + type Output = MyType>; + // ~~~~~~~~~~~~~~~~ + // | + // Note that `Box` is not `Copy`! +} +``` + +However, if we simply naively add the requirement that associated +types must be well-formed, this results in a large annotation burden +(see e.g. [PR 25701](https://github.com/rust-lang/rust/pull/25701/)). +For example, in practice, many iterator implementation break due to +region relationships: + +```rust +impl<'a, T> IntoIterator for &'a LinkedList { + type Item = &'a T; + ... +} +``` + +The problem here is that for `&'a T` to be well-formed, `T: 'a` must +hold, but that is not specified in the where clauses. This RFC +proposes using implied bounds to address this concern -- specifically, +every impl is permitted to assume that all types which appear in the +impl header (trait reference) are well-formed, and in turn each "user" +of an impl will validate this requirement whenever they project out of +a trait reference (e.g., to do a method call, or normalize an +associated type). + +# Detailed design + +This section dives into detail on the proposed type rules. + +### A little type grammar + +We extend the type grammar from [RFC 192] with projections and slice +types: + + T = scalar (i32, u32, ...) // Boring stuff + | X // Type variable + | Id // Nominal type (struct, enum) + | &r T // Reference (mut doesn't matter here) + | O0..On+r // Object type + | [T] // Slice type + | for fn(T1..Tn) -> T0 // Function pointer + | >::Id // Projection + P = r // Region name + | T // Type + O = for TraitId // Object type fragment + r = 'x // Region name + +We'll use this to describe the rules in detail. + +A quick note on terminology: an "object type fragment" is part of an +object type: so if you have `Box`, `FnMut()` and `Send` +are object type fragments. Object type fragments are identical to full +trait references, except that they do not have a self type (no `P0`). + +### Syntactic definition of the outlives relation + +The outlives relation is defined in purely syntactic terms as follows. +These are inference rules written in a primitive ASCII notation. :) As +part of defining the outlives relation, we need to track the set of +lifetimes that are bound within the type we are looking at. Let's +call that set `R=`. Initially, this set `R` is empty, but it +will grow as we traverse through types like fns or object fragments, +which can bind region names via `for<..>`. + +#### Simple outlives rules + +Here are the rules covering the simple cases, where no type parameters +or projections are involved: + + OutlivesScalar: + -------------------------------------------------- + R ⊢ scalar: 'a + + OutlivesNominalType: + ∀i. R ⊢ Pi: 'a + -------------------------------------------------- + R ⊢ Id: 'a + + OutlivesReference: + R ⊢ 'x: 'a + R ⊢ T: 'a + -------------------------------------------------- + R ⊢ &'x T: 'a + + OutlivesObject: + ∀i. R ⊢ Oi: 'a + R ⊢ 'x: 'a + -------------------------------------------------- + R ⊢ O0..On+'x: 'a + + OutlivesFunction: + ∀i. R,r.. ⊢ Ti: 'a + -------------------------------------------------- + R ⊢ for fn(T1..Tn) -> T0: 'a + + OutlivesFragment: + ∀i. R,r.. ⊢ Pi: 'a + -------------------------------------------------- + R ⊢ for TraitId: 'a + +#### Outlives for lifetimes + +The outlives relation for lifetimes depends on whether the lifetime in +question was bound within a type or not. In the usual case, we decide +the relationship between two lifetimes by consulting the environment, +or using the reflexive property. Lifetimes representing scopes within +the current fn have a relationship derived from the code itself, while +lifetime parameters have relationships defined by where-clauses and +implied bounds. + + OutlivesRegionEnv: + 'x ∉ R // not a bound region + ('x: 'a) in Env // derivable from where-clauses etc + -------------------------------------------------- + R ⊢ 'x: 'a + + OutlivesRegionReflexive: + -------------------------------------------------- + R ⊢ 'a: 'a + + OutlivesRegionTransitive: + R ⊢ 'a: 'c + R ⊢ 'c: 'b + -------------------------------------------------- + R ⊢ 'a: 'b + +For higher-ranked lifetimes, we simply ignore the relation, since the +lifetime is not yet known. This means for example that `for<'a> fn(&'a +i32): 'x` holds, even though we do not yet know what region `'a` is +(and in fact it may be instantiated many times with different values +on each call to the fn). + + OutlivesRegionBound: + 'x ∈ R // bound region + -------------------------------------------------- + R ⊢ 'x: 'a + +#### Outlives for type parameters + +For type parameters, the only way to draw "outlives" conclusions is to +find information in the environment (which is being threaded +implicitly here, since it is never modified). In terms of a Rust +program, this means both explicit where-clauses and implied bounds +derived from the signature (discussed below). + + OutlivesTypeParameterEnv: + X: 'a in Env + -------------------------------------------------- + R ⊢ X: 'a + + +#### Outlives for projections + +Projections have the most possibilities. First, we may find +information in the in-scope where clauses, as with type parameters, +but we can also consult the trait definition to find bounds (consider +an associated type declared like `type Foo: 'static`). These rule only +apply if there are no higher-ranked lifetimes in the projection; for +simplicity's sake, we encode that by requiring an empty list of +higher-ranked lifetimes. (This is somewhat stricter than necessary, +but reflects the behavior of my prototype implementation.) + + OutlivesProjectionEnv: + >::Id: 'b in Env + <> ⊢ 'b: 'a + -------------------------------------------------- + <> ⊢ >::Id: 'a + + OutlivesProjectionTraitDef: + WC = [Xi => Pi] WhereClauses(Trait) + >::Id: 'b in WC + <> ⊢ 'b: 'a + -------------------------------------------------- + <> ⊢ >::Id: 'a + +All the rules covered so far already exist today. This last rule, +however, is not only new, it is the crucial insight of this RFC. It +states that if all the components in a projection's trait reference +outlive `'a`, then the projection must outlive `'a`: + + OutlivesProjectionComponents: + ∀i. R ⊢ Pi: 'a + -------------------------------------------------- + R ⊢ >::Id: 'a + +Given the importance of this rule, it's worth spending a bit of time +discussing it in more detail. The following explanation is fairly +informal. A more detailed look can be found in the appendix. + +Let's begin with a concrete example of an iterator type, like +`std::vec::Iter<'a,T>`. We are interested in the projection of +`Iterator::Item`: + + as Iterator>::Item + +or, in the more succint (but potentially ambiguous) form: + + Iter<'a,T>::Item + +Since I'm going to be talking a lot about this type, let's just call +it `` for now. We would like to determine whether `: 'x` holds. + +Now, the easy way to solve `: 'x` would be to normalize `` +by looking at the relevant impl: + +```rust +impl<'b,U> Iterator for Iter<'b,U> { + type Item = &'b U; + ... +} +``` + +From this impl, we can conclude that ` == &'a T`, and thus +reduce `: 'x` to `&'a T: 'x`, which in turn holds if `'a: 'x` +and `T: 'x` (from the rule `OutlivesReference`). + +But often we are in a situation where we can't normalize the +projection (for example, a projection like `I::Item` where we only +know that `I: Iterator`). What can we do then? The rule +`OutlivesProjectionComponents` says that if we can conclude that every +lifetime/type parameter `Pi` to the trait reference outlives `'x`, +then we know that a projection from those parameters outlives `'x`. In +our example, the trait reference is ` as Iterator>`, so +that means that if the type `Iter<'a,T>` outlives `'x`, then the +projection `` outlives `'x`. Now, you can see that this +trivially reduces to the same result as the normalization, since +`Iter<'a,T>: 'x` holds if `'a: 'x` and `T: 'x` (from the rule +`OutlivesNominalType`). + +OK, so we've seen that applying the rule +`OutlivesProjectionComponents` comes up with the same result as +normalizing (at least in this case), and that's a good sign. But what +is the basis of the rule? + +The basis of the rule comes from reasoning about the impl that we used +to do normalization. Let's consider that impl again, but this time +hide the actual type that was specified: + +```rust +impl<'b,U> Iterator for Iter<'b,U> { + type Item = /* */; + ... +} +``` + +So when we normalize ``, we obtain the result by applying some +substitution `Θ` to ``. This substitution is a mapping from the +lifetime/type parameters on the impl to some specific values, such +that ` == Θ as Iterator>::Item`. In this case, that +means `Θ` would be `['b => 'a, U => T]` (and of course `` would +be `&'b U`, but we're not supposed to rely on that). + +The key idea for the `OutlivesProjectionComponents` is that the only +way that `` can *fail* to outlive `'x` is if either: + +- it names some lifetime parameter `'p` where `'p: 'x` does not hold; or, +- it names some type parameter `X` where `X: 'x` does not hold. + +Now, the only way that `` can refer to a parameter `P` is if it +is brought in by the substitution `Θ`. So, if we can just show that +all the types/lifetimes that in the range of `Θ` outlive `'x`, then we +know that `Θ ` outlives `'x`. + +Put yet another way: imagine that you have an impl with *no +parameters*, like: + +```rust +impl Iterator for Foo { + type Item = /* */; +} +``` + +Clearly, whatever `` is, it can only refer to the lifetime +`'static`. So `::Item: 'static` holds. We know this +is true without ever knowing what `` is -- we just need to see +that the trait reference `` doesn't have any +lifetimes or type parameters in it, and hence the impl cannot refer to +any lifetime or type parameters. + +#### Implementation complications + +The current region inference code only permits constraints of the +form: + +``` +C = r0: r1 + | C AND C +``` + +This is convenient because a simple fixed-point iteration suffices to +find the minimal regions which satisfy the constraints. + +Unfortunately, this constraint model does not scale to the outlives +rules for projections. Consider a trait reference like `>::Item: 'Y`, where `'X` and `'Y` are both region variables +whose value is being inferred. At this point, there are several +inference rules which could potentially apply. Let us assume that +there is a where-clause in the environment like `>::Item: 'b`. In that case, *if* `'X == 'a` and `'b: 'Y`, +then we could employ the `OutlivesProjectionEnv` rule. This would +correspond to a constraint set like: + +``` +C = 'X:'a AND 'a:'X AND 'b:'Y +``` + +Otherwise, if `T: 'a` and `'X: 'Y`, then we could use the +`OutlivesProjectionComponents` rule, which would require a constraint +set like: + +``` +C = C1 AND 'X:'Y +``` + +where `C1` is the constraint set for `T:'a`. + +As you can see, these two rules yielded distinct constraint sets. +Ideally, we would combine them with an `OR` constraint, but no such +constraint is available. Adding such a constraint complicates how +inference works, since a fixed-point iteration is no longer +sufficient. + +This complication is unfortunate, but to a large extent already exists +with where-clauses and trait matching (see e.g. [#21974]). (Moreover, +it seems to be inherent to the concept of assocated types, since they +take several inputs (the parameters to the trait) which may or may not +be related to the actual type definition in question.) + +For the time being, the current implementation takes a pragmatic +approach based on heuristics. It first examines whether any region +bounds are declared in the trait and, if so, prefers to use +those. Otherwise, if there are region variables in the projection, +then it falls back to the `OutlivesProjectionComponents` rule. This is +always sufficient but may be stricter than necessary. If there are no +region variables in the projection, then it can simply run inference +to completion and check each of the other two rules in turn. (It is +still necessary to run inference because the bound may be a region +variable.) So far this approach has sufficed for all situations +encountered in practice. Eventually, we should extend the region +inferencer to a richer model that includes "OR" constraints. + +### The WF relation + +This section describes the "well-formed" relation. In +[previous RFCs][RFC 192], this was combined with the outlives +relation. We separate it here for reasons that shall become clear when +we discuss WF conditions on impls. + +The WF relation is really pretty simple: it just says that a type is +"self-consistent". Typically, this would include validating scoping +(i.e., that you don't refer to a type parameter `X` if you didn't +declare one), but we'll take those basic conditions for granted. + + WfScalar: + -------------------------------------------------- + R ⊢ scalar WF + + WfParameter: + -------------------------------------------------- + R ⊢ X WF // where X is a type parameter + + WfTuple: + ∀i. R ⊢ Ti WF + ∀i WF + + WfReference: + R ⊢ T WF // T must be WF + R ⊢ T: 'x // T must outlive 'x + -------------------------------------------------- + R ⊢ &'x T WF + + WfSlice: + R ⊢ T WF + R ⊢ T: Sized + -------------------------------------------------- + [T] WF + + WfProjection: + ∀i. R ⊢ Pi WF // all components well-formed + R ⊢ > // the projection itself is valid + -------------------------------------------------- + R ⊢ >::Id WF + +#### WF checking and higher-ranked types + +There are two places in Rust where types can introduce lifetime names +into scope: fns and trait objects. These have somewhat different rules +than the rest, simply because they modify the set `R` of bound +lifetime names. Let's start with the rule for fn types: + + WfFn: + ∀i. R, r.. ⊢ Ti WF + -------------------------------------------------- + R ⊢ for fn(T1..Tn) -> T0 WF + +Basically, this rule adds the bound lifetimes to the set `R` and then +checks whether the argument and return type are well-formed. We'll see +in the next section that means that any requirements on those types +which reference bound identifiers are just assumed to hold, but the +remainder are checked. For example, if we have a type `HashSet` +which requires that `K: Hash`, then `fn(HashSet)` would be +illegal since `NoHash: Hash` does not hold, but `for<'a> +fn(HashSet<&'a NoHash>)` *would* be legal, since `&'a NoHash: Hash` +involves a bound region `'a`. See the "Checking Conditions" section +for details. + +Note that `fn` types do not require that `T0..Tn` be `Sized`. This is +intentional. The limitation that only sized values can be passed as +argument (or returned) is enforced at the time when a fn is actually +called, as well as in actual fn definitions, but is not considered +fundamental to fn types thesmelves. There are several reasons for +this. For one thing, it's forwards compatible with passing DST by +value. For another, it means that non-defaulted trait methods to do +not have to show that their argument types are `Sized` (this will be +checked in the implementations, where more types are known). Since the +implicit `Self` type parameter is not `Sized` by default ([RFC 546]), +requiring that argument types be `Sized` in trait definitions proves +to be an annoying annotation burden. + +The object type rule is similar, though it includes an extra clause: + + WfObject: + rᵢ = union of implied region bounds from Oi + ∀i. rᵢ: r + ∀i. R ⊢ Oi WF + -------------------------------------------------- + R ⊢ O0..On+r WF + +The first two clauses here state that the explicit lifetime bound `r` +must be an approximation for the the implicit bounds `rᵢ` derived from +the trait definitions. That is, if you have a trait definition like + +```rust +trait Foo: 'static { ... } +``` + +and a trait object like `Foo+'x`, when we require that `'static: 'x` +(which is true, clearly, but in some cases the implicit bounds from +traits are not `'static` but rather some named lifetime). + +The next clause states that all object type fragments must be WF. An +object type fragment is WF if its components are WF: + + WfObjectFragment: + ∀i. R, r.. ⊢ Pi + TraitId is object safe + -------------------------------------------------- + R ⊢ for TraitId + +Note that we don't check the where clauses declared on the trait +itself. These are checked when the object is created. The reason not +to check them here is because the `Self` type is not known (this is an +object, after all), and hence we can't check them in general. (But see +*unresolved questions*.) + +#### WF checking a trait reference + +In some contexts, we want to check a trait reference, such as the ones +that appear in where clauses or type parameter bounds. The rules for +this are given here: + + WfTraitReference: + ∀i. R, r.. ⊢ Pi + C = WhereClauses(Id) // and the conditions declared on Id must hold... + R, r0...rn ⊢ [P0..Pn] C // ...after substituting parameters, of course + -------------------------------------------------- + R ⊢ for P0: TraitId + +The rules are fairly straightforward. The components must be well formed, +and any where-clauses declared on the trait itself much hold. + +#### Checking conditions + +In various rules above, we have rules that declare that a where-clause +must hold, which have the form `R ̣⊢ WhereClause`. Here, `R` represents +the set of bound regions. It may well be that `WhereClause` does not +use any of the regions in `R`. In that case, we can ignore the +bound-regions and simple check that `WhereClause` holds. But if +`WhereClause` *does* refer to regions in `R`, then we simply consider +`R ⊢ WhereClause` to hold. Those conditions will be checked later when +the bound lifetimes are instantiated (either through a call or a +projection). + +In practical terms, this means that if I have a type like: + +```rust +struct Iterator<'a, T:'a> { ... } +``` + +and a function type like `for<'a> fn(i: Iterator<'a, T>)` then this +type is considered well-formed without having to show that `T: 'a` +holds. In terms of the rules, this is because we would wind up with a +constraint like `'a ⊢ T: 'a`. + +However, if I have a type like + +```rust +struct Foo<'a, T:Eq> { .. } +``` + +and a function type like `for<'a> fn(f: Foo<'a, T>)`, I still must +show that `T: Eq` holds for that function to be well-formed. This is +because the condition which is geneated will be `'a ⊢ T: Eq`, but `'a` +is not referenced there. + +#### Implied bounds + +Implied bounds can be derived from the WF and outlives relations. The +implied bounds from a type `T` are given by expanding the requirements +that `T: WF`. Since we currently limit ourselves to implied region +bounds, we we are interesting in extracting requirements of the form: + +- `'a:'r`, where two regions must be related; +- `X:'r`, where a type parameter `X` outlives a region; or, +- `>::Id: 'r`, where a projection outlives a region. + +Some caution is required around projections when deriving implied +bounds. If we encounter a requirement that e.g. `X::Id: 'r`, we cannot +for example deduce that `X: 'r` must hold. This is because while `X: +'r` is *sufficient* for `X::Id: 'r` to hold, it is not *necessary* for +`X::Id: 'r` to hold. So we can only conclude that `X::Id: 'r` holds, +and not `X: 'r`. + +#### When should we check the WF relation and under what conditions? + +Currently the compiler performs WF checking in a somewhat haphazard +way: in some cases (such as impls), it omits checking WF, but in +others (such as fn bodies), it checks WF when it should not have +to. Partly that is due to the fact that the compiler currently +connects the WF and outlives relationship into one thing, rather than +separating them as described here. + +**Constants/statics.** The type of a constant or static can be checked +for WF in an empty environment. + +**Struct/enum declarations.** In a struct/enum declaration, we should +check that all field types are WF, given the bounds and where-clauses +from the struct declaration. Also check that where-clauses are well-formed. + +**Function items.** For function items, the environment consists of +all the where-clauses from the fn, as well as implied bounds derived +from the fn's argument types. These are then used to check that the +following are well-formed: + +- argument types; +- return type; +- where clauses; +- types of local variables. + +These WF requirements are imposed at each fn or associated fn +definition (as well as within trait items). + +**Trait impls.** In a trait impl, we assume that all types appearing +in the impl header are well-formed. This means that the initial +environment for an impl consists of the impl where-clauses and implied +bounds derived from its header. Example: Given an impl like +`impl<'a,T> SomeTrait for &'a T`, the environment would be `T: Sized` +(explicit where-clause) and `T: 'a` (implied bound derived from `&'a +T`). This environment is used as the starting point for checking the +items: + +- Where-clauses declared on the trait must be WF. +- Associated types must be WF in the trait environment. +- The types of associated constants must be WF in the trait environment. +- Associated fns are checked just like regular function items, but + with the additional implied bounds from the impl signature. + +**Inherent impls.** In an inherent impl, we can assume that the self +type is well-formed, but otherwise check the methods as if they were +normal functions. We must check that all items are well-formed, along with +the where clauses declared on the impl. + +**Trait declarations.** Trait declarations (and defaults) are checked +in the same fashion as impls, except that there are no implied bounds +from the impl header. We must check that all items are well-formed, +along with the where clauses declared on the trait. + +**Type aliases.** Type aliases are currently not checked for WF, since +they are considered transparent to type-checking. It's not clear that +this is the best policy, but it seems harmless, since the WF rules +will still be applied to the expanded version. See the *Unresolved +Questions* for some discussion on the alternatives here. + +Several points in the list above made use of *implied bounds* based on +assuming that various types were WF. We have to ensure that those +bounds are checked on the reciprocal side, as follows: + +**Fns being called.** Before calling a fn, we check that its argument +and return types are WF. This check takes place after all +higher-ranked lifetimes have been instantiated. Checking the argument +types ensures that the implied bounds due to argument types are +correct. Checking the return type ensures that the resulting type of +the call is WF. + +**Method calls, "UFCS" notation for fns and constants.** These are the +two ways to project a value out of a trait reference. A method call or +UFCS resolution will require that the trait reference is WF according +to the rules given above. + +**Normalizing associated type references.** Whenever a projection type +like `T::Foo` is normalized, we will require that the trait reference +is WF. + +# Drawbacks + +N/A + +# Alternatives + +I'm not aware of any appealing alternatives. + +# Unresolved questions + +**Best policy for type aliases.** The current policy is not to check +type aliases, since they are transparent to type-checking, and hence +their expansion can be checked instead. This is coherent, though +somewhat confusing in terms of the interaction with projections, since +we frequently cannot resolve projections without at least minimal +bounds (i.e., `type IteratorAndItem = (T::Item, +T)`). Still, full-checking of WF on type aliases seems to just mean +more annotation with little benefit. It might be nice to keep the +current policy and later, if/when we adopt a more full notion of +implied bounds, rationalize it by saying that the suitable bounds for +a type alias are implied by its expansion. + +**For trait object type fragments, should we check WF conditions when +we can?** For example, if you have: + +```rust +trait HashSet +``` + +should an object like `Box>` be illegal? It seems +like that would be inline with our "best effort" approach to bound +regions, so probably yes. + +[RFC 192]: https://github.com/rust-lang/rfcs/blob/master/text/0192-bounds-on-object-and-generic-types.md +[RFC 195]: https://github.com/rust-lang/rfcs/blob/master/text/0195-associated-items.md +[RFC 447]: https://github.com/rust-lang/rfcs/blob/master/text/0447-no-unused-impl-parameters.md +[#21748]: https://github.com/rust-lang/rust/issues/21748 +[#23442]: https://github.com/rust-lang/rust/issues/23442 +[#24622]: https://github.com/rust-lang/rust/issues/24622 +[#22436]: https://github.com/rust-lang/rust/pull/22436 +[#22246]: https://github.com/rust-lang/rust/issues/22246 +[#25860]: https://github.com/rust-lang/rust/issues/25860 +[#25692]: https://github.com/rust-lang/rust/issues/25692 +[adapted]: https://github.com/rust-lang/rust/issues/22246#issuecomment-74186523 +[#22077]: https://github.com/rust-lang/rust/issues/22077 +[#24461]: https://github.com/rust-lang/rust/pull/24461 +[#21974]: https://github.com/rust-lang/rust/issues/21974 +[RFC 546]: 0546-Self-not-sized-by-default.md + +# Appendix + +The informal explanation glossed over some details. This appendix +tries to be a bit more thorough with how it is that we can conclude +that a projection outlives `'a` if its inputs outlive `'a`. To start, +let's specify the projection `` as: + + >::Id + +where `P` can be a lifetime or type parameter as appropriate. + +Then we know that there exists some impl of the form: + +```rust +impl Trait for Q0 { + type Id = T; +} +``` + +Here again, `X` can be a lifetime or type parameter name, and `Q` can +be any lifetime or type parameter. + +Let `Θ` be a suitable substitution `[Xi => Ri]` such that `∀i. Θ Qi == +Pi` (in other words, so that the impl applies to the projection). Then +the normalized form of `` is `Θ T`. Note that because trait +matching is invariant, the types must be exactly equal. + +[RFC 447] and [#24461] require that a parameter `Xi` can only appear +in `T` if it is *constrained* by the trait reference `>`. The full definition of *constrained* appears below, +but informally it means roughly that `Xi` appears in `Q0..Qn` +somewhere outside of a projection. Let's call the constrained set of +parameters `Constrained(Q0..Qn)`. + +Recall the rule `OutlivesProjectionComponents`: + + OutlivesProjectionComponents: + ∀i. R ⊢ Pi: 'a + -------------------------------------------------- + R ⊢ >::Id: 'a + +We aim to show that `∀i. R ⊢ Pi: 'a` implies `R ⊢ (Θ T): 'a`, which implies +that this rule is a sound approximation for normalization. The +argument follows from two lemmas ("proofs" for these lemmas are +sketched below): + +1. First, we show that if `R ⊢ Pi: 'a`, then every "subcomponent" `P'` + of `Pi` outlives `'a`. The idea here is that each variable `Xi` + from the impl will match against and extract some subcomponent `P'` + of `Pi`, and we wish to show that the subcomponent `P'` extracted + by `Xi` outlives `'a`. +2. Then we will show that the type `θ T` outlives `'a` if, for each of + the in-scope parameters `Xi`, `Θ Xi: 'a`. + +**Definition 1.** `Constrained(T)` defines the set of type/lifetime +parameters that are *constrained* by a type. This set is found just by +recursing over and extracting all subcomponents *except* for those +found in a projection. This is because a type like `X::Foo` does not +constrain what type `X` can take on, rather it uses `X` as an input to +compute a result: + + Constrained(scalar) = {} + Constrained(X) = {X} + Constrained(&'x T) = {'x} | Constrained(T) + Constrained(O0..On+'x) = Union(Constrained(Oi)) | {'x} + Constrained([T]) = Constrained(T), + Constrained(for<..> fn(T1..Tn) -> T0) = Union(Constrained(Ti)) + Constrained(>::Id) = {} // empty set + +**Definition 2.** `Constrained('a) = {'a}`. In other words, a lifetime +reference just constraints itself. + +**Lemma 1:** Given `R ⊢ P: 'a`, `P = [X => P'] Q`, and `X ∈ Constrained(Q)`, +then `R ⊢ P': 'a`. Proceed by induction and by cases over the form of `P`: + +1. If `P` is a scalar or parameter, there are no subcomponents, so `P'=P`. +2. For nominal types, references, objects, and function types, either + `P'=P` or `P'` is some subcomponent of `P`. The appropriate "outlives" + rules all require that all subcomponents outlive `'a`, and hence + the conclusion follows by induction. +3. If `P'` is a projection, that implies that `P'=P`. + - Otherwise, `Q` must be a projection, and in that case, `Constrained(Q)` would be + the empty set. + +**Lemma 2:** Given that `FV(T) ∈ X`, `∀i. Ri: 'a`, then `[X => R] T: +'a`. In other words, if all the type/lifetime parameters that appear +in a type outlive `'a`, then the type outlives `'a`. Follows by +inspection of the outlives rules. + +# Edit History + +[RFC1592] - amend to require that tuple fields be sized + +[crater-errors]: https://gist.github.com/nikomatsakis/2f851e2accfa7ba2830d#root-regressions-sorted-by-rank +[crater-all]: https://gist.github.com/nikomatsakis/364fae49de18268680f2#root-regressions-sorted-by-rank +[#21953]: https://github.com/rust-lang/rust/issues/21953 +[RFC1592]: https://github.com/rust-lang/rfcs/pull/1592 \ No newline at end of file diff --git a/text/1216-bang-type.md b/text/1216-bang-type.md new file mode 100644 index 00000000000..35acefdfa51 --- /dev/null +++ b/text/1216-bang-type.md @@ -0,0 +1,415 @@ +- Feature Name: bang_type +- Start Date: 2015-07-19 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1216 +- Rust Issue: https://github.com/rust-lang/rust/issues/35121 + +# Summary + +Promote `!` to be a full-fledged type equivalent to an `enum` with no variants. + +# Motivation + +To understand the motivation for this it's necessary to understand the concept +of empty types. An empty type is a type with no inhabitants, ie. a type for +which there is nothing of that type. For example consider the type `enum Never +{}`. This type has no constructors and therefore can never be instantiated. It +is empty, in the sense that there are no values of type `Never`. Note that +`Never` is not equivalent to `()` or `struct Foo {}` each of which have exactly +one inhabitant. Empty types have some interesting properties that may be +unfamiliar to programmers who have not encountered them before. + + * They never exist at runtime. + Because there is no way to create one. + + * They have no logical machine-level representation. + One way to think about this is to consider the number of bits required to + store a value of a given type. A value of type `bool` can be in two + possible states (`true` and `false`). Therefore to specify which state a + `bool` is in we need `log2(2) ==> 1` bit of information. A value of type + `()` can only be in one possible state (`()`). Therefore to specify which + state a `()` is in we need `log2(1) ==> 0` bits of information. A value of + type `Never` has no possible states it can be in. Therefore to ask which of + these states it is in is a meaningless question and we have `log2(0) ==> + undefined` (or `-∞`). Having no representation is not problematic as safe + code never has reason nor ability to handle data of an empty type (as such + data can never exist). In practice, Rust currently treats empty types as + having size 0. + + * Code that handles them never executes. + Because there is no value that it could execute with. Therefore, having a + `Never` in scope is a static guarantee that a piece of code will never be + run. + + * They represent the return type of functions that don't return. + For a function that never returns, such as `exit`, the set of all values it + may return is the empty set. That is to say, the type of all values it may + return is the type of no inhabitants, ie. `Never` or anything isomorphic to + it. Similarly, they are the logical type for expressions that never return + to their caller such as `break`, `continue` and `return`. + + * They can be converted to any other type. + To specify a function `A -> B` we need to specify a return value in `B` for + every possible argument in `A`. For example, an expression that converts + `bool -> T` needs to specify a return value for both possible arguments + `true` and `false`: + + ```rust + let foo: &'static str = match x { + true => "some_value", + false => "some_other_value", + }; + ``` + + Likewise, an expression to convert `() -> T` needs to specify one value, + the value corresponding to `()`: + + ```rust + let foo: &'static str = match x { + () => "some_value", + }; + ``` + + And following this pattern, to convert `Never -> T` we need to specify a + `T` for every possible `Never`. Of which there are none: + + ```rust + let foo: &'static str = match x { + }; + ``` + + Reading this, it may be tempting to ask the question "what is the value of + `foo` then?". Remember that this depends on the value of `x`. As there are + no possible values of `x` it's a meaningless question and besides, the + fact that `x` has type `Never` gives us a static guarantee that the match + block will never be executed. + +Here's some example code that uses `Never`. This is legal rust code that you +can run today. + +```rust +use std::process::exit; + +// Our empty type +enum Never {} + +// A diverging function with an ordinary return type +fn wrap_exit() -> Never { + exit(0); +} + +// we can use a `Never` value to diverge without using unsafe code or calling +// any diverging intrinsics +fn diverge_from_never(n: Never) -> ! { + match n { + } +} + +fn main() { + let x: Never = wrap_exit(); + // `x` is in scope, everything below here is dead code. + + let y: String = match x { + // no match cases as `Never` has no variants + }; + + // we can still use `y` though + println!("Our string is: {}", y); + + // we can use `x` to diverge + diverge_from_never(x) +} +``` + +This RFC proposes that we allow `!` to be used directly, as a type, rather than +using `Never` (or equivalent) in its place. Under this RFC, the above code +could more simply be written. + +```rust +use std::process::exit; + +fn main() { + let x: ! = exit(0); + // `x` is in scope, everything below here is dead code. + + let y: String = match x { + // no match cases as `Never` has no variants + }; + + // we can still use `y` though + println!("Our string is: {}", y); + + // we can use `x` to diverge + x +} +``` + +So why do this? AFAICS there are 3 main reasons + + * **It removes one superfluous concept from the language and allows diverging + functions to be used in generic code.** + + Currently, Rust's functions can be divided into two kinds: those that + return a regular type and those that use the `-> !` syntax to mark + themselves as diverging. This division is unnecessary and means that + functions of the latter kind don't play well with generic code. + + For example: you want to use a diverging function where something expects a + `Fn() -> T` + + ```rust + fn foo() -> !; + fn call_a_fn T>(f: F) -> T; + + call_a_fn(foo) // ERROR! + ``` + + Or maybe you want to use a diverging function to implement a trait method + that returns an associated type: + + ```rust + trait Zog { + type Output + fn zog() -> Output; + }; + + impl Zog for T { + type Output = !; // ERROR! + fn zog() -> ! { panic!("aaah!") }; // ERROR! + } + ``` + + The workaround in these cases is to define a type like `Never` and use it + in place of `!`. You can then define functions `wrap_foo` and `unwrap_zog` + similar to the functions `wrap_exit` and `diverge_from_never` defined + earlier. It would be nice if this workaround wasn't necessary. + + * **It creates a standard empty type for use throughout rust code.** + + Empty types are useful for more than just marking functions as diverging. + When used in an enum variant they prevent the variant from ever being + instantiated. One major use case for this is if a method needs to return a + `Result` to satisfy a trait but we know that the method will always + succeed. + + For example, here's a saner implementation of `FromStr` for `String` than + currently exists in `libstd`. + + ```rust + impl FromStr for String { + type Err = !; + + fn from_str(s: &str) -> Result { + Ok(String::from(s)) + } + } + ``` + + This result can then be safely unwrapped to a `String` without using + code-smelly things like `unreachable!()` which often mask bugs in code. + + ```rust + let r: Result = FromStr::from_str("hello"); + let s = match r { + Ok(s) => s, + Err(e) => match e {}, + } + ``` + + Empty types can also be used when someone needs a dummy type to implement a + trait. Because `!` can be converted to any other type it has a trivial + implementation of any trait whose only associated items are non-static + methods. The impl simply matches on self for every method. + + Example: + + ```rust + trait ToSocketAddr { + fn to_socket_addr(&self) -> IoResult; + fn to_socket_addr_all(&self) -> IoResult>; + } + + impl ToSocketAddr for ! { + fn to_socket_addr(&self) -> IoResult { + match self {} + } + + fn to_socket_addr_all(&self) -> IoResult> { + match self {} + } + } + ``` + + All possible implementations of this trait for `!` are equivalent. This is + because any two functions that take a `!` argument and return the same type + are equivalent - they return the same result for the same arguments and + have the same effects (because they are uncallable). + + Suppose someone wants to call `fn foo(arg: Option)` with + `None`. They need to choose a type for `T` so they can pass `None::` as + the argument. However there may be no sensible default type to use for `T` + or, worse, they may not have any types at their disposal that implement + `SomeTrait`. As the user in this case is only using `None`, a sensible + choice for `T` would be a type such that `Option` can ony be `None`, ie. + it would be nice to use `!`. If `!` has a trivial implementation of + `SomeTrait` then the choice of `T` is truly irrelevant as this means `foo` + doesn't use any associated types/lifetimes/constants or static methods of + `T` and is therefore unable to distinguish `None::` from `None::`. + With this RFC, the user could `impl SomeTrait for !` (if `SomeTrait`'s + author hasn't done so already) and call `foo(None::)`. + + Currently, `Never` can be used for all the above purposes. It's useful + enough that @reem has written a package for it + [here](https://github.com/reem/rust-void) where it is named `Void`. I've also + invented it independently for my own projects and probably other people + have as well. However `!` can be extended logically to cover all the above + use cases. Doing so would standardise the concept and prevent different + people reimplementing it under different names. + + * **Better dead code detection** + + Consider the following code: + + ``` + let t = std::thread::spawn(|| panic!("nope")); + t.join().unwrap(); + println!("hello"); + + ``` + Under this RFC: the closure body gets typed `!` instead of `()`, the `unwrap()` + gets typed `!`, and the `println!` will raise a dead code warning. There's no + way current rust can detect cases like that. + + * **Because it's the correct thing to do.** + + The empty type is such a fundamental concept that - given that it already + exists in the form of empty enums - it warrants having a canonical form of + it built-into the language. For example, `return` and `break` expressions + should logically be typed `!` but currently seem to be typed `()`. (There + is some code in the compiler that assigns type `()` to diverging + expressions because it doesn't have a sensible type to assign to them). + This means we can write stuff like this: + + ```rust + match break { + () => ... // huh? Where did that `()` come from? + } + ``` + + But not this: + + ```rust + match break {} // whaddaya mean non-exhaustive patterns? + ``` + + This is just weird and should be fixed. + +I suspect the reason that `!` isn't already treated as a canonical empty type +is just most people's unfamilarity with empty types. To draw a parallel in +history: in C `void` is in essence a type like any other. However it can't be +used in all the normal positions where a type can be used. This breaks generic +code (eg. `T foo(); T val = foo()` where `T == void`) and forces one to use +workarounds such as defining `struct Void {}` and wrapping `void`-returning +functions. + +In the early days of programming having a type that contained no data probably +seemed pointless. After all, there's no point in having a `void` typed function +argument or a vector of `void`s. So `void` was treated as merely a special +syntax for denoting a function as returning no value resulting in a language +that was more broken and complicated than it needed to be. + +Fifty years later, Rust, building on decades of experience, decides to fix C's +shortsightedness and bring `void` into the type system in the form of the empty +tuple `()`. Rust also introduces coproduct types (in the form of enums), +allowing programmers to work with uninhabited types (such as `Never`). However +rust also introduces a special syntax for denoting a function as never +returning: `fn() -> !`. Here, `!` is in essence a type like any other. However +it can't be used in all the normal positions where a type can be used. This +breaks generic code (eg. `fn() -> T; let val: T = foo()` where `T == !`) and +forces one to use workarounds such as defining `enum Never {}` and wrapping +`!`-returning functions. + +To be clear, `!` has a meaning in any situation that any other type does. A `!` +function argument makes a function uncallable, a `Vec` is a vector that can +never contain an element, a `!` enum variant makes the variant guaranteed never +to occur and so forth. It might seem pointless to use a `!` function argument +or a `Vec` (just as it would be pointless to use a `()` function argument or +a `Vec<()>`), but that's no reason to disallow it. And generic code sometimes +requires it. + +Rust already has empty types in the form of empty enums. Any code that could be +written with this RFC's `!` can already be written by swapping out `!` with +`Never` (sans implicit casts, see below). So if this RFC could create any +issues for the language (such as making it unsound or complicating the +compiler) then these issues would already exist for `Never`. + +It's also worth noting that the `!` proposed here is *not* the bottom type that +used to exist in Rust in the very early days. Making `!` a subtype of all types +would greatly complicate things as it would require, for example, `Vec` be a +subtype of `Vec`. This `!` is simply an empty type (albeit one that can be +cast to any other type) + +# Detailed design + +Add a type `!` to Rust. `!` behaves like an empty enum except that it can be +implicitly cast to any other type. ie. the following code is acceptable: + +```rust +let r: Result = Ok(23); +let i = match r { + Ok(i) => i, + Err(e) => e, // e is cast to i32 +} +``` + +Implicit casting is necessary for backwards-compatibility so that code like the +following will continue to compile: + +```rust +let i: i32 = match some_bool { + true => 23, + false => panic!("aaah!"), // an expression of type `!`, gets cast to `i32` +} + +match break { + () => 23, // matching with a `()` forces the match argument to be cast to type `()` +} +``` +These casts can be implemented by having the compiler assign a fresh, diverging +type variable to any expression of type `!`. + +In the compiler, remove the distinction between diverging and converging +functions. Use the type system to do things like reachability analysis. + +Allow expressions of type `!` to be explicitly cast to any other type (eg. +`let x: u32 = break as u32;`) + +Add an implementation for `!` of any trait that it can trivially implement. Add +methods to `Result` and `Result` for safely extracting the inner +value. Name these methods along the lines of `unwrap_nopanic`, `safe_unwrap` or +something. + +# Drawbacks + +Someone would have to implement this. + +# Alternatives + + * Don't do this. + * Move @reem's `Void` type into `libcore`. This would create a standard empty + type and make it available for use in the standard libraries. If we were to + do this it might be an idea to rename `Void` to something else (`Never`, + `Empty` and `Mu` have all been suggested). Although `Void` has some + precedence in languages like Haskell and Idris the name is likely to trip + up people coming from a C/Java et al. background as `Void` is *not* `void` + but it can be easy to confuse the two. + +# Unresolved questions + +`!` has a unique impl of any trait whose only items are non-static methods. It +would be nice if there was a way a to automate the creation of these impls. +Should `!` automatically satisfy any such trait? This RFC is not blocked on +resolving this question if we are willing to accept backward-incompatibilities +in questionably-valid code which tries to call trait methods on diverging +expressions and relies on the trait being implemented for `()`. As such, the +issue has been given [it's own RFC](https://github.com/rust-lang/rfcs/pull/1637). + diff --git a/text/1219-use-group-as.md b/text/1219-use-group-as.md new file mode 100644 index 00000000000..15dd88f2ea6 --- /dev/null +++ b/text/1219-use-group-as.md @@ -0,0 +1,72 @@ +- Feature Name: use_group_as +- Start Date: 2015-02-15 +- RFC PR: [rust-lang/rfcs#1219](https://github.com/rust-lang/rfcs/pull/1219) +- Rust Issue: [rust-lang/rust#27578](https://github.com/rust-lang/rust/issues/27578) + +# Summary + +Allow renaming imports when importing a group of symbols from a module. + +```rust +use std::io::{ + Error as IoError, + Result as IoResult, + Read, + Write +} +``` + +# Motivation + +The current design requires the above example to be written like this: + +```rust +use std::io::Error as IoError; +use std::io::Result as IoResult; +use std::io::{Read, Write}; +``` + +It's unfortunate to duplicate `use std::io::` on the 3 lines, and the proposed +example feels logical, and something you reach for in this instance, without +knowing for sure if it worked. + +# Detailed design + +The current grammar for use statements is something like: + +``` + use_decl : "pub" ? "use" [ path "as" ident + | path_glob ] ; + + path_glob : ident [ "::" [ path_glob + | '*' ] ] ? + | '{' path_item [ ',' path_item ] * '}' ; + + path_item : ident | "self" ; +``` + +This RFC proposes changing the grammar to something like: + +``` + use_decl : "pub" ? "use" [ path [ "as" ident ] ? + | path_glob ] ; + + path_glob : ident [ "::" [ path_glob + | '*' ] ] ? + | '{' path_item [ ',' path_item ] * '}' ; + + path_item : ident [ "as" ident] ? + | "self" [ "as" ident]; +``` + +The `"as" ident` part is optional in each location, and if omitted, it is expanded +to alias to the same name, e.g. `use foo::{bar}` expands to `use foo::{bar as bar}`. + +This includes being able to rename `self`, such as `use std::io::{self +as stdio, Result as IoResult};`. + +# Drawbacks + +# Alternatives + +# Unresolved Questions diff --git a/text/1228-placement-left-arrow.md b/text/1228-placement-left-arrow.md new file mode 100644 index 00000000000..ffeba08d696 --- /dev/null +++ b/text/1228-placement-left-arrow.md @@ -0,0 +1,212 @@ +- Feature Name: place_left_arrow_syntax +- Start Date: 2015-07-28 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1228 +- Rust Issue: https://github.com/rust-lang/rust/issues/27779 + +# Summary + +Rather than trying to find a clever syntax for placement-new that leverages +the `in` keyword, instead use the syntax `PLACE_EXPR <- VALUE_EXPR`. + +This takes advantage of the fact that `<-` was reserved as a token via +historical accident (that for once worked out in our favor). + +# Motivation + +One sentence: the syntax `a <- b` is short, can be parsed without +ambiguity, and is strongly connotated already with assignment. + +Further text (essentially historical background): + +There is much debate about what syntax to use for placement-new. +We started with `box (PLACE_EXPR) VALUE_EXPR`, then migrated towards +leveraging the `in` keyword instead of `box`, yielding `in (PLACE_EXPR) VALUE_EXPR`. + +A lot of people disliked the `in (PLACE_EXPR) VALUE_EXPR` syntax +(see discussion from [RFC 809]). + +[RFC 809]: https://github.com/rust-lang/rfcs/pull/809 + +In response to that discussion (and also due to personal preference) +I suggested the alternative syntax `in PLACE_EXPR { BLOCK_EXPR }`, +which is what landed when [RFC 809] was merged. + +However, it is worth noting that this alternative syntax actually +failed to address a number of objections (some of which also +applied to the original `in (PLACE_EXPR) VALUE_EXPR` syntax): + + * [kennytm](https://github.com/rust-lang/rfcs/pull/809#issuecomment-73071324) + + > While in (place) value is syntactically unambiguous, it looks + > completely unnatural as a statement alone, mainly because there + > are no verbs in the correct place, and also using in alone is + > usually associated with iteration (for x in y) and member + > testing (elem in set). + + * [petrochenkov](https://github.com/rust-lang/rfcs/pull/809#issuecomment-73142168) + + > As C++11 experience has shown, when it's available, it will + > become the default method of inserting elements in containers, + > since it's never performing worse than "normal insertion" and + > is often better. So it should really have as short and + > convenient syntax as possible. + + * [p1start](https://github.com/rust-lang/rfcs/pull/809#issuecomment-73837430) + + > I’m not a fan of in { }, simply because the + > requirement of a block suggests that it’s some kind of control + > flow structure, or that all the statements inside will be + > somehow run ‘in’ the given (or perhaps, as @m13253 + > seems to have interpreted it, for all box expressions to go + > into the given place). It would be our first syntactical + > construct which is basically just an operator that has to + > have a block operand. + +I believe the `PLACE_EXPR <- VALUE_EXPR` syntax addresses all of the +above concerns. + +Thus cases like allocating into an arena (which needs to take as input the arena itself +and a value-expression, and returns a reference or handle for the allocated entry in the arena -- i.e. *cannot* return unit) +would look like: + +```rust +let ref_1 = arena <- value_expression; +let ref_2 = arena <- value_expression; +``` + +compare the above against the way this would look under [RFC 809]: + +```rust +let ref_1 = in arena { value_expression }; +let ref_2 = in arena { value_expression }; +``` + +# Detailed design + +Extend the parser to parse `EXPR <- EXPR`. The left arrow operator is +right-associative and has precedence higher than assignment and +binop-assignment, but lower than other binary operators. + +`EXPR <- EXPR` is parsed into an AST form that is desugared in much +the same way that `in EXPR { BLOCK }` or `box (EXPR) EXPR` are +desugared (see [PR 27215]). + +Thus the static and dynamic semantics of `PLACE_EXPR <- VALUE_EXPR` +are *equivalent* to `box (PLACE_EXPR) VALUE_EXPR`. Namely, it is +still an expression form that operates by: + 1. Evaluate the `PLACE_EXPR` to a place + 2. Evaluate `VALUE_EXPR` directly into the constructed place + 3. Return the finalized place value. + +(See protocol as documented in [RFC 809] for more details here.) + +[PR 27215]: https://github.com/rust-lang/rust/pull/27215 + +This parsing form can be separately feature-gated (this RFC was +written assuming that would be the procedure). However, since +placement-`in` landed very recently ([PR 27215]) and is still +feature-gated, we can also just fold this change in with +the pre-existing `placement_in_syntax` feature gate +(though that may be non-intuitive since the keyword `in` is +no longer part of the syntactic form). + +This feature has already been prototyped, see [place-left-syntax branch]. + +[place-left-syntax branch]: https://github.com/rust-lang/rust/compare/rust-lang:master...pnkfelix:place-left-syntax + +Then, (after sufficient snapshot and/or time passes) remove the following syntaxes: + + * `box (PLACE_EXPR) VALUE_EXPR` + * `in PLACE_EXPR { VALUE_BLOCK }` + +That is, `PLACE_EXPR <- VALUE_EXPR` will be the "one true way" to +express placement-new. + +(Note that support for `box VALUE_EXPR` will remain, and in fact, the +expression `(box ())` expression will become unambiguous and thus we +could make it legal. Because, you know, those boxes of unit have a +syntax that is really important to optimize.) + +Finally, it would may be good, as part of this process, to actually +amend the text [RFC 809] itself to use the `a <- b` syntax. +At least, it seems like many people use the RFC's as a reference source +even when they are later outdated. +(An easier option though may be to just add a forward reference to this +RFC from [RFC 809], if this RFC is accepted.) + +# Drawbacks + +The only drawback I am aware of is this [comment from nikomataskis](https://github.com/rust-lang/rfcs/pull/809#issuecomment-73903777) + +> the intent is less clear than with a devoted keyword. + +Note however that this was stated with regards to a hypothetical +overloading of the `=` operator (at least that is my understanding). + +I think the use of the `<-` operator can be considered sufficiently +"devoted" (i.e. separate) syntax to placate the above concern. + +# Alternatives + +See [different surface syntax] from the alternatives from [RFC 809]. + +[different surface syntax]: https://github.com/pnkfelix/rfcs/blob/fsk-placement-box-rfc/text/0000-placement-box.md#same-semantics-but-different-surface-syntax + +Also, if we want to try to make it clear that this is not *just* +an assignment, we could combine `in` and `<-`, yielding e.g.: + +```rust +let ref_1 = in arena <- value_expression; +let ref_2 = in arena <- value_expression; +``` + +## Precedence + +Finally, precedence of this operator may be defined to be anything from being +less than assignment/binop-assignment (set of right associative operators with +lowest precedence) to highest in the language. The most prominent choices are: + +1. Less than assignment: + + Assuming `()` never becomes a `Placer`, this resolves a pretty common + complaint that a statement such as `x = y <- z` is not clear or readable + by forcing the programmer to write `x = (y <- z)` for code to typecheck. + This, however introduces an inconsistency in parsing between `let x =` and + `x =`: `let x = (y <- z)` but `(x = z) <- y`. + +2. Same as assignment and binop-assignment: + + `x = y <- z = a <- b = c = d <- e <- f` parses as + `x = (y <- (z = (a <- (b = (c = (d <- (e <- f)))))))`. This is so far + the easiest option to implement in the compiler. + +3. More than assignment and binop-assignment, but less than any other operator: + + This is what this RFC currently proposes. This allows for various + expressions involving equality symbols and `<-` to be parsed reasonably and + consistently. For example `x = y <- z += a <- b <- c` would get parsed as `x + = ((y <- z) += (a <- (b <- c)))`. + +4. More than any operator: + + This is not a terribly interesting one, but still an option. Works well if + we want to force people enclose both sides of the operator into parentheses + most of the time. This option would get `x <- y <- z * a` parsed as `(x <- + (y <- z)) * a`. + +# Unresolved questions + +**What should the precedence of the `<-` operator be?** In particular, +it may make sense for it to have the same precedence of `=`, as argued +in [these][huon1] [comments][huon2]. The ultimate answer here will +probably depend on whether the result of `a <- b` is commonly composed +and how, so it was decided to hold off on a final decision until there +was more usage in the wild. + +[huon1]: https://github.com/rust-lang/rfcs/pull/1319#issuecomment-206627750 +[huon2]: https://github.com/rust-lang/rfcs/pull/1319#issuecomment-207090495 + +# Change log + +**2016.04.22.** Amended by [rust-lang/rfcs#1319](https://github.com/rust-lang/rfcs/pull/1319) +to adjust the precedence. diff --git a/text/1229-compile-time-asserts.md b/text/1229-compile-time-asserts.md new file mode 100644 index 00000000000..c15720e2d36 --- /dev/null +++ b/text/1229-compile-time-asserts.md @@ -0,0 +1,107 @@ +- Feature Name: compile_time_asserts +- Start Date: 2015-07-30 +- RFC PR: [rust-lang/rfcs#1229](https://github.com/rust-lang/rfcs/pull/1229) +- Rust Issue: [rust-lang/rust#28238](https://github.com/rust-lang/rust/issues/28238) + +# Summary + +If the constant evaluator encounters erronous code during the evaluation of +an expression that is not part of a true constant evaluation context a warning +must be emitted and the expression needs to be translated normally. + +# Definition of constant evaluation context + +There are exactly five places where an expression needs to be constant. + +- the initializer of a constant `const foo: ty = EXPR` or `static foo: ty = EXPR` +- the size of an array `[T; EXPR]` +- the length of a repeat expression `[VAL; LEN_EXPR]` +- C-Like enum variant discriminant values +- patterns + +In the future the body of `const fn` might also be interpreted as a constant +evaluation context. + +Any other expression might still be constant evaluated, but it could just +as well be compiled normally and executed at runtime. + +# Motivation + +Expressions are const-evaluated even when they are not in a const environment. + +For example + +```rust +fn blub(t: T) -> T { t } +let x = 5 << blub(42); +``` + +will not cause a compiler error currently, while `5 << 42` will. +If the constant evaluator gets smart enough, it will be able to const evaluate +the `blub` function. This would be a breaking change, since the code would not +compile anymore. (this occurred in https://github.com/rust-lang/rust/pull/26848). + +# Detailed design + +The PRs https://github.com/rust-lang/rust/pull/26848 and https://github.com/rust-lang/rust/pull/25570 will be setting a precedent +for warning about such situations (WIP, not pushed yet). + +When the constant evaluator fails while evaluating a normal expression, +a warning will be emitted and normal translation needs to be resumed. + +# Drawbacks + +None, if we don't do anything, the const evaluator cannot get much smarter. + +# Alternatives + +## allow breaking changes + +Let the compiler error on things that will unconditionally panic at runtime. + +## insert an unconditional panic instead of generating regular code + +GNAT (an Ada compiler) does this already: + +```ada +procedure Hello is + Var: Integer range 15 .. 20 := 21; +begin + null; +end Hello; +``` + +The anonymous subtype `Integer range 15 .. 20` only accepts values in `[15, 20]`. +This knowledge is used by GNAT to emit the following warning during compilation: + +``` +warning: value not in range of subtype of "Standard.Integer" defined at line 2 +warning: "Constraint_Error" will be raised at run time +``` + +I don't have a GNAT with `-emit-llvm` handy, but here's the asm with `-O0`: + +```asm +.cfi_startproc +pushq %rbp +.cfi_def_cfa_offset 16 +.cfi_offset 6, -16 +movq %rsp, %rbp +.cfi_def_cfa_register 6 +movl $2, %esi +movl $.LC0, %edi +movl $0, %eax +call __gnat_rcheck_CE_Range_Check +``` + + +# Unresolved questions + +## Const-eval the body of `const fn` that are never used in a constant environment + +Currently a `const fn` that is called in non-const code is treated just like a normal function. + +In case there is a statically known erroneous situation in the body of the function, +the compiler should raise an error, even if the function is never called. + +The same applies to unused associated constants. diff --git a/text/1236-stabilize-catch-panic.md b/text/1236-stabilize-catch-panic.md new file mode 100644 index 00000000000..f943c613293 --- /dev/null +++ b/text/1236-stabilize-catch-panic.md @@ -0,0 +1,488 @@ +- Feature Name: `recover` +- Start Date: 2015-07-24 +- RFC PR: [rust-lang/rfcs#1236](https://github.com/rust-lang/rfcs/pull/1236) +- Rust Issue: [rust-lang/rust#27719](https://github.com/rust-lang/rust/issues/27719) + +# Summary + +Move `std::thread::catch_panic` to `std::panic::recover` after replacing the +`Send + 'static` bounds on the closure parameter with a new `PanicSafe` +marker trait. + +# Motivation + +In today's stable Rust it's not possible to catch a panic on the thread that +caused it. There are a number of situations, however, where this is +either required for correctness or necessary for building a useful abstraction: + +* It is currently defined as undefined behavior to have a Rust program panic + across an FFI boundary. For example if C calls into Rust and Rust panics, then + this is undefined behavior. Being able to catch a panic will allow writing + C APIs in Rust that do not risk aborting the process they are embedded into. + +* Abstractions like thread pools want to catch the panics of tasks being run + instead of having the thread torn down (and having to spawn a new thread). + +Stabilizing the `catch_panic` function would enable these two use cases, but +let's also take a look at the current signature of the function: + +```rust +fn catch_panic(f: F) -> thread::Result + where F: FnOnce() -> R + Send + 'static +``` + +This function will run the closure `f` and if it panics return `Err(Box)`. +If the closure doesn't panic it will return `Ok(val)` where `val` is the +returned value of the closure. The closure, however, is restricted to only close +over `Send` and `'static` data. These bounds can be overly restrictive, and due +to thread-local storage [they can be subverted][tls-subvert], making it unclear +what purpose they serve. This RFC proposes to remove the bounds as well. + +[tls-subvert]: https://github.com/rust-lang/rust/issues/25662 + +Historically Rust has purposefully avoided the foray into the situation of +catching panics, largely because of a problem typically referred to as +"exception safety". To further understand the motivation of stabilization and +relaxing the bounds, let's review what exception safety is and what it means for +Rust. + +# Background: What is exception safety? + +Languages with exceptions have the property that a function can "return" early +if an exception is thrown. While exceptions aren't too hard to reason about when +thrown explicitly, they can be problematic when they are thrown by code being +called -- especially when that code isn't known in advance. Code is **exception +safe** if it works correctly even when the functions it calls into throw +exceptions. + +The idea of throwing an exception causing bugs may sound a bit alien, so it's +helpful to drill down into exactly why this is the case. Bugs related to +exception safety are comprised of two critical components: + +1. An invariant of a data structure is broken. +2. This broken invariant is the later observed. + +Exceptional control flow often exacerbates this first component of breaking +invariants. For example many data structures have a number of invariants that +are dynamically upheld for correctness, and the type's routines can temporarily +break these invariants to be fixed up before the function returns. If, however, +an exception is thrown in this interim period the broken invariant could be +accidentally exposed. + +The second component, observing a broken invariant, can sometimes be difficult +in the face of exceptions, but languages often have constructs to enable these +sorts of witnesses. Two primary methods of doing so are something akin to +finally blocks (code run on a normal or exceptional return) or just catching the +exception. In both cases code which later runs that has access to the original +data structure will see the broken invariants. + +Now that we've got a better understanding of how an exception might cause a bug +(e.g. how code can be "exception unsafe"), let's take a look how we can make +code exception safe. To be exception safe, code needs to be prepared for an +exception to be thrown whenever an invariant it relies on is broken, for +example: + +* Code can be audited to ensure it only calls functions which are statically + known to not throw an exception. +* Local "cleanup" handlers can be placed on the stack to restore invariants + whenever a function returns, either normally or exceptionally. This can be + done through finally blocks in some languages or via destructors in others. +* Exceptions can be caught locally to perform cleanup before possibly re-raising + the exception. + +With all that in mind, we've now identified problems that can arise via +exceptions (an invariant is broken and then observed) as well as methods to +ensure that prevent this from happening. In languages like C++ this means that +we can be memory safe in the face of exceptions and in languages like Java we +can ensure that our logical invariants are upheld. Given this background let's +take a look at how any of this applies to Rust. + +# Background: What is exception safety in Rust? + +> Note: This section describes the current state of Rust today without this RFC +> implemented + +Up to now we've been talking about exceptions and exception safety, but from a +Rust perspective we can just replace this with panics and panic safety. Panics +in Rust are currently implemented essentially as a C++ exception under the hood. +As a result, **exception safety is something that needs to be handled in Rust +code today**. + +One of the primary examples where panics need to be handled in Rust is unsafe +code. Let's take a look at an example where this matters: + +```rust +pub fn push_ten_more(v: &mut Vec, t: T) { + unsafe { + v.reserve(10); + let len = v.len(); + v.set_len(len + 10); + for i in 0..10 { + ptr::write(v.as_mut_ptr().offset(len + i), t.clone()); + } + } +} +``` + +While this code may look correct, it's actually not memory safe. +`Vec` has an internal invariant that its first `len` elements are safe to drop +at any time. Our function above has temporarily broken this invariant with the +call to `set_len` (the next 10 elements are uninitialized). If the type `T`'s +`clone` method panics then this broken invariant will escape the function. The +broken `Vec` is then observed during its destructor, leading to the eventual +memory unsafety. + +It's important to keep in mind that panic safety in Rust is not solely limited +to memory safety. *Logical invariants* are often just as critical to keep +correct during execution and no `unsafe` code in Rust is needed to break a +logical invariant. In practice, however, these sorts of bugs are rarely observed +due to Rust's design: + +* Rust doesn't expose uninitialized memory +* Panics cannot be caught in a thread +* Across threads data is poisoned by default on panics +* Idiomatic Rust must opt in to extra sharing across boundaries (e.g. `RefCell`) +* Destructors are relatively rare and uninteresting in safe code + +These mitigations all address the *second* aspect of exception unsafety: +observation of broken invariants. With the tactics in place, it ends up being +the case that **safe Rust code can largely ignore exception safety +concerns**. That being said, it does not mean that safe Rust code can *always* +ignore exception safety issues. There are a number of methods to subvert the +mitigation strategies listed above: + +1. When poisoning data across threads, antidotes are available to access + poisoned data. Namely the [`PoisonError` type][pet] allows safe access to the + poisoned information. +2. Single-threaded types with interior mutability, such as `RefCell`, allow for + sharing data across stack frames such that a broken invariant could + eventually be observed. +3. Whenever a thread panics, the destructors for its stack variables will be run + as the thread unwinds. Destructors may have access to data which was also + accessible lower on the stack (such as through `RefCell` or `Rc`) which has a + broken invariant, and the destructor may then witness this. + +[pet]: http://doc.rust-lang.org/std/sync/struct.PoisonError.html + +But all of these "subversions" fall outside the realm of normal, idiomatic, safe +Rust code, and so they all serve as a "heads up" that panic safety might be an +issue. Thus, in practice, Rust programmers worry about exception safety far less +than in languages with full-blown exceptions. + +Despite these methods to subvert the mitigations placed by default in Rust, a +key part of exception safety in Rust is that **safe code can never lead to +memory unsafety**, regardless of whether it panics or not. Memory unsafety +triggered as part of a panic can always be traced back to an `unsafe` block. + +With all that background out of the way now, let's take a look at the guts of +this RFC. + +# Detailed design + +At its heart, the change this RFC is proposing is to move +`std::thread::catch_panic` to a new `std::panic` module and rename the function +to `recover`. Additionally, the `Send + 'static` bounds on the closure parameter +will be replaced with a new trait `PanicSafe`, modifying the signature to +be: + +```rust +fn recover R + PanicSafe, R>(f: F) -> thread::Result +``` + +Before analyzing this new signature, let's take a look at this new +`PanicSafe` trait. + +## A `PanicSafe` marker trait + +As discussed in the motivation section above, the current bounds of `Send + +'static` on the closure parameter are too restrictive for common use cases, but +they can serve as a "speed bump" (like poisoning on mutexes) to add to the +repertoire of mitigation strategies that Rust has by default for dealing with +panics. + +The purpose of this marker trait will be to identify patterns which do not need +to worry about exception safety and allow them by default. In situations where +exception safety *may* be concerned then an explicit annotation will be needed +to allow the usage. In other words, this marker trait will act similarly to a +"targeted `unsafe` block". + +For the implementation details, the following items will be added to the +`std::panic` module. + +```rust +pub trait PanicSafe {} +impl PanicSafe for .. {} + +impl<'a, T> !PanicSafe for &'a mut T {} +impl<'a, T: NoUnsafeCell> PanicSafe for &'a T {} +impl PanicSafe for Mutex {} + +pub trait NoUnsafeCell {} +impl NoUnsafeCell for .. {} +impl !NoUnsafeCell for UnsafeCell {} + +pub struct AssertPanicSafe(pub T); +impl PanicSafe for AssertPanicSafe {} + +impl Deref for AssertPanicSafe { + type Target = T; + fn deref(&self) -> &T { &self.0 } +} +impl DerefMut for AssertPanicSafe { + fn deref_mut(&mut self) -> &mut T { &mut self.0 } +} +``` + +Let's take a look at each of these items in detail: + +* `impl PanicSafe for .. {}` - this makes this trait a marker trait, implying + that a the trait is implemented for all types by default so long as the + consituent parts implement the trait. +* `impl !PanicSafe for &mut T {}` - this indicates that exception safety + needs to be handled when dealing with mutable references. Thinking about the + `recover` function, this means that the pointer could be modified inside the + block, but once it exits the data may or may not be in an invalid state. +* `impl PanicSafe for &T {}` - similarly to the above + implementation for `&mut T`, the purpose here is to highlight points where + data can be mutated across a `recover` boundary. If `&T` does not contains an + `UnsafeCell`, then no mutation should be possible and it is safe to allow. +* `impl PanicSafe for Mutex {}` - as mutexes are poisoned by default, they + are considered exception safe. +* `pub struct AssertPanicSafe(pub T);` - this is the "opt out" structure of + exception safety. Wrapping something in this type indicates an assertion that + it is exception safe and shouldn't be warned about when crossing the `recover` + boundary. Otherwise this type simply acts like a `T`. + +### Example usage + +The only consumer of the `PanicSafe` bound is the `recover` function on the +closure type parameter, and this ends up meaning that the *environment* needs to +be exception safe. In terms of error messages, this causes the compiler to emit +an error per closed-over-variable to indicate whether or not it is exception +safe to share across the boundary. + +It is also a critical design aspect that usage of `PanicSafe` or +`AssertPanicSafe` does not require `unsafe` code. As discussed above, panic +safety does not directly lead to memory safety problems in otherwise safe code. + +In the normal usage of `recover`, neither `PanicSafe` nor `AssertPanicSafe` +should be necessary to mention. For example when defining an FFI function: + +```rust +#[no_mangle] +pub extern fn called_from_c(ptr: *const c_char, num: i32) -> i32 { + let result = panic::recover(|| { + let s = unsafe { CStr::from_ptr(ptr) }; + println!("{}: {}", s, num); + }); + match result { + Ok(..) => 0, + Err(..) => 1, + } +} +``` + +Additionally, if FFI functions instead use normal Rust types, `AssertPanicSafe` +still need not be mentioned at all: + +```rust +#[no_mangle] +pub extern fn called_from_c(ptr: &i32) -> i32 { + let result = panic::recover(|| { + println!("{}", *ptr); + }); + match result { + Ok(..) => 0, + Err(..) => 1, + } +} +``` + +If, however, types are coming in which are flagged as not exception safe then +the `AssertPanicSafe` wrapper can be used to leverage `recover`: + +```rust +fn foo(data: &RefCell) { + panic::recover(|| { + println!("{}", data.borrow()); //~ ERROR RefCell is not panic safe + }); +} +``` + +This can be fixed with a simple assertion that the usage here is indeed +exception safe: + +```rust +fn foo(data: &RefCell) { + let data = AssertPanicSafe(data); + panic::recover(|| { + println!("{}", data.borrow()); // ok + }); +} +``` + +### Future extensions + +In the future, this RFC proposes adding the following implementation of +`PanicSafe`: + +```rust +impl PanicSafe for T {} +``` + +This implementation block encodes the "exception safe" boundary of +`thread::spawn` but is unfortunately not allowed today due to coherence rules. +If available, however, it would possibly reduce the number of false positives +which require using `AssertPanicSafe`. + +### Global complexity + +Adding a new marker trait is a pretty hefty move for the standard library. The +current marker traits, `Send` and `Sync`, are well known and are ubiquitous +throughout the ecosystem and standard library. Due to the way that these +properties are derived, adding a new marker trait can lead to a multiplicative +increase in global complexity (as all types must consider the marker trait). + +With `PanicSafe`, however, it is expected that this is not the case. The +`recover` function is not intented to be used commonly outside of FFI or thread +pool-like abstractions. Within FFI the `PanicSafe` trait is typically not +mentioned due to most types being relatively simple. Thread pools, on the other +hand, will need to mention `AssertPanicSafe`, but will likely propagate panics +to avoid exposing `PanicSafe` as a bound. + +Overall, the expected idiomatic usage of `recover` should mean that `PanicSafe` +is rarely mentioned, if at all. It is intended that `AssertPanicSafe` is ideally +only necessary where it actually needs to be considered (which idiomatically +isn't too often) and even then it's lightweight to use. + +## Will Rust have exceptions? + +In a technical sense this RFC is not "adding exceptions to Rust" as they already +exist in the form of panics. What this RFC is adding, however, is a construct +via which to catch these exceptions within a thread, bringing the standard +library closer to the exception support in other languages. + +Catching a panic makes it easier to observe broken invariants of data structures +shared across the `catch_panic` boundary, which can possibly increase the +likelihood of exception safety issues arising. + +The risk of this step is that catching panics becomes an idiomatic way to deal +with error-handling, thereby making exception safety much more of a headache +than it is today (as it's more likely that a broken invariant is later +witnessed). The `catch_panic` function is intended to only be used +where it's absolutely necessary, e.g. for FFI boundaries, but how can it be +ensured that `catch_panic` isn't overused? + +There are two key reasons `catch_panic` likely won't become idiomatic: + +1. There are already strong and established conventions around error handling, + and in particular around the use of panic and `Result` with stabilized usage + of them in the standard library. There is little chance these conventions + would change overnight. + +2. There has long been a desire to treat every use of `panic!` as an abort + which is motivated by portability, compile time, binary size, and a number of + other factors. Assuming this step is taken, it would be extremely unwise for + a library to signal expected errors via panics and rely on consumers using + `catch_panic` to handle them. + +For reference, here's a summary of the conventions around `Result` and `panic`, +which still hold good after this RFC: + +### Result vs Panic + +There are two primary strategies for signaling that a function can fail in Rust +today: + +* `Results` represent errors/edge-cases that the author of the library knew + about, and expects the consumer of the library to handle. + +* `panic`s represent errors that the author of the library did not expect to + occur, such as a contract violation, and therefore does not expect the + consumer to handle in any particular way. + +Another way to put this division is that: + +* `Result`s represent errors that carry additional contextual information. This + information allows them to be handled by the caller of the function producing + the error, modified with additional contextual information, and eventually + converted into an error message fit for a top-level program. + +* `panic`s represent errors that carry no contextual information (except, + perhaps, debug information). Because they represented an unexpected error, + they cannot be easily handled by the caller of the function or presented to + the top-level program (except to say "something unexpected has gone wrong"). + +Some pros of `Result` are that it signals specific edge cases that you as a +consumer should think about handling and it allows the caller to decide +precisely how to handle the error. A con with `Result` is that defining errors +and writing down `Result` + `try!` is not always the most ergonomic. + +The pros and cons of `panic` are essentially the opposite of `Result`, being +easy to use (nothing to write down other than the panic) but difficult to +determine when a panic can happen or handle it in a custom fashion, even with +`catch_panic`. + +These divisions justify the use of `panic`s for things like out-of-bounds +indexing: such an error represents a programming mistake that (1) the author of +the library was not aware of, by definition, and (2) cannot be meaningfully +handled by the caller. + +In terms of heuristics for use, `panic`s should rarely if ever be used to report +routine errors for example through communication with the system or through IO. +If a Rust program shells out to `rustc`, and `rustc` is not found, it might be +tempting to use a panic because the error is unexpected and hard to recover +from. A user of the program, however, would benefit from intermediate code +adding contextual information about the in-progress operation, and the program +could report the error in terms a they can understand. While the error is +rare, **when it happens it is not a programmer error**. In short, panics are +roughly analogous to an opaque "an unexpected error has occurred" message. + +Stabilizing `catch_panic` does little to change the tradeoffs around `Result` +and `panic` that led to these conventions. + +# Drawbacks + +A drawback of this RFC is that it can water down Rust's error handling story. +With the addition of a "catch" construct for exceptions, it may be unclear to +library authors whether to use panics or `Result` for their error types. As we +discussed above, however, Rust's design around error handling has always had to +deal with these two strategies, and our conventions don't materially change by +stabilizing `catch_panic`. + +# Alternatives + +One alternative, which is somewhat more of an addition, is to have the standard +library entirely abandon all exception safety mitigation tactics. As explained +in the motivation section, exception safety will not lead to memory unsafety +unless paired with unsafe code, so it is perhaps within the realm of possibility +to remove the tactics of poisoning from mutexes and simply require that +consumers deal with exception safety 100% of the time. + +This alternative is often motivated by saying that there are enough methods to +subvert the default mitigation tactics that it's not worth trying to plug some +holes and not others. Upon closer inspection, however, the areas where safe code +needs to worry about exception safety are isolated to the single-threaded +situations. For example `RefCell`, destructors, and `catch_panic` all only +expose data possibly broken through a panic in a single thread. + +Once a thread boundary is crossed, the only current way to share data mutably is +via `Mutex` or `RwLock`, both of which are poisoned by default. This sort of +sharing is fundamental to threaded code, and poisoning by default allows safe +code to freely use many threads without having to consider exception safety +across threads (as poisoned data will tear down all connected threads). + +This property of multithreaded programming in Rust is seen as strong enough that +poisoning should not be removed by default, and in fact a new hypothetical +`thread::scoped` API (a rough counterpart of `catch_panic`) could also propagate +panics by default (like poisoning) with an ability to opt out (like +`PoisonError`). + +# Unresolved questions + +- Is it worth keeping the `'static` and `Send` bounds as a mitigation measure in + practice, even if they aren't enforceable in theory? That would require thread + pools to use unsafe code, but that could be acceptable. + +- Should `catch_panic` be stabilized within `std::thread` where it lives today, + or somewhere else? diff --git a/text/1238-nonparametric-dropck.md b/text/1238-nonparametric-dropck.md new file mode 100644 index 00000000000..a79ead95e2a --- /dev/null +++ b/text/1238-nonparametric-dropck.md @@ -0,0 +1,541 @@ +- Feature Name: dropck_parametricity +- Start Date: 2015-08-05 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1238/ +- Rust Issue: https://github.com/rust-lang/rust/issues/28498 + +# Summary + +Revise the Drop Check (`dropck`) part of Rust's static analyses in two +ways. In the context of this RFC, these revisions are respectively +named `cannot-assume-parametricity` and `unguarded-escape-hatch`. + + 1. `cannot-assume-parametricity` (CAP): Make `dropck` analysis stop + relying on parametricity of type-parameters. + + 2. `unguarded-escape-hatch` (UGEH): Add an attribute (with some name + starting with "unsafe") that a library designer can attach to a + `drop` implementation that will allow a destructor to side-step + the `dropck`'s constraints (unsafely). + +# Motivation + +## Background: Parametricity in `dropck` + +The Drop Check rule (`dropck`) for [Sound Generic Drop][] relies on a +reasoning process that needs to infer that the behavior of a +polymorphic function (e.g. `fn foo`) does not depend on the +concrete type instantiations of any of its *unbounded* type parameters +(e.g. `T` in `fn foo`), at least beyond the behavior of the +destructor (if any) for those type parameters. + +[Sound Generic Drop]: https://github.com/rust-lang/rfcs/blob/master/text/0769-sound-generic-drop.md + +This property is a (weakened) form of a property known in academic +circles as *Parametricity*. +(See e.g. [Reynolds, IFIP 1983][Rey83], [Wadler, FPCA 1989][Wad89].) + + * Parametricity, in this context, essentially says that the compiler + can reason about the body of `foo` (and the subroutines that `foo` + invokes) without having to think about the particular concrete + types that the type parameter `T` is instantiated with. + `foo` cannot do anything with a `t: T` except: + + 1. move `t` to some other owner expecting a `T` or, + + 2. drop `t`, running its destructor and freeing associated resources. + + * For example, this allows the compiler to deduce that even if `T` is + instantiated with a concrete type like `&Vec`, the body of + `foo` cannot actually read any `u32` data out of the vector. More + details about this are available on the [Sound Generic Drop][] RFC. + +## "Mistakes were made" + +The parametricity-based reasoning in the +[Drop Check analysis][Sound Generic Drop] (`dropck`) was clever, but +fragile and unproven. + + * Regarding its fragility, it has been shown to have + [bugs][parametricity-insufficient]; in particular, parametricity is + a necessary but *not* sufficient condition to justify the + inferences that `dropck` makes. + + * Regarding its unproven nature, `dropck` violated the heuristic in + Rust's design to not incorporate ideas unless those ideas had + already been proven effective elsewhere. + +[parametricity-insufficient]: https://github.com/rust-lang/rust/issues/26656 + +These issues might alone provide motivation for ratcheting back on +`dropck`'s rules in the short term, putting in a more conservative +rule in the stable release channel while allowing experimentation with +more-aggressive feature-gated rules in the development nightly release +channel. + +However, there is also a specific reason why we want to ratchet back +on the `dropck` analysis as soon as possible. + +## Impl specialization is inherently non-parametric + +The parametricity requirement in the Drop Check rule over-restricts +the design space for future language changes. + +In particular, the [impl specialization] RFC describes a language +change that will allow the invocation of a polymorphic function `f` to +end up in different sequences of code based solely on the concrete +type of `T`, *even* when `T` has no trait bounds within its +declaration in `f`. + +[impl specialization]: https://github.com/rust-lang/rfcs/pull/1210 + +# Detailed design + +Revise the Drop Check (`dropck`) part of Rust's static analyses in two +ways. In the context of this RFC, these revisions are respectively +named `cannot-assume-parametricity` (CAP) and `unguarded-escape-hatch` (UGEH). + +Though the revisions are given distinct names, they both fall under +the feature gate `dropck_parametricity`. (Note however that this +might be irrelevant to CAP; see [CAP stabilization details][]). + +## cannot-assume-parametricity + +The heart of CAP is this: make `dropck` analysis stop relying on +parametricity of type-parameters. + +### Changes to the Drop-Check Rule + +The Drop-Check Rule (both in its original form and as revised here) +dicates when a lifetime `'a` must strictly outlive some value `v`, +where `v` owns data of type `D`; the rule gave two circumstances where +`'a` must strictly outlive the scope of `v`. + + * The first circumstance (`D` is directly instantiated at `'a`) + remains unchanged by this RFC. + + * The second circumstance (`D` has some type parameter with + trait-provided methods, i.e. that could be invoked within `Drop`) + is broadened by this RFC to simply say "`D` has some type + parameter." + +That is, under the changes of this RFC, whether the type parameter has +a trait-bound is irrelevant to the Drop-Check Rule. The reason is that +any type parameter, regardless of whether it has a trait bound or not, +may end up participating in [impl specialization], and thus could +expose an otherwise invisible reference `&'a AlreadyDroppedData`. + +`cannot-assume-parametricity` is a breaking change, since the language +will start assuming that a destructor for a data-type definition such +as `struct Parametri` may read from data held in its `C` parameter, +even though the `fn drop` formerly appeared to be parametric with +respect to `C`. This will cause `rustc` to reject code that it had +previously accepted (below are some examples that +[continue to work][examples-continue-to-work] and +some that [start being rejected][examples-start-reject]). + +### CAP stabilization details +[CAP stabilization details]: #cap-stabilization-details + +`cannot-assume-parametricity` will be incorporated into the beta +and stable Rust channels, to ensure that destructor code atop +stable channels in the wild stop relying on parametricity as soon +as possible. This will enable new language features such as +[impl specialization]. + + * It is not yet clear whether it is feasible to include a warning + cycle for CAP. + + * For now, this RFC is proposing to remove the parts of Drop-Check + that attempted to prove that the `impl Drop` was parametric with + respect to `T`. This would mean that there would be more warning + cycle; `dropck` would simply start rejecting more code. + There would be no way to opt back into the old `dropck` rules. + + * (However, during implementation of this change, we should + double-check whether a warning-cycle is in fact feasible.) + +## unguarded-escape-hatch + +The heart of `unguarded-escape-hatch` (UGEH) is this: Provide a new, +unsafe (and unstable) attribute-based escape hatch for use in the +standard library for cases where Drop Check is too strict. + +### Why we need an escape hatch + +The original motivation for the parametricity special-case in the +original Drop-Check rule was due to an observation that collection +types such as `TypedArena` or `Vec` were often used to +contain values that wanted to refer to each other. + +An example would be an element type like +`struct Concrete<'a>(u32, Cell>>);`, and then +instantiations of `TypedArena` or `Vec`. +This pattern has been used within `rustc`, for example, +to store elements of a linked structure within an arena. + +Without the parametricity special-case, the existence of a destructor +on `TypedArena` or `Vec` led the Drop-Check analysis to conclude +that those destructors might hypothetically read from the references +held within `T` -- forcing `dropck` to reject those destructors. + +(Note that `Concrete` itself has no destructor; if it did, then +`dropck`, both as originally stated and under the changes of this RFC, +*would* force the `'a` parameter of any instance to strictly outlive +the instance value, thus ruling out cross-references in the same +`TypedArena` or `Vec`.) + +Of course, the whole point of this RFC is that using parametricity as +the escape hatch seems like it does not suffice. But we still need +*some* escape hatch. + +### The new escape hatch: an unsafe attribute + +This leads us to the second component of the RFC, `unguarded-escape-hatch` (UGEH): +Add an attribute (with a name starting with "unsafe") that a library +designer can attach to a `drop` implementation that will allow a +destructor to side-step the `dropck`'s constraints (unsafely). + +This RFC proposes the attribute name `unsafe_destructor_blind_to_params`. +This name was specifically chosen to be long and ugly; see +[UGEH stabilization details] for further discussion. + +Much like the `unsafe_destructor` attribute that we had in the past, +this attribute relies on the programmer to ensure that the destructor +cannot actually be used unsoundly. It states an (unproven) assumption +that the given implementation of `drop` (and all functions that this + `drop` may transitively call) will never read or modify a value of +any type parameter, apart from the trivial operations of either +dropping the value or moving the value from one location to another. + + * (In particular, it certainly must not dereference any `&`-reference + within such a value, though this RFC is adopts a somewhat stronger + requirement to encourage the attribute to only be used for the + limited case of parametric collection types, where one need not do + anything more than move or drop values.) + +The above assumption must hold regardless of what impact +[impl specialization][] has on the resolution of all function calls. + +### UGEH stabilization details +[UGEH stabilization details]: #ugeh-stabilization-details + +The proposed attribute is only a *short-term* patch to work-around a +bug exposed by the combination of two desirable features (namely +[impl specialization] and [`dropck`][Sound Generic Drop]). + +In particular, using the attribute in cases where control-flow in the +destructor can reach functions that may be specialized on a +type-parameter `T` may expose the system to use-after-free scenarios +or other unsound conditions. This may a non-trivial thing for the +programmer to prove. + + * Short term strategy: The working assumption of this RFC is that the + standard library developers will use the proposed attribute in + cases where the destructor *is* parametric with respect to all type + parameters, even though the compiler cannot currently prove this to + be the case. + + The new attribute will be restricted to non-stable channels, like + any other new feature under a feature-gate. + + * Long term strategy: This RFC does not make any formal guarantees + about the long-term strategy for including an escape hatch. In + particular, this RFC does *not* propose that we stabilize the + proposed attribute + + It may be possible for future language changes to allow us to + directly express the necessary parametricity properties. + See further discussion in the [continue supporting parametricity][] alternative. + + The suggested attribute name (`unsafe_destructor_blind_to_params` + above) was deliberately selected to be long and ugly, in order to + discourage it from being stabilized in the future without at least + some significant discussion. (Likewise, the acronym "UGEH" was + chosen for its likely pronounciation "ugh", again a reminder that + we do not *want* to adopt this approach for the long term.) + + +## Examples of code changes under the RFC + +This section shows some code examples, starting with code that works +today and must continue to work tomorrow, then showing an example of +code that will start being rejected, and ending with an example of the +UGEH attribute. + +### Examples of code that must continue to work +[examples-continue-to-work]: #examples-of-code-that-must-continue-to-work + +Here is some code that works today and must continue to work in the future: + +```rust +use std::cell::Cell; + +struct Concrete<'a>(u32, Cell>>); + +fn main() { + let mut data = Vec::new(); + data.push(Concrete(0, Cell::new(None))); + data.push(Concrete(0, Cell::new(None))); + + data[0].1.set(Some(&data[1])); + data[1].1.set(Some(&data[0])); +} +``` + +In the above, we are building up a vector, pushing `Concrete` elements +onto it, and then later linking those concrete elements together via +optional references held in a cell in each concrete element. + +We can even wrap the vector in a struct that holds it. This also must +continue to work (and will do so under this RFC); such structural +composition is a common idiom in Rust code. + +```rust +use std::cell::Cell; + +struct Concrete<'a>(u32, Cell>>); + +struct Foo { data: Vec } + +fn main() { + let mut foo = Foo { data: Vec::new() }; + foo.data.push(Concrete(0, Cell::new(None))); + foo.data.push(Concrete(0, Cell::new(None))); + + foo.data[0].1.set(Some(&foo.data[1])); + foo.data[1].1.set(Some(&foo.data[0])); +} +``` + +### Examples of code that will start to be rejected +[examples-start-reject]: #examples-of-code-that-will-start-to-be-rejected + +The main change injected by this RFC is this: due to `cannot-assume-parametricity`, +an attempt to add a destructor to the `struct Foo` above will cause the +code above to be rejected, because we will assume that the destructor for `Foo` +may invoke methods on the concrete elements that dereferences their links. + +Thus, this code will be rejected: + +```rust +use std::cell::Cell; + +struct Concrete<'a>(u32, Cell>>); + +struct Foo { data: Vec } + +// This is the new `impl Drop` +impl Drop for Foo { + fn drop(&mut self) { } +} + +fn main() { + let mut foo = Foo { data: Vec::new() }; + foo.data.push(Concrete(0, Cell::new(None))); + foo.data.push(Concrete(0, Cell::new(None))); + + foo.data[0].1.set(Some(&foo.data[1])); + foo.data[1].1.set(Some(&foo.data[0])); +} +``` + +NOTE: Based on a preliminary crater run, it seems that mixing together +destructors with this sort of cyclic structure is sufficiently rare +that *no* crates on `crates.io` actually regressed under the new rule: +everything that compiled before the change continued to compile after +it. + +### Example of the unguarded-escape-hatch +[examples-escape-hatch]: #example-of-the-unguarded-escape-hatch + +If the developer of `Foo` has access to the feature-gated +escape-hatch, and is willing to assert that the destructor for `Foo` +does nothing with the links in the data, then the developer can work +around the above rejection of the code by adding the corresponding +attribute. + +```rust +#![feature(dropck_parametricity)] +use std::cell::Cell; + +struct Concrete<'a>(u32, Cell>>); + +struct Foo { data: Vec } + +impl Drop for Foo { + #[unsafe_destructor_blind_to_params] // This is the UGEH attribute + fn drop(&mut self) { } +} + +fn main() { + let mut foo = Foo { data: Vec::new() }; + foo.data.push(Concrete(0, Cell::new(None))); + foo.data.push(Concrete(0, Cell::new(None))); + + foo.data[0].1.set(Some(&foo.data[1])); + foo.data[1].1.set(Some(&foo.data[0])); +} +``` + +# Drawbacks + +As should be clear by the tone of this RFC, the +`unguarded-escape-hatch` is clearly a hack. It is subtle and unsafe, +just as `unsafe_destructor` was (and for the most part, the whole +point of [Sound Generic Drop][] was to remove `unsafe_destructor` from +the language). + + * However, the expectation is that most clients will have no need to + ever use the `unguarded-escape-hatch`. + + * It may suffice to use the escape hatch solely within the collection + types of `libstd`. + + * Otherwise, if clients outside of `libstd` determine that they *do* + need to be able to write destructors that need to bypass `dropck` + safely, then we can (and *should*) investigate one of the + [sound alternatives][continue supporting parametricity], rather + than stabilize the unsafe hackish escape hatch.. + +# Alternatives +[alternatives]: #alternatives + +## CAP without UGEH + +One might consider adopting `cannot-assume-parametricity` without +`unguarded-escape-hatch`. However, unless some other sort of escape +hatch were added, this path would break much more code. + +## UGEH for lifetime parameters + +Since we're already being unsafe here, one might consider having +the `unsafe_destructor_blind_to_params` apply to lifetime parameters +as well as type parameters. + +However, given that the `unsafe_destructor_blind_to_params` attribute +is only intended as a short-term band-aid (see +[UGEH stabilization details][]) it seems better to just make it only as +broad as it needs to be (and no broader). + +## "Sort-of Guarded" Escape Hatch + +We could add the escape hatch but continue employing the current +dropck analysis to it. This would essentially mean that code would have +to apply the unsafe attribute to be considered for parametricity, but +if there were obvious problems (namely, if the type parameter had a trait bound) +then the attempt to opt into parametricity would be ignored and the +strict ordering restrictions on the lifetimes would be imposed. + +I only mention this because it occurred to me in passing; I do not +really think it has much of a benefit. It would potentially lead +someone to think that their code has been proven sound (since the +`dropck` would catch some mistakes in programmer reasoning) but the +pitfalls with respect to specialization would remain. + +## Continue Supporting Parametricity +[continue supporting parametricity]: #continue-supporting-parametricity +There may be ways to revise the language so that functions can declare +that they must be parametric with respect to their type parameters. +Here we sketch two potential ideas for how one might do this, mostly to +give a hint of why this is not a trivial change to the language. + +Neither design is likely to be adopted, at least as described here, +because both of them impose significant burdens on implementors of +parametric destructors, as we will see. + +(Also, if we go down this path, we will need to fix other bugs in the +Drop Check rule, where, as previously noted, parametricity is a +[necessary but *insufficient* condition][parametricity-insufficient] for soundness.) + +### Parametricity via effect-system attributes + +One feature of the [impl specialization] RFC is that all functions that +can be specialized must be declared as such, via the `default` keyword. + +This leads us to one way that a function could declare that its body +must not be allows to call into specialized methods: an attribute like +`#[unspecialized]`. The `#[unspecialized]` attribute, when applied to +a function `fn foo()`, would mean two things: + + * `foo` is not allowed to call any functions that have the `default` keyword. + + * `foo` is only allowed to call functions that are also marked `#[unspecialized]` + +All `fn drop` methods would be required to be `#[unspecialized]`. + +It is the second bullet that makes this an ad-hoc effect system: it provides +a recursive property ensuring that during the extent of the call to `foo`, +we will never invoke a function marked as `default` (and therefore, I *think*, +will never even potentially invoke a method that has been specialized). + +It is also this second bullet that represents a signficant burden on +the destructor implementor. In particular, it immediately rules out +using any library routine unless that routine has been marked as +`#[unspecialized]`. The attribute is unlikely to be included on any +function unless the its developer is making a destructor that calls it +in tandem. + +### Parametricity via some `?`-bound + +Another approach starts from another angle: As described earlier, +parametricity in `dropck` is the requirement that `fn drop` cannot do +anything with a `t: T` (where `T` is some relevant type parameter) +except: + + 1. move `t` to some other owner expecting a `T` or, + + 2. drop `t`, running its destructor and freeing associated resources. + +So, perhaps it would be more natural to express this requirement +via a bound. + +We would start with the assumption that functions may be +non-parametric (and thus their implementations may be specialized to +specific types). + +But then if you want to declare a function as having a stronger +constraint on its behavior (and thus expanding its potential callers +to ones that require parametricity), you could add a bound `T: ?Special`. + +The Drop-check rule would treat `T: ?Special` type-parameters as parametric, +and other type-parameters as non-parametric. + +The marker trait `Special` would be an OIBIT that all sized types would get. + +Any expression in the context of a type-parameter binding of the form +`` would not be allowed to call any `default` method +where `T` could affect the specialization process. + +(The careful reader will probably notice the potential sleight-of-hand +here: is this really any different from the effect-system attributes +proposed earlier? Perhaps not, though it seems likely that the finer +grain parameter-specific treatment proposed here is more expressive, +at least in theory.) + +Like the previous proposal, this design represents a significant +burden on the destructor implementor: Again, the `T: ?Special` +attribute is unlikely to be included on any function unless the its +developer is making a destructor that calls it in tandem. + +# Unresolved questions + + * What name to use for the attribute? + Is `unsafe_destructor_blind_to_params` sufficiently long and ugly? ;) + + * What is the real long-term plan? + + * Should we consider merging the discussion of alternatives + into the [impl specialization] RFC? + +# Bibliography + +### Reynolds +[Rey83]: #reynolds +John C. Reynolds. "Types, abstraction and parametric polymorphism". IFIP 1983 +http://www.cse.chalmers.se/edu/year/2010/course/DAT140_Types/Reynolds_typesabpara.pdf + +### Wadler +[Wad89]: #wadler +Philip Wadler. "Theorems for free!". FPCA 1989 +http://ttic.uchicago.edu/~dreyer/course/papers/wadler.pdf + diff --git a/text/1240-repr-packed-unsafe-ref.md b/text/1240-repr-packed-unsafe-ref.md new file mode 100644 index 00000000000..6ac5b341974 --- /dev/null +++ b/text/1240-repr-packed-unsafe-ref.md @@ -0,0 +1,438 @@ +- Feature Name: NA +- Start Date: 2015-08-06 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1240 +- Rust Issue: https://github.com/rust-lang/rust/issues/27060 + +# Summary + +Taking a reference into a struct marked `repr(packed)` should become +`unsafe`, because it can lead to undefined behaviour. `repr(packed)` +structs need to be banned from storing `Drop` types for this reason. + +# Motivation + +Issue [#27060](https://github.com/rust-lang/rust/issues/27060) noticed +that it was possible to trigger undefined behaviour in safe code via +`repr(packed)`, by creating references `&T` which don't satisfy the +expected alignment requirements for `T`. + +Concretely, the compiler assumes that any reference (or raw pointer, +in fact) will be aligned to at least `align_of::()`, i.e. the +following snippet should run successfully: + +```rust +let some_reference: &T = /* arbitrary code */; + +let actual_address = some_reference as *const _ as usize; +let align = std::mem::align_of::(); + +assert_eq!(actual_address % align, 0); +``` + +However, `repr(packed)` allows on to violate this, by creating values +of arbitrary types that are stored at "random" byte addresses, by +removing the padding normally inserted to maintain alignment in +`struct`s. E.g. suppose there's a struct `Foo` defined like +`#[repr(packed, C)] struct Foo { x: u8, y: u32 }`, and there's an +instance of `Foo` allocated at a 0x1000, the `u32` will be placed at +`0x1001`, which isn't 4-byte aligned (the alignment of `u32`). + +Issue #27060 has a snippet which crashes at runtime on at least two +x86-64 CPUs (the author's and the one playpen runs on) and almost +certainly most other platforms. + +```rust +#![feature(simd, test)] + +extern crate test; + +// simd types require high alignment or the CPU faults +#[simd] +#[derive(Debug, Copy, Clone)] +struct f32x4(f32, f32, f32, f32); + +#[repr(packed)] +#[derive(Copy, Clone)] +struct Unalign(T); + +struct Breakit { + x: u8, + y: Unalign +} + +fn main() { + let val = Breakit { x: 0, y: Unalign(f32x4(0.0, 0.0, 0.0, 0.0)) }; + + test::black_box(&val); + + println!("before"); + + let ok = val.y; + test::black_box(ok.0); + + println!("middle"); + + let bad = val.y.0; + test::black_box(bad); + + println!("after"); +} +``` + +On playpen, it prints: + +``` +before +middle +playpen: application terminated abnormally with signal 4 (Illegal instruction) +``` + +That is, the `bad` variable is causing the CPU to fault. The `let` +statement is (in pseudo-Rust) behaving like `let bad = +load_with_alignment(&val.y.0, align_of::());`, but the +alignment isn't satisfied. (The `ok` line is compiled to a `movupd` +instruction, while the `bad` is compiled to a `movapd`: `u` == +unaligned, `a` == aligned.) + +(NB. The use of SIMD types in the example is just to be able to +demonstrate the problem on x86. That platform is generally fairly +relaxed about pointer alignments and so SIMD & its specialised `mov` +instructions are the easiest way to demonstrate the violated +assumptions at runtime. Other platforms may fault on other types.) + +Being able to assume that accesses are aligned is useful, for +performance, and almost all references will be correctly aligned +anyway (`repr(packed)` types and internal references into them are +quite rare). + +The problems with unaligned accesses can be avoided by ensuring that +the accesses are actually aligned (e.g. via runtime checks, or other +external constraints the compiler cannot understand directly). For +example, consider the following + +```rust +#[repr(packed, C)] +struct Bar { + x: u8, + y: u16, + z: u8, + w: u32, +} +``` + +Taking a reference to some of those fields may cause undefined +behaviour, but not always. It is always correct to take +a reference to `x` or `z` since `u8` has alignment 1. If the struct +value itself is 4-byte aligned (which is not guaranteed), `w` will +also be 4-byte aligned since the `u8, u16, u8` take up 4 bytes, hence +it is correct to take a reference to `w` in this case (and only that +case). Similarly, it is only correct to take a reference to `y` if the +struct is at an odd address, so that the `u16` starts at an even one +(i.e. is 2-byte aligned). + +# Detailed design + +It is `unsafe` to take a reference to the field of a `repr(packed)` +struct. It is still possible, but it is up to the programmer to ensure +that the alignment requirements are satisfied. Referencing +(by-reference, or by-value) a subfield of a struct (including indexing +elements of a fixed-length array) stored inside a `repr(packed)` +struct counts as taking a reference to the `packed` field and hence is +unsafe. + +It is still legal to manipulate the fields of a `packed` struct by +value, e.g. the following is correct (and not `unsafe`), no matter the +alignment of `bar`: + +```rust +let bar: Bar = ...; + +let x = bar.y; +bar.w = 10; +``` + +It is illegal to store a type `T` implementing `Drop` (including a +generic type) in a `repr(packed)` type, since the destructor of `T` is +passed a reference to that `T`. The crater run (see appendix) found no +crate that needs to use `repr(packed)` to store a `Drop` type (or a +generic type). The generic type rule is conservatively approximated by +disallowing generic `repr(packed)` structs altogether, but this can be +relaxed (see Alternatives). + +Concretely, this RFC is proposing the introduction of the `// error`s +in the following code. + +```rust +struct Baz { + x: u8, +} + +#[repr(packed)] +struct Qux { // error: generic repr(packed) struct + y: Baz, + z: u8, + w: String, // error: storing a Drop type in a repr(packed) struct + t: [u8; 4], +} + +let mut qux = Qux { ... }; + +// all ok: +let y_val = qux.y; +let z_val = qux.z; +let t_val = qux.t; +qux.y = Baz { ... }; +qux.z = 10; +qux.t = [0, 1, 2, 3]; + +// new errors: + +let y_ref = &qux.y; // error: taking a reference to a field of a repr(packed) struct is unsafe +let z_ref = &mut qux.z; // ditto +let y_ptr: *const _ = &qux.y; // ditto +let z_ptr: *mut _ = &mut qux.z; // ditto + +let x_val = qux.y.x; // error: directly using a subfield of a field of a repr(packed) struct is unsafe +let x_ref = &qux.y.x; // ditto +qux.y.x = 10; // ditto + +let t_val = qux.t[0]; // error: directly indexing an array in a field of a repr(packed) struct is unsafe +let t_ref = &qux.t[0]; // ditto +qux.t[0] = 10; // ditto +``` + +(NB. the subfield and indexing cases can be resolved by first copying +the packed field's value onto the stack, and then accessing the +desired value.) + +## Staging + +This change will first land as warnings indicating that code will be +broken, with the warnings switched to the intended errors after one +release cycle. + +# Drawbacks + +This will cause some functionality to stop working in +possibly-surprising ways (NB. the drawback here is mainly the +"possibly-surprising", since the functionality is broken with general +`packed` types.). For example, `#[derive]` usually takes references to +the fields of structs, and so `#[derive(Clone)]` will generate +errors. However, this use of derive is incorrect in general (no +guarantee that the fields are aligned), and, one can easily replace it +by: + +```rust +#[derive(Copy)] +#[repr(packed)] +struct Foo { ... } + +impl Clone for Foo { fn clone(&self) -> Foo { *self } } +``` + +Similarly, `println!("{}", foo.bar)` will be an error despite there +not being a visible reference (`println!` takes one internally), +however, this can be resolved by, for instance, assigning to a +temporary. + +# Alternatives + +- A short-term solution would be to feature gate `repr(packed)` while + the kinks are worked out of it +- Taking an internal reference could be made flat-out illegal, and the + times when it is correct simulated by manual raw-pointer + manipulation. +- The rules could be made less conservative in several cases, however + the crater run didn't indicate any need for this: + - a generic `repr(packed)` struct can use the generic in ways that + avoids problems with `Drop`, e.g. if the generic is bounded by + `Copy`, or if the type is only used in ways that are `Copy` such + as behind a `*const T`. + - using a subfield of a field of a `repr(packed)` struct by-value + could be OK. + +# Unresolved questions + +None. + +# Appendix + +## Crater analysis + +Crater was run on 2015/07/23 with a patch that feature gated `repr(packed)`. + +High-level summary: + +- several unnecessary uses of `repr(packed)` (patches have been + submitted and merged to remove all of these) +- most necessary ones are to match the declaration of a struct in C +- many "necessary" uses can be replaced by byte arrays/arrays of smaller types +- 8 crates are currently on stable themselves (unsure about deps), 4 are already on nightly + - 1 of the 8, http2parse, is essentially only used by a nightly-only crate (tendril) + - 4 of the stable and 1 of the nightly crates don't need `repr(packed)` at all + +| | stable | needed | FFI only | +|------------|--------|--------|----------| +| image | ✓ | | | +| nix | ✓ | ✓ | ✓ | +| tendril | | ✓ | | +| assimp-sys | ✓ | ✓ | ✓ | +| stemmer | ✓ | | | +| x86 | ✓ | ✓ | ✓ | +| http2parse | ✓ | ✓ | | +| nl80211rs | ✓ | ✓ | ✓ | +| openal | ✓ | | | +| elfloader | | ✓ | ✓ | +| x11 | ✓ | | | +| kiss3d | ✓ | | | + +More detailed analysis inline with broken crates. (Don't miss `kiss3d` in the non-root section.) + +### Regression report c85ba3e9cb4620c6ec8273a34cce6707e91778cb vs. 7a265c6d1280932ba1b881f31f04b03b20c258e5 + +* From: c85ba3e9cb4620c6ec8273a34cce6707e91778cb +* To: 7a265c6d1280932ba1b881f31f04b03b20c258e5 + +#### Coverage + +* 2617 crates tested: 1404 working / 1151 broken / 40 regressed / 0 fixed / 22 unknown. + +#### Regressions + +* There are 11 root regressions +* There are 40 regressions + +#### Root regressions, sorted by rank: + +* [image-0.3.11](https://crates.io/crates/image) + ([before](https://tools.taskcluster.net/task-inspector/#V6QBA9LfTT6mhFJ0Yo7nJg)) + ([after](https://tools.taskcluster.net/task-inspector/#QU9d4XEPSWOg7CIGFpATDg)) + - [use](https://github.com/PistonDevelopers/image/blob/8e64e0d78e465ddfa13cd6627dede5fd258386f6/src/tga/decoder.rs#L75) + seems entirely unnecessary (no raw bytewise operations on the + struct itself) + + On stable. +* [nix-0.3.9](https://crates.io/crates/nix) + ([before](https://tools.taskcluster.net/task-inspector/#X3HMXrq4S_GMNbeeAY8i6w)) + ([after](https://tools.taskcluster.net/task-inspector/#kz0vDaAhRRuKww2l-FvYpQ)) + - [use](https://github.com/carllerche/nix-rust/blob/5801318c0c4c6eeb3431144a89496830f55d6628/src/sys/epoll.rs#L98) + required to match + [C struct](https://github.com/torvalds/linux/blob/de182468d1bb726198abaab315820542425270b7/include/uapi/linux/eventpoll.h#L53-L62) + + On stable. +* [tendril-0.1.2](https://crates.io/crates/tendril) + ([before](https://tools.taskcluster.net/task-inspector/#zQH7ShADR5O9eQe1mg3e6A)) + ([after](https://tools.taskcluster.net/task-inspector/#zI-PoIZHTm-7Urq3CLsXeg)) + - [use 1](https://github.com/servo/tendril/blob/faf97ded26213e561f8ad2768113cc05b6424748/src/buf32.rs#L19) + not strictly necessary? + - [use 2](https://github.com/servo/tendril/blob/faf97ded26213e561f8ad2768113cc05b6424748/src/tendril.rs#L43) + required on 64-bit platforms to get size_of::<Header>() == 12 rather + than 16. + - [use 3](https://github.com/servo/tendril/blob/faf97ded26213e561f8ad2768113cc05b6424748/src/tendril.rs#L91), + as above, does some precise tricks with the layout for optimisation. + + Requires nightly. +* [assimp-sys-0.0.3](https://crates.io/crates/assimp-sys) ([before](https://tools.taskcluster.net/task-inspector/#rTrUh0VQR2uWXMQw14kRIA)) ([after](https://tools.taskcluster.net/task-inspector/#AR36o35FRV-mVInHKWFDrg)) + - [many uses](https://github.com/Eljay/assimp-sys/search?utf8=%E2%9C%93&q=packed), + required to match + [C structs](https://github.com/assimp/assimp/blob/f3d418a199cfb7864c826665016e11c65ddd7aa9/include/assimp/types.h#L227) + (one example). In author's words: + + > [11:36:15] <eljay> huon: well my assimp binding is basically abandoned for now if you are just worried about breaking things, and seems unlikely anyone is using it :P + + On stable. +* [stemmer-0.1.1](https://crates.io/crates/stemmer) ([before](https://tools.taskcluster.net/task-inspector/#0Affr5PrTnGoBukeRwuiKw)) ([after](https://tools.taskcluster.net/task-inspector/#8xGRmPxOQS2NHbvgXMvmWQ)) + - [use](https://github.com/lady-segfault/stemmer-rs/blob/4090dcf7a258df5031c10754c8de118e0ca93512/src/stemmer.rs#L7), completely unnecessary + + On stable. +* [x86-0.2.0](https://crates.io/crates/x86) ([before](https://tools.taskcluster.net/task-inspector/#__VYVs6QSYm4JF68fSXibw)) ([after](https://tools.taskcluster.net/task-inspector/#xj8paeiaR0OGkK1v2raHYg)) + - [several similar uses](https://github.com/gz/rust-x86/search?utf8=%E2%9C%93&q=packed), + specific layout necessary for raw interaction with CPU features + + Requires nightly. +* [http2parse-0.0.3](https://crates.io/crates/http2parse) ([before](https://tools.taskcluster.net/task-inspector/#CUr_5dfgQMywZmG_ER7ZGQ)) ([after](https://tools.taskcluster.net/task-inspector/#rQO3m_8iQQapN2l-PvGrRw)) + - [use](https://github.com/reem/rust-http2parse/blob/b363139ac2f81fa25db504a9256face9f8c799b6/src/payload.rs#L206), + used to get super-fast "parsing" of headers, by transmuting + `&[u8]` to `&[Setting]`. + + On stable, however: + + ```irc + [11:30:38] reem: why is https://github.com/reem/rust-http2parse/blob/b363139ac2f81fa25db504a9256face9f8c799b6/src/payload.rs#L208 packed? + [11:31:59] huon: I transmute from & [u8] to & [Setting] + [11:32:35] So repr packed gets me the layout I need + [11:32:47] With no padding between the u8 and u16 + [11:33:11] and between Settings + [11:33:17] ok + [11:33:22] (stop doing bad things :P ) + [11:34:00] (there's some problems with repr(packed) https://github.com/rust-lang/rust/issues/27060 and we may be feature gating it) + [11:35:02] reem: wait, aren't there endianness problems? + [11:36:16] Ah yes, looks like I forgot to finish the Setting interface + [11:36:27] The identifier and value methods take care of converting to types values + [11:36:39] The goal is just to avoid copying the whole buffer and requiring an allocation + [11:37:01] Right now the whole parser takes like 9 ns to parse a frame + [11:39:11] would you be sunk if repr(packed) was feature gated? + [11:40:17] or, is maybe something like `struct SettingsRaw { identifier: [u8; 2], value: [u8; 4] }` OK (possibly with conversion functions etc.)? + [11:40:46] Yea, I could get around it if I needed to + [11:40:58] Anyway the primary consumer is transfer and I'm running on nightly there + [11:41:05] So it doesn't matter too much + ``` + +* [nl80211rs-0.1.0](https://crates.io/crates/nl80211rs) ([before](https://tools.taskcluster.net/task-inspector/#rhEG57vQQHWiVCcS3kIWrA)) ([after](https://tools.taskcluster.net/task-inspector/#s97ED8oXQ4WN-Pbm3ZsFJQ)) + - [three similar uses](https://github.com/carrotsrc/nl80211rs/search?utf8=%E2%9C%93&q=packed) + to match + [C struct](http://lxr.free-electrons.com/source/include/uapi/linux/nl80211.h#L2288). + + On stable. +* [openal-0.2.1](https://crates.io/crates/openal) ([before](https://tools.taskcluster.net/task-inspector/#XUvl-638T82xgGwkrxpz5g)) ([after](https://tools.taskcluster.net/task-inspector/#Oc9wEFpbQM2Tja9sv0qt4g)) + - [several similar uses](https://github.com/meh/rust-openal/blob/9e35fd284f25da7fe90a8307de85a6ec6d392ea1/src/util.rs#L6), + probably unnecessary, just need the struct to behave like + `[f32; 3]`: pointers to it + [are passed](https://github.com/meh/rust-openal/blob/9e35fd284f25da7fe90a8307de85a6ec6d392ea1/src/listener/listener.rs#L204-L205) + to [functions expecting `*mut f32`](https://github.com/meh/rust-openal-sys/blob/master/src/al.rs#L146) pointers. + + On stable. +* [elfloader-0.0.1](https://crates.io/crates/elfloader) ([before](https://tools.taskcluster.net/task-inspector/#ssE4lk0xR3q1qYZBXK24aA)) ([after](https://tools.taskcluster.net/task-inspector/#SAH7AAVIToKkhf7QRK4C1g)) + - [two similar uses](https://github.com/gz/rust-elfloader/blob/d61db7c83d66ce65da92aed5e33a4baf35f4c1e7/src/elf.rs#L362), + required to match file headers/formats exactly. + + Requires nightly. +* [x11cap-0.1.0](https://crates.io/crates/x11cap) ([before](https://tools.taskcluster.net/task-inspector/#7wn8cjqXSOaZfpekKRY-yw)) ([after](https://tools.taskcluster.net/task-inspector/#bA6LwPreTMa8R_zYNt8Z3w)) + - [use](https://github.com/bryal/X11Cap/blob/d11b7170e6fa7c1ab370c69887b9ce71a542335d/src/lib.rs#L41) unnecessary. + + Requires nightly. + +#### Non-root regressions, sorted by rank: + +* [glium-0.8.0](https://crates.io/crates/glium) ([before](https://tools.taskcluster.net/task-inspector/#m5yEIEu-QEeM_2t4_11Opg)) ([after](https://tools.taskcluster.net/task-inspector/#Wztxoh9SQ-GqA4F3inaR9Q)) +* [mio-0.4.1](https://crates.io/crates/mio) ([before](https://tools.taskcluster.net/task-inspector/#RtT-HmwbTYuG0djpAkVLvA)) ([after](https://tools.taskcluster.net/task-inspector/#Lx1d3ukPSGyRIwIDt_w0gw)) +* [piston_window-0.11.0](https://crates.io/crates/piston_window) ([before](https://tools.taskcluster.net/task-inspector/#QE421inlRgShgoXKcUkEEA)) ([after](https://tools.taskcluster.net/task-inspector/#wIKQPW_7TjmrztHQ4Kk3hw)) +* [piston2d-gfx_graphics-0.4.0](https://crates.io/crates/piston2d-gfx_graphics) ([before](https://tools.taskcluster.net/task-inspector/#hIUDm8m6QrCdOpSF30aPjQ)) ([after](https://tools.taskcluster.net/task-inspector/#HOw14MCoQxGj7GjYIy-Lng)) +* [piston-gfx_texture-0.2.0](https://crates.io/crates/piston-gfx_texture) ([before](https://tools.taskcluster.net/task-inspector/#om-wlRW-Tm65MTlrpa8u7Q)) ([after](https://tools.taskcluster.net/task-inspector/#m9e9Vx58RA6KhCljujzzMQ)) +* [piston2d-glium_graphics-0.3.0](https://crates.io/crates/piston2d-glium_graphics) ([before](https://tools.taskcluster.net/task-inspector/#vHeYcL2gRT2aIz9JeksAfw)) ([after](https://tools.taskcluster.net/task-inspector/#yEKBSm1BQ_C0O-4GKhQgUQ)) +* [html5ever-0.2.0](https://crates.io/crates/html5ever) ([before](https://tools.taskcluster.net/task-inspector/#C0yCazihTWa4x2GxCUxasQ)) ([after](https://tools.taskcluster.net/task-inspector/#Vbl4HjqcQlq4-sJ2m1yBnQ)) +* [caribon-0.6.2](https://crates.io/crates/caribon) ([before](https://tools.taskcluster.net/task-inspector/#AJZzG5gLSY-WVMKc-MoV5w)) ([after](https://tools.taskcluster.net/task-inspector/#ornLa3ZaSC-Zbz7ICg33Tg)) +* [gj-0.0.2](https://crates.io/crates/gj) ([before](https://tools.taskcluster.net/task-inspector/#xhaiB76FQAKCEsmBkQtp1A)) ([after](https://tools.taskcluster.net/task-inspector/#rBJke3wpQqaq7wmEiQtLJA)) +* [glium_text-0.5.0](https://crates.io/crates/glium_text) ([before](https://tools.taskcluster.net/task-inspector/#IMdXVtTYSIaDrCRQ6SbLTA)) ([after](https://tools.taskcluster.net/task-inspector/#t322h_mzQGarVmsf5MHqKA)) +* [glyph_packer-0.0.0](https://crates.io/crates/glyph_packer) ([before](https://tools.taskcluster.net/task-inspector/#JmIVzau8RyOhnlTvdsRIHQ)) ([after](https://tools.taskcluster.net/task-inspector/#7k9GF09SQPya4ZrLuR6cJw)) +* [html5ever_dom_sink-0.2.0](https://crates.io/crates/html5ever_dom_sink) ([before](https://tools.taskcluster.net/task-inspector/#7GJmaAYKS9WNqnbCx5XMrw)) ([after](https://tools.taskcluster.net/task-inspector/#pHotnKLkTAqK4-LP-n2MUQ)) +* [identicon-0.1.0](https://crates.io/crates/identicon) ([before](https://tools.taskcluster.net/task-inspector/#15nnASVgStmrwqdCS1q8Rg)) ([after](https://tools.taskcluster.net/task-inspector/#WgJb_jEMQIebNgb_D2uq7Q)) +* [assimp-0.0.4](https://crates.io/crates/assimp) ([before](https://tools.taskcluster.net/task-inspector/#-i-FYpJ2Rz-bcmxGVmxoOQ)) ([after](https://tools.taskcluster.net/task-inspector/#HXR8V8NeRMyOxF0Nnhdl0w)) +* [jamkit-0.2.4](https://crates.io/crates/jamkit) ([before](https://tools.taskcluster.net/task-inspector/#mcpl8Z62Td-DFfoi9AqRnw)) ([after](https://tools.taskcluster.net/task-inspector/#XGOIXxqpRbCMy5bZ42GV5w)) +* [coap-0.1.0](https://crates.io/crates/coap) ([before](https://tools.taskcluster.net/task-inspector/#SI137HlpRsSuQrlhxlRHpQ)) ([after](https://tools.taskcluster.net/task-inspector/#dT3pt46pQtmy3CvIaC_71Q)) +* [kiss3d-0.1.2](https://crates.io/crates/kiss3d) ([before](https://tools.taskcluster.net/task-inspector/#2Bbro6uZQQCudv2ClalFTw)) ([after](https://tools.taskcluster.net/task-inspector/#9vRbugDKTDm94fjw6BcS6A)) + - [use](https://github.com/sebcrozet/kiss3d/blob/1c1d39d5f8a428609b2f7809c7237e8853ac24e9/src/text/glyph.rs#L7) seems to be unnecessary: semantically useless, just a space "optimisation", which actually makes no difference because the Vec field will be appropriately aligned always. + + On stable. +* [compass-sprite-0.0.3](https://crates.io/crates/compass-sprite) ([before](https://tools.taskcluster.net/task-inspector/#dTcfDsk1QYKWtK7EH5gnwg)) ([after](https://tools.taskcluster.net/task-inspector/#rElhdv9GS8-Zi14LSL-6Ng)) +* [dcpu16-gui-0.0.3](https://crates.io/crates/dcpu16-gui) ([before](https://tools.taskcluster.net/task-inspector/#mtbOQfFUTDiZcMUc65LD3w)) ([after](https://tools.taskcluster.net/task-inspector/#co31ZVgNQ1mYyDCnSwBxJg)) +* [piston3d-gfx_voxel-0.1.1](https://crates.io/crates/piston3d-gfx_voxel) ([before](https://tools.taskcluster.net/task-inspector/#2nZmq4zORIOdJ-ErCOCmww)) ([after](https://tools.taskcluster.net/task-inspector/#epzWs2zuSiWxfoWyMCv0Kw)) +* [dev-0.0.7](https://crates.io/crates/dev) ([before](https://tools.taskcluster.net/task-inspector/#5hSafPV2RlKlubg7WHniPw)) ([after](https://tools.taskcluster.net/task-inspector/#ITQ6zXYpSAC3_AtmMe4xRw)) +* [rustty-0.1.3](https://crates.io/crates/rustty) ([before](https://tools.taskcluster.net/task-inspector/#jlstxp6mSPqzQ1n3FgHSRA)) ([after](https://tools.taskcluster.net/task-inspector/#HgrQz6UVQ5yCkVX25Py-2w)) +* [skeletal_animation-0.1.1](https://crates.io/crates/skeletal_animation) ([before](https://tools.taskcluster.net/task-inspector/#nyMUzqs6RZKIZJ1v1xcglA)) ([after](https://tools.taskcluster.net/task-inspector/#10lM9Vh5SBa7YD3swbm6pw)) +* [slabmalloc-0.0.1](https://crates.io/crates/slabmalloc) ([before](https://tools.taskcluster.net/task-inspector/#li_vsJY8S9-OKEP_KIzEyQ)) ([after](https://tools.taskcluster.net/task-inspector/#1lcKVbKVQNqkKSfwEKIvkg)) +* [spidev-0.1.0](https://crates.io/crates/spidev) ([before](https://tools.taskcluster.net/task-inspector/#5YidcvWyQ0KSmX_9yHjL5A)) ([after](https://tools.taskcluster.net/task-inspector/#mmDafSdlSIS-xfDvyeIckQ)) +* [sysfs_gpio-0.3.2](https://crates.io/crates/sysfs_gpio) ([before](https://tools.taskcluster.net/task-inspector/#KEO87BJHSB-9wNHvTGgEiQ)) ([after](https://tools.taskcluster.net/task-inspector/#44Qnzq6CSBSrMti4utYEZQ)) +* [texture_packer-0.0.1](https://crates.io/crates/texture_packer) ([before](https://tools.taskcluster.net/task-inspector/#-yNhXPaFSBK59eEPRBChVw)) ([after](https://tools.taskcluster.net/task-inspector/#dY5YnW-uTRuCAxxh93_P1w)) +* [falcon-0.0.1](https://crates.io/crates/falcon) ([before](https://tools.taskcluster.net/task-inspector/#hsFGvgrWTL6yY5JVjm20Sw)) ([after](https://tools.taskcluster.net/task-inspector/#YMYfL2KkTH2fct8CD9nqUg)) +* [filetype-0.2.0](https://crates.io/crates/filetype) ([before](https://tools.taskcluster.net/task-inspector/#bCC3ps_gT6m05BNm5lEnFw)) ([after](https://tools.taskcluster.net/task-inspector/#trGw9uPMTgiuxp-w821ZgA)) diff --git a/text/1241-no-wildcard-deps.md b/text/1241-no-wildcard-deps.md new file mode 100644 index 00000000000..b0fc80cf984 --- /dev/null +++ b/text/1241-no-wildcard-deps.md @@ -0,0 +1,131 @@ +- Feature Name: N/A +- Start Date: 2015-07-23 +- RFC PR: [rust-lang/rfcs#1241](https://github.com/rust-lang/rfcs/pull/1241) +- Rust Issue: [rust-lang/rust#28628](https://github.com/rust-lang/rust/issues/28628) + +# Summary + +A Cargo crate's dependencies are associated with constraints that specify the +set of versions of the dependency with which the crate is compatible. These +constraints range from accepting exactly one version (`=1.2.3`), to +accepting a range of versions (`^1.2.3`, `~1.2.3`, `>= 1.2.3, < 3.0.0`), to +accepting any version at all (`*`). This RFC proposes to update crates.io to +reject publishes of crates that have compile or build dependencies with +a wildcard version constraint. + +# Motivation + +Version constraints are a delicate balancing act between stability and +flexibility. On one extreme, one can lock dependencies to an exact version. +From one perspective, this is great, since the dependencies a user will consume +will be the same that the developers tested against. However, on any nontrival +project, one will inevitably run into conflicts where library A depends on +version `1.2.3` of library B, but library C depends on version `1.2.4`, at +which point, the only option is to force the version of library B to one of +them and hope everything works. + +On the other hand, a wildcard (`*`) constraint will never conflict with +anything! There are other things to worry about here, though. A version +constraint is fundamentally an assertion from a library's author to its users +that the library will work with any version of a dependency that matches its +constraint. A wildcard constraint is claiming that the library will work with +any version of the dependency that has ever been released *or will ever be +released, forever*. This is a somewhat absurd guarantee to make - forever is a +long time! + +Absurd guarantees on their own are not necessarily sufficient motivation to +make a change like this. The real motivation is the effect that these +guarantees have on consumers of libraries. + +As an example, consider the [openssl](https://crates.io/crates/openssl) crate. +It is one of the most popular libraries on crates.io, with several hundred +downloads every day. 50% of the [libraries that depend on it](https://crates.io/crates/openssl/reverse_dependencies) +have a wildcard constraint on the version. None of them can build against every +version that has ever been released. Indeed, no libraries can since many of +those releases can before Rust 1.0 released. In addition, almost all of them +them will fail to compile against version 0.7 of openssl when it is released. +When that happens, users of those libraries will be forced to manually override +Cargo's version selection every time it is recalculated. This is not a fun +time. + +Bad version restrictions are also "viral". Even if a developer is careful to +pick dependencies that have reasonable version restrictions, there could be a +wildcard constraint hiding five transitive levels down. Manually searching the +entire dependency graph is an exercise in frustration that shouldn't be +necessary. + +On the other hand, consider a library that has a version constraint of `^0.6`. +When openssl 0.7 releases, the library will either continue to work against +version 0.7, or it won't. In the first case, the author can simply extend the +constraint to `>= 0.6, < 0.8` and consumers can use it with version 0.6 or 0.7 +without any trouble. If it does not work against version 0.7, consumers of the +library are fine! Their code will continue to work without any manual +intervention. The author can update the library to work with version 0.7 and +release a new version with a constraint of `^0.7` to support consumers that +want to use that newer release. + +Making crates.io more picky than Cargo itself is not a new concept; it +currently [requires several items](https://github.com/rust-lang/crates.io/blob/8c85874b6b967e1f46ae2113719708dce0c16d32/src/krate.rs#L746-L759) in published crates that Cargo will not: + + * A valid license + * A description + * A list of authors + +All of these requirements are in place to make it easier for developers to use +the libraries uploaded to crates.io - that's why crates are published, after +all! A restriction on wildcards is another step down that path. + +Note that this restriction would only apply to normal compile dependencies and +build dependencies, but not to dev dependencies. Dev dependencies are only used +when testing a crate, so it doesn't matter to downstream consumers if they +break. + +This RFC is not trying to prohibit *all* constraints that would run into the +issues described above. For example, the constraint `>= 0.0.0` is exactly +equivalent to `*`. This is for a couple of reasons: + +* It's not totally clear how to precisely define "reasonable" constraints. For +example, one might want to forbid constraints that allow unreleased major +versions. However, some crates provide strong guarantees that any breaks will +be followed by one full major version of deprecation. If a library author is +sure that their crate doesn't use any deprecated functionality of that kind of +dependency, it's completely safe and reasonable to explicitly extend the +version constraint to include the next unreleased version. +* Cargo and crates.io are missing tools to deal with overly-restrictive +constraints. For example, it's not currently possible to force Cargo to allow +dependency resolution that violates version constraints. Without this kind of +support, it is somewhat risky to push too hard towards tight version +constraints. +* Wildcard constraints are popular, at least in part, because they are the +path of least resistance when writing a crate. Without wildcard constraints, +crate authors will be forced to figure out what kind of constraints make the +most sense in their use cases, which may very well be good enough. + +# Detailed design + +The prohibition on wildcard constraints will be rolled out in stages to make +sure that crate authors have lead time to figure out their versioning stories. + +In the next stable Rust release (1.4), Cargo will issue warnings for all +wildcard constraints on build and compile dependencies when publishing, but +publishes those constraints will still succeed. Along side the next stable +release after that (1.5 on December 11th, 2015), crates.io be updated to reject +publishes of crates with those kinds of dependency constraints. Note that the +check will happen on the crates.io side rather than on the Cargo side since +Cargo can publish to locations other than crates.io which may not worry about +these restrictions. + +# Drawbacks + +The barrier to entry when publishing a crate will be mildly higher. + +Tightening constraints has the potential to cause resolution breakage when no +breakage would occur otherwise. + +# Alternatives + +We could continue allowing these kinds of constraints, but complain in a +"sufficiently annoying" manner during publishes to discourage their use. + +This RFC originally proposed forbidding all constraints that had no upper +version bound but has since been pulled back to just `*` constraints. diff --git a/text/1242-rust-lang-crates.md b/text/1242-rust-lang-crates.md new file mode 100644 index 00000000000..dc24add8ffe --- /dev/null +++ b/text/1242-rust-lang-crates.md @@ -0,0 +1,231 @@ +- Feature Name: N/A +- Start Date: 2015-07-29 +- RFC PR: [rust-lang/rfcs#1242](https://github.com/rust-lang/rfcs/pull/1242) +- Rust Issue: N/A + +# Summary + +This RFC proposes a policy around the crates under the rust-lang github +organization that are not part of the Rust distribution (compiler or standard +library). At a high level, it proposes that these crates be: + +- Governed similarly to the standard library; +- Maintained at a similar level to the standard library, including platform support; +- Carefully curated for quality. + +# Motivation + +There are three main motivations behind this RFC. + +**Keeping `std` small**. There is a widespread desire to keep the standard + library reasonably small, and for good reason: the stability promises made in + `std` are tied to the versioning of Rust itself, as are updates to it, meaning + that the standard library has much less flexibility than other crates + enjoy. While we *do* plan to continue to grow `std`, and there are legitimate + reasons for APIs to live there, we still plan to take a minimalistic + approach. See + [this discussion](https://internals.rust-lang.org/t/what-should-go-into-the-standard-library/2158) + for more details. + +The desire to keep `std` small is in tension with the desire to provide +high-quality libraries *that belong to the whole Rust community* and cover a +wider range of functionality. The poster child here is the +[regex crate](https://github.com/rust-lang/regex), which provides vital +functionality but is not part of the standard library or basic Rust distribution +-- and which is, in principle, under the control of the whole Rust community. + +This RFC resolves the tension between a "batteries included" Rust and a small +`std` by treating `rust-lang` crates as, in some sense, "the rest of the +standard library". While this doesn't solve the entire problem of curating the +library ecosystem, it offers a big step for some of the most significant/core +functionality we want to commit to. + +**Staging `std`**. For cases where we do want to grow the standard library, we + of course want to heavily vet APIs before their stabilization. Historically + we've done so by landing the APIs directly in `std`, but marked unstable, + relegating their use to nightly Rust. But in many cases, new `std` APIs can + just as well begin their life as external crates, usable on stable Rust, and + ultimately stabilized wholesale. The recent + [`std::net` RFC](https://github.com/rust-lang/rfcs/pull/1158) is a good + example of this phenomenon. + +The main challenge to making this kind of "`std` staging" work is getting +sufficient visibility, central management, and community buy-in for the library +prior to stabilization. When there is widespread desire to extend `std` in a +certain way, this RFC proposes that the extension can start its life as an +external rust-lang crate (ideally usable by stable Rust). It also proposes an +eventual migration path into `std`. + +**Cleanup**. During the stabilization of `std`, a fair amount of functionality + was moved out into external crates hosted under the rust-lang github + organization. The quality and future prospects of these crates varies widely, + and we would like to begin to organize and clean them up. + +# Detailed design + +## The lifecycle of a rust-lang crate + +First, two additional github organizations are proposed: + +- rust-lang-nursery +- rust-lang-deprecated + +New cratess start their life in a `0.X` series that lives in the +rust-lang-nursery. Crates in this state do not represent a major commitment from +the Rust maintainers; rather, they signal a trial period. A crate enters the +nursery when (1) there is already a working body of code and (2) the library +subteam approves a petition for inclusion. The petition is informal (not an +RFC), and can take the form of a discuss post laying out the motivation and +perhaps some high-level design principles, and linking to the working code. + +If the library team accepts a crate into the nursery, they are indicating an +*interest* in ultimately advertising the crate as "a core part of Rust", and in +maintaining the crate permanently. During the 0.X series in the nursery, the +original crate author maintains control of the crate, approving PRs and so on, +but the library subteam and broader community is expected to participate. As +we'll see below, nursery crates will be advertised (though not in the same way +as full rust-lang crates), increasing the chances that the crate is scrutinized +before being promoted to the next stage. + +Eventually, a nursery crate will either fail (and move to rust-lang-deprecated) +or reach a point where a 1.0 release would be appropriate. The failure case can +be decided at any point by the library subteam. + +If, on the other hand, a library reaches the 1.0 point, it is ready to be +promoted into rust-lang proper. To do so, an RFC must be written outlining the +motivation for the crate, the reasons that community ownership are important, +and delving into the API design and its rationale design. These RFCs are +intended to follow similar lines to the pre-1.0 stabilization RFCs for the +standard library (such as +[collections](https://github.com/rust-lang/rfcs/pull/235) or +[Duration](https://github.com/rust-lang/rfcs/pull/1040)) -- which have been very +successful in improving API design prior to stabilization. Once a "1.0 RFC" is +approved by the libs team, the crate moves into the rust-lang organization, and +is henceforth governed by the whole Rust community. That means in particular +that significant changes (certainly those that would require a major version +bump, but other substantial PRs as well) are reviewed by the library subteam and +may require an RFC. On the other hand, the community has broadly agreed to +maintain the library in perpetuity (unless it is later deprecated). And again, +as we'll see below, the promoted crate is very visibly advertised as part of the +"core Rust" package. + +Promotion to 1.0 requires first-class support on all first-tier platforms, +except for platform-specific libraries. + +Crates in rust-lang may issue new major versions, just like any other crates, +though such changes should go through the RFC process. While the library subteam +is responsible for major decisions about the library after 1.0, its original +author(s) will of course wield a great deal of influence, and their objections +will be given due weight in the consensus process. + +### Relation to `std` + +In many cases, the above description of the crate lifecycle is complete. But +some rust-lang crates are destined for std. Usually this will be clear up front. + +When a std-destined crate has reached sufficient maturity, the libs subteam can +call a "final comment period" for moving it into `std` proper. Assuming there +are no blocking objections, the code is moved into `std`, and the original repo +is left intact, with the following changes: + +- a minor version bump, +- *conditionally* replacing all definitions with `pub use` from `std` (which + will require the ability to `cfg` switch on feature/API availability -- a + highly-desired feature on its own). + +By re-routing the library to `std` when available we provide seamless +compatibility between users of the library externally and in `std`. In +particular, traits and types defined in the crate are compatible across either +way of importing them. + +### Deprecation + +At some point a library may become stale -- either because it failed to make it +out of the nursery, or else because it was supplanted by a superior library. The +libs subteam can deprecate nursery crates at any time, and can deprecate +rust-lang crates through an RFC. This is expected to be a rare occurrence. + +Deprecated crates move to rust-lang-deprecated and are subsequently minimally +maintained. Alternatively, if someone volunteers to maintain the crate, +ownership can be transferred externally. + +## Advertising + +Part of the reason for having rust-lang crates is to have a clear, short list of +libraries that are broadly useful, vetted and maintained. But where should this +list appear? + +This RFC doesn't specify the complete details, but proposes a basic direction: + +- The crates in rust-lang should appear in the sidebar in the core rustdocs + distributed with Rust, along side the standard library. (For nightly releases, + we should include the nursery crates as well.) + +- The crates should also be published on crates.io, and should somehow be +*badged*. But the design of a badging/curation system for crates.io is out of +scope for this RFC. + +## Plan for existing crates + +There are already a number of non-`std` crates in rust-lang. Below, we give the +full list along with recommended actions: + +### Transfer ownership + +Please volunteer if you're interested in taking one of these on! + +- rlibc +- semver +- threadpool + +### Move to rust-lang-nursery + +- bitflags +- getopts +- glob +- libc +- log +- rand (note, @huonw has a major revamp in the works) +- regex +- rustc-serialize (but will likely be replaced by serde or other approach eventually) +- tempdir (destined for `std` after reworking) +- uuid + +### Move to rust-lang-deprecated + +- fourcc: highly niche +- hexfloat: niche +- num: this is essentially a dumping ground from 1.0 stabilization; needs a complete re-think. +- term: API needs total overhaul +- time: needs total overhaul destined for std +- url: replaced by https://github.com/servo/rust-url + +# Drawbacks + +The drawbacks of this RFC are largely social: + +* Emphasizing rust-lang crates may alienate some in the Rust community, since it + means that certain libraries obtain a special "blessing". This is mitigated by + the fact that these libraries also become owned by the community at large. + +* On the other hand, requiring that ownership/governance be transferred to the + library subteam may be a disincentive for library authors, since they lose + unilateral control of their libraries. But this is an inherent aspect of the + policy design, and the vastly increased visibility of libraries is likely a + strong enough incentive to overcome this downside. + +# Alternatives + +The main alternative would be to not maintain other crates under the rust-lang +umbrella, and to offer some other means of curation (the latter of which is +needed in any case). + +That would be a missed opportunity, however; Rust's governance and maintenance +model has been very successful so far, and given our minimalistic plans for the +standard library, it is very appealing to have *some* other way to apply the +full Rust community in taking care of additional crates. + +# Unresolved questions + +Part of the maintenance standard for Rust is the CI infrastructure, including +bors/homu. What level of CI should we provide for these crates, and how do we do it? diff --git a/text/1252-open-options.md b/text/1252-open-options.md new file mode 100644 index 00000000000..6854dc4f22c --- /dev/null +++ b/text/1252-open-options.md @@ -0,0 +1,661 @@ +- Feature Name: `expand_open_options` +- Start Date: 2015-08-04 +- RFC PR: [rust-lang/rfcs#1252](https://github.com/rust-lang/rfcs/pull/1252) +- Rust Issue: [rust-lang/rust#30014](https://github.com/rust-lang/rust/issues/30014) + +# Summary + +Document and expand the open options. + + +# Motivation + +The options that can be passed to the os when opening a file vary between +systems. And even if the options seem the same or similar, there may be +unexpected corner cases. + +This RFC attempts to +- describe the different corner cases and behaviour of various operating + systems. +- describe the intended behaviour and interaction of Rusts options. +- remedy cross-platform inconsistencies. +- suggest extra options to expose more platform-specific options. + + +# Detailed design + +## Access modes + +### Read-only +Open a file for read-only. + + +### Write-only +Open a file for write-only. + +If a file already exist, the contents of that file get overwritten, but it is +not truncated. Example: +``` +// contents of file before: "aaaaaaaa" +file.write(b"bbbb") +// contents of file after: "bbbbaaaa" +``` + + +### Read-write +This is the simple combinations of read-only and write-only. + + +### Append-mode +Append-mode is similar to write-only, but all writes always happen at the end of +the file. This mode is especially useful if multiple processes or threads write +to a single file, like a log file. The operating system guarantees all writes +are atomic: no writes get mangled because another process writes at the same +time. No guarantees are made about the order writes end up in the file though. + +Note: sadly append-mode is not atomic on NFS filesystems. + +One maybe obvious note when using append-mode: make sure that all data that +belongs together, is written the the file in one operation. This can be done +by concatenating strings before passing them to `write()`, or using a buffered +writer (with a more than adequately sized buffer) and calling `flush()` when the +message is complete. + +_Implementation detail_: On Windows opening a file in append-mode has one flag +_less_, the right to change existing data is removed. On Unix opening a file in +append-mode has one flag _extra_, that sets the status of the file descriptor to +append-mode. You could say that on Windows write is a superset of append, while +on Unix append is a superset of write. + +Because of this append is treated as a separate access mode in Rust, and if +`.append(true)` is specified than `.write()` is ignored. + + +### Read-append +Writing to the file works exactly the same as in append-mode. + +Reading is more difficult, and may involve a lot of seeking. When the file is +opened, the position for reading may be set at the end of the file, so you +should first seek to the beginning. Also after every write the position is set +to the end of the file. So before writing you should save the current position, +and restore it after the write. +``` +try!(file.seek(SeekFrom::Start(0))); +try!(file.read(&mut buffer)); +let pos = try!(file.seek(SeekFrom::Current(0))); +try!(file.write(b"foo")); +try!(file.seek(SeekFrom::Start(pos))); +try!(file.read(&mut buffer)); +``` + +### No access mode set +Even if you don't have read or write permission to a file, it is possible to +open it on some systems by opening it with no access mode set (or the equivalent +there of). This is true for Windows, Linux (with the flag `O_PATH`) and +GNU/Hurd. + +What can be done with a file opened this way is system-specific and niche. Since +Linux version 2.6.39 all three operating systems support reading metadata such +as the file size and timestamps. + +On practically all variants of Unix opening a file without specifying the access +mode falls back to opening the file read-only. This is because of the way the +access flags where traditionally defined: `O_RDONLY = 0`, `O_WRONLY = 1` and +`O_RDWR = 2`. When no flags are set, the access mode is `0`: read-only. But +code that relies on this is considered buggy and not portable. + +What should Rust do when no access mode is specified? Fall back to read-only, +open with the most similar system-specific mode, or always fail to open? This +RFC proposes to always fail. This is the conservative choice, and can be changed +to open in a system-specific mode if a clear use case arises. Implementing a +fallback is not worth it: it is no great effort to set the access mode +explicitly. + + +### Windows-specific +`.access_mode(FILE_READ_DATA)` + +On Windows you can detail whether you want to have read and/or write access to +the files data, attributes and/or extended attributes. Managing permissions in +such detail has proven itself too difficult, and generally not worth it. + +In Rust, `.read(true)` gives you read access to the data, attributes and +extended attributes. Similarly, `.write(true)` gives write access to those +three, and the right to append data beyond the current end of the file. + +But if you want fine-grained control, with `access_mode` you have it. + +`.access_mode()` overrides the access mode set with Rusts cross-platform +options. Reasons to do so: +- it is not possible to un-set the flags set by Rusts options; +- otherwise the cross-platform options have to be wrapped with `#[cfg(unix)]`, + instead of only having to wrap the Windows-specific option. + +As a reference, this are the flags set by Rusts access modes: + +bit| flag | read | write | read-write | append | read-append | +--:|:----------------------|:-----:|:-----:|:----------:|:------:|:-----------:| + | **generic rights** | | | | | | +31 | GENERIC_READ | set | | set | | set | +30 | GENERIC_WRITE | | set | set | | | +29 | GENERIC_EXECUTE | | | | | | +28 | GENERIC_ALL | | | | | | + | **specific rights** | | | | | | + 0 | FILE_READ_DATA |implied| | implied | | implied | + 1 | FILE_WRITE_DATA | |implied| implied | | | + 2 | FILE_APPEND_DATA | |implied| implied | set | set | + 3 | FILE_READ_EA |implied| | implied | | implied | + 4 | FILE_WRITE_EA | |implied| implied | set | set | + 6 | FILE_EXECUTE | | | | | | + 7 | FILE_READ_ATTRIBUTES |implied| | implied | | implied | + 8 | FILE_WRITE_ATTRIBUTES | |implied| implied | set | set | + | **standard rights** | | | | | | +16 | DELETE | | | | | | +17 | READ_CONTROL |implied|implied| implied | set | set+implied | +18 | WRITE_DAC | | | | | | +19 | WRITE_OWNER | | | | | | +20 | SYNCHRONIZE |implied|implied| implied | set | set+implied | + +The implied flags can be specified explicitly with the constants +`FILE_GENERIC_READ` and `FILE_GENERIC_WRITE`. + + +## Creation modes + +creation mode | file exists | file does not exist | Unix | Windows | +:----------------------------|-------------|---------------------|:------------------|:------------------------------------------| +not set (open existing) | open | fail | | OPEN_EXISTING | +.create(true) | open | create | O_CREAT | OPEN_ALWAYS | +.truncate(true) | truncate | fail | O_TRUNC | TRUNCATE_EXISTING | +.create(true).truncate(true) | truncate | create | O_CREAT + O_TRUNC | CREATE_ALWAYS | +.create_new(true) | fail | create | O_CREAT + O_EXCL | CREATE_NEW + FILE_FLAG_OPEN_REPARSE_POINT | + + +### Not set (open existing) +Open an existing file. Fails if the file does not exist. + + +### Create +`.create(true)` + +Open an existing file, or create a new file if it does not already exists. + + +### Truncate +`.truncate(true)` + +Open an existing file, and truncate it to zero length. Fails if the file does +not exist. Attributes and permissions of the truncated file are preserved. + +Note when using the Windows-specific `.access_mode()`: truncating will only work +if the `GENERIC_WRITE` flag is set. Setting the equivalent individual flags is +not enough. + + +### Create and truncate +`.create(true).truncate(true)` + +Open an existing file and truncate it to zero length, or create a new file if it +does not already exists. + +Note when using the Windows-specific `.access_mode()`: Contrary to only +`.truncate(true)`, with `.create(true).truncate(true)` Windows _can_ truncate an +existing file without requiring any flags to be set. + +On Windows the attributes of an existing file can cause `.open()` to fail. If +the existing file has the attribute _hidden_ set, it is necessary to open with +`FILE_ATTRIBUTE_HIDDEN`. Similarly if the existing file has the attribute +_system_ set, it is necessary to open with `FILE_ATTRIBUTE_SYSTEM`. See +the Windows-specific `.attributes()` below on how to set these. + + +### Create_new +`.create_new(true)` + +Create a new file, and fail if it already exist. + +On Unix this options started its life as a security measure. If you first check +if a file does not exists with `exists()` and then call `open()`, some other +process may have created in the in mean time. `.create_new()` is an atomic +operation that will fail if a file already exist at the location. + +`.create_new()` has a special rule on Unix for dealing with symlinks. If there +is a symlink at the final element of its path (e.g. the filename), open will +fail. This is to prevent a vulnerability where an unprivileged process could +trick a privileged process into following a symlink and overwriting a file the +unprivileged process has no access to. +See [Exploiting symlinks and tmpfiles](https://lwn.net/Articles/250468/). +On Windows this behaviour is imitated by specifying not only `CREATE_NEW` but +also `FILE_FLAG_OPEN_REPARSE_POINT`. + +Simply put: nothing is allowed to exist on the target location, also no +(dangling) symlink. + +if `.create_new(true)` is set, `.create()` and `.truncate()` are ignored. + + +### Unix-specific: Mode +`.mode(0o666)` + +On Unix the new file is created by default with permissions `0o666` minus the +systems `umask` (see [Wikipedia](https://en.wikipedia.org/wiki/Umask)). It is +possible to set on other mode with this option. + +If a file already exist or `.create(true)` or `.create_new(true)` are not +specified, `.mode()` is ignored. + +Rust currently does not expose a way to modify the umask. + + +### Windows-specific: Attributes +`.attributes(FILE_ATTRIBUTE_READONLY | FILE_ATTRIBUTE_HIDDEN | FILE_ATTRIBUTE_SYSTEM)` + +Files on Windows can have several attributes, most commonly one or more of the +following four: readonly, hidden, system and archive. Most +[others](https://msdn.microsoft.com/en-us/library/windows/desktop/gg258117%28v=vs.85%29.aspx) +are properties set by the file system. Of the others only +`FILE_ATTRIBUTE_ENCRYPTED`, `FILE_ATTRIBUTE_TEMPORARY` and +`FILE_ATTRIBUTE_OFFLINE` can be set when creating a new file. All others are +silently ignored. + +It is no use to set the archive attribute, as Windows sets it automatically when +the file is newly created or modified. This flag may then be used by backup +applications as an indication of which files have changed. + +If a _new_ file is created because it does not yet exist and `.create(true)` or +`.create_new(true)` are specified, the new file is given the attributes declared +with `.attributes()`. + +If an _existing_ file is opened with `.create(true).truncate(true)`, its +existing attributes are preserved and combined with the ones declared with +`.attributes()`. + +In all other cases the attributes get ignored. + + +### Combination of access modes and creation modes + +Some combinations of creation modes and access modes do not make sense. + +For example: `.create(true)` when opening read-only. If the file does not +already exist, it is created and you start reading from an empty file. And it is +questionable whether you have permission to create a new file if you don't have +write access. A new file is created on all systems I have tested, but there is +no documentation that explicitly guarantees this behaviour. + +The same is true for `.truncate(true)` with read and/or append mode. Should an +existing file be modified if you don't have write permission? On Unix it is +undefined +(see [some](http://www.monkey.org/openbsd/archive/tech/0009/msg00299.html) +[comments](http://www.monkey.org/openbsd/archive/tech/0009/msg00304.html) on the +OpenBSD mailing list). The behaviour on Windows is inconsistent and depends on +whether `.create(true)` is set. + +To give guarantees about cross-platform (and sane) behaviour, Rust should allow +only the following combinations of access modes and creations modes: + +creation mode | read | write | read-write | append | read-append | +:-----------------------|:-----:|:-----:|:----------:|:------:|:-----------:| +not set (open existing) | X | X | X | X | X | +create | | X | X | X | X | +truncate | | X | X | | | +create and truncate | | X | X | | | +create_new | | X | X | X | X | + +It is possible to bypass these restrictions by using system-specific options (as +in this case you already have to take care of cross-platform support yourself). +On Unix this is done by setting the creation mode using `.custom_flags()` with +`O_CREAT`, `O_TRUNC` and/or `O_EXCL`. On Windows this can be done by manually +specifying `.access_mode()` (see above). + + +## Asynchronous IO +Out op scope. + + +## Other options + +### Inheritance of file descriptors +Leaking file descriptors to child processes can cause problems and can be a +security vulnerability. See this report by +[Python](https://www.python.org/dev/peps/pep-0446/). + +On Windows, child processes do not inherit file descriptors by default (but this +can be changed). On Unix they always inherit, unless the close-on-exec flag is +set. + +The close on exec flag can be set atomically when opening the file, or later +with `fcntl`. The `O_CLOEXEC` flag is in the relatively new POSIX-2008 standard, +and all modern versions of Unix support it. The following table lists for which +operating systems we can rely on the flag to be supported. + +os | since version | oldest supported version +:-------------|:--------------|:------------------------ +OS X | 10.6 | 10.7? +Linux | 2.6.23 | 2.6.32 (supported by Rust) +FreeBSD | 8.3 | 8.4 +OpenBSD | 5.0 | 5.7 +NetBSD | 6.0 | 5.0 +Dragonfly BSD | 3.2 | ? (3.2 is not updated since 2012-12-14) +Solaris | 11 | 10 + +This means we can always set the flag `O_CLOEXEC`, and do an additional `fcntl` +if the os is NetBSD or Solaris. + + +### Custom flags +`.custom_flags()` + +Windows and the various flavours of Unix support flags that are not +cross-platform, but that can be useful in some circumstances. On Unix they will +be passed as the variable _flags_ to `open`, on Windows as the +_dwFlagsAndAttributes_ parameter. + +The cross-platform options of Rust can do magic: they can set any flag necessary +to ensure it works as expected. For example, `.append(true)` on Unix not only +sets the flag `O_APPEND`, but also automatically `O_WRONLY` or `O_RDWR`. This +special treatment is not available for the custom flags. + +Custom flags can only set flags, not remove flags set by Rusts options. + +For the custom flags on Unix, the bits that define the access mode are masked +out with `O_ACCMODE`, to ensure they do not interfere with the access mode set +by Rusts options. + +[Windows](https://msdn.microsoft.com/en-us/library/windows/desktop/hh449426%28v=vs.85%29.aspx): + +bit| flag +--:|:-------------------------------- +31 | FILE_FLAG_WRITE_THROUGH +30 | FILE_FLAG_OVERLAPPED +29 | FILE_FLAG_NO_BUFFERING +28 | FILE_FLAG_RANDOM_ACCESS +27 | FILE_FLAG_SEQUENTIAL_SCAN +26 | FILE_FLAG_DELETE_ON_CLOSE +25 | FILE_FLAG_BACKUP_SEMANTICS +24 | FILE_FLAG_POSIX_SEMANTICS +23 | FILE_FLAG_SESSION_AWARE +21 | FILE_FLAG_OPEN_REPARSE_POINT +20 | FILE_FLAG_OPEN_NO_RECALL +19 | FILE_FLAG_FIRST_PIPE_INSTANCE +18 | FILE_FLAG_OPEN_REQUIRING_OPLOCK + + +Unix: + +| POSIX | Linux | OS X | FreeBSD | OpenBSD | NetBSD |Dragonfly BSD| Solaris | +|:------------|:------------|:------------|:------------|:------------|:------------|:------------|:------------| +| O_TRUNC | O_TRUNC | O_TRUNC | O_TRUNC | O_TRUNC | O_TRUNC | O_TRUNC | O_TRUNC | +| O_CREAT | O_CREAT | O_CREAT | O_CREAT | O_CREAT | O_CREAT | O_CREAT | O_CREAT | +| O_EXCL | O_EXCL | O_EXCL | O_EXCL | O_EXCL | O_EXCL | O_EXCL | O_EXCL | +| O_APPEND | O_APPEND | O_APPEND | O_APPEND | O_APPEND | O_APPEND | O_APPEND | O_APPEND | +| O_CLOEXEC | O_CLOEXEC | O_CLOEXEC | O_CLOEXEC | O_CLOEXEC | O_CLOEXEC | O_CLOEXEC | O_CLOEXEC | +| O_DIRECTORY | O_DIRECTORY | O_DIRECTORY | O_DIRECTORY | O_DIRECTORY | O_DIRECTORY | O_DIRECTORY | O_DIRECTORY | +| O_NOCTTY | O_NOCTTY | O_NOCTTY | O_NOCTTY | | O_NOCTTY | | O_NOCTTY | +| O_NOFOLLOW | O_NOFOLLOW | O_NOFOLLOW | O_NOFOLLOW | O_NOFOLLOW | O_NOFOLLOW | O_NOFOLLOW | O_NOFOLLOW | +| O_NONBLOCK | O_NONBLOCK | O_NONBLOCK | O_NONBLOCK | O_NONBLOCK | O_NONBLOCK | O_NONBLOCK | O_NONBLOCK | +| O_SYNC | O_SYNC | O_SYNC | O_SYNC | O_SYNC | O_SYNC | O_FSYNC | O_SYNC | +| O_DSYNC | O_DSYNC | O_DSYNC | | | O_DSYNC | | O_DSYNC | +| O_RSYNC | | | | | O_RSYNC | | O_RSYNC | +| | O_DIRECT | | O_DIRECT | | O_DIRECT | O_DIRECT | | +| | O_ASYNC | | | | O_ASYNC | | | +| | O_NOATIME | | | | | | | +| | O_PATH | | | | | | | +| | O_TMPFILE | | | | | | | +| | | O_SHLOCK | O_SHLOCK | O_SHLOCK | O_SHLOCK | O_SHLOCK | | +| | | O_EXLOCK | O_EXLOCK | O_EXLOCK | O_EXLOCK | O_EXLOCK | | +| | | O_SYMLINK | | | | | | +| | | O_EVTONLY | | | | | | +| | | | | | O_NOSIGPIPE | | | +| | | | | | O_ALT_IO | | | +| | | | | | | | O_NOLINKS | +| | | | | | | | O_XATTR | +| [POSIX](http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html) | [Linux](http://man7.org/linux/man-pages/man2/open.2.html) | [OS X](https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/open.2.html) | [FreeBSD](https://www.freebsd.org/cgi/man.cgi?query=open&sektion=2) | [OpenBSD](http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man2/open.2?query=open&sec=2) | [NetBSD](http://netbsd.gw.com/cgi-bin/man-cgi?open+2+NetBSD-current) | [Dragonfly BSD](http://leaf.dragonflybsd.org/cgi/web-man?command=open§ion=2) | [Solaris](http://docs.oracle.com/cd/E23824_01/html/821-1463/open-2.html) | + + +### Windows-specific flags and attributes +The following variables for CreateFile2 currently have no equivalent functions +in Rust to set them: +``` +DWORD dwSecurityQosFlags; +LPSECURITY_ATTRIBUTES lpSecurityAttributes; +HANDLE hTemplateFile; +``` + + +## Changes from current + +### Access mode +- Current: `.append(true)` requires `.write(true)` on Unix, but not on Windows. + New: ignore `.write()` if `.append(true)` is specified. +- Current: when `.append(true)` is set, it is not possible to modify file + attributes on Windows, but it is possible to change the file mode on Unix. + New: allow file attributes to be modified on Windows in append-mode. +- Current: On Windows `.read()` and `.write()` set individual bit flags instead + of generic flags. New: Set generic flags, as recommend by Microsoft. e.g. + `GENERIC_WRITE` instead of `FILE_GENERIC_WRITE` and `GENERIC_READ` instead of + `FILE_GENERIC_READ`. Currently truncate is broken on Windows, this fixes it. +- Current: when no access mode is set, this falls back to opening the file + read-only on Unix, and opening with no access permissions on Windows. + New: always fail to open if no access mode is set. +- Rename the Windows-specific `.desired_access()` to `.access_mode()` + +### Creation mode +- Implement `.create_new()`. +- Do not allow `.truncate(true)` if the access mode is read-only and/or append. +- Do not allow `.create(true)` or `.create_new (true)` if the access mode is + read-only. +- Remove the Windows-specific `.creation_disposition()`. + It has no use, because all its options can be set in a cross-platform way. +- Split the Windows-specific `.flags_and_attributes()` into `.custom_flags()` + and `.attributes()`. This is a form of future-proofing, as the new Windows 8 + `Createfile2` also splits these attributes. This has the advantage of a clear + separation between file attributes, that are somewhat similar to Unix mode + bits, and the custom flags that modify the behaviour of the current file + handle. + +### Other options +- Set the close-on-exec flag atomically on Unix if supported. +- Implement `.custom_flags()` on Windows and Unix to pass custom flags to the +system. + + +# Drawbacks +This adds a thin layer on top of the raw operating system calls. In this +[pull request](https://github.com/rust-lang/rust/pull/26772#issuecomment-126753342) +the conclusion was: this seems like a good idea for a "high level" abstraction +like OpenOptions. + +This adds extra options that many applications can do without (otherwise they +were already implemented). + +Also this RFC is in line with the vision for IO in the +[IO-OS-redesign](https://github.com/rust-lang/rfcs/blob/master/text/0517-io-os-reform.md#vision-for-io): +- [The APIs] should impose essentially zero cost over the underlying OS + services; the core APIs should map down to a single syscall unless more are + needed for cross-platform compatibility. +- The APIs should largely feel like part of "Rust" rather than part of any + legacy, and they should enable truly portable code. +- Coverage. The std APIs should over time strive for full coverage of non-niche, + cross-platform capabilities. + + +# Alternatives +The first version of this RFC contained a proposal for options that control +caching anf file locking. They are out of scope for now, but included here for +reference. + + +## Sharing / locking +On Unix it is possible for multiple processes to read and write to the same file +at the same time. + +When you open a file on Windows, the system by default denies other processes to +read or write to the file, or delete it. By setting the sharing mode, it is +possible to allow other processes read, write and/or delete access. For +cross-platform consistency, Rust imitates Unix by setting all sharing flags. + +Unix has no equivalent to the kind of file locking that Windows has. It has two +types of advisory locking, POSIX and BSD-style. Advisory means any process that +does not use locking itself can happily ignore the locking af another process. +As if that is not bad enough, they both have +[problems](http://0pointer.de/blog/projects/locking.html) that make them close +to unusable for modern multi-threaded programs. Linux may in some very rare +cases support mandatory file locking, but it is just as broken as advisory. + + +### Windows-specific: Share mode +`.share_mode(FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE)` + +It is possible to set the individual share permissions with `.share_mode()`. + +The current philosophy of this function is that others should have no rights, +unless explicitly granted. I think a better fit for Rust would be to give all +others all rights, unless explicitly denied, e.g.: +`.share_mode(DENY_READ | DENY_WRITE | DENY_DELETE)`. + + +## Controlling caching +When dealing file file systems and hard disks, there are several kinds of +caches. Giving hints or controlling them may improve performance or data +consistency. +1. *read-ahead (performance of reads and overwrites)* + Instead of requesting only the data necessary for a single `read()` call from + a storage device, an operating system may request more data than necessary to + have it already available for the next read. +2. *os cache (performance of reads and overwrites)* + The os may keep the data of previous reads and writes in memory to increase + the performance of future reads and possibly writes. +3. *os staging area (convenience/performance of reads and writes)* + The size and alignment of data reads and writes to a disk should + correspondent to sectors on the storage device, usually 512 or 4096 bytes. + The os makes sure a regular `write()` or `read()` doesn't have to care about + this. For example a small write (say a 100 bytes) has to rewrite a whole + sector. The os often has the surrounding data in its cache and can + efficiently combine it to write the whole sector. +4. *delayed writing (performance/correctness of writes)* + The os may delay writes to improve performance, for example by batching + consecutive writes, and scheduling with reads to minimize seeking. +5. *on-disk write cache (performance/correctness of writes)* + Most hard disk / storage devices have a small RAM cache. It can speed up + reads, and writes can return as soon as the data is written to the devices + cache. + + +### Read-ahead hint +``` +.read_ahead_hint(enum CacheHint) + +enum ReadAheadHint { + Default, + Sequential, + Random, +} +``` + +If you read a file sequentially the read-ahead is beneficial, for completely +random access it can become a penalty. + +- `Default` uses the generally good heuristics of the operating system. +- `Sequential` indicates sequential but not neccesary consecutive access. + With this the os may increase the amount of data that is read ahead. +- `Random` indicates mainly random access. The os may disable its read-ahead + cache. + +This option is treated as a hint. It is ignored if the os does not support it, +or if the behaviour of the application proves it is set wrong. + +Open flags / system calls: +- Windows: flags `FILE_FLAG_SEQUENTIAL_SCAN` and `FILE_FLAG_RANDOM_ACCESS` +- Linux, FreeBSD, NetBSD: `posix_fadvise()` with the flags + `POSIX_FADV_SEQUENTIAL` and `POSIX_FADV_RANDOM` +- OS X: `fcntl()` with with `F_RDAHEAD 0` for random (there is no special mode + for sequential). + + +### OS cache +`used_once(true)` + +When reading many gigabytes of data a process may push useful data from other +processes out of the os cache. To keep the performance of the whole system up, a +process could indicate to the os whether data is only needed once, or not needed +anymore. On Linux, FreeBSD and NetBSD this is possible with fcntl +`POSIX_FADV_DONTNEED` after a read or write with sync (or before close). On +FreeBSD and NetBSD it is also possible to specify this up-front with fnctl +`POSIX_FADV_NOREUSE`, and on OS X with fnctl `F_NOCACHE`. Windows does not seem +to provide an option for this. + +This option may negatively effect the performance of writes smaller than the +sector size, as cached data may not be available to the os staging area. + +This control over the os cache is the main reason some applications use direct +io, despite it being less convenient and disabling other useful caches. + + +### Delayed writing and on-disk write cache +`.sync_data(true)` and `.sync_all(true)` + +There can be two delays (by the os and by the disk cache) between when an +application performs a write, and when the data is written to persistent +storage. They increase performance, but increase the risk of data loss in case +of a systems crash or power outage. + +When dealing with critical data, it may be useful to control these caches to +make the chance of data loss smaller. The application should normally do so by +calling Rusts stand-alone functions `sync_data()` or `sync_all()` at meaningful +points (e.g. when the file is in a consistent state, or a state it can recover +from). + +However, `.sync_data()` and `.sync_all()` may also be given as an open option. +This guarantees every write will not return before the data is written to disk. +These options improve reliability as and you can never accidentally forget a +sync. + +Whether perfermance with these options is worse than with the stand-alone +functions is hard to say. With these options the data maybe has to be +synchronised more often. But the stand-alone functions often sync outstanding +writes to all files, while the options possibly sync only the current file. + +The difference between `.sync_all()` and `.sync_data(true)` is that +`.sync_data(true)` does not update the less critical metadata such as the last +modified timestamp (although it will be written eventually). + +Open flags: +- Windows: `FILE_FLAG_WRITE_THROUGH` for `.sync_all()` +- Unix: `O_SYNC` for `.sync_all()` and `O_DSYNC` for `.sync_data()` + +If a system does not support syncing only data, this option will fall back to +syncing both data and metadata. If `.sync_all(true)` is specified, +`.sync_data()` is ignored. + + +### Direct access / no caching +Most operating systems offer a mode that reads data straight from disk to an +application buffer, or that writes straight from a buffer to disk. This avoid +the small cost of a memory copy. It has the side effect that the data is not +available to the os to provide caching. Also, because this does not use the +_os staging area_ all reads and writes have to take care of data sizes and +alignment themselves. + +Overview: +- _os staging area_: not used +- _read-ahead_: not used +- _os cache_: data may be used, but is not added +- _delayed writing_: no delay +- _on-disk write cache_: maybe + +Open flags / system calls: +- Windows: flag `FILE_FLAG_NO_BUFFERING` +- Linux, FreeBSD, NetBSD, Dragonfly BSD: flag `O_DIRECT` + +The other options offer a more fine-grained control over caching, and usually +offer better performance or correctness guarantees. This option is sometimes +used by applications as a crude way to control (disable) the _os cache_. + +Rust should not currently expose this as an open option, because it should be +used with an abstraction / external crate that handles the data size and +alignment requirements. If it should be used at all. + + +# Unresolved questions +None. diff --git a/text/1257-drain-range-2.md b/text/1257-drain-range-2.md new file mode 100644 index 00000000000..533f1f60e69 --- /dev/null +++ b/text/1257-drain-range-2.md @@ -0,0 +1,124 @@ +- Feature Name: drain-range +- Start Date: 2015-08-14 +- RFC PR: [rust-lang/rfcs#1257](https://github.com/rust-lang/rfcs/pull/1257) +- Rust Issue: [rust-lang/rust#27711](https://github.com/rust-lang/rust/issues/27711) + +# Summary + +Implement `.drain(range)` and `.drain()` respectively as appropriate on collections. + +# Motivation + +The `drain` methods and their draining iterators serve to mass remove elements +from a collection, receiving them by value in an iterator, while the collection +keeps its allocation intact (if applicable). + +The range parameterized variants of drain are a generalization of `drain`, to +affect just a subrange of the collection, for example removing just an index range +from a vector. + +`drain` thus serves both to consume all or some elements from a collection without +consuming the collection itself. The ranged `drain` allows bulk removal of +elements, more efficently than any other safe API. + +# Detailed design + +- Implement `.drain(a..b)` where `a` and `b` are indices, for all + collections that are sequences. +- Implement `.drain()` for other collections. This is just like `.drain(..)` would be + (drain the whole collection). +- Ranged drain accepts all range types, currently .., a.., ..b, a..b, + and drain will accept inclusive end ranges ("closed ranges") when they are implemented. +- Drain removes every element in the range. +- Drain returns an iterator that produces the removed items by value. +- Drain removes the whole range, regardless if you iterate the draining iterator + or not. +- Drain preserves the collection's capacity where it is possible. + +## Collections + +`Vec` and `String` already have ranged drain, so they are complete. + +`HashMap` and `HashSet` already have `.drain()`, so they are complete; +their elements have no meaningful order. + +`BinaryHeap` already has `.drain()`, and just like its other iterators, +it promises no particular order. So this collection is already complete. + +The following collections need updated implementations: + +`VecDeque` should implement `.drain(range)` for index ranges, just like `Vec` +does. + +`LinkedList` should implement `.drain(range)` for index ranges. Just +like the other seqences, this is a `O(n)` operation, and `LinkedList` already +has other indexed methods (`.split_off()`). + +## `BTreeMap` and `BTreeSet` + +`BTreeMap` already has a ranged iterator, `.range(a, b)`, and `drain` for +`BTreeMap` and `BTreeSet` should have arguments completely consistent the range +method. This will be addressed separately. + +## Stabilization + +The following can be stabilized as they are: + +- `HashMap::drain` +- `HashSet::drain` +- `BinaryHeap::drain` + +The following can be stabilized, but their argument's trait is not stable: + +- `Vec::drain` +- `String::drain` + +The following will be heading towards stabilization after changes: + +- `VecDeque::drain` + +# Drawbacks + +- Collections disagree on if they are drained with a range (`Vec`) or not (`HashMap`) +- No trait for the drain method. + +# Alternatives + +- Use a trait for the drain method and let all collections implement it. This + will force all collections to use a single parameter (a range) for the drain + method. + +- Provide `.splice(range, iterator)` for `Vec` instead of `.drain(range)`: + + ```rust + fn splice(&mut self, range: R, iter: I) -> Splice + where R: RangeArgument, I: IntoIterator + ``` + + if the method `.splice()` would both return an iterator of the replaced elements, + and consume an iterator (of arbitrary length) to replace the removed range, then + it includes drain's tasks. + +- RFC #574 proposed accepting either a single index (single key for maps) + or a range for ranged drain, so an alternative would be to do that. The + single index case is however out of place, and writing a range that spans + a single index is easy. + +- Use the name `.remove_range(a..b)` instead of `.drain(a..b)`. Since the method + has two simultaneous roles, removing a range and yielding a range as an iterator, + either role could guide the name. + This alternative name was not very popular with the rust developers I asked + (but they are already used to what `drain` means in rust context). + +- Provide `.drain()` without arguments and separate range drain into a separate + method name, implemented in addition to `drain` where applicable. + +- Do not support closed ranges in `drain`. + +- `BinaryHeap::drain` could drain the heap in sorted order. The primary proposal + is arbitrary order, to match preexisting `BinaryHeap` iterators. + +# Unresolved questions + +- Concrete shape of the `BTreeMap` API is not resolved here +- Will closed ranges be used for the `drain` API? diff --git a/text/1260-main-reexport.md b/text/1260-main-reexport.md new file mode 100644 index 00000000000..9a9d6b35f16 --- /dev/null +++ b/text/1260-main-reexport.md @@ -0,0 +1,57 @@ +- Feature Name: main_reexport +- Start Date: 2015-08-19 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1260 +- Rust Issue: https://github.com/rust-lang/rust/issues/28937 + +# Summary + +Allow a re-export of a function as entry point `main`. + +# Motivation + +Functions and re-exports of functions usually behave the same way, but they do +not for the program entry point `main`. This RFC aims to fix this inconsistency. + +The above mentioned inconsistency means that e.g. you currently cannot use a +library's exported function as your main function. + +Example: + + pub mod foo { + pub fn bar() { + println!("Hello world!"); + } + } + use foo::bar as main; + +Example 2: + + extern crate main_functions; + pub use main_functions::rmdir as main; + +See also https://github.com/rust-lang/rust/issues/27640 for the corresponding +issue discussion. + +The `#[main]` attribute can also be used to change the entry point of the +generated binary. This is largely irrelevant for this RFC as this RFC tries to +fix an inconsistency with re-exports and directly defined functions. +Nevertheless, it can be pointed out that the `#[main]` attribute does not cover +all the above-mentioned use cases. + +# Detailed design + +Use the symbol `main` at the top-level of a crate that is compiled as a program +(`--crate-type=bin`) – instead of explicitly only accepting directly-defined +functions, also allow (possibly non-`pub`) re-exports. + +# Drawbacks + +None. + +# Alternatives + +None. + +# Unresolved questions + +None. diff --git a/text/1268-allow-overlapping-impls-on-marker-traits.md b/text/1268-allow-overlapping-impls-on-marker-traits.md new file mode 100644 index 00000000000..9ae0b4cb450 --- /dev/null +++ b/text/1268-allow-overlapping-impls-on-marker-traits.md @@ -0,0 +1,144 @@ +- Feature Name: Allow overlapping impls for marker traits +- Start Date: 2015-09-02 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1268 +- Rust Issue: https://github.com/rust-lang/rust/issues/29864 + +# Summary + +Preventing overlapping implementations of a trait makes complete sense in the +context of determining method dispatch. There must not be ambiguity in what code +will actually be run for a given type. However, for marker traits, there are no +associated methods for which to indicate ambiguity. There is no harm in a type +being marked as `Sync` for multiple reasons. + +# Motivation + +This is purely to improve the ergonomics of adding/implementing marker traits. +While specialization will certainly make all cases not covered today possible, +removing the restriction entirely will improve the ergonomics in several edge +cases. + +Some examples include: + +- the coercible trait design presents at [RFC #91][91]; +- the `ExnSafe` trait proposed in [RFC #1236][1236]. + +[91]: https://github.com/rust-lang/rfcs/pull/91 +[1236]: https://github.com/rust-lang/rfcs/pull/1236 + +# Detailed design + +For the purpose of this RFC, the definition of a marker trait is a trait with no +associated items. The design here is quite straightforward. The following code +fails to compile today: + +```rust +trait Marker {} + +struct GenericThing { + a: A, + b: B, +} + +impl Marker> for A {} +impl Marker> for B {} +``` + +The two impls are considered overlapping, as there is no way to prove currently +that `A` and `B` are not the same type. However, in the case of marker traits, +there is no actual reason that they couldn't be overlapping, as no code could +actually change based on the `impl`. + +For a concrete use case, consider some setup like the following: + +```rust +trait QuerySource { + fn select>(&self, columns: C) -> SelectSource { + ... + } +} + +trait Column {} +trait Table: QuerySource {} +trait Selectable: Column {} + +impl> Selectable for C {} +``` + +However, when the following becomes introduced: + +```rust +struct JoinSource { + left: Left, + right: Right, +} + +impl QuerySource for JoinSource where + Left: Table + JoinTo, + Right: Table, +{ + ... +} +``` + +It becomes impossible to satisfy the requirements of `select`. The following +impl is disallowed today: + +```rust +impl Selectable> for C where + Left: Table + JoinTo, + Right: Table, + C: Column, +{} + +impl Selectable> for C where + Left: Table + JoinTo, + Right: Table, + C: Column, +{} +``` + +Since `Left` and `Right` might be the same type, this causes an overlap. +However, there's also no reason to forbid the overlap. There is no way to work +around this today. Even if you write an impl that is more specific about the +tables, that would be considered a non-crate local blanket implementation. The +only way to write it today is to specify each column individually. + +# Drawbacks + +With this change, adding any methods to an existing marker trait, even +defaulted, would be a breaking change. Once specialization lands, this could +probably be considered an acceptable breakage. + +# Alternatives + +If the lattice rule for specialization is eventually accepted, there does not +appear to be a case that is impossible to write, albeit with some additional +boilerplate, as you'll have to manually specify the empty impl for any overlap +that might occur. + +# Unresolved questions + +**How can we implement this design?** Simply lifting the coherence +restrictions is easy enough, but we will encounter some challenges +when we come to test whether a given trait impl holds. For example, if +we have something like: + +```rust +impl MarkerTrait for T { } +impl MarkerTrait for T { } +``` + +means that a type `Foo: MarkerTrait` can hold *either* by `Foo: Send` +*or* by `Foo: Sync`. Today, we prefer to break down an obligation like +`Foo: MarkerTrait` into component obligations (e.g., `Foo: Send`). Due +to coherence, there is always one best way to do this (sort of --- +where clauses complicate matters). That is, except for complications +due to type inference, there is a best impl to choose. But under this +proposal, there would not be. Experimentation is needed (similar +concerns arise with the proposals around specialization, so it may be +that progress on that front will answer the questions raised here). + +**Should we add some explicit way to indicate that this is a marker +trait?** This would address the drawback that adding items is a +backwards incompatible change. diff --git a/text/1270-deprecation.md b/text/1270-deprecation.md new file mode 100644 index 00000000000..ebd327adaaf --- /dev/null +++ b/text/1270-deprecation.md @@ -0,0 +1,127 @@ +- Feature Name: Public Stability +- Start Date: 2015-09-03 +- RFC PR: [rust-lang/rfcs#1270](https://github.com/rust-lang/rfcs/pull/1270) +- Rust Issue: [rust-lang/rust#29935](https://github.com/rust-lang/rust/issues/29935) + +# Summary + +This RFC proposes to allow library authors to use a `#[deprecated]` attribute, +with optional `since = "`*version*`"` and `note = "`*free text*`"`fields. The +compiler can then warn on deprecated items, while `rustdoc` can document their +deprecation accordingly. + +# Motivation + +Library authors want a way to evolve their APIs; which also involves +deprecating items. To do this cleanly, they need to document their intentions +and give their users enough time to react. + +Currently there is no support from the language for this oft-wanted feature +(despite a similar feature existing for the sole purpose of evolving the Rust +standard library). This RFC aims to rectify that, while giving a pleasant +interface to use while maximizing usefulness of the metadata introduced. + +# Detailed design + +Public API items (both plain `fn`s, methods, trait- and inherent +`impl`ementations as well as `const` definitions, type definitions, struct +fields and enum variants) can be given a `#[deprecated]` attribute. All +possible fields are optional: + +* `since` is defined to contain the version of the crate at the time of +deprecating the item, following the semver scheme. Rustc does not know about +versions, thus the content of this field is not checked (but will be by external +lints, e.g. [rust-clippy](https://github.com/Manishearth/rust-clippy). +* `note` should contain a human-readable string outlining the reason for +deprecating the item and/or what to use instead. While this field is not required, +library authors are strongly advised to make use of it. The string is interpreted +as plain unformatted text (for now) so that rustdoc can include it in the item's +documentation without messing up the formatting. + +On use of a *deprecated* item, `rustc` will `warn` of the deprecation. Note +that during Cargo builds, warnings on dependencies get silenced. While this has +the upside of keeping things tidy, it has a downside when it comes to +deprecation: + +Let's say I have my `llogiq` crate that depends on `foobar` which uses a +deprecated item of `serde`. I will never get the warning about this unless I +try to build `foobar` directly. We may want to create a service like `crater` +to warn on use of deprecated items in library crates, however this is outside +the scope of this RFC. + +`rustdoc` will show deprecation on items, with a `[deprecated]` box that may +optionally show the version and note where available. + +The language reference will be extended to describe this feature as outlined +in this RFC. Authors shall be advised to leave their users enough time to react +before *removing* a deprecated item. + +The internally used feature can either be subsumed by this or possibly renamed +to avoid a name clash. + +# Intended Use + +Crate author Anna wants to evolve her crate's API. She has found that one +type, `Foo`, has a better implementation in the `rust-foo` crate. Also she has +written a `frob(Foo)` function to replace the earlier `Foo::frobnicate(self)` +method. + +So Anna first bumps the version of her crate (because deprecation is always +done on a version change) from `0.1.1` to `0.2.1`. She also adds the following +prefix to the `Foo` type: + +``` +extern crate rust_foo; + +#[deprecated(since = "0.2.1", + note="The rust_foo version is more advanced, and this crate's will likely be discontinued")] +struct Foo { .. } +``` + +Users of her crate will see the following once they `cargo update` and `build`: + +``` +src/foo_use.rs:27:5: 27:8 warning: Foo is marked deprecated as of version 0.2.1 +src/foo_use.rs:27:5: 27:8 note: The rust_foo version is more advanced, and this crate's will likely be discontinued +``` + +Rust-clippy will likely gain more sophisticated checks for deprecation: + +* `future_deprecation` will warn on items marked as deprecated, but with a +version lower than their crates', while `current_deprecation` will warn only on +those items marked as deprecated where the version is equal or lower to the +crates' one. +* `deprecation_syntax` will check that the `since` field really contains a +semver number and not some random string. + +Clippy users can then activate the clippy checks and deactivate the standard +deprecation checks. + +# Drawbacks + +* Once the feature is public, we can no longer change its design + +# Alternatives + +* Do nothing +* make the `since` field required and check that it's a single version +* require either `reason` or `use` be present +* `reason` could include markdown formatting +* rename the `reason` field to `note` to clarify its broader usage. (**done!**) +* add a `note` field and make `reason` a field with specific meaning, perhaps +even predefine a number of valid reason strings, as JEP277 currently does +* Add a `use` field containing a plain text of what to use instead +* Add a `use` field containing a path to some function, type, etc. to replace +the current feature. Currently with the rustc-private feature, people are +describing a replacement in the `reason` field, which is clearly not the +original intention of the field +* Optionally, `cargo` could offer a new dependency category: "doc-dependencies" +which are used to pull in other crates' documentations to link them (this is +obviously not only relevant to deprecation) + +# Unresolved questions + +* What other restrictions should we introduce now to avoid being bound to a +possibly flawed design? +* Can / Should the `std` library make use of the `#[deprecated]` extensions? +* Bikeshedding: Are the names good enough? diff --git a/text/1288-time-improvements.md b/text/1288-time-improvements.md new file mode 100644 index 00000000000..d480051c3dd --- /dev/null +++ b/text/1288-time-improvements.md @@ -0,0 +1,351 @@ +- Feature Name: `time_improvements` +- Start Date: 2015-09-20 +- RFC PR: [rust-lang/rfcs#1288](https://github.com/rust-lang/rfcs/pull/1288) +- Rust Issue: [rust-lang/rust#29866](https://github.com/rust-lang/rust/issues/29866) + +# Summary + +This RFC proposes several new types and associated APIs for working with times in Rust. +The primary new types are `Instant`, for working with time that is guaranteed to be +monotonic, and `SystemTime`, for working with times across processes on a single system +(usually internally represented as a number of seconds since an epoch). + +# Motivations + +The primary motivation of this RFC is to flesh out a larger set of APIs for +representing instants in time and durations of time. + +For various reasons that this RFC will explore, APIs related to time are fairly +error-prone and have a number of caveats that programmers do not expect. + +Rust APIs tend to expose more of these kinds of caveats through their APIs, in +order to help programmers become aware of and handle edge-cases. At the same +time, un-ergonomic APIs can work against that goal. + +This RFC attempts to balance the desire to expose common footguns and help +programmers handle edge-cases with a desire to avoid creating so many hoops to +jump through that the useful caveats get ignored. + +At a high level, this RFC covers two concepts related to time: + +* Instants, moments in time +* Durations, an amount of time between two instants + +We would like to be able to do some basic operations with these instants: + +* Compare two instants +* Add a time period to an instant +* Subtract a time period from an instant +* Compare an instant to "now" to discover time elapsed + +However, there are a number of problems that arise when trying to define these +types and operations. + +First of all, with the exception of moments in time created using system APIs that +guarantee monotonicity (because they were created within a single process, or +created during since the last boot), moments in time are not monotonic. +A simple example of this is that if a program creates two files sequentially, +it cannot assume that the creation time of the second file is later than the +creation time of the first file. + +This is because NTP (the network time protocol) can arbitrarily change the +system clock, and can even **rewind time**. This kind of time travel means that +the "system time-line" is not continuous and monotonic, which is something that +programmers very often forget when writing code involving machine times. + +This design attempts to help programmers avoid some of the most egregious and +unexpected consequences of this kind of "time travel". + +--- + +Leap seconds, which cannot be predicted, mean that it is impossible +to reliably add a number of seconds to a particular moment in time represented +as a human date and time ("1 million seconds from 2015-09-20 at midnight"). + +They also mean that seemingly simple concepts, like "1 minute", have caveats +depending on exactly how they are used. Caveats related to leap seconds +create real-world bugs, because of how unusual leap seconds are, and how +unlikely programmers are to consider "12:00:60" as a valid time. + +Certain kinds of seemingly simple operations may not make sense in +all cases. For example, adding "1 year" to February 29, 2012 would produce +February 29, 2013, which is not a valid date. Adding "1 month" to August 31, +2015 would produce September 31, 2015, which is also not a valid date. + +Certain human descriptions of durations, like "1 month and 35 days" +do not make sense, and human descriptions like "1 month and 5 days" have +ambiguous meaning when used in operations (do you add 1 month first and then +5 days or vice versa). + + +For these reasons, this RFC does not attempt to define a human duration with +fields for years, days or months. Such a duration would be difficult to use +in operations without hard-to-remember ordering rules. + +For these reasons, this RFC does not propose APIs related to human concepts +dates and times. It is intentionally forwards-compatible with such +extensions. + +--- + +Finally, many APIs that **take** a `Duration` can only do something useful with +positive values. For example, a timeout API would not know how to wait a +negative amount of time before timing out. Even discounting the possibility of +coding mistakes, the problem of system clock time travel means that programmers +often produce negative durations that they did not expect, and APIs that +liberally accept negative durations only propagate the error further. + +As a result, this RFC makes a number of simplifying assumptions that can be +relaxed over time with additional types or through further RFCs: + +It provides convenience methods for constructing Durations from larger units +of time (minutes, hours, days), but gives them names like +`Duration.from_standard_hour`. A standard hour is always 3600 seconds, +regardless of leap seconds. + +It provides APIs that are expected to produce positive `Duration`s, and expects +that APIs like timeouts will accept positive `Durations` (which is currently +the case in Rust's standard library). These APIs help the programmer discover +the possibility of system clock time travel, and either handle the error explicitly, +or at least avoid propagating the problem into other APIs (by using `unwrap`). + +It separates monotonic time (`Instant`) from time derived from the system +clock (`SystemTime`), which must account for the possibility of time travel. +This allows methods related to monotonic time to be uncaveated, while working +with the system clock has more methods that return `Result`s. + +This RFC does not attempt to define a type for calendared DateTimes, nor does it +directly address time zones. + +# Proposal + +## Types + +```rust +pub struct Instant { + secs: u64, + nanos: u32 +} + +pub struct SystemTime { + secs: u64, + nanos: u32 +} + +pub struct Duration { + secs: u64, + nanos: u32 +} +``` + +### Instant + +`Instant` is the simplest of the types representing moments in time. It +represents an opaque (non-serializable!) timestamp that is guaranteed to +be monotonic when compared to another `Instant`. + +> In this context, monotonic means that a timestamp created later in real-world +> time will always be not less than a timestamp created earlier in real-world +> time. + +The `Duration` type can be used in conjunction with `Instant`, and these +operations have none of the usual time-related caveats. + +* Add a `Duration` to a `Instant`, producing a new `Instant` +* compare two `Instant`s to each other +* subtract a `Instant` from a later `Instant`, producing a `Duration` +* ask for an amount of time elapsed since a `Instant`, producing a `Duration` + +Asking for an amount of time elapsed from a given `Instant` is a very common +operation that is guaranteed to produce a positive `Duration`. Asking for the +difference between an earlier and a later `Instant` also produces a positive +`Duration` when used correctly. + +This design does not assume that negative `Duration`s are never useful, but +rather that the most common uses of `Duration` do not have a meaningful +use for negative values. Rather than require each API that takes a `Duration` +to produce an `Err` (or `panic!`) when receiving a negative value, this design +optimizes for the broadly useful positive `Duration`. + +```rust +impl Instant { + /// Returns an instant corresponding to "now". + pub fn now() -> Instant; + + /// Panics if `earlier` is later than &self. + /// Because Instant is monotonic, the only time that `earlier` should be + /// a later time is a bug in your code. + pub fn duration_from_earlier(&self, earlier: Instant) -> Duration; + + /// Panics if self is later than the current time (can happen if a Instant + /// is produced synthetically) + pub fn elapsed(&self) -> Duration; +} + +impl Add for Instant { + type Output = Instant; +} + +impl Sub for Instant { + type Output = Instant; +} + +impl PartialEq for Instant; +impl Eq for Instant; +impl PartialOrd for Instant; +impl Ord for Instant; +``` + +For convenience, several new constructors are added to `Duration`. Because any +unit greater than seconds has caveats related to leap seconds, all of the +constructors take "standard" units. For example a "standard minute" is 60 +seconds, while a "standard hour" is 3600 seconds. + +The "standard" terminology comes from [JodaTime][joda-time-standard]. + +[joda-time-standard]: http://joda-time.sourceforge.net/apidocs/org/joda/time/Duration.html#standardDays(long) + +```rust +impl Duration { + /// a standard minute is 60 seconds + /// panics if the number of minutes is larger than u64 seconds + pub fn from_standard_minutes(minutes: u64) -> Duration; + + /// a standard hour is 60 standard minutes + /// panics if the number of hours is larger than u64 seconds + pub fn from_standard_hours(hours: u64) -> Duration; + + /// a standard day is 24 standard hours + /// panics if the number of days is larger than u64 seconds + pub fn from_standard_days(days: u64) -> Duration; +} +``` + +### SystemTime + +**This type should not be used for in-process timestamps, like those used in +benchmarks.** + +A `SystemTime` represents a time stored on the local machine derived from the +system clock (in UTC). For example, it is used to represent `mtime` on the file +system. + +The most important caveat of `SystemTime` is that it is **not monotonic**. This +means that you can save a file to the file system, then save another file to +the file system, **and the second file has an `mtime` earlier than the second**. + +> **This means that an operation that happens after another operation in real +> time may have an earlier `SystemTime`!** + +In practice, most programmers do not think about this kind of "time travel" +with the system clock, leading to strange bugs once the mistaken assumption +propagates through the system. + +This design attempts to help the programmer catch the most egregious of these +kinds of mistakes (unexpected travel **back in time**) before the mistake +propagates. + +```rust +impl SystemTime { + /// Returns the system time corresponding to "now". + pub fn now() -> SystemTime; + + /// Returns an `Err` if `earlier` is later + pub fn duration_from_earlier(&self, earlier: SystemTime) -> Result; + + /// Returns an `Err` if &self is later than the current system time. + pub fn elapsed(&self) -> Result; +} + +impl Add for SystemTime { + type Output = SystemTime; +} + +impl Sub for SystemTime { + type Output = SystemTime; +} + +// An anchor which can be used to generate new SystemTime instances from a known +// Duration or convert a SystemTime to a Duration which can later then be used +// again to recreate the SystemTime. +// +// Defined to be "1970-01-01 00:00:00 UTC" on all systems. +const UNIX_EPOCH: SystemTime = ...; + +// Note that none of these operations actually imply that the underlying system +// operation that produced these SystemTimes happened at the same time +// (for Eq) or before/after (for Ord) than the other system operation. +impl PartialEq for SystemTime; +impl Eq for SystemTime; +impl PartialOrd for SystemTime; +impl Ord for SystemTime; + +impl SystemTimeError { + /// A SystemTimeError originates from attempting to subtract two SystemTime + /// instances, a and b. If a < b then an error is returned, and the duration + /// returned represents (b - a). + pub fn duration(&self) -> Duration; +} +``` + +The main difference from the design of `Instant` is that it is impossible to +know for sure that a `SystemTime` is in the past, even if the operation that +produced it happened in the past (in real time). + +--- + +##### Illustrative Example: + +If a program requests a `SystemTime` that represents the `mtime` of a given file, +then writes a new file and requests its `SystemTime`, it may expect the second +`SystemTime` to be after the first. + +Using `duration_from_earlier` will remind the programmer that "time travel" is +possible, and make it easy to handle that case. As always, the programmer can +use `.unwrap()` in the prototype stage to avoid having to handle the edge-case +yet, while retaining a reminder that the edge-case is possible. + +# Drawbacks + +This RFC defines two new types for describing times, and posits a third type +to complete the picture. At first glance, having three different APIs for +working with times may seem overly complex. + +However, there are significant differences between times that only go forward +and times that can go forward or backward. There are also significant differences +between times represented as a number since an epoch and time represented in +human terms. + +As a result, this RFC chose to make these differences explicit, allowing +ergonomic, uncaveated use of monotonic time, and a small speedbump when +working with times that can move both forward and backward. + +# Alternatives + +One alternative design would be to attempt to have a single unified time +type. The rationale for not doing so is explained under Drawbacks. + +Another possible alternative is to allow free math between instants, +rather than providing operations for comparing later instants to earlier +ones. + +In practice, the vast majority of APIs **taking** a `Duration` expect +a positive-only `Duration`, and therefore code that subtracts a time +from another time will usually want a positive `Duration`. + +The problem is especially acute when working with `SystemTime`, where +it is possible for a question like: "how much time has elapsed since +I created this file" to return a negative Duration! + +This RFC attempts to catch mistakes related to negative `Duration`s at +the point where they are produced, rather than requiring all APIs that +**take** a `Duration` to guard against negative values. + +Because `Ord` is implemented on `SystemTime` and `Instant`, it is +possible to compare two arbitrary times to each other first, and then +use `duration_from_earlier` reliably to get a positive `Duration`. + +# Unresolved Questions + +This RFC leaves types related to human representations of dates and times +to a future proposal. diff --git a/text/1291-promote-libc.md b/text/1291-promote-libc.md new file mode 100644 index 00000000000..c719e69a02b --- /dev/null +++ b/text/1291-promote-libc.md @@ -0,0 +1,308 @@ +- Feature Name: N/A +- Start Date: 2015-09-21 +- RFC PR: [rust-lang/rfcs#1291](https://github.com/rust-lang/rfcs/pull/1291) +- Rust Issue: N/A + +# Summary + +Promote the `libc` crate from the nursery into the `rust-lang` organization +after applying changes such as: + +* Remove the internal organization of the crate in favor of just one flat + namespace at the top of the crate. +* Set up a large number of CI builders to verify FFI bindings across many + platforms in an automatic fashion. +* Define the scope of libc in terms of bindings it will provide for each + platform. + +# Motivation + +The current `libc` crate is a bit of a mess unfortunately, having long since +departed from its original organization and scope of definition. As more +platforms have been added over time as well as more APIs in general, the +internal as well as external facing organization has become a bit muddled. Some +specific concerns related to organization are: + +* There is a vast amount of duplication between platforms with some common + definitions. For example all BSD-like platforms end up defining a similar set + of networking struct constants with the same definitions, but duplicated in + many locations. +* Some subset of `libc` is reexported at the top level via globs, but not all of + `libc` is reexported in this fashion. +* When adding new APIs it's unclear what modules it should be placed into. It's + not always the case that the API being added conforms to one of the existing + standards that a module exist for and it's not always easy to consult the + standard itself to see if the API is in the standard. +* Adding a new platform to liblibc largely entails just copying a huge amount of + code from some previously similar platform and placing it at a new location in + the file. + +Additionally, on the technical and tooling side of things some concerns are: + +* None of the FFI bindings in this module are verified in terms of testing. + This means that they are both not automatically generated nor verified, and + it's highly likely that there are a good number of mistakes throughout. +* It's very difficult to explore the documentation for libc on different + platforms, but this is often one of the more important libraries to have + documentation for across all platforms. + +The purpose of this RFC is to largely propose a reorganization of the libc +crate, along with tweaks to some of the mundane details such as internal +organization, CI automation, how new additions are accepted, etc. These changes +should all help push `libc` to a more more robust position where it can be well +trusted across all platforms both now and into the future! + +# Detailed design + +All design can be previewed as part of an [in progress fork][libc] available on +GitHub. Additionally, all mentions of the `libc` crate in this RFC refer to the +external copy on crates.io, not the in-tree one in the `rust-lang/rust` +repository. No changes are being proposed (e.g. to stabilize) the in-tree copy. + +[libc]: https://github.com/alexcrichton/libc + +### What is this crate? + +The primary purpose of this crate is to provide all of the definitions +necessary to easily interoperate with C code (or "C-like" code) on each of the +platforms that Rust supports. This includes type definitions (e.g. `c_int`), +constants (e.g. `EINVAL`) as well as function headers (e.g. `malloc`). + +One question that typically comes up with this sort of purpose is whether the +crate is "cross platform" in the sense that it basically just works across the +platforms it supports. The `libc` crate, however, **is not intended to be cross +platform** but rather the opposite, an exact binding to the platform in +question. In essence, the `libc` crate is targeted as "replacement for +`#include` in Rust" for traditional system header files, but it makes no +effort to be help being portable by tweaking type definitions and signatures. + +### The Home of `libc` + +Currently this crate resides inside of the main `rust` repo of the `rust-lang` +organization, but this unfortunately somewhat hinders its development as it +takes awhile to land PRs and isn't quite as quick to release as external +repositories. As a result, this RFC proposes having the crate reside externally +in the `rust-lang` organization so additions can be made through PRs (tested +much more quickly). + +The main repository will have a submodule pointing at the external repository to +continue building libstd. + +### Public API + +The `libc` crate will hide all internal organization of the crate from users of +the crate. All items will be reexported at the top level as part of a flat +namespace. This brings with it a number of benefits: + +* The internal structure can evolve over time to better fit new platforms + while being backwards compatible. +* This design matches what one would expect from C, where there's only a flat + namespace available. +* Finding an API is quite easy as the answer is "it's always at the root". + +A downside of this approach, however, is that the public API of `libc` will be +platform-specific (e.g. the set of symbols it exposes is different across +platforms), which isn't seen very commonly throughout the rest of the Rust +ecosystem today. This can be mitigated, however, by clearly indicating that this +is a platform specific library in the sense that it matches what you'd get if +you were writing C code across multiple platforms. + +The API itself will include any number of definitions typically found in C +header files such as: + +* C types, e.g. typedefs, primitive types, structs, etc. +* C constants, e.g. `#define` directives +* C statics +* C functions (their headers) +* C macros (exported as `#[inline]` functions in Rust) + +As a technical detail, all `struct` types exposed in `libc` will be guaranteed +to implement the `Copy` and `Clone` traits. There will be an optional feature of +the library to implement `Debug` for all structs, but it will be turned off by +default. + +### Changes from today + +The [in progress][libc] implementation of this RFC has a number of API changes +and breakages from today's `libc` crate. Almost all of them are minor and +targeted at making bindings more correct in terms of faithfully representing the +underlying platforms. + +There is, however, one large notable change from today's crate. The `size_t`, +`ssize_t`, `ptrdiff_t`, `intptr_t`, and `uintptr_t` types are all defined in +terms of `isize` and `usize` instead of known sizes. Brought up by @briansmith +on [#28096][isizeusize] this helps decrease the number of casts necessary in +normal code and matches the existing definitions on all platforms that `libc` +supports today. In the future if a platform is added where these type +definitions are not correct then new ones will simply be available for that +target platform (and casts will be necessary if targeting it). + +[isizeusize]: https://github.com/rust-lang/rust/pull/28096 + +Note that part of this change depends upon removing the compiler's +lint-by-default about `isize` and `usize` being used in FFI definitions. This +lint is mostly a holdover from when the types were named `int` and `uint` and it +was easy to confuse them with C's `int` and `unsigned int` types. + +The final change to the `libc` crate will be to bump its version to 1.0.0, +signifying that breakage has happened (a bump from 0.1.x) as well as having a +future-stable interface until 2.0.0. + +### Scope of `libc` + +The name "libc" is a little nebulous as to what it means across platforms. It +is clear, however, that this library must have a well defined scope up to which +it can expand to ensure that it doesn't start pulling in dozens of runtime +dependencies to bind all the system APIs that are found. + +Unfortunately, however, this library also can't be "just libc" in the sense of +"just libc.so on Linux," for example, as this would omit common APIs like +pthreads and would also mean that pthreads would be included on platforms like +MUSL (where it is literally inside libc.a). Additionally, the purpose of libc +isn't to provide a cross platform API, so there isn't necessarily one true +definition in terms of sets of symbols that `libc` will export. + +In order to have a well defined scope while satisfying these constraints, this +RFC proposes that this crate will have a scope that is defined separately for +each platform that it targets. The proposals are: + +* Linux (and other unix-like platforms) - the libc, libm, librt, libdl, + libutil, and libpthread libraries. Additional platforms can include libraries + whose symbols are found in these libraries on Linux as well. +* OSX - the common library to link to on this platform is libSystem, but this + transitively brings in quite a few dependencies, so this crate will refine + what it depends upon from libSystem a little further, specifically: + libsystem\_c, libsystem\_m, libsystem\_pthread, libsystem\_malloc and libdyld. +* Windows - the VS CRT libraries. This library is currently intended to be + distinct from the `winapi` crate as well as bindings to common system DLLs + found on Windows, so the current scope of `libc` will be pared back to just + what the CRT contains. This notably means that a large amount of the current + contents will be removed on Windows. + +New platforms added to `libc` can decide the set of libraries `libc` will link +to and bind at that time. + +### Internal structure + +The primary change being made is that the crate will no longer be one large file +sprinkled with `#[cfg]` annotations. Instead, the crate will be split into a +tree of modules, and all modules will reexport the entire contents of their +children. Unlike most libraries, however, most modules in `libc` will be +hidden via `#[cfg]` at compile time. Each platform supported by `libc` will +correspond to a path from a leaf module to the root, picking up more +definitions, types, and constants as the tree is traversed upwards. + +This organization provides a simple method of deduplication between platforms. +For example `libc::unix` contains functions found across all unix platforms +whereas `libc::unix::bsd` is a refinement saying that the APIs within are common +to only BSD-like platforms (these may or may not be present on non-BSD platforms +as well). The benefits of this structure are: + +* For any particular platform, it's easy in the source to look up what its value + is (simply trace the path from the leaf to the root, aka the filesystem + structure, and the value can be found). +* When adding an API it's easy to know **where** the API should be added because + each node in the module hierarchy corresponds clearly to some subset of + platforms. +* Adding new platforms should be a relatively simple and confined operation. New + leaves of the hierarchy would be created and some definitions upwards may be + pushed to lower levels if APIs need to be changed or aren't present on the new + platform. It should be easy to audit, however, that a new platform doesn't + tamper with older ones. + +### Testing + +The current set of bindings in the `libc` crate suffer a drawback in that they +are not verified. This is often a pain point for new platforms where when +copying from an existing platform it's easy to forget to update a constant here +or there. This lack of testing leads to problems like a [wrong definition of +`ioctl`][ioctl] which in turn lead to [backwards compatibility +problems][backcompat] when the API is fixed. + +[ioctl]: https://github.com/rust-lang/rust/pull/26809 +[backcompat]: https://github.com/rust-lang/rust/pull/27762 + +In order to solve this problem altogether, the libc crate will be enhanced with +the ability to automatically test the FFI bindings it contains. As this crate +will begin to live in `rust-lang` instead of the `rust` repo itself, this means +it can leverage external CI systems like Travis CI and AppVeyor to perform these +tasks. + +The [current implementation][ctest] of the binding testing verifies attributes +such as type size/alignment, struct field offset, struct field types, constant +values, function definitions, etc. Over time it can be enhanced with more +metrics and properties to test. + +[ctest]: https://github.com/alexcrichton/ctest + +In theory adding a new platform to `libc` will be blocked until automation can +be set up to ensure that the bindings are correct, but it is unfortunately not +easy to add this form of automation for all platforms, so this will not be a +requirement (beyond "tier 1 platforms"). There is currently automation for the +following targets, however, through Travis and AppVeyor: + +* `{i686,x86_64}-pc-windows-{msvc,gnu}` +* `{i686,x86_64,mips,aarch64}-unknown-linux-gnu` +* `x86_64-unknown-linux-musl` +* `arm-unknown-linux-gnueabihf` +* `arm-linux-androideabi` +* `{i686,x86_64}-apple-{darwin,ios}` + +# Drawbacks + +### Loss of module organization + +The loss of an internal organization structure can be seen as a drawback of this +design. While perhaps not precisely true today, the principle of the structure +was that it is easy to constrain yourself to a particular C standard or subset +of C to in theory write "more portable programs by default" by only using the +contents of the respective module. Unfortunately in practice this does not seem +to be that much in use, and it's also not clear whether this can be expressed +through simply headers in `libc`. For example many platforms will have slight +tweaks to common structures, definitions, or types in terms of signedness or +value, so even if you were restricted to a particular subset it's not clear that +a program would automatically be more portable. + +That being said, it would still be useful to have these abstractions to *some +degree*, but the filp side is that it's easy to build this sort of layer on top +of `libc` as designed here externally on crates.io. For example `extern crate +posix` could just depend on `libc` and reexport all the contents for the +POSIX standard, perhaps with tweaked signatures here and there to work better +across platforms. + +### Loss of Windows bindings + +By only exposing the CRT functions on Windows, the contents of `libc` will be +quite trimmed down which means when accessing similar functions like `send` or +`connect` crates will be required to link to two libraries at least. + +This is also a bit of a maintenance burden on the standard library itself as it +means that all the bindings it uses must move to `src/libstd/sys/windows/c.rs` +in the immedidate future. + +# Alternatives + +* Instead of *only* exporting a flat namespace the `libc` crate could optionally + also do what it does today with respect to reexporting modules corresponding + to various C standards. The downside to this, unfortunately, is that it's + unclear how much portability using these standards actually buys you. + +* The crate could be split up into multiple crates which represent an exact + correspondance to system libraries, but this has the downside of using common + functions available on both OSX and Linux would require at least two `extern + crate` directives and dependencies. + +# Unresolved questions + +* The only platforms without automation currently are the BSD-like platforms + (e.g. FreeBSD, OpenBSD, Bitrig, DragonFly, etc), but if it were possible to + set up automation for these then it would be plausible to actually require + automation for any new platform. It is possible to do this? + +* What is the relation between `std::os::*::raw` and `libc`? Given that the + standard library will probably always depend on an in-tree copy of the `libc` + crate, should `libc` define its own in this case, have the standard library + reexport, and then the out-of-tree `libc` reexports the standard library? + +* Should Windows be supported to a greater degree in `libc`? Should this crate + and `winapi` have a closer relationship? diff --git a/text/1298-incremental-compilation.md b/text/1298-incremental-compilation.md new file mode 100644 index 00000000000..fb8a9e1a860 --- /dev/null +++ b/text/1298-incremental-compilation.md @@ -0,0 +1,648 @@ +- Feature Name: incremental-compilation +- Start Date: 2015-08-04 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary + +Enable the compiler to cache incremental workproducts. + +# Motivation + +The goal of incremental compilation is, naturally, to improve build +times when making small edits. Any reader who has never felt the need +for such a feature is strongly encouraged to attempt hacking on the +compiler or servo sometime (naturally, all readers are so encouraged, +regardless of their opinion on the need for incremental compilation). + +## Basic usage + +The basic usage will be that one enables incremental compilation using +a compiler flag like `-C incremental-compilation=TMPDIR`. The `TMPDIR` +directory is intended to be an empty directory that the compiler can +use to store intermediate by-products; the compiler will automatically +"GC" this directory, deleting older files that are no longer relevant +and creating new ones. + +## High-level design + +The high-level idea is that we will track the following intermediate +workproducts for every function (and, indeed, for other kinds of items +as well, but functions are easiest to describe): + +- External signature + - For a function, this would include the types of its arguments, + where-clauses declared on the function, and so forth. +- MIR + - The MIR represents the type-checked statements in the body, in + simplified forms. It is described by [RFC #1211][1211]. As the MIR + is not fully implemented, this is a non-trivial dependency. We + could instead use the existing annotated HIR, however that would + require a larger effort in terms of porting and adapting data + structures to an incremental setting. Using the MIR simplifies + things in this respect. +- Object files + - This represents the final result of running LLVM. It may be that + the best strategy is to "cache" compiled code in the form of an + rlib that is progressively patched, or it may be easier to store + individual `.o` files that must be relinked (anyone who has worked + in a substantial C++ project can attest, however, that linking can + take a non-trivial amount of time). + +Of course, the key to any incremental design is to determine what must +be changed. This can be encoded in a *dependency graph*. This graph +connects the various bits of the HIR to the external products +(signatures, MIR, and object files). It is of the utmost importance +that this dependency graph is complete: if edges are missing, the +result will be obscure errors where changes are not fully propagated, +yielding inexplicable behavior at runtime. This RFC proposes an +automatic scheme based on encapsulation. + +### Interaction with lints and compiler plugins + +Although rustc does not yet support compiler plugins through a stable +interface, we have long planned to allow for custom lints, syntax +extensions, and other sorts of plugins. It would be nice therefore to +be able to accommodate such plugins in the design, so that their +inputs can be tracked and accounted for as well. + +## Interaction with optimization + +It is important to clarify, though, that this design does not attempt +to enable full optimizing for incremental compilation; indeed the two +are somewhat at odds with one another, as full optimization may +perform inlining and inter-function analysis, which can cause small +edits in one function to affect the generated code of another. This +situation is further exacerbated by the fact that LLVM does not +provide any way to track these sorts of dependencies (e.g., one cannot +even determine what inlining took place, though @dotdash suggested a +clever trick of using llvm lifetime hints). Strategies for handling +this are discussed in the [Optimization section](#optimization) below. + +# Detailed design + +We begin with a high-level execution plan, followed by sections that +explore aspects of the plan in more detail. The high-level summary +includes links to each of the other sections. + +## High-level execution plan + +Regardless of whether it is invoked in incremental compilation mode or +not, the compiler will always parse and macro expand the entire crate, +resulting in a HIR tree. Once we have a complete HIR tree, and if we +are invoked in incremental compilation mode, the compiler will then +try to determine which parts of the crate have changed since the last +execution. For each item, we compute a [(mostly) stable id](#defid) +based primarily on the item's name and containing module. We then +compute a hash of its contents and compare that hash against the hash +that the item had in the compilation (if any). + +Once we know which items have changed, we consult a +[dependency graph](#depgraph) to tell us which artifacts are still +usable. These artifacts can take the form of serializing MIR graphs, +LLVM IR, compiled object code, and so forth. The dependency graph +tells us which bits of AST contributed to each artifact. It is +constructed by dynamically monitoring what the compiler accesses +during execution. + +Finally, we can begin execution. The compiler is currently structured +in a series of passes, each of which walks the entire AST. We do not +need to change this structure to enable incremental +compilation. Instead, we continue to do every pass as normal, but when +we come to an item for which we have a pre-existing artifact (for +example, if we are type-checking a fn that has not changed since the +last execution), we can simply skip over that fn instead. Similar +strategies can be used to enable lazy or parallel compilation at later +times. (Eventually, though, it might be nice to restructure the +compiler so that it operates in more of a demand driven style, rather +than a series of sweeping passes.) + +When we come to the final LLVM stages, we must +[separate the functions into distinct "codegen units"](#optimization) +for the purpose of LLVM code generation. This will build on the +existing "codegen-units" used for parallel code generation. LLVM may +perform inlining or interprocedural analysis within a unit, but not +across units, which limits the amount of reoptimization needed when +one of those functions changes. + +Finally, the RFC closes with a discussion of +[testing strategies](#testing) we can use to help avoid bugs due to +incremental compilation. + +### Staging + +One important question is how to stage the incremental compilation +work. That is, it'd be nice to start seeing some benefit as soon as +possible. One possible plan is as follows: + +1. Implement stable def-ids (in progress, nearly complete). +2. Implement the dependency graph and tracking system (started). +3. Experiment with distinct modularization schemes to find the one which + gives the best fragmentation with minimal performance impact. + Or, at least, implement something finer-grained than today's codegen-units. +4. Persist compiled object code only. +5. Persist intermediate MIR and generated LLVM as well. + +The most notable staging point here is that we can begin by just +saving object code, and then gradually add more artifacts that get +saved. The effect of saving fewer things (such as only saving object +code) will simply be to make incremental compilation somewhat less +effective, since we will be forced to re-type-check and re-trans +functions where we might have gotten away with only generating new +object code. However, this is expected to be be a second order effect +overall, particularly since LLVM optimization time can be a very large +portion of compilation. + + +## Handling DefIds + +In order to correlate artifacts between compilations, we need some +stable way to name items across compilations (and across crates). The +compiler currently uses something called a `DefId` to identify each +item. However, these ids today are based on a node-id, which is just +an index into the HIR and hence will change whenever *anything* +preceding it in the HIR changes. We need to make the `DefId` for an +item independent of changes to other items. + +Conceptually, the idea is to change `DefId` into the pair of a crate +and a path: + +``` +DEF_ID = (CRATE, PATH) +CRATE = +PATH = PATH_ELEM | PATH :: PATH_ELEM +PATH_ELEM = (PATH_ELEM_DATA, ) +PATH_ELEM_DATA = Crate(ID) + | Mod(ID) + | Item(ID) + | TypeParameter(ID) + | LifetimeParameter(ID) + | Member(ID) + | Impl + | ... +``` + +However, rather than actually store the path in the compiler, we will +instead intern the paths in the `CStore`, and the `DefId` will simply +store an integer. So effectively the `node` field of `DefId`, which +currently indexes into the HIR of the appropriate crate, becomes an +index into the crate's list of paths. + +For the most part, these paths match up with user's intuitions. So a +struct `Foo` declared in a module `bar` would just have a path like +`bar::Foo`. However, the paths are also able to express things for +which there is no syntax, such as an item declared within a function +body. + +### Disambiguation + +For the most part, paths should naturally be unique. However, there +are some cases where a single parent may have multiple children with +the same path. One case would be erroneous programs, where there are +(e.g.) two structs declared with the same name in the same +module. Another is that some items, such as impls, do not have a name, +and hence we cannot easily distinguish them. Finally, it is possible +to declare multiple functions with the same name within function bodies: + +```rust +fn foo() { + { + fn bar() { } + } + + { + fn bar() { } + } +} +``` + +All of these cases are handled by a simple *disambiguation* mechanism. +The idea is that we will assign a path to each item as we traverse the +HIR. If we find that a single parent has two children with the same +name, such as two impls, then we simply assign them unique integers in +the order that they appear in the program text. For example, the +following program would use the paths shown (I've elided the +disambiguating integer except where it is relevant): + +```rust +mod foo { // Path: ::foo + pub struct Type { } // Path: ::foo::Type + impl Type { // Path: ::foo::(,0) + fn bar() {..} // Path: ::foo::(,0)::bar + } + impl Type { } // Path: ::foo::(,1) +} +``` + +Note that the impls were arbitrarily assigned indices based on the order +in which they appear. This does mean that reordering impls may cause +spurious recompilations. We can try to mitigate this somewhat by making the +path entry for an impl include some sort of hash for its header or its contents, +but that will be something we can add later. + +*Implementation note:* Refactoring DefIds in this way is a large +task. I've made several attempts at doing it, but my latest branch +appears to be working out (it is not yet complete). As a side benefit, +I've uncovered a few fishy cases where we using the node id from +external crates to index into the local crate's HIR map, which is +certainly incorrect. --nmatsakis + + +## Identifying and tracking dependencies + +### Core idea: a fine-grained dependency graph + +Naturally any form of incremental compilation requires a detailed +understanding of how each work item is dependent on other work items. +This is most readily visualized as a dependency graph; the +finer-grained the nodes and edges in this graph, the better. For example, +consider a function `foo` that calls a function `bar`: + +```rust +fn foo() { + ... + bar(); + ... +} +``` + +Now imagine that the body (but not the external signature) of `bar` +changes. Do we need to type-check `foo` again? Of course not: `foo` +only cares about the signature of `bar`, not its body. For the +compiler to understand this, though, we'll need to create distinct +graph nodes for the signature and body of each function. + +(Note that our policy of making "external signatures" fully explicit +is helpful here. If we supported, e.g., return type inference, than it +would be harder to know whether a change to `bar` means `foo` must be +recompiled.) + +### Categories of nodes + +This section gives a kind of "first draft" of the set of graph +nodes/edges that we will use. It is expected that the full set of +nodes/edges will evolve in the course of implementation (and of course +over time as well). In particular, some parts of the graph as +presented here are intentionally quite coarse and we envision that the +graph will be gradually more fine-grained. + +The nodes fall into the following categories: + +- **HIR nodes.** Represent some portion of the input HIR. For example, + the body of a fn as a HIR node. These are the inputs to the entire + compilation process. + - Examples: + - `SIG(X)` would represent the signature of some fn item + `X` that the user wrote (i.e., the names of the types, + where-clauses, etc) + - `BODY(X)` would be the body of some fn item `X` + - and so forth +- **Metadata nodes.** These represent portions of the metadata from + another crate. Each piece of metadata will include a hash of its + contents. When we need information about an external item, we load + that info out of the metadata and add it into the IR nodes below; + this can be represented in the graph using edges. This means that + incremental compilation can also work across crates. +- **IR nodes.** Represent some portion of the computed IR. For + example, the MIR representation of a fn body, or the `ty` + representation of a fn signature. These also frequently correspond + to a single entry in one of the various compiler hashmaps. These are + the outputs (and intermediate steps) of the compilation process + - Examples: + - `ITEM_TYPE(X)` -- entry in the obscurely named `tcache` table + for `X` (what is returned by the rather-more-clearly-named + `lookup_item_type`) + - `PREDICATES(X)` -- entry in the `predicates` table + - `ADT(X)` -- ADT node for a struct (this may want to be more + fine-grained, particularly to cover the ivars) + - `MIR(X)` -- the MIR for the item `X` + - `LLVM(X)` -- the LLVM IR for the item `X` + - `OBJECT(X)` -- the object code generated by compiling some item + `X`; the precise way that this is saved will depend on whether + we use `.o` files that are linked together, or if we attempt to + amend the shared library in place. +- **Procedure nodes.** These represent various passes performed by the + compiler. For example, the act of type checking a fn body, or the + act of constructing MIR for a fn body. These are the "glue" nodes + that wind up reading the inputs and creating the outputs, and hence + which ultimately tie the graph together. + - Examples: + - `COLLECT(X)` -- the collect code executing on item `X` + - `WFCHECK(X)` -- the wfcheck code executing on item `X` + - `BORROWCK(X)` -- the borrowck code executing on item `X` + +To see how this all fits together, let's consider the graph for a +simple example: + +```rust +fn foo() { + bar(); +} + +fn bar() { +} +``` + +This might generate a graph like the following (the following sections +will describe how this graph is constructed). Note that this is not a +complete graph, it only shows the data needed to produce `MIR(foo)`. + +``` +BODY(foo) ----------------------------> TYPECK(foo) --> MIR(foo) + ^ ^ ^ ^ | +SIG(foo) ----> COLLECT(foo) | | | | | + | | | | | v + +--> ITEM_TYPE(foo) -----+ | | | LLVM(foo) + +--> PREDICATES(foo) ------+ | | | + | | | +SIG(bar) ----> COLLECT(bar) | | v + | | | OBJECT(foo) + +--> ITEM_TYPE(bar) ---------+ | + +--> PREDICATES(bar) ----------+ +``` + +As you can see, this graph indicates that if the signature of either +function changes, we will need to rebuild the MIR for `foo`. But there +is no path from the body of `bar` to the MIR for foo, so changes there +need not trigger a rebuild (we are assuming here that `bar` is not +inlined into `foo`; see the [section on optimizations](#optimization) +for more details on how to handle those sorts of dependencies). + +### Building the graph + +It is very important the dependency graph contain *all* edges. If any +edges are missing, it will mean that we will get inconsistent builds, +where something should have been rebuilt what was not. Hand-coding a +graph like this, therefore, is probably not the best choice -- we +might get it right at first, but it's easy to for such a setup to fall +out of sync as the code is edited. (For example, if a new table is +added, or a function starts reading data that it didn't before.) + +Another consideration is compiler plugins. At present, of course, we +don't have a stable API for such plugins, but eventually we'd like to +support a rich family of them, and they may want to participate in the +incremental compilation system as well. So we need to have an idea of +what data a plugin accesses and modifies, and for what purpose. + +The basic strategy then is to build the graph dynamically with an API +that looks something like this: + +- `push_procedure(procedure_node)` +- `pop_procedure(procedure_node)` +- `read_from(data_node)` +- `write_to(data_node)` + +Here, the `procedure_node` arguments are one of the procedure labels +above (like `COLLECT(X)`), and the `data_node` arguments are either +HIR or IR nodes (e.g., `SIG(X)`, `MIR(X)`). + +The idea is that we maintain for each thread a stack of active +procedures. When `push_procedure` is called, a new entry is pushed +onto that stack, and when `pop_procedure` is called, an entry is +popped. When `read_from(D)` is called, we add an edge from `D` to the +top of the stack (it is an error if the stack is empty). Similarly, +`write_to(D)` adds an edge from the top of the stack to `D`. + +Naturally it is easy to misuse the above methods: one might forget to +push/pop a procedure at the right time, or fail to invoke +read/write. There are a number of refactorings we can do on the +compiler to make this scheme more robust. + +#### Procedures + +Most of the compiler passes operate an item at a time. Nonetheless, +they are largely encoded using the standard visitor, which walks all +HIR nodes. We can refactor most of them to instead use an outer +visitor, which walks items, and an inner visitor, which walks a +particular item. (Many passes, such as borrowck, already work this +way.) This outer visitor will be parameterized with the label for the +pass, and will automatically push/pop procedure nodes as appropriate. +This means that as long as you base your pass on the generic +framework, you don't really have to worry. + +In general, while I described the general case of a stack of procedure +nodes, it may be desirable to try and maintain the invariant that +there is only ever one procedure node on the stack at a +time. Otherwise, failing to push/pop a procedure at the right time +could result in edges being added to the wrong procedure. It is likely +possible to refactor things to maintain this invariant, but that has +to be determined as we go. + +#### IR nodes + +Adding edges to the IR nodes that represent the compiler's +intermediate byproducts can be done by leveraging privacy. The idea is +to enforce the use of accessors to the maps and so forth, rather than +allowing direct access. These accessors will call the `read_from` and +`write_to` methods as appropriate to add edges to/from the current +active procedure. + +#### HIR nodes + +HIR nodes are a bit trickier to encapsulate. After all, the HIR map +itself gives access to the root of the tree, which in turn gives +access to everything else -- and encapsulation is harder to enforce +here. + +Some experimentation will be required here, but the rough plan is to: + +1. Leveraging the HIR, move away from storing the HIR as one large tree, + and instead have a tree of items, with each item containing only its own + content. + - This way, giving access to the HIR node for an item doesn't implicitly + give access to all of its subitems. + - Ideally this would match precisely the HIR nodes we setup, which + means that e.g. a function would have a subtree corresponding to + its signature, and a separating subtree corresponding to its + body. + - We can still register the lexical nesting of items by linking "indirectly" + via a `DefId`. +2. Annotate the HIR map accessor methods so that they add appropriate + read/write edges. + +This will integrate with the "default visitor" described under +procedure nodes. This visitor can hand off just an opaque id for each +item, requiring the pass itself to go through the map to fetch the +actual HIR, thus triggering a read edge (we might also bake this +behavior into the visitor for convenience). + +### Persisting the graph + +Once we've built the graph, we have to persist it, along with some +associated information. The idea is that the compiler, when invoked, +will be supplied with a directory. It will store temporary files in +there. We could also consider extending the design to support use by +multiple simultaneous compiler invocations, which could mean +incremental compilation results even across branches, much like ccache +(but this may require tweaks to the GC strategy). + +Once we get to the point of persisting the graph, we don't need the +full details of the graph. The process nodes, in particular, can be +removed. They exist only to create links between the other nodes. To +remove them, we first compute the transitive reachability relationship +and then drop the process nodes out of the graph, leaving only the HIR +nodes (inputs) and IR nodes (output). (In fact, we only care about +the IR nodes that we intend to persist, which may be only a subset of +the IR nodes, so we can drop those that we do not plan to persist.) + +For each HIR node, we will hash the HIR and store that alongside the +node. This indicates precisely the state of the node at the time. +Note that we only need to hash the HIR itself; contextual information +(like `use` statements) that are needed to interpret the text will be +part of a separate HIR node, and there should be edges from that node +to the relevant compiler data structures (such as the name resolution +tables). + +For each IR node, we will serialize the relevant information from the +table and store it. The following data will need to be serialized: + +- Types, regions, and predicates +- ADT definitions +- MIR definitions +- Identifiers +- Spans + +This list was gathered primarily by spelunking through the compiler. +It is probably somewhat incomplete. The appendix below lists an +exhaustive exploration. + +### Reusing and garbage collecting artifacts + +The general procedure when the compiler starts up in incremental mode +will be to parse and macro expand the input, create the corresponding +set of HIR nodes, and compute their hashes. We can then load the +previous dependency graph and reconcile it against the current state: + +- If the dep graph contains a HIR node that is no longer present in the + source, that node is queued for deletion. +- If the same HIR node exists in both the dep graph and the input, but + the hash has changed, that node is queued for deletion. +- If there is a HIR node that exists only in the input, it is added + to the dep graph with no dependencies. + +We then delete the transitive closure of nodes queued for deletion +(that is, all the HIR nodes that have changed or been removed, and all +nodes reachable from those HIR nodes). As part of the deletion +process, we remove whatever on disk artifact that may have existed. + + +### Handling spans + +There are times when the precise span of an item is a significant part +of its metadata. For example, debuginfo needs to identify line numbers +and so forth. However, editing one fn will affect the line numbers for +all subsequent fns in the same file, and it'd be best if we can avoid +recompiling all of them. Our plan is to phase span support in incrementally: + +1. Initially, the AST hash will include the filename/line/column, + which does mean that later fns in the same file will have to be + recompiled (somewhat unnnecessarily). +2. Eventually, it would be better to encode spans by identifying a + particular AST node (relative to the root of the item). Since we + are hashing the structure of the AST, we know the AST from the + previous and current compilation will match, and thus we can + compute the current span by finding tha corresponding AST node and + loading its span. This will require some refactoring and work however. + + +## Optimization and codegen units + +There is an inherent tension between incremental compilation and full +optimization. Full optimization may perform inlining and +inter-function analysis, which can cause small edits in one function +to affect the generated code of another. This situation is further +exacerbated by the fact that LLVM does not provide any means to track +when one function was inlined into another, or when some sort of +interprocedural analysis took place (to the best of our knowledge, at +least). + +This RFC proposes a simple mechanism for permitting aggressive +optimization, such as inlining, while also supporting reasonable +incremental compilation. The idea is to create *codegen units* that +compartmentalize closely related functions (for example, on a module +boundary). This means that those compartmentalized functions may +analyze one another, while treating functions from other compartments +as opaque entities. This means that when a function in compartment X +changes, we know that functions from other compartments are unaffected +and their object code can be reused. Moreover, while the other +functions in compartment X must be re-optimized, we can still reuse +the existing LLVM IR. (These are the same codegen units as we use for +parallel codegen, but setup differently.) + +In terms of the dependency graph, we would create one IR node +representing the codegen unit. This would have the object code as an +associated artifact. We would also have edges from each component of +the codegen unit. As today, generic or inlined functions would not +belong to any codegen unit, but rather would be instantiated anew into +each codegen unit in which they are (transitively) referenced. + +There is an analogy here with C++, which naturally faces the same +problems. In that setting, templates and inlineable functions are +often placed into header files. Editing those header files naturally +triggers more recompilation. The compiler could employ a similar +strategy by replicating things that look like good candidates for +inlining into each module; call graphs and profiling information may +be a good input for such heuristics. + + +## Testing strategy + +If we are not careful, incremental compilation has the potential to +produce an infinite stream of irreproducible bug reports, so it's +worth considering how we can best test this code. + +### Regression tests + +The first and most obvious piece of infrastructure is something for +reliable regression testing. The plan is simply to have a series of +sources and patches. The source will have each patch applied in +sequence, rebuilding (incrementally) at each point. We can then check +that (a) we only rebuilt what we expected to rebuild and (b) compare +the result with the result of a fresh build from scratch. This allows +us to build up tests for specific scenarios or bug reports, but +doesn't help with *finding* bugs in the first place. + +### Replaying crates.io versions and git history + +The next step is to search across crates.io for consecutive +releases. For a given package, we can checkout version `X.Y` and then +version `X.(Y+1)` and check that incrementally building from one to +the other is successful and that all tests still yield the same +results as before (pass or fail). + +A similar search can be performed across git history, where we +identify pairs of consecutive commits. This has the advantage of being +more fine-grained, but the disadvantage of being a MUCH larger search +space. + +### Fuzzing + +The problem with replaying crates.io versions and even git commits is +that they are probably much larger changes than the typical +recompile. Another option is to use fuzzing, making "innocuous" +changes that should trigger a recompile. Fuzzing is made easier here +because we have an oracle -- that is, we can check that the results of +recompiling incrementally match the results of compiling from scratch. +It's also not necessary that the edits are valid Rust code, though we +should test that too -- in particular, we want to test that the proper +errors are reported when code is invalid, as well. @nrc also +suggested a clever hybrid, where we use git commits as a source for +the fuzzer's edits, gradually building up the commit. + +# Drawbacks + +The primary drawback is that incremental compilation may introduce a +new vector for bugs. The design mitigates this concern by attempting +to make the construction of the dependency graph as automated as +possible. We also describe automated testing strategies. + +# Alternatives + +This design is an evolution from [RFC 594][]. + +# Unresolved questions + +None. + +[1211]: https://github.com/rust-lang/rfcs/pull/1211 +[RFC 594]: https://github.com/rust-lang/rfcs/pull/594 diff --git a/text/1300-intrinsic-semantics.md b/text/1300-intrinsic-semantics.md new file mode 100644 index 00000000000..a4e3ffbe551 --- /dev/null +++ b/text/1300-intrinsic-semantics.md @@ -0,0 +1,49 @@ +- Feature Name: intrinsic-semantics +- Start Date: 2015-09-29 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1300 +- Rust Issue: N/A + +# Summary + +Define the general semantics of intrinsic functions. This does not define the semantics of the +individual intrinsics, instead defines the semantics around intrinsic functions in general. + +# Motivation + +Intrinsics are currently poorly-specified in terms of how they function. This means they are a +cause of ICEs and general confusion. The poor specification of them also means discussion affecting +intrinsics gets mired in opinions about what intrinsics should be like and how they should act or +be implemented. + +# Detailed design + +Intrinsics are currently implemented by generating the code for the intrinsic at the call +site. This allows for intrinsics to be implemented much more efficiently in many cases. For +example, `transmute` is able to evaluate the input expression directly into the storage for the +result, removing a potential copy. This is the main idea of intrinsics, a way to generate code that +is otherwise inexpressible in Rust. + +Keeping this in-place behaviour is desirable, so this RFC proposes that intrinsics should only be +usable as functions when called. This is not a change from the current behaviour, as you already +cannot use intrinsics as function pointers. Using an intrinsic in any way other than directly +calling should be considered an error. + +Intrinsics should continue to be defined and declared the same way. The `rust-intrinsic` and +`platform-intrinsic` ABIs indicate that the function is an intrinsic function. + +# Drawbacks + +* Fewer bikesheds to paint. +* Doesn't allow intrinsics to be used as regular functions. (Note that this is not something we + have evidence to suggest is a desired property, as it is currently the case anyway) + +# Alternatives + +* Allow coercion to regular functions and generate wrappers. This is similar to how we handle named + tuple constructors. Doing this undermines the idea of intrinsics as a way of getting the compiler + to generate specific code at the call-site however. +* Do nothing. + +# Unresolved questions + +None. diff --git a/text/1307-osstring-methods.md b/text/1307-osstring-methods.md new file mode 100644 index 00000000000..51d4ca1991d --- /dev/null +++ b/text/1307-osstring-methods.md @@ -0,0 +1,77 @@ +- Feature Name: `osstring_simple_functions` +- Start Date: 2015-10-04 +- RFC PR: [rust-lang/rfcs#1307](https://github.com/rust-lang/rfcs/pull/1307) +- Rust Issue: [rust-lang/rust#29453](https://github.com/rust-lang/rust/issues/29453) + +# Summary + +Add some additional utility methods to OsString and OsStr. + +# Motivation + +OsString and OsStr are extremely bare at the moment; some utilities would make them +easier to work with. The given set of utilities is taken from String, and don't add +any additional restrictions to the implementation. + +I don't think any of the proposed methods are controversial. + +# Detailed design + +Add the following methods to OsString: + +```rust +/// Creates a new `OsString` with the given capacity. The string will be able +/// to hold exactly `capacity` bytes without reallocating. If `capacity` is 0, +/// the string will not allocate. +/// +/// See main `OsString` documentation information about encoding. +fn with_capacity(capacity: usize) -> OsString; + +/// Truncates `self` to zero length. +fn clear(&mut self); + +/// Returns the number of bytes this `OsString` can hold without reallocating. +/// +/// See `OsString` introduction for information about encoding. +fn capacity(&self) -> usize; + +/// Reserves capacity for at least `additional` more bytes to be inserted in the +/// given `OsString`. The collection may reserve more space to avoid frequent +/// reallocations. +fn reserve(&mut self, additional: usize); + +/// Reserves the minimum capacity for exactly `additional` more bytes to be +/// inserted in the given `OsString`. Does nothing if the capacity is already +/// sufficient. +/// +/// Note that the allocator may give the collection more space than it +/// requests. Therefore capacity can not be relied upon to be precisely +/// minimal. Prefer reserve if future insertions are expected. +fn reserve_exact(&mut self, additional: usize); +``` + +Add the following methods to OsStr: + +```rust +/// Checks whether `self` is empty. +fn is_empty(&self) -> bool; + +/// Returns the number of bytes in this string. +/// +/// See `OsStr` introduction for information about encoding. +fn len(&self) -> usize; +``` + +# Drawbacks + +The meaning of `len()` might be a bit confusing because it's the size of +the internal representation on Windows, which isn't otherwise visible to the +user. + +# Alternatives + +None. + +# Unresolved questions + +None. diff --git a/text/1317-ide.md b/text/1317-ide.md new file mode 100644 index 00000000000..579c4c1d96a --- /dev/null +++ b/text/1317-ide.md @@ -0,0 +1,302 @@ +- Feature Name: n/a +- Start Date: 2015-10-13 +- RFC PR: [rust-lang/rfcs#1317](https://github.com/rust-lang/rfcs/pull/1317) +- Rust Issue: [rust-lang/rust#31548](https://github.com/rust-lang/rust/issues/31548) + +# Summary + +This RFC describes the Rust Language Server (RLS). This is a program designed to +service IDEs and other tools. It offers a new access point to compilation and +APIs for getting information about a program. The RLS can be thought of as an +alternate compiler, but internally will use the existing compiler. + +Using the RLS offers very low latency compilation. This allows for an IDE to +present information based on compilation to the user as quickly as possible. + + +## Requirements + +To be concrete about the requirements for the RLS, it should enable the +following actions: + +* show compilation errors and warnings, updated as the user types, +* code completion as the user types, +* highlight all references to an item, +* find all references to an item, +* jump to definition. + +These requirements will be covered in more detail in later sections. + + +## History note + +This RFC started as a more wide-ranging RFC. Some of the details have been +scaled back to allow for more focused and incremental development. + +Parts of the RFC dealing with robust compilation have been removed - work here +is ongoing and mostly doesn't require an RFC. + +The RLS was earlier referred to as the oracle. + + +# Motivation + +Modern IDEs are large and complex pieces of software; creating a new one from +scratch for Rust would be impractical. Therefore we need to work with existing +IDEs (such as Eclipse, IntelliJ, and Visual Studio) to provide functionality. +These IDEs provide excellent editor and project management support out of the +box, but know nothing about the Rust language. This information must come from +the compiler. + +An important aspect of IDE support is that response times must be extremely +quick. Users expect some feedback as they type. Running normal compilation of an +entire project is far too slow. Furthermore, as the user is typing, the program +will not be a valid, complete Rust program. + +We expect that an IDE may have its own lexer and parser. This is necessary for +the IDE to quickly give parse errors as the user types. Editors are free to rely +on the compiler's parsing if they prefer (the compiler will do its own parsing +in any case). Further information (name resolution, type information, etc.) will +be provided by the RLS. + +## Requirements + +We stated some requirements in the summary, here we'll cover more detail and the +workflow between IDE and RLS. + +The RLS should be safe to use in the face of concurrent actions. For example, +multiple requests for compilation could occur, with later requests occurring +before earlier requests have finished. There could be multiple clients making +requests to the RLS, some of which may mutate its data. The RLS should provide +reliable and consistent responses. However, it is not expected that clients are +totally isolated, e.g., if client 1 updates the program, then client 2 requests +information about the program, client 2's response will reflect the changes made +by client 1, even if these are not otherwise known to client 2. + + +### Show compilation errors and warnings, updated as the user types + +The IDE will request compilation of the in-memory program. The RLS will compile +the program and asynchronously supply the IDE with errors and warnings. + +### Code completion as the user types + +The IDE will request compilation of the in-memory program and request code- +completion options for the cursor position. The RLS will compile the program. As +soon as it has enough information for code-completion it will return options to +the IDE. + +* The RLS should return code-completion options asynchronously to the IDE. + Alternatively, the RLS could block the IDE's request for options. +* The RLS should not filter the code-completion options. For example, if the + user types `foo.ba` where `foo` has available fields `bar` and `qux`, it + should return both these fields, not just `bar`. The IDE can perform it's own + filtering since it might want to perform spell checking, etc. Put another way, + the RLS is not a code completion tool, but supplies the low-level data that a + code completion tool uses to provide suggestions. + +### Highlight all references to an item + +The IDE requests all references in the same file based on a position in the +file. The RLS returns a list of spans. + +### Find all references to an item + +The IDE requests all references based on a position in the file. The RLS returns +a list of spans. + +### Jump to definition + +The IDE requests the definition of an item based on a position in a file. The RLS +returns a list of spans (a list is necessary since, for example, a dynamically +dispatched trait method could be defined in multiple places). + + +# Detailed design + +## Architecture + +The basic requirements for the architecture of the RLS are that it should be: + +* reusable by different clients (IDEs, tools, ...), +* fast (we must provide semantic information about a program as the user types), +* handle multi-crate programs, +* consistent (it should handle multiple, potentially mutating, concurrent requests). + +The RLS will be a long running daemon process. Communication between the RLS and +an IDE will be via IPC calls (tools (for example, Racer) will also be able to +use the RLS as an in-process library.). The RLS will include the compiler as a +library. + +The RLS has three main components - the compiler, a database, and a work queue. + +The RLS accepts two kinds of requests - compilation requests and queries. It +will also push data to registered programs (generally triggered by compilation +completing). Essentially, all communication with the RLS is asynchronous (when +used as an in-process library, the client will be able to use synchronous +function calls too). + +The work queue is used to sequentialise requests and ensure consistency of +responses. Both compilation requests and queries are stored in the queue. Some +compilation requests can cause earlier compilation requests to be canceled. +Queries blocked on the earlier compilation then become blocked on the new +request. + +In the future, we should move queries ahead of compilation requests where +possible. + +When compilation completes, the database is updated (see below for more +details). All queries are answered from the database. The database has data for +the whole project, not just one crate. This also means we don't need to keep the +compiler's data in memory. + + +## Compilation + +The RLS is somewhat parametric in its compilation model. Theoretically, it could +run a full compile on the requested crate, however this would be too slow in +practice. + +The general procedure is that the IDE (or other client) requests that the RLS +compile a crate. It is up to the IDE to interact with Cargo (or some other +build system) in order to produce the correct build command and to ensure that +any dependencies are built. + +Initially, the RLS will do a standard incremental compile on the specified +crate. See [RFC PR 1298](https://github.com/rust-lang/rfcs/pull/1298) for more +details on incremental compilation. + +The crate being compiled should include any modifications made in the client and +not yet committed to a file (e.g., changes the IDE has in memory). The client +should pass such changes to the RLS along with the compilation request. + +I see two ways to improve compilation times: lazy compilation and keeping the +compiler in memory. We might also experiment with having the IDE specify which +parts of the program have changed, rather than having the compiler compute this. + +### Lazy compilation + +With lazy compilation the IDE requests that a specific item is compiled, rather +than the whole program. The compiler compiles this function compiling other +items only as necessary to compile the requested item. + +Lazy compilation should also be incremental - an item is only compiled if +required *and* if it has changed. + +Obviously, we could miss some errors with pure lazy compilation. To address this +the RLS schedules both a lazy and a full (but still incremental) compilation. +The advantage of this approach is that many queries scheduled after compilation +can be performed after the lazy compilation, but before the full compilation. + +### Keeping the compiler in memory + +There are still overheads with the incremental compilation approach. We must +startup the compiler initialising its data structures, we must parse the whole +crate, and we must read the incremental compilation data and metadata from disk. + +If we can keep the compiler in memory, we avoid these costs. + +However, this would require some significant refactoring of the compiler. There +is currently no way to invalidate data the compiler has already computed. It +also becomes difficult to cancel compilation: if we receive two compile requests +in rapid succession, we may wish to cancel the first compilation before it +finishes, since it will be wasted work. This is currently easy - the compilation +process is killed and all data released. However, if we want to keep the +compiler in memory we must invalidate some data and ensure the compiler is in a +consistent state. + + +### Compilation output + +Once compilation is finished, the RLS's database must be updated. Errors and +warnings produced by the compiler are stored in the database. Information from +name resolution and type checking is stored in the database (exactly which +information will grow with time). The analysis information will be provided by +the save-analysis API. + +The compiler will also provide data on which (old) code has been invalidated. +Any information (including errors) in the database concerning this code is +removed before the new data is inserted. + + +### Multiple crates + +The RLS does not track dependencies, nor much crate information. However, it +will be asked to compile many crates and it will keep track of which crate data +belongs to. It will also keep track of which crates belong to a single program +and will not share data between programs, even if the same crate is shared. This +helps avoid versioning issues. + + +## Versioning + +The RLS will be released using the same train model as Rust. A version of the +RLS is pinned to a specific version of Rust. If users want to operate with +multiple versions, they will need multiple versions of the RLS (I hope we can +extend multirust/rustup.rs to handle the RLS as well as Rust). + + +# Drawbacks + +It's a lot of work. But better we do it once than each IDE doing it themselves, +or having sub-standard IDE support. + + +# Alternatives + +The big design choice here is using a database rather than the compiler's data +structures. The primary motivation for this is the 'find all references' +requirement. References could be in multiple crates, so we would need to reload +incremental compilation data (which must include the serialised MIR, or +something equivalent) for all crates, then search this data for matching +identifiers. Assuming the serialisation format is not too complex, this should +be possible in a reasonable amount of time. Since identifiers might be in +function bodies, we can't rely on metadata. + +This is a reasonable alternative, and may be simpler than the database approach. +However, it is not planned to output this data in the near future (the initial +plan for incremental compilation is to not store information required to re- +check function bodies). This approach might be too slow for very large projects, +we might wish to do searches in the future that cannot be answered without doing +the equivalent of a database join, and the database simplifies questions about +concurrent accesses. + +We could only provide the RLS as a library, rather than providing an API via +IPC. An IPC interface allows a single instance of the RLS to service multiple +programs, is language-agnostic, and allows for easy asynchronous-ness between +the RLS and its clients. It also provides isolation - a panic in the RLS will +not cause the IDE to crash, not can a long-running operation delay the IDE. Most +of these advantages could be captured using threads. However, the cost of +implementing an IPC interface is fairly low and means less effort for clients, +so it seems worthwhile to provide. + +Extending this idea, we could do less than the RLS - provide a high-level +library API for the Rust compiler and let other projects do the rest. In +particular, Racer does an excellent job at providing the information the RLS +would provide without much information from the compiler. This is certainly less +work for the compiler team and more flexible for clients. On the other hand, it +means more work for clients and possible fragmentation. Duplicated effort means +that different clients will not benefit from each other's innovations. + +The RLS could do more - actually perform some of the processing tasks usually +done by IDEs (such as editing source code) or other tools (refactoring, +reformating, etc.). + + +# Unresolved questions + +A problem is that Visual Studio uses UTF16 while Rust uses UTF8, there is (I +understand) no efficient way to convert between byte counts in these systems. +I'm not sure how to address this. It might require the RLS to be able to operate +in UTF16 mode. This is only a problem with byte offsets in spans, not with +row/column data (the RLS will supply both). It may be possible for Visual Studio +to just use the row/column data, or convert inefficiently to UTF16. I guess the +question comes down to should this conversion be done in the RLS or the client. +I think we should start assuming the client, and perhaps adjust course later. + +What kind of IPC protocol to use? HTTP is popular and simple to deal with. It's +platform-independent and used in many similar pieces of software. On the other +hand it is heavyweight and requires pulling in large libraries, and requires +some attention to security issues. Alternatives are some kind of custom +prototcol, or using a solution like Thrift. My prefernce is for HTTP, since it +has been proven in similar situations. diff --git a/text/1327-dropck-param-eyepatch.md b/text/1327-dropck-param-eyepatch.md new file mode 100644 index 00000000000..15de93c8420 --- /dev/null +++ b/text/1327-dropck-param-eyepatch.md @@ -0,0 +1,600 @@ +- Feature Name: dropck_eyepatch, generic_param_attrs +- Start Date: 2015-10-19 +- RFC PR: [rust-lang/rfcs#1327](https://github.com/rust-lang/rfcs/pull/1327) +- Rust Issue: [rust-lang/rust#34761](https://github.com/rust-lang/rust/issues/34761) + +# Summary +[summary]: #summary + +Refine the unguarded-escape-hatch from [RFC 1238][] (nonparametric +dropck) so that instead of a single attribute side-stepping *all* +dropck constraints for a type's destructor, we instead have a more +focused system that specifies exactly which type and/or lifetime +parameters the destructor is guaranteed not to access. + +Specifically, this RFC proposes adding the capability to attach +attributes to the binding sites for generic parameters (i.e. lifetime +and type paramters). Atop that capability, this RFC proposes adding a +`#[may_dangle]` attribute that indicates that a given lifetime or type +holds data that must not be accessed during the dynamic extent of that +`drop` invocation. + +As a side-effect, enable adding attributes to the formal declarations +of generic type and lifetime parameters. + +The proposal in this RFC is intended as a *temporary* solution (along +the lines of `#[fundamental]` and *will not* be stabilized +as-is. Instead, we anticipate a more comprehensive approach to be +proposed in a follow-up RFC. + +[RFC 1238]: https://github.com/rust-lang/rfcs/blob/master/text/1238-nonparametric-dropck.md +[RFC 769]: https://github.com/rust-lang/rfcs/blob/master/text/0769-sound-generic-drop.md + +# Motivation +[motivation]: #motivation + +The unguarded escape hatch (UGEH) from [RFC 1238] is a blunt +instrument: when you use `unsafe_destructor_blind_to_params`, it is +asserting that your destructor does not access borrowed data whose +type includes *any* lifetime or type parameter of the type. + +For example, the current destructor for `RawVec` (in `liballoc/`) +looks like this: + +```rust +impl Drop for RawVec { + #[unsafe_destructor_blind_to_params] + /// Frees the memory owned by the RawVec *without* trying to Drop its contents. + fn drop(&mut self) { + [... free memory using global system allocator ...] + } +} +``` + +The above is sound today, because the above destructor does not call +any methods that can access borrowed data in the values of type `T`, +and so we do not need to enforce the drop-ordering constraints imposed +when you leave out the `unsafe_destructor_blind_to_params` attribute. + +While the above attribute suffices for many use cases today, it is not +fine-grain enough for other cases of interest. In particular, it +cannot express that the destructor will not access borrowed data +behind a *subset* of the type parameters. + +Here are two concrete examples of where the need for this arises: + +## Example: `CheckedHashMap` + +The original Sound Generic Drop proposal ([RFC 769][]) +had an [appendix][RFC 769 CheckedHashMap] with an example of a +`CheckedHashMap` type that called the hashcode method +for all of the keys in the map in its destructor. +This is clearly a type where we *cannot* claim that we do not access +borrowed data potentially hidden behind `K`, so it would be unsound +to use the blunt `unsafe_destructor_blind_to_params` attribute on this +type. + +However, the values of the `V` parameter to `CheckedHashMap` are, in +all likelihood, *not* accessed by the `CheckedHashMap` destructor. If +that is the case, then it should be sound to instantiate `V` with a +type that contains references to other parts of the map (e.g., +references to the keys or to other values in the map). However, we +cannot express this today: There is no way to say that the +`CheckedHashMap` will not access borrowed data that is behind *just* +`V`. + +[RFC 769 CheckedHashMap]: https://github.com/rust-lang/rfcs/blob/master/text/0769-sound-generic-drop.md#appendix-a-why-and-when-would-drop-read-from-borrowed-data + +## Example: `Vec` + +The Rust developers have been talking for [a long time][RFC Issue 538] +about adding an `Allocator` trait that would allow users to override +the allocator used for the backing storage of collection types like +`Vec` and `HashMap`. + +For example, we would like to generalize the `RawVec` given above as +follows: + +```rust +#[unsafe_no_drop_flag] +pub struct RawVec { + ptr: Unique, + cap: usize, + alloc: A, +} + +impl Drop for RawVec { + #[should_we_put_ugeh_attribute_here_or_not(???)] + /// Frees the memory owned by the RawVec *without* trying to Drop its contents. + fn drop(&mut self) { + [... free memory using self.alloc ...] + } +} +``` + +However, we *cannot* soundly add an allocator parameter to a +collection that today uses the `unsafe_destructor_blind_to_params` +UGEH attribute in the destructor that deallocates, because that blunt +instrument would allow someone to write this: + +```rust +// (`ArenaAllocator`, when dropped, automatically frees its allocated blocks) + +// (Usual pattern for assigning same extent to `v` and `a`.) +let (v, a): (Vec, ArenaAllocator); + +a = ArenaAllocator::new(); +v = Vec::with_allocator(&a); + +... v.push(stuff) ... + +// at end of scope, `a` may be dropped before `v`, invalidating +// soundness of subsequent invocation of destructor for `v` (because +// that would try to free buffer of `v` via `v.buf.alloc` (== `&a`)). +``` + +The only way today to disallow the above unsound code would be to +remove `unsafe_destructor_blind_to_params` from `RawVec`/ `Vec`, which +would break other code (for example, code using `Vec` as the backing +storage for [cyclic graph structures][dropck_legal_cycles.rs]). + +[RFC Issue 538]: https://github.com/rust-lang/rfcs/issues/538 + +[dropck_legal_cycles.rs]: https://github.com/rust-lang/rust/blob/098a7a07ee6d11cf6d2b9d18918f26be95ee2f66/src/test/run-pass/dropck_legal_cycles.rs + +# Detailed design +[detailed design]: #detailed-design + +First off: The proposal in this RFC is intended as a *temporary* +solution (along the lines of `#[fundamental]` and *will not* be +stabilized as-is. Instead, we anticipate a more comprehensive approach +to be proposed in a follow-up RFC. + +Having said that, here is the proposed short-term solution: + + 1. Add the ability to attach attributes to syntax that binds formal + lifetime or type parmeters. For the purposes of this RFC, the only + place in the syntax that requires such attributes are `impl` + blocks, as in `impl Drop for Type { ... }` + + 2. Add a new fine-grained attribute, `may_dangle`, which is attached + to the binding sites for lifetime or type parameters on an `Drop` + implementation. + This RFC will sometimes call this attribute the "eyepatch", + since it does + not make dropck totally blind; just blind on one "side". + + 3. Add a new requirement that any `Drop` implementation that uses the + `#[may_dangle]` attribute must be declared as an `unsafe impl`. + This reflects the fact that such `Drop` implementations have + an additional constraint on their behavior (namely that they cannot + access certain kinds of data) that will not be verified by the + compiler and thus must be verified by the programmer. + + 4. Remove `unsafe_destructor_blind_to_params`, since all uses of it + should be expressible via `#[may_dangle]`. + +## Attributes on lifetime or type parameters + +This is a simple extension to the syntax. + +It is guarded by the feature gate `generic_param_attrs`. + +Constructions like the following will now become legal. + +Example of eyepatch attribute on a single type parameter: +```rust +unsafe impl<'a, #[may_dangle] X, Y> Drop for Foo<'a, X, Y> { + ... +} +``` + +Example of eyepatch attribute on a lifetime parameter: +```rust +unsafe impl<#[may_dangle] 'a, X, Y> Drop for Bar<'a, X, Y> { + ... +} +``` + +Example of eyepatch attribute on multiple parameters: +```rust +unsafe impl<#[may_dangle] 'a, X, #[may_dangle] Y> Drop for Baz<'a, X, Y> { + ... +} +``` + +These attributes are only written next to the formal binding +sites for the generic parameters. The *usage* sites, points +which refer back to the parameters, continue to disallow the use +of attributes. + +So while this is legal syntax: + +```rust +unsafe impl<'a, #[may_dangle] X, Y> Drop for Foo<'a, X, Y> { + ... +} +``` + +the follow would be illegal syntax (at least for now): + +```rust +unsafe impl<'a, X, Y> Drop for Foo<'a, #[may_dangle] X, Y> { + ... +} +``` + + +## The "eyepatch" attribute + +Add a new attribute, `#[may_dangle]` (the "eyepatch"). + +It is guarded by the feature gate `dropck_eyepatch`. + +The eyepatch is similar to `unsafe_destructor_blind_to_params`: it is +part of the `Drop` implementation, and it is meant +to assert that a destructor is guaranteed not to access certain kinds +of data accessible via `self`. + +The main difference is that the eyepatch is applied to a single +generic parameter: `#[may_dangle] ARG`. +This specifies exactly *what* +the destructor is blind to (i.e., what will dropck treat as +inaccessible from the destructor for this type). + +There are two things one can supply as the `ARG` for a given eyepatch: +one of the type parameters for the type, +or one of the lifetime parameters +for the type. + +When used on a type, e.g. `#[may_dangle] T`, the programmer is +asserting the only uses of values of that type will be to move or drop +them. Thus, no fields will be accessed nor methods called on values of +such a type (apart from any access performed by the destructor for the +type when the values are dropped). This ensures that no dangling +references (such as when `T` is instantiated with `&'a u32`) are ever +accessed in the scenario where `'a` has the same lifetime as the value +being currently destroyed (and thus the precise order of destruction +between the two is unknown to the compiler). + +When used on a lifetime, e.g. `#[may_dangle] 'a`, the programmer is +asserting that no data behind a reference of lifetime `'a` will be +accessed by the destructor. Thus, no fields will be accessed nor +methods called on values of type `&'a Struct`, ensuring that again no +dangling references are ever accessed by the destructor. + +## Require `unsafe` on Drop implementations using the eyepatch + +The final detail is to add an additional check to the compiler +to ensure that any use of `#[may_dangle]` on a `Drop` implementation +imposes a requirement that that implementation block use +`unsafe impl`.[2](#footnote1) + +This reflects the fact that use of `#[may_dangle]` is a +programmer-provided assertion about the behavior of the `Drop` +implementation that must be valided manually by the programmer. +It is analogous to other uses of `unsafe impl` (apart from the +fact that the `Drop` trait itself is not an `unsafe trait`). + +### Examples adapted from the Rustonomicon + +[nomicon dropck]: https://doc.rust-lang.org/nightly/nomicon/dropck.html + +So, adapting some examples from the Rustonomicon +[Drop Check][nomicon dropck] chapter, we would be able to write +the following. + +Example of eyepatch on a lifetime parameter:: + +```rust +struct InspectorA<'a>(&'a u8, &'static str); + +unsafe impl<#[may_dangle] 'a> Drop for InspectorA<'a> { + fn drop(&mut self) { + println!("InspectorA(_, {}) knows when *not* to inspect.", self.1); + } +} +``` + +Example of eyepatch on a type parameter: + +```rust +use std::fmt; + +struct InspectorB(T, &'static str); + +unsafe impl<#[may_dangle] T: fmt::Display> Drop for InspectorB { + fn drop(&mut self) { + println!("InspectorB(_, {}) knows when *not* to inspect.", self.1); + } +} +``` + +Both of the above two examples are much the same as if we had used the +old `unsafe_destructor_blind_to_params` UGEH attribute. + +### Example: RawVec + +To generalize `RawVec` from the [motivation](#motivation) with an +`Allocator` correctly (that is, soundly and without breaking existing +code), we would now write: + +```rust +unsafe impl<#[may_dangle]T, A:Allocator> Drop for RawVec { + /// Frees the memory owned by the RawVec *without* trying to Drop its contents. + fn drop(&mut self) { + [... free memory using self.alloc ...] + } +} +``` + +The use of `#[may_dangle] T` here asserts that even +though the destructor may access borrowed data through `A` (and thus +dropck must impose drop-ordering constraints for lifetimes occurring +in the type of `A`), the developer is guaranteeing that no access to +borrowed data will occur via the type `T`. + +The latter is not expressible today even with +`unsafe_destructor_blind_to_params`; there is no way to say that a +type will not access `T` in its destructor while also ensuring the +proper drop-ordering relationship between `RawVec` and `A`. + +### Example; Multiple Lifetimes + +Example: The above `InspectorA` carried a `&'static str` that was +always safe to access from the destructor. + +If we wanted to generalize this type a bit, we might write: + +```rust +struct InspectorC<'a,'b,'c>(&'a str, &'b str, &'c str); + +unsafe impl<#[may_dangle] 'a, 'b, #[may_dangle] 'c> Drop for InspectorC<'a,'b,'c> { + fn drop(&mut self) { + println!("InspectorA(_, {}, _) knows when *not* to inspect.", self.1); + } +} +``` + +This type, like `InspectorA`, is careful to only access the `&str` +that it holds in its destructor; but now the borrowed string slice +does not have `'static` lifetime, so we must make sure that we do not +claim that we are blind to its lifetime (`'b`). + +(This example also illustrates that one can attach multiple instances +of the eyepatch attribute to a destructor, each with a distinct input +for its `ARG`.) + +Given the definition above, this code will compile and run properly: + +```rust +fn this_will_work() { + let b; // ensure that `b` strictly outlives `i`. + let (i,a,c); + a = format!("a"); + b = format!("b"); + c = format!("c"); + i = InspectorC(a, b, c); +} +``` + +while this code will be rejected by the compiler: + +```rust +fn this_will_not_work() { + let (a,c); + let (i,b); // OOPS: `b` not guaranteed to survive for `i`'s destructor. + a = format!("a"); + b = format!("b"); + c = format!("c"); + i = InspectorC(a, b, c); +} +``` + +## Semantics + +How does this work, you might ask? + +The idea is actually simple: the dropck rule stays mostly the same, +except for a small twist. + +The Drop-Check rule at this point essentially says: + +> if the type of `v` owns data of type `D`, where +> +> (1.) the `impl Drop for D` is either type-parametric, or lifetime-parametric over `'a`, and +> (2.) the structure of `D` can reach a reference of type `&'a _`, +> +> then `'a` must strictly outlive the scope of `v` + +The main change we want to make is to the second condition. +Instead of just saying "the structure of `D` can reach a reference of type `&'a _`", +we want first to replace eyepatched lifetimes and types within `D` with `'static` and `()`, +respectively. Call this revised type `patched(D)`. + +Then the new condition is: + +> (2.) the structure of patched(D) can reach a reference of type `&'a _`, + +*Everything* else is the same. + +In particular, the patching substitution is *only* applied with +respect to a particular destructor. Just because `Vec` is blind to `T` +does not mean that we will ignore the actual type instantiated at `T` +in terms of drop-ordering constraints. + +For example, in `Vec>`, even though `Vec` +itself is blind to the whole type `InspectorC<'a, 'name, 'c>` when we +are considering the `impl Drop for Vec`, we *still* honor the +constraint that `'name` must strictly outlive the `Vec` (because we +continue to consider all `D` that is data owned by a value `v`, +including when `D` == `InspectorC<'a,'name,'c>`). + +## Prototype +[prototype]: #prototype + +pnkfelix has implemented a proof-of-concept +[implementation][pnkfelix prototype] of the `#[may_dangle]` attribute. +It uses the substitution machinery we already have in the compiler +to express the semantics above. + +## Limitations of prototype (not part of design) + +Here we note a few limitations of the current prototype. These +limitations are *not* being proposed as part of the specification of +the feature. + +2. The compiler does not yet enforce (or even +allow) the use of `unsafe impl` for `Drop` implementations that use +the `#[may_dangle]` attribute. + +Fixing the above limitations should just be a matter of engineering, +not a fundamental hurdle to overcome in the feature's design in the +context of the language. + +[pnkfelix prototype]: https://github.com/pnkfelix/rust/commits/dropck-eyepatch + +# Drawbacks +[drawbacks]: #drawbacks + +## Ugliness + +This attribute, like the original `unsafe_destructor_blind_to_params` +UGEH attribute, is ugly. + +## Unchecked assertions boo + +It would be nicer if to actually change the language in a way where we +could check the assertions being made by the programmer, rather than +trusting them. (pnkfelix has some thoughts on this, which are mostly +reflected in what he wrote in the [RFC 1238 alternatives][].) + +[RFC 1238 alternatives]: https://github.com/rust-lang/rfcs/blob/master/text/1238-nonparametric-dropck.md#continue-supporting-parametricity + +# Alternatives +[alternatives]: #alternatives + +Note: The alternatives section for this RFC is particularly +note-worthy because the ideas here may serve as the basis for a more +comprehensive long-term approach. + +## Make dropck "see again" via (focused) where-clauses + +The idea is that we keep the UGEH attribute, blunt hammer that it is. +You first opt out of the dropck ordering constraints via that, and +then you add back in ordering constraints via `where` clauses. + +(The ordering constraints in question would normally be *implied* by +the dropck analysis; the point is that UGEH is opting out of that +analysis, and so we are now adding them back in.) + +Here is the allocator example expressed in this fashion: + +```rust +impl Drop for RawVec { + #[unsafe_destructor_blind_to_params] + /// Frees the memory owned by the RawVec *without* trying to Drop its contents. + fn drop<'s>(&'s mut self) where A: 's { + // ~~~~~~~~~~~ + // | + // | + // This constraint (that `A` outlives `'s`), and other conditions + // relating `'s` and `Self` are normally implied by Rust's type + // system, but `unsafe_destructor_blind_to_params` opts out of + // enforcing them. This `where`-clause is opting back into *just* + // the `A:'s` again. + // + // Note we are *still* opting out of `T: 's` via + // `unsafe_destructor_blind_to_params`, and thus our overall + // goal (of not breaking code that relies on `T` not having to + // survive the destructor call) is accomplished. + + [... free memory using self.alloc ...] + } +} +``` + +This approach, if we can make it work, seems fine to me. It certainly +avoids a number of problems that the eyepatch attribute has. + +Advantages of fn-drop-with-where-clauses: + + * Since the eyepatch attribute is to be limited to type and lifetime + parameters, this approach is more expressive, + since it would allow one to put type-projections into the + constraints. + +Drawbacks of fn-drop-with-where-clauses: + + * Its not 100% clear what our implementation strategy will be for it, + while the eyepatch attribute does have a [prototype]. + + I actually do not give this drawback much weight; resolving this + may be merely a matter of just trying to do it: e.g., build up the + set of where-clauses when we make the ADT's representatin, and + then have `dropck` insert instantiate and insert them as needed. + + * It might have the wrong ergonomics for developers: It seems bad to + have the blunt hammer introduce all sorts of potential + unsoundness, and rely on the developer to keep the set of + `where`-clauses on the `fn drop` up to date. + + This would be a pretty bad drawback, *if* the language and + compiler were to stagnate. But my intention/goal is to eventually + put in a [sound compiler analysis][wait-for-proper-parametricity]. + In other words, in the future, I will be more concerned about the + ergonomics of the code that uses the sound analysis. I will not be + concerned about "gotcha's" associated with the UGEH escape hatch. + +(The most important thing I want to convey is that I believe that both +the eyepatch attribute and fn-drop-with-where-clauses are capable of +resolving the real issues that I face today, and I would be happy for +either proposal to be accepted.) + +## Wait for proper parametricity +[wait-for-proper-parametricity]: #wait-for-proper-parametricity + +As alluded to in the [drawbacks][], in principle we could provide +similar expressiveness to that offered by the eyepatch (which is +acting as a fine-grained escape hatch from dropck) by instead offering +some language extension where the compiler would actually analyze the +code based on programmer annotations indicating which types and +lifetimes are not used by a function. + +In my opinion I am of two minds on this (but they are both in favor +this RFC rather than waiting for a sound compiler analysis): + + 1. We will always need an escape hatch. The programmer will always need + a way to assert something that she knows to be true, even if the compiler + cannot prove it. (A simple example: Calling a third-party API that has not + yet added the necessary annotations.) + + This RFC is proposing that we keep an escape hatch, but we make it more + expressive. + + 2. If we eventually *do* have a sound compiler analysis, I see the + compiler changes and library annotations suggested by this RFC as + being in line with what that compiler analysis would end up using + anyway. In other words: Assume we *did* add some way for the programmer + to write that `T` is parametric (e.g. `T: ?Special` in the [RFC 1238 alternatives]). + Even then, we would still need the compiler changes suggested by this RFC, + and at that point hopefully the task would be for the programmer to mechanically + replace occurrences of `#[may_dangle] T` with `T: ?Special` + (and then see if the library builds). + + In other words, I see the form suggested by this RFC as being a step *towards* + a proper analysis, in the sense that it is getting programmers used to thinking + about the individual parameters and their relationship with the container, rather + than just reasoning about the container on its own without any consideration + of each type/lifetime parameter. + +## Do nothing + +If we do nothing, then we cannot add `Vec` soundly. + +# Unresolved questions +[unresolved]: #unresolved-questions + +Is the definition of the drop-check rule sound with this `patched(D)` +variant? (We have not proven any previous variation of the rule +sound; I think it would be an interesting student project though.) diff --git a/text/1328-global-panic-handler.md b/text/1328-global-panic-handler.md new file mode 100644 index 00000000000..299a4254a6a --- /dev/null +++ b/text/1328-global-panic-handler.md @@ -0,0 +1,183 @@ +- Feature Name: `panic_handler` +- Start Date: 2015-10-08 +- RFC PR: [rust-lang/rfcs#1328](https://github.com/rust-lang/rfcs/pull/1328) +- Rust Issue: [rust-lang/rust#30449](https://github.com/rust-lang/rust/issues/30449) + +# Summary + +When a thread panics in Rust, the unwinding runtime currently prints a message +to standard error containing the panic argument as well as the filename and +line number corresponding to the location from which the panic originated. +This RFC proposes a mechanism to allow user code to replace this logic with +custom handlers that will run before unwinding begins. + +# Motivation + +The default behavior is not always ideal for all programs: + +* Programs with command line interfaces do not want their output polluted by + random panic messages. +* Programs using a logging framework may want panic messages to be routed into + that system so that they can be processed like other events. +* Programs with graphical user interfaces may not have standard error attached + at all and want to be notified of thread panics to potentially display an + internal error dialog to the user. + +The standard library [previously +supported](https://doc.rust-lang.org/1.3.0/std/rt/unwind/fn.register.html) (in +unstable code) the registration of a set of panic handlers. This API had +several issues: + +* The system supported a fixed but unspecified number of handlers, and a + handler could never be unregistered once added. +* The callbacks were raw function pointers rather than closures. +* Handlers would be invoked on nested panics, which would result in a stack + overflow if a handler itself panicked. +* The callbacks were specified to take the panic message, file name and line + number directly. This would prevent us from adding more functionality in + the future, such as access to backtrace information. In addition, the + presence of file names and line numbers for all panics causes some amount of + binary bloat and we may want to add some avenue to allow for the omission of + those values in the future. + +# Detailed design + +A new module, `std::panic`, will be created with a panic handling API: + +```rust +/// Unregisters the current panic handler, returning it. +/// +/// If no custom handler is registered, the default handler will be returned. +/// +/// # Panics +/// +/// Panics if called from a panicking thread. Note that this will be a nested +/// panic and therefore abort the process. +pub fn take_handler() -> Box { ... } + +/// Registers a custom panic handler, replacing any that was previously +/// registered. +/// +/// # Panics +/// +/// Panics if called from a panicking thread. Note that this will be a nested +/// panic and therefore abort the process. +pub fn set_handler(handler: F) where F: Fn(&PanicInfo) + 'static + Sync + Send { ... } + +/// A struct providing information about a panic. +pub struct PanicInfo { ... } + +impl PanicInfo { + /// Returns the payload associated with the panic. + /// + /// This will commonly, but not always, be a `&'static str` or `String`. + pub fn payload(&self) -> &Any + Send { ... } + + /// Returns information about the location from which the panic originated, + /// if available. + pub fn location(&self) -> Option { ... } +} + +/// A struct containing information about the location of a panic. +pub struct Location<'a> { ... } + +impl<'a> Location<'a> { + /// Returns the name of the source file from which the panic originated. + pub fn file(&self) -> &str { ... } + + /// Returns the line number from which the panic originated. + pub fn line(&self) -> u32 { ... } +} +``` + +When a panic occurs, but before unwinding begins, the runtime will call the +registered panic handler. After the handler returns, the runtime will then +unwind the thread. If a thread panics while panicking (a "double panic"), the +panic handler will *not* be invoked and the process will abort. Note that the +thread is considered to be panicking while the panic handler is running, so a +panic originating from the panic handler will result in a double panic. + +The `take_handler` method exists to allow for handlers to "chain" by closing +over the previous handler and calling into it: + +```rust +let old_handler = panic::take_handler(); +panic::set_handler(move |info| { + println!("uh oh!"); + old_handler(info); +}); +``` + +This is obviously a racy operation, but as a single global resource, the global +panic handler should only be adjusted by applications rather than libraries, +most likely early in the startup process. + +The implementation of `set_handler` and `take_handler` will have to be +carefully synchronized to ensure that a handler is not replaced while executing +in another thread. This can be accomplished in a manner similar to [that used +by the `log` +crate](https://github.com/rust-lang-nursery/log/blob/aa8618c840dd88b27c487c9fc9571d89751583f3/src/lib.rs). +`take_handler` and `set_handler` will wait until no other threads are currently +running the panic handler, at which point they will atomically swap the handler +out as appropriate. + +Note that `location` will always return `Some` in the current implementation. +It returns an `Option` to hedge against possible future changes to the panic +system that would allow a crate to be compiled with location metadata removed +to minimize binary size. + +## Prior Art + +C++ has a +[`std::set_terminate`](http://www.cplusplus.com/reference/exception/set_terminate/) +function which registers a handler for uncaught exceptions, returning the old +one. The handler takes no arguments. + +Python passes uncaught exceptions to the global handler +[`sys.excepthook`](https://docs.python.org/2/library/sys.html#sys.excepthook) +which can be set by user code. + +In Java, uncaught exceptions [can be +handled](http://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html#setUncaughtExceptionHandler(java.lang.Thread.UncaughtExceptionHandler)) +by handlers registered on an individual `Thread`, by the `Thread`'s, +`ThreadGroup`, and by a handler registered globally. The handlers are provided +with the `Throwable` that triggered the handler. + +# Drawbacks + +The more infrastructure we add to interact with panics, the more attractive it +becomes to use them as a more normal part of control flow. + +# Alternatives + +Panic handlers could be run after a panicking thread has unwound rather than +before. This is perhaps a more intuitive arrangement, and allows `catch_panic` +to prevent panic handlers from running. However, running handlers before +unwinding allows them access to more context, for example, the ability to take +a stack trace. + +`PanicInfo::location` could be split into `PanicInfo::file` and +`PanicInfo::line` to cut down on the API size, though that would require +handlers to deal with weird cases like a line number but no file being +available. + +[RFC 1100](https://github.com/rust-lang/rfcs/pull/1100) proposed an API based +around thread-local handlers. While there are reasonable use cases for the +registration of custom handlers on a per-thread basis, most of the common uses +for custom handlers want to have a single set of behavior cover all threads in +the process. Being forced to remember to register a handler in every thread +spawned in a program is tedious and error prone, and not even possible in many +cases for threads spawned in libraries the author has no control over. + +While out of scope for this RFC, a future extension could add thread-local +handlers on top of the global one proposed here in a straightforward manner. + +The implementation could be simplified by altering the API to store, and +`take_logger` to return, an `Arc` or +a bare function pointer. This seems like a somewhat weirder API, however, and +the implementation proposed above should not end up complex enough to justify +the change. + +# Unresolved questions + +None at the moment. diff --git a/text/1331-grammar-is-canonical.md b/text/1331-grammar-is-canonical.md new file mode 100644 index 00000000000..e88f690d3ae --- /dev/null +++ b/text/1331-grammar-is-canonical.md @@ -0,0 +1,97 @@ +- Feature Name: grammar +- Start Date: 2015-10-21 +- RFC PR: [rust-lang/rfcs#1331](https://github.com/rust-lang/rfcs/pull/1331) +- Rust Issue: [rust-lang/rust#30942](https://github.com/rust-lang/rust/issues/30942) + +# Summary +[summary]: #summary +[src/grammar]: https://github.com/rust-lang/rust/tree/master/src/grammar + +Grammar of the Rust language should not be rustc implementation-defined. We have a formal grammar +at [src/grammar] which is to be used as the canonical and formal representation of the Rust +language. + +# Motivation +[motivation]: #motivation +[#1228]: https://github.com/rust-lang/rfcs/blob/master/text/1228-placement-left-arrow.md +[#1219]: https://github.com/rust-lang/rfcs/blob/master/text/1219-use-group-as.md +[#1192]: https://github.com/rust-lang/rfcs/blob/master/text/1192-inclusive-ranges.md + +In many RFCs proposing syntactic changes ([#1228], [#1219] and [#1192] being some of more recently +merged RFCs) the changes are described rather informally and are hard to both implement and +discuss which also leads to discussions containing a lot of guess-work. + +Making [src/grammar] to be the canonical grammar and demanding for description of syntactic changes +to be presented in terms of changes to the formal grammar should greatly simplify both the +discussion and implementation of the RFCs. Using a formal grammar also allows us to discover and +rule out existence of various issues with the grammar changes (e.g. grammar ambiguities) during +design phase rather than implementation phase or, even worse, after the stabilisation. + +# Detailed design +[design]: #detailed-design +[A-grammar]: https://github.com/rust-lang/rust/issues?utf8=✓&q=is:issue+is:open+label:A-grammar + +Sadly, the [grammar][src/grammar] in question is [not quite equivalent][A-grammar] to the +implementation in rustc yet. We cannot possibly hope to catch all the quirks in the rustc parser +implementation, therefore something else needs to be done. + +This RFC proposes following approach to making [src/grammar] the canonical Rust language grammar: + +1. Fix the already known discrepancies between implementation and [src/grammar]; +2. Make [src/grammar] a [semi-canonical grammar]; +3. After a period of time transition [src/grammar] to a [fully-canonical grammar]. + +## Semi-canonical grammar +[semi-canonical grammar]: #semi-canonical-grammar + +Once all known discrepancies between the [src/grammar] and rustc parser implementation are +resolved, [src/grammar] enters the state of being semi-canonical grammar of the Rust language. + +Semi-canonical means that all new development involving syntax changes are made and discussed in +terms of changes to the [src/grammar] and [src/grammar] is in general regarded to as the canonical +grammar except when new discrepancies are discovered. These discrepancies must be swiftly resolved, +but resolution will depend on what kind of discrepancy it is: + +1. For syntax changes/additions introduced after [src/grammar] gained the semi-canonical state, the + [src/grammar] is canonical; +2. For syntax that was present before [src/grammar] gained the semi-canonical state, in most cases + the implementation is canonical. + +This process is sure to become ambiguous over time as syntax is increasingly adjusted (it is harder +to “blame” syntax changes compared to syntax additions), therefore the resolution process of +discrepancies will also depend more on a decision from the Rust team. + +## Fully-canonical grammar +[fully-canonical grammar]: #fully-canonical-grammar + +After some time passes, [src/grammar] will transition to the state of fully canonical grammar. +After [src/grammar] transitions into this state, for any discovered discrepancies the +rustc parser implementation must be adjusted to match the [src/grammar], unless decided otherwise +by the RFC process. + +## RFC process changes for syntactic changes and additions + +Once the [src/grammar] enters semi-canonical state, all RFCs must describe syntax additions and +changes in terms of the formal [src/grammar]. Discussion about these changes are also expected (but +not necessarily will) to become more formal and easier to follow. + +# Drawbacks +[drawbacks]: #drawbacks + +This RFC introduces a period of ambiguity during which neither implementation nor [src/grammar] are +truly canonical representation of the Rust language. This will be less of an issue over time as +discrepancies are resolved, but its an issue nevertheless. + +# Alternatives +[alternatives]: #alternatives + +One alternative would be to immediately make [src/grammar] a fully-canonical grammar of the Rust +language at some arbitrary point in the future. + +Another alternative is to simply forget idea of having a formal grammar be the canonical grammar of +the Rust language. + +# Unresolved questions +[unresolved]: #unresolved-questions + +How much time should pass between [src/grammar] becoming semi-canonical and fully-canonical? diff --git a/text/1358-repr-align.md b/text/1358-repr-align.md new file mode 100644 index 00000000000..d3c2d8f0004 --- /dev/null +++ b/text/1358-repr-align.md @@ -0,0 +1,149 @@ +- Feature Name: `repr_align` +- Start Date: 2015-11-09 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1358 +- Rust Issue: https://github.com/rust-lang/rust/issues/33626 + +# Summary +[summary]: #summary + +Extend the existing `#[repr]` attribute on structs with an `align = "N"` option +to specify a custom alignment for `struct` types. + +# Motivation +[motivation]: #motivation + +The alignment of a type is normally not worried about as the compiler will "do +the right thing" of picking an appropriate alignment for general use cases. +There are situations, however, where a nonstandard alignment may be desired when +operating with foreign systems. For example these sorts of situations tend to +necessitate or be much easier with a custom alignment: + +* Hardware can often have obscure requirements such as "this structure is + aligned to 32 bytes" when it in fact is only composed of 4-byte values. While + this can typically be manually calculated and managed, it's often also useful + to express this as a property of a type to get the compiler to do a little + extra work instead. +* C compilers like gcc and clang offer the ability to specify a custom alignment + for structures, and Rust can much more easily interoperate with these types if + Rust can also mirror the request for a custom alignment (e.g. passing a + structure to C correctly is much easier). +* Custom alignment can often be used for various tricks here and there and is + often convenient as "let's play around with an implementation" tool. For + example this can be used to statically allocate page tables in a kernel + or create an at-least cache-line-sized structure easily for concurrent + programming. + +Currently these sort of situations are possible in Rust but aren't necessarily +the most ergonomic as programmers must manually manage alignment. The purpose of +this RFC is to provide a lightweight annotation to alter the compiler-inferred +alignment of a structure to enable these situations much more easily. + +# Detailed design +[design]: #detailed-design + +The `#[repr]` attribute on `struct`s will be extended to include a form such as: + +```rust +#[repr(align = "16")] +struct MoreAligned(i32); +``` + +This structure will still have an alignment of 16 (as returned by +`mem::align_of`), and in this case the size will also be 16. + +Syntactically, the `repr` meta list will be extended to accept a meta item +name/value pair with the name "align" and the value as a string which can be +parsed as a `u64`. The restrictions on where this attribute can be placed along +with the accepted values are: + +* Custom alignment can only be specified on `struct` declarations for now. + Specifying a different alignment on perhaps `enum` or `type` definitions + should be a backwards-compatible extension. +* Alignment values must be a power of two. + +Multiple `#[repr(align = "..")]` directives are accepted on a struct +declaration, and the actual alignment of the structure will be the maximum of +all `align` directives and the natural alignment of the struct itself. + +Semantically, it will be guaranteed (modulo `unsafe` code) that custom alignment +will always be respected. If a pointer to a non-aligned structure exists and is +used then it is considered unsafe behavior. Local variables, objects in arrays, +statics, etc, will all respect the custom alignment specified for a type. + +For now, it will be illegal for any `#[repr(packed)]` struct to transitively +contain a struct with `#[repr(align)]`. Specifically, both attributes cannot be +applied on the same struct, and a `#[repr(packed)]` struct cannot transitively +contain another struct with `#[repr(align)]`. The flip side, including a +`#[repr(packed)]` structure inside of a `#[repr(align)]` one will be allowed. +The behavior of MSVC and gcc differ in how these properties interact, and for +now we'll just yield an error while we get experience with the two attributes. + +Some examples of `#[repr(align)]` are: + +```rust +// Raising alignment +#[repr(align = "16")] +struct Align16(i32); + +assert_eq!(mem::align_of::(), 16); +assert_eq!(mem::size_of::(), 16); + +// Lowering has no effect +#[repr(align = "1")] +struct Align1(i32); + +assert_eq!(mem::align_of::(), 4); +assert_eq!(mem::size_of::(), 4); + +// Multiple attributes take the max +#[repr(align = "8", align = "4")] +#[repr(align = "16")] +struct AlignMany(i32); + +assert_eq!(mem::align_of::(), 16); +assert_eq!(mem::size_of::(), 16); + +// Raising alignment may not alter size. +#[repr(align = "8")] +struct Align8Many { + a: i32, + b: i32, + c: i32, + d: u8, +} + +assert_eq!(mem::align_of::(), 8); +assert_eq!(mem::size_of::(), 16); +``` + +# Drawbacks +[drawbacks]: #drawbacks + +Specifying a custom alignment isn't always necessarily easy to do so via a +literal integer value. It may require usage of `#[cfg_attr]` in some situations +and may otherwise be much more convenient to name a different type instead. +Working with a raw integer, however, should provide the building block for +building up other abstractions and should be maximally flexible. It also +provides a relatively straightforward implementation and understanding of the +attribute at hand. + +This also currently does not allow for specifying the custom alignment of a +struct field (as C compilers also allow doing) without the usage of a newtype +structure. Currently `#[repr]` is not recognized here, but it would be a +backwards compatible extension to start reading it on struct fields. + +# Alternatives +[alternatives]: #alternatives + +Instead of using the `#[repr]` attribute as the "house" for the custom +alignment, there could instead be a new `#[align = "..."]` attribute. This is +perhaps more extensible to alignment in other locations such as a local variable +(with attributes on expressions), a struct field (where `#[repr]` is more of an +"outer attribute"), or enum variants perhaps. + +# Unresolved questions +[unresolved]: #unresolved-questions + +* It is likely best to simply match the semantics of C/C++ in the regard of + custom alignment, but is it ensured that this RFC is the same as the behavior + of standard C compilers? diff --git a/text/1359-process-ext-unix.md b/text/1359-process-ext-unix.md new file mode 100644 index 00000000000..ece03ce6a75 --- /dev/null +++ b/text/1359-process-ext-unix.md @@ -0,0 +1,126 @@ +- Feature Name: `process_exec` +- Start Date: 2015-11-09 +- RFC PR: [rust-lang/rfcs#1359](https://github.com/rust-lang/rfcs/pull/1359) +- Rust Issue: [rust-lang/rust#31398](https://github.com/rust-lang/rust/issues/31398) + +# Summary +[summary]: #summary + +Add two methods to the `std::os::unix::process::CommandExt` trait to provide +more control over how processes are spawned on Unix, specifically: + +```rust +fn exec(&mut self) -> io::Error; +fn before_exec(&mut self, f: F) -> &mut Self + where F: FnOnce() -> io::Result<()> + Send + Sync + 'static; +``` + +# Motivation +[motivation]: #motivation + +Although the standard library's implementation of spawning processes on Unix is +relatively complex, it unfortunately doesn't provide the same flexibility as +calling `fork` and `exec` manually. For example, these sorts of use cases are +not possible with the `Command` API: + +* The `exec` function cannot be called without `fork`. It's often useful on Unix + in doing this to avoid spawning processes or improve debuggability if the + pre-`exec` code was some form of shim. +* Execute other flavorful functions between the fork/exec if necessary. For + example some proposed extensions to the standard library are [dealing with the + controlling tty][tty] or dealing with [session leaders][session]. In theory + any sort of arbitrary code can be run between these two syscalls, and it may + not always be the case the standard library can provide a suitable + abstraction. + +[tty]: https://github.com/rust-lang/rust/pull/28982 +[session]: https://github.com/rust-lang/rust/pull/26470 + +Note that neither of these pieces of functionality are possible on Windows as +there is no equivalent of the `fork` or `exec` syscalls in the standard APIs, so +these are specifically proposed as methods on the Unix extension trait. + +# Detailed design +[design]: #detailed-design + +The following two methods will be added to the +`std::os::unix::process::CommandExt` trait: + +```rust +/// Performs all the required setup by this `Command`, followed by calling the +/// `execvp` syscall. +/// +/// On success this function will not return, and otherwise it will return an +/// error indicating why the exec (or another part of the setup of the +/// `Command`) failed. +/// +/// Note that the process may be in a "broken state" if this function returns in +/// error. For example the working directory, environment variables, signal +/// handling settings, various user/group information, or aspects of stdio +/// file descriptors may have changed. If a "transactional spawn" is required to +/// gracefully handle errors it is recommended to use the cross-platform `spawn` +/// instead. +fn exec(&mut self) -> io::Error; + +/// Schedules a closure to be run just before the `exec` function is invoked. +/// +/// This closure will be run in the context of the child process after the +/// `fork` and other aspects such as the stdio file descriptors and working +/// directory have successfully been changed. Note that this is often a very +/// constrained environment where normal operations like `malloc` or acquiring a +/// mutex are not guaranteed to work (due to other threads perhaps still running +/// when the `fork` was run). +/// +/// The closure is allowed to return an I/O error whose OS error code will be +/// communicated back to the parent and returned as an error from when the spawn +/// was requested. +/// +/// Multiple closures can be registered and they will be called in order of +/// their registration. If a closure returns `Err` then no further closures will +/// be called and the spawn operation will immediately return with a failure. +fn before_exec(&mut self, f: F) -> &mut Self + where F: FnOnce() -> io::Result<()> + Send + Sync + 'static; +``` + +The `exec` function is relatively straightforward as basically the entire spawn +operation minus the `fork`. The stdio handles will be inherited by default if +not otherwise configured. Note that a configuration of `piped` will likely just +end up with a broken half of a pipe on one of the file descriptors. + +The `before_exec` function has extra-restrictive bounds to preserve the same +qualities that the `Command` type has (notably `Send`, `Sync`, and `'static`). +This also happens after all other configuration has happened to ensure that +libraries can take advantage of the other operations on `Command` without having +to reimplement them manually in some circumstances. + +# Drawbacks +[drawbacks]: #drawbacks + +This change is possible to be a breaking change to `Command` as it will no +longer implement all marker traits by default (due to it containing closure +trait objects). While the common marker traits are handled here, it's possible +that there are some traits in the wild in use which this could break. + +Much of the functionality which may initially get funneled through `before_exec` +may actually be best implemented as functions in the standard library itself. +It's likely that many operations are well known across unixes and aren't niche +enough to stay outside the standard library. + +# Alternatives +[alternatives]: #alternatives + +Instead of souping up `Command` the type could instead provide accessors to all +of the configuration that it contains. This would enable this sort of +functionality to be built on crates.io first instead of requiring it to be built +into the standard library to start out with. Note that this may want to end up +in the standard library regardless, however. + +# Unresolved questions +[unresolved]: #unresolved-questions + +* Is it appropriate to run callbacks just before the `exec`? Should they instead + be run before any standard configuration like stdio has run? +* Is it possible to provide "transactional semantics" to the `exec` function + such that it is safe to recover from? Perhaps it's worthwhile to provide + partial transactional semantics in the form of "this can be recovered from so + long as all stdio is inherited". diff --git a/text/1361-cargo-cfg-dependencies.md b/text/1361-cargo-cfg-dependencies.md new file mode 100644 index 00000000000..c4eed93edb6 --- /dev/null +++ b/text/1361-cargo-cfg-dependencies.md @@ -0,0 +1,158 @@ +- Feature Name: N/A +- Start Date: 2015-11-10 +- RFC PR: [rust-lang/rfcs#1361](https://github.com/rust-lang/rfcs/pull/1361) +- Rust Issue: N/A + +# Summary +[summary]: #summary + +Improve the target-specific dependency experience in Cargo by leveraging the +same `#[cfg]` syntax that Rust has. + +# Motivation +[motivation]: #motivation + +Currently in Cargo it's [relatively painful][issue] to list target-specific +dependencies. This can only be done by listing out the entire target string as +opposed to using the more-convenient `#[cfg]` annotations that Rust source code +has access to. Consequently a Windows-specific dependency ends up having to be +defined for four triples: `{i686,x86_64}-pc-windows-{gnu,msvc}`, and this is +unfortunately not forwards compatible as well! + +[issue]: https://github.com/rust-lang/cargo/issues/1007 + +As a result most crates end up unconditionally depending on target-specific +dependencies and rely on the crates themselves to have the relevant `#[cfg]` to +only be compiled for the right platforms. This experience leads to excessive +downloads, excessive compilations, and overall "unclean methods" to have a +platform specific dependency. + +This RFC proposes leveraging the same familiar syntax used in Rust itself to +define these dependencies. + +# Detailed design +[design]: #detailed-design + +The target-specific dependency syntax in Cargo will be expanded to include +not only full target strings but also `#[cfg]` expressions: + +```toml +[target."cfg(windows)".dependencies] +winapi = "0.2" + +[target."cfg(unix)".dependencies] +unix-socket = "0.4" + +[target.'cfg(target_os = "macos")'.dependencies] +core-foundation = "0.2" +``` + +Specifically, the "target" listed here is considered special if it starts with +the string "cfg(" and ends with ")". If this is not true then Cargo will +continue to treat it as an opaque string and pass it to the compiler via +`--target` (Cargo's current behavior). + +Cargo will implement its own parser of this syntax inside the `cfg` expression, +it will not rely on the compiler itself. The grammar, however, will be the same +as the compiler for now: + +``` +cfg := "cfg(" meta-item * ")" +meta-item := ident | + ident "=" string | + ident "(" meta-item * ")" +``` + +Like Rust, Cargo will implement the `any`, `all`, and `not` operators for the +`ident(list)` syntax. The last missing piece is simply understand what `ident` +and `ident = "string"` values are defined for a particular target. To learn this +information Cargo will query the compiler via a new command line flag: + +``` +$ rustc --print cfg +unix +target_os="apple" +target_pointer_width="64" +... + +$ rustc --print cfg --target i686-pc-windows-msvc +windows +target_os="windows" +target_pointer_width="32" +... +``` + +The `--print cfg` command line flag will print out all built-in `#[cfg]` +directives defined by the compiler onto standard output. Each cfg will be +printed on its own line to allow external parsing. Cargo will use this to call +the compiler once (or twice if an explicit target is requested) when resolution +starts, and it will use these key/value pairs to execute the `cfg` queries in +the dependency graph being constructed. + +# Drawbacks +[drawbacks]: #drawbacks + +This is not a forwards-compatible extension to Cargo, so this will break +compatibility with older Cargo versions. If a crate is published with a Cargo +that supports this `cfg` syntax, it will not be buildable by a Cargo that does +not understand the `cfg` syntax. The registry itself is prepared to handle this +sort of situation as the "target" string is just opaque, however. + +This can be perhaps mitigated via a number of strategies: + +1. Have crates.io reject the `cfg` syntax until the implementation has landed on + stable Cargo for at least one full cycle. Applications, path dependencies, + and git dependencies would still be able to use this syntax, but crates.io + wouldn't be able to leverage it immediately. +2. Crates on crates.io wishing for compatibility could simply hold off on using + this syntax until this implementation has landed in stable Cargo for at least + a full cycle. This would mean that everyone could use it immediately but "big + crates" would be advised to hold off for compatibility for awhile. +3. Have crates.io rewrite dependencies as they're published. If you publish a + crate with a `cfg(windows)` dependency then crates.io could expand this to + all known triples which match `cfg(windows)` when storing the metadata + internally. This would mean that crates using `cfg` syntax would continue to + be compatible with older versions of Cargo so long as they were only used as + a crates.io dependency. + +For ease of implementation this RFC would recommend strategy (1) to help ease +this into the ecosystem without too much pain in terms of compatibility or +implementation. + +# Alternatives +[alternatives]: #alternatives + +Instead of using Rust's `#[cfg]` syntax, Cargo could support other options such +as patterns over the target string. For example it could accept something along +the lines of: + +```toml +[target."*-pc-windows-*".dependencies] +winapi = "0.2" + +[target."*-apple-*".dependencies] +core-foundation = "0.2" +``` + +While certainly more flexible than today's implementation, it unfortunately is +relatively error prone and doesn't cover all the use cases one may want: + +* Matching against a string isn't necessarily guaranteed to be robust moving + forward into the future. +* This doesn't support negation and other operators, e.g. `all(unix, not(osx))`. +* This doesn't support meta-families like `cfg(unix)`. + +Another possible alternative would be to have Cargo supply pre-defined families +such as `windows` and `unix` as well as the above pattern matching, but this +eventually just moves into the territory of what `#[cfg]` already provides but +may not always quite get there. + +# Unresolved questions +[unresolved]: #unresolved-questions + +* This is not the only change that's known to Cargo which is known to not be + forwards-compatible, so it may be best to lump them all together into one + Cargo release instead of releasing them over time, but should this be blocked + on those ideas? (note they have not been formed into an RFC yet) + + diff --git a/text/1398-kinds-of-allocators.md b/text/1398-kinds-of-allocators.md new file mode 100644 index 00000000000..720e6fdcde4 --- /dev/null +++ b/text/1398-kinds-of-allocators.md @@ -0,0 +1,2198 @@ +- Feature Name: allocator_api +- Start Date: 2015-12-01 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1398 +- Rust Issue: https://github.com/rust-lang/rust/issues/32838 + +# Summary +[summary]: #summary + +Add a standard allocator interface and support for user-defined +allocators, with the following goals: + + 1. Allow libraries (in libstd and elsewhere) to be generic with + respect to the particular allocator, to support distinct, + stateful, per-container allocators. + + 2. Require clients to supply metadata (such as block size and + alignment) at the allocation and deallocation sites, to ensure + hot-paths are as efficient as possible. + + 3. Provide high-level abstraction over the layout of an object in + memory. + +Regarding GC: We plan to allow future allocators to integrate +themselves with a standardized reflective GC interface, but leave +specification of such integration for a later RFC. (The design +describes a way to add such a feature in the future while ensuring +that clients do not accidentally opt-in and risk unsound behavior.) + +# Motivation +[motivation]: #motivation + +As noted in [RFC PR 39][] (and reiterated in [RFC PR 244][]), modern general purpose allocators are good, +but due to the design tradeoffs they must make, cannot be optimal in +all contexts. (It is worthwhile to also read discussion of this claim +in papers such as +[Reconsidering Custom Malloc](#reconsidering-custom-memory-allocation).) + +Therefore, the standard library should allow clients to plug in their +own allocator for managing memory. + +## Allocators are used in C++ system programming + +The typical reasons given for use of custom allocators in C++ are among the +following: + + 1. Speed: A custom allocator can be tailored to the particular + memory usage profiles of one client. This can yield advantages + such as: + + * A bump-pointer based allocator, when available, is faster + than calling `malloc`. + + * Adding memory padding can reduce/eliminate false sharing of + cache lines. + + 2. Stability: By segregating different sub-allocators and imposing + hard memory limits upon them, one has a better chance of handling + out-of-memory conditions. + + If everything comes from a single global heap, it becomes much + harder to handle out-of-memory conditions because by the time the + handler runs, it is almost certainly going to be unable to + allocate any memory for its own work. + + 3. Instrumentation and debugging: One can swap in a custom + allocator that collects data such as number of allocations, + or time for requests to be serviced. + +## Allocators should feel "rustic" + +In addition, for Rust we want an allocator API design that leverages +the core type machinery and language idioms (e.g. using `Result` to +propagate dynamic error conditions), and provides +premade functions for common patterns for allocator clients (such as +allocating either single instances of a type, or arrays of some types +of dynamically-determined length). + +## Garbage Collection integration + +Finally, we want our allocator design to allow for a garbage +collection (GC) interface to be added in the future. + +At the very least, we do not want to accidentally *disallow* GC by +choosing an allocator API that is fundamentally incompatible with it. + +(However, this RFC does not actually propose a concrete solution for +how to integrate allocators with GC.) + +# Detailed design +[design]: #detailed-design + +## The `Allocator` trait at a glance + +The source code for the `Allocator` trait prototype is provided in an +[appendix][Source for Allocator]. But since that section is long, here +we summarize the high-level points of the `Allocator` API. + +(See also the [walk thru][] section, which actually links to +individual sections of code.) + + * Basic implementation of the trait requires just two methods + (`alloc` and `dealloc`). You can get an initial implemention off + the ground with relatively little effort. + + * All methods that can fail to satisfy a request return a `Result` + (rather than building in an assumption that they panic or abort). + + * Furthermore, allocator implementations are discouraged from + directly panicking or aborting on out-of-memory (OOM) during + calls to allocation methods; instead, + clients that do wish to report that OOM occurred via a particular + allocator can do so via the `Allocator::oom()` method. + + * OOM is not the only type of error that may occur in general; + allocators can inject more specific error types to indicate + why an allocation failed. + + * The metadata for any allocation is captured in a `Layout` + abstraction. This type carries (at minimum) the size and alignment + requirements for a memory request. + + * The `Layout` type provides a large family of functional construction + methods for building up the description of how memory is laid out. + + * Any sized type `T` can be mapped to its `Layout`, via `Layout::new::()`, + + * Heterogenous structure; e.g. `layout1.extend(layout2)`, + + * Homogenous array types: `layout.repeat(n)` (for `n: usize`), + + * There are packed and unpacked variants for the latter two methods. + + * Helper `Allocator` methods like `fn alloc_one` and `fn + alloc_array` allow client code to interact with an allocator + without ever directly constructing a `Layout`. + + * Once an `Allocator` implementor has the `fn alloc` and `fn dealloc` + methods working, it can provide overrides of the other methods, + providing hooks that take advantage of specific details of how your + allocator is working underneath the hood. + + * In particular, the interface provides a few ways to let clients + potentially reuse excess memory associated with a block + + * `fn realloc` is a common pattern (where the client hopes that + the method will reuse the original memory when satisfying the + `realloc` request). + + * `fn alloc_excess` and `fn usable_size` provide an alternative + pattern, where your allocator tells the client about the excess + memory provided to satisfy a request, and the client can directly + expand into that excess memory, without doing round-trip requests + through the allocator itself. + +## Semantics of allocators and their memory blocks +[semantics of allocators]: #semantics-of-allocators-and-their-memory-blocks + +In general, an allocator provide access to a memory pool that owns +some amount of backing storage. The pool carves off chunks of that +storage and hands it out, via the allocator, as individual blocks of +memory to service client requests. (A "client" here is usually some +container library, like `Vec` or `HashMap`, that has been suitably +parameterized so that it has an `A:Allocator` type parameter.) + +So, an interaction between a program, a collection library, and an +allocator might look like this: + + +If you cannot see the SVG linked here, try the [ASCII art version][ascii-art] appendix. +Also, if you have suggestions for changes to the SVG, feel free to write them as a comment +in that appendix; (but be sure to be clear that you are pointing out a suggestion for the SVG). + + +In general, an allocator might be the backing memory pool itself; or +an allocator might merely be a *handle* that references the memory +pool. In the former case, when the allocator goes out of scope or is +otherwise dropped, the memory pool is dropped as well; in the latter +case, dropping the allocator has no effect on the memory pool. + + * One allocator that acts as a handle is the global heap allocator, + whose associated pool is the low-level `#[allocator]` crate. + + * Another allocator that acts as a handle is a `&'a Pool`, where + `Pool` is some structure implementing a sharable backing store. + The big [example][] section shows an instance of this. + + * An allocator that is its own memory pool would be a type + analogous to `Pool` that implements the `Allocator` interface + directly, rather than via `&'a Pool`. + + * A case in the middle of the two extremes might be something like an + allocator of the form `Rc>`. This reflects *shared* + ownership between a collection of allocators handles: dropping one + handle will not drop the pool as long as at least one other handle + remains, but dropping the last handle will drop the pool itself. + + FIXME: `RefCell` is not going to work with the allocator API + envisaged here; see [comment from gankro][]. We will need to + address this (perhaps just by pointing out that it is illegal and + suggesting a standard pattern to work around it) before this RFC + can be accepted. + +[comment from gankro]: https://github.com/rust-lang/rfcs/pull/1398#issuecomment-162681096 + +A client that is generic over all possible `A:Allocator` instances +cannot know which of the above cases it falls in. This has consequences +in terms of the restrictions that must be met by client code +interfacing with an allocator, which we discuss in a +later [section on lifetimes][lifetimes]. + + +## Example Usage +[example]: #example-usage + +Lets jump into a demo. Here is a (super-dumb) bump-allocator that uses +the `Allocator` trait. + +### Implementing the `Allocator` trait + +First, the bump-allocator definition itself: each such allocator will +have its own name (for error reports from OOM), start and limit +pointers (`ptr` and `end`, respectively) to the backing storage it is +allocating into, as well as the byte alignment (`align`) of that +storage, and an `avail: AtomicPtr` for the cursor tracking how +much we have allocated from the backing storage. +(The `avail` field is an atomic because eventually we want to try +sharing this demo allocator across scoped threads.) + +```rust +#[derive(Debug)] +pub struct DumbBumpPool { + name: &'static str, + ptr: *mut u8, + end: *mut u8, + avail: AtomicPtr, + align: usize, +} +``` + +The initial implementation is pretty straight forward: just immediately +allocate the whole pool's backing storage. + +(If we wanted to be really clever we might layer this type on top of +*another* allocator. +For this demo I want to try to minimize cleverness, so we will use +`heap::allocate` to grab the backing storage instead of taking an +`Allocator` of our own.) + + +```rust +impl DumbBumpPool { + pub fn new(name: &'static str, + size_in_bytes: usize, + start_align: usize) -> DumbBumpPool { + unsafe { + let ptr = heap::allocate(size_in_bytes, start_align); + if ptr.is_null() { panic!("allocation failed."); } + let end = ptr.offset(size_in_bytes as isize); + DumbBumpPool { + name: name, + ptr: ptr, end: end, avail: AtomicPtr::new(ptr), + align: start_align + } + } + } +} +``` + +Since clients are not allowed to have blocks that outlive their +associated allocator (see the [lifetimes][] section), +it is sound for us to always drop the backing storage for an allocator +when the allocator itself is dropped +(regardless of what sequence of `alloc`/`dealloc` interactions occured +with the allocator's clients). + +```rust +impl Drop for DumbBumpPool { + fn drop(&mut self) { + unsafe { + let size = self.end as usize - self.ptr as usize; + heap::deallocate(self.ptr, size, self.align); + } + } +} +``` + +Here are some other design choices of note: + + * Our Bump Allocator is going to use a most simple-minded deallocation + policy: calls to `fn dealloc` are no-ops. Instead, every request takes + up fresh space in the backing storage, until the pool is exhausted. + (This was one reason I use the word "Dumb" in its name.) + + * Since we want to be able to share the bump-allocator amongst multiple + (lifetime-scoped) threads, we will implement the `Allocator` interface + as a *handle* pointing to the pool; in this case, a simple reference. + + * Since the whole point of this particular bump-allocator is to + shared across threads (otherwise there would be no need to use + `AtomicPtr` for the `avail` field), we will want to implement the + (unsafe) `Sync` trait on it (doing this signals that it is safe to + send `&DumbBumpPool` to other threads). + +Here is that `impl Sync`. + +```rust +/// Note of course that this impl implies we must review all other +/// code for DumbBumpPool even more carefully. +unsafe impl Sync for DumbBumpPool { } +``` + +Here is the demo implementation of `Allocator` for the type. + +```rust +unsafe impl<'a> Allocator for &'a DumbBumpPool { + unsafe fn alloc(&mut self, layout: alloc::Layout) -> Result { + let align = layout.align(); + let size = layout.size(); + + let mut curr_addr = self.avail.load(Ordering::Relaxed); + loop { + let curr = curr_addr as usize; + let (sum, oflo) = curr.overflowing_add(align - 1); + let curr_aligned = sum & !(align - 1); + let remaining = (self.end as usize) - curr_aligned; + if oflo || remaining < size { + return Err(AllocErr::Exhausted { request: layout.clone() }); + } + + let curr_aligned = curr_aligned as *mut u8; + let new_curr = curr_aligned.offset(size as isize); + + let attempt = self.avail.compare_and_swap(curr_addr, new_curr, Ordering::Relaxed); + // If the allocation attempt hits interference ... + if curr_addr != attempt { + curr_addr = attempt; + continue; // .. then try again + } else { + println!("alloc finis ok: 0x{:x} size: {}", curr_aligned as usize, size); + return Ok(curr_aligned); + } + } + } + + unsafe fn dealloc(&mut self, _ptr: Address, _layout: alloc::Layout) { + // this bump-allocator just no-op's on dealloc + } + + fn oom(&mut self, err: AllocErr) -> ! { + let remaining = self.end as usize - self.avail.load(Ordering::Relaxed) as usize; + panic!("exhausted memory in {} on request {:?} with avail: {}; self: {:?}", + self.name, err, remaining, self); + } + +} +``` + +(Niko Matsakis has pointed out that this particular allocator might +avoid interference errors by using fetch-and-add rather than +compare-and-swap. The devil's in the details as to how one might +accomplish that while still properly adjusting for alignment; in any +case, the overall point still holds in cases outside of this specific +demo.) + +And that is it; we are done with our allocator implementation. + +### Using an `A:Allocator` from the client side + +We assume that `Vec` has been extended with a `new_in` method that +takes an allocator argument that it uses to satisfy its allocation +requests. + +```rust +fn demo_alloc(a1:A1, a2: A2, print_state: F) { + let mut v1 = Vec::new_in(a1); + let mut v2 = Vec::new_in(a2); + println!("demo_alloc, v1; {:?} v2: {:?}", v1, v2); + for i in 0..10 { + v1.push(i as u64 * 1000); + v2.push(i as u8); + v2.push(i as u8); + } + println!("demo_alloc, v1; {:?} v2: {:?}", v1, v2); + print_state(); + for i in 10..100 { + v1.push(i as u64 * 1000); + v2.push(i as u8); + v2.push(i as u8); + } + println!("demo_alloc, v1.len: {} v2.len: {}", v1.len(), v2.len()); + print_state(); + for i in 100..1000 { + v1.push(i as u64 * 1000); + v2.push(i as u8); + v2.push(i as u8); + } + println!("demo_alloc, v1.len: {} v2.len: {}", v1.len(), v2.len()); + print_state(); +} + +fn main() { + use std::thread::catch_panic; + + if let Err(panicked) = catch_panic(|| { + let alloc = DumbBumpPool::new("demo-bump", 4096, 1); + demo_alloc(&alloc, &alloc, || println!("alloc: {:?}", alloc)); + }) { + match panicked.downcast_ref::() { + Some(msg) => { + println!("DumbBumpPool panicked: {}", msg); + } + None => { + println!("DumbBumpPool panicked"); + } + } + } + + // // The below will be (rightly) rejected by compiler when + // // all pieces are properly in place: It is not valid to + // // have the vector outlive the borrowed allocator it is + // // referencing. + // + // let v = { + // let alloc = DumbBumpPool::new("demo2", 4096, 1); + // let mut v = Vec::new_in(&alloc); + // for i in 1..4 { v.push(i); } + // v + // }; + + let alloc = DumbBumpPool::new("demo-bump", 4096, 1); + for i in 0..100 { + let r = ::std::thread::scoped(|| { + let v = Vec::new_in(&alloc); + for j in 0..10 { + v.push(j); + } + }); + } + + println!("got here"); +} +``` + +And that's all to the demo, folks. + +### What about standard library containers? + +The intention of this RFC is that the Rust standard library will be +extended with parameteric allocator support: `Vec`, `HashMap`, etc +should all eventually be extended with the ability to use an +alternative allocator for their backing storage. + +However, this RFC does not prescribe when or how this should happen. + +Under the design of this RFC, Allocators parameters are specified via +a *generic type parameter* on the container type. This strongly +implies that `Vec` and `HashMap` will need to be extended +with an allocator type parameter, i.e.: `Vec` and +`HashMap`. + +There are two reasons why such extension is left to later work, after +this RFC. + +#### Default type parameter fallback + +On its own, such a change would be backwards incompatible (i.e. a huge +breaking change), and also would simply be just plain inconvenient for +typical use cases. Therefore, the newly added type parameters will +almost certainly require a *default type*: `Vec` and +`HashMap`. + +Default type parameters themselves, in the context of type defintions, +are a stable part of the Rust language. + +However, the exact semantics of how default type parameters interact +with inference is still being worked out (in part *because* allocators +are a motivating use case), as one can see by reading the following: + +* RFC 213, "Finalize defaulted type parameters": https://github.com/rust-lang/rfcs/blob/master/text/0213-defaulted-type-params.md + + * Tracking Issue for RFC 213: Default Type Parameter Fallback: https://github.com/rust-lang/rust/issues/27336 + +* Feature gate defaulted type parameters appearing outside of types: https://github.com/rust-lang/rust/pull/30724 + +#### Fully general container integration needs Dropck Eyepatch + +The previous problem was largely one of programmer +ergonomics. However, there is also a subtle soundness issue that +arises due to an current implementation artifact. + +Standard library types like `Vec` and `HashMap` allow +instantiating the generic parameters `T`, `K`, `V` with types holding +lifetimes that do not strictly outlive that of the container itself. +(I will refer to such instantiations of `Vec` and `HashMap` +"same-lifetime instances" as a shorthand in this discussion.) + +Same-lifetime instance support is currently implemented for `Vec` and +`HashMap` via an unstable attribute that is too +coarse-grained. Therefore, we cannot soundly add the allocator +parameter to `Vec` and `HashMap` while also continuing to allow +same-lifetime instances without first addressing this overly coarse +attribute. I have an open RFC to address this, the "Dropck Eyepatch" +RFC; that RFC explains in more detail why this problem arises, using +allocators as a specific motivating use case. + + * Concrete code illustrating this exact example (part of Dropck Eyepatch RFC): + https://github.com/pnkfelix/rfcs/blob/dropck-eyepatch/text/0000-dropck-param-eyepatch.md#example-vect-aallocatordefaultallocator + + * Nonparametric dropck RFC https://github.com/rust-lang/rfcs/blob/master/text/1238-nonparametric-dropck.md + +#### Standard library containers conclusion + +Rather than wait for the above issues to be resolved, this RFC +proposes that we at least stabilize the `Allocator` trait interface; +then we will at least have a starting point upon which to prototype +standard library integration. + +## Allocators and lifetimes +[lifetimes]: #allocators-and-lifetimes + +As mentioned above, allocators provide access to a memory pool. An +allocator can *be* the pool (in the sense that the allocator owns the +backing storage that represents the memory blocks it hands out), or an +allocator can just be a handle that points at the pool. + +Some pools have indefinite extent. An example of this is the global +heap allocator, requesting memory directly from the low-level +`#[allocator]` crate. Clients of an allocator with such a pool need +not think about how long the allocator lives; instead, they can just +freely allocate blocks, use them at will, and deallocate them at +arbitrary points in the future. Memory blocks that come from such a +pool will leak if it is not explicitly deallocated. + +Other pools have limited extent: they are created, they build up +infrastructure to manage their blocks of memory, and at some point, +such pools are torn down. Memory blocks from such a pool may or may +not be returned to the operating system during that tearing down. + +There is an immediate question for clients of an allocator with the +latter kind of pool (i.e. one of limited extent): whether it should +attempt to spend time deallocating such blocks, and if so, at what +time to do so? + +Again, note: + + * generic clients (i.e. that accept any `A:Allocator`) *cannot know* + what kind of pool they have, or how it relates to the allocator it + is given, + + * dropping the client's allocator may or may not imply the dropping + of the pool itself! + +That is, code written to a specific `Allocator` implementation may be +able to make assumptions about the relationship between the memory +blocks and the allocator(s), but the generic code we expect the +standard library to provide cannot make such assumptions. + +To satisfy the above scenarios in a sane, consistent, general fashion, +the `Allocator` trait assumes/requires all of the following conditions. +(Note: this list of conditions uses the phrases "should", "must", and "must not" +in a formal manner, in the style of [IETF RFC 2119][].) + +[IETF RFC 2119]: https://www.ietf.org/rfc/rfc2119.txta + + 1. (for allocator impls and clients): in the absence of other + information (e.g. specific allocator implementations), all blocks + from a given pool have lifetime equivalent to the lifetime of the + pool. + + This implies if a client is going to read from, write to, or + otherwise manipulate a memory block, the client *must* do so before + its associated pool is torn down. + + (It also implies the converse: if a client can prove that the pool + for an allocator is still alive, then it can continue to work + with a memory block from that allocator even after the allocator + is dropped.) + + 2. (for allocator impls): an allocator *must not* outlive its + associated pool. + + All clients can assume this in their code. + + (This constraint provides generic clients the preconditions they + need to satisfy the first condition. In particular, even though + clients do not generally know what kind of pool is associated with + its allocator, it can conservatively assume that all blocks will + live at least as long as the allocator itself.) + + 3. (for allocator impls and clients): all clients of an allocator + *should* eventually call the `dealloc` method on every block they + want freed (otherwise, memory may leak). + + However, allocator implementations *must* remain sound even if + this condition is not met: If `dealloc` is not invoked for all + blocks and this condition is somehow detected, then an allocator + can panic (or otherwise signal failure), but that sole violation + must not cause undefined behavior. + + (This constraint is to encourage generic client authors to write + code that will not leak memory when instantiated with allocators + of indefinite extent, such as the global heap allocator.) + + 4. (for allocator impls): moving an allocator value *must not* + invalidate its outstanding memory blocks. + + All clients can assume this in their code. + + So if a client allocates a block from an allocator (call it `a1`) + and then `a1` moves to a new place (e.g. via`let a2 = a1;`), then + it remains sound for the client to deallocate that block via + `a2`. + + Note that this implies that it is not sound to implement an + allocator that embeds its own pool structurally inline. + + E.g. this is *not* a legal allocator: + ```rust + struct MegaEmbedded { pool: [u8; 1024*1024], cursor: usize, ... } + impl Allocator for MegaEmbedded { ... } // INVALID IMPL + ``` + The latter impl is simply unreasonable (at least if one is + intending to satisfy requests by returning pointers into + `self.bytes`). + + Note that an allocator that owns its pool *indirectly* + (i.e. does not have the pool's state embedded in the allocator) is fine: + ```rust + struct MegaIndirect { pool: *mut [u8; 1024*1024], cursor: usize, ... } + impl Allocator for MegaIndirect { ... } // OKAY + ``` + + (I originally claimed that `impl Allocator for &mut MegaEmbedded` + would also be a legal example of an allocator that is an indirect handle + to an unembedded pool, but others pointed out that handing out the + addresses pointing into that embedded pool could end up violating our + aliasing rules for `&mut`. I obviously did not expect that outcome; I + would be curious to see what the actual design space is here.) + + 5. (for allocator impls and clients) if an allocator is cloneable, the + client *can assume* that all clones + are interchangably compatible in terms of their memory blocks: if + allocator `a2` is a clone of `a1`, then one can allocate a block + from `a1` and return it to `a2`, or vice versa, or use `a2.realloc` + on the block, et cetera. + + This essentially means that any cloneable + allocator *must* be a handle indirectly referencing a pool of some + sort. (Though do remember that such handles can collectively share + ownership of their pool, such as illustrated in the + `Rc>` example given earlier.) + + (Note: one might be tempted to further conclude that this also + implies that allocators implementing `Copy` must have pools of + indefinite extent. While this seems reasonable for Rust as it + stands today, I am slightly worried whether it would continue to + hold e.g. in a future version of Rust with something like + `Gc: Copy`, where the `GcPool` and its blocks is reclaimed + (via finalization) sometime after being determined to be globally + unreachable. Then again, perhaps it would be better to simply say + "we will not support that use case for the allocator API", so that + clients would be able to employ the reasoning outlined in the + outset of this paragraph.) + + +## A walk through the Allocator trait +[walk thru]: #a-walk-through-the-allocator-trait + +### Role-Based Type Aliases + +Allocation code often needs to deal with values that boil down to a +`usize` in the end. But there are distinct roles (e.g. "size", +"alignment") that such values play, and I decided those roles would be +worth hard-coding into the method signatures. + + * Therefore, I made [type aliases][] for `Size`, `Capacity`, `Alignment`, and `Address`. + +### Basic implementation + +An instance of an allocator has many methods, but an implementor of +the trait need only provide two method bodies: [alloc and dealloc][]. + +(This is only *somewhat* analogous to the `Iterator` trait in Rust. It +is currently very uncommon to override any methods of `Iterator` except +for `fn next`. However, I expect it will be much more common for +`Allocator` to override at least some of the other methods, like `fn +realloc`.) + +The `alloc` method returns an `Address` when it succeeds, and +`dealloc` takes such an address as its input. But the client must also +provide metadata for the allocated block like its size and alignment. +This is encapsulated in the `Layout` argument to `alloc` and `dealloc`. + +### Memory layouts + +A `Layout` just carries the metadata necessary for satisfying an +allocation request. Its (current, private) representation is just a +size and alignment. + +The more interesting thing about `Layout` is the +family of public methods associated with it for building new layouts via +composition; these are shown in the [layout api][]. + +### Reallocation Methods + +Of course, real-world allocation often needs more than just +`alloc`/`dealloc`: in particular, one often wants to avoid extra +copying if the existing block of memory can be conceptually expanded +in place to meet new allocation needs. In other words, we want +`realloc`, plus alternatives to it (`alloc_excess`) that allow clients to avoid +round-tripping through the allocator API. + +For this, the [memory reuse][] family of methods is appropriate. + +### Type-based Helper Methods + +Some readers might skim over the `Layout` API and immediately say "yuck, +all I wanted to do was allocate some nodes for a tree-structure and +let my clients choose how the backing memory is chosen! Why do I have +to wrestle with this `Layout` business?" + +I agree with the sentiment; that's why the `Allocator` trait provides +a family of methods capturing [common usage patterns][], +for example, `a.alloc_one::()` will return a `Unique` (or error). + +## Unchecked variants + +Almost all of the methods above return `Result`, and guarantee some +amount of input validation. (This is largely because I observed code +duplication doing such validation on the client side; or worse, such +validation accidentally missing.) + +However, some clients will want to bypass such checks (and do it +without risking undefined behavior, namely by ensuring the method preconditions +hold via local invariants in their container type). + +For these clients, the `Allocator` trait provides +["unchecked" variants][unchecked variants] of nearly all of its +methods; so `a.alloc_unchecked(layout)` will return an `Option
` +(where `None` corresponds to allocation failure). + +The idea here is that `Allocator` implementors are encouraged +to streamline the implmentations of such methods by assuming that all +of the preconditions hold. + + * However, to ease initial `impl Allocator` development for a given + type, all of the unchecked methods have default implementations + that call out to their checked counterparts. + + * (In other words, "unchecked" is in some sense a privilege being + offered to impl's; but there is no guarantee that an arbitrary impl + takes advantage of the privilege.) + +## Object-oriented Allocators + +Finally, we get to object-oriented programming. + +In general, we expect allocator-parametric code to opt *not* to use +trait objects to generalize over allocators, but instead to use +generic types and instantiate those types with specific concrete +allocators. + +Nonetheless, it *is* an option to write `Box` or `&Allocator`. + + * (The allocator methods that are not object-safe, like + `fn alloc_one(&mut self)`, have a clause `where Self: Sized` to + ensure that their presence does not cause the `Allocator` trait as + a whole to become non-object-safe.) + + +## Why this API +[Why this API]: #why-this-api + +Here are some quick points about how this API was selected + +### Why not just `free(ptr)` for deallocation? + +As noted in [RFC PR 39][] (and reiterated in [RFC PR 244][]), the basic `malloc` interface +{`malloc(size) -> ptr`, `free(ptr)`, `realloc(ptr, size) -> ptr`} is +lacking in a number of ways: `malloc` lacks the ability to request a +particular alignment, and `realloc` lacks the ability to express a +copy-free "reuse the input, or do nothing at all" request. Another +problem with the `malloc` interface is that it burdens the allocator +with tracking the sizes of allocated data and re-extracting the +allocated size from the `ptr` in `free` and `realloc` calls (the +latter can be very cheap, but there is still no reason to pay that +cost in a language like Rust where the relevant size is often already +immediately available as a compile-time constant). + +Therefore, in the name of (potential best-case) speed, we want to +require client code to provide the metadata like size and alignment +to both the allocation and deallocation call sites. + +### Why not just `alloc`/`dealloc` (or `alloc`/`dealloc`/`realloc`)? + +* The `alloc_one`/`dealloc_one` and `alloc_array`/`dealloc_array` + capture a very common pattern for allocation of memory blocks where + a simple value or array type is being allocated. + +* The `alloc_array_unchecked` and `dealloc_array_unchecked` likewise + capture a common pattern, but are "less safe" in that they put more + of an onus on the caller to validate the input parameters before + calling the methods. + +* The `alloc_excess` and `realloc_excess` methods provide a way for + callers who can make use of excess memory to avoid unnecessary calls + to `realloc`. + +### Why the `Layout` abstraction? + +While we do want to require clients to hand the allocator the size and +alignment, we have found that the code to compute such things follows +regular patterns. It makes more sense to factor those patterns out +into a common abstraction; this is what `Layout` provides: a high-level +API for describing the memory layout of a composite structure by +composing the layout of its subparts. + +### Why return `Result` rather than a raw pointer? + +My hypothesis is that the standard allocator API should embrace +`Result` as the standard way for describing local error conditions in +Rust. + + * A previous version of this RFC attempted to ensure that the use of + the `Result` type could avoid any additional overhead over a raw + pointer return value, by using a `NonZero` address type and a + zero-sized error type attached to the trait via an associated + `Error` type. But during the RFC process we decided that this + was not necessary. + +### Why return `Result` rather than directly `oom` on failure + +Again, my hypothesis is that the standard allocator API should embrace +`Result` as the standard way for describing local error conditions in +Rust. + +I want to leave it up to the clients to decide if they can respond to +out-of-memory (OOM) conditions on allocation failure. + +However, since I also suspect that some programs would benefit from +contextual information about *which* allocator is reporting memory +exhaustion, I have made `oom` a method of the `Allocator` trait, so +that allocator clients have the option of calling that on error. + +### Why is `usable_size` ever needed? Why not call `layout.size()` directly, as is done in the default implementation? + +`layout.size()` returns the minimum required size that the client needs. +In a block-based allocator, this may be less than the *actual* size +that the allocator would ever provide to satisfy that kind of +request. Therefore, `usable_size` provides a way for clients to +observe what the minimum actual size of an allocated block for +that`layout` would be, for a given allocator. + +(Note that the documentation does say that in general it is better for +clients to use `alloc_excess` and `realloc_excess` instead, if they +can, as a way to directly observe the *actual* amount of slop provided +by the particular allocator.) + +### Why is `Allocator` an `unsafe trait`? + +It just seems like a good idea given how much of the standard library +is going to assume that allocators are implemented according to their +specification. + +(I had thought that `unsafe fn` for the methods would suffice, but +that is putting the burden of proof (of soundness) in the *wrong* +direction...) + +## The GC integration strategy +[gc integration]: #the-gc-integration-strategy + +One of the main reasons that [RFC PR 39] was not merged as written +was because it did not account for garbage collection (GC). + +In particular, assuming that we eventually add support for GC in some +form, then any value that holds a reference to an object on the GC'ed +heap will need some linkage to the GC. In particular, if the *only* +such reference (i.e. the one with sole ownership) is held in a block +managed by a user-defined allocator, then we need to ensure that all +such references are found when the GC does its work. + +The Rust project has control over the `libstd` provided allocators, so +the team can adapt them as necessary to fit the needs of whatever GC +designs come around. But the same is not true for user-defined +allocators: we want to ensure that adding support for them does not +inadvertantly kill any chance for adding GC later. + +### The inspiration for Layout + +Some aspects of the design of this RFC were selected in the hopes that +it would make such integration easier. In particular, the introduction +of the relatively high-level `Kind` abstraction was developed, in +part, as a way that a GC-aware allocator would build up a tracing +method associated with a layout. + +Then I realized that the `Kind` abstraction may be valuable on its +own, without GC: It encapsulates important patterns when working with +representing data as memory records. + +(Later we decided to rename `Kind` to `Layout`, in part to avoid +confusion with the use of the word "kind" in the context of +higher-kinded types (HKT).) + +So, this RFC offers the `Layout` abstraction without promising that it +solves the GC problem. (It might, or it might not; we don't know yet.) + +### Forwards-compatibility + +So what *is* the solution for forwards-compatibility? + +It is this: Rather than trying to build GC support into the +`Allocator` trait itself, we instead assume that when GC support +comes, it may come with a new trait (call it `GcAwareAllocator`). + + * (Perhaps we will instead use an attribute; the point is, whatever + option we choose can be incorporated into the meta-data for a + crate.) + +Allocators that are are GC-compatible have to explicitly declare +themselves as such, by implementing `GcAwareAllocator`, which will +then impose new conditions on the methods of `Allocator`, for example +ensuring e.g. that allocated blocks of memory can be scanned +(i.e. "parsed") by the GC (if that in fact ends up being necessary). + +This way, we can deploy an `Allocator` trait API today that does not +provide the necessary reflective hooks that a GC would need to access. + +Crates that define their own `Allocator` implementations without also +claiming them to be GC-compatible will be forbidden from linking with +crates that require GC support. (In other words, when GC support +comes, we assume that the linking component of the Rust compiler will +be extended to check such compatibility requirements.) + +# Drawbacks +[drawbacks]: #drawbacks + +The API may be over-engineered. + +The core set of methods (the ones without `unchecked`) return +`Result` and potentially impose unwanted input validation overhead. + + * The `_unchecked` variants are intended as the response to that, + for clients who take care to validate the many preconditions + themselves in order to minimize the allocation code paths. + +# Alternatives +[alternatives]: #alternatives + +## Just adopt [RFC PR 39][] with this RFC's GC strategy + +The GC-compatibility strategy described here (in [gc integration][]) +might work with a large number of alternative designs, such as that +from [RFC PR 39][]. + +While that is true, it seems like it would be a little short-sighted. +In particular, I have neither proven *nor* disproven the value of +`Layout` system described here with respect to GC integration. + +As far as I know, it is the closest thing we have to a workable system +for allowing client code of allocators to accurately describe the +layout of values they are planning to allocate, which is the main +ingredient I believe to be necessary for the kind of dynamic +reflection that a GC will require of a user-defined allocator. + +## Make `Layout` an associated type of `Allocator` trait + +I explored making an `AllocLayout` bound and then having + +```rust +pub unsafe trait Allocator { + /// Describes the sort of records that this allocator can + /// construct. + type Layout: AllocLayout; + + ... +} +``` + +Such a design might indeed be workable. (I found it awkward, which is +why I abandoned it.) + +But the question is: What benefit does it bring? + +The main one I could imagine is that it might allow us to introduce a +division, at the type-system level, between two kinds of allocators: +those that are integrated with the GC (i.e., have an associated +`Allocator::Layout` that ensures that all allocated blocks are scannable +by a GC) and allocators that are *not* integrated with the GC (i.e., +have an associated `Allocator::Layout` that makes no guarantees about +one will know how to scan the allocated blocks. + +However, no such design has proven itself to be "obviously feasible to +implement," and therefore it would be unreasonable to make the `Layout` +an associated type of the `Allocator` trait without having at least a +few motivating examples that *are* clearly feasible and useful. + +## Variations on the `Layout` API + + * Should `Layout` offer a `fn resize(&self, new_size: usize) -> Layout` constructor method? + (Such a method would rule out deriving GC tracers from layouts; but we could + maybe provide it as an `unsafe` method.) + + * Should `Layout` ensure an invariant that its associated size is + always a multiple of its alignment? + + * Doing this would allow simplifying a small part of the API, + namely the distinct `Layout::repeat` (returns both a layout and an + offset) versus `Layout::array` (where the offset is derivable from + the input `T`). + + * Such a constraint would have precendent; in particular, the + `aligned_alloc` function of C11 requires the given size + be a multiple of the alignment. + + * On the other hand, both the system and jemalloc allocators seem + to support more flexible allocation patterns. Imposing the above + invariant implies a certain loss of expressiveness over what we + already provide today. + + * Should `Layout` ensure an invariant that its associated size is always positive? + + * Pro: Removes something that allocators would need to check about + input layouts (the backing memory allocators will tend to require + that the input sizes are positive). + + * Con: Requiring positive size means that zero-sized types do not have an associated + `Layout`. That's not the end of the world, but it does make the `Layout` API slightly + less convenient (e.g. one cannot use `extend` with a zero-sized layout to + forcibly inject padding, because zero-sized layouts do not exist). + + * Should `Layout::align_to` add padding to the associated size? (Probably not; this would + make it impossible to express certain kinds of patteerns.) + + * Should the `Layout` methods that might "fail" return `Result` instead of `Option`? + +## Variations on the `Allocator` API + + * Should the allocator methods take `&self` or `self` rather than `&mut self`. + + As noted during in the RFC comments, nearly every trait goes through a bit + of an identity crisis in terms of deciding what kind of `self` parameter is + appropriate. + + The justification for `&mut self` is this: + + * It does not restrict allocator implementors from making sharable allocators: + to do so, just do `impl<'a> Allocator for &'a MySharedAlloc`, as illustrated + in the `DumbBumpPool` example. + + * `&mut self` is better than `&self` for simple allocators that are *not* sharable. + `&mut self` ensures that the allocation methods have exclusive + access to the underlying allocator state, without resorting to a + lock. (Another way of looking at it: It moves the onus of using a + lock outward, to the allocator clients.) + + * One might think that the points made + above apply equally well to `self` (i.e., if you want to implement an allocator + that wants to take itself via a `&mut`-reference when the methods take `self`, + then do `impl<'a> Allocator for &'a mut MyUniqueAlloc`). + + However, the problem with `self` is that if you want to use an + allocator for *more than one* allocation, you will need to call + `clone()` (or make the allocator parameter implement + `Copy`). This means in practice all allocators will need to + support `Clone` (and thus support sharing in general, as + discussed in the [Allocators and lifetimes][lifetimes] section). + + (Remember, I'm thinking about allocator-parametric code like + `Vec`, which does not know if the `A` is a + `&mut`-reference. In that context, therefore one cannot assume + that reborrowing machinery is available to the client code.) + + Put more simply, requiring that allocators implement `Clone` means + that it will *not* be pratical to do + `impl<'a> Allocator for &'a mut MyUniqueAlloc`. + + By using `&mut self` for the allocation methods, we can encode + the expected use case of an *unshared* allocator that is used + repeatedly in a linear fashion (e.g. vector that needs to + reallocate its backing storage). + + * Should the types representing allocated storage have lifetimes attached? + (E.g. `fn alloc<'a>(&mut self, layout: &alloc::Layout) -> Address<'a>`.) + + I think Gankro [put it best](https://github.com/rust-lang/rfcs/pull/1398#issuecomment-164003160): + + > This is a low-level unsafe interface, and the expected usecases make it + > both quite easy to avoid misuse, and impossible to use lifetimes + > (you want a struct to store the allocator and the allocated elements). + > Any time we've tried to shove more lifetimes into these kinds of + > interfaces have just been an annoying nuisance necessitating + > copy-lifetime/transmute nonsense. + + * Should `Allocator::alloc` be safe instead of `unsafe fn`? + + * Clearly `fn dealloc` and `fn realloc` need to be `unsafe`, since + feeding in improper inputs could cause unsound behavior. But is + there any analogous input to `fn alloc` that could cause + unsoundness (assuming that the `Layout` struct enforces invariants + like "the associated size is non-zero")? + + * (I left it as `unsafe fn alloc` just to keep the API uniform with + `dealloc` and `realloc`.) + + * Should `Allocator::realloc` not require that `new_layout.align()` + evenly divide `layout.align()`? In particular, it is not too + expensive to check if the two layouts are not compatible, and fall + back on `alloc`/`dealloc` in that case. + + * Should `Allocator` not provide unchecked variants on `fn alloc`, + `fn realloc`, et cetera? (To me it seems having them does no harm, + apart from potentially misleading clients who do not read the + documentation about what scenarios yield undefined behavior. + + * Another option here would be to provide a `trait + UncheckedAllocator: Allocator` that carries the unchecked + methods, so that clients who require such micro-optimized paths + can ensure that their clients actually pass them an + implementation that has the checks omitted. + + * On the flip-side of the previous bullet, should `Allocator` provide + `fn alloc_one_unchecked` and `fn dealloc_one_unchecked` ? + I think the only check that such variants would elide would be that + `T` is not zero-sized; I'm not sure that's worth it. + (But the resulting uniformity of the whole API might shift the + balance to "worth it".) + + * Should the precondition of allocation methods be loosened to + accept zero-sized types? + + Right now, there is a requirement that the allocation requests + denote non-zero sized types (this requirement is encoded in two + ways: for `Layout`-consuming methods like `alloc`, it is enforced + via the invariant that the `Size` is a `NonZero`, and this is + enforced by checks in the `Layout` construction code; for the + convenience methods like `alloc_one`, they will return `Err` if the + allocation request is zero-sized). + + The main motivation for this restriction is some underlying system + allocators, like `jemalloc`, explicitly disallow zero-sized + inputs. Therefore, to remove all unnecessary control-flow branches + between the client and the underlying allocator, the `Allocator` + trait is bubbling that restriction up and imposing it onto the + clients, who will presumably enforce this invariant via + container-specific means. + + But: pre-existing container types (like `Vec`) already + *allow* zero-sized `T`. Therefore, there is an unfortunate mismatch + between the ideal API those container would prefer for their + allocators and the actual service that this `Allocator` trait is + providing. + + So: Should we lift this precondition of the allocation methods, and allow + zero-sized requests (which might be handled by a global sentinel value, or + by an allocator-specific sentinel value, or via some other means -- this + would have to be specified as part of the Allocator API)? + + (As a middle ground, we could lift the precondition solely for the convenience + methods like `fn alloc_one` and `fn alloc_array`; that way, the most low-level + methods like `fn alloc` would continue to minimize the overhead they add + over the underlying system allocator, while the convenience methods would truly + be convenient.) + + * Should `oom` be a free-function rather than a method on `Allocator`? + (The reason I want it on `Allocator` is so that it can provide feedback + about the allocator's state at the time of the OOM. Zoxc has argued + on the RFC thread that some forms of static analysis, to prove `oom` is + never invoked, would prefer it to be a free function.) + +# Unresolved questions +[unresolved]: #unresolved-questions + + * Since we cannot do `RefCell` (see FIXME above), what is + our standard recommendation for what to do instead? + + * Should `Layout` be an associated type of `Allocator` (see + [alternatives][] section for discussion). + (In fact, most of the "Variations correspond to potentially + unresolved questions.) + + * Are the type definitions for `Size`, `Capacity`, `Alignment`, and + `Address` an abuse of the `NonZero` type? (Or do we just need some + constructor for `NonZero` that asserts that the input is non-zero)? + + * Do we need `Allocator::max_size` and `Allocator::max_align` ? + + * Should default impl of `Allocator::max_align` return `None`, or is + there more suitable default? (perhaps e.g. `PLATFORM_PAGE_SIZE`?) + + The previous allocator documentation provided by Daniel Micay + suggest that we should specify that behavior unspecified if + allocation is too large, but if that is the case, then we should + definitely provide some way to *observe* that threshold.) + + From what I can tell, we cannot currently assume that all + low-level allocators will behave well for large alignments. + See https://github.com/rust-lang/rust/issues/30170 + + * Should `Allocator::oom` also take a `std::fmt::Arguments<'a>` parameter + so that clients can feed in context-specific information that is not + part of the original input `Layout` argument? (I have not done this + mainly because I do not want to introduce a dependency on `libstd`.) + +# Change History + +* Changed `fn usable_size` to return `(l, m)` rathern than just `m`. + +* Removed `fn is_transient` from `trait AllocError`, and removed discussion + of transient errors from the API. + +* Made `fn dealloc` method infallible (i.e. removed its `Result` return type). + +* Alpha-renamed `alloc::Kind` type to `alloc::Layout`, and made it non-`Copy`. + +* Revised `fn oom` method to take the `Self::Error` as an input (so that the + allocator can, indirectly, feed itself information about what went wrong). + +* Removed associated `Error` type from `Allocator` trait; all methods now use `AllocErr` + for error type. Removed `AllocError` trait and `MemoryExhausted` error. + +* Removed `fn max_size` and `fn max_align` methods; we can put them back later if + someone demonstrates a need for them. + +* Added `fn realloc_in_place`. + +* Removed uses of `NonZero`. Made `Layout` able to represent zero-sized layouts. + A given `Allocator` may or may not support zero-sized layouts. + +# Appendices + +## Bibliography +[Bibliography]: #bibliography + +### RFC Pull Request #39: Allocator trait +[RFC PR 39]: https://github.com/rust-lang/rfcs/pull/39/files + +Daniel Micay, 2014. RFC: Allocator trait. https://github.com/thestinger/rfcs/blob/ad4cdc2662cc3d29c3ee40ae5abbef599c336c66/active/0000-allocator-trait.md + +### RFC Pull Request #244: Allocator RFC, take II +[RFC PR 244]: https://github.com/rust-lang/rfcs/pull/244 + +Felix Klock, 2014, Allocator RFC, take II, https://github.com/pnkfelix/rfcs/blob/d3c6068e823f495ee241caa05d4782b16e5ef5d8/active/0000-allocator.md + +### Dynamic Storage Allocation: A Survey and Critical Review +Paul R. Wilson, Mark S. Johnstone, Michael Neely, and David Boles, 1995. [Dynamic Storage Allocation: A Survey and Critical Review](https://parasol.tamu.edu/~rwerger/Courses/689/spring2002/day-3-ParMemAlloc/papers/wilson95dynamic.pdf) ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps . Slightly modified version appears in Proceedings of 1995 International Workshop on Memory Management (IWMM '95), Kinross, Scotland, UK, September 27--29, 1995 Springer Verlag LNCS + +### Reconsidering custom memory allocation +[ReCustomMalloc]: http://dl.acm.org/citation.cfm?id=582421 + +Emery D. Berger, Benjamin G. Zorn, and Kathryn S. McKinley. 2002. [Reconsidering custom memory allocation][ReCustomMalloc]. In Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA '02). + +### The memory fragmentation problem: solved? +[MemFragSolvedP]: http://dl.acm.org/citation.cfm?id=286864 + +Mark S. Johnstone and Paul R. Wilson. 1998. [The memory fragmentation problem: solved?][MemFragSolvedP]. In Proceedings of the 1st international symposium on Memory management (ISMM '98). + +### EASTL: Electronic Arts Standard Template Library +[EASTL]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html + +Paul Pedriana. 2007. [EASTL] -- Electronic Arts Standard Template Library. Document number: N2271=07-0131 + +### Towards a Better Allocator Model +[Halpern proposal]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1850.pdf + +Pablo Halpern. 2005. [Towards a Better Allocator Model][Halpern proposal]. Document number: N1850=05-0110 + +### Various allocators + +[jemalloc], [tcmalloc], [Hoard] + +[jemalloc]: http://www.canonware.com/jemalloc/ + +[tcmalloc]: http://goog-perftools.sourceforge.net/doc/tcmalloc.html + +[Hoard]: http://www.hoard.org/ + +[tracing garbage collector]: http://en.wikipedia.org/wiki/Tracing_garbage_collection + +[malloc/free]: http://en.wikipedia.org/wiki/C_dynamic_memory_allocation + +## ASCII art version of Allocator message sequence chart +[ascii-art]: #ascii-art-version-of-allocator-message-sequence-chart + +This is an ASCII art version of the SVG message sequence chart +from the [semantics of allocators] section. + +``` +Program Vec Allocator + || + || + +--------------- create allocator -------------------> ** (an allocator is born) + *| <------------ return allocator A ---------------------+ + || | + || | + +- create vec w/ &mut A -> ** (a vec is born) | + *| <------return vec V ------+ | + || | | + *------- push W_1 -------> *| | + | || | + | || | + | +--- allocate W array ---> *| + | | || + | | || + | | +---- (request system memory if necessary) + | | *| <-- ... + | | || + | *| <--- return *W block -----+ + | || | + | || | + *| <------- (return) -------+| | + || | | + +------- push W_2 -------->+| | + | || | + *| <------- (return) -------+| | + || | | + +------- push W_3 -------->+| | + | || | + *| <------- (return) -------+| | + || | | + +------- push W_4 -------->+| | + | || | + *| <------- (return) -------+| | + || | | + +------- push W_5 -------->+| | + | || | + | +---- realloc W array ---> *| + | | || + | | || + | | +---- (request system memory if necessary) + | | *| <-- ... + | | || + | *| <--- return *W block -----+ + *| <------- (return) -------+| | + || | | + || | | + . . . + . . . + . . . + || | | + || | | + || (end of Vec scope) | | + || | | + +------ drop Vec --------> *| | + | || (Vec destructor) | + | || | + | +---- dealloc W array --> *| + | | || + | | +---- (potentially return system memory) + | | *| <-- ... + | | || + | *| <------- (return) --------+ + *| <------- (return) --------+ | + || | + || | + || | + || (end of Allocator scope) | + || | + +------------------ drop Allocator ------------------> *| + | || + | |+---- (return any remaining associated memory) + | *| <-- ... + | || + *| <------------------ (return) -------------------------+ + || + || + . + . + . +``` + + +## Transcribed Source for Allocator trait API +[Source for Allocator]: #transcribed-source-for-allocator-trait-api + +Here is the whole source file for my prototype allocator API, +sub-divided roughly accordingly to functionality. + +(We start with the usual boilerplate...) + +```rust +// Copyright 2015 The Rust Project Developers. See the COPYRIGHT +// file at the top-level directory of this distribution and at +// http://rust-lang.org/COPYRIGHT. +// +// Licensed under the Apache License, Version 2.0 or the MIT license +// , at your +// option. This file may not be copied, modified, or distributed +// except according to those terms. + +#![unstable(feature = "allocator_api", + reason = "the precise API and guarantees it provides may be tweaked \ + slightly, especially to possibly take into account the \ + types being stored to make room for a future \ + tracing garbage collector", + issue = "27700")] + +use core::cmp; +use core::mem; +use core::nonzero::NonZero; +use core::ptr::{self, Unique}; + +``` + +### Type Aliases +[type aliases]: #type-aliases + +```rust +pub type Size = usize; +pub type Capacity = usize; +pub type Alignment = usize; + +pub type Address = *mut u8; + +/// Represents the combination of a starting address and +/// a total capacity of the returned block. +pub struct Excess(Address, Capacity); + +fn size_align() -> (usize, usize) { + (mem::size_of::(), mem::align_of::()) +} + +``` + +### Layout API +[layout api]: #layout-api + +```rust +/// Category for a memory record. +/// +/// An instance of `Layout` describes a particular layout of memory. +/// You build a `Layout` up as an input to give to an allocator. +/// +/// All layouts have an associated non-negative size and positive alignment. +#[derive(Clone, Debug, PartialEq, Eq)] +pub struct Layout { + // size of the requested block of memory, measured in bytes. + size: Size, + // alignment of the requested block of memory, measured in bytes. + // we ensure that this is always a power-of-two, because API's + ///like `posix_memalign` require it and it is a reasonable + // constraint to impose on Layout constructors. + // + // (However, we do not analogously require `align >= sizeof(void*)`, + // even though that is *also* a requirement of `posix_memalign`.) + align: Alignment, +} + + +// FIXME: audit default implementations for overflow errors, +// (potentially switching to overflowing_add and +// overflowing_mul as necessary). + +impl Layout { + // (private constructor) + fn from_size_align(size: usize, align: usize) -> Layout { + assert!(align.is_power_of_two()); + assert!(align > 0); + Layout { size: size, align: align } + } + + /// The minimum size in bytes for a memory block of this layout. + pub fn size(&self) -> usize { self.size } + + /// The minimum byte alignment for a memory block of this layout. + pub fn align(&self) -> usize { self.align } + + /// Constructs a `Layout` suitable for holding a value of type `T`. + pub fn new() -> Self { + let (size, align) = size_align::(); + Layout::from_size_align(size, align) + } + + /// Produces layout describing a record that could be used to + /// allocate backing structure for `T` (which could be a trait + /// or other unsized type like a slice). + pub fn for_value(t: &T) -> Self { + let (size, align) = (mem::size_of_val(t), mem::align_of_val(t)); + Layout::from_size_align(size, align) + } + + /// Creates a layout describing the record that can hold a value + /// of the same layout as `self`, but that also is aligned to + /// alignment `align` (measured in bytes). + /// + /// If `self` already meets the prescribed alignment, then returns + /// `self`. + /// + /// Note that this method does not add any padding to the overall + /// size, regardless of whether the returned layout has a different + /// alignment. In other words, if `K` has size 16, `K.align_to(32)` + /// will *still* have size 16. + pub fn align_to(&self, align: Alignment) -> Self { + if align > self.align { + let pow2_align = align.checked_next_power_of_two().unwrap(); + debug_assert!(pow2_align > 0); // (this follows from self.align > 0...) + Layout { align: pow2_align, + ..*self } + } else { + self.clone() + } + } + + /// Returns the amount of padding we must insert after `self` + /// to ensure that the following address will satisfy `align` + /// (measured in bytes). + /// + /// Behavior undefined if `align` is not a power-of-two. + /// + /// Note that in practice, this is only useable if `align <= + /// self.align` otherwise, the amount of inserted padding would + /// need to depend on the particular starting address for the + /// whole record, because `self.align` would not provide + /// sufficient constraint. + pub fn padding_needed_for(&self, align: Alignment) -> usize { + debug_assert!(align <= self.align()); + let len = self.size(); + let len_rounded_up = (len + align - 1) & !(align - 1); + return len_rounded_up - len; + } + + /// Creates a layout describing the record for `n` instances of + /// `self`, with a suitable amount of padding between each to + /// ensure that each instance is given its requested size and + /// alignment. On success, returns `(k, offs)` where `k` is the + /// layout of the array and `offs` is the distance between the start + /// of each element in the array. + /// + /// On arithmetic overflow, returns `None`. + pub fn repeat(&self, n: usize) -> Option<(Self, usize)> { + let padded_size = match self.size.checked_add(self.padding_needed_for(self.align)) { + None => return None, + Some(padded_size) => padded_size, + }; + let alloc_size = match padded_size.checked_mul(n) { + None => return None, + Some(alloc_size) => alloc_size, + }; + Some((Layout::from_size_align(alloc_size, self.align), padded_size)) + } + + /// Creates a layout describing the record for `self` followed by + /// `next`, including any necessary padding to ensure that `next` + /// will be properly aligned. Note that the result layout will + /// satisfy the alignment properties of both `self` and `next`. + /// + /// Returns `Some((k, offset))`, where `k` is layout of the concatenated + /// record and `offset` is the relative location, in bytes, of the + /// start of the `next` embedded witnin the concatenated record + /// (assuming that the record itself starts at offset 0). + /// + /// On arithmetic overflow, returns `None`. + pub fn extend(&self, next: Self) -> Option<(Self, usize)> { + let new_align = cmp::max(self.align, next.align); + let realigned = Layout { align: new_align, ..*self }; + let pad = realigned.padding_needed_for(new_align); + let offset = self.size() + pad; + let new_size = offset + next.size(); + Some((Layout::from_size_align(new_size, new_align), offset)) + } + + /// Creates a layout describing the record for `n` instances of + /// `self`, with no padding between each instance. + /// + /// On arithmetic overflow, returns `None`. + pub fn repeat_packed(&self, n: usize) -> Option { + let scaled = match self.size().checked_mul(n) { + None => return None, + Some(scaled) => scaled, + }; + let size = { assert!(scaled > 0); scaled }; + Some(Layout { size: size, align: self.align }) + } + + /// Creates a layout describing the record for `self` followed by + /// `next` with no additional padding between the two. Since no + /// padding is inserted, the alignment of `next` is irrelevant, + /// and is not incoporated *at all* into the resulting layout. + /// + /// Returns `(k, offset)`, where `k` is layout of the concatenated + /// record and `offset` is the relative location, in bytes, of the + /// start of the `next` embedded witnin the concatenated record + /// (assuming that the record itself starts at offset 0). + /// + /// (The `offset` is always the same as `self.size()`; we use this + /// signature out of convenience in matching the signature of + /// `fn extend`.) + /// + /// On arithmetic overflow, returns `None`. + pub fn extend_packed(&self, next: Self) -> Option<(Self, usize)> { + let new_size = match self.size().checked_add(next.size()) { + None => return None, + Some(new_size) => new_size, + }; + Some((Layout { size: new_size, ..*self }, self.size())) + } + + // Below family of methods *assume* inputs are pre- or + // post-validated in some manner. (The implementations here + ///do indirectly validate, but that is not part of their + /// specification.) + // + // Since invalid inputs could yield ill-formed layouts, these + // methods are `unsafe`. + + /// Creates layout describing the record for a single instance of `T`. + pub unsafe fn new_unchecked() -> Self { + let (size, align) = size_align::(); + Layout::from_size_align(size, align) + } + + + /// Creates a layout describing the record for `self` followed by + /// `next`, including any necessary padding to ensure that `next` + /// will be properly aligned. Note that the result layout will + /// satisfy the alignment properties of both `self` and `next`. + /// + /// Returns `(k, offset)`, where `k` is layout of the concatenated + /// record and `offset` is the relative location, in bytes, of the + /// start of the `next` embedded witnin the concatenated record + /// (assuming that the record itself starts at offset 0). + /// + /// Requires no arithmetic overflow from inputs. + pub unsafe fn extend_unchecked(&self, next: Self) -> (Self, usize) { + self.extend(next).unwrap() + } + + /// Creates a layout describing the record for `n` instances of + /// `self`, with a suitable amount of padding between each. + /// + /// Requires non-zero `n` and no arithmetic overflow from inputs. + /// (See also the `fn array` checked variant.) + pub unsafe fn repeat_unchecked(&self, n: usize) -> (Self, usize) { + self.repeat(n).unwrap() + } + + /// Creates a layout describing the record for `n` instances of + /// `self`, with no padding between each instance. + /// + /// Requires non-zero `n` and no arithmetic overflow from inputs. + /// (See also the `fn array_packed` checked variant.) + pub unsafe fn repeat_packed_unchecked(&self, n: usize) -> Self { + self.repeat_packed(n).unwrap() + } + + /// Creates a layout describing the record for `self` followed by + /// `next` with no additional padding between the two. Since no + /// padding is inserted, the alignment of `next` is irrelevant, + /// and is not incoporated *at all* into the resulting layout. + /// + /// Returns `(k, offset)`, where `k` is layout of the concatenated + /// record and `offset` is the relative location, in bytes, of the + /// start of the `next` embedded witnin the concatenated record + /// (assuming that the record itself starts at offset 0). + /// + /// (The `offset` is always the same as `self.size()`; we use this + /// signature out of convenience in matching the signature of + /// `fn extend`.) + /// + /// Requires no arithmetic overflow from inputs. + /// (See also the `fn extend_packed` checked variant.) + pub unsafe fn extend_packed_unchecked(&self, next: Self) -> (Self, usize) { + self.extend_packed(next).unwrap() + } + + /// Creates a layout describing the record for a `[T; n]`. + /// + /// On zero `n`, zero-sized `T`, or arithmetic overflow, returns `None`. + pub fn array(n: usize) -> Option { + Layout::new::() + .repeat(n) + .map(|(k, offs)| { + debug_assert!(offs == mem::size_of::()); + k + }) + } + + /// Creates a layout describing the record for a `[T; n]`. + /// + /// Requires nonzero `n`, nonzero-sized `T`, and no arithmetic + /// overflow; otherwise behavior undefined. + pub fn array_unchecked(n: usize) -> Self { + Layout::array::(n).unwrap() + } + +} + +``` + +### AllocErr API +[error api]: #allocerr-api + +```rust +/// The `AllocErr` error specifies whether an allocation failure is +/// specifically due to resource exhaustion or if it is due to +/// something wrong when combining the given input arguments with this +/// allocator. +#[derive(Clone, PartialEq, Eq, Debug)] +pub enum AllocErr { + /// Error due to hitting some resource limit or otherwise running + /// out of memory. This condition strongly implies that *some* + /// series of deallocations would allow a subsequent reissuing of + /// the original allocation request to succeed. + Exhausted { request: Layout }, + + /// Error due to allocator being fundamentally incapable of + /// satisfying the original request. This condition implies that + /// such an allocation request will never succeed on the given + /// allocator, regardless of environment, memory pressure, or + /// other contextual condtions. + /// + /// For example, an allocator that does not support zero-sized + /// blocks can return this error variant. + Unsupported { details: &'static str }, +} + +impl AllocErr { + pub fn invalid_input(details: &'static str) -> Self { + AllocErr::Unsupported { details: details } + } + pub fn is_memory_exhausted(&self) -> bool { + if let AllocErr::Exhausted { .. } = *self { true } else { false } + } + pub fn is_request_unsupported(&self) -> bool { + if let AllocErr::Unsupported { .. } = *self { true } else { false } + } +} + +/// The `CannotReallocInPlace` error is used when `fn realloc_in_place` +/// was unable to reuse the given memory block for a requested layout. +#[derive(Clone, PartialEq, Eq, Debug)] +pub struct CannotReallocInPlace; + +``` + +### Allocator trait header +[trait header]: #allocator-trait-header + +```rust +/// An implementation of `Allocator` can allocate, reallocate, and +/// deallocate arbitrary blocks of data described via `Layout`. +/// +/// Some of the methods require that a layout *fit* a memory block. +/// What it means for a layout to "fit" a memory block means is that +/// the following two conditions must hold: +/// +/// 1. The block's starting address must be aligned to `layout.align()`. +/// +/// 2. The block's size must fall in the range `[use_min, use_max]`, where: +/// +/// * `use_min` is `self.usable_size(layout).0`, and +/// +/// * `use_max` is the capacity that was (or would have been) +/// returned when (if) the block was allocated via a call to +/// `alloc_excess` or `realloc_excess`. +/// +/// Note that: +/// +/// * the size of the layout most recently used to allocate the block +/// is guaranteed to be in the range `[use_min, use_max]`, and +/// +/// * a lower-bound on `use_max` can be safely approximated by a call to +/// `usable_size`. +/// +pub unsafe trait Allocator { + +``` + +### Allocator core alloc and dealloc +[alloc and dealloc]: #allocator-core-alloc-and-dealloc + +```rust + /// Returns a pointer suitable for holding data described by + /// `layout`, meeting its size and alignment guarantees. + /// + /// The returned block of storage may or may not have its contents + /// initialized. (Extension subtraits might restrict this + /// behavior, e.g. to ensure initialization.) + /// + /// Returning `Err` indicates that either memory is exhausted or `layout` does + /// not meet allocator's size or alignment constraints. + /// + /// Implementations are encouraged to return `Err` on memory + /// exhaustion rather than panicking or aborting, but this is + /// not a strict requirement. (Specifically: it is *legal* to use + /// this trait to wrap an underlying native allocation library + /// that aborts on memory exhaustion.) + unsafe fn alloc(&mut self, layout: Layout) -> Result; + + /// Deallocate the memory referenced by `ptr`. + /// + /// `ptr` must have previously been provided via this allocator, + /// and `layout` must *fit* the provided block (see above); + /// otherwise yields undefined behavior. + unsafe fn dealloc(&mut self, ptr: Address, layout: Layout); + + /// Allocator-specific method for signalling an out-of-memory + /// condition. + /// + /// Implementations of the `oom` method are discouraged from + /// infinitely regressing in nested calls to `oom`. In + /// practice this means implementors should eschew allocating, + /// especially from `self` (directly or indirectly). + /// + /// Implementions of this trait's allocation methods are discouraged + /// from panicking (or aborting) in the event of memory exhaustion; + /// instead they should return an appropriate error from the + /// invoked method, and let the client decide whether to invoke + /// this `oom` method. + fn oom(&mut self, _: AllocErr) -> ! { + unsafe { ::core::intrinsics::abort() } + } +``` + +### Allocator-specific quantities and limits +[quantites and limits]: #allocator-specific-quantities-and-limits + +```rust + // == ALLOCATOR-SPECIFIC QUANTITIES AND LIMITS == + // usable_size + + /// Returns bounds on the guaranteed usable size of a successful + /// allocation created with the specified `layout`. + /// + /// In particular, for a given layout `k`, if `usable_size(k)` returns + /// `(l, m)`, then one can use a block of layout `k` as if it has any + /// size in the range `[l, m]` (inclusive). + /// + /// (All implementors of `fn usable_size` must ensure that + /// `l <= k.size() <= m`) + /// + /// Both the lower- and upper-bounds (`l` and `m` respectively) are + /// provided: An allocator based on size classes could misbehave + /// if one attempts to deallocate a block without providing a + /// correct value for its size (i.e., one within the range `[l, m]`). + /// + /// Clients who wish to make use of excess capacity are encouraged + /// to use the `alloc_excess` and `realloc_excess` instead, as + /// this method is constrained to conservatively report a value + /// less than or equal to the minimum capacity for *all possible* + /// calls to those methods. + /// + /// However, for clients that do not wish to track the capacity + /// returned by `alloc_excess` locally, this method is likely to + /// produce useful results. + unsafe fn usable_size(&self, layout: &Layout) -> (Capacity, Capacity) { + (layout.size(), layout.size()) + } + +``` + +### Allocator methods for memory reuse +[memory reuse]: #allocator-methods-for-memory-reuse + +```rust + // == METHODS FOR MEMORY REUSE == + // realloc. alloc_excess, realloc_excess + + /// Returns a pointer suitable for holding data described by + /// `new_layout`, meeting its size and alignment guarantees. To + /// accomplish this, this may extend or shrink the allocation + /// referenced by `ptr` to fit `new_layout`. + /// + /// * `ptr` must have previously been provided via this allocator. + /// + /// * `layout` must *fit* the `ptr` (see above). (The `new_layout` + /// argument need not fit it.) + /// + /// Behavior undefined if either of latter two constraints are unmet. + /// + /// In addition, `new_layout` should not impose a different alignment + /// constraint than `layout`. (In other words, `new_layout.align()` + /// should equal `layout.align()`.) + /// However, behavior is well-defined (though underspecified) when + /// this constraint is violated; further discussion below. + /// + /// If this returns `Ok`, then ownership of the memory block + /// referenced by `ptr` has been transferred to this + /// allocator. The memory may or may not have been freed, and + /// should be considered unusable (unless of course it was + /// transferred back to the caller again via the return value of + /// this method). + /// + /// Returns `Err` only if `new_layout` does not meet the allocator's + /// size and alignment constraints of the allocator or the + /// alignment of `layout`, or if reallocation otherwise fails. (Note + /// that did not say "if and only if" -- in particular, an + /// implementation of this method *can* return `Ok` if + /// `new_layout.align() != old_layout.align()`; or it can return `Err` + /// in that scenario, depending on whether this allocator + /// can dynamically adjust the alignment constraint for the block.) + /// + /// If this method returns `Err`, then ownership of the memory + /// block has not been transferred to this allocator, and the + /// contents of the memory block are unaltered. + unsafe fn realloc(&mut self, + ptr: Address, + layout: Layout, + new_layout: Layout) -> Result { + let (min, max) = self.usable_size(&layout); + let s = new_layout.size(); + // All Layout alignments are powers of two, so a comparison + // suffices here (rather than resorting to a `%` operation). + if min <= s && s <= max && new_layout.align() <= layout.align() { + return Ok(ptr); + } else { + let new_size = new_layout.size(); + let old_size = layout.size(); + let result = self.alloc(new_layout); + if let Ok(new_ptr) = result { + ptr::copy(ptr as *const u8, new_ptr, cmp::min(old_size, new_size)); + self.dealloc(ptr, layout); + } + result + } + } + + /// Behaves like `fn alloc`, but also returns the whole size of + /// the returned block. For some `layout` inputs, like arrays, this + /// may include extra storage usable for additional data. + unsafe fn alloc_excess(&mut self, layout: Layout) -> Result { + let usable_size = self.usable_size(&layout); + self.alloc(layout).map(|p| Excess(p, usable_size.1)) + } + + /// Behaves like `fn realloc`, but also returns the whole size of + /// the returned block. For some `layout` inputs, like arrays, this + /// may include extra storage usable for additional data. + unsafe fn realloc_excess(&mut self, + ptr: Address, + layout: Layout, + new_layout: Layout) -> Result { + let usable_size = self.usable_size(&new_layout); + self.realloc(ptr, layout, new_layout) + .map(|p| Excess(p, usable_size.1)) + } + + /// Attempts to extend the allocation referenced by `ptr` to fit `new_layout`. + /// + /// * `ptr` must have previously been provided via this allocator. + /// + /// * `layout` must *fit* the `ptr` (see above). (The `new_layout` + /// argument need not fit it.) + /// + /// Behavior undefined if either of latter two constraints are unmet. + /// + /// If this returns `Ok`, then the allocator has asserted that the + /// memory block referenced by `ptr` now fits `new_layout`, and thus can + /// be used to carry data of that layout. (The allocator is allowed to + /// expend effort to accomplish this, such as extending the memory block to + /// include successor blocks, or virtual memory tricks.) + /// + /// If this returns `Err`, then the allocator has made no assertion + /// about whether the memory block referenced by `ptr` can or cannot + /// fit `new_layout`. + /// + /// In either case, ownership of the memory block referenced by `ptr` + /// has not been transferred, and the contents of the memory block + /// are unaltered. + unsafe fn realloc_in_place(&mut self, + ptr: Address, + layout: Layout, + new_layout: Layout) -> Result<(), CannotReallocInPlace> { + let (_, _, _) = (ptr, layout, new_layout); + Err(CannotReallocInPlace) + } +``` + +### Allocator convenience methods for common usage patterns +[common usage patterns]: #allocator-convenience-methods-for-common-usage-patterns + +```rust + // == COMMON USAGE PATTERNS == + // alloc_one, dealloc_one, alloc_array, realloc_array. dealloc_array + + /// Allocates a block suitable for holding an instance of `T`. + /// + /// Captures a common usage pattern for allocators. + /// + /// The returned block is suitable for passing to the + /// `alloc`/`realloc` methods of this allocator. + /// + /// May return `Err` for zero-sized `T`. + unsafe fn alloc_one(&mut self) -> Result, AllocErr> + where Self: Sized { + let k = Layout::new::(); + if k.size() > 0 { + self.alloc(k).map(|p|Unique::new(*p as *mut T)) + } else { + Err(AllocErr::invalid_input("zero-sized type invalid for alloc_one")) + } + } + + /// Deallocates a block suitable for holding an instance of `T`. + /// + /// The given block must have been produced by this allocator, + /// and must be suitable for storing a `T` (in terms of alignment + /// as well as minimum and maximum size); otherwise yields + /// undefined behavior. + /// + /// Captures a common usage pattern for allocators. + unsafe fn dealloc_one(&mut self, mut ptr: Unique) + where Self: Sized { + let raw_ptr = ptr.get_mut() as *mut T as *mut u8; + self.dealloc(raw_ptr, Layout::new::()); + } + + /// Allocates a block suitable for holding `n` instances of `T`. + /// + /// Captures a common usage pattern for allocators. + /// + /// The returned block is suitable for passing to the + /// `alloc`/`realloc` methods of this allocator. + /// + /// May return `Err` for zero-sized `T` or `n == 0`. + /// + /// Always returns `Err` on arithmetic overflow. + unsafe fn alloc_array(&mut self, n: usize) -> Result, AllocErr> + where Self: Sized { + match Layout::array::(n) { + Some(ref layout) if layout.size() > 0 => { + self.alloc(layout.clone()) + .map(|p| { + println!("alloc_array layout: {:?} yielded p: {:?}", layout, p); + Unique::new(p as *mut T) + }) + } + _ => Err(AllocErr::invalid_input("invalid layout for alloc_array")), + } + } + + /// Reallocates a block previously suitable for holding `n_old` + /// instances of `T`, returning a block suitable for holding + /// `n_new` instances of `T`. + /// + /// Captures a common usage pattern for allocators. + /// + /// The returned block is suitable for passing to the + /// `alloc`/`realloc` methods of this allocator. + /// + /// May return `Err` for zero-sized `T` or `n == 0`. + /// + /// Always returns `Err` on arithmetic overflow. + unsafe fn realloc_array(&mut self, + ptr: Unique, + n_old: usize, + n_new: usize) -> Result, AllocErr> + where Self: Sized { + match (Layout::array::(n_old), Layout::array::(n_new), *ptr) { + (Some(ref k_old), Some(ref k_new), ptr) if k_old.size() > 0 && k_new.size() > 0 => { + self.realloc(ptr as *mut u8, k_old.clone(), k_new.clone()) + .map(|p|Unique::new(p as *mut T)) + } + _ => { + Err(AllocErr::invalid_input("invalid layout for realloc_array")) + } + } + } + + /// Deallocates a block suitable for holding `n` instances of `T`. + /// + /// Captures a common usage pattern for allocators. + unsafe fn dealloc_array(&mut self, ptr: Unique, n: usize) -> Result<(), AllocErr> + where Self: Sized { + let raw_ptr = *ptr as *mut u8; + match Layout::array::(n) { + Some(ref k) if k.size() > 0 => { + Ok(self.dealloc(raw_ptr, k.clone())) + } + _ => { + Err(AllocErr::invalid_input("invalid layout for dealloc_array")) + } + } + } + +``` + +### Allocator unchecked method variants +[unchecked variants]: #allocator-unchecked-method-variants + +```rust + // UNCHECKED METHOD VARIANTS + + /// Returns a pointer suitable for holding data described by + /// `layout`, meeting its size and alignment guarantees. + /// + /// The returned block of storage may or may not have its contents + /// initialized. (Extension subtraits might restrict this + /// behavior, e.g. to ensure initialization.) + /// + /// Returns `None` if request unsatisfied. + /// + /// Behavior undefined if input does not meet size or alignment + /// constraints of this allocator. + unsafe fn alloc_unchecked(&mut self, layout: Layout) -> Option
{ + // (default implementation carries checks, but impl's are free to omit them.) + self.alloc(layout).ok() + } + + /// Returns a pointer suitable for holding data described by + /// `new_layout`, meeting its size and alignment guarantees. To + /// accomplish this, may extend or shrink the allocation + /// referenced by `ptr` to fit `new_layout`. + //// + /// (In other words, ownership of the memory block associated with + /// `ptr` is first transferred back to this allocator, but the + /// same block may or may not be transferred back as the result of + /// this call.) + /// + /// * `ptr` must have previously been provided via this allocator. + /// + /// * `layout` must *fit* the `ptr` (see above). (The `new_layout` + /// argument need not fit it.) + /// + /// * `new_layout` must meet the allocator's size and alignment + /// constraints. In addition, `new_layout.align()` must equal + /// `layout.align()`. (Note that this is a stronger constraint + /// that that imposed by `fn realloc`.) + /// + /// Behavior undefined if any of latter three constraints are unmet. + /// + /// If this returns `Some`, then the memory block referenced by + /// `ptr` may have been freed and should be considered unusable. + /// + /// Returns `None` if reallocation fails; in this scenario, the + /// original memory block referenced by `ptr` is unaltered. + unsafe fn realloc_unchecked(&mut self, + ptr: Address, + layout: Layout, + new_layout: Layout) -> Option
{ + // (default implementation carries checks, but impl's are free to omit them.) + self.realloc(ptr, layout, new_layout).ok() + } + + /// Behaves like `fn alloc_unchecked`, but also returns the whole + /// size of the returned block. + unsafe fn alloc_excess_unchecked(&mut self, layout: Layout) -> Option { + self.alloc_excess(layout).ok() + } + + /// Behaves like `fn realloc_unchecked`, but also returns the + /// whole size of the returned block. + unsafe fn realloc_excess_unchecked(&mut self, + ptr: Address, + layout: Layout, + new_layout: Layout) -> Option { + self.realloc_excess(ptr, layout, new_layout).ok() + } + + + /// Allocates a block suitable for holding `n` instances of `T`. + /// + /// Captures a common usage pattern for allocators. + /// + /// Requires inputs are non-zero and do not cause arithmetic + /// overflow, and `T` is not zero sized; otherwise yields + /// undefined behavior. + unsafe fn alloc_array_unchecked(&mut self, n: usize) -> Option> + where Self: Sized { + let layout = Layout::array_unchecked::(n); + self.alloc_unchecked(layout).map(|p|Unique::new(*p as *mut T)) + } + + /// Reallocates a block suitable for holding `n_old` instances of `T`, + /// returning a block suitable for holding `n_new` instances of `T`. + /// + /// Captures a common usage pattern for allocators. + /// + /// Requires inputs are non-zero and do not cause arithmetic + /// overflow, and `T` is not zero sized; otherwise yields + /// undefined behavior. + unsafe fn realloc_array_unchecked(&mut self, + ptr: Unique, + n_old: usize, + n_new: usize) -> Option> + where Self: Sized { + let (k_old, k_new, ptr) = (Layout::array_unchecked::(n_old), + Layout::array_unchecked::(n_new), + *ptr); + self.realloc_unchecked(ptr as *mut u8, k_old, k_new) + .map(|p|Unique::new(*p as *mut T)) + } + + /// Deallocates a block suitable for holding `n` instances of `T`. + /// + /// Captures a common usage pattern for allocators. + /// + /// Requires inputs are non-zero and do not cause arithmetic + /// overflow, and `T` is not zero sized; otherwise yields + /// undefined behavior. + unsafe fn dealloc_array_unchecked(&mut self, ptr: Unique, n: usize) + where Self: Sized { + let layout = Layout::array_unchecked::(n); + self.dealloc(*ptr as *mut u8, layout); + } +} +``` diff --git a/text/1399-repr-pack.md b/text/1399-repr-pack.md new file mode 100644 index 00000000000..c165ace4bce --- /dev/null +++ b/text/1399-repr-pack.md @@ -0,0 +1,97 @@ +- Feature Name: `repr_packed` +- Start Date: 2015-12-06 +- RFC PR: [rust-lang/rfcs#1399](https://github.com/rust-lang/rfcs/pull/1399) +- Rust Issue: [rust-lang/rust#33158](https://github.com/rust-lang/rust/issues/33158) + +# Summary +[summary]: #summary + +Extend the existing `#[repr]` attribute on structs with a `packed = "N"` option to +specify a custom packing for `struct` types. + +# Motivation +[motivation]: #motivation + +Many C/C++ compilers allow a packing to be specified for structs which +effectively lowers the alignment for a struct and its fields (for example with +MSVC there is `#pragma pack(N)`). Such packing is used extensively in certain +C/C++ libraries (such as Windows API which uses it pervasively making writing +Rust libraries such as `winapi` challenging). + +At the moment the only way to work around the lack of a proper +`#[repr(packed = "N")]` attribute is to use `#[repr(packed)]` and then manually +fill in padding which is a burdensome task. Even then that isn't quite right +because the overall alignment of the struct would end up as 1 even though it +needs to be N (or the default if that is smaller than N), so this fills in a gap +which is impossible to do in Rust at the moment. + +# Detailed design +[design]: #detailed-design + +The `#[repr]` attribute on `struct`s will be extended to include a form such as: + +```rust +#[repr(packed = "2")] +struct LessAligned(i16, i32); +``` + +This structure will have an alignment of 2 and a size of 6, as well as the +second field having an offset of 2 instead of 4 from the base of the struct. +This is in contrast to without the attribute where the structure would have an +alignment of 4 and a size of 8, and the second field would have an offset of 4 +from the base of the struct. + +Syntactically, the `repr` meta list will be extended to accept a meta item +name/value pair with the name "packed" and the value as a string which can be +parsed as a `u64`. The restrictions on where this attribute can be placed along +with the accepted values are: + +* Custom packing can only be specified on `struct` declarations for now. + Specifying a different packing on perhaps `enum` or `type` definitions should + be a backwards-compatible extension. +* Packing values must be a power of two. + +By specifying this attribute, the alignment of the struct would be the smaller +of the specified packing and the default alignment of the struct. The alignments +of each struct field for the purpose of positioning fields would also be the +smaller of the specified packing and the alignment of the type of that field. If +the specified packing is greater than or equal to the default alignment of the +struct, then the alignment and layout of the struct should be unaffected. + +When combined with `#[repr(C)]` the size alignment and layout of the struct +should match the equivalent struct in C. + +`#[repr(packed)]` and `#[repr(packed = "1")]` should have identical behavior. + +Because this lowers the effective alignment of fields in the same way that +`#[repr(packed)]` does (which caused [issue #27060][gh27060]), while accessing a +field should be safe, borrowing a field should be unsafe. + +Specifying `#[repr(packed)]` and `#[repr(packed = "N")]` where N is not 1 should +result in an error. + +Specifying `#[repr(packed = "A")]` and `#[repr(align = "B")]` should still pack +together fields with the packing specified, but then increase the overall +alignment to the alignment specified. Depends on [RFC #1358][rfc1358] landing. + +# Drawbacks +[drawbacks]: #drawbacks + +# Alternatives +[alternatives]: #alternatives + +* The alternative is not doing this and forcing people to continue using + `#[repr(packed)]` with manual padding, although such structs would always have + an alignment of 1 which is often wrong. +* Alternatively a new attribute could be used such as `#[pack]`. + +# Unresolved questions +[unresolved]: #unresolved-questions + +* The behavior specified here should match the behavior of MSVC at least. Does + it match the behavior of other C/C++ compilers as well? +* Should it still be safe to borrow fields whose alignment is less than or equal + to the specified packing or should all field borrows be unsafe? + +[gh27060]: https://github.com/rust-lang/rust/issues/27060 +[rfc1358]: https://github.com/rust-lang/rfcs/pull/1358 diff --git a/text/1414-rvalue_static_promotion.md b/text/1414-rvalue_static_promotion.md new file mode 100644 index 00000000000..74c2957da90 --- /dev/null +++ b/text/1414-rvalue_static_promotion.md @@ -0,0 +1,198 @@ +- Feature Name: rvalue_static_promotion +- Start Date: 2015-12-18 +- RFC PR: [#1414](https://github.com/rust-lang/rfcs/pull/1414) +- Rust Issue: [#38865](https://github.com/rust-lang/rust/issues/38865) + +# Summary +[summary]: #summary + +Promote constexpr rvalues to values in static memory instead of +stack slots, and expose those in the language by being able to directly create +`'static` references to them. This would allow code like +`let x: &'static u32 = &42` to work. + +# Motivation +[motivation]: #motivation + +Right now, when dealing with constant values, you have to explicitly define +`const` or `static` items to create references with `'static` lifetime, +which can be unnecessarily verbose if those items never get exposed +in the actual API: + +```rust +fn return_x_or_a_default(x: Option<&u32>) -> &u32 { + if let Some(x) = x { + x + } else { + static DEFAULT_X: u32 = 42; + &DEFAULT_X + } +} +fn return_binop() -> &'static Fn(u32, u32) -> u32 { + const STATIC_TRAIT_OBJECT: &'static Fn(u32, u32) -> u32 + = &|x, y| x + y; + STATIC_TRAIT_OBJECT +} +``` + +This workaround also has the limitation of not being able to refer to +type parameters of a containing generic functions, eg you can't do this: + +```rust +fn generic() -> &'static Option { + const X: &'static Option = &None::; + X +} +``` + +However, the compiler already special cases a small subset of rvalue +const expressions to have static lifetime - namely the empty array expression: + +```rust +let x: &'static [u8] = &[]; +``` + +And though they don't have to be seen as such, string literals could be regarded +as the same kind of special sugar: + +```rust +let b: &'static [u8; 4] = b"test"; +// could be seen as `= &[116, 101, 115, 116]` + +let s: &'static str = "foo"; +// could be seen as `= &str([102, 111, 111])` +// given `struct str([u8]);` and the ability to construct compound +// DST structs directly +``` + +With the proposed change, those special cases would instead become +part of a general language feature usable for custom code. + +# Detailed design +[design]: #detailed-design + +Inside a function body's block: + +- If a shared reference to a constexpr rvalue is taken. (`&`) +- And the constexpr does not contain a `UnsafeCell { ... }` constructor. +- And the constexpr does not contain a const fn call returning a type containing a `UnsafeCell`. +- Then instead of translating the value into a stack slot, translate + it into a static memory location and give the resulting reference a + `'static` lifetime. + +The `UnsafeCell` restrictions are there to ensure that the promoted value is +truly immutable behind the reference. + +Examples: + +```rust +// OK: +let a: &'static u32 = &32; +let b: &'static Option> = &None; +let c: &'static Fn() -> u32 = &|| 42; + +let h: &'static u32 = &(32 + 64); + +fn generic() -> &'static Option { + &None:: +} + +// BAD: +let f: &'static Option> = &Some(UnsafeCell { data: 32 }); +let g: &'static Cell = &Cell::new(); // assuming conf fn new() +``` + +These rules above should be consistent with the existing rvalue promotions in `const` +initializer expressions: + +```rust +// If this compiles: +const X: &'static T = &; + +// Then this should compile as well: +let x: &'static T = &; +``` + +## Implementation + +The necessary changes in the compiler did already get implemented as +part of codegen optimizations (emitting references-to or memcopies-from values in static memory instead of embedding them in the code). + +All that is left do do is "throw the switch" for the new lifetime semantic +by removing these lines: +https://github.com/rust-lang/rust/blob/29ea4eef9fa6e36f40bc1f31eb1e56bf5941ee72/src/librustc/middle/mem_categorization.rs#L801-L807 + +(And of course fixing any fallout/bitrot that might have happened, adding tests, etc.) + +# Drawbacks +[drawbacks]: #drawbacks + +One more feature with seemingly ad-hoc rules to complicate the language... + +# Alternatives, Extensions +[alternatives]: #alternatives + +It would be possible to extend support to `&'static mut` references, +as long as there is the additional constraint that the +referenced type is zero sized. + +This again has precedence in the array reference constructor: + +```rust +// valid code today +let y: &'static mut [u8] = &mut []; +``` + +The rules would be similar: + +- If a mutable reference to a constexpr rvalue is taken. (`&mut `) +- And the constexpr does not contain a `UnsafeCell { ... }` constructor. +- And the constexpr does not contain a const fn call returning a type containing a `UnsafeCell`. +- _And the type of the rvalue is zero-sized._ +- Then instead of translating the value into a stack slot, translate + it into a static memory location and give the resulting reference a + `'static` lifetime. + +The zero-sized restriction is there because +aliasing mutable references are only safe for zero sized types +(since you never dereference the pointer for them). + +Example: + +```rust +fn return_fn_mut_or_default(&mut self) -> &FnMut(u32, u32) -> u32 { + self.operator.unwrap_or(&mut |x, y| x * y) + // ^ would be okay, since it would be translated like this: + // const STATIC_TRAIT_OBJECT: &'static mut FnMut(u32, u32) -> u32 + // = &mut |x, y| x * y; + // self.operator.unwrap_or(STATIC_TRAIT_OBJECT) +} + +let d: &'static mut () = &mut (); +let e: &'static mut Fn() -> u32 = &mut || 42; +``` + +There are two ways this could be taken further with zero-sized types: + +1. Remove the `UnsafeCell` restriction if the type of the rvalue is zero-sized. +2. The above, but also remove the __constexpr__ restriction, applying to any zero-sized rvalue instead. + +Both cases would work because one can't cause memory unsafety with a reference +to a zero sized value, and they would allow more safe code to compile. + +However, they might complicated reasoning about the rules more, +especially with the last one also being possibly confusing in regards to +side-effects. + +Not doing this means: + +- Relying on `static` and `const` items to create `'static` references, which won't work in generics. +- Empty-array expressions would remain special cased. +- It would also not be possible to safely create `&'static mut` references to zero-sized +types, though that part could also be achieved by allowing mutable references to +zero-sized types in constants. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None, beyond "Should we do alternative 1 instead?". diff --git a/text/1415-trim-std-os.md b/text/1415-trim-std-os.md new file mode 100644 index 00000000000..5a2e8d5cd58 --- /dev/null +++ b/text/1415-trim-std-os.md @@ -0,0 +1,146 @@ +- Feature Name: N/A +- Start Date: 2015-12-18 +- RFC PR: [rust-lang/rfcs#1415](https://github.com/rust-lang/rfcs/pull/1415) +- Rust Issue: [rust-lang/rust#31549](https://github.com/rust-lang/rust/issues/31549) + +# Summary +[summary]: #summary + +Deprecate type aliases and structs in `std::os::$platform::raw` in favor of +trait-based accessors which return Rust types rather than the equivalent C type +aliases. + +# Motivation +[motivation]: #motivation + +[RFC 517][io-reform] set forth a vision for the `raw` modules in the standard +library to perform lowering operations on various Rust types to their platform +equivalents. For example the `fs::Metadata` structure can be lowered to the +underlying `sys::stat` structure. The rationale for this was to enable building +abstractions externally from the standard library by exposing all of the +underlying data that is obtained from the OS. + +[io-reform]: https://github.com/rust-lang/rfcs/blob/master/text/0517-io-os-reform.md + +This strategy, however, runs into a few problems: + +* For some libc structures, such as `stat`, there's not actually one canonical + definition. For example on 32-bit Linux the definition of `stat` will change + depending on whether [LFS][lfs] is enabled (via the `-D_FILE_OFFSET_BITS` + macro). This means that if std is advertises these `raw` types as being "FFI + compatible with libc", it's not actually correct in all circumstances! +* Intricately exporting raw underlying interfaces (such as [`&stat` from + `&fs::Metadata`][std-as-stat]) makes it difficult to change the + implementation over time. Today the 32-bit Linux standard library [doesn't + use LFS functions][std-no-lfs], so files over 4GB cannot be opened. Changing + this, however, would [involve changing the `stat` + structure][libc-stat-change] and may be difficult to do. +* Trait extensions in the `raw` module attempt to return the `libc` aliased type + on all platforms, for example [`DirEntryExt::ino`][std-ino] returns a type of + `ino_t`. The `ino_t` type is billed as being FFI compatible with the libc + `ino_t` type, but not all platforms store the `d_ino` field in `dirent` with + the `ino_t` type. For example on Android the [definition of + `ino_t`][android-ino_t] is `u32` but the [actual stored value is + `u64`][android-d_ino]. This means that on Android we're actually silently + truncating the return value! + +[lfs]: http://users.suse.com/~aj/linux_lfs.html +[std-as-stat]: https://github.com/rust-lang/rust/blob/29ea4eef9fa6e36f40bc1f31eb1e56bf5941ee72/src/libstd/sys/unix/fs.rs#L81-L92 +[std-no-lfs]: https://github.com/rust-lang/rust/issues/30050 +[std-ino]: https://github.com/rust-lang/rust/blob/29ea4eef9fa6e36f40bc1f31eb1e56bf5941ee72/src/libstd/sys/unix/fs.rs#L192-L197 +[libc-stat-change]: https://github.com/rust-lang-nursery/libc/blob/2c7e08c959e599ca221581b1670a9ecbbeac2dcb/src/unix/notbsd/linux/other/b32/mod.rs#L28-L71 +[android-d_ino]: https://github.com/rust-lang-nursery/libc/blob/2c7e08c959e599ca221581b1670a9ecbbeac2dcb/src/unix/notbsd/android/mod.rs#L50 +[android-ino_t]: https://github.com/rust-lang-nursery/libc/blob/2c7e08c959e599ca221581b1670a9ecbbeac2dcb/src/unix/notbsd/android/mod.rs#L11 + +Over time it's basically turned out that exporting the somewhat-messy details of +libc has gotten a little messy in the standard library as well. Exporting this +functionality (e.g. being able to access all of the fields), is quite useful +however! This RFC proposes tweaking the design of the extensions in +`std::os::*::raw` to allow the same level of information exposure that happens +today but also cut some of the tie from libc to std to give us more freedom to +change these implementation details and work around weird platforms. + +# Detailed design +[design]: #detailed-design + +First, the types and type aliases in `std::os::*::raw` will all be +deprecated. For example `stat`, `ino_t`, `dev_t`, `mode_t`, etc, will all be +deprecated (in favor of their definitions in the `libc` crate). Note that the C +integer types, `c_int` and friends, will not be deprecated. + +Next, all existing extension traits will cease to return platform specific type +aliases (such as the `DirEntryExt::ino` function). Instead they will return +`u64` across the board unless it's 100% known for sure that fewer bits will +suffice. This will improve consistency across platforms as well as avoid +truncation problems such as those Android is experiencing. Furthermore this +frees std from dealing with any odd FFI compatibility issues, punting that to +the libc crate itself it the values are handed back into C. + +The `std::os::*::fs::MetadataExt` will have its `as_raw_stat` method deprecated, +and it will instead grow functions to access all the associated fields of the +underlying `stat` structure. This means that there will now be a +trait-per-platform to expose all this information. Also note that all the +methods will likely return `u64` in accordance with the above modification. + +With these modifications to what `std::os::*::raw` includes and how it's +defined, it should be easy to tweak existing implementations and ensure values +are transmitted in a lossless fashion. The changes, however, are both breaking +changes and don't immediately enable fixing bugs like using LFS on Linux: + +* Code such as `let a: ino_t = entry.ino()` would break as the `ino()` function + will return `u64`, but the definition of `ino_t` may not be `u64` for all + platforms. +* The `stat` structure itself on 32-bit Linux still uses 32-bit fields (e.g. it + doesn't mirror `stat64` in libc). + +To help with these issues, more extensive modifications can be made to the +platform specific modules. All type aliases can be switched over to `u64` and +the `stat` structure could simply be redefined to `stat64` on Linux (minus +keeping the same name). This would, however, explicitly mean that +**std::os::raw is no longer FFI compatible with C**. + +This breakage can be clearly indicated in the deprecation messages, however. +Additionally, this fits within std's [breaking changes policy][api-evolution] as +a local `as` cast should be all that's needed to patch code that breaks to +straddle versions of Rust. + +[api-evolution]: https://github.com/rust-lang/rfcs/blob/master/text/1105-api-evolution.md + +# Drawbacks +[drawbacks]: #drawbacks + +As mentioned above, this RFC is strictly-speaking a breaking change. It is +expected that not much code will break, but currently there is no data +supporting this. + +Returning `u64` across the board could be confusing in some circumstances as it +may wildly differ both in terms of signedness as well as size from the +underlying C type. Converting it back to the appropriate type runs the risk of +being onerous, but accessing these raw fields in theory happens quite rarely as +std should primarily be exporting cross-platform accessors for the various +fields here and there. + +# Alternatives +[alternatives]: #alternatives + +* The documentation of the raw modules in std could be modified to indicate that + the types contained within are intentionally not FFI compatible, and the same + structure could be preserved today with the types all being rewritten to what + they would be anyway if this RFC were implemented. For example `ino_t` on + Android would change to `u64` and `stat` on 32-bit Linux would change to + `stat64`. In doing this, however, it's not clear why we'd keep around all the + C namings and structure. + +* Instead of breaking existing functionality, new accessors and types could be + added to acquire the "lossless" version of a type. For example we could add a + `ino64` function on `DirEntryExt` which returns a `u64`, and for `stat` we + could add `as_raw_stat64`. This would, however, force `Metadata` to store two + different `stat` structures, and the breakage in practice this will cause may + be small enough to not warrant these great lengths. + +# Unresolved questions +[unresolved]: #unresolved-questions + +* Is the policy of almost always returning `u64` too strict? Should types like + `mode_t` be allowed as `i32` explicitly? Should the sign at least attempt to + always be preserved? diff --git a/text/1419-slice-copy.md b/text/1419-slice-copy.md new file mode 100644 index 00000000000..4e3abedc5e5 --- /dev/null +++ b/text/1419-slice-copy.md @@ -0,0 +1,60 @@ +- Feature Name: slice\_copy\_from +- Start Date: (fill me in with today's date, YYYY-MM-DD) +- RFC PR: [rust-lang/rfcs#1419](https://github.com/rust-lang/rfcs/pull/1419) +- Rust Issue: [rust-lang/rust#31755](https://github.com/rust-lang/rust/issues/31755) + +# Summary +[summary]: #summary + +Safe `memcpy` from one slice to another of the same type and length. + +# Motivation +[motivation]: #motivation + +Currently, the only way to quickly copy from one non-`u8` slice to another is to +use a loop, or unsafe methods like `std::ptr::copy_nonoverlapping`. This allows +us to guarantee a `memcpy` for `Copy` types, and is safe. + +# Detailed design +[design]: #detailed-design + +Add one method to Primitive Type `slice`. + +```rust +impl [T] where T: Copy { + pub fn copy_from_slice(&mut self, src: &[T]); +} +``` + +`copy_from_slice` asserts that `src.len() == self.len()`, then `memcpy`s the +members into `self` from `src`. Calling `copy_from_slice` is semantically +equivalent to a `memcpy`. `self` shall have exactly the same members as `src` +after a call to `copy_from_slice`. + +# Drawbacks +[drawbacks]: #drawbacks + +One new method on `slice`. + +# Alternatives +[alternatives]: #alternatives + +`copy_from_slice` could be called `copy_to`, and have the order of the arguments +switched around. This would follow `ptr::copy_nonoverlapping` ordering, and not +`dst = src` or `.clone_from_slice()` ordering. + +`copy_from_slice` could panic only if `dst.len() < src.len()`. This would be the +same as what came before, but we would also lose the guarantee that an +uninitialized slice would be fully initialized. + +`copy_from_slice` could be a free function, as it was in the original draft of +this document. However, there was overwhelming support for it as a method. + +`copy_from_slice` could be not merged, and `clone_from_slice` could be +specialized to `memcpy` in cases of `T: Copy`. I think it's good to have a +specific function to do this, however, which asserts that `T: Copy`. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None, as far as I can tell. diff --git a/text/1422-pub-restricted.md b/text/1422-pub-restricted.md new file mode 100644 index 00000000000..85f973d6286 --- /dev/null +++ b/text/1422-pub-restricted.md @@ -0,0 +1,984 @@ +- Feature Name: pub_restricted +- Start Date: 2015-12-18 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1422 +- Rust Issue: https://github.com/rust-lang/rust/issues/32409 + +# Summary +[summary]: #summary + +Expand the current `pub`/non-`pub` categorization of items with the +ability to say "make this item visible *solely* to a (named) module +tree." + +The current `crate` is one such tree, and would be expressed via: +`pub(crate) item`. Other trees can be denoted via a path employed in a +`use` statement, e.g. `pub(a::b) item`, or `pub(super) item`. + +# Motivation +[motivation]: #motivation + +Right now, if you have a definition for an item `X` that you want to +use in many places in a module tree, you can either +(1.) define `X` at the root of the tree as a non-`pub` item, or +(2.) you can define `X` as a `pub` item in some submodule +(and import into the root of the module tree via `use`). + +But: Sometimes neither of these options is really what you want. + +There are scenarios where developers would like an item to be visible +to a particular module subtree (or a whole crate in its entirety), but +it is not possible to move the item's (non-pub) definition to the root +of that subtree (which would be the usual way to expose an item to a +subtree without making it pub). + +If the definition of `X` itself needs access to other private items +within a submodule of the tree, then `X` *cannot* be put at the root +of the module tree. Illustration: + +```rust +// Intent: `a` exports `I`, `bar`, and `foo`, but nothing else. +pub mod a { + pub const I: i32 = 3; + + // `semisecret` will be used "many" places within `a`, but + // is not meant to be exposed outside of `a`. + fn semisecret(x: i32) -> i32 { use self::b::c::J; x + J } + + pub fn bar(z: i32) -> i32 { semisecret(I) * z } + pub fn foo(y: i32) -> i32 { semisecret(I) + y } + + mod b { + mod c { + const J: i32 = 4; // J is meant to be hidden from the outside world. + } + } +} +``` + +(Note: the `pub mod a` is meant to be at the root of some crate.) + +The latter code fails to compile, due to the privacy violation where +the body of `fn semisecret` attempts to access `a::b::c::J`, which +is not visible in the context of `a`. + +A standard way to deal with this today is to use the second approach +described above (labelled "(2.)"): move `fn semisecret` down into the place where it can +access `J`, marking `fn semisecret` as `pub` so that it can still be +accessed within the items of `a`, and then re-exporting `semisecret` +as necessary up the module tree. + +```rust +// Intent: `a` exports `I`, `bar`, and `foo`, but nothing else. +pub mod a { + pub const I: i32 = 3; + + // `semisecret` will be used "many" places within `a`, but + // is not meant to be exposed outside of `a`. + // (If we put `pub use` here, then *anyone* could access it.) + use self::b::semisecret; + + pub fn bar(z: i32) -> i32 { semisecret(I) * z } + pub fn foo(y: i32) -> i32 { semisecret(I) + y } + + mod b { + pub use self::c::semisecret; + mod c { + const J: i32 = 4; // J is meant to be hidden from the outside world. + pub fn semisecret(x: i32) -> i32 { x + J } + } + } +} +``` + +This works, but there is a serious issue with it: One cannot easily +tell exactly how "public" `fn semisecret` is. In particular, +understanding who can access `semisecret` requires reasoning about +(1.) all of the `pub use`'s (aka re-exports) of `semisecret`, and +(2.) the `pub`-ness of every module in a path leading to `fn +semisecret` or one of its re-exports. + +This RFC seeks to remedy the above problem via two main changes. + + 1. Give the user a way to explicitly restrict the intended scope + of where a `pub`-licized item can be used. + + 2. Modify the privacy rules so that `pub`-restricted items cannot be + used nor re-exported outside of their respective restricted areas. + +## Impact + +This difficulty in reasoning about the "publicness" of a name is not +just a problem for users; it also complicates efforts within the +compiler to verify that a surface API for a type does not itself use +or expose any private names. + +[There][18241] are [a][28325] number [of][28450] bugs [filed][28514] against +[privacy][29668] checking; some are simply +implementation issues, but the comment threads in the issues make it +clear that in some cases, different people have very different mental +models about how privacy interacts with aliases (e.g. `type` +declarations) and re-exports. + +In theory, we can add the changes of this RFC without breaking any old +code. (That is, in principle the only affected code is that for item +definitions that use `pub(restriction)`. This limited addition would +still provide value to users in their reasoning about the visibility +of such items.) + +In practice, I expect that as part of the implementation of this RFC, +we will probably fix pre-existing bugs in the parts of privacy +checking verifying that surface API's do not use or expose private +names. + +Important: No such fixes to such pre-existing bugs are being +concretely proposed by this RFC; I am merely musing that by adding a +more expressive privacy system, we will open the door to fix bugs +whose exploits, under the old system, were the only way to express +certain patterns of interest to developers. + + +[18241]: https://github.com/rust-lang/rust/issues/18241 + + +[28325]: https://github.com/rust-lang/rust/issues/28325 + + +[28450]: https://github.com/rust-lang/rust/issues/28450 + + +[28514]: https://github.com/rust-lang/rust/issues/28514 + + +[29668]: https://github.com/rust-lang/rust/issues/29668 + + +[RFC 136]: https://github.com/rust-lang/rfcs/blob/master/text/0136-no-privates-in-public.md + + +[RFC amendment 200]: https://github.com/rust-lang/rfcs/pull/200 + + +# Detailed design +[design]: #detailed-design + +The main problem identified in the [motivation][] section is this: + +From an module-internal definition like +```rust +pub mod a { [...] mod b { [...] pub fn semisecret(x: i32) -> i32 { x + J } [...] } } +``` +one cannot readily tell exactly how "public" the `fn semisecret` is meant to be. + +As already stated, this RFC seeks to remedy the above problem via two +main changes. + + 1. Give the user a way to explicitly restrict the intended scope + of where a `pub`-licized item can be used. + + 2. Modify the privacy rules so that `pub`-restricted items cannot be + used nor re-exported outside of their respective restricted areas. + +## Syntax + +The new feature is to restrict the scope by adding the module subtree +(which acts as the restricted area) in parentheses after the `pub` +keyword, like so: + +```rust +pub(a::b::c) item; +``` + +The path in the restriction is resolved just like a `use` statement: it +is resolved absolutely, from the crate root. + +Just like `use` statements, one can also write relative paths, by +starting them with `self` or a sequence of `super`'s. + +```rust +pub(super::super) item; +// or +pub(self) item; // (semantically equiv to no `pub`; see below) +``` + +In addition to the forms analogous to `use`, there is one new form: + +```rust +pub(crate) item; +``` + +In other words, the grammar is changed like so: + +old: +``` +VISIBILITY ::= | `pub` +``` + +new: +``` +VISIBILITY ::= | `pub` | `pub` `(` USE_PATH `)` | `pub` `(` `crate` `)` +``` + +One can use these `pub(restriction)` forms anywhere that one can +currently use `pub`. In particular, one can use them on item +definitions, methods in an impl, the fields of a struct +definition, and on `pub use` re-exports. + +## Semantics + +The meaning of `pub(restriction)` is as follows: The definition of +every item, method, field, or name (e.g. a re-export) is associated +with a restriction. + +A restriction is either: the universe of all crates (aka +"unrestricted"), the current crate, or an absolute path to a module +sub-hierarchy in the current crate. A restricted thing cannot be +directly "used" in source code outside of its restricted area. (The +term "used" here is meant to cover both direct reference in the +source, and also implicit reference as the inferred type of an +expression or pattern.) + + * `pub` written with no explicit restriction means that there is no + restriction, or in other words, the restriction is the universe of + all crates. + + * `pub(crate)` means that the restriction is the current crate. + + * `pub()` means that the restriction is the module + sub-hierarchy denoted by ``, resolved in the context of the + occurrence of the `pub` modifier. (This is to ensure that `super` + and `self` make sense in such paths.) + +As noted above, the definition means that `pub(self) item` is the same +as if one had written just `item`. + + * The main reason to support this level of generality (which is + otherwise just "redundant syntax") is macros: one can write a macro + that expands to `pub($arg) item`, and a macro client can pass in + `self` as the `$arg` to get the effect of a non-pub definition. + +NOTE: even if the restriction of an item or name indicates that it is +accessible in some context, it may still be impossible to reference +it. In particular, we will still keep our existing rules regarding +`pub` items defined in non-`pub` modules; such items would have no +restriction, but still may be inaccessible if they are not re-exported in +some manner. + +## Revised Example +[revised]: #revised-example + +In the running example, one could instead write: + +```rust +// Intent: `a` exports `I`, `bar`, and `foo`, but nothing else. +pub mod a { + pub const I: i32 = 3; + + // `semisecret` will be used "many" places within `a`, but + // is not meant to be exposed outside of `a`. + // (`pub use` would be *rejected*; see Note 1 below) + use self::b::semisecret; + + pub fn bar(z: i32) -> i32 { semisecret(I) * z } + pub fn foo(y: i32) -> i32 { semisecret(I) + y } + + mod b { + pub(a) use self::c::semisecret; + mod c { + const J: i32 = 4; // J is meant to be hidden from the outside world. + + // `pub(a)` means "usable within hierarchy of `mod a`, but not + // elsewhere." + pub(a) fn semisecret(x: i32) -> i32 { x + J } + } + } +} +``` + +Note 1: The compiler would reject the variation of the above written +as: + +```rust +pub mod a { [...] pub use self::b::semisecret; [...] } +``` + +because `pub(a) fn semisecret` says that it cannot be used outside of +`a`, and therefore it be incorrect (or at least useless) to reexport +`semisecret` outside of `a`. + +Note 2: The most direct interpretation of the rules here leads me to +conclude that `b`'s re-export of `semisecret` needs to be restricted +to `a` as well. However, it may be possible to loosen things so that +the re-export could just stay as `pub` with no extra restriction; see +discussion of "IRS:PUNPM" in Unresolved Questions. + +This richer notion of privacy does offer us some other ways to +re-write the running example; instead of defining `fn semisecret` +within `c` so that it can access `J`, we might instead expose `J` to +`mod b` and then put `fn semisecret`, like so: + +```rust +pub mod a { + [...] + mod b { + use self::c::J; + pub(a) fn semisecret(x: i32) -> i32 { x + J } + mod c { + pub(b) const J: i32 = 4; + } + } +} +``` + +(This RFC takes no position on which of the above two structures is +"better"; a toy example like this does not provide enough context to +judge.) + +## Restrictions +[restrictions]: #restrictions + +Lets discuss what the restrictions actually mean. + +Some basic definitions: An item is just as it is declared in the Rust +reference manual: a component of a crate, located at a fixed path +(potentially at the "outermost" anonymous module) within the module +tree of the crate. + +Every item can be thought of as having some hidden implementation +component(s) along with an exposed surface API. + +So, for example, in `pub fn foo(x: Input) -> Output { Body }`, the +surface of `foo` includes `Input` and `Output`, while the `Body` is +hidden. + +The pre-existing privacy rules (both prior to and after this RFC) try +to enforce two things: (1.) when a item references a path, all of the +names on that path need to be visible (in terms of privacy) in the +referencing context and, (2.) private items should not be exposed in +the surface of public API's. + + * I am using the term "surface" rather than "signature" deliberately, + since I think the term "signature" is too broad to be used to + accurately describe the current semantics of rustc. See my recent + [Surface blog post][] for further discussion. + +[Surface blog post]: http://blog.pnkfx.org/blog/2015/12/19/signatures-and-surfaces-thoughts-on-privacy-versus-dependency/ + +This RFC is expanding the scope of (2.) above, so that the rules are now: + + 1. when a item references a path (in its implementation or in its + signature), all of the names on that path must be visible in the + referencing context. + + 2. items *restricted* to an area R should not be exposed in the + surface API of names or items that can themselves be exported + beyond R. (Privacy is now a special case of this more general + notion.) + + For convenience, it is legal to declare a field (or inherent + method) with a strictly larger area of restriction than its + `self`. See discussion in the [examples][parts-more-public-than-whole]. + +In principle, validating (1.) can be done via the pre-existing privacy +code. (However, it may make sense to do it by mapping each name to its +associated restriction; I don't think that will change the outcome, +but it might make the checking code simpler. But I am not an expert on +the current state of the privacy checking code.) + +Validating (2.) requires traversing the surface API for each item and +comparing the restriction for every reference to the restriction of +the item itself. + +## Trait methods + +Currently, trait associated item syntax carries no `pub` modifier. + +A question arises when trying to apply the terminology of this RFC: +are trait associated items implicitly `pub`, in the sense that they +are unrestricted? + +The simple answer is: No, associated items are not implicitly `pub`; +at least, not in general. (They are not in general implicitly `pub` +today either, as discussed in [RFC 136][when public (RFC 136)].) +(If they were implictly `pub`, things would be difficult; further +discussion in attached [appendix][associated items digression].) + +[when public (RFC 136)]: https://github.com/rust-lang/rfcs/blob/master/text/0136-no-privates-in-public.md#when-is-an-item-public + +However, since this RFC is introducing multiple kinds of `pub`, we +should address the topic of what *is* the `pub`-ness of associated +items. + + * When analyzing a trait definition, then associated items should be + considered to inherit the `pub`-ness, if any, of their defining + trait. + + We want to make sure that this code continues to work: + + ```rust + mod a { + struct S(String); + trait Trait { + fn make_s(&self) -> S; // referencing `S` is ok, b/c `Trait` is not `pub` + } + } + ``` + + And under this RFC, we now allow this as well: + + ```rust + mod a { + struct S(String); + mod b { + pub(a) trait Trait { + fn mk_s(&self) -> ::a::S; + // referencing `::a::S` is ok, b/c `Trait` is restricted to `::a` + } + } + use self::b::Trait; + } + ``` + + Note that in stable Rust today, it is an error to declare the latter trait + within `mod b` as non-`pub` (since the `use self::b::Trait` would be + referencing a private item), + *and* in the Rust nightly channel it is a warning to declare it + as `pub trait Trait { ... }`. + + The point of this RFC is to give users a sensible way to declare + such traits within `b`, without allowing them to be exposed outside + of `a`. + + * When analyzing an `impl Trait for Type`, there may be distinct + restrictions assigned to the `Trait` and the `Type`. However, + since both the `Trait` and the `Type` must be visible in the + context of the module where the `impl` occurs, there should + be a subtree relationship between the two restrictions; in other + words, one restriction should be less than (or equal to) the other. + + So just use the minimum of the two restrictions when analyzing + the right-hand sides of the associated items in the impl. + + Note: I am largely adopting this rule in an attempt to be + consistent with [RFC 136][when public (RFC 136)]. I invite + discussion of whether this rule actually makes sense as phrased + here. + +## More examples! +[examples]: #more-examples + +These examples meant to explore the syntax a bit. They are *not* meant +to provide motivation for the feature (i.e. I am not claiming that the +feature is making this code cleaner or easier to reason about). + +### Impl item example +[impl item example]: #impl-item-example + +```rust +pub struct S(i32); + +mod a { + pub fn call_foo(s: &super::S) { s.foo(); } + + mod b { + fn some_method_private_to_b() { + println!("inside some_method_private_to_b"); + } + + impl super::super::S { + pub(a) fn foo(&self) { + some_method_private_to_b(); + println!("only callable within `a`: {}", self.0); + } + } + } +} + +fn rejected(s: &S) { + s.foo(); //~ ERROR: `S::foo` not visible outside of module `a` +} +``` + +(You may be wondering: "Could we move that `impl S` out to the +top-level, out of `mod a`?" Well ... see discussion in the +[unresolved questions][def-outside-restriction].) + +### Restricting fields example +[restricting fields example]: #restricting-fields-example + +```rust +mod a { + #[derive(Default)] + struct Priv(i32); + + pub mod b { + use a::Priv as Priv_a; + + #[derive(Default)] + pub struct F { + pub x: i32, + y: Priv_a, + pub(a) z: Priv_a, + } + + #[derive(Default)] + pub struct G(pub i32, Priv_a, pub(a) Priv_a); + + // ... accesses to F.{x,y,z} ... + // ... accesses to G.{0,1,2} ... + } + // ... accesses to F.{x,z} ... + // ... accesses to G.{0,2} ... +} + +mod k { + use a::b::{F, G}; + // ... accesses to F and F.x ... + // ... accesses to G and G.0 ... +} +``` + + +### Fields and inherent methods more public than self +[parts-more-public-than-whole]: #fields-and-inherent-methods-more-public-than-self + +In Rust today, one can write + +```rust +mod a { struct X { pub y: i32, } } +``` + +This RFC was crafted to say that fields and inherent methods +can have an associated restriction that is larger than the restriction +of its `self`. This was both to keep from breaking the above +code, and also because it would be annoying to be forced to write: + +```rust +mod a { struct X { pub(a) y: i32, } } +``` + +(This RFC is not an attempt to resolve things like +[Rust Issue 30079][30079]; the decision of how to handle that issue +can be dealt with orthogonally, in my opinion.) + +[30079]: https://github.com/rust-lang/rust/issues/30079 + + +So, under this RFC, the following is legal: + +```rust +mod a { + pub use self::b::stuff_with_x; + mod b { + struct X { pub y: i32, pub(a) z: i32 } + mod c { + impl super::X { + pub(c) fn only_in_c(&mut self) { self.y += 1; } + + pub fn callanywhere(&mut self) { + self.only_in_c(); + println!("X.y is now: {}", self.y); + } + } + } + pub fn stuff_with_x() { + let mut x = X { y: 10, z: 20}; + x.callanywhere(); + } + } +} +``` + +In particular: + + * It is okay that the fields `y` and `z` and the inherent method + `fn callanywhere` are more publicly visible than `X`. + + (Just because we declare something `pub` does not mean it will + actually be *possible* to reach it from arbitrary contexts. Whether + or not such access is possible will depend on many things, including + but not limited to the restriction attached and also future decisions + about issues like [issue 30079][30079].) + + * We are allowed to restrict an inherent method, `fn only_in_c`, to + a subtree of the module tree where `X` is itself visible. + +### Re-exports + +Here is an example of a `pub use` re-export using the new +feature, including both correct and invalid uses of the extended form. + +```rust +mod a { + mod b { + pub(a) struct X { pub y: i32, pub(a) z: i32 } // restricted to `mod a` tree + mod c { + pub mod d { + pub(super) use a::b::X as P; // ok: a::b::c is submodule of `a` + } + + fn swap_ok(x: d::P) -> d::P { // ok: `P` accessible here + X { z: x.y, y: x.z } + } + } + + fn swap_bad(x: c::d::P) -> c::d::P { //~ ERROR: `c::d::P` not visible outside `a::b::c` + X { z: x.y, y: x.z } + } + + mod bad { + pub use super::X; //~ ERROR: `X` cannot be reexported outside of `a` + } + } + + fn swap_ok2(x: X) -> X { // ok: `X` accessible from `mod a`. + X { z: x.y, y: x.z } + } +} +``` + +### Crate restricted visibility + +This is a concrete illusration of how one might use the `pub(crate) item` form, +(which is perhaps quite similar to Java's default "package visibility"). + +Crate `c1`: + +```rust +pub mod a { + struct Priv(i32); + + pub(crate) struct R { pub y: i32, z: Priv } // ok: field allowed to be more public + pub struct S { pub y: i32, z: Priv } + + pub fn to_r_bad(s: S) -> R { ... } //~ ERROR: `R` restricted solely to this crate + + pub(crate) fn to_r(s: S) -> R { R { y: s.y, z: s.z } } // ok: restricted to crate +} + +use a::{R, S}; // ok: `a::R` and `a::S` are both visible + +pub use a::R as ReexportAttempt; //~ ERROR: `a::R` restricted solely to this crate +``` + +Crate `c2`: + +```rust +extern crate c1; + +use c1::a::S; // ok: `S` is unrestricted + +use c1::a::R; //~ ERROR: `c1::a::R` not visible outside of its crate +``` + +## Precedent + +When I started on this I was not sure if this form of delimited access +to a particular module subtree had a precedent; the closest thing I +could think of was C++ `friend` modifiers (but `friend` is far more +ad-hoc and free-form than what is being proposed here). + +### Scala + +It has since been pointed out to me that Scala has scoped access +modifiers `protected[Y]` and `private[Y]`, which specify that access +is provided upto `Y` (where `Y` can be a package, class or singleton +object). + +The feature proposed by this RFC appears to be similar in intent to +Scala's scoped access modifiers. + +Having said that, I will admit that I am not clear on what +distinction, if any, Scala draws between `protected[Y]` and +`private[Y]` when `Y` is a package, which is the main analogy for our +purposes, or if they just allow both forms as synonyms for +convenience. + +(I can imagine a hypothetical distinction in Scala when `Y` is a +class, but my skimming online has not provided insight as to what the +actual distinction is.) + +Even if there is some distinction drawn between the two forms in +Scala, I suspect Rust does not need an analogous distinction in it's +`pub(restricted)` + +# Drawbacks +[drawbacks]: #drawbacks + +Obviously, +`pub(restriction) item` complicates the surface syntax of the language. + + * However, my counter-argument to this drawback is that this feature + in fact *simplifies* the developer's mental model. It is easier to + directly encode the expected visibility of an item via + `pub(restriction)` than to figure out the right concoction via a + mix of nested `mod` and `pub use` statements. And likewise, it is + easier to read it too. + +Developers may misuse this form and make it hard to access the tasty +innards of other modules. + + * This is true, but I claim it is irrelevant. + + The effect of this change is solely on the visibility of items + *within* a crate. No rules for inter-crate access change. + + From the perspective of cross-crate development, this RFC changes + nothing, except that it may lead some crate authors to make some + things no longer universally `pub` that they were forced to make + visible before due to earlier limitations. I claim that in such + cases, those crate authors probably always intended for such items + to be non-`pub`, but language limitations were forcing their hand. + + As for intra-crate access: My expectation is that an individual + crate will be made by a team of developers who can work out what + mutual visibility they want and how it should evolve over time. + This feature may affect their work flow to some degree, but they + can choose to either use it or not, based on their own internal + policies. + + +# Alternatives +[alternatives]: #alternatives + +## Do not extend the language! + + * Change privacy rules and make privacy analysis "smarter" + (e.g. global reachabiliy analysis) + + The main problem with this approach is that we tried it, and it + did not work well: The implementation was buggy, and the user-visible + error messages were hard to understand. + + See discussion when the team was discussing the [public items amendment][] + +[public items amendment]: https://github.com/rust-lang/meeting-minutes/blob/master/weekly-meetings/2014-09-16.md#rfc-public-items + + * "Fix" the mental model of privacy (if necessary) without extending + the language. + + The alternative is bascially saying: "Our existing system is fine; all + of the problems with it are due to bugs in the implementation" + + I am sympathetic to this response. However, I think it doesn't + quite hold up. Some users want to be able to define items that are + exposed outside of their module but still restrict the scope of + where they can be referenced, as discussed in the [motivation][] + section, and I do not think the current model can be "fixed" to + support that use case, at least not without adding some sort of + global reachability analysis as discussed in the previous bullet. + +In addition, these two alternatives do not address the main point +being made in the [motivation][] section: one cannot tell exactly how +"public" a `pub` item is, without working backwards through the module +tree for all of its re-exports. + +## Curb your ambitions! + + * Instead of adding support for restricting to arbitrary module + subtrees, narrow the feature to just `pub(crate) item`, so that one + chooses either "module private" (by adding no modifier), or + "universally visible" (by adding `pub`), or "visible to just the + current crate" (by adding `pub(crate)`). + + This would be somewhat analogous to Java's relatively coarse + grained privacy rules, where one can choose `public`, `private`, + `protected`, or the unnamed "package" visibility. + + I am all for keeping the implementation simple. However, the reason + that we should support arbitrary module subtrees is that doing so + will enable certain refactorings. Namely, if I decide I want to + inline the definition for one or more crates `A1`, `A2`, ... into + client crate `C` (i.e. replacing `extern crate A1;` with an + suitably defined `mod A1 { ... }`, but I do not want to worry about + whether doing so will risk future changes violating abstraction + boundaries that were previously being enforced via `pub(crate)`, + then I believe allowing `pub(path)` will allow a mechanical tool to + do the inline refactoring, rewriting each `pub(crate)` as `pub(A1)` + as necessary. + +## Be more ambitious! + +This feature could be extended in various ways. + +For example: + + * As mentioned on the RFC comment thread, + we could allow multiple paths in the restriction-specification: + `pub(path1, path2, path3)`. + + This, for better or worse, would start + to look a lot like `friend` declarations from C++. + + * Also as mentioned on the RFC comment thread, the + `pub(restricted)` form does not have any variant where the + restrction-specification denotes the whole universe. + In other words, there's no current way to get the same effect + as `pub item` via `pub(restricted) item`; you cannot say + `pub(universe) item` (even though I do so in a tongue-in-cheek + manner elsewhere in this RFC). + + Some future syntaxes to support this have been proposed in the + RFC comment thread, such as `pub(::)`. But this RFC is leaving the + actual choice to add such an extension (and what syntax to use + for it) up to a later amendment in the future. + +# Unresolved questions +[unresolved]: #unresolved-questions + +## Can definition site fall outside restriction? +[def-outside-restriction]: #can-definition-site-fall-outside-restriction + +For example, is it illegal to do the following: + +```rust +mod a { + mod child { } + mod b { pub(super::child) const J: i32 = 3; } +} +``` + +Or does it just mean that `J`, despite being defined in `mod b`, is +itself not accessible in `mod b`? + +pnkfelix is personally inclined to make this sort of thing illegal, +mainly because he finds it totally unintuitive, but is interested in +hearing counter-arguments. + +## Implicit Restriction Satisfaction (IRS:PUNPM) + +If a re-export occurs within a non-`pub` module, can we treat it as +implicitly satisfying a restriction to `super` imposed by the item it +is re-exporting? + +In particular, the [revised example][revised] included: + +```rust +// Intent: `a` exports `I` and `foo`, but nothing else. +pub mod a { + [...] + mod b { + pub(a) use self::c::semisecret; + mod c { pub(a) fn semisecret(x: i32) -> i32 { x + J } } + } +} +``` + +However, since `b` is non-`pub`, its `pub` items and re-exports are +solely accessible via the subhierarchy of its module parent (i.e., +`mod a`, as long as no entity attempts to re-export them to a braoder +scope. + +In other words, in some sense `mod b { pub use item; }` *could* +implicitly satisfy a restriction to `super` imposed by `item` (if we +chose to allow it). + +Note: If it were `pub mod b` or `pub(restrict) mod b`, then the above +reasoning would not hold. Therefore, this discussion is limited to +re-exports from non-`pub` modules. + +If we do not allow such implicit restriction satisfaction +for `pub use` re-exports from non-`pub` modules (IRS:PUNPM), then: + +```rust +pub mod a { + [...] + mod b { + pub use self::c::semisecret; + mod c { pub(a) fn semisecret(x: i32) -> i32 { x + J } } + } +} +``` + +would be rejected, and one would be expected to write either: + +```rust + pub(super) use self::c::semisecret; +``` + +or + +```rust + pub(a) use self::c::semisecret; +``` + + +(Side note: I am *not* saying that under IRS:PUNPM, the two forms `pub +use item` and `pub(super) use item` would be considered synonymous, +even in the context of a non-pub module like `mod b`. In particular, +`pub(super) use item` may be imposing a new restriction on the +re-exported name that was not part of its original definition.) + +## Interaction with Globs + +Glob re-exports +currently only re-export `pub` (as in `pub(universe)` items). + +What should glob-reepxorts do with respect to `pub(restricted)`? + +Here is an illustrating example pointed out by petrochenkov in the +comment thread: + +```rust +mod m { + /*priv*/ pub(m) struct S1; + pub(super) S2; + pub(foo::bar) S3; + pub S4; + + mod n { + + // What is reexported here? + // Just `S4`? + // Anything in `m` visible + // to `n` (which is not consisent with the current treatment of + `pub` by globs). + + pub use m::*; + } +} + +// What is reexported here? +pub use m::*; +pub(baz::qux) use m::*; +``` + +This remains an unresolved question, but my personal inclination, at +least for the initial implementation, is to make globs only import +purely `pub` items; no non-`pub`, and no `pub(restricted)`. + +After we get more experience with `pub(restricted)` (and perhaps make +other changes that may come in future RFCs), we will be in a better +position to evaluate what to do here. + + +# Appendices + +## Associated Items Digression +[associated items digression]: #associated-items-digression + +If associated items were implicitly `pub`, in the sense that they are +unrestricted, then that would conflict with the rules imposed by this +RFC, in the sense that the surface API of a non-`pub` trait is +composed of its associated items, and so if all associated items were +implicitly `pub` and unrestricted, then this code would be rejected: + +```rust +mod a { + struct S(String); + trait Trait { + fn mk_s(&self) -> S; // is this implicitly `pub` and unrestricted? + } + impl Trait for () { fn mk_s(&self) -> S { S(format!("():()")) } } + impl Trait for i32 { fn mk_s(&self) -> S { S(format!("{}:i32", self)) } } + pub fn foo(x:i32) -> String { format!("silly{}{}", ().mk_s().0, x.mk_s().0) } +} +``` + +If associated items were implicitly `pub` and unrestricted, then the +above code would be rejected under direct interpretation of the rules +of this RFC (because `fn make_s` is implicitly unrestricted, but the +surface of `fn make_s` references `S`, a non-`pub` item). This would +be backwards-incompatible (and just darn inconvenient too). + +So, to be clear, this RFC is *not* suggesting that associated items be +implicitly `pub` and unrestricted. diff --git a/text/1432-replace-slice.md b/text/1432-replace-slice.md new file mode 100644 index 00000000000..2dbe69feee2 --- /dev/null +++ b/text/1432-replace-slice.md @@ -0,0 +1,180 @@ +- Feature Name: splice +- Start Date: 2015-12-28 +- RFC PR: [rust-lang/rfcs#1432](https://github.com/rust-lang/rfcs/pull/1432) +- Rust Issue: [rust-lang/rust#32310](https://github.com/rust-lang/rust/issues/32310) + +# Summary +[summary]: #summary + +Add a `splice` method to `Vec` and `String` removes a range of elements, +and replaces it in place with a given sequence of values. +The new sequence does not necessarily have the same length as the range it replaces. +In the `Vec` case, this method returns an iterator of the elements being moved out, like `drain`. + + +# Motivation +[motivation]: #motivation + +An implementation of this operation is either slow or dangerous. + +The slow way uses `Vec::drain`, and then `Vec::insert` repeatedly. +The latter part takes quadratic time: +potentially many elements after the replaced range are moved by one offset +potentially many times, once for each new element. + +The dangerous way, detailed below, takes linear time +but involves unsafely moving generic values with `std::ptr::copy`. +This is non-trivial `unsafe` code, where a bug could lead to double-dropping elements +or exposing uninitialized elements. +(Or for `String`, breaking the UTF-8 invariant.) +It therefore benefits form having a shared, carefully-reviewed implementation +rather than leaving it to every potential user to do it themselves. + +While it could be an external crate on crates.io, +this operation is general-purpose enough that I think it belongs in the standard library, +similar to `Vec::drain`. + +# Detailed design +[design]: #detailed-design + +An example implementation is below. + +The proposal is to have inherent methods instead of extension traits. +(Traits are used to make this testable outside of `std` +and to make a point in Unresolved Questions below.) + +```rust +#![feature(collections, collections_range, str_char)] + +extern crate collections; + +use collections::range::RangeArgument; +use std::ptr; + +trait VecSplice { + fn splice(&mut self, range: R, iterable: I) -> Splice + where R: RangeArgument, I: IntoIterator; +} + +impl VecSplice for Vec { + fn splice(&mut self, range: R, iterable: I) -> Splice + where R: RangeArgument, I: IntoIterator + { + unimplemented!() // FIXME: Fill in when exact semantics are decided. + } +} + +struct Splice { + vec: &mut Vec, + range: Range + iter: I::IntoIter, + // FIXME: Fill in when exact semantics are decided. +} + +impl Iterator for Splice { + type Item = I::Item; + fn next(&mut self) -> Option { + unimplemented!() // FIXME: Fill in when exact semantics are decided. + } +} + +impl Drop for Splice { + fn drop(&mut self) { + unimplemented!() // FIXME: Fill in when exact semantics are decided. + } +} + +trait StringSplice { + fn splice(&mut self, range: R, s: &str) where R: RangeArgument; +} + +impl StringSplice for String { + fn splice(&mut self, range: R, s: &str) where R: RangeArgument { + if let Some(&start) = range.start() { + assert!(self.is_char_boundary(start)); + } + if let Some(&end) = range.end() { + assert!(self.is_char_boundary(end)); + } + unsafe { + self.as_mut_vec() + }.splice(range, s.bytes()) + } +} + +#[test] +fn it_works() { + let mut v = vec![1, 2, 3, 4, 5]; + v.splice(2..4, [10, 11, 12].iter().cloned()); + assert_eq!(v, &[1, 2, 10, 11, 12, 5]); + v.splice(1..3, Some(20)); + assert_eq!(v, &[1, 20, 11, 12, 5]); + let mut s = "Hello, world!".to_owned(); + s.splice(7.., "世界!"); + assert_eq!(s, "Hello, 世界!"); +} + +#[test] +#[should_panic] +fn char_boundary() { + let mut s = "Hello, 世界!".to_owned(); + s.splice(..8, "") +} +``` + +The elements of the vector after the range first be moved by an offset of +the lower bound of `Iterator::size_hint` minus the length of the range. +Then, depending on the real length of the iterator: + +* If it’s the same as the lower bound, we’re done. +* If it’s lower than the lower bound (which was then incorrect), the elements will be moved once more. +* If it’s higher, the extra iterator items well be collected into a temporary `Vec` + in order to know exactly how many there are, and the elements after will be moved once more. + +# Drawbacks +[drawbacks]: #drawbacks + +Same as for any addition to `std`: +not every program needs it, and standard library growth has a maintainance cost. + +# Alternatives +[alternatives]: #alternatives + +* Status quo: leave it to every one who wants this to do it the slow way or the dangerous way. +* Publish a crate on crates.io. + Individual crates tend to be not very discoverable, + so not this situation would not be so different from the status quo. + +# Unresolved questions +[unresolved]: #unresolved-questions + +* Should the input iterator be consumed incrementally at each `Splice::next` call, + or only in `Splice::drop`? + +* It would be nice to be able to `Vec::splice` with a slice + without writing `.iter().cloned()` explicitly. + This is possible with the same trick as for the `Extend` trait + ([RFC 839](https://github.com/rust-lang/rfcs/blob/master/text/0839-embrace-extend-extinguish.md)): + accept iterators of `&T` as well as iterators of `T`: + + ```rust + impl<'a, T: 'a> VecSplice<&'a T> for Vec where T: Copy { + fn splice(&mut self, range: R, iterable: I) + where R: RangeArgument, I: IntoIterator + { + self.splice(range, iterable.into_iter().cloned()) + } + } + ``` + + However, this trick can not be used with an inherent method instead of a trait. + (By the way, what was the motivation for `Extend` being a trait rather than inherent methods, + before RFC 839?) + +* If coherence rules and backward-compatibility allow it, + this functionality could be added to `Vec::insert` and `String::insert` + by overloading them / making them more generic. + This would probably require implementing `RangeArgument` for `usize` + representing an empty range, + though a range of length 1 would maybe make more sense for `Vec::drain` + (another user of `RangeArgument`). diff --git a/text/1434-contains-method-for-ranges.md b/text/1434-contains-method-for-ranges.md new file mode 100644 index 00000000000..2c4a2d39b91 --- /dev/null +++ b/text/1434-contains-method-for-ranges.md @@ -0,0 +1,77 @@ +- Feature Name: `contains_method` +- Start Date: 2015-12-28 +- RFC PR: [rust-lang/rfcs#1434](https://github.com/rust-lang/rfcs/pull/1434) +- Rust Issue: [rust-lang/rust#32311](https://github.com/rust-lang/rust/issues/32311) + +# Summary +[summary]: #summary + +Implement a method, `contains()`, for `Range`, `RangeFrom`, and `RangeTo`, checking if a number is in the range. + +Note that the alternatives are just as important as the main proposal. + +# Motivation +[motivation]: #motivation + +The motivation behind this is simple: To be able to write simpler and more expressive code. This RFC introduces a "syntactic sugar" without doing so. + +# Detailed design +[design]: #detailed-design + +Implement a method, `contains()`, for `Range`, `RangeFrom`, and `RangeTo`. This method will check if a number is bound by the range. It will yield a boolean based on the condition defined by the range. + +The implementation is as follows (placed in libcore, and reexported by libstd): + +```rust +use core::ops::{Range, RangeTo, RangeFrom}; + +impl Range where Idx: PartialOrd { + fn contains(&self, item: Idx) -> bool { + self.start <= item && self.end > item + } +} + +impl RangeTo where Idx: PartialOrd { + fn contains(&self, item: Idx) -> bool { + self.end > item + } +} + +impl RangeFrom where Idx: PartialOrd { + fn contains(&self, item: Idx) -> bool { + self.start <= item + } +} + +``` + +# Drawbacks +[drawbacks]: #drawbacks + +Lacks of generics (see Alternatives). + +# Alternatives +[alternatives]: #alternatives + +## Add a `Contains` trait + +This trait provides the method `.contains()` and implements it for all the Range types. + +## Add a `.contains>(i: I)` iterator method + +This method returns a boolean, telling if the iterator contains the item given as parameter. Using method specialization, this can achieve the same performance as the method suggested in this RFC. + +This is more flexible, and provide better performance (due to specialization) than just passing a closure comparing the items to a `any()` method. + +## Make `.any()` generic over a new trait + +Call this trait, `ItemPattern`. This trait is implemented for `Item` and `FnMut(Item) -> bool`. This is, in a sense, similar to `std::str::pattern::Pattern`. + +Then let `.any()` generic over this trait (`T: ItemPattern`) to allow `any()` taking `Self::Item` searching through the iterator for this particular value. + +This will not achieve the same performance as the other proposals. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None. diff --git a/text/1440-drop-types-in-const.md b/text/1440-drop-types-in-const.md new file mode 100644 index 00000000000..4455d580b38 --- /dev/null +++ b/text/1440-drop-types-in-const.md @@ -0,0 +1,70 @@ +- Feature Name: `drop_types_in_const` +- Start Date: 2016-01-01 +- RFC PR: [rust-lang/rfcs#1440](https://github.com/rust-lang/rfcs/pull/1440) +- Rust Issue: [rust-lang/rust#33156](https://github.com/rust-lang/rust/issues/33156) + +# Summary +[summary]: #summary + +Allow types with destructors to be used in `static` items and in `const` functions, as long as the destructor never needs to run in const context. + +# Motivation +[motivation]: #motivation + +Some of the collection types do not allocate any memory when constructed empty (most notably `Vec`). With the change to make leaking safe, the restriction on `static` items with destructors +is no longer required to be a hard error (as it is safe and accepted that these destructors may never run). + +Allowing types with destructors to be directly used in `const` functions and stored in `static`s will remove the need to have +runtime-initialisation for global variables. + +# Detailed design +[design]: #detailed-design + +- Lift the restriction on types with destructors being used in statics. + - `static`s containing Drop-types will not run the destructor upon program/thread exit. + - (Optionally adding a lint that warn about the possibility of resource leak) +- Alloc instantiating structures with destructors in constant expressions, +- Continue to prevent `const` items from holding types with destructors. +- Allow `const fn` to return types with destructors. +- Disallow constant expressions which would result in the destructor being called (if the code were run at runtime). + +## Examples +Assuming that `RwLock` and `Vec` have `const fn new` methods, the following example is possible and avoids runtime validity checks. + +```rust +/// Logging output handler +trait LogHandler: Send + Sync { + // ... +} +/// List of registered logging handlers +static S_LOGGERS: RwLock >> = RwLock::new( Vec::new() ); +``` + +Disallowed code +```rust +static VAL: usize = (Vec::::new(), 0).1; // The `Vec` would be dropped +const EMPTY_BYTE_VEC: Vec = Vec::new(); // `const` items can't have destructors + +const fn sample(_v: Vec) -> usize { + 0 // Discards the input vector, dropping it +} +``` + +# Drawbacks +[drawbacks]: #drawbacks + +Destructors do not run on `static` items (by design), so this can lead to unexpected behavior when a type's destructor has effects outside the program (e.g. a RAII temporary folder handle, which deletes the folder on drop). However, this can already happen using the `lazy_static` crate. + +# Alternatives +[alternatives]: #alternatives + +- Runtime initialisation of a raw pointer can be used instead (as the `lazy_static` crate currently does on stable) +- On nightly, a bug related to `static` and `UnsafeCell>` can be used to remove the dynamic allocation. + - Both of these alternatives require runtime initialisation, and incur a checking overhead on subsequent accesses. +- Leaking of objects could be addressed by using C++-style `.dtors` support + - This is undesirable, as it introduces confusion around destructor execution order. + +# Unresolved questions +[unresolved]: #unresolved-questions + +- TBD diff --git a/text/1443-extended-compare-and-swap.md b/text/1443-extended-compare-and-swap.md new file mode 100644 index 00000000000..25a41cdc8be --- /dev/null +++ b/text/1443-extended-compare-and-swap.md @@ -0,0 +1,115 @@ +- Feature Name: `extended_compare_and_swap` +- Start Date: 2016-1-5 +- RFC PR: [rust-lang/rfcs#1443](https://github.com/rust-lang/rfcs/pull/1443) +- Rust Issue: [rust-lang/rust#31767](https://github.com/rust-lang/rust/issues/31767) + +# Summary +[summary]: #summary + +Rust currently provides a `compare_and_swap` method on atomic types, but this method only exposes a subset of the functionality of the C++11 equivalents [`compare_exchange_strong` and `compare_exchange_weak`](http://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange): + +- `compare_and_swap` maps to the C++11 `compare_exchange_strong`, but there is no Rust equivalent for `compare_exchange_weak`. The latter is allowed to fail spuriously even when the comparison succeeds, which allows the compiler to generate better assembly code when the compare and swap is used in a loop. + +- `compare_and_swap` only has a single memory ordering parameter, whereas the C++11 versions have two: the first describes the memory ordering when the operation succeeds while the second one describes the memory ordering on failure. + +# Motivation +[motivation]: #motivation + +While all of these variants are identical on x86, they can allow more efficient code to be generated on architectures such as ARM: + +- On ARM, the strong variant of compare and swap is compiled into an `LDREX` / `STREX` loop which restarts the compare and swap when a spurious failure is detected. This is unnecessary for many lock-free algorithms since the compare and swap is usually already inside a loop and a spurious failure is often caused by another thread modifying the atomic concurrently, which will probably cause the compare and swap to fail anyways. + +- When Rust lowers `compare_and_swap` to LLVM, it uses the same memory ordering type for success and failure, which on ARM adds extra memory barrier instructions to the failure path. Most lock-free algorithms which make use of compare and swap in a loop only need relaxed ordering on failure since the operation is going to be restarted anyways. + +# Detailed design +[design]: #detailed-design + +Since `compare_and_swap` is stable, we can't simply add a second memory ordering parameter to it. This RFC proposes deprecating the `compare_and_swap` function and replacing it with `compare_exchange` and `compare_exchange_weak`, which match the names of the equivalent C++11 functions (with the `_strong` suffix removed). + +## `compare_exchange` + +A new method is instead added to atomic types: + +```rust +fn compare_exchange(&self, current: T, new: T, success: Ordering, failure: Ordering) -> T; +``` + +The restrictions on the failure ordering are the same as C++11: only `SeqCst`, `Acquire` and `Relaxed` are allowed and it must be equal or weaker than the success ordering. Passing an invalid memory ordering will result in a panic, although this can often be optimized away since the ordering is usually statically known. + +The documentation for the original `compare_and_swap` is updated to say that it is equivalent to `compare_exchange` with the following mapping for memory orders: + +Original | Success | Failure +-------- | ------- | ------- +Relaxed | Relaxed | Relaxed +Acquire | Acquire | Acquire +Release | Release | Relaxed +AcqRel | AcqRel | Acquire +SeqCst | SeqCst | SeqCst + +## `compare_exchange_weak` + +A new method is instead added to atomic types: + +```rust +fn compare_exchange_weak(&self, current: T, new: T, success: Ordering, failure: Ordering) -> (T, bool); +``` + +`compare_exchange` does not need to return a success flag because it can be inferred by checking if the returned value is equal to the expected one. This is not possible for `compare_exchange_weak` because it is allowed to fail spuriously, which means that it could fail to perform the swap even though the returned value is equal to the expected one. + +A lock free algorithm using a loop would use the returned bool to determine whether to break out of the loop, and if not, use the returned value for the next iteration of the loop. + +## Intrinsics + +These are the existing intrinsics used to implement `compare_and_swap`: + +```rust + pub fn atomic_cxchg(dst: *mut T, old: T, src: T) -> T; + pub fn atomic_cxchg_acq(dst: *mut T, old: T, src: T) -> T; + pub fn atomic_cxchg_rel(dst: *mut T, old: T, src: T) -> T; + pub fn atomic_cxchg_acqrel(dst: *mut T, old: T, src: T) -> T; + pub fn atomic_cxchg_relaxed(dst: *mut T, old: T, src: T) -> T; +``` + +The following intrinsics need to be added to support relaxed memory orderings on failure: + +```rust + pub fn atomic_cxchg_acqrel_failrelaxed(dst: *mut T, old: T, src: T) -> T; + pub fn atomic_cxchg_failacq(dst: *mut T, old: T, src: T) -> T; + pub fn atomic_cxchg_failrelaxed(dst: *mut T, old: T, src: T) -> T; + pub fn atomic_cxchg_acq_failrelaxed(dst: *mut T, old: T, src: T) -> T; +``` + +The following intrinsics need to be added to support `compare_exchange_weak`: + +```rust + pub fn atomic_cxchg_weak(dst: *mut T, old: T, src: T) -> (T, bool); + pub fn atomic_cxchg_weak_acq(dst: *mut T, old: T, src: T) -> (T, bool); + pub fn atomic_cxchg_weak_rel(dst: *mut T, old: T, src: T) -> (T, bool); + pub fn atomic_cxchg_weak_acqrel(dst: *mut T, old: T, src: T) -> (T, bool); + pub fn atomic_cxchg_weak_relaxed(dst: *mut T, old: T, src: T) -> (T, bool); + pub fn atomic_cxchg_weak_acqrel_failrelaxed(dst: *mut T, old: T, src: T) -> (T, bool); + pub fn atomic_cxchg_weak_failacq(dst: *mut T, old: T, src: T) -> (T, bool); + pub fn atomic_cxchg_weak_failrelaxed(dst: *mut T, old: T, src: T) -> (T, bool); + pub fn atomic_cxchg_weak_acq_failrelaxed(dst: *mut T, old: T, src: T) -> (T, bool); +``` + +# Drawbacks +[drawbacks]: #drawbacks + +Ideally support for failure memory ordering would be added by simply adding an extra parameter to the existing `compare_and_swap` function. However this is not possible because `compare_and_swap` is stable. + +This RFC proposes deprecating a stable function, which may not be desirable. + +# Alternatives +[alternatives]: #alternatives + +One alternative for supporting failure orderings is to add new enum variants to `Ordering` instead of adding new methods with two ordering parameters. The following variants would need to be added: `AcquireFailRelaxed`, `AcqRelFailRelaxed`, `SeqCstFailRelaxed`, `SeqCstFailAcquire`. The downside is that the names are quite ugly and are only valid for `compare_and_swap`, not other atomic operations. It is also a breaking change to a stable enum. + +Another alternative is to not deprecate `compare_and_swap` and instead add `compare_and_swap_explicit`, `compare_and_swap_weak` and `compare_and_swap_weak_explicit`. However the distiniction between the explicit and non-explicit isn't very clear and can lead to some confusion. + +Not doing anything is also a possible option, but this will cause Rust to generate worse code for some lock-free algorithms. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None diff --git a/text/1444-union.md b/text/1444-union.md new file mode 100644 index 00000000000..75dfe8e8066 --- /dev/null +++ b/text/1444-union.md @@ -0,0 +1,426 @@ +- Feature Name: `union` +- Start Date: 2015-12-29 +- RFC PR: https://github.com/rust-lang/rfcs/pulls/1444 +- Rust Issue: https://github.com/rust-lang/rust/issues/32836 + +# Summary +[summary]: #summary + +Provide native support for C-compatible unions, defined via a new "contextual +keyword" `union`, without breaking any existing code that uses `union` as an +identifier. + +# Motivation +[motivation]: #motivation + +Many FFI interfaces include unions. Rust does not currently have any native +representation for unions, so users of these FFI interfaces must define +multiple structs and transmute between them via `std::mem::transmute`. The +resulting FFI code must carefully understand platform-specific size and +alignment requirements for structure fields. Such code has little in common +with how a C client would invoke the same interfaces. + +Introducing native syntax for unions makes many FFI interfaces much simpler and +less error-prone to write, simplifying the creation of bindings to native +libraries, and enriching the Rust/Cargo ecosystem. + +A native union mechanism would also simplify Rust implementations of +space-efficient or cache-efficient structures relying on value representation, +such as machine-word-sized unions using the least-significant bits of aligned +pointers to distinguish cases. + +The syntax proposed here recognizes `union` as though it were a keyword when +used to introduce a union declaration, *without* breaking any existing code +that uses `union` as an identifier. Experiments by Niko Matsakis demonstrate +that recognizing `union` in this manner works unambiguously with zero conflicts +in the Rust grammar. + +To preserve memory safety, accesses to union fields may only occur in unsafe +code. Commonly, code using unions will provide safe wrappers around unsafe +union field accesses. + +# Detailed design +[design]: #detailed-design + +## Declaring a union type + +A union declaration uses the same field declaration syntax as a struct +declaration, except with `union` in place of `struct`. + +```rust +union MyUnion { + f1: u32, + f2: f32, +} +``` + +By default, a union uses an unspecified binary layout. A union declared with +the `#[repr(C)]` attribute will have the same layout as an equivalent C union. + +A union must have at least one field; an empty union declaration produces a +syntax error. + +## Contextual keyword + +Rust normally prevents the use of a keyword as an identifier; for instance, a +declaration `fn struct() {}` will produce an error "expected identifier, found +keyword `struct`". However, to avoid breaking existing declarations that use +`union` as an identifier, Rust will only recognize `union` as a keyword when +used to introduce a union declaration. A declaration `fn union() {}` will not +produce such an error. + +## Instantiating a union + +A union instantiation uses the same syntax as a struct instantiation, except +that it must specify exactly one field: + +```rust +let u = MyUnion { f1: 1 }; +``` + +Specifying multiple fields in a union instantiation results in a compiler +error. + +Safe code may instantiate a union, as no unsafe behavior can occur until +accessing a field of the union. Code that wishes to maintain invariants about +the union fields should make the union fields private and provide public +functions that maintain the invariants. + +## Reading fields + +Unsafe code may read from union fields, using the same dotted syntax as a +struct: + +```rust +fn f(u: MyUnion) -> f32 { + unsafe { u.f2 } +} +``` + +## Writing fields + +Unsafe code may write to fields in a mutable union, using the same syntax as a +struct: + +```rust +fn f(u: &mut MyUnion) { + unsafe { + u.f1 = 2; + } +} +``` + +If a union contains multiple fields of different sizes, assigning to a field +smaller than the entire union must not change the memory of the union outside +that field. + +Union fields will normally not implement `Drop`, and by default, declaring a +union with a field type that implements `Drop` will produce a lint warning. +Assigning to a field with a type that implements `Drop` will call `drop()` on +the previous value of that field. This matches the behavior of `struct` fields +that implement `Drop`. To avoid this, such as if interpreting the union's +value via that field and dropping it would produce incorrect behavior, Rust +code can assign to the entire union instead of the field. A union does not +implicitly implement `Drop` even if its field types do. + +The lint warning produced when declaring a union field of a type that +implements `Drop` should document this caveat in its explanatory text. + +## Pattern matching + +Unsafe code may pattern match on union fields, using the same syntax as a +struct, without the requirement to mention every field of the union in a match +or use `..`: + +```rust +fn f(u: MyUnion) { + unsafe { + match u { + MyUnion { f1: 10 } => { println!("ten"); } + MyUnion { f2 } => { println!("{}", f2); } + } + } +} +``` + +Matching a specific value from a union field makes a refutable pattern; naming +a union field without matching a specific value makes an irrefutable pattern. +Both require unsafe code. + +Pattern matching may match a union as a field of a larger structure. In +particular, when using a Rust union to implement a C tagged union via FFI, this +allows matching on the tag and the corresponding field simultaneously: + +```rust +#[repr(u32)] +enum Tag { I, F } + +#[repr(C)] +union U { + i: i32, + f: f32, +} + +#[repr(C)] +struct Value { + tag: Tag, + u: U, +} + +fn is_zero(v: Value) -> bool { + unsafe { + match v { + Value { tag: I, u: U { i: 0 } } => true, + Value { tag: F, u: U { f: 0.0 } } => true, + _ => false, + } + } +} +``` + +Note that a pattern match on a union field that has a smaller size than the +entire union must not make any assumptions about the value of the union's +memory outside that field. For example, if a union contains a `u8` and a +`u32`, matching on the `u8` may not perform a `u32`-sized comparison over the +entire union. + +## Borrowing union fields + +Unsafe code may borrow a reference to a field of a union; doing so borrows the +entire union, such that any borrow conflicting with a borrow of the union +(including a borrow of another union field or a borrow of a structure +containing the union) will produce an error. + +```rust +union U { + f1: u32, + f2: f32, +} + +#[test] +fn test() { + let mut u = U { f1: 1 }; + unsafe { + let b1 = &mut u.f1; + // let b2 = &mut u.f2; // This would produce an error + *b1 = 5; + } + assert_eq!(unsafe { u.f1 }, 5); +} +``` + +Simultaneous borrows of multiple fields of a struct contained within a union do +not conflict: + +```rust +struct S { + x: u32, + y: u32, +} + +union U { + s: S, + both: u64, +} + +#[test] +fn test() { + let mut u = U { s: S { x: 1, y: 2 } }; + unsafe { + let bx = &mut u.s.x; + // let bboth = &mut u.both; // This would fail + let by = &mut u.s.y; + *bx = 5; + *by = 10; + } + assert_eq!(unsafe { u.s.x }, 5); + assert_eq!(unsafe { u.s.y }, 10); +} +``` + +## Union and field visibility + +The `pub` keyword works on the union and on its fields, as with a struct. The +union and its fields default to private. Using a private field in a union +instantiation, field access, or pattern match produces an error. + +## Uninitialized unions + +The compiler should consider a union uninitialized if declared without an +initializer. However, providing a field during instantiation, or assigning to +a field, should cause the compiler to treat the entire union as initialized. + +## Unions and traits + +A union may have trait implementations, using the same `impl` syntax as a +struct. + +The compiler should provide a lint if a union field has a type that implements +the `Drop` trait. The explanation for that lint should include an explanation +of the caveat documented in the section "Writing fields". The compiler should +allow disabling that lint with `#[allow(union_field_drop)]`, for code that +intentionally stores a type with Drop in a union. The compiler must never +implicitly generate a Drop implementation for the union itself, though Rust +code may explicitly implement Drop for a union type. + +## Generic unions + +A union may have a generic type, with one or more type parameters or lifetime +parameters. As with a generic enum, the types within the union must make use +of all the parameters; however, not all fields within the union must use all +parameters. + +Type inference works on generic union types. In some cases, the compiler may +not have enough information to infer the parameters of a generic type, and may +require explicitly specifying them. + +## Unions and undefined behavior + +Rust code must not use unions to invoke [undefined +behavior](https://doc.rust-lang.org/nightly/reference.html#behavior-considered-undefined). +In particular, Rust code must not use unions to break the pointer aliasing +rules with raw pointers, or access a field containing a primitive type with an +invalid value. + +In addition, since a union declared without `#[repr(C)]` uses an unspecified +binary layout, code reading fields of such a union or pattern-matching such a +union must not read from a field other than the one written to. This includes +pattern-matching a specific value in a union field. + +## Union size and alignment + +A union declared with `#[repr(C)]` must have the same size and alignment as an +equivalent C union declaration for the target platform. Typically, a union +would have the maximum size of any of its fields, and the maximum alignment of +any of its fields. Note that those maximums may come from different fields; +for instance: + +```rust +#[repr(C)] +union U { + f1: u16, + f2: [u8; 4], +} + +#[test] +fn test() { + assert_eq!(std::mem::size_of(), 4); + assert_eq!(std::mem::align_of(), 2); +} +``` + +# Drawbacks +[drawbacks]: #drawbacks + +Adding a new type of data structure would increase the complexity of the +language and the compiler implementation, albeit marginally. However, this +change seems likely to provide a net reduction in the quantity and complexity +of unsafe code. + +# Alternatives +[alternatives]: #alternatives + +Proposals for unions in Rust have a substantial history, with many variants and +alternatives prior to the syntax proposed here with a `union` pseudo-keyword. +Thanks to many people in the Rust community for helping to refine this RFC. + +The most obvious path to introducing unions in Rust would introduce `union` as +a new keyword. However, any introduction of a new keyword will necessarily +break some code that previously compiled, such as code using the keyword as an +identifier. Making `union` a keyword in the standard way would break the +substantial volume of existing Rust code using `union` for other purposes, +including [multiple functions in the standard +library](https://doc.rust-lang.org/std/?search=union). The approach proposed +here, recognizing `union` to introduce a union declaration without prohibiting +`union` as an identifier, provides the most natural declaration syntax and +avoids breaking any existing code. + +Proposals for unions in Rust have extensively explored possible variations on +declaration syntax, including longer keywords (`untagged_union`), built-in +syntax macros (`union!`), compound keywords (`unsafe union`), pragmas +(`#[repr(union)] struct`), and combinations of existing keywords (`unsafe +enum`). + +In the absence of a new keyword, since unions represent unsafe, untagged sum +types, and enum represents safe, tagged sum types, Rust could base unions on +enum instead. The [unsafe enum](https://github.com/rust-lang/rfcs/pull/724) +proposal took this approach, introducing unsafe, untagged enums, identified +with `unsafe enum`; further discussion around that proposal led to the +suggestion of extending it with struct-like field access syntax. Such a +proposal would similarly eliminate explicit use of `std::mem::transmute`, and +avoid the need to handle platform-specific size and alignment requirements for +fields. + +The standard pattern-matching syntax of enums would make field accesses +significantly more verbose than struct-like syntax, and in particular would +typically require more code inside unsafe blocks. Adding struct-like field +access syntax would avoid that; however, pairing an enum-like definition with +struct-like usage seems confusing for developers. A declaration using `enum` +leads users to expect enum-like syntax; a new construct distinct from both +`enum` and `struct` avoids leading users to expect any particular syntax or +semantics. Furthermore, developers used to C unions will expect struct-like +field access for unions. + +Since this proposal uses struct-like syntax for declaration, initialization, +pattern matching, and field access, the original version of this RFC used a +pragma modifying the `struct` keyword: `#[repr(union)] struct`. However, while +the proposed unions match struct syntax, they do not share the semantics of +struct; most notably, unions represent a sum type, while structs represent a +product type. The new construct `union` avoids the semantics attached to +existing keywords. + +In the absence of any native support for unions, developers of existing Rust +code have resorted to either complex platform-specific transmute code, or +complex union-definition macros. In the latter case, such macros make field +accesses and pattern matching look more cumbersome and less structure-like, and +still require detailed platform-specific knowledge of structure layout and +field sizes. The implementation and use of such macros provides strong +motivation to seek a better solution, and indeed existing writers and users of +such macros have specifically requested native syntax in Rust. + +Finally, to call more attention to reads and writes of union fields, field +access could use a new access operator, rather than the same `.` operator used +for struct fields. This would make union fields more obvious at the time of +access, rather than making them look syntactically identical to struct fields +despite the semantic difference in storage representation. However, this does +not seem worth the additional syntactic complexity and divergence from other +languages. Union field accesses already require unsafe blocks, which calls +attention to them. Calls to unsafe functions use the same syntax as calls to +safe functions. + +Much discussion in the [tracking issue for +unions](https://github.com/rust-lang/rust/issues/32836) debated whether +assigning to a union field that implements Drop should drop the previous value +of the field. This produces potentially surprising behavior if that field +doesn't currently contain a valid value of that type. However, that behavior +maintains consistency with assignments to struct fields and mutable variables, +which writers of unsafe code must already take into account; the alternative +would add an additional special case for writers of unsafe code. This does +provide further motivation for the lint for union fields implementing Drop; +code that explicitly overrides that lint will need to take this into account. + +# Unresolved questions +[unresolved]: #unresolved-questions + +Can the borrow checker support the rule that "simultaneous borrows of multiple +fields of a struct contained within a union do not conflict"? If not, omitting +that rule would only marginally increase the verbosity of such code, by +requiring an explicit borrow of the entire struct first. + +Can a pattern match match multiple fields of a union at once? For rationale, +consider a union using the low bits of an aligned pointer as a tag; a pattern +match may match the tag using one field and a value identified by that tag +using another field. However, if this complicates the implementation, omitting +it would not significantly complicate code using unions. + +C APIs using unions often also make use of anonymous unions and anonymous +structs. For instance, a union may contain anonymous structs to define +non-overlapping fields, and a struct may contain an anonymous union to define +overlapping fields. This RFC does not define anonymous unions or structs, but +a subsequent RFC may wish to do so. + +# Edit History + +- This RFC was amended in https://github.com/rust-lang/rfcs/pull/1663/ + to clarify the behavior when an individual field whose type + implements `Drop`. diff --git a/text/1445-restrict-constants-in-patterns.md b/text/1445-restrict-constants-in-patterns.md new file mode 100644 index 00000000000..74eedb4520b --- /dev/null +++ b/text/1445-restrict-constants-in-patterns.md @@ -0,0 +1,621 @@ +- Feature Name: `structural_match` +- Start Date: 2015-02-06 +- RFC PR: [rust-lang/rfcs#1445](https://github.com/rust-lang/rfcs/pull/1445) +- Rust Issue: [rust-lang/rust#31434](https://github.com/rust-lang/rust/issues/31434) + +# Summary +[summary]: #summary + +The current compiler implements a more expansive semantics for pattern +matching than was originally intended. This RFC introduces several +mechanisms to reign in these semantics without actually breaking +(much, if any) extant code: + +- Introduce a feature-gated attribute `#[structural_match]` which can + be applied to a struct or enum `T` to indicate that constants of + type `T` can be used within patterns. +- Have `#[derive(Eq)]` automatically apply this attribute to + the struct or enum that it decorates. **Automatically inserted attributes + do not require use of feature-gate.** +- When expanding constants of struct or enum type into equivalent + patterns, require that the struct or enum type is decorated with + `#[structural_match]`. Constants of builtin types are always + expanded. + +The practical effect of these changes will be to prevent the use of +constants in patterns unless the type of those constants is either a +built-in type (like `i32` or `&str`) or a user-defined constant for +which `Eq` is **derived** (not merely *implemented*). + +To be clear, this `#[structural_match]` attribute is **never intended +to be stabilized**. Rather, the intention of this change is to +restrict constant patterns to those cases that everyone can agree on +for now. We can then have further discussion to settle the best +semantics in the long term. + +Because the compiler currently accepts arbitrary constant patterns, +this is technically a backwards incompatible change. However, the +design of the RFC means that existing code that uses constant patterns +will generally "just work". The justification for this change is that +it is clarifying +["underspecified language semantics" clause, as described in RFC 1122][ls]. +A [recent crater run][crater] with a prototype implementation found 6 +regressions. + +[crater]: https://gist.github.com/nikomatsakis/e714e4a824527e0ce5c9 + +**Note:** this was also discussed on an [internals thread]. Major +points from that thread are summarized either inline or in +alternatives. + +[ls]: https://github.com/rust-lang/rfcs/blob/master/text/1122-language-semver.md#underspecified-language-semantics +[crater run]: https://gist.github.com/nikomatsakis/26096ec2a2df3c1fb224 +[internals thread]: https://internals.rust-lang.org/t/how-to-handle-pattern-matching-on-constants/2846) + +# Motivation +[motivation]: #motivation + +The compiler currently permits any kind of constant to be used within +a pattern. However, the *meaning* of such a pattern is somewhat +controversial: the current semantics implemented by the compiler were +[adopted in July of 2014](https://github.com/rust-lang/rust/pull/15650) +and were never widely discussed nor did they go through the RFC +process. Moreover, the discussion at the time was focused primarily on +implementation concerns, and overlooked the potential semantic +hazards. + +### Semantic vs structural equality + +Consider a program like this one, which references a constant value +from within a pattern: + +```rust +struct SomeType { + a: u32, + b: u32, +} + +const SOME_CONSTANT: SomeType = SomeType { a: 22+22, b: 44+44 }; + +fn test(v: SomeType) { + match v { + SOME_CONSTANT => println!("Yes"), + _ => println!("No"), + } +} +``` + +The question at hand is what do we expect this match to do, precisely? +There are two main possibilities: semantic and structural equality. + +**Semantic equality.** Semantic equality states that a pattern +`SOME_CONSTANT` matches a value `v` if `v == SOME_CONSTANT`. In other +words, the `match` statement above would be exactly equivalent to an +`if`: + +```rust +if v == SOME_CONSTANT { + println!("Yes") +} else { + println!("No"); +} +``` + +Under semantic equality, the program above would not compile, because +`SomeType` does not implement the `PartialEq` trait. + +**Structural equality.** Under structural equality, `v` matches the +pattern `SOME_CONSTANT` if all of its fields are (structurally) equal. +Primitive types like `u32` are structurally equal if they represent +the same value (but see below for discussion about floating point +types like `f32` and `f64`). This means that the `match` statement +above would be roughly equivalent to the following `if` (modulo +privacy): + +```rust +if v.a == SOME_CONSTANT.a && v.b == SOME_CONSTANT.b { + println!("Yes") +} else { + println!("No"); +} +``` + +Structural equality basically says "two things are structurally equal +if their fields are structurally equal". It is sort of equality you +would get if everyone used `#[derive(PartialEq)]` on all types. Note +that the equality defined by structural equality is completely +distinct from the `==` operator, which is tied to the `PartialEq` +traits. That is, two values that are *semantically unequal* could be +*structurally equal* (an example where this might occur is the +floating point value `NaN`). + +**Current semantics.** The compiler's current semantics are basically +structural equality, though in the case of floating point numbers they +are arguably closer to semantic equality (details below). In +particular, when a constant appears in a pattern, the compiler first +evaluates that constant to a specific value. So we would reduce the +expression: + +```rust +const SOME_CONSTANT: SomeType = SomeType { a: 22+22, b: 44+44 }; +``` + +to the value `SomeType { a: 44, b: 88 }`. We then expand the pattern +`SOME_CONSTANT` as though you had typed this value in place (well, +almost as though, read on for some complications around privacy). +Thus the match statement above is equivalent to: + +```rust +match v { + SomeType { a: 44, b: 88 } => println!(Yes), + _ => println!("No"), +} +``` + +### Disadvantages of the current approach + +Given that the compiler already has a defined semantics, it is +reasonable to ask why we might want to change it. There +are two main disadvantages: + +1. **No abstraction boundary.** The current approach does not permit + types to define what equality means for themselves (at least not if + they can be constructed in a constant). +2. **Scaling to associated constants.** The current approach does not + permit associated constants or generic integers to be used in a + match statement. + +#### Disadvantage: Weakened abstraction bounary + +The single biggest concern with structural equality is that it +introduces two distinct notions of equality: the `==` operator, based +on the `PartialEq` trait, and pattern matching, based on a builtin +structural recursion. This will cause problems for user-defined types +that rely on `PartialEq` to define equality. Put another way, **it is +no longer possible for user-defined types to completely define what +equality means for themselves** (at least not if they can be +constructed in a constant). Furthermore, because the builtin +structural recursion does not consider privacy, `match` statements can +now be used to **observe private fields**. + +**Example: Normalized durations.** Consider a simple duration type: + +```rust +#[derive(Copy, Clone)] +pub struct Duration { + pub seconds: u32, + pub minutes: u32, +} +``` + +Let's say that this `Duration` type wishes to represent a span of +time, but it also wishes to preserve whether that time was expressed +in seconds or minutes. In other words, 60 seconds and 1 minute are +equal values, but we don't want to normalize 60 seconds into 1 minute; +perhaps because it comes from user input and we wish to keep things +just as the user chose to express it. + +We might implement `PartialEq` like so (actually the `PartialEq` trait +is slightly different, but you get the idea): + +```rust +impl PartialEq for Duration { + fn eq(&self, other: &Duration) -> bool { + let s1 = (self.seconds as u64) + (self.minutes as u64 * 60); + let s2 = (other.seconds as u64) + (other.minutes as u64 * 60); + s1 == s2 + } +} +``` + +Now imagine I have some constants: + +```rust +const TWENTY_TWO_SECONDS: Duration = Duration { seconds: 22, minutes: 0 }; +const ONE_MINUTE: Duration = Duration { seconds: 0, minutes: 1 }; +``` + +And I write a match statement using those constants: + +```rust +fn detect_some_case_or_other(d: Duration) { + match d { + TWENTY_TWO_SECONDS => /* do something */, + ONE_MINUTE => /* do something else */, + _ => /* do something else again */, + } +} +``` + +Now this code is, in all probability, buggy. Probably I meant to use +the notion of equality that `Duration` defined, where seconds and +minutes are normalized. But that is not the behavior I will see -- +instead I will use a pure structural match. What's worse, this means +the code will probably work in my local tests, since I like to say +"one minute", but it will break when I demo it for my customer, since +she prefers to write "60 seconds". + +**Example: Floating point numbers.** Another example is floating point +numbers. Consider the case of `0.0` and `-0.0`: these two values are +distinct, but they typically behave the same; so much so that they +compare equal (that is, `0.0 == -0.0` is `true`). So it is likely +that code such as: + +```rust +match some_computation() { + 0.0 => ..., + x => ..., +} +``` + +did not intend to discriminate between zero and negative zero. In +fact, in the compiler today, match *will* compare 0.0 and -0.0 as +equal. We simply do not extend that courtesy to user-defined types. + +**Example: observing private fields.** The current constant expansion +code does not consider privacy. In other words, constants are expanded +into equivalent patterns, but those patterns may not have been +something the user could have typed because of privacy rules. Consider +a module like: + +```rust +mod foo { + pub struct Foo { b: bool } + pub const V1: Foo = Foo { b: true }; + pub const V2: Foo = Foo { b: false }; +} +``` + +Note that there is an abstraction boundary here: b is a private +field. But now if I wrote code from another module that matches on a +value of type Foo, that abstraction boundary is pierced: + +```rust +fn bar(f: x::Foo) { + // rustc knows this is exhaustive because if expanded `V1` into + // equivalent patterns; patterns you could not write by hand! + match f { + x::V1 => { /* moreover, now we know that f.b is true */ } + x::V2 => { /* and here we know it is false */ } + } +} +``` + +Note that, because `Foo` does not implement `PartialEq`, just having +access to `V1` would not otherwise allow us to observe the value of +`f.b`. (And even if `Foo` *did* implement `PartialEq`, that +implementation might not read `f.b`, so we still would not be able to +observe its value.) + +**More examples.** There are numerous possible examples here. For +example, strings that compare using case-insensitive comparisons, but +retain the original case for reference, such as those used in +file-systems. Views that extract a subportion of a larger value (and +hence which should only compare that subportion). And so forth. + +#### Disadvantage: Scaling to associated constants and generic integers + +Rewriting constants into patterns requires that we can **fully +evaluate** the constant at the time of exhaustiveness checking. For +associated constants and type-level integers, that is not possible -- +we have to wait until monomorphization time. Consider: + +```rust +trait SomeTrait { + const A: bool; + const B: bool; +} + +fn foo(x: bool) { + match x { + T::A => println!("A"), + T::B => println!("B"), + } +} + +impl SomeTrait for i32 { + const A: bool = true; + const B: bool = true; +} + +impl SomeTrait for u32 { + const A: bool = true; + const B: bool = false; +} +``` + +Is this match exhaustive? Does it contain dead code? The answer will +depend on whether `T=i32` or `T=u32`, of course. + +### Advantages of the current approach + +However, structural equality also has a number of advantages: + +**Better optimization.** One of the biggest "pros" is that it can +potentially enable nice optimization. For example, given constants like the following: + +```rust +struct Value { x: u32 } +const V1: Value = Value { x: 0 }; +const V2: Value = Value { x: 1 }; +const V3: Value = Value { x: 2 }; +const V4: Value = Value { x: 3 }; +const V5: Value = Value { x: 4 }; +``` + +and a match pattern like the following: + +```rust +match v { + V1 => ..., + ..., + V5 => ..., +} +``` + +then, because pattern matching is always a process of structurally +extracting values, we can compile this to code that reads the field +`x` (which is a `u32`) and does an appropriate switch on that +value. Semantic equality would potentially force a more conservative +compilation strategy. + +**Better exhautiveness and dead-code checking.** Similarly, we can do +more thorough exhaustiveness and dead-code checking. So for example if +I have a struct like: + +```rust +struct Value { field: bool } +const TRUE: Value { field: true }; +const FALSE: Value { field: false }; +``` + +and a match pattern like: + +```rust +match v { TRUE => .., FALSE => .. } +``` + +then we can prove that this match is exhaustive. Similarly, we can prove +that the following match contains dead-code: + +```rust +const A: Value { field: true }; +match v { + TRUE => ..., + A => ..., +} +``` + +Again, some of the alternatives might not allow this. (But note the +cons, which also raise the question of exhaustiveness checking.) + +**Nullary variants and constants are (more) equivalent.** Currently, +there is a sort of equivalence between enum variants and constants, at +least with respect to pattern matching. Consider a C-like enum: + +```rust +enum Modes { + Happy = 22, + Shiny = 44, + People = 66, + Holding = 88, + Hands = 110, +} + +const C: Modes = Modes::Happy; +``` + +Now if I match against `Modes::Happy`, that is matching against an +enum variant, and under *all* the proposals I will discuss below, it +will check the actual variant of the value being matched (regardless +of whether `Modes` implements `PartialEq`, which it does not here). On +the other hand, if matching against `C` were to require a `PartialEq` +impl, then it would be illegal. Therefore matching against an *enum +variant* is distinct from matching against a *constant*. + +# Detailed design +[design]: #detailed-design + +The goal of this RFC is not to decide between semantic and structural +equality. Rather, the goal is to restrict pattern matching to that subset +of types where the two variants behave roughly the same. + +### The structural match attribute + +We will introduce an attribute `#[structural_match]` which can be +applied to struct and enum types. Explicit use of this attribute will +(naturally) be feature-gated. When converting a constant value into a +pattern, if the constant is of struct or enum type, we will check +whether this attribute is present on the struct -- if so, we will +convert the value as we do today. If not, we will report an error that +the struct/enum value cannot be used in a pattern. + +### Behavior of `#[derive(Eq)]` + +When deriving the `Eq` trait, we will add the `#[structural_match]` to +the type in question. Attributes added in this way will be **exempt from +the feature gate**. + +## Exhaustiveness and dead-code checking + +We will treat user-defined structs "opaquely" for the purpose of +exhaustiveness and dead-code checking. This is required to allow for +semantic equality semantics in the future, since in that case we +cannot rely on `Eq` to be correctly implemented (e.g., it could always +return `false`, no matter values are supplied to it, even though it's +not supposed to). The impact of this change has not been evaluated but +is expected to be **very** small, since in practice it is rather +challenging to successfully make an exhaustive match using +user-defined constants, unless they are something trivial like +newtype'd booleans (and, in that case, you can update the code to use +a more extended pattern). + +Similarly, dead code detection should treat constants in a +conservative fashion. that is, we can recognize that if there are two +arms using the same constant, the second one is dead code, even though +it may be that neither will matches (e.g., `match foo { C => _, C => _ +}`). We will make no assumptions about two distinct constants, even if +we can concretely evaluate them to the same value. + +One **unresolved question** (described below) is what behavior to +adopt for constants that involve no user-defined types. There, the +definition of `Eq` is purely under our control, and we know that it +matches structural equality, so we can retain our current aggressive +analysis if desired. + +### Phasing + +We will not make this change instantaneously. Rather, for at least one +release cycle, users who are pattern matching on struct types that +lack `#[structural_match]` will be warned about imminent breakage. + +# Drawbacks +[drawbacks]: #drawbacks + +This is a breaking change, which means some people might have to +change their code. However, that is considered extremely unlikely, +because such users would have to be pattern matching on constants that +are not comparable for equality (this is likely a bug in any case). + +# Alternatives +[alternatives]: #alternatives + + **Limit matching to builtin types.** An earlier version of this RFC +limited matching to builtin types like integers (and tuples of +integers). This RFC is a generalization of that which also +accommodates struct types that derive `Eq`. + +**Embrace current semantics (structural equality).** Naturally we +could opt to keep the semantics as they are. The advantages and +disadvantages are discussed above. + +**Embrace semantic equality.** We could opt to just go straight +towards "semantic equality". However, it seems better to reset the +semantics to a base point that everyone can agree on, and then extend +from that base point. Moreover, adopting semantic equality straight +out would be a riskier breaking change, as it could silently change +the semantics of existing programs (whereas the current proposal only +causes compilation to fail, never changes what an existing program +will do). + +# Discussion thread summary + +This section summarizes various points that were raised in the +[internals thread] which are related to patterns but didn't seem to +fit elsewhere. + +**Overloaded patterns.** Some languages, notably Scala, permit +overloading of patterns. This is related to "semantic equality" in +that it involves executing custom, user-provided code at compilation +time. + +**Pattern synonyms.** Haskell offers a feature called "pattern +synonyms" and +[it was argued](https://internals.rust-lang.org/t/how-to-handle-pattern-matching-on-constants/2846/39?u=nikomatsakis) +that the current treatment of patterns can be viewed as a similar +feature. This may be true, but constants-in-patterns are lacking a +number of important features from pattern synonyms, such as bindings, +as +[discussed in this response](https://internals.rust-lang.org/t/how-to-handle-pattern-matching-on-constants/2846/48?u=nikomatsakis). +The author feels that pattern synonyms might be a useful feature, but +it would be better to design them as a first-class feature, not adapt +constants for that purpose. + +# Unresolved questions +[unresolved]: #unresolved-questions + +**What about exhaustiveness etc on builtin types?** Even if we ignore +user-defined types, there are complications around exhaustiveness +checking for constants of any kind related to associated constants and +other possible future extensions. For example, the following code +[fails to compile](http://is.gd/PJjNKl) because it contains dead-code: + +```rust +const X: u64 = 0; +const Y: u64 = 0; +fn bar(foo: u64) { + match foo { + X => { } + Y => { } + _ => { } + } +} +``` + +However, we would be unable to perform such an analysis in a more +generic context, such as with an associated constant: + +```rust +trait Trait { + const X: u64; + const Y: u64; +} + +fn bar(foo: u64) { + match foo { + T::X => { } + T::Y => { } + _ => { } + } +} +``` + +Here, although it may well be that `T::X == T::Y`, we can't know for +sure. So, for consistency, we may wish to treat all constants opaquely +regardless of whether we are in a generic context or not. (However, it +also seems reasonable to make a "best effort" attempt at +exhaustiveness and dead pattern checking, erring on the conservative +side in those cases where constants cannot be fully evaluated.) + +A different argument in favor of treating all constants opaquely is +that the current behavior can leak details that perhaps were intended +to be hidden. For example, imagine that I define a fn `hash` that, +given a previous hash and a value, produces a new hash. Because I am +lazy and prototyping my system, I decide for now to just ignore the +new value and pass the old hash through: + +```rust +const fn add_to_hash(prev_hash: u64, _value: u64) -> u64 { + prev_hash +} +``` + +Now I have some consumers of my library and they define a few constants: + +```rust +const HASH_OF_ZERO: add_to_hash(0, 0); +const HASH_OF_ONE: add_to_hash(0, 1); +``` + +And at some point they write a match statement: + +```rust +fn process_hash(h: u64) { + match h { + HASH_OF_ZERO => /* do something */, + HASH_OF_ONE => /* do something else */, + _ => /* do something else again */, +} +``` + +As before, what you get when you [compile this](http://is.gd/u5WtCo) +is a dead-code error, because the compiler can see that `HASH_OF_ZERO` +and `HASH_OF_ONE` are the same value. + +Part of the solution here might be making "unreachable patterns" a +warning and not an error. The author feels this would be a good idea +regardless (though not necessarily as part of this RFC). However, +that's not a complete solution, since -- at least for `bool` constants +-- the same issues arise if you consider exhaustiveness checking. + +On the other hand, it feels very silly for the compiler not to +understand that `match some_bool { true => ..., false => ... }` is +exhaustive. Furthermore, there are other ways for the values of +constants to "leak out", such as when part of a type like +`[u8; SOME_CONSTANT]` (a point made by both [arielb1][arielb1ac] and +[glaebhoerl][gac] on the [internals thread]). Therefore, the proper +way to address this question is perhaps to consider an explicit form +of "abstract constant". + +[arielb1ac]: https://internals.rust-lang.org/t/how-to-handle-pattern-matching-on-constants/2846/9?u=nikomatsakis +[gac]: https://internals.rust-lang.org/t/how-to-handle-pattern-matching-on-constants/2846/32?u=nikomatsakis diff --git a/text/1461-net2-mutators.md b/text/1461-net2-mutators.md new file mode 100644 index 00000000000..cccd2423b32 --- /dev/null +++ b/text/1461-net2-mutators.md @@ -0,0 +1,126 @@ +- Feature Name: `net2_mutators` +- Start Date: 2016-01-12 +- RFC PR: [rust-lang/rfcs#1461](https://github.com/rust-lang/rfcs/pull/1461) +- Rust Issue: [rust-lang/rust#31766](https://github.com/rust-lang/rust/issues/31766) + +# Summary +[summary]: #summary + +[RFC 1158](https://github.com/rust-lang/rfcs/pull/1158) proposed the addition +of more functionality for the `TcpStream`, `TcpListener` and `UdpSocket` types, +but was declined so that those APIs could be built up out of tree in the [net2 +crate](https://crates.io/crates/net2/). This RFC proposes pulling portions of +net2's APIs into the standard library. + +# Motivation +[motivation]: #motivation + +The functionality provided by the standard library's wrappers around standard +networking types is fairly limited, and there is a large set of well supported, +standard functionality that is not currently implemented in `std::net` but has +existed in net2 for some time. + +All of the methods to be added map directly to equivalent system calls. + +This does not cover the entirety of net2's APIs. In particular, this RFC does +not propose to touch the builder types. + +# Detailed design +[design]: #detailed-design + +The following methods will be added: + +```rust +impl TcpStream { + fn set_nodelay(&self, nodelay: bool) -> io::Result<()>; + fn nodelay(&self) -> io::Result; + + fn set_ttl(&self, ttl: u32) -> io::Result<()>; + fn ttl(&self) -> io::Result; + + fn set_only_v6(&self, only_v6: bool) -> io::Result<()>; + fn only_v6(&self) -> io::Result; + + fn take_error(&self) -> io::Result>; + + fn set_nonblocking(&self, nonblocking: bool) -> io::Result<()>; +} + +impl TcpListener { + fn set_ttl(&self, ttl: u32) -> io::Result<()>; + fn ttl(&self) -> io::Result; + + fn set_only_v6(&self, only_v6: bool) -> io::Result<()>; + fn only_v6(&self) -> io::Result; + + fn take_error(&self) -> io::Result>; + + fn set_nonblocking(&self, nonblocking: bool) -> io::Result<()>; +} + +impl UdpSocket { + fn set_broadcast(&self, broadcast: bool) -> io::Result<()>; + fn broadcast(&self) -> io::Result; + + fn set_multicast_loop_v4(&self, multicast_loop_v4: bool) -> io::Result<()>; + fn multicast_loop_v4(&self) -> io::Result; + + fn set_multicast_ttl_v4(&self, multicast_ttl_v4: u32) -> io::Result<()>; + fn multicast_ttl_v4(&self) -> io::Result; + + fn set_multicast_loop_v6(&self, multicast_loop_v6: bool) -> io::Result<()>; + fn multicast_loop_v6(&self) -> io::Result; + + fn set_ttl(&self, ttl: u32) -> io::Result<()>; + fn ttl(&self) -> io::Result; + + fn set_only_v6(&self, only_v6: bool) -> io::Result<()>; + fn only_v6(&self) -> io::Result; + + fn join_multicast_v4(&self, multiaddr: &Ipv4Addr, interface: &Ipv4Addr) -> io::Result<()>; + fn join_multicast_v6(&self, multiaddr: &Ipv6Addr, interface: u32) -> io::Result<()>; + + fn leave_multicast_v4(&self, multiaddr: &Ipv4Addr, interface: &Ipv4Addr) -> io::Result<()>; + fn leave_multicast_v6(&self, multiaddr: &Ipv6Addr, interface: u32) -> io::Result<()>; + + fn connect(&self, addr: A) -> Result<()>; + fn send(&self, buf: &[u8]) -> Result; + fn recv(&self, buf: &mut [u8]) -> Result; + + fn take_error(&self) -> io::Result>; + + fn set_nonblocking(&self, nonblocking: bool) -> io::Result<()>; +} +``` + +The traditional approach would be to add these as unstable, inherent methods. +However, since inherent methods take precedence over trait methods, this would +cause all code using the extension traits in net2 to start reporting stability +errors. Instead, we have two options: + +1. Add this functionality as *stable* inherent methods. The rationale here would + be that time in a nursery crate acts as a de facto stabilization period. +2. Add this functionality via *unstable* extension traits. When/if we decide to + stabilize, we would deprecate the trait and add stable inherent methods. + Extension traits are a bit more annoying to work with, but this would give + us a formal stabilization period. + +Option 2 seems like the safer approach unless people feel comfortable with these +APIs. + +# Drawbacks +[drawbacks]: #drawbacks + +This is a fairly significant increase in the surface areas of these APIs, and +most users will never touch some of the more obscure functionality that these +provide. + +# Alternatives +[alternatives]: #alternatives + +We can leave some or all of this functionality in net2. + +# Unresolved questions +[unresolved]: #unresolved-questions + +The stabilization path (see above). diff --git a/text/1467-volatile.md b/text/1467-volatile.md new file mode 100644 index 00000000000..f3e7f6b628c --- /dev/null +++ b/text/1467-volatile.md @@ -0,0 +1,34 @@ +- Feature Name: volatile +- Start Date: 2016-01-18 +- RFC PR: [rust-lang/rfcs#1467](https://github.com/rust-lang/rfcs/pull/1467) +- Rust Issue: [rust-lang/rust#31756](https://github.com/rust-lang/rust/issues/31756) + +# Summary +[summary]: #summary + +Stabilize the `volatile_load` and `volatile_store` intrinsics as `ptr::read_volatile` and `ptr::write_volatile`. + +# Motivation +[motivation]: #motivation + +This is necessary to allow volatile access to memory-mapping I/O in stable code. Currently this is only possible using unstable intrinsics, or by abusing a bug in the `load` and `store` functions on atomic types which gives them volatile semantics ([rust-lang/rust#30962](https://github.com/rust-lang/rust/pull/30962)). + +# Detailed design +[design]: #detailed-design + +`ptr::read_volatile` and `ptr::write_volatile` will work the same way as `ptr::read` and `ptr::write` respectively, except that the memory access will be done with volatile semantics. The semantics of a volatile access are already pretty well defined by the C standard and by LLVM. In documentation we can refer to http://llvm.org/docs/LangRef.html#volatile-memory-accesses. + +# Drawbacks +[drawbacks]: #drawbacks + +None. + +# Alternatives +[alternatives]: #alternatives + +We could also stabilize the `volatile_set_memory`, `volatile_copy_memory` and `volatile_copy_nonoverlapping_memory` intrinsics as `ptr::write_bytes_volatile`, `ptr::copy_volatile` and `ptr::copy_nonoverlapping_volatile`, but these are not as widely used and are not available in C. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None. diff --git a/text/1479-unix-socket.md b/text/1479-unix-socket.md new file mode 100644 index 00000000000..96e26df72ee --- /dev/null +++ b/text/1479-unix-socket.md @@ -0,0 +1,480 @@ +- Feature Name: `unix_socket` +- Start Date: 2016-01-25 +- RFC PR: [rust-lang/rfcs#1479](https://github.com/rust-lang/rfcs/pull/1479) +- Rust Issue: [rust-lang/rust#32312](https://github.com/rust-lang/rust/issues/32312) + +# Summary +[summary]: #summary + +[Unix domain sockets](https://en.wikipedia.org/wiki/Unix_domain_socket) provide +a commonly used form of IPC on Unix-derived systems. This RFC proposes move the +[unix_socket](https://crates.io/crates/unix_socket/) nursery crate into the +`std::os::unix` module. + +# Motivation +[motivation]: #motivation + +Unix sockets are a common form of IPC on unixy systems. Databases like +PostgreSQL and Redis allow connections via Unix sockets, and Servo uses them to +communicate with subprocesses. Even though Unix sockets are not present on +Windows, their use is sufficiently widespread to warrant inclusion in the +platform-specific sections of the standard library. + +# Detailed design +[design]: #detailed-design + +Unix sockets can be configured with the `SOCK_STREAM`, `SOCK_DGRAM`, and +`SOCK_SEQPACKET` types. `SOCK_STREAM` creates a connection-oriented socket that +behaves like a TCP socket, `SOCK_DGRAM` creates a packet-oriented socket that +behaves like a UDP socket, and `SOCK_SEQPACKET` provides something of a hybrid +between the other two - a connection-oriented, reliable, ordered stream of +delimited packets. `SOCK_SEQPACKET` support has not yet been implemented in the +unix_socket crate, so only the first two socket types will initially be +supported in the standard library. + +While a TCP or UDP socket would be identified by a IP address and port number, +Unix sockets are typically identified by a filesystem path. For example, a +Postgres server will listen on a Unix socket located at +`/run/postgresql/.s.PGSQL.5432` in some configurations. However, the +`socketpair` function can make a pair of *unnamed* connected Unix sockets not +associated with a filesystem path. In addition, Linux provides a separate +*abstract* namespace not associated with the filesystem, indicated by a leading +null byte in the address. In the initial implementation, the abstract namespace +will not be supported - the various socket constructors will check for and +reject addresses with interior null bytes. + +A `std::os::unix::net` module will be created with the following contents: + +The `UnixStream` type mirrors `TcpStream`: +```rust +pub struct UnixStream { + ... +} + +impl UnixStream { + /// Connects to the socket named by `path`. + /// + /// `path` may not contain any null bytes. + pub fn connect>(path: P) -> io::Result { + ... + } + + /// Creates an unnamed pair of connected sockets. + /// + /// Returns two `UnixStream`s which are connected to each other. + pub fn pair() -> io::Result<(UnixStream, UnixStream)> { + ... + } + + /// Creates a new independently owned handle to the underlying socket. + /// + /// The returned `UnixStream` is a reference to the same stream that this + /// object references. Both handles will read and write the same stream of + /// data, and options set on one stream will be propogated to the other + /// stream. + pub fn try_clone(&self) -> io::Result { + ... + } + + /// Returns the socket address of the local half of this connection. + pub fn local_addr(&self) -> io::Result { + ... + } + + /// Returns the socket address of the remote half of this connection. + pub fn peer_addr(&self) -> io::Result { + ... + } + + /// Sets the read timeout for the socket. + /// + /// If the provided value is `None`, then `read` calls will block + /// indefinitely. It is an error to pass the zero `Duration` to this + /// method. + pub fn set_read_timeout(&self, timeout: Option) -> io::Result<()> { + ... + } + + /// Sets the write timeout for the socket. + /// + /// If the provided value is `None`, then `write` calls will block + /// indefinitely. It is an error to pass the zero `Duration` to this + /// method. + pub fn set_write_timeout(&self, timeout: Option) -> io::Result<()> { + ... + } + + /// Returns the read timeout of this socket. + pub fn read_timeout(&self) -> io::Result> { + ... + } + + /// Returns the write timeout of this socket. + pub fn write_timeout(&self) -> io::Result> { + ... + } + + /// Moves the socket into or out of nonblocking mode. + pub fn set_nonblocking(&self, nonblocking: bool) -> io::Result<()> { + ... + } + + /// Returns the value of the `SO_ERROR` option. + pub fn take_error(&self) -> io::Result> { + ... + } + + /// Shuts down the read, write, or both halves of this connection. + /// + /// This function will cause all pending and future I/O calls on the + /// specified portions to immediately return with an appropriate value + /// (see the documentation of `Shutdown`). + pub fn shutdown(&self, how: Shutdown) -> io::Result<()> { + ... + } +} + +impl Read for UnixStream { + ... +} + +impl<'a> Read for &'a UnixStream { + ... +} + +impl Write for UnixStream { + ... +} + +impl<'a> Write for UnixStream { + ... +} + +impl FromRawFd for UnixStream { + ... +} + +impl AsRawFd for UnixStream { + ... +} + +impl IntoRawFd for UnixStream { + ... +} +``` + +Differences from `TcpStream`: +* `connect` takes an `AsRef` rather than a `ToSocketAddrs`. +* The `pair` method creates a pair of connected, unnamed sockets, as this is + commonly used for IPC. +* The `SocketAddr` returned by the `local_addr` and `peer_addr` methods is + different. +* The `set_nonblocking` and `take_error` methods are not currently present on + `TcpStream` but are provided in the `net2` crate and are being proposed for + addition to the standard library in a separate RFC. + +As noted above, a Unix socket can either be unnamed, be associated with a path +on the filesystem, or (on Linux) be associated with an ID in the abstract +namespace. The `SocketAddr` struct is fairly simple: + +```rust +pub struct SocketAddr { + ... +} + +impl SocketAddr { + /// Returns true if the address is unnamed. + pub fn is_unnamed(&self) -> bool { + ... + } + + /// Returns the contents of this address if it corresponds to a filesystem path. + pub fn as_pathname(&self) -> Option<&Path> { + ... + } +} +``` + +The `UnixListener` type mirrors the `TcpListener` type: +```rust +pub struct UnixListener { + ... +} + +impl UnixListener { + /// Creates a new `UnixListener` bound to the specified socket. + /// + /// `path` may not contain any null bytes. + pub fn bind>(path: P) -> io::Result { + ... + } + + /// Accepts a new incoming connection to this listener. + /// + /// This function will block the calling thread until a new Unix connection + /// is established. When established, the corersponding `UnixStream` and + /// the remote peer's address will be returned. + pub fn accept(&self) -> io::Result<(UnixStream, SocketAddr)> { + ... + } + + /// Creates a new independently owned handle to the underlying socket. + /// + /// The returned `UnixListener` is a reference to the same socket that this + /// object references. Both handles can be used to accept incoming + /// connections and options set on one listener will affect the other. + pub fn try_clone(&self) -> io::Result { + ... + } + + /// Returns the local socket address of this listener. + pub fn local_addr(&self) -> io::Result { + ... + } + + /// Moves the socket into or out of nonblocking mode. + pub fn set_nonblocking(&self, nonblocking: bool) -> io::Result<()> { + ... + } + + /// Returns the value of the `SO_ERROR` option. + pub fn take_error(&self) -> io::Result> { + ... + } + + /// Returns an iterator over incoming connections. + /// + /// The iterator will never return `None` and will also not yield the + /// peer's `SocketAddr` structure. + pub fn incoming<'a>(&'a self) -> Incoming<'a> { + ... + } +} + +impl FromRawFd for UnixListener { + ... +} + +impl AsRawFd for UnixListener { + ... +} + +impl IntoRawFd for UnixListener { + ... +} +``` + +Differences from `TcpListener`: +* `bind` takes an `AsRef` rather than a `ToSocketAddrs`. +* The `SocketAddr` type is different. +* The `set_nonblocking` and `take_error` methods are not currently present on + `TcpListener` but are provided in the `net2` crate and are being proposed for + addition to the standard library in a separate RFC. + +Finally, the `UnixDatagram` type mirrors the `UpdSocket` type: +```rust +pub struct UnixDatagram { + ... +} + +impl UnixDatagram { + /// Creates a Unix datagram socket bound to the given path. + /// + /// `path` may not contain any null bytes. + pub fn bind>(path: P) -> io::Result { + ... + } + + /// Creates a Unix Datagram socket which is not bound to any address. + pub fn unbound() -> io::Result { + ... + } + + /// Create an unnamed pair of connected sockets. + /// + /// Returns two `UnixDatagrams`s which are connected to each other. + pub fn pair() -> io::Result<(UnixDatagram, UnixDatagram)> { + ... + } + + /// Creates a new independently owned handle to the underlying socket. + /// + /// The returned `UnixDatagram` is a reference to the same stream that this + /// object references. Both handles will read and write the same stream of + /// data, and options set on one stream will be propogated to the other + /// stream. + pub fn try_clone(&self) -> io::Result { + ... + } + + /// Connects the socket to the specified address. + /// + /// The `send` method may be used to send data to the specified address. + /// `recv` and `recv_from` will only receive data from that address. + /// + /// `path` may not contain any null bytes. + pub fn connect>(&self, path: P) -> io::Result<()> { + ... + } + + /// Returns the address of this socket. + pub fn local_addr(&self) -> io::Result { + ... + } + + /// Returns the address of this socket's peer. + /// + /// The `connect` method will connect the socket to a peer. + pub fn peer_addr(&self) -> io::Result { + ... + } + + /// Receives data from the socket. + /// + /// On success, returns the number of bytes read and the address from + /// whence the data came. + pub fn recv_from(&self, buf: &mut [u8]) -> io::Result<(usize, SocketAddr)> { + ... + } + + /// Receives data from the socket. + /// + /// On success, returns the number of bytes read. + pub fn recv(&self, buf: &mut [u8]) -> io::Result { + ... + } + + /// Sends data on the socket to the specified address. + /// + /// On success, returns the number of bytes written. + /// + /// `path` may not contain any null bytes. + pub fn send_to>(&self, buf: &[u8], path: P) -> io::Result { + ... + } + + /// Sends data on the socket to the socket's peer. + /// + /// The peer address may be set by the `connect` method, and this method + /// will return an error if the socket has not already been connected. + /// + /// On success, returns the number of bytes written. + pub fn send(&self, buf: &[u8]) -> io::Result { + ... + } + + /// Sets the read timeout for the socket. + /// + /// If the provided value is `None`, then `recv` and `recv_from` calls will + /// block indefinitely. It is an error to pass the zero `Duration` to this + /// method. + pub fn set_read_timeout(&self, timeout: Option) -> io::Result<()> { + ... + } + + /// Sets the write timeout for the socket. + /// + /// If the provided value is `None`, then `send` and `send_to` calls will + /// block indefinitely. It is an error to pass the zero `Duration` to this + /// method. + pub fn set_write_timeout(&self, timeout: Option) -> io::Result<()> { + ... + } + + /// Returns the read timeout of this socket. + pub fn read_timeout(&self) -> io::Result> { + ... + } + + /// Returns the write timeout of this socket. + pub fn write_timeout(&self) -> io::Result> { + ... + } + + /// Moves the socket into or out of nonblocking mode. + pub fn set_nonblocking(&self, nonblocking: bool) -> io::Result<()> { + ... + } + + /// Returns the value of the `SO_ERROR` option. + pub fn take_error(&self) -> io::Result> { + ... + } + + /// Shut down the read, write, or both halves of this connection. + /// + /// This function will cause all pending and future I/O calls on the + /// specified portions to immediately return with an appropriate value + /// (see the documentation of `Shutdown`). + pub fn shutdown(&self, how: Shutdown) -> io::Result<()> { + ... + } +} + +impl FromRawFd for UnixDatagram { + ... +} + +impl AsRawFd for UnixDatagram { + ... +} + +impl IntoRawFd for UnixDatagram { + ... +} +``` + +Differences from `UdpSocket`: +* `bind` takes an `AsRef` rather than a `ToSocketAddrs`. +* The `unbound` method creates an unbound socket, as a Unix socket does not need + to be bound to send messages. +* The `pair` method creates a pair of connected, unnamed sockets, as this is + commonly used for IPC. +* The `SocketAddr` returned by the `local_addr` and `peer_addr` methods is + different. +* The `connect`, `send`, `recv`, `set_nonblocking`, and `take_error` methods are + not currently present on `UdpSocket` but are provided in the `net2` crate and + are being proposed for addition to the standard library in a separate RFC. + +## Functionality not present + +Some functionality is notably absent from this proposal: + +* Linux's abstract namespace is not supported. Functionality may be added in + the future via extension traits in `std::os::linux::net`. +* No support for `SOCK_SEQPACKET` sockets is proposed, as it has not yet been + implemented. Since it is connection oriented, there will be a socket type + `UnixSeqPacket` and a listener type `UnixSeqListener`. The naming of the + listener is a bit unfortunate, but use of `SOCK_SEQPACKET` is rare compared + to `SOCK_STREAM` so naming priority can go to that version. +* Unix sockets support file descriptor and credential transfer, but these will + not initially be supported as the `sendmsg`/`recvmsg` interface is complex + and bindings will need some time to prototype. + +These features can bake in the `rust-lang-nursery/unix-socket` as they're +developed. + +# Drawbacks +[drawbacks]: #drawbacks + +While there is precedent for platform specific components in the standard +library, this will be the by far the largest platform specific addition. + +# Alternatives +[alternatives]: #alternatives + +Unix socket support could be left out of tree. + +The naming convention of `UnixStream` and `UnixDatagram` doesn't perfectly +mirror `TcpStream` and `UdpSocket`, but `UnixStream` and `UnixSocket` seems way +too confusing. + +# Unresolved questions +[unresolved]: #unresolved-questions + +Is `std::os::unix::net` the right name for this module? It's not strictly +"networking" as all communication is local to one machine. `std::os::unix::unix` +is more accurate but weirdly repetitive and the extension trait module +`std::os::linux::unix` is even weirder. `std::os::unix::socket` is an option, +but seems like too general of a name for specifically `AF_UNIX` sockets as +opposed to *all* sockets. diff --git a/text/1492-dotdot-in-patterns.md b/text/1492-dotdot-in-patterns.md new file mode 100644 index 00000000000..c7861672222 --- /dev/null +++ b/text/1492-dotdot-in-patterns.md @@ -0,0 +1,128 @@ +- Feature Name: dotdot_in_patterns +- Start Date: 2016-02-06 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1492 +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +Permit the `..` pattern fragment in more contexts. + +# Motivation +[motivation]: #motivation + +The pattern fragment `..` can be used in some patterns to denote several elements in list contexts. +However, it doesn't always compiles when used in such contexts. +One can expect the ability to match tuple variants like `V(u8, u8, u8)` with patterns like +`V(x, ..)` or `V(.., z)`, but the compiler rejects such patterns currently despite accepting +very similar `V(..)`. + +This RFC is intended to "complete" the feature and make it work in all possible list contexts, +making the language a bit more convenient and consistent. + +# Detailed design +[design]: #detailed-design + +Let's list all the patterns currently existing in the language, that contain lists of subpatterns: + +``` +// Struct patterns. +S { field1, field2, ..., fieldN } + +// Tuple struct patterns. +S(field1, field2, ..., fieldN) + +// Tuple patterns. +(field1, field2, ..., fieldN) + +// Slice patterns. +[elem1, elem2, ..., elemN] +``` +In all the patterns above, except for struct patterns, field/element positions are significant. + +Now list all the contexts that currently permit the `..` pattern fragment: +``` +// Struct patterns, the last position. +S { subpat1, subpat2, .. } + +// Tuple struct patterns, the last and the only position, no extra subpatterns allowed. +S(..) + +// Slice patterns, the last position. +[subpat1, subpat2, ..] +// Slice patterns, the first position. +[.., subpatN-1, subpatN] +// Slice patterns, any other position. +[subpat1, .., subpatN] +// Slice patterns, any of the above with a subslice binding. +// (The binding is not actually a binding, but one more pattern bound to the sublist, but this is +// not important for our discussion.) +[subpat1, binding.., subpatN] +``` +Something is obviously missing, let's fill in the missing parts. + +``` +// Struct patterns, the last position. +S { subpat1, subpat2, .. } +// **NOT PROPOSED**: Struct patterns, any position. +// Since named struct fields are not positional, there's essentially no sense in placing the `..` +// anywhere except for one conventionally chosen position (the last one) or in sublist bindings, +// so we don't propose extensions to struct patterns. +S { subpat1, .., subpatN } +// **NOT PROPOSED**: Struct patterns with bindings +S { subpat1, binding.., subpatN } + +// Tuple struct patterns, the last and the only position, no extra subpatterns allowed. +S(..) +// **NEW**: Tuple struct patterns, any position. +S(subpat1, subpat2, ..) +S(.., subpatN-1, subpatN) +S(subpat1, .., subpatN) +// **NOT PROPOSED**: Struct patterns with bindings +S(subpat1, binding.., subpatN) + +// **NEW**: Tuple patterns, any position. +(subpat1, subpat2, ..) +(.., subpatN-1, subpatN) +(subpat1, .., subpatN) +// **NOT PROPOSED**: Tuple patterns with bindings +(subpat1, binding.., subpatN) +``` + +Slice patterns are not covered in this RFC, but here is the syntax for reference: + +``` +// Slice patterns, the last position. +[subpat1, subpat2, ..] +// Slice patterns, the first position. +[.., subpatN-1, subpatN] +// Slice patterns, any other position. +[subpat1, .., subpatN] +// Slice patterns, any of the above with a subslice binding. +// By ref bindings are allowed, slices and subslices always have compatible layouts. +[subpat1, binding.., subpatN] +``` + +Trailing comma is not allowed after `..` in the last position by analogy with existing slice and +struct patterns. + +This RFC is not critically important and can be rolled out in parts, for example, bare `..` first, +`..` with a sublist binding eventually. + +# Drawbacks +[drawbacks]: #drawbacks + +None. + +# Alternatives +[alternatives]: #alternatives + +Do not permit sublist bindings in tuples and tuple structs at all. + +# Unresolved questions +[unresolved]: #unresolved-questions + +Sublist binding syntax conflicts with possible exclusive range patterns +`begin .. end`/`begin..`/`..end`. This problem already exists for slice patterns and has to be +solved independently from extensions to `..`. +This RFC simply selects the same syntax that slice patterns already have. diff --git a/text/1498-ipv6addr-octets.md b/text/1498-ipv6addr-octets.md new file mode 100644 index 00000000000..5e77166aa6c --- /dev/null +++ b/text/1498-ipv6addr-octets.md @@ -0,0 +1,79 @@ +- Feature Name: `ipaddr_octet_arrays` +- Start Date: 2016-02-12 +- RFC PR: [rust-lang/rfcs#1498](https://github.com/rust-lang/rfcs/pull/1498) +- Rust Issue: [rust-lang/rust#32313](https://github.com/rust-lang/rust/issues/32313) + +# Summary +[summary]: #summary + +Add constructor and conversion functions for `std::net::Ipv6Addr` and +`std::net::Ipv4Addr` that are oriented around arrays of octets. + +# Motivation +[motivation]: #motivation + +Currently, the interface for `std::net::Ipv6Addr` is oriented around 16-bit +"segments". The constructor takes eight 16-bit integers as arguments, +and the sole getter function, `segments`, returns an array of eight +16-bit integers. This interface is unnatural when doing low-level network +programming, where IPv6 addresses are treated as a sequence of 16 octets. +For example, building and parsing IPv6 packets requires doing +bitwise arithmetic with careful attention to byte order in order to convert +between the on-wire format of 16 octets and the eight segments format used +by `std::net::Ipv6Addr`. + +# Detailed design +[design]: #detailed-design + +The following method would be added to `impl std::net::Ipv6Addr`: + +``` +pub fn octets(&self) -> [u8; 16] { + self.inner.s6_addr +} +``` + +The following `From` trait would be implemented: + +``` +impl From<[u8; 16]> for Ipv6Addr { + fn from(octets: [u8; 16]) -> Ipv6Addr { + let mut addr: c::in6_addr = unsafe { std::mem::zeroed() }; + addr.s6_addr = octets; + Ipv6Addr { inner: addr } + } +} +``` + +For consistency, the following `From` trait would be +implemented for `Ipv4Addr`: + +``` +impl From<[u8; 4]> for Ipv4Addr { + fn from(octets: [u8; 4]) -> Ipv4Addr { + Ipv4Addr::new(octets[0], octets[1], octets[2], octets[3]) + } +} +``` + +Note: `Ipv4Addr` already has an `octets` method that returns a `[u8; 4]`. + +# Drawbacks +[drawbacks]: #drawbacks + +It adds additional functions to the API, which increases cognitive load +and maintenance burden. That said, the functions are conceptually very simple +and their implementations short. + +# Alternatives +[alternatives]: #alternatives + +Do nothing. The downside is that developers will need to resort to +bitwise arithmetic, which is awkward and error-prone (particularly with +respect to byte ordering) to convert between `Ipv6Addr` and the on-wire +representation of IPv6 addresses. Or they will use their alternative +implementations of `Ipv6Addr`, fragmenting the ecosystem. + +# Unresolved questions +[unresolved]: #unresolved-questions + diff --git a/text/1504-int128.md b/text/1504-int128.md new file mode 100644 index 00000000000..4b43883bb39 --- /dev/null +++ b/text/1504-int128.md @@ -0,0 +1,107 @@ +- Feature Name: int128 +- Start Date: 21-02-2016 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1504 +- Rust Issue: https://github.com/rust-lang/rust/issues/35118 + +# Summary +[summary]: #summary + +This RFC adds the `i128` and `u128` primitive types to Rust. + +# Motivation +[motivation]: #motivation + +Some algorithms need to work with very large numbers that don't fit in 64 bits, such as certain cryptographic algorithms. One possibility would be to use a BigNum library, but these use heap allocation and tend to have high overhead. LLVM has support for very efficient 128-bit integers, which are exposed by Clang in C as the `__int128` type. + +# Detailed design +[design]: #detailed-design + +## Compiler support + +The first step for implementing this feature is to add support for the `i128`/`u128` primitive types to the compiler. This will requires changes to many parts of the compiler, from libsyntax to trans. + +The compiler will need to be bootstrapped from an older compiler which does not support `i128`/`u128`, but rustc will want to use these types internally for things like literal parsing and constant propagation. This can be solved by using a "software" implementation of these types, similar to the one in the [extprim](https://github.com/kennytm/extprim) crate. Once stage1 is built, stage2 can be compiled using the native LLVM `i128`/`u128` types. + +## Runtime library support + +The LLVM code generator supports 128-bit integers on all architectures, however it will lower some operations to runtime library calls. This similar to how we currently handle `u64` and `i64` on 32-bit platforms: "complex" operations such as multiplication or division are lowered by LLVM backends into calls to functions in the `compiler-rt` runtime library. + +Here is a rough breakdown of which operations are handled natively instead of through a library call: +- Add/Sub/Neg: native, including checked overflow variants +- Compare (eq/ne/gt/ge/lt/le): native +- Bitwise and/or/xor/not: native +- Shift left/right: native on most architectures (some use libcalls instead) +- Bit counting, parity, leading/trailing ones/zeroes: native +- Byte swapping: native +- Mul/Div/Mod: libcall (including checked overflow multiplication) +- Conversion to/from f32/f64: libcall + +The `compiler-rt` library that comes with LLVM only implements runtime library functions for 128-bit integers on 64-bit platforms (`#ifdef __LP64__`). We will need to provide our own implementations of the relevant functions to allow `i128`/`u128` to be available on all architectures. Note that this can only be done with a compiler that already supports `i128`/`u128` to match the calling convention that LLVM is expecting. + +Here is the list of functions that need to be implemented: + +```rust +fn __ashlti3(a: i128, b: i32) -> i128; +fn __ashrti3(a: i128, b: i32) -> i128; +fn __divti3(a: i128, b: i128) -> i128; +fn __fixdfti(a: f64) -> i128; +fn __fixsfti(a: f32) -> i128; +fn __fixunsdfti(a: f64) -> u128; +fn __fixunssfti(a: f32) -> u128; +fn __floattidf(a: i128) -> f64; +fn __floattisf(a: i128) -> f32; +fn __floatuntidf(a: u128) -> f64; +fn __floatuntisf(a: u128) -> f32; +fn __lshrti3(a: i128, b: i32) -> i128; +fn __modti3(a: i128, b: i128) -> i128; +fn __muloti4(a: i128, b: i128, overflow: &mut i32) -> i128; +fn __multi3(a: i128, b: i128) -> i128; +fn __udivti3(a: u128, b: u128) -> u128; +fn __umodti3(a: u128, b: u128) -> u128; +``` + +Implementations of these functions will be written in Rust and will be included in libcore. Note that it is not possible to write these functions in C or use the existing implementations in `compiler-rt` since the `__int128` type is not available in C on 32-bit platforms. + +## Modifications to libcore + +Several changes need to be done to libcore: +- `src/libcore/num/i128.rs`: Define `MIN` and `MAX`. +- `src/libcore/num/u128.rs`: Define `MIN` and `MAX`. +- `src/libcore/num/mod.rs`: Implement inherent methods, `Zero`, `One`, `From` and `FromStr` for `u128` and `i128`. +- `src/libcore/num/wrapping.rs`: Implement methods for `Wrapping` and `Wrapping`. +- `src/libcore/fmt/num.rs`: Implement `Binary`, `Octal`, `LowerHex`, `UpperHex`, `Debug` and `Display` for `u128` and `i128`. +- `src/libcore/cmp.rs`: Implement `Eq`, `PartialEq`, `Ord` and `PartialOrd` for `u128` and `i128`. +- `src/libcore/nonzero.rs`: Implement `Zeroable` for `u128` and `i128`. +- `src/libcore/iter.rs`: Implement `Step` for `u128` and `i128`. +- `src/libcore/clone.rs`: Implement `Clone` for `u128` and `i128`. +- `src/libcore/default.rs`: Implement `Default` for `u128` and `i128`. +- `src/libcore/hash/mod.rs`: Implement `Hash` for `u128` and `i128` and add `write_i128` and `write_u128` to `Hasher`. +- `src/libcore/lib.rs`: Add the `u128` and `i128` modules. + +## Modifications to libstd + +A few minor changes are required in libstd: +- `src/libstd/lib.rs`: Re-export `core::{i128, u128}`. +- `src/libstd/primitive_docs.rs`: Add documentation for `i128` and `u128`. + +## Modifications to other crates + +A few external crates will need to be updated to support the new types: +- `rustc-serialize`: Add the ability to serialize `i128` and `u128`. +- `serde`: Add the ability to serialize `i128` and `u128`. +- `rand`: Add the ability to generate random `i128`s and `u128`s. + +# Drawbacks +[drawbacks]: #drawbacks + +One possible issue is that a `u128` can hold a very large number that doesn't fit in a `f32`. We need to make sure this doesn't lead to any `undef`s from LLVM. See [this comment](https://github.com/rust-lang/rust/issues/10185#issuecomment-110955148), and [this example code](https://gist.github.com/Amanieu/f87da5f0599b343c5500). + +# Alternatives +[alternatives]: #alternatives + +There have been several attempts to create `u128`/`i128` wrappers based on two `u64` values, but these can't match the performance of LLVM's native 128-bit integers. For example LLVM is able to lower a 128-bit add into just 2 instructions on 64-bit platforms and 4 instructions on 32-bit platforms. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None diff --git a/text/1506-adt-kinds.md b/text/1506-adt-kinds.md new file mode 100644 index 00000000000..8c033ade760 --- /dev/null +++ b/text/1506-adt-kinds.md @@ -0,0 +1,181 @@ +- Feature Name: clarified_adt_kinds +- Start Date: 2016-02-07 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1506 +- Rust Issue: https://github.com/rust-lang/rust/issues/35626 + +# Summary +[summary]: #summary + +Provide a simple model describing three kinds of structs and variants and their relationships. +Provide a way to match on structs/variants in patterns regardless of their kind (`S{..}`). +Permit tuple structs and tuple variants with zero fields (`TS()`). + +# Motivation +[motivation]: #motivation + +There's some mental model lying under the current implementation of ADTs, but it is not written +out explicitly and not implemented completely consistently. +Writing this model out helps to identify its missing parts. +Some of this missing parts turn out to be practically useful. +This RFC can also serve as a piece of documentation. + +# Detailed design +[design]: #detailed-design + +The text below mostly talks about structures, but almost everything is equally applicable to +variants. + +## Braced structs + +Braced structs are declared with braces (unsurprisingly). + +``` +struct S { + field1: Type1, + field2: Type2, + field3: Type3, +} +``` + +Braced structs are the basic struct kind, other kinds are built on top of them. +Braced structs have 0 or more user-named fields and are defined only in type namespace. + +Braced structs can be used in struct expressions `S{field1: expr, field2: expr}`, including +functional record update (FRU) `S{field1: expr, ..s}`/`S{..s}` and with struct patterns +`S{field1: pat, field2: pat}`/`S{field1: pat, ..}`/`S{..}`. +In all cases the path `S` of the expression or pattern is looked up in the type namespace (so these +expressions/patterns can be used with type aliases). +Fields of a braced struct can be accessed with dot syntax `s.field1`. + +Note: struct *variants* are currently defined in the value namespace in addition to type namespace, + there are no particular reasons for this and this is probably temporary. + +## Unit structs + +Unit structs are defined without any fields or brackets. + +``` +struct US; +``` + +Unit structs can be thought of as a single declaration for two things: a basic struct + +``` +struct US {} +``` + +and a constant with the same nameNote 1 + +``` +const US: US = US{}; +``` + +Unit structs have 0 fields and are defined in both type (the type `US`) and value (the +constant `US`) namespaces. + +As a basic struct, a unit struct can participate in struct expressions `US{}`, including FRU +`US{..s}` and in struct patterns `US{}`/`US{..}`. In both cases the path `US` of the expression +or pattern is looked up in the type namespace (so these expressions/patterns can be used with type +aliases). +Fields of a unit struct could also be accessed with dot syntax, but it doesn't have any fields. + +As a constant, a unit struct can participate in unit struct expressions `US` and unit struct +patterns `US`, both of these are looked up in the value namespace in which the constant `US` is +defined (so these expressions/patterns cannot be used with type aliases). + +Note 1: the constant is not exactly a `const` item, there are subtle differences (e.g. with regards +to `match` exhaustiveness), but it's a close approximation. +Note 2: the constant is pretty weirdly namespaced in case of unit *variants*, constants can't be +defined in "enum modules" manually. + +## Tuple structs + +Tuple structs are declared with parentheses. +``` +struct TS(Type0, Type1, Type2); +``` + +Tuple structs can be thought of as a single declaration for two things: a basic struct + +``` +struct TS { + 0: Type0, + 1: Type1, + 2: Type2, +} +``` + +and a constructor function with the same nameNote 2 + +``` +fn TS(arg0: Type0, arg1: Type1, arg2: Type2) -> TS { + TS{0: arg0, 1: arg1, 2: arg2} +} +``` + +Tuple structs have 0 or more automatically-named fields and are defined in both type (the type `TS`) +and the value (the constructor function `TS`) namespaces. + +As a basic struct, a tuple struct can participate in struct expressions `TS{0: expr, 1: expr}`, +including FRU `TS{0: expr, ..ts}`/`TS{..ts}` and in struct patterns +`TS{0: pat, 1: pat}`/`TS{0: pat, ..}`/`TS{..}`. +In both cases the path `TS` of the expression or pattern is looked up in the type namespace (so +these expressions/patterns can be used with type aliases). +Fields of a tuple struct can be accessed with dot syntax `ts.0`. + +As a constructor, a tuple struct can participate in tuple struct expressions `TS(expr, expr)` and +tuple struct patterns `TS(pat, pat)`/`TS(..)`, both of these are looked up in the value namespace +in which the constructor `TS` is defined (so these expressions/patterns cannot be used with type +aliases). Tuple struct expressions `TS(expr, expr)` are usual +function calls, but the compiler reserves the right to make observable improvements to them based +on the additional knowledge, that `TS` is a constructor. + +Note 1: the automatically assigned field names are quite interesting, they are not identifiers +lexically (they are integer literals), so such fields can't be defined manually. +Note 2: the constructor function is not exactly a `fn` item, there are subtle differences (e.g. with +regards to privacy checks), but it's a close approximation. + +## Summary of the changes. + +Everything related to braced structs and unit structs is already implemented. + +New: Permit tuple structs and tuple variants with 0 fields. This restriction is artificial and can +be lifted trivially. Macro writers dealing with tuple structs/variants will be happy to get rid of +this one special case. + +New: Permit using tuple structs and tuple variants in braced struct patterns and expressions not +requiring naming their fields - `TS{..ts}`/`TS{}`/`TS{..}`. This doesn't require much effort to +implement as well. +This also means that `S{..}` patterns can be used to match structures and variants of any kind. +The desire to have such "match everything" patterns is sometimes expressed given +that number of fields in structures and variants can change from zero to non-zero and back during +development. +An extra benefit is ability to match/construct tuple structs using their type aliases. + +New: Permit using tuple structs and tuple variants in braced struct patterns and expressions +requiring naming their fields - `TS{0: expr}`/`TS{0: pat}`/etc. +While this change is important for consistency, there's not much motivation for it in hand-written +code besides shortening patterns like `ItemFn(_, _, unsafety, _, _, _)` into something like +`ItemFn{2: unsafety, ..}` and ability to match/construct tuple structs using their type aliases. +However, automatic code generators (e.g. syntax extensions) can get more benefits from the +ability to generate uniform code for all structure kinds. +`#[derive]` for example, currently has separate code paths for generating expressions and patterns +for braces structs (`ExprStruct`/`PatKind::Struct`), tuple structs +(`ExprCall`/`PatKind::TupleStruct`) and unit structs (`ExprPath`/`PatKind::Path`). With proposed +changes `#[derive]` could simplify its logic and always generate braced forms for expressions and +patterns. + +# Drawbacks +[drawbacks]: #drawbacks + +None. + +# Alternatives +[alternatives]: #alternatives + +None. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None. diff --git a/text/1510-cdylib.md b/text/1510-cdylib.md new file mode 100644 index 00000000000..7961262594c --- /dev/null +++ b/text/1510-cdylib.md @@ -0,0 +1,101 @@ +- Feature Name: N/A +- Start Date: 2016-02-23 +- RFC PR: [rust-lang/rfcs#1510](https://github.com/rust-lang/rfcs/pull/1510) +- Rust Issue: [rust-lang/rust#33132](https://github.com/rust-lang/rust/issues/33132) + +# Summary +[summary]: #summary + +Add a new crate type accepted by the compiler, called `cdylib`, which +corresponds to exporting a C interface from a Rust dynamic library. + +# Motivation +[motivation]: #motivation + +Currently the compiler supports two modes of generating dynamic libraries: + +1. One form of dynamic library is intended for reuse with further compilations. + This kind of library exposes all Rust symbols, links to the standard library + dynamically, etc. I'll refer to this mode as **rdylib** as it's a Rust + dynamic library talking to Rust. +2. Another form of dynamic library is intended for embedding a Rust application + into another. Currently the only difference from the previous kind of dynamic + library is that it favors linking statically to other Rust libraries + (bundling them inside). I'll refer to this as a **cdylib** as it's a Rust + dynamic library exporting a C API. + +Each of these flavors of dynamic libraries has a distinct use case. For examples +rdylibs are used by the compiler itself to implement plugins, and cdylibs are +used whenever Rust needs to be dynamically loaded from another language or +application. + +Unfortunately the balance of features is tilted a little bit too much towards +the smallest use case, rdylibs. In practice because Rust is statically linked by +default and has an unstable ABI, rdylibs are used quite rarely. There are a +number of requirements they impose, however, which aren't necessary for +cdylibs: + +* Metadata is included in all dynamic libraries. If you're just loading Rust + into somewhere else, however, you have no need for the metadata! +* *Reachable* symbols are exposed from dynamic libraries, but if you're loading + Rust into somewhere else then, like executables, only *public* non-Rust-ABI + functions need to be exported. This can lead to unnecessarily large Rust + dynamic libraries in terms of object size as well as missed optimization + opportunities from knowing that a function is otherwise private. +* We can't run LTO for dylibs because those are intended for end products, not + intermediate ones like (1) is. + +The purpose of this RFC is to solve these drawbacks with a new crate-type to +represent the more rarely used form of dynamic library (rdylibs). + +# Detailed design +[design]: #detailed-design + +A new crate type will be accepted by the compiler, `cdylib`, which can be passed +as either `--crate-type cdylib` on the command line or via `#![crate_type = +"cdylib"]` in crate attributes. This crate type will conceptually correspond to +the cdylib use case described above, and today's `dylib` crate-type will +continue to correspond to the rdylib use case above. Note that the literal +output artifacts of these two crate types (files, file names, etc) will be the +same. + +The two formats will differ in the parts listed in the motivation above, +specifically: + +* **Metadata** - rdylibs will have a section of the library with metadata, + whereas cdylibs will not. +* **Symbol visibility** - rdylibs will expose all symbols as rlibs do, cdylibs + will expose symbols as executables do. This means that `pub fn foo() {}` will + not be an exported symbol, but `#[no_mangle] pub extern fn foo() {}` will be + an exported symbol. Note that the compiler will also be at liberty to pass + extra flags to the linker to actively hide exported Rust symbols from linked + libraries. +* **LTO** - this will disallowed for rdylibs, but enabled for cdylibs. +* **Linkage** - rdylibs will link dynamically to one another by default, for + example the standard library will be linked dynamically by default. On the + other hand, cdylibs will link all Rust dependencies statically by default. + +# Drawbacks +[drawbacks]: #drawbacks + +Rust's ephemeral and ill-defined "linkage model" is... well... ill defined and +ephemeral. This RFC is an extension of this model, but it's difficult to reason +about extending that which is not well defined. As a result there could be +unforseen interactions between this output format and where it's used. + +# Alternatives +[alternatives]: #alternatives + +* Originally this RFC proposed adding a new crate type, `rdylib`, instead of + adding a new crate type, `cdylib`. The existing `dylib` output type would be + reinterpreted as a cdylib use-case. This is unfortunately, however, a breaking + change and requires a somewhat complicated transition plan in Cargo for + plugins. In the end it didn't seem worth it for the benefit of "cdylib is + probably what you want". + +# Unresolved questions +[unresolved]: #unresolved-questions + +* Should the existing `dylib` format be considered unstable? (should it require + a nightly compiler?). The use case for a Rust dynamic library is so limited, + and so volatile, we may want to just gate access to it by default. diff --git a/text/1513-less-unwinding.md b/text/1513-less-unwinding.md new file mode 100644 index 00000000000..a46c736e077 --- /dev/null +++ b/text/1513-less-unwinding.md @@ -0,0 +1,273 @@ +- Feature Name: `panic_runtime` +- Start Date: 2016-02-25 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1513 +- Rust Issue: https://github.com/rust-lang/rust/issues/32837 + +# Summary +[summary]: #summary + +Stabilize implementing panics as aborts. + +* Stabilize the `-Z no-landing-pads` flag under the name `-C panic=strategy` +* Implement a number of unstable features akin to custom allocators to swap out + implementations of panic just before a final product is generated. +* Add a `[profile.dev]` option to Cargo to configure how panics are implemented. + +# Motivation +[motivation]: #motivation + +Panics in Rust have long since been implemented with the intention of being +caught at particular boundaries (for example the thread boundary). This is quite +useful for isolating failures in Rust code, for example: + +* Servers can avoid taking down the entire process but can instead just take + down one request. +* Embedded Rust libraries can avoid taking down the entire process and can + instead gracefully inform the caller that an internal logic error occurred. +* Rust applications can isolate failure from various components. The classical + example of this is Servo can display a "red X" for an image which fails to + decode instead of aborting the entire browser or killing an entire page. + +While these are examples where a recoverable panic is useful, there are many +applications where recovering panics is undesirable or doesn't lead to anything +productive: + +* Rust applications which use `Result` for error handling typically use `panic!` + to indicate a fatal error, in which case the process *should* be taken down. +* Many applications simply can't recover from an internal assertion failure, so + there's no need trying to recover it. +* To implement a recoverable panic, the compiler and standard library use a + method called stack unwinding. The compiler must generate code to support this + unwinding, however, and this takes time in codegen and optimizers. +* Low-level applications typically don't use unwinding at all as there's no + stack unwinder (e.g. kernels). + +> **Note**: as an idea of the compile-time and object-size savings from +> disabling the extra codegen, compiling Cargo as a library is 11% faster (16s +> from 18s) and 13% smaller (15MB to 13MB). Sizable gains! + +Overall, the ability to recover panics is something that needs to be decided at +the application level rather than at the language level. Currently the compiler +does not support the ability to translate panics to process aborts in a stable +fashion, and the purpose of this RFC is to add such a venue. + +With such an important codegen option, however, as whether or not exceptions can +be caught, it's easy to get into a situation where libraries of mixed +compilation modes are linked together, causing odd or unknown errors. This RFC +proposes a situation similar to the design of custom allocators to alleviate +this situation. + +# Detailed design +[design]: #detailed-design + +The major goal of this RFC is to develop a work flow around managing crates +which wish to disable unwinding. This intends to set forth a complete vision for +how these crates interact with the ecosystem at large. Much of this design will +be similar to the [custom allocator RFC][custom-allocators]. + +[custom-allocators]: https://github.com/rust-lang/rfcs/blob/master/text/1183-swap-out-jemalloc.md + +### High level design + +This section serves as a high-level tour through the design proposed in this +RFC. The linked sections provide more complete explanation as to what each step +entails. + +* The compiler will have a [new stable flag](#new-compiler-flags), `-C panic` + which will configure how unwinding-related code is generated. +* [Two new unstable attributes](#panic-attributes) will be added to the + compiler, `#![needs_panic_runtime]` and `#![panic_runtime]`. The standard + library will need a runtime and will be lazily linked to a crate which has + `#![panic_runtime]`. +* [Two unstable crates](#panic-crates) tagged with `#![panic_runtime]` will be + distributed as the runtime implementation of panicking, `panic_abort` and + `panic_unwind` crates. The former will translate all panics to process + aborts, whereas the latter will be implemented as unwinding is today, via the + system stack unwinder. +* [Cargo will gain](#cargo-changes) a new `panic` option in the `[profile.foo]` + sections to indicate how that profile should compile panic support. + +### New Compiler Flags + +The first component to this design is to have a **stable** flag to the compiler +which configures how panic-related code is generated. This will be +stabilized in the form: + +``` +$ rustc -C help + +Available codegen options: + + ... + -C panic=val -- strategy to compile in for panic related code + ... +``` + +There will currently be two supported strategies: + +* `unwind` - this is what the compiler implements by default today via the + `invoke` LLVM instruction. +* `abort` - this will implement that `-Z no-landing-pads` does today, which is + to disable the `invoke` instruction and use `call` instead everywhere. + +This codegen option will default to `unwind` if not specified (what happens +today), and the value will be encoded into the crate metadata. This option is +planned with extensibility in mind to future panic strategies if we ever +implement some (return-based unwinding is at least one other possible option). + +### Panic Attributes + +Very similarly to [custom allocators][allocator-attributes], two new +**unstable** crate attributes will be added to the compiler: + +[allocator-attributes]: https://github.com/rust-lang/rfcs/blob/master/text/1183-swap-out-jemalloc.md#new-attributes + +* `#![needs_panic_runtime]` - indicates that this crate requires a "panic + runtime" to link correctly. This will be attached to the standard library and + is not intended to be attached to any other crate. +* `#![panic_runtime]` - indicates that this crate is a runtime implementation of + panics. + +As with allocators, there are a number of limitations imposed by these +attributes by the compiler: + +* Any crate DAG can only contain at most one instance of `#![panic_runtime]`. +* Implicit dependency edges are drawn from crates tagged with + `#![needs_panic_runtime]` to those tagged with `#![panic_runtime]`. Loops as + usual are forbidden (e.g. a panic runtime can't depend on libstd). +* Complete artifacts which include a crate tagged with `#![needs_panic_runtime]` + must include a panic runtime. This includes executables, dylibs, and + staticlibs. If no panic runtime is explicitly linked, then the compiler will + select an appropriate runtime to inject. +* Finally, the compiler will ensure that panic runtimes and compilation modes + are not mismatched. For a final product (outputs that aren't rlibs) the + `-C panic` mode of the panic runtime must match the final product itself. If + the panic mode is `abort`, then no other validation is performed, but + otherwise all crates in the DAG must have the same value of `-C panic`. + +The purpose of these limitations is to solve a number of problems that arise +when switching panic strategies. For example with aborting panic crates won't +have to link to runtime support of unwinding, or rustc will disallow mixing +panic strategies by accident. + +The actual API of panic runtimes will not be detailed in this RFC. These new +attributes will be unstable, and consequently the API itself will also be +unstable. It suffices to say, however, that like custom allocators a panic +runtime will implement some public `extern` symbols known to the crates that +need a panic runtime, and that's how they'll communicate/link up. + +### Panic Crates + +Two new **unstable** crates will be added to the distribution for each target: + +* `panic_unwind` - this is an extraction of the current implementation of + panicking from the standard library. It will use the same mechanism of stack + unwinding as is implemented on all current platforms. +* `panic_abort` - this is a new implementation of panicking which will simply + translate unwinding to process aborts. There will be no runtime support + required by this crate. + +The compiler will assume that these crates are distributed for each platform +where the standard library is also distributed (e.g. a crate that has +`#![needs_panic_runtime]`). + +### Compiler defaults + +The compiler will ship with a few defaults which affect how panic runtimes are +selected in Rust programs. Specifically: + +* The `-C panic` option will default to **unwind** as it does today. +* The libtest crate will explicitly link to `panic_unwind`. The test runner that + libtest implements relies on equating panics with failure and cannot work if + panics are translated to aborts. +* If no panic runtime is explicitly selected, the compiler will employ the + following logic to decide what panic runtime to inject: + + 1. If any crate in the DAG is compiled with `-C panic=abort`, then `panic_abort` + will be injected. + 2. If all crates in the DAG are compiled with `-C panic=unwind`, then + `panic_unwind` is injected. + +### Cargo changes + +In order to export this new feature to Cargo projects, a new option will be +added to the `[profile]` section of manifests: + +```toml +[profile.dev] +panic = 'unwind' +``` + +This will cause Cargo to pass `-C panic=unwind` to all `rustc` invocations for +a crate graph. Cargo will have special knowledge, however, that for `cargo +test` it cannot pass `-C panic=abort`. + +# Drawbacks +[drawbacks]: #drawbacks + +* The implementation of custom allocators was no small feat in the compiler, and + much of this RFC is essentially the same thing. Similar infrastructure can + likely be leveraged to alleviate the implementation complexity, but this is + undeniably a large change to the compiler for albeit a relatively minor + option. The counter point to this, however, is that disabling unwinding in a + principled fashion provides far higher quality error messages, prevents + erroneous situations, and provides an immediate benefit for many Rust users + today. + +* The binary distribution of the standard library will not change from what it + is today. In other words, the standard library (and dependency crates like + libcore) will be compiled with `-C panic=unwind`. This introduces the + opportunity for extra code bloat or missed optimizations in applications that + end up disabling unwinding in the long run. Distribution, however, is *far* + easier because there's only one copy of the standard library and we don't have + to rely on any other form of infrastructure. + +* This represents a proliferation of the `#![needs_foo]` and `#![foo]` style + system that allocators have begun. This may be indicative of a deeper + underlying requirement here of the standard library or perhaps showing how the + strategy in the standard library needs to change. If the standard library were + a crates.io crate it would arguably support these options via Cargo features, + but without that option is this the best way to be implementing these switches + for the standard library? + +# Alternatives +[alternatives]: #alternatives + +* Currently this RFC allows mixing multiple panic runtimes in a crate graph so + long as the actual runtime is compiled with `-C panic=abort`. This is + primarily done to immediately reap benefit from `-C panic=abort` even though + the standard library we distribute will still have unwinding support compiled + in (compiled with `-C panic=unwind`). In the not-too-distant future however, + we will likely be poised to distribute multiple binary copies of the standard + library compiled with different profiles. We may be able to tighten this + restriction on behalf of the compiler, requiring that all crates in a DAG have + the same `-C panic` compilation mode, but there would unfortunately be no + immediate benefit to implementing the RFC from users of our precompiled + nightlies. + + This alternative, additionally, can also be viewed as a drawback. It's unclear + what a future libstd distribution mechanism would look like and how this RFC + might interact with it. Stabilizing disabling unwinding via a compiler switch + or a Cargo profile option may not end up meshing well with the strategy we + pursue with shipping multiple standard libraries. + +* Instead of the panic runtime support in this RFC, we could instead just ship + two different copies of the standard library where one simply translates + panics to abort instead of unwinding. This is unfortunately very difficult + for Cargo or the compiler to track, however, to ensure that the codegen + option of how panics are translated is propagated throughout the rest of + the crate graph. Additionally it may be easy to mix up crates of different + panic strategies. + +# Unresolved questions +[unresolved]: #unresolved-questions + +* One possible implementation of unwinding is via return-based flags. Much of + this RFC is designed with the intention of supporting arbitrary unwinding + implementations, but it's unclear whether it's too heavily biased towards + panic is either unwinding or aborting. + +* The current implementation of Cargo would mean that a naive implementation of + the profile option would cause recompiles between `cargo build` and `cargo + test` for projects that specify `panic = 'abort'`. Is this acceptable? Should + Cargo cache both copies of the crate? diff --git a/text/1521-copy-clone-semantics.md b/text/1521-copy-clone-semantics.md new file mode 100644 index 00000000000..6a79314d156 --- /dev/null +++ b/text/1521-copy-clone-semantics.md @@ -0,0 +1,66 @@ +- Feature Name: N/A +- Start Date: 01 March, 2016 +- RFC PR: [rust-lang/rfcs#1521](https://github.com/rust-lang/rfcs/pull/1521) +- Rust Issue: [rust-lang/rust#33416](https://github.com/rust-lang/rust/issues/33416) + +# Summary +[summary]: #summary + +With specialization on the way, we need to talk about the semantics of +`::clone() where T: Copy`. + +It's generally been an unspoken rule of Rust that a `clone` of a `Copy` type is +equivalent to a `memcpy` of that type; however, that fact is not documented +anywhere. This fact should be in the documentation for the `Clone` trait, just +like the fact that `T: Eq` should implement `a == b == c == a` rules. + +# Motivation +[motivation]: #motivation + +Currently, `Vec::clone()` is implemented by creating a new `Vec`, and then +cloning all of the elements from one into the other. This is slow in debug mode, +and may not always be optimized (although it often will be). Specialization +would allow us to simply `memcpy` the values from the old `Vec` to the new +`Vec` in the case of `T: Copy`. However, if we don't specify this, we will not +be able to, and we will be stuck looping over every value. + +It's always been the intention that `Clone::clone == ptr::read for T: Copy`; see +[issue #23790][issue-copy]: "It really makes sense for `Clone` to be a +supertrait of `Copy` -- `Copy` is a refinement of `Clone` where `memcpy` +suffices, basically." This idea was also implicit in accepting +[rfc #0839][rfc-extend] where "[B]ecause Copy: Clone, it would be backwards +compatible to upgrade to Clone in the future if demand is high enough." + +# Detailed design +[design]: #detailed-design + +Specify that `::clone(t)` shall be equivalent to `ptr::read(t)` +where `T: Copy, t: &T`. An implementation that does not uphold this *shall not* +result in undefined behavior; `Clone` is not an `unsafe trait`. + +Also add something like the following sentence to the documentation for the +`Clone` trait: + +"If `T: Copy`, `x: T`, and `y: &T`, then `let x = y.clone();` is equivalent to +`let x = *y;`. Manual implementations must be careful to uphold this." + +# Drawbacks +[drawbacks]: #drawbacks + +This is a breaking change, technically, although it breaks code that was +malformed in the first place. + +# Alternatives +[alternatives]: #alternatives + +The alternative is that, for each type and function we would like to specialize +in this way, we document this separately. This is how we started off with +`clone_from_slice`. + +# Unresolved questions +[unresolved]: #unresolved-questions + +What the exact wording should be. + +[issue-copy]: https://github.com/rust-lang/rust/issues/23790 +[rfc-extend]: https://github.com/rust-lang/rfcs/blob/master/text/0839-embrace-extend-extinguish.md diff --git a/text/1522-conservative-impl-trait.md b/text/1522-conservative-impl-trait.md new file mode 100644 index 00000000000..20c23428fef --- /dev/null +++ b/text/1522-conservative-impl-trait.md @@ -0,0 +1,545 @@ +- Feature Name: conservative_impl_trait +- Start Date: 2016-01-31 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1522 +- Rust Issue: https://github.com/rust-lang/rust/issues/34511 + +# Summary +[summary]: #summary + +Add a conservative form of abstract return types, also known as `impl +Trait`, that will be compatible with most possible future extensions +by initially being restricted to: + +- Only free-standing or inherent functions. +- Only return type position of a function. + +Abstract return types allow a function to hide a concrete return +type behind a trait interface similar to trait objects, while +still generating the same statically dispatched code as with concrete types. + +With the placeholder syntax used in discussions so far, +abstract return types would be used roughly like this: + +```rust +fn foo(n: u32) -> impl Iterator { + (0..n).map(|x| x * 100) +} +// ^ behaves as if it had return type Map, Closure> +// where Closure = type of the |x| x * 100 closure. + +for x in foo(10) { + // x = 0, 100, 200, ... +} + +``` + +# Background + +There has been much discussion around the `impl Trait` feature already, with +different proposals extending the core idea into different directions: + +- The [original proposal](https://github.com/rust-lang/rfcs/pull/105). +- A [blog post](http://aturon.github.io/blog/2015/09/28/impl-trait/) reviving + the proposal and further exploring the design space. +- A [more recent proposal](https://github.com/rust-lang/rfcs/pull/1305) with a + substantially more ambitious scope. + +This RFC is an attempt to make progress on the feature by proposing a minimal +subset that should be forwards-compatible with a whole range of extensions that +have been discussed (and will be reviewed in this RFC). However, even this small +step requires resolving some of the core questions raised in +[the blog post](http://aturon.github.io/blog/2015/09/28/impl-trait/). + +This RFC is closest in spirit to the +[original RFC](https://github.com/rust-lang/rfcs/pull/105), and we'll repeat +its motivation and some other parts of its text below. + +# Motivation +[motivation]: #motivation + +> Why are we doing this? What use cases does it support? What is the expected outcome? + +In today's Rust, you can write a function signature like + +````rust +fn consume_iter_static>(iter: I) +fn consume_iter_dynamic(iter: Box>) +```` + +In both cases, the function does not depend on the exact type of the argument. +The type is held "abstract", and is assumed only to satisfy a trait bound. + +* In the `_static` version using generics, each use of the function is + specialized to a concrete, statically-known type, giving static dispatch, inline + layout, and other performance wins. + +* In the `_dynamic` version using trait objects, the concrete argument type is + only known at runtime using a vtable. + +On the other hand, while you can write + +````rust +fn produce_iter_dynamic() -> Box> +```` + +you _cannot_ write something like + +````rust +fn produce_iter_static() -> Iterator +```` + +That is, in today's Rust, abstract return types can only be written using trait +objects, which can be a significant performance penalty. This RFC proposes +"unboxed abstract types" as a way of achieving signatures like +`produce_iter_static`. Like generics, unboxed abstract types guarantee static +dispatch and inline data layout. + +Here are some problems that unboxed abstract types solve or mitigate: + +* _Returning unboxed closures_. Closure syntax generates an anonymous type + implementing a closure trait. Without unboxed abstract types, there is no way + to use this syntax while returning the resulting closure unboxed, because there + is no way to write the name of the generated type. + +* _Leaky APIs_. Functions can easily leak implementation details in their return + type, when the API should really only promise a trait bound. For example, a + function returning `Rev>` is revealing exactly how the iterator + is constructed, when the function should only promise that it returns _some_ + type implementing `Iterator`. Using newtypes/structs with private fields + helps, but is extra work. Unboxed abstract types make it as easy to promise only + a trait bound as it is to return a concrete type. + +* _Complex types_. Use of iterators in particular can lead to huge types: + + ````rust + Chain>>>, SkipWhile<'a, u16, Map<'a, &u16, u16, slice::Items>>> + ```` + + Even when using newtypes to hide the details, the type still has to be written + out, which can be very painful. Unboxed abstract types only require writing the + trait bound. + +* _Documentation_. In today's Rust, reading the documentation for the `Iterator` + trait is needlessly difficult. Many of the methods return new iterators, but + currently each one returns a different type (`Chain`, `Zip`, `Map`, `Filter`, + etc), and it requires drilling down into each of these types to determine what + kind of iterator they produce. + +In short, unboxed abstract types make it easy for a function signature to +promise nothing more than a trait bound, and do not generally require the +function's author to write down the concrete type implementing the bound. + +# Detailed design +[design]: #detailed-design + +As explained at the start of the RFC, the focus here is a relatively narrow +introduction of abstract types limited to the return type of inherent methods +and free functions. While we still need to resolve some of the core questions +about what an "abstract type" means even in these cases, we avoid some of the +complexities that come along with allowing the feature in other locations or +with other extensions. + +## Syntax + +Let's start with the bikeshed: The proposed syntax is `impl Trait` in return type +position, composing like trait objects to forms like `impl Foo+Send+'a`. + +It can be explained as "a type that implements `Trait`", +and has been used in that form in most earlier discussions and proposals. + +Initial versions of this RFC proposed `@Trait` for brevity reasons, +since the feature is supposed to be used commonly once implemented, +but due to strong negative reactions by the community this has been +changed back to the current form. + +There are other possibilities, like `abstract Trait` or `~Trait`, with +good reasons for or against them, but since the concrete choice of syntax +is not a blocker for the implementation of this RFC, it is intended for +a possible follow-up RFC to address syntax changes if needed. + +## Semantics + +The core semantics of the feature is described below. + +Note that the sections after this one go into more detail on some of the design +decisions, and that **it is likely for many of the mentioned limitations to be +lifted at some point in the future**. For clarity, we'll separately categories the *core +semantics* of the feature (aspects that would stay unchanged with future extensions) +and the *initial limitations* (which are likely to be lifted later). + +**Core semantics**: + +- If a function returns `impl Trait`, its body can return values of any type that + implements `Trait`, but all return values need to be of the same type. + +- As far as the typesystem and the compiler is concerned, the return type + outside of the function would not be a entirely "new" type, nor would it be a + simple type alias. Rather, its semantics would be very similar to that of + _generic type parameters_ inside a function, with small differences caused by + being an _output_ rather than an _input_ of the function. + + - The type would be known to implement the specified traits. + - The type would not be known to implement any other trait, with + the exception of OIBITS (aka "auto traits") and default traits like `Sized`. + - The type would not be considered equal to the actual underlying type. + - The type would not be allowed to appear as the Self type for an `impl` block. + +- Because OIBITS like `Send` and `Sync` will leak through an abstract return + type, there will be some additional complexity in the compiler due to some + non-local type checking becoming necessary. + +- The return type has an identity based on all generic parameters the + function body is parameterized by, and by the location of the function + in the module system. This means type equality behaves like this: + + ```rust + fn foo(t: T) -> impl Trait { + t + } + + fn bar() -> impl Trait { + 123 + } + + fn equal_type(a: T, b: T) {} + + equal_type(bar(), bar()); // OK + equal_type(foo::(0), foo::(0)); // OK + equal_type(bar(), foo::(0)); // ERROR, `impl Trait {bar}` is not the same type as `impl Trait {foo}` + equal_type(foo::(false), foo::(0)); // ERROR, `impl Trait {foo}` is not the same type as `impl Trait {foo}` + ``` + +- The code generation passes of the compiler would not draw a distinction + between the abstract return type and the underlying type, just like they don't + for generic parameters. This means: + - The same trait code would be instantiated, for example, `-> impl Any` + would return the type id of the underlying type. + - Specialization would specialize based on the underlying type. + +**Initial limitations**: + +- `impl Trait` may only be written within the return type of a freestanding or + inherent-impl function, not in trait definitions or any non-return type position. They may also not appear + in the return type of closure traits or function pointers, + unless these are themselves part of a legal return type. + + - Eventually, we will want to allow the feature to be used within traits, and + like in argument position as well (as an ergonomic improvement over today's generics). + - Using `impl Trait` multiple times in the same return type would be valid, + like for example in `-> (impl Foo, impl Bar)`. + +- The type produced when a function returns `impl Trait` would be effectively + unnameable, just like closures and function items. + + - We will almost certainly want to lift this limitation in the long run, so + that abstract return types can be placed into structs and so on. There are a + few ways we could do so, all related to getting at the "output type" of a + function given all of its generic arguments. + +- The function body cannot see through its own return type, so code like this + would be forbidden just like on the outside: + + ```rust + fn sum_to(n: u32) -> impl Display { + if n == 0 { + 0 + } else { + n + sum_to(n - 1) + } + } + ``` + + - It's unclear whether we'll want to lift this limitation, but it should be possible to do so. + +## Rationale + +### Why this semantics for the return type? + +There has been a lot of discussion about what the semantics of the return type +should be, with the theoretical extremes being "full return type inference" and +"fully abstract type that behaves like a autogenerated newtype wrapper". (This +was in fact the main focus of the +[blog post](http://aturon.github.io/blog/2015/09/28/impl-trait/) on `impl +Trait`.) + +The design as chosen in this RFC lies somewhat in between those two, since it +allows OIBITs to leak through, and allows specialization to "see" the full type +being returned. That is, `impl Trait` does not attempt to be a "tightly sealed" +abstraction boundary. The rationale for this design is a mixture of pragmatics +and principles. + +#### Specialization transparency + +**Principles for specialization transparency**: + +The [specialization RFC](https://github.com/rust-lang/rfcs/pull/1210) has given +us a basic principle for how to understand bounds in function generics: they +represent a *minimum* contract between the caller and the callee, in that the +caller must meet at least those bounds, and the callee must be prepared to work +with any type that meets at least those bounds. However, with specialization, +the callee may choose different behavior when additional bounds hold. + +This RFC abides by a similar interpretation for return types: the signature +represents the minimum bound that the callee must satisfy, and the caller must +be prepared to work with any type that meets at least that bound. Again, with +specialization, the caller may dispatch on additional type information beyond +those bounds. + +In other words, to the extent that returning `impl Trait` is intended to be +symmetric with taking a generic `T: Trait`, transparency with respect to +specialization maintains that symmetry. + +**Pragmatics for specialization transparency**: + +The practical reason we want `impl Trait` to be transparent to specialization is the +same as the reason we want specialization in the first place: to be able to +break through abstractions with more efficient special-case code. + +This is particularly important for one of the primary intended usecases: +returning `impl Iterator`. We are very likely to employ specialization for various +iterator types, and making the underlying return type invisible to +specialization would lose out on those efficiency wins. + +#### OIBIT transparency + +OIBITs leak through an abstract return type. This might be considered controversial, since +it effectively opens a channel where the result of function-local type inference affects +item-level API, but has been deemed worth it for the following reasons: + +- Ergonomics: Trait objects already have the issue of explicitly needing to + declare `Send`/`Sync`-ability, and not extending this problem to abstract + return types is desirable. In practice, most uses of this feature would have + to add explicit bounds for OIBITS if they wanted to be maximally usable. + +- Low real change, since the situation already somewhat exists on structs with private fields: + - In both cases, a change to the private implementation might change whether a OIBIT is + implemented or not. + - In both cases, the existence of OIBIT impls is not visible without documentation tools + - In both cases, you can only assert the existence of OIBIT impls + by adding explicit trait bounds either to the API or to the crate's test suite. + +In fact, a large part of the point of OIBITs in the first place was to cut +across abstraction barriers and provide information about a type without the +type's author having to explicitly opt in. + +This means, however, that it has to be considered a silent breaking change to +change a function with a abstract return type in a way that removes OIBIT impls, +which might be a problem. (As noted above, this is already the case for `struct` +definitions.) + +But since the number of used OIBITs is relatively small, deducing the return type +in a function body and reasoning about whether such a breakage will occur has +been deemed as a manageable amount of work. + +#### Wherefore type abstraction? + +In the [most recent RFC](https://github.com/rust-lang/rfcs/pull/1305) related to +this feature, a more "tightly sealed" abstraction mechanism was +proposed. However, part of the discussion on specialization centered on +precisely the issue of what type abstraction provides and how to achieve it. A +particular salient point there is that, in Rust, *privacy* is already our +primary mechanism for hiding +(["privacy is the new parametricity"](https://github.com/rust-lang/rfcs/pull/1210#issuecomment-181992044)). In +practice, that means that if you want opacity against specialization, you should +use something like a newtype. + +### Anonymity + +A abstract return type cannot be named in this proposal, which means that it +cannot be placed into `structs` and so on. This is not a fundamental limitation +in any sense; the limitation is there both to keep this RFC simple, and because +the precise way we might want to allow naming of such types is still a bit +unclear. Some possibilities include a `typeof` operator, or explicit named +abstract types. + +### Limitation to only return type position + +There have been various proposed additional places where abstract types +might be usable. For example, `fn x(y: impl Trait)` as shorthand for +`fn x(y: T)`. + +Since the exact semantics and user experience for these locations are yet +unclear (`impl Trait` would effectively behave completely different before and after +the `->`), this has also been excluded from this proposal. + +### Type transparency in recursive functions + +Functions with abstract return types can not see through their own return type, +making code like this not compile: + +```rust +fn sum_to(n: u32) -> impl Display { + if n == 0 { + 0 + } else { + n + sum_to(n - 1) + } +} +``` + +This limitation exists because it is not clear how much a function body +can and should know about different instantiations of itself. + +It would be safe to allow recursive calls if the set of generic parameters +is identical, and it might even be safe if the generic parameters are different, +since you would still be inside the private body of the function, just +differently instantiated. + +But variance caused by lifetime parameters and the interaction with +specialization makes it uncertain whether this would be sound. + +In any case, it can be initially worked around by defining a local helper function like this: + +```rust +fn sum_to(n: u32) -> impl Display { + fn sum_to_(n: u32) -> u32 { + if n == 0 { + 0 + } else { + n + sum_to_(n - 1) + } + } + sum_to_(n) +} +``` + +### Not legal in function pointers/closure traits + +Because `impl Trait` defines a type tied to the concrete function body, +it does not make much sense to talk about it separately in a function signature, +so the syntax is forbidden there. + +### Compatibility with conditional trait bounds + +On valid critique for the existing `impl Trait` proposal is that it does not +cover more complex scenarios, where the return type would implement +one or more traits depending on whether a type parameter does so with another. + +For example, a iterator adapter might want to implement `Iterator` and +`DoubleEndedIterator`, depending on whether the adapted one does: + +```rust +fn skip_one(i: I) -> SkipOne { ... } +struct SkipOne { ... } +impl Iterator for SkipOne { ... } +impl DoubleEndedIterator for SkipOne { ... } +``` + +Using just `-> impl Iterator`, this would not be possible to reproduce. + +Since there has been no proposals so far that would address this in a way +that would conflict with the fixed-trait-set case, this RFC punts on that issue as well. + +### Limitation to free/inherent functions + +One important usecase of abstract return types is to use them in trait methods. + +However, there is an issue with this, namely that in combinations with generic +trait methods, they are effectively equivalent to higher kinded types. +Which is an issue because Rust HKT story is not yet figured out, so +any "accidental implementation" might cause unintended fallout. + +HKT allows you to be generic over a type constructor, aka a +"thing with type parameters", and then instantiate them at some later point to +get the actual type. +For example, given a HK type `T` that takes one type as parameter, you could +write code that uses `T` or `T` without caring about +whether `T = Vec`, `T = Box`, etc. + +Now if we look at abstract return types, we have a similar situation: + +```rust +trait Foo { + fn bar() -> impl Baz +} +``` + +Given a `T: Foo`, we could instantiate `T::bar::` or `T::bar::`, +and could get arbitrary different return types of `bar` instantiated +with a `u32` or `bool`, +just like `T` and `T` might give us `Vec` or `Box` +in the example above. + +The problem does not exists with trait method return types today because +they are concrete: + +```rust +trait Foo { + fn bar() -> X +} +``` + +Given the above code, there is no way for `bar` to choose a return type `X` +that could fundamentally differ between instantiations of `Self` +while still being instantiable with an arbitrary `U`. + +At most you could return a associated type, but then you'd loose the generics +from `bar` + +```rust +trait Foo { + type X; + fn bar() -> Self::X // No way to apply U +} +``` + +So, in conclusion, since Rusts HKT story is not yet fleshed out, +and the compatibility of the current compiler with it is unknown, +it is not yet possible to reach a concrete solution here. + +In addition to that, there are also different proposals as to whether +a abstract return type is its own thing or sugar for a associated type, +how it interacts with other associated items and so on, +so forbidding them in traits seems like the best initial course of action. + +# Drawbacks +[drawbacks]: #drawbacks + +> Why should we *not* do this? + +## Drawbacks due to the proposal's minimalism + +As has been elaborated on above, there are various way this feature could be +extended and combined with the language, so implementing it might cause issues +down the road if limitations or incompatibilities become apparent. However, +variations of this RFC's proposal have been under discussion for quite a long +time at this point, and this proposal is carefully designed to be +future-compatible with them, while resolving the core issue around transparency. + +A drawback of limiting the feature to return type position (and not arguments) +is that it creates a somewhat inconsistent mental model: it forces you to +understand the feature in a highly special-cased way, rather than as a general +way to talk about unknown-but-bounded types in function signatures. This could +be particularly bewildering to newcomers, who must choose between `T: Trait`, +`Box`, and `impl Trait`, with the latter only usable in one place. + +## Drawbacks due to partial transparency + +The fact that specialization and OIBITs can "see through" `impl Trait` may be +surprising, to the extent that one wants to see `impl Trait` as an abstraction +mechanism. However, as the RFC argued in the rationale section, this design is +probably the most consistent with our existing post-specialization abstraction +mechanisms, and lead to the relatively simple story that *privacy* is the way to +achieve hiding in Rust. + +# Alternatives +[alternatives]: #alternatives + +> What other designs have been considered? What is the impact of not doing this? + +See the links in the motivation section for detailed analysis that we won't +repeat here. + +But basically, without this feature certain things remain hard or impossible to do +in Rust, like returning a efficiently usable type parameterized by +types private to a function body, for example an iterator adapter containing a closure. + +# Unresolved questions +[unresolved]: #unresolved-questions + +> What parts of the design are still to be determined? + +The precise implementation details for OIBIT transparency are a bit unclear: in +general, it means that type checking may need to proceed in a particular order, +since you cannot get the full type information from the signature alone (you +have to typecheck the function body to determine which OIBITs apply). diff --git a/text/1525-cargo-workspace.md b/text/1525-cargo-workspace.md new file mode 100644 index 00000000000..2022fc825b7 --- /dev/null +++ b/text/1525-cargo-workspace.md @@ -0,0 +1,350 @@ +- Feature Name: N/A +- Start Date: 2015-09-15 +- RFC PR: [rust-lang/rfcs#1525](https://github.com/rust-lang/rfcs/pull/1525) +- Rust Issue: [rust-lang/cargo#2122](https://github.com/rust-lang/cargo/issues/2122) + +# Summary + +Improve Cargo's story around multi-crate single-repo project management by +introducing the concept of workspaces. All packages in a workspace will share +`Cargo.lock` and an output directory for artifacts. + +# Motivation + +A common method to organize a multi-crate project is to have one +repository which contains all of the crates. Each crate has a corresponding +subdirectory along with a `Cargo.toml` describing how to build it. There are a +number of downsides to this approach, however: + +* Each sub-crate will have its own `Cargo.lock`, so it's difficult to ensure + that the entire project is using the same version of all dependencies. This is + desired as the main crate (often a binary) is often the one that has the + `Cargo.lock` "which counts", but it needs to be kept in sync with all + dependencies. + +* When building or testing sub-crates, all dependencies will be recompiled as + the target directory will be changing as you move around the source tree. This + can be overridden with `build.target-dir` or `CARGO_TARGET_DIR`, but this + isn't always convenient to set. + +Solving these two problems should help ease the development of large Rust +projects by ensuring that all dependencies remain in sync and builds by default +use already-built artifacts if available. + +# Detailed design + +Cargo will grow the concept of a **workspace** for managing repositories of +multiple crates. Workspaces will then have the properties: + +* A workspace can contain multiple local crates: one 'root crate', and any + number of 'member crate'. +* The root crate of a workspace has a `Cargo.toml` file containing `[workspace]` + key, which we call it as 'root `Cargo.toml`'. +* Whenever any crate in the workspace is compiled, output will be placed in the + `target` directory next to the root `Cargo.toml`. +* One `Cargo.lock` file for the entire workspace will reside next to the root + `Cargo.toml` and encompass the dependencies (and dev-dependencies) for all + crates in the workspace. + +With workspaces, Cargo can now solve the problems set forth in the motivation +section. Next, however, workspaces need to be defined. In the spirit of much of +the rest of Cargo's configuration today this will largely be automatic for +conventional project layouts but will have explicit controls for configuration. + +### New manifest keys + +First, let's look at the new manifest keys which will be added to `Cargo.toml`: + +```toml +[workspace] +members = ["relative/path/to/child1", "../child2"] + +# or ... + +[package] +workspace = "../foo" +``` + +The root `Cargo.toml` of a workspace, indicated by the presence of `[workspace]`, +is responsible for defining the entire workspace (listing all members). +This example here means that two extra crates will be members of the workspace +(which also includes the root). + +The `package.workspace` key is used to point at a workspace's root crate. For +example this Cargo.toml indicates that the Cargo.toml in `../foo` is the root +Cargo.toml of root crate, that this package is a member of. + +These keys are mutually exclusive when applied in `Cargo.toml`. A crate may +*either* specify `package.workspace` or specify `[workspace]`. That is, a +crate cannot both be a root crate in a workspace (contain `[workspace]`) and +also be a member crate of another workspace (contain `package.workspace`). + +### "Virtual" `Cargo.toml` + +A good number of projects do not necessarily have a "root `Cargo.toml`" which is +an appropriate root for a workspace. To accommodate these projects and allow for +the output of a workspace to be configured regardless of where crates are +located, Cargo will now allow for "virtual manifest" files. These manifests will +currently **only** contains the `[workspace]` table and will notably be lacking +a `[project]` or `[package]` top level key. + +Cargo will for the time being disallow many commands against a virtual manifest, +for example `cargo build` will be rejected. Arguments that take a package, +however, such as `cargo test -p foo` will be allowed. Workspaces can eventually +get extended with `--all` flags so in a workspace root you could execute +`cargo build --all` to compile all crates. + +### Validating a workspace + +A workspace is valid if these two properties hold: + +1. A workspace has only one root crate (that with `[workspace]` in + `Cargo.toml`). +2. All workspace crates defined in `workspace.members` point back to the + workspace root with `package.workspace`. + +While the restriction of one-root-per workspace may make sense, the restriction +of crates pointing back to the root may not. If, however, this restriction were +not in place then the set of crates in a workspace may differ depending on +which crate it was viewed from. For example if workspace root A includes B then +it will think B is in A's workspace. If, however, B does not point back to A, +then B would not think that A was in its workspace. This would in turn cause the +set of crates in each workspace to be different, further causing `Cargo.lock` to +get out of sync if it were allowed. By ensuring that all crates have edges to +each other in a workspace Cargo can prevent this situation and guarantee robust +builds no matter where they're executed in the workspace. + +To alleviate misconfiguration Cargo will emit an error if the two properties +above do not hold for any crate attempting to be part of a workspace. For +example, if the `package.workspace` key is specified, but the crate is not a +workspace root or doesn't point back to the original crate an error is emitted. + +### Implicit relations + +The combination of the `package.workspace` key and `[workspace]` table is enough +to specify any workspace in Cargo. Having to annotate all crates with a +`package.workspace` parent or a `workspace.members` list can get quite tedious, +however! To alleviate this configuration burden Cargo will allow these keys to +be implicitly defined in some situations. + +The `package.workspace` can be omitted if it would only contain `../` (or some +repetition of it). That is, if the root of a workspace is hierarchically the +first `Cargo.toml` with `[workspace]` above a crate in the filesystem, then that +crate can omit the `package.workspace` key. + +Next, a crate which specifies `[workspace]` **without a `members` key** will +transitively crawl `path` dependencies to fill in this key. This way all `path` +dependencies (and recursively their own `path` dependencies) will inherently +become the default value for `workspace.members`. + +Note that these implicit relations will be subject to the same validations +mentioned above for all of the explicit configuration as well. + +### Workspaces in practice + +Many Rust projects today already have `Cargo.toml` at the root of a repository, +and with the small addition of `[workspace]` in the root `Cargo.toml`, a +workspace will be ready for all crates in that repository. For example: + +* An FFI crate with a sub-crate for FFI bindings + + ``` + Cargo.toml + src/ + foo-sys/ + Cargo.toml + src/ + ``` + +* A crate with multiple in-tree dependencies + + ``` + Cargo.toml + src/ + dep1/ + Cargo.toml + src/ + dep2/ + Cargo.toml + src/ + ``` + +Some examples of layouts that will require extra configuration, along with the +configuration necessary, are: + +* Trees without any root crate + + ``` + crate1/ + Cargo.toml + src/ + crate2/ + Cargo.toml + src/ + crate3/ + Cargo.toml + src/ + ``` + + these crates can all join the same workspace via a `Cargo.toml` file at the + root looking like: + + ```toml + [workspace] + members = ["crate1", "crate2", "crate3"] + ``` + +* Trees with multiple workspaces + + ``` + ws1/ + crate1/ + Cargo.toml + src/ + crate2/ + Cargo.toml + src/ + ws2/ + Cargo.toml + src/ + crate3/ + Cargo.toml + src/ + ``` + + The two workspaces here can be configured by placing the following in the + manifests: + + ```toml + # ws1/Cargo.toml + [workspace] + members = ["crate1", "crate2"] + ``` + + ```toml + # ws2/Cargo.toml + [workspace] + ``` + +* Trees with non-hierarchical workspaces + + ``` + root/ + Cargo.toml + src/ + crates/ + crate1/ + Cargo.toml + src/ + crate2/ + Cargo.toml + src/ + ``` + + The workspace here can be configured by placing the following in the + manifests: + + ```toml + # root/Cargo.toml + # + # Note that `members` aren't necessary if these are otherwise path + # dependencies. + [workspace] + members = ["../crates/crate1", "../crates/crate2"] + ``` + + ```toml + # crates/crate1/Cargo.toml + [package] + workspace = "../../root" + ``` + + ```toml + # crates/crate2/Cargo.toml + [package] + workspace = "../../root" + ``` + +Projects like the compiler will likely need exhaustively explicit configuration. +The `rust` repo conceptually has two workspaces, the standard library and the +compiler, and these would need to be manually configured with +`workspace.members` and `package.workspace` keys amongst all crates. + +### Lockfile and override interactions + +One of the main features of a workspace is that only one `Cargo.lock` is +generated for the entire workspace. This lock file can be affected, however, +with both [`[replace]` overrides][replace] as well as `paths` overrides. + +[replace]: https://github.com/rust-lang/cargo/pull/2385 + +Primarily, the `Cargo.lock` generate will not simply be the concatenation of the +lock files from each project. Instead the entire workspace will be resolved +together all at once, minimizing versions of crates used and sharing +dependencies as much as possible. For example one `path` dependency will always +have the same set of dependencies no matter which crate is being compiled. + +When interacting with overrides, workspaces will be modified to only allow +`[replace]` to exist in the workspace root. This Cargo.toml will affect lock +file generation, but no other workspace members will be allowed to have a +`[replace]` directive (with an informative error message being produced). + +Finally, the `paths` overrides will be applied as usual, and they'll continue to +be applied relative to whatever crate is being compiled (not the workspace +root). These are intended for much more local testing, so no restriction of +"must be in the root" should be necessary. + +Note that this change to the lockfile format is technically incompatible with +older versions of Cargo.lock, but the entire workspaces feature is also +incompatible with older versions of Cargo. This will require projects that wish +to work with workspaces and multiple versions of Cargo to check in multiple +`Cargo.lock` files, but if projects avoid workspaces then Cargo will remain +forwards and backwards compatible. + +### Future Extensions + +Once Cargo understands a workspace of crates, we could easily extend various +subcommands with a `--all` flag to perform tasks such as: + +* Test all crates within a workspace (run all unit tests, doc tests, etc) +* Build all binaries for a set of crates within a workspace +* Publish all crates in a workspace if necessary to crates.io + +Furthermore, workspaces could start to deduplicate metadata among crates like +version numbers, URL information, authorship, etc. + +This support isn't proposed to be added in this RFC specifically, but simply to +show that workspaces can be used to solve other existing issues in Cargo. + +# Drawbacks + +* As proposed there is no method to disable implicit actions taken by Cargo. + It's unclear what the use case for this is, but it could in theory arise. + +* No crate will implicitly benefit from workspaces after this is implemented. + Existing crates must opt-in with a `[workspace]` key somewhere at least. + +# Alternatives + +* The `workspace.members` key could support globs to define a number of + directories at once. For example one could imagine: + + ```toml + [workspace] + members = ["crates/*"] + ``` + + as an ergonomic method of slurping up all sub-folders in the `crates` folder + as crates. + +* Cargo could attempt to perform more inference of workspace members by simply + walking the entire directory tree starting at `Cargo.toml`. All children found + could implicitly be members of the workspace. Walking entire trees, + unfortunately, isn't always efficient to do and it would be unfortunate to + have to unconditionally do this. + +# Unresolved questions + +* Does this approach scale well to repositories with a large number of crates? + For example does the winapi-rs repository experience a slowdown on standard + `cargo build` as a result? diff --git a/text/1535-stable-overflow-checks.md b/text/1535-stable-overflow-checks.md new file mode 100644 index 00000000000..eb66764c103 --- /dev/null +++ b/text/1535-stable-overflow-checks.md @@ -0,0 +1,64 @@ +- Feature Name: N/A +- Start Date: 2016-03-09 +- RFC PR: [rust-lang/rfcs#1535](https://github.com/rust-lang/rfcs/pull/1535) +- Rust Issue: [rust-lang/rust#33134](https://github.com/rust-lang/rust/issues/33134) + +# Summary +[summary]: #summary + +Stabilize the `-C overflow-checks` command line argument. + +# Motivation +[motivation]: #motivation + +This is an easy way to turn on overflow checks in release builds +without otherwise turning on debug assertions, via the `-C +debug-assertions` flag. In stable Rust today you can't get one without +the other. + +Users can use the `-C overflow-checks` flag from their Cargo +config to turn on overflow checks for an entire application. + +This flag, which accepts values of 'yes'/'no', 'on'/'off', is being +renamed from `force-overflow-checks` because the `force` doesn't add +anything that the 'yes'/'no' + +# Detailed design +[design]: #detailed-design + +This is a stabilization RFC. The only steps will be to move +`force-overflow-checks` from `-Z` to `-C`, renaming it to +`overflow-checks`, and making it stable. + +# Drawbacks +[drawbacks]: #drawbacks + +It's another rather ad-hoc flag for modifying code generation. + +Like other such flags, this applies to the entire code unit, +regardless of monomorphizations. This means that code generation for a +single function can be diferent based on which code unit its +instantiated in. + +# Alternatives +[alternatives]: #alternatives + +The flag could instead be tied to crates such that any time code from +that crate is inlined/monomorphized it turns on overflow checks. + +We might also want a design that provides per-function control over +overflow checks. + +# Unresolved questions +[unresolved]: #unresolved-questions + +Cargo might also add a profile option like + +```toml +[profile.dev] +overflow-checks = true +``` + +This may also be accomplished by Cargo's pending support for passing +arbitrary flags to rustc. + diff --git a/text/1542-try-from.md b/text/1542-try-from.md new file mode 100644 index 00000000000..affee80057b --- /dev/null +++ b/text/1542-try-from.md @@ -0,0 +1,165 @@ +- Feature Name: `try_from` +- Start Date: 2016-03-10 +- RFC PR: [rust-lang/rfcs#1542](https://github.com/rust-lang/rfcs/pull/1542) +- Rust Issue: [rust-lang/rfcs#33147](https://github.com/rust-lang/rust/issues/33417) + +# Summary +[summary]: #summary + +The standard library provides the `From` and `Into` traits as standard ways to +convert between types. However, these traits only support *infallable* +conversions. This RFC proposes the addition of `TryFrom` and `TryInto` traits +to support these use cases in a standard way. + +# Motivation +[motivation]: #motivation + +Fallible conversions are fairly common, and a collection of ad-hoc traits has +arisen to support them, both [within the standard library][from-str] and [in +third party crates][into-connect-params]. A standardized set of traits +following the pattern set by `From` and `Into` will ease these APIs by +providing a standardized interface as we expand the set of fallible +conversions. + +One specific avenue of expansion that has been frequently requested is fallible +integer conversion traits. Conversions between integer types may currently be +performed with the `as` operator, which will silently truncate the value if it +is out of bounds of the target type. Code which needs to down-cast values must +manually check that the cast will succeed, which is both tedious and error +prone. A fallible conversion trait reduces code like this: + +```rust +let value: isize = ...; + +let value: u32 = if value < 0 || value > u32::max_value() as isize { + return Err(BogusCast); +} else { + value as u32 +}; +``` + +to simply: + +```rust +let value: isize = ...; +let value: u32 = try!(value.try_into()); +``` + +# Detailed design +[design]: #detailed-design + +Two traits will be added to the `core::convert` module: + +```rust +pub trait TryFrom: Sized { + type Err; + + fn try_from(t: T) -> Result; +} + +pub trait TryInto: Sized { + type Err; + + fn try_into(self) -> Result; +} +``` + +In a fashion similar to `From` and `Into`, a blanket implementation of `TryInto` +is provided for all `TryFrom` implementations: + +```rust +impl TryInto for T where U: TryFrom { + type Error = U::Err; + + fn try_into(self) -> Result { + U::try_from(self) + } +} +``` + +In addition, implementations of `TryFrom` will be provided to convert between +*all combinations* of integer types: + +```rust +#[derive(Debug)] +pub struct TryFromIntError(()); + +impl fmt::Display for TryFromIntError { + fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result { + fmt.write_str(self.description()) + } +} + +impl Error for TryFromIntError { + fn description(&self) -> &str { + "out of range integral type conversion attempted" + } +} + +impl TryFrom for u8 { + type Err = TryFromIntError; + + fn try_from(t: usize) -> Result { + // ... + } +} + +// ... +``` + +This notably includes implementations that are actually infallible, including +implementations between a type and itself. A common use case for these kinds +of conversions is when interacting with a C API and converting, for example, +from a `u64` to a `libc::c_long`. `c_long` may be `u32` on some platforms but +`u64` on others, so having an `impl TryFrom for u64` ensures that +conversions using these traits will compile on all architectures. Similarly, a +conversion from `usize` to `u32` may or may not be fallible depending on the +target architecture. + +The standard library provides a reflexive implementation of the `From` trait +for all types: `impl From for T`. We could similarly provide a "lifting" +implementation of `TryFrom`: + +```rust +impl> TryFrom for U { + type Err = Void; + + fn try_from(t: T) -> Result { + Ok(U::from(t)) + } +} +``` + +However, this implementation would directly conflict with our goal of having +uniform `TryFrom` implementations between all combinations of integer types. In +addition, it's not clear what value such an implementation would actually +provide, so this RFC does *not* propose its addition. + +# Drawbacks +[drawbacks]: #drawbacks + +It is unclear if existing fallible conversion traits can backwards-compatibly +be subsumed into `TryFrom` and `TryInto`, which may result in an awkward mix of +ad-hoc traits in addition to `TryFrom` and `TryInto`. + +# Alternatives +[alternatives]: #alternatives + +We could avoid general traits and continue making distinct conversion traits for +each use case. + +# Unresolved questions +[unresolved]: #unresolved-questions + +Are `TryFrom` and `TryInto` the right names? There is some precedent for the +`try_` prefix: `TcpStream::try_clone`, `Mutex::try_lock`, etc. + +What should be done about `FromStr`, `ToSocketAddrs`, and other ad-hoc fallible +conversion traits? An upgrade path may exist in the future with specialization, +but it is probably too early to say definitively. + +Should `TryFrom` and `TryInto` be added to the prelude? This would be the first +prelude addition since the 1.0 release. + +[from-str]: https://doc.rust-lang.org/1.7.0/std/str/trait.FromStr.html +[into-connect-params]: http://sfackler.github.io/rust-postgres/doc/v0.11.4/postgres/trait.IntoConnectParams.html diff --git a/text/1543-integer_atomics.md b/text/1543-integer_atomics.md new file mode 100644 index 00000000000..d1d0dce3f64 --- /dev/null +++ b/text/1543-integer_atomics.md @@ -0,0 +1,105 @@ +- Feature Name: `integer_atomics` +- Start Date: 2016-03-14 +- RFC PR: [rust-lang/rfcs#1543](https://github.com/rust-lang/rfcs/pull/1543) +- Rust Issue: [rust-lang/rust#32976](https://github.com/rust-lang/rust/issues/32976) + +# Summary +[summary]: #summary + +This RFC basically changes `core::sync::atomic` to look like this: + +```rust +#[cfg(target_has_atomic = "8")] +struct AtomicBool {} +#[cfg(target_has_atomic = "8")] +struct AtomicI8 {} +#[cfg(target_has_atomic = "8")] +struct AtomicU8 {} +#[cfg(target_has_atomic = "16")] +struct AtomicI16 {} +#[cfg(target_has_atomic = "16")] +struct AtomicU16 {} +#[cfg(target_has_atomic = "32")] +struct AtomicI32 {} +#[cfg(target_has_atomic = "32")] +struct AtomicU32 {} +#[cfg(target_has_atomic = "64")] +struct AtomicI64 {} +#[cfg(target_has_atomic = "64")] +struct AtomicU64 {} +#[cfg(target_has_atomic = "128")] +struct AtomicI128 {} +#[cfg(target_has_atomic = "128")] +struct AtomicU128 {} +#[cfg(target_has_atomic = "ptr")] +struct AtomicIsize {} +#[cfg(target_has_atomic = "ptr")] +struct AtomicUsize {} +#[cfg(target_has_atomic = "ptr")] +struct AtomicPtr {} +``` + +# Motivation +[motivation]: #motivation + +Many lock-free algorithms require a two-value `compare_exchange`, which is effectively twice the size of a `usize`. This would be implemented by atomically swapping a struct containing two members. + +Another use case is to support Linux's futex API. This API is based on atomic `i32` variables, which currently aren't available on x86_64 because `AtomicIsize` is 64-bit. + +# Detailed design +[design]: #detailed-design + +## New atomic types + +The `AtomicI8`, `AtomicI16`, `AtomicI32`, `AtomicI64` and `AtomicI128` types are added along with their matching `AtomicU*` type. These have the same API as the existing `AtomicIsize` and `AtomicUsize` types. Note that support for 128-bit atomics is dependent on the [i128/u128 RFC](https://github.com/rust-lang/rfcs/pull/1504) being accepted. + +## Target support + +One problem is that it is hard for a user to determine if a certain type `T` can be placed inside an `Atomic`. After a quick survey of the LLVM and Clang code, architectures can be classified into 3 categories: + +- The architecture does not support any form of atomics (mainly microcontroller architectures). +- The architecture supports all atomic operations for integers from i8 to iN (where N is the architecture word/pointer size). +- The architecture supports all atomic operations for integers from i8 to i(N*2). + +A new target cfg is added: `target_has_atomic`. It will have multiple values, one for each atomic size supported by the target. For example: + +```rust +#[cfg(target_has_atomic = "128")] +static ATOMIC: AtomicU128 = AtomicU128::new(mem::transmute((0u64, 0u64))); +#[cfg(not(target_has_atomic = "128"))] +static ATOMIC: Mutex<(u64, u64)> = Mutex::new((0, 0)); + +#[cfg(target_has_atomic = "64")] +static COUNTER: AtomicU64 = AtomicU64::new(0); +#[cfg(not(target_has_atomic = "64"))] +static COUTNER: AtomicU32 = AtomicU32::new(0); +``` + +Note that it is not necessary for an architecture to natively support atomic operations for all sizes (`i8`, `i16`, etc) as long as it is able to perform a `compare_exchange` operation with a larger size. All smaller operations can be emulated using that. For example a byte atomic can be emulated by using a `compare_exchange` loop that only modifies a single byte of the value. This is actually how LLVM implements byte-level atomics on MIPS, which only supports word-sized atomics native. Note that the out-of-bounds read is fine here because atomics are aligned and will never cross a page boundary. Since this transformation is performed transparently by LLVM, we do not need to do any extra work to support this. + +## Changes to `AtomicPtr`, `AtomicIsize` and `AtomicUsize` + +These types will have a `#[cfg(target_has_atomic = "ptr")]` bound added to them. Although these types are stable, this isn't a breaking change because all targets currently supported by Rust will have this type available. This would only affect custom targets, which currently fail to link due to missing compiler-rt symbols anyways. + +## Changes to `AtomicBool` + +This type will be changes to use an `AtomicU8` internally instead of an `AtomicUsize`, which will allow it to be safely transmuted to a `bool`. This will make it more consistent with the other atomic types that have the same layout as their underlying type. (For example futex code will assume that a `&AtomicI32` can be passed as a `&i32` to the system call) + +# Drawbacks +[drawbacks]: #drawbacks + +Having certain atomic types get enabled/disable based on the target isn't very nice, but it's unavoidable because support for atomic operations is very architecture-specific. + +This approach doesn't directly support for atomic operations on user-defined structs, but this can be emulated using transmutes. + +# Alternatives +[alternatives]: #alternatives + +One alternative that was discussed in a [previous RFC](https://github.com/rust-lang/rfcs/pull/1505) was to add a generic `Atomic` type. However the consensus was that having unsupported atomic types either fail at monomorphization time or fall back to lock-based implementations was undesirable. + +Several other designs have been suggested [here](https://internals.rust-lang.org/t/pre-rfc-extended-atomic-types/3068). + +# Unresolved questions +[unresolved]: #unresolved-questions + +None diff --git a/text/1548-global-asm.md b/text/1548-global-asm.md new file mode 100644 index 00000000000..81c651c0f74 --- /dev/null +++ b/text/1548-global-asm.md @@ -0,0 +1,61 @@ +- Feature Name: global_asm +- Start Date: 2016-03-18 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1548 +- Rust Issue: https://github.com/rust-lang/rust/issues/35119 + +# Summary +[summary]: #summary + +This RFC exposes LLVM's support for [module-level inline assembly](http://llvm.org/docs/LangRef.html#module-level-inline-assembly) by adding a `global_asm!` macro. The syntax is very simple: it just takes a string literal containing the assembly code. + +Example: +```rust +global_asm!(r#" +.globl my_asm_func +my_asm_func: + ret +"#); + +extern { + fn my_asm_func(); +} +``` + +# Motivation +[motivation]: #motivation + +There are two main use cases for this feature. The first is that it allows functions to be written completely in assembly, which mostly eliminates the need for a `naked` attribute. This is mainly useful for function that use a custom calling convention, such as interrupt handlers. + +Another important use case is that it allows external assembly files to be used in a Rust module without needing hacks in the build system: + +```rust +global_asm!(include_str!("my_asm_file.s")); +``` + +Assembly files can also be preprocessed or generated by `build.rs` (for example using the C preprocessor), which will produce output files in the Cargo output directory: + +```rust +global_asm!(include_str!(concat!(env!("OUT_DIR"), "/preprocessed_asm.s"))); +``` + +# Detailed design +[design]: #detailed-design + +See description above, not much to add. The macro will map directly to LLVM's `module asm`. + +# Drawbacks +[drawbacks]: #drawbacks + +Like `asm!`, this feature depends on LLVM's integrated assembler. + +# Alternatives +[alternatives]: #alternatives + +The current way of including external assembly is to compile the assembly files using gcc in `build.rs` and link them into the Rust program as a static library. + +An alternative for functions written entirely in assembly is to add a [`#[naked]` function attribute](https://github.com/rust-lang/rfcs/pull/1201). + +# Unresolved questions +[unresolved]: #unresolved-questions + +None diff --git a/text/1552-contains-method-for-various-collections.md b/text/1552-contains-method-for-various-collections.md new file mode 100644 index 00000000000..07dab257fe7 --- /dev/null +++ b/text/1552-contains-method-for-various-collections.md @@ -0,0 +1,95 @@ +- Feature Name: `contains_method_for_various_collections` +- Start Date: 2016-03-16 +- RFC PR: [rust-lang/rfcs#1552](https://github.com/rust-lang/rfcs/pull/1552) +- Rust Issue: [rust-lang/rust#32630](https://github.com/rust-lang/rust/issues/32630) + +# Summary +[summary]: #summary + +Add a `contains` method to `VecDeque` and `LinkedList` that checks if the +collection contains a given item. + +# Motivation +[motivation]: #motivation + +A `contains` method exists for the slice type `[T]` and for `Vec` through +`Deref`, but there is no easy way to check if a `VecDeque` or `LinkedList` +contains a specific item. Currently, the shortest way to do it is something +like: + +```rust +vec_deque.iter().any(|e| e == item) +``` + +While this is not insanely verbose, a `contains` method has the following +advantages: + +- the name `contains` expresses the programmer's intent... +- ... and thus is more idiomatic +- it's as short as it can get +- programmers that are used to call `contains` on a `Vec` are confused by the + non-existence of the method for `VecDeque` or `LinkedList` + +# Detailed design +[design]: #detailed-design + +Add the following method to `std::collections::VecDeque`: + +```rust +impl VecDeque { + /// Returns `true` if the `VecDeque` contains an element equal to the + /// given value. + pub fn contains(&self, x: &T) -> bool + where T: PartialEq + { + // implementation with a result equivalent to the result + // of `self.iter().any(|e| e == x)` + } +} +``` + +Add the following method to `std::collections::LinkedList`: + +```rust +impl LinkedList { + /// Returns `true` if the `LinkedList` contains an element equal to the + /// given value. + pub fn contains(&self, x: &T) -> bool + where T: PartialEq + { + // implementation with a result equivalent to the result + // of `self.iter().any(|e| e == x)` + } +} +``` + +The new methods should probably be marked as unstable initially and be +stabilized later. + +# Drawbacks +[drawbacks]: #drawbacks + +Obviously more methods increase the complexity of the standard library, but in +case of this RFC the increase is rather tiny. + +While `VecDeque::contains` should be (nearly) as fast as `[T]::contains`, +`LinkedList::contains` will probably be much slower due to the cache +inefficient nature of a linked list. Offering a method that is short to +write and convenient to use could lead to excessive use of said method +without knowing about the problems mentioned above. + +# Alternatives +[alternatives]: #alternatives + +There are a few alternatives: + +- add `VecDeque::contains` only and do not add `LinkedList::contains` +- do nothing, because -- technically -- the same functionality is offered + through iterators +- also add `BinaryHeap::contains`, since it could be convenient for some use + cases, too + +# Unresolved questions +[unresolved]: #unresolved-questions + +None so far. diff --git a/text/1558-closure-to-fn-coercion.md b/text/1558-closure-to-fn-coercion.md new file mode 100644 index 00000000000..5e6a9bb5f58 --- /dev/null +++ b/text/1558-closure-to-fn-coercion.md @@ -0,0 +1,222 @@ +- Feature Name: closure_to_fn_coercion +- Start Date: 2016-03-25 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +A closure that does not move, borrow, or otherwise access (capture) local +variables should be coercable to a function pointer (`fn`). + +# Motivation +[motivation]: #motivation + +Currently in Rust, it is impossible to bind anything but a pre-defined function +as a function pointer. When dealing with closures, one must either rely upon +Rust's type-inference capabilities, or use the `Fn` trait to abstract for any +closure with a certain type signature. + +It is not possible to define a function while at the same time binding it to a +function pointer. + +This is, admittedly, a convenience-motivated feature, but in certain situations +the inability to bind code this way creates a significant amount of boilerplate. +For example, when attempting to create an array of small, simple, but unique functions, +it would be necessary to pre-define each and every function beforehand: + +```rust +fn inc_0(var: &mut u32) {} +fn inc_1(var: &mut u32) { *var += 1; } +fn inc_2(var: &mut u32) { *var += 2; } +fn inc_3(var: &mut u32) { *var += 3; } + +const foo: [fn(&mut u32); 4] = [ + inc_0, + inc_1, + inc_2, + inc_3, +]; +``` + +This is a trivial example, and one that might not seem too consequential, but the +code doubles with every new item added to the array. With a large amount of elements, +the duplication begins to seem unwarranted. + +A solution, of course, is to use an array of `Fn` instead of `fn`: + +```rust +const foo: [&'static Fn(&mut u32); 4] = [ + &|var: &mut u32| {}, + &|var: &mut u32| *var += 1, + &|var: &mut u32| *var += 2, + &|var: &mut u32| *var += 3, +]; +``` + +And this seems to fix the problem. Unfortunately, however, because we use +a reference to the `Fn` trait, an extra layer of indirection is added when +attempting to run `foo[n](&mut bar)`. + +Rust must use dynamic dispatch in this situation; a closure with captures is nothing +but a struct containing references to captured variables. The code associated with a +closure must be able to access those references stored in the struct. + +In situations where this function pointer array is particularly hot code, +any optimizations would be appreciated. More generally, it is always preferable +to avoid unnecessary indirection. And, of course, it is impossible to use this syntax +when dealing with FFI. + +Aside from code-size nits, anonymous functions are legitimately useful for programmers. +In the case of callback-heavy code, for example, it can be impractical to define functions +out-of-line, with the requirement of producing confusing (and unnecessary) names for each. +In the very first example given, `inc_X` names were used for the out-of-line functions, but +more complicated behavior might not be so easily representable. + +Finally, this sort of automatic coercion is simply intuitive to the programmer. +In the `&Fn` example, no variables are captured by the closures, so the theory is +that nothing stops the compiler from treating them as anonymous functions. + +# Detailed design +[design]: #detailed-design + +In C++, non-capturing lambdas (the C++ equivalent of closures) "decay" into function pointers +when they do not need to capture any variables. This is used, for example, to pass a lambda +into a C function: + +```cpp +void foo(void (*foobar)(void)) { + // impl +} +void bar() { + foo([]() { /* do something */ }); +} +``` + +With this proposal, rust users would be able to do the same: + +```rust +fn foo(foobar: fn()) { + // impl +} +fn bar() { + foo(|| { /* do something */ }); +} +``` + +Using the examples within ["Motivation"](#motivation), the code array would +be simplified to no performance detriment: + +```rust +const foo: [fn(&mut u32); 4] = [ + |var: &mut u32| {}, + |var: &mut u32| *var += 1, + |var: &mut u32| *var += 2, + |var: &mut u32| *var += 3, +]; +``` + +Because there does not exist any item in the language that directly produces +a `fn` type, even `fn` items must go through the process of reification. To +perform the coercion, then, rustc must additionally allow the reification of +unsized closures to `fn` types. The implementation of this is simplified by the +fact that closures' capture information is recorded on the type-level. + +*Note:* once explicitly assigned to an `Fn` trait, the closure can no longer be +coerced into `fn`, even if it has no captures. + +```rust +let a: &Fn(u32) -> u32 = |foo: u32| { foo + 1 }; +let b: fn(u32) -> u32 = *a; // Can't re-coerce +``` + +# Drawbacks +[drawbacks]: #drawbacks + +This proposal could potentially allow Rust users to accidentally constrain their APIs. +In the case of a crate, a user returning `fn` instead of `Fn` may find +that their code compiles at first, but breaks when the user later needs to capture variables: + +```rust +// The specific syntax is more convenient to use +fn func_specific(&self) -> (fn() -> u32) { + || return 0 +} + +fn func_general<'a>(&'a self) -> impl Fn() -> u32 { + move || return self.field +} +``` + +In the above example, the API author could start off with the specific version of the function, +and by circumstance later need to capture a variable. The required change from `fn` to `Fn` could +be a breaking change. + +We do expect crate authors to measure their API's flexibility in other areas, however, as when +determining whether to take `&self` or `&mut self`. Taking a similar situation to the above: + +```rust +fn func_specific<'a>(&'a self) -> impl Fn() -> u32 { + move || return self.field +} + +fn func_general<'a>(&'a mut self) -> impl FnMut() -> u32 { + move || { self.field += 1; return self.field; } +} +``` + +This aspect is probably outweighed by convenience, simplicity, and the potential for optimization +that comes with the proposed changes. + +# Alternatives +[alternatives]: #alternatives + +## Function literal syntax + +With this alternative, Rust users would be able to directly bind a function +to a variable, without needing to give the function a name. + +```rust +let foo = fn() { /* do something */ }; +foo(); +``` + +```rust +const foo: [fn(&mut u32); 4] = [ + fn(var: &mut u32) {}, + fn(var: &mut u32) { *var += 1 }, + fn(var: &mut u32) { *var += 2 }, + fn(var: &mut u32) { *var += 3 }, +]; +``` + +This isn't ideal, however, because it would require giving new semantics +to `fn` syntax. Additionally, such syntax would either require explicit return types, +or additional reasoning about the literal's return type. + +```rust +fn(x: bool) { !x } +``` + +The above function literal, at first glance, appears to return `()`. This could be +potentially misleading, especially in situations where the literal is bound to a +variable with `let`. + +As with all new syntax, this alternative would carry with it a discovery barrier. +Closure coercion may be preferred due to its intuitiveness. + +## Aggressive optimization + +This is possibly unrealistic, but an alternative would be to continue encouraging +the use of closures with the `Fn` trait, but use static analysis to determine +when the used closure is "trivial" and does not need indirection. + +Of course, this would probably significantly complicate the optimization process, and +would have the detriment of not being easily verifiable by the programmer without +checking the disassembly of their program. + +# Unresolved questions +[unresolved]: #unresolved-questions + +Should we generalize this behavior in the future, so that any zero-sized type that +implements `Fn` can be converted into a `fn` pointer? diff --git a/text/1559-attributes-with-literals.md b/text/1559-attributes-with-literals.md new file mode 100644 index 00000000000..e9044f87c78 --- /dev/null +++ b/text/1559-attributes-with-literals.md @@ -0,0 +1,165 @@ +- Feature Name: attributes_with_literals +- Start Date: 2016-03-28 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1559 +- Rust Issue: https://github.com/rust-lang/rust/issues/34981 + +# Summary +[summary]: #summary + +This RFC proposes accepting literals in attributes by defining the grammar of attributes as: + +```ebnf +attr : '#' '!'? '[' meta_item ']' ; + +meta_item : IDENT ( '=' LIT | '(' meta_item_inner? ')' )? ; + +meta_item_inner : (meta_item | LIT) (',' meta_item_inner)? ; +``` + +Note that `LIT` is a valid Rust literal and `IDENT` is a valid Rust identifier. The following +attributes, among others, would be accepted by this grammar: + +```rust +#[attr] +#[attr(true)] +#[attr(ident)] +#[attr(ident, 100, true, "true", ident = 100, ident = "hello", ident(100))] +#[attr(100)] +#[attr(enabled = true)] +#[enabled(true)] +#[attr("hello")] +#[repr(C, align = 4)] +#[repr(C, align(4))] +``` + +# Motivation +[motivation]: #motivation + +At present, literals are only accepted as the value of a key-value pair in attributes. What's more, +only _string_ literals are accepted. This means that literals can only appear in forms of +`#[attr(name = "value")]` or `#[attr = "value"]`. + +This forces non-string literal values to be awkwardly stringified. For example, while it is clear +that something like alignment should be an integer value, the following are disallowed: +`#[align(4)]`, `#[align = 4]`. Instead, we must use something akin to `#[align = "4"]`. Even +`#[align("4")]` and `#[name("name")]` are disallowed, forcing key-value pairs or identifiers to be +used instead: `#[align(size = "4")]` or `#[name(name)]`. + +In short, the current design forces users to use values of a single type, and thus occasionally the +_wrong_ type, in attributes. + +### Cleaner Attributes + +Implementation of this RFC can clean up the following attributes in the standard library: + +* `#![recursion_limit = "64"]` **=>** `#![recursion_limit = 64]` or `#![recursion_limit(64)]` +* `#[cfg(all(unix, target_pointer_width = "32"))]` **=>** `#[cfg(all(unix, target_pointer_width = 32))]` + +If `align` were to be added as an attribute, the following are now valid options for its syntax: + +* `#[repr(align(4))]` +* `#[repr(align = 4)]` +* `#[align = 4]` +* `#[align(4)]` + +### Syntax Extensions + +As syntax extensions mature and become more widely used, being able to use literals in a variety of +positions becomes more important. + +# Detailed design +[design]: #detailed-design + +To clarify, _literals_ are: + + * **Strings:** `"foo"`, `r##"foo"##` + * **Byte Strings:** `b"foo"` + * **Byte Characters:** `b'f'` + * **Characters:** `'a'` + * **Integers:** `1`, `1{i,u}{8,16,32,64,size}` + * **Floats:** `1.0`, `1.0f{32,64}` + * **Booleans:** `true`, `false` + +They are defined in the [manual] and by implementation in the [AST]. + + [manual]: https://doc.rust-lang.org/reference.html#literals + [AST]: http://manishearth.github.io/rust-internals-docs/syntax/ast/enum.LitKind.html + +Implementation of this RFC requires the following changes: + +1. The `MetaItemKind` structure would need to allow literals as top-level entities: + + ```rust + pub enum MetaItemKind { + Word(InternedString), + List(InternedString, Vec>), + NameValue(InternedString, Lit), + Literal(Lit), + } + ``` + +2. `libsyntax` (`libsyntax/parse/attr.rs`) would need to be modified to allow literals as values in + k/v pairs and as top-level entities of a list. + +3. Crate metadata encoding/decoding would need to encode and decode literals in attributes. + +# Drawbacks +[drawbacks]: #drawbacks + +This RFC requires a change to the AST and is likely to break syntax extensions using attributes in +the wild. + +# Alternatives +[alternatives]: #alternatives + +### Token trees + +An alternative is to allow any tokens inside of an attribute. That is, the grammar could be: + +```ebnf +attr : '#' '!'? '[' TOKEN+ ']' ; +``` + +where `TOKEN` is any valid Rust token. The drawback to this approach is that attributes lose any +sense of structure. This results in more difficult and verbose attribute parsing, although this +could be ameliorated through libraries. Further, this would require almost all of the existing +attribute parsing code to change. + +The advantage, of course, is that it allows any syntax and is rather future proof. It is also more +inline with `macro!`s. + +### Allow only unsuffixed literals + +This RFC proposes allowing _any_ valid Rust literals in attributes. Instead, the use of literals +could be restricted to only those that are unsuffixed. That is, only the following literals could be +allowed: + + * **Strings:** `"foo"` + * **Characters:** `'a'` + * **Integers:** `1` + * **Floats:** `1.0` + * **Booleans:** `true`, `false` + +This cleans up the appearance of attributes will still increasing flexibility. + +### Allow literals only as values in k/v pairs + +Instead of allowing literals in top-level positions, i.e. `#[attr(4)]`, only allow them as values in +key value pairs: `#[attr = 4]` or `#[attr(ident = 4)]`. This has the nice advantage that it was the +initial idea for attributes, and so the AST types already reflect this. As such, no changes would +have to be made to existing code. The drawback, of course, is the lack of flexibility. `#[repr(C, +align(4))]` would no longer be valid. + +### Do nothing + +Of course, the current design could be kept. Although it seems that the initial intention was for a +form of literals to be allowed. Unfortunately, this idea was [scrapped due to release pressure] and +never revisited. Even [the reference] alludes to allowing all literals as values in k/v pairs. + + [scrapped due to release pressure]: https://github.com/rust-lang/rust/issues/623 + [the reference]: https://doc.rust-lang.org/reference.html#attributes + +# Unresolved questions +[unresolved]: #unresolved-questions + +None that I can think of. diff --git a/text/1560-name-resolution.md b/text/1560-name-resolution.md new file mode 100644 index 00000000000..1afe57b90e3 --- /dev/null +++ b/text/1560-name-resolution.md @@ -0,0 +1,658 @@ +- Feature Name: item_like_imports +- Start Date: 2016-02-09 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1560 +- Rust Issue: https://github.com/rust-lang/rust/issues/35120 + +# Summary +[summary]: #summary + +Some internal and language-level changes to name resolution. + +Internally, name resolution will be split into two parts - import resolution and +name lookup. Import resolution is moved forward in time to happen in the same +phase as parsing and macro expansion. Name lookup remains where name resolution +currently takes place (that may change in the future, but is outside the scope +of this RFC). However, name lookup can be done earlier if required (importantly +it can be done during macro expansion to allow using the module system for +macros, also outside the scope of this RFC). Import resolution will use a new +algorithm. + +The observable effects of this RFC (i.e., language changes) are some increased +flexibility in the name resolution rules, especially around globs and shadowing. + +There is an implementation of the language changes in +[PR #32213](https://github.com/rust-lang/rust/pull/32213). + +# Motivation +[motivation]: #motivation + +Naming and importing macros currently works very differently to naming and +importing any other item. It would be impossible to use the same rules, +since macro expansion happens before name resolution in the compilation process. +Implementing this RFC means that macro expansion and name resolution can happen +in the same phase, thus allowing macros to use the Rust module system properly. + +At the same time, we should be able to accept more Rust programs by tweaking the +current rules around imports and name shadowing. This should make programming +using imports easier. + + +## Some issues in Rust's name resolution + +Whilst name resolution is sometimes considered a simple part of the compiler, +there are some details in Rust which make it tricky to properly specify and +implement. Some of these may seem obvious, but the distinctions will be +important later. + +* Imported vs declared names - a name can be imported (e.g., `use foo;`) or + declared (e.g., `fn foo ...`). +* Single vs glob imports - a name can be explicitly (e.g., `use a::foo;`) or + implicitly imported (e.g., `use a::*;` where `foo` is declared in `a`). +* Public vs private names - the visibility of names is somewhat tied up with + name resolution, for example in current Rust `use a::*;` only imports the + public names from `a`. +* Lexical scoping - a name can be inherited from a surrounding scope, rather + than being declared in the current one, e.g., `let foo = ...; { foo(); }`. +* There are different kinds of scopes - at the item level, names are not + inherited from outer modules into inner modules. Items may also be declared + inside functions and blocks within functions, with different rules from modules. + At the expression level, blocks (`{...}`) give explicit scope, however, from + the point of view of macro hygiene and region inference, each `let` statement + starts a new implicit scope. +* Explicitly declared vs macro generated names - a name can be declared + explicitly in the source text, or could be declared as the result of expanding + a macro. +* Rust has multiple namespaces - types, values, and macros exist in separate + namespaces (some items produce names in multiple namespaces). Imports + refer (implictly) to one or more names in different namespaces. + + Note that all top-level (i.e., not parameters, etc.) path segments in a path + other than the last must be in the type namespace, e.g., in `a::b::c`, `a` and + `b` are assumed to be in the type namespace, and `c` may be in any namespace. +* Rust has an implicit prelude - the prelude defines a set of names which are + always (unless explicitly opted-out) nameable. The prelude includes macros. + Names in the prelude can be shadowed by any other names. + + +# Detailed design +[design]: #detailed-design + +## Guiding principles + +We would like the following principles to hold. There may be edge cases where +they do not, but we would like these to be as small as possible (and prefer they +don't exist at all). + +#### Avoid 'time-travel' ambiguities, or different results of resolution if names +are resolved in different orders. + +Due to macro expansion, it is possible for a name to be resolved and then to +become ambiguous, or (with rules formulated in a certain way) for a name to be +resolved, then to be amiguous, then to be resolvable again (possibly to +different bindings). + +Furthermore, there is some flexibility in the order in which macros can be +expanded. How a name resolves should be consistent under any ordering. + +The strongest form of this principle, I believe, is that at any stage of +macro expansion, and under any ordering of expansions, if a name resolves to a +binding then it should always (i.e., at any other stage of any other expansion +series) resolve to that binding, and if resolving a name produces an error +(n.b., distinct from not being able to resolve), it should always produce an +error. + + +#### Avoid errors due to the resolver being stuck. + +Errors with concrete causes and explanations are easier for the user to +understand and to correct. If an error is caused by name resolution getting +stuck, rather than by a concrete problem, this is hard to explain or correct. + +For example, if we support a rule that means that a certain glob can't be +expanded before a macro is, but the macro can only be named via that glob +import, then there is an obvious resolution that can't be reached due to our +ordering constraints. + + +#### The order of declarations of items should be irrelevant. + +I.e., names should be able to be used before they are declared. Note that this +clearly does not hold for declarations of variables in statements inside +function bodies. + + +#### Macros should be manually expandable. + +Compiling a program should have the same result before and after expanding a +macro 'by hand', so long as hygiene is accounted for. + + +#### Glob imports should be manually expandable. + +A programmer should be able to replace a glob import with a list import that +imports any names imported by the glob and used in the current scope, without +changing name resolution behaviour. + + +#### Visibility should not affect name resolution. + +Clearly, visibility affects whether a name can be used or not. However, it +should not affect the mechanics of name resolution. I.e., changing a name from +public to private (or vice versa), should not cause more or fewer name +resolution errors (it may of course cause more or fewer accessibility errors). + + +## Changes to name resolution rules + +### Multiple unused imports + +A name may be imported multiple times, it is only a name resolution error if +that name is used. E.g., + +``` +mod foo { + pub struct Qux; +} + +mod bar { + pub struct Qux; +} + +mod baz { + use foo::*; + use bar::*; // Ok, no name conflict. +} +``` + +In this example, adding a use of `Qux` in `baz` would cause a name resolution +error. + +### Multiple imports of the same binding + +A name may be imported multiple times and used if both names bind to the same +item. E.g., + +``` +mod foo { + pub struct Qux; +} + +mod bar { + pub use foo::Qux; +} + +mod baz { + use foo::*; + use bar::*; + + fn f(q: Qux) {} +} +``` + +### non-public imports + +Currently `use` and `pub use` items are treated differently. Non-public imports +will be treated in the same way as public imports, so they may be referenced +from modules which have access to them. E.g., + +``` +mod foo { + pub struct Qux; +} + +mod bar { + use foo::Qux; + + mod baz { + use bar::Qux; // Ok + } +} +``` + + +### Glob imports of accessible but not public names + +Glob imports will import all accessible names, not just public ones. E.g., + +``` +struct Qux; + +mod foo { + use super::*; + + fn f(q: Qux) {} // Ok +} +``` + +This change is backwards incompatible. However, the second rule above should +address most cases, e.g., + +``` +struct Qux; + +mod foo { + use super::*; + use super::Qux; // Legal due to the second rule above. + + fn f(q: Qux) {} // Ok +} +``` + +The below rule (though more controversial) should make this change entirely +backwards compatible. + +Note that in combination with the above rule, this means non-public imports are +imported by globs where they are private but accessible. + + +### Explicit names may shadow implicit names + +Here, an implicit name means a name imported via a glob or inherited from an +outer scope (as opposed to being declared or imported directly in an inner scope). + +An explicit name may shadow an implicit name without causing a name +resolution error. E.g., + +``` +mod foo { + pub struct Qux; +} + +mod bar { + pub struct Qux; +} + +mod baz { + use foo::*; + + struct Qux; // Shadows foo::Qux. +} + +mod boz { + use foo::*; + use bar::Qux; // Shadows foo::Qux; note, ordering is not important. +} +``` + +or + +``` +fn main() { + struct Foo; // 1. + { + struct Foo; // 2. + + let x = Foo; // Ok and refers to declaration 2. + } +} +``` + +Note that shadowing is namespace specific. I believe this is consistent with our +general approach to name spaces. E.g., + +``` +mod foo { + pub struct Qux; +} + +mod bar { + pub trait Qux; +} + +mod boz { + use foo::*; + use bar::Qux; // Shadows only in the type name space. + + fn f(x: &Qux) { // bound to bar::Qux. + let _ = Qux; // bound to foo::Qux. + } +} +``` + +Caveat: an explicit name which is defined by the expansion of a macro does **not** +shadow implicit names. Example: + +``` +macro_rules! foo { + () => { + fn foo() {} + } +} + +mod a { + fn foo() {} +} + +mod b { + use a::*; + + foo!(); // Expands to `fn foo() {}`, this `foo` does not shadow the `foo` + // imported from `a` and therefore there is a duplicate name error. +} +``` + +The rationale for this caveat is so that during import resolution, if we have a +glob import (or other implicit name) we can be sure that any imported names will +not be shadowed, either the name will continue to be valid, or there will be an +error. Without this caveat, a name could be valid, and then after further +expansion, become shadowed by a higher priority name. + +An error is reported if there is an ambiguity between names due to the lack of +shadowing, e.g., (this example assumes modularised macros), + +``` +macro_rules! foo { + () => { + macro! bar { ... } + } +} + +mod a { + macro! bar { ... } +} + +mod b { + use a::*; + + foo!(); // Expands to `macro! bar { ... }`. + + bar!(); // ERROR: bar is ambiguous. +} +``` + +Note on the caveat: there will only be an error emitted if an ambiguous name is +used directly or indirectly in a macro use. I.e., is the name of a macro that is +used, or is the name of a module that is used to name a macro either in a macro +use or in an import. + +Alternatives: we could emit an error even if the ambiguous name is not used, or +as a compromise between these two, we could emit an error if the name is in the +type or macro namespace (a name in the value namespace can never cause problems). + +This change is discussed in [issue 31337](https://github.com/rust-lang/rust/issues/31337) +and on this RFC PR's comment thread. + + +### Re-exports, namespaces, and visibility. + +(This is something of a clarification point, rather than explicitly new behaviour. +See also discussion on [issue 31783](https://github.com/rust-lang/rust/issues/31783)). + +An import (`use`) or re-export (`pub use`) imports a name in all available +namespaces. E.g., `use a::foo;` will import `foo` in the type and value +namespaces if it is declared in those namespaces in `a`. + +For a name to be re-exported, it must be public, e.g, `pub use a::foo;` requires +that `foo` is declared publicly in `a`. This is complicated by namespaces. The +following behaviour should be followed for a re-export of `foo`: + +* `foo` is private in all namespaces in which it is declared - emit an error. +* `foo` is public in all namespaces in which it is declared - `foo` is + re-exported in all namespaces. +* `foo` is mixed public/private - `foo` is re-exported in the namespaces in which + it is declared publicly and imported but not re-exported in namespaces in which + it is declared privately. + +For a glob re-export, there is an error if there are no public items in any +namespace. Otherwise private names are imported and public names are re-exported +on a per-namespace basis (i.e., following the above rules). + +## Changes to the implementation + +Note: below I talk about "the binding table", this is sort of hand-waving. I'm +envisaging a sets-of-scopes system where there is effectively a single, global +binding table. However, the details of that are beyond the scope of this RFC. +One can imagine "the binding table" means one binding table per scope, as in the +current system. + +Currently, parsing and macro expansion happen in the same phase. With this +proposal, we add import resolution to that mix too. Binding tables as well as +the AST will be produced by libsyntax. Name lookup will continue to be done +where name resolution currently takes place. + +To resolve imports, the algorithm proceeds as follows: we start by parsing as +much of the program as we can; like today we don't parse macros. When we find +items which bind a name, we add the name to the binding table. When we find an +import which can't be resolved, we add it to a work list. When we find a glob +import, we have to record a 'back link', so that when a public name is added for +the supplying module, we can add it for the importing module. + +We then loop over the work list and try to lookup names. If a name has exactly +one best binding then we use it (and record the binding on a list of resolved +names). If there are zero then we put it back on the work list. If there is more +than one binding, then we record an ambiguity error. When we reach a fixed +point, i.e., the work list no longer changes, then we are done. If the work list +is empty, then expansion/import resolution succeeded, otherwise there are names +not found, or ambiguous names, and we failed. + +As we are looking up names, we record the resolutions in the binding table. If +the name we are looking up is for a glob import, we add bindings for every +accessible name currently known. + +To expand a macro use, we try to resolve the macro's name. If that fails, we put +it on the work list. Otherwise, we expand that macro by parsing the arguments, +pattern matching, and doing hygienic expansion. We then parse the generated code +in the same way as we parsed the original program. We add new names to the +binding table, and expand any new macro uses. + +If we add names for a module which has back links, we must follow them and add +these names to the importing module (if they are accessible). + +In pseudo-code: + +``` +// Assumes parsing is already done, but the two things could be done in the same +// pass. +fn parse_expand_and_resolve() { + loop until fixed point { + process_names() + loop until fixed point { + process_work_list() + } + expand_macros() + } + + for item in work_list { + report_error() + } else { + success!() + } +} + +fn process_names() { + // 'module' includes `mod`s, top level of the crate, function bodies + for each unseen item in any module { + if item is a definition { + // struct, trait, type, local variable def, etc. + bindings.insert(item.name, module, item) + populate_back_links(module, item) + } else { + try_to_resolve_import(module, item) + } + record_macro_uses() + } +} + +fn try_to_resolve_import(module, item) { + if item is an explicit use { + // item is use a::b::c as d; + match try_to_resolve(item) { + Ok(r) => { + add(bindings.insert(d, module, r, Priority::Explicit)) + populate_back_links(module, item) + } + Err() => work_list.push(module, item) + } + } else if item is a glob { + // use a::b::*; + match try_to_resolve(a::b) { + Ok(n) => + for binding in n { + bindings.insert_if_no_higher_priority_binding(binding.name, module, binding, Priority::Glob) + populate_back_links(module, binding) + } + add_back_link(n to module) + work_list.remove() + Err(_) => work_list.push(module, item) + } + } +} + +fn process_work_list() { + for each (module, item) in work_list { + work_list.remove() + try_to_resolve_import(module, item) + } +} +``` + +Note that this pseudo-code elides some details: that names are imported into +distinct namespaces (the type and value namespaces, and with changes to macro +naming, also the macro namespace), and that we must record whether a name is due +to macro expansion or not to abide by the caveat to the 'explicit names shadow +glob names' rule. + +If Rust had a single namespace (or had some other properties), we would not have +to distinguish between failed and unresolved imports. However, it does and we +must. This is not clear from the pseudo-code because it elides namespaces, but +consider the following small example: + +``` +use a::foo; // foo exists in the value namespace of a. +use b::*; // foo exists in the type namespace of b. +``` + +Can we resolve a use of `foo` in type position to the import from `b`? That +depends on whether `foo` exists in the type namespace in `a`. If we can prove +that it does not (i.e., resolution fails) then we can use the glob import. If we +cannot (i.e., the name is unresolved but we can't prove it will not resolve +later), then it is not safe to use the glob import because it may be shadowed by +the explicit import. (Note, since `foo` exists in at least the value namespace +in `a`, there will be no error due to a bad import). + +In order to keep macro expansion comprehensible to programmers, we must enforce +that all macro uses resolve to the same binding at the end of resolution as they +do when they were resolved. + +We rely on a monotonicity property in macro expansion - once an item exists in a +certain place, it will always exist in that place. It will never disappear and +never change. Note that for the purposes of this property, I do not consider +code annotated with a macro to exist until it has been fully expanded. + +A consequence of this is that if the compiler resolves a name, then does some +expansion and resolves it again, the first resolution will still be valid. +However, another resolution may appear, so the resolution of a name may change +as we expand. It can also change from a good resolution to an ambiguity. It is +also possible to change from good to ambiguous to good again. There is even an +edge case where we go from good to ambiguous to the same good resolution (but +via a different route). + +If import resolution succeeds, then we check our record of name resolutions. We +re-resolve and check we get the same result. We can also check for un-used +macros at this point. + +Note that the rules in the previous section have been carefully formulated to +ensure that this check is sufficient to prevent temporal ambiguities. There are +many slight variations for which this check would not be enough. + +### Privacy + +In order to resolve imports (and in the future for macro privacy), we must be +able to decide if names are accessible. This requires doing privacy checking as +required during parsing/expansion/import resolution. We can keep the current +algorithm, but check accessibility on demand, rather than as a separate pass. + +During macro expansion, once a name is resolvable, then we can safely perform +privacy checking, because parsing and macro expansion will never remove items, +nor change the module structure of an item once it has been expanded. + +### Metadata + +When a crate is packed into metadata, we must also include the binding table. We +must include private entries due to macros that the crate might export. We don't +need data for function bodies. For functions which are serialised for +inlining/monomorphisation, we should include local data (although it's probably +better to serialise the HIR or MIR, then the local bindings are unnecessary). + + +# Drawbacks +[drawbacks]: #drawbacks + +It's a lot of work and name resolution is complex, therefore there is scope for +introducing bugs. + +The macro changes are not backwards compatible, which means having a macro +system 2.0. If users are reluctant to use that, we will have two macro systems +forever. + +# Alternatives +[alternatives]: #alternatives + +## Naming rules + +We could take a subset of the shadowing changes (or none at all), whilst still +changing the implementation of name resolution. In particular, we might want to +discard the explicit/glob shadowing rule change, or only allow items, not +imported names to shadow. + +We could also consider different shadowing rules around namespacing. In the +'globs and explicit names' rule change, we could consider an explicit name to +shadow both name spaces and emit a custom error. The example becomes: + + +``` +mod foo { + pub struct Qux; +} + +mod bar { + pub trait Qux; +} + +mod boz { + use foo::*; + use bar::Qux; // Shadows both name spaces. + + fn f(x: &Qux) { // bound to bar::Qux. + let _ = Qux; // ERROR, unresolved name Qux; the compiler would emit a + // note about shadowing and namespaces. + } +} +``` + +## Import resolution algorithm + +Rather than lookup names for imports during the fixpoint iteration, one could +save links between imports and definitions. When lookup is required (for macros, +or later in the compiler), these links are followed to find a name, rather than +having the name being immediately available. + + +# Unresolved questions +[unresolved]: #unresolved-questions + +## Name lookup + +The name resolution phase would be replaced by a cut-down name lookup phase, +where the binding tables generated during expansion are used to lookup names in +the AST. + +We could go further, two appealing possibilities are merging name lookup with +the lowering from AST to HIR, so the HIR is a name-resolved data structure. Or, +name lookup could be done lazily (probably with some caching) so no tables +binding names to definitions are kept. I prefer the first option, but this is +not really in scope for this RFC. + +## `pub(restricted)` + +Where this RFC touches on the privacy system there are some edge cases involving +the `pub(path)` form of restricted visibility. I expect the precise solutions +will be settled during implementation and this RFC should be amended to reflect +those choices. + + +# References + +* [Niko's prototype](https://github.com/nikomatsakis/rust-name-resolution-algorithm) +* [Blog post](http://ncameron.org/blog/name-resolution/), includes details about + how the name resolution algorithm interacts with sets of scopes hygiene. diff --git a/text/1561-macro-naming.md b/text/1561-macro-naming.md new file mode 100644 index 00000000000..2d7a4caf6ee --- /dev/null +++ b/text/1561-macro-naming.md @@ -0,0 +1,187 @@ +- Feature Name: N/A (part of other unstable features) +- Start Date: 2016-02-11 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1561 +- Rust Issue: https://github.com/rust-lang/rust/issues/35896 + +# Summary +[summary]: #summary + +Naming and modularisation for macros. + +This RFC proposes making macros a first-class citizen in the Rust module system. +Both macros by example (`macro_rules` macros) and procedural macros (aka syntax +extensions) would use the same naming and modularisation scheme as other items +in Rust. + +For procedural macros, this RFC could be implemented immediately or as part of a +larger effort to reform procedural macros. For macros by example, this would be +part of a macros 2.0 feature, the rest of which will be described in a separate +RFC. This RFC depends on the changes to name resolution described in +[RFC 1560](https://github.com/rust-lang/rfcs/pull/1560). + +# Motivation +[motivation]: #motivation + +Currently, procedural macros are not modularised at all (beyond the crate +level). Macros by example have a [custom modularisation +scheme](https://github.com/rust-lang/rfcs/blob/master/text/0453-macro-reform.md) +which involves modules to some extent, but relies on source ordering and +attributes which are not used for other items. Macros cannot be imported or +named using the usual syntax. It is confusing that macros use their own system +for modularisation. It would be far nicer if they were a more regular feature of +Rust in this respect. + + +# Detailed design +[design]: #detailed-design + +## Defining macros + +This RFC does not propose changes to macro definitions. It is envisaged that +definitions of procedural macros will change, see [this blog post](http://ncameron.org/blog/macro-plans-syntax/) +for some rough ideas. I'm assuming that procedural macros will be defined in +some function-like way and that these functions will be defined in modules in +their own crate (to start with). + +Ordering of macro definitions in the source text will no longer be significant. +A macro may be used before it is defined, as long as it can be named. That is, +macros follow the same rules regarding ordering as other items. E.g., this will +work: + +``` +foo!(); + +macro! foo { ... } +``` + +(Note, I'm using a hypothetical `macro!` defintion which I will define in a future +RFC. The reader can assume it works much like `macro_rules!`, but with the new +naming scheme). + +Macro expansion order is also not defined by source order. E.g., in `foo!(); bar!();`, +`bar` may be expanded before `foo`. Ordering is only guaranteed as far as it is +necessary. E.g., if `bar` is only defined by expanding `foo`, then `foo` must be +expanded before `bar`. + +## Function-like macro uses + +A function-like macro use (c.f., attribute-like macro use) is a macro use which +uses `foo!(...)` or `foo! ident (...)` syntax (where `()` may also be `[]` or `{}`). + +Macros may be named by using a `::`-separated path. Naming follows the same +rules as other items in Rust. + +If a macro `baz` (by example or procedural) is defined in a module `bar` which +is nested in `foo`, then it may be used anywhere in the crate using an +absolute path: `::foo::bar::baz!(...)`. It can be used via relative paths in the +usual way, e.g., inside `foo` as `bar::baz!()`. + +Macros declared inside a function body can only be used inside that function +body. + +For procedural macros, the path must point to the function defining the macro. + +The grammar for macros is changed, anywhere we currently parser `name "!"`, we +now parse `path "!"`. I don't think this introduces any issues. + +Name lookup follows the same name resolution rules as other items. See [RFC +1560](https://github.com/rust-lang/rfcs/pull/1560) for details on how name +resolution could be adapted to support this. + +## Attribute-like macro uses + +Attribute macros may also be named using a `::`-separated path. Other than +appearing in an attribute, these also follow the usual Rust naming rules. + +E.g., `#[::foo::bar::baz(...)]` and `#[bar::baz(...)]` are uses of absolute and +relative paths, respectively. + + +## Importing macros + +Importing macros is done using `use` in the same way as other items. An `!` is +not necessary in an import item. Macros are imported into their own namespace +and do not shadow or overlap items with the same name in the type or value +namespaces. + +E.g., `use foo::bar::baz;` imports the macro `baz` from the module `::foo::bar`. +Macro imports may be used in import lists (with other macro imports and with +non-macro imports). + +Where a glob import (`use ...::*;`) imports names from a module including macro +definitions, the names of those macros are also imported. E.g., `use +foo::bar::*;` would import `baz` along with any other items in `foo::bar`. + +Where macros are defined in a separate crate, these are imported in the same way +as other items by an `extern crate` item. + +No `#[macro_use]` or `#[macro_export]` annotations are required. + + +## Shadowing + +Macro names follow the same shadowing rules as other names. For example, an +explicitly declared macro would shadow a glob-imported macro with the same name. +Note that since macros are in a different namespace from types and values, a +macro cannot shadow a type or value or vice versa. + + +# Drawbacks +[drawbacks]: #drawbacks + +If the new macro system is not well adopted by users, we could be left with two +very different schemes for naming macros depending on whether a macro is defined +by example or procedurally. That would be inconsistent and annoying. However, I +hope we can make the new macro system appealing enough and close enough to the +existing system that migration is both desirable and easy. + + +# Alternatives +[alternatives]: #alternatives + +We could adopt the proposed scheme for procedural macros only and keep the +existing scheme for macros by example. + +We could adapt the current macros by example scheme to procedural macros. + +We could require the `!` in macro imports to distinguish them from other names. +I don't think this is necessary or helpful. + +We could continue to require `macro_export` annotations on top of this scheme. +However, I prefer moving to a scheme using the same privacy system as the rest +of Rust, see below. + + +# Unresolved questions +[unresolved]: #unresolved-questions + +## Privacy for macros + +I would like that macros follow the same rules for privacy as other Rust items, +i.e., they are private by default and may be marked as `pub` to make them +public. This is not as straightforward as it sounds as it requires parsing `pub +macro! foo` as a macro definition, etc. I leave this for a separate RFC. + +## Scoped attributes + +It would be nice for tools to use scoped attributes as well as procedural +macros, e.g., `#[rustfmt::skip]` or `#[rust::new_attribute]`. I believe this +should be straightforward syntactically, but there are open questions around +when attributes are ignored or seen by tools and the compiler. Again, I leave it +for a future RFC. + +## Inline procedural macros + +Some day, I hope that procedural macros may be defined in the same crate in +which they are used. I leave the details of this for later, however, I don't +think this affects the design of naming - it should all Just Work. + +## Applying to existing macros + +This RFC is framed in terms of a new macro system. There are various ways that +some parts of it could be applied to existing macros (`macro_rules!`) to +backwards compatibly make existing macros usable under the new naming system. + +I want to leave this question unanswered for now. Until we get some experience +implementing this feature it is unclear how much this is possible. Once we know +that we can try to decide how much of that is also desirable. diff --git a/text/1566-proc-macros.md b/text/1566-proc-macros.md new file mode 100644 index 00000000000..f1942e5d2be --- /dev/null +++ b/text/1566-proc-macros.md @@ -0,0 +1,483 @@ +- Feature Name: procedural_macros +- Start Date: 2016-02-15 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1566 +- Rust Issue: https://github.com/rust-lang/rust/issues/38356 + +# Summary +[summary]: #summary + +This RFC proposes an evolution of Rust's procedural macro system (aka syntax +extensions, aka compiler plugins). This RFC specifies syntax for the definition +of procedural macros, a high-level view of their implementation in the compiler, +and outlines how they interact with the compilation process. + +This RFC specifies the architecture of the procedural macro system. It relies on +[RFC 1561](https://github.com/rust-lang/rfcs/pull/1561) which specifies the +naming and modularisation of macros. It leaves many of the details for further +RFCs, in particular the details of the APIs available to macro authors +(tentatively called `libproc_macro`, formerly `libmacro`). See this +[blog post](http://ncameron.org/blog/libmacro/) for some ideas of how that might +look. + +[RFC 1681](https://github.com/rust-lang/rfcs/pull/1681) specified a mechanism +for custom derive using 'macros 1.1'. That RFC is essentially a subset of this +one. Changes and differences are noted throughout the text. + +At the highest level, macros are defined by implementing functions marked with +a `#[proc_macro]` attribute. Macros operate on a list of tokens provided by the +compiler and return a list of tokens that the macro use is replaced by. We +provide low-level facilities for operating on these tokens. Higher level +facilities (e.g., for parsing tokens to an AST) should exist as library crates. + + +# Motivation +[motivation]: #motivation + +Procedural macros have long been a part of Rust and have been used for diverse +and interesting purposes, for example [compile-time regexes](https://github.com/rust-lang-nursery/regex), +[serialisation](https://github.com/serde-rs/serde), and +[design by contract](https://github.com/nrc/libhoare). They allow the ultimate +flexibility in syntactic abstraction, and offer possibilities for efficiently +using Rust in novel ways. + +Procedural macros are currently unstable and are awkward to define. We would +like to remedy this by implementing a new, simpler system for procedural macros, +and for this new system to be on the usual path to stabilisation. + +One major problem with the current system is that since it is based on ASTs, if +we change the Rust language (even in a backwards compatible way) we can easily +break procedural macros. Therefore, offering the usual backwards compatibility +guarantees to procedural macros, would inhibit our ability to evolve the +language. By switching to a token-based (rather than AST- based) system, we hope +to avoid this problem. + +# Detailed design +[design]: #detailed-design + +There are two kinds of procedural macro: function-like and attribute-like. These +two kinds exist today, and other than naming (see +[RFC 1561](https://github.com/rust-lang/rfcs/pull/1561)) the syntax for using +these macros remains unchanged. If the macro is called `foo`, then a function- +like macro is used with syntax `foo!(...)`, and an attribute-like macro with +`#[foo(...)] ...`. Macros may be used in the same places as `macro_rules` macros +and this remains unchanged. + +There is also a third kind, custom derive, which are specified in [RFC +1681](https://github.com/rust-lang/rfcs/pull/1681). This RFC extends the +facilities open to custom derive macros beyond the string-based system of RFC +1681. + +To define a procedural macro, the programmer must write a function with a +specific signature and attribute. Where `foo` is the name of a function-like +macro: + +``` +#[proc_macro] +pub fn foo(TokenStream) -> TokenStream; +``` + +The first argument is the tokens between the delimiters in the macro use. +For example in `foo!(a, b, c)`, the first argument would be `[Ident(a), Comma, +Ident(b), Comma, Ident(c)]`. + +The value returned replaces the macro use. + +Attribute-like: + +``` +#[proc_macro_attribute] +pub fn foo(Option, TokenStream) -> TokenStream; +``` + +The first argument is a list of the tokens between the delimiters in the macro +use. Examples: + +* `#[foo]` => `None` +* `#[foo()]` => `Some([])` +* `#[foo(a, b, c)]` => `Some([Ident(a), Comma, Ident(b), Comma, Ident(c)])` + +The second argument is the tokens for the AST node the attribute is placed on. +Note that in order to compute the tokens to pass here, the compiler must be able +to parse the code the attribute is applied to. However, the AST for the node +passed to the macro is discarded, it is not passed to the macro nor used by the +compiler (in practice, this might not be 100% true due to optimisiations). If +the macro wants an AST, it must parse the tokens itself. + +The attribute and the AST node it is applied to are both replaced by the +returned tokens. In most cases, the tokens returned by a procedural macro will +be parsed by the compiler. It is the procedural macro's responsibility to ensure +that the tokens parse without error. In some cases, the tokens will be consumed +by another macro without parsing, in which case they do not need to parse. The +distinction is not statically enforced. It could be, but I don't think the +overhead would be justified. + +Custom derive: + +``` +#[proc_macro_derive] +pub fn foo(TokenStream) -> TokenStream; +``` + +Similar to attribute-like macros, the item a custom derive applies to must +parse. Custom derives may on be applied to the items that a built-in derive may +be applied to (structs and enums). + +Currently, macros implementing custom derive only have the option of converting +the `TokenStream` to a string and converting a result string back to a +`TokenStream`. This option will remain, but macro authors will also be able to +operate directly on the `TokenStream` (which should be preferred, since it +allows for hygiene and span support). + +Procedural macros which take an identifier before the argument list (e.g, `foo! +bar(...)`) will not be supported (at least initially). + +My feeling is that this macro form is not used enough to justify its existence. +From a design perspective, it encourages uses of macros for language extension, +rather than syntactic abstraction. I feel that such macros are at higher risk of +making programs incomprehensible and of fragmenting the ecosystem). + +Behind the scenes, these functions implement traits for each macro kind. We may +in the future allow implementing these traits directly, rather than just +implementing the above functions. By adding methods to these traits, we can +allow macro implementations to pass data to the compiler, for example, +specifying hygiene information or allowing for fast re-compilation. + +## `proc-macro` crates + +[Macros 1.1](https://github.com/rust-lang/rfcs/pull/1681) added a new crate +type: proc-macro. This both allows procedural macros to be declared within the +crate, and dictates how the crate is compiled. Procedural macros must use +this crate type. + +We introduce a special configuration option: `#[cfg(proc_macro)]`. Items with +this configuration are not macros themselves but are compiled only for macro +uses. + +If a crate is a `proc-macro` crate, then the `proc_macro` cfg variable is true +for the whole crate. Initially it will be false for all other crates. This has +the effect of partitioning crates into macro- defining and non-macro defining +crates. In the future, I hope we can relax these restrictions so that macro and +non-macro code can live in the same crate. + +Importing macros for use means using `extern crate` to make the crate available +and then using `use` imports or paths to name macros, just like other items. +Again, see [RFC 1561](https://github.com/rust-lang/rfcs/pull/1561) for more +details. + +When a `proc-macro` crate is `extern crate`ed, it's items (even public ones) are +not available to the importing crate; only macros declared in that crate. There +should be a lint to warn about public items which will not be visible due to +`proc_macro`. The crate is used by the compiler at compile-time, rather than +linked with the importing crate at runtime. + +[Macros 1.1](https://github.com/rust-lang/rfcs/pull/1681) required `#[macro_use]` +on `extern crate` which imports procedural macros. This will not be required +and should be deprecated. + + +## Writing procedural macros + +Procedural macro authors should not use the compiler crates (libsyntax, etc.). +Using these will remain unstable. We will make available a new crate, +libproc_macro, which will follow the usual path to stabilisation, will be part +of the Rust distribution, and will be required to be used by procedural macros +(because, at the least, it defines the types used in the required signatures). + +The details of libproc_macro will be specified in a future RFC. In the meantime, +this [blog post](http://ncameron.org/blog/libmacro/) gives an idea of what it +might contain. + +The philosophy here is that libproc_macro will contain low-level tools for +constructing macros, dealing with tokens, hygiene, pattern matching, quasi- +quoting, interactions with the compiler, etc. For higher level abstractions +(such as parsing and an AST), macros should use external libraries (there are no +restrictions on `#[cfg(proc_macro)]` crates using other crates). + +A `MacroContext` is an object placed in thread-local storage when a macro is +expanded. It contains data about how the macro is being used and defined. It is +expected that for most uses, macro authors will not use the `MacroContext` +directly, but it will be used by library functions. It will be more fully +defined in the upcoming RFC proposing libproc_macro. + +Rust macros are hygienic by default. Hygiene is a large and complex subject, but +to summarise: effectively, naming takes place in the context of the macro +definition, not the expanded macro. + +Procedural macros often want to bend the rules around macro hygiene, for example +to make items or variables more widely nameable than they would be by default. +Procedural macros will be able to take part in the application of the hygiene +algorithm via libproc_macro. Again, full details must wait for the libproc_macro +RFC and a sketch is available in this [blog post](http://ncameron.org/blog/libmacro/). + + +## Tokens + +Procedural macros will primarily operate on tokens. There are two main benefits +to this principle: flexibility and future proofing. By operating on tokens, code +passed to procedural macros does not need to satisfy the Rust parser, only the +lexer. Stabilising an interface based on tokens means we need only commit to +not changing the rules around those tokens, not the whole grammar. I.e., it +allows us to change the Rust grammar without breaking procedural macros. + +In order to make the token-based interface even more flexible and future-proof, +I propose a simpler token abstraction than is currently used in the compiler. +The proposed system may be used directly in the compiler or may be an interface +wrapper over a more efficient representation. + +Since macro expansion will not operate purely on tokens, we must keep hygiene +information on tokens, rather than on `Ident` AST nodes (we might be able to +optimise by not keeping such info for all tokens, but that is an implementation +detail). We will also keep span information for each token, since that is where +a record of macro expansion is maintained (and it will make life easier for +tools. Again, we might optimise internally). + +A token is a single lexical element, for example, a numeric literal, a word +(which could be an identifier or keyword), a string literal, or a comment. + +A token stream is a sequence of tokens, e.g., `a b c;` is a stream of four +tokens - `['a', 'b', 'c', ';'']`. + +A token tree is a tree structure where each leaf node is a token and each +interior node is a token stream. I.e., a token stream which can contain nested +token streams. A token tree can be delimited, e.g., `a (b c);` will give +`TT(None, ['a', TT(Some('()'), ['b', 'c'], ';'']))`. An undelimited token tree +is useful for grouping tokens due to expansion, without representation in the +source code. That could be used for unsafety hygiene, or to affect precedence +and parsing without affecting scoping. They also replace the interpolated AST +tokens currently in the compiler. + +In code: + +``` +// We might optimise this representation +pub struct TokenStream(Vec); + +// A borrowed TokenStream +pub struct TokenSlice<'a>(&'a [TokenTree]); + +// A token or token tree. +pub struct TokenTree { + pub kind: TokenKind, + pub span: Span, + pub hygiene: HygieneObject, +} + +pub enum TokenKind { + Sequence(Delimiter, TokenStream), + + // The content of the comment can be found from the span. + Comment(CommentKind), + + // `text` is the string contents, not including delimiters. It would be nice + // to avoid an allocation in the common case that the string is in the + // source code. We might be able to use `&'codemap str` or something. + // `raw_markers` is for the count of `#`s if the string is a raw string. If + // the string is not raw, then it will be `None`. + String { text: Symbol, raw_markers: Option, kind: StringKind }, + + // char literal, span includes the `'` delimiters. + Char(char), + + // These tokens are treated specially since they are used for macro + // expansion or delimiting items. + Exclamation, // `!` + Dollar, // `$` + // Not actually sure if we need this or if semicolons can be treated like + // other punctuation. + Semicolon, // `;` + Eof, // Do we need this? + + // Word is defined by Unicode Standard Annex 31 - + // [Unicode Identifier and Pattern Syntax](http://unicode.org/reports/tr31/) + Word(Symbol), + Punctuation(char), +} + +pub enum Delimiter { + None, + // { } + Brace, + // ( ) + Parenthesis, + // [ ] + Bracket, +} + +pub enum CommentKind { + Regular, + InnerDoc, + OuterDoc, +} + +pub enum StringKind { + Regular, + Byte, +} + +// A Symbol is a possibly-interned string. +pub struct Symbol { ... } +``` + +Note that although tokens exclude whitespace, by examining the spans of tokens, +a procedural macro can get the string representation of a `TokenStream` and thus +has access to whitespace information. + +### Open question: `Punctuation(char)` and multi-char operators. + +Rust has many compound operators, e.g., `<<`. It's not clear how best to deal +with them. If the source code contains "`+ =`", it would be nice to distinguish +this in the token stream from "`+=`". On the other hand, if we represent `<<` as +a single token, then the macro may need to split them into `<`, `<` in generic +position. + +I had hoped to represent each character as a separate token. However, to make +pattern matching backwards compatible, we would need to combine some tokens. In +fact, if we want to be completely backwards compatible, we probably need to keep +the same set of compound operators as are defined at the moment. + +Some solutions: + +* `Punctuation(char)` with special rules for pattern matching tokens, +* `Punctuation([char])` with a facility for macros to split tokens. Tokenising + could match the maximum number of punctuation characters, or use the rules for + the current token set. The former would have issues with pattern matching. The + latter is a bit hacky, there would be backwards compatibility issues if we + wanted to add new compound operators in the future. + +## Staging + +1. Implement [RFC 1561](https://github.com/rust-lang/rfcs/pull/1561). +2. Implement `#[proc_macro]` and `#[cfg(proc_macro)]` and the function approach to + defining macros. However, pass the existing data structures to the macros, + rather than tokens and `MacroContext`. +3. Implement libproc_macro and make this available to macros. At this stage both old + and new macros are available (functions with different signatures). This will + require an RFC and considerable refactoring of the compiler. +4. Implement some high-level macro facilities in external crates on top of + libproc_macro. It is hoped that much of this work will be community-led. +5. After some time to allow conversion, deprecate the old-style macros. Later, + remove old macros completely. + + +# Drawbacks +[drawbacks]: #drawbacks + +Procedural macros are a somewhat unpleasant corner of Rust at the moment. It is +hard to argue that some kind of reform is unnecessary. One could find fault with +this proposed reform in particular (see below for some alternatives). Some +drawbacks that come to mind: + +* providing such a low-level API risks never seeing good high-level libraries; +* the design is complex and thus will take some time to implement and stabilise, + meanwhile unstable procedural macros are a major pain point in current Rust; +* dealing with tokens and hygiene may discourage macro authors due to complexity, + hopefully that is addressed by library crates. + +The actual concept of procedural macros also have drawbacks: executing arbitrary +code in the compiler makes it vulnerable to crashes and possibly security issues, +macros can introduce hard to debug errors, macros can make a program hard to +comprehend, it risks creating de facto dialects of Rust and thus fragmentation +of the ecosystem, etc. + +# Alternatives +[alternatives]: #alternatives + +We could keep the existing system or remove procedural macros from Rust. + +We could have an AST-based (rather than token-based) system. This has major +backwards compatibility issues. + +We could allow pluging in at later stages of compilation, giving macros access +to type information, etc. This would allow some really interesting tools. +However, it has some large downsides - it complicates the whole compilation +process (not just the macro system), it pollutes the whole compiler with macro +knowledge, rather than containing it in the frontend, it complicates the design +of the interface between the compiler and macro, and (I believe) the use cases +are better addressed by compiler plug-ins or tools based on the compiler (the +latter can be written today, the former require more work on an interface to the +compiler to be practical). + +We could use the `macro` keyword rather than the `fn` keyword to declare a +macro. We would then not require a `#[proc_macro]` attribute. + +We could use `#[macro]` instead of `#[proc_macro]` (and similarly for the other +attributes). This would require making `macro` a contextual keyword. + +We could have a dedicated syntax for procedural macros, similar to the +`macro_rules` syntax for macros by example. Since a procedural macro is really +just a Rust function, I believe using a function is better. I have also not been +able to come up with (or seen suggestions for) a good alternative syntax. It +seems reasonable to expect to write Rust macros in Rust (although there is +nothing stopping a macro author from using FFI and some other language to write +part or all of a macro). + +For attribute-like macros on items, it would be nice if we could skip parsing +the annotated item until after macro expansion. That would allow for more +flexible macros, since the input would not be constrained to Rust syntax. However, +this would require identifying items from tokens, rather than from the AST, which +would require additional rules on token trees and may not be possible. + + +# Unresolved questions +[unresolved]: #unresolved-questions + +### Linking model + +Currently, procedural macros are dynamically linked with the compiler. This +prevents the compiler being statically linked, which is sometimes desirable. An +alternative architecture would have procedural macros compiled as independent +programs and have them communicate with the compiler via IPC. + +This would have the advantage of allowing static linking for the compiler and +would prevent procedural macros from crashing the main compiler process. +However, designing a good IPC interface is complicated because there is a lot of +data that might be exchanged between the compiler and the macro. + +I think we could first design the syntax, interfaces, etc. and later evolve into +a process-separated model (if desired). However, if this is considered an +essential feature of macro reform, then we might want to consider the interfaces +more thoroughly with this in mind. + +A step in this direction might be to run the macro in its own thread, but in the +compiler's process. + +### Interactions with constant evaluation + +Both procedural macros and constant evaluation are mechanisms for running Rust +code at compile time. Currently, and under the proposed design, they are +considered completely separate features. There might be some benefit in letting +them interact. + + +### Inline procedural macros + +It would nice to allow procedural macros to be defined in the crate in which +they are used, as well as in separate crates (mentioned above). This complicates +things since it breaks the invariant that a crate is designed to be used at +either compile-time or runtime. I leave it for the future. + + +### Specification of the macro definition function signatures + +As proposed, the signatures of functions used as macro definitions are hard- +wired into the compiler. It would be more flexible to allow them to be specified +by a lang-item. I'm not sure how beneficial this would be, since a change to the +signature would require changing much of the procedural macro system. I propose +leaving them hard-wired, unless there is a good use case for the more flexible +approach. + + +### Specifying delimiters + +Under this RFC, a function-like macro use may use either parentheses, braces, or +square brackets. The choice of delimiter does not affect the semantics of the +macro (the rules requiring braces or a semi-colon for macro uses in item position +still apply). + +Which delimiter was used should be available to the macro implementation via the +`MacroContext`. I believe this is maximally flexible - the macro implementation +can throw an error if it doesn't like the delimiters used. + +We might want to allow the compiler to restrict the delimiters. Alternatively, +we might want to hide the information about the delimiter from the macro author, +so as not to allow errors regarding delimiter choice to affect the user. diff --git a/text/1567-long-error-codes-explanation-normalization.md b/text/1567-long-error-codes-explanation-normalization.md new file mode 100644 index 00000000000..9e02eed52b4 --- /dev/null +++ b/text/1567-long-error-codes-explanation-normalization.md @@ -0,0 +1,142 @@ + +Start Date: 2016-01-04 + +- RFC PR: [rust-lang/rfcs#1567](https://github.com/rust-lang/rfcs/pull/1567) +- Rust Issue: N/A + +# Summary + +Rust has extend error messages that explain each error in more detail. We've been writing lots of them, which is good, but they're written in different styles, which is bad. This RFC intends to fix this inconsistency by providing a template for these long-form explanations to follow. + +# Motivation + +Long error codes explanations are a very important part of Rust. Having an explanation of what failed helps to understand the error and is appreciated by Rust developers of all skill levels. Providing an unified template is needed in order to help people who would want to write ones as well as people who read them. + +# Detailed design + +Here is what I propose: + +## Error description + +Provide a more detailed error message. For example: + +```rust +extern crate a; +extern crate b as a; +``` + +We get the `E0259` error code which says "an extern crate named `a` has already been imported in this module" and the error explanation says: "The name chosen for an external crate conflicts with another external crate that has been imported into the current module.". + +## Minimal example + +Provide an erroneous code example which directly follows `Error description`. The erroneous example will be helpful for the `How to fix the problem`. Making it as simple as possible is really important in order to help readers to understand what the error is about. A comment should be added with the error on the same line where the errors occur. Example: + +```rust +type X = u32; // error: type parameters are not allowed on this type +``` + +If the error comments is too long to fit 80 columns, split it up like this, so the next line start at the same column of the previous line: + +```rust +type X = u32<'static>; // error: lifetime parameters are not allowed on + // this type +``` + +And if the sample code is too long to write an effective comment, place your comment on the line before the sample code: + +```rust +// error: lifetime parameters are not allowed on this type +fn super_long_function_name_and_thats_problematic() {} +``` + +Of course, it the comment is too long, the split rules still applies. + +## Error explanation + +Provide a full explanation about "__why__ you get the error" and some leads on __how__ to fix it. If needed, use additional code snippets to improve your explanations. + +## How to fix the problem + +This part will show how to fix the error that we saw previously in the `Minimal example`, with comments explaining how it was fixed. + +## Additional information + +Some details which might be useful for the users, let's take back `E0109` example. At the end, the supplementary explanation is the following: "Note that type parameters for enum-variant constructors go after the variant, not after the enum (`Option::None::`, not `Option::::None`).". It provides more information, not directly linked to the error, but it might help user to avoid doing another error. + +## Template + +In summary, the template looks like this: + +```rust +E000: r##" +[Error description] + +Example of erroneous code: + +\```compile_fail +[Minimal example] +\``` + +[Error explanation] + +\``` +[How to fix the problem] +\``` + +[Optional Additional information] +``` + +Now let's take a full example: + +> E0409: r##" +> An "or" pattern was used where the variable bindings are not consistently bound +> across patterns. +> +> Example of erroneous code: +> +> ```compile_fail +> let x = (0, 2); +> match x { +> (0, ref y) | (y, 0) => { /* use y */} // error: variable `y` is bound with +> // different mode in pattern #2 +> // than in pattern #1 +> _ => () +> } +> ``` +> +> Here, `y` is bound by-value in one case and by-reference in the other. +> +> To fix this error, just use the same mode in both cases. +> Generally using `ref` or `ref mut` where not already used will fix this: +> +> ```ignore +> let x = (0, 2); +> match x { +> (0, ref y) | (ref y, 0) => { /* use y */} +> _ => () +> } +> ``` +> +> Alternatively, split the pattern: +> +> ``` +> let x = (0, 2); +> match x { +> (y, 0) => { /* use y */ } +> (0, ref y) => { /* use y */} +> _ => () +> } +> ``` +> "##, + +# Drawbacks + +This will make contributing slighty more complex, as there are rules to follow, whereas right now there are none. + +# Alternatives + +Not having error codes explanations following a common template. + +# Unresolved questions + +None. diff --git a/text/1574-more-api-documentation-conventions.md b/text/1574-more-api-documentation-conventions.md new file mode 100644 index 00000000000..178ac66bdc2 --- /dev/null +++ b/text/1574-more-api-documentation-conventions.md @@ -0,0 +1,662 @@ +- Feature Name: More API Documentation Conventions +- Start Date: 2016-03-31 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +[RFC 505] introduced certain conventions around documenting Rust projects. This +RFC augments that one, and a full text of the older one combined with these +modfications is provided below. + +[RFC 505]: https://github.com/rust-lang/rfcs/blob/master/text/0505-api-comment-conventions.md + +# Motivation +[motivation]: #motivation + +Documentation is an extremely important part of any project. It’s important +that we have consistency in our documentation. + +For the most part, the RFC proposes guidelines that are already followed today, +but it tries to motivate and clarify them. + +# Detailed design +[design]: #detailed-design + +### English +[english]: #english + +This section applies to `rustc` and the standard library. + +### Using Markdown +[using-markdown]: #using-markdown + +The updated list of common headings is: + +* Examples +* Panics +* Errors +* Safety +* Aborts +* Undefined Behavior + +RFC 505 suggests that one should always use the `rust` formatting directive: + + ```rust + println!("Hello, world!"); + ``` + + ```ruby + puts "Hello" + ``` + +But, in API documentation, feel free to rely on the default being ‘rust’: + + /// For example: + /// + /// ``` + /// let x = 5; + /// ``` + +Other places do not know how to highlight this anyway, so it's not important to +be explicit. + +RFC 505 suggests that references and citation should be linked ‘reference +style.’ This is still recommended, but prefer to leave off the second `[]`: + +``` +[Rust website] + +[Rust website]: http://www.rust-lang.org +``` + +to + +``` +[Rust website][website] + +[website]: http://www.rust-lang.org +``` + +But, if the text is very long, it is okay to use this form. + +### Examples in API docs +[examples-in-api-docs]: #examples-in-api-docs + +Everything should have examples. Here is an example of how to do examples: + +``` +/// # Examples +/// +/// ``` +/// use op; +/// +/// let s = "foo"; +/// let answer = op::compare(s, "bar"); +/// ``` +/// +/// Passing a closure to compare with, rather than a string: +/// +/// ``` +/// use op; +/// +/// let s = "foo"; +/// let answer = op::compare(s, |a| a.chars().is_whitespace().all()); +/// ``` +``` + +### Referring to types +[referring-to-types]: #referring-to-types + +When talking about a type, use its full name. In other words, if the type is generic, +say `Option`, not `Option`. An exception to this is bounds. Write `Cow<'a, B>` +rather than `Cow<'a, B> where B: 'a + ToOwned + ?Sized`. + +Another possibility is to write in lower case using a more generic term. In other words, +‘string’ can refer to a `String` or an `&str`, and ‘an option’ can be ‘an `Option`’. + +### Link all the things +[link-all-the-things]: #link-all-the-things + +A major drawback of Markdown is that it cannot automatically link types in API documentation. +Do this yourself with the reference-style syntax, for ease of reading: + +``` +/// The [`String`] passed in lorum ipsum... +/// +/// [`String`]: ../string/struct.String.html +``` + +### Module-level vs type-level docs +[module-level-vs-type-level-docs]: #module-level-vs-type-level-docs + +There has often been a tension between module-level and type-level +documentation. For example, in today's standard library, the various +`*Cell` docs say, in the pages for each type, to "refer to the module-level +documentation for more details." + +Instead, module-level documentation should show a high-level summary of +everything in the module, and each type should document itself fully. It is +okay if there is some small amount of duplication here. Module-level +documentation should be broad and not go into a lot of detail. That is left +to the type's documentation. + +## Example +[example]: #example + +Below is a full crate, with documentation following these rules. I am loosely basing +this off of my [ref_slice] crate, because it’s small, but I’m not claiming the code +is good here. It’s about the docs, not the code. + +[ref_slice]: https://crates.io/crates/ref_slice + +In lib.rs: + +```rust +//! Turning references into slices +//! +//! This crate contains several utility functions for taking various kinds +//! of references and producing slices out of them. In this case, only full +//! slices, not ranges for sub-slices. +//! +//! # Layout +//! +//! At the top level, we have functions for working with references, `&T`. +//! There are two submodules for dealing with other types: `option`, for +//! &[`Option`], and `mut`, for `&mut T`. +//! +//! [`Option`]: http://doc.rust-lang.org/std/option/enum.Option.html + +pub mod option; + +/// Converts a reference to `T` into a slice of length 1. +/// +/// This will not copy the data, only create the new slice. +/// +/// # Panics +/// +/// In this case, the code won’t panic, but if it did, the circumstances +/// in which it would would be included here. +/// +/// # Examples +/// +/// ``` +/// extern crate ref_slice; +/// use ref_slice::ref_slice; +/// +/// let x = &5; +/// +/// let slice = ref_slice(x); +/// +/// assert_eq!(&[5], slice); +/// ``` +/// +/// A more compelx example. In this case, it’s the same example, because this +/// is a pretty trivial function, but use your imagination. +/// +/// ``` +/// extern crate ref_slice; +/// use ref_slice::ref_slice; +/// +/// let x = &5; +/// +/// let slice = ref_slice(x); +/// +/// assert_eq!(&[5], slice); +/// ``` +pub fn ref_slice(s: &T) -> &[T] { + unimplemented!() +} + +/// Functions that operate on mutable references. +/// +/// This submodule mirrors the parent module, but instead of dealing with `&T`, +/// they’re for `&mut T`. +mod mut { + /// Converts a reference to `&mut T` into a mutable slice of length 1. + /// + /// This will not copy the data, only create the new slice. + /// + /// # Safety + /// + /// In this case, the code doesn’t need to be marked as unsafe, but if it + /// did, the invariants you’re expected to uphold would be documented here. + /// + /// # Examples + /// + /// ``` + /// extern crate ref_slice; + /// use ref_slice::mut; + /// + /// let x = &mut 5; + /// + /// let slice = mut::ref_slice(x); + /// + /// assert_eq!(&mut [5], slice); + /// ``` + pub fn ref_slice(s: &mut T) -> &mut [T] { + unimplemented!() + } +} +``` + +in `option.rs`: + +```rust +//! Functions that operate on references to [`Option`]s. +//! +//! This submodule mirrors the parent module, but instead of dealing with `&T`, +//! they’re for `&`[`Option`]. +//! +//! [`Option`]: http://doc.rust-lang.org/std/option/enum.Option.html + +/// Converts a reference to `Option` into a slice of length 0 or 1. +/// +/// [`Option`]: http://doc.rust-lang.org/std/option/enum.Option.html +/// +/// This will not copy the data, only create the new slice. +/// +/// # Examples +/// +/// ``` +/// extern crate ref_slice; +/// use ref_slice::option; +/// +/// let x = &Some(5); +/// +/// let slice = option::ref_slice(x); +/// +/// assert_eq!(&[5], slice); +/// ``` +/// +/// `None` will result in an empty slice: +/// +/// ``` +/// extern crate ref_slice; +/// use ref_slice::option; +/// +/// let x: &Option = &None; +/// +/// let slice = option::ref_slice(x); +/// +/// assert_eq!(&[], slice); +/// ``` +pub fn ref_slice(opt: &Option) -> &[T] { + unimplemented!() +} +``` + +# Drawbacks +[drawbacks]: #drawbacks + +It’s possible that RFC 505 went far enough, and something this detailed is inappropriate. + +# Alternatives +[alternatives]: #alternatives + +We could stick with the more minimal conventions of the previous RFC. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None. + +# Appendix A: Full conventions text + +Below is a combination of RFC 505 + this RFC’s modifications, for convenience. + +### Summary sentence +[summary-sentence]: #summary-sentence + +In API documentation, the first line should be a single-line short sentence +providing a summary of the code. This line is used as a summary description +throughout Rustdoc’s output, so it’s a good idea to keep it short. + +The summary line should be written in third person singular present indicative +form. Basically, this means write ‘Returns’ instead of ‘Return’. + +### English +[english]: #english + +This section applies to `rustc` and the standard library. + +All documentation for the standard library is standardized on American English, +with regards to spelling, grammar, and punctuation conventions. Language +changes over time, so this doesn’t mean that there is always a correct answer +to every grammar question, but there is often some kind of formal consensus. + +### Use line comments +[use-line-comments]: #use-line-comments + +Avoid block comments. Use line comments instead: + +```rust +// Wait for the main task to return, and set the process error code +// appropriately. +``` + +Instead of: + +```rust +/* + * Wait for the main task to return, and set the process error code + * appropriately. + */ +``` + +Only use inner doc comments `//!` to write crate and module-level documentation, +nothing else. When using `mod` blocks, prefer `///` outside of the block: + +```rust +/// This module contains tests +mod test { + // ... +} +``` + +over + +```rust +mod test { + //! This module contains tests + + // ... +} +``` + +### Using Markdown +[using-markdown]: #using-markdown + +Within doc comments, use Markdown to format your documentation. + +Use top level headings (`#`) to indicate sections within your comment. Common headings: + +* Examples +* Panics +* Errors +* Safety +* Aborts +* Undefined Behavior + +An example: + +```rust +/// # Examples +``` + +Even if you only include one example, use the plural form: ‘Examples’ rather +than ‘Example’. Future tooling is easier this way. + +Use backticks (`) to denote a code fragment within a sentence. + +Use triple backticks (```) to write longer examples, like this: + + This code does something cool. + + ```rust + let x = foo(); + + x.bar(); + ``` + +When appropriate, make use of Rustdoc’s modifiers. Annotate triple backtick blocks with +the appropriate formatting directive. + + ```rust + println!("Hello, world!"); + ``` + + ```ruby + puts "Hello" + ``` + +In API documentation, feel free to rely on the default being ‘rust’: + + /// For example: + /// + /// ``` + /// let x = 5; + /// ``` + +In long-form documentation, always be explicit: + + For example: + + ```rust + let x = 5; + ``` + +This will highlight syntax in places that do not default to ‘rust’, like GitHub. + +Rustdoc is able to test all Rust examples embedded inside of documentation, so +it’s important to mark what is not Rust so your tests don’t fail. + +References and citation should be linked ‘reference style.’ Prefer + +``` +[Rust website] + +[Rust website]: http://www.rust-lang.org +``` + +to + +``` +[Rust website](http://www.rust-lang.org) +``` + +If the text is very long, feel free to use the shortened form: + +``` +This link [is very long and links to the Rust website][website]. + +[website]: http://www.rust-lang.org +``` + +### Examples in API docs +[examples-in-api-docs]: #examples-in-api-docs + +Everything should have examples. Here is an example of how to do examples: + +``` +/// # Examples +/// +/// ``` +/// use op; +/// +/// let s = "foo"; +/// let answer = op::compare(s, "bar"); +/// ``` +/// +/// Passing a closure to compare with, rather than a string: +/// +/// ``` +/// use op; +/// +/// let s = "foo"; +/// let answer = op::compare(s, |a| a.chars().is_whitespace().all()); +/// ``` +``` + +### Referring to types +[referring-to-types]: #referring-to-types + +When talking about a type, use its full name. In other words, if the type is generic, +say `Option`, not `Option`. An exception to this is bounds. Write `Cow<'a, B>` +rather than `Cow<'a, B> where B: 'a + ToOwned + ?Sized`. + +Another possibility is to write in lower case using a more generic term. In other words, +‘string’ can refer to a `String` or an `&str`, and ‘an option’ can be ‘an `Option`’. + +### Link all the things +[link-all-the-things]: #link-all-the-things + +A major drawback of Markdown is that it cannot automatically link types in API documentation. +Do this yourself with the reference-style syntax, for ease of reading: + +``` +/// The [`String`] passed in lorum ipsum... +/// +/// [`String`]: ../string/struct.String.html +``` + +### Module-level vs type-level docs +[module-level-vs-type-level-docs]: #module-level-vs-type-level-docs + +There has often been a tension between module-level and type-level +documentation. For example, in today's standard library, the various +`*Cell` docs say, in the pages for each type, to "refer to the module-level +documentation for more details." + +Instead, module-level documentation should show a high-level summary of +everything in the module, and each type should document itself fully. It is +okay if there is some small amount of duplication here. Module-level +documentation should be broad, and not go into a lot of detail, which is left +to the type's documentation. + +## Example +[example]: #example + +Below is a full crate, with documentation following these rules. I am loosely basing +this off of my [ref_slice] crate, because it’s small, but I’m not claiming the code +is good here. It’s about the docs, not the code. + +[ref_slice]: https://crates.io/crates/ref_slice + +In lib.rs: + +```rust +//! Turning references into slices +//! +//! This crate contains several utility functions for taking various kinds +//! of references and producing slices out of them. In this case, only full +//! slices, not ranges for sub-slices. +//! +//! # Layout +//! +//! At the top level, we have functions for working with references, `&T`. +//! There are two submodules for dealing with other types: `option`, for +//! &[`Option`], and `mut`, for `&mut T`. +//! +//! [`Option`]: http://doc.rust-lang.org/std/option/enum.Option.html + +pub mod option; + +/// Converts a reference to `T` into a slice of length 1. +/// +/// This will not copy the data, only create the new slice. +/// +/// # Panics +/// +/// In this case, the code won’t panic, but if it did, the circumstances +/// in which it would would be included here. +/// +/// # Examples +/// +/// ``` +/// extern crate ref_slice; +/// use ref_slice::ref_slice; +/// +/// let x = &5; +/// +/// let slice = ref_slice(x); +/// +/// assert_eq!(&[5], slice); +/// ``` +/// +/// A more compelx example. In this case, it’s the same example, because this +/// is a pretty trivial function, but use your imagination. +/// +/// ``` +/// extern crate ref_slice; +/// use ref_slice::ref_slice; +/// +/// let x = &5; +/// +/// let slice = ref_slice(x); +/// +/// assert_eq!(&[5], slice); +/// ``` +pub fn ref_slice(s: &T) -> &[T] { + unimplemented!() +} + +/// Functions that operate on mutable references. +/// +/// This submodule mirrors the parent module, but instead of dealing with `&T`, +/// they’re for `&mut T`. +mod mut { + /// Converts a reference to `&mut T` into a mutable slice of length 1. + /// + /// This will not copy the data, only create the new slice. + /// + /// # Safety + /// + /// In this case, the code doesn’t need to be marked as unsafe, but if it + /// did, the invariants you’re expected to uphold would be documented here. + /// + /// # Examples + /// + /// ``` + /// extern crate ref_slice; + /// use ref_slice::mut; + /// + /// let x = &mut 5; + /// + /// let slice = mut::ref_slice(x); + /// + /// assert_eq!(&mut [5], slice); + /// ``` + pub fn ref_slice(s: &mut T) -> &mut [T] { + unimplemented!() + } +} +``` + +in `option.rs`: + +```rust +//! Functions that operate on references to [`Option`]s. +//! +//! This submodule mirrors the parent module, but instead of dealing with `&T`, +//! they’re for `&`[`Option`]. +//! +//! [`Option`]: http://doc.rust-lang.org/std/option/enum.Option.html + +/// Converts a reference to `Option` into a slice of length 0 or 1. +/// +/// [`Option`]: http://doc.rust-lang.org/std/option/enum.Option.html +/// +/// This will not copy the data, only create the new slice. +/// +/// # Examples +/// +/// ``` +/// extern crate ref_slice; +/// use ref_slice::option; +/// +/// let x = &Some(5); +/// +/// let slice = option::ref_slice(x); +/// +/// assert_eq!(&[5], slice); +/// ``` +/// +/// `None` will result in an empty slice: +/// +/// ``` +/// extern crate ref_slice; +/// use ref_slice::option; +/// +/// let x: &Option = &None; +/// +/// let slice = option::ref_slice(x); +/// +/// assert_eq!(&[], slice); +/// ``` +pub fn ref_slice(opt: &Option) -> &[T] { + unimplemented!() +} +``` + diff --git a/text/1576-macros-literal-matcher.md b/text/1576-macros-literal-matcher.md new file mode 100644 index 00000000000..7626e2150ba --- /dev/null +++ b/text/1576-macros-literal-matcher.md @@ -0,0 +1,40 @@ +- Feature Name: macros-literal-match +- Start Date: 2016-04-08 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1576 +- Rust Issue: https://github.com/rust-lang/rust/issues/35625 + +# Summary + +Add a `literal` fragment specifier for `macro_rules!` patterns that matches literal constants: + +```rust +macro_rules! foo { + ($l:literal) => ( /* ... */ ); +}; +``` + +# Motivation + +There are a lot of macros out there that take literal constants as arguments (often string constants). For now, most use the `expr` fragment specifier, which is fine since literal constants are a subset of expressions. But it has the following issues: +* It restricts the syntax of those macros. A limited set of FOLLOW tokens is allowed after an `expr` specifier. For example `$e:expr : $t:ty` is not allowed whereas `$l:literal : $t:ty` should be. There is no reason to arbitrarily restrict the syntax of those macros where they will only be actually used with literal constants. A workaround for that is to use the `tt` matcher. +* It does not allow for proper error reporting where the macro actually *needs* the parameter to be a literal constant. With this RFC, bad usage of such macros will give a proper syntax error message whereas with `epxr` it would probably give a syntax or typing error inside the generated code, which is hard to understand. +* It's not consistent. There is no reason to allow expressions, types, etc. but not literals. + +# Design + +Add a `literal` (or `lit`, or `constant`) matcher in macro patterns that matches all single-tokens literal constants (those that are currently represented by `token::Literal`). +Matching input against this matcher would call the `parse_lit` method from `libsyntax::parse::Parser`. The FOLLOW set of this matcher should be the same as `ident` since it matches a single token. + +# Drawbacks + +This includes only single-token literal constants and not compound literals, for example struct literals `Foo { x: some_literal, y: some_literal }` or arrays `[some_literal ; N]`, where `some_literal` can itself be a compound literal. See in alternatives why this is disallowed. + +# Alternatives + +* Allow compound literals too. In theory there is no reason to exclude them since they do not require any computation. In practice though, allowing them requires using the expression parser but limiting it to allow only other compound literals and not arbitrary expressions to occur inside a compound literal (for example inside struct fields). This would probably require much more work to implement and also mitigates the first motivation since it will probably restrict a lot the FOLLOW set of such fragments. +* Adding fragment specifiers for each constant type: `$s:str` which expects a literal string, `$i:integer` which expects a literal integer, etc. With this design, we could allow something like `$s:struct` for compound literals which still requires a lot of work to implement but has the advantage of not ‶polluting″ the FOLLOW sets of other specifiers such as `str`. It provides also better ‶static″ (pre-expansion) checking of the arguments of a macro and thus better error reporting. Types are also good for documentation. The main drawback here if of course that we could not allow any possible type since we cannot interleave parsing and type checking, so we would have to define a list of accepted types, for example `str`, `integer`, `bool`, `struct` and `array` (without specifying the complete type of the structs and arrays). This would be a bit inconsistent since those types indeed refer more to syntactic categories in this context than to true Rust types. It would be frustrating and confusing since it can give the impression that macros do type-checking of their arguments, when of course they don't. +* Don't do this. Continue to use `expr` or `tt` to refer to literal constants. + +# Unresolved + +The keyword of the matcher can be `literal`, `lit`, `constant`, or something else. diff --git a/text/1581-fused-iterator.md b/text/1581-fused-iterator.md new file mode 100644 index 00000000000..e95a314418d --- /dev/null +++ b/text/1581-fused-iterator.md @@ -0,0 +1,284 @@ +- Feature Name: fused +- Start Date: 2016-04-15 +- RFC PR: [rust-lang/rfcs#1581](https://github.com/rust-lang/rfcs/pull/1581) +- Rust Issue: [rust-lang/rust#35602](https://github.com/rust-lang/rust/issues/35602) + +# Summary +[summary]: #summary + +Add a marker trait `FusedIterator` to `std::iter` and implement it on `Fuse` and +applicable iterators and adapters. By implementing `FusedIterator`, an iterator +promises to behave as if `Iterator::fuse()` had been called on it (i.e. return +`None` forever after returning `None` once). Then, specialize `Fuse` to be a +no-op if `I` implements `FusedIterator`. + +# Motivation +[motivation]: #motivation + +Iterators are allowed to return whatever they want after returning `None` once. +However, assuming that an iterator continues to return `None` can make +implementing some algorithms/adapters easier. Therefore, `Fuse` and +`Iterator::fuse` exist. Unfortunately, the `Fuse` iterator adapter introduces a +noticeable overhead. Furthermore, many iterators (most if not all iterators in +std) already act as if they were fused (this is considered to be the "polite" +behavior). Therefore, it would be nice to be able to pay the `Fuse` overhead +only when necessary. + +Microbenchmarks: + +```text +test fuse ... bench: 200 ns/iter (+/- 13) +test fuse_fuse ... bench: 250 ns/iter (+/- 10) +test myfuse ... bench: 48 ns/iter (+/- 4) +test myfuse_myfuse ... bench: 48 ns/iter (+/- 3) +test range ... bench: 48 ns/iter (+/- 2) +``` + +```rust +#![feature(test, specialization)] +extern crate test; + +use std::ops::Range; + +#[derive(Clone, Debug)] +#[must_use = "iterator adaptors are lazy and do nothing unless consumed"] +pub struct Fuse { + iter: I, + done: bool +} + +pub trait FusedIterator: Iterator {} + +trait IterExt: Iterator + Sized { + fn myfuse(self) -> Fuse { + Fuse { + iter: self, + done: false, + } + } +} + +impl FusedIterator for Fuse where Fuse: Iterator {} +impl FusedIterator for Range where Range: Iterator {} + +impl IterExt for T {} + +impl Iterator for Fuse where I: Iterator { + type Item = ::Item; + + #[inline] + default fn next(&mut self) -> Option<::Item> { + if self.done { + None + } else { + let next = self.iter.next(); + self.done = next.is_none(); + next + } + } +} + +impl Iterator for Fuse where I: FusedIterator { + #[inline] + fn next(&mut self) -> Option<::Item> { + self.iter.next() + } +} + +impl ExactSizeIterator for Fuse where I: ExactSizeIterator {} + +#[bench] +fn myfuse(b: &mut test::Bencher) { + b.iter(|| { + for i in (0..100).myfuse() { + test::black_box(i); + } + }) +} + +#[bench] +fn myfuse_myfuse(b: &mut test::Bencher) { + b.iter(|| { + for i in (0..100).myfuse().myfuse() { + test::black_box(i); + } + }); +} + + +#[bench] +fn fuse(b: &mut test::Bencher) { + b.iter(|| { + for i in (0..100).fuse() { + test::black_box(i); + } + }) +} + +#[bench] +fn fuse_fuse(b: &mut test::Bencher) { + b.iter(|| { + for i in (0..100).fuse().fuse() { + test::black_box(i); + } + }); +} + +#[bench] +fn range(b: &mut test::Bencher) { + b.iter(|| { + for i in (0..100) { + test::black_box(i); + } + }) +} +``` + +# Detailed Design +[design]: #detailed-design + +``` +trait FusedIterator: Iterator {} + +impl FusedIterator for Fuse {} + +impl FusedIterator for Range {} +// ...and for most std/core iterators... + + +// Existing implementation of Fuse repeated for convenience +pub struct Fuse { + iterator: I, + done: bool, +} + +impl Iterator for Fuse where I: Iterator { + type Item = I::Item; + + #[inline] + fn next(&mut self) -> Self::Item { + if self.done { + None + } else { + let next = self.iterator.next(); + self.done = next.is_none(); + next + } + } +} + +// Then, specialize Fuse... +impl Iterator for Fuse where I: FusedIterator { + type Item = I::Item; + + #[inline] + fn next(&mut self) -> Self::Item { + // Ignore the done flag and pass through. + // Note: this means that the done flag should *never* be exposed to the + // user. + self.iterator.next() + } +} + +``` + +# Drawbacks +[drawbacks]: #drawbacks + +1. Yet another special iterator trait. +2. There is a useless done flag on no-op `Fuse` adapters. +3. Fuse isn't used very often anyways. However, I would argue that it should be + used more often and people are just playing fast and loose. I'm hoping that + making `Fuse` free when unneeded will encourage people to use it when they should. +4. This trait locks implementors into following the `FusedIterator` spec; + removing the `FusedIterator` implementation would be a breaking change. This + precludes future optimizations that take advantage of the fact that the + behavior of an `Iterator` is undefined after it returns `None` the first + time. + + +# Alternatives + +## Do Nothing + +Just pay the overhead on the rare occasions when fused is actually used. + +## IntoFused + +Use an associated type (and set it to `Self` for iterators that already provide +the fused guarantee) and an `IntoFused` trait: + +```rust +#![feature(specialization)] +use std::iter::Fuse; + +trait FusedIterator: Iterator {} + +trait IntoFused: Iterator + Sized { + type Fused: Iterator; + fn into_fused(self) -> Self::Fused; +} + +impl IntoFused for T where T: Iterator { + default type Fused = Fuse; + default fn into_fused(self) -> Self::Fused { + // Currently complains about a mismatched type but I think that's a + // specialization bug. + self.fuse() + } +} + +impl IntoFused for T where T: FusedIterator { + type Fused = Self; + + fn into_fused(self) -> Self::Fused { + self + } +} +``` + +For now, this doesn't actually compile because rust believes that the associated +type `Fused` could be specialized independent of the `into_fuse` function. + +While this method gets rid of memory overhead of a no-op `Fuse` wrapper, it adds +complexity, needs to be implemented as a separate trait (because adding +associated types is a breaking change), and can't be used to optimize the +iterators returned from `Iterator::fuse` (users would *have* to call +`IntoFused::into_fused`). + +## Associated Type + +If we add the ability to condition associated types on `Self: Sized`, I believe +we can add them without it being a breaking change (associated types only need +to be fully specified on DSTs). If so (after fixing the bug in specialization +noted above), we could do the following: + +```rust +trait Iterator { + type Item; + type Fuse: Iterator where Self: Sized = Fuse; + fn fuse(self) -> Self::Fuse where Self: Sized { + Fuse { + done: false, + iter: self, + } + } + // ... +} +``` + +However, changing an iterator to take advantage of this would be a breaking +change. + +# Unresolved questions +[unresolved]: #unresolved-questions + +Should this trait be unsafe? I can't think of any way generic unsafe code could +end up relying on the guarantees of `FusedIterator`. + +~~Also, it's possible to implement the specialized `Fuse` struct without a useless +`done` bool. Unfortunately, it's *very* messy. IMO, this is not worth it for now +and can always be fixed in the future as it doesn't change the `FusedIterator` +trait.~~ Resolved: It's not possible to remove the `done` bool without making +`Fuse` invariant. + diff --git a/text/1584-macros.md b/text/1584-macros.md new file mode 100644 index 00000000000..a7e72a23477 --- /dev/null +++ b/text/1584-macros.md @@ -0,0 +1,148 @@ +- Feature Name: macro_2_0 +- Start Date: 2016-04-17 +- RFC PR: [1584](https://github.com/rust-lang/rfcs/pull/1584) +- Rust Issue: [39412](https://github.com/rust-lang/rust/issues/39412) + +# Summary +[summary]: #summary + +Declarative macros 2.0. A replacement for `macro_rules!`. This is mostly a +placeholder RFC since many of the issues affecting the new macro system are +(or will be) addressed in other RFCs. This RFC may be expanded at a later date. + +Currently in this RFC: + +* That we should have a new declarative macro system, +* a new keyword for declaring macros (`macro`). + +In other RFCs: + +* Naming and modularisation (#1561). + +To come in separate RFCs: + +* more detailed syntax proposal, +* hygiene improvements, +* more ... + +Note this RFC does not involve procedural macros (aka syntax extensions). + + +# Motivation +[motivation]: #motivation + +There are several changes to the declarative macro system which are desirable but +not backwards compatible (See [RFC 1561](https://github.com/rust-lang/rfcs/pull/1561) +for some changes to macro naming and modularisation, I would also like to +propose improvements to hygiene in macros, and some improved syntax). + +In order to maintain Rust's backwards compatibility guarantees, we cannot change +the existing system (`macro_rules!`) to accommodate these changes. I therefore +propose a new declarative macro system to live alongside `macro_rules!`. + +Example (possible) improvements: + +```rust +// Naming (RFC 1561) + +fn main() { + a::foo!(...); +} + +mod a { + // Macro privacy (TBA) + pub macro foo { ... } +} +``` + +```rust +// Relative paths (part of hygiene reform, TBA) + +mod a { + pub macro foo { ... bar() ... } + fn bar() { ... } +} + +fn main() { + a::foo!(...); // Expansion calls a::bar +} +``` + +```rust +// Syntax (TBA) + +macro foo($a: ident) => { + return $a + 1; +} +``` + +I believe it is extremely important that moving to the new macro system is as +straightforward as possible for both macro users and authors. This must be the +case so that users make the transition to the new system and we are not left +with two systems forever. + +A goal of this design is that for macro users, there is no difference in using +the two systems other than how macros are named. For macro authors, most macros +that work in the old system should work in the new system with minimal changes. +Macros which will need some adjustment are those that exploit holes in the +current hygiene system. + + +# Detailed design +[design]: #detailed-design + +There will be a new system of declarative macros using similar syntax and +semantics to the current `macro_rules!` system. + +A declarative macro is declared using the `macro` keyword. For example, where a +macro `foo` is declared today as `macro_rules! foo { ... }`, it will be declared +using `macro foo { ... }`. I leave the syntax of the macro body for later +specification. + +## Nomenclature + +Throughout this RFC, I use 'declarative macro' to refer to a macro declared +using declarative (and domain specific) syntax (such as the current +`macro_rules!` syntax). The 'declarative macros' name is in opposition to +'procedural macros', which are declared as Rust programs. The specific +declarative syntax using pattern matching and templating is often referred to as +'macros by example'. + +'Pattern macro' has been suggested as an alternative for 'declarative macro'. + +# Drawbacks +[drawbacks]: #drawbacks + +There is a risk that `macro_rules!` is good enough for most users and there is +low adoption of the new system. Possibly worse would be that there is high +adoption but little migration from the old system, leading to us having to +support two systems forever. + + +# Alternatives +[alternatives]: #alternatives + +Make backwards incompatible changes to `macro_rules!`. This is probably a +non-starter due to our stability guarantees. We might be able to make something +work if this was considered desirable. + +Limit ourselves to backwards compatible changes to `macro_rules!`. I don't think +this is worthwhile. It's not clear we can make meaningful improvements without +breaking backwards compatibility. + +Use `macro!` instead of `macro` (proposed in an earlier version of this RFC). + +Don't use a keyword - either make `macro` not a keyword or use a different word +for declarative macros. + +Live with the existing system. + + +# Unresolved questions +[unresolved]: #unresolved-questions + +What to do with `macro_rules`? We will need to maintain it at least until `macro` +is stable. Hopefully, we can then deprecate it (some time will be required to +migrate users to the new system). Eventually, I hope we can remove `macro_rules!`. +That will take a long time, and would require a 2.0 version of Rust to strictly +adhere to our stability guarantees. diff --git a/text/1589-rustc-bug-fix-procedure.md b/text/1589-rustc-bug-fix-procedure.md new file mode 100644 index 00000000000..9ba3ce8898f --- /dev/null +++ b/text/1589-rustc-bug-fix-procedure.md @@ -0,0 +1,284 @@ +- Feature Name: N/A +- Start Date: 2016-04-22 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1589 +- Rust Issue: N/A + +# Summary +[summary]: #summary + +Defines a best practices procedure for making bug fixes or soundness +corrections in the compiler that can cause existing code to stop +compiling. + +# Motivation +[motivation]: #motivation + +From time to time, we encounter the need to make a bug fix, soundness +correction, or other change in the compiler which will cause existing +code to stop compiling. When this happens, it is important that we +handle the change in a way that gives users of Rust a smooth +transition. What we want to avoid is that existing programs suddenly +stop compiling with opaque error messages: we would prefer to have a +gradual period of warnings, with clear guidance as to what the problem +is, how to fix it, and why the change was made. This RFC describes the +procedure that we have been developing for handling breaking changes +that aims to achieve that kind of smooth transition. + +One of the key points of this policy is that (a) warnings should be +issued initially rather than hard errors if at all possible and (b) +every change that causes existing code to stop compiling will have an +associated tracking issue. This issue provides a point to collect +feedback on the results of that change. Sometimes changes have +unexpectedly large consequences or there may be a way to avoid the +change that was not considered. In those cases, we may decide to +change course and roll back the change, or find another solution (if +warnings are being used, this is particularly easy to do). + +### What qualifies as a bug fix? + +Note that this RFC does not try to define when a breaking change is +permitted. That is already covered under [RFC 1122][]. This document +assumes that the change being made is in accordance with those +policies. Here is a summary of the conditions from RFC 1122: + +- **Soundness changes:** Fixes to holes uncovered in the type system. +- **Compiler bugs:** Places where the compiler is not implementing the + specified semantics found in an RFC or lang-team decision. +- **Underspecified language semantics:** Clarifications to grey areas + where the compiler behaves inconsistently and no formal behavior had + been previously decided. + +Please see [the RFC][RFC 1122] for full details! + +# Detailed design +[design]: #detailed-design + +The procedure for making a breaking change is as follows (each of +these steps is described in more detail below): + +0. Do a **crater run** to assess the impact of the change. +1. Make a **special tracking issue** dedicated to the change. +2. Do not report an error right away. Instead, **issue + forwards-compatibility lint warnings**. + - Sometimes this is not straightforward. See the text below for + suggestions on different techniques we have employed in the past. + - For cases where warnings are infeasible: + - Report errors, but make every effort to give a targeted error + message that directs users to the tracking issue + - Submit PRs to all known affected crates that fix the issue + - or, at minimum, alert the owners of those crates to the problem + and direct them to the tracking issue +3. Once the change has been in the wild for at least one cycle, we can + **stabilize the change**, converting those warnings into errors. + +Finally, for changes to libsyntax that will affect plugins, the +general policy is to batch these changes. That is discussed below in +more detail. + +### Tracking issue + +Every breaking change should be accompanied by a **dedicated tracking +issue** for that change. The main text of this issue should describe +the change being made, with a focus on what users must do to fix their +code. The issue should be approachable and practical; it may make +sense to direct users to an RFC or some other issue for the full +details. The issue also serves as a place where users can comment with +questions or other concerns. + +A template for these breaking-change tracking issues can be found +below. An example of how such an issue should look can be +[found here][breaking-change-issue]. + +The issue should be tagged with (at least) `B-unstable` and +`T-compiler`. + +### Tracking issue template + +What follows is a template for tracking issues. + +--------------------------------------------------------------------------- + +This is the **summary issue** for the `YOUR_LINT_NAME_HERE` +future-compatibility warning and other related errors. The goal of +this page is describe why this change was made and how you can fix +code that is affected by it. It also provides a place to ask questions +or register a complaint if you feel the change should not be made. For +more information on the policy around future-compatibility warnings, +see our [breaking change policy guidelines][guidelines]. + +[guidelines]: LINK_TO_THIS_RFC + +#### What is the warning for? + +*Describe the conditions that trigger the warning and how they can be +fixed. Also explain why the change was made.** + +#### When will this warning become a hard error? + +At the beginning of each 6-week release cycle, the Rust compiler team +will review the set of outstanding future compatibility warnings and +nominate some of them for **Final Comment Period**. Toward the end of +the cycle, we will review any comments and make a final determination +whether to convert the warning into a hard error or remove it +entirely. + +--------------------------------------------------------------------------- + +### Issuing future compatibility warnings + +The best way to handle a breaking change is to begin by issuing +future-compatibility warnings. These are a special category of lint +warning. Adding a new future-compatibility warning can be done as +follows. + +```rust +// 1. Define the lint in `src/librustc/lint/builtin.rs`: +declare_lint! { + pub YOUR_ERROR_HERE, + Warn, + "illegal use of foo bar baz" +} + +// 2. Add to the list of HardwiredLints in the same file: +impl LintPass for HardwiredLints { + fn get_lints(&self) -> LintArray { + lint_array!( + .., + YOUR_ERROR_HERE + ) + } +} + +// 3. Register the lint in `src/librustc_lint/lib.rs`: +store.register_future_incompatible(sess, vec![ + ..., + FutureIncompatibleInfo { + id: LintId::of(YOUR_ERROR_HERE), + reference: "issue #1234", // your tracking issue here! + }, +]); + +// 4. Report the lint: +tcx.sess.add_lint( + lint::builtin::YOUR_ERROR_HERE, + path_id, + binding.span, + format!("some helper message here")); +``` + +#### Helpful techniques + +It can often be challenging to filter out new warnings from older, +pre-existing errors. One technique that has been used in the past is +to run the older code unchanged and collect the errors it would have +reported. You can then issue warnings for any errors you would give +which do not appear in that original set. Another option is to abort +compilation after the original code completes if errors are reported: +then you know that your new code will only execute when there were no +errors before. + +#### Crater and crates.io + +We should always do a crater run to assess impact. It is polite and +considerate to at least notify the authors of affected crates the +breaking change. If we can submit PRs to fix the problem, so much the +better. + +#### Is it ever acceptable to go directly to issuing errors? + +Changes that are believed to have negligible impact can go directly to +issuing an error. One rule of thumb would be to check against +`crates.io`: if fewer than 10 **total** affected projects are found +(**not** root errors), we can move straight to an error. In such +cases, we should still make the "breaking change" page as before, and +we should ensure that the error directs users to this page. In other +words, everything should be the same except that users are getting an +error, and not a warning. Moreover, we should submit PRs to the +affected projects (ideally before the PR implementing the change lands +in rustc). + +If the impact is not believed to be negligible (e.g., more than 10 +crates are affected), then warnings are required (unless the compiler +team agrees to grant a special exemption in some particular case). If +implementing warnings is not feasible, then we should make an +aggressive strategy of migrating crates before we land the change so +as to lower the number of affected crates. Here are some techniques +for approaching this scenario: + +1. Issue warnings for subparts of the problem, and reserve the new + errors for the smallest set of cases you can. +2. Try to give a very precise error message that suggests how to fix + the problem and directs users to the tracking issue. +3. It may also make sense to layer the fix: + - First, add warnings where possible and let those land before proceeding + to issue errors. + - Work with authors of affected crates to ensure that corrected + versions are available *before* the fix lands, so that downstream + users can use them. + + +### Stabilization + +After a change is made, we will **stabilize** the change using the same +process that we use for unstable features: + +- After a new release is made, we will go through the outstanding tracking + issues corresponding to breaking changes and nominate some of them for + **final comment period** (FCP). +- The FCP for such issues lasts for one cycle. In the final week or two of the cycle, + we will review comments and make a final determination: + - Convert to error: the change should be made into a hard error. + - Revert: we should remove the warning and continue to allow the older code to compile. + - Defer: can't decide yet, wait longer, or try other strategies. + +Ideally, breaking changes should have landed on the **stable branch** +of the compiler before they are finalized. + +### Batching breaking changes to libsyntax + +Due to the lack of stable plugins, making changes to libsyntax can +currently be quite disruptive to the ecosystem that relies on plugins. +In an effort to ease this pain, we generally try to batch up such +changes so that they occur all at once, rather than occuring in a +piecemeal fashion. In practice, this means that you should add: + + cc #31645 @Manishearth + +to the PR and avoid directly merging it. In the future we may develop +a more polished procedure here, but the hope is that this is a +relatively temporary state of affairs. + +# Drawbacks +[drawbacks]: #drawbacks + +Following this policy can require substantial effort and slows the +time it takes for a change to become final. However, this is far +outweighed by the benefits of avoiding sharp disruptions in the +ecosystem. + +# Alternatives +[alternatives]: #alternatives + +There are obviously many points that we could tweak in this policy: + +- Eliminate the tracking issue. +- Change the stabilization schedule. + +Two other obvious (and rather extreme) alternatives are not having a +policy and not making any sort of breaking change at all: + +- Not having a policy at all (as is the case today) encourages + inconsistent treatment of issues. +- Not making any sorts of breaking changes would mean that Rust simply + has to stop evolving, or else would issue new major versions quite + frequently, causing undue disruption. + +# Unresolved questions +[unresolved]: #unresolved-questions + +N/A + + + +[RFC 1122]: https://github.com/rust-lang/rfcs/blob/master/text/1122-language-semver.md +[breaking-change-issue]: https://gist.github.com/nikomatsakis/631ec8b4af9a18b5d062d9d9b7d3d967 diff --git a/text/1590-macro-lifetimes.md b/text/1590-macro-lifetimes.md new file mode 100644 index 00000000000..38b92d51477 --- /dev/null +++ b/text/1590-macro-lifetimes.md @@ -0,0 +1,54 @@ +- Feature Name: Allow `lifetime` specifiers to be passed to macros +- Start Date: 2016-04-22 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1590 +- Rust Issue: https://github.com/rust-lang/rust/issues/34303 + +# Summary +[summary]: #summary + +Add a `lifetime` specifier for `macro_rules!` patterns, that matches any valid +lifetime. + +# Motivation +[motivation]: #motivation + +Certain classes of macros are completely impossible without the ability to pass +lifetimes. Specifically, anything that wants to implement a trait from inside of +a macro is going to need to deal with lifetimes eventually. They're also +commonly needed for any macros that need to deal with types in a more granular +way than just `ty`. + +Since a lifetime is a single token, the only way to match against a lifetime is +by capturing it as `tt`. Something like `'$lifetime:ident` would fail to +compile. This is extremely limiting, as it becomes difficult to sanitize input, +and `tt` is extremely difficult to use in a sequence without using awkward +separators. + +# Detailed design +[design]: #detailed-design + +This RFC proposes adding `lifetime` as an additional specifier to +`macro_rules!` (alternatively: `life` or `lt`). As it is a single token, it is +able to be followed by any other specifier. Since a lifetime acts very much +like an identifier, and can appear in almost as many places, it can be handled +almost identically. + +A preliminary implementation can be found at +https://github.com/rust-lang/rust/pull/33135 + +# Drawbacks +[drawbacks]: #drawbacks + +None + +# Alternatives +[alternatives]: #alternatives + +A more general specifier, such as a "type parameter list", which would roughly +map to `ast::Generics` would cover most of the cases that matching lifetimes +individually would cover. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None diff --git a/text/1607-style-rfcs.md b/text/1607-style-rfcs.md new file mode 100644 index 00000000000..162a580ce09 --- /dev/null +++ b/text/1607-style-rfcs.md @@ -0,0 +1,324 @@ +- Feature Name: N/A +- Start Date: 2016-04-21 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1607 +- Rust Issue: N/A + + +# Summary +[summary]: #summary + +This RFC proposes a process for deciding detailed guidelines for code +formatting, and default settings for Rustfmt. The outcome of the process should +be an approved formatting style defined by a style guide and enforced by +Rustfmt. + +This RFC proposes creating a new repository under the [rust-lang](https://github.com/rust-lang) +organisation called fmt-rfcs. It will be operated in a similar manner to the +[RFCs repository](https://github.com/rust-lang/rfcs), but restricted to +formatting issues. A new [sub-team](https://github.com/rust-lang/rfcs/blob/master/text/1068-rust-governance.md#subteams) +will be created to deal with those RFCs. Both the team and repository are +expected to be temporary. Once the style guide is complete, the team can be +disbanded and the repository frozen. + + +# Motivation +[motivation]: #motivation + +There is a need to decide on detailed guidelines for the format of Rust code. A +uniform, language-wide formatting style makes comprehending new code-bases +easier and forestalls bikeshedding arguments in teams of Rust users. The utility +of such guidelines has been proven by Go, amongst other languages. + +The [Rustfmt](https://github.com/rust-lang-nursery/rustfmt) tool is +[reaching maturity](https://users.rust-lang.org/t/please-help-test-rustfmt/5386) +and currently enforces a somewhat arbitrary, lightly discussed style, with many +configurable options. + +If Rustfmt is to become a widely accepted tool, there needs to be a process for +the Rust community to decide on the default style, and how configurable that +style should be. + +These discussions should happen in the open and be highly visible. It is +important that the Rust community has significant input to the process. The RFC +repository would be an ideal place to have this discussion because it exists to +satisfy these goals, and is tried and tested. However, the discussion is likely +to be a high-bandwidth one (code style is a contentious and often subjective +topic, and syntactic RFCs tend to be the highest traffic ones). Therefore, +having the discussion on the RFCs repository could easily overwhelm it and make +it less useful for other important discussions. + +There currently exists a [style guide](https://github.com/rust-lang/rust/tree/master/src/doc/style) +as part of the Rust documentation. This is far more wide-reaching than just +formatting style, but also not detailed enough to specify Rustfmt. This was +originally developed in its [own repository](https://github.com/rust-lang/rust-guidelines), +but is now part of the main Rust repository. That seems like a poor venue for +discussion of these guidelines due to visibility. + + +# Detailed design +[design]: #detailed-design + +## Process + +The process for style RFCs will mostly follow the [process for other RFCs](https://github.com/rust-lang/rfcs). +Anyone may submit an RFC. An overview of the process is: + +* If there is no single, obvious style, then open a GitHub issue on the + fmt-rfcs repo for initial discussion. This initial discussion should identify + which Rustfmt options are required to enforce the guideline. +* Implement the style in rustfmt (behind an option if it is not the current + default). In exceptional circumstances (such as where the implementation would + require very deep changes to rustfmt), this step may be skipped. +* Write an RFC formalising the formatting convention and referencing the + implementation, submit as a PR to fmt-rfcs. The RFC should include the default + values for options to enforce the guideline and which non-default options + should be kept. +* The RFC PR will be triaged by the style team and either assigned to a team + member for [shepherding](https://github.com/rust-lang/rfcs#the-role-of-the-shepherd), + or closed. +* When discussion has reached a fixed point, the RFC PR will be put into a final + comment period (FCP). +* After FCP, the RFC will either be accepted and merged or closed. +* Implementation in Rustfmt can then be finished (including any changes due to + discussion of the RFC), and defaults are set. + + +### Scope of the process + +This process is specifically limited to formatting style guidelines which can be +enforced by Rustfmt with its current architecture. Guidelines that cannot be +enforced by Rustfmt without a large amount of work are out of scope, even if +they only pertain to formatting. + +Note whether Rustfmt should be configurable at all, and if so how configurable +is a decision that should be dealt with using the formatting RFC process. That +will be a rather exceptional RFC. + +### Size of RFCs + +RFCs should be self-contained and coherent, whilst being as small as possible to +keep discussion focused. For example, an RFC on 'arithmetic and logic +expressions' is about the right size; 'expressions' would be too big, and +'addition' would be too small. + + +### When is a guideline ready for RFC? + +The purpose of the style RFC process is to foster an open discussion about style +guidelines. Therefore, RFC PRs should be made early rather than late. It is +expected that there may be more discussion and changes to style RFCs than is +typical for Rust RFCs. However, at submission, RFC PRs should be completely +developed and explained to the level where they can be used as a specification. + +A guideline should usually be implemented in Rustfmt **before** an RFC PR is +submitted. The RFC should be used to select an option to be the default +behaviour, rather than to identify a range of options. An RFC can propose a +combination of options (rather than a single one) as default behaviour. An RFC +may propose some reorganisation of options. + +Usually a style should be widely used in the community before it is submitted as +an RFC. Where multiple styles are used, they should be covered as alternatives +in the RFC, rather than being submitted as multiple RFCs. In some cases, a style +may be proposed without wide use (we don't want to discourage innovation), +however, it should have been used in *some* real code, rather than just being +sketched out. + + +### Triage + +RFC PRs are triaged by the style team. An RFC may be closed during triage (with +feedback for the author) if the style team think it is not specified in enough +detail, has too narrow or broad scope, or is not appropriate in some way (e.g., +applies to more than just formatting). Otherwise, the PR will be assigned a +shepherd as for other RFCs. + + +### FCP + +FCP will last for two weeks (assuming the team decide to meet every two weeks) +and will be announced in the style team sub-team report. + + +### Decision and post-decision process + +The style team will make the ultimate decision on accepting or closing a style +RFC PR. Decisions should be by consensus. Most discussion should take place on +the PR comment thread, a decision should ideally be made when consensus is +reached on the thread. Any additional discussion amongst the style team will be +summarised on the thread. + +If an RFC PR is accepted, it will be merged. An issue for implementation will be +filed in the appropriate place (usually the Rustfmt repository) referencing the +RFC. If the style guide needs to be updated, then an issue for that should be +filed on the Rust repository. + +The author of an RFC is not required to implement the guideline. If you are +interested in working on the implementation for an 'active' RFC, but cannot +determine if someone else is already working on it, feel free to ask (e.g. by +leaving a comment on the associated issue). + + +## The fmt-rfcs repository + +The form of the fmt-rfcs repository will follow the rfcs repository. Accepted +RFCs will live in a `text` directory, the `README.md` will include information +taken from this RFC, there will be an RFC template in the root of the +repository. Issues on the repository can be used for placeholders for future +RFCs and for preliminary discussion. + +The RFC format will be illustrated by the RFC template. It will have the +following sections: + +* summary +* details +* implementation +* rationale +* alternatives +* unresolved questions + +The 'details' section should contain examples of both what should and shouldn't +be done, cover simple and complex cases, and the interaction with other style +guidelines. + +The 'implementation' section should specify how options must be set to enforce +the guideline, and what further changes (including additional options) are +required. It should specify any renaming, reorganisation, or removal of options. + +The 'rationale' section should motivate the choices behind the RFC. It should +reference existing code bases which use the proposed style. 'Alternatives' +should cover alternative possible guidelines, if appropriate. + +Guidelines may include more than one acceptable rule, but should offer +guidance for when to use each rule (which should be formal enough to be used by +a tool). + +For example: + +> A struct literal must be formatted either on a single line (with +spaces after the opening brace and before the closing brace, and with fields +separated by commas and spaces), or on multiple lines (with one field per line +and newlines after the opening brace and before the closing brace). The former +approach should be used for short struct literals, the latter for longer struct +literals. For tools, the first approach should be used when the width of the +fields (excluding commas and braces) is 16 characters. E.g., + +> ```rust +let x = Foo { a: 42, b: 34 }; +let y = Foo { + a: 42, + b: 34, + c: 1000 +}; +``` + +(Note this is just an example, not a proposed guideline). + +The repository in embryonic form lives at [nrc/fmt-rfcs](https://github.com/nrc/fmt-rfcs). +It illustrates what [issues](https://github.com/nrc/fmt-rfcs/issues/1) and +[PRs](https://github.com/nrc/fmt-rfcs/pull/2) might look like, as well as +including the RFC template. Note that typically there should be more discussion +on an issue before submitting an RFC PR. + +The repository should be updated as this RFC develops, and moved to the rust-lang +GitHub organisation if this RFC is accepted. + + +## The style team + +The style [sub-team](https://github.com/rust-lang/rfcs/blob/master/text/1068-rust-governance.md#subteams) +will be responsible for handling style RFCs and making decisions related to +code style and formatting. + +Per the [governance RFC](https://github.com/rust-lang/rfcs/blob/master/text/1068-rust-governance.md), +the core team would pick a leader who would then pick the rest of the team. I +propose that the team should include members representative of the following +areas: + +* Rustfmt, +* the language, tools, and libraries sub-teams (since each has a stake in code style), +* large Rust projects. + +Because activity such as this hasn't been done before in the RUst community, it +is hard to identify suitable candidates for the team ahead of time. The team +will probably start small and consist of core members of the Rust community. I +expect that once the process gets underway the team can be rapidly expanded with +community members who are active in the fmt-rfcs repository (i.e., submitting +and constructively commenting on RFCs). + +There will be a dedicated irc channel for discussion on formatting issues: +`#rust-style`. + + +## Style guide + +The [existing style guide](https://github.com/rust-lang/rust/tree/master/src/doc/style) +will be split into two guides: one dealing with API design and similar issues +which will be managed by the libs team, and one dealing with formatting issues +which will be managed by the style team. Note that the formatting part of the +guide may include guidelines which are not enforced by Rustfmt. Those are outside +the scope of the process defined in this RFC, but still belong in that part of +the style guide. + +When RFCs are accepted the style guide may need to be updated. Towards the end +of the process, the style team should audit and edit the guide to ensure it is a +coherent document. + + +## Material goals + +Hopefully, the style guideline process will have limited duration, one year +seems reasonable. After that time, style guidelines for new syntax could be +included with regular RFCs, or the fmt-rfcs repository could be maintained in a +less active fashion. + +At the end of the process, the fmt-rfcs repository should be a fairly complete +guide for formatting Rust code, and useful as a specification for Rustfmt and +tools with similar goals, such as IDEs. In particular, there should be a +decision made on how configurable Rustfmt should be, and an agreed set of +default options. The formatting style guide in the Rust repository should be a +more human-friendly source of formatting guidelines, and should be in sync with +the fmt-rfcs repo. + + +# Drawbacks +[drawbacks]: #drawbacks + +This RFC introduces more process and bureaucracy, and requires more meetings for +some core Rust contributors. Precious time and energy will need to be devoted to +discussions. + + +# Alternatives +[alternatives]: #alternatives + +Benevolent dictator - a single person dictates style rules which will be +followed without question by the community. This seems to work for Go, I suspect +it will not work for Rust. + +Parliamentary 'democracy' - the community 'elects' a style team (via the usual +RFC consensus process, rather than actual voting). The style team decides on +style issues without an open process. This would be more efficient, but doesn't +fit very well with the open ethos of the Rust community. + +Use the RFCs repo, rather than a new repo. This would have the benefit that +style RFCs would get more visibility, and it is one less place to keep track of +for Rust community members. However, it risks overwhelming the RFC repo with +style debate. + +Use issues on Rustfmt. I feel that the discussions would not have enough +visibility in this fashion, but perhaps that can be addressed by wide and +regular announcement. + +Use a book format for the style repo, rather than a collection of RFCs. This +would make it easier to see how the 'final product' style guide would look. +However, I expect there will be many issues that are important to be aware of +while discussing an RFC, that are not important to include in a final guide. + +Have an existing team handle the process, rather than create a new style team. +Saves on a little bureaucracy. Candidate teams would be language and tools. +However, the language team has very little free bandwidth, and the tools team is +probably not broad enough to effectively handle the style decisions. + + +# Unresolved questions +[unresolved]: #unresolved-questions diff --git a/text/1618-ergonomic-format-args.md b/text/1618-ergonomic-format-args.md new file mode 100644 index 00000000000..7178da14b96 --- /dev/null +++ b/text/1618-ergonomic-format-args.md @@ -0,0 +1,146 @@ +- Feature Name: (not applicable) +- Start Date: 2016-05-17 +- RFC PR: [rust-lang/rfcs#1618](https://github.com/rust-lang/rfcs/pull/1618) +- Rust Issue: [rust-lang/rust#33642](https://github.com/rust-lang/rust/pull/33642) + +# Summary +[summary]: #summary + +Removes the one-type-only restriction on `format_args!` arguments. +Expressions like `format_args!("{0:x} {0:o}", foo)` now work as intended, +where each argument is still evaluated only once, in order of appearance +(i.e. left-to-right). + +# Motivation +[motivation]: #motivation + +The `format_args!` macro and its friends historically only allowed a single +type per argument, such that trivial format strings like `"{0:?} == {0:x}"` or +`"rgb({r}, {g}, {b}) is #{r:02x}{g:02x}{b:02x}"` are illegal. This is +massively inconvenient and counter-intuitive, especially considering the +formatting syntax is borrowed from Python where such things are perfectly +valid. + +Upon closer investigation, the restriction is in fact an artificial +implementation detail. For mapping format placeholders to macro arguments the +`format_args!` implementation did not bother to record type information for +all the placeholders sequentially, but rather chose to remember only one type +per argument. Also the formatting logic has not received significant attention +since after its conception, but the uses have greatly expanded over the years, +so the mechanism as a whole certainly needs more love. + +# Detailed design +[design]: #detailed-design + +Formatting is done during both compile-time (expansion-time to be pedantic) +and runtime in Rust. As we are concerned with format string parsing, not +outputting, this RFC only touches the compile-time side of the existing +formatting mechanism which is `libsyntax_ext` and `libfmt_macros`. + +Before continuing with the details, it is worth noting that the core flow of +current Rust formatting is *mapping arguments to placeholders to format specs*. +For clarity, we distinguish among *placeholders*, *macro arguments* and +*argument objects*. They are all *italicized* to provide some +visual hint for distinction. + +To implement the proposed design, the following changes in behavior are made: + +* implicit references are resolved during parse of format string; +* named *macro arguments* are resolved into positional ones; +* placeholder types are remembered and de-duplicated for each *macro argument*, +* the *argument objects* are emitted with information gathered in steps above. + +As most of the details is best described in the code itself, we only +illustrate some of the high-level changes below. + +## Implicit reference resolution + +Currently two forms of implicit references exist: `ArgumentNext` and +`CountIsNextParam`. Both take a positional *macro argument* and advance the +same internal pointer, but format is parsed before position, as shown in +format strings like `"{foo:.*} {} {:.*}"` which is in every way equivalent to +`"{foo:.0$} {1} {3:.2$}"`. + +As the rule is already known even at compile-time, and does not require the +whole format string to be known beforehand, the resolution can happen just +inside the parser after a *placeholder* is successfully parsed. As a natural +consequence, both forms can be removed from the rest of the compiler, +simplifying work later. + +## Named argument resolution + +Not seen elsewhere in Rust, named arguments in format macros are best seen as +syntactic sugar, and we'd better actually treat them as such. Just after +successfully parsing the *macro arguments*, we immediately rewrite every name +to its respective position in the argument list, which again simplifies the +process. + +## Processing and expansion + +We only have absolute positional references to *macro arguments* at this point, +and it's straightforward to remember all unique *placeholders* encountered for +each. The unique *placeholders* are emitted into *argument objects* in order, +to preserve evaluation order, but no difference in behavior otherwise. + +# Drawbacks +[drawbacks]: #drawbacks + +Due to the added data structures and processing, time and memory costs of +compilations may slightly increase. However this is mere speculation without +actual profiling and benchmarks. Also the ergonomical benefits alone justifies +the additional costs. + +# Alternatives +[alternatives]: #alternatives + +## Do nothing + +One can always write a little more code to simulate the proposed behavior, +and this is what people have most likely been doing under today's constraints. +As in: + +```rust +fn main() { + let r = 0x66; + let g = 0xcc; + let b = 0xff; + + // rgb(102, 204, 255) == #66ccff + // println!("rgb({r}, {g}, {b}) == #{r:02x}{g:02x}{b:02x}", r=r, g=g, b=b); + println!("rgb({}, {}, {}) == #{:02x}{:02x}{:02x}", r, g, b, r, g, b); +} +``` + +Or slightly more verbose when side effects are in play: + +```rust +fn do_something(i: &mut usize) -> usize { + let result = *i; + *i += 1; + result +} + +fn main() { + let mut i = 0x1234usize; + + // 0b1001000110100 0o11064 0x1234 + // 0x1235 + // println!("{0:#b} {0:#o} {0:#x}", do_something(&mut i)); + // println!("{:#x}", i); + + // need to consider side effects, hence a temp var + { + let r = do_something(&mut i); + println!("{:#b} {:#o} {:#x}", r, r, r); + println!("{:#x}", i); + } +} +``` + +While the effects are the same and nothing requires modification, the +ergonomics is simply bad and the code becomes unnecessarily convoluted. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None. diff --git a/text/1620-regex-1.0.md b/text/1620-regex-1.0.md new file mode 100644 index 00000000000..a793d1da593 --- /dev/null +++ b/text/1620-regex-1.0.md @@ -0,0 +1,971 @@ +- Feature Name: regex-1.0 +- Start Date: 2016-05-11 +- RFC PR: [rust-lang/rfcs#1620](https://github.com/rust-lang/rfcs/pull/1620) +- Rust Issue: N/A + +# Table of contents + +* [Summary][summary] +* [Motivation][motivation] +* [Detailed design][design] + * [Syntax][syntax] + * [Evolution][evolution] + * [Concrete syntax][concrete-syntax] + * [Expansion concerns][expansion-concerns] + * [Core API][core-api] + * [RegexBuilder][regexbuilder] + * [Replacer][replacer] + * [quote][quote] + * [RegexSet][regexset] + * [The `bytes` submodule][the-bytes-submodule] +* [Drawbacks][drawbacks] + * [Guaranteed linear time matching][guaranteed-linear-time-matching] + * [Allocation][allocation] + * [Synchronization is implicit][synchronization-is-implicit] + * [The implementation is complex][the-implementation-is-complex] +* [Alternatives][alternatives] + * [Big picture][big-picture] + * [`bytes::Regex`][bytesregex] + * [A regex trait][a-regex-trait] + * [Reuse some types][reuse-some-types] +* [Unresolved questions][unresolved] + * [`regex-syntax`][regex-syntax] + * [`regex-capi`][regex-capi] + * [`regex_macros`][regex_macros] + * [Dependencies][dependencies] + * [Exposing more internals][exposing-more-internals] +* [Breaking changes][breaking-changes] + +# Summary +[summary]: #summary + +This RFC proposes a 1.0 API for the `regex` crate and therefore a move out of +the `rust-lang-nursery` organization and into the `rust-lang` organization. +Since the API of `regex` has largely remained unchanged since its inception +[2 years ago](https://github.com/rust-lang/rfcs/blob/master/text/0042-regexps.md), +significant emphasis is placed on retaining the existing API. Some minor +breaking changes are proposed. + +# Motivation +[motivation]: #motivation + +Regular expressions are a widely used tool and most popular programming +languages either have an implementation of regexes in their standard library, +or there exists at least one widely used third party implementation. It +therefore seems reasonable for Rust to do something similar. + +The `regex` crate specifically serves many use cases, most of which are somehow +related to searching strings for patterns. Describing regular expressions in +detail is beyond the scope of this RFC, but briefly, these core use cases are +supported in the main API: + +1. Testing whether a pattern matches some text. +2. Finding the location of a match of a pattern in some text. +3. Finding the location of a match of a pattern---and locations of all its + capturing groups---in some text. +4. Iterating over successive non-overlapping matches of (2) and (3). + +The expected outcome is that the `regex` crate should be the preferred default +choice for matching regular expressions when writing Rust code. This is already +true today; this RFC formalizes it. + +# Detailed design +[design]: #detailed-design + +## Syntax +[syntax]: #syntax + +### Evolution +[evolution]: #evolution + +The public API of a `regex` library *includes* the syntax of a regular +expression. A change in the semantics of the syntax can cause otherwise working +programs to break, yet, we'd still like the option to expand the syntax if +necessary. Thus, this RFC proposes: + +1. Any change that causes a previously invalid regex pattern to become valid is + *not* a breaking change. For example, the escape sequence `\y` is not a + valid pattern, but could become one in a future release without a major + version bump. +2. Any change that causes a previously valid regex pattern to become invalid + *is* a breaking change. +3. Any change that causes a valid regex pattern to change its matching + semantics *is* a breaking change. (For example, changing `\b` from "word + boundary assertion" to "backspace character.") + +Bug fixes and Unicode upgrades are exceptions to both (2) and (3). + +Another interesting exception to (2) is that compiling a regex can fail if the +entire compiled object would exceed some pre-defined user configurable size. +In particular, future changes to the compiler could cause certain instructions +to use more memory, or indeed, the representation of the compiled regex could +change completely. This could cause a regex that fit under the size limit to +no longer fit, and therefore fail to compile. These cases are expected to be +extremely rare in practice. Notably, the default size limit is `10MB`. + +### Concrete syntax +[concrete-syntax]: #concrete-syntax + +The syntax is exhaustively documented in the current public API documentation: +http://doc.rust-lang.org/regex/regex/index.html#syntax + +To my knowledge, the evolution as proposed in this RFC has been followed since +`regex` was created. The syntax has largely remained unchanged with few +additions. + +### Expansion concerns +[expansion-concerns]: #expansion-concerns + +There are a few possible avenues for expansion, and we take measures to make +sure they are possible with respect to API evolution. + +* Escape sequences are often blessed with special semantics. For example, `\d` + is a Unicode character class that matches any digit and `\b` is a word + boundary assertion. We may one day like to add more escape sequences with + special semantics. For this reason, any unrecognized escape sequence makes a + pattern invalid. +* If we wanted to expand the syntax with various look-around operators, then it + would be possible since most common syntax is considered an invalid pattern + today. In particular, all of the [syntactic forms listed + here](http://www.regular-expressions.info/refadv.html) are invalid patterns + in `regex`. +* Character class sets are another potentially useful feature that may be worth + adding. Currently, [various forms of set + notation](http://www.regular-expressions.info/refcharclass.html) are treated + as valid patterns, but this RFC proposes making them invalid patterns before + `1.0`. +* Additional named Unicode classes or codepoints may be desirable to add. + Today, any pattern of the form `\p{NAME}` where `NAME` is unrecognized is + considered invalid, which leaves room for expansion. +* If all else fails, we can introduce new flags that enable new features that + conflict with stable syntax. This is possible because using an unrecognized + flag results in an invalid pattern. + +## Core API +[core-api]: #core-api + +The core API of the `regex` crate is the `Regex` type: + +```rust +pub struct Regex(_); +``` + +It has one primary constructor: + +```rust +impl Regex { + /// Creates a new regular expression. If the pattern is invalid or otherwise + /// fails to compile, this returns an error. + pub fn new(pattern: &str) -> Result; +} +``` + +And five core search methods. All searching completes in worst case linear time +with respect to the search text (the size of the regex is taken as a constant). + +```rust +impl Regex { + /// Returns true if and only if the text matches this regex. + pub fn is_match(&self, text: &str) -> bool; + + /// Returns the leftmost-first match of this regex in the text given. If no + /// match exists, then None is returned. + /// + /// The leftmost-first match is defined as the first match that is found + /// by a backtracking search. + pub fn find<'t>(&self, text: &'t str) -> Option>; + + /// Returns an iterator of successive non-overlapping matches of this regex + /// in the text given. + pub fn find_iter<'r, 't>(&'r self, text: &'t str) -> Matches<'r, 't>; + + /// Returns the leftmost-first match of this regex in the text given with + /// locations for all capturing groups that participated in the match. + pub fn captures(&self, text: &str) -> Option; + + /// Returns an iterator of successive non-overlapping matches with capturing + /// group information in the text given. + pub fn captures_iter<'r, 't>(&'r self, text: &'t str) -> CaptureMatches<'r, 't>; +} +``` + +(N.B. The `captures` method can technically replace all uses of `find` and +`is_match`, but is potentially slower. Namely, the API reflects a performance +trade off: the more you ask for, the harder the regex engine has to work.) + +There is one additional, but idiosyncratic, search method: + +```rust +impl Regex { + /// Returns the end location of a match if one exists in text. + /// + /// This may return a location preceding the end of a proper leftmost-first + /// match. In particular, it may return the location at which a match is + /// determined to exist. For example, matching `a+` against `aaaaa` will + /// return `1` while the end of the leftmost-first match is actually `5`. + /// + /// This has the same performance characteristics as `is_match`. + pub fn shortest_match(&self, text: &str) -> Option; +} +``` + +And two methods for splitting: + +```rust +impl Regex { + /// Returns an iterator of substrings of `text` delimited by a match of + /// this regular expression. Each element yielded by the iterator corresponds + /// to text that *isn't* matched by this regex. + pub fn split<'r, 't>(&'r self, text: &'t str) -> Split<'r, 't>; + + /// Returns an iterator of at most `limit` substrings of `text` delimited by + /// a match of this regular expression. Each element yielded by the iterator + /// corresponds to text that *isn't* matched by this regex. The remainder of + /// `text` that is not split will be the last element yielded by the + /// iterator. + pub fn splitn<'r, 't>(&'r self, text: &'t str, limit: usize) -> SplitN<'r, 't>; +} +``` + +And three methods for replacement. Replacement is discussed in more detail in a +subsequent section. + +```rust +impl Regex { + /// Replaces matches of this regex in `text` with `rep`. If no matches were + /// found, then the given string is returned unchanged, otherwise a new + /// string is allocated. + /// + /// `replace` replaces the first match only. `replace_all` replaces all + /// matches. `replacen` replaces at most `limit` matches. + fn replace<'t, R: Replacer>(&self, text: &'t str, rep: R) -> Cow<'t, str>; + fn replace_all<'t, R: Replacer>(&self, text: &'t str, rep: R) -> Cow<'t, str>; + fn replacen<'t, R: Replacer>(&self, text: &'t str, limit: usize, rep: R) -> Cow<'t, str>; +} +``` + +And lastly, three simple accessors: + +```rust +impl Regex { + /// Returns the original pattern string. + pub fn as_str(&self) -> &str; + + /// Returns an iterator over all capturing group in the pattern in the order + /// they were defined (by position of the leftmost parenthesis). The name of + /// the group is yielded if it has a name, otherwise None is yielded. + pub fn capture_names(&self) -> CaptureNames; + + /// Returns the total number of capturing groups in the pattern. This + /// includes the implicit capturing group corresponding to the entire + /// pattern. + pub fn captures_len(&self) -> usize; +} +``` + +Finally, `Regex` impls the `Send`, `Sync`, `Display`, `Debug`, `Clone` and +`FromStr` traits from the standard library. + +## Error + +The `Error` enum is an *extensible* enum, similar to `std::io::Error`, +corresponding to the different ways that regex compilation can fail. In +particular, this means that adding a new variant to this enum is not a breaking +change. (Removing or changing an existing variant is still a breaking change.) + +```rust +pub enum Error { + /// A syntax error. + Syntax(SyntaxError), + /// The compiled program exceeded the set size limit. + /// The argument is the size limit imposed. + CompiledTooBig(usize), + /// Hints that destructuring should not be exhaustive. + /// + /// This enum may grow additional variants, so this makes sure clients + /// don't count on exhaustive matching. (Otherwise, adding a new variant + /// could break existing code.) + #[doc(hidden)] + __Nonexhaustive, +} +``` + +Note that the `Syntax` variant could contain the `Error` type from the +`regex-syntax` crate, but this couples `regex-syntax` to the public API +of `regex`. We sidestep this hazard by defining a newtype in `regex` that +internally wraps `regex_syntax::Error`. This also enables us to selectively +expose more information in the future. + +## RegexBuilder +[regexbuilder]: #regexbuilder + +In most cases, the construction of a regex is done with `Regex::new`. There are +however some options one might want to tweak. This can be done with a +`RegexBuilder`: + +```rust +impl RegexBuilder { + /// Creates a new builder from the given pattern. + pub fn new(pattern: &str) -> RegexBuilder; + + /// Compiles the pattern and all set options. If successful, a Regex is + /// returned. Otherwise, if compilation failed, an Error is returned. + /// + /// N.B. `RegexBuilder::new("...").compile()` is equivalent to + /// `Regex::new("...")`. + pub fn build(&self) -> Result; + + /// Set the case insensitive flag (i). + pub fn case_insensitive(&mut self, yes: bool) -> &mut RegexBuilder; + + /// Set the multi line flag (m). + pub fn multi_line(&mut self, yes: bool) -> &mut RegexBuilder; + + /// Set the dot-matches-any-character flag (s). + pub fn dot_matches_new_line(&mut self, yes: bool) -> &mut RegexBuilder; + + /// Set the swap-greedy flag (U). + pub fn swap_greed(&mut self, yes: bool) -> &mut RegexBuilder; + + /// Set the ignore whitespace flag (x). + pub fn ignore_whitespace(&mut self, yes: bool) -> &mut RegexBuilder; + + /// Set the Unicode flag (u). + pub fn unicode(&mut self, yes: bool) -> &mut RegexBuilder; + + /// Set the approximate size limit (in bytes) of the compiled regular + /// expression. + /// + /// If compiling a pattern would approximately exceed this size, then + /// compilation will fail. + pub fn size_limit(&mut self, limit: usize) -> &mut RegexBuilder; + + /// Set the approximate size limit (in bytes) of the cache used by the DFA. + /// + /// This is a per thread limit. Once the DFA fills the cache, it will be + /// wiped and refilled again. If the cache is wiped too frequently, the + /// DFA will quit and fall back to another matching engine. + pub fn dfa_size_limit(&mut self, limit: usize) -> &mut RegexBuilder; +} +``` + +## Captures + +A `Captures` value stores the locations of all matching capturing groups for +a single match. It provides convenient access to those locations indexed by +either number, or, if available, name. + +The first capturing group (index `0`) is always unnamed and always corresponds +to the entire match. Other capturing groups correspond to groups in the +pattern. Capturing groups are indexed by the position of their leftmost +parenthesis in the pattern. + +Note that `Captures` is a type constructor with a single parameter: the +lifetime of the text searched by the corresponding regex. In particular, the +lifetime of `Captures` is not tied to the lifetime of a `Regex`. + +```rust +impl<'t> Captures<'t> { + /// Returns the match associated with the capture group at index `i`. If + /// `i` does not correspond to a capture group, or if the capture group + /// did not participate in the match, then `None` is returned. + pub fn get(&self, i: usize) -> Option>; + + /// Returns the match for the capture group named `name`. If `name` isn't a + /// valid capture group or didn't match anything, then `None` is returned. + pub fn name(&self, name: &str) -> Option>; + + /// Returns the number of captured groups. This is always at least 1, since + /// the first unnamed capturing group corresponding to the entire match + /// always exists. + pub fn len(&self) -> usize; + + /// Expands all instances of $name in the text given to the value of the + /// corresponding named capture group. The expanded string is written to + /// dst. + /// + /// The name in $name may be integer corresponding to the index of a capture + /// group or it can be the name of a capture group. If the name isn't a valid + /// capture group, then it is replaced with an empty string. + /// + /// The longest possible name is used. e.g., $1a looks up the capture group + /// named 1a and not the capture group at index 1. To exert more precise + /// control over the name, use braces, e.g., ${1}a. + /// + /// To write a literal $, use $$. + pub fn expand(&self, replacement: &str, dst: &mut String); +} +``` + +The `Captures` type impls `Debug`, `Index` (for numbered capture groups) +and `Index` (for named capture groups). A downside of the `Index` impls is +that the return value is bounded to the lifetime of `Captures` instead of the +lifetime of the actual text searched because of how the `Index` trait is +defined. Callers can work around that limitation if necessary by using an +explicit method such as `get` or `name`. + +## Replacer +[replacer]: #replacer + +The `Replacer` trait is a helper trait to make the various `replace` methods on +`Regex` more ergonomic. In particular, it makes it possible to use either a +standard string as a replacement, or a closure with more explicit access to a +`Captures` value. + +```rust +pub trait Replacer { + /// Appends text to dst to replace the current match. + /// + /// The current match is represents by caps, which is guaranteed to have a + /// match at capture group 0. + /// + /// For example, a no-op replacement would be + /// dst.extend(caps.at(0).unwrap()). + fn replace_append(&mut self, caps: &Captures, dst: &mut String); + + /// Return a fixed unchanging replacement string. + /// + /// When doing replacements, if access to Captures is not needed, then + /// it can be beneficial from a performance perspective to avoid finding + /// sub-captures. In general, this is called once for every call to replacen. + fn no_expansion<'r>(&'r mut self) -> Option> { + None + } +} +``` + +Along with this trait, there is also a helper type, `NoExpand` that implements +`Replacer` like so: + +```rust +pub struct NoExpand<'t>(pub &'t str); + +impl<'t> Replacer for NoExpand<'t> { + fn replace_append(&mut self, _: &Captures, dst: &mut String) { + dst.push_str(self.0); + } + + fn no_expansion<'r>(&'r mut self) -> Option> { + Some(Cow::Borrowed(self.0)) + } +} +``` + +This permits callers to use `NoExpand` with the `replace` methods to guarantee +that the replacement string is never searched for `$group` replacement syntax. + +We also provide two more implementations of the `Replacer` trait: `&str` and +`FnMut(&Captures) -> String`. + +## quote +[quote]: #quote + +There is one free function in `regex`: + +```rust +/// Escapes all regular expression meta characters in `text`. +/// +/// The string returned may be safely used as a literal in a regex. +pub fn quote(text: &str) -> String; +``` + +## RegexSet +[regexset]: #regexset + +A `RegexSet` represents the union of zero or more regular expressions. It is a +specialized machine that can match multiple regular expressions simultaneously. +Conceptually, it is similar to joining multiple regexes as alternates, e.g., +`re1|re2|...|reN`, with one crucial difference: in a `RegexSet`, multiple +expressions can match. This means that each pattern can be reasoned about +independently. A `RegexSet` is ideal for building simpler lexers or an HTTP +router. + +Because of their specialized nature, they can only report which regexes match. +They do not report match locations. In theory, this could be added in the +future, but is difficult. + +```rust +pub struct RegexSet(_); + +impl RegexSet { + /// Constructs a new RegexSet from the given sequence of patterns. + /// + /// The order of the patterns given is used to assign increasing integer + /// ids starting from 0. Namely, matches are reported in terms of these ids. + pub fn new(patterns: I) -> Result + where S: AsRef, I: IntoIterator; + + /// Returns the total number of regexes in this set. + pub fn len(&self) -> usize; + + /// Returns true if and only if one or more regexes in this set match + /// somewhere in the given text. + pub fn is_match(&self, text: &str) -> bool; + + /// Returns the set of regular expressions that match somewhere in the given + /// text. + pub fn matches(&self, text: &str) -> SetMatches; +} +``` + +`RegexSet` impls the `Debug` and `Clone` traits. + +The `SetMatches` type is queryable and implements `IntoIterator`. + +```rust +pub struct SetMatches(_); + +impl SetMatches { + /// Returns true if this set contains 1 or more matches. + pub fn matched_any(&self) -> bool; + + /// Returns true if and only if the regex identified by the given id is in + /// this set of matches. + /// + /// This panics if the id given is >= the number of regexes in the set that + /// these matches came from. + pub fn matched(&self, id: usize) -> bool; + + /// Returns the total number of regexes in the set that created these + /// matches. + pub fn len(&self) -> usize; + + /// Returns an iterator over the ids in the set that correspond to a match. + pub fn iter(&self) -> SetMatchesIter; +} +``` + +`SetMatches` impls the `Debug` and `Clone` traits. + +Note that a builder is not proposed for `RegexSet` in this RFC; however, it is +likely one will be added at some point in a backwards compatible way. + +## The `bytes` submodule +[the-bytes-submodule]: #the-bytes-submodule + +All of the above APIs have thus far been explicitly for searching `text` where +`text` has type `&str`. While this author believes that suits most use cases, +it should also be possible to search a regex on *arbitrary* bytes, i.e., +`&[u8]`. One particular use case is quickly searching a file via a memory map. +If regexes could only search `&str`, then one would have to verify it was UTF-8 +first, which could be costly. Moreover, if the file isn't valid UTF-8, then you +either can't search it, or you have to allocate a new string and lossily copy +the contents. Neither case is particularly ideal. It would instead be nice to +just search the `&[u8]` directly. + +This RFC including a `bytes` submodule in the crate. The API of this submodule +is a clone of the API described so far, except with `&str` replaced by `&[u8]` +for the search text (patterns are still `&str`). The clone includes `Regex` +itself, along with all supporting types and traits such as `Captures`, +`Replacer`, `FindIter`, `RegexSet`, `RegexBuilder` and so on. (This RFC +describes some alternative designs in a subsequent section.) + +Since the API is a clone of what has been seen so far, it is not written out +again. Instead, we'll discuss the key differences. + +Again, the first difference is that a `bytes::Regex` can search `&[u8]` +while a `Regex` can search `&str`. + +The second difference is that a `bytes::Regex` can completely disable Unicode +support and explicitly match arbitrary bytes. The details: + +1. The `u` flag can be disabled even when disabling it might cause the regex to +match invalid UTF-8. When the `u` flag is disabled, the regex is said to be in +"ASCII compatible" mode. +2. In ASCII compatible mode, neither Unicode codepoints nor Unicode character +classes are allowed. +3. In ASCII compatible mode, Perl character classes (`\w`, `\d` and `\s`) +revert to their typical ASCII definition. `\w` maps to `[[:word:]]`, `\d` maps +to `[[:digit:]]` and `\s` maps to `[[:space:]]`. +4. In ASCII compatible mode, word boundaries use the ASCII compatible `\w` to +determine whether a byte is a word byte or not. +5. Hexadecimal notation can be used to specify arbitrary bytes instead of +Unicode codepoints. For example, in ASCII compatible mode, `\xFF` matches the +literal byte `\xFF`, while in Unicode mode, `\xFF` is a Unicode codepoint that +matches its UTF-8 encoding of `\xC3\xBF`. Similarly for octal notation. +6. `.` matches any byte except for `\n` instead of any Unicode codepoint. When +the `s` flag is enabled, `.` matches any byte. + +An interesting property of the above is that while the Unicode flag is enabled, +a `bytes::Regex` is *guaranteed* to match only valid UTF-8 in a `&[u8]`. Like +`Regex`, the Unicode flag is enabled by default. + +N.B. The Unicode flag can also be selectively disabled in a `Regex`, but not in +a way that permits matching invalid UTF-8. + +# Drawbacks +[drawbacks]: #drawbacks + +## Guaranteed linear time matching +[guaranteed-linear-time-matching]: #guaranteed-linear-time-matching + +A significant contract in the API of the `regex` crate is that all searching +has worst case `O(n)` complexity, where `n ~ length(text)`. (The size of the +regular expression is taken as a constant.) This contract imposes significant +restrictions on both the implementation and the set of features exposed in the +pattern language. A full analysis is beyond the scope of this RFC, but here are +the highlights: + +1. Unbounded backtracking can't be used to implement matching. Backtracking can + be quite fast in practice (indeed, the current implementation uses bounded + backtracking in some cases), but has worst case exponential time. +2. Permitting backreferences in the pattern language can cause matching to + become NP-complete, which (probably) can't be solved in linear time. +3. Arbitrary look around is probably difficult to fit into a linear time + guarantee *in practice*. + +The benefit to the linear time guarantee is just that: no matter what, all +searching completes in linear time with respect to the search text. This is a +valuable guarantee to make, because it means that one can execute arbitrary +regular expressions over arbitrary input and be absolutely sure that it will +finish in some "reasonable" time. + +Of course, in practice, constants that are omitted from complexity analysis +*actually matter*. For this reason, the `regex` crate takes a number of steps +to keep constants low. For example, by placing a limit on the size of the +regular expression or choosing an appropriate matching engine when another +might result in higher constant factors. + +This particular drawback segregates Rust's regular expression library from most +other regular expression libraries that programmers may be familiar with. +Languages such as Java, Python, Perl, Ruby, PHP and C++ support more flavorful +regexes by default. Go is the only language this author knows of whose standard +regex implementation guarantees linear time matching. Of course, RE2 +is also worth mentioning, which is a C++ regex library that guarantees linear +time matching. There are other implementations of regexes that guarantee linear +time matching (TRE, for example), but none of them are particularly popular. + +It is also worth noting that since Rust's FFI is zero cost, one can bind to +existing regex implementations that provide more features (bindings for both +PCRE1 and Oniguruma exist today). + +## Allocation +[allocation]: #allocation + +The `regex` API assumes that the implementation can dynamically allocate +memory. Indeed, the current implementation takes advantage of this. A `regex` +library that has no requirement on dynamic memory allocation would look +significantly different than the one that exists today. Dynamic memory +allocation is utilized pervasively in the parser, compiler and even during +search. + +The benefit of permitting dynamic memory allocation is that it makes the +implementation *and* API simpler. This does make use of the `regex` crate in +environments that don't have dynamic memory allocation impossible. + +This author isn't aware of any `regex` library that can work without dynamic +memory allocation. + +With that said, `regex` may want to grow custom allocator support when the +corresponding traits stabilize. + +## Synchronization is implicit +[synchronization-is-implicit]: #synchronization-is-implicit + +Every `Regex` value can be safely used from multiple threads simultaneously. +Since a `Regex` has interior mutable state, this implies that it must do some +kind of synchronization in order to be safe. + +There are some reasons why we might want to do synchronization +automatically: + +1. `Regex` exposes an *immutable API*. That is, from looking at its set of + methods, none of them borrow the `Regex` mutably (or otherwise claim to + mutate the `Regex`). This author claims that since there is no *observable + mutation* of a `Regex`, it *not* being thread safe would violate the + principle of least surprise. +2. Often, a `Regex` should be compiled once and reused repeatedly in multiple + searches. To facilitate this, `lazy_static!` can be used to guarantee that + compilation happens exactly once. `lazy_static!` requires its types to be + `Sync`. A user of `Regex` could work around this by wrapping a `Regex` in a + `Mutex`, but this would make misuse too easy. For example, locking a `Regex` + in one thread would prevent simultaneous searching in another thread. + +Synchronization has overhead, although it is extremely small (and dwarfed +by general matching overhead). The author has *ad hoc* benchmarked the +`regex` implementation with GNU Grep, and per match overhead is comparable in +single threaded use. It is this author's opinion, that it is good enough. If +synchronization overhead across multiple threads is too much, callers may elect +to clone the `Regex` so that each thread gets its own copy. Cloning a `Regex` +is no more expensive than what would be done internally automatically, but it +does eliminate contention. + +An alternative is to increase the API surface and have types that are +synchronized by default and types that aren't synchronized. This was discussed +at length in +[this +thread](https://users.rust-lang.org/t/help-me-reduce-overhead-of-regex-matching/5220/1). +My conclusion from this thread is that we either expand the surface of the API, +or we break the current API or we keep implicit synchronization as-is. In this +author's opinion, neither expanding the API or breaking the API is worth +avoiding negligible synchronization overhead. + +## The implementation is complex +[the-implementation-is-complex]: #the-implementation-is-complex + +Regular expression engines have a lot of moving parts and it often requires +quite a bit of context on how the whole library is organized in order to make +significant contributions. Therefore, moving `regex` into `rust-lang` is a +*maintenance hazard*. This author has tried to mitigate this hazard somewhat by +doing the following: + +1. Offering to mentor contributions. Significant contributions have thus far + fizzled, but minor contributions---even to complex code like the DFA---have + been successful. +2. Documenting not just the API, but the *internals*. The DFA is, for example, + heavily documented. +3. Wrote a `HACKING.md` guide that gives a sweeping overview of the design. +4. Significant test and benchmark suites. + +With that said, there is still a lot more that could be done to mitigate the +maintenance hazard. In this author's opinion, the interaction between the three +parts of the implementation (parsing, compilation, searching) is not documented +clearly enough. + +# Alternatives +[alternatives]: #alternatives + +## Big picture +[big-picture]: #big-picture + +The most important alternative is to decide *not* to bless a particular +implementation of regular expressions. We might want to go this route for any +number of reasons (see: Drawbacks). However, the `regex` crate is already +widely used, which provides at least some evidence that some set of programmers +find it good enough for general purpose regex searching. + +The impact of not moving `regex` into `rust-lang` is, plainly, that Rust won't +have an "officially blessed" regex implementation. Many programmers may +appreciate the complexity of a regex implementation, and therefore might insist +that one be officially maintained. However, to be honest, it isn't quite clear +what would happen in practice. This author is speculating. + +## `bytes::Regex` +[bytesregex]: #bytesregex + +This RFC proposes stabilizing the `bytes` sub-module of the `regex` crate in +its entirety. The `bytes` sub-module is a near clone of the API at the crate +level with one important difference: it searches `&[u8]` instead of `&str`. +This design was motivated by a similar split in `std`, but there are +alternatives. + +### A regex trait +[a-regex-trait]: #a-regex-trait + +One alternative is designing a trait that looks something like this: + +```rust +trait Regex { + type Text: ?Sized; + + fn is_match(&self, text: &Self::Text) -> bool; + fn find(&self, text: &Self::Text) -> Option; + fn find_iter<'r, 't>(&'r self, text: &'t Self::Text) -> Matches<'r, 't, Self::Text>; + // and so on +} +``` + +However, there are a couple problems with this approach. First and foremost, +the use cases of such a trait aren't exactly clear. It does make writing +generic code that searches either a `&str` or a `&[u8]` possible, but the +semantics of searching `&str` (always valid UTF-8) or `&[u8]` are quite a bit +different with respect to the original `Regex`. Secondly, the trait isn't +obviously implementable by others. For example, some of the methods return +iterator types such as `Matches` that are typically implemented with a +lower level API that isn't exposed. This suggests that a straight-forward +traitification of the current API probably isn't appropriate, and perhaps, +a better trait needs to be more fundamental to regex searching. + +Perhaps the strongest reason to not adopt this design for regex `1.0` is that +we don't have any experience with it and there hasn't been any demand for it. +In particular, it could be prototyped in another crate. + +### Reuse some types +[reuse-some-types]: #reuse-some-types + +In the current proposal, the `bytes` submodule completely duplicates the +top-level API, including all iterator types, `Captures` and even the `Replacer` +trait. We could parameterize many of those types over the type of the text +searched. For example, the proposed `Replacer` trait looks like this: + +```rust +trait Replacer { + fn replace_append(&mut self, caps: &Captures, dst: &mut String); + + fn no_expansion<'r>(&'r mut self) -> Option> { + None + } +} +``` + +We might add an associated type like so: + +```rust +trait Replacer { + type Text: ToOwned + ?Sized; + + fn replace_append( + &mut self, + caps: &Captures, + dst: &mut ::Owned, + ); + + fn no_expansion<'r>(&'r mut self) -> Option> { + None + } +} +``` + +But parameterizing the `Captures` type is a little bit tricky. Namely, methods +like `get` want to slice the text at match offsets, but this can't be done +safely in generic code without introducing another public trait. + +The final death knell in this idea is that these two implementations cannot +co-exist: + +```rust +impl Replacer for F where F: FnMut(&Captures) -> String { + type Text = str; + + fn replace_append(&mut self, caps: &Captures, dst: &mut String) { + dst.push_str(&(*self)(caps)); + } +} + +impl Replacer for F where F: FnMut(&Captures) -> Vec { + type Text = [u8]; + + fn replace_append(&mut self, caps: &Captures, dst: &mut Vec) { + dst.extend(&(*self)(caps)); + } +} +``` + +Perhaps there is a path through this using yet more types or more traits, but +without a really strong motivating reason to find it, I'm not convinced it's +worth it. Duplicating all of the types is unfortunate, but it's *simple*. + + +# Unresolved questions +[unresolved]: #unresolved-questions + +The `regex` repository has more than just the `regex` crate. + +## `regex-syntax` +[regex-syntax]: #regex-syntax + +This crate exposes a regular expression parser and abstract syntax that is +completely divorced from compilation or searching. It is not part of `regex` +proper since it may experience more frequent breaking changes and is far less +frequently used. It is not clear whether this crate will ever see `1.0`, and if +it does, what criteria would be used to judge it suitable for `1.0`. +Nevertheless, it is a useful public API, but it is not part of this RFC. + +## `regex-capi` +[regex-capi]: #regex-capi + +Recently, `regex-capi` was built to provide a C API to this regex library. It +has been used to build [cgo bindings to this library for +Go](https://github.com/BurntSushi/rure-go). Given its young age, it is not part +of this proposal but will be maintained as a pre-1.0 crate in the same +repository. + +## `regex_macros` +[regex_macros]: #regex_macros + +The `regex!` compiler plugin is a macro that can compile regular expressions +when your Rust program compiles. Stated differently, `regex!("...")` is +transformed into Rust code that executes a search of the given pattern +directly. It was written two years ago and largely hasn't changed since. When +it was first written, it had two major benefits: + +1. If there was a syntax error in your regex, your Rust program would not + compile. +2. It was faster. + +Today, (1) can be simulated in practice with the use of a Clippy lint and (2) +is no longer true. In fact, `regex!` is at least one order of magnitude slower +than the standard `Regex` implementation. + +The future of `regex_macros` is not clear. In one sense, since it is a +compiler plugin, there hasn't been much interest in developing it further since +its audience is necessarily limited. In another sense, it's not entirely clear +what its implementation path is. It would take considerable work for it to beat +the current `Regex` implementation (if it's even possible). More discussion on +this is out of scope. + +## Dependencies +[dependencies]: #dependencies + +As of now, `regex` has several dependencies: + +* `aho-corasick` +* `memchr` +* `thread_local` +* `regex-syntax` +* `utf8-ranges` + +All of them except for `thread_local` were written by this author, and were +primarily motivated for use in the `regex` crate. They were split out because +they seem generally useful. + +There may be other things in `regex` (today or in the future) that may also be +helpful to others outside the strict context of `regex`. Is it beneficial to +split such things out and create a longer list of dependencies? Or should we +keep `regex` as tight as possible? + +## Exposing more internals +[exposing-more-internals]: #exposing-more-internals + +It is conceivable that others might find interest in the regex compiler or more +lower level access to the matching engines. We could do something similar to +`regex-syntax` and expose some internals in a separate crate. However, there +isn't a pressing desire to do this at the moment, and would probably require a +good deal of work. + +# Breaking changes +[breaking-changes]: #breaking-changes + +This section of the RFC lists all breaking changes between `regex 0.1` and the +API proposed in this RFC. + +* `find` and `find_iter` now return values of type `Match` instead of + `(usize, usize)`. The `Match` type has `start` and `end` methods which can + be used to recover the original offsets, as well as an `as_str` method to + get the matched text. +* The `Captures` type no longer has any iterators defined. Instead, callers + should use the `Regex::capture_names` method. +* `bytes::Regex` enables the Unicode flag by default. Previously, it disabled + it by default. The flag can be disabled in the pattern with `(?-u)`. +* The definition of the `Replacer` trait was completely re-worked. Namely, its + API inverts control of allocation so that the caller must provide a `String` + to write to. Previous implementors will need to examine the new API. Moving + to the new API should be straight-forward. +* The `is_empty` method on `Captures` was removed since it always returns + `false` (because every `Captures` has at least one capture group + corresponding to the entire match). +* The `PartialEq` and `Eq` impls on `Regex` were removed. If you need this + functionality, add a newtype around `Regex` and write the corresponding + `PartialEq` and `Eq` impls. +* The lifetime parameters for the `iter` and `iter_named` methods on + `Captures` were fixed. The corresponding iterator types, `SubCaptures` and + `SubCapturesNamed`, grew an additional lifetime parameter. +* The constructor, `Regex::with_size_limit`, was removed. It can be replaced + with use of `RegexBuilder`. +* The `is_match` free function was removed. Instead, compile a `Regex` + explicitly and call the `is_match` method. +* Many iterator types were renamed. (e.g., `RegexSplits` to `SplitsIter`.) +* Replacements now return a `Cow` instead of a `String`. Namely, the + subject text doesn't need to be copied if there are no replacements. Callers + may need to add `into_owned()` calls to convert the `Cow` to a proper + `String`. +* The `Error` type no longer has the `InvalidSet` variant, since the error is + no longer possible. Its `Syntax` variant was also modified to wrap a `String` + instead of a `regex_syntax::Error`. If you need access to specific parse + error information, use the `regex-syntax` crate directly. +* To allow future growth, some character classes may no longer compile to make + room for possibly adding class set notation in the future. +* Various iterator types have been renamed. +* The `RegexBuilder` type now takes an `&mut self` on most methods instead of + `self`. Additionally, the final build step now uses `build()` instead of + `compile()`. diff --git a/text/1623-static.md b/text/1623-static.md new file mode 100644 index 00000000000..b07b8aed773 --- /dev/null +++ b/text/1623-static.md @@ -0,0 +1,118 @@ +- Feature Name: static_lifetime_in_statics +- Start Date: 2016-05-20 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1623 +- Rust Issue: https://github.com/rust-lang/rust/issues/35897 + +# Summary +[summary]: #summary + +Let's default lifetimes in static and const declarations to `'static`. + +# Motivation +[motivation]: #motivation + +Currently, having references in `static` and `const` declarations is cumbersome +due to having to explicitly write `&'static ..`. Also the long lifetime name +causes substantial rightwards drift, which makes it hard to format the code +to be visually appealing. + +For example, having a `'static` default for lifetimes would turn this: +```rust +static my_awesome_tables: &'static [&'static HashMap, u32>] = .. +``` +into this: +```rust +static my_awesome_table: &[&HashMap, u32>] = .. +``` + +The type declaration still causes some rightwards drift, but at least all the +contained information is useful. There is one exception to the rule: lifetime +elision for function signatures will work as it does now (see example below). + +# Detailed design +[design]: #detailed-design + +The same default that RFC #599 sets up for trait object is to be used for +statics and const declarations. In those declarations, the compiler will assume +`'static` when a lifetime is not explicitly given in all reference lifetimes, +including reference lifetimes obtained via generic substitution. + +Note that this RFC does not forbid writing the lifetimes, it only sets a +default when no is given. Thus the change will not cause any breakage and is +therefore backwards-compatible. It's also very unlikely that implementing this +RFC will restrict our design space for `static` and `const` definitions down +the road. + +The `'static` default does *not* override lifetime elision in function +signatures, but work alongside it: + +```rust +static foo: fn(&u32) -> &u32 = ...; // for<'a> fn(&'a u32) -> &'a u32 +static bar: &Fn(&u32) -> &u32 = ...; // &'static for<'a> Fn(&'a u32) -> &'a u32 +``` + +With generics, it will work as anywhere else, also differentiating between +function lifetimes and reference lifetimes. Notably, writing out the lifetime +is still possible. + +```rust +trait SomeObject<'a> { .. } +static foo: &SomeObject = ...; // &'static SomeObject<'static> +static bar: &for<'a> SomeObject<'a> = ...; // &'static for<'a> SomeObject<'a> +static baz: &'static [u8] = ...; + +struct SomeStruct<'a, 'b> { + foo: &'a Foo, + bar: &'a Bar, + f: for<'b> Fn(&'b Foo) -> &'b Bar +} + +static blub: &SomeStruct = ...; // &'static SomeStruct<'static, 'b> for any 'b +``` + +It will still be an error to omit lifetimes in function types *not* eligible +for elision, e.g. + +```rust +static blobb: FnMut(&Foo, &Bar) -> &Baz = ...; //~ ERROR: missing lifetimes for + //^ &Foo, &Bar, &Baz +``` + +This ensures that the really hairy cases that need the full type documented +aren't unduly abbreviated. + +It should also be noted that since statics and constants have no `self` type, +elision will only work with distinct input lifetimes or one input+output +lifetime. + +# Drawbacks +[drawbacks]: #drawbacks + +There are no known drawbacks to this change. + +# Alternatives +[alternatives]: #alternatives + +* Leave everything as it is. Everyone using static references is annoyed by +having to add `'static` without any value to readability. People will resort to +writing macros if they have many resources. +* Write the aforementioned macro. This is inferior in terms of UX. Depending on +the implementation it may or may not be possible to default lifetimes in +generics. +* Make all non-elided lifetimes `'static`. This has the drawback of creating +hard-to-spot errors (that would also probably occur in the wrong place) and +confusing users. +* Make all non-declared lifetimes `'static`. This would not be backwards +compatible due to interference with lifetime elision. +* Infer types for statics. The absence of types makes it harder to reason about +the code, so even if type inference for statics was to be implemented, +defaulting lifetimes would have the benefit of pulling the cost-benefit +relation in the direction of more explicit code. Thus it is advisable to +implement this change even with the possibility of implementing type inference +later. + +# Unresolved questions +[unresolved]: #unresolved-questions + +* Are there third party Rust-code handling programs that need to be updated to +deal with this change? diff --git a/text/1624-loop-break-value.md b/text/1624-loop-break-value.md new file mode 100644 index 00000000000..87d491cb8a7 --- /dev/null +++ b/text/1624-loop-break-value.md @@ -0,0 +1,311 @@ +- Feature Name: loop_break_value +- Start Date: 2016-05-20 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1624 +- Rust Issue: https://github.com/rust-lang/rust/issues/37339 + +# Summary +[summary]: #summary + +(This is a result of discussion of +[issue #961](https://github.com/rust-lang/rfcs/issues/961) and related to RFCs +[352](https://github.com/rust-lang/rfcs/pull/352) and +[955](https://github.com/rust-lang/rfcs/pull/955).) + +Let a `loop { ... }` expression return a value via `break my_value;`. + +# Motivation +[motivation]: #motivation + +> Rust is an expression-oriented language. Currently loop constructs don't +> provide any useful value as expressions, they are run only for their +> side-effects. But there clearly is a "natural-looking", practical case, +> described in [this thread](https://github.com/rust-lang/rfcs/issues/961) +> and [this] RFC, where the loop expressions could have +> meaningful values. I feel that not allowing that case runs against the +> expression-oriented conciseness of Rust. +> [comment by golddranks](https://github.com/rust-lang/rfcs/issues/961#issuecomment-220820787) + +Some examples which can be much more concisely written with this RFC: + +```rust +// without loop-break-value: +let x = { + let temp_bar; + loop { + ... + if ... { + temp_bar = bar; + break; + } + } + foo(temp_bar) +}; + +// with loop-break-value: +let x = foo(loop { + ... + if ... { break bar; } + }); + +// without loop-break-value: +let computation = { + let result; + loop { + if let Some(r) = self.do_something() { + result = r; + break; + } + } + result.do_computation() +}; +self.use(computation); + +// with loop-break-value: +let computation = loop { + if let Some(r) = self.do_something() { + break r; + } + }.do_computation(); +self.use(computation); +``` + +# Detailed design +[design]: #detailed-design + +This proposal does two things: let `break` take a value, and let `loop` have a +result type other than `()`. + +### Break Syntax + +Four forms of `break` will be supported: + +1. `break;` +2. `break 'label;` +3. `break EXPR;` +4. `break 'label EXPR;` + +where `'label` is the name of a loop and `EXPR` is an expression. `break` and `break 'label` become +equivalent to `break ()` and `break 'label ()` respectively. + +### Result type of loop + +Currently the result type of a 'loop' without 'break' is `!` (never returns), +which may be coerced to any type. The result type of a 'loop' with a 'break' +is `()`. This is important since a loop may appear as the last expression of +a function: + +```rust +fn f() { + loop { + do_something(); + // never breaks + } +} +fn g() -> () { + loop { + do_something(); + if Q() { break; } + } +} +fn h() -> ! { + loop { + do_something(); + // this loop must diverge for the function to typecheck + } +} +``` + +This proposal allows 'loop' expression to be of any type `T`, following the same typing and +inference rules that are applicable to other expressions in the language. Type of `EXPR` in every +`break EXPR` and `break 'label EXPR` must be coercible to the type of the loop the `EXPR` appears +in. + + + +It is an error if these types do not agree or if the compiler's type deduction rules do not yield a +concrete type. + +Examples of errors: + +```rust +// error: loop type must be () and must be i32 +let a: i32 = loop { break; }; +// error: loop type must be i32 and must be &str +let b: i32 = loop { break "I am not an integer."; }; +// error: loop type must be Option<_> and must be &str +let c = loop { + if Q() { + break "answer"; + } else { + break None; + } +}; +fn z() -> ! { + // function does not return + // error: loop may break (same behaviour as before) + loop { + if Q() { break; } + } +} +``` + +Example showing the equivalence of `break;` and `break ();`: + +```rust +fn y() -> () { + loop { + if coin_flip() { + break; + } else { + break (); + } + } +} +``` + +Coercion examples: + +```rust +// ! coerces to any type +loop {}: (); +loop {}: u32; +loop { + break (loop {}: !); +}: u32; +loop { + // ... + break 42; + // ... + break panic!(); +}: u32; + +// break EXPRs are not of the same type, but both coerce to `&[u8]`. +let x = [0; 32]; +let y = [0; 48]; +loop { + // ... + break &x; + // ... + break &y; +}: &[u8]; +``` + + +### Result value + +A loop only yields a value if broken via some form of `break ...;` statement, +in which case it yields the value resulting from the evaulation of the +statement's expression (`EXPR` above), or `()` if there is no `EXPR` +expression. + +Examples: + +```rust +assert_eq!(loop { break; }, ()); +assert_eq!(loop { break 5; }, 5); +let x = 'a loop { + 'b loop { + break 'a 1; + } + break 'a 2; +}; +assert_eq!(x, 1); +``` + +# Drawbacks +[drawbacks]: #drawbacks + +The proposal changes the syntax of `break` statements, requiring updates to +parsers and possibly syntax highlighters. + +# Alternatives +[alternatives]: #alternatives + +No alternatives to the design have been suggested. It has been suggested that +the feature itself is unnecessary, and indeed much Rust code already exists +without it, however the pattern solves some cases which are difficult to handle +otherwise and allows more flexibility in code layout. + +# Unresolved questions +[unresolved]: #unresolved-questions + +### Extension to for, while, while let + +A frequently discussed issue is extension of this concept to allow `for`, +`while` and `while let` expressions to return values in a similar way. There is +however a complication: these expressions may also terminate "naturally" (not +via break), and no consensus has been reached on how the result value should +be determined in this case, or even the result type. + +There are three options: + +1. Do not adjust `for`, `while` or `while let` at this time +2. Adjust these control structures to return an `Option`, returning `None` + in the default case +3. Specify the default return value via some extra syntax + +#### Via `Option` + +Unfortunately, option (2) is not possible to implement cleanly without breaking +a lot of existing code: many functions use one of these control structures in +tail position, where the current "value" of the expression, `()`, is implicitly +used: + +```rust +// function returns `()` +fn print_my_values(v: &Vec) { + for x in v { + println!("Value: {}", x); + } + // loop exits with `()` which is implicitly "returned" from the function +} +``` + +Two variations of option (2) are possible: + +* Only adjust the control structures where they contain a `break EXPR;` or + `break 'label EXPR;` statement. This may work but would necessitate that + `break;` and `break ();` mean different things. +* As a special case, make `break ();` return `()` instead of `Some(())`, + while for other values `break x;` returns `Some(x)`. + +#### Via extra syntax for the default value + +Several syntaxes have been proposed for how a control structure's default value +is set. For example: + +```rust +fn first(list: Iterator) -> Option { + for x in list { + break Some(x); + } else default { + None + } +} +``` + +or: + +```rust +let x = for thing in things default "nope" { + if thing.valid() { break "found it!"; } +} +``` + +There are two things to bear in mind when considering new syntax: + +* It is undesirable to add a new keyword to the list of Rust's keywords +* It is strongly desirable that unbounded lookahead is required while syntax + parsing Rust code + +For more discussion on this topic, see [issue #961](https://github.com/rust-lang/rfcs/issues/961). diff --git a/text/1636-document_all_features.md b/text/1636-document_all_features.md new file mode 100644 index 00000000000..75336200056 --- /dev/null +++ b/text/1636-document_all_features.md @@ -0,0 +1,275 @@ +- Feature Name: document_all_features +- Start Date: 2016-06-03 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1636 +- Rust Issue: N/A + + +# Summary + +One of the major goals of Rust's development process is *stability without stagnation*. That means we add features regularly. However, it can be difficult to *use* those features if they are not publicly documented anywhere. Therefore, this RFC proposes requiring that all new language features and public standard library items must be documented before landing on the stable release branch (item documentation for the standard library; in the language reference for language features). + + +## Outline + +- Summary + - Outline +- Motivation + - The Current Situation + - Precedent +- Detailed design + - New RFC section: “How do we teach this?” + - New requirement to document changes before stabilizing + - Language features + - Reference + - The state of the reference + - _The Rust Programming Language_ + - Standard library +- How do we teach this? +- Drawbacks +- Alternatives +- Unresolved questions + + +# Motivation + +At present, new language features are often documented *only* in the RFCs which propose them and the associated announcement blog posts. Moreover, as features change, the existing official language documentation (the Rust Book, Rust by Example, and the language reference) can increasingly grow outdated. + +Although the Rust Book and Rust by Example are kept relatively up to date, [the reference is not][home-to-reference]: + +> While Rust does not have a specification, the reference tries to describe its working in detail. *It tends to be out of date.* (emphasis mine) + +Importantly, though, this warning only appears on the [main site][home-to-reference], not in the reference itself. If someone searches for e.g. that `deprecated` attribute and *does* find the discussion of the deprecated attribute, they will have no reason to believe that the reference is wrong. + +[home-to-reference]: https://www.rust-lang.org/documentation.html + +For example, the change in Rust 1.9 to allow users to use the `#[deprecated]` attribute for their own libraries was, at the time of writing this RFC, *nowhere* reflected in official documentation. (Many other examples could be supplied; this one was chosen for its relative simplicity and recency.) The Book's [discussion of attributes][book-attributes] linked to the [reference list of attributes][ref-attributes], but as of the time of writing the reference [still specifies][ref-compiler-attributes] that `deprecated` was a compiler-only feature. The two places where users might have become aware of the change are [the Rust 1.9 release blog post][1.9-blog] and the [RFC itself][RFC-1270]. Neither (yet) ranked highly in search; users were likely to be misled. + +[book-attributes]: https://doc.rust-lang.org/book/attributes.html +[ref-attributes]: https://doc.rust-lang.org/reference.html#attributes +[ref-compiler-attributes]: https://doc.rust-lang.org/reference.html#compiler-features +[1.9-blog]: http://blog.rust-lang.org/2016/05/26/Rust-1.9.html#deprecation-warnings +[RFC-1270]: https://github.com/rust-lang/rfcs/blob/master/text/1270-deprecation.md + +Changing this to require all language features to be documented before stabilization would mean Rust users can use the language documentation with high confidence that it will provide exhaustive coverage of all stable Rust features. + +Although the standard library is in excellent shape regarding documentation, including it in this policy will help guarantee that it remains so going forward. + +## The Current Situation + +Today, the canonical source of information about new language features is the RFCs which define them. The Rust Reference is substantially out of date, and not all new features have made their way into _The Rust Programming Language_. + +There are several serious problems with the _status quo_ of using RFCs as ad hoc documentation: + +1. Many users of Rust may simply not know that these RFCs exist. The number of users who do not know (or especially care) about the RFC process or its history will only increase as Rust becomes more popular. + +2. In many cases, especially in more complicated language features, some important elements of the decision, details of implementation, and expected behavior are fleshed out either in the pull-request discussion for the RFC, or in the implementation issues which follow them. + +3. The RFCs themselves, and even more so the associated pull request discussions, are often dense with programming language theory. This is as it should be in context, but it means that the relevant information may be inaccessible to Rust users without prior PLT background, or without the patience to wade through it. + +4. Similarly, information about the final decisions on language features is often buried deep at the end of long and winding threads (especially for a complicated feature like `impl` specialization). + +5. Information on how the features will be used is often closely coupled to information on how the features will be implemented, both in the RFCs and in the discussion threads. Again, this is as it should be, but it makes it difficult (at best!) for ordinary Rust users to read. + +In short, RFCs are a poor source of information about language features for the ordinary Rust user. Rust users should not need to be troubled with details of how the language is implemented works simply to learn how pieces of it work. Nor should they need to dig through tens (much less hundreds) of comments to determine what the final form of the feature is. + +However, there is currently no other documentation at all for many newer features. This is a significant barrier to adoption of the language, and equally of adoption of new features which will improve the ergonomics of the language. + +## Precedent + +This exact idea has been adopted by the Ember community after their somewhat bumpy transitions at the end of their 1.x cycle and leading into their 2.x transition. As one commenter there [put it][@davidgoli]: + +> The fact that 1.13 was released without updated guides is really discouraging to me as an Ember adopter. It may be much faster, the features may be much cooler, but to me, they don't exist unless I can learn how to use them from documentation. Documentation IS feature work. ([@davidgoli]) + +[@davidgoli]: https://github.com/emberjs/rfcs/pull/56#issuecomment-114635962 + +The Ember core team agreed, and embraced the principle outlined in [this comment][@guarav0]: + +> No version shall be released until guides and versioned API documentation is ready. This will allow newcomers the ability to understand the latest release. ([@guarav0]) + +[@guarav0]: https://github.com/emberjs/rfcs/pull/56#issuecomment-114339423 + +One of the main reasons not to adopt this approach, that it might block features from landing as soon as they otherwise might, was [addressed][@eccegordo] in that discussion as well: + +> Now if this documentation effort holds up the releases people are going to grumble. But so be it. The challenge will be to effectively parcel out the effort and relieve the core team to do what they do best. No single person should be a gate. But lack of good documentation should gate releases. That way a lot of eyes are forced to focus on the problem. We can't get the great new toys unless everybody can enjoy the toys. ([@eccegordo]) + +[@eccegordo]: https://github.com/emberjs/rfcs/pull/56#issuecomment-114389963 + +The basic decision has led to a substantial improvement in the currency of the documentation (which is now updated the same day as a new version is released). Moreover, it has spurred ongoing development of better tooling around documentation to manage these releases. Finally, at least in the RFC author's estimation, it has also led to a substantial increase in the overall quality of that documentation, possibly as a consequence of increasing the community involvement in the documentation process (including the formation of a documentation subteam). + + +# Detailed design + +The basic process of developing new language features will remain largely the same as today. The required changes are two additions: + +- a new section in the RFC, "How do we teach this?" modeled on Ember's updated RFC process + +- a new requirement that the changes themselves be properly documented before being merged to stable + + +## New RFC section: "How do we teach this?" + +Following the example of Ember.js, we must add a new section to the RFC, just after **Detailed design**, titled **How do we teach this?** The section should explain what changes need to be made to documentation, and if the feature substantially changes what would be considered the "best" way to solve a problem or is a fairly mainstream issue, discuss how it might be incorporated into _The Rust Programming Language_ and/or _Rust by Example_. + +Here is the Ember RFC section, with appropriate substitutions and modifications: + +> # How We Teach This +> What names and terminology work best for these concepts and why? How is this idea best presented? As a continuation of existing Rust patterns, or as a wholly new one? +> +> Would the acceptance of this proposal change how Rust is taught to new users at any level? What additions or changes to the Rust Reference, _The Rust Programing Language_, and/or _Rust by Example_ does it entail? +> +> How should this feature be introduced and taught to existing Rust users? + +For a great example of this in practice, see the (currently open) [Ember RFC: Module Unification], which includes several sections discussing conventions, tooling, concepts, and impacts on testing. + +[Ember RFC: Module Unification]: https://github.com/dgeb/rfcs/blob/module-unification/text/0000-module-unification.md#how-we-teach-this + +## New requirement to document changes before stabilizing + +[require-documentation-before-stabilization]: #new-requirement-to-document-changes-before-stabilizing + +Prior to stabilizing a feature, the features will now be documented as follows: + +- Language features: + - must be documented in the Rust Reference. + - should be documented in _The Rust Programming Language_. + - may be documented in _Rust by Example_. +- Standard library additions must include documentation in `std` API docs. +- Both language features and standard library changes must include: + - a single line for the changelog + - a longer summary for the long-form release announcement. + +Stabilization of a feature must not proceed until the requirements outlined in the **How We Teach This** section of the originating RFC have been fulfilled. + +### Language features + +We will document *all* language features in the Rust Reference, as well as updating _The Rust Programming Language_ and _Rust by Example_ as appropriate. (Not all features or changes will require updates to the books.) + +#### Reference + +[reference]: #reference + +This will necessarily be a manual process, involving updates to the `reference.md` file. (It may at some point be sensible to break up the Reference file for easier maintenance; that is left aside as orthogonal to this discussion.) + +Feature documentation does not need to be written by the feature author. In fact, this is one of the areas where the community may be most able to support the language/compiler developers even if not themselves programming language theorists or compiler hackers. This may free up the compiler developers' time. It will also help communicate the features in a way that is accessible to ordinary Rust users. + +New features do not need to be documented to be merged into `master`/nightly + +Instead, the documentation process should immediately precede the move to stabilize. Once the *feature* has been deemed ready for stabilization, either the author or a community volunteer should write the *reference material* for the feature, to be incorporated into the Rust Reference. + +The reference material need not be especially long, but it should be long enough for ordinary users to learn how to use the language feature *without reading the RFCs*. + +Discussion of stabilizing a feature in a given release will now include the status of the reference material. + +##### The current state of the reference + +[refstate]: #the-current-state-of-the-reference + +Since the reference is fairly out of date, we should create a "strike team" to update it. This can proceed in parallel with the documentation of new features. + +Updating the reference should proceed stepwise: + +1. Begin by adding an appendix in the reference with links to all accepted RFCs which have been implemented but are not yet referenced in the documentation. +2. As the reference material is written for each of those RFC features, remove it from that appendix. + +The current presentation of the reference is also in need of improvement: a single web page with *all* of this content is difficult to navigate, or to update. Therefore, the strike team may also take this opportunity to reorganize the reference and update its presentation. + +#### _The Rust Programming Language_ + +[trpl]: #the-rust-programming-language + +Most new language features should be added to _The Rust Programming Language_. However, since the book is planned to go to print, the main text of the book is expected to be fixed between major revisions. As such, new features should be documented in an online appendix to the book, which may be titled e.g. "Newest Features." + +The published version of the book should note that changes and languages features made available after the book went to print will be documented in that online appendix. + +### Standard library + +In the case of the standard library, this could conceivably be managed by setting the `#[forbid(missing_docs)]` attribute on the library roots. In lieu of that, manual code review and general discipline should continue to serve. However, if automated tools *can* be employed here, they should. + +# How do we teach this? + +Since this RFC promotes including this section, it includes it itself. (RFCs, unlike Rust `struct` or `enum` types, may be freely self-referential. No boxing required.) + +To be most effective, this will involve some changes both at a process and core-team level, and at a community level. + +1. The RFC template must be updated to include the new section for teaching. +2. The RFC process in the [RFCs README] must be updated, specifically by including "fail to include a plan for documenting the feature" in the list of possible problems in "Submit a pull request step" in [What the process is]. +3. Make documentation and teachability of new features *equally* high priority with the features themselves, and communicate this clearly in discussion of the features. (Much of the community is already very good about including this in considerations of language design; this simply makes this an explicit goal of discussions around RFCs.) + +[RFCs README]: https://github.com/rust-lang/rfcs/blob/master/README.md +[What the process is]: https://github.com/rust-lang/rfcs/blob/master/README.md#what-the-process-is + +This is also an opportunity to allow/enable community members with less experience to contribute more actively to _The Rust Programming Language_, _Rust by Example_, and the Rust Reference. + +1. We should write issues for feature documentation, and may flag them as approachable entry points for new users. + +2. We may use the more complicated language reference issues as points for mentoring developers interested in contributing to the compiler. Helping document a complex language feature may be a useful on-ramp for working on the compiler itself. + +At a "messaging" level, we should continue to emphasize that *documentation is just as valuable as code*. For example (and there are many other similar opportunities): in addition to highlighting new language features in the release notes for each version, we might highlight any part of the documentation which saw substantial improvement in the release. + + +# Drawbacks + +1. The largest drawback at present is that the language reference is *already* quite out of date. It may take substantial work to get it up to date so that new changes can be landed appropriately. (Arguably, however, this should be done regardless, since the language reference is an important part of the language ecosystem.) + +2. Another potential issue is that some sections of the reference are particularly thorny and must be handled with considerable care (e.g. lifetimes). Although in general it would not be necessary for the author of the new language feature to write all the documentation, considerable extra care and oversight would need to be in place for these sections. + +3. This may delay landing features on stable. However, all the points raised in **Precedent** on this apply, especially: + + > We can't get the great new toys unless everybody can enjoy the toys. ([@eccegordo]) + + For Rust to attain its goal of *stability without stagnation*, its documentation must also be stable and not stagnant. + +4. If the forthcoming docs team is unable to provide significant support, and perhaps equally if the rest of the community does not also increase involvement, this will simply not work. No individual can manage all of these docs alone. + + +# Alternatives + +- **Just add the "How do we teach this?" section.** + + Of all the alternatives, this is the easiest (and probably the best). It does not substantially change the state with regard to the documentation, and even having the section in the RFC does not mean that it will end up added to the docs, as evidence by the [`#[deprecated]` RFC][RFC 1270], which included as part of its text: + + > The language reference will be extended to describe this feature as outlined in this RFC. Authors shall be advised to leave their users enough time to react before removing a deprecated item. + + This is not a small downside by any stretch—but adding the section to the RFC will still have all the secondary benefits noted above, and it probably at least somewhat increases the likelihood that new features do get documented. + +- **Embrace the documentation, but do not include "How do we teach this?" section in new RFCs.** + + This still gives us most of the benefits (and was in fact the original form of the proposal), and does not place a new burden on RFC authors to make sure that knowing how to *teach* something is part of any new language or standard library feature. + + On the other hand, thinking about the impact on teaching should further improve consideration of the general ergonomics of a proposed feature. If something cannot be *taught* well, it's likely the design needs further refinement. + +- **No change; leave RFCs as canonical documentation.** + + This approach can take (at least) two forms: + + + 1. We can leave things as they are, where the RFC and surrounding discussion form the primary point of documentation for newer-than-1.0 language features. As part of that, we could just link more prominently to the RFC repository and describe the process from the documentation pages. + 2. We could automatically render the text of the RFCs into part of the documentation used on the site (via submodules and the existing tooling around Markdown documents used for Rust documentation). + + However, for all the reasons highlighted above in **Motivation: The Current Situation**, RFCs and their associated threads are *not* a good canonical source of information on language features. + +- **Add a rule for the standard library but not for language features.** + + This would basically just turn the _status quo_ into an official policy. It has all the same drawbacks as no change at all, but with the possible benefit of enabling automated checks on standard library documentation. + +- **Add a rule for language features but not for the standard library.** + + The standard library is in much better shape, in no small part because of the ease of writing inline documentation for new modules. Adding a formal rule may not be necessary if good habits are already in place. + + On the other hand, having a formal policy would not seem to *hurt* anything here; it would simply formalize what is already happening (and perhaps, via linting attributes, make it easy to spot when it has failed). + +- **Eliminate the reference entirely.** + + Since the reference is already substantially out of date, it might make sense to stop presenting it publicly at all, at least until such a time as it has been completely reworked and updated. + + The main upside to this is the reality that an outdated and inaccurate reference may be worse than no reference at all, as it may mislead espiecally new Rust users. + + The main downside, of course, is that this would leave very large swaths of the language basically without *any* documentation, and even more of it only documented in RFCs than is the case today. + + +[RFC 1270]: https://github.com/rust-lang/rfcs/pull/1270 + +# Unresolved questions + +- How do we clearly distinguish between features on nightly, beta, and stable Rust—in the reference especially, but also in the book? +- For the standard library, once it migrates to a crates structure, should it simply include the `#[forbid(missing_docs)]` attribute on all crates to set this as a build error? diff --git a/text/1640-duration-checked-sub.md b/text/1640-duration-checked-sub.md new file mode 100644 index 00000000000..362ff2b1382 --- /dev/null +++ b/text/1640-duration-checked-sub.md @@ -0,0 +1,102 @@ +- Feature Name: `duration_checked` +- Start Date: 2016-06-04 +- RFC PR: [rust-lang/rfcs#1640](https://github.com/rust-lang/rfcs/pull/1640) +- Rust Issue: [rust-lang/rust#35774](https://github.com/rust-lang/rust/issues/35774) + +# Summary +[summary]: #summary + +This RFC adds the `checked_*` methods already known from primitives like +`usize` to `Duration`. + +# Motivation +[motivation]: #motivation + +Generally this helps when subtracting `Duration`s which can be the case quite +often. + +One abstract example would be executing a specific piece of code repeatedly +after a constant amount of time. + +Specific examples would be a network service or a rendering process emitting a +constant amount of frames per second. + +Example code would be as follows: + +```rust + +// This function is called repeatedly +fn render() { + // 10ms delay results in 100 frames per second + let wait_time = Duration::from_millis(10); + + // `Instant` for elapsed time + let start = Instant::now(); + + // execute code here + render_and_output_frame(); + + // there are no negative `Duration`s so this does nothing if the elapsed + // time is longer than the defined `wait_time` + start.elapsed().checked_sub(wait_time).and_then(std::thread::sleep); +} +``` + +Of course it is also suitable to not introduce `panic!()`s when adding +`Duration`s. + +# Detailed design +[design]: #detailed-design + +The detailed design would be exactly as the current `sub()` method, just +returning an `Option` and passing possible `None` values from the +underlying primitive types: + +```rust +impl Duration { + fn checked_sub(self, rhs: Duration) -> Option { + if let Some(mut secs) = self.secs.checked_sub(rhs.secs) { + let nanos = if self.nanos >= rhs.nanos { + self.nanos - rhs.nanos + } else { + if let Some(secs) = secs.checked_sub(1) { + self.nanos + NANOS_PER_SEC - rhs.nanos + } + else { + return None; + } + }; + debug_assert!(nanos < NANOS_PER_SEC); + Some(Duration { secs: secs, nanos: nanos }) + } + else { + None + } + } +} +``` + +The same accounts for all other added methods, namely: + +- `checked_add()` +- `checked_sub()` +- `checked_mul()` +- `checked_div()` + +# Drawbacks +[drawbacks]: #drawbacks + +`None`. + +# Alternatives +[alternatives]: #alternatives + +The alternatives are simply not doing this and forcing the programmer to code +the check on their behalf. +This is not what you want. + +# Unresolved questions +[unresolved]: #unresolved-questions + +`None`. + diff --git a/text/1643-memory-model-strike-team.md b/text/1643-memory-model-strike-team.md new file mode 100644 index 00000000000..a8c5b1e9103 --- /dev/null +++ b/text/1643-memory-model-strike-team.md @@ -0,0 +1,313 @@ +- Feature Name: N/A +- Start Date: 2016-06-07 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1643 +- Rust Issue: N/A + +# Summary +[summary]: #summary + +Incorporate a strike team dedicated to preparing rules and guidelines +for writing unsafe code in Rust (commonly referred to as Rust's +"memory model"), in cooperation with the lang team. The discussion +will generally proceed in phases, starting with establishing +high-level principles and gradually getting down to the nitty gritty +details (though some back and forth is expected). The strike team will +produce various intermediate documents that will be submitted as +normal RFCs. + +# Motivation +[motivation]: #motivation + +Rust's safe type system offers very strong aliasing information that +promises to be a rich source of compiler optimization. For example, +in safe code, the compiler can infer that if a function takes two +`&mut T` parameters, those two parameters must reference disjoint +areas of memory (this allows optimizations similar to C99's `restrict` +keyword, except that it is both automatic and fully enforced). The +compiler also knows that given a shared reference type `&T`, the +referent is immutable, except for data contained in an `UnsafeCell`. + +Unfortunately, there is a fly in the ointment. Unsafe code can easily +be made to violate these sorts of rules. For example, using unsafe +code, it is trivial to create two `&mut` references that both refer to +the same memory (and which are simultaneously usable). In that case, +if the unsafe code were to (say) return those two points to safe code, +that would undermine Rust's safety guarantees -- hence it's clear that +this code would be "incorrect". + +But things become more subtle when we just consider what happens +*within* the abstraction. For example, is unsafe code allowed to use +two overlapping `&mut` references internally, without returning it to +the wild? Is it all right to overlap with `*mut`? And so forth. + +It is the contention of this RFC that a complete guidelines for unsafe +code are far too big a topic to be fruitfully addressed in a single +RFC. Therefore, this RFC proposes the formation of a dedicated +**strike team** (that is, a temporary, single-purpose team) that will +work on hammering out the details over time. Precise membership of +this team is not part of this RFC, but will be determined by the lang +team as well as the strike team itself. + +The unsafe guidelines work will proceed in rough stages, described +below. An initial goal is to produce a **high-level summary detailing +the general approach of the guidelines.** Ideally, this summary should +be sufficient to help guide unsafe authors in best practices that are +most likely to be forwards compatible. Further work will then expand +on the model to produce a more **detailed set of rules**, which may in +turn require revisiting the high-level summary if contradictions are +uncovered. + +This new "unsafe code" strike team is intended to work in +collaboration with the existing lang team. Ultimately, whatever rules +are crafted must be adopted with the **general consensus of both the +strike team and the lang team**. It is expected that lang team members +will be more involved in the early discussions that govern the overall +direction and less involved in the fine details. + +#### History and recent discussions + +The history of optimizing C can be instructive. All code in C is +effectively unsafe, and so in order to perform optimizations, +compilers have come to lean heavily on the notion of "undefined +behavior" as well as various ad-hoc rules about what programs ought +not to do (see e.g. [these][cl1] [three][cl2] [posts][cl3] entitled +"What Every C Programmer Should Know About Undefined Behavior", by +Chris Lattner). This can cause some very surprising behavior (see e.g. +["What Every Compiler Author Should Know About Programmers"][cap] or +[this blog post by John Regehr][jr], which is quite humorous). Note that +Rust has a big advantage over C here, in that only the authors of +unsafe code should need to worry about these rules. + +[cl1]: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html +[cl2]: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html +[cl3]: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html +[cap]: http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf +[jr]: http://blog.regehr.org/archives/761 + +In terms of Rust itself, there has been a large amount of discussion +over the years. Here is a (non-comprehensive) set of relevant links, +with a strong bias towards recent discussion: + +- [RFC Issue #1447](https://github.com/rust-lang/rfcs/issues/1447) provides + a general set of links as well as some discussion. +- [RFC #1578](https://github.com/rust-lang/rfcs/pull/1578) is an initial + proposal for a Rust memory model by ubsan. +- The + [Tootsie Pop](http://smallcultfollowing.com/babysteps/blog/2016/05/27/the-tootsie-pop-model-for-unsafe-code/) + blog post by nmatsakis proposed an alternative approach, building on + [background about unsafe abstractions](http://smallcultfollowing.com/babysteps/blog/2016/05/23/unsafe-abstractions/) + described in an earlir post. There is also a lot of valuable + discussion in + [the corresponding internals thread](http://smallcultfollowing.com/babysteps/blog/2016/05/23/unsafe-abstractions/). + +#### Other factors + +Another factor that must be considered is the interaction with weak +memory models. Most of the links above focus purely on sequential +code: Rust has more-or-less adopted the C++ memory model for governing +interactions across threads. But there may well be subtle cases that +arise we delve deeper. For more on the C++ memory model, see +[Hans Boehm's excellent webpage](http://www.hboehm.info/c++mm/). + +# Detailed design +[design]: #detailed-design + +## Scope + +Here are some of the issues that should be resolved as part of these +unsafe code guidelines. The following list is not intended as +comprehensive (suggestions for additions welcome): + +- Legal aliasing rules and patterns of memory accesses + - e.g., which of the patterns listed in [rust-lang/rust#19733](https://github.com/rust-lang/rust/issues/19733) + are legal? + - can unsafe code create (but not use) overlapping `&mut`? + - under what conditions is it legal to dereference a `*mut T`? + - when can an `&mut T` legally alias an `*mut T`? +- Struct layout guarantees +- Interactions around zero-sized types + - e.g., what pointer values can legally be considered a `Box`? +- Allocator dependencies + +One specific area that we can hopefully "outsource" is detailed rules +regarding the interaction of different threads. Rust exposes atomics +that roughly correspond to C++11 atomics, and the intention is that we +can layer our rules for sequential execution atop those rules for +parallel execution. + +## Termination conditions + +The unsafe code guidelines team is intended as a temporary strike team +with the goal of producing the documents described below. Once the RFC +for those documents have been approved, responsibility for maintaining +the documents falls to the lang team. + +## Time frame + +Working out a a set of rules for unsafe code is a detailed process and +is expected to take months (or longer, depending on the level of +detail we ultimately aim for). However, the intention is to publish +preliminary documents as RFCs as we go, so hopefully we can be +providing ever more specific guidance for unsafe code authors. + +Note that even once an initial set of guidelines is adopted, problems +or inconsistencies may be found. If that happens, the guidelines will +be adjusted as needed to correct the problem, naturally with an eye +towards backwards compatibility. In other words, the unsafe +guidelines, like the rules for Rust language itself, should be +considered a "living document". + +As a note of caution, experience from other languages such as Java or +C++ suggests that the work on memory models can take years. Moreover, +even once a memory model is adopted, it can be unclear whether +[common compiler optimizations are actually permitted](http://www.di.ens.fr/~zappa/readings/c11comp.pdf) +under the model. The hope is that by focusing on sequential and +Rust-specific issues we can sidestep some of these quandries. + +## Intermediate documents + +Because hammering out the finer points of the memory model is expected +to possibly take some time, it is important to produce intermediate +agreements. This section describes some of the documents that may be +useful. These also serve as a rough guideline to the overall "phases" +of discussion that are expected, though in practice discussion will +likely go back and forth: + +- **Key examples and optimizations**: highlighting code examples that + ought to work, or optimizations we should be able to do, as well as + some that will not work, or those whose outcome is in doubt. +- **High-level design**: describe the rules at a high-level. This + would likely be the document that unsafe code authors would read to + know if their code is correct in the majority of scenarios. Think of + this as the "user's guide". +- **Detailed rules**: More comprehensive rules. Think of this as the + "reference manual". + +Note that both the "high-level design" and "detailed rules", once +considered complete, will be submitted as RFCs and undergo the usual +final comment period. + +### Key examples and optimizations + +Probably a good first step is to agree on some key examples and +overall principles. Examples would fall into several categories: + +- Unsafe code that we feel **must** be considered **legal** by any model +- Unsafe code that we feel **must** be considered **illegal** by any model +- Unsafe code that we feel **may or may not** be considered legal +- Optimizations that we **must** be able to perform +- Optimizations that we **should not** expect to be able to perform +- Optimizations that it would be nice to have, but which may be sacrificed + if needed + +Having such guiding examples naturally helps to steer the effort, but +it also helps to provide guidance for unsafe code authors in the +meantime. These examples illustrate patterns that one can adopt with +reasonable confidence. + +Deciding about these examples should also help in enumerating the +guiding principles we would like to adhere to. The design of a memory +model ultimately requires balancing several competing factors and it +may be useful to state our expectations up front on how these will be +weighed: + +- **Optimization.** The stricter the rules, the more we can optimize. + - on the other hand, rules that are overly strict may prevent people + from writing unsafe code that they would like to write, ultimately + leading to slower exeution. +- **Comprehensibility.** It is important to strive for rules that end + users can readily understand. If learning the rules requires diving + into academic papers or using Coq, it's a non-starter. +- **Effect on existing code.** No matter what model we adopt, existing + unsafe code may or may not comply. If we then proceed to optimize, + this could cause running code to stop working. While + [RFC 1122](https://github.com/rust-lang/rfcs/blob/master/text/1122-language-semver.md) + explicitly specified that the rules for unsafe code may change, we + will have to decide where to draw the line in terms of how much to + weight backwards compatibility. + +It is expected that the lang team will be **highly involved** in this discussion. + +It is also expected that we will gather examples in the following ways: + +- survey existing unsafe code; +- solicit suggestions of patterns from the Rust-using public: + - scenarios where they would like an official judgement; + - interesting questions involving the standard library. + +### High-level design + +The next document to produce is to settle on a high-level +design. There have already been several approaches floated. This phase +should build on the examples from before, in that proposals can be +weighed against their effect on the examples and optimizations. + +There will likely also be some feedback between this phase and the +previosu: as new proposals are considered, that may generate new +examples that were not relevant previously. + +Note that even once a high-level design is adopted, it will be +considered "tentative" and "unstable" until the detailed rules have +been worked out to a reasonable level of confidence. + +Once a high-level design is adopted, it may also be used by the +compiler team to inform which optimizations are legal or illegal. +However, if changes are later made, the compiler will naturally have +to be adjusted to match. + +It is expected that the lang team will be **highly involved** in this discussion. + +### Detailed rules + +Once we've settled on a high-level path -- and, no doubt, while in the +process of doing so as well -- we can begin to enumerate more detailed +rules. It is also expected that working out the rules may uncover +contradictions or other problems that require revisiting the +high-level design. + +### Lints and other checkers + +Ideally, the team will also consider whether automated checking for +conformance is possible. It is not a responsibility of this strike +team to produce such automated checking, but automated checking is +naturally a big plus! + +## Repository + +In general, the memory model discussion will be centered on a specific +repository (perhaps +, but perhaps moved +to the rust-lang organization). This allows for multi-faced +discussion: for example, we can open issues on particular questions, +as well as storing the various proposals and litmus tests in their own +directories. We'll work out and document the procedures and +conventions here as we go. + +# Drawbacks +[drawbacks]: #drawbacks + +The main drawback is that this discussion will require time and energy +which could be spent elsewhere. The justification for spending time on +developing the memory model instead is that it is crucial to enable +the compiler to perform aggressive optimizations. Until now, we've +limited ourselves by and large to conservative optimizations (though +we do supply some LLVM aliasing hints that can be affected by unsafe +code). As the transition to MIR comes to fruition, it is clear that we +will be in a place to perform more aggressive optimization, and hence +the need for rules and guidelines is becoming more acute. We can +continue to adopt a conservative course, but this risks growing an +ever larger body of code dependent on the compiler not performing +aggressive optimization, which may close those doors forever. + +# Alternatives +[alternatives]: #alternatives + +- Adopt a memory model in one fell swoop: + - considered too complicated +- Defer adopting a memory model for longer: + - considered too risky + +# Unresolved questions +[unresolved]: #unresolved-questions + +None. diff --git a/text/1644-default-and-expanded-rustc-errors.md b/text/1644-default-and-expanded-rustc-errors.md new file mode 100644 index 00000000000..5bf1816baac --- /dev/null +++ b/text/1644-default-and-expanded-rustc-errors.md @@ -0,0 +1,391 @@ +- Feature Name: `default_and_expanded_errors_for_rustc` +- Start Date: 2016-06-07 +- RFC PR: [rust-lang/rfcs#1644](https://github.com/rust-lang/rfcs/pull/1644) +- Rust Issue: [rust-lang/rust#34826](https://github.com/rust-lang/rust/issues/34826) + [rust-lang/rust#34827](https://github.com/rust-lang/rust/issues/34827) + +# Summary +This RFC proposes an update to error reporting in rustc. Its focus is to change the format of Rust +error messages and improve --explain capabilities to focus on the user's code. The end goal is for +errors and explain text to be more readable, more friendly to new users, while still helping Rust +coders fix bugs as quickly as possible. We expect to follow this RFC with a supplemental RFC that +provides a writing style guide for error messages and explain text with a focus on readability and +education. + +# Motivation + +## Default error format + +Rust offers a unique value proposition in the landscape of languages in part by codifying concepts +like ownership and borrowing. Because these concepts are unique to Rust, it's critical that the +learning curve be as smooth as possible. And one of the most important tools for lowering the +learning curve is providing excellent errors that serve to make the concepts less intimidating, +and to help 'tell the story' about what those concepts mean in the context of the programmer's code. + +[as text] +``` +src/test/compile-fail/borrowck/borrowck-borrow-from-owned-ptr.rs:29:22: 29:30 error: cannot borrow `foo.bar1` as mutable more than once at a time [E0499] +src/test/compile-fail/borrowck/borrowck-borrow-from-owned-ptr.rs:29 let _bar2 = &mut foo.bar1; + ^~~~~~~~ +src/test/compile-fail/borrowck/borrowck-borrow-from-owned-ptr.rs:29:22: 29:30 help: run `rustc --explain E0499` to see a detailed explanation +src/test/compile-fail/borrowck/borrowck-borrow-from-owned-ptr.rs:28:21: 28:29 note: previous borrow of `foo.bar1` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `foo.bar1` until the borrow ends +src/test/compile-fail/borrowck/borrowck-borrow-from-owned-ptr.rs:28 let bar1 = &mut foo.bar1; + ^~~~~~~~ +src/test/compile-fail/borrowck/borrowck-borrow-from-owned-ptr.rs:31:2: 31:2 note: previous borrow ends here +src/test/compile-fail/borrowck/borrowck-borrow-from-owned-ptr.rs:26 fn borrow_same_field_twice_mut_mut() { +src/test/compile-fail/borrowck/borrowck-borrow-from-owned-ptr.rs:27 let mut foo = make_foo(); +src/test/compile-fail/borrowck/borrowck-borrow-from-owned-ptr.rs:28 let bar1 = &mut foo.bar1; +src/test/compile-fail/borrowck/borrowck-borrow-from-owned-ptr.rs:29 let _bar2 = &mut foo.bar1; +src/test/compile-fail/borrowck/borrowck-borrow-from-owned-ptr.rs:30 *bar1; +src/test/compile-fail/borrowck/borrowck-borrow-from-owned-ptr.rs:31 } + ^ +``` + +[as image] +![Image of new error flow](http://www.jonathanturner.org/images/old_errors_3.png) + +*Example of a borrow check error in the current compiler* + +Though a lot of time has been spent on the current error messages, they have a couple flaws which +make them difficult to use. Specifically, the current error format: + +* Repeats the file position on the left-hand side. This offers no additional information, but +instead makes the error harder to read. +* Prints messages about lines often out of order. This makes it difficult for the developer to +glance at the error and recognize why the error is occuring +* Lacks a clear visual break between errors. As more errors occur it becomes more difficult to tell +them apart. +* Uses technical terminology that is difficult for new users who may be unfamiliar with compiler +terminology or terminology specific to Rust. + +This RFC details a redesign of errors to focus more on the source the programmer wrote. This format +addresses the above concerns by eliminating clutter, following a more natural order for help +messages, and pointing the user to both "what" the error is and "why" the error is occurring by +using color-coded labels. Below you can see the same error again, this time using the proposed +format: + +[as text] +``` +error[E0499]: cannot borrow `foo.bar1` as mutable more than once at a time + --> src/test/compile-fail/borrowck/borrowck-borrow-from-owned-ptr.rs:29:22 + | +28 | let bar1 = &mut foo.bar1; + | -------- first mutable borrow occurs here +29 | let _bar2 = &mut foo.bar1; + | ^^^^^^^^ second mutable borrow occurs here +30 | *bar1; +31 | } + | - first borrow ends here +``` + +[as image] + + + +*Example of the same borrow check error in the proposed format* + +## Expanded error format (revised --explain) + +Languages like Elm have shown how effective an educational tool error messages can be if the +explanations like our --explain text are mixed with the user's code. As mentioned earlier, it's +crucial for Rust to be easy-to-use, especially since it introduces a fair number of concepts that +may be unfamiliar to the user. Even experienced users may need to use --explain text from time to +time when they encounter unfamiliar messages. + +While we have --explain text today, it uses generic examples that require the user to mentally +translate the given example into what works for their specific situation. + +``` +You tried to move out of a value which was borrowed. Erroneous code example: + +use std::cell::RefCell; + +struct TheDarkKnight; + +impl TheDarkKnight { + fn nothing_is_true(self) {} +} +... +``` + +*Example of the current --explain (showing E0507)* + +To help users, this RFC proposes a new `--explain errors`. This new mode is more textual error +reporting mode that gives additional explanation to help better understand compiler messages. The +end result is a richer, on-demand error reporting style. + +``` +error: cannot move out of borrowed content + --> /Users/jturner/Source/errors/borrowck-move-out-of-vec-tail.rs:30:17 + +I’m trying to track the ownership of the contents of `tail`, which is borrowed, through this match +statement: + +29 | match tail { + +In this match, you use an expression of the form [...]. When you do this, it’s like you are opening +up the `tail` value and taking out its contents. Because `tail` is borrowed, you can’t safely move +the contents. + +30 | [Foo { string: aa }, + | ^^ cannot move out of borrowed content + +You can avoid moving the contents out by working with each part using a reference rather than a +move. A naive fix might look this: + +30 | [Foo { string: ref aa }, + +``` + +# Detailed design + +The RFC is separated into two parts: the format of error messages and the format of expanded error +messages (using `--explain errors`). + +## Format of error messages + +The proposal is a lighter error format focused on the code the user wrote. Messages that help +understand why an error occurred appear as labels on the source. The goals of this new format are +to: + +* Create something that's visually easy to parse +* Remove noise/unnecessary information +* Present information in a way that works well for new developers, post-onboarding, and experienced +developers without special configuration +* Draw inspiration from Elm as well as Dybuk and other systems that have already improved on the +kind of errors that Rust has. + +In order to accomplish this, the proposed design needs to satisfy a number of constraints to make +the result maximally flexible across various terminals: + +* Multiple errors beside each other should be clearly separate and not muddled together. +* Each error message should draw the eye to where the error occurs with sufficient context to +understand why the error occurs. +* Each error should have a "header" section that is visually distinct from the code section. +* Code should visually stand out from text and other error messages. This allows the developer to +immediately recognize their code. +* Error messages should be just as readable when not using colors (eg for users of black-and-white +terminals, color-impaired readers, weird color schemes that we can't predict, or just people that +turn colors off) +* Be careful using “ascii art” and avoid unicode. Instead look for ways to show the information +concisely that will work across the broadest number of terminals. We expect IDEs to possibly allow +for a more graphical error in the future. +* Where possible, use labels on the source itself rather than sentence "notes" at the end. +* Keep filename:line easy to spot for people who use editors that let them click on errors + +### Header + +``` +error[E0499]: cannot borrow `foo.bar1` as mutable more than once at a time + --> src/test/compile-fail/borrowck/borrowck-borrow-from-owned-ptr.rs:29:22 +``` + +The header still serves the original purpose of knowing: a) if it's a warning or error, b) the text +of the warning/error, and c) the location of this warning/error. We keep the error code, now a part +of the error indicator, as a way to help improve search results. + +### Line number column + +``` + | +28 | + | +29 | + | +30 | +31 | + | +``` + +The line number column lets you know where the error is occurring in the file. Because we only show +lines that are of interest for the given error/warning, we elide lines if they are not annotated as +part of the message (we currently use the heuristic to elide after one un-annotated line). + +Inspired by Dybuk and Elm, the line numbers are separated with a 'wall', a separator formed from +pipe('|') characters, to clearly distinguish what is a line number from what is source at a glance. + +As the wall also forms a way to visually separate distinct errors, we propose extending this concept +to also support span-less notes and hints. For example: + +``` +92 | config.target_dir(&pkg) + | ^^^^ expected `core::workspace::Workspace`, found `core::package::Package` + = note: expected type `&core::workspace::Workspace<'_>` + = note: found type `&core::package::Package` +``` +### Source area + +``` + let bar1 = &mut foo.bar1; + -------- first mutable borrow occurs here + let _bar2 = &mut foo.bar1; + ^^^^^^^^ second mutable borrow occurs here + *bar1; + } + - first borrow ends here +``` + +The source area shows the related source code for the error/warning. The source is laid out in the +order it appears in the source file, giving the user a way to map the message against the source +they wrote. + +Key parts of the code are labeled with messages to help the user understand the message. + +The primary label is the label associated with the main warning/error. It explains the **what** of +the compiler message. By reading it, the user can begin to understand what the root cause of the +error or warning is. This label is colored to match the level of the message (yellow for warning, +red for error) and uses the ^^^ underline. + +Secondary labels help to understand the error and use blue text and --- underline. These labels +explain the **why** of the compiler message. You can see one such example in the above message +where the secondary labels explain that there is already another borrow going on. In another +example, we see another way that primary and secondary work together to tell the whole story for +why the error occurred. + +Taken together, primary and secondary labels create a 'flow' to the message. Flow in the message +lets the user glance at the colored labels and quickly form an educated guess as to how to correctly +update their code. + +Note: We'll talk more about additional style guidance for wording to help create flow in the +subsequent style RFC. + +## Expanded error messages + +Currently, --explain text focuses on the error code. You invoke the compiler with --explain + and receive a verbose description of what causes errors of that number. The resulting +message can be helpful, but it uses generic sample code which makes it feel less connected to the +user's code. + +We propose adding a new `--explain errors`. By passing this to the compiler (or to cargo), the +compiler will switch to an expanded error form which incorporates the same source and label +information the user saw in the default message with more explanation text. + +``` +error: cannot move out of borrowed content + --> /Users/jturner/Source/errors/borrowck-move-out-of-vec-tail.rs:30:17 + +I’m trying to track the ownership of the contents of `tail`, which is borrowed, through this match +statement: + +29 | match tail { + +In this match, you use an expression of the form [...]. When you do this, it’s like you are opening +up the `tail` value and taking out its contents. Because `tail` is borrowed, you can’t safely move +the contents. + +30 | [Foo { string: aa }, + | ^^ cannot move out of borrowed content + +You can avoid moving the contents out by working with each part using a reference rather than a +move. A naive fix might look this: + +30 | [Foo { string: ref aa }, +``` + +*Example of an expanded error message* + +The expanded error message effectively becomes a template. The text of the template is the +educational text that is explaining the message more more detail. The template is then populated +using the source lines, labels, and spans from the same compiler message that's printed in the +default mode. This lets the message writer call out each label or span as appropriate in the +expanded text. + +It's possible to also add additional labels that aren't necessarily shown in the default error mode +but would be available in the expanded error format. This gives the explain text writer maximal +flexibility without impacting the readability of the default message. I'm currently prototyping an +implementation of how this templating could work in practice. + +## Tying it together + +Lastly, we propose that the final error message: + +``` +error: aborting due to 2 previous errors +``` + +Be changed to notify users of this ability: + +``` +note: compile failed due to 2 errors. You can compile again with `--explain errors` for more information +``` + +# Drawbacks + +Changes in the error format can impact integration with other tools. For example, IDEs that use a +simple regex to detect the error would need to be updated to support the new format. This takes +time and community coordination. + +While the new error format has a lot of benefits, it's possible that some errors will feel +"shoehorned" into it and, even after careful selection of secondary labels, may still not read as +well as the original format. + +There is a fair amount of work involved to update the errors and explain text to the proposed +format. + +# Alternatives + +Rather than using the proposed error format format, we could only provide the verbose --explain +style that is proposed in this RFC. Respected programmers like +[John Carmack](https://twitter.com/ID_AA_Carmack/status/735197548034412546) have praised the Elm +error format. + +``` +Detected errors in 1 module. + +-- TYPE MISMATCH --------------------------------------------------------------- +The right argument of (+) is causing a type mismatch. + +25| model + "1" + ^^^ +(+) is expecting the right argument to be a: + + number + +But the right argument is: + + String + +Hint: To append strings in Elm, you need to use the (++) operator, not (+). + + +Hint: I always figure out the type of the left argument first and if it is acceptable on its own, I +assume it is "correct" in subsequent checks. So the problem may actually be in how the left and +right arguments interact. +``` + +*Example of an Elm error* + +In developing this RFC, we experimented with both styles. The Elm error format is great as an +educational tool, and we wanted to leverage its style in Rust. For day-to-day work, though, we +favor an error format that puts heavy emphasis on quickly guiding the user to what the error is and +why it occurred, with an easy way to get the richer explanations (using --explain) when the user +wants them. + +# Stabilization + +Currently, this new rust error format is available on nightly using the +```export RUST_NEW_ERROR_FORMAT=true``` environment variable. Ultimately, this should become the +default. In order to get there, we need to ensure that the new error format is indeed an +improvement over the existing format in practice. + +We also have not yet implemented the extended error format. This format will also be gated by its +own flag while we explore and stabilize it. Because of the relative difference in maturity here, +the default error message will be behind a flag for a cycle before it becomes default. The extended +error format will be implemented and a follow-up RFC will be posted describing its design. This will +start its stabilization period, after which time it too will be enabled. + +How do we measure the readability of error messages? This RFC details an educated guess as to what +would improve the current state but shows no ways to measure success. + +Likewise, while some of us have been dogfooding these errors, we don't know what long-term use feels +like. For example, after a time does the use of color feel excessive? We can always update the +errors as we go, but it'd be helpful to catch it early if possible. + +# Unresolved questions + +There are a few unresolved questions: +* Editors that rely on pattern-matching the compiler output will need to be updated. It's an open +question how best to transition to using the new errors. There is on-going discussion of +standardizing the JSON output, which could also be used. +* Can additional error notes be shown without the "rainbow problem" where too many colors and too +much boldness cause errors to become less readable? diff --git a/text/1647-allow-self-in-where-clauses.md b/text/1647-allow-self-in-where-clauses.md new file mode 100644 index 00000000000..da90f8ba6e0 --- /dev/null +++ b/text/1647-allow-self-in-where-clauses.md @@ -0,0 +1,87 @@ +- Feature Name: `allow_self_in_where_clauses` +- Start Date: 2016-06-13 +- RFC PR: [#1647](https://github.com/rust-lang/rfcs/pull/1647) +- Rust Issue: [#38864](https://github.com/rust-lang/rust/issues/38864) + +# Summary +[summary]: #summary + +This RFC proposes allowing the `Self` type to be used in every position in trait +implementations, including where clauses and other parameters to the trait being +implemented. + +# Motivation +[motivation]: #motivation + +`Self` is a useful tool to have to reduce churn when the type changes for +various reasons. One would expect to be able to write + +```rust +impl SomeTrait for MySuperLongType where + Self: SomeOtherTrait, +``` + +but this will fail to compile today, forcing you to repeat the type, and adding +one more place that has to change if the type ever changes. + +By this same logic, we would also like to be able to reference associated types +from the traits being implemented. When dealing with generic code, patterns like +this often emerge: + +```rust +trait MyTrait { + type MyType: SomeBound; +} + +impl MyTrait for SomeStruct where + SomeOtherStruct: SomeBound, +{ + type MyType = SomeOtherStruct; +} +``` + +the only reason the associated type is repeated at all is to restate the bound +on the associated type. It would be nice to reduce some of that duplication. + +# Detailed design +[design]: #detailed-design + +Instead of blocking `Self` from being used in the "header" of a trait impl, +it will be understood to be a reference to the implementation type. For example, +all of these would be valid: + +```rust +impl SomeTrait for SomeType where Self: SomeOtherTrait { } + +impl SomeTrait for SomeType { } + +impl SomeTrait for SomeType where SomeOtherType: SomeTrait { } + +impl SomeTrait for SomeType where Self::AssocType: SomeOtherTrait { + AssocType = SomeOtherType; +} +``` + +If the `Self` type is parameterized by `Self`, an error that the type definition +is recursive is thrown, rather than not recognizing self. + +```rust +// The error here is because this would be Vec>, Vec>>, ... +impl SomeTrait for Vec { } +``` + +# Drawbacks +[drawbacks]: #drawbacks + +`Self` is always less explicit than the alternative. + +# Alternatives +[alternatives]: #alternatives + +Not implementing this is an alternative, as is accepting Self only in where clauses +and not other positions in the impl header. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None diff --git a/text/1649-atomic-access.md b/text/1649-atomic-access.md new file mode 100644 index 00000000000..e946ca0c9ec --- /dev/null +++ b/text/1649-atomic-access.md @@ -0,0 +1,61 @@ +q- Feature Name: atomic_access +- Start Date: 2016-06-15 +- RFC PR: [rust-lang/rfcs#1649](https://github.com/rust-lang/rfcs/pull/1649) +- Rust Issue: [rust-lang/rust#35603](https://github.com/rust-lang/rust/issues/35603) + +# Summary +[summary]: #summary + +This RFC adds the following methods to atomic types: + +```rust +impl AtomicT { + fn get_mut(&mut self) -> &mut T; + fn into_inner(self) -> T; +} +``` + +It also specifies that the layout of an `AtomicT` type is always the same as the underlying `T` type. So, for example, `AtomicI32` is guaranteed to be transmutable to and from `i32`. + +# Motivation +[motivation]: #motivation + +## `get_mut` and `into_inner` + +These methods are useful for accessing the value inside an atomic object directly when there are no other threads accessing it. This is guaranteed by the mutable reference and the move, since it means there can be no other live references to the atomic. + +A normal load/store is different from a `load(Relaxed)` or `store(Relaxed)` because it has much weaker synchronization guarantees, which means that the compiler can produce more efficient code. In particular, LLVM currently treats all atomic operations (even relaxed ones) as volatile operations, which means that it does not perform any optimizations on them. For example, it will not eliminate a `load(Relaxed)` even if the results of the load is not used anywhere. + +`get_mut` in particular is expected to be useful in `Drop` implementations where you have a `&mut self` and need to read the value of an atomic. `into_inner` somewhat overlaps in functionality with `get_mut`, but it is included to allow extracting the value without requiring the atomic object to be mutable. These methods mirror `Mutex::get_mut` and `Mutex::into_inner`. + +## Atomic type layout + +The layout guarantee is mainly intended to be used for FFI, where a variable of a non-atomic type needs to be modified atomically. The most common example of this is the Linux `futex` system call which takes an `int*` parameter pointing to an integer that is atomically modified by both userspace and the kernel. + +Rust code invoking the `futex` system call so far has simply passed the address of the atomic object directly to the system call. However this makes the assumption that the atomic type has the same layout as the underlying integer type, which is not currently guaranteed by the documentation. + +This also allows the reverse operation by casting a pointer: it allows Rust code to atomically modify a value that was not declared as a atomic type. This is useful when dealing with FFI structs that are shared with a thread managed by a C library. Another example would be to atomically modify a value in a memory mapped file that is shared with another process. + +# Detailed design +[design]: #detailed-design + +The actual implementations of these functions are mostly trivial since they are based on `UnsafeCell::get`. + +The existing implementations of atomic types already have the same layout as the underlying types (even `AtomicBool` and `bool`), so no change is needed here apart from the documentation. + +# Drawbacks +[drawbacks]: #drawbacks + +The functionality of `into_inner` somewhat overlaps with `get_mut`. + +We lose the ability to change the layout of atomic types, but this shouldn't be necessary since these types map directly to hardware primitives. + +# Alternatives +[alternatives]: #alternatives + +The functionality of `get_mut` and `into_inner` can be implemented using `load(Relaxed)`, however the latter can result in worse code because it is poorly handled by the optimizer. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None diff --git a/text/1651-movecell.md b/text/1651-movecell.md new file mode 100644 index 00000000000..ec0bc3360d2 --- /dev/null +++ b/text/1651-movecell.md @@ -0,0 +1,62 @@ +- Feature Name: move_cell +- Start Date: 2016-06-15 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1651 +- Rust Issue: https://github.com/rust-lang/rust/issues/39264 + +# Summary +[summary]: #summary + +Extend `Cell` to work with non-`Copy` types. + +# Motivation +[motivation]: #motivation + +It allows safe inner-mutability of non-`Copy` types without the overhead of `RefCell`'s reference counting. + +The key idea of `Cell` is to provide a primitive building block to safely support inner mutability. This must be done while maintaining Rust's aliasing requirements for mutable references. Unlike `RefCell` which enforces this at runtime through reference counting, `Cell` does this statically by disallowing any reference (mutable or immutable) to the data contained in the cell. + +While the current implementation only supports `Copy` types, this restriction isn't actually necessary to maintain Rust's aliasing invariants. The only affected API is the `get` function which, by design, is only usable with `Copy` types. + +# Detailed design +[design]: #detailed-design + +```rust +impl Cell { + fn set(&self, val: T); + fn replace(&self, val: T) -> T; + fn into_inner(self) -> T; +} + +impl Cell { + fn get(&self) -> T; +} + +impl Cell { + fn take(&self) -> T; +} +``` + +The `get` method is kept but is only available for `T: Copy`. + +The `set` method is available for all `T`. It will need to be implemented by calling `replace` and dropping the returned value. Dropping the old value in-place is unsound since the `Drop` impl will hold a mutable reference to the cell contents. + +The `into_inner` and `replace` methods are added, which allow the value in a cell to be read even if `T` is not `Copy`. The `get` method can't be used since the cell must always contain a valid value. + +Finally, a `take` method is added which is equivalent to `self.replace(Default::default())`. + +# Drawbacks +[drawbacks]: #drawbacks + +It makes the `Cell` type more complicated. + +`Cell` will only be able to derive traits like `Eq` and `Ord` for types that are `Copy`, since there is no way to non-destructively read the contents of a non-`Copy` `Cell`. + +# Alternatives +[alternatives]: #alternatives + +The alternative is to use the `MoveCell` type from crates.io which provides the same functionality. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None diff --git a/text/1653-assert_ne.md b/text/1653-assert_ne.md new file mode 100644 index 00000000000..78fd4d29aad --- /dev/null +++ b/text/1653-assert_ne.md @@ -0,0 +1,66 @@ +- Feature Name: Assert Not Equals Macro (`assert_ne`) +- Start Date: (2016-06-17) +- RFC PR: [rust-lang/rfcs#1653](https://github.com/rust-lang/rfcs/pull/1653) +- Rust Issue: [rust-lang/rust#35073](https://github.com/rust-lang/rust/issues/35073) + +# Summary +[summary]: #summary + +`assert_ne` is a macro that takes 2 arguments and panics if they are equal. It +works and is implemented identically to `assert_eq` and serves as its complement. +This proposal also includes a `debug_asset_ne`, matching `debug_assert_eq`. + +# Motivation +[motivation]: #motivation + +This feature, among other reasons, makes testing more readable and consistent as +it complements `asset_eq`. It gives the same style panic message as `assert_eq`, +which eliminates the need to write it yourself. + +# Detailed design +[design]: #detailed-design + +This feature has exactly the same design and implementation as `assert_eq`. + +Here is the definition: + +```rust +macro_rules! assert_ne { + ($left:expr , $right:expr) => ({ + match (&$left, &$right) { + (left_val, right_val) => { + if *left_val == *right_val { + panic!("assertion failed: `(left != right)` \ + (left: `{:?}`, right: `{:?}`)", left_val, right_val) + } + } + } + }) +} +``` + +This is complemented by a `debug_assert_ne` (similar to `debug_assert_eq`): + +```rust +macro_rules! debug_assert_ne { + ($($arg:tt)*) => (if cfg!(debug_assertions) { assert_ne!($($arg)*); }) +} +``` + +# Drawbacks +[drawbacks]: #drawbacks + +Any addition to the standard library will need to be maintained forever, so it is +worth weighing the maintenance cost of this over the value add. Given that it is so +similar to `assert_eq`, I believe the weight of this drawback is low. + +# Alternatives +[alternatives]: #alternatives + +Alternatively, users implement this feature themselves, or use the crate `assert_ne` +that I published. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None at this moment. diff --git a/text/1660-try-borrow.md b/text/1660-try-borrow.md new file mode 100644 index 00000000000..57a5566efa7 --- /dev/null +++ b/text/1660-try-borrow.md @@ -0,0 +1,70 @@ +- Feature Name: `try_borrow` +- Start Date: 2016-06-27 +- RFC PR: [rust-lang/rfcs#1660](https://github.com/rust-lang/rfcs/pull/1660) +- Rust Issue: [rust-lang/rust#35070](https://github.com/rust-lang/rust/issues/35070) + +# Summary +[summary]: #summary + +Introduce non-panicking borrow methods on `RefCell`. + +# Motivation +[motivation]: #motivation + +Whenever something is built from user input, for example a graph in which nodes +are `RefCell` values, it is primordial to avoid panicking on bad input. The +only way to avoid panics on cyclic input in this case is a way to +conditionally-borrow the cell contents. + +# Detailed design +[design]: #detailed-design + +```rust +/// Returned when `RefCell::try_borrow` fails. +pub struct BorrowError { _inner: () } + +/// Returned when `RefCell::try_borrow_mut` fails. +pub struct BorrowMutError { _inner: () } + +impl RefCell { + /// Tries to immutably borrows the value. This returns `Err(_)` if the cell + /// was already borrowed mutably. + pub fn try_borrow(&self) -> Result, BorrowError> { ... } + + /// Tries to mutably borrows the value. This returns `Err(_)` if the cell + /// was already borrowed. + pub fn try_borrow_mut(&self) -> Result, BorrowMutError> { ... } +} +``` + +# Drawbacks +[drawbacks]: #drawbacks + +This departs from the fallible/infallible convention where we avoid providing +both panicking and non-panicking methods for the same operation. + +# Alternatives +[alternatives]: #alternatives + +The alternative is to provide a `borrow_state` method returning the state +of the borrow flag of the cell, i.e: + +```rust +pub enum BorrowState { + Reading, + Writing, + Unused, +} + +impl RefCell { + pub fn borrow_state(&self) -> BorrowState { ... } +} +``` + +See [the Rust tracking issue](https://github.com/rust-lang/rust/issues/27733) +for this feature. + +# Unresolved questions +[unresolved]: #unresolved-questions + +There are no unresolved questions. diff --git a/text/1665-windows-subsystem.md b/text/1665-windows-subsystem.md new file mode 100644 index 00000000000..9100dfb3df5 --- /dev/null +++ b/text/1665-windows-subsystem.md @@ -0,0 +1,161 @@ +- Feature Name: Windows Subsystem +- Start Date: 2016-07-03 +- RFC PR: [rust-lang/rfcs#1665](https://github.com/rust-lang/rfcs/pull/1665) +- Rust Issue: [rust-lang/rust#37499](https://github.com/rust-lang/rust/issues/37499) + +# Summary +[summary]: #summary + +Rust programs compiled for Windows will always allocate a console window on +startup. This behavior is controlled via the `SUBSYSTEM` parameter passed to the +linker, and so *can* be overridden with specific compiler flags. However, doing +so will bypass the Rust-specific initialization code in `libstd`, as when using +the MSVC toolchain, the entry point must be named `WinMain`. + +This RFC proposes supporting this case explicitly, allowing `libstd` to +continue to be initialized correctly. + +# Motivation +[motivation]: #motivation + +The `WINDOWS` subsystem is commonly used on Windows: desktop applications +typically do not want to flash up a console window on startup. + +Currently, using the `WINDOWS` subsystem from Rust is undocumented, and the +process is non-trivial when targeting the MSVC toolchain. There are a couple of +approaches, each with their own downsides: + +## Define a WinMain symbol + +A new symbol `pub extern "system" WinMain(...)` with specific argument +and return types must be declared, which will become the new entry point for +the program. + +This is unsafe, and will skip the initialization code in `libstd`. + +The GNU toolchain will accept either entry point. + +## Override the entry point via linker options + +This uses the same method as will be described in this RFC. However, it will +result in build scripts also being compiled for the `WINDOWS` subsystem, which +can cause additional console windows to pop up during compilation, making the +system unusable while a build is in progress. + +# Detailed design +[design]: #detailed-design + +When an executable is linked while compiling for a Windows target, it will be +linked for a specific *subsystem*. The subsystem determines how the operating +system will run the executable, and will affect the execution environment of +the program. + +In practice, only two subsystems are very commonly used: `CONSOLE` and +`WINDOWS`, and from a user's perspective, they determine whether a console will +be automatically created when the program is started. + +## New crate attribute + +This RFC proposes two changes to solve this problem. The first is adding a +top-level crate attribute to allow specifying which subsystem to use: + +`#![windows_subsystem = "windows"]` + +Initially, the set of possible values will be `{windows, console}`, but may be +extended in future if desired. + +The use of this attribute in a non-executable crate will result in a compiler +warning. If compiling for a non-Windows target, the attribute will be silently +ignored. + +## Additional linker argument + +For the GNU toolchain, this will be sufficient. However, for the MSVC toolchain, +the linker will be expecting a `WinMain` symbol, which will not exist. + +There is some complexity to the way in which a different entry point is expected +when using the `WINDOWS` subsystem. Firstly, the C-runtime library exports two +symbols designed to be used as an entry point: +``` +mainCRTStartup +WinMainCRTStartup +``` + +`LINK.exe` will use the subsystem to determine which of these symbols to use +as the default entry point if not overridden. + +Each one performs some unspecified initialization of the CRT, before calling out +to a symbol defined within the program (`main` or `WinMain` respectively). + +The second part of the solution is to pass an additional linker option when +targeting the MSVC toolchain: +`/ENTRY:mainCRTStartup` + +This will override the entry point to always be `mainCRTStartup`. For +console-subsystem programs this will have no effect, since it was already the +default, but for `WINDOWS` subsystem programs, it will eliminate the need for +a `WinMain` symbol to be defined. + +This command line option will always be passed to the linker, regardless of the +presence or absence of the `windows_subsystem` crate attribute, except when +the user specifies their own entry point in the linker arguments. This will +require `rustc` to perform some basic parsing of the linker options. + +# Drawbacks +[drawbacks]: #drawbacks + +- A new platform-specific crate attribute. +- The difficulty of manually calling the Rust initialization code is potentially + a more general problem, and this only solves a specific (if common) case. +- The subsystem must be specified earlier than is strictly required: when + compiling C/C++ code only the linker, not the compiler, needs to actually be + aware of the subsystem. +- It is assumed that the initialization performed by the two CRT entry points + is identical. This seems to currently be the case, and is unlikely to change + as this technique appears to be used fairly widely. + +# Alternatives +[alternatives]: #alternatives + +- Only emit one of either `WinMain` or `main` from `rustc` based on a new + command line option. + + This command line option would only be applicable when compiling an + executable, and only for Windows platforms. No other supported platforms + require a different entry point or additional linker arguments for programs + designed to run with a graphical user interface. + + `rustc` will react to this command line option by changing the exported + name of the entry point to `WinMain`, and passing additional arguments to + the linker to configure the correct subsystem. A mismatch here would result + in linker errors. + + A similar option would need to be added to `Cargo.toml` to make usage as + simple as possible. + + There's some bike-shedding which can be done on the exact command line + interface, but one possible option is shown below. + + Rustc usage: + `rustc foo.rs --crate-subsystem windows` + + Cargo.toml + ```toml + [package] + # ... + + [[bin]] + name = "foo" + path = "src/foo.rs" + subsystem = "windows" + ``` + + The `crate-subsystem` command line option would exist on all platforms, + but would be ignored when compiling for a non-Windows target, so as to + support cross-compiling. If not compiling a binary crate, specifying the + option is an error regardless of the target. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None diff --git a/text/1679-panic-safe-slicing.md b/text/1679-panic-safe-slicing.md new file mode 100644 index 00000000000..11cbd2c4f1b --- /dev/null +++ b/text/1679-panic-safe-slicing.md @@ -0,0 +1,122 @@ +- Feature Name: `panic_safe_slicing` +- Start Date: 2015-10-16 +- RFC PR: [rust-lang/rfcs#1679](https://github.com/rust-lang/rfcs/pull/1679) +- Rust Issue: [rust-lang/rfcs#35729](https://github.com/rust-lang/rust/issues/35729) + +# Summary + +Add "panic-safe" or "total" alternatives to the existing panicking indexing syntax. + +# Motivation + +`SliceExt::get` and `SliceExt::get_mut` can be thought as non-panicking versions of the simple +indexing syntax, `a[idx]`, and `SliceExt::get_unchecked` and `SliceExt::get_unchecked_mut` can +be thought of as unsafe versions with bounds checks elided. However, there is no such equivalent for +`a[start..end]`, `a[start..]`, or `a[..end]`. This RFC proposes such methods to fill the gap. + +# Detailed design + +The `get`, `get_mut`, `get_unchecked`, and `get_unchecked_mut` will be made generic over `usize` +as well as ranges of `usize` like slice's `Index` implementation currently is. This will allow e.g. +`a.get(start..end)` which will behave analagously to `a[start..end]`. + +Because methods cannot be overloaded in an ad-hoc manner in the same way that traits may be +implemented, we introduce a `SliceIndex` trait which is implemented by types which can index into a +slice: +```rust +pub trait SliceIndex { + type Output: ?Sized; + + fn get(self, slice: &[T]) -> Option<&Self::Output>; + fn get_mut(self, slice: &mut [T]) -> Option<&mut Self::Output>; + unsafe fn get_unchecked(self, slice: &[T]) -> &Self::Output; + unsafe fn get_mut_unchecked(self, slice: &[T]) -> &mut Self::Output; + fn index(self, slice: &[T]) -> &Self::Output; + fn index_mut(self, slice: &mut [T]) -> &mut Self::Output; +} + +impl SliceIndex for usize { + type Output = T; + // ... +} + +impl SliceIndex for R + where R: RangeArgument +{ + type Output = [T]; + // ... +} +``` + +And then alter the `Index`, `IndexMut`, `get`, `get_mut`, `get_unchecked`, and `get_mut_unchecked` +implementations to be generic over `SliceIndex`: +```rust +impl [T] { + pub fn get(&self, idx: I) -> Option + where I: SliceIndex + { + idx.get(self) + } + + pub fn get_mut(&mut self, idx: I) -> Option + where I: SliceIndex + { + idx.get_mut(self) + } + + pub unsafe fn get_unchecked(&self, idx: I) -> I::Output + where I: SliceIndex + { + idx.get_unchecked(self) + } + + pub unsafe fn get_mut_unchecked(&mut self, idx: I) -> I::Output + where I: SliceIndex + { + idx.get_mut_unchecked(self) + } +} + +impl Index for [T] + where I: SliceIndex +{ + type Output = I::Output; + + fn index(&self, idx: I) -> &I::Output { + idx.index(self) + } +} + +impl IndexMut for [T] + where I: SliceIndex +{ + fn index_mut(&self, idx: I) -> &mut I::Output { + idx.index_mut(self) + } +} +``` + +# Drawbacks + +- The `SliceIndex` trait is unfortunate - it's tuned for exactly the set of methods it's used by. + It only exists because inherent methods cannot be overloaded the same way that trait + implementations can be. It would most likely remain unstable indefinitely. +- Documentation may suffer. Rustdoc output currently explicitly shows each of the ways you can + index a slice, while there will simply be a single generic implementation with this change. This + may not be that bad, though. The doc block currently seems to provided the most valuable + information to newcomers rather than the trait bound, and that will still be present with this + change. + +# Alternatives + +- Stay as is. +- A previous version of this RFC introduced new `get_slice` etc methods rather than overloading + `get` etc. This avoids the utility trait but is somewhat less ergonomic. +- Instead of one trait amalgamating all of the required methods, we could have one trait per + method. This would open a more reasonable door to stabilizing those traits, but adds quite a lot + more surface area. Replacing an unstable `SliceIndex` trait with a collection would be + backwards compatible. + +# Unresolved questions + +None diff --git a/text/1681-macros-1.1.md b/text/1681-macros-1.1.md new file mode 100644 index 00000000000..0a0aa483f9b --- /dev/null +++ b/text/1681-macros-1.1.md @@ -0,0 +1,585 @@ +- Feature Name: `rustc_macros` +- Start Date: 2016-07-14 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1681 +- Rust Issue: https://github.com/rust-lang/rust/issues/35900 + +# Summary +[summary]: #summary + +Extract a very small sliver of today's procedural macro system in the compiler, +just enough to get basic features like custom derive working, to have an +eventually stable API. Ensure that these features will not pose a maintenance +burden on the compiler but also don't try to provide enough features for the +"perfect macro system" at the same time. Overall, this should be considered an +incremental step towards an official "macros 2.0". + +# Motivation +[motivation]: #motivation + +Some large projects in the ecosystem today, such as [serde] and [diesel], +effectively require the nightly channel of the Rust compiler. Although most +projects have an alternative to work on stable Rust, this tends to be far less +ergonomic and comes with its own set of downsides, and empirically it has not +been enough to push the nightly users to stable as well. + +[serde]: https://github.com/serde-rs/serde +[diesel]: http://diesel.rs/ + +These large projects, however, are often the face of Rust to external users. +Common knowledge is that fast serialization is done using serde, but to others +this just sounds like "fast Rust needs nightly". Over time this persistent +thought process creates a culture of "well to be serious you require nightly" +and a general feeling that Rust is not "production ready". + +The good news, however, is that this class of projects which require nightly +Rust almost all require nightly for the reason of procedural macros. Even +better, the full functionality of procedural macros is rarely needed, only +custom derive! Even better, custom derive typically doesn't *require* the features +one would expect from a full-on macro system, such as hygiene and modularity, +that normal procedural macros typically do. The purpose of this RFC, as a +result, is to provide these crates a method of working on stable Rust with the +desired ergonomics one would have on nightly otherwise. + +Unfortunately today's procedural macros are not without their architectural +shortcomings as well. For example they're defined and imported with arcane +syntax and don't participate in hygiene very well. To address these issues, +there are a number of RFCs to develop a "macros 2.0" story: + +* [Changes to name resolution](https://github.com/rust-lang/rfcs/pull/1560) +* [Macro naming and modularisation](https://github.com/rust-lang/rfcs/pull/1561) +* [Procedural macros](https://github.com/rust-lang/rfcs/pull/1566) +* [Macros by example 2.0](https://github.com/rust-lang/rfcs/pull/1584) + +Many of these designs, however, will require a significant amount of work to not +only implement but also a significant amount of work to stabilize. The current +understanding is that these improvements are on the time scale of years, whereas +the problem of nightly Rust is today! + +As a result, it is an explicit non-goal of this RFC to architecturally improve +on the current procedural macro system. The drawbacks of today's procedural +macros will be the same as those proposed in this RFC. The major goal here is +to simply minimize the exposed surface area between procedural macros and the +compiler to ensure that the interface is well defined and can be stably +implemented in future versions of the compiler as well. + +Put another way, we currently have macros 1.0 unstable today, we're shooting +for macros 2.0 stable in the far future, but this RFC is striking a middle +ground at macros 1.1 today! + +# Detailed design +[design]: #detailed-design + +First, before looking how we're going to expose procedural macros, let's +take a detailed look at how they work today. + +### Today's procedural macros + +A procedural macro today is loaded into a crate with the `#![plugin(foo)]` +annotation at the crate root. This in turn looks for a crate named `foo` [via +the same crate loading mechanisms][loader] as `extern crate`, except [with the +restriction][host-restriction] that the target triple of the crate must be the +same as the target the compiler was compiled for. In other words, if you're on +x86 compiling to ARM, macros must also be compiled for x86. + +[loader]: https://github.com/rust-lang/rust/blob/78d49bfac2bbcd48de522199212a1209f498e834/src/librustc_metadata/creader.rs#L480 +[host-restriction]: https://github.com/rust-lang/rust/blob/78d49bfac2bbcd48de522199212a1209f498e834/src/librustc_metadata/creader.rs#L494 + +Once a crate is found, it's required to be a dynamic library as well, and once +that's all verified the compiler [opens it up with `dlopen`][dlopen] (or the +equivalent therein). After loading, the compiler will [look for a special +symbol][symbol] in the dynamic library, and then call it with a macro context. + +[dlopen]: https://github.com/rust-lang/rust/blob/78d49bfac2bbcd48de522199212a1209f498e834/src/librustc_plugin/load.rs#L124 +[symbol]: https://github.com/rust-lang/rust/blob/78d49bfac2bbcd48de522199212a1209f498e834/src/librustc_plugin/load.rs#L136-L139 + +So as we've seen macros are compiled as normal crates into dynamic libraries. +One function in the crate is tagged with `#[plugin_registrar]` which gets wired +up to this "special symbol" the compiler wants. When the function is called with +a macro context, it uses the passed in [plugin registry][registry] to register +custom macros, attributes, etc. + +[registry]: https://github.com/rust-lang/rust/blob/78d49bfac2bbcd48de522199212a1209f498e834/src/librustc_plugin/registry.rs#L30-L69 + +After a macro is registered, the compiler will then continue the normal process +of expanding a crate. Whenever the compiler encounters this macro it will call +this registration with essentially and AST and morally gets back a different +AST to splice in or replace. + +### Today's drawbacks + +This expansion process suffers from many of the downsides mentioned in the +motivation section, such as a lack of hygiene, a lack of modularity, and the +inability to import macros as you would normally other functionality in the +module system. + +Additionally, though, it's essentially impossible to ever *stabilize* because +the interface to the compiler is... the compiler! We clearly want to make +changes to the compiler over time, so this isn't acceptable. To have a stable +interface we'll need to cut down this surface area *dramatically* to a curated +set of known-stable APIs. + +Somewhat more subtly, the technical ABI of procedural macros is also exposed +quite thinly today as well. The implementation detail of dynamic libraries, and +especially that both the compiler and the macro dynamically link to libraries +like libsyntax, cannot be changed. This precludes, for example, a completely +statically linked compiler (e.g. compiled for `x86_64-unknown-linux-musl`). +Another goal of this RFC will also be to hide as many of these technical +details as possible, allowing the compiler to flexibly change how it interfaces +to macros. + +## Macros 1.1 + +Ok, with the background knowledge of what procedural macros are today, let's +take a look at how we can solve the major problems blocking its stabilization: + +* Sharing an API of the entire compiler +* Frozen interface between the compiler and macros + +### `librustc_macro` + +Proposed in [RFC 1566](https://github.com/rust-lang/rfcs/pull/1566) and +described in [this blog post](http://ncameron.org/blog/libmacro/) the +distribution will now ship with a new `librustc_macro` crate available for macro +authors. The intention here is that the gory details of how macros *actually* +talk to the compiler is entirely contained within this one crate. The stable +interface to the compiler is then entirely defined in this crate, and we can +make it as small or large as we want. Additionally, like the standard library, +it can contain unstable APIs to test out new pieces of functionality over time. + +The initial implementation of `librustc_macro` is proposed to be *incredibly* +bare bones: + +```rust +#![crate_name = "macro"] + +pub struct TokenStream { + // ... +} + +#[derive(Debug)] +pub struct LexError { + // ... +} + +impl FromStr for TokenStream { + type Err = LexError; + + fn from_str(s: &str) -> Result { + // ... + } +} + +impl fmt::Display for TokenStream { + fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { + // ... + } +} +``` + +That is, there will only be a handful of exposed types and `TokenStream` can +only be converted to and from a `String`. Eventually `TokenStream` type will +more closely resemble token streams [in the compiler +itself][compiler-tokenstream], and more fine-grained manipulations will be +available as well. + +[compiler-tokenstream]: https://github.com/rust-lang/rust/blob/master/src/libsyntax/tokenstream.rs#L323-L338 + +### Defining a macro + +A new crate type will be added to the compiler, `rustc-macro` (described below), +indicating a crate that's compiled as a procedural macro. There will not be a +"registrar" function in this crate type (like there is today), but rather a +number of functions which act as token stream transformers to implement macro +functionality. + +A macro crate might look like: + +```rust +#![crate_type = "rustc-macro"] +#![crate_name = "double"] + +extern crate rustc_macro; + +use rustc_macro::TokenStream; + +#[rustc_macro_derive(Double)] +pub fn double(input: TokenStream) -> TokenStream { + let source = input.to_string(); + + // Parse `source` for struct/enum declaration, and then build up some new + // source code representing a number of items in the implementation of + // the `Double` trait for the struct/enum in question. + let source = derive_double(&source); + + // Parse this back to a token stream and return it + source.parse().unwrap() +} +``` + +This new `rustc_macro_derive` attribute will be allowed inside of a +`rustc-macro` crate but disallowed in other crate types. It defines a new +`#[derive]` mode which can be used in a crate. The input here is the entire +struct that `#[derive]` was attached to, attributes and all. The output is +**expected to include the `struct`/`enum` itself** as well as any number of +items to be contextually "placed next to" the initial declaration. + +Again, though, there is no hygiene. More specifically, the +`TokenStream::from_str` method will use the same expansion context as the derive +attribute itself, not the point of definition of the derive function. All span +information for the `TokenStream` structures returned by `from_source` will +point to the original `#[derive]` annotation. This means that error messages +related to struct definitions will get *worse* if they have a custom derive +attribute placed on them, because the entire struct's span will get folded into +the `#[derive]` annotation. Eventually, though, more span information will be +stable on the `TokenStream` type, so this is just a temporary limitation. + +The `rustc_macro_derive` attribute requires the signature (similar to [macros +2.0][mac20sig]): + +[mac20sig]: http://ncameron.org/blog/libmacro/#tokenisingandquasiquoting + +```rust +fn(TokenStream) -> TokenStream +``` + +If a macro cannot process the input token stream, it is expected to panic for +now, although eventually it will call methods in `rustc_macro` to provide more +structured errors. The compiler will wrap up the panic message and display it +to the user appropriately. Eventually, however, `librustc_macro` will provide +more interesting methods of signaling errors to users. + +Customization of user-defined `#[derive]` modes can still be done through custom +attributes, although it will be required for `rustc_macro_derive` +implementations to remove these attributes when handing them back to the +compiler. The compiler will still gate unknown attributes by default. + +### `rustc-macro` crates + +Like the rlib and dylib crate types, the `rustc-macro` crate +type is intended to be an intermediate product. What it *actually* produces is +not specified, but if a `-L` path is provided to it then the compiler will +recognize the output artifacts as a macro and it can be loaded for a program. + +Initially if a crate is compiled with the `rustc-macro` crate type (and possibly +others) it will forbid exporting any items in the crate other than those +functions tagged `#[rustc_macro_derive]` and those functions must also be placed +at the crate root. Finally, the compiler will automatically set the +`cfg(rustc_macro)` annotation whenever any crate type of a compilation is the +`rustc-macro` crate type. + +While these properties may seem a bit odd, they're intended to allow a number of +forwards-compatible extensions to be implemented in macros 2.0: + +* Macros eventually want to be imported from crates (e.g. `use foo::bar!`) and + limiting where `#[derive]` can be defined reduces the surface area for + possible conflict. +* Macro crates eventually want to be compiled to be available both at runtime + and at compile time. That is, an `extern crate foo` annotation may load + *both* a `rustc-macro` crate and a crate to link against, if they are + available. Limiting the public exports for now to only custom-derive + annotations should allow for maximal flexibility here. + +### Using a procedural macro + +Using a procedural macro will be very similar to today's `extern crate` system, +such as: + +```rust +#[macro_use] +extern crate double; + +#[derive(Double)] +pub struct Foo; + +fn main() { + // ... +} +``` + +That is, the `extern crate` directive will now also be enhanced to look for +crates compiled as `rustc-macro` in addition to those compiled as `dylib` and +`rlib`. Today this will be temporarily limited to finding *either* a +`rustc-macro` crate or an rlib/dylib pair compiled for the target, but this +restriction may be lifted in the future. + +The custom derive annotations loaded from `rustc-macro` crates today will all be +placed into the same global namespace. Any conflicts (shadowing) will cause the +compiler to generate an error, and it must be resolved by loading only one or +the other of the `rustc-macro` crates (eventually this will be solved with a +more principled `use` system in macros 2.0). + +### Initial implementation details + +This section lays out what the initial implementation details of macros 1.1 +will look like, but none of this will be specified as a stable interface to the +compiler. These exact details are subject to change over time as the +requirements of the compiler change, and even amongst platforms these details +may be subtly different. + +The compiler will essentially consider `rustc-macro` crates as `--crate-type +dylib -C prefer-dyanmic`. That is, compiled the same way they are today. This +namely means that these macros will dynamically link to the same standard +library as the compiler itself, therefore sharing resources like a global +allocator, etc. + +The `librustc_macro` crate will compiled as an rlib and a static copy of it +will be included in each macro. This crate will provide a symbol known by the +compiler that can be dynamically loaded. The compiler will `dlopen` a macro +crate in the same way it does today, find this symbol in `librustc_macro`, and +call it. + +The `rustc_macro_derive` attribute will be encoded into the crate's metadata, +and the compiler will discover all these functions, load their function +pointers, and pass them to the `librustc_macro` entry point as well. This +provides the opportunity to register all the various expansion mechanisms with +the compiler. + +The actual underlying representation of `TokenStream` will be basically the same +as it is in the compiler today. (the details on this are a little light +intentionally, shouldn't be much need to go into *too* much detail). + +### Initial Cargo integration + +Like plugins today, Cargo needs to understand which crates are `rustc-macro` +crates and which aren't. Cargo additionally needs to understand this to sequence +compilations correctly and ensure that `rustc-macro` crates are compiled for the +host platform. To this end, Cargo will understand a new attribute in the `[lib]` +section: + +```toml +[lib] +rustc-macro = true +``` + +This annotation indicates that the crate being compiled should be compiled as a +`rustc-macro` crate type for the host platform in the current compilation. + +Eventually Cargo may also grow support to understand that a `rustc-macro` crate +should be compiled twice, once for the host and once for the target, but this is +intended to be a backwards-compatible extension to Cargo. + +## Pieces to stabilize + +Eventually this RFC is intended to be considered for stabilization (after it's +implemented and proven out on nightly, of course). The summary of pieces that +would become stable are: + +* The `rustc_macro` crate, and a small set of APIs within (skeleton above) +* The `rustc-macro` crate type, in addition to its current limitations +* The `#[rustc_macro_derive]` attribute +* The signature of the `#![rustc_macro_derive]` functions +* Semantically being able to load macro crates compiled as `rustc-macro` into + the compiler, requiring that the crate was compiled by the exact compiler. +* The semantic behavior of loading custom derive annotations, in that they're + just all added to the same global namespace with errors on conflicts. + Additionally, definitions end up having no hygiene for now. +* The `rustc-macro = true` attribute in Cargo + +### Macros 1.1 in practice + +Alright, that's a lot to take in! Let's take a look at what this is all going to +look like in practice, focusing on a case study of `#[derive(Serialize)]` for +serde. + +First off, serde will provide a crate, let's call it `serde_macros`. The +`Cargo.toml` will look like: + +```toml +[package] +name = "serde-macros" +# ... + +[lib] +rustc-macro = true + +[dependencies] +syntex_syntax = "0.38.0" +``` + +The contents will look similar to + +```rust +extern crate rustc_macro; +extern crate syntex_syntax; + +use rustc_macro::TokenStream; + +#[rustc_macro_derive(Serialize)] +pub fn derive_serialize(input: TokenStream) -> TokenStream { + let input = input.to_string(); + + // use syntex_syntax from crates.io to parse `input` into an AST + + // use this AST to generate an impl of the `Serialize` trait for the type in + // question + + // convert that impl to a string + + // parse back into a token stream + return impl_source.parse().unwrap() +} +``` + +Next, crates will depend on this such as: + +```toml +[dependencies] +serde = "0.9" +serde-macros = "0.9" +``` + +And finally use it as such: + +```rust +extern crate serde; +#[macro_use] +extern crate serde_macros; + +#[derive(Serialize)] +pub struct Foo { + a: usize, + #[serde(rename = "foo")] + b: String, +} +``` + +# Drawbacks +[drawbacks]: #drawbacks + +* This is not an interface that would be considered for stabilization in a void, + there are a number of known drawbacks to the current macro system in terms of + how it architecturally fits into the compiler. Additionally, there's work + underway to solve all these problems with macros 2.0. + + As mentioned before, however, the stable version of macros 2.0 is currently + quite far off, and the desire for features like custom derive are very real + today. The rationale behind this RFC is that the downsides are an acceptable + tradeoff from moving a significant portion of the nightly ecosystem onto stable + Rust. + +* This implementation is likely to be less performant than procedural macros + are today. Round tripping through strings isn't always a speedy operation, + especially for larger expansions. Strings, however, are a very small + implementation detail that's easy to see stabilized until the end of time. + Additionally, it's planned to extend the `TokenStream` API in the future to + allow more fine-grained transformations without having to round trip through + strings. + +* Users will still have an inferior experience to today's nightly macros + specifically with respect to compile times. The `syntex_syntax` crate takes + quite a few seconds to compile, and this would be required by any crate which + uses serde. To offset this, though, the `syntex_syntax` could be *massively* + stripped down as all it needs to do is parse struct declarations mostly. There + are likely many other various optimizations to compile time that can be + applied to ensure that it compiles quickly. + +* Plugin authors will need to be quite careful about the code which they + generate as working with strings loses much of the expressiveness of macros in + Rust today. For example: + + ```rust + macro_rules! foo { + ($x:expr) => { + #[derive(Serialize)] + enum Foo { Bar = $x, Baz = $x * 2 } + } + } + foo!(1 + 1); + ``` + + Plugin authors would have to ensure that this is not naively interpreted as + `Baz = 1 + 1 * 2` as this will cause incorrect results. The compiler will also + need to be careful to parenthesize token streams like this when it generates + a stringified source. + +* By having separte library and macro crate support today (e.g. `serde` and + `serde_macros`) it's possible for there to be version skew between the two, + making it tough to ensure that the two versions you're using are compatible + with one another. This would be solved if `serde` itself could define or + reexport the macros, but unfortunately that would require a likely much larger + step towards "macros 2.0" to solve and would greatly increase the size of this + RFC. + +* Converting to a string and back loses span information, which can + lead to degraded error messages. For example, currently we can make + an effort to use the span of a given field when deriving code that + is caused by that field, but that kind of precision will not be + possible until a richer interface is available. + +# Alternatives +[alternatives]: #alternatives + +* Wait for macros 2.0, but this likely comes with the high cost of postponing a + stable custom-derive experience on the time scale of years. + +* Don't add `rustc_macro` as a new crate, but rather specify that + `#[rustc_macro_derive]` has a stable-ABI friendly signature. This does not + account, however, for the eventual planned introduction of the `rustc_macro` + crate and is significantly harder to write. The marginal benefit of being + slightly more flexible about how it's run likely isn't worth it. + +* The syntax for defining a macro may be different in the macros 2.0 world (e.g. + `pub macro foo` vs an attribute), that is it probably won't involve a function + attribute like `#[rustc_macro_derive]`. This interim system could possibly use + this syntax as well, but it's unclear whether we have a concrete enough idea + in mind to implement today. + +* The `TokenStream` state likely has some sort of backing store behind it like a + string interner, and in the APIs above it's likely that this state is passed + around in thread-local-storage to avoid threading through a parameter like + `&mut Context` everywhere. An alternative would be to explicitly pass this + parameter, but it might hinder trait implementations like `fmt::Display` and + `FromStr`. Additionally, threading an extra parameter could perhaps become + unwieldy over time. + +* In addition to allowing definition of custom-derive forms, definition of + custom procedural macros could also be allowed. They are similarly + transformers from token streams to token streams, so the interface in this RFC + would perhaps be appropriate. This addition, however, adds more surface area + to this RFC and the macro 1.1 system which may not be necessary in the long + run. It's currently understood that *only* custom derive is needed to move + crates like serde and diesel onto stable Rust. + +* Instead of having a global namespace of `#[derive]` modes which `rustc-macro` + crates append to, we could at least require something along the lines of + `#[derive(serde_macros::Deserialize)]`. This is unfortunately, however, still + disconnected from what name resolution will actually be eventually and also + deviates from what you actually may want, `#[derive(serde::Deserialize)]`, for + example. + +# Unresolved questions +[unresolved]: #unresolved-questions + +* Is the interface between macros and the compiler actually general enough to + be implemented differently one day? + +* The intention of macros 1.1 is to be *as close as possible* to macros 2.0 in + spirit and implementation, just without stabilizing vast quantities of + features. In that sense, it is the intention that given a stable macros 1.1, + we can layer on features backwards-compatibly to get to macros 2.0. Right now, + though, the delta between what this RFC proposes and where we'd like to is + very small, and can get get it down to actually zero? + +* Eventually macro crates will want to be loaded both at compile time and + runtime, and this means that Cargo will need to understand to compile these + crates twice, once as `rustc-macro` and once as an rlib. Does Cargo have + enough information to do this? Are the extensions needed here + backwards-compatible? + +* What sort of guarantees will be provided about the runtime environment for + plugins? Are they sandboxed? Are they run in the same process? + +* Should the name of this library be `rustc_macros`? The `rustc_` prefix + normally means "private". Other alternatives are `macro` (make it a contextual + keyword), `macros`, `proc_macro`. + +* Should a `Context` or similar style argument be threaded through the APIs? + Right now they sort of implicitly require one to be threaded through + thread-local-storage. + +* Should the APIs here be namespaced, perhaps with a `_1_1` suffix? + +* To what extent can we preserve span information through heuristics? + Should we adopt a slightly different API, for example one based on + concatenation, to allow preserving spans? + diff --git a/text/1682-field-init-shorthand.md b/text/1682-field-init-shorthand.md new file mode 100644 index 00000000000..f0d79f80374 --- /dev/null +++ b/text/1682-field-init-shorthand.md @@ -0,0 +1,215 @@ +- Feature Name: field-init-shorthand +- Start Date: 2016-07-18 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1682 +- Rust Issue: https://github.com/rust-lang/rust/issues/37340 + +# Summary +[summary]: #summary + +When initializing a data structure (struct, enum, union) with named fields, +allow writing `fieldname` as a shorthand for `fieldname: fieldname`. This +allows a compact syntax for initialization, with less duplication. + +Example usage: + + struct SomeStruct { field1: ComplexType, field2: AnotherType } + + impl SomeStruct { + fn new() -> Self { + let field1 = { + // Various initialization code + }; + let field2 = { + // More initialization code + }; + SomeStruct { field1, field2 } + } + } + +# Motivation +[motivation]: #motivation + +When writing initialization code for a data structure, the names of the +structure fields often become the most straightforward names to use for their +initial values as well. At the end of such an initialization function, then, +the initializer will contain many patterns of repeated field names as field +values: `field: field, field2: field2, field3: field3`. + +Such repetition of the field names makes it less ergonomic to separately +declare and initialize individual fields, and makes it tempting to instead +embed complex code directly in the initializer to avoid repetition. + +Rust already allows +[similar syntax for destructuring in pattern matches](https://doc.rust-lang.org/book/patterns.html#destructuring): +a pattern match can use `SomeStruct { field1, field2 } => ...` to match +`field1` and `field2` into values with the same names. This RFC introduces +symmetrical syntax for initializers. + +A family of related structures will often use the same field name for a +semantically-similar value. Combining this new syntax with the existing +pattern-matching syntax allows simple movement of data between fields with a +pattern match: `Struct1 { field1, .. } => Struct2 { field1 }`. + +The proposed syntax also improves structure initializers in closures, such as +might appear in a chain of iterator adapters: `|field1, field2| SomeStruct { +field1, field2 }`. + +This RFC takes inspiration from the Haskell +[NamedFieldPuns extension](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#record-puns), +and from ES6 +[shorthand property names](http://www.ecma-international.org/ecma-262/6.0/#sec-object-initializer). + +# Detailed design +[design]: #detailed-design + +## Grammar + +In the initializer for a `struct` with named fields, a `union` with named +fields, or an enum variant with named fields, accept an identifier `field` as a +shorthand for `field: field`. + +With reference to the grammar in `parser-lalr.y`, this proposal would +expand the `field_init` +[rule](https://github.com/rust-lang/rust/blob/master/src/grammar/parser-lalr.y#L1663-L1665) +to the following: + + field_init + : ident + | ident ':' expr + ; + +## Interpretation + +The shorthand initializer `field` always behaves in every possible way like the +longhand initializer `field: field`. This RFC introduces no new behavior or +semantics, only a purely syntactic shorthand. The rest of this section only +provides further examples to explicitly clarify that this new syntax remains +entirely orthogonal to other initializer behavior and semantics. + +## Examples + +If the struct `SomeStruct` has fields `field1` and `field2`, the initializer +`SomeStruct { field1, field2 }` behaves in every way like the initializer +`SomeStruct { field1: field1, field2: field2 }`. + +An initializer may contain any combination of shorthand and full field +initializers: + + let a = SomeStruct { field1, field2: expression, field3 }; + let b = SomeStruct { field1: field1, field2: expression, field3: field3 }; + assert_eq!(a, b); + +An initializer may use shorthand field initializers together with +[update syntax](https://doc.rust-lang.org/book/structs.html#update-syntax): + + let a = SomeStruct { field1, .. someStructInstance }; + let b = SomeStruct { field1: field1, .. someStructInstance }; + assert_eq!(a, b); + +## Compilation errors + +This shorthand initializer syntax does not introduce any new compiler errors +that cannot also occur with the longhand initializer syntax `field: field`. +Existing compiler errors that can occur with the longhand initializer syntax +`field: field` also apply to the shorthand initializer syntax `field`: + +- As with the longhand initializer `field: field`, if the structure has no + field with the specified name `field`, the shorthand initializer `field` + results in a compiler error for attempting to initialize a non-existent + field. + +- As with the longhand initializer `field: field`, repeating a field name + within the same initializer results in a compiler error + ([E0062](https://doc.rust-lang.org/error-index.html#E0062)); this occurs with + any combination of shorthand initializers or full `field: expression` + initializers. + +- As with the longhand initializer `field: field`, if the name `field` does not + resolve, the shorthand initializer `field` results in a compiler error for an + unresolved name ([E0425](https://doc.rust-lang.org/error-index.html#E0425)). + +- As with the longhand initializer `field: field`, if the name `field` resolves + to a value with type incompatible with the field `field` in the structure, + the shorthand initializer `field` results in a compiler error for mismatched + types ([E0308](https://doc.rust-lang.org/error-index.html#E0308)). + +# Drawbacks +[drawbacks]: #drawbacks + +This new syntax could significantly improve readability given clear and local +field-punning variables, but could also be abused to decrease readability if +used with more distant variables. + +As with many syntactic changes, a macro could implement this instead. See the +Alternatives section for discussion of this. + +The shorthand initializer syntax looks similar to positional initialization of +a structure without field names; reinforcing this, the initializer will +commonly list the fields in the same order that the struct declares them. +However, the shorthand initializer syntax differs from the positional +initializer syntax (such as for a tuple struct) in that the positional syntax +uses parentheses instead of braces: `SomeStruct(x, y)` is unambiguously a +positional initializer, while `SomeStruct { x, y }` is unambiguously a +shorthand initializer for the named fields `x` and `y`. + +# Alternatives +[alternatives]: #alternatives + +## Wildcards + +In addition to this syntax, initializers could support omitting the field names +entirely, with syntax like `SomeStruct { .. }`, which would implicitly +initialize omitted fields from identically named variables. However, that would +introduce far too much magic into initializers, and the context-dependence +seems likely to result in less readable, less obvious code. + +## Macros + +A macro wrapped around the initializer could implement this syntax, without +changing the language; for instance, `pun! { SomeStruct { field1, field2 } }` +could expand to `SomeStruct { field1: field1, field2: field2 }`. However, this +change exists to make structure construction shorter and more expressive; +having to use a macro would negate some of the benefit of doing so, +particularly in places where brevity improves readability, such as in a closure +in the middle of a larger expression. There is also precedent for +language-level support. Pattern matching already allows using field names as +the _destination_ for the field values via destructuring. This change adds a +symmetrical mechanism for construction which uses existing names as _sources_. + +## Sigils + +To minimize confusing shorthand expressions with the construction of +tuple-like structs, we might elect to prefix expanded field names with +sigils. + +For example, if the sigil were `:`, the existing syntax `S { x: x }` +would be expressed as `S { :x }`. This is used in +[MoonScript](http://moonscript.org/reference/#the-language/table-literals). + +This particular choice of sigil may be confusing, due to the +already-overloaded use of `:` for fields and type ascription. Additionally, +in languages such as Ruby and Elixir, `:x` denotes a symbol or atom, which +may be confusing for newcomers. + +Other sigils could be used instead, but even then we are then increasing +the amount of new syntax being introduced. This both increases language +complexity and reduces the gained compactness, worsening the +cost/benefit ratio of adding a shorthand. Any use of a sigil also breaks +the symmetry between binding pattern matching and the proposed +shorthand. + +## Keyword-prefixed + +Similarly to sigils, we could use a keyword like Nix uses +[inherit](http://nixos.org/nix/manual/#idm46912467627696). Some forms we could +decide upon (using `use` as the keyword of choice here, but it could be +something else), it could look like the following. + +* `S { use x, y, z: 10}` +* `S { use (x, y), z: 10 }` +* `S { use {x, y}, z: 10 }` +* `S { use x, use y, z: 10}` + +This has the same drawbacks as sigils except that it won't be confused for +symbols in other languages or adding more sigils. It also has the benefit +of being something that can be searched for in documentation. diff --git a/text/1683-docs-team.md b/text/1683-docs-team.md new file mode 100644 index 00000000000..72c9a0b7256 --- /dev/null +++ b/text/1683-docs-team.md @@ -0,0 +1,105 @@ +- Feature Name: N/A +- Start Date: 2016-07-21 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1683 +- Rust Issue: N/A + +# Summary +[summary]: #summary + +Create a team responsible for documentation for the Rust project. + +# Motivation +[motivation]: #motivation + +[RFC 1068] introduced a federated governance model for the Rust project. Several initial subteams were set up. There was a note +after the [original subteam list] saying this: + +[RFC 1068]: https://github.com/rust-lang/rfcs/blob/master/text/1068-rust-governance.md +[original subteam list]: https://github.com/rust-lang/rfcs/blob/master/text/1068-rust-governance.md#the-teams + +> In the long run, we will likely also want teams for documentation and for community events, but these can be spun up once there is a more clear need (and available resources). + +Now is the time for a documentation subteam. + +## Why documentation was left out + +Documentation was left out of the original list because it wasn't clear that there would be anyone but me on it. Furthermore, +one of the original reasons for the subteams was to decide who gets counted amongst consensus for RFCs, but it was unclear +how many documentation-related RFCs there would even be. + +## Chicken, meet egg + +However, RFCs are not only what subteams do. To quote the RFC: + +> * Shepherding RFCs for the subteam area. As always, that means (1) ensuring +> that stakeholders are aware of the RFC, (2) working to tease out various +> design tradeoffs and alternatives, and (3) helping build consensus. +> * Accepting or rejecting RFCs in the subteam area. +> * Setting policy on what changes in the subteam area require RFCs, and reviewing direct PRs for changes that do not require an RFC. +> * Delegating reviewer rights for the subteam area. The ability to r+ is not limited to team members, and in fact earning r+ rights is a good stepping stone toward team membership. Each team should set reviewing policy, manage reviewing rights, and ensure that reviews take place in a timely manner. (Thanks to Nick Cameron for this suggestion.) + +The first two are about RFCs themselves, but the second two are more pertinent to documentation. In particular, +deciding who gets `r+` rights is important. A lack of clarity in this area has been unfortuante, and has led to a +chicken and egg situation: without a documentation team, it's unclear how to be more involved in working on Rust's +documentation, but without people to be on the team, there's no reason to form a team. For this reason, I think +a small initial team will break this logjam, and provide room for new contributors to grow. + +# Detailed design +[design]: #detailed-design + +The Rust documentation team will be responsible for all of the things listed above. Specifically, they will pertain +to these areas of the Rust project: + +* The standard library documentation +* The book and other long-form docs +* Cargo's documentation +* The Error Index + +Furthermore, the documentation team will be available to help with ecosystem documentation, in a few ways. Firstly, +in an advisory capacity: helping people who want better documentation for their crates to understand how to accomplish +that goal. Furthermore, monitoring the overall ecosystem documentation, and identifying places where we could contribute +and make a large impact for all Rustaceans. If the Rust project itself has wonderful docs, but the ecosystem has terrible +docs, then people will still be frustrated with Rust's documentation situation, especially given our anti-batteries-included +attitude. To be clear, this does not mean _owning_ the ecosystem docs, but rather working to contribute in more ways +than just the Rust project itself. + +We will coordinate in the `#rust-docs` IRC room, and have regular meetings, as the team sees fit. Regular meetings will be +important to coordinate broader goals; and participation will be important for team members. We hold meetings weekly. + +## Membership + +* @steveklabnik, team lead +* @GuillaumeGomez +* @jonathandturner +* @peschkaj + +It's important to have a path towards attaining team membership; there are some other people who have already been doing +docs work that aren't on this list. These guidelines are not hard and fast, however, anyone wanting to eventually be a +member of the team should pursue these goals: + +* Contributing documentation patches to Rust itself +* Attending doc team meetings, which are open to all +* generally being available on [IRC][^IRC] to collaborate with others + +I am not quantifying this exactly because it's not about reaching some specific number; adding someone to the team should +make sense if someone is doing all of these things. + +[^IRC]: The #rust-docs channel on irc.mozilla.org + +# Drawbacks +[drawbacks]: #drawbacks + +This is Yet Another Team. Do we have too many teams? I don't think so, but someone might. + +# Alternatives +[alternatives]: #alternatives + +The main alternative is not having a team. This is the status quo, so the situation is well-understood. + +It's possible that docs come under the purvew of "tools", and so maybe the docs team would be an expansion +of the tools team, rather than its own new team. Or some other subteam. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None. diff --git a/text/1696-discriminant.md b/text/1696-discriminant.md new file mode 100644 index 00000000000..e0d888fc73b --- /dev/null +++ b/text/1696-discriminant.md @@ -0,0 +1,118 @@ +- Feature Name: discriminant +- Start Date: 2016-08-01 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1696 +- Rust Issue: [#24263](https://github.com/rust-lang/rust/pull/24263), [#34785](https://github.com/rust-lang/rust/pull/34785) + +# Summary +[summary]: #summary + +Add a function that extracts the discriminant from an enum variant as a comparable, hashable, printable, but (for now) opaque and unorderable type. + +# Motivation +[motivation]: #motivation + +When using an ADT enum that contains data in some of the variants, it is sometimes desirable to know the variant but ignore the data, in order to compare two values by variant or store variants in a hash map when the data is either unhashable or unimportant. + +The motivation for this is mostly identical to [RFC 639](https://github.com/rust-lang/rfcs/blob/master/text/0639-discriminant-intrinsic.md#motivation). + +# Detailed design +[design]: #detailed-design + +The proposed design has been implemented at [#34785](https://github.com/rust-lang/rust/pull/34785) (after some back-and-forth). That implementation is copied at the end of this section for reference. + +A struct `Discriminant` and a free function `fn discriminant(v: &T) -> Discriminant` are added to `std::mem` (for lack of a better home, and noting that `std::mem` already contains similar parametricity escape hatches such as `size_of`). For now, the `Discriminant` struct is simply a newtype over `u64`, because that's what the `discriminant_value` intrinsic returns, and a `PhantomData` to allow it to be generic over `T`. + +Making `Discriminant` generic provides several benefits: + +- `discriminant(&EnumA::Variant) == discriminant(&EnumB::Variant)` is statically prevented. +- In the future, we can implement different behavior for different kinds of enums. For example, if we add a way to distinguish C-like enums at the type level, then we can add a method like `Discriminant::into_inner` for only those enums. Or enums with certain kinds of discriminants could become orderable. + +The function no longer requires a `Reflect` bound on its argument even though discriminant extraction is a partial violation of parametricity, in that a generic function with no bounds on its type parameters can nonetheless find out some information about the input types, or perform a "partial equality" comparison. This is debatable (see [this comment](https://github.com/rust-lang/rfcs/pull/639#issuecomment-86441840), [this comment](https://github.com/rust-lang/rfcs/pull/1696#issuecomment-236669066) and open question #2), especially in light of specialization. The situation is comparable to `TypeId::of` (which requires the bound) and `mem::size_of_val` (which does not). Note that including a bound is the conservative decision, because it can be backwards-compatibly removed. + +```rust +/// Returns a value uniquely identifying the enum variant in `v`. +/// +/// If `T` is not an enum, calling this function will not result in undefined behavior, but the +/// return value is unspecified. +/// +/// # Stability +/// +/// Discriminants can change if enum variants are reordered, if a new variant is added +/// in the middle, or (in the case of a C-like enum) if explicitly set discriminants are changed. +/// Therefore, relying on the discriminants of enums outside of your crate may be a poor decision. +/// However, discriminants of an identical enum should not change between minor versions of the +/// same compiler. +/// +/// # Examples +/// +/// This can be used to compare enums that carry data, while disregarding +/// the actual data: +/// +/// ``` +/// #![feature(discriminant_value)] +/// use std::mem; +/// +/// enum Foo { A(&'static str), B(i32), C(i32) } +/// +/// assert!(mem::discriminant(&Foo::A("bar")) == mem::discriminant(&Foo::A("baz"))); +/// assert!(mem::discriminant(&Foo::B(1)) == mem::discriminant(&Foo::B(2))); +/// assert!(mem::discriminant(&Foo::B(3)) != mem::discriminant(&Foo::C(3))); +/// ``` +pub fn discriminant(v: &T) -> Discriminant { + unsafe { + Discriminant(intrinsics::discriminant_value(v), PhantomData) + } +} + +/// Opaque type representing the discriminant of an enum. +/// +/// See the `discriminant` function in this module for more information. +pub struct Discriminant(u64, PhantomData<*const T>); + +impl Copy for Discriminant {} + +impl clone::Clone for Discriminant { + fn clone(&self) -> Self { + *self + } +} + +impl cmp::PartialEq for Discriminant { + fn eq(&self, rhs: &Self) -> bool { + self.0 == rhs.0 + } +} + +impl cmp::Eq for Discriminant {} + +impl hash::Hash for Discriminant { + fn hash(&self, state: &mut H) { + self.0.hash(state); + } +} + +impl fmt::Debug for Discriminant { + fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result { + self.0.fmt(fmt) + } +} +``` + +# Drawbacks +[drawbacks]: #drawbacks + +1. Anytime we reveal more details about the memory representation of a `repr(rust)` type, we add back-compat guarantees. The author is of the opinion that the proposed `Discriminant` newtype still hides enough to mitigate this drawback. (But see open question #1.) +2. Adding another function and type to core implies an additional maintenance burden, especially when more enum layout optimizations come around (however, there is hardly any burden on top of that associated with the extant `discriminant_value` intrinsic). + +# Alternatives +[alternatives]: #alternatives + +1. Do nothing: there is no stable way to extract the discriminant from an enum variant. Users who need such a feature will need to write (or generate) big match statements and hope they optimize well (this has been servo's approach). +2. Directly stabilize the `discriminant_value` intrinsic, or a wrapper that doesn't use an opaque newtype. This more drastically precludes future enum representation optimizations, and won't be able to take advantage of future type system improvements that would let `discriminant` return a type dependent on the enum. + +# Unresolved questions +[unresolved]: #unresolved-questions + +1. Can the return value of `discriminant(&x)` be considered stable between subsequent compilations of the same code? How about if the enum in question is changed by modifying a variant's name? by adding a variant? +2. Is the `T: Reflect` bound necessary? +3. Can `Discriminant` implement `PartialOrd`? diff --git a/text/1717-dllimport.md b/text/1717-dllimport.md new file mode 100644 index 00000000000..3153f493980 --- /dev/null +++ b/text/1717-dllimport.md @@ -0,0 +1,148 @@ +- Feature Name: dllimport +- Start Date: 2016-08-13 +- RFC PR: [rust-lang/rfcs#1717](https://github.com/rust-lang/rfcs/pull/1717) +- Rust Issue: [rust-lang/rust#37403](https://github.com/rust-lang/rust/issues/37403) + +# Summary +[summary]: #summary + +Make compiler aware of the association between library names adorning `extern` blocks +and symbols defined within the block. Add attributes and command line switches that leverage +this association. + +# Motivation +[motivation]: #motivation + +Most of the time a linkage directive is only needed to inform the linker about +what native libraries need to be linked into a program. On some platforms, +however, the compiler needs more detailed knowledge about what's being linked +from where in order to ensure that symbols are wired up correctly. + +On Windows, when a symbol is imported from a dynamic library, the code that accesses +this symbol must be generated differently than for symbols imported from a static library. + +Currently the compiler is not aware of associations between the libraries and symbols +imported from them, so it cannot alter code generation based on library kind. + +# Detailed design +[design]: #detailed-design + +### Library <-> symbol association + +The compiler shall assume that symbols defined within extern block +are imported from the library mentioned in the `#[link]` attribute adorning the block. + +### Changes to code generation + +On platforms other than Windows the above association will have no effect. +On Windows, however, `#[link(..., kind="dylib")` shall be presumed to mean linking to a dll, +whereas `#[link(..., kind="static")` shall mean static linking. In the former case, all symbols +associated with that library will be marked with LLVM [dllimport][1] storage class. + +[1]: http://llvm.org/docs/LangRef.html#dll-storage-classes + +### Library name and kind variance + +Many native libraries are linked via the command line via `-l` which is passed +in through Cargo build scripts instead of being written in the source code +itself. As a recap, a native library may change names across platforms or +distributions or it may be linked dynamically in some situations and +statically in others which is why build scripts are leveraged to make these +dynamic decisions. In order to support this kind of dynamism, the following +modifications are proposed: + +- Extend syntax of the `-l` flag to `-l [KIND=]lib[:NEWNAME]`. The `NEWNAME` + part may be used to override name of a library specified in the source. +- Add new meaning to the `KIND` part: if "lib" is already specified in the source, + this will override its kind with KIND. Note that this override is possible only + for libraries defined in the current crate. + +Example: + +```rust +// mylib.rs +#[link(name="foo", kind="dylib")] +extern { + // dllimport applied +} + +#[link(name="bar", kind="static")] +extern { + // dllimport not applied +} + +#[link(name="baz")] +extern { + // kind defaults to "dylib", dllimport applied +} +``` + +```sh +rustc mylib.rs -l static=foo # change foo's kind to "static", dllimport will not be applied +rustc mylib.rs -l foo:newfoo # link newfoo instead of foo, keeping foo's kind as "dylib" +rustc mylib.rs -l dylib=bar # change bar's kind to "dylib", dllimport will be applied +``` + +### Unbundled static libs (optional) + +It had been pointed out that sometimes one may wish to link to a static system library +(i.e. one that is always available to the linker) without bundling it into .lib's and .rlib's. +For this use case we'll introduce another library "kind", "static-nobundle". +Such libraries would be treated in the same way as "static", except they will not be bundled into +the target .lib/.rlib. + +# Drawbacks +[drawbacks]: #drawbacks + +For libraries to work robustly on MSVC, the correct `#[link]` annotation will +be required. Most cases will "just work" on MSVC due to the compiler strongly +favoring static linkage, but any symbols imported from a dynamic library or +exported as a Rust dynamic library will need to be tagged appropriately to +ensure that they work in all situations. Worse still, the `#[link]` annotations +on an `extern` block are not required on any other platform to work correctly, +meaning that it will be common that these attributes are left off by accident. + + +# Alternatives +[alternatives]: #alternatives + +- Instead of enhancing `#[link]`, a `#[linked_from = "foo"]` annotation could be added. + This has the drawback of not being able to handle native libraries whose + name is unpredictable across platforms in an easy fashion, however. + Additionally, it adds an extra attribute to the comipler that wasn't known + previously. + +- Support a `#[dllimport]` on extern blocks (or individual symbols, or both). + This has the following drawbacks, however: + - This attribute would duplicate the information already provided by + `#[link(kind="...")]`. + - It is not always known whether `#[dllimport]` is needed. Native + libraires are not always known whether they're linked dynamically or + statically (e.g. that's what a build script decides), so `dllimport` + will need to be guarded by `cfg_attr`. + +- When linking native libraries, the compiler could attempt to locate each + library on the filesystem and probe the contents for what symbol names are + exported from the native library. This list could then be cross-referenced + with all symbols declared in the program locally to understand which symbols + are coming from a dylib and which are being linked statically. Some downsides + of this approach may include: + + - It's unclear whether this will be a performant operation and not cause + undue runtime overhead during compiles. + + - On Windows linking to a DLL involves linking to its "import library", so + it may be difficult to know whether a symbol truly comes from a DLL or + not. + + - Locating libraries on the system may be difficult as the system linker + often has search paths baked in that the compiler does not know about. + +- As was already mentioned, "kind" override can affect codegen of the current crate only. + This overloading the `-l` flag for this purpose may be confusinfg to developers. + A new codegen flag might be a better fit for this, for example `-C libkind=KIND=LIB`. + +# Unresolved questions +[unresolved]: #unresolved-questions + +- Should we allow dropping a library specified in the source from linking via `-l lib:` (i.e. "rename to empty")? diff --git a/text/1721-crt-static.md b/text/1721-crt-static.md new file mode 100644 index 00000000000..2ccd66208a4 --- /dev/null +++ b/text/1721-crt-static.md @@ -0,0 +1,371 @@ +- Feature Name: `crt_link` +- Start Date: 2016-08-18 +- RFC PR: [rust-lang/rfcs#1721](https://github.com/rust-lang/rfcs/pull/1721) +- Rust Issue: [rust-lang/rust#37406](https://github.com/rust-lang/rust/issues/37406) + +# Summary +[summary]: #summary + +Enable the compiler to select whether a target dynamically or statically links +to a platform's standard C runtime through the introduction of three orthogonal +and otherwise general purpose features, one of which will likely never become +stable and can be considered an implementation detail of std. These features do +not require the compiler or language to have intrinsic knowledge of the +existence of C runtimes. + +The end result is that rustc will be able to reuse its existing standard library +binaries for the MSVC and musl targets to build code that links either +statically or dynamically to libc. + +The design herein additionally paves the way for improved support for +dllimport/dllexport, and cpu-specific features, particularly when +combined with a [std-aware cargo]. + +[std-aware cargo]: https://github.com/rust-lang/rfcs/pull/1133 + +# Motivation +[motivation]: #motivation + +Today all targets of rustc hard-code how they link to the native C runtime. For +example the `x86_64-unknown-linux-gnu` target links to glibc dynamically, +`x86_64-unknown-linux-musl` links statically to musl, and +`x86_64-pc-windows-msvc` links dynamically to MSVCRT. There are many use cases, +however, where these decisions are not suitable. For example binaries on Alpine +Linux want to link dynamically to musl and creating portable binaries on Windows +is most easily done by linking statically to MSVCRT. + +Today rustc has no mechanism for accomplishing this besides defining an entirely +new target specification and distributing a build of the standard library for +it. Because target specifications must be described by a target triple, and +target triples have preexisting conventions into which such a scheme does not +fit, we have resisted doing so. + +# Detailed design +[design]: #detailed-design + +This RFC introduces three separate features to the compiler and Cargo. When +combined they will enable the compiler to change whether the C standard library +is linked dynamically or statically. In isolation each feature is a natural +extension of existing features, and each should be useful on its own. + +A key insight is that, for practical purposes, the object code _for the standard +library_ does not need to change based on how the C runtime is being linked; +though it is true that on Windows, it is _generally_ important to properly +manage the use of dllimport/dllexport attributes based on the linkage type, and +C code does need to be compiled with specific options based on the linkage type. +So it is technically possible to produce Rust executables and dynamic libraries +that either link to libc statically or dynamically from a single std binary by +correctly manipulating the arguments to the linker. + +A second insight is that there are multiple existing, unserved use cases for +configuring features of the hardware architecture, underlying platform, or +runtime [1], which require the entire 'world', possibly including std, to be +compiled a certain way. C runtime linkage is another example of this +requirement. + +[1]: https://internals.rust-lang.org/t/pre-rfc-a-vision-for-platform-architecture-configuration-specific-apis/3502 + +From these observations we can design a cross-platform solution spanning both +Cargo and the compiler by which Rust programs may link to either a dynamic or +static C library, using only a single std binary. As future work this RFC +discusses how the proposed scheme scheme can be extended to rebuild std +specifically for a particular C-linkage scenario, which may have minor +advantages on Windows due to issues around dllimport and dllexport; and how this +scheme naturally extends to recompiling std in the presence of modified CPU +features. + +This RFC does *not* propose unifying how the C runtime is linked across +platforms (e.g. always dynamically or always statically) but instead leaves that +decision to each target, and to future work. + +In summary the new mechanics are: + +- Specifying C runtime linkage via `-C target-feature=+crt-static` or `-C + target-feature=-crt-static`. This extends `-C target-feature` to mean not just + "CPU feature" ala LLVM, but "feature of the Rust target". Several existing + properties of this flag, the ability to add, with `+`, _or remove_, with `-`, + the feature, as well as the automatic lowering to `cfg` values, are crucial to + later aspects of the design. This target feature will be added to targets via + a small extension to the compiler's target specification. +- Lowering `cfg` values to Cargo build script environment variables. This will + enable build scripts to understand all enabled features of a target (like + `crt-static` above) to, for example, compile C code correctly on MSVC. +- Lazy link attributes. This feature is only required by std's own copy of the + libc crate, and only because std is distributed in binary form and it may yet + be a long time before Cargo itself can rebuild std. + +### Specifying dynamic/static C runtime linkage + +A new `target-feature` flag will now be supported by the compiler for relevant +targets: `crt-static`. This can be enabled and disabled in the compiler via: + +``` +rustc -C target-feature=+crt-static ... +rustc -C target-feature=-crt-static ... +``` + +Currently all `target-feature` flags are passed through straight to LLVM, but +this proposes extending the meaning of `target-feature` to Rust-target-specific +features as well. Target specifications will be able to indicate what custom +target-features can be defined, and most existing targets will define a new +`crt-static` feature which is turned off by default (except for musl). + +The default of `crt-static` will be different depending on the target. For +example `x86_64-unknown-linux-musl` will have it on by default, whereas +`arm-unknown-linux-musleabi` will have it turned off by default. + +### Lowering `cfg` values to Cargo build script environment variables + +Cargo will begin to forward `cfg` values from the compiler into build +scripts. Currently the compiler supports `--print cfg` as a flag to print out +internal cfg directives, which Cargo uses to implement platform-specific +dependencies. + +When Cargo runs a build script it already sets a [number of environment +variables][cargo-build-env], and it will now set a family of `CARGO_CFG_*` +environment variables as well. For each key printed out from `rustc --print +cfg`, Cargo will set an environment variable for the build script to learn +about. + +[cargo-build-env]: http://doc.crates.io/environment-variables.html#environment-variables-cargo-sets-for-build-scripts + +For example, locally `rustc --print cfg` prints: + +``` +target_os="linux" +target_family="unix" +target_arch="x86_64" +target_endian="little" +target_pointer_width="64" +target_env="gnu" +unix +debug_assertions +``` + +And with this Cargo would set the following environment variables for build +script invocations for this target. + +``` +export CARGO_CFG_TARGET_OS=linux +export CARGO_CFG_TARGET_FAMILY=unix +export CARGO_CFG_TARGET_ARCH=x86_64 +export CARGO_CFG_TARGET_ENDIAN=little +export CARGO_CFG_TARGET_POINTER_WIDTH=64 +export CARGO_CFG_TARGET_ENV=gnu +export CARGO_CFG_UNIX +export CARGO_CFG_DEBUG_ASSERTIONS +``` + +As mentioned in the previous section, the linkage of the C standard library will +be specified as a target feature, which is lowered to a `cfg` value, thus giving +build scripts the ability to modify compilation options based on C standard +library linkage. One important complication here is that `cfg` values in Rust +may be defined multiple times, and this is the case with target features. When a +`cfg` value is defined multiple times, Cargo will create a single environment +variable with a comma-separated list of values. + +So for a target with the following features enabled + +``` +target_feature="sse" +target_feature="crt-static" +``` + +Cargo would convert it to the following environment variable: + +``` +export CARGO_CFG_TARGET_FEATURE=sse,crt-static +``` + +Through this method build scripts will be able to learn how the C standard +library is being linked. This is crucially important for the MSVC target where +code needs to be compiled differently depending on how the C library is linked. + +This feature ends up having the added benefit of informing build scripts about +selected CPU features as well. For example once the `target_feature` `#[cfg]` +is stabilized build scripts will know whether SSE/AVX/etc are enabled features +for the C code they might be compiling. + +After this change, the gcc-rs crate will be modified to check for the +`CARGO_CFG_TARGET_FEATURE` directive, and parse it into a list of enabled +features. If the `crt-static` feature is not enabled it will compile C code on +the MSVC target with `/MD`, indicating dynamic linkage. Otherwise if the value +is `static` it will compile code with `/MT`, indicating static linkage. Because +today the MSVC targets use dynamic linkage and gcc-rs compiles C code with `/MD`, +gcc-rs will remain forward and backwards compatible with existing and future +Rust MSVC toolchains until such time as the the decision is made to change the +MSVC toolchain to `+crt-static` by default. + +### Lazy link attributes + +The final feature that will be added to the compiler is the ability to "lazily" +interpret the linkage requirements of a native library depending on values of +`cfg` at compile time of downstream crates, not of the crate with the `#[link]` +directives. This feature is never intended to be stabilized, and is instead +targeted at being an unstable implementation detail of the `libc` crate linked +to `std` (but _not_ the stable `libc` crate deployed to crates.io). + +Specifically, the `#[link]` attribute will be extended with a new argument +that it accepts, `cfg(..)`, such as: + +```rust +#[link(name = "foo", cfg(bar))] +``` + +This `cfg` indicates to the compiler that the `#[link]` annotation only applies +if the `bar` directive is matched. This interpretation is done not during +compilation of the crate in which the `#[link]` directive appears, but during +compilation of the crate in which linking is finally performed. The compiler +will then use this knowledge in two ways: + +* When `dllimport` or `dllexport` needs to be applied, it will evaluate the + final compilation unit's `#[cfg]` directives and see if upstream `#[link]` + directives apply or not. + +* When deciding what native libraries should be linked, the compiler will + evaluate whether they should be linked or not depending on the final + compilation's `#[cfg]` directives and the upstream `#[link]` directives. + +### Customizing linkage to the C runtime + +With the above features, the following changes will be made to select the +linkage of the C runtime at compile time for downstream crates. + +First, the `libc` crate will be modified to contain blocks along the lines of: + +```rust +cfg_if! { + if #[cfg(target_env = "musl")] { + #[link(name = "c", cfg(target_feature = "crt-static"), kind = "static")] + #[link(name = "c", cfg(not(target_feature = "crt-static")))] + extern {} + } else if #[cfg(target_env = "msvc")] { + #[link(name = "msvcrt", cfg(not(target_feature = "crt-static")))] + #[link(name = "libcmt", cfg(target_feature = "crt-static"))] + extern {} + } else { + // ... + } +} +``` + +This informs the compiler that, for the musl target, if the CRT is statically +linked then the library named `c` is included statically in libc.rlib. If the +CRT is linked dynamically, however, then the library named `c` will be linked +dynamically. Similarly for MSVC, a static CRT implies linking to `libcmt` and a +dynamic CRT implies linking to `msvcrt` (as we do today). + +Finally, an example of compiling for MSVC and linking statically to the C +runtime would look like: + +``` +RUSTFLAGS='-C target-feature=+crt-static' cargo build --target x86_64-pc-windows-msvc +``` + +and similarly, compiling for musl but linking dynamically to the C runtime would +look like: + +``` +RUSTFLAGS='-C target-feature=-crt-static' cargo build --target x86_64-unknown-linux-musl +``` + +### Future work + +The features proposed here are intended to be the absolute bare bones of support +needed to configure how the C runtime is linked. A primary drawback, however, is +that it's somewhat cumbersome to select the non-default linkage of the CRT. +Similarly, however, it's cumbersome to select target CPU features which are not +the default, and these two situations are very similar. Eventually it's intended +that there's an ergonomic method for informing the compiler and Cargo of all +"compilation codegen options" over the usage of `RUSTFLAGS` today. + +Furthermore, it would have arguably been a "more correct" choice for Rust to by +default statically link to the CRT on MSVC rather than dynamically. While this +would be a breaking change today due to how C components are compiled, if this +RFC is implemented it should not be a breaking change to switch the defaults in +the future, after a reasonable transition period. + +The support in this RFC implies that the exact artifacts that we're shipping +will be usable for both dynamically and statically linking the CRT. +Unfortunately, however, on MSVC code is compiled differently if it's linking to +a dynamic library or not. The standard library uses very little of the MSVCRT, +so this won't be a problem in practice for now, but runs the risk of binding our +hands in the future. It's intended, though, that Cargo [will eventually support +custom-compiling the standard library][std-aware cargo]. The `crt-static` +feature would simply be another input to this logic, so Cargo would +custom-compile the standard library if it differed from the upstream artifacts, +solving this problem. + +### References + +- [Issue about MSVCRT static linking] + (https://github.com/rust-lang/libc/issues/290) +- [Issue about musl dynamic linking] + (https://github.com/rust-lang/rust/issues/34987) +- [Discussion on issues around glgobal codegen configuration] + (https://internals.rust-lang.org/t/pre-rfc-a-vision-for-platform-architecture-configuration-specific-apis/3502) +- [std-aware Cargo RFC] + (https://github.com/rust-lang/libc/issues/290). + A proposal to teach Cargo to build the standard library. Rebuilding of std will + likely in the future be influenced by `-C target-feature`. +- [Cargo's documentation on build-script environment variables] + (https://github.com/rust-lang/libc/issues/290) + +# Drawbacks +[drawbacks]: #drawbacks + +* Working with `RUSTFLAGS` can be cumbersome, but as explained above it's + planned that eventually there's a much more ergonomic configuration method for + other codegen options like `target-cpu` which would also encompass the linkage + of the CRT. + +* Adding a feature which is intended to never be stable (`#[link(.., cfg(..))]`) + is somewhat unfortunate but allows sidestepping some of the more thorny + questions with how this works. The stable *semantics* will be that for some + targets the `--cfg crt_link=...` directive affects the linkage of the CRT, + which seems like a worthy goal regardless. + +* The lazy semantics of `#[link(cfg(..))]` are not so obvious from the name (no + other `cfg` attribute is treated this way). But this seems a minor issue since + the feature serves one implementation-specif purpose and isn't intended for + stabilization. + +# Alternatives +[alternatives]: #alternatives + +* One alternative is to add entirely new targets, for example + `x86_64-pc-windows-msvc-static`. Unfortunately though we don't have a great + naming convention for this, and it also isn't extensible to other codegen + options like `target-cpu`. Additionally, adding a new target is a pretty + heavyweight solution as we'd have to start distributing new artifacts and + such. + +* Another possibility would be to start storing metadata in the "target name" + along the lines of `x86_64-pc-windows-msvc+static`. This is a pretty big + design space, though, which may not play well with Cargo and build scripts, so + for now it's preferred to avoid this rabbit hole of design if possible. + +* Finally, the compiler could simply have an environment variable which + indicates the CRT linkage. This would then be read by the compiler and by + build scripts, and the compiler would have its own back channel for changing + the linkage of the C library along the lines of `#[link(.., cfg(..))]` above. + +* Another approach has [been proposed recently][rfc-1684] that has + rustc define an environment variable to specify the C runtime kind. + +[rfc-1684]: https://github.com/rust-lang/rfcs/pull/1684 + +* Instead of extending the semantics of `-C target-feature` beyond "CPU + features", we could instead add a new flag for the purpose, e.g. `-C + custom-feature`. + +# Unresolved questions +[unresolved]: #unresolved-questions + +* What happens during the `cfg` to environment variable conversion for values + that contain commas? It's an unusual corner case, and build scripts should not + depend on such values, but it needs to be handled sanely. + +* Is it really true that lazy linking is only needed by std's libc? What about + in a world where we distribute more precompiled binaries than just std? + diff --git a/text/1725-unaligned-access.md b/text/1725-unaligned-access.md new file mode 100644 index 00000000000..6424f0c61c6 --- /dev/null +++ b/text/1725-unaligned-access.md @@ -0,0 +1,63 @@ +- Feature Name: `unaligned_access` +- Start Date: 2016-08-22 +- RFC PR: [rust-lang/rfcs#1725](https://github.com/rust-lang/rfcs/pull/1725) +- Rust Issue: [rust-lang/rust#37955](https://github.com/rust-lang/rust/issues/37955) + +# Summary +[summary]: #summary + +Add two functions, `ptr::read_unaligned` and `ptr::write_unaligned`, which allows reading/writing to an unaligned pointer. All other functions that access memory (`ptr::{read,write}`, `ptr::copy{_nonoverlapping}`, etc) require that a pointer be suitably aligned for its type. + +# Motivation +[motivation]: #motivation + +One major use case is to make working with packed structs easier: + +```rust +#[repr(packed)] +struct Packed(u8, u16, u8); + +let mut a = Packed(0, 1, 0); +unsafe { + let b = ptr::read_unaligned(&a.1); + ptr::write_unaligned(&mut a.1, b + 1); +} +``` + +Other use cases generally involve parsing some file formats or network protocols that use unaligned values. + +# Detailed design +[design]: #detailed-design + +The implementation of these functions are simple wrappers around `ptr::copy_nonoverlapping`. The pointers are cast to `u8` to ensure that LLVM does not make any assumptions about the alignment. + +```rust +pub unsafe fn read_unaligned(p: *const T) -> T { + let mut r = mem::uninitialized(); + ptr::copy_nonoverlapping(p as *const u8, + &mut r as *mut _ as *mut u8, + mem::size_of::()); + r +} + +pub unsafe fn write_unaligned(p: *mut T, v: T) { + ptr::copy_nonoverlapping(&v as *const _ as *const u8, + p as *mut u8, + mem::size_of::()); +} +``` + +# Drawbacks +[drawbacks]: #drawbacks + +There functions aren't *stricly* necessary since they are just convenience wrappers around `ptr::copy_nonoverlapping`. + +# Alternatives +[alternatives]: #alternatives + +We could simply not add these, however figuring out how to do unaligned access properly is extremely unintuitive: you need to cast the pointer to `*mut u8` and then call `ptr::copy_nonoverlapping`. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None diff --git a/text/1728-north-star.md b/text/1728-north-star.md new file mode 100644 index 00000000000..8b815c11e7b --- /dev/null +++ b/text/1728-north-star.md @@ -0,0 +1,476 @@ +- Feature Name: north_star +- Start Date: 2016-08-07 +- RFC PR: #1728 +- Rust Issue: N/A + +# Summary +[summary]: #summary + +A refinement of the Rust planning and reporting process, to establish a shared +vision of the project among contributors, to make clear the roadmap toward that +vision, and to celebrate our achievements. + +Rust's roadmap will be established in year-long cycles, where we identify up +front - together, as a project - the most critical problems facing the language +and its ecosystem, along with the story we want to be able to tell the world +about Rust. Work toward solving those problems, our short-term goals, will be +decided by the individual teams, as they see fit, and regularly re-triaged. For +the purposes of reporting the project roadmap, goals will be assigned to release +cycle milestones. + +At the end of the year we will deliver a public facing retrospective, describing +the goals we achieved and how to use the new features in detail. It will +celebrate the year's progress toward our goals, as well as the achievements of +the wider community. It will evaluate our performance and anticipate its impact +on the coming year. + +The primary outcome for these changes to the process are that we will have a +consistent way to: + +- Decide our project-wide goals through consensus. +- Advertise our goals as a published roadmap. +- Celebrate our achievements with an informative publicity-bomb. + +# Motivation +[motivation]: #motivation + +Rust is a massive project and ecosystem, developed by a massive team of +mostly-independent contributors. What we've achieved together already is +mind-blowing: we've created a uniquely powerful platform that solves problems +that the computing world had nearly given up on, and jumpstarted a new era in +systems programming. Now that Rust is out in the world, proving itself to be a +stable foundation for building the next generation of computing systems, the +possibilities open to us are nearly endless. + +And that's a big problem. + +In the run-up to the release of Rust 1.0 we had a clear, singular goal: get Rust +done and deliver it to the world. We established the discrete steps necessary +to get there, and although it was a tense period where the entire future of the +project was on the line, we were united in a single mission. As The Rust Project +Developers we were pumped up, and our user base - along with the wider +programming world - were excited to see what we would deliver. + +But 1.0 is a unique event, and since then our efforts have become more diffuse +even as the scope of our ambitions widen. This shift is inevitable: **our success +post-1.0 depends on making improvements in increasingly broad and complex ways**. +The downside, of course, is that a less singular focus can make it much harder +to rally our efforts, to communicate a clear story - and ultimately, to ship. + +Since 1.0, we've attempted to lay out some major goals, both through the +[internals forum] and the [blog]. We've done pretty well in actually achieving +these goals, and in some cases - particularly [MIR] - the community has really +come together to produce amazing, focused results. But in general, there are +several problems with the status quo: + +[internals forum]: https://internals.rust-lang.org/t/priorities-after-1-0/1901 +[blog]: https://blog.rust-lang.org/2015/08/14/Next-year.html +[MIR]: https://blog.rust-lang.org/2016/04/19/MIR.html + +- We have not systematically tracked or communicated our progression through the + completion of these goals, making it difficult for even the most immersed + community members to know where things stand, and making it difficult for + *anyone* to know how or where to get involved. A symptom is that questions + like "When is MIR landing?" or "What are the blockers for `?` stabilizing" + become extremely frequently-asked. **We should provide an at-a-glance view + what Rust's current strategic priorities are and how they are progressing.** + +- We are overwhelmed by an avalanche of promising ideas, with major RFCs + demanding attention (and languishing in the queue for months) while subteams + focus on their strategic goals. This state of affairs produces needless + friction and loss of momentum. **We should agree on and disseminate our + priorities, so we can all be pulling in roughly the same direction**. + +- We do not have any single point of release, like 1.0, that gathers together a + large body of community work into a single, polished product. Instead, we have + a rapid release process, which results in a [remarkably stable and reliable + product][s] but can paradoxically reduce pressure to ship new features in a + timely fashion. **We should find a balance, retaining rapid release but + establishing some focal point around which to rally the community, polish a + product, and establish a clear public narrative**. + +[s]: http://blog.rust-lang.org/2014/10/30/Stability.html + +All told, there's a lot of room to do better in establishing, communicating, and +driving the vision for Rust. + +This RFC proposes changes to the way The Rust Project plans its work, +communicates and monitors its progress, directs contributors to focus on the +strategic priorities of the project, and finally, delivers the results of its +effort to the world. + +The changes proposed here are intended to work with the particular strengths of +our project - community development, collaboration, distributed teams, loose +management structure, constant change and uncertainty. It should introduce +minimal additional burden on Rust team members, who are already heavily +overtasked. The proposal does not attempt to solve all problems of project +management in Rust, nor to fit the Rust process into any particular project +management structure. Let's make a few incremental improvements that will have +the greatest impact, and that we can accomplish without disruptive changes to +the way we work today. + +# Detailed design +[design]: #detailed-design + +Rust's roadmap will be established in year-long cycles, where we identify up +front the most critical problems facing the project, formulated as _problem +statements_. Work toward solving those problems, _goals_, will be planned as +part of the release cycles by individual teams. For the purposes of reporting +the project roadmap, goals will be assigned to _release cycle milestones_, which +represent the primary work performed each release cycle. Along the way, teams +will be expected to maintain _tracking issues_ that communicate progress toward +the project's goals. + +At the end of the year we will deliver a public facing retrospective, which is +intended as a 'rallying point'. Its primary purposes are to create anticipation +of a major event in the Rust world, to motivate (rally) contributors behind the +goals we've established to get there, and generate a big PR-bomb where we can +brag to the world about what we've done. It can be thought of as a 'state of the +union'. This is where we tell Rust's story, describe the new best practices +enabled by the new features we've delivered, celebrate those contributors who +helped achieve our goals, honestly evaluate our performance, and look forward to +the year to come. + +## Summary of terminology + +Key terminology used in this RFC: + +- _problem statement_ - A description of a major issue facing Rust, possibly + spanning multiple teams and disciplines. We decide these together, every year, + so that everybody understands the direction the project is taking. These are + used as the broad basis for decision making throughout the year, and are + captured in the yearly "north star RFC", and tagged `R-problem-statement` + on the issue tracker. + +- _goal_ - These are set by individual teams quarterly, in service of solving + the problems identified by the project. They have estimated deadlines, and + those that result in stable features have estimated release numbers. Goals may + be subdivided into further discrete tasks on the issue tracker. They are + tagged `R-goal`. + +- _retrospective_ - At the end of the year we deliver a retrospective report. It + presents the result of work toward each of our goals in a way that serves to + reinforce the year's narrative. These are written for public consumption, + showing off new features, surfacing interesting technical details, and + celebrating those who contribute to achieving the project's goals and + resolving it's problems. + +- _release cycle milestone_ - All goals have estimates for completion, placed on + milestones that correspond to the 6 week release cycle. These milestones are + timed to corrspond to a release cycle, but don't represent a specific + release. That is, work toward the current nightly, the current beta, or even + that doesn't directly impact a specific release, all goes into the release + cycle milestone corresponding to the time period in which the work is + completed. + +## Problem statements and the north star RFC + +The full planning cycle spans one year. At the beginning of the cycle we +identify areas of Rust that need the most improvement, and at the end of the +cycle is a 'rallying point' where we deliver to the world the results of our +efforts. We choose year-long cycles because a year is enough time to accomplish +relatively large goals; and because having the rallying point occur at the same +time every year makes it easy to know when to anticipate big news from the +project. Being calendar-based avoids the temptation to slip or produce +feature-based releases, instead providing a fixed point of accountability for +shipping. + +This planning effort is _problem-oriented_. Focusing on "how" may seem like an +obvious thing to do, but in practice it's very easy to become enamored of +particular technical ideas and lose sight of the larger context. By codifying a +top-level focus on motivation, we ensure we are focusing on the right problems +and keeping an open mind on how to solve them. Consensus on the problem space +then frames the debate on solutions, helping to avoid surprises and hurt +feelings, and establishing a strong causal record for explaining decisions in +the future. + +At the beginning of the cycle we spend no more than one month deciding on a +small set of _problem statements_ for the project, for the year. The number +needs to be small enough to present to the community managably, while also +sufficiently motivating the primary work of all the teams for the year. 8-10 is +a reasonable guideline. This planning takes place via the RFC process and is +open to the entire community. The result of the process is the yearly 'north +star RFC'. + +The problem statements established here determine the strategic direction of the +project. They identify critical areas where the project is lacking and represent +a public commitment to fixing them. They should be informed in part by inputs +like [the survey] and [production user outreach], as well as an open discussion +process. And while the end-product is problem-focused, the discussion is likely +to touch on possible solutions as well. We shouldn't blindly commit to solving a +problem without some sense for the plausibility of a solution in terms of both +design and resources. + +[the survey]: https://blog.rust-lang.org/2016/06/30/State-of-Rust-Survey-2016.html +[production user outreach]: https://internals.rust-lang.org/t/production-user-research-summary/2530 + +Problem statements consist of a single sentence summarizing the problem, and one +or more paragraphs describing it (and its importance!) in detail. Examples of +good problem statements might be: + +- The Rust compiler is too slow for a tight edit-compile-test cycle +- Rust lacks world-class IDE support +- The Rust story for asynchronous I/O is very primitive +- Rust compiler errors are difficult to understand +- Rust plugins have no clear path to stabilization +- Rust doesn't integrate well with garbage collectors +- Rust's trait system doesn't fully support zero-cost abstractions +- The Rust community is insufficiently diverse +- Rust needs more training materials +- Rust's CI infrastructure is unstable +- It's too hard to obtain Rust for the platforms people want to target + +During the actual process each of these would be accompanied by a paragraph or +more of justification. + +We strictly limit the planning phase to one month in order to keep the +discussion focused and to avoid unrestrained bikeshedding. The activities +specified here are not the focus of the project and we need to get through them +efficiently and get on with the actual work. + +The core team is responsible for initiating the process, either on the internals +forum or directly on the RFC repository, and the core team is responsible for +merging the final RFC, thus it will be their responsibility to ensure that the +discussion drives to a reasonable conclusion in time for the deadline. + +Once the year's problem statements are decided, a metabug is created for each on +the rust-lang/rust issue tracker and tagged `R-problem-statement`. In the OP of +each metabug the teams are responsible for maintaining a list of their goals, +linking to tracking issues. + +Like other RFCs, the north star RFC is not immutable, and if new motivations +arise during the year, it may be amended, even to the extent of adding +additional problem statements; though it is not appropriate for the project +to continually rehash the RFC. + +## Goal setting and tracking progress + +During the regular 6-week release cycles is where the solutions take shape and +are carried out. Each cycle teams are expected to set concrete _goals_ that work +toward solving the project's stated problems; and to review and revise their +previous goals. The exact forum and mechanism for doing this evaluation and +goal-setting is left to the individual teams, and to future experimentation, +but the end result is that each release cycle each team will document their +goals and progress in a standard format. + +A goal describes a task that contributes to solving the year's problems. It may +or may not involve a concrete deliverable, and it may be in turn subdivided into +further goals. Not all the work items done by teams in a quarter should be +considered a goal. Goals only need to be granular enough to demonstrate +consistent progress toward solving the project's problems. Work that contributes +toward quarterly goals should still be tracked as sub-tasks of those goals, but +only needs to be filed on the issue tracker and not reported directly as goals +on the roadmap. + +For each goal the teams will create an issue on the issue tracker tagged with +`R-goal`. Each goal must be described in a single sentence summary with an +end-result or deliverable that is as crisply stated as possible. Goals with +sub-goals and sub-tasks must list them in the OP in a standard format. + +During each cycle all `R-goal` and `R-unstable` issues assigned to each team +must be triaged and updated for the following information: + +- The set of sub-goals and sub-tasks and their status +- The release cycle milestone + +Goals that will be likely completed in this cycle or the next should be assigned +to the appropriate milestone. Some goals may be expected to be completed in +the distant future, and these do not need to be assigned a milestone. + +The release cycle milestone corresponds to a six week period of time and +contains the work done during that time. It does not correspend to a specific +release, nor do the goals assigned to it need to result in a stable feature +landing in any specific release. + +Release cycle milestones serve multiple purposes, not just tracking of the goals +defined in this RFC: `R-goal` tracking, tracking of stabilization of +`R-unstable` and `R-RFC-approved` features, tracking of critical bug fixes. + +Though the release cycle milestones are time-oriented and are not strictly tied +to a single upcoming release, from the set of assigned `R-unstable` issues one +can derive the new features landing in upcoming releases. + +During the last week of every release cycle each team will write a brief +report summarizing their goal progress for the cycle. Some project member +will compile all the team reports and post them to internals.rust-lang.org. +In addition to providing visibility into progress, these will be sources +to draw from for the subsequent release announcements. + +## The retrospective (rallying point) + +The retrospective is an opportunity to showcase the best of Rust and its +community to the world. + +It is a report covering all the Rust activity of the past year. It is written +for a broad audience: contributors, users and non-users alike. It reviews each +of the problems we tackled this year and the goals we achieved toward solving +them, and it highlights important work in the broader community and +ecosystem. For both these things the retrospective provides technical detail, as +though it were primary documentation; this is where we show our best side to the +world. It explains new features in depth, with clear prose and plentiful +examples, and it connects them all thematically, as a demonstration of how to +write cutting-edge Rust code. + +While we are always lavish with our praise of contributors, the retrospective is +the best opportunity to celebrate specific individuals and their contributions +toward the strategic interests of the project, as defined way back at the +beginning of the year. + +Finally, the retrospective is an opportunity to evaluate our performance. Did we +make progress toward solving the problems we set out to solve? Did we outright +solve any of them? Where did we fail to meet our goals and how might we do +better next year? + +Since the retrospective must be a high-quality document, and cover a lot of +material, it is expected to require significant planning, editing and revision. +The details of how this will work are to be determined. + +## Presenting the roadmap + +As a result of this process the Rust roadmap for the year is encoded in three +main ways, that evolve over the year: + +- The north-star RFC, which contains the problem statements collected in one + place +- The R-problem-statement issues, which contain the individual problem + statements, each linking to supporting goals +- The R-goal issues, which contain a hierarchy of work items, tagged with + metadata indicating their statuses. + +Alone, these provide the *raw data* for a roadmap. A user could run a +GitHub query for all `R-problem-statement` issues, and by digging through them +get a reasonably accurate picture of the roadmap. + +However, for the process to be a success, we need to present the roadmap in a +way that is prominent, succinct, and layered with progressive detail. There is a +lot of opportunity for design here; an early prototype of one possible view is +available [here]. + +[here]: https://brson.github.io/rust-z + +Again, the details are to be determined. + +## Calendar + +The timing of the events specified by this RFC is precisely specified in order +to set clear expectations and accountability, and to avoid process slippage. The +activities specified here are not the focus of the project and we need to get +through them efficiently and get on with the actual work. + +The north star RFC development happens during the month of September, starting +September 1 and ending by October 1. This means that an RFC must be ready for +FCP by the last week of September. We choose September for two reasons: it is +the final month of a calendar quarter, allowing the beginning of the years work +to commence at the beginning of calendar Q4; we choose Q4 because it is the +traditional conference season and allows us opportunities to talk publicly about +both our previous years progress as well as next years ambitions. By contrast, +starting with Q1 of the calendar year is problematic due to the holiday season. + +Following from the September planning month, the quarterly planning cycles take +place for exactly one week at the beginning of the calendar quarter; likewise, +the planning for each subsequent quarter at the beginning of the calendar +quarter; and the development of the yearly retrospective approximately for the +month of August. + +The survey and other forms of outreach and data gathering should be timed to fit +well into the overall calendar. + +## References + +- [Refining RFCs part 1: Roadmap] + (https://internals.rust-lang.org/t/refining-rfcs-part-1-roadmap/3656), + the internals.rust-lang.org thread that spawned this RFC. +- [Post-1.0 priorities thread on internals.rust-lang.org] + (https://internals.rust-lang.org/t/priorities-after-1-0/1901). +- [Post-1.0 blog post on project direction] + (https://blog.rust-lang.org/2015/08/14/Next-year.html). +- [Blog post on MIR] + (https://blog.rust-lang.org/2016/04/19/MIR.html), + a large success in strategic community collaboration. +- ["Stability without stagnation"] + (http://blog.rust-lang.org/2014/10/30/Stability.html), + outlining Rust's philosophy on rapid iteration while maintaining strong + stability guarantees. +- [The 2016 state of Rust survey] + (https://blog.rust-lang.org/2016/06/30/State-of-Rust-Survey-2016.html), + which indicates promising directions for future work. +- [Production user outreach thread on internals.rust-lang.org] + (https://internals.rust-lang.org/t/production-user-research-summary/2530), + another strong indicator of Rust's needs. +- [rust-z] + (https://brson.github.io/rust-z), + a prototype tool to organize the roadmap. + +# Drawbacks +[drawbacks]: #drawbacks + +The yearly north star RFC could be an unpleasant bikeshed, because it +simultaneously raises the stakes of discussion while moving away from concrete +proposals. That said, the *problem* orientation should help facilitate +discussion, and in any case it's vital to be explicit about our values and +prioritization. + +While part of the aim of this proposal is to increase the effectiveness of our +team, it also imposes some amount of additional work on everyone. Hopefully the +benefits will outweigh the costs. + +The end-of-year retrospective will require significant effort. It's not clear +who will be motivated to do it, and at the level of quality it demands. This is +the piece of the proposal that will probably need the most follow-up work. + +# Alternatives +[alternatives]: #alternatives + +Instead of imposing further process structure on teams we might attempt to +derive a roadmap solely from the data they are currently producing. + +To serve the purposes of a 'rallying point', a high-profile deliverable, we +might release a software product instead of the retrospective. A larger-scope +product than the existing rustc+cargo pair could accomplish this, i.e. +[The Rust Platform](http://aturon.github.io/blog/2016/07/27/rust-platform/) idea. + +Another rallying point could be a long-term support release. + +# Unresolved questions +[unresolved]: #unresolved-questions + +Are 1 year cycles long enough? + +Are 1 year cycles too long? What happens if important problems come up +mid-cycle? + +Does the yearly report serve the purpose of building anticipation, motivation, +and creating a compelling PR-bomb? + +Is a consistent time-frame for the big cycle really the right thing? One of the +problems we have right now is that our release cycles are so predictable they +are almost boring. It could be more exciting to not know exactly when the cycle +is going to end, to experience the tension of struggling to cross the finish +line. + +How can we account for work that is not part of the planning process +described here? + +How do we address problems that are outside the scope of the standard library +and compiler itself? (See +[The Rust Platform](http://aturon.github.io/blog/2016/07/27/rust-platform/) for +an alternative aimed at this goal.) + +How do we motivate the improvement of rust-lang crates and other libraries? Are +they part of the planning process? The retrospective? + +'Problem statement' is not inspiring terminology. We don't want to our roadmap +to be front-loaded with 'problems'. Likewise, 'goal' and 'retrospective' could +be more colorful. + +Can we call the yearly RFC the 'north star RFC'? Too many concepts? + +What about tracking work that is not part of R-problem-statement and R-goal? I +originally wanted to track all features in a roadmap, but this does not account +for anything that has not been explicitly identified as supporting the +roadmap. As formulated this proposal does not provide an easy way to find the +status of arbitrary features in the RFC pipeline. + +How do we present the roadmap? Communicating what the project is working on and +toward is one of the _primary goals_ of this RFC and the solution it proposes is +minimal - read the R-problem-statement issues. diff --git a/text/1774-roadmap-2017.md b/text/1774-roadmap-2017.md new file mode 100644 index 00000000000..ae260c2090f --- /dev/null +++ b/text/1774-roadmap-2017.md @@ -0,0 +1,594 @@ +- Feature Name: N/A +- Start Date: 2016-10-04 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1774 +- Rust Issue: N/A + +# Summary +[summary]: #summary + +This RFC proposes the *2017 Rust Roadmap*, in accordance with [RFC 1728](https://github.com/rust-lang/rfcs/pull/1728). The goal of the roadmap is to lay out a vision for where the Rust project should be in a year's time. **This year's focus is improving Rust's *productivity*, while retaining its emphasis on fast, reliable code**. At a high level, by the end of 2017: + +* Rust should have a lower learning curve +* Rust should have a pleasant edit-compile-debug cycle +* Rust should provide a solid, but basic IDE experience +* Rust should provide easy access to high quality crates +* Rust should be well-equipped for writing robust, high-scale servers +* Rust should have 1.0-level crates for essential tasks +* Rust should integrate easily into large build systems +* Rust's community should provide mentoring at all levels + +In addition, we should make significant strides in *exploring* two areas where +we're not quite ready to set out specific goals: + +* Integration with other languages, running the gamut from C to JavaScript +* Usage in resource-constrained environments + +The proposal is based on the [2016 survey], systematic outreach, direct conversations with individual Rust users, and an extensive [internals thread]. Thanks to everyone who helped with this effort! + +[2016 survey]: https://blog.rust-lang.org/2016/06/30/State-of-Rust-Survey-2016.html +[internals thread]: https://internals.rust-lang.org/t/setting-our-vision-for-the-2017-cycle/ + +# Motivation +[motivation]: #motivation + +There's no end of possible improvements to Rust—so what do we use to guide our +thinking? + +The core team has tended to view our strategy not in terms of particular features or +aesthetic goals, but instead in terms of **making Rust successful while staying +true to its core values**. This basic sentiment underlies much of the proposed +roadmap, so let's unpack it a bit. + +## Making Rust successful + +### The measure of success + +What does it mean for Rust to be successful? There are a lot of good answers to +this question, a lot of different things that draw people to use or contribute +to Rust. But regardless of our *personal* values, there's at least one clear +measure for Rust's broad success: **people should be using Rust in +production and reaping clear benefits from doing so**. + +- Production use matters for the obvious reason: it grows the set of + stakeholders with potential to invest in the language and ecosystem. To + deliver on that potential, Rust needs to be part of the backbone of some major + products. + +- Production use measures our *design* success; it's the ultimate reality + check. Rust takes a unique stance on a number of tradeoffs, which we believe + to position it well for writing fast and reliable software. The real test of + those beliefs is people using Rust to build large, production systems, on + which they're betting time and money. + +- The *kind* of production use matters. For Rust to truly be a success, there + should be clear-cut reasons people are employing it rather than another + language. Rust needs to provide crisp, standout benefits to the organizations + using it. + +The idea here is *not* about "taking over the world" with Rust; it's not about +market share for the sake of market share. But if Rust is truly delivering a +valuable new way of programming, we should be seeing that benefit in "the real +world", in production uses that are significant enough to help sustain Rust's +development. + +That's not to say we should expect to see this usage *immediately*; there's a +long pipeline for technology adoption, so the effects of our work can take a +while to appear. The framing here is about our long-term aims. We should be +making investments in Rust today that will position it well for this kind of +success in the future. + +### The obstacles to success + +At this point, we have a fair amount of data about how Rust is reaching its +audience, through the [2016 survey], informal conversations, and explicit +outreach to (pre-)production shops (writeup coming soon). The data from the +survey is generally corroborated by these other venues, so let's focus on that. + +[2016 survey]: https://blog.rust-lang.org/2016/06/30/State-of-Rust-Survey-2016.html + +We asked both current and potential users what most stands in the way of their +using Rust, and got some pretty clear answers: + +- 1 in 4: learning curve +- 1 in 7: lack of libraries +- 1 in 9: general “maturity” concerns +- 1 in 19: lack of IDEs (1 in 4 non-users) +- 1 in 20: compiler performance + +None of these obstacles is directly about the core language or `std`; people are +generally happy with what the language offers today. Instead, the connecting +theme is *productivity*—how quickly can I start writing real code? bring up a +team? prototype and iterate? debug my code? And so on. + +In other words, our primary challenge isn't making Rust "better" in the +abstract; it's making people *productive* with Rust. The need is most pronounced +in the early stages of Rust learning, where we risk losing a large pool of +interested people if we can't get them over the hump. Evidence from the survey +and elsewhere suggests that once people do get over the initial learning curve, +they tend to stick around. + +So how do we pull it off? + +### Core values + +Part of what makes Rust so exciting is that it attempts to eliminate some +seemingly fundamental tradeoffs. The central such tradeoff is between safety +and speed. Rust strives for + +- uncompromising reliability +- uncompromising performance + +and delivers on this goal largely thanks to its fundamental concept of +ownership. + +But there's a problem: at first glance, "productivity" and "learnability" may +seem at odds with Rust's core goals. It's common to hear the refrain that +"fighting with the borrow checker" is a rite of passage for Rustaceans. Or that +removing papercuts would mean glossing over safety holes or performance cliffs. + +To be sure, there are tradeoffs here. But as above, if there's one thing the +Rust community knows how to do, it's bending the curve around tradeoffs—memory +safety without garbage collection, concurrency without data races, and all the +rest. We have many examples in the language where we've managed to make a +feature pleasant to use, while also providing maximum performance and +safety—closures are a particularly good example, but there are +[others](https://internals.rust-lang.org/t/roadmap-2017-productivity-learning-curve-and-expressiveness/4097). + +And of course, beyond the core language, "productivity" also depends a lot on +tooling and the ecosystem. Cargo is one example where Rust's tooling provides a +huge productivity boost, and we've been working hard on other aspects of +tooling, like the +[compiler's error messages](https://blog.rust-lang.org/2016/08/10/Shape-of-errors-to-come.html), +that likewise have a big impact on productivity. There's so much more we can be +doing in this space. + +In short, **productivity should be a core value of Rust**. By the end of 2017, +let's try to earn the slogan: + +- Rust: fast, reliable, productive—pick three. + +# Detailed design +[design]: #detailed-design + +## Overall strategy + +In the abstract, reaching the kind of adoption we need means bringing +people along a series of distinct steps: + +- Public perception of Rust +- First contact +- Early play, toy projects +- Public projects +- Personal investment +- Professional investment + +We need to (1) provide "drivers", i.e. strong motivation to continue through the +stages and (2) avoid "blockers" that prevent people from progressing. + +At the moment, our most immediate adoption obstacles are mostly about blockers, +rather than a lack of drivers: there are people who see potential value in Rust, +but worry about issues like productivity, tooling, and maturity standing in the +way of use at scale. The roadmap proposes a set of goals largely angled at +reducing these blockers. + +However, for Rust to make sense to use in a significant way in production, it +also needs to have a "complete story" for one or more domains of use. The goals +call out a specific domain where we are already seeing promising production use, +and where we have a relatively clear path toward a more complete story. + +Almost all of the goals focus squarely on "productivity" of one kind or another. + +## Goals + +Now to the meat of the roadmap: the goals. Each is phrased in terms of a +*qualitative vision*, trying to carve out what the *experience* of Rust should +be in one year's time. The details mention some possible avenues toward a +solution, but this shouldn't be taken as prescriptive. + +These goals are partly informed from the [internals thread] about the +roadmap. That thread also posed a number of possible additional goals. Of +course, part of the work of the roadmap is to allocate our limited resources, +which fundamentally means not including some possible goals. Some of the most +promising suggestions that didn't make it into the roadmap proposal itself are +included in the Alternatives section. + +### Rust should have a lower learning curve + +Rust offers a unique value proposition in part because it offers a unique +feature: its ownership model. Because the concept is not (yet!) a widespread one +in other languages, it is something most people have to learn from scratch +before hitting their stride with Rust. And that often comes on top of other +aspects of Rust that may be less familiar. A common refrain is "the first couple +of weeks are tough, but it's oh so worth it." How many people are bouncing off +of Rust in those first couple of weeks? How many team leads are reluctant to +introduce Rust because of the training needed? (1 in 4 survey respondents +mentioned the learning curve.) + +Here are some strategies we might take to lower the learning curve: + +- **Improved docs**. While the existing Rust book has been successful, we've + learned a lot about teaching Rust, and there's a + [rewrite](http://words.steveklabnik.com/whats-new-with-the-rust-programming-language) + in the works. The effort is laser-focused on the key areas that trip people up + today (ownership, modules, strings, errors). + +- **Gathering cookbooks, examples, and patterns**. One way to quickly get + productive in a language is to work from a large set of examples and + known-good patterns that can guide your early work. As a community, we could + push crates to include more substantial example code snippets, and organize + efforts around design patterns and cookbooks. (See + [the commentary on the RFC thread](https://github.com/rust-lang/rfcs/pull/1774#issuecomment-269359228) + for much more detail.) + +- **Improved errors**. We've already made some + [big strides](https://blog.rust-lang.org/2016/08/10/Shape-of-errors-to-come.html) + here, particularly for ownership-related errors, but there's surely more room + for improvement. + +- **Improved language features**. There are a couple of ways that the language + design itself can be oriented toward learnability. First, we can introduce new + features with an explicit eye toward + [how they will be taught](https://github.com/rust-lang/rfcs/pull/1636). Second, + we can improve existing features to make them easier to understand and use -- + things like non-lexical lifetimes being a major example. There's already been + [some discussion on internals](https://internals.rust-lang.org/t/roadmap-2017-productivity-learning-curve-and-expressiveness/4097/) + +- **IDEs and other tooling**. IDEs provide a good opportunity for deeper + teaching. An IDE can visualize errors, for example *showing* you the lifetime + of a borrow. They can also provide deeper inspection of what's going on with + things like method dispatch, type inference, and so on. + +### Rust should have a pleasant edit-compile-debug cycle + +The edit-compile-debug cycle in Rust takes too long, and it's one of the +complaints we hear most often from production users. We've laid down a good +foundation with [MIR][] (now turned on by default) and [incremental compilation][] +(which recently hit alpha). But we need to continue pushing hard to actually +deliver the improvements. And to fully address the problem, **the improvement +needs to apply to large Rust projects, not just small or mid-sized benchmarks**. + +To get this done, we're also going to need further improvements to the +performance monitoring infrastructure, including more benchmarks. Note, though, +that the goal is stated *qualitatively*, and we need to be careful with what we +measure to ensure we don't lose sight of that goal. + +While the most obvious routes are direct improvements like incremental +compilation, since the focus here is primarily on development (including +debugging), another promising avenue is more usable debug builds. Production +users often say "debug binaries are too slow to run, but release binaries are +too slow to build". There may be a lot of room in the middle. + +Depending on how far we want to take IDE support (see below), pushing +incremental compilation up through the earliest stages of the compiler may also +be important. + +[MIR]: https://blog.rust-lang.org/2016/04/19/MIR.html +[incremental compilation]: https://blog.rust-lang.org/2016/09/08/incremental.html + +### Rust should provide a solid, but basic IDE experience + +For many people—even whole organizations—IDEs are an essential part of the +programming workflow. In the survey, 1 in 4 respondents mentioned requiring IDE +support before using Rust seriously. Tools like [Racer] and the [IntelliJ] Rust +plugin have made great progress this year, but [compiler integration] in its +infancy, which limits the kinds of tools that general IDE plugins can provide. + +The problem statement here says "solid, but basic" rather than "world-class" IDE +support to set realistic expectations for what we can get done this year. Of +course, the precise contours will need to be driven by implementation work, but +we can enumerate some basic constraints for such an IDE here: + +- It should be **reliable**: it shouldn't crash, destroy work, or give inaccurate + results in situations that demand precision (like refactorings). +- It should be **responsive**: the interface should never hang waiting on the + compiler or other computation. In places where waiting is required, the + interface should update as smoothly as possible, while providing + responsiveness throughout. +- It should provide **basic functionality**. At a minimum, that's: syntax + highlighting, basic code navigation (e.g. go-to-definition), code completion, + build support (with Cargo integration), error integration, and code + formatting. + +Note that while some of this functionality is available in existing IDE/plugin +efforts, a key part of this initiative is to (1) lay the foundation for plugins +based on compiler integration (2) pull together existing tools into a single +service that can integrate with multiple IDEs. + +[Racer]: https://github.com/phildawes/racer +[IntelliJ]: https://intellij-rust.github.io/ +[compiler integration]: https://internals.rust-lang.org/t/introducing-rust-language-server-source-release/4209/ + +### Rust should provide easy access to high quality crates + +Another major message from the survey and elsewhere is that Rust's ecosystem, +while growing, is still immature (1 in 9 survey respondents mentioned +this). Maturity is not something we can rush. But there are steps we can take +across the ecosystem to help improve the quality and discoverability of crates, +both of which will help increase the overall sense of maturity. + +Some avenues for quality improvement: + +- Provide stable, extensible test/bench frameworks. +- Provide more push-button CI setup, e.g. have `cargo new` set up Travis/Appveyor. +- Restart the [API guidelines](http://aturon.github.io/) project. +- Use badges on crates.io to signal various quality metrics. +- Perform API reviews on important crates. + +Some avenues for discoverability improvement: + +- Adding categories to crates.io, making it possible to browse lists like + "crates for parsing". +- More sophisticated ranking and/or curation. + +A number of ideas along these lines were discussed in the [Rust Platform thread]. + +[Rust Platform thread]: https://internals.rust-lang.org/t/proposal-the-rust-platform/3745 + +### Rust should be well-equipped for writing robust, high-scale servers + +The biggest area we've seen with interest in production Rust so far is the +server, particularly in cases where high-scale performance, control, and/or +reliability are paramount. At the moment, our ecosystem in this space is +nascent, and production users are having to build a lot from scratch. + +Of the specific domains we might target for having a more complete story, Rust +on the server is the place with the clearest direction and momentum. In a year's +time, it's within reach to drastically improve Rust's server ecosystem and the +overall experience of writing server code. The relevant pieces here include +foundations for async IO, language improvements for async code ergonomics, +shared infrastructure for writing services (including abstractions for +implementing protocols and middleware), and endless interfaces to existing +services/protocols. + +There are two reasons to focus on the robust, high-scale case. Most importantly, +it's the place where Rust has the clearest value proposition relative to other +languages, and hence the place where we're likeliest to achieve significant, +quality production usage (as discussed earlier in the RFC). More generally, the +overall server space is *huge*, so choosing a particular niche provides +essential focus for our efforts. + +### Rust should have 1.0-level crates for essential tasks + +Rust has taken a decidedly lean approach to its standard library, preferring for +much of the typical "batteries included" functionality to live externally in the +crates.io ecosystem. While there are a lot of benefits to that approach, it's +important that we do in fact provide the batteries somewhere: we need 1.0-level +functionality for essential tasks. To pick just one example, the `rand` crate +has suffered from a lack of vision and has effectively stalled before reaching +1.0 maturity, despite its central importance for a non-trivial part of the +ecosystem. + +There are two basic strategies we might take to close these gaps. + +The first is to identify a broad set of "essential tasks" by, for example, +finding the commonalities between large "batteries included" standard libraries, +and focus community efforts on bolstering crates in these areas. With sustained +and systematic effort, we can probably help push a number of these crates to 1.0 +maturity this year. + +A second strategy is to focus specifically on tasks that play to Rust's +strengths. For example, Rust's potential for [fearless concurrency] across a +range of paradigms is one of the most unique and exciting aspects of the +language. But we aren't fully delivering on this potential, due to the +immaturity of libraries in the space. The response to work in this space, like +the recent [futures library announcement], suggests that there is a lot of +pent-up demand and excitement, and that this kind of work can open a lot of +doors for Rust. So concurrency/asynchrony/parallelism is one segment of the +ecosystem that likely deserves particular focus (and feeds into the high-scale +server goal as well); there are likely others. + +[fearless concurrency]: http://blog.rust-lang.org/2015/04/10/Fearless-Concurrency.html +[futures library announcement]: http://aturon.github.io/blog/2016/08/11/futures/ + +### Rust should integrate easily into large build systems + +When working with larger organizations interested in using Rust, one of the +first hurdles we tend to run into is fitting into an existing build +system. We've been exploring a number of different approaches, each of which +ends up using Cargo (and sometimes `rustc`) in different ways, with different +stories about how to incorporate crates from the broader crates.io ecosystem. +Part of the issue seems to be a perceived overlap between functionality in Cargo +(and its notion of compilation unit) and in ambient build systems, but we have +yet to truly get to the bottom of the issues—and it may be that the problem is +one of communication, rather than of some technical gap. + +By the end of 2017, this kind of integration should be *easy*: as a community, +we should have a strong understanding of best practices, and potentially build +tooling in support of those practices. And of course, we want to approach this +goal with Rust's values in mind, ensuring that first-class access to the +crates.io ecosystem is a cornerstone of our eventual story. + +### Rust's community should provide mentoring at all levels + +The Rust community is awesome, in large part because of how welcoming it is. But +we could do a lot more to help grow people into roles in the project, including +pulling together important work items at all level of expertise to direct people +to, providing mentoring, and having a clearer on-ramp to the various official +Rust teams. Outreach and mentoring is also one of the best avenues for +increasing diversity in the project, which, as the survey demonstrates, has a +lot of room for improvement. + +While there's work here for *all* the teams, the community team in particular +will continue to focus on early-stage outreach, while other teams will focus on +leadership onboarding. + +## Areas of exploration + +The goals above represent the steps we think are most essential to Rust's +success in 2017, and where we are in a position to lay out a fairly concrete vision. + +Beyond those goals, however, there are a number of areas with strong potential +for Rust that are in a more exploratory phase, with subcommunities already +exploring the frontiers. Some of these areas are important enough that we want +to call them out explicitly, and will expect ongoing progress over the course of +the year. In particular, the subteams are expected to proactively help organize +and/or carry out explorations in these areas, and by the end of the year we +expect to have greater clarity around Rust's story for these areas, putting us +in a position to give more concrete goals in subsequent roadmaps. + +Here are the two proposed Areas of Exploration. + +### Integration with other languages + +Other languages here includes "low-level" cases like C/C++, and "high-level" +cases like JavaScript, Ruby, Python, Java and C#. Rust adoption often depends on +being able to start using it *incrementally*, and language integration is often +a key to doing so -- an intuition substantiated by data from the survey and +commercial outreach. + +Rust's core support for interfacing with C is fairly strong, but wrapping a C +library still involves tedious work mirroring declarations and writing C shims +or other glue code. Moreover, many projects that are ripe for Rust integration +are currently using C++, and interfacing with those effectively requires +maintaining an alternative C wrapper for the C++ APIs. This is a problem both +for Rust code that wants to employ existing libraries and for those who want to +integrate Rust into existing C/C++ codebases. + +For interfacing with "high-level" languages, there is the additional barrier of +working with a runtime system, which often involves integration with a garbage +collector and object system. There are ongoing projects on these fronts, but +it's early days and there are still a lot of open questions. + +Some potential avenues of exploration include: + +- Continuing work on bindgen, with focus on seamless C and eventually C++ + support. This may involve some FFI-related language extensions (like richer + `repr`). +- Other routes for C/C++ integration. +- Continued expansion of existing projects like + [Helix](https://github.com/rustbridge/helix) and + [Neon](https://github.com/dherman/neon), which may require some language + enhancements. +- Continued work on [GC integration hooks](http://manishearth.github.io/blog/2016/08/18/gc-support-in-rust-api-design/) +- Investigation of object system integrations, including DOM and + [GObject](https://internals.rust-lang.org/t/rust-and-gnome-meeting-notes/4339). + +### Usage in resource-constrained environments + +Rust is a natural fit for programming resource-constrained devices, and +there are some [ongoing efforts](https://github.com/rust-embedded/) to better +organize work in this area, as well as a +[thread](https://internals.rust-lang.org/t/roadmap-2017-needs-of-no-std-embedded-developers/4096) +on the current significant problems in the domain. Embedded devices likewise +came up repeatedly in the internals thread. It's also a potentially huge +market. At the moment, though, it's far from clear what it will take to achieve +significant production use in the embedded space. It would behoove us to try to +get a clearer picture of this space in 2017. + +Some potential avenues of exploration include: + +- Continuing work on [rustup](https://github.com/rust-lang-nursery/rustup.rs/), + [xargo](https://github.com/japaric/xargo) and similar tools for easing + embedded development. +- Land ["std-aware Cargo"](https://github.com/rust-lang/rfcs/pull/1133), making + it easier to experiment with ports of the standard library to new platforms. +- Work on + [scenarios](https://internals.rust-lang.org/t/fleshing-out-libstd-scenarios/4206) + or other techniques for cutting down `std` in various ways, depending on + platform capabilities. +- Develop a story for failable allocation in `std` (i.e., without aborting when + out of memory). + +## Non-goals + +Finally, it's important that the roadmap "have teeth": we should be focusing on +the goals, and avoid getting distracted by other improvements that, whatever +their appeal, could sap bandwidth and our ability to ship what we believe is +most important in 2017. + +To that end, it's worth making some explicit *non*-goals, to set expectations +and short-circuit discussions: + +- No major new language features, except in service of one of the goals. Cases + that have a very strong impact on the "areas of support" may be considered + case-by-case. + +- No major expansions to `std`, except in service of one of the goals. Cases + that have a very strong impact on the "areas of support" may be considered + case-by-case. + +- No Rust 2.0. In particular, no changes to the language or `std` that could be + perceived as "major breaking changes". We need to be doing everything we can + to foster maturity in Rust, both in reality and in perception, and ongoing + stability is an important part of that story. + +# Drawbacks and alternatives +[drawbacks]: #drawbacks + +It's a bit difficult to enumerate the full design space here, given how much +there is we could potentially be doing. Instead, we'll take a look at some +alternative high-level strategies, and some additional goals from the internals +thread. + +## Overall strategy + +At a high level, though, the biggest alternatives (and potential for drawbacks) +are probably at the strategic level. This roadmap proposal takes the approach of +(1) focusing on reducing clear blockers to Rust adoption, particularly connected +with productivity and (2) choosing one particular "driver" for adoption to +invest in, namely high-scale servers. The balance between blocker/driver focus +could be shifted—it might be the case that by providing more incentive to use +Rust in a particular domain, people are willing to overlook some of its +shortcomings. + +Another possible blind spot is the conservative take on language expansion, +particularly when it comes to productivity. For example, we could put much +greater emphasis on "metaprogramming", and try to complete Plugins 2.0 +in 2017. That kind of investment *could* pay dividends, since libraries can do +amazing things with plugins that could draw people to Rust. But, as above, the +overall strategy of reducing blockers assumes that what's most needed isn't more +flashy examples of Rust's power, but rather more bread-and-butter work on +reducing friction, improving tooling, and just making Rust easier to use across +the board. + +The roadmap is informed by the survey, systematic outreach, numerous direct +conversations, and general strategic thinking. But there could certainly be +blind spots and biases. It's worth double-checking our inputs. + +## Other ideas from the internals thread + +Finally, there were several strong contenders for additional goals from the internals +thread that we might consider. To be clear, these are not currently part of the +proposed goals, but we may want to consider elevating them: + +- A goal explicitly for + [systematic expansion of commercial use](https://internals.rust-lang.org/t/setting-our-vision-for-the-2017-cycle/3958/68); + this proposal takes that as a kind of overarching idea for all of the goals. + +- A goal for Rust infrastructure, which came + [up](https://internals.rust-lang.org/t/setting-our-vision-for-the-2017-cycle/3958/9) + [several](https://internals.rust-lang.org/t/setting-our-vision-for-the-2017-cycle/3958/68) + [times](https://internals.rust-lang.org/t/setting-our-vision-for-the-2017-cycle/3958/5). + While this goal seems quite worthwhile in terms of paying dividends across the + project, in terms of our current subteam makeup it's hard to see how to + allocate resources toward this goal without dropping other important goals. We + might consider forming a dedicated infrastructure team, or somehow organizing + and growing our bandwidth in this area. + +- A goal for progress in areas like + [scientific computing](https://internals.rust-lang.org/t/setting-our-vision-for-the-2017-cycle/3958/52), + [HPC](https://internals.rust-lang.org/t/setting-our-vision-for-the-2017-cycle/3958/48). + +After an exhaustive look at the thread, the remaining proposals are in one way +or another covered somewhere in the discussion above. + +# Unresolved questions +[unresolved]: #unresolved-questions + +The main unresolved question is how to break the given goals into more +deliverable pieces of work, but that's a process that will happen after the +overall roadmap is approved. + +Are there other "areas of support" we should consider? Should any of these areas +be elevated to a top-level goal (which would likely involve cutting back on some +other goal)? + +Should we consider some loose way of organizing "special interest groups" to +focus on some of the priorities not part of the official goal set, but where +greater coordination would be helpful? This was suggested +[multiple](https://internals.rust-lang.org/t/setting-our-vision-for-the-2017-cycle/3958/70) +[times](https://internals.rust-lang.org/t/setting-our-vision-for-the-2017-cycle/3958/135). + +Finally, there were several strong contenders for additional goals from the +internals thread that we might consider, which are listed at the end of the +goals section. diff --git a/text/1828-rust-bookshelf.md b/text/1828-rust-bookshelf.md new file mode 100644 index 00000000000..1222116a7c8 --- /dev/null +++ b/text/1828-rust-bookshelf.md @@ -0,0 +1,120 @@ +- Feature Name: N/A +- Start Date: 2016-12-25 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1828 +- Rust Issue: https://github.com/rust-lang/rust/issues/39588 + +# Summary +[summary]: #summary + +Create a "Rust Bookshelf" of learning resources for Rust. + +* Pull the book out of tree into `rust-lang/book`, which holds the second + edition, currently. +* Pull the nomicon and the reference out of tree and convert them to mdBook. +* Pull the cargo docs out of tree and convert them to mdBook. +* Create a new "Nightly Book" in-tree. +* Provide a path forward for more long-form documentation to be maintained by + the project. + +This is largely about how doc.rust-lang.org is organized; today, it points to +the book, the reference, the nomicon, the error index, and the standard library +docs. This suggests unifying the first three into one thing. + +# Motivation +[motivation]: #motivation + +There are a few independent motivations for this RFC. + +* Separate repos for separate projects. +* Consistency between long-form docs. +* A clear place for unstable documentation, which is now needed for + stabilization. +* Better promoting good resources like the 'nomicon, which may not be as well + known as "the book" is. + +These will be discussed further in the detailed design. + +# Detailed design +[design]: #detailed-design + +Several new repositories will be made, one for each of: + +* The Rustinomicon ("the 'nomicon") +* The Cargo Book +* The Rust Reference Manual + +These would live under the `rust-lang` organization. + +They will all use mdBook to build. They will have their existing text re-worked +into the format; at first a simple conversion, then more major improvements. +Their current text will be removed from the main tree. + +The first edition of the book lives in-tree, but the second edition lives in +`rust-lang/book`. We'll remove the existing text from the tree and move it +into `rust-lang/book`. + +A new book will be created from the "Nightly Rust" section of the book. It will +be called "The Nightly Book," and will contain unstable documentation for both +rustc and Cargo, as well as material that will end up in the reference. This +came up when [trying to document RFC +1623](https://github.com/rust-lang/rust/pull/37928). We don't have a unified +way of handling unstable documentation. This will give it a place to develop, +and part of the stabilization process will be moving documentation from this +book into the other parts of the documentation. + +The nightly book will be organized around `#![feature]`s, so that you can look +up the documentation for each feature, as well as seeing which features +currently exist. + +The nightly book is in-tree so that it runs more often, as part of people's +normal test suite. This doesn't mean that the book won't run on every commit; +just that the out-of-tree books will run mostly in CI, whereas the nightly +book will run when developers do `x.py check`. This is similar to how, today, +Traivs runs a subset of the tests, but buildbot runs all of them. + +The landing page on doc.rust-lang.org will show off the full bookshelf, to let +people find the documenation they need. It will also link to their respective +repositories. + +Finally, this creates a path for more books in the future: "the FFI Book" would +be one example of a possibility for this kind of thing. The docs team will +develop critera for accepting a book as part of the official project. + +# How We Teach This +[how-we-teach-this]: #how-we-teach-this + +The landing page on doc.rust-lang.org will show off the full bookshelf, to let +people find the documenation they need. It will also link to their respective +repositories. + +# Drawbacks +[drawbacks]: #drawbacks + +A ton of smaller repos can make it harder to find what goes where. + +Removing work from `rust-lang/rust` means people aren't credited in release +notes any more. I will be opening a separate RFC to address this issue, it's +also an issue without this RFC being accepted. + +Operations are harder, but they have to change to support this use-case for +other reasons, so this does not add any extra burden. + +# Alternatives +[alternatives]: #alternatives + +Do nothing. + +Do only one part of this, instead of the whole thing. + +Move all of the "bookshelf" into one repository, rather than individual ones. +This would require a lot more label-wrangling, but might be easier. + +# Unresolved questions +[unresolved]: #unresolved-questions + +How should the first and second editions of the book live in the same +repository? + +What criteria should we use to accept new books? + +Should we adopt "learning Rust with too many Linked Lists"?

, ignore)]`. + +# Drawbacks + +While the implementation of this change in the compiler will be +straightforward, the effects on downstream code will be significant, especially +in the standard library. + +# Alternatives + +`all` and `any` could be renamed to `and` and `or`, though I feel that the +proposed names read better with the function-like syntax and are consistent +with `Iterator::all` and `Iterator::any`. + +Issue [#2119](https://github.com/rust-lang/rust/issues/2119) proposed the +addition of `||` and `&&` operators and parantheses to the attribute syntax +to result in something like `#[cfg(a || (b && c)]`. I don't favor this proposal +since it would result in a major change to the attribute syntax for relatively +little readability gain. + +# Unresolved questions + +How long should multiple `#[cfg(...)]` attributes on a single item be +forbidden? It should probably be at least until after 0.12 releases. + +Should we permanently keep the behavior of treating `#[cfg(a, b)]` as +`#[cfg(all(a, b))]`? It is the common case, and adding this interpretation +can reduce the noise level a bit. On the other hand, it may be a bit confusing +to read as it's not immediately clear if it will be processed as `and(..)` or +`all(..)`. diff --git a/text/0195-associated-items.md b/text/0195-associated-items.md new file mode 100644 index 00000000000..4540b3a3904 --- /dev/null +++ b/text/0195-associated-items.md @@ -0,0 +1,1444 @@ +- Start Date: 2014-08-04 +- RFC PR #: [rust-lang/rfcs#195](https://github.com/rust-lang/rfcs/pull/195) +- Rust Issue #: [rust-lang/rust#17307](https://github.com/rust-lang/rust/issues/17307) + +# Summary + +This RFC extends traits with *associated items*, which make generic programming +more convenient, scalable, and powerful. In particular, traits will consist of a +set of methods, together with: + +* Associated functions (already present as "static" functions) +* Associated consts +* Associated types +* Associated lifetimes + +These additions make it much easier to group together a set of related types, +functions, and constants into a single package. + +This RFC also provides a mechanism for *multidispatch* traits, where the `impl` +is selected based on multiple types. The connection to associated items will +become clear in the detailed text below. + +*Note: This RFC was originally accepted before RFC 246 introduced the +distinction between const and static items. The text has been updated to clarify +that associated consts will be added rather than statics, and to provide a +summary of restrictions on the initial implementation of associated +consts. Other than that modification, the proposal has not been changed to +reflect newer Rust features or syntax.* + +# Motivation + +A typical example where associated items are helpful is data structures like +graphs, which involve at least three types: nodes, edges, and the graph itself. + +In today's Rust, to capture graphs as a generic trait, you have to take the +additional types associated with a graph as _parameters_: + +```rust +trait Graph { + fn has_edge(&self, &N, &N) -> bool; + ... +} +``` + +The fact that the node and edge types are parameters is confusing, since any +concrete graph type is associated with a *unique* node and edge type. It is also +inconvenient, because code working with generic graphs is likewise forced to +parameterize, even when not all of the types are relevant: + +```rust +fn distance>(graph: &G, start: &N, end: &N) -> uint { ... } +``` + +With associated types, the graph trait can instead make clear that the node and +edge types are determined by any `impl`: + +```rust +trait Graph { + type N; + type E; + fn has_edge(&self, &N, &N) -> bool; +} +``` + +and clients can abstract over them all at once, referring to them through the +graph type: + +```rust +fn distance(graph: &G, start: &G::N, end: &G::N) -> uint { ... } +``` + +The following subsections expand on the above benefits of associated items, as +well as some others. + +## Associated types: engineering benefits for generics + +As the graph example above illustrates, associated _types_ do not increase the +expressiveness of traits _per se_, because you can always use extra type +parameters to a trait instead. However, associated types provide several +engineering benefits: + +* **Readability and scalability** + + Associated types make it possible to abstract over a whole family of types at + once, without having to separately name each of them. This improves the + readability of generic code (like the `distance` function above). It also + makes generics more "scalable": traits can incorporate additional associated + types without imposing an extra burden on clients that don't care about those + types. + + In today's Rust, by contrast, adding additional generic parameters to a + trait often feels like a very "heavyweight" move. + +* **Ease of refactoring/evolution** + + Because users of a trait do not have to separately parameterize over its + associated types, new associated types can be added without breaking all + existing client code. + + In today's Rust, by contrast, associated types can only be added by adding + more type parameters to a trait, which breaks all code mentioning the trait. + +## Clearer trait matching + +Type parameters to traits can either be "inputs" or "outputs": + +* **Inputs**. An "input" type parameter is used to _determine_ which `impl` to + use. + +* **Outputs**. An "output" type parameter is uniquely determined _by_ the + `impl`, but plays no role in selecting the `impl`. + +Input and output types play an important role for type inference and trait +coherence rules, which is described in more detail later on. + +In the vast majority of current libraries, the only input type is the `Self` +type implementing the trait, and all other trait type parameters are outputs. +For example, the trait `Iterator` takes a type parameter `A` for the elements +being iterated over, but this type is always determined by the concrete `Self` +type (e.g. `Items`) implementing the trait: `A` is typically an output. + +Additional input type parameters are useful for cases like binary operators, +where you may want the `impl` to depend on the types of *both* +arguments. For example, you might want a trait + +```rust +trait Add { + fn add(&self, rhs: &Rhs) -> Sum; +} +``` + +to view the `Self` and `Rhs` types as inputs, and the `Sum` type as an output +(since it is uniquely determined by the argument types). This would allow +`impl`s to vary depending on the `Rhs` type, even though the `Self` type is the same: + +```rust +impl Add for int { ... } +impl Add for int { ... } +``` + +Today's Rust does not make a clear distinction between input and output type +parameters to traits. If you attempted to provide the two `impl`s above, you +would receive an error like: + +``` +error: conflicting implementations for trait `Add` +``` + +This RFC clarifies trait matching by: + +* Treating all trait type parameters as *input* types, and +* Providing associated types, which are *output* types. + +In this design, the `Add` trait would be written and implemented as follows: + +```rust +// Self and Rhs are *inputs* +trait Add { + type Sum; // Sum is an *output* + fn add(&self, &Rhs) -> Sum; +} + +impl Add for int { + type Sum = int; + fn add(&self, rhs: &int) -> int { ... } +} + +impl Add for int { + type Sum = Complex; + fn add(&self, rhs: &Complex) -> Complex { ... } +} +``` + +With this approach, a trait declaration like `trait Add { ... }` is really +defining a *family* of traits, one for each choice of `Rhs`. One can then +provide a distinct `impl` for every member of this family. + +## Expressiveness + +Associated types, lifetimes, and functions can already be expressed in today's +Rust, though it is unwieldy to do so (as argued above). + +But associated _consts_ cannot be expressed using today's traits. + +For example, today's Rust includes a variety of numeric traits, including +`Float`, which must currently expose constants as static functions: + +```rust +trait Float { + fn nan() -> Self; + fn infinity() -> Self; + fn neg_infinity() -> Self; + fn neg_zero() -> Self; + fn pi() -> Self; + fn two_pi() -> Self; + ... +} +``` + +Because these functions cannot be used in constant expressions, the modules for +float types _also_ export a separate set of constants as consts, not using +traits. + +Associated constants would allow the consts to live directly on the traits: + +```rust +trait Float { + const NAN: Self; + const INFINITY: Self; + const NEG_INFINITY: Self; + const NEG_ZERO: Self; + const PI: Self; + const TWO_PI: Self; + ... +} +``` + +## Why now? + +The above motivations aside, it may not be obvious why adding associated types +*now* (i.e., pre-1.0) is important. There are essentially two reasons. + +First, the design presented here is *not* backwards compatible, because it +re-interprets trait type parameters as inputs for the purposes of trait +matching. The input/output distinction has several ramifications on coherence +rules, type inference, and resolution, which are all described later on in the +RFC. + +Of course, it might be possible to give a somewhat less ideal design where +associated types can be added later on without changing the interpretation of +existing trait type parameters. For example, type parameters could be explicitly +marked as inputs, and otherwise assumed to be outputs. That would be +unfortunate, since associated types would *also* be outputs -- leaving the +language with two ways of specifying output types for traits. + +But the second reason is for the library stabilization process: + +* Since most existing uses of trait type parameters are intended as outputs, + they should really be associated types instead. Making promises about these APIs + as they currently stand risks locking the libraries into a design that will seem + obsolete as soon as associated items are added. Again, this risk could probably + be mitigated with a different, backwards-compatible associated item design, but + at the cost of cruft in the language itself. + +* The binary operator traits (e.g. `Add`) should be multidispatch. It does not + seem possible to stabilize them *now* in a way that will support moving to + multidispatch later. + +* There are some thorny problems in the current libraries, such as the `_equiv` + methods accumulating in `HashMap`, that can be solved using associated + items. (See "Defaults" below for more on this specific example.) Additional + examples include traits for error propagation and for conversion (to be + covered in future RFCs). Adding these traits would improve the quality and + consistency of our 1.0 library APIs. + +# Detailed design + +## Trait headers + +Trait headers are written according to the following grammar: + +``` +TRAIT_HEADER = + 'trait' IDENT [ '<' INPUT_PARAMS '>' ] [ ':' BOUNDS ] [ WHERE_CLAUSE ] + +INPUT_PARAMS = INPUT_TY { ',' INPUT_TY }* [ ',' ] +INPUT_PARAM = IDENT [ ':' BOUNDS ] + +BOUNDS = BOUND { '+' BOUND }* [ '+' ] +BOUND = IDENT [ '<' ARGS '>' ] + +ARGS = INPUT_ARGS + | OUTPUT_CONSTRAINTS + | INPUT_ARGS ',' OUTPUT_CONSTRAINTS + +INPUT_ARGS = TYPE { ',' TYPE }* + +OUTPUT_CONSTRAINTS = OUTPUT_CONSTRAINT { ',' OUTPUT_CONSTRAINT }* +OUTPUT_CONSTRAINT = IDENT '=' TYPE +``` + +**NOTE**: The grammar for `WHERE_CLAUSE` and `BOUND` is explained in detail in + the subsection "Constraining associated types" below. + +All type parameters to a trait are considered inputs, and can be used to select +an `impl`; conceptually, each distinct instantiation of the types yields a +distinct trait. More details are given in the section "The input/output type +distinction" below. + +## Trait bodies: defining associated items + +Trait bodies are expanded to include three new kinds of items: consts, types, +and lifetimes: + +``` +TRAIT = TRAIT_HEADER '{' TRAIT_ITEM* '}' +TRAIT_ITEM = + ... + | 'const' IDENT ':' TYPE [ '=' CONST_EXP ] ';' + | 'type' IDENT [ ':' BOUNDS ] [ WHERE_CLAUSE ] [ '=' TYPE ] ';' + | 'lifetime' LIFETIME_IDENT ';' +``` + +Traits already support associated functions, which had previously been called +"static" functions. + +The `BOUNDS` and `WHERE_CLAUSE` on associated types are *obligations* for the +implementor of the trait, and *assumptions* for users of the trait: + +```rust +trait Graph { + type N: Show + Hash; + type E: Show + Hash; + ... +} + +impl Graph for MyGraph { + // Both MyNode and MyEdge must implement Show and Hash + type N = MyNode; + type E = MyEdge; + ... +} + +fn print_nodes(g: &G) { + // here, can assume G::N implements Show + ... +} +``` + +### Namespacing/shadowing for associated types + +Associated types may have the same name as existing types in scope, *except* for +type parameters to the trait: + +```rust +struct Foo { ... } + +trait Bar { + type Foo; // this is allowed + fn into_foo(self) -> Foo; // this refers to the trait's Foo + + type Input; // this is NOT allowed +} +``` + +By not allowing name clashes between input and output types, +keep open the possibility of later allowing syntax like: + +```rust +Bar +``` + +where both input and output parameters are constrained by name. And anyway, +there is no use for clashing input/output names. + +In the case of a name clash like `Foo` above, if the trait needs to refer to the +outer `Foo` for some reason, it can always do so by using a `type` alias +external to the trait. + +### Defaults + +Notice that associated consts and types both permit defaults, just as trait +methods and functions can provide defaults. + +Defaults are useful both as a code reuse mechanism, and as a way to expand the +items included in a trait without breaking all existing implementors of the +trait. + +Defaults for associated types, however, present an interesting question: can +default methods assume the default type? In other words, is the following +allowed? + +```rust +trait ContainerKey : Clone + Hash + Eq { + type Query: Hash = Self; + fn compare(&self, other: &Query) -> bool { self == other } + fn query_to_key(q: &Query) -> Self { q.clone() }; +} + +impl ContainerKey for String { + type Query = str; + fn compare(&self, other: &str) -> bool { + self.as_slice() == other + } + fn query_to_key(q: &str) -> String { + q.into_string() + } +} + +impl HashMap where K: ContainerKey { + fn find(&self, q: &K::Query) -> &V { ... } +} +``` + +In this example, the `ContainerKey` trait is used to associate a "`Query`" type +(for lookups) with an owned key type. This resolves the thorny "equiv" problem +in `HashMap`, where the hash map keys are `String`s but you want to index the +hash map with `&str` values rather than `&String` values, i.e. you want the +following to work: + +```rust +// H: HashMap +H.find("some literal") +``` + +rather than having to write + +```rust +H.find(&"some literal".to_string())` +``` + +The current solution involves duplicating the API surface with `_equiv` methods +that use the somewhat subtle `Equiv` trait, but the associated type approach +makes it easy to provide a simple, single API that covers the same use cases. + +The defaults for `ContainerKey` just assume that the owned key and lookup key +types are the same, but the default methods have to assume the default +associated types in order to work. + +For this to work, it must *not* be possible for an implementor of `ContainerKey` +to override the default `Query` type while leaving the default methods in place, +since those methods may no longer typecheck. + +We deal with this in a very simple way: + +* If a trait implementor overrides any default associated types, they must also + override *all* default functions and methods. + +* Otherwise, a trait implementor can selectively override individual default + methods/functions, as they can today. + +## Trait implementations + +Trait `impl` syntax is much the same as before, except that const, type, and +lifetime items are allowed: + +``` +IMPL_ITEM = + ... + | 'const' IDENT ':' TYPE '=' CONST_EXP ';' + | 'type' IDENT' '=' 'TYPE' ';' + | 'lifetime' LIFETIME_IDENT '=' LIFETIME_REFERENCE ';' +``` + +Any `type` implementation must satisfy all bounds and where clauses in the +corresponding trait item. + +## Referencing associated items + +Associated items are referenced through paths. The expression path grammar was +updated as part of [UFCS](https://github.com/rust-lang/rfcs/pull/132), but to +accommodate associated types and lifetimes we need to update the type path +grammar as well. + +The full grammar is as follows: + +``` +EXP_PATH + = EXP_ID_SEGMENT { '::' EXP_ID_SEGMENT }* + | TYPE_SEGMENT { '::' EXP_ID_SEGMENT }+ + | IMPL_SEGMENT { '::' EXP_ID_SEGMENT }+ +EXP_ID_SEGMENT = ID [ '::' '<' TYPE { ',' TYPE }* '>' ] + +TY_PATH + = TY_ID_SEGMENT { '::' TY_ID_SEGMENT }* + | TYPE_SEGMENT { '::' TY_ID_SEGMENT }* + | IMPL_SEGMENT { '::' TY_ID_SEGMENT }+ + +TYPE_SEGMENT = '<' TYPE '>' +IMPL_SEGMENT = '<' TYPE 'as' TRAIT_REFERENCE '>' +TRAIT_REFERENCE = ID [ '<' TYPE { ',' TYPE * '>' ] +``` + +Here are some example paths, along with what they might be referencing + +```rust +// Expression paths /////////////////////////////////////////////////////////////// + +a::b::c // reference to a function `c` in module `a::b` +a:: // the function `a` instantiated with type arguments `T1`, `T2` +Vec::::new // reference to the function `new` associated with `Vec` + as SomeTrait>::some_fn + // reference to the function `some_fn` associated with `SomeTrait`, + // as implemented by `Vec` +T::size_of // the function `size_of` associated with the type or trait `T` +::size_of // the function `size_of` associated with `T` _viewed as a type_ +::size_of + // the function `size_of` associated with `T`'s impl of `SizeOf` + +// Type paths ///////////////////////////////////////////////////////////////////// + +a::b::C // reference to a type `C` in module `a::b` +A // type A instantiated with type arguments `T1`, `T2` +Vec::Iter // reference to the type `Iter` associated with `Vec + as SomeTrait>::SomeType + // reference to the type `SomeType` associated with `SomeTrait`, + // as implemented by `Vec` +``` + +### Ways to reference items + +Next, we'll go into more detail on the meaning of each kind of path. + +For the sake of discussion, we'll suppose we've defined a trait like the +following: + +```rust +trait Container { + type E; + fn empty() -> Self; + fn insert(&mut self, E); + fn contains(&self, &E) -> bool where E: PartialEq; + ... +} + +impl Container for Vec { + type E = T; + fn empty() -> Vec { Vec::new() } + ... +} +``` + +#### Via an `ID_SEGMENT` prefix + +##### When the prefix resolves to a type + +The most common way to get at an associated item is through a type parameter +with a trait bound: + +```rust +fn pick(c: &C) -> Option<&C::E> { ... } + +fn mk_with_two() -> C where C: Container, C::E = uint { + let mut cont = C::empty(); // reference to associated function + cont.insert(0); + cont.insert(1); + cont +} +``` + +For these references to be valid, the type parameter must be known to implement +the relevant trait: + +```rust +// Knowledge via bounds +fn pick(c: &C) -> Option<&C::E> { ... } + +// ... or equivalently, where clause +fn pick(c: &C) -> Option<&C::E> where C: Container { ... } + +// Knowledge via ambient constraints +struct TwoContainers(C1, C2); +impl TwoContainers { + fn pick_one(&self) -> Option<&C1::E> { ... } + fn pick_other(&self) -> Option<&C2::E> { ... } +} +``` + +Note that `Vec::E` and `Vec::::empty` are also valid type and function +references, respectively. + +For cases like `C::E` or `Vec::E`, the path begins with an `ID_SEGMENT` +prefix that itself resolves to a _type_: both `C` and `Vec` are types. In +general, a path `PREFIX::REST_OF_PATH` where `PREFIX` resolves to a type is +equivalent to using a `TYPE_SEGMENT` prefix `::REST_OF_PATH`. So, for +example, following are all equivalent: + +```rust +fn pick(c: &C) -> Option<&C::E> { ... } +fn pick(c: &C) -> Option<&::E> { ... } +fn pick(c: &C) -> Option<&<::E>> { ... } +``` + +The behavior of `TYPE_SEGMENT` prefixes is described in the next subsection. + +##### When the prefix resolves to a trait + +However, it is possible for an `ID_SEGMENT` prefix to resolve to a *trait*, +rather than a type. In this case, the behavior of an `ID_SEGMENT` varies from +that of a `TYPE_SEGMENT` in the following way: + +```rust +// a reference Container::insert is roughly equivalent to: +fn trait_insert(c: &C, e: C::E); + +// a reference ::insert is roughly equivalent to: +fn object_insert(c: &Container, e: E); +``` + +That is, if `PREFIX` is an `ID_SEGMENT` that +resolves to a trait `Trait`: + +* A path `PREFIX::REST` resolves to the item/path `REST` defined within + `Trait`, while treating the type implementing the trait as a type parameter. + +* A path `::REST` treats `PREFIX` as a (DST-style) *type*, and is + hence usable only with trait objects. See the + [UFCS RFC](https://github.com/rust-lang/rfcs/pull/132) for more detail. + +Note that a path like `Container::E`, while grammatically valid, will fail to +resolve since there is no way to tell which `impl` to use. A path like +`Container::empty`, however, resolves to a function roughly equivalent to: + +```rust +fn trait_empty() -> C; +``` + +#### Via a `TYPE_SEGMENT` prefix + +> The following text is *slightly changed* from the +> [UFCS RFC](https://github.com/rust-lang/rfcs/pull/132). + +When a path begins with a `TYPE_SEGMENT`, it is a type-relative path. If this is +the complete path (e.g., ``), then the path resolves to the specified +type. If the path continues (e.g., `::size_of`) then the next segment is +resolved using the following procedure. The procedure is intended to mimic +method lookup, and hence any changes to method lookup may also change the +details of this lookup. + +Given a path `::m::...`: + +1. Search for members of inherent impls defined on `T` (if any) with + the name `m`. If any are found, the path resolves to that item. + +2. Otherwise, let `IN_SCOPE_TRAITS` be the set of traits that are in + scope and which contain a member named `m`: + - Let `IMPLEMENTED_TRAITS` be those traits from `IN_SCOPE_TRAITS` + for which an implementation exists that (may) apply to `T`. + - There can be ambiguity in the case that `T` contains type inference + variables. + - If `IMPLEMENTED_TRAITS` is not a singleton set, report an ambiguity + error. Otherwise, let `TRAIT` be the member of `IMPLEMENTED_TRAITS`. + - If `TRAIT` is ambiguously implemented for `T`, report an + ambiguity error and request further type information. + - Otherwise, rewrite the path to `::m::...` and + continue. + +#### Via a `IMPL_SEGMENT` prefix + +> The following text is *somewhat different* from the +> [UFCS RFC](https://github.com/rust-lang/rfcs/pull/132). + +When a path begins with an `IMPL_SEGMENT`, it is a reference to an item defined +from a trait. Note that such paths must always have a follow-on member `m` (that +is, `` is not a complete path, but `::m` is). + +To resolve the path, first search for an applicable implementation of `Trait` +for `T`. If no implementation can be found -- or the result is ambiguous -- then +report an error. Note that when `T` is a type parameter, a bound `T: Trait` +guarantees that there is such an implementation, but does not count for +ambiguity purposes. + +Otherwise, resolve the path to the member of the trait with the substitution +`Self => T` and continue. + +This apparently straightforward algorithm has some subtle consequences, as +illustrated by the following example: + +```rust +trait Foo { + type T; + fn as_T(&self) -> &T; +} + +// A blanket impl for any Show type T +impl Foo for T { + type T = T; + fn as_T(&self) -> &T { self } +} + +fn bounded(u: U) where U::T: Show { + // Here, we just constrain the associated type directly + println!("{}", u.as_T()) +} + +fn blanket(u: U) { + // the blanket impl applies to U, so we know that `U: Foo` and + // ::T = U (and, of course, U: Show) + println!("{}", u.as_T()) +} + +fn not_allowed(u: U) { + // this will not compile, since ::T is not known to + // implement Show + println!("{}", u.as_T()) +} +``` + +This example includes three generic functions that make use of an associated +type; the first two will typecheck, while the third will not. + +* The first case, `bounded`, places a `Show` constraint directly on the + otherwise-abstract associated type `U::T`. Hence, it is allowed to assume that + `U::T: Show`, even though it does not know the concrete implementation of + `Foo` for `U`. + +* The second case, `blanket`, places a `Show` constraint on the type `U`, which + means that the blanket `impl` of `Foo` applies even though we do not know the + *concrete* type that `U` will be. That fact means, moreover, that we can + compute exactly what the associated type `U::T` will be, and know that it will + satisfy `Show. Coherence guarantees that that the blanket `impl` is the only + one that could apply to `U`. (See the section "Impl specialization" under + "Unresolved questions" for a deeper discussion of this point.) + +* The third case assumes only that `U: Foo`, and therefore nothing is known + about the associated type `U::T`. In particular, the function cannot assume + that `U::T: Show`. + +The resolution rules also interact with instantiation of type parameters in an +intuitive way. For example: + +```rust +trait Graph { + type N; + type E; + ... +} + +impl Graph for MyGraph { + type N = MyNode; + type E = MyEdge; + ... +} + +fn pick_node(t: &G) -> &G::N { + // the type G::N is abstract here + ... +} + +let G = MyGraph::new(); +... +pick_node(G) // has type: ::N = MyNode +``` + +Assuming there are no blanket implementations of `Graph`, the `pick_node` +function knows nothing about the associated type `G::N`. However, a *client* of +`pick_node` that instantiates it with a particular concrete graph type will also +know the concrete type of the value returned from the function -- here, `MyNode`. + +## Scoping of `trait` and `impl` items + +Associated types are frequently referred to in the signatures of a trait's +methods and associated functions, and it is natural and convenient to refer to +them directly. + +In other words, writing this: + +```rust +trait Graph { + type N; + type E; + fn has_edge(&self, &N, &N) -> bool; + ... +} +``` + +is more appealing than writing this: + +```rust +trait Graph { + type N; + type E; + fn has_edge(&self, &Self::N, &Self::N) -> bool; + ... +} +``` + +This RFC proposes to treat both `trait` and `impl` bodies (both +inherent and for traits) the same way we treat `mod` bodies: *all* +items being defined are in scope. In particular, methods are in scope +as UFCS-style functions: + +```rust +trait Foo { + type AssocType; + lifetime 'assoc_lifetime; + const ASSOC_CONST: uint; + fn assoc_fn() -> Self; + + // Note: 'assoc_lifetime and AssocType in scope: + fn method(&self, Self) -> &'assoc_lifetime AssocType; + + fn default_method(&self) -> uint { + // method in scope UFCS-style, assoc_fn in scope + let _ = method(self, assoc_fn()); + ASSOC_CONST // in scope + } +} + +// Same scoping rules for impls, including inherent impls: +struct Bar; +impl Bar { + fn foo(&self) { ... } + fn bar(&self) { + foo(self); // foo in scope UFCS-style + ... + } +} +``` + +Items from super traits are *not* in scope, however. See +[the discussion on super traits below](#super-traits) for more detail. + +These scope rules provide good ergonomics for associated types in +particular, and a consistent scope model for language constructs that +can contain items (like traits, impls, and modules). In the long run, +we should also explore imports for trait items, i.e. `use +Trait::some_method`, but that is out of scope for this RFC. + +Note that, according to this proposal, associated types/lifetimes are *not* in +scope for the optional `where` clause on the trait header. For example: + +```rust +trait Foo + // type parameters in scope, but associated types are not: + where Bar: Encodable { + + type Output; + ... +} +``` + +This setup seems more intuitive than allowing the trait header to refer directly +to items defined within the trait body. + +It's also worth noting that *trait-level* `where` clauses are never needed for +constraining associated types anyway, because associated types also have `where` +clauses. Thus, the above example could (and should) instead be written as +follows: + +```rust +trait Foo { + type Output where Bar: Encodable; + ... +} +``` + +## Constraining associated types + +Associated types are not treated as parameters to a trait, but in some cases a +function will want to constrain associated types in some way. For example, as +explained in the Motivation section, the `Iterator` trait should treat the +element type as an output: + +```rust +trait Iterator { + type A; + fn next(&mut self) -> Option; + ... +} +``` + +For code that works with iterators generically, there is no need to constrain +this type: + +```rust +fn collect_into_vec(iter: I) -> Vec { ... } +``` + +But other code may have requirements for the element type: + +* That it implements some traits (bounds). +* That it unifies with a particular type. + +These requirements can be imposed via `where` clauses: + +```rust +fn print_iter(iter: I) where I: Iterator, I::A: Show { ... } +fn sum_uints(iter: I) where I: Iterator, I::A = uint { ... } +``` + +In addition, there is a shorthand for equality constraints: + +```rust +fn sum_uints>(iter: I) { ... } +``` + +In general, a trait like: + +```rust +trait Foo { + type Output1; + type Output2; + lifetime 'a; + const C: bool; + ... +} +``` + +can be written in a bound like: + +``` +T: Foo +T: Foo +T: Foo +T: Foo +T: Foo>(t: T) // this is valid +fn consume_obj(t: Box>) // this is NOT valid + +// but this IS valid: +fn consume_obj(t: Box; // what is the lifetime here? + fn iter<'a>(&'a self) -> I; // and how to connect it to self? +} +``` + +The problem is that, when implementing this trait, the return type `I` of `iter` +must generally depend on the *lifetime* of self. For example, the corresponding +method in `Vec` looks like the following: + +```rust +impl Vec { + fn iter(&'a self) -> Items<'a, T> { ... } +} +``` + +This means that, given a `Vec`, there isn't a *single* type `Items` for +iteration -- rather, there is a *family* of types, one for each input lifetime. +In other words, the associated type `I` in the `Iterable` needs to be +"higher-kinded": not just a single type, but rather a family: + +```rust +trait Iterable { + type A; + type I<'a>: Iterator<&'a A>; + fn iter<'a>(&self) -> I<'a>; +} +``` + +In this case, `I` is parameterized by a lifetime, but in other cases (like +`map`) an associated type needs to be parameterized by a type. + +In general, such higher-kinded types (HKTs) are a much-requested feature for +Rust, and they would extend the reach of associated types. But the design and +implementation of higher-kinded types is, by itself, a significant investment. +The point of view of this RFC is that associated items bring the most important +changes needed to stabilize our existing traits (and add a few key others), +while HKTs will allow us to define important traits in the future but are not +necessary for 1.0. + +### Encoding higher-kinded types + +That said, it's worth pointing out that variants of higher-kinded types can be +encoded in the system being proposed here. + +For example, the `Iterable` example above can be written in the following +somewhat contorted style: + +```rust +trait IterableOwned { + type A; + type I: Iterator; + fn iter_owned(self) -> I; +} + +trait Iterable { + fn iter<'a>(&'a self) -> <&'a Self>::I where &'a Self: IterableOwned { + IterableOwned::iter_owned(self) + } +} +``` + +The idea here is to define a trait that takes, as input type/lifetimes +parameters, the parameters to any HKTs. In this case, the trait is implemented +on the type `&'a Self`, which includes the lifetime parameter. + +We can in fact generalize this technique to encode arbitrary HKTs: + +```rust +// The kind * -> * +trait TypeToType { + type Output; +} +type Apply where Name: TypeToType = Name::Output; + +struct Vec_; +struct DList_; + +impl TypeToType for Vec_ { + type Output = Vec; +} + +impl TypeToType for DList_ { + type Output = DList; +} + +trait Mappable +{ + type E; + type HKT where Apply = Self; + + fn map(self, f: E -> F) -> Apply; +} +``` + +While the above demonstrates the versatility of associated types and `where` +clauses, it is probably too much of a hack to be viable for use in `libstd`. + +### Associated consts in generic code + +If the value of an associated const depends on a type parameter (including +`Self`), it cannot be used in a constant expression. This restriction will +almost certainly be lifted in the future, but this raises questions outside the +scope of this RFC. + +# Staging + +Associated lifetimes are probably not necessary for the 1.0 timeframe. While we +currently have a few traits that are parameterized by lifetimes, most of these +can go away once DST lands. + +On the other hand, associated lifetimes are probably trivial to implement once +associated types have been implemented. + +# Other interactions + +## Interaction with implied bounds + +As part of the +[implied bounds](http://smallcultfollowing.com/babysteps/blog/2014/07/06/implied-bounds/) +idea, it may be desirable for this: + +```rust +fn pick_node(g: &G) -> &::N +``` + +to be sugar for this: + +```rust +fn pick_node(g: &G) -> &::N +``` + +But this feature can easily be added later, as part of a general implied bounds RFC. + +## Future-proofing: specialization of `impl`s + +In the future, we may wish to relax the "overlapping instances" rule so that one +can provide "blanket" trait implementations and then "specialize" them for +particular types. For example: + +```rust +trait Sliceable { + type Slice; + // note: not using &self here to avoid need for HKT + fn as_slice(self) -> Slice; +} + +impl<'a, T> Sliceable for &'a T { + type Slice = &'a T; + fn as_slice(self) -> &'a T { self } +} + +impl<'a, T> Sliceable for &'a Vec { + type Slice = &'a [T]; + fn as_slice(self) -> &'a [T] { self.as_slice() } +} +``` + +But then there's a difficult question: + +``` +fn dice(a: &A) -> &A::Slice where &A: Slicable { + a // is this allowed? +} +``` + +Here, the blanket and specialized implementations provide incompatible +associated types. When working with the trait generically, what can we assume +about the associated type? If we assume it is the blanket one, the type may +change during monomorphization (when specialization takes effect)! + +The RFC *does* allow generic code to "see" associated types provided by blanket +implementations, so this is a potential problem. + +Our suggested strategy is the following. If at some later point we wish to add +specialization, traits would have to *opt in* explicitly. For such traits, we +would *not* allow generic code to "see" associated types for blanket +implementations; instead, output types would only be visible when all input +types were concretely known. This approach is backwards-compatible with the RFC, +and is probably a good idea in any case. + +# Alternatives + +## Multidispatch through tuple types + +This RFC clarifies trait matching by making trait type parameters inputs to +matching, and associated types outputs. + +A more radical alternative would be to *remove type parameters from traits*, and +instead support multiple input types through a separate multidispatch mechanism. + +In this design, the `Add` trait would be written and implemented as follows: + +```rust +// Lhs and Rhs are *inputs* +trait Add for (Lhs, Rhs) { + type Sum; // Sum is an *output* + fn add(&Lhs, &Rhs) -> Sum; +} + +impl Add for (int, int) { + type Sum = int; + fn add(left: &int, right: &int) -> int { ... } +} + +impl Add for (int, Complex) { + type Sum = Complex; + fn add(left: &int, right: &Complex) -> Complex { ... } +} +``` + +The `for` syntax in the trait definition is used for multidispatch traits, here +saying that `impl`s must be for pairs of types which are bound to `Lhs` and +`Rhs` respectively. The `add` function can then be invoked in UFCS style by +writing + +```rust +Add::add(some_int, some_complex) +``` + +*Advantages of the tuple approach*: + +- It does not force a distinction between `Self` and other input types, which in + some cases (including binary operators like `Add`) can be artificial. + +- Makes it possible to specify input types without specifying the trait: + `<(A, B)>::Sum` rather than `>::Sum`. + +*Disadvantages of the tuple approach*: + +- It's more painful when you *do* want a method rather than a function. + +- Requires `where` clauses when used in bounds: `where (A, B): Trait` rather + than `A: Trait`. + +- It gives two ways to write single dispatch: either without `for`, or using + `for` with a single-element tuple. + +- There's a somewhat jarring distinction between single/multiple dispatch + traits, making the latter feel "bolted on". + +- The tuple syntax is unusual in acting as a binder of its types, as opposed to + the `Trait` syntax. + +- Relatedly, the generics syntax for traits is immediately understandable (a + family of traits) based on other uses of generics in the language, while the + tuple notation stands alone. + +- Less clear story for trait objects (although the fact that `Self` is the only + erased input type in this RFC may seem somewhat arbitrary). + +On balance, the generics-based approach seems like a better fit for the language +design, especially in its interaction with methods and the object system. + +## A backwards-compatible version + +Yet another alternative would be to allow trait type parameters to be either +inputs or outputs, marking the inputs with a keyword `in`: + +```rust +trait Add { + fn add(&Lhs, &Rhs) -> Sum; +} +``` + +This would provide a way of adding multidispatch now, and then adding associated +items later on without breakage. If, in addition, output types had to come after +all input types, it might even be possible to migrate output type parameters +like `Sum` above into associated types later. + +This is perhaps a reasonable fallback, but it seems better to introduce a clean +design with both multidispatch and associated items together. + +# Unresolved questions + +## Super traits + +This RFC largely ignores super traits. + +Currently, the implementation of super traits treats them identically to a +`where` clause that bounds `Self`, and this RFC does not propose to change +that. However, a follow-up RFC should clarify that this is the intended +semantics for super traits. + +Note that this treatment of super traits is, in particular, consistent with the +proposed scoping rules, which do not bring items from super traits into scope in +the body of a subtrait; they must be accessed via `Self::item_name`. + +## Equality constraints in `where` clauses + +This RFC allows equality constraints on types for associated types, but does not +propose a similar feature for `where` clauses. That will be the subject of a +follow-up RFC. + +## Multiple trait object bounds for the same trait + +The design here makes it possible to write bounds or trait objects that mention +the same trait, multiple times, with different inputs: + +```rust +fn mulit_add + Add>(t: T) -> T { ... } +fn mulit_add_obj(t: Box + Add>) -> Box + Add> { ... } +``` + +This seems like a potentially useful feature, and should be unproblematic for +bounds, but may have implications for vtables that make it problematic for trait +objects. Whether or not such trait combinations are allowed will likely depend +on implementation concerns, which are not yet clear. + +## Generic associated consts in match patterns + +It seems desirable to allow constants that depend on type parameters in match +patterns, but it's not clear how to do so while still checking exhaustiveness +and reachability of the match arms. Most likely this requires new forms of +where clause, to constrain associated constant values. + +For now, we simply defer the question. + +## Generic associated consts in array sizes + +It would be useful to be able to use trait-associated constants in generic code. + +```rust +// Shouldn't this be OK? +const ALIAS_N: usize = ::N; +let x: [u8; ::N] = [0u8; ALIAS_N]; +// Or... +let x: [u8; T::N + 1] = [0u8; T::N + 1]; +``` + +However, this causes some problems. What should we do with the following case in +type checking, where we need to prove that a generic is valid for any `T`? + +```rust +let x: [u8; T::N + T::N] = [0u8; 2 * T::N]; +``` + +We would like to handle at least some obvious cases (e.g. proving that +`T::N == T::N`), but without trying to prove arbitrary statements about +arithmetic. The question of how to do this is deferred. diff --git a/text/0198-slice-notation.md b/text/0198-slice-notation.md new file mode 100644 index 00000000000..87c5273824a --- /dev/null +++ b/text/0198-slice-notation.md @@ -0,0 +1,227 @@ +- Start Date: 2014-09-11 +- RFC PR #: [rust-lang/rfcs#198](https://github.com/rust-lang/rfcs/pull/198) +- Rust Issue #: [rust-lang/rust#17177](https://github.com/rust-lang/rust/issues/17177) + +# Summary + +This RFC adds *overloaded slice notation*: + +- `foo[]` for `foo.as_slice()` +- `foo[n..m]` for `foo.slice(n, m)` +- `foo[n..]` for `foo.slice_from(n)` +- `foo[..m]` for `foo.slice_to(m)` +- `mut` variants of all the above + +via two new traits, `Slice` and `SliceMut`. + +It also changes the notation for range `match` patterns to `...`, to +signify that they are inclusive whereas `..` in slices are exclusive. + +# Motivation + +There are two primary motivations for introducing this feature. + +### Ergonomics + +Slicing operations, especially `as_slice`, are a very common and basic thing to +do with vectors, and potentially many other kinds of containers. We already +have notation for indexing via the `Index` trait, and this RFC is essentially a +continuation of that effort. + +The `as_slice` operator is particularly important. Since we've moved away from +auto-slicing in coercions, explicit `as_slice` calls have become extremely +common, and are one of the +[leading ergonomic/first impression](https://github.com/rust-lang/rust/issues/14983) +problems with the language. There are a few other approaches to address this +particular problem, but these alternatives have downsides that are discussed +below (see "Alternatives"). + +### Error handling conventions + +We are gradually moving toward a Python-like world where notation like `foo[n]` +calls `fail!` when `n` is out of bounds, while corresponding methods like `get` +return `Option` values rather than failing. By providing similar notation for +slicing, we open the door to following the same convention throughout +vector-like APIs. + +# Detailed design + +The design is a straightforward continuation of the `Index` trait design. We +introduce two new traits, for immutable and mutable slicing: + +```rust +trait Slice { + fn as_slice<'a>(&'a self) -> &'a S; + fn slice_from(&'a self, from: Idx) -> &'a S; + fn slice_to(&'a self, to: Idx) -> &'a S; + fn slice(&'a self, from: Idx, to: Idx) -> &'a S; +} + +trait SliceMut { + fn as_mut_slice<'a>(&'a mut self) -> &'a mut S; + fn slice_from_mut(&'a mut self, from: Idx) -> &'a mut S; + fn slice_to_mut(&'a mut self, to: Idx) -> &'a mut S; + fn slice_mut(&'a mut self, from: Idx, to: Idx) -> &'a mut S; +} +``` + +(Note, the mutable names here are part of likely changes to naming conventions +that will be described in a separate RFC). + +These traits will be used when interpreting the following notation: + +*Immutable slicing* + +- `foo[]` for `foo.as_slice()` +- `foo[n..m]` for `foo.slice(n, m)` +- `foo[n..]` for `foo.slice_from(n)` +- `foo[..m]` for `foo.slice_to(m)` + +*Mutable slicing* + +- `foo[mut]` for `foo.as_mut_slice()` +- `foo[mut n..m]` for `foo.slice_mut(n, m)` +- `foo[mut n..]` for `foo.slice_from_mut(n)` +- `foo[mut ..m]` for `foo.slice_to_mut(m)` + +Like `Index`, uses of this notation will auto-deref just as if they were method +invocations. So if `T` implements `Slice`, and `s: Smaht`, then +`s[]` compiles and has type `&[U]`. + +Note that slicing is "exclusive" (so `[n..m]` is the interval `n <= x +< m`), while `..` in `match` patterns is "inclusive". To avoid +confusion, we propose to change the `match` notation to `...` to +reflect the distinction. The reason to change the notation, rather +than the interpretation, is that the exclusive (respectively +inclusive) interpretation is the right default for slicing +(respectively matching). + +## Rationale for the notation + +The choice of square brackets for slicing is straightforward: it matches our +indexing notation, and slicing and indexing are closely related. + +Some other languages (like Python and Go -- and Fortran) use `:` rather than +`..` in slice notation. The choice of `..` here is influenced by its use +elsewhere in Rust, for example for fixed-length array types `[T, ..n]`. The `..` +for slicing has precedent in Perl and D. + +See [Wikipedia](http://en.wikipedia.org/wiki/Array_slicing) for more on the +history of slice notation in programming languages. + +### The `mut` qualifier + +It may be surprising that `mut` is used as a qualifier in the proposed +slice notation, but not for the indexing notation. The reason is that +indexing includes an implicit dereference. If `v: Vec` then +`v[n]` has type `Foo`, not `&Foo` or `&mut Foo`. So if you want to get +a mutable reference via indexing, you write `&mut v[n]`. More +generally, this allows us to do resolution/typechecking prior to +resolving the mutability. + +This treatment of `Index` matches the C tradition, and allows us to +write things like `v[0] = foo` instead of `*v[0] = foo`. + +On the other hand, this approach is problematic for slicing, since in +general it would yield an unsized type (under DST) -- and of course, +slicing is meant to give you a fat pointer indicating the size of the +slice, which we don't want to immediately deref. But the consequence +is that we need to know the mutability of the slice up front, when we +take it, since it determines the type of the expression. + +# Drawbacks + +The main drawback is the increase in complexity of the language syntax. This +seems minor, especially since the notation here is essentially "finishing" what +was started with the `Index` trait. + +## Limitations in the design + +Like the `Index` trait, this forces the result to be a reference via +`&`, which may rule out some generalizations of slicing. + +One way of solving this problem is for the slice methods to take +`self` (by value) rather than `&self`, and in turn to implement the +trait on `&T` rather than `T`. Whether this approach is viable in the +long run will depend on the final rules for method resolution and +auto-ref. + +In general, the trait system works best when traits can be applied to +types `T` rather than borrowed types `&T`. Ultimately, if Rust gains +higher-kinded types (HKT), we could change the slice type `S` in the +trait to be higher-kinded, so that it is a *family* of types indexed +by lifetime. Then we could replace the `&'a S` in the return value +with `S<'a>`. It should be possible to transition from the current +`Index` and `Slice` trait designs to an HKT version in the future +without breaking backwards compatibility by using blanket +implementations of the new traits (say, `IndexHKT`) for types that +implement the old ones. + +# Alternatives + +For improving the ergonomics of `as_slice`, there are two main alternatives. + +## Coercions: auto-slicing + +One possibility would be re-introducing some kind of coercion that automatically +slices. +We used to have a coercion from (in today's terms) `Vec` to +`&[T]`. Since we no longer coerce owned to borrowed values, we'd probably want a +coercion `&Vec` to `&[T]` now: + +```rust +fn use_slice(t: &[u8]) { ... } + +let v = vec!(0u8, 1, 2); +use_slice(&v) // automatically coerce here +use_slice(v.as_slice()) // equivalent +``` + +Unfortunately, adding such a coercion requires choosing between the following: + +* Tie the coercion to `Vec` and `String`. This would reintroduce special + treatment of these otherwise purely library types, and would mean that other + library types that support slicing would not benefit (defeating some of the + purpose of DST). + +* Make the coercion extensible, via a trait. This is opening pandora's box, + however: the mechanism could likely be (ab)used to run arbitrary code during + coercion, so that any invocation `foo(a, b, c)` might involve running code to + pre-process each of the arguments. While we may eventually want such + user-extensible coercions, it is a *big* step to take with a lot of potential + downside when reasoning about code, so we should pursue more conservative + solutions first. + +## Deref + +Another possibility would be to make `String` implement `Deref` and +`Vec` implement `Deref<[T]>`, once DST lands. Doing so would allow explicit +coercions like: + +```rust +fn use_slice(t: &[u8]) { ... } + +let v = vec!(0u8, 1, 2); +use_slice(&*v) // take advantage of deref +use_slice(v.as_slice()) // equivalent +``` + +There are at least two downsides to doing so, however: + +* It is not clear how the method resolution rules will ultimately interact with + `Deref`. In particular, a leading proposal is that for a smart pointer `s: Smaht` + when you invoke `s.m(...)` only *inherent* methods `m` are considered for + `Smaht`; *trait* methods are only considered for the maximally-derefed + value `*s`. + + With such a resolution strategy, implementing `Deref` for `Vec` would make it + impossible to use trait methods on the `Vec` type except through UFCS, + severely limiting the ability of programmers to usefully implement new traits + for `Vec`. + +* The idea of `Vec` as a smart pointer around a slice, and the use of `&*v` as + above, is somewhat counterintuitive, especially for such a basic type. + +Ultimately, notation for slicing seems desireable on its own merits anyway, and +if it can eliminate the need to implement `Deref` for `Vec` and `String`, all +the better. diff --git a/text/0199-ownership-variants.md b/text/0199-ownership-variants.md new file mode 100644 index 00000000000..22f548b38fb --- /dev/null +++ b/text/0199-ownership-variants.md @@ -0,0 +1,146 @@ +- Start Date: 2014-08-28 +- RFC PR #: [rust-lang/rfcs#199](https://github.com/rust-lang/rfcs/pull/199) +- Rust Issue #: [rust-lang/rust#16810](https://github.com/rust-lang/rust/issues/16810) + +# Summary + +This is a *conventions RFC* for settling naming conventions when there +are by value, by reference, and by mutable reference variants of an +operation. + +# Motivation + +Currently the libraries are not terribly consistent about how to +signal mut variants of functions; sometimes it is by a `mut_` prefix, +sometimes a `_mut` suffix, and occasionally with `_mut_` appearing in +the middle. These inconsistencies make APIs difficult to remember. + +While there are arguments in favor of each of the positions, we stand +to gain a lot by standardizing, and to some degree we just need to +make a choice. + +# Detailed design + +Functions often come in multiple variants: immutably borrowed, mutably +borrowed, and owned. + +The canonical example is iterator methods: + +- `iter` works with immutably borrowed data +- `mut_iter` works with mutably borrowed data +- `move_iter` works with owned data + +For iterators, the "default" (unmarked) variant is immutably borrowed. +In other cases, the default is owned. + +The proposed rules depend on which variant is the default, but use +*suffixes* to mark variants in all cases. + +## The rules + +### Immutably borrowed by default + +If `foo` uses/produces an immutable borrow by default, use: + +* The `_mut` suffix (e.g. `foo_mut`) for the mutably borrowed variant. +* The `_move` suffix (e.g. `foo_move`) for the owned variant. + +However, in the case of iterators, the moving variant can also be +understood as an `into` conversion, `into_iter`, and `for x in v.into_iter()` +reads arguably better than `for x in v.iter_move()`, so the convention is +`into_iter`. + +**NOTE**: This convention covers only the *method* names for + iterators, not the names of the iterator types. That will be the + subject of a follow up RFC. + +### Owned by default + +If `foo` uses/produces owned data by default, use: + +* The `_ref` suffix (e.g. `foo_ref`) for the immutably borrowed variant. +* The `_mut` suffix (e.g. `foo_mut`) for the mutably borrowed variant. + +### Exceptions + +For mutably borrowed variants, if the `mut` qualifier is part of a +type name (e.g. `as_mut_slice`), it should appear as it would appear +in the type. + +### References to type names + +Some places in the current libraries, we say things like `as_ref` and +`as_mut`, and others we say `get_ref` and `get_mut_ref`. + +Proposal: generally standardize on `mut` as a shortening of `mut_ref`. + + +## The rationale + +### Why suffixes? + +Using a suffix makes it easier to visually group variants together, +especially when sorted alphabetically. It puts the emphasis on the +functionality, rather than the qualifier. + +### Why `move`? + +Historically, Rust has used `move` as a way to signal ownership +transfer and to connect to C++ terminology. The main disadvantage is +that it does not emphasize ownership, which is our current narrative. +On the other hand, in Rust all data is owned, so using `_owned` as a +qualifier is a bit strange. + +The `Copy` trait poses a problem for any terminology about ownership +transfer. The proposed mental model is that with `Copy` data you are +"moving a copy". + +See Alternatives for more discussion. + +### Why `mut` rather then `mut_ref`? + +It's shorter, and pairs like `as_ref` and `as_mut` have a pleasant harmony +that doesn't place emphasis on one kind of reference over the other. + +# Alternatives + +## Prefix or mixed qualifiers + +Using prefixes for variants is another possibility, but there seems to +be little upside. + +It's possible to rationalize our current mix of prefixes and suffixes +via +[grammatical distinctions](https://github.com/rust-lang/rust/issues/13660#issuecomment-43576378), +but this seems overly subtle and complex, and requires a strong +command of English grammar to work well. + +## No suffix exception + +The rules here make an exception when `mut` is part of a type name, as +in `as_mut_slice`, but we could instead *always* place the qualifier +as a suffix: `as_slice_mut`. This would make APIs more consistent in +some ways, less in others: conversion functions would no longer +consistently use a transcription of their type name. + +This is perhaps not so bad, though, because as it is we often +abbreviate type names. In any case, we need a convention (separate +RFC) for how to refer to type names in methods. + +## `owned` instead of `move` + +The overall narrative about Rust has been evolving to focus on +*ownership* as the essential concept, with borrowing giving various +lesser forms of ownership, so `_owned` would be a reasonable +alternative to `_move`. + +On the other hand, the `ref` variants do not say "borrowed", so in +some sense this choice is inconsistent. In addition, the terminology +is less familiar to those coming from C++. + +## `val` instead of `owned` + +Another option would be `val` or `value` instead of `owned`. This +suggestion plays into the "by reference" and "by value" distinction, +and so is even more congruent with `ref` than `move` is. On the other +hand, it's less clear/evocative than either `move` or `owned`. diff --git a/text/0201-error-chaining.md b/text/0201-error-chaining.md new file mode 100644 index 00000000000..a6ee66b1ce2 --- /dev/null +++ b/text/0201-error-chaining.md @@ -0,0 +1,354 @@ +- Start Date: (fill me in with today's date, 2014-07-17) +- RFC PR #: [rust-lang/rfcs#201](https://github.com/rust-lang/rfcs/pull/201) +- Rust Issue #: [rust-lang/rust#17747](https://github.com/rust-lang/rust/issues/17747) + +# Summary + +This RFC improves interoperation between APIs with different error +types. It proposes to: + +* Increase the flexibility of the `try!` macro for clients of multiple + libraries with disparate error types. + +* Standardize on basic functionality that any error type should have + by introducing an `Error` trait. + +* Support easy error chaining when crossing abstraction boundaries. + +The proposed changes are all library changes; no language changes are +needed -- except that this proposal depends on +[multidispatch](https://github.com/rust-lang/rfcs/pull/195) happening. + +# Motivation + +Typically, a module (or crate) will define a custom error type encompassing the +possible error outcomes for the operations it provides, along with a custom +`Result` instance baking in this type. For example, we have `io::IoError` and +`io::IoResult = Result`, and similarly for other libraries. +Together with the `try!` macro, the story for interacting with errors for a +single library is reasonably good. + +However, we lack infrastructure when consuming or building on errors from +multiple APIs, or abstracting over errors. + +## Consuming multiple error types + +Our current infrastructure for error handling does not cope well with +mixed notions of error. + +Abstractly, as described by +[this issue](https://github.com/rust-lang/rust/issues/14419), we +cannot do the following: + +``` +fn func() -> Result { + try!(may_return_error_type_A()); + try!(may_return_error_type_B()); +} +``` + +Concretely, imagine a CLI application that interacts both with files +and HTTP servers, using `std::io` and an imaginary `http` crate: + +``` +fn download() -> Result<(), CLIError> { + let contents = try!(http::get(some_url)); + let file = try!(File::create(some_path)); + try!(file.write_str(contents)); + Ok(()) +} +``` + +The `download` function can encounter both `io` and `http` errors, and +wants to report them both under the common notion of `CLIError`. But +the `try!` macro only works for a single error type at a time. + +There are roughly two scenarios where multiple library error types +need to be coalesced into a common type, each with different needs: +application error reporting, and library error reporting + +### Application error reporting: presenting errors to a user + +An application is generally the "last stop" for error handling: it's +the point at which remaining errors are presented to the user in some +form, when they cannot be handled programmatically. + +As such, the data needed for application-level errors is usually +related to human interaction. For a CLI application, a short text +description and longer verbose description are usually all that's +needed. For GUI applications, richer data is sometimes required, but +usually not a full `enum` describing the full range of errors. + +Concretely, then, for something like the `download` function above, +for a CLI application, one might want `CLIError` to roughly be: + +```rust +struct CLIError<'a> { + description: &'a str, + detail: Option, + ... // possibly more fields here; see detailed design +} +``` + +Ideally, one could use the `try!` macro as in the `download` example +to coalesce a variety of error types into this single, simple +`struct`. + +### Library error reporting: abstraction boundaries + +When one library builds on others, it needs to translate from their +error types to its own. For example, a web server framework may build +on a library for accessing a SQL database, and needs some way to +"lift" SQL errors to its own notion of error. + +In general, a library may not want to reveal the upstream libraries it +relies on -- these are implementation details which may change over +time. Thus, it is critical that the error type of upstream libraries +not leak, and "lifting" an error from one library to another is a way +of imposing an abstraction boundaries. + +In some cases, the right way to lift a given error will depend on the +operation and context. In other cases, though, there will be a general +way to embed one kind of error in another (usually via a +["cause chain"](http://docs.oracle.com/javase/tutorial/essential/exceptions/chained.html)). Both +scenarios should be supported by Rust's error handling infrastructure. + +## Abstracting over errors + +Finally, libraries sometimes need to work with errors in a generic +way. For example, the `serialize::Encoder` type takes is generic over +an arbitrary error type `E`. At the moment, such types are completely +arbitrary: there is no `Error` trait giving common functionality +expected of all errors. Consequently, error-generic code cannot +meaningfully interact with errors. + +(See [this issue](https://github.com/rust-lang/rust/issues/15036) for +a concrete case where a bound would be useful; note, however, that the +design below does not cover this use-case, as explained in +Alternatives.) + +Languages that provide exceptions often have standard exception +classes or interfaces that guarantee some basic functionality, +including short and detailed descriptions and "causes". We should +begin developing similar functionality in `libstd` to ensure that we +have an agreed-upon baseline error API. + +# Detailed design + +We can address all of the problems laid out in the Motivation section +by adding some simple library code to `libstd`, so this RFC will +actually give a full implementation. + +**Note**, however, that this implementation relies on the +[multidispatch](https://github.com/rust-lang/rfcs/pull/195) proposal +currently under consideration. + +The proposal consists of two pieces: a standardized `Error` trait and +extensions to the `try!` macro. + +## The `Error` trait + +The standard `Error` trait follows very the widespread pattern found +in `Exception` base classes in many languages: + +```rust +pub trait Error: Send + Any { + fn description(&self) -> &str; + + fn detail(&self) -> Option<&str> { None } + fn cause(&self) -> Option<&Error> { None } +} +``` + +Every concrete error type should provide at least a description. By +making this a slice-returning method, it is possible to define +lightweight `enum` error types and then implement this method as +returning static string slices depending on the variant. + +The `cause` method allows for cause-chaining when an error crosses +abstraction boundaries. The cause is recorded as a trait object +implementing `Error`, which makes it possible to read off a kind of +abstract backtrace (often more immediately helpful than a full +backtrace). + +The `Any` bound is needed to allow *downcasting* of errors. This RFC +stipulates that it must be possible to downcast errors in the style of +the `Any` trait, but leaves unspecified the exact implementation +strategy. (If trait object upcasting was available, one could simply +upcast to `Any`; otherwise, we will likely need to duplicate the +`downcast` APIs as blanket `impl`s on `Error` objects.) + +It's worth comparing the `Error` trait to the most widespread error +type in `libstd`, `IoError`: + +```rust +pub struct IoError { + pub kind: IoErrorKind, + pub desc: &'static str, + pub detail: Option, +} +``` + +Code that returns or asks for an `IoError` explicitly will be able to +access the `kind` field and thus react differently to different kinds +of errors. But code that works with a generic `Error` (e.g., +application code) sees only the human-consumable parts of the error. +In particular, application code will often employ `Box` as the +error type when reporting errors to the user. The `try!` macro +support, explained below, makes doing so ergonomic. + +## An extended `try!` macro + +The other piece to the proposal is a way for `try!` to automatically +convert between different types of errors. + +The idea is to introduce a trait `FromError` that says how to +convert from some lower-level error type `E` to `Self`. The `try!` +macro then passes the error it is given through this conversion before +returning: + +```rust +// E here is an "input" for dispatch, so conversions from multiple error +// types can be provided +pub trait FromError { + fn from_err(err: E) -> Self; +} + +impl FromError for E { + fn from_err(err: E) -> E { + err + } +} + +impl FromError for Box { + fn from_err(err: E) -> Box { + box err as Box + } +} + +macro_rules! try ( + ($expr:expr) => ({ + use error; + match $expr { + Ok(val) => val, + Err(err) => return Err(error::FromError::from_err(err)) + } + }) +) +``` + +This code depends on +[multidispatch](https://github.com/rust-lang/rfcs/pull/195), because +the conversion depends on both the source and target error types. (In +today's Rust, the two implementations of `FromError` given above would +be considered overlapping.) + +Given the blanket `impl` of `FromError` for `E`, all existing uses +of `try!` would continue to work as-is. + +With this infrastructure in place, application code can generally use +`Box` as its error type, and `try!` will take care of the rest: + +``` +fn download() -> Result<(), Box> { + let contents = try!(http::get(some_url)); + let file = try!(File::create(some_path)); + try!(file.write_str(contents)); + Ok(()) +} +``` + +Library code that defines its own error type can define custom +`FromError` implementations for lifting lower-level errors (where the +lifting should also perform cause chaining) -- at least when the +lifting is uniform across the library. The effect is that the mapping +from one error type into another only has to be written one, rather +than at every use of `try!`: + +``` +impl FromError MyError { ... } +impl FromError MyError { ... } + +fn my_lib_func() -> Result { + try!(may_return_error_type_A()); + try!(may_return_error_type_B()); +} +``` + +# Drawbacks + +The main drawback is that the `try!` macro is a bit more complicated. + +# Unresolved questions + +## Conventions + +This RFC does not define any particular conventions around cause +chaining or concrete error types. It will likely take some time and +experience using the proposed infrastructure before we can settle +these conventions. + +## Extensions + +The functionality in the `Error` trait is quite minimal, and should +probably grow over time. Some additional functionality might include: + +### Features on the `Error` trait + +* **Generic creation of `Error`s.** It might be useful for the `Error` + trait to expose an associated constructor. See + [this issue](https://github.com/rust-lang/rust/issues/15036) for an + example where this functionality would be useful. + +* **Mutation of `Error`s**. The `Error` trait could be expanded to + provide setters as well as getters. + +The main reason not to include the above two features is so that +`Error` can be used with extremely minimal data structures, +e.g. simple `enum`s. For such data structures, it's possible to +produce fixed descriptions, but not mutate descriptions or other error +properties. Allowing generic creation of any `Error`-bounded type +would also require these `enum`s to include something like a +`GenericError` variant, which is unfortunate. So for now, the design +sticks to the least common denominator. + +### Concrete error types + +On the other hand, for code that doesn't care about the footprint of +its error types, it may be useful to provide something like the +following generic error type: + +```rust +pub struct WrappedError { + pub kind: E, + pub description: String, + pub detail: Option, + pub cause: Option> +} + +impl WrappedError { + pub fn new(err: E) { + WrappedErr { + kind: err, + description: err.to_string(), + detail: None, + cause: None + } + } +} + +impl Error for WrappedError { + fn description(&self) -> &str { + self.description.as_slice() + } + fn detail(&self) -> Option<&str> { + self.detail.as_ref().map(|s| s.as_slice()) + } + fn cause(&self) -> Option<&Error> { + self.cause.as_ref().map(|c| &**c) + } +} +``` + +This type can easily be added later, so again this RFC sticks to the +minimal functionality for now. diff --git a/text/0202-subslice-syntax-change.md b/text/0202-subslice-syntax-change.md new file mode 100644 index 00000000000..8ec42ec1c63 --- /dev/null +++ b/text/0202-subslice-syntax-change.md @@ -0,0 +1,60 @@ +- Start Date: 2014-08-15 +- RFC PR: https://github.com/rust-lang/rfcs/pull/202 +- Rust Issue: https://github.com/rust-lang/rust/issues/16967 + +# Summary + +Change syntax of subslices matching from `..xs` to `xs..` +to be more consistent with the rest of the language +and allow future backwards compatible improvements. + +Small example: + +```rust +match slice { + [xs.., _] => xs, + [] => fail!() +} +``` + +This is basically heavily stripped version of [RFC 101](https://github.com/rust-lang/rfcs/pull/101). + +# Motivation + +In Rust, symbol after `..` token usually describes number of things, +as in `[T, ..N]` type or in `[e, ..N]` expression. +But in following pattern: `[_, ..xs]`, `xs` doesn't describe any number, +but the whole subslice. + +I propose to move dots to the right for several reasons (including one mentioned above): + +1. Looks more natural (but that might be subjective). +2. Consistent with the rest of the language. +3. C++ uses `args...` in variadic templates. +4. It allows extending slice pattern matching as described in [RFC 101](https://github.com/rust-lang/rfcs/pull/101). + +# Detailed design + +Slice matching grammar would change to (assuming trailing commas; +grammar syntax as in Rust manual): + + slice_pattern : "[" [[pattern | subslice_pattern] ","]* "]" ; + subslice_pattern : ["mut"? ident]? ".." ["@" slice_pattern]? ; + +To compare, currently it looks like: + + slice_pattern : "[" [[pattern | subslice_pattern] ","]* "]" ; + subslice_pattern : ".." ["mut"? ident ["@" slice_pattern]?]? ; + +# Drawbacks + +Backward incompatible. + +# Alternatives + +Don't do it at all. + +# Unresolved questions + +Whether subslice matching combined with `@` should be written as `xs.. @[1, 2]` +or maybe in another way: `xs @[1, 2]..`. diff --git a/text/0212-restore-int-fallback.md b/text/0212-restore-int-fallback.md new file mode 100644 index 00000000000..874c86dc875 --- /dev/null +++ b/text/0212-restore-int-fallback.md @@ -0,0 +1,225 @@ +- Start Date: 2014-09-03 +- RFC PR: https://github.com/rust-lang/rfcs/pull/212 +- Rust Issue: https://github.com/rust-lang/rust/issues/16968 + +# Summary + +Restore the integer inference fallback that was removed. Integer +literals whose type is unconstrained will default to `i32`, unlike the +previous fallback to `int`. +Floating point literals will default to `f64`. + +# Motivation + +## History lesson + +Rust has had a long history with integer and floating-point +literals. Initial versions of Rust required *all* literals to be +explicitly annotated with a suffix (if no suffix is provided, then +`int` or `float` was used; note that the `float` type has since been +removed). This meant that, for example, if one wanted to count up all +the numbers in a list, one would write `0u` and `1u` so as to employ +unsigned integers: + + let mut count = 0u; // let `count` be an unsigned integer + while cond() { + ... + count += 1u; // `1u` must be used as well + } + +This was particularly troublesome with arrays of integer literals, +which could be quite hard to read: + + let byte_array = [0u8, 33u8, 50u8, ...]; + +It also meant that code which was very consciously using 32-bit or +64-bit numbers was hard to read. + +Therefore, we introduced integer inference: unlabeled integer literals +are not given any particular integral type rather a fresh "integral +type variable" (floating point literals work in an analogous way). The +idea is that the vast majority of literals will eventually interact +with an actual typed variable at some point, and hence we can infer +what type they ought to have. For those cases where the type cannot be +automatically selected, we decided to fallback to our older behavior, +and have integer/float literals be typed as `int`/`float` (this is also what Haskell +does). Some time later, we did [various measurements][m] and found +that in real world code this fallback was rarely used. Therefore, we +decided that to remove the fallback. + +## Experience with lack of fallback + +Unfortunately, when doing the measurements that led us to decide to +remove the `int` fallback, we neglected to consider coding "in the +small" (specifically, we did not include tests in the +measurements). It turns out that when writing small programs, which +includes not only "hello world" sort of things but also tests, the +lack of integer inference fallback is quite annoying. This is +particularly troublesome since small program are often people's first +exposure to Rust. The problems most commonly occur when integers are +"consumed" by printing them out to the screen or by asserting +equality, both of which are very common in small programs and testing. + +There are at least three common scenarios where fallback would be +beneficial: + +**Accumulator loops.** Here a counter is initialized to `0` and then +incremented by `1`. Eventually it is printed or compared against +a known value. + +``` +let mut c = 0; +loop { + ...; + c += 1; +} +println!("{}", c); // Does not constrain type of `c` +assert_eq(c, 22); +``` + +**Calls to range with constant arguments.** Here a call to range like +`range(0, 10)` is used to execute something 10 times. It is important +that the actual counter is either unused or only used in a print out +or comparison against another literal: + +``` +for _ in range(0, 10) { +} +``` + +**Large constants.** In small tests it is convenient to make dummy +test data. This frequently takes the form of a vector or map of ints. + +``` +let mut m = HashMap::new(); +m.insert(1, 2); +m.insert(3, 4); +assert_eq(m.find(&3).map(|&i| i).unwrap(), 4); +``` + +## Lack of bugs + +To our knowledge, there has not been a single bug exposed by removing +the fallback to the `int` type. Moreover, such bugs seem to be +extremely unlikely. + +The primary reason for this is that, in production code, the `i32` +fallback is very rarely used. In a sense, the same [measurements][m] +that were used to justify removing the `int` fallback also justify +keeping it. As the measurements showed, the vast, vast majority of +integer literals wind up with a constrained type, unless they are only +used to print out and do assertions with. Specifically, any integer +that is passed as a parameter, returned from a function, or stored in +a struct or array, must wind up with a specific type. + +## Rationale for the choice of defaulting to `i32` + +In contrast to the first revision of the RFC, the fallback type +suggested is `i32`. This is justified by a case analysis which showed +that there does not exist a compelling reason for having a signed +pointer-sized integer type as the default. + +There are reasons *for* using `i32` instead: It's familiar to programmers +from the C programming language (where the default int type is 32-bit in +the major calling conventions), it's faster than 64-bit integers in +arithmetic today, and is superior in memory usage while still providing +a reasonable range of possible values. + +To expand on the perfomance argument: `i32` obviously uses half of the +memory of `i64` meaning half the memory bandwidth used, half as much +cache consumption and twice as much vectorization – additionally +arithmetic (like multiplication and division) is faster on some of the +modern CPUs. + +## Case analysis + +This is an analysis of cases where `int` inference might be thought of +as useful: + +**Indexing into an array with unconstrained integer literal:** + +``` +let array = [0u8, 1, 2, 3]; +let index = 3; +array[index] +``` + +In this case, `index` is already automatically inferred to be a `uint`. + +**Using a default integer for tests, tutorials, etc.:** Examples of this +include "The Guide", the Rust API docs and the Rust standard library +unit tests. This is better served by a smaller, faster and platform +independent type as default. + +**Using an integer for an upper bound or for simply printing it:** This +is also served very well by `i32`. + +**Counting of loop iterations:** This is a part where `int` is as badly +suited as `i32`, so at least the move to `i32` doesn't create new +hazards (note that the number of elements of a vector might not +necessarily fit into an `int`). + +In addition to all the points above, having a platform-independent type +obviously results in less differences between the platforms in which the +programmer "doesn't care" about the integer type they are using. + +## Future-proofing for overloaded literals + +It is possible that, in the future, we will wish to allow vector and +strings literals to be overloaded so that they can be resolved to +user-defined types. In that case, for backwards compatibility, it will +be necessary for those literals to have some sort of fallback type. +(This is a relatively weak consideration.) + +# Detailed design + +Integral literals are currently type-checked by creating a special +class of type variable. These variables are subject to unification as +normal, but can only unify with integral types. This RFC proposes +that, at the end of type inference, when all constraints are known, we +will identify all integral type variables that have not yet been bound +to anything and bind them to `i32`. Similarly, floating point literals +will fallback to `f64`. + +For those who wish to be very careful about which integral types they +employ, a new lint (`unconstrained_literal`) will be added which +defaults to `allow`. This lint is triggered whenever the type of an +integer or floating point literal is unconstrained. + +# Downsides + +Although there seems to be little motivation for `int` to be the +default, there might be use cases where `int` is a more correct fallback +than `i32`. + +Additionally, it might seem weird to some that `i32` is a default, when +`int` looks like the default from other languages. The name of `int` +however is not in the scope of this RFC. + + +# Alternatives + +- **No fallback.** Status quo. + +- **Fallback to something else.** We could potentially fallback to + `int` like the original RFC suggested or some other integral type + rather than `i32`. + +- **Fallback in a more narrow range of cases.** We could attempt to + identify integers that are "only printed" or "only compared". There + is no concrete proposal in this direction and it seems to lead to an + overly complicated design. + +- **Default type parameters influencing inference.** There is a + separate, follow-up proposal being prepared that uses default type + parameters to influence inference. This would allow some examples, + like `range(0, 10)` to work even without integral fallback, because + the `range` function itself could specify a fallback type. However, + this does not help with many other examples. + +# History + +2014-11-07: Changed the suggested fallback from `int` to `i32`, add +rationale. + +[m]: https://gist.github.com/nikomatsakis/11179747 diff --git a/text/0213-defaulted-type-params.md b/text/0213-defaulted-type-params.md new file mode 100644 index 00000000000..5425cc46ffc --- /dev/null +++ b/text/0213-defaulted-type-params.md @@ -0,0 +1,571 @@ +- Start Date: 2015-02-04 +- RFC PR: https://github.com/rust-lang/rfcs/pull/213 +- Rust Issue: https://github.com/rust-lang/rust/issues/21939 + +# Summary + +Rust currently includes feature-gated support for type parameters that +specify a default value. This feature is not well-specified. The aim +of this RFC is to fully specify the behavior of defaulted type +parameters: + +1. Type parameters in any position can specify a default. +2. Within fn bodies, defaulted type parameters are used to drive inference. +3. Outside of fn bodies, defaulted type parameters supply fixed + defaults. +4. `_` can be used to omit the values of type parameters and apply a + suitable default: + - In a fn body, any type parameter can be omitted in this way, and + a suitable type variable will be used. + - Outside of a fn body, only defaulted type parameters can be + omitted, and the specified default is then used. + +Points 2 and 4 extend the current behavior of type parameter defaults, +aiming to address some shortcomings of the current implementation. + +This RFC would remove the feature gate on defaulted type parameters. + +# Motivation + +## Why defaulted type parameters + +Defaulted type parameters are very useful in two main scenarios: + +1. Extended a type without breaking existing clients. +2. Allowing customization in ways that many or most users do not care + about. + +Often, these two scenarios occur at the same time. A classic +historical example is the `HashMap` type from Rust's standard +library. This type now supports the ability to specify custom +hashers. For most clients, this is not particularly important and this +initial versions of the `HashMap` type were not customizable in this +regard. But there are some cases where having the ability to use a +custom hasher can make a huge difference. Having the ability to +specify defaults for type parameters allowed the `HashMap` type to add +a new type parameter `H` representing the hasher type without breaking +any existing clients and also without forcing all clients to specify +what hasher to use. + +However, customization occurs in places other than types. Consider the +function `range()`. In early versions of Rust, there was a distinct +range function for each integral type (e.g. `uint::range`, +`int::range`, etc). These functions were eventually consolidated into +a single `range()` function that is defined generically over all +"enumerable" types: + + trait Enumerable : Add + PartialOrd + Clone + One; + pub fn range(start: A, stop: A) -> Range { + Range{state: start, stop: stop, one: One::one()} + } + +This version is often more convenient to use, particularly in a +generic context. + +However, the generic version does have the downside that when the +bounds of the range are integral, inference sometimes lacks enough +information to select a proper type: + + // ERROR -- Type argument unconstrained, what integral type did you want? + for x in range(0, 10) { ... } + +Thus users are forced to write: + + for x in range(0u, 10u) { ... } + +This RFC describes how to integrate default type parameters with +inference such that the type parameter on `range` can specify a +default (`uint`, for example): + + pub fn range(start: A, stop: A) -> Range { + Range{state: start, stop: stop, one: One::one()} + } + +Using this definition, a call like `range(0, 10)` is perfectly legal. +If it turns out that the type argument is not other constraint, `uint` +will be used instead. + +## Extending types without breaking clients. + +Without defaults, once a library is released to "the wild", it is not +possible to add type parameters to a type without breaking all +existing clients. However, it frequently happens that one wants to +take an existing type and make it more flexible that it used to be. +This often entails adding a new type parameter so that some type which +was hard-coded before can now be customized. Defaults provide a means +to do this while having older clients transparently fallback to the +older behavior. + +*Historical example:* Extending HashMap to support various hash + algorithms. + +# Detailed Design + +## Remove feature gate + +This RFC would remove the feature gate on defaulted type parameters. + +## Type parameters with defaults + +Defaults can be placed on any type parameter, whether it is declared +on a type definition (`struct`, `enum`), type alias (`type`), trait +definition (`trait`), trait implementation (`impl`), or a function or +method (`fn`). + +Once a given type parameter declares a default value, all subsequent +type parameters in the list must declare default values as well: + + // OK. All defaulted type parameters come at the end. + fn foo() { .. } + + // ERROR. B has a default, but C does not. + fn foo() { .. } + +The default value of a type parameter `X` may refer to other type +parameters declared on the same item. However, it may only refer to +type parameters declared *before* `X` in the list of type parameters: + + // OK. Default value of `B` refers to `A`, which is not defaulted. + fn foo() { .. } + + // OK. Default value of `C` refers to `B`, which comes before + // `C` in the list of parameters. + fn foo() { .. } + + // ERROR. Default value of `B` refers to `C`, which comes AFTER + // `B` in the list of parameters. + fn foo() { .. } + +## Instantiating defaults + +This section specifies how to interpret a reference to a generic +type. Rather than writing out a rather tedious (and hard to +understand) description of the algorithm, the rules are instead +specified by a series of examples. The high-level idea of the rules is +as follows: + +- Users must always provide *some* value for non-defaulted type parameters. + Defaulted type parameters may be omitted. +- The `_` notation can always be used to *explicitly omit* the value + of a type parameter: + - Inside a fn body, any type parameter may be omitted. Inference is used. + - Outside a fn body, only defaulted type parameters may be + omitted. The default value is used. + - *Motivation:* This is consistent with Rust tradition, which + generally requires explicit types or a mechanical defaulting + process outside of `fn` bodies. + +### References to generic types + +We begin with examples of references to the generic type `Foo`: + + struct Foo { ... } + +`Foo` defines four type parameters, the final two of which are +defaulted. First, let us consider what happens outside of a fn +body. It is mandatory to supply explicit values for all non-defaulted +type parameters: + + // ERROR: 2 parameters required, 0 provided. + fn f(_: &Foo) { ... } + +Defaulted type parameters are filled in based on the defaults given: + + // Legal: Equivalent to `Foo` + fn f(_: &Foo) { ... } + +Naturally it is legal to specify explicit values for the defaulted +type parameters if desired: + + // Legal: Equivalent to `Foo` + fn f(_: &Foo) { ... } + +It is also legal to provide just one of the defaulted type parameters +and not the other: + + // Legal: Equivalent to `Foo` + fn f(_: &Foo) { ... } + +If the user wishes to supply the value of the type parameter `D` +explicitly, but not `C`, then `_` can be used to request the default: + + // Legal: Equivalent to `Foo` + fn f(_: &Foo) { ... } + +Note that, outside of a fn body, `_` can *only* be used with +defaulted type parameters: + + // ERROR: outside of a fn body, `_` cannot be + // used for a non-defaulted type parameter + fn f(_: &Foo) { ... } + +Inside a fn body, the rules are much the same, except that `_` is +legal everywhere. Every reference to `_` creates a fresh type +variable `$n`. If the type parameter whose value is omitted has an +associate default, that default is used as the *fallback* for `$n` +(see the section "Type variables with fallbacks" for more +information). Here are some examples: + + fn f() { + // Error: `Foo` requires at least 2 type parameters, 0 supplied. + let x: Foo = ...; + + // All of these 4 examples are OK and equivalent. Each + // results in a type `Foo<$0,$1,$2,$3>` and `$0`-`$4` are type + // variables. `$2` has a fallback of `DefaultHasher` and `$3` + // has a fallback of `$2`. + let x: Foo<_,_> = ...; + let x: Foo<_,_,_> = ...; + let x: Foo<_,_,_,_> = ...; + + // Results in a type `Foo` where `$0` + // has a fallback of `DefaultHasher`. + let x: Foo = ...; + } + +### References to generic traits + +The rules for traits are the same as the rules for types. Consider a +trait `Foo`: + + trait Foo { ... } + +References to this trait can omit values for `C` and `D` in precisely +the same way as was shown for types: + + // All equivalent to Foo: + fn foo>() { ... } + fn foo>() { ... } + fn foo>() { ... } + + // Equivalent to Foo: + fn foo>() { ... } + +### References to generic functions + +The rules for referencing generic functions are the same as for types, +except that it is legal to omit values for all type parameters if +desired. In that case, the behavior is the same as it would be if `_` +were used as the value for every type parameter. Note that functions +can only be referenced from within a fn body. + +### References to generic impls + +Users never explicitly "reference" an impl. Rather, the trait matching +system implicitly instantaites impls as part of trait matching. This +implies that all type parameters are always instantiated with type +variables. These type variables are assigned fallbacks according to +the defaults given. + +## Type variables with fallbacks + +We extend the inference system so that when a type variable is +created, it can optionally have a *fallback value*, which is another +type. + +In the type checker, whenever we create a fresh type variable to +represent a type parameter with an associated default, we will use +that default as the fallback value for this type variable. + +Example: + +``` +fn foo(a: A, b: B) { ... } + +fn bar() { + // Here, the values of the type parameters are given explicitly. + let f: fn(uint, uint) = foo::; + + // Here the value of the first type parameter is given explicitly, + // but not the second. Because the second specifies a default, this + // is permitted. The type checker will create a fresh variable `$0` + // and attempt to infer the value of this defaulted type parameter. + let g: fn(uint, $0) = foo::; + + // Here, the values of the type parameters are not given explicitly, + // and hence the type checker will create fresh variables + // `$1` and `$2` for both of them. + let h: fn($1, $2) = foo; +} +``` + +In this snippet, there are three references to the generic function +`foo`, each of which specifies progressively fewer types. As a result, +the type checker winds up creating three type variables, which are +referred to in the example as `$0`, `$1`, and `$2` (not that this `$` +notation is just for explanatory purposes and is not actual Rust +syntax). + +The fallback values of `$0`, `$1`, and `$2` are as follows: + +- `$0` was created to represent the type parameter `B` defined on + `foo`. This means that `$0` will have a fallback value of `uint`, + since the type variable `A` was specified to be `uint` in the + expression that created `$0`. +- `$1` was created to represent the type parameter `A`, which + has no default. Therefore `$1` has no fallback. +- `$2` was created to represent the type parameter `B`. It will + have the fallback value of `$1`, which was the value of `A` + within the expression where `$2` was created. + +## Trait resolution, fallbacking, and inference + +Prior to this RFC, type-checking a function body proceeds roughly as +follows: + +1. The function body is analyzed. This results in an accumulated set of + type variables, constraints, and trait obligations. +2. Those trait obligations are then resolved until a fixed point + is reached. +3. If any trait obligations remain unresolved, an error is reported. +4. If any type variables were never bound to a concrete value, an error + is reported. + +To accommodate fallback, the new procedure is somewhat different: + +1. The function body is analyzed. This results in an accumulated set of + type variables, constraints, and trait obligations. +2. Execute in a loop: + 1. Run trait resolution until a fixed point is reached. + 2. Create a (initially empty) set `UB` of unbound type and + integral/float variables. This set represents the set of + variables for which fallbacks should be applied. + 3. Add all unbound integral and float variables to the set `UB` + 4. For each type variable `X`: + - If `X` has no fallback defined, skip. + - If `X` is not bound, add `X` to `UB` + - If `X` is bound to an unbound integral variable `I`, add `X` to + `UB` and remove `I` from `UB` (if present). + - If `X` is bound to an unbound float variable `F`, add `X` to + `UB` and remove `F` from `UB` (if present). + 5. If `UB` is the empty set, break out of the loop. + 6. For each member of `UB`: + - If the member is an integral type variable `I`, set `I` to `int`. + - If the member is a float variable `F`, set `I` to `f64`. + - Otherwise, the member must be a variable `X` with a defined fallback. + Set `X` to its fallback. + - Note that this "set" operations can fail, which indicates + conflicting defaults. A suitable error message should be + given. +3. If any type parameters still have no value assigned to them, report an error. +4. If any trait obligations could not be resolved, report an error. + +There are some subtle points to this algorithm: + +**When defaults are to be applied, we first gather up the set of +variables that have applicable defaults (step 2.2) and then later +unconditionally apply those defaults (step 2.4).** In particular, we +do not loop over each type variable, check whether it is unbound, and +apply the default only if it is unbound. The reason for this is that +it can happen that there are contradictory defaults and we want to +ensure that this results in an error: + + fn foo() -> F { } + fn bar(b: B) { } + fn baz() { + // Here, F is instantiated with $0=uint + let x: $0 = foo(); + + // Here, B is instantiated with $1=uint, and constraint $0 <: $1 is added. + bar(x); + } + +In this example, two type variables are created. `$0` is the value of +`F` in the call to `foo()` and `$1` is the value of `B` in the call to +`bar()`. The fact that `x`, which has type `$0`, is passed as an +argument to `bar()` will add the constraint that `$0 <: $1`, but at no +point are any concrete types given. Therefore, once type checking is +complete, we will apply defaults. Using the algorithm given above, we +will determine that both `$0` and `$1` are unbound and have suitable +defaults. We will then unify `$0` with `uint`. This will succeed and, +because `$0 <: $1`, cause `$1` to be unified with `uint`. Next, we +will try to unify `$1` with its default, `int`. This will lead to an +error. If we combined the checking of whether `$1` was unbound with +the unification with the default, we would have first unified `$0` and +then decided that `$1` did not require unification. + +**In the general case, a loop is required to continue resolving traits +and applying defaults in sequence.** Resolving traits can lead to +unifications, so it is clear that we must resolve all traits that we +can before we apply any defaults. However, it is also true that adding +defaults can create new trait obligations that must be resolved. + +Here is an example where processing trait obligations creates +defaults, and processing defaults created trait obligations: + + trait Foo { } + trait Bar { } + + impl Foo for Vec { } // Impl 1 + impl Bar for uint { } // Impl 2 + + fn takes_foo(f: F) { } + + fn main() { + let x = Vec::new(); // x: Vec<$0> + takes_foo(x); // adds oblig Vec<$0> : Foo + } + +When we finish type checking `main`, we are left with a variable `$0` +and a trait obligation `Vec<$0> : Foo`. Processing the trait +obligation selects the impl 1 as the way to fulfill this trait +obligation. This results in: + +1. a new type variable `$1`, which represents the parameter `T` on the impl. + `$1` has a default, `uint`. +2. the constraint that `$0=$1`. +3. a new trait obligation `$1 : Bar`. + +We cannot process the new trait obligation yet because the type +variable `$1` is still unbound. (We know that it is equated with `$0`, +but we do not have any concrete types yet, just variables.) After +trait resolution reaches a fixed point, defaults are applied. `$1` is +equated with `uint` which in turn propagates to `$0`. At this point, +there is still an outstanding trait obligation `uint : Bar`. This +trait obligation can be resolved to impl 2. + +The previous example consisted of "1.5" iterations of the loop. That +is, although trait resolution runs twice, defaults are only needed one +time: + +1. Trait resolution executed to resolve `Vec<$0> : Foo`. +2. Defaults were applied to unify `$1 = $0 = uint`. +3. Trait resolution executed to resolve `uint : Bar` +4. No more defaults to apply, done. + +The next example does 2 full iterations of the loop. + + trait Foo { } + trait Bar { } + trait Baz { } + + impl=Vec> Foo for Vec { } // Impl 1 + impl Bar for Vec { } // Impl 2 + + fn takes_foo(f: F) { } + + fn main() { + let x = Vec::new(); // x: Vec<$0> + takes_foo(x); // adds oblig Vec<$0> : Foo + } + +Here the process is as follows: + +1. Trait resolution executed to resolve `Vec<$0> : Foo`. The result is + two fresh variables, `$1` (for `U`) and `$2=Vec<$1>` (for `$T`), the + constraint that `$0=$2`, and the obligation `$2 : Bar<$1>`. +2. Defaults are applied to unify `$2 = $0 = Vec<$1>`. +3. Trait resolution executed to resolve `$2 : Bar<$1>`. The result + is a fresh variable `$3=uint` (for `$V`) and the constraint + that `$1=$3`. +4. Defaults are applied to unify `$3 = $1 = uint`. + +It should be clear that one can create examples in this vein so as to +require any number of loops. + +**Interaction with integer/float literal fallback.** This RFC gives +defaulted type parameters precedence over integer/float literal +fallback. This seems preferable because such types can be more +specific. Below are some examples. See also the *alternatives* +section. + +``` +// Here the type of the integer literal 22 is inferred +// to `int` using literal fallback. +fn foo(t: T) { ... } +foo(22) +``` + +``` +// Here the type of the integer literal 22 is inferred +// to `uint` because the default on `T` overrides the +// standard integer literal fallback. +fn foo(t: T) { ... } +foo(22) +``` + +``` +// Here the type of the integer literal 22 is inferred +// to `char`, leading to an error. This can be resolved +// by using an explicit suffix like `22i`. +fn foo(t: T) { ... } +foo(22) +``` + +**Termination.** Any time that there is a loop, one must inquire after +termination. In principle, the loop above could execute indefinitely. +This is because trait resolution is not guaranteed to terminate -- +basically there might be a cycle between impls such that we continue +creating new type variables and new obligations forever. The trait +matching system already defends against this with a recursion counter. +That same recursion counter is sufficient to guarantee termination +even when the default mechanism is added to the mix. This is because +the default mechanism can never itself create new trait obligations: +it can only cause previous ambiguous trait obligations to now be +matchable (because unbound variables become bound). But the actual +need to iteration through the loop is still caused by trait matching +generating recursive obligations, which have an associated depth +limit. + +## Compatibility analysis + +One of the major design goals of defaulted type parameters is to +permit new parameters to be added to existing types or methods in a +backwards compatible way. This remains possible under the current +design. + +Note though that adding a default to an *existing* type parameter can +lead to type errors in clients. This can occur if clients were already +relying on an inference fallback from some other source and there is +now an ambiguity. Naturally clients can always fix this error by +specifying the value of the type parameter in question manually. + +# Downsides and alternatives + +## Avoid inference + +Rather than adding the notion of *fallbacks* to type variables, +defaults could be mechanically added, even within fn bodies, as they +are today. But this is disappointing because it means that examples +like `range(0,10)`, where defaults could inform inference, still +require explicit annotation. Without the notion of fallbacks, it is +also difficult to say what defaulted type parameters in methods or +impls should mean. + +## More advanced interaction between integer literal inference + +There were some other proposals to have a more advanced interaction +between custom fallbacks and literal inference. For example, it is +possible to imagine that we allow literal inference to take precedence +over type default fallbacks, unless the fallback is itself integral. +The problem is that this is both complicated and possibly not forwards +compatible if we opt to allow a more general notion of literal +inference in the future (in other words, if integer literals may be +mapped to more than just the built-in integral types). Furthermore, +these rules would create strictly fewer errors, and hence can be added +in the future if desired. + +## Notation + +Allowing `_` notation outside of fn body means that it's meaning +changes somewhat depending on context. However, this is consistent +with the meaning of omitted lifetimes, which also change in the same +way (mechanical default outside of fn body, inference within). + +An alternative design is to use the `K=V` notation proposed in the +associated items RFC for specify the values of default type +parameters. However, this is somewhat odd, because default type +parameters appear in a positional list, and thus it is suprising that +values for the non-defaulted parameters are given positionally, but +values for the defaulted type parameters are given with labels. + +Another alternative would to simply prohibit users from specifying the +value of a defaulted type parameter unless values are given for all +previous defaulted typed parameters. But this is clearly annoying in +those cases where defaulted type parameters represent distinct axes of +customization. + +# Hat Tip + +eddyb introduced defaulted type parameters and also opened the first +pull request that used them to inform inference. diff --git a/text/0214-while-let.md b/text/0214-while-let.md new file mode 100644 index 00000000000..9385e8280d2 --- /dev/null +++ b/text/0214-while-let.md @@ -0,0 +1,84 @@ +- Start Date: 2014-08-27 +- RFC PR: https://github.com/rust-lang/rfcs/pull/214 +- Rust Issue: https://github.com/rust-lang/rust/issues/17687 + +# Summary + +Introduce a new `while let PAT = EXPR { BODY }` construct. This allows for using a refutable pattern +match (with optional variable binding) as the condition of a loop. + +# Motivation + +Just as `if let` was inspired by Swift, it turns out Swift supports `while let` as well. This was +not discovered until much too late to include it in the `if let` RFC. It turns out that this sort of +looping is actually useful on occasion. For example, the desugaring `for` loop is actually a variant +on this; if `while let` existed it could have been implemented to map `for PAT in EXPR { BODY }` to + +```rust +// the match here is so `for` can accept an rvalue for the iterator, +// and was used in the "real" desugaring version. +match &mut EXPR { + i => { + while let Some(PAT) = i.next() { + BODY + } + } +} +``` + +(note that the non-desugared form of `for` is no longer equivalent). + +More generally, this construct can be used any time looping + pattern-matching is desired. + +This also makes the language a bit more consistent; right now, any condition that can be used with +`if` can be used with `while`. The new `if let` adds a form of `if` that doesn't map to `while`. +Supporting `while let` restores the equivalence of these two control-flow constructs. + +# Detailed design + +`while let` operates similarly to `if let`, in that it desugars to existing syntax. Specifically, +the syntax + +```rust +['ident:] while let PAT = EXPR { + BODY +} +``` + +desugars to + +```rust +['ident:] loop { + match EXPR { + PAT => BODY, + _ => break + } +} +``` + +Just as with `if let`, an irrefutable pattern given to `while let` is considered an error. This is +largely an artifact of the fact that the desugared `match` ends up with an unreachable pattern, +and is not actually a goal of this syntax. The error may be suppressed in the future, which would be +a backwards-compatible change. + +Just as with `if let`, `while let` will be introduced under a feature gate (named `while_let`). + +# Drawbacks + +Yet another addition to the grammar. Unlike `if let`, it's not obvious how useful this syntax will +be. + +# Alternatives + +As with `if let`, this could plausibly be done with a macro, but it would be ugly and produce bad +error messages. + +`while let` could be extended to support alternative patterns, just as match arms do. This is not +part of the main proposal for the same reason it was left out of `if let`, which is that a) it looks +weird, and b) it's a bit of an odd coupling with the `let` keyword as alternatives like this aren't +going to be introducing variable bindings. However, it would make `while let` more general and able +to replace more instances of `loop { match { ... } }` than is possible with the main design. + +# Unresolved questions + +None. diff --git a/text/0216-collection-views.md b/text/0216-collection-views.md new file mode 100644 index 00000000000..b2c242f815b --- /dev/null +++ b/text/0216-collection-views.md @@ -0,0 +1,208 @@ +- Start Date: 2014-08-28 +- RFC PR: (https://github.com/rust-lang/rfcs/pull/216) +- Rust Issue: (https://github.com/rust-lang/rust/issues/17320) + +# Summary + +Add additional iterator-like Entry objects to collections. +Entries provide a composable mechanism for in-place observation and mutation of a +single element in the collection, without having to "re-find" the element multiple times. +This deprecates several "internal mutation" methods like hashmap's `find_or_insert_with`. + +# Motivation + +As we approach 1.0, we'd like to normalize the standard APIs to be consistent, composable, +and simple. However, this currently stands in opposition to manipulating the collections in +an *efficient* manner. For instance, if one wishes to build an accumulating map on top of one +of the concrete maps, they need to distinguish between the case when the element they're inserting +is *already* in the map, and when it's *not*. One way to do this is the following: + +``` +if map.contains_key(&key) { + *map.find_mut(&key).unwrap() += 1; +} else { + map.insert(key, 1); +} +``` + +However, searches for `key` *twice* on every operation. +The second search can be squeezed out the `update` re-do by matching on the result +of `find_mut`, but the `insert` case will always require a re-search. + +To solve this problem, Rust currently has an ad-hoc mix of "internal mutation" methods which +take multiple values or closures for the collection to use contextually. Hashmap in particular +has the following methods: + +``` +fn find_or_insert<'a>(&'a mut self, k: K, v: V) -> &'a mut V +fn find_or_insert_with<'a>(&'a mut self, k: K, f: |&K| -> V) -> &'a mut V +fn insert_or_update_with<'a>(&'a mut self, k: K, v: V, f: |&K, &mut V|) -> &'a mut V +fn find_with_or_insert_with<'a, A>(&'a mut self, k: K, a: A, found: |&K, &mut V, A|, not_found: |&K, A| -> V) -> &'a mut V +``` + +Not only are these methods fairly complex to use, but they're over-engineered and +combinatorially explosive. They all seem to return a mutable reference to the region +accessed "just in case", and `find_with_or_insert_with` takes a magic argument `a` to +try to work around the fact that the *two* closures it requires can't both close over +the same value (even though only one will ever be called). `find_with_or_insert_with` +is also actually performing the role of `insert_with_or_update_with`, +suggesting that these aren't well understood. + +Rust has been in this position before: internal iteration. Internal iteration was (author's note: I'm told) +confusing and complicated. However the solution was simple: external iteration. You get +all the benefits of internal iteration, but with a much simpler interface, and greater +composability. Thus, this RFC proposes the same solution to the internal mutation problem. + +# Detailed design + +A fully tested "proof of concept" draft of this design has been implemented on top of hashmap, +as it seems to be the worst offender, while still being easy to work with. It sits as a pull request +[here](https://github.com/rust-lang/rust/pull/17378). + +All the internal mutation methods are replaced with a single method on a collection: `entry`. +The signature of `entry` will depend on the specific collection, but generally it will be similar to +the signature for searching in that structure. `entry` will in turn return an `Entry` object, which +captures the *state* of a completed search, and allows mutation of the area. + +For convenience, we will use the hashmap draft as an example. + +``` +/// Get an Entry for where the given key would be inserted in the map +pub fn entry<'a>(&'a mut self, key: K) -> Entry<'a, K, V>; + +/// A view into a single occupied location in a HashMap +pub struct OccupiedEntry<'a, K, V>{ ... } + +/// A view into a single empty location in a HashMap +pub struct VacantEntry<'a, K, V>{ ... } + +/// A view into a single location in a HashMap +pub enum Entry<'a, K, V> { + /// An occupied Entry + Occupied(OccupiedEntry<'a, K, V>), + /// A vacant Entry + Vacant(VacantEntry<'a, K, V>), +} +``` + +Of course, the real meat of the API is in the Entry's interface (impl details removed): + +``` +impl<'a, K, V> OccupiedEntry<'a, K, V> { + /// Gets a reference to the value of this Entry + pub fn get(&self) -> &V; + + /// Gets a mutable reference to the value of this Entry + pub fn get_mut(&mut self) -> &mut V; + + /// Converts the entry into a mutable reference to its value + pub fn into_mut(self) -> &'a mut V; + + /// Sets the value stored in this Entry + pub fn set(&mut self, value: V) -> V; + + /// Takes the value stored in this Entry + pub fn take(self) -> V; +} + +impl<'a, K, V> VacantEntry<'a, K, V> { + /// Set the value stored in this Entry, and returns a reference to it + pub fn set(self, value: V) -> &'a mut V; +} +``` + +There are definitely some strange things here, so let's discuss the reasoning! + +First, `entry` takes a `key` by value, because this is the observed behaviour of the internal mutation +methods. Further, taking the `key` up-front allows implementations to avoid *validating* provided keys if +they require an owned `key` later for insertion. This key is effectively a *guarantor* of the entry. + +Taking the key by-value might change once collections reform lands, and Borrow and ToOwned are available. +For now, it's an acceptable solution, because in particular, the primary use case of this functionality +is when you're *not sure* if you need to insert, in which case you should be prepared to insert. +Otherwise, `find_mut` is likely sufficient. + +The result is actually an enum, that will either be Occupied or Vacant. These two variants correspond +to concrete types for when the key matched something in the map, and when the key didn't, repsectively. + +If there isn't a match, the user has exactly one option: insert a value using `set`, which will also insert +the guarantor, and destroy the Entry. This is to avoid the costs of maintaining the structure, which +otherwise isn't particularly interesting anymore. + +If there is a match, a more robust set of options is provided. `get` and `get_mut` provide access to the +value found in the location. `set` behaves as the vacant variant, but without destroying the entry. +It also yields the old value. `take` simply removes the found value, and destroys the entry for similar reasons as `set`. + +Let's look at how we one now writes `insert_or_update`: + +There are two options. We can either do the following: + +``` +// cleaner, and more flexible if logic is more complex +let val = match map.entry(key) { + Vacant(entry) => entry.set(0), + Occupied(entry) => entry.into_mut(), +}; +*val += 1; +``` + +or + +``` +// closer to the original, and more compact +match map.entry(key) { + Vacant(entry) => { entry.set(1); }, + Occupied(mut entry) => { *entry.get_mut() += 1; }, +} +``` + +Either way, one can now write something equivalent to the "intuitive" inefficient code, but it is now as efficient as the complex +`insert_or_update` methods. In fact, this matches so closely to the inefficient manipulation +that users could reasonable ignore Entries *until performance becomes an issue*, at which point +it's an almost trivial migration. Closures also aren't needed to dance around the fact that one may +want to avoid generating some values unless they have to, because that falls naturally out of +normal control flow. + +If you look at the actual patch that does this, you'll see that Entry itself is exceptionally +simple to implement. Most of the logic is trivial. The biggest amount of work was just +capturing the search state correctly, and even that was mostly a cut-and-paste job. + +With Entries, the gate is also opened for... *adaptors*! +Really want `insert_or_update` back? That can be written on top of this generically with ease. +However, such discussion is out-of-scope for this RFC. Adaptors can +be tackled in a back-compat manner after this has landed, and usage is observed. Also, this +proposal does not provide any generic trait for Entries, preferring concrete implementations for +the time-being. + +# Drawbacks + +* More structs, and more methods in the short-term + +* More collection manipulation "modes" for the user to think about + +* `insert_or_update_with` is kind of convenient for avoiding the kind of boiler-plate +found in the examples + +# Alternatives + +* Just put our foot down, say "no efficient complex manipulations", and drop +all the internal mutation stuff without a replacement. + +* Try to build out saner/standard internal manipulation methods. + +* Try to make this functionality a subset of [Cursors](http://discuss.rust-lang.org/t/pseudo-rfc-cursors-reversible-iterators/386/7), +which would be effectively a bi-directional mut_iter +where the returned references borrow the cursor preventing aliasing/safety issues, +so that mutation can be performed at the location of the cursor. +However, preventing invalidation would be more expensive, and it's not clear that +cursor semantics would make sense on e.g. a HashMap, as you can't insert *any* key +in *any* location. + +* This RFC originally [proposed a design without enums that was substantially more complex] +(https://github.com/Gankro/rust/commit/6d6804a6d16b13d07934f0a217a3562384e55612). +However it had some interesting ideas about Key manipulation, so we mention it here for +historical purposes. + +# Unresolved questions + +Naming bikesheds! diff --git a/text/0218-empty-struct-with-braces.md b/text/0218-empty-struct-with-braces.md new file mode 100644 index 00000000000..e378801ab4e --- /dev/null +++ b/text/0218-empty-struct-with-braces.md @@ -0,0 +1,382 @@ +- Start Date: (fill me in with today's date, 2014-08-28) +- RFC PR: [rust-lang/rfcs#218](https://github.com/rust-lang/rfcs/pull/218/files) +- Rust Issue: [rust-lang/rust#218](https://github.com/rust-lang/rust/issues/24266) + +# Summary + +When a struct type `S` has no fields (a so-called "empty struct"), +allow it to be defined via either `struct S;` or `struct S {}`. +When defined via `struct S;`, allow instances of it to be constructed +and pattern-matched via either `S` or `S {}`. +When defined via `struct S {}`, require instances to be constructed +and pattern-matched solely via `S {}`. + +# Motivation + +Today, when writing code, one must treat an empty struct as a +special case, distinct from structs that include fields. +That is, one must write code like this: +```rust +struct S2 { x1: int, x2: int } +struct S0; // kind of different from the above. + +let s2 = S2 { x1: 1, x2: 2 }; +let s0 = S0; // kind of different from the above. + +match (s2, s0) { + (S2 { x1: y1, x2: y2 }, + S0) // you can see my pattern here + => { println!("Hello from S2({}, {}) and S0", y1, y2); } +} +``` + +While this yields code that is relatively free of extraneous +curly-braces, this special case handling of empty structs presents +problems for two cases of interest: automatic code generators +(including, but not limited to, Rust macros) and conditionalized code +(i.e. code with `cfg` attributes; see the [CFG problem] appendix). +The heart of the code-generator argument is: Why force all +to-be-written code-generators and macros with special-case handling of +the empty struct case (in terms of whether or not to include the +surrounding braces), especially since that special case is likely to +be forgotten (yielding a latent bug in the code generator). + +The special case handling of empty structs is also a problem for +programmers who actively add and remove fields from structs during +development; such changes cause a struct to switch from being empty +and non-empty, and the associated revisions of changing removing and +adding curly braces is aggravating (both in effort revising the code, +and also in extra noise introduced into commit histories). + +This RFC proposes an approach similar to the one we used circa February +2013, when both `S0` and `S0 { }` were accepted syntaxes for an empty +struct. The parsing ambiguity that motivated removing support for +`S0 { }` is no longer present (see the [Ancient History] appendix). +Supporting empty braces in the syntax for empty structs is easy to do +in the language now. + +# Detailed design + +There are two kinds of empty structs: Braced empty structs and +flexible empty structs. Flexible empty structs are a slight +generalization of the structs that we have today. + +Flexible empty structs are defined via the syntax `struct S;` (as today). + +Braced empty structs are defined via the syntax `struct S { }` ("new"). + +Both braced and flexible empty structs can be constructed via the +expression syntax `S { }` ("new"). Flexible empty structs, as today, +can also be constructed via the expression syntax `S`. + +Both braced and flexible empty structs can be pattern-matched via the +pattern syntax `S { }` ("new"). Flexible empty structs, as today, +can also be pattern-matched via the pattern syntax `S`. + +Braced empty struct definitions solely affect the type namespace, +just like normal non-empty structs. +Flexible empty structs affect both the type and value namespaces. + +As a matter of style, using braceless syntax is preferred for +constructing and pattern-matching flexible empty structs. For +example, pretty-printer tools are encouraged to emit braceless forms +if they know that the corresponding struct is a flexible empty struct. +(Note that pretty printers that handle incomplete fragments may not +have such information available.) + +There is no ambiguity introduced by this change, because we have +already introduced a restriction to the Rust grammar to force the use +of parentheses to disambiguate struct literals in such contexts. (See +[Rust RFC 25]). + +The expectation is that when migrating code from a flexible empty +struct to a non-empty struct, it can start by first migrating to a +braced empty struct (and then have a tool indicate all of the +locations where braces need to be added); after that step has been +completed, one can then take the next step of adding the actual field. + +# Drawbacks + +Some people like "There is only one way to do it." But, there is +precendent in Rust for violating "one way to do it" in favor of +syntactic convenience or regularity; see +the [Precedent for flexible syntax in Rust] appendix. +Also, see the [Always Require Braces] alternative below. + +I have attempted to summarize the previous discussion from [RFC PR +147] in the [Recent History] appendix; some of the points there +include drawbacks to this approach and to the [Always Require Braces] +alternative. + +# Alternatives + +## Always Require Braces + +Alternative 1: "Always Require Braces". Specifically, require empty +curly braces on empty structs. People who like the current syntax of +curly-brace free structs can encode them this way: `enum S0 { S0 }` +This would address all of the same issues outlined above. (Also, the +author (pnkfelix) would be happy to take this tack.) + +The main reason not to take this tack is that some people may like +writing empty structs without braces, but do not want to switch to the +unary enum version described in the previous paragraph. +See "I wouldn't want to force noisier syntax ..." +in the [Recent History] appendix. + +## Status quo + +Alternative 2: Status quo. Macros and code-generators in general will +need to handle empty structs as a special case. We may continue +hitting bugs like [CFG parse bug]. Some users will be annoyed but +most will probably cope. + +## Synonymous in all contexts + +Alternative 3: An earlier version of this RFC proposed having `struct +S;` be entirely synonymous with `struct S { }`, and the expression +`S { }` be synonymous with `S`. + +This was deemed problematic, since it would mean that `S { }` would +put an entry into both the type and value namespaces, while +`S { x: int }` would only put an entry into the type namespace. +Thus the current draft of the RFC proposes the "flexible" versus +"braced" distinction for empty structs. + +## Never synonymous + +Alternative 4: Treat `struct S;` as requiring `S` at the expression +and pattern sites, and `struct S { }` as requiring `S { }` at the +expression and pattern sites. + +This in some ways follows a principle of least surprise, but it also +is really hard to justify having both syntaxes available for empty +structs with no flexibility about how they are used. (Note again that +one would have the option of choosing between +`enum S { S }`, `struct S;`, or `struct S { }`, each with their own +idiosyncrasies about whether you have to write `S` or `S { }`.) +I would rather adopt "Always Require Braces" than "Never Synonymous" + +## Empty Tuple Structs + +One might say "why are you including support for curly braces, but not +parentheses?" Or in other words, "what about empty tuple structs?" + +The code-generation argument could be applied to tuple-structs as +well, to claim that we should allow the syntax `S0()`. I am less +inclined to add a special case for that; I think tuple-structs are +less frequently used (especially with many fields); they are largely +for ad-hoc data such as newtype wrappers, not for code generators. + +Note that we should not attempt to generalize this RFC as proposed to +include tuple structs, i.e. so that given `struct S0 {}`, the +expressions `T0`, `T0 {}`, and `T0()` would be synonymous. The reason +is that given a tuple struct `struct T2(int, int)`, the identifier +`T2` is *already* bound to a constructor function: + +```rust +fn main() { + #[deriving(Show)] + struct T2(int, int); + + fn foo(f: |int, int| -> S) { + println!("Hello from {} and {}", f(2,3), f(4,5)); + } + foo(T2); +} +``` + +So if we were to attempt to generalize the leniency of this RFC to +tuple structs, we would be in the unfortunate situation given `struct +T0();` of trying to treat `T0` simultaneously as an instance of the +struct and as a constructor function. So, the handling of empty +structs proposed by this RFC does not generalize to tuple structs. + +(Note that if we adopt alternative 1, [Always Require Braces], then +the issue of how tuple structs are handled is totally orthogonal -- we +could add support for `struct T0()` as a distinct type from `struct S0 +{}`, if we so wished, or leave it aside.) + +# Unresolved questions + +None + +# Appendices + +## The CFG problem + +A program like this works today: + +```rust +fn main() { + #[deriving(Show)] + struct Svaries { + x: int, + y: int, + + #[cfg(zed)] + z: int, + } + + let s = match () { + #[cfg(zed)] _ => Svaries { x: 3, y: 4, z: 5 }, + #[cfg(not(zed))] _ => Svaries { x: 3, y: 4 }, + }; + println!("Hello from {}", s) +} +``` + +Observe what happens when one modifies the above just a bit: +```rust + struct Svaries { + #[cfg(eks)] + x: int, + #[cfg(why)] + y: int, + + #[cfg(zed)] + z: int, + } +``` + +Now, certain `cfg` settings yield an empty struct, even though it +is surrounded by braces. Today this leads to a [CFG parse bug] +when one attempts to actually construct such a struct. + +If we want to support situations like this properly, we will probably +need to further extend the `cfg` attribute so that it can be placed +before individual fields in a struct constructor, like this: + +```rust +// You cannot do this today, +// but maybe in the future (after a different RFC) +let s = Svaries { + #[cfg(eks)] x: 3, + #[cfg(why)] y: 4, + #[cfg(zed)] z: 5, +}; +``` + +Supporting such a syntax consistently in the future should start today +with allowing empty braces as legal code. (Strictly speaking, it is +not *necessary* that we add support for empty braces at the parsing +level to support this feature at the semantic level. But supporting +empty-braces in the syntax still seems like the most consistent path +to me.) + +## Ancient History + +A parsing ambiguity was the original motivation for disallowing the +syntax `S {}` in favor of `S` for constructing an instance of +an empty struct. The ambiguity and various options for dealing with it +were well documented on the [rust-dev thread]. +Both syntaxes were simultaneously supported at the time. + +In particular, at the time that mailing list thread was created, the +code match `match x {} ...` would be parsed as `match (x {}) ...`, not +as `(match x {}) ...` (see [Rust PR 5137]); likewise, `if x {}` would +be parsed as an if-expression whose test component is the struct +literal `x {}`. Thus, at the time of [Rust PR 5137], if the input to +a `match` or `if` was an identifier expression, one had to put +parentheses around the identifier to force it to be interpreted as +input to the `match`/`if`, and not as a struct constructor. + +Of the options for resolving this discussed on the mailing list +thread, the one selected (removing `S {}` construction expressions) +was chosen as the most expedient option. + +At that time, the option of "Place a parser restriction on those +contexts where `{` terminates the expression and say that struct +literals cannot appear there unless they are in parentheses." was +explicitly not chosen, in favor of continuing to use the +disambiguation rule in use at the time, namely that the presence of a +label (e.g. `S { a_label: ... }`) was *the* way to distinguish a +struct constructor from an identifier followed by a control block, and +thus, "there must be one label." + +Naturally, if the construction syntax were to be disallowed, it made +sense to also remove the `struct S {}` declaration syntax. + +Things have changed since the time of that mailing list thread; +namely, we have now adopted the aforementioned parser restriction +[Rust RFC 25]. (The text of RFC 25 does not explicitly address +`match`, but we have effectively expanded it to include a curly-brace +delimited block of match-arms in the definition of "block".) Today, +one uses parentheses around struct literals in some contexts (such as +`for e in (S {x: 3}) { ... }` or `match (S {x: 3}) { ... }` + +Note that there was never an ambiguity for uses of `struct S0 { }` in item +position. The issue was solely about expression position prior to the +adoption of [Rust RFC 25]. + +## Precedent for flexible syntax in Rust + +There is precendent in Rust for violating "one way to do it" in favor +of syntactic convenience or regularity. + +For example, one can often include an optional trailing comma, for +example in: `let x : &[int] = [3, 2, 1, ];`. + +One can also include redundant curly braces or parentheses, for +example in: +```rust +println!("hi: {}", { if { x.len() > 2 } { ("whoa") } else { ("there") } }); +``` + +One can even mix the two together when delimiting match arms: +```rust + let z: int = match x { + [3, 2] => { 3 } + [3, 2, 1] => 2, + _ => { 1 }, + }; +``` + +We do have lints for some style violations (though none catch the +cases above), but lints are different from fundamental language +restrictions. + +## Recent history + +There was a previous [RFC PR][RFC PR 147] that was effectively the +same in spirit to this one. It was closed because it was not +sufficient well fleshed out for further consideration by the core +team. However, to save people the effort of reviewing the comments on +that PR (and hopefully stave off potential bikeshedding on this PR), I +here summarize the various viewpoints put forward on the comment +thread there, and note for each one, whether that viewpoint would be +addressed by this RFC (accept both syntaxes), by [Always Require Braces], +or by [Status Quo]. + +Note that this list of comments is *just* meant to summarize the list +of views; it does not attempt to reflect the number of commenters who +agreed or disagreed with a particular point. (But since the RFC process +is not a democracy, the number of commenters should not matter anyway.) + +* "+1" ==> Favors: This RFC (or potentially [Always Require Braces]; I think the content of [RFC PR 147] shifted over time, so it is hard to interpret the "+1" comments now). +* "I find `let s = S0;` jarring, think its an enum initially." ==> Favors: Always Require Braces +* "Frequently start out with an empty struct and add fields as I need them." ==> Favors: This RFC or Always Require Braces +* "Foo{} suggests is constructing something that it's not; all uses of the value `Foo` are indistinguishable from each other" ==> Favors: Status Quo +* "I find it strange anyone would prefer `let x = Foo{};` over `let x = Foo;`" ==> Favors Status Quo; strongly opposes Always Require Braces. +* "I agree that 'instantiation-should-follow-declation', that is, structs declared `;, (), {}` should only be instantiated [via] `;, (), { }` respectively" ==> Opposes leniency of this RFC in that it allows expression to use include or omit `{}` on an empty struct, regardless of declaration form, and vice-versa. +* "The code generation argument is reasonable, but I wouldn't want to force noisier syntax on all 'normal' code just to make macros work better." ==> Favors: This RFC + +[Always Require Braces]: #always-require-braces +[Status Quo]: #status-quo +[Ancient History]: #ancient-history +[Recent History]: #recent-history +[CFG problem]: #the-cfg-problem +[Empty Tuple Structs]: #empty-tuple-structs +[Precedent for flexible syntax in Rust]: #precedent-for-flexible-syntax-in-rust + +[rust-dev thread]: https://mail.mozilla.org/pipermail/rust-dev/2013-February/003282.html + +[Rust Issue 5167]: https://github.com/rust-lang/rust/issues/5167 + +[Rust RFC 25]: https://github.com/rust-lang/rfcs/blob/master/complete/0025-struct-grammar.md + +[CFG parse bug]: https://github.com/rust-lang/rust/issues/16819 + +[Rust PR 5137]: https://github.com/rust-lang/rust/pull/5137 + +[RFC PR 147]: https://github.com/rust-lang/rfcs/pull/147 diff --git a/text/0221-panic.md b/text/0221-panic.md new file mode 100644 index 00000000000..ba1bd39bbe7 --- /dev/null +++ b/text/0221-panic.md @@ -0,0 +1,67 @@ +- Start Date: 2014-09-23 +- RFC PR #: [rust-lang/rfcs#221](https://github.com/rust-lang/rfcs/pull/221) +- Rust Issue #: [rust-lang/rust#17489](https://github.com/rust-lang/rust/issues/17489) + +# Summary + +Rename "task failure" to "task panic", and `fail!` to `panic!`. + +# Motivation + +The current terminology of "task failure" often causes problems when +writing or speaking about code. You often want to talk about the +possibility of an operation that returns a `Result` "failing", but +cannot because of the ambiguity with task failure. Instead, you have +to speak of "the failing case" or "when the operation does not +succeed" or other circumlocutions. + +Likewise, we use a "Failure" header in rustdoc to describe when +operations may fail the task, but it would often be helpful to +separate out a section describing the "Err-producing" case. + +We have been steadily moving away from task failure and toward +`Result` as an error-handling mechanism, so we should optimize our +terminology accordingly: `Result`-producing functions should be easy +to describe. + +# Detailed design + +Not much more to say here than is in the summary: rename "task +failure" to "task panic" in documentation, and `fail!` to `panic!` in +code. + +The choice of `panic` emerged from a +[discuss thread](http://discuss.rust-lang.org/t/renaming-task-failure/310/20) +and +[workweek discussion](https://github.com/rust-lang/meeting-minutes/blob/master/workweek-2014-08-18/error-handling.md). +It has precedent in a language setting in Go, and of course goes back +to Kernel panics. + +With this choice, we can use "failure" to refer to an operation that +produces `Err` or `None`, "panic" for unwinding at the task level, and +"abort" for aborting the entire process. + +The connotations of panic seem fairly accurate: the process is not +immediately ending, but it is rapidly fleeing from some problematic +circumstance (by killing off tasks) until a recovery point. + +# Drawbacks + +The term "panic" is a bit informal, which some consider a drawback. + +Making this change is likely to be a lot of work. + +# Alternatives + +Other choices include: + +- `throw!` or `unwind!`. These options reasonably describe the current + behavior of task failure, but "throw" suggests general exception + handling, and both place the emphasis on the mechanism rather than + the policy. We also are considering eventually adding a flag that + allows `fail!` to abort the process, which would make these terms misleading. + +- `abort!`. Ambiguous with process abort. + +- `die!`. A reasonable choice, but it's not immediately obvious what + is being killed. diff --git a/text/0230-remove-runtime.md b/text/0230-remove-runtime.md new file mode 100644 index 00000000000..d852d475e43 --- /dev/null +++ b/text/0230-remove-runtime.md @@ -0,0 +1,295 @@ +- Start Date: 2014-09-16 +- RFC PR: https://github.com/rust-lang/rfcs/pull/230 +- Rust Issue: https://github.com/rust-lang/rust/issues/17325 + +# Summary + +This RFC proposes to remove the *runtime system* that is currently part of the +standard library, which currently allows the standard library to support both +native and green threading. In particular: + +* The `libgreen` crate and associated support will be moved out of tree, into a + separate Cargo package. + +* The `librustrt` (the runtime) crate will be removed entirely. + +* The `std::io` implementation will be directly welded to native threads and + system calls. + +* The `std::io` module will remain completely cross-platform, though *separate* + platform-specific modules may be added at a later time. + +# Motivation + +## Background: thread/task models and I/O + +Many languages/libraries offer some notion of "task" as a unit of concurrent +execution, possibly distinct from native OS threads. The characteristics of +tasks vary along several important dimensions: + +* *1:1 vs M:N*. The most fundamental question is whether a "task" always + corresponds to an OS-level thread (the 1:1 model), or whether there is some + userspace scheduler that maps tasks onto worker threads (the M:N model). Some + kernels -- notably, Windows -- support a 1:1 model where the scheduling is + performed in userspace, which combines some of the advantages of the two + models. + + In the M:N model, there are various choices about whether and when blocked + tasks can migrate between worker threads. One basic downside of the model, + however, is that if a task takes a page fault, the entire worker thread is + essentially blocked until the fault is serviced. Choosing the optimal number + of worker threads is difficult, and some frameworks attempt to do so + dynamically, which has costs of its own. + +* *Stack management*. In the 1:1 model, tasks are threads and therefore must be + equipped with their own stacks. In M:N models, tasks may or may not need their + own stack, but there are important tradeoffs: + + * Techniques like *segmented stacks* allow stack size to grow over time, + meaning that tasks can be equipped with their own stack but still be + lightweight. Unfortunately, segmented stacks come with + [a significant performance and complexity cost](https://mail.mozilla.org/pipermail/rust-dev/2013-November/006314.html). + + * On the other hand, if tasks are not equipped with their own stack, they + either cannot be migrated between underlying worker threads (the case for + frameworks like Java's + [fork/join](http://gee.cs.oswego.edu/dl/papers/fj.pdf)), or else must be + implemented using *continuation-passing style (CPS)*, where each blocking + operation takes a closure representing the work left to do. (CPS essentially + moves the needed parts of the stack into the continuation closure.) The + upside is that such tasks can be extremely lightweight -- essentially just + the size of a closure. + +* *Blocking and I/O support*. In the 1:1 model, a task can block freely without + any risk for other tasks, since each task is an OS thread. In the M:N model, + however, blocking in the OS sense means blocking the worker thread. (The same + applies to long-running loops or page faults.) + + M:N models can deal with blocking in a couple of ways. The approach taken in + Java's [fork/join](http://gee.cs.oswego.edu/dl/papers/fj.pdf) framework, for + example, is to dynamically spin up/down worker threads. Alternatively, special + task-aware blocking operations (including I/O) can be provided, which are + mapped under the hood to nonblocking operations, allowing the worker thread to + continue. Unfortunately, this latter approach helps only with explicit + blocking; it does nothing for loops, page faults and the like. + +### Where Rust is now + +Rust has gradually migrated from a "green" threading model toward a native +threading model: + +* In Rust's green threading, tasks are scheduled M:N and are equipped with their + own stack. Initially, Rust used segmented stacks to allow growth over time, + but that + [was removed](https://mail.mozilla.org/pipermail/rust-dev/2013-November/006314.html) + in favor of pre-allocated stacks, which means Rust's green threads are not + "lightweight". The treatment of blocking is described below. + +* In Rust's native threading model, tasks are 1:1 with OS threads. + +Initially, Rust supported only the green threading model. Later, native +threading was added and ultimately became the default. + +In today's Rust, there is a single I/O API -- `std::io` -- that provides +blocking operations only and works with both threading models. +Rust is somewhat unusual in allowing programs to mix native and green threading, +and furthermore allowing *some* degree of interoperation between the two. This +feat is achieved through the runtime system -- `librustrt` -- which exposes: + +* The `Runtime` trait, which abstracts over the scheduler (via methods like + `deschedule` and `spawn_sibling`) as well as the entire I/O API (via + `local_io`). + +* The `rtio` module, which provides a number of traits that define the standard I/O + abstraction. + +* The `Task` struct, which includes a `Runtime` trait object as the dynamic entry point + into the runtime. + +In this setup, `libstd` works directly against the runtime interface. When +invoking an I/O or scheduling operation, it first finds the current `Task`, and +then extracts the `Runtime` trait object to actually perform the operation. + +On native tasks, blocking operations simply block. On green tasks, blocking +operations are routed through the green scheduler and/or underlying event loop +and nonblocking I/O. + +The actual scheduler and I/O implementations -- `libgreen` and `libnative` -- +then live as crates "above" `libstd`. + +## The problems + +While the situation described above may sound good in principle, there are +several problems in practice. + +**Forced co-evolution.** With today's design, the green and native + threading models must provide the same I/O API at all times. But + there is functionality that is only appropriate or efficient in one + of the threading models. + + For example, the lightest-weight M:N task models are essentially just + collections of closures, and do not provide any special I/O support. This + style of lightweight tasks is used in Servo, but also shows up in + [java.util.concurrent's exectors](http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executors.html) + and [Haskell's par monad](https://hackage.haskell.org/package/monad-par), + among many others. These lighter weight models do not fit into the current + runtime system. + + On the other hand, green threading systems designed explicitly to support I/O + may also want to provide low-level access to the underlying event loop -- an + API surface that doesn't make sense for the native threading model. + + Under the native model we want to provide direct non-blocking and/or + asynchronous I/O support -- as a systems language, Rust should be able to work + directly with what the OS provides without imposing global abstraction + costs. These APIs may involve some platform-specific abstractions (`epoll`, + `kqueue`, IOCP) for maximal performance. But integrating them cleanly with a + green threading model may be difficult or impossible -- and at the very least, + makes it difficult to add them quickly and seamlessly to the current I/O + system. + + In short, the current design couples threading and I/O models together, and + thus forces the green and native models to supply a common I/O interface -- + despite the fact that they are pulling in different directions. + +**Overhead.** The current Rust model allows runtime mixtures of the green and + native models. The implementation achieves this flexibility by using trait + objects to model the entire I/O API. Unfortunately, this flexibility has + several downsides: + +- *Binary sizes*. A significant overhead caused by the trait object design is that + the entire I/O system is included in any binary that statically links to + `libstd`. See + [this comment](https://github.com/rust-lang/rust/issues/10740#issuecomment-31475987) + for more details. + +- *Task-local storage*. The current implementation of task-local storage is + designed to work seamlessly across native and green threads, and its performs + substantially suffers as a result. While it is feasible to provide a more + efficient form of "hybrid" TLS that works across models, doing so is *far* + more difficult than simply using native thread-local storage. + +- *Allocation and dynamic dispatch*. With the current design, any invocation of + I/O involves at least dynamic dispatch, and in many cases allocation, due to + the use of trait objects. However, in most cases these costs are trivial when + compared to the cost of actually doing the I/O (or even simply making a + syscall), so they are not strong arguments against the current design. + +**Problematic I/O interactions.** As the + [documentation for libgreen](http://doc.rust-lang.org/green/#considerations-when-using-libgreen) + explains, only some I/O and synchronization methods work seamlessly across + native and green tasks. For example, any invocation of native code that calls + blocking I/O has the potential to block the worker thread running the green + scheduler. In particular, `std::io` objects created on a native task cannot + safely be used within a green task. Thus, even though `std::io` presents a + unified I/O API for green and native tasks, it is not fully interoperable. + +**Embedding Rust.** When embedding Rust code into other contexts -- whether + calling from C code or embedding in high-level languages -- there is a fair + amount of setup needed to provide the "runtime" infrastructure that `libstd` + relies on. If `libstd` was instead bound to the native threading and I/O + system, the embedding setup would be much simpler. + +**Maintenance burden.** Finally, `libstd` is made somewhat more complex by + providing such a flexible threading model. As this RFC will explain, moving to + a strictly native threading model will allow substantial simplification and + reorganization of the structure of Rust's libraries. + +# Detailed design + +To mitigate the above problems, this RFC proposes to tie `std::io` directly to +the native threading model, while moving `libgreen` and its supporting +infrastructure into an external Cargo package with its own I/O API. + +## The near-term plan +### `std::io` and native threading + +The plan is to entirely remove `librustrt`, including all of the traits. +The abstraction layers will then become: + +- Highest level: `libstd`, providing cross-platform, high-level I/O and + scheduling abstractions. The crate will depend on `libnative` (the opposite + of today's situation). + +- Mid-level: `libnative`, providing a cross-platform Rust interface for I/O and + scheduling. The API will be relatively low-level, compared to `libstd`. The + crate will depend on `libsys`. + +- Low-level: `libsys` (renamed from `liblibc`), providing platform-specific Rust + bindings to system C APIs. + +In this scheme, the actual API of `libstd` will not change significantly. But +its implementation will invoke functions in `libnative` directly, rather than +going through a trait object. + +A goal of this work is to minimize the complexity of embedding Rust code in +other contexts. It is not yet clear what the final embedding API will look like. + +### Green threading + +Despite tying `libstd` to native threading, however, `libgreen` will still be +supported -- at least initially. The infrastructure in `libgreen` and friends will +move into its own Cargo package. + +Initially, the green threading package will support essentially the same +interface it does today; there are no immediate plans to change its API, since +the focus will be on first improving the native threading API. Note, however, +that the I/O API will be exposed separately within `libgreen`, as opposed to the +current exposure through `std::io`. + +## The long-term plan + +Ultimately, a large motivation for the proposed refactoring is to allow the APIs +for native I/O to grow. + +In particular, over time we should expose more of the underlying system +capabilities under the native threading model. Whenever possible, these +capabilities should be provided at the `libstd` level -- the highest level of +cross-platform abstraction. However, an important goal is also to provide +nonblocking and/or asynchronous I/O, for which system APIs differ greatly. It +may be necessary to provide additional, platform-specific crates to expose this +functionality. Ideally, these crates would interoperate smoothly with `libstd`, +so that for example a `libposix` crate would allow using an `poll` operation +directly against a `std::io::fs::File` value, for example. + +We also wish to expose "lowering" operations in `libstd` -- APIs that allow +you to get at the file descriptor underlying a `std::io::fs::File`, for example. + +On the other hand, we very much want to explore and support truly lightweight +M:N task models (that do not require per-task stacks) -- supporting efficient +data parallelism with work stealing for CPU-bound computations. These +lightweight models will not provide any special support for I/O. But they may +benefit from a notion of "task-local storage" and interfacing with the task +scheduler when explicitly synchronizing between tasks (via channels, for +example). + +All of the above long-term plans will require substantial new design and +implementation work, and the specifics are out of scope for this RFC. The main +point, though, is that the refactoring proposed by this RFC will make it much +more plausible to carry out such work. + +Finally, a guiding principle for the above work is *uncompromising support* for +native system APIs, in terms of both functionality and performance. For example, +it must be possible to use thread-local storage without significant overhead, +which is very much not the case today. Any abstractions to support M:N threading +models -- including the now-external `libgreen` package -- must respect this +constraint. + +# Drawbacks + +The main drawback of this proposal is that green I/O will be provided by a +forked interface of `std::io`. This change makes green threading +"second class", and means there's more to learn when using both models +together. + +This setup also somewhat increases the risk of invoking native blocking I/O on a +green thread -- though of course that risk is very much present today. One way +of mitigating this risk in general is the Java executor approach, where the +native "worker" threads that are executing the green thread scheduler are +monitored for blocking, and new worker threads are spun up as needed. + +# Unresolved questions + +There are may unresolved questions about the exact details of the refactoring, +but these are considered implementation details since the `libstd` interface +itself will not substantially change as part of this RFC. diff --git a/text/0231-upvar-capture-inference.md b/text/0231-upvar-capture-inference.md new file mode 100644 index 00000000000..1326f2e0008 --- /dev/null +++ b/text/0231-upvar-capture-inference.md @@ -0,0 +1,55 @@ +- Start Date: 2014-09-09 +- RFC PR: [rust-lang/rfcs#231](https://github.com/rust-lang/rfcs/pull/231) +- Rust Issue: [rust-lang/rust#16640](https://github.com/rust-lang/rust/issues/16640) + +# Summary + +The `||` unboxed closure form should be split into two forms—`||` for nonescaping closures and `move ||` for escaping closures—and the capture clauses and self type specifiers should be removed. + +# Motivation + +Having to specify `ref` and the capture mode for each unboxed closure is inconvenient (see Rust PR rust-lang/rust#16610). It would be more convenient for the programmer if the type of the closure and the modes of the upvars could be inferred. This also eliminates the "line-noise" syntaxes like `|&:|`, which are arguably unsightly. + +Not all knobs can be removed, however—the programmer must manually specify whether each closure is escaping or nonescaping. To see this, observe that no sensible default for the closure `|| (*x).clone()` exists: if the function is nonescaping, it's a closure that returns a copy of `x` every time but does not move `x` into it; if the function is escaping, it's a closure that returns a copy of `x` and takes ownership of `x`. + +Therefore, we need two forms: one for *nonescaping* closures and one for *escaping* closures. Nonescaping closures are the commonest, so they get the `||` syntax that we have today, and a new `move ||` syntax will be introduced for escaping closures. + +# Detailed design + +For unboxed closures specified with `||`, the capture modes of the free variables are determined as follows: + +1. Any variable which is closed over and borrowed mutably is by-reference and mutably borrowed. + +2. Any variable of a type that does not implement `Copy` which is moved within the closure is captured by value. + +3. Any other variable which is closed over is by-reference and immutably borrowed. + +The trait that the unboxed closure implements is `FnOnce` if any variables were moved *out* of the closure; otherwise `FnMut` if there are any variables that are closed over and mutably borrowed; otherwise `Fn`. + +The `ref` prefix for unboxed closures is removed, since it is now essentially implied. + +We introduce a new grammar production, `move ||`. The value returned by a `move ||` implements `FnOnce`, `FnMut`, or `Fn`, as determined above; thus, for example, `move |x: int, y| x + y` produces an unboxed closure that implements the `Fn(int, int) -> int` trait (and thus the `FnOnce(int, int) -> int` trait by inheritance). Free variables referenced by a `move ||` closure are always captured by value. + +In the trait reference grammar, we will change the `|&:|` sugar to `Fn()`, the `|&mut:|` sugar to `FnMut()`, and the `|:|` sugar to `FnOnce()`. Thus what was before written `fn foo int>()` will be `fn foo int>()`. + +It is important to note that the trait reference syntax and closure construction syntax are purposefully distinct. This is because either the `||` form or the `move ||` form can construct any of `FnOnce`, `FnMut`, or `Fn` closures. + +# Drawbacks + +1. Having two syntaxes for closures could be seen as unfortunate. + +2. `move` becomes a keyword. + +# Alternatives + +1. Keep the status quo: `|:|`/`|&mut:`/`|&:|` are the only ways to create unboxed closures, and `ref` must be used to get by-reference upvars. + +2. Use some syntax other than `move ||` for escaping closures. + +3. Keep the `|:|`/`|&mut:`/`|&:|` syntax only for trait reference sugar. + +4. Use `fn()` syntax for trait reference sugar. + +# Unresolved questions + +There may be unforeseen complications in doing the inference. diff --git a/text/0234-variants-namespace.md b/text/0234-variants-namespace.md new file mode 100644 index 00000000000..3a43617823b --- /dev/null +++ b/text/0234-variants-namespace.md @@ -0,0 +1,82 @@ +- Start Date: 2014-09-16 +- RFC PR #: https://github.com/rust-lang/rfcs/pull/234 +- Rust Issue #: https://github.com/rust-lang/rust/issues/17323 + +# Summary + +Make enum variants part of both the type and value namespaces. + +# Motivation + +We might, post-1.0, want to allow using enum variants as types. This would be +backwards incompatible, because if a module already has a value with the same name +as the variant in scope, then there will be a name clash. + +# Detailed design + +Enum variants would always be part of both the type and value namespaces. +Variants would not, however, be usable as types - we might want to allow this +later, but it is out of scope for this RFC. + +## Data + +Occurrences of name clashes in the Rust repo: + +* `Key` in `rustrt::local_data` + +* `InAddr` in `native::io::net` + +* `Ast` in `regex::parse` + +* `Class` in `regex::parse` + +* `Native` in `regex::re` + +* `Dynamic` in `regex::re` + +* `Zero` in `num::bigint` + +* `String` in `term::terminfo::parm` + +* `String` in `serialize::json` + +* `List` in `serialize::json` + +* `Object` in `serialize::json` + +* `Argument` in `fmt_macros` + +* `Metadata` in `rustc_llvm` + +* `ObjectFile` in `rustc_llvm` + +* 'ItemDecorator' in `syntax::ext::base` + +* 'ItemModifier' in `syntax::ext::base` + +* `FunctionDebugContext` in `rustc::middle::trans::debuginfo` + +* `AutoDerefRef` in `rustc::middle::ty` + +* `MethodParam` in `rustc::middle::typeck` + +* `MethodObject` in `rustc::middle::typeck` + +That's a total of 20 in the compiler and libraries. + + +# Drawbacks + +Prevents the common-ish idiom of having a struct with the same name as a variant +and then having a value of that struct be the variant's data. + +# Alternatives + +Don't do it. That would prevent us making changes to the typed-ness of enums in +the future. If we accept this RFC, but at some point we decide we never want to +do anything with enum variants and types, we could always roll back this change +backwards compatibly. + +# Unresolved questions + +N/A diff --git a/text/0235-collections-conventions.md b/text/0235-collections-conventions.md new file mode 100644 index 00000000000..e922c617860 --- /dev/null +++ b/text/0235-collections-conventions.md @@ -0,0 +1,1774 @@ +- Start Date: 2014-10-29 +- RFC PR #: [rust-lang/rfcs#235](https://github.com/rust-lang/rfcs/pull/235) +- Rust Issue #: [rust-lang/rust#18424](https://github.com/rust-lang/rust/issues/18424) + +# Summary + +This is a combined *conventions* and *library stabilization* RFC. The goal is to +establish a set of naming and signature conventions for `std::collections`. + +The major components of the RFC include: + +* Removing most of the traits in `collections`. + +* A general proposal for solving the "equiv" problem, as well as improving + `MaybeOwned`. + +* Patterns for overloading on by-need values and predicates. + +* Initial, forwards-compatible steps toward `Iterable`. + +* A coherent set of API conventions across the full variety of collections. + +*A big thank-you to @Gankro, who helped collect API information and worked + through an initial pass of some of the proposals here.* + +# Motivation + +This RFC aims to improve the design of the `std::collections` module in +preparation for API stabilization. There are a number of problems that need to +be addressed, as spelled out in the subsections below. + +## Collection traits + +The `collections` module defines several traits: + +* Collection +* Mutable +* MutableSeq +* Deque +* Map, MutableMap +* Set, MutableSet + +There are several problems with the current trait design: + +* Most important: the traits do not provide iterator methods like `iter`. It is + not possible to do so in a clean way without higher-kinded types, as the RFC + explains in more detail below. + +* The split between mutable and immutable traits is not well-motivated by + any of the existing collections. + +* The methods defined in these traits are somewhat anemic compared to the suite + of methods provided on the concrete collections that implement them. + +## Divergent APIs + +Despite the current collection traits, the APIs of various concrete collections +has diverged; there is not a globally coherent design, and there are many +inconsistencies. + +One problem in particular is the lack of clear guiding principles for the API +design. This RFC proposes a few along the way. + +## Providing slice APIs on `Vec` and `String` + +The `String` and `Vec` types each provide a limited subset of the methods +provides on string and vector slices, but there is not a clear reason to limit +the API in this way. Today, one has to write things like +`some_str.as_slice().contains(...)`, which is not ergonomic or intuitive. + +## The `Equiv` problem + +There is a more subtle problem related to slices. It's common to use a `HashMap` +with owned `String` keys, but then the natural API for things like lookup is not +very usable: + +```rust +fn find(&self, k: &K) -> Option<&V> +``` + +The problem is that, since `K` will be `String`, the `find` function requests a +`&String` value -- whereas one typically wants to work with the more flexible +`&str` slices. In particular, using `find` with a literal string requires +something like: + +```rust +map.find(&"some literal".to_string()) +``` + +which is unergonomic and requires an extra allocation just to get a borrow that, +in some sense, was already available. + +The current `HashMap` API works around this problem by providing an *additional* +set of methods that uses a generic notion of "equivalence" of values that have +different types: + +```rust +pub trait Equiv { + fn equiv(&self, other: &T) -> bool; +} + +impl Equiv for String { + fn equiv(&self, other: &str) -> bool { + self.as_slice() == other + } +} + +fn find_equiv + Equiv>(&self, k: &Q) -> Option<&V> +``` + +There are a few downsides to this approach: + +* It requires a duplicated `_equiv` variant of each method taking a reference to + the key. (This downside could likely be mitigated using + [multidispatch](https://github.com/rust-lang/rfcs/pull/195).) + +* Its correctness depends on equivalent values producing the same hash, which is + not checked. + +* `String`-keyed hash maps are very common, so newcomers are likely to run + headlong into the problem. First, `find` will fail to work in the expected + way. But the signature of `find_equiv` is more difficult to understand than + `find`, and it it's not immediately obvious that it solves the problem. + +* It is the right API for `HashMap`, but not helpful for e.g. `TreeMap`, which + would want an analog for `Ord`. + +The `TreeMap` API currently deals with this problem in an entirely different +way: + +```rust +/// Returns the value for which f(key) returns Equal. +/// f is invoked with current key and guides tree navigation. +/// That means f should be aware of natural ordering of the tree. +fn find_with(&self, f: |&K| -> Ordering) -> Option<&V> +``` + +Besides being less convenient -- you cannot write `map.find_with("some literal")` -- +this function navigates the tree according to an ordering that may have no +relationship to the actual ordering of the tree. + +## `MaybeOwned` + +Sometimes a function does not know in advance whether it will need or produce an +owned copy of some data, or whether a borrow suffices. A typical example is the +`from_utf8_lossy` function: + +```rust +fn from_utf8_lossy<'a>(v: &'a [u8]) -> MaybeOwned<'a> +``` + +This function will return a string slice if the input was correctly utf8 encoded +-- without any allocation. But if the input has invalid utf8 characters, the +function allocates a new `String` and inserts utf8 "replacement characters" +instead. Hence, the return type is an `enum`: + +```rust +pub enum MaybeOwned<'a> { + Slice(&'a str), + Owned(String), +} +``` + +This interface makes it possible to allocate only when necessary, but the +`MaybeOwned` type (and connected machinery) are somewhat ad hoc -- and +specialized to `String`/`str`. It would be somewhat more palatable if there were +a single "maybe owned" abstraction usable across a wide range of types. + +## `Iterable` + +A frequently-requested feature for the `collections` module is an `Iterable` +trait for "values that can be iterated over". There are two main motivations: + +* *Abstraction*. Today, you can write a function that takes a single `Iterator`, + but you cannot write a function that takes a container and then iterates over + it multiple times (perhaps with differing mutability levels). An `Iterable` + trait could allow that. + +* *Ergonomics*. You'd be able to write + + ```rust + for v in some_vec { ... } + ``` + + rather than + + ```rust + for v in some_vec.iter() { ... } + ``` + + and `consume_iter(some_vec)` rather than `consume_iter(some_vec.iter())`. + +# Detailed design + +## The collections today + +The concrete collections currently available in `std` fall into roughly three categories: + +* Sequences + * Vec + * String + * Slices + * Bitv + * DList + * RingBuf + * PriorityQueue + +* Sets + * HashSet + * TreeSet + * TrieSet + * EnumSet + * BitvSet + +* Maps + * HashMap + * TreeMap + * TrieMap + * LruCache + * SmallIntMap + +The primary goal of this RFC is to establish clean and consistent APIs that +apply across each group of collections. + +Before diving into the details, there is one high-level changes that should be +made to these collections. The `PriorityQueue` collection should be renamed to +`BinaryHeap`, following the convention that concrete collections are named according +to their implementation strategy, not the abstract semantics they implement. We +may eventually want `PriorityQueue` to be a *trait* that's implemented by +multiple concrete collections. + +The `LruCache` could be renamed for a similar reason (it uses a `HashMap` in its +implementation), However, the implementation is actually generic with respect to +this underlying map, and so in the long run (with HKT and other language +changes) `LruCache` should probably add a type parameter for the underlying map, +defaulted to `HashMap`. + +## Design principles + +* *Centering on `Iterator`s*. The `Iterator` trait is a strength of Rust's + collections library. Because so many APIs can produce iterators, adding an API + that consumes one is very powerful -- and conversely as well. Moreover, + iterators are highly efficient, since you can chain several layers of + modification without having to materialize intermediate results. Thus, + whenever possible, collection APIs should strive to work with iterators. + + In particular, some existing convenience methods avoid iterators for either + performance or ergonomic reasons. We should instead improve the ergonomics and + performance of iterators, so that these extra convenience methods are not + necessary and so that *all* collections can benefit. + +* *Minimizing method variants*. One problem with some of the current collection + APIs is the proliferation of method variants. For example, `HashMap` include + *seven* methods that begin with the name `find`! While each method has a + motivation, the API as a whole can be bewildering, especially to newcomers. + + When possible, we should leverage the trait system, or find other + abstractions, to reduce the need for method variants while retaining their + ergonomics and power. + +* *Conservatism*. It is easier to add APIs than to take them away. This RFC + takes a fairly conservative stance on what should be included in the + collections APIs. In general, APIs should be very clearly motivated by a wide + variety of use cases, either for expressiveness, performance, or ergonomics. + +## Removing the traits + +This RFC proposes a somewhat radical step for the collections traits: rather +than reform them, we should eliminate them altogether -- *for now*. + +Unlike inherent methods, which can easily be added and deprecated over time, a +trait is "forever": there are very few backwards-compatible modifications to +traits. Thus, for something as fundamental as collections, it is prudent to take +our time to get the traits right. + +### Lack of iterator methods + +In particular, there is one way in which the current traits are clearly *wrong*: +they do not provide standard methods like `iter`, despite these being +fundamental to working with collections in Rust. Sadly, this gap is due to +inexpressiveness in the language, which makes directly defining iterator methods +in a trait impossible: + +```rust +trait Iter { + type A; + type I: Iterator<&'a A>; // what is the lifetime here? + fn iter<'a>(&'a self) -> I; // and how to connect it to self? +} +``` + +The problem is that, when implementing this trait, the return type `I` of `iter` +should depend on the *lifetime* of self. For example, the corresponding +method in `Vec` looks like the following: + +```rust +impl Vec { + fn iter(&'a self) -> Items<'a, T> { ... } +} +``` + +This means that, given a `Vec`, there isn't a *single* type `Items` for +iteration -- rather, there is a *family* of types, one for each input lifetime. +In other words, the associated type `I` in the `Iter` needs to be +"higher-kinded": not just a single type, but rather a family: + +```rust +trait Iter { + type A; + type I<'a>: Iterator<&'a A>; + fn iter<'a>(&self) -> I<'a>; +} +``` + +In this case, `I` is parameterized by a lifetime, but in other cases (like +`map`) an associated type needs to be parameterized by a type. + +In general, such higher-kinded types (HKTs) are a much-requested feature for +Rust. But the design and implementation of higher-kinded types is, by itself, a +significant investment. + +HKT would also allow for parameterization over smart pointer types, which has +many potential use cases in the context of collections. + +Thus, the goal in this RFC is to do the best we can without HKT *for now*, +while allowing a graceful migration if or when HKT is added. + +### Persistent/immutable collections + +Another problem with the current collection traits is the split between +immutable and mutable versions. In the long run, we will probably want to +provide *persistent* collections (which allow non-destructive "updates" that +create new collections that share most of their data with the old ones). + +However, persistent collection APIs have not been thoroughly explored in Rust; +it would be hasty to standardize on a set of traits until we have more +experience. + +### Downsides of removal + +There are two main downsides to removing the traits without a replacement: + +1. It becomes impossible to write code using generics over a "kind" of + collection (like `Map`). + +2. It becomes more difficult to ensure that the collections share a common API. + +For point (1), first, if the APIs are sufficiently consistent it should be +possible to transition code from e.g. a `TreeMap` to a `HashMap` by changing +very few lines of code. Second, generic programming is currently quite limited, +given the inability to iterate. Finally, generic programming over collections is +a large design space (with much precedent in C++, for example), and we should +take our time and gain more experience with a variety of concrete collections +before settling on a design. + +For point (2), first, the current traits have failed to keep the APIs in line, +as we will see below. Second, this RFC is the antidote: we establish a clear set +of conventions and APIs for concrete collections up front, and stabilize on +those, which should make it easy to add traits later on. + +### Why not leave the traits as "experimental"? + +An alternative to removal would be to leave the traits intact, but marked as +experimental, with the intent to radically change them later. + +Such a strategy doesn't buy much relative to removal (given the arguments +above), but risks the traits becoming "de facto" stable if people begin using +them en masse. + +## Solving the `_equiv` and `MaybeOwned` problems + +The basic problem that leads to `_equiv` methods is that: + +* `&String` and `&str` are not the same type. +* The `&str` type is more flexible and hence more widely used. +* Code written for a generic type `T` that takes a reference `&T` will therefore + not be suitable when `T` is instantiated with `String`. + +A similar story plays out for `&Vec` and `&[T]`, and with DST and custom +slice types the same problem will arise elsewhere. + +### The `Borrow` trait + +This RFC proposes to use a *trait*, `Borrow` to connect borrowed and owned data +in a generic fashion: + +```rust +/// A trait for borrowing. +trait Borrow { + /// Immutably borrow from an owned value. + fn borrow(&self) -> &B; + + /// Mutably borrow from an owned value. + fn borrow_mut(&mut self) -> &mut B; +} + +// The Sized bound means that this impl does not overlap with the impls below. +impl Borrow for T { + fn borrow(a: &T) -> &T { + a + } + fn borrow_mut(a: &mut T) -> &mut T { + a + } +} + +impl Borrow for String { + fn borrow(s: &String) -> &str { + s.as_slice() + } + fn borrow_mut(s: &mut String) -> &mut str { + s.as_mut_slice() + } +} + +impl Borrow<[T]> for Vec { + fn borrow(s: &Vec) -> &[T] { + s.as_slice() + } + fn borrow_mut(s: &mut Vec) -> &mut [T] { + s.as_mut_slice() + } +} +``` + +*(Note: thanks to @epdtry for [suggesting this variation](https://github.com/rust-lang/rfcs/pull/235#issuecomment-55337168)! The original proposal + is listed [in the Alternatives](#variants-of-borrow).)* + +A primary goal of the design is allowing a *blanket* `impl` for non-sliceable +types (the first `impl` above). This blanket `impl` ensures that all new sized, +cloneable types are automatically borrowable; new `impl`s are required only for +new *unsized* types, which are rare. The `Sized` bound on the initial impl means +that we can freely add impls for unsized types (like `str` and `[T]`) without +running afoul of coherence. + +Because of the blanket `impl`, the `Borrow` trait can largely be ignored except +when it is actually used -- which we describe next. + +### Using `Borrow` to replace `_equiv` methods + +With the `Borrow` trait in place, we can eliminate the `_equiv` method variants +by asking map keys to be `Borrow`: + +```rust +impl HashMap where K: Hash + Eq { + fn find(&self, k: &Q) -> &V where K: Borrow, Q: Hash + Eq { ... } + fn contains_key(&self, k: &Q) -> bool where K: Borrow, Q: Hash + Eq { ... } + fn insert(&mut self, k: K, v: V) -> Option { ... } + + ... +} +``` + +The benefits of this approach over `_equiv` are: + +* The `Borrow` trait captures the borrowing relationship between an owned data + structure and both references to it and slices from it -- once and for all. + This means that it can be used *anywhere* we need to program generically over + "borrowed" data. In particular, the single trait works for both `HashMap` and + `TreeMap`, and should work for other kinds of data structures as well. It also + helps generalize `MaybeOwned`, for similar reasons (see below.) + + A *very important* consequence is that the map methods using `Borrow` can + potentially be put into a common `Map` trait that's implemented by `HashMap`, + `TreeMap`, and others. While we do not propose to do so now, we definitely + want to do so later on. + +* When using a `HashMap`, all of the basic methods like `find`, + `contains_key` and `insert` "just work", without forcing you to think about + `&String` vs `&str`. + +* We don't need separate `_equiv` variants of methods. (However, this could + probably be addressed with + [multidispatch](https://github.com/rust-lang/rfcs/pull/195) by providing a + blanket `Equiv` implementation.) + +On the other hand, this approach retains some of the downsides of `_equiv`: + +* The signature for methods like `find` and `contains_key` is more complex than + their current signatures. There are two counterpoints. First, over time the + `Borrow` trait is likely to become a well-known concept, so the signature will + not appear completely alien. Second, what is perhaps more important than the + signature is that, when using `find` on `HashMap`, various method + arguments *just work* as expected. + +* The API does not guarantee "coherence": the `Hash` and `Eq` (or `Ord`, for + `TreeMap`) implementations for the owned and borrowed keys might differ, + breaking key invariants of the data structure. This is already the case with + `_equiv`. + +The [Alternatives section](#variants-of-borrow) includes a variant of `Borrow` +that doesn't suffer from these downsides, but has some downsides of its own. + +### Clone-on-write (`Cow`) pointers + +A side-benefit of the `Borrow` trait is that we can give a more general version +of the `MaybeOwned` as a "clone-on-write" smart pointer: + +```rust +/// A generalization of Clone. +trait FromBorrow: Borrow { + fn from_borrow(b: &B) -> Self; +} + +/// A clone-on-write smart pointer +pub enum Cow<'a, T, B> where T: FromBorrow { + Shared(&'a B), + Owned(T) +} + +impl<'a, T, B> Cow<'a, T, B> where T: FromBorrow { + pub fn new(shared: &'a B) -> Cow<'a, T, B> { + Shared(shared) + } + + pub fn new_owned(owned: T) -> Cow<'static, T, B> { + Owned(owned) + } + + pub fn is_owned(&self) -> bool { + match *self { + Owned(_) => true, + Shared(_) => false + } + } + + pub fn to_owned_mut(&mut self) -> &mut T { + match *self { + Shared(shared) => { + *self = Owned(FromBorrow::from_borrow(shared)); + self.to_owned_mut() + } + Owned(ref mut owned) => owned + } + } + + pub fn into_owned(self) -> T { + match self { + Shared(shared) => FromBorrow::from_borrow(shared), + Owned(owned) => owned + } + } +} + +impl<'a, T, B> Deref for Cow<'a, T, B> where T: FromBorrow { + fn deref(&self) -> &B { + match *self { + Shared(shared) => shared, + Owned(ref owned) => owned.borrow() + } + } +} + +impl<'a, T, B> DerefMut for Cow<'a, T, B> where T: FromBorrow { + fn deref_mut(&mut self) -> &mut B { + self.to_owned_mut().borrow_mut() + } +} +``` + +The type `Cow<'a, String, str>` is roughly equivalent to today's `MaybeOwned<'a>` +(and `Cow<'a, Vec, [T]>` to `MaybeOwnedVector<'a, T>`). + +By implementing `Deref` and `DerefMut`, the `Cow` type acts as a smart pointer +-- but in particular, the `mut` variant actually *clones* if the pointed-to +value is not currently owned. Hence "clone on write". + +One slight gotcha with the design is that `&mut str` is not very useful, while +`&mut String` is (since it allows extending the string, for example). On the +other hand, `Deref` and `DerefMut` must deref to the *same* underlying type, and +for `Deref` to not require cloning, it must yield a `&str` value. + +Thus, the `Cow` pointer offers a separate `to_owned_mut` method that yields a +mutable reference to the *owned* version of the type. + +Note that, by not using `into_owned`, the `Cow` pointer itself may be owned by +some other data structure (perhaps as part of a collection) and will internally +track whether an owned copy is available. + +Altogether, this RFC proposes to introduce `Borrow` and `Cow` as above, and to +deprecate `MaybeOwned` and `MaybeOwnedVector`. The API changes for the +collections are discussed [below](#the-apis). + +## `IntoIterator` (and `Iterable`) + +As discussed in [earlier](#iterable), some form of an `Iterable` trait is +desirable for both expressiveness and ergonomics. Unfortunately, a full +treatment of `Iterable` requires HKT for similar reasons to +[the collection traits](#lack-of-iterator-methods). However, it's possible to +get some of the way there in a forwards-compatible fashion. + +In particular, the following two traits work fine (with +[associated items](https://github.com/rust-lang/rfcs/pull/195)): + +```rust +trait Iterator { + type A; + fn next(&mut self) -> Option; + ... +} + +trait IntoIterator { + type A; + type I: Iterator; + + fn into_iter(self) -> I; +} +``` + +Because `IntoIterator` consumes `self`, lifetimes are not an issue. + +It's tempting to also define a trait like: + +```rust +trait Iterable<'a> { + type A; + type I: Iterator<&'a A>; + + fn iter(&'a self) -> I; +} +``` + +(along the lines of those proposed by +[an earlier RFC](https://github.com/rust-lang/rfcs/pull/17)). + +The problem with `Iterable` as defined above is that it's locked to a particular +lifetime up front. But in many cases, the needed lifetime is not even nameable +in advance: + +```rust +fn iter_through_rc(c: Rc) where I: Iterable { + // the lifetime of the borrow is established here, + // so cannot even be named in the function signature + for x in c.iter() { + // ... + } +} +``` + +To make this kind of example work, you'd need to be able to say something like: + +```rust +where <'a> I: Iterable<'a> +``` + +that is, that `I` implements `Iterable` for *every* lifetime `'a`. While such a +feature is feasible to add to `where` clauses, the HKT solution is undoubtedly +cleaner. + +Fortunately, we can have our cake and eat it too. This RFC proposes the +`IntoIterator` trait above, together with the following blanket `impl`: + +```rust +impl IntoIterator for I { + type A = I::A; + type I = I; + fn into_iter(self) -> I { + self + } +} +``` + +which means that taking `IntoIterator` is strictly more flexible than taking +`Iterator`. Note that in other languages (like Java), iterators are *not* +iterable because the latter implies an unlimited number of iterations. But +because `IntoIterator` consumes `self`, it yields only a single iteration, so +all is good. + +For individual collections, one can then implement `IntoIterator` on both the +collection and borrows of it: + +```rust +impl IntoIterator for Vec { + type A = T; + type I = MoveItems; + fn into_iter(self) -> MoveItems { ... } +} + +impl<'a, T> IntoIterator for &'a Vec { + type A = &'a T; + type I = Items<'a, T>; + fn into_iter(self) -> Items<'a, T> { ... } +} + +impl<'a, T> IntoIterator for &'a mut Vec { + type A = &'a mut T; + type I = ItemsMut<'a, T>; + fn into_iter(self) -> ItemsMut<'a, T> { ... } +} +``` + +If/when HKT is added later on, we can add an `Iterable` trait and a blanket +`impl` like the following: + +```rust +// the HKT version +trait Iterable { + type A; + type I<'a>: Iterator<&'a A>; + fn iter<'a>(&'a self) -> I<'a>; +} + +impl<'a, C: Iterable> IntoIterator for &'a C { + type A = &'a C::A; + type I = C::I<'a>; + fn into_iter(self) -> I { + self.iter() + } +} +``` + +This gives a clean migration path: once `Vec` implements `Iterable`, it can drop +the `IntoIterator` `impl`s for borrowed vectors, since they will be covered by +the blanket implementation. No code should break. + +Likewise, if we add a feature like the "universal" `where` clause mentioned +above, it can be used to deal with embedded lifetimes as in the +`iter_through_rc` example; and if the HKT version of `Iterable` is later added, +thanks to the suggested blanket `impl` for `IntoIterator` that `where` clause +could be changed to use `Iterable` instead, again without breakage. + +### Benefits of `IntoIterator` + +What do we gain by incorporating `IntoIterator` today? + +This RFC proposes that `for` loops should use `IntoIterator` rather than +`Iterator`. With the blanket `impl` of `IntoIterator` for any `Iterator`, this +is not a breaking change. However, given the `IntoIterator` `impl`s for `Vec` +above, we would be able to write: + +```rust +let v: Vec = ... + +for x in &v { ... } // iterate over &Foo +for x in &mut v { ... } // iterate over &mut Foo +for x in v { ... } // iterate over Foo +``` + +Similarly, methods that currently take slices or iterators can be changed to +take `IntoIterator` instead, immediately becoming more general and more +ergonomic. + +In general, `IntoIterator` will allow us to move toward more `Iterator`-centric +APIs today, in a way that's compatible with HKT tomorrow. + +### Additional methods + +Another typical desire for an `Iterable` trait is to offer defaulted versions of +methods that basically re-export iterator methods on containers (see +[the earlier RFC](https://github.com/rust-lang/rfcs/pull/17)). Usually these +methods would go through a reference iterator (i.e. the `iter` method) rather +than a moving iterator. + +It is possible to add such methods using the design proposed above, but there +are some drawbacks. For example, should `Vec::map` produce an iterator, or a new +vector? It would be possible to do the latter generically, but only with +HKT. (See +[this discussion](https://github.com/rust-lang/rfcs/pull/17#issuecomment-43817453).) + +This RFC only proposes to add the following method via `IntoIterator`, as a +convenience for a common pattern: + +```rust +trait IterCloned { + type A; + type I: Iterator; + fn iter_cloned(self) -> I; +} + +impl<'a, T, I: IntoIterator> IterCloned for I where I::A = &'a T { + type A = T; + type I = ClonedItems; + fn into_iter(self) -> I { ... } +} +``` + +(The `iter_cloned` method will help reduce the number of method variants in +general for collections, as we will see below). + +We will leave to later RFCs the incorporation of additional methods. Notice, in +particular, that such methods can wait until we introduce an `Iterable` trait +via HKT without breaking backwards compatibility. + +## Minimizing variants: `ByNeed` and `Predicate` traits + +There are several kinds of methods that, in their most general form take +closures, but for which convenience variants taking simpler data are common: + +* *Taking values by need*. For example, consider the `unwrap_or` and + `unwrap_or_else` methods in `Option`: + + ```rust + fn unwrap_or(self, def: T) -> T + fn unwrap_or_else(self, f: || -> T) -> T + ``` + + The `unwrap_or_else` method is the most general: it invokes the closure to + compute a default value *only when `self` is `None`*. When the default value + is expensive to compute, this by-need approach helps. But often the default + value is cheap, and closures are somewhat annoying to write, so `unwrap_or` + provides a convenience wrapper. + +* *Taking predicates*. For example, a method like `contains` often shows up + (inconsistently!) in two variants: + + ```rust + fn contains(&self, elem: &T) -> bool; // where T: PartialEq + fn contains_fn(&self, pred: |&T| -> bool) -> bool; + ``` + + Again, the `contains_fn` version is the more general, but it's convenient to + provide a specialized variant when the element type can be compared for + equality, to avoid writing explicit closures. + +As it turns out, with +[multidispatch](https://github.com/rust-lang/rfcs/pull/195)) it is possible to +use a *trait* to express these variants through overloading: + +```rust +trait ByNeed { + fn compute(self) -> T; +} + +impl ByNeed for T { + fn compute(self) -> T { + self + } +} + +// Due to multidispatch, this impl does NOT overlap with the above one +impl ByNeed for || -> T { + fn compute(self) -> T { + self() + } +} + +impl Option { + fn unwrap_or(self, def: U) where U: ByNeed { ... } + ... +} +``` + +```rust +trait Predicate { + fn check(&self, &T) -> bool; +} + +impl Predicate for &T { + fn check(&self, t: &T) -> bool { + *self == t + } +} + +impl Predicate for |&T| -> bool { + fn check(&self, t: &T) -> bool { + (*self)(t) + } +} + +impl Vec { + fn contains