Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enum with associated values #30

Closed
bernardnormier opened this issue Feb 3, 2021 · 21 comments
Closed

Enum with associated values #30

bernardnormier opened this issue Feb 3, 2021 · 21 comments
Labels
Milestone

Comments

@bernardnormier
Copy link
Member

This is a proposal to add associated values to Slice enum, similar to Swift and Rust.
See:
https://docs.swift.org/swift-book/LanguageGuide/Enumerations.html
https://doc.rust-lang.org/rust-by-example/custom_types/enum.html

The syntax of Slice enums with associated values is similar to Rust/Swift:

enum WebEvent
{
    PageLoad,
    PageUnload,
    KeyPress(byte key),
    Paste(string str),
    Click(long x, long y)
}

Syntactically, it's similar to an operation with parameters and no return value.

A Slice enum where any enumerator has one or more associated values:

  • cannot be unchecked
  • cannot have an explicit value for any of its enumerators (like = 5)
  • cannot have an underlying integer type (like enum WebEvent : byte)

In C#, WebEvent and its enumerators would map to a small generated record hierarchy:

public record WebEvent
{
    public sealed record PageLoad() : WebEvent;
    public sealed record PageUnload() : WebEvent;
    public sealed record KeyPress(byte Key) : WebEvent;
    public sealed record Paste(string Str) : WebEvent;
    public sealed record Click(long X, long Y) : WebEvent;
}

Usage: https://dotnetfiddle.net/LH5C6q

This also works well with switch case and switch expression, since we can match on the type (like WebEvent.KeyPress).

@bernardnormier bernardnormier transferred this issue from icerpc/icerpc-csharp Oct 14, 2021
@pepone
Copy link
Member

pepone commented Jun 22, 2022

How will we encode these enumerators, I assume we encode each using a struct with a type ID, and enumerators without associated values, encode like an empty struct of the given type.

@InsertCreativityHere
Copy link
Member

I had thought it'd be something like:

enum MyEnum: byte
{
    Foo,
    Bar(String),
}

for Foo we'd encode [0]
for Bar we'd encode [1][String]
In general we'd enum [value as enum backing type][data stored in enumerator, in order]

We really don't even need type-IDs here, since we know exactly what type will be passed in.

@InsertCreativityHere
Copy link
Member

One question I see is whether we should allow tags here...

enum MyEnum
{
    Foo(tag(1) String?),
}

I think allowing optionals is totally fine, but tags, I don't know. I think it's fine to not support them, making enumerators effectively like compact structs.

@pepone
Copy link
Member

pepone commented Jun 22, 2022

I think allowing optional here is more flexible, one option would be to allow using compact as a modifier for enums like we already do for structs.

compat enum ... {} // Enumerator encode as index + compact struct  
enum ... {} // Enumerator encode as index + struct

@bernardnormier
Copy link
Member Author

I lean towards having only compact enums (and hence no compact qualifier) because "regular" enums without associated values should (obviously) be compact.

If we allowed tags in associated values, then enum Letter { A, B, C } is non-compact by default because you can always add a tagged associated value to each of A, B or C:

enum Letter { A, B(foo: tag(1) string?), C }

@InsertCreativityHere
Copy link
Member

I agree, supporting tags here seems overkill.

Note that there's nothing wrong with enumerators holding non-compact structs, so they can still have flexibility in the underlying data:

struct Foo
{
    data: tag(1) int32,
}

enum Bar
{
    Nothing,
    F(Foo),
}

@pepone
Copy link
Member

pepone commented Jun 23, 2022

I feel that tags are important because they allow applications to evolve, is true that you can use a struct with tags but you have to anticipate the need, I think that defaulting to evolvability and letting the user opt-in compact and squeeze this extra bit of performance would be best, and it seems like supporting both would not add much complexity.

@bernardnormier
Copy link
Member Author

bernardnormier commented Mar 4, 2023

The first thing we need in terms of extensibility for enums with associated values is to support unchecked enums with associated values. This way, if you don't like an existing associated value, you can introduce a new discriminant with the desired value.

Then we can piggy-back on this extensibility to add tags to the enum's associated values.

Proposal

  1. An enum with an underlying type is always encoded as its underlying type. It can't have an associated value.

  2. An enum with no underlying type:

  • has always an associated value (which can be empty)
  • does not support explicit values for its discriminants
  1. A checked enum with no underlying type can be marked compact. It means the associated value is logically a compact struct. For enumerators without an associated value, nothing is encoded beyond the discriminant.

For example:

compact enum Location {
    Unknown
    Anonymous
    Known(coordinates: Coordinates)
} 

Here, Unknown is encoded as 0 (as a varint32), and Known is encoded as 2 (varint32) followed by the coordinates. The Coordinates struct is encoded like a compact struct holding Coordinates.

Without compact, Unknown is encoded as 0 followed by the tag end marker (0xfc), and Known is encoded as 2 (varint32) followed by an anonymous struct that holds the coordinates. This allows associated values to be extended in a wire-compatible manner:

enum Location {
     Unknown(tag(1) name: string?) // wire compatible
     Anonymous
     Known(coordinates: Coordinates)
}
  1. an unchecked enum with no underlying type can also be compact or non-compact:
unchecked enum Location {
    Unknown
    Anonymous
} 

compact = tags not allowed + smaller encoding size (1 byte less: not much)
non-compact = tags allowed

Every enumerator is encoded as: [discriminant][boxed value]. The boxed value is a size followed by an anonymous struct (compact or non-compact depending on the enum qualifier).

For example, Unknown is encoded as 0 (discriminant) followed by an 1 (size) followed by the empty struct tag end marker (since the enum is non-compact).

Then, you could extend it with:

unchecked enum Location {
    Unknown
    Anonymous
    Known
} 

and later:

unchecked enum Location {
    Unknown(tag(1) message: string?)
    Anonymous
    Known
    KnownWithCoordinates(coordinates: Coordinates)
} 

During decoding:

  • if the Slice engine knows the KnownWithCoordinates enumerator, it will decode it properly
  • if the Slice engine does not know the KnownWithCoordinates enumerator, it will decode the discriminant but won't be able to decode the associated value. It can skip this value since it's in a box. It would be nevertheless useful to somehow give the raw sequence<uint8> to the application and support "value preservation". That's a language mapping concern.

@InsertCreativityHere
Copy link
Member

Tags are so powerful with structs because they're 'extensible by default'. If you just type struct Foo {...}, you can use tags. Users have to opt-out of this extensibility by adding the compact modifier to it (if they know they don't need the flexiblity and care about performance).

Having enums be 'un extensible by default' makes the feature partially useless in practice. Few users will have the foresight (or even knowledge) to preemptively mark their enums unchecked just in case they need the extensibility in the future.

More likely they'll only write enum Foo {...}, and by the time they realize "wow, I really wish this variant took an extra bool", there's already nothing they can do about it.


So, if we go with this approach, I think we'd have to switch the behavior of enums with variants:
They should be "unchecked"/"non-compact" by default, and users can opt into disabling this for performance by applying the checked or compact keyword.

This conclusion is a little unfortunate though, because while "unchecked by default" is ideal for enums with variants, "checked by default" makes more sense for 'normal' enumerators...

@bernardnormier
Copy link
Member Author

So, if we go with this approach

I updated the proposal. It's now extensible by default as far as the associated values are concerned.

@bernardnormier
Copy link
Member Author

Having enums be 'un extensible by default'

checked is not quite the same an un-extensible. For example, it's fine to add enumerators to a checked enum as long as you don't send this enumerator to an application that doesn't know about it. You could imagine a new server and new client that know and use this new enumerator, while an old client doesn't and never sees it.

@InsertCreativityHere
Copy link
Member

From this proposal, am I correct that you could have a compact unchecked enum {...}?
An enum where you can add new enumerators to it, but the associated values are forbidden from having tags.

Also, I assume that all the parts about associated values are Slice2 specific, correct? Just to make sure!

@InsertCreativityHere
Copy link
Member

The more I think about this, the more it feels like we're talking about 2 completely separate types.
Just looking at Slice2 for a minute, since Slice1 is set in stone.

'normal' enums:

  • must declare an underlying type
  • can be set to explicit values (A = 6)
  • cannot use associated values (A(b: bool))
  • always encoded directly as their underlying type

enum with associated types:

  • cannot declare an underlying type
  • cannot be set to explicit values (A = 6)
  • can use associated values (A(b: bool))
  • always encoded as either a struct, or a parameter list

Should we maybe just make them two separate types? enum and union or something.
They support/require different modifiers, different syntax, express different behavior, and use different encodings.

enum can be checked or unchecked, can optionally have an underlying type, and contains a list of enumerators that are either implicitely or explicitely set to an integer.
union can use unchecked and compact, can never have an underlying type (it'd just be a syntax error), and contains a list of inline structs.

I think splitting them would make some of the terminology less confusing.
Right now I find it all quite verbose/easy to confuse terms when speaking colloquially.

It also makes it easier to describe things to users:
enum is supported with both Slice1 and Slice2, union is Slice2 only.
vs
enums with associated types are only supported by Slice2 while 'normal'(?) enums are supported by both Slice1 and Slice2.

@bernardnormier
Copy link
Member Author

This analysis is correct, and my proposal does include two distinct syntaxes that use the same enum keyword:

  • an enum with an underlying type is a "basic" enum. The enumerator of such a basic enum can't have an associated value; it can however have an explicit numeric value in the underlying type's range.
  • an enum without an underlying type is an enum with associated values. Each enumerator is a discriminant plus an associated value (which can be empty).

Swift does the same, with a single keyword (enum). See https://docs.swift.org/swift-book/documentation/the-swift-programming-language/enumerations/

AFAIK all programming languages with enum-with-associated-values (aka tagged union, discriminated union...) call this construct 'enum'; it would be confusing to pick a different name in Slice.

@InsertCreativityHere
Copy link
Member

InsertCreativityHere commented Mar 8, 2023

I'll admit it isn't very compelling, but C++ does have different names for these constructs.
They call regular enums enums, and enums with associated values variants.
variant is just a type, not a new keyword or anything, but still, it's a different name.
They describe it as a "type safe union" but it's really just a discriminated union, like enums with associated values are.

@InsertCreativityHere
Copy link
Member

InsertCreativityHere commented Mar 8, 2023

AFAIK all programming languages with enum-with-associated-values (aka tagged union, discriminated union...) call this construct 'enum'; it would be confusing to pick a different name in Slice.

It's interesting that we arrived at opposite conclusions from the same information!

To me, it would be confusing to use the same name because for developers from C#, Java, TypeScript, Kotlin, C++, etc.,
enum only means "basic enum". Re-using this term for a different concept/data structure might catch them off guard.
Or fly under their radar: "I already know what structs, enums, and interfaces are, I'll just skim these parts of the docs..."

Many languages use enum to only refer to what we're calling basic enums.
The number of languages that use enum to mean both kinds is fairly small (mostly just because associated values are rare).

Either way, not a pressing discussion to have.

@bernardnormier
Copy link
Member Author

For enum with associated values in C#, see also: https://github.com/domn1995/dunet

@InsertCreativityHere
Copy link
Member

InsertCreativityHere commented Oct 11, 2023

In languages like C# that don't have native support for these,
enums will have 2 very-different mappings (let's run with C#):

  • 'normal' enums get mapped to enum
  • 'assocated value' enums get mapped to nested record classes

This proposal says the deciding factor will be "is there an underlying type?"

  • yes -> it's a normal enum
  • no -> it's an assocated value enum

This produces a bad mapping for the most common kind of enum:
those with neither associated values nor underlying types.

enum MyEnum {
    Foo,
    Bar,
}

These are the cleanest looking enums, and the one that users will initially reach for.

But will get treated as enums with associated values, and mapped 'poorly' in many languages:
A C# developed will find a bunch of empty record classes nested together, instead of an enum.
A C++ developer will find a variant of monostates, instead of an enum class.
etc.
All of these will surprising to users, and harder to use.

In my opinion, deciding which mapping to use should be based on are there associated values, and not is there an underlying value. This criterion results in more natural mappings more often.

It's worth remembering most languages do not support associated values. C# doesn't, Kotlin doesn't, Typescript doesn't, Python doesn't, Go doesn't even have enums, let alone with associated values.
Of course, we will find ways to emulate associated values, but the slightly-hacky emulated mapping as the 'default' is suboptimal.

Ensuring that most languages have natural, expected mappings out-of-the-box is more important than optimizing the syntax of enums for the 2 languages (Rust and Swift) that can properly use this new feature IMO.


P.S. I just feel like normal enums should be the default in my gut.

@InsertCreativityHere
Copy link
Member

InsertCreativityHere commented Oct 11, 2023

A concern I anticipate is: "future extensibility".
Making 'normal' enums the default means that adding an associated value is a breaking change.

// For example, going from `A` to `B` would be a breaking change.
enum A { Foo, Bar }
enum B { Foo, Bar(b: bool) }

You are correct, but "future extensibility" is unsolvable here. With the current proposal, this is already a breaking change:

enum A { Foo, Bar }
enum B { Foo, Bar = 5 }

No matter which choice we make, we will be loosing one form of extensibility.
Either the ability to add associated values, or the ability to add 'raw values'.

At best, we can try to argue about which form of extensibility is more important,
or guess which kind of enum will see more use. But I think they're nominally equivalent.

@bernardnormier
Copy link
Member Author

With the current proposal, this is already a breaking change:

That's not correct. The current proposal doesn't allow you to assign a value to an enumerator when the enumeration has no underlying type.

@InsertCreativityHere
Copy link
Member

Parser support for enums with associated values was added in #664.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants