Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Set inside Variant #94399

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

radiantgurl
Copy link
Contributor

@radiantgurl radiantgurl commented Jul 15, 2024

closes godotengine/godot-proposals#867

  • Core
  • Documentation
  • Editor
    • Icon
  • GDScript
    • Implementation
    • Tests
      • Non-typed test
      • Typed test

TODO:

  • Modify the VariantParser to no longer use ERR_PRINTER_ON_FIRE to signal from the dictionary parser that it's actually a set.
  • Modify for it to support sets with one element.
  • Make the unit tests stop failing even if the output is correct.
  • Implement typed sets
  • Fix issues that make the typed sets unit test fail.

TODO AFTER MERGE:

  • Mono
  • Make sets exportable.
  • Add set to godot-cpp

@AThousandShips
Copy link
Member

Note that the opposite of "disjoint" is "non-disjoint" or better "overlapping", if we're going to use set theory terms I think we should use established ones ("joint" is not one), but I'd say the more accessible and generally useful phrasing would be "non-overlapping" and "overlapping" for what are now called "disjoint" and "joint"

@radiantgurl
Copy link
Contributor Author

Using placeholder icon for now.

@fire
Copy link
Member

fire commented Jul 16, 2024

ERROR: @GlobalScope.xml: Unrecognized opening tag "[Set]" in method "is_same" description. Strange error in the cicd.

@AThousandShips
Copy link
Member

That's because the doc isn't generated so it cannot be looked up, the docfile has to be generated first

@radiantgurl radiantgurl force-pushed the variant-set branch 3 times, most recently from 102906d to 08882f8 Compare July 20, 2024 12:23
@radiantgurl radiantgurl reopened this Jul 20, 2024
@radiantgurl
Copy link
Contributor Author

Woops

@radiantgurl radiantgurl force-pushed the variant-set branch 4 times, most recently from 8f763c8 to 6714305 Compare July 21, 2024 19:02
@radiantgurl
Copy link
Contributor Author

image
Doesnt pass tests, i have no idea why. The file output matches with the test.

@radiantgurl
Copy link
Contributor Author

Since the file output matches the unit tests expected, i won't let it stop me.

@radiantgurl radiantgurl marked this pull request as ready for review July 21, 2024 19:49
@radiantgurl radiantgurl requested review from a team as code owners July 21, 2024 19:49
@fire
Copy link
Member

fire commented Jul 22, 2024

I personally would like set even though it bloats variant by another type. Not sure how much of a review I can do.

Copy link
Member

@AThousandShips AThousandShips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial feedback

core/variant/set.h Outdated Show resolved Hide resolved
core/variant/set.cpp Outdated Show resolved Hide resolved
core/variant/set.cpp Outdated Show resolved Hide resolved
core/variant/variant_setget.cpp Outdated Show resolved Hide resolved
core/variant/set.h Outdated Show resolved Hide resolved
modules/gdscript/gdscript_analyzer.cpp Outdated Show resolved Hide resolved
core/variant/set.cpp Outdated Show resolved Hide resolved
modules/gdscript/gdscript_parser.h Outdated Show resolved Hide resolved
@radiantgurl radiantgurl force-pushed the variant-set branch 3 times, most recently from 28cf3fd to e44a479 Compare July 22, 2024 09:19
Copy link
Member

@HolonProduction HolonProduction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm not convinced of the GDScript syntax {A, B, C}. It adds unnecessary complexity when parsing. Also I don't think it is very readable. I would prefer to constructs sets like packed arrays:
Set().

modules/gdscript/gdscript_parser.cpp Outdated Show resolved Hide resolved
@@ -2975,6 +2975,13 @@ GDScriptParser::ExpressionNode *GDScriptParser::parse_dictionary(ExpressionNode
case GDScriptTokenizer::Token::EQUAL:
dictionary->style = DictionaryNode::LUA_TABLE;
break;
case GDScriptTokenizer::Token::COMMA:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that parse_dictionary should call to parse_set. Better use another method parse_dictioanry_or_set that delegates to both of the methods.

modules/gdscript/gdscript_parser.cpp Outdated Show resolved Hide resolved
modules/gdscript/gdscript_analyzer.cpp Outdated Show resolved Hide resolved
@radiantgurl
Copy link
Contributor Author

Overall I'm not convinced of the GDScript syntax {A, B, C}. It adds unnecessary complexity when parsing. Also I don't think it is very readable. I would prefer to constructs sets like packed arrays: Set().

It would be a lot straight forward to use the mathematical and python syntax as its easier to understand/comprehend and even use.

@Calinou
Copy link
Member

Calinou commented Jul 22, 2024

I'd say the {A, B, C} syntax can be confusing, since it can be mistaken for a dictionary at a glance. Most other programming languages also don't have a shorthand for their Set type. I would prefer an explicit Set() syntax given this type won't be used very often (way less often than Array and Dictionary).

@HolonProduction
Copy link
Member

Could you elaborate why you think it would be better to understand/comprehend? Not everyone might be familiar with mathematical notations. And while not everyone might be familiar with the term "Set" as well, it at least gives a good search query to start (and a target for symbol lookup to get to the documentation).

Also while the notation is well established in math, I don't think it is in programming, or is it? I personally wouldn't feel certain that I'm looking at a set if I see it in a language that I didn't know.

It might be a tradeoff for the usage if you have to write it out all the time. But given it's only three letters I don't think it's that important.

If it turns out to be a real problem we could implement vararg methods for common set operations, but given that there is no overloading this might get confusing.

@Mickeon
Copy link
Contributor

Mickeon commented Jul 22, 2024

I myself brought up concerns about the name "Set" itself. I'm fully aware this is how it is most commonly known as, but I wonder if it's worth evaluating another name for this type.

Main reason is because "set" is used in Godot exclusively to refer to... a setter, setting a property.
Second reason, and I know the following is extremely nitpicky, but a lot of class reference will have to be modified to avoid any possible ambiguity, as using "set" to refer to a group of elements in an array is reasonably common.

@HolonProduction
Copy link
Member

I mean I can kind of understand that sentiment, but I can't really think of any another name that wouldn't lead to the same effect in the other direction (avoiding calling our NotSets set). Also I don't really think that has been a point of confusion in other languages. E.g. Kotlin has both Set type and a set keyword.

I mean if there is a concrete idea for an alternative name we can evaluate it, but I can't think of one.

@Bromeon
Copy link
Contributor

Bromeon commented Jul 23, 2024

Agree with Mickeon, it's particularly bad because Godot uses a ton of "set" identifiers all over the place, including set() as a method on every single class. Simply calling it HashSet would already solve most issues, and it makes also clear that it's O(1) and not a tree-based implementation. This is further clarified for people coming from C++ and being used to std::set.

E.g. Kotlin has both Set type and a set keyword.

But set doesn't usually appear as part of method names, when it refers to "setting a value". In fact, Kotlin is a good example because it entirely replaced setter methods with properties. So the confusion is naturally lower.

@Bromeon
Copy link
Contributor

Bromeon commented Jul 23, 2024

Some thoughts regarding this feature in general:

  1. Is HashSet conceptually identical to a Dictionary with null (unit) values, except maybe faster? Or are there notable differences?

  2. How would HashSet interoperate with other data structures like Array or Dictionary?

    • How would conversions look?
  3. Is there a strategy for dealing with mutations of the key type?

    • Mostly a problem for objects with custom hashes.
    • Mutation should likely be disallowed, but maybe there could be Debug checks or so, also useful in combination with iteration (but not sure how hard that would be...)
  4. Are there cases where the Godot API itself benefits from this type? Or is it mostly a user convenience?

  5. How do users and Godot API developers choose between Array and HashSet?

    • Array is often the faster choice even if you need a set, unless you have a huge number of elements. Linear search on small numbers is very efficient, and you benefit from contiguous storage, especially for iteration.
    • Until now, existing APIs use Array even when elements are unique and order is not essential. These can't change for compatibility reasons, even after HashSet is added, so we have potential inconsistencies.
    • We need to keep in mind that we had this exact situation with String and StringName -- adding StringName led to many inconsistent APIs, and the implicit conversions caused performance problems in some cases (simply passing strings could lead to expensive conversions in every call). There needs to be a strategy for this.
  6. Are there plans for making this typed in the future, i.e. HashSet[T]?

@AThousandShips
Copy link
Member

I think this should be a RefCounted type like PackedDataContainer instead of a Variant type, I don't think the general demand or need for this type is great enough to justify the major overhead and changes to core code. Especially since this does not use typing, if Set[T] was a thing then this would be far more reasonable as a Variant type, but then that would involve even more code and complexity which I don't think is necessary

The main reasons to consider this as a Variant type would be, IMO:

  1. Support for initializers
  2. Typing support
  3. Cleaner serializing support
  4. Tighter integration with extensions and C#

But 1 has been questioned, and I agree that it both isn't very necessary to have it, and that it would be hard to do so in a way that doesn't both create confusion and make parsing more complicated

For 2 there's no implementation for it, and doesn't seem to be any intention to add it

I also don't necessarily see the need for 3, I'd consider sets here a matter of processing and not data storage, and with a good set of methods you can easily just input an Array of data into the set as part of some de-serialization method if you desire

So for me only 4 stands out as a real benefit here, but I don't think a high performance and tight coupling is a critical part of this, it would potentially be if this was a highly needed feature, but I don't think that there will be a large number of users using this in performance critical areas

tl;dr;
This should, IMO, be a RefCounted type with a HashSet backing, and hopefully future additions of generics to GDScript can allow us to use typing with them if desired in the future

@HolonProduction
Copy link
Member

Should we maybe move over to the proposal for further discussions like this?

@AThousandShips
Copy link
Member

I'd say most of this is implementation related which belongs here, though the broader discussion of Variant vs RefCounted could be served better over on the proposal

@AlyceOsbourne
Copy link

I mean, the whole point is all of the set operations.
Yes, its similar to a dict with null values, but half the size because we don't waste a tonne of data, Everything from spatial hashing, to data sieves, to simple deduplication, there are many reasons to have sets in a game engine.
Every time Sets are bought up in chat, amongst the general users, there is always a display of bafflement to why many of the recent features have been added BEFORE sets, given they are such a fundamental coding tool.
Within the first week of using Godot I ended up writing a set implementation, because I was having to manually apply set operations, and was genuinely surprised that godot is the only lang I have used that doesn't come with a set.

In terms of type conversion and coercion. It should be able to swap too and from an array.
For serialization, again, basically an array, with a flag to mark as set.

On the syntax points here, Python has made use of the {} set syntax since forever, and I mean, godot has borrowed a tonne of its syntax from python anyway.

It seems to me that the argument isn't so much against the inclusion of sets, and more for that fact the type system needs fixing. Its clear that it hasn't been written with the future in mind, and is rigidly designed in such a way that it doesn't support reasonable additions.

@radiantgurl
Copy link
Contributor Author

radiantgurl commented Jul 23, 2024

In terms of type conversion and coercion. It should be able to swap too and from an array. For serialization, again, basically an array, with a flag to mark as set.

Serializing will not work if Set is RefCounted, if you try to serialize with allow_objects = false it will serialize it as an ObjectID

On the syntax points here, Python has made use of the {} set syntax since forever, and I mean, godot has borrowed a tonne of its syntax from python anyway.

This is the reason why i wanted to add it using that syntax.

@dalexeev
Copy link
Member

I mean, the whole point is all of the set operations.

We could add corresponding methods to Array and/or Dictionary. For Array, we could also take repetition into account, see godotengine/godot-proposals#1756 (comment):

[2, 2, 3].merge_with([2, 5]) # [2, 2, 3, 5]
[2, 2, 3].merge_with([2, 2, 5]) # [2, 2, 3, 5]
[2, 2, 3, 5].intersect_with([2]) # [2]
[2, 2, 3, 5].intersect_with([2, 2]) # [2, 2]

Serializing will not work if Set is RefCounted, if you try to serialize with allow_objects = false it will serialize it as an ObjectID

We could solve this on the serialization side by adding special handling for the Set class. We already support a special syntax for resources in text serialization:

} else if (id == "Resource" || id == "SubResource" || id == "ExtResource") {

@HolonProduction
Copy link
Member

HolonProduction commented Jul 23, 2024

Serializing will not work if Set is RefCounted, if you try to serialize with allow_objects = false it will serialize it as an ObjectID

That's a good point. I didn't think about that. (Edit: see dalexeev's comment ☝️)

On the syntax points here, Python has made use of the {} set syntax since forever, and I mean, godot has borrowed a tonne of its syntax from python anyway.

This is the reason why i wanted to add it using that syntax.

I don't think that really matters. GDScript has the same syntax in a lot of places, but it also borrows from other languages like javascript or lua. In the end we need to evaluate new syntax with GDScript in mind and not python.

In terms of type conversion and coercion. It should be able to swap too and from an array.

Not sure about that. Rust for example can do all its coercion at compile time, that's why it's basically free for cases were it isn't needed. GDScript being gradual typed can't rely on that. So all code or at least all unsafe code would get slower through this even when not using sets.

@AThousandShips
Copy link
Member

AThousandShips commented Jul 23, 2024

For array we could also add a method like the STL unique, erasing sequential identical elements, to make Array more convenient for set operations

Edit (to not clutter):
Also, for serialization (beyond convenience) there's no real benefit of doing an integrated serialization instead of having construction or insertion from an array, as performance wise there's no benefit of reloading from storage on construction or some special way, as we create a HashSet which can't be constructed faster as such, i.e. using a method that allows you to add an Array full of entries in one go would be as performant as any construction

The only necessity of adding a method to add a whole bunch of entries at once would be to allow using reserving etc. (it'd be different if we used a tree based structure, there storing and reloading would be something that could be made significantly more performant)

@HolonProduction
Copy link
Member

We could solve this on the serialization side by adding special handling for the Set class. We already support a special syntax for resources in text serialization:

} else if (id == "Resource" || id == "SubResource" || id == "ExtResource") {

I guess we don't have a mechanic for modules or gdextensions to add such handlers right?

@radiantgurl
Copy link
Contributor Author

In terms of type conversion and coercion. It should be able to swap too and from an array.

Not sure about that. Rust for example can do all its coercion at compile time, that's why it's basically free for cases were it isn't needed. GDScript being gradual typed can't rely on that. So all code or at least all unsafe code would get slower through this even when not using sets.

We use a switch case, it doesn't get slower.

@radiantgurl
Copy link
Contributor Author

radiantgurl commented Aug 1, 2024

I've decided to go forward with implementing typed sets as well, as we cannot make RefCounted currently take container types.

@adamscott
Copy link
Member

Hi there @RadiantUwU !

I added the discussion label because this feature warrants a serious discussion internally about the necessity sets inside Variant. I don't want to prejudge, neither predict the outcome of the PR, but there's a lot of chances that it could be rejected on the basis that Variant is already stuffed to the brim. Adding types to Variant affects heavily performance, as it is used practically everywhere in the engine. So you'll understand why we don't want to "bloat" Variant.

While some of us can offer you reviews and such, it doesn't necessarily mean that this PR would be merged even if everything is alright and up to standards.

You put a lot of work on this PR and we are grateful for it. We'll take time to review it and decide whether or not it is right to merge. We just want you to know the odds of the outcome in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a Set primitive to complement Dictionary and Array
10 participants