-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sort handles Incomparable types by default #5742
Comments
Wanted behavior:
|
Do we specify what happens for Vectors like Implicitly, I think the idea is to group by type? So If that's the solution, how do we determine the relative order between types? |
I'd suggest a new parameter to
The behavior is clear when The only idea I have is to put the bigger one as first and the smaller one second. The size of a group can either be size of set or it can be size with duplicates. Each may have some nice parameters.
I don't think we can do it in general - Unless you want to do it by comparing their fully qualified names. We can treat few fixed types especially and hardcode their order. |
You mean the warning should be attached to the vector, not to the incomparable value, right? Attaching it to the incomparable value seems not useful, as it would be hard for the user to spot that there is some warning attached, whereas with the warning on the vector it would/should be visible straight away.
I think sorting of a vector with mixed types makes sense as long as the types are "primitive", i.e., numbers, text, nothing - and for them, we can determine that, e.g., numbers come before text or vice versa. If there is a mixture of other types, trying to sort it should IMHO fail anyway. Users should provide their own comparator functions for such a non-standard collection of values.
According to my previous note, the only incomparable types that we would sort by default would be just |
Another interesting topic from a discussion with @radeusgd:
Clearly such sets can only have "partial order". If you have type Set with a comparator based on set's elements, then all objects of type Set will be grouped together (as they share the same comparator) and they shall end up in some topological order. Of course that would require |
With the above in mind, we also need to be aware that the user may define an ordering which may not necessarily be symmetric, reflexive or transitive. In these cases, we cannot really do much - probably should keep the order of elements as-is, maybe report some warning. But ideally we shouldn't for example enter an infinite loop if we get an example where |
@Akirathan, such attitude leads towards the "Rust approach":
The goal of Enso
You are right @radeusgd, a warning must be attached whenever the
Right, we shall know what to do when the ordering constraints decay. Here is a classification of various cases and proposal how to handle such situation. Linear OrderingIf all values are subject to linear order we just sort them. They all must have the same Default ComparatorMany of the builtin Enso types have the same "default comparator". That is beneficial for the purposes of performance and allows the engine to perform various optimizations. However, from a logical perspective - different kinds of builtin are supposed to be treated as having "different comparator". It is up to the engine Different ComparatorsValues that don't have the same Partial OrderThe current design of comparators tries hard to avoid "partial ordering" (inside of a single I don't like Topological SortObjects with partial ordering can still be sorted by topological sort. We will optimistically speculate on linear order of all values, but as soon as we find out it is not there, we resort into topological sorting and attach warnings to notify the user. Topological sort may produce different results, but every result it produces respects all the specified Violated Transitivity or Anti-SymmetryCustom Even such situation has a reasonable solution (and I implemented it once in the past) - just detect each cycle and collapse the cycle into a single element. Perform a topological sort on the simplified graph (succeeds as all cycles have been eliminated) and then expand the collapsed elements back. The elements that violated anti-symmetry are in random order, but all the other elements are properly sorted. Typical Enso users shall welcome such result especially with a handful warning describing which elements couldn't be sorted because they formed a cycle. Violating ReflexivityIt is not possible in Enso for Violating StabilityWhat if two subsequent calls to Benefits of Trying to Sort HardThere are reasonable data structures that cannot have total order, just partial one. @radeusgd mentioned:
Having a flexible |
Thanks @JaroslavTulach for this amazing writeup! The new sorting design looks really really great IMO |
Pavel Marek reports a new STANDUP for today (2023-03-14): Progress: Analyzing the impact of changes. Thinking about how to introduce an optimized, default version of Vector.sort / Array.sort without changing the signature of Vector.sort. Thinking about generalization of "sorting incomparable values". It should be finished by 2023-03-21. |
Pavel Marek reports a new STANDUP for today (2023-03-15): Progress: Developing a new, optimized sort node for the most common case when all the elements of a vector have the default comparator, and |
After discussion with Pavel I've modified the specification to include:
|
Pavel Marek reports a new STANDUP for yesterday (2023-03-16): Progress: Implemented the optimized Vector.sort_builtin for the most common case - when the vector contains only primitive values with the default comparator. It should be finished by 2023-03-21. |
Pavel Marek reports a new STANDUP for today (2023-03-17): Progress: Published a PR, basic tests for the primitive values work, even for vectors with elements with different comparators. It should be finished by 2023-03-21. |
Pavel Marek reports a new STANDUP for today (2023-03-20): Progress: Dealing with sorting of values with different comparators. The groups of sorted values with different comparators will be merged together by the order of FQN of comparators. Writing more tests. It should be finished by 2023-03-21. |
Pavel Marek reports a new 🔴 DELAY for today (2023-03-21): Summary: There is 3 days delay in implementation of the sort handles Incomparable types by default (#5742) task. Delay Cause: Blowout of complex Enso code handling, trying to merge all the sort functionality into a single builtin node. Need more time for that. |
Pavel Marek reports a new STANDUP for today (2023-03-21): Progress: Bumped into a complex failure case when given a vector of custom types with custom comparators, that either return Nothing or fail with Incomparable_Values - difficult to handle that in Enso - blowout of Enso code. Trying to merge the sort functionality into a single builtin. For that, it is theoretically needed to only precompute some values, and everything should be handled by the builtin. Also the performance should be better. It should be finished by 2023-03-24. |
Pavel Marek reports a new STANDUP for today (2023-03-22): Progress: Consolidating all the sorting functionality into a single builtin node. So far, have not bumped into major issues. It should be finished by 2023-03-24. |
Pavel Marek reports a new STANDUP for yesterday (2023-03-23): Progress: Discussed other performance issue with Dmitry. Bumped into an issue with warnings not being stripped from |
Pavel Marek reports a new STANDUP for today (2023-03-24): Progress: Gave up the issue with proper warnings handling, and created a separate issue for that - #6070. Got |
Pavel Marek reports a new 🔴 DELAY for today (2023-03-24): Summary: There is 7 days delay in implementation of the sort handles Incomparable types by default (#5742) task. Got slowed down a bit by the issue with warnings on the vector not being stripped away, but rather duplicated. Moreover, the sorting issue turned out to be more complex than anticipated. Right now, I at least have all my created tests passing, but I still need to deal with other tests, and probably also run benchmarks. Delay Cause: I expect that I will need roughly another week to finish this task. |
Pavel Marek reports a new STANDUP for today (2023-03-27): Progress: Continuing with fixing some corner cases. Handle incorrect |
Pavel Marek reports a new STANDUP for today (2023-03-28): Progress: Struggling with the BC 7, will probably finish it quite late and postpone it for yet another week. Implementing direct call of |
Pavel Marek reports a new 🔴 DELAY for yesterday (2023-04-03): Summary: There is 7 days delay in implementation of the sort handles Incomparable types by default (#5742) task. Delay Cause: I had longer vacation than I expected, catching up... |
Pavel Marek reports a new STANDUP for yesterday (2023-04-03): Progress: Fixing some tests, catching up after vacation. It should be finished by 2023-04-07. |
Pavel Marek reports a new STANDUP for today (2023-04-05): Progress: Fixing more tests, fixing how |
Pavel Marek reports a new STANDUP for today (2023-04-06): Progress: Removing Array.sort_builtin - it is now replaced with |
Pavel Marek reports a new STANDUP for today (2023-04-07): Progress: Found out that the OutOfMemoryError is caused by too frequent reassignment of a single warning attached to the sorted vector. Every InvokeMethodNode creates a new Warning object with a new reassignment location. This is a potential bug in how we process warnings. Next week, I will try to reproduce this and create a separate issue for that. It should be finished by 2023-04-07. |
What could be related to this issue is that our warnings get duplicated easily. I was advocating to do something about it a while ago, but I think after all no issue was created, maybe it is time. Here is a related discussion: https://discord.com/channels/401396655599124480/951879392743796757
|
Pavel Marek reports a new 🔴 DELAY for yesterday (2023-04-12): Summary: There is 10 days delay in implementation of the sort handles Incomparable types by default (#5742) task. All tests seem to be passing now, except for two that can be fixed immediately. Need some time for review, and possibly benchmark comparison. I expect this to be the very last delay for this issue. Need to merge it ASAP. Delay Cause: Problems with OOM, warnings, need to introduce new parameter, some problems after merge with develop - builtin methods cannot be called as static methods, only instance methods. |
Pavel Marek reports a new STANDUP for yesterday (2023-04-12): Progress: Dealing with new |
Pavel Marek reports a new STANDUP for today (2023-04-13): Progress: Almost all test seems to be passing now. Bumped into issue with builtin methods not callable as static methods. Some last cleanups before final reviews. It should be finished by 2023-04-17. |
Pavel Marek reports a new STANDUP for today (2023-04-14): Progress: Integrated all review suggestions. Fixed all the tests. Scheduled benchmarks. Run locally some benchmarks that suggest that |
Pavel Marek reports a new STANDUP for today (2023-04-15): Progress: Compared benchmarks, seems very decent, but had to revert one commit that caused some performance issues. Integrated all the other reviews. Going to merge ASAP, hopefully until tomorrow. It should be finished by 2023-04-17. |
Problem Definition
The following snippet should not fail with
Incomparable_Values
error:The problem is, that
Nothing
andNumber.nan
are incomparable, i.e., they returnNothing
from theircompare
method, whereas integers and decimals are comparable.User Wanted Behavior
Nothing
: Nothing ends up together at one end. No warning.NaN
:NaN
ends up next to floats aboveNothing
. Attach a warning .Comparator.from
) found: Parcel into groups, sort independently and attach a warning. Bigger parcel first, smaller then.Warn, Fail or Nothing
Introduce new parameter to
sort
:(on_incomparable = Warn) : Fail | Warn | Accept
.Re. "...ends up at one end..." - (optionally in @JaroslavTulach opinion) introduce a new argument in
sort
method that specifies how these incomparable values should be handled - whether they should be sorted at the very beginning of the vector, or at the very end.Technical Specification
Comment below describes the desired behavior. Few notes:
Nothing
has a special treatment - and ends up at one endcompare: Any -> Any -> (Ordering | Nothing)
NaN
compares toNothing
with everything - it'd be great to not treat it as a special case, but as a regular one - if possibleNothing
- leave that at its position sort only the other elements(!?).Related
Comparators were introduced in #4067, see
Ordering.enso
for docs.Related:
The text was updated successfully, but these errors were encountered: