-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std.sort: add pdqsort and heapsort #15412
Conversation
235681e
to
0a96d78
Compare
6387526
to
e5eff13
Compare
Do you have some benchmarks of this Zig implementation vs block sort? |
87c8eaf
to
d9b1415
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this, @alichraghi.
Can you justify why you changed the fundamental comparison function from lessThan()bool
to order()Order
? I picked lessThan()
because it requires only one comparison every time, while order()
requires 1-3, even if the callsite only cares about whether the objects are already ordered. order()
can also be written in terms of lessThan()
. lessThan()
is a simpler operation.
Let us make sure that when this lands, #8289 is solved for good. At the very least there should be an assert for doing it wrong that leads directly to the solution, or the API should make this mistake impossible.
It also looks like this introduces recursion? So now user input can potentially cause stack overflow in the sort function. I see that it uses log(N) stack memory though, which is nice. Can we guarantee some kind of upper bound on stack memory usage, which will later be used to annotate the recursion (#1639)?
Can you please take this opportunity to fix
/// TODO currently this just calls `insertionSortContext`. The block sort implementation
/// in this file needs to be adapted to use the sort context.
pub fn sortContext(len: usize, context: anytype) void {
return insertionSortContext(len, context);
}
Regarding namespaces, let's also try to avoid std.sort.sort()
. How about std.mem.sort
for the default sort, and std.sort
can be used for things like std.sort.quick
, std.sort.heap
, std.sort.insertion
, and things like this. I'm serious about not having redundant names in fully qualified namespaces.
Also, the default sorting algorithm should be stable.
faf0265
to
fc0f23c
Compare
EDIT: turns out pdq can be easily adapted into sortContext. pdq requires equality comparision (discussion) so i went with gantt
title Sorting (ascending) 10000000 usize
dateFormat x
axisFormat %S s
section random
pdq_order 2.336: 0,2337
pdq_lessThan 2.270: 0,2270
section sorted
pdq_order 0.068: 0,68
pdq_lessThan 0.026: 0,26
section reverse
pdq_order 0.509: 0,509
pdq_lessThan 0.390: 0,390
section ascending saw
pdq_order 1.692: 0,1692
pdq_lessThan 1.578: 0,1578
section descending saw
pdq_order 1.694: 0,1694
pdq_lessThan 1.549: 0,1549
the other way is accepting pub fn pdqSort(
comptime T: type,
items: []T,
context: anytype,
comptime cmp: fn (context: @TypeOf(context), lhs: T, rhs: T) bool,
) void { can be expressed into pub fn pdqSort(
comptime T: type,
items: []T,
context: anytype,
) void { but the compiler crashed and after hours i wasn't able to figure out why :(
Creating values for all types to just check this looks very cursed. i will look at the algorithm later
it's impossible to be adapted into blockSort because of
done |
c2e5978
to
db8676e
Compare
all algorithms except block respects the context variant and an upper bound is now guaranteed on stack memory usage in pdqsort. |
Nice work. The other issues on my wishlist can be tackled separately. |
Closes #15388
pdqsort (Pattern-defeating quicksort):
A novel sorting algorithm that combines the fast average case of randomized quicksort
with the fast worst case of heapsort, while achieving linear time on inputs with certain patterns.
This implementation is based on https://github.com/zhangyunhao116/pdqsort
which later replaced Go's original sorting algorithm
also:
cc @matklad, @voroskoi