complete btree rewrite #17334

Gankra · 2014-09-17T04:39:44Z

Replaces BTree with BTreeMap and BTreeSet, which are completely new implementations.
BTreeMap's internal Node representation is particularly inefficient at the moment to
make this first implementation easy to reason about and fairly safe. Both collections
are also currently missing some of the tooling specific to sorted collections, which
is planned as future work pending reform of these APIs. General implementation issues
are discussed with TODOs internally

[breaking-change]

Still waiting on compilation/test/bench stuff locally, but the edit-distance on any errors should be very small at this point. This is ready to be reviewed.

Gankra · 2014-09-17T04:40:16Z

CC @pcwalton @huonw @aturon

Gankra · 2014-09-17T04:42:59Z

I also tossed in quick-n-dirty implementation of my Entry API RFC since that landed while I was working on this. I'm considering some minor revisions to that API with respect to function signatures, so that might get changed. Broad strokes should stay the same, though.

Gankra · 2014-09-17T16:17:12Z

Last commit builds/tests perfectly!

Bench results:

test btree::map::bench::find_rand_100                      ... bench:        70 ns/iter (+/- 5)
test btree::map::bench::find_rand_10_000                   ... bench:       151 ns/iter (+/- 6)
test btree::map::bench::find_seq_100                       ... bench:        82 ns/iter (+/- 1)
test btree::map::bench::find_seq_10_000                    ... bench:       103 ns/iter (+/- 1)
test btree::map::bench::insert_rand_100                    ... bench:       183 ns/iter (+/- 2)
test btree::map::bench::insert_rand_10_000                 ... bench:       435 ns/iter (+/- 13)
test btree::map::bench::insert_seq_100                     ... bench:       331 ns/iter (+/- 10)
test btree::map::bench::insert_seq_10_000                  ... bench:       391 ns/iter (+/- 14)

test treemap::bench::find_rand_100                         ... bench:        74 ns/iter (+/- 4)
test treemap::bench::find_rand_10_000                      ... bench:       164 ns/iter (+/- 7)
test treemap::bench::find_seq_100                          ... bench:        76 ns/iter (+/- 2)
test treemap::bench::find_seq_10_000                       ... bench:       115 ns/iter (+/- 1)
test treemap::bench::insert_rand_100                       ... bench:       109 ns/iter (+/- 3)
test treemap::bench::insert_rand_10_000                    ... bench:       991 ns/iter (+/- 19)
test treemap::bench::insert_seq_100                        ... bench:       491 ns/iter (+/- 21)
test treemap::bench::insert_seq_10_000                     ... bench:       829 ns/iter (+/- 24)

Better than TreeMap in almost every category, with the inefficient representation! \o/

gereeter · 2014-09-18T01:34:22Z

src/libcollections/btree/node.rs

+/// Takes a Vec, and splits the last `amount` elements into a new one
+fn split<T>(left: &mut Vec<T>, amount: uint) -> Vec<T> {
+    // FIXME(Gankro): gross gross gross gross
+    // Vec should probably have a method for this


Have you looked at the code produced for this method? Because of the manual copy and double reverse, I wouldn't be surprised if this slowed down inserts measurably.

Well, it's a FIXME for a reason. I'm not sure there's anything I can really do, short of adding a method to Vec, no?

@gereeter I found a working solution using unsafe code. Yay!

Deprecates the `find_or_*` family of "internal mutation" methods on `HashMap` in favour of the "external mutation" Entry API as part of RFC 60. Part of rust-lang#17320, although this still needs to be done on `TreeMap` and `BTree`. Work on `TreeMap` is TBD, and `BTree`'s work is part of the complete rewrite in rust-lang#17334. The implemented API deviates from the API described in the RFC in two key places: * `VacantEntry.set` yields a mutable reference to the inserted element to avoid code duplication where complex logic needs to be done *regardless* of whether the entry was vacant or not. * `OccupiedEntry.into_mut` was added so that it is possible to return a reference into the map beyond the lifetime of the Entry itself, providing functional parity to `VacantEntry.set`. [breaking-change]

gereeter · 2014-09-19T00:43:59Z

src/libcollections/btree/map.rs

+    pub fn entry<'a>(&'a mut self, key: K) -> Entry<'a, K, V> {
+        // same basic logic of `swap` and `pop`
+        let mut visit_stack = Vec::with_capacity(self.depth);
+        let mut found = false;


Is there any reason for using an explicit found flag instead of just returning where you currently break?

Yes, a node (and therefore self) is mutably borrowed inside the loop. You have to break out to then borrow self into the returned structs. Although maybe there's a better way I don't see?

huonw · 2014-09-21T00:28:30Z

src/libcollections/btree/node.rs

+    // (ptr, cap, size). This could also have cache benefits for very small nodes, as keys
+    // could bleed into edges and vals.
+    //
+    // However doing this would require *a lot* of code to reimplement all


(Nitpick: not that much, since these are presumably fixed sized and don't need all the dynamic modification/helper functions.)

huonw · 2014-09-25T10:01:23Z

src/libcollections/btree/map.rs

+            let mut stack = self.stack;
+
+            // Remove the key-value pair from the leaf, check if the node is underfull, and then
+            // promptly forget the leaf and ptr to avoid ownership issues


What is the 'leaf and ptr' talking about here exactly?

Gankra · 2014-09-25T12:20:22Z

@huonw I believe I've addressed your latest batch of issues (in "more fixups"). Rebased, and am currently building locally to verify everything is a-okay.

Gankra · 2014-09-25T21:53:03Z

@huonw Ah, I didn't notice one of your comment batches until now. Addressed.

aturon · 2014-09-25T21:59:50Z

As far as adding Deref/DerefMut, we'd probably want to remove get and get_mut (though this makes it slightly harder to figure out what's going on with the API at a glance).

I believe that @pcwalton is currently working on, or has submitted a PR for, fixing up DerefMut.

I suggest we transition this API toward deref after that lands. Thoughts?

huonw · 2014-09-25T22:01:02Z

Fine by me.

Replaces BTree with BTreeMap and BTreeSet, which are completely new implementations. BTreeMap's internal Node representation is particularly inefficient at the moment to make this first implementation easy to reason about and fairly safe. Both collections are also currently missing some of the tooling specific to sorted collections, which is planned as future work pending reform of these APIs. General implementation issues are discussed with TODOs internally Perf results on x86_64 Linux: test treemap::bench::find_rand_100 ... bench: 76 ns/iter (+/- 4) test treemap::bench::find_rand_10_000 ... bench: 163 ns/iter (+/- 6) test treemap::bench::find_seq_100 ... bench: 77 ns/iter (+/- 3) test treemap::bench::find_seq_10_000 ... bench: 115 ns/iter (+/- 1) test treemap::bench::insert_rand_100 ... bench: 111 ns/iter (+/- 1) test treemap::bench::insert_rand_10_000 ... bench: 996 ns/iter (+/- 18) test treemap::bench::insert_seq_100 ... bench: 486 ns/iter (+/- 20) test treemap::bench::insert_seq_10_000 ... bench: 800 ns/iter (+/- 15) test btree::map::bench::find_rand_100 ... bench: 74 ns/iter (+/- 4) test btree::map::bench::find_rand_10_000 ... bench: 153 ns/iter (+/- 5) test btree::map::bench::find_seq_100 ... bench: 82 ns/iter (+/- 1) test btree::map::bench::find_seq_10_000 ... bench: 108 ns/iter (+/- 0) test btree::map::bench::insert_rand_100 ... bench: 220 ns/iter (+/- 1) test btree::map::bench::insert_rand_10_000 ... bench: 620 ns/iter (+/- 16) test btree::map::bench::insert_seq_100 ... bench: 411 ns/iter (+/- 12) test btree::map::bench::insert_seq_10_000 ... bench: 534 ns/iter (+/- 14) BTreeMap still has a lot of room for optimization, but it's already beating out TreeMap on most access patterns. [breaking-change]

Gankra · 2014-09-27T14:27:00Z

Eugh, didn't know that stuff wasn't picked up by make check-collections. Fixed.

Replaces BTree with BTreeMap and BTreeSet, which are completely new implementations. BTreeMap's internal Node representation is particularly inefficient at the moment to make this first implementation easy to reason about and fairly safe. Both collections are also currently missing some of the tooling specific to sorted collections, which is planned as future work pending reform of these APIs. General implementation issues are discussed with TODOs internally [breaking-change] Still waiting on compilation/test/bench stuff locally, but the edit-distance on any errors should be very small at this point. This is ready to be reviewed.

Gankra force-pushed the btree-vec branch from 86fb503 to 0823a16 Compare September 17, 2014 21:07

gereeter reviewed Sep 18, 2014
View reviewed changes

Gankra mentioned this pull request Sep 18, 2014

implement entry API for HashMap #17378

Merged

gereeter reviewed Sep 19, 2014
View reviewed changes

huonw reviewed Sep 21, 2014
View reviewed changes

huonw reviewed Sep 25, 2014
View reviewed changes

Gankra force-pushed the btree-vec branch from 45b7aee to dbca4a0 Compare September 25, 2014 12:18

Gankra force-pushed the btree-vec branch from dbca4a0 to 0b4ba2d Compare September 25, 2014 15:03

vks mentioned this pull request Sep 25, 2014

Pick over the standard libraries for types that don't (but can) implement common traits #15294

Closed

Gankra force-pushed the btree-vec branch 3 times, most recently from b576db6 to e1957ae Compare September 27, 2014 04:44

Gankra force-pushed the btree-vec branch from e1957ae to b6edc59 Compare September 27, 2014 14:26

bors closed this Sep 27, 2014

huonw mentioned this pull request Sep 28, 2014

RFC: Collections reform rust-lang/rfcs#235

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

complete btree rewrite #17334

complete btree rewrite #17334

Gankra commented Sep 17, 2014

Gankra commented Sep 17, 2014

Gankra commented Sep 17, 2014

Gankra commented Sep 17, 2014

gereeter Sep 18, 2014

Gankra Sep 18, 2014

Gankra Sep 18, 2014

gereeter Sep 19, 2014

Gankra Sep 19, 2014

huonw Sep 21, 2014

huonw Sep 25, 2014

Gankra commented Sep 25, 2014

Gankra commented Sep 25, 2014

aturon commented Sep 25, 2014

huonw commented Sep 25, 2014

Gankra commented Sep 27, 2014

complete btree rewrite #17334

complete btree rewrite #17334

Conversation

Gankra commented Sep 17, 2014

Gankra commented Sep 17, 2014

Gankra commented Sep 17, 2014

Gankra commented Sep 17, 2014

gereeter Sep 18, 2014

Choose a reason for hiding this comment

Gankra Sep 18, 2014

Choose a reason for hiding this comment

Gankra Sep 18, 2014

Choose a reason for hiding this comment

gereeter Sep 19, 2014

Choose a reason for hiding this comment

Gankra Sep 19, 2014

Choose a reason for hiding this comment

huonw Sep 21, 2014

Choose a reason for hiding this comment

huonw Sep 25, 2014

Choose a reason for hiding this comment

Gankra commented Sep 25, 2014

Gankra commented Sep 25, 2014

aturon commented Sep 25, 2014

huonw commented Sep 25, 2014

Gankra commented Sep 27, 2014