Refactor eachregion to be O(n log n) not O(n^2) #73

tecosaur · 2024-08-02T15:16:54Z

Since we removed the ordering restriction on annotations to improve the semantics of annotation modification, each annotations(str) call became O(n) which is fine for a once off, but use it in a loop as eachregion does and now it's O(n m). That's pretty underwhelming.

We can improve this to O(n log n) by pre-sorting the list of annotations, and working with it instead. A bit more complexity is needed to do this while preserving the semantics, but it can be worth it for long strings. With a 100,000 char string with 20,000 annotations, print time goes from ~0.4s to 0.015s on my machine.

This improvement has been prompted by #72.

tecosaur · 2024-08-02T15:22:27Z

Looks like I've got an off-by-one error to fix, but other than that this seems solid.

Since we removed the ordering restriction on annotations to improve the semantics of annotation modification, each `annotations(str)` call became `O(n)` which is fine for a once off, but use it in a loop as `eachregion` does and now it's `O(n m)`. That's pretty underwhelming. We can improve this to `O(n log n)` by pre-sorting the list of annotations, and working with it instead. A bit more complexity is needed to do this while preserving the semantics, but it can be worth it for long strings. With a 100,000 char string with 20,000 annotations, print time goes from ~0.4s to ~0.01s on my machine.

While this is implicitly tested later on, I think it's nice to test for it explicitly. If nothing else, should any potentially buggy modifications be made in the future it will make it easier to pin down the root misbehaviour.

tecosaur · 2024-08-04T08:59:51Z

This has ended up being a ~40x performance improvement to eachregion 🙂.

tecosaur force-pushed the presort-in-eachregion branch 2 times, most recently from 1044edf to fb40c04 Compare August 3, 2024 18:07

tecosaur force-pushed the presort-in-eachregion branch from fb40c04 to c417262 Compare August 3, 2024 18:11

Explicitly test eachregion

fc686f3

While this is implicitly tested later on, I think it's nice to test for it explicitly. If nothing else, should any potentially buggy modifications be made in the future it will make it easier to pin down the root misbehaviour.

tecosaur merged commit fc686f3 into main Aug 4, 2024
5 checks passed

tecosaur deleted the presort-in-eachregion branch August 4, 2024 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor eachregion to be O(n log n) not O(n^2) #73

Refactor eachregion to be O(n log n) not O(n^2) #73

tecosaur commented Aug 2, 2024

tecosaur commented Aug 2, 2024

tecosaur commented Aug 4, 2024

Refactor eachregion to be O(n log n) not O(n^2) #73

Refactor eachregion to be O(n log n) not O(n^2) #73

Conversation

tecosaur commented Aug 2, 2024

tecosaur commented Aug 2, 2024

tecosaur commented Aug 4, 2024