Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor eachregion to be O(n log n) not O(n^2) #73

Merged
merged 2 commits into from
Aug 4, 2024

Conversation

tecosaur
Copy link
Collaborator

@tecosaur tecosaur commented Aug 2, 2024

Since we removed the ordering restriction on annotations to improve the semantics of annotation modification, each annotations(str) call became O(n) which is fine for a once off, but use it in a loop as eachregion does and now it's O(n m). That's pretty underwhelming.

We can improve this to O(n log n) by pre-sorting the list of annotations, and working with it instead. A bit more complexity is needed to do this while preserving the semantics, but it can be worth it for long strings. With a 100,000 char string with 20,000 annotations, print time goes from ~0.4s to 0.015s on my machine.


This improvement has been prompted by #72.

@tecosaur
Copy link
Collaborator Author

tecosaur commented Aug 2, 2024

Looks like I've got an off-by-one error to fix, but other than that this seems solid.

@tecosaur tecosaur force-pushed the presort-in-eachregion branch 2 times, most recently from 1044edf to fb40c04 Compare August 3, 2024 18:07
Since we removed the ordering restriction on annotations to improve the
semantics of annotation modification, each `annotations(str)` call
became `O(n)` which is fine for a once off, but use it in a loop as
`eachregion` does and now it's `O(n m)`. That's pretty underwhelming.

We can improve this to `O(n log n)` by pre-sorting the list of
annotations, and working with it instead. A bit more complexity is
needed to do this while preserving the semantics, but it can be worth it
for long strings. With a 100,000 char string with 20,000 annotations,
print time goes from ~0.4s to ~0.01s on my machine.
@tecosaur tecosaur force-pushed the presort-in-eachregion branch from fb40c04 to c417262 Compare August 3, 2024 18:11
While this is implicitly tested later on, I think it's nice to test for
it explicitly. If nothing else, should any potentially buggy
modifications be made in the future it will make it easier to pin down
the root misbehaviour.
@tecosaur tecosaur merged commit fc686f3 into main Aug 4, 2024
5 checks passed
@tecosaur tecosaur deleted the presort-in-eachregion branch August 4, 2024 08:59
@tecosaur
Copy link
Collaborator Author

tecosaur commented Aug 4, 2024

This has ended up being a ~40x performance improvement to eachregion 🙂.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant