Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split version ranges into their own crate #262

Merged
merged 9 commits into from
Oct 29, 2024
Merged

Conversation

konstin
Copy link
Member

@konstin konstin commented Oct 12, 2024

In our implementation of PEP 440 and PEP 508 (Python standards for version selectors and requirements and their markers, which again use versions), we want to use PubGrub's advanced Range type for its fast operations, but we don't want to depend on the whole pubgrub crate in crates that don't do resolution. Hence, I'm splitting out the version ranges into their own crate.

My plan is to merge this PR, publish the crate as 0.1.0, adopt it in our use case (pep440-rs, pep508-rs, uv, downstream users of those crates), and if no blockers show up, publish a version-range[s] 1.0.0 soon.

Range is renamed to Ranges for clarity. To avoid breaking users instantly, we report Ranges as Range, too.

This PR also switches to SmallVec<[Interval<V>; 1]> for better performance over the custom smallvec or other size.

$ taskset -c 0-1 hyperfine --warmup 1 --runs 3 "./uv-branch-1 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline" "./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline"

Benchmark 1: ./uv-branch-1 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline
  Time (mean ± σ):      3.774 s ±  0.007 s    [User: 3.970 s, System: 0.337 s]
  Range (min … max):    3.769 s …  3.782 s    3 runs
 
Benchmark 2: ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline
  Time (mean ± σ):      3.823 s ±  0.004 s    [User: 4.008 s, System: 0.345 s]
  Range (min … max):    3.821 s …  3.828 s    3 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  ./uv-branch-1 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline ran
    1.01 ± 0.00 times faster than ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline

The proptest feature grew out of necessity for the pubgrub tests, but i think it's a nice feature actually.

Another wishlist feature is to have a different Display for uv, but that didn't fit in here yet.

@konstin konstin requested review from mpizenberg and Eh2406 October 12, 2024 21:02
@konstin konstin force-pushed the konsti/split-out-ranges branch from 96384b2 to 8fd898b Compare October 12, 2024 23:01
@mpizenberg
Copy link
Member

Splitting ranges sounds like a good idea to me. Also agree that the Range name is not ideal. Ranges is slightly better. Segments has the same vibe but I prefer Ranges over segments.

@Eh2406
Copy link
Member

Eh2406 commented Oct 14, 2024

This is a really good idea.
If adopting this new crate is not a braking change, we could even release a version of 0.2 that uses it solving #258. Also the resolvo folks expressed interest in sharing this dependency: Https://github.com/mamba-org/resolvo/issues/2 .

Copy link
Member

@Eh2406 Eh2406 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good, after we bike shed naming.

[dependencies]
proptest = { version = "1.5.0", optional = true }
serde = { version = "1.0.210", features = ["derive"], optional = true }
smallvec = { version = "1.13.2" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes us to using the smallvec crate, instead of the one we maintain. This might be reasonable decision, it hasn't had a CVE in a while, but we should call it out and make sure were in agreement.

Copy link
Member

@mpizenberg mpizenberg Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually minimizing unsafe usage is something I valued! It would be unfortunate to regress on that aspect. I think I’d even prefer that we published our smallvec impl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer removing our custom implementation and depending on smallvec, one of the most used packages on crates.io. I expect most pubgrub users to already depend on smallvec, uv and cargo do through multiple packages.

While I see the point about unsafe and publishing a safe competitor would be nice, smallvec gives us flexibility and definitive performance in a well vetted library.

Copy link
Member

@mpizenberg mpizenberg Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s not like it’s a huge thing. It’s purpose built 200 lines. And as I recall last time, I think it was even better performing that the smallvec crate for our very specific use case. It’s not about publishing a safe competitor. I only suggested publishing it to be able to reuse it in both the range crate and in the pubgrub crate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my benchmark, smallvec is about 50ms faster than our custom solution. In this intentionally pubgrub-heavy benchmark, I also tried smallvec with size 4 for comparison (uv-branch-4).

$ taskset -c 0-1 hyperfine --warmup 1 --runs 40 "./uv-branch pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline" "./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline"  "./uv-branch-4 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline"
Benchmark 1: ./uv-branch pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline
Time (mean ± σ):      3.842 s ±  0.006 s    [User: 4.016 s, System: 0.369 s]
Range (min … max):    3.833 s …  3.854 s    40 runs

Benchmark 2: ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline
Time (mean ± σ):      3.900 s ±  0.007 s    [User: 4.079 s, System: 0.365 s]
Range (min … max):    3.889 s …  3.916 s    40 runs

Benchmark 3: ./uv-branch-4 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline
Time (mean ± σ):      3.868 s ±  0.010 s    [User: 4.046 s, System: 0.372 s]
Range (min … max):    3.852 s …  3.900 s    40 runs

Summary
./uv-branch pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline ran
1.01 ± 0.00 times faster than ./uv-branch-4 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline
1.02 ± 0.00 times faster than ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline

Reducing the smallvec to a single free segment with SmallVec<[Interval<V>; 1]> improves performance even further.

$ taskset -c 0-1 hyperfine --warmup 1 --runs 3 "./uv-branch-1 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline" "./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline"

Benchmark 1: ./uv-branch-1 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline
  Time (mean ± σ):      3.774 s ±  0.007 s    [User: 3.970 s, System: 0.337 s]
  Range (min … max):    3.769 s …  3.782 s    3 runs
 
Benchmark 2: ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline
  Time (mean ± σ):      3.823 s ±  0.004 s    [User: 4.008 s, System: 0.345 s]
  Range (min … max):    3.821 s …  3.828 s    3 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  ./uv-branch-1 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline ran
    1.01 ± 0.00 times faster than ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline

Smaller benchmarks are too noisy unfortunately.

$ taskset -c 0-1 hyperfine --warmup 1 --runs 200 "./uv-branch-1 pip compile scripts/requirements/jupyter.in --offline" "./uv-main pip compile scripts/requirements/jupyter.in --offline"
Benchmark 1: ./uv-branch-1 pip compile scripts/requirements/jupyter.in --offline
  Time (mean ± σ):      21.3 ms ±   0.5 ms    [User: 14.9 ms, System: 15.9 ms]
  Range (min … max):    20.2 ms …  23.8 ms    200 runs
 
Benchmark 2: ./uv-main pip compile scripts/requirements/jupyter.in --offline
  Time (mean ± σ):      21.2 ms ±   0.4 ms    [User: 14.7 ms, System: 16.0 ms]
  Range (min … max):    20.1 ms …  22.5 ms    200 runs
 
Summary
  ./uv-main pip compile scripts/requirements/jupyter.in --offline ran
    1.00 ± 0.03 times faster than ./uv-branch-1 pip compile scripts/requirements/jupyter.in --offline
$ taskset -c 0-1 hyperfine --warmup 1 --runs 30 "./uv-branch-1 pip compile scripts/requirements/airflow.in --offline" "./uv-main pip compile scripts/requirements/airflow.in --offline"
Benchmark 1: ./uv-branch-1 pip compile scripts/requirements/airflow.in --offline
  Time (mean ± σ):     263.1 ms ±   4.9 ms    [User: 269.6 ms, System: 130.9 ms]
  Range (min … max):   256.9 ms … 273.3 ms    30 runs
 
Benchmark 2: ./uv-main pip compile scripts/requirements/airflow.in --offline
  Time (mean ± σ):     266.2 ms ±   6.2 ms    [User: 272.5 ms, System: 132.3 ms]
  Range (min … max):   256.7 ms … 285.6 ms    30 runs
 
Summary
  ./uv-branch-1 pip compile scripts/requirements/airflow.in --offline ran
    1.01 ± 0.03 times faster than ./uv-main pip compile scripts/requirements/airflow.in --offline

Treating smallvec as a worse vec with SmallVec<[Interval<V>; 0]> regresses:

$ taskset -c 0-1 hyperfine --warmup 1 --runs 3 "./uv-branch-0 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline" "./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline"
Benchmark 1: ./uv-branch-0 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline
  Time (mean ± σ):      3.870 s ±  0.001 s    [User: 4.085 s, System: 0.393 s]
  Range (min … max):    3.868 s …  3.871 s    3 runs
 
Benchmark 2: ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline
  Time (mean ± σ):      3.826 s ±  0.010 s    [User: 4.006 s, System: 0.349 s]
  Range (min … max):    3.819 s …  3.837 s    3 runs
 
Summary
  ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline ran
    1.01 ± 0.00 times faster than ./uv-branch-0 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline

The problem seems to be that the ranges type is too large:

Base types:

  • uv's Version: 8 bytes (Arc)
  • Bound<Version>: 16 bytes
  • Interval<Version>: 32 bytes

Ranges:

  • Range<Version> main: 64 bytes
  • Range<Version> smallvec 2 entries: 72 bytes
  • Range<Version> smallvec 1 entry: 40 bytes
  • Range<Version> smallvec 0 entries: 24 bytes (the size of a regular Vec)

For comparison smallvec::SmallVec<[u32; 2]>> is 24 bytes (the size of a regular Vec).

Version is an Arc<VersionInner>, so it has only one niche (see e.g. https://stackoverflow.com/questions/76429517/how-does-niche-optimization-for-an-enum-work-in-rust and https://www.0xatticus.com/posts/understanding_rust_niche/), while bound needs a niche for Unbounded, but also a bit for included vs. excluded. Speculating a bit, if we use 1 bit for unbounded, 1 bit for included vs. excluded, 64 bits per Arc, then an Interval<Version> would need 132 bits in 17 bytes or two intervals would need 264 bits in 33 bytes, each plus padding (afaik usually to the next 8 bytes).

Practically, we can win 8 bytes by manually inlining (Bound<T>, Bound<T>):

pub enum BoundBound<T> {
    IncludedIncluded(T, T),
    IncludedExcluded(T, T),
    IncludedUnbounded(T),
    ExcludedIncluded(T, T),
    ExcludedExcluded(T, T),
    ExcludedUnbounded(T),
    UnboundedIncluded(T),
    UnboundedExcluded(T),
    UnboundedUnbounded,
}
  • main SmallVec<Interval<Version>>: 64 bytes
  • main SmallVec<BoundBound<Version>>: 48 bytes
  • smallvec SmallVec<[Interval<V>; 2]>: 72 bytes
  • smallvec SmallVec<[BoundBound<V>; 2]>: 56 bytes

I propose switching to smallvec::SmallVec<[Interval<V>; 1]> for a performance improvement. There are other potential performance wins by optimizing Interval<T> or changing the smallvec size (or switching to a vec) in other places, namely unit_propagation_buffer, merged_dependencies and dated_derivations.

Copy link
Member

@mpizenberg mpizenberg Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this very thoughtful and complete analysis! My personal position is that I prefer not having unsafe for less than 4% or 5% improvement (1.3% here according to your analysis). But if that is the preference of the majority, so be it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, there are different tiers to unsafe:

  • FFI unsafe: We have some unsafe in uv to interact with system APIs written in C, or in pyo3 to speak to the Python C API.
  • Unsafe in high level projects
  • Unsafe in std and foundational libraries

There is std and a number of projects such as serde, bytemuck and smallvec that provide foundational abstractions that can only be implemented in a generic, performant and ergonomic way using unsafe, such as Vec or SmallVec. Most of these would be part of std in a language such as Python or implemented as a C extension (such as numpy), but Rust allows them to live outside std.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The few percent in performance difference does not justify using the unsafe one. But, The only way to have Ranges as its own crate is for it to depend on some smallvec crate. Given that I'd rather depend on someone else's then publish and maintain our own.

> This feature requires Rust 1.49.

> When the union feature is enabled smallvec will track its state (inline or spilled) without the use of an enum tag, reducing the size of the smallvec by one machine word. This means that there is potentially no space overhead compared to Vec. Note that smallvec can still be larger than Vec if the inline buffer is larger than two machine words.
@@ -0,0 +1,7 @@
# This file is automatically @generated by Cargo.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this files actually needed. I think was accidentally created before version-ranges was added to the workspace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

/// Profiling in <https://github.com/pubgrub-rs/pubgrub/pull/262#discussion_r1804276278> showed
/// that a single stack entry is the most efficient. This is most likely due to `Interval<V>`
/// being large.
segments: SmallVec<[Interval<V>; 1]>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this 1 ends up in practice to depend on the size of V, then we could make this a const generic on Ranges. Definitely premature optimization at this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smallvec 2 uses const generics 👀

@Eh2406
Copy link
Member

Eh2406 commented Oct 22, 2024

Looks like we have a plan! Thanks for the work here!

@konstin
Copy link
Member Author

konstin commented Oct 24, 2024

I've checked that this change doesn't break uv

@Eh2406 Eh2406 added this pull request to the merge queue Oct 29, 2024
Merged via the queue into dev with commit 8c37699 Oct 29, 2024
4 checks passed
@Eh2406 Eh2406 deleted the konsti/split-out-ranges branch October 29, 2024 15:27
konstin added a commit to astral-sh/uv that referenced this pull request Oct 29, 2024
See pubgrub-rs/pubgrub#262 for prior discussion.

In most parts of uv, we don't want to depend on pubgrub, but we want to use its powerful version ranges arithmetic. pubgrub-rs/pubgrub#262 split out the ranges into a separate, small crate (that can be published separately to crates.io).

With this first in a series of changes, we only depend on pubgrub in the uv-resolver crate.

Note that the name is now `Ranges`, since the type formerly known as `Range` was a misnormer, it contains multiple ranges.
konstin added a commit to astral-sh/uv that referenced this pull request Oct 29, 2024
See pubgrub-rs/pubgrub#262 for prior discussion.

In most parts of uv, we don't want to depend on pubgrub, but we want to use its powerful version ranges arithmetic. pubgrub-rs/pubgrub#262 split out the ranges into a separate, small crate (that can be published separately to crates.io).

With this first in a series of changes, we only depend on pubgrub in the uv-resolver crate.

Note that the name is now `Ranges`, since the type formerly known as `Range` was a misnormer, it contains multiple ranges.
konstin added a commit to astral-sh/uv that referenced this pull request Oct 29, 2024
Enabled by #8667 and pubgrub-rs/pubgrub#262, we can remove the uv-pubgrub crate and move the conversion between pep440 specifiers and version ranges directly into pep440.

In a next step, we can remove the `VersionRangesSpecifier` intermediary and perform the conversion directly.
@konstin
Copy link
Member Author

konstin commented Oct 30, 2024

I've released version-ranges to crates.io and gave to the team publish permissions. The crates is now used in the PEP 440 and PEP 508 crates (https://crates.io/crates/version-ranges/reverse_dependencies) and did a neat trim in uv. I'll write an announcement post later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants