Split version ranges into their own crate #262

konstin · 2024-10-12T21:02:26Z

In our implementation of PEP 440 and PEP 508 (Python standards for version selectors and requirements and their markers, which again use versions), we want to use PubGrub's advanced Range type for its fast operations, but we don't want to depend on the whole pubgrub crate in crates that don't do resolution. Hence, I'm splitting out the version ranges into their own crate.

My plan is to merge this PR, publish the crate as 0.1.0, adopt it in our use case (pep440-rs, pep508-rs, uv, downstream users of those crates), and if no blockers show up, publish a version-range[s] 1.0.0 soon.

Range is renamed to Ranges for clarity. To avoid breaking users instantly, we report Ranges as Range, too.

This PR also switches to SmallVec<[Interval<V>; 1]> for better performance over the custom smallvec or other size.

$ taskset -c 0-1 hyperfine --warmup 1 --runs 3 "./uv-branch-1 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline" "./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline"

Benchmark 1: ./uv-branch-1 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline
  Time (mean ± σ):      3.774 s ±  0.007 s    [User: 3.970 s, System: 0.337 s]
  Range (min … max):    3.769 s …  3.782 s    3 runs
 
Benchmark 2: ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline
  Time (mean ± σ):      3.823 s ±  0.004 s    [User: 4.008 s, System: 0.345 s]
  Range (min … max):    3.821 s …  3.828 s    3 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  ./uv-branch-1 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline ran
    1.01 ± 0.00 times faster than ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline

The proptest feature grew out of necessity for the pubgrub tests, but i think it's a nice feature actually.

Another wishlist feature is to have a different Display for uv, but that didn't fit in here yet.

mpizenberg · 2024-10-13T12:21:23Z

Splitting ranges sounds like a good idea to me. Also agree that the Range name is not ideal. Ranges is slightly better. Segments has the same vibe but I prefer Ranges over segments.

Eh2406 · 2024-10-14T16:04:19Z

This is a really good idea.
If adopting this new crate is not a braking change, we could even release a version of 0.2 that uses it solving #258. Also the resolvo folks expressed interest in sharing this dependency: Https://github.com/mamba-org/resolvo/issues/2 .

Eh2406

This all looks good, after we bike shed naming.

Eh2406 · 2024-10-15T19:23:47Z

version-ranges/Cargo.toml

+[dependencies]
+proptest = { version = "1.5.0", optional = true }
+serde = { version = "1.0.210", features = ["derive"], optional = true }
+smallvec = { version = "1.13.2" }


This changes us to using the smallvec crate, instead of the one we maintain. This might be reasonable decision, it hasn't had a CVE in a while, but we should call it out and make sure were in agreement.

Ah, actually minimizing unsafe usage is something I valued! It would be unfortunate to regress on that aspect. I think I’d even prefer that we published our smallvec impl.

I'd prefer removing our custom implementation and depending on smallvec, one of the most used packages on crates.io. I expect most pubgrub users to already depend on smallvec, uv and cargo do through multiple packages.

While I see the point about unsafe and publishing a safe competitor would be nice, smallvec gives us flexibility and definitive performance in a well vetted library.

It’s not like it’s a huge thing. It’s purpose built 200 lines. And as I recall last time, I think it was even better performing that the smallvec crate for our very specific use case. It’s not about publishing a safe competitor. I only suggested publishing it to be able to reuse it in both the range crate and in the pubgrub crate.

For my benchmark, smallvec is about 50ms faster than our custom solution. In this intentionally pubgrub-heavy benchmark, I also tried smallvec with size 4 for comparison (uv-branch-4).

$ taskset -c 0-1 hyperfine --warmup 1 --runs 40 "./uv-branch pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline" "./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline" "./uv-branch-4 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline" Benchmark 1: ./uv-branch pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline Time (mean ± σ): 3.842 s ± 0.006 s [User: 4.016 s, System: 0.369 s] Range (min … max): 3.833 s … 3.854 s 40 runs Benchmark 2: ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline Time (mean ± σ): 3.900 s ± 0.007 s [User: 4.079 s, System: 0.365 s] Range (min … max): 3.889 s … 3.916 s 40 runs Benchmark 3: ./uv-branch-4 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline Time (mean ± σ): 3.868 s ± 0.010 s [User: 4.046 s, System: 0.372 s] Range (min … max): 3.852 s … 3.900 s 40 runs Summary ./uv-branch pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline ran 1.01 ± 0.00 times faster than ./uv-branch-4 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline 1.02 ± 0.00 times faster than ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline

Reducing the smallvec to a single free segment with SmallVec<[Interval<V>; 1]> improves performance even further.

$ taskset -c 0-1 hyperfine --warmup 1 --runs 3 "./uv-branch-1 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline" "./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline" Benchmark 1: ./uv-branch-1 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline Time (mean ± σ): 3.774 s ± 0.007 s [User: 3.970 s, System: 0.337 s] Range (min … max): 3.769 s … 3.782 s 3 runs Benchmark 2: ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline Time (mean ± σ): 3.823 s ± 0.004 s [User: 4.008 s, System: 0.345 s] Range (min … max): 3.821 s … 3.828 s 3 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary ./uv-branch-1 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline ran 1.01 ± 0.00 times faster than ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline

Smaller benchmarks are too noisy unfortunately.

$ taskset -c 0-1 hyperfine --warmup 1 --runs 200 "./uv-branch-1 pip compile scripts/requirements/jupyter.in --offline" "./uv-main pip compile scripts/requirements/jupyter.in --offline" Benchmark 1: ./uv-branch-1 pip compile scripts/requirements/jupyter.in --offline Time (mean ± σ): 21.3 ms ± 0.5 ms [User: 14.9 ms, System: 15.9 ms] Range (min … max): 20.2 ms … 23.8 ms 200 runs Benchmark 2: ./uv-main pip compile scripts/requirements/jupyter.in --offline Time (mean ± σ): 21.2 ms ± 0.4 ms [User: 14.7 ms, System: 16.0 ms] Range (min … max): 20.1 ms … 22.5 ms 200 runs Summary ./uv-main pip compile scripts/requirements/jupyter.in --offline ran 1.00 ± 0.03 times faster than ./uv-branch-1 pip compile scripts/requirements/jupyter.in --offline $ taskset -c 0-1 hyperfine --warmup 1 --runs 30 "./uv-branch-1 pip compile scripts/requirements/airflow.in --offline" "./uv-main pip compile scripts/requirements/airflow.in --offline" Benchmark 1: ./uv-branch-1 pip compile scripts/requirements/airflow.in --offline Time (mean ± σ): 263.1 ms ± 4.9 ms [User: 269.6 ms, System: 130.9 ms] Range (min … max): 256.9 ms … 273.3 ms 30 runs Benchmark 2: ./uv-main pip compile scripts/requirements/airflow.in --offline Time (mean ± σ): 266.2 ms ± 6.2 ms [User: 272.5 ms, System: 132.3 ms] Range (min … max): 256.7 ms … 285.6 ms 30 runs Summary ./uv-branch-1 pip compile scripts/requirements/airflow.in --offline ran 1.01 ± 0.03 times faster than ./uv-main pip compile scripts/requirements/airflow.in --offline

Treating smallvec as a worse vec with SmallVec<[Interval<V>; 0]> regresses:

$ taskset -c 0-1 hyperfine --warmup 1 --runs 3 "./uv-branch-0 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline" "./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline" Benchmark 1: ./uv-branch-0 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline Time (mean ± σ): 3.870 s ± 0.001 s [User: 4.085 s, System: 0.393 s] Range (min … max): 3.868 s … 3.871 s 3 runs Benchmark 2: ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline Time (mean ± σ): 3.826 s ± 0.010 s [User: 4.006 s, System: 0.349 s] Range (min … max): 3.819 s … 3.837 s 3 runs Summary ./uv-main pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline ran 1.01 ± 0.00 times faster than ./uv-branch-0 pip compile scripts/requirements/bio_embeddings.in --exclude-newer 2024-03-01 --offline

The problem seems to be that the ranges type is too large:

Base types:

uv's Version: 8 bytes (Arc)

Bound<Version>: 16 bytes

Interval<Version>: 32 bytes

Ranges:

Range<Version> main: 64 bytes

Range<Version> smallvec 2 entries: 72 bytes

Range<Version> smallvec 1 entry: 40 bytes

Range<Version> smallvec 0 entries: 24 bytes (the size of a regular Vec)

For comparison smallvec::SmallVec<[u32; 2]>> is 24 bytes (the size of a regular Vec).

Version is an Arc<VersionInner>, so it has only one niche (see e.g. https://stackoverflow.com/questions/76429517/how-does-niche-optimization-for-an-enum-work-in-rust and https://www.0xatticus.com/posts/understanding_rust_niche/), while bound needs a niche for Unbounded, but also a bit for included vs. excluded. Speculating a bit, if we use 1 bit for unbounded, 1 bit for included vs. excluded, 64 bits per Arc, then an Interval<Version> would need 132 bits in 17 bytes or two intervals would need 264 bits in 33 bytes, each plus padding (afaik usually to the next 8 bytes).

Practically, we can win 8 bytes by manually inlining (Bound<T>, Bound<T>):

pub enum BoundBound<T> { IncludedIncluded(T, T), IncludedExcluded(T, T), IncludedUnbounded(T), ExcludedIncluded(T, T), ExcludedExcluded(T, T), ExcludedUnbounded(T), UnboundedIncluded(T), UnboundedExcluded(T), UnboundedUnbounded, }

main SmallVec<Interval<Version>>: 64 bytes

main SmallVec<BoundBound<Version>>: 48 bytes

smallvec SmallVec<[Interval<V>; 2]>: 72 bytes

smallvec SmallVec<[BoundBound<V>; 2]>: 56 bytes

I propose switching to smallvec::SmallVec<[Interval<V>; 1]> for a performance improvement. There are other potential performance wins by optimizing Interval<T> or changing the smallvec size (or switching to a vec) in other places, namely unit_propagation_buffer, merged_dependencies and dated_derivations.

Thanks a lot for this very thoughtful and complete analysis! My personal position is that I prefer not having unsafe for less than 4% or 5% improvement (1.3% here according to your analysis). But if that is the preference of the majority, so be it.

For me, there are different tiers to unsafe:

FFI unsafe: We have some unsafe in uv to interact with system APIs written in C, or in pyo3 to speak to the Python C API.

Unsafe in high level projects

Unsafe in std and foundational libraries

There is std and a number of projects such as serde, bytemuck and smallvec that provide foundational abstractions that can only be implemented in a generic, performant and ergonomic way using unsafe, such as Vec or SmallVec. Most of these would be part of std in a language such as Python or implemented as a C extension (such as numpy), but Rust allows them to live outside std.

The few percent in performance difference does not justify using the unsafe one. But, The only way to have Ranges as its own crate is for it to depend on some smallvec crate. Given that I'd rather depend on someone else's then publish and maintain our own.

> This feature requires Rust 1.49. > When the union feature is enabled smallvec will track its state (inline or spilled) without the use of an enum tag, reducing the size of the smallvec by one machine word. This means that there is potentially no space overhead compared to Vec. Note that smallvec can still be larger than Vec if the inline buffer is larger than two machine words.

Eh2406 · 2024-10-22T17:55:15Z

version-ranges/Cargo.lock

@@ -0,0 +1,7 @@
+# This file is automatically @generated by Cargo.


I don't think this files actually needed. I think was accidentally created before version-ranges was added to the workspace.

Eh2406 · 2024-10-22T18:30:22Z

version-ranges/src/lib.rs

+    /// Profiling in <https://github.com/pubgrub-rs/pubgrub/pull/262#discussion_r1804276278> showed
+    /// that a single stack entry is the most efficient. This is most likely due to `Interval<V>`
+    /// being large.
+    segments: SmallVec<[Interval<V>; 1]>,


If this 1 ends up in practice to depend on the size of V, then we could make this a const generic on Ranges. Definitely premature optimization at this point.

smallvec 2 uses const generics 👀

Eh2406 · 2024-10-22T18:31:10Z

Looks like we have a plan! Thanks for the work here!

konstin · 2024-10-24T10:28:10Z

I've checked that this change doesn't break uv

See pubgrub-rs/pubgrub#262 for prior discussion. In most parts of uv, we don't want to depend on pubgrub, but we want to use its powerful version ranges arithmetic. pubgrub-rs/pubgrub#262 split out the ranges into a separate, small crate (that can be published separately to crates.io). With this first in a series of changes, we only depend on pubgrub in the uv-resolver crate. Note that the name is now `Ranges`, since the type formerly known as `Range` was a misnormer, it contains multiple ranges.

Enabled by #8667 and pubgrub-rs/pubgrub#262, we can remove the uv-pubgrub crate and move the conversion between pep440 specifiers and version ranges directly into pep440. In a next step, we can remove the `VersionRangesSpecifier` intermediary and perform the conversion directly.

konstin · 2024-10-30T10:13:14Z

I've released version-ranges to crates.io and gave to the team publish permissions. The crates is now used in the PEP 440 and PEP 508 crates (https://crates.io/crates/version-ranges/reverse_dependencies) and did a neat trim in uv. I'll write an announcement post later.

konstin requested review from mpizenberg and Eh2406 October 12, 2024 21:02

Split version ranges into their own crate

8fd898b

konstin force-pushed the konsti/split-out-ranges branch from 96384b2 to 8fd898b Compare October 12, 2024 23:01

polish

3ebe63f

Eh2406 approved these changes Oct 14, 2024

View reviewed changes

konstin added 2 commits October 14, 2024 18:54

Rename to Ranges

f46896e

Doc

d4fb873

Eh2406 reviewed Oct 15, 2024

View reviewed changes

konstin added 5 commits October 16, 2024 10:34

Use SmallVec<[Interval<V>; 1]> for better performance

9024566

Merge dev

50fc892

Move comments.

3053979

Add alias to avoid breaking change

aaa57d8

Eh2406 reviewed Oct 22, 2024

View reviewed changes

Eh2406 added this pull request to the merge queue Oct 29, 2024

Merged via the queue into dev with commit 8c37699 Oct 29, 2024
4 checks passed

Eh2406 deleted the konsti/split-out-ranges branch October 29, 2024 15:27

konstin mentioned this pull request Oct 29, 2024

Start using the version ranges crate astral-sh/uv#8667

Merged

konstin mentioned this pull request Oct 29, 2024

Merge uv-pubgrub into uv-pep440 astral-sh/uv#8669

Merged

konstin mentioned this pull request Oct 30, 2024

Rename v0.3 Range into BoundedRange #123

Closed

konstin added the Ranges label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split version ranges into their own crate #262

Split version ranges into their own crate #262

konstin commented Oct 12, 2024 •

edited

Loading

mpizenberg commented Oct 13, 2024

Eh2406 commented Oct 14, 2024

Eh2406 left a comment

Eh2406 Oct 15, 2024

mpizenberg Oct 15, 2024 •

edited

Loading

konstin Oct 16, 2024

mpizenberg Oct 16, 2024 •

edited

Loading

konstin Oct 17, 2024

mpizenberg Oct 17, 2024 •

edited

Loading

konstin Oct 17, 2024

Eh2406 Oct 22, 2024

Eh2406 Oct 22, 2024

konstin Oct 22, 2024

Eh2406 Oct 22, 2024

konstin Oct 22, 2024

Eh2406 commented Oct 22, 2024

konstin commented Oct 24, 2024

konstin commented Oct 30, 2024

		@@ -0,0 +1,7 @@
		# This file is automatically @generated by Cargo.

Split version ranges into their own crate #262

Split version ranges into their own crate #262

Conversation

konstin commented Oct 12, 2024 • edited Loading

mpizenberg commented Oct 13, 2024

Eh2406 commented Oct 14, 2024

Eh2406 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpizenberg Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpizenberg Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpizenberg Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Eh2406 commented Oct 22, 2024

konstin commented Oct 24, 2024

konstin commented Oct 30, 2024

konstin commented Oct 12, 2024 •

edited

Loading

mpizenberg Oct 15, 2024 •

edited

Loading

mpizenberg Oct 16, 2024 •

edited

Loading

mpizenberg Oct 17, 2024 •

edited

Loading