Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heap-backed fixed-width integers #58

Closed
tarcieri opened this issue Jan 16, 2022 · 11 comments · Fixed by #221
Closed

Heap-backed fixed-width integers #58

tarcieri opened this issue Jan 16, 2022 · 11 comments · Fixed by #221

Comments

@tarcieri
Copy link
Member

tarcieri commented Jan 16, 2022

Somewhat in the spirit of #29, I have been wondering if it would make sense to support a heap-backed integer type (cc @elichi)

The idea would be that integers are still fixed-width and would otherwise work the same as the const generic counterparts, but the width could be chosen at runtime/initialization time, rather than at compile time. So they're not really "arbitrary precision" in that they don't grow/shrink, they're still fixed-width, just fixed at a width determined at runtime.

One place where this would be applicable is the rsa crate. While it'd be nice for microcontroller users to be able to instantiate RSA with e.g. 2048-bit or 4096-bit keys using only the stack, other use cases like OpenPGP implementations may need to support arbitrary key sizes running upwards of 16384-bit. For these use cases where key sizes can potentially vary wildly, it'd be nice to choose the integer size at runtime. The Integer and other traits in this crate would permit accepting either. (cc @dignifiedquire)

Internally this would involve implementing core algorithms that operate over slices instead of fixed-size arrays. This could have other benefits: it could reduce the monomorphization penalty for different const generic sizes by having them pass a slice to a function which can operate over many sizes.

There are some potential drawbacks: we don't presently unroll loops, but in some cases it'd be nice and generate better code, and this could potentially negatively impact such code. I think the solution there would be to duplicate some of that loop unrolled code between the stack and heap-backed implementations, so the stack-allocated version always unrolls the loops ahead-of-time, and the heap-backed version would need to loop over a slice.

@tarcieri
Copy link
Member Author

Another downside I forgot to mention:

The current approach of defining all operations on a newtype for [Limb; LIMBS] allows the compiler to optimize away bounds checks in certain cases because it knows a priori that accesses on literal indexes or indexes whose range can be deduced at compile time will always be in-range.

Implementing operations in terms of &[Limb] could preclude these optimizations, although using #[inline(always)] may be sufficient to ensure they're always applied if possible.

@dignifiedquire
Copy link
Member

one trick I have seen is delegsting to the const versions on the dynamic versions as soon as the size is known.

@newpavlov
Copy link
Member

I don't think we should allow selection of width at runtime. I think rsa would benefit from being generic over U16384 and Box<U16384>, but the width should be explicit in type signatures. Applications which want to support arbitrary key sizes would instantiate several algorithms (e.g. Rsa<U2048>, Rsa<U4096>, Rsa<Box<U8192>>, Rsa<Box<U16384>>) and would switch between them depending on provided key width.

@tarcieri
Copy link
Member Author

@newpavlov U16384 would make things unacceptably slow for typical RSA key sizes.

Something like Rsa<U2048>, Rsa<U4096>, Rsa<Box<U8192>>, Rsa<Box<U16384>> would monomorphize the code 4 times.

@newpavlov
Copy link
Member

newpavlov commented Oct 13, 2022

U16384 would make things unacceptably slow for typical RSA key sizes.

For typical sizes application would use Rsa<U2048> or Rsa<U4096>.

would monomorphize the code 4 times

Yes, as intended. Compiler would be able to perform various optimizations dependent on knowing width at compile time such as unrolling and using SIMD instructions. Binary size would somewhat suffer, but I don't think it's the main concern for applications which want to support RSA keys up to 16384 bits. Application authors would be able to trade between binary sizes and potential runtime overheads by selecting number of instantiated algorithms.

@burdges
Copy link

burdges commented Oct 13, 2022

I'd expect the compiler optimizes away bounds checks if you use say T: Borrow<[Limb]> and T = [Limb; LIMBS], and somehow use [Limb]: ToOwned, but then &[Limbs] works too, so you should try replacing the LIMBS parameter by a T parameter.

As a rule, a Borrow-like type parameter should generally be preferred over type level sizes. It's maybe problematic that ToOwned::Owned: Borrow<Self> but not BorrowMut<Self>, not quite sure.

It's possible RSA2048 winds up faster as [Limbs] under heavy CPU cache loads anyways, but hopefully LLVM knows when to treat [Limbs; LIMBS] as [Limbs].

@newpavlov
Copy link
Member

newpavlov commented Oct 13, 2022

As a rule, a Borrow-like type parameter should generally be preferred over type level sizes.

You reasoning being? The only advantage I see is smaller binary size in cases with arbitrary sized keys, but performance-wise your rule provides less opportunities for compiler to optimize. The most basic one: knowing size at compile time allows compiler to eliminate code for processing tails (because it can see that the size is multiple of 8/16/32/etc.) and for some functions even to completely unroll loops, thus completely eliminating any branching. LLVM is mostly smart enough to decide whether it worth to unroll loops or process data in chunks.

UPD: I misread your post, I thought you want us to allow RSA to be generic over &[Limb]. Having T be equal to [Limb; LIMBS] provides compiler all the necessary information for the described optimizations.

@tarcieri
Copy link
Member Author

@burdges for e.g. a type-level parameter of impl blocks, this requires using a blanket impl, e.g.

impl<U: Borrow<[Limb]>, const LIMBS: usize> Mul<U> for UInt<LIMBS> { ... }

...which means any trait that accepts it as a type-level parameter can only impl a given trait once, which is rather restrictive.

@burdges
Copy link

burdges commented Oct 13, 2022

Is there not a similar issue if U is generic over LIMBS?

@tarcieri
Copy link
Member Author

In that case, it's a concrete type (UInt) that's generic over LIMBS, not a generic type parameter as part of a blanket impl.

@tarcieri
Copy link
Member Author

I pushed up an initial take of this in #221

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants