Optimize Bytes::copy_from_slice #365

stepancheg · 2020-01-22T05:36:13Z

Create a new SharedInline Bytes representation, which is:

struct SharedInline {
    ref_cnt: AtomicUsize,
    cap: usize,
    // data: [u8; cap],
}

The advantage of this representation is that we do not need an extra
allocation when cloning Bytes which makes such cloning much faster
and without extra allocation.

The drawback is a slightly lower performance of such object destruction
due to atomic decrement.

The bench:

#[bench]
fn copy_from_slice_and_clone(b: &mut Bencher) {
    b.iter(|| {
        Bytes::copy_from_slice(b"abcdef").clone()
    });
}

becomes two times faster (210ns/iter vs 110ns/iter).

While non-shared bench:

#[bench]
fn copy_from_slice(b: &mut Bencher) {
    b.iter(|| {
        Bytes::copy_from_slice(b"abcdef")
    });
}

becomes a little slower (85ns/iter vs 90ns/iter).

Create a new `SharedInline` `Bytes` representation, which is: ``` struct SharedInline { ref_cnt: AtomicUsize, cap: usize, data: [cap; u8], } ``` The advantace of this representation is that we do not need an extra allocation when cloning `Bytes` which makes such cloning much faster and without extra allocation. The drawback is slightly lower performance of such object destruction due to atomic decrement in constructor. The bench: ``` #[bench] fn copy_from_slice_and_clone(b: &mut Bencher) { b.iter(|| { Bytes::copy_from_slice(b"abcdef").clone() }); } ``` becomes two times faster (210ns/iter vs 110ns/iter). While non-shared bench: ``` #[bench] fn copy_from_slice(b: &mut Bencher) { b.iter(|| { Bytes::copy_from_slice(b"abcdef") }); } ``` becomes a little slower (86ns/iter vs 90ns/iter).

seanmonstar

I didn't understand at first, but now I get it:

When built with an existing Vec, there's 2 allocations, the Shared which is a skinny Arc, and then the Vec inside it.
When we'd need to allocate a Vec ourselves, we could just allocate once, and smash the Arc and Vec into 1.

seanmonstar · 2020-01-23T18:24:15Z

src/bytes.rs

+struct SharedInline {
+    ref_cnt: AtomicUsize,
+    cap: usize,
+    // data: [u8; cap]


Can this just be an actual DST? Admittedly I haven't tried to use that part of Rust much...

@seanmonstar it can't because *mut SharedInline must be one word size (to fit into AtomicPtr()), and DST is two words.

stepancheg added 2 commits January 22, 2020 04:31

Bench Bytes::copy_from_slice followed by Clone

320b17d

seanmonstar reviewed Jan 23, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Bytes::copy_from_slice #365

Optimize Bytes::copy_from_slice #365

stepancheg commented Jan 22, 2020

seanmonstar left a comment

seanmonstar Jan 23, 2020

stepancheg Jan 23, 2020

Optimize Bytes::copy_from_slice #365

Are you sure you want to change the base?

Optimize Bytes::copy_from_slice #365

Conversation

stepancheg commented Jan 22, 2020

seanmonstar left a comment

Choose a reason for hiding this comment

seanmonstar Jan 23, 2020

Choose a reason for hiding this comment

stepancheg Jan 23, 2020

Choose a reason for hiding this comment