Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Bytes::copy_from_slice #365

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

stepancheg
Copy link
Contributor

Create a new SharedInline Bytes representation, which is:

struct SharedInline {
    ref_cnt: AtomicUsize,
    cap: usize,
    // data: [u8; cap],
}

The advantage of this representation is that we do not need an extra
allocation when cloning Bytes which makes such cloning much faster
and without extra allocation.

The drawback is a slightly lower performance of such object destruction
due to atomic decrement.

The bench:

#[bench]
fn copy_from_slice_and_clone(b: &mut Bencher) {
    b.iter(|| {
        Bytes::copy_from_slice(b"abcdef").clone()
    });
}

becomes two times faster (210ns/iter vs 110ns/iter).

While non-shared bench:

#[bench]
fn copy_from_slice(b: &mut Bencher) {
    b.iter(|| {
        Bytes::copy_from_slice(b"abcdef")
    });
}

becomes a little slower (85ns/iter vs 90ns/iter).

Create a new `SharedInline` `Bytes` representation, which is:

```
struct SharedInline {
    ref_cnt: AtomicUsize,
    cap: usize,
    data: [cap; u8],
}
```

The advantace of this representation is that we do not need an extra
allocation when cloning `Bytes` which makes such cloning much faster
and without extra allocation.

The drawback is slightly lower performance of such object destruction
due to atomic decrement in constructor.

The bench:

```
#[bench]
fn copy_from_slice_and_clone(b: &mut Bencher) {
    b.iter(|| {
        Bytes::copy_from_slice(b"abcdef").clone()
    });
}
```

becomes two times faster (210ns/iter vs 110ns/iter).

While non-shared bench:

```
#[bench]
fn copy_from_slice(b: &mut Bencher) {
    b.iter(|| {
        Bytes::copy_from_slice(b"abcdef")
    });
}
```

becomes a little slower (86ns/iter vs 90ns/iter).
Copy link
Member

@seanmonstar seanmonstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand at first, but now I get it:

  • When built with an existing Vec, there's 2 allocations, the Shared which is a skinny Arc, and then the Vec inside it.
  • When we'd need to allocate a Vec ourselves, we could just allocate once, and smash the Arc and Vec into 1.

struct SharedInline {
ref_cnt: AtomicUsize,
cap: usize,
// data: [u8; cap]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be an actual DST? Admittedly I haven't tried to use that part of Rust much...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanmonstar it can't because *mut SharedInline must be one word size (to fit into AtomicPtr()), and DST is two words.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants