Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks and performance #25

Open
sharksforarms opened this issue May 28, 2020 · 5 comments
Open

Benchmarks and performance #25

sharksforarms opened this issue May 28, 2020 · 5 comments
Labels
optimization Optimization

Comments

@sharksforarms
Copy link
Owner

Using criterion

+bonus if compared to other crates like nom for reading

@sharksforarms
Copy link
Owner Author

sharksforarms commented Jun 1, 2020

Added some simple benchmarks to catch regressions and measure optimisations on reading

@sharksforarms sharksforarms changed the title Add benchmarks Benchmarks and performance Jun 15, 2020
@OliverGavin
Copy link

I was playing with the magic attribute and benchmarks and I realized it is very slow so I thought I would share:

#![feature(test)]

use deku::prelude::*;

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(endian = "big", magic = b"\x00\x00\x00\x05")]
struct WithMagicV5 {
    value: u32
}

#[derive(Debug, PartialEq, DekuRead, DekuWrite)]
#[deku(endian = "big")]
struct WithoutMagicV5 {
    #[deku(assert = "*version == 5")]
    version: u32,  // users ideally should not be able to set this as it is always 5
    value: u32
}

#[cfg_attr(test, macro_use)]
extern crate hex_literal;

#[cfg(test)]
mod magic_benches {
    use super::*;

    extern crate test;
    use test::Bencher;
    
    #[bench]
    fn bench_with_magic(b: &mut Bencher) {
        let data = hex!("00 00 00 05 00 00 00 01");
        b.iter(|| {
            let (_, _) = WithMagicV5::from_bytes((&data, 0)).unwrap();
        });
    }
    
    #[bench]
    fn bench_without_magic(b: &mut Bencher) {
        let data = hex!("00 00 00 05 00 00 00 01");
        b.iter(|| {
            let (_, _) = WithoutMagicV5::from_bytes((&data, 0)).unwrap();
        });
    }
}

And the results:

running 2 tests
test magic_benches::bench_with_magic        ... bench:          48 ns/iter (+/- 4)
test magic_benches::bench_without_magic     ... bench:          21 ns/iter (+/- 6)

@sharksforarms
Copy link
Owner Author

sharksforarms commented Apr 17, 2021

I was playing with the magic attribute and benchmarks and I realized it is very slow so I thought I would share:

Nice one!

There are some optimizations which could be done. This is a great example.

For example, magic attribute handling could convert the byte string to 0x00000005u32 (at compile time?) and do a single u32 read and equality check. Currently, it would do 4 u8 reads and 4 equality checks (iterating the byte string).

To add flexibility, magic could also accept a number or a bytestring!

@OliverGavin
Copy link

Oh yeah, actually the first problem I faced with magic was that I couldn't pass a number (and it wasn't clear from the docs as it was a string example).

I'd be happy to give this a shot if you could give me a rough idea where to look? Seems relatively easy.

@sharksforarms
Copy link
Owner Author

sharksforarms commented Apr 20, 2021

@OliverGavin Yes! That would be awesome. Happy to show you where to start.

I'd take a look at how the id attribute works, this attribute accepts both b"bytes" and tokens: https://github.com/sharksforarms/deku/blob/master/deku-derive/src/lib.rs#L17

The magic attribute is declared here: https://github.com/sharksforarms/deku/blob/master/deku-derive/src/lib.rs#L121 and as you can see only accepts a LitByteStr. This would involve changing the type to an enum to accept either b"" and other desired variants (such as LitInt)

I'll create an issue for this!

Edit: #204

wcampbell0x2a added a commit to wcampbell0x2a/deku that referenced this issue Sep 28, 2022
Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See sharksforarms#44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

 See sharksforarms#25
wcampbell0x2a added a commit to wcampbell0x2a/deku that referenced this issue Sep 28, 2022
Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See sharksforarms#44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

 See sharksforarms#25
wcampbell0x2a added a commit to wcampbell0x2a/deku that referenced this issue Sep 28, 2022
Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See sharksforarms#44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

 See sharksforarms#25
wcampbell0x2a added a commit to wcampbell0x2a/deku that referenced this issue Sep 28, 2022
Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See sharksforarms#44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

 See sharksforarms#25
sharksforarms pushed a commit to wcampbell0x2a/deku that referenced this issue Oct 6, 2022
Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See sharksforarms#44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

 See sharksforarms#25
sharksforarms pushed a commit that referenced this issue Oct 6, 2022
Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See #44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

 See #25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Optimization
Projects
None yet
Development

No branches or pull requests

2 participants