Shared counters optimization #101

jimvdl · 2021-11-27T15:59:09Z

Hello, I've made an attempt at implementing the shared counters optimization.

Overview:
Once count experiences contention the compare exchange will fail triggering the allocation of counter_cells. It allocates a Vec with 2 cells initializing one of them to n. Any future contention on count will randomly select a counter cell and increment its count. If this also fails (and the number of counter cells doesn't exceed the number of cpu's) then a new counter cell array is allocated with n << 1, it copies all of the old values over and initializes the new counters to 0. If counter_cells is busy for whatever reason it falls back on simply attempting to increment count again. When the hash map is dropped the counter_cells array gets subsequently dropped.

Fixes: #11

This is my first open-source contribution so any advice/feedback is greatly appreciated.

This change is

`HashMap` now has a `counter_cells` pointer that potentially holds counter cells to relieve `count` of contention. Under the guard of `cells_busy`, it resizes when one of the counter cells experiences contention. All of the counter cells are dropped when the hash map gets deallocated.

jimvdl · 2021-11-29T10:31:29Z

Hi @jonhoo, would you mind giving this a review? Thanks!

jonhoo · 2021-12-09T04:14:20Z

This is on my radar, I just haven't had the spare time to look at it yet! Thanks for giving it a shot. Once I have some time I'll give it a review :)

One thing off the top of my head is that I'd like to see this implemented as a separate type in its own module rather than all inlined into the implementation. Will make it much easier to review, and possibly reusable in and of itself!

jimvdl · 2021-12-10T11:13:22Z

I had some questions, I'm sure you'll get to them naturally when you're reviewing but wanted to list them out:

The struct name is currently LongAdder, I'm not sure if that is a fitting name since we are not adding longs like in Java. (maybe IsizeAdder, but I'll leave that up to you).
A bunch of the tests already verify that the length matches up but since the counter is not a atomic snapshot some tests might occasionally fail due to the sum being calculated while asserting the length.
Should I directly mirror LongAdder and implement functions like increment(), decrement() etc?
The flurry style linting beta clippy test warns about unsoundness on K & V on BinEntry because they do not implement Send and Sync, should I PR a fix for these before this PR gets reviewed?

Take all the time you need, I'm eager to learn something and potentially get this merged so we'll take a look at it whenever you're ready!

jonhoo · 2021-12-16T05:41:02Z

I had some questions, I'm sure you'll get to them naturally when you're reviewing but wanted to list them out:

I didn't get to as many things as I wanted today, so this remains on my backlog, but figured I'd at least answer your questions!

* The struct name is currently `LongAdder`, I'm not sure if that is a fitting name since we are not adding longs like in Java. (maybe IsizeAdder, but I'll leave that up to you).

Oof, Isize just looks really weird. How about ConcurrentCounter? That's exactly what it is. I suppose we could throw "signed" in there, but I don't think that's super necessary.

* A bunch of the tests already verify that the length matches up but since the counter is not a atomic snapshot some tests might occasionally fail due to the sum being calculated while asserting the length.

We definitely do not want spurious tests, but it sounds like those tests should be racy already if they're seeing a race when using this new concurrent counter... Can you point to a specific test that fails?

* Should I directly mirror LongAdder and implement functions like increment(), decrement() etc?

I wonder — could you just implement the AddAssign trait instead? I think your add only takes a Guard at the moment because it uses Atomic, but maybe it'd be an idea to just allocate a Vec with num_cpu counters from the very beginning and keep a separate AtomicUsize that just tracks the highest index in the Vec we should be using? That means no resizing, which means no Atomic, which means no Guard. Want to give that a shot?

* The flurry style linting beta clippy test warns about unsoundness on `K` & `V` on `BinEntry` because they do not implement `Send` and `Sync`, should I PR a fix for these before this PR gets reviewed?

Oh, interesting. Yeah, definitely try for a separate PR that adds those!

Take all the time you need, I'm eager to learn something and potentially get this merged so we'll take a look at it whenever you're ready!

❤️

jimvdl · 2021-12-17T14:13:46Z

We _definitely_ do not want spurious tests, but it sounds like those tests should be racy already if they're seeing a race when using this new concurrent counter... Can you point to a specific test that fails?

Every test that asserts the maps length might be racy. Before the counter optimization the length of the map would always be correctly synced. Now however, when high contention occurs the length might be a bit behind the actual number of items in the map. When a test asserts the length while it's still counting the length might be 1 behind the actual length (if that makes sense). I did try to reproduce tests that might be spurious but couldn't find any specific ones as of yet.

I wonder — could you just implement the `AddAssign` trait instead? I think your `add` only takes a `Guard` at the moment because it uses `Atomic`, but maybe it'd be an idea to just allocate a `Vec` with `num_cpu` counters from the very beginning and keep a _separate_ `AtomicUsize` that just tracks the highest index in the `Vec` we should be using? That means no resizing, which means no `Atomic`, which means no `Guard`. Want to give that a shot?

I tried to implement the AddAssign trait but got slightly stuck due to the trait requiring &mut self. add_count only has &self and something like RefCell probably wouldn't work here due to it not being Sync. Neither would Arc<Mutex<ConcurrentCounter>> because that would defeat the purpose of making the counter concurrent. If we somehow can get around the exclusive borrow requirement AddAssign should work. Any tips/thoughts on how I might go about this?

Oh, interesting. Yeah, definitely try for a separate PR that adds those!

Will do. I'll also include #98 since it's a similar problem.

…-counters-optimization

jonhoo

I was finally able to give this a look, sorry it took so so so long!

@ibraheemdev Any chance you have some spare cycles to try benchmarking this to see if it meaningfully improves multi-core performance? Or better yet, help @jimvdl do the benchmarks themselves!

src/counter.rs

src/map.rs

jonhoo · 2022-03-07T01:44:51Z

Oh, and you'll probably want to merge your changes with master, since some things have changed there in the intervening time.

As for AddAssign, I think you could probably implement it for &Counter, but it's not terribly important, and may honestly just end up proving unexpected and unergonomic in the end 😅

ibraheemdev · 2022-03-07T02:02:20Z

Any chance you have some spare cycles to try benchmarking this to see if it meaningfully improves multi-core performance?

I may have some time next weekend. It also occurs to me that this datastructure may be generally useful as a crate.

jimvdl · 2022-03-08T14:22:43Z

As for AddAssign, I think you could probably implement it for &Counter and may honestly just end up proving unexpected and unergonomic

I think your hunch might be correct, it would mean you could increment the counter like this:

let mut c = &self.counter;
c += 5;

Which does feel a bit unergonomic, but I'm still struggling with some of Rust's basics so maybe there is a nicer way.

codecov · 2022-03-08T15:37:32Z

Codecov Report

Merging #101 (1bd51f5) into master (85ac469) will decrease coverage by 0.26%.
The diff coverage is 78.57%.

Impacted Files	Coverage Δ
src/counter.rs	`75.00% <75.00%> (ø)`
src/map.rs	`81.00% <85.71%> (+0.41%)`	⬆️
src/node.rs	`77.22% <100.00%> (-1.31%)`	⬇️

jonhoo

Yeah, it's not super pretty. I think it's okay just leaving that out then — it's not like calling .add is that onerous!

src/counter.rs

src/map.rs

jimvdl · 2022-03-10T10:53:49Z

I'm also not sure how to accurately test the map's length, because it might not always be in sync with the actual number of items. It will sync up eventually after there are no more concurrent updates and length assertions but that still causes some tests to occasionally fail. See the Java docs.

I glanced at their tests to see if they have a special way of asserting the length but couldn't find anything, maybe I missed something? What do you guys think?

- Made the resize hint check more concise. - Moved the TODO about the CounterCell implementation to the counter module. - Reverted the counter declaration back to the original line.

jonhoo · 2022-03-11T02:18:43Z

I'm also not sure how to accurately test the map's length

Hmm, how about something like

loop {
  let n = map.len()
  assert!(n <= EXPECTED, "n <= {}", EXPECTED);
  if n == EXPECTED { break; }
  std::thread::yield_now();
}

Maybe even with a max limit for the number of iterations? Probably worth sticking that in some kind of helper function for the tests too.

jimvdl · 2022-03-13T16:36:07Z

Sorry that I keep bothering you with this but I'm seriously confused. I've been re-running the tests over and over to try and catch a test that fails due to the length being off by 1 (like I saw way before) but can't find one that fails now... Before it would fail pretty often but ever since we changed the cells the problem seems to have gone away??

Since I can't reproduce the problem anymore at all we probably won't need that helper function.

Also, when I was testing I tried a sample size of 1 million parallel inserts and got this distribution: ConcurrentCounter { base: 990579, cells: [2384, 2308, 2360, 2369] }. On the one hand the solution of base + value and each iteration of the loop adding += cv works pretty well, on the other hand it seems that you need to do a ridiculous amount of parallel inserts to make use of the counter cells. We can still add some benchmarks to test this a little better, although it seems consistent.

jonhoo · 2022-03-19T03:27:38Z

Huh, how weird. Maybe the old implementation had a bug somehow? It's a good sign it's not happening frequently now though!

That distribution looks pretty good. I think benchmarks will definitely be helpful here though, and hopefully we'll see an impact at higher thread counts.

jimvdl · 2022-03-25T09:09:53Z

I'll soon give adding benchmarks for this a try when I have a little bit more time. I did open a PR for the benchmarks as they currently do not compile, once that's resolved I'll get back to this PR.

…-counters-optimization

…vdl/flurry into shared-counters-optimization

jimvdl · 2022-04-05T11:04:57Z

I've never really implemented benchmarks before but looking at the existing benchmarks I figured a good place to start would be comparing just a single AtomicIsize and the ConcurrentCounter.

I'm just going to include some results (I don't know if this is useful, lmk):

AtomicIsize at 1 thread:

	Lower bound	Estimate	Upper bound
Slope	136.70 us	136.89 us	137.11 us
Throughput	238.99 Melem/s	239.38 Melem/s	239.71 Melem/s
R²	0.9918508	0.9921105	0.9917720
Mean	137.41 us	137.87 us	138.41 us
Std. Dev.	1.6814 us	2.5705 us	3.4837 us
Median	136.69 us	136.91 us	137.26 us
MAD	573.91 ns	863.95 ns	1.3428 us

ConcurrentCounter at 1 thread:

	Lower bound	Estimate	Upper bound
Slope	277.78 us	278.57 us	279.64 us
Throughput	117.18 Melem/s	117.63 Melem/s	117.96 Melem/s
R²	0.9637630	0.9648086	0.9628616
Mean	277.87 us	278.61 us	279.61 us
Std. Dev.	1.7306 us	4.4850 us	6.9734 us
Median	277.45 us	277.67 us	277.84 us
MAD	740.74 ns	1.1073 us	1.3220 us

AtomicIsize at 8 threads:

	Lower bound	Estimate	Upper bound
Slope	528.59 us	536.34 us	544.36 us
Throughput	60.196 Melem/s	61.096 Melem/s	61.992 Melem/s
R²	0.6309893	0.6458310	0.6299712
Mean	531.82 us	537.97 us	544.15 us
Std. Dev.	29.949 us	31.549 us	32.805 us
Median	512.77 us	520.99 us	565.18 us
MAD	12.804 us	26.802 us	46.733 us

ConcurrentCounter at 8 threads:

	Lower bound	Estimate	Upper bound
Slope	1.0086 ms	1.0208 ms	1.0313 ms
Throughput	31.775 Melem/s	32.101 Melem/s	32.490 Melem/s
R²	0.7435502	0.7552155	0.7466318
Mean	1.0148 ms	1.0246 ms	1.0327 ms
Std. Dev.	25.988 us	45.767 us	67.823 us
Median	1.0210 ms	1.0288 ms	1.0363 ms
MAD	23.335 us	31.836 us	39.822 us

I didn't really know what the best way was to expose the private counter module so for now I copied and pasted it in so that is already a point of improvement.

I hope this was at least a step in the right direction, let me know what I can improve.

jonhoo · 2022-04-10T18:48:59Z

Hmm, that's interesting. If I'm reading your data correctly, it seems like ConcurrentCounter is slower under contention, not faster as we'd expect 🤔 How many (real) cores do you have on the computer you ran this on?

jimvdl · 2022-04-10T19:26:28Z

I ran this on an Intel Core i7-7700K, which has 4 cores and 8 threads. These benchmarks don't even take the actual insert operation into account, which makes me suspect that the impact of having just one AtomicIsize would even be less noticable in practise.

jonhoo · 2022-04-10T19:32:25Z

Yeah, it's tricky, because where I think this would matter is when you have, say, 16, or 32 real cores, which is harder to test for. I wonder what results you'd get on your box if you ran with 4 threads though — hyperthreads tend to only really add noise for these kinds of measurements.

jimvdl · 2022-04-10T19:55:45Z

I will run the benchmarks again when I use only 4 cores to see if that makes a difference, I'll get back to you on that one.

I might be able to test this on a CPU with 10 cores, and see if that makes a difference. Although let's say for a moment that having 16 or 32 makes a substantial difference, would the ConcurrentCounter still be useful knowing that with fewer than 16 cores you would get a performance decrease?

jimvdl · 2022-04-14T10:21:46Z

I've tried to disable hyperthreading but apparently my CPU doesn't have an option to disable it. (as far as I can find) I did try another route: in the task manager you can set the affinity of a running process, which I then set to 4 instead of the 8 it had before. I'm not sure if this gave me accurate results but here they are:

AtomicIsize at 1 thread:

	Lower bound	Estimate	Upper bound
Slope	129.12 us	129.23 us	129.36 us
Throughput	253.31 Melem/s	253.56 Melem/s	253.78 Melem/s
R²	0.9981254	0.9982359	0.9981086
Mean	129.14 us	129.26 us	129.40 us
Std. Dev.	379.87 ns	682.82 ns	953.36 ns
Median	128.98 us	129.03 us	129.13 us
MAD	207.45 ns	279.07 ns	362.17 ns

ConcurrentCounter at 1 thread:

	Lower bound	Estimate	Upper bound
Slope	231.46 us	231.60 us	231.77 us
Throughput	141.38 Melem/s	141.49 Melem/s	141.57 Melem/s
R²	0.9991705	0.9992210	0.9991435
Mean	231.51 us	231.66 us	231.85 us
Std. Dev.	409.77 ns	891.91 ns	1.2775 us
Median	231.42 us	231.47 us	231.53 us
MAD	212.07 ns	293.64 ns	343.09 ns

AtomicIsize at 4 threads:

	Lower bound	Estimate	Upper bound
Slope	455.77 us	477.68 us	498.68 us
Throughput	65.709 Melem/s	68.598 Melem/s	71.896 Melem/s
R²	0.2564080	0.2696591	0.2574293
Mean	449.16 us	464.91 us	480.76 us
Std. Dev.	76.443 us	80.814 us	84.013 us
Median	407.94 us	413.40 us	528.11 us
MAD	55.867 us	67.139 us	128.89 us

ConcurrentCounter at 4 theads:

	Lower bound	Estimate	Upper bound
Slope	742.61 us	756.28 us	769.56 us
Throughput	42.580 Melem/s	43.328 Melem/s	44.125 Melem/s
R²	0.5678183	0.5859128	0.5687718
Mean	745.44 us	753.67 us	761.84 us
Std. Dev.	35.724 us	42.078 us	48.248 us
Median	746.76 us	751.40 us	759.00 us
MAD	24.176 us	41.755 us	54.862 us

If you compare these to the 4 thread versions of the previous benchmark with hyperthreading enabled you can see a slight difference:

AtomicIsize with 4 threads: (note: hyperthreading enabled)

	Lower bound	Estimate	Upper bound
Slope	485.85 us	496.67 us	506.66 us
Throughput	64.674 Melem/s	65.975 Melem/s	67.444 Melem/s
R²	0.5423258	0.5583073	0.5446181
Mean	481.40 us	490.08 us	498.65 us
Std. Dev.	40.887 us	44.328 us	46.723 us
Median	470.72 us	520.55 us	523.34 us
MAD	11.299 us	16.970 us	67.557 us

ConcurrentCounter with 4 threads: (note: hyperthreading enabled)

	Lower bound	Estimate	Upper bound
Slope	1.0255 ms	1.0359 ms	1.0465 ms
Throughput	31.313 Melem/s	31.632 Melem/s	31.954 Melem/s
R²	0.8311427	0.8397268	0.8309221
Mean	1.0211 ms	1.0318 ms	1.0422 ms
Std. Dev.	44.870 us	54.107 us	63.172 us
Median	1.0214 ms	1.0305 ms	1.0466 ms
MAD	37.691 us	49.023 us	59.846 us

jonhoo · 2022-04-16T21:39:24Z

Hmm, that still looks like it's slower and less scalable, which is surprising. I'd be super curious to see on a higher-core-count machine, and ideally as a plot with error bars!

jimvdl · 2022-04-24T10:51:56Z

I tried to get my hands on that 10 core machine but sadly couldn't use it. Anything else I can try/do?

jonhoo · 2022-04-24T17:48:47Z

@ibraheemdev Any chance you have a box with more cores available? I might be able to spin something up, but my schedule is pretty packed for a while 😞

ibraheemdev · 2022-06-20T17:40:48Z

My machine has 8 cores, I can run the benchmarks but not sure we'll see any benefit at a low-medium core count.

JackThomson2 · 2022-06-26T21:02:24Z

Hey was interested by this PR and had a play around, I found using the nightly only #[thread_local] gave some promising results

jack_counter/1          time:   [158.47 us 159.09 us 160.13 us]
                        thrpt:  [204.63 Melem/s 205.97 Melem/s 206.77 Melem/s]

jack_counter/4          time:   [519.68 us 526.90 us 533.70 us]
                        thrpt:  [61.398 Melem/s 62.190 Melem/s 63.054 Melem/s]

jack_counter/8          time:   [421.63 us 427.01 us 432.60 us]
                        thrpt:  [75.746 Melem/s 76.738 Melem/s 77.718 Melem/s]

Compared to the AtomicIsize

atomic_counter/1        time:   [156.52 us 156.65 us 156.83 us]
                        thrpt:  [208.95 Melem/s 209.18 Melem/s 209.35 Melem/s]

atomic_counter/4        time:   [575.31 us 576.61 us 577.85 us]
                        thrpt:  [56.707 Melem/s 56.828 Melem/s 56.958 Melem/s]

atomic_counter/8        time:   [592.14 us 592.63 us 593.18 us]
                        thrpt:  [55.241 Melem/s 55.293 Melem/s 55.338 Melem/s]

This was using the benchmarking setup from this PR and I have the code here if you want to have a look: https://github.com/JackThomson2/fast-counter/blob/master/src/lib.rs

jimvdl · 2022-06-29T13:34:53Z

Those results are promising indeed! I like the solution you went with, definitely worth looking into more. I'm wondering why the thread local approach is more performant compared to my solution. When you possible published this as a crate it could be used in Flurry as a dependency instead.

JackThomson2 · 2022-07-05T14:30:04Z

I'll have a look at getting this published / compare the performance to the thread_local non nightly macro.

I originally had a look at optimising your approach, the few findings I could see which helped speed it up where

Ensuring the number of cpus where set to next_power_of_two() and eliminating the divide here let c = &self.cells[index as usize % self.cells.len()];
Adding another base = self.base.load(Ordering::SeqCst); at the end of the loop (line 48) this meant it was less likely to fail, I assumed under higher contention this number will have been updated by the time we came around.

I think the reason it's not as fast was that you just moved the contention to the next cell. I had an experiment using wyrand as a pseudo random generator to pick a random cell, and this did also help slightly

JackThomson2 · 2022-07-26T11:40:08Z

Here are the results for 2-16 cores with the different approaches:

atomic_counter/2        time:   [282.18 us 285.66 us 289.18 us]
                        thrpt:  [113.31 Melem/s 114.71 Melem/s 116.12 Melem/s]

atomic_counter/4        time:   [324.25 us 326.41 us 328.51 us]
                        thrpt:  [99.749 Melem/s 100.39 Melem/s 101.06 Melem/s]

atomic_counter/8        time:   [345.57 us 346.09 us 346.61 us]
                        thrpt:  [94.539 Melem/s 94.681 Melem/s 94.824 Melem/s]

atomic_counter/16       time:   [414.53 us 415.65 us 416.83 us]
                        thrpt:  [78.612 Melem/s 78.836 Melem/s 79.048 Melem/s]


==============================================
==============================================


fast_counter/2          time:   [370.83 us 377.43 us 383.15 us]
                        thrpt:  [85.522 Melem/s 86.818 Melem/s 88.364 Melem/s]

fast_counter/4          time:   [338.49 us 345.35 us 351.70 us]
                        thrpt:  [93.171 Melem/s 94.882 Melem/s 96.807 Melem/s]

fast_counter/8          time:   [249.25 us 254.46 us 259.47 us]
                        thrpt:  [126.29 Melem/s 128.78 Melem/s 131.47 Melem/s]

fast_counter/16         time:   [163.34 us 169.76 us 176.39 us]
                        thrpt:  [185.77 Melem/s 193.03 Melem/s 200.61 Melem/s]


==============================================
==============================================


fast_counter thread local macro/2
                        time:   [388.31 us 392.67 us 396.95 us]
                        thrpt:  [82.549 Melem/s 83.449 Melem/s 84.387 Melem/s]

fast_counter thread local macro/4
                        time:   [364.32 us 369.14 us 373.44 us]
                        thrpt:  [87.746 Melem/s 88.769 Melem/s 89.943 Melem/s]

fast_counter thread local macro/8
                        time:   [254.32 us 259.57 us 265.15 us]
                        thrpt:  [123.58 Melem/s 126.24 Melem/s 128.84 Melem/s]

fast_counter thread local macro/16
                        time:   [172.06 us 175.66 us 179.73 us]
                        thrpt:  [182.32 Melem/s 186.54 Melem/s 190.44 Melem/s]

I will look at adding inline attributes to the methods, I don't think these are inlining at the moment, when I manually copied them into the test file the 2 and 4 core are much closer

jimvdl · 2022-08-03T10:41:45Z

All the results look way better than a single atomic counter. It has some additional overhead for 2 core cpu's but honestly I don't think that would be an issue since everyone has at least 4 cores anyway.

You want to open a PR for this instead? If you do lmk I'll close this one.

JackThomson2 · 2022-08-03T18:47:40Z

Even better news my suspicion around the inlining was correct was correct, when I added #[inline] we're much closer on 2 core and faster on 4 core!

atomic_counter/2        time:   [290.27 us 293.65 us 297.26 us]
                        thrpt:  [110.23 Melem/s 111.59 Melem/s 112.89 Melem/s]

atomic_counter/4        time:   [320.62 us 323.01 us 325.27 us]
                        thrpt:  [100.74 Melem/s 101.45 Melem/s 102.20 Melem/s]

atomic_counter/8        time:   [343.33 us 344.14 us 344.98 us]
                        thrpt:  [94.985 Melem/s 95.217 Melem/s 95.442 Melem/s]

atomic_counter/16       time:   [410.49 us 411.71 us 412.99 us]
                        thrpt:  [79.344 Melem/s 79.590 Melem/s 79.827 Melem/s]

------------------------------------------------------------------------------

fast_counter_nightly/2  time:   [314.05 us 315.63 us 317.16 us]
                        thrpt:  [103.32 Melem/s 103.82 Melem/s 104.34 Melem/s]

fast_counter_nightly/4  time:   [292.82 us 294.93 us 296.72 us]
                        thrpt:  [110.44 Melem/s 111.10 Melem/s 111.91 Melem/s]

fast_counter_nightly/8  time:   [209.61 us 215.30 us 221.28 us]
                        thrpt:  [148.08 Melem/s 152.20 Melem/s 156.33 Melem/s]

fast_counter_nightly/16 time:   [157.28 us 160.06 us 163.00 us]
                        thrpt:  [201.04 Melem/s 204.72 Melem/s 208.34 Melem/s]

------------------------------------------------------------------------------


fast_counter_stable/2   time:   [400.89 us 407.77 us 413.33 us]
                        thrpt:  [79.277 Melem/s 80.360 Melem/s 81.739 Melem/s]

fast_counter_stable/4   time:   [369.10 us 372.90 us 376.90 us]
                        thrpt:  [86.942 Melem/s 87.873 Melem/s 88.778 Melem/s]

fast_counter_stable/8   time:   [247.36 us 253.10 us 258.51 us]
                        thrpt:  [126.76 Melem/s 129.47 Melem/s 132.47 Melem/s]

fast_counter_stable/16  time:   [162.17 us 166.01 us 170.13 us]
                        thrpt:  [192.60 Melem/s 197.39 Melem/s 202.06 Melem/s]

I'll see if I can get time to make the PR to change to this!

jimvdl · 2022-08-15T09:32:14Z

Closing in favour of #109.

jonhoo · 2022-08-20T18:22:50Z

Great work folks, thanks for picking this up and driving it on your own!

jimvdl added 7 commits November 23, 2021 15:41

Shared counters optimization jonhoo#11

2c08ee0

Missing safety warning

d5aaaef

Minor changes

d982e4d

Fixed invalid drop of cells

72286fa

Run cargo fmt

d4e8587

Fixed incorrect drop of old counter cells after resize

2a4a28f

jimvdl added 5 commits December 9, 2021 14:59

Moved counter cell logic into its own module

7b2e23f

Minor doc fixes

4b3d0ed

Ran cargo fmt

d7de300

Impl Send + Sync for LongAdder

5882f58

Removed unnecessary Send and Sync for LongAdder

577874e

jimvdl added 2 commits December 17, 2021 16:11

Refactored LongAdder to ConcurrentCounter

52d1cfe

Merge branch 'master' of https://github.com/jonhoo/flurry into shared…

8be45ce

…-counters-optimization

jonhoo requested changes Mar 7, 2022

View reviewed changes

src/counter.rs Outdated Show resolved Hide resolved

src/counter.rs Outdated Show resolved Hide resolved

src/counter.rs Outdated Show resolved Hide resolved

src/map.rs Show resolved Hide resolved

src/map.rs Outdated Show resolved Hide resolved

merged master, minor clippy adjustments

d72fed6

Applied review changes

166dca0

jonhoo requested changes Mar 9, 2022

View reviewed changes

src/counter.rs Outdated Show resolved Hide resolved

src/map.rs Outdated Show resolved Hide resolved

src/map.rs Outdated Show resolved Hide resolved

src/map.rs Outdated Show resolved Hide resolved

src/map.rs Outdated Show resolved Hide resolved

Applied some review changes

a7e3aaf

- Made the resize hint check more concise. - Moved the TODO about the CounterCell implementation to the counter module. - Reverted the counter declaration back to the original line.

Better cell selection, reverted resize hint change

68eda45

Fixed an issue where base wasn't retried correctly every iteration

e398d4f

jonhoo approved these changes Mar 19, 2022

View reviewed changes

jimvdl added 4 commits March 25, 2022 21:44

Merge branch 'master' into shared-counters-optimization

906672e

Merge branch 'master' of https://github.com/jonhoo/flurry into shared…

5221009

…-counters-optimization

Initial benchmarking approach

d8343f9

Merge branch 'shared-counters-optimization' of https://github.com/jim…

1bd51f5

…vdl/flurry into shared-counters-optimization

JackThomson2 mentioned this pull request Aug 14, 2022

Sharded counter optimisation #109

Open

jimvdl closed this Aug 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared counters optimization #101

Shared counters optimization #101

jimvdl commented Nov 27, 2021 •

edited

Loading

jimvdl commented Nov 29, 2021

jonhoo commented Dec 9, 2021

jimvdl commented Dec 10, 2021

jonhoo commented Dec 16, 2021

jimvdl commented Dec 17, 2021 •

edited

Loading

jonhoo left a comment

jonhoo commented Mar 7, 2022

ibraheemdev commented Mar 7, 2022

jimvdl commented Mar 8, 2022 •

edited

Loading

codecov bot commented Mar 8, 2022 •

edited

Loading

jonhoo left a comment

jimvdl commented Mar 10, 2022 •

edited

Loading

jonhoo commented Mar 11, 2022

jimvdl commented Mar 13, 2022 •

edited

Loading

jonhoo commented Mar 19, 2022

jimvdl commented Mar 25, 2022

jimvdl commented Apr 5, 2022

jonhoo commented Apr 10, 2022

jimvdl commented Apr 10, 2022

jonhoo commented Apr 10, 2022

jimvdl commented Apr 10, 2022

jimvdl commented Apr 14, 2022 •

edited

Loading

jonhoo commented Apr 16, 2022

jimvdl commented Apr 24, 2022

jonhoo commented Apr 24, 2022

ibraheemdev commented Jun 20, 2022

JackThomson2 commented Jun 26, 2022

jimvdl commented Jun 29, 2022 •

edited

Loading

JackThomson2 commented Jul 5, 2022

JackThomson2 commented Jul 26, 2022 •

edited

Loading

jimvdl commented Aug 3, 2022 •

edited

Loading

JackThomson2 commented Aug 3, 2022

jimvdl commented Aug 15, 2022

jonhoo commented Aug 20, 2022

Shared counters optimization #101

Shared counters optimization #101

Conversation

jimvdl commented Nov 27, 2021 • edited Loading

jimvdl commented Nov 29, 2021

jonhoo commented Dec 9, 2021

jimvdl commented Dec 10, 2021

jonhoo commented Dec 16, 2021

jimvdl commented Dec 17, 2021 • edited Loading

jonhoo left a comment

Choose a reason for hiding this comment

jonhoo commented Mar 7, 2022

ibraheemdev commented Mar 7, 2022

jimvdl commented Mar 8, 2022 • edited Loading

codecov bot commented Mar 8, 2022 • edited Loading

Codecov Report

jonhoo left a comment

Choose a reason for hiding this comment

jimvdl commented Mar 10, 2022 • edited Loading

jonhoo commented Mar 11, 2022

jimvdl commented Mar 13, 2022 • edited Loading

jonhoo commented Mar 19, 2022

jimvdl commented Mar 25, 2022

jimvdl commented Apr 5, 2022

jonhoo commented Apr 10, 2022

jimvdl commented Apr 10, 2022

jonhoo commented Apr 10, 2022

jimvdl commented Apr 10, 2022

jimvdl commented Apr 14, 2022 • edited Loading

jonhoo commented Apr 16, 2022

jimvdl commented Apr 24, 2022

jonhoo commented Apr 24, 2022

ibraheemdev commented Jun 20, 2022

JackThomson2 commented Jun 26, 2022

jimvdl commented Jun 29, 2022 • edited Loading

JackThomson2 commented Jul 5, 2022

JackThomson2 commented Jul 26, 2022 • edited Loading

jimvdl commented Aug 3, 2022 • edited Loading

JackThomson2 commented Aug 3, 2022

jimvdl commented Aug 15, 2022

jonhoo commented Aug 20, 2022

jimvdl commented Nov 27, 2021 •

edited

Loading

jimvdl commented Dec 17, 2021 •

edited

Loading

jimvdl commented Mar 8, 2022 •

edited

Loading

codecov bot commented Mar 8, 2022 •

edited

Loading

jimvdl commented Mar 10, 2022 •

edited

Loading

jimvdl commented Mar 13, 2022 •

edited

Loading

jimvdl commented Apr 14, 2022 •

edited

Loading

jimvdl commented Jun 29, 2022 •

edited

Loading

JackThomson2 commented Jul 26, 2022 •

edited

Loading

jimvdl commented Aug 3, 2022 •

edited

Loading