opt: farmer cache flatten piece_caches #2925

tediou5 · 2024-07-16T16:26:28Z

This is the first step of #1769 , which is to collect all the stored_pieces and free_offsets of all the caches together, so that we don't need to resort to UniqueRecordBinaryHeap to manage the synchronisation of the individual caches.

Code contributor checklist:

I have read, understood and followed contributing guide

tediou5 · 2024-07-16T16:28:22Z

This is just an internal structural change that our existing test cases can cover.

nazar-pc

I like that it became simpler and even faster for lookups since we no longer need to iterate over caches (though there should not have been many of them anyway).

However, there was a reason it was implemented the way it was, let me explain.

Offsets were stored in flat data structures such that we do not need to store cache index in every single entry unnecessarily.
For example for 1T worth of pieces we'll store 1M of 2-byte cache index values, which means we use 2G of RAM just on those numbers while previously it was not using any memory at all.
In fact right now for 1T of cache it'll likely use 4G of RAM due to memory alignment of FarmerCacheOffset data stucture caused by the fact that another field is u32 and that causes the whole data structure to be aligned to 4 bytes even though 2 out of 8 bytes will not be used, though this can be mitigated by forcing alignment of the data structure to 2 bytes.

Another preformance concern is the fact that free offsets are no longer distributed across different caches, which means both worse cache read performance, potential farming issues and slower worst case piece cache re-sync when one of the existing farms is removed from farmer and cache wasn't filled fully.

I think we can find a creative way of fixing the second issue, but the first issue with offsets is kind of inherent.

I'm not sure how big of a deal it is in practice, it might still be worth doing, but it was not arbitrary that it was all stored the way it was.
At the same time 1T of cache for local farmer by default means ~100T of space pledged, so it may not be a horrible thing to have +2G of memory usage on top of already quite high memory usage.

nazar-pc · 2024-07-17T06:10:26Z

crates/subspace-farmer/src/farmer_cache.rs

@@ -176,38 +190,38 @@ where
            // TODO: Consider implementing optional re-sync of the piece instead of just forgetting
            WorkerCommand::ForgetKey { key } => {
                let mut caches = self.piece_caches.write().await;
+                let Some(offset) = caches.stored_pieces.remove(&key) else {
+                    // key not exist.


NIT: please use this formatting going forward:

Suggested change

// key not exist.

// Key not exist

To be consistent with existing comments

nazar-pc · 2024-07-17T06:11:55Z

crates/subspace-farmer/src/farmer_cache.rs

+                    Ok(None) => {
+                        warn!(
+                            %cache_index,
+                            cache_offset = %offset.piece_offset,


I'd extract piece_offset into a variable like cache_index and call it piece_offset here as well. It is harder to debug when the same thing is called piece_offset in one place and cache_offset in another.

nazar-pc · 2024-07-17T06:14:27Z

crates/subspace-farmer/src/farmer_cache.rs

-            stored_pieces.push(state.stored_pieces);
-            state.free_offsets.clear();
-            free_offsets.push(state.free_offsets);
+        const MAX_CACHES_NUM: usize = u16::MAX as usize;


I'd create a type alias type CacheIndex = u16 and used it both here (you'll be able to move this constant out BTW) and in FarmerCacheOffset data structure, such that reviewer can see that those things are in fact related more easily. For farms we even have a generic that is u8 for local farms and u16 for cluster setup because type size here impacts RAM usage.

nazar-pc · 2024-07-17T06:16:07Z

crates/subspace-farmer/src/farmer_cache.rs

+        let mut backends = Vec::new();
+        #[allow(clippy::mutable_key_type)]
+        let mut stored_pieces = HashMap::new();
+        let mut free_offsets = VecDeque::new();


Please do not do ::new() if you can avoid it, it will result in bad performance and bad memory usage. See how previous code was trying to carefully preallocate all data structures to correct size beforehand, in fact it was even reusing previous memory allocations rather than doing fresh allocations. I think it might be good to preserve that.

nazar-pc · 2024-07-17T06:19:15Z

crates/subspace-farmer/src/farmer_cache.rs

-
-                return;
+        // Build cache state of all backends
+        for (index, new_cache) in new_piece_caches.into_iter().enumerate() {


I understand that now there is a single data structure containing everything, but it would be really nice for initialization performance to remain high.

You have removed run_future_in_dedicated_thread, meaning all the caches will now be processed sequentially instead of concurrently, which will massively slow down cache initialization performance, especially for large farmers.

nazar-pc · 2024-07-17T06:27:37Z

crates/subspace-farmer/src/farmer_cache.rs

+                if let Some(capacity_used) =
+                    piece_caches_capacity_used.get_mut(usize::from(offset.cache_index))
+                {
+                    *capacity_used += 1;
+                }


I'd write it as this due to pre-allocation above:

Suggested change

if let Some(capacity_used) =

piece_caches_capacity_used.get_mut(usize::from(offset.cache_index))

{

*capacity_used += 1;

}

piece_caches_capacity_used[usize::from(offset.cache_index)] += 1;

Thought this is not a correct logic, it should only be increased if the key is not a duplicate (which is exactly what piece_indices_to_store.remove(key).is_none() is for below.

nazar-pc · 2024-07-17T06:29:02Z

crates/subspace-farmer/src/farmer_cache.rs

-                    let Some(offset) = cache.free_offsets.pop_front() else {
-                        return false;
-                    };
+            let Some(offset) = caches.free_offsets.pop_front() else {


This will work, but it loses the property where pieces were distributed across multiple caches for performance reasons. Unless offsets interleave this will likely result in higher read load on one disk and lower on another, which may negatively impact farming performance.

Let me think about it, I should somehow make the cache load as balanced as possible, and I think we can do that.

nazar-pc · 2024-07-17T06:32:17Z

crates/subspace-farmer/src/farmer_cache.rs

                );
+                break;


The fact that one backend has errors shouldn't prevent pieces from being stored in other backends. See how previous code was iterating over all caches to try and find one that is operational. We still need that loop here, but the issue is that we can't iterate over different backends anymore, so it will be less performant either way (will likely need to cache problematic backend indices to avoid hitting it over and over again repeatedly).

You're right. It could be a disaster here.

nazar-pc · 2024-07-17T06:32:37Z

crates/subspace-farmer/src/farmer_cache.rs

-                        );
-                        cache.stored_pieces.insert(record_key, offset);
-                    }
+                // for (cache_index, cache) in caches.iter_mut().enumerate() {


NIT: Don't commit commented-out code, please

nazar-pc · 2024-07-17T06:33:44Z

crates/subspace-farmer/src/farmer_cache.rs

+                };
+                let cache_index = usize::from(offset.cache_index);
+                let piece_offset = offset.piece_offset;
+                let Some(backend) = caches.backends.get(cache_index).cloned() else {


Does it need to be cloned though?

beckend is wrapped in Arc. In this clone in order to avoid lending immutable borrowed

Why do you need to avoid it? BTW you can leave multiple comments as "Review" if you switch to "Files changed" tab of the PR instead, it decreases number of notifications for repository maintainers greatly.

Oh yeah, there's no need to avoid it, its a mistake.

BTW you can leave multiple comments as "Review" if you switch to "Files changed" tab of the PR instead, it decreases number of notifications for repository maintainers greatly.

Okay, I'll take care of that. (Also, I'm very sorry, but I submitted the commit with wrong author, can I just override that?)

Sure, please force-push with name change and no other changes so I don't need to re-review it again

nazar-pc · 2024-07-17T09:33:25Z

I thought about it some more and I think we can keep free offsets separate for balancing purposes, but combine stored pieces. This will also reduce memory usage for large farmers in the meantime since most of the offsets will be free for them and will use more efficient layout in memory.

tediou5 · 2024-07-17T10:48:05Z

@nazar-pc I actually had the crazy idea of stitching together all the stored and free cache to get a fixed length, contiguous array (0..piece_cache_capacity_total). This can be compactly stored in a bitmap. By storing an additional Vec<max_num_elements>, the backend can be deduced from the index in the bitmap (because of the contiguous splicing). Also 01 can be used to mark stored or free.

1T cache -> 2^20 piece -> 2^20b -> 2^17B -> 2^7KiB
(128K + 32 * CacheNumber if my math is correct)

tediou5 · 2024-07-17T10:53:26Z

But let me fix the problem mentioned above first

nazar-pc · 2024-07-17T10:58:20Z

Stored pieces need to be in hashmap due to lookups, we can't replace hashmap with a vector and still get efficient lookups.

As for knowing free offsets we can potentially compact it to a single number per cache backend by simply storing how many offsets we've occupied because we occupy offsets sequentially. The only issue there is that we'll not be able to store offsets that became free due to read errors, we'll need to use a separate data structure for that, which can be tiny and only used in this emergency situation.

Let's probably not do that as part of this PR though, one step at a time so reviewing is manageable.

tediou5 · 2024-07-17T11:02:02Z

Let's probably not do that as part of this PR though, one step at a time so reviewing is manageable.

Yeah, I agree. I'll clean up the code for review and subsequent submission.

nazar-pc

I think it looks mostly good, but I still have two requests here.

I'd like to see yet another PR where you do the refactoring separately from features

For example in last commit you have done both refactoring of code and some renaming and new features all at once.
There is a lot of noise that is hard to review and new code even though looks like old code is subtly different that makes diff unnecessarily larger and harder to review.

For example despite similar structure new code uses backend instead of new_cache, and cache_index instead of index, it also extracted new_cache.max_num_elements() into max_num_elements variable.
While I agree with all of those changes, they are useless noise when I want to just see what logic you have changed (if any). Currently both diff of the commit and final diff are both annoyingly hard to read.

See for example how #2912 moved things around without actually changing anything in terms of how code works such that subsequent PRs are easier to read.
Or #2920 that was built on top with a few commits that intentionally change a few things logically so you don't need to analyze changes in all 15 files at once.
Or see how #2626 has a few things renamed, but renamings are isolated to separate commits, so they don't really need to be reviewed.

If you create a separate PR that renames things into what they will be named in the upcoming PR, it will decrease the diff and time to review significantly.
Right now my brain simply wants to give up reading because things have changed where they didn't need to, at least not at the same time with other changes.

I can ignore whitespace changes in diff, but I can't ignore renamings unfortunately.

Since we will want to refactor the way free offsets work, we don't need to change them in this PR

Free offsets will likely be replaced with a number/pair of numbers for each cache rather than a flat vector of actual offsets.
As such, we don't need to change that part of the logic in this PR and have a temporary degradation of cache performance, we can delay this to a separate PR that we already know we'll want to do.

nazar-pc · 2024-07-18T07:44:07Z

crates/subspace-farmer/src/bin/subspace-farmer/commands/cluster/controller/caches.rs

@@ -86,7 +86,7 @@ impl KnownCaches {
 pub(super) async fn maintain_caches(
    cache_group: &str,
    nats_client: &NatsClient,
-    farmer_cache: FarmerCache,
+    farmer_cache: FarmerCache<u16>,


Please use type alias just like we do for FarmIndex. Simple type CacheIndex = u16 will do. It is annoying to have u16 sprinkled in multiple places without direct link together.

nazar-pc · 2024-07-18T07:44:19Z

crates/subspace-farmer/src/bin/subspace-farmer/commands/shared/network.rs

 where
    FarmIndex: Hash + Eq + Copy + fmt::Debug + Send + Sync + 'static,
    usize: From<FarmIndex>,
+    CacheIndex: Hash + Eq + Copy + fmt::Debug + Default + Send + Sync + 'static,


Why is Default required?

tediou5 · 2024-07-18T09:10:46Z

I think it looks mostly good, but I still have two requests here.

I'd like to see yet another PR where you do the refactoring separately from features

For example in last commit you have done both refactoring of code and some renaming and new features all at once. There is a lot of noise that is hard to review and new code even though looks like old code is subtly different that makes diff unnecessarily larger and harder to review.

For example despite similar structure new code uses backend instead of new_cache, and cache_index instead of index, it also extracted new_cache.max_num_elements() into max_num_elements variable. While I agree with all of those changes, they are useless noise when I want to just see what logic you have changed (if any). Currently both diff of the commit and final diff are both annoyingly hard to read.

See for example how #2912 moved things around without actually changing anything in terms of how code works such that subsequent PRs are easier to read. Or #2920 that was built on top with a few commits that intentionally change a few things logically so you don't need to analyze changes in all 15 files at once. Or see how #2626 has a few things renamed, but renamings are isolated to separate commits, so they don't really need to be reviewed.

If you create a separate PR that renames things into what they will be named in the upcoming PR, it will decrease the diff and time to review significantly. Right now my brain simply wants to give up reading because things have changed where they didn't need to, at least not at the same time with other changes.

I can ignore whitespace changes in diff, but I can't ignore renamings unfortunately.

Since we will want to refactor the way free offsets work, we don't need to change them in this PR

Free offsets will likely be replaced with a number/pair of numbers for each cache rather than a flat vector of actual offsets. As such, we don't need to change that part of the logic in this PR and have a temporary degradation of cache performance, we can delay this to a separate PR that we already know we'll want to do.

I'm very sorry for the disaster my bad habits have caused you. I'll take care of this, and keep my commit as small as possible.

nazar-pc · 2024-07-22T17:55:53Z

I squashed your commits into one, added one more before to simplify squashed diff (you can see force push diff in https://github.com/subspace/subspace/compare/175ab9cd19b48fdce7864942fe8507a8d1ada307..5a43d331935b1ddc4a5eb4fc7f7b539ac25e328f).

I also pushed one more commit on top that fixes locking (piece cache read lock was held unnecessarily before taking plot cache lock) and reordered code closer to what it was before, such that looking at the final diff code is very similar and only actually changed part related to piece cache is now shown as changed. I will squash that last commit in, just wanted to show what I have changed and why.

Now I think we're close to merging this, but free offsets need to be refactored first or else we'll have a regression, here is how I think that should be done (and can be pushed into this PR):

for every piece cache we'll store total capacity and used capacity to understand what offsets are completely free
in addition to above for each piece cache we'll need to store "dangling offsets", those are the offsets that were forgotten in runtime or had gaps/corruptions during initial reading

No force pushes with rebase here, please, it makes reviewing much harder than it needs to be.

nazar-pc · 2024-07-27T14:58:02Z

Do I understand correctly that you just reased it on main?

tediou5 · 2024-07-27T15:57:24Z

Do I understand correctly that you just reased it on main?

I pushed him from the wrong branch, and after I found out I reset it back to.

@nazar-pc Only the last commit makes sense, the previous ones were just tiny refactorings and renames

nazar-pc

Almost there, just accounting for used capacity/dangling free offsets is incorrect

crates/subspace-farmer/src/farmer_cache.rs

nazar-pc · 2024-07-29T10:28:45Z

crates/subspace-farmer/src/farmer_cache.rs

@@ -367,6 +431,7 @@ where
                        let offset = FarmerCacheOffset::new(cache_index, piece_offset);
                        match maybe_piece_index {
                            Some(piece_index) => {
+                                *used_capacity = piece_offset.0 + 1;


This naive logic is actually not correct. See, you get maybe_piece_index, there is nothing in API that guarantees that used pieces offsets do not have unused offsets in between them. In fact this is exactly what dangling offsets are supposed to represent: free offsets dangling inside of otherwise used capacity.

So used capacity is the whole range of used indices even if there are "holes" in it, while this is only increasing index for non-dangling offsets, which will result in inconsistent data structure.

Yes, so what's actually being done here is to record the last_used_offset, as well as all free_offsets for the current cache_backend (including dangling_free_offset, you can see line 439, just below. ). And later on, we'll unify to compressing them.

Hm... I see. This does work correctly, though it will potentially allocate a large cache_stored_pieces that would otherwise be empty or near-empty.

tediou5

I will calculate dangling_free_offsets inline. But the handling of dangling_offsets I think is correct.

crates/subspace-farmer/src/farmer_cache.rs

tediou5 · 2024-07-29T12:41:13Z

crates/subspace-farmer/src/farmer_cache.rs

@@ -367,6 +431,7 @@ where
                        let offset = FarmerCacheOffset::new(cache_index, piece_offset);
                        match maybe_piece_index {
                            Some(piece_index) => {
+                                *used_capacity = piece_offset.0 + 1;


Yes, so what's actually being done here is to record the last_used_offset, as well as all free_offsets for the current cache_backend (including dangling_free_offset, you can see line 439, just below. ). And later on, we'll unify to compressing them.

tediou5 · 2024-07-29T12:44:24Z

crates/subspace-farmer/src/farmer_cache.rs

@@ -414,7 +479,8 @@ where
                    let backend = cache.backend;
                    let free_offsets = cache.cache_free_offsets;
                    stored_pieces.extend(cache.cache_stored_pieces.into_iter());
-                    dangling_free_offsets.extend(free_offsets.into_iter());
+                    dangling_free_offsets
+                        .extend(backend.dangling_free_offsets(free_offsets).into_iter());


@nazar-pc Here we extract all free_offsets that are before last_used_offset (they are dangling_offsets)

nazar-pc

I may massage this a bit before merging, but it looks good otherwise, thanks a lot!

nazar-pc · 2024-07-29T14:25:28Z

crates/subspace-farmer/src/farmer_cache.rs

@@ -367,6 +431,7 @@ where
                        let offset = FarmerCacheOffset::new(cache_index, piece_offset);
                        match maybe_piece_index {
                            Some(piece_index) => {
+                                *used_capacity = piece_offset.0 + 1;


Hm... I see. This does work correctly, though it will potentially allocate a large cache_stored_pieces that would otherwise be empty or near-empty.

crates/subspace-farmer/src/farmer_cache.rs

tediou5

Looks good to me

nazar-pc

Let go, thank you!

EmilFattakhov · 2024-08-15T15:56:58Z

Thank you very much for your contributions @tediou5 through this PR and several other PRs you've submitted throughout July!

As a token of our appreciation, we would love to reward you with some USDC as a part of our Contribution Contest . Please fill out this form and we will send the reward your way.

Sorry about the delay in sending you USDC for your contributions past month, we're going to sum them up!

Thanks again and looking forward to your future contributions!

tediou5 requested review from shamil-gadelshin, nazar-pc and rg3l3dr as code owners July 16, 2024 16:26

tediou5 force-pushed the opt/flatten-farmer-caches branch from d2d0551 to 3ca4a28 Compare July 17, 2024 06:23

nazar-pc reviewed Jul 17, 2024

View reviewed changes

tediou5 force-pushed the opt/flatten-farmer-caches branch from 3ca4a28 to 392c91a Compare July 17, 2024 13:05

tediou5 requested a review from nazar-pc July 17, 2024 15:40

nazar-pc reviewed Jul 18, 2024

View reviewed changes

tediou5 requested a review from nazar-pc July 18, 2024 09:11

tediou5 force-pushed the opt/flatten-farmer-caches branch from 3d00acd to 175ab9c Compare July 19, 2024 16:54

Some renaming to make future changes smaller

c1ddc13

nazar-pc force-pushed the opt/flatten-farmer-caches branch from 175ab9c to 5a43d33 Compare July 22, 2024 17:48

tediou5 force-pushed the opt/flatten-farmer-caches branch from 9bd96bf to 231ab78 Compare July 27, 2024 14:43

nazar-pc reviewed Jul 29, 2024

View reviewed changes

tediou5 commented Jul 29, 2024

View reviewed changes

tediou5 requested a review from nazar-pc July 29, 2024 13:06

nazar-pc previously approved these changes Jul 29, 2024

View reviewed changes

tedious added 3 commits July 29, 2024 22:24

opt: farmer cache flatten piece_caches

c9b5c5d

chore: tiny refactor and rename

040bbfa

opt: refactor farmer cache free offsets

6c98cf6

TODO and a comment

9cc17ea

nazar-pc dismissed their stale review via 9cc17ea July 29, 2024 19:29

nazar-pc force-pushed the opt/flatten-farmer-caches branch from c3d136e to 9cc17ea Compare July 29, 2024 19:29

nazar-pc enabled auto-merge July 29, 2024 19:30

tediou5 commented Jul 30, 2024

View reviewed changes

nazar-pc approved these changes Jul 30, 2024

View reviewed changes

nazar-pc added this pull request to the merge queue Jul 30, 2024

Merged via the queue into autonomys:main with commit dbdfee5 Jul 30, 2024
9 checks passed

This was referenced Aug 3, 2024

feat: remove unique record binary heap #2960

Merged

feat: remove unique record binary heap #2892

Closed

opt: farmer cache flatten piece_caches #2925

opt: farmer cache flatten piece_caches #2925

Conversation

tediou5 commented Jul 16, 2024

Code contributor checklist:

tediou5 commented Jul 16, 2024

nazar-pc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nazar-pc commented Jul 17, 2024

tediou5 commented Jul 17, 2024

tediou5 commented Jul 17, 2024

nazar-pc commented Jul 17, 2024

tediou5 commented Jul 17, 2024

nazar-pc left a comment

Choose a reason for hiding this comment

I'd like to see yet another PR where you do the refactoring separately from features

Since we will want to refactor the way free offsets work, we don't need to change them in this PR

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tediou5 commented Jul 18, 2024

I'd like to see yet another PR where you do the refactoring separately from features

Since we will want to refactor the way free offsets work, we don't need to change them in this PR

nazar-pc commented Jul 22, 2024

nazar-pc commented Jul 27, 2024

tediou5 commented Jul 27, 2024

nazar-pc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tediou5 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nazar-pc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tediou5 left a comment

Choose a reason for hiding this comment

nazar-pc left a comment

Choose a reason for hiding this comment

EmilFattakhov commented Aug 15, 2024

tediou5 left a comment •

edited

Loading