Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve content propagation in sync #1480

Merged
merged 22 commits into from
Sep 22, 2023
Merged

Conversation

Frando
Copy link
Member

@Frando Frando commented Sep 14, 2023

Description

This improves content propagation for document sync by implementing the following features.

  • feat: allow to send gossip to neighbors only
  • feat: provider and candidate roles for downloader
  • feat: transfer content status during set reconciliation
  • feat: inform neighbors about finished content downloads

Notes & open questions

Change checklist

  • Self-review.
  • Documentation updates if relevant.
  • Tests if relevant.

@b5 b5 added this to the v0.6.0 milestone Sep 14, 2023
@Frando Frando force-pushed the sync-content-propagation branch 2 times, most recently from fe9dc20 to a6773be Compare September 14, 2023 13:20
@Frando
Copy link
Member Author

Frando commented Sep 14, 2023

sync_full_basic times out. I think it's due to how I changed the downloader, which is incorrect: I changed the delay for request scheduling to 0 in the case of peers marked as Provider. However when I'm rereading the code now, I think I'm noticing that this delay currently is intended to cover also the time it takes to establish the connection. So it fails, because the connection is not established in zero time. @divagant-martian maybe you can have a look at the commit that changes the downloader, maybe you have a good idea on how to wire this correctly without having to change too much of the downloader logic.

@divagant-martian
Copy link
Contributor

divagant-martian commented Sep 14, 2023

upon receiving the request you can only start it right away if the provider is connected. what we need to do is, when a provider has capacity (which includes ofc when it first connects) to get the next hash if provides eagerly, so changes there are a bit more involved. My plan was to store this in the ProviderMap as a HashMap<PublicKey, VecDeque<Hash>> everytime a hash is tried for this peer the Hash is moved to the back of the queue, so that we try different hashes for a single provider. A peer from the map would be removed when all it's provided hashes are removed, and hashes are removed when downloads are done.

In this way we combine needing to have delays on retries and eager downloads from known providers. I know you would like to consider delays per (hash, peer) but I would wait until we see if this is necessary since that's more overhead than what's potentially necessary to effectively get a download.

Also, I think it's a reasonable option to limit the PR up to where we define the peer priority and handle eager downloads in another one since what's in here is a good improvement per se already. This way you are not blocked by the changes of the downloader in the meantime

@Frando
Copy link
Member Author

Frando commented Sep 18, 2023

I added two commits:

  • make the downloader start downloading hashes added with PeerRole:::Provider immediately. Implements @divagant-martian's suggestion from the previous commit.
  • Expose the distance (in hops) that a gossip message travelled. Use this to treat sync gossip messages with a distance of 0 (i.e. received from author directly) to use PeerRole::Provider (i.e. assume that the author has the content available).

With these two changes, tests should now pass! And all artificial delays for content downloads in the happy case are gone now.

I think this is ready for review now. Noted some other things we might do to improve sync, but those should go into new PRs.

@Frando Frando force-pushed the sync-content-propagation branch 2 times, most recently from 32eff14 to 6f5ad3b Compare September 18, 2023 08:49
@Frando Frando force-pushed the sync-content-propagation branch from 6f5ad3b to dfba9bb Compare September 18, 2023 08:55
@Frando Frando changed the title (WIP) feat: Improve content propagation in sync feat: Improve content propagation in sync Sep 18, 2023
@Frando Frando mentioned this pull request Sep 18, 2023
29 tasks
Copy link
Contributor

@divagant-martian divagant-martian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just starting here

@Frando Frando force-pushed the sync-content-propagation branch 2 times, most recently from 49e2d40 to 6f67690 Compare September 20, 2023 13:53
Copy link
Contributor

@divagant-martian divagant-martian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I can't vouch more for the sync related code but gossip and downloads look generally good. Comments are minor

@Frando Frando force-pushed the sync-content-propagation branch from 6f67690 to d6ac046 Compare September 21, 2023 11:25
@Frando
Copy link
Member Author

Frando commented Sep 21, 2023

Thanks for the review. Addressed the comments, let's see what CI says :)

Copy link
Contributor

@divagant-martian divagant-martian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@divagant-martian divagant-martian added this pull request to the merge queue Sep 21, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Sep 21, 2023
@divagant-martian divagant-martian added this pull request to the merge queue Sep 21, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 21, 2023
@Frando Frando enabled auto-merge September 21, 2023 23:26
@Frando Frando added this pull request to the merge queue Sep 21, 2023
Merged via the queue into main with commit 49bde4f Sep 22, 2023
github-merge-queue bot pushed a commit that referenced this pull request Oct 6, 2023
…ases (#1584)

## Description

We weren't removing the hash of a request that was scheduled, became
ready but couldn't be executed. This caused getting the next hash for a
provider to pop a hash no longer necessary.

I added an invariant check for the provider map and it seems to be we
missed a couple removals in #1480 so this fixes removal in cancellation
as well

## Notes & open questions
one small change, seemingly unrelated, is that the `debug_assert` now
uses the debug version of the hash instead of display. This is because
throughout the file we log the kind in debug, which in turn uses the
debug impl for the hash. This difference made it a bit harder to follow
the logs because the printed hash didn't match those in the rest of the
logs. The change makes this consistent

## Change checklist

- [x] Self-review.
- [ ] Documentation updates if relevant.
- [x] Tests if relevant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants