Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for responding to a LocalStateQuery demand for a snapshot of big ledger peers #1067

Merged
merged 4 commits into from
Oct 23, 2024

Conversation

crocodile-dentist
Copy link
Contributor

@crocodile-dentist crocodile-dentist commented Apr 18, 2024

This change equips LocalStateQuery mini protocol app to respond to a request for a snapshot, taken from the current tip, of big ledger peers from a client, eg. cardano-cli. A new query tag GetBigLedgerPeerSnapshot is added to BlockQuery. This data can be loaded later by a node, and it may be leveraged by the network layer to facilitate syncing up a node from a blank or stale state. Diffusion layer is expected to make heavy use of these relays when bootstrapping from scratch in Genesis consensus mode.

@crocodile-dentist crocodile-dentist self-assigned this Apr 18, 2024
@crocodile-dentist crocodile-dentist requested a review from a team as a code owner April 18, 2024 07:06
@crocodile-dentist crocodile-dentist force-pushed the mwojtowicz/ledger-query-peer-snapshot branch from 3f419fa to 5ef2b15 Compare April 18, 2024 08:20
Copy link
Member

@amesgen amesgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only minor comments.

FTR: This builds on top of IntersectMBO/ouroboros-network#4850

@crocodile-dentist crocodile-dentist force-pushed the mwojtowicz/ledger-query-peer-snapshot branch 2 times, most recently from fc9cfa7 to 7704377 Compare April 18, 2024 13:06
@@ -434,6 +447,11 @@ instance (ShelleyCompatible proto era, ProtoCrypto proto ~ crypto)
SL.queryCommitteeMembersState coldCreds hotCreds statuses st
GetFilteredVoteDelegatees stakeCreds ->
getFilteredVoteDelegatees st stakeCreds
GetPeerSnapshot ->
Copy link
Contributor

@nfrisby nfrisby Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems relatively inexpensive, but do you know roughly whether there's a risk of a CPU and/or allocation spike when handling this query, for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not analyzed this - as far as calculating the big ledger peers from the whole, it is just some folds and sorting, with strictness applied where appropriate, so it should be well behaved for reasonable input. I'm not sure about the workload of fetching raw peers from the ledger. Should I take a closer overall look?

Copy link
Contributor

@nfrisby nfrisby Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the workload of fetching raw peers from the ledger. Should I take a closer overall look?

Perhaps other members of the Networking Team (eg Marcin? Armando?) have already worried about that being "too expensive", since "getting ledger peers" is pretty primary to the Diffusion Layer?


We don't need to pin it down exactly. But it'd be regrettable if some SPO's system was performing poorly because they didn't know that they shouldn't be sending this query too often. So we should be able to document the "rough cost" or some such if that sort of thing is a risk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The costs described in #1067 (comment) seem completely fine and in line (or below) what other queries require.

@crocodile-dentist crocodile-dentist force-pushed the mwojtowicz/ledger-query-peer-snapshot branch 3 times, most recently from 1a5a8bd to be3cf42 Compare April 24, 2024 10:08
@crocodile-dentist
Copy link
Contributor Author

crocodile-dentist commented May 31, 2024

I've looked into the performance impact on cpu and memory use of this query on a fully synced node without any peers connected, and results are as follows. Measurements are taken every second for a duration of 5 minutes on an [email protected] GHz with 32 GB of RAM.

5 minute baseline plot when idling:
idle

5 minutes of snapshot requests, where to node is queried in random intervals every 0 to 1 seconds:
snapshot

There isn't any discernable impact on memory use - no spikes or significant/unusual growth could be determined, and any memory use triggered by the query is neglibigle. Some minor CPU use in excess of the idling load is visible on the second plot, but nothing noteworthy can be concluded from it as it appears to be within reasonable limits.

The query itself executes in around 20ms on the node:
50-100 us to acquire ledger state in an atomic operation
<3 ms to calculate stake pool relays by getPeers (time complexity~ O(n log n) - almost, there is a square term due to removal of duplicate relays for each stake pool, but I assume that the number of relays for each pool is small and can be treated as a constant to simplify things)
<15 ms to calculate which are the big ledger peers (time complexity is O(n log n) with most likely bigger constant terms as there are multiple passes over the stake pools, eg. one pass to calculate the total stake and another to re-compute the relative stake of each pool, and another to sort things)

The resulting snapshot file size is ~50kb.

@crocodile-dentist crocodile-dentist force-pushed the mwojtowicz/ledger-query-peer-snapshot branch from db1ddcf to 7de012a Compare June 7, 2024 10:42
@crocodile-dentist crocodile-dentist force-pushed the mwojtowicz/ledger-query-peer-snapshot branch from 7de012a to dad791e Compare June 15, 2024 13:44
@jasagredo
Copy link
Contributor

For the fs-sim error I see in ci, update to the CHaP index-state at 2024-06-06T15:28:08Z

@crocodile-dentist
Copy link
Contributor Author

For the fs-sim error I see in ci, update to the CHaP index-state at 2024-06-06T15:28:08Z

Yes, that's what I figured

@crocodile-dentist crocodile-dentist force-pushed the mwojtowicz/ledger-query-peer-snapshot branch 2 times, most recently from 1f04896 to 8a82356 Compare June 18, 2024 09:26
@crocodile-dentist crocodile-dentist force-pushed the mwojtowicz/ledger-query-peer-snapshot branch 2 times, most recently from f764134 to 6693ea2 Compare August 8, 2024 12:23
@crocodile-dentist crocodile-dentist force-pushed the mwojtowicz/ledger-query-peer-snapshot branch 5 times, most recently from c88f5cf to 6b1f13e Compare October 3, 2024 15:58
@crocodile-dentist crocodile-dentist force-pushed the mwojtowicz/ledger-query-peer-snapshot branch 3 times, most recently from 9b4bac8 to c130c9f Compare October 21, 2024 13:23
Copy link
Member

@amesgen amesgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the code that is stacked on top of #1223 👍 (last two commits)

Looks great, only two trivial comments

@amesgen
Copy link
Member

amesgen commented Oct 21, 2024

Ah, and the new golden files have to be checked in (that's why CI is red ATM). You can generate them locally via cabal build all and then cabal test all --test-options='-p Golden'.

This change enables retrieval of big ledger peers. This data can be
saved and loaded later, and used by the network layer
to facilitate syncing up a node from a fresh or stale state.
@crocodile-dentist crocodile-dentist force-pushed the mwojtowicz/ledger-query-peer-snapshot branch 3 times, most recently from 945e64a to 9dede29 Compare October 23, 2024 08:27
@crocodile-dentist crocodile-dentist force-pushed the mwojtowicz/ledger-query-peer-snapshot branch from 9dede29 to 28fdf38 Compare October 23, 2024 09:01
@crocodile-dentist crocodile-dentist added this pull request to the merge queue Oct 23, 2024
Merged via the queue into main with commit 43e04df Oct 23, 2024
18 checks passed
@crocodile-dentist crocodile-dentist deleted the mwojtowicz/ledger-query-peer-snapshot branch October 23, 2024 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

4 participants