-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Superblock validation - Part II #1542
Superblock validation - Part II #1542
Conversation
53e5d57
to
16d7970
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're going to deadlock cs_mapManifest
and cs_mapParts
. At least one place in main.cpp locks cs_mapManifest
to call AlreadyHave()
, which can then lock cs_mapParts
. The scraper threads lock cs_mapManifest
for various functions, the CSplitBlob
destructor locks cs_mapParts
, and the superblock validation will lock both.
Going to try to trace the locks/threads more closely...pretty hard to reason about at this point...we may need to think of a better synchronization strategy for these two.
if (const auto SBProjectIter = NewFormatSuperblock.m_projects.Try(iWhitelistProject.m_name)) | ||
{ | ||
// Pull the convergence hint (reduced content hash) for the project object. | ||
uint32_t nReducedSBProjectObjectContentHash = SBProjectIter->m_convergence_hint; | ||
|
||
for (const auto& iter : mProjectObjectsBinnedbyContent) | ||
if (iter.first <= NewFormatSuperblock.m_timestamp && nReducedProjectObjectContentHash == nReducedSBProjectObjectContentHash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a node's time is behind, this could result in a false validation failure. A scraper could publish a manifest with a time ahead of the perceived time that a node received the superblock. As in...
- Scraper detects new etag, pulls the stats, and publishes a new manifest
- 10 seconds later, a node 20 seconds behind stakes a superblock with that manifest
- Another node immediately receives the manifest and the superblock...
manifest time: t - 10,
superblock time: t - 20,
where t is roughly the node's current time. - Validation fails because the manifest was not considered when reconstructing the convergence
The superblock timestamp will be set to the timestamp of its containing block when received by a node. I don't think we need to check the timestamp here because the content hash should adequately filter out ineligible candidates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I put this in there to try and conform the validation to the existing implementation of the formation of a convergence from the source node, while trying to deal with the exact opposite case that you describe above, which is that the convergence that is hinted by the incoming superblock may not be the latest by project convergence that can be formed on the receiving node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may have to check every combination of the matching project parts (breadth-and-most-recent-first) to also protect against temporary network partitions or delays during the superblock window. Not a pretty algorithm...
There will only be one combination almost all of the time, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. And it is 99.9% likely the head of the list. But it is the 0.1% I am worried about. I do not want to get into an 2^n (n the number of projects) algorithm here for a 0.1% corner case. I have an idea. I will have to draw it out on a picture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rare
To say the least 🙂
And this only matters in a fallback-to-project-level convergence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the node has only been running for a short enough while that it is not cached...
But we will do the final algo anyway, because I like to be thorough. I don't think we should delay the interim release over this. Let's merge this tracking PR as is and continue our work on this on integrated_scraper_2.
On the cs_mapManifest and cs_mapParts, unless the critical sections are locked and unlocked in different order we will not deadlock. We could probably solve the problem by moving to a single critical section to protect both entities, but I am loath to do that, because I am concerned about the loss of parallelism... |
Not that it absolutely proves anything, but I think that if we were to see a deadlock, we would have seen it already on the mainnet scraper node I am running with these changes, which has 250+connections that exercise the net code + doing the scraper work for two consecutive days. |
I agree—let’s keep the parallelism if we can. I did not find any lock scenarios that will result in a deadlock either. Running this also now 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No locking issues observed. @jamescowens and I talked about the by-project manifest-to-superblock timestamp comparison and agreed to refactor it in a follow-up PR. The code is not yet active.
Yep. We have some work to do on it. We know what we have to do... |
Added - Add testnet desktop launcher action for Linux #1516 (@caraka) - Shuffle vSideStakeAlloc if necessary to support sidestaking to more than 6 destinations #1532 (@jamescowens) - New Superblock format preparations for Fern #1526, #1542 (@jamescowens, @cyrossignol) - Multisigtools - Consolidate multisig unspent #1529 (@iFoggz) - Scanforunspent #1547 (@iFoggz) - consolidatemsunspent and scanforunspent bug fix #1561 (@iFoggz) - New banning misbehavior handling and Peers Tab on Debug Console #1537 (@jamescowens) - Reimplement getunconfirmedbalance rpc #1548 (@jamescowens) - Add CLI switch to display binary version #1553 (@cyrossignol) Changed - Select smallest coins for contracts #1519 (@iFoggz) - Move some functionality from miner to SelectCoinsForStaking + Respect the coin reserve setting + Randomize UTXO order #1525 (@iFoggz) - For voting - if url does not contain http then add it #1531 (@iFoggz) - Backport newer serialization facilities from Bitcoin #1535 (@cyrossignol) - Refactor ThreadSocketHandler2() Inactivity checks #1538 (@iFoggz) - Update outdated checkpoints #1539 (@barton2526) - Change needed to build Gridcoin for OSX using homebrew #1540 (@Git-Jiro) - Optimize scraper traffic for expiring manifests #1542 (@jamescowens) - Move legacy neural vote warnings to debug log level #1560 (@cyrossignol) - Change banlist save interval to 5 minutes #1564 (@jamescowens) - Change default rpcconsole.ui window size to better support new Peers tab #1566 (@jamescowens) Removed - Remove deprecated RSA weight and legacy kernel #1507 (@cyrossignol) Fixed - Clean up compiler warnings #1521 (@cyrossignol) - Handle missing external CPID in client_state.xml #1530 (@cyrossignol) - Support boost 1.70+ #1533 (@iFoggz) - Fix diagnostics failed to make connection to NTP server #1545 (@Git-Jiro) - Install manpages in correct system location #1546 (@Git-Jiro) - Fix ability to show help and version without a config file #1553 (@cyrossignol) - Refactor QT UI variable names to be more consistent, Fix Difficulty default #1563 (@barton2526) - Fix two regressions in previous UI refactor #1565 (@barton2526) - Fix "Owed" amount in output of "magnitude" RPC method #1569 (@cyrossignol)
Currently this has the new formulation for the per project uncached SB validation (ValidateSuperblock function)