Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benph/bootstrap thread migration #3554

Merged
merged 101 commits into from
Feb 27, 2023
Merged

Conversation

Ben-PH
Copy link
Contributor

@Ben-PH Ben-PH commented Feb 10, 2023

  • document all added functions
  • try in sandbox /simulation/labnet
  • unit tests on the added/changed features
    • make tests compile
    • make tests pass
  • add logs allowing easy debugging in case the changes caused problems
  • if the API has changed, update the API specification

@Ben-PH Ben-PH linked an issue Feb 10, 2023 that may be closed by this pull request
@Ben-PH Ben-PH self-assigned this Feb 10, 2023
@Ben-PH Ben-PH added bootstrap Issues related to the bootstrap refactoring labels Feb 10, 2023
@Ben-PH
Copy link
Contributor Author

Ben-PH commented Feb 14, 2023

This implementation keeps async around, but uses tokio::spawn to run the the handling of a bootstrap connection, instead of pushing it onto a FuturesUnordered and keeping it on the same thread. Remov ing the async completely is non-trivial:

  • Bandwidth throttling uses contructs that are coupled to an async context.
  • Interconnected parts with the client code.
  • use of timeouts, e.g. limiting the connection time so clients don't DoS the server.

These can, of course, be overcome in the (•_•) ( •_•)>⌐■-■ (⌐■_■) future.

Instead of checking the length of the (now removed) FuturesUnordered list, we track the number of active threads through an Arc<()>, and looking at its strong-count. Running a bootstrap requires passing in a clone, and the runner will drop the clone on completion.

It still uses the select macro with these branches:

  • controller message: break if the message is a Stop log active bootstrap count if the message is a Complete. The complete could probably be discarded, with logging done in the drop-handler of the counter?
  • cache updater: As before, runs the black/white list updater on the primary thread. Will look into moving this into dedicated thread...
  • listener: the main change here is that the once the server decides to provide the actual bootstrapping services to a client, that is done on a separate thread using tokio::spawn instead of pushing onto a FuturesUnordered

@Ben-PH
Copy link
Contributor Author

Ben-PH commented Feb 14, 2023

Issues with implementation at this point:

  • OBO errors need to be tested for the thread-count. Either find a more ergonomic/idiomatic way to keep the count correct automagically, or unit test.
  • I'm not 100% sure that the listen-branch is cancel proof/safe. If there's an .await at a pathalogical moment, that could introduce some weird behavior.
  • Things are still async: related to previous point, but addressed in earlier comment
  • We don't have performance regression testing
  • ADD TO ME

massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
massa-bootstrap/src/server.rs Show resolved Hide resolved
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
@AurelienFT
Copy link
Contributor

AurelienFT commented Feb 14, 2023

Instead of checking the length of the (now removed) FuturesUnordered list, we track the number of active threads through an Arc<()>, and looking at its strong-count. Running a bootstrap requires passing in a clone, and the runner will drop the clone on completion.

It's not better to have a vector with the handle on the thread/tasks ?

@Ben-PH
Copy link
Contributor Author

Ben-PH commented Feb 15, 2023

That would require some sort of mechanism to pop the join handle from the collection automagically.

The reason I went with the Arc is because there is a tight coupling between the strong count of the arc, and the number of active threads. I originally went with a plain old u32, which worked nicely, but keeping the value in line with the number of active threads (active threads == active bootstrap sessions) would make the code less readable/maintainable. Although this approach still requires manual book-keeping, it is constrained to "only needing to pass in the a clone of a specific variable, and don't forget to account for the root copy". The built-in mechanisms handle the rest.

I'm sure there's a better way, such as a mechanism that will automagically push/pop join handles from a collection in O(1) based on their ownership. Thread-pooling comes to mind. We might end up going there on the next iteration of this task, but this seemed appropriate to begin with.

@Ben-PH
Copy link
Contributor Author

Ben-PH commented Feb 15, 2023

The Arc has a (minor) added benefit of freeing up a bootstrap slot as soon as it completes/fails:

    let res = tokio::time::timeout(
        config.bootstrap_timeout.into(),
        manage_bootstrap(
            &config,
            &mut server,
            data_execution,
            version,
            consensus_command_sender,
            network_command_sender,
        ),
    )
    .await;
    // This drop allows the server to accept new connections before having to complete the error notifications
    drop(arc_counter);
    let _ = controller_tx.send(BSInternalMessage::Complete).await;
    match res {

the match res includes some branches inside a tokio::timeout context that sends errors to the client. In the cases of an error with the bootstrap, or a timeout of the bootstrap, we do not have to wait until the message sending is completed for that slot to be available for the next client.

It's a small and perhaps rarely needed optimisation, but it's something that works for now.

The next design iterations should consider applying such optimisations automagically. My initial thinking, is to couple the count with the manage_bootstrap scope inside here: https://github.com/massalabs/massa/pull/3554/files#diff-2f2a0d279d4c015c632b27bdc7bd2fb62cf753fa8811dead19aadc3133666156R431

...instead of inside the run_bootstrap scope from which the manage_bootstrap scope is created.

This would all be made redundant, though, if we were to create a dedicated thread pool for the bootstrap server, spawn the threads within that pool, and count the spawned threads. That should work in a bubble, but I don't think we want to go down that road yet...

@Ben-PH Ben-PH requested a review from AurelienFT February 15, 2023 11:53
@Ben-PH Ben-PH marked this pull request as ready for review February 15, 2023 11:53
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
massa-bootstrap/src/server.rs Show resolved Hide resolved
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
Copy link
Contributor Author

@Ben-PH Ben-PH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self review

massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
massa-bootstrap/src/server.rs Show resolved Hide resolved
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
massa-bootstrap/src/server.rs Show resolved Hide resolved
massa-bootstrap/src/server.rs Show resolved Hide resolved
massa-bootstrap/src/server.rs Outdated Show resolved Hide resolved
@Ben-PH Ben-PH requested a review from modship February 24, 2023 18:04
@Ben-PH Ben-PH merged commit 691e343 into testnet_20 Feb 27, 2023
@AurelienFT AurelienFT mentioned this pull request Feb 27, 2023
bors bot added a commit that referenced this pull request Mar 1, 2023
3489: Testnet 20 r=AurelienFT a=AurelienFT

Merged in this testnet:

- #3475 
- #3549
- #3562 
- #3462 
- #3492 
- #3502 
- #3495 
- #3556 
- #3511 
- #3498 
- #3566 
- #3557 
- #3576 
- #3579 
- #3507 
- #3585 
- #3587 
- #3580 
- #3590 
- #3549 
- #3455 
- #3601 
- #3602 
- #3606 
- #3588 
- #3603 
- #3554

Co-authored-by: AurelienFT <[email protected]>
Co-authored-by: Ben PHL <[email protected]>
Co-authored-by: Modship <[email protected]>
Co-authored-by: Ben <[email protected]>
Co-authored-by: Eitu33 <[email protected]>
Co-authored-by: Sydhds <[email protected]>
Co-authored-by: modship <[email protected]>
Co-authored-by: Moncef AOUDIA <[email protected]>
Co-authored-by: Moncef AOUDIA <[email protected]>
@Ben-PH Ben-PH mentioned this pull request Mar 20, 2023
28 tasks
@AurelienFT AurelienFT deleted the benph/bootstrap-thread-migration branch May 29, 2023 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bootstrap Issues related to the bootstrap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove async from bootstrap
3 participants