Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a CLI for hivemind.DHT #465

Merged
merged 13 commits into from
Jun 5, 2022
Merged

Implement a CLI for hivemind.DHT #465

merged 13 commits into from
Jun 5, 2022

Conversation

mryab
Copy link
Member

@mryab mryab commented Apr 20, 2022

This PR creates a command-line script for creating a DHT instance which becomes available directly after the installation of hivemind. Importantly, this script does not create an auxiliary peer or allow to monitor/store task-specific metrics — the goal is simply to have a quick way to create a DHT instance to establish connectivity; later on, more specialized peers with user code may join the network.

Since the peer needs to be exposed to the network, I've moved the log_visible_maddrs function to utils.networking, further simplifying the ALBERT example folder. Also, I've fixed a couple of typos and reduced the high-level interaction surface of hivemind.utils(.networking)

In the future, we may add an interactive version of hivemind-dht similar to a database or key-value storage CLI client; e.g., with an ability to request or store specific keys and list the available nodes. This may greatly simplify debugging "lost" keys or DHT connectivity issues; for the time being, I've added a simpler version that monitors the DHT state without additional network load.

@mryab mryab requested review from justheuristic and borzunov April 20, 2022 23:22
Comment on lines 7 to 14
from hivemind.utils.networking import (
Endpoint,
choose_ip_address,
get_free_port,
get_port,
log_visible_maddrs,
replace_port,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a reminder, the goal here is to incrementally move from wildcard imports to explicit enumeration of all exposed functions and classes. All public objects remain available for direct imports, but the higher we go in the import hierarchy, the less we want to expose (mostly the key functionality at the top level of hivemind, which also helps on autocompletion)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to not import here anything at all, since usage of most of these functions are discouraged.

In fact, most of them are legacy that is (1) quite hacky (e.g., get_free_port is subject to the race condition that was a reason of lots of flapping tests in the past) and (2) work with pure IP:port addresses (i.e., should not be used after the server would be converted to libp2p).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept two imports: the one that is used externally (log_visible_maddrs) and the one that is used in a couple of tests (get_free_port), since I want to restrict the scope of this PR. In follow-up PRs, we can actually move get_free_port to test_utils, since it is not used in any of hivemind-internal code

initial_peers_str = " ".join(str(addr) for addr in selected_maddrs)

_logger.info(
f"Running a DHT instance. To connect other peers to this one, use "
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an "over the Internet" clause, but as we know, this works perfectly fine for local networks as well

@mryab mryab force-pushed the cli_dht branch 2 times, most recently from 9b1d663 to f9c2f9c Compare April 20, 2022 23:35
@mryab mryab changed the base branch from master to bump-black April 20, 2022 23:41
@mryab
Copy link
Member Author

mryab commented Apr 20, 2022

image

Example output of three `hivemind-dht` instances (two have a refresh period of 5 seconds, one has a default period of 30 seconds, but is launched with `GOLOG_LOGLEVEL=INFO HIVEMIND_LOGLEVEL=DEBUG`) and one DHT launched from the Python REPL for the purpose of adding custom keys

@codecov
Copy link

codecov bot commented Apr 20, 2022

Codecov Report

Merging #465 (4b0a518) into master (724cdfe) will decrease coverage by 0.13%.
The diff coverage is 55.81%.

@@            Coverage Diff             @@
##           master     #465      +/-   ##
==========================================
- Coverage   83.03%   82.90%   -0.14%     
==========================================
  Files          82       83       +1     
  Lines        8136     8177      +41     
==========================================
+ Hits         6756     6779      +23     
- Misses       1380     1398      +18     
Impacted Files Coverage Δ
hivemind/dht/dht.py 91.25% <ø> (+1.25%) ⬆️
hivemind/dht/node.py 91.44% <ø> (ø)
hivemind/utils/networking.py 43.24% <26.66%> (-13.28%) ⬇️
hivemind/hivemind_cli/run_dht.py 70.37% <70.37%> (ø)
hivemind/utils/__init__.py 100.00% <100.00%> (ø)
hivemind/averaging/averager.py 88.36% <0.00%> (-0.48%) ⬇️
hivemind/averaging/matchmaking.py 84.22% <0.00%> (ø)
hivemind/utils/asyncio.py 100.00% <0.00%> (+0.84%) ⬆️

@mryab
Copy link
Member Author

mryab commented Apr 20, 2022

The launch command and the first output lines for the first node are

~/hivemind$ hivemind-dht --refresh_period 5
Apr 21 02:09:37.060 [INFO] Running a DHT instance. To connect other peers to this one, use --initial_peers /ip4/172.27.77.70/tcp/43331/p2p/QmNMLCmjYUFgQ1TTkEv8P6kxQ3HYJHVhhWL6BH3nMLD3kW
Apr 21 02:09:37.060 [INFO] Full list of visible multiaddresses: /ip4/172.27.77.70/tcp/43331/p2p/QmNMLCmjYUFgQ1TTkEv8P6kxQ3HYJHVhhWL6BH3nMLD3kW /ip4/127.0.0.1/tcp/43331/p2p/QmNMLCmjYUFgQ1TTkEv8P6kxQ3HYJHVhhWL6BH3nMLD3kW
Apr 21 02:09:37.061 [INFO] 1 DHT nodes (including this one) are in the local routing table
Apr 21 02:09:37.061 [INFO] Local storage contains 0 keys

Copy link
Member

@borzunov borzunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I've left some discussion points below.

Comment on lines 7 to 14
from hivemind.utils.networking import (
Endpoint,
choose_ip_address,
get_free_port,
get_port,
log_visible_maddrs,
replace_port,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to not import here anything at all, since usage of most of these functions are discouraged.

In fact, most of them are legacy that is (1) quite hacky (e.g., get_free_port is subject to the race condition that was a reason of lots of flapping tests in the past) and (2) work with pure IP:port addresses (i.e., should not be used after the server would be converted to libp2p).

logger.debug(f"Local storage contents: {node.protocol.storage}")


def main():
Copy link
Member

@borzunov borzunov Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test that launches a DHT with this script (and maybe tries to connect to it)?

This would make impossible to forget about this script if someone changes smth in the DHT/P2P interface. It would also save us from the drop in codecov coverage :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests/test_cli_scripts.py with a basic test for hivemind-dht, we can also test hivemind-server similarly in the future


def main():
parser = ArgumentParser()
parser.add_argument(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of other P2P/DHT params are not covered here, and we use some of them while testing libp2p connectivity features, etc.

What's your take on this? Should we add them gradually (when deemed necessary) or write code that automatically fetches their names, help lines, and defaults using reflexion?

In case of the former, I'm worried that the help lines & defaults would quickly become not synchronized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that for the first PR that introduces hivemind-dht, we don't need to support all possible scenarios in which the DHT can be used. I deliberately kept it to a minimum to use the core functionality and took my inspiration from the ALBERT example; if you believe that some of the essential arguments are missing, I'd be happy to add them.

For the desynchronization of DHT docstrings, I'd not worry too much about it at first: I'm not aware of any imminent PRs that change the purpose of the given DHT arguments, and if somebody does change its API, I believe that at least one of us (me, you or @justheuristic) will have a look at the corresponding PR. Though it doesn't guarantee that we will notice a desync, the issue will be easily fixable, and adding extra abstraction at this point seems to me to be an overkill

@borzunov
Copy link
Member

borzunov commented Apr 21, 2022

In the future, we may add an interactive version of hivemind-dht similar to a database or key-value storage CLI client

One more nice-to-have thing for the future: make this script show a really simple web interface showing the DHT contents. We don't need to use any frameworks for that (we could write it in the same way as python -m http.server works).

Such an interface would be great for educational purposes: if a student starts to work with hivemind, they would be able to explore DHT (and make sense out of it) before they learn how all the asyncio & libp2p stuff necessary to work with it in the code.

Base automatically changed from bump-black to master May 3, 2022 21:38
@borzunov borzunov added the dht label May 23, 2022
@mryab mryab requested review from justheuristic and removed request for borzunov June 4, 2022 18:57
@mryab mryab requested a review from borzunov June 4, 2022 18:57
Copy link
Member

@borzunov borzunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I have only one issue left.

@mryab mryab merged commit c49802a into master Jun 5, 2022
@mryab mryab deleted the cli_dht branch June 5, 2022 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants