Implement a CLI for hivemind.DHT #465

mryab · 2022-04-20T23:22:16Z

This PR creates a command-line script for creating a DHT instance which becomes available directly after the installation of hivemind. Importantly, this script does not create an auxiliary peer or allow to monitor/store task-specific metrics — the goal is simply to have a quick way to create a DHT instance to establish connectivity; later on, more specialized peers with user code may join the network.

Since the peer needs to be exposed to the network, I've moved the log_visible_maddrs function to utils.networking, further simplifying the ALBERT example folder. Also, I've fixed a couple of typos and reduced the high-level interaction surface of hivemind.utils(.networking)

In the future, we may add an interactive version of hivemind-dht similar to a database or key-value storage CLI client; e.g., with an ability to request or store specific keys and list the available nodes. This may greatly simplify debugging "lost" keys or DHT connectivity issues; for the time being, I've added a simpler version that monitors the DHT state without additional network load.

mryab · 2022-04-20T23:25:32Z

hivemind/utils/__init__.py

+from hivemind.utils.networking import (
+    Endpoint,
+    choose_ip_address,
+    get_free_port,
+    get_port,
+    log_visible_maddrs,
+    replace_port,
+)


Just as a reminder, the goal here is to incrementally move from wildcard imports to explicit enumeration of all exposed functions and classes. All public objects remain available for direct imports, but the higher we go in the import hierarchy, the less we want to expose (mostly the key functionality at the top level of hivemind, which also helps on autocompletion)

I'd suggest to not import here anything at all, since usage of most of these functions are discouraged.

In fact, most of them are legacy that is (1) quite hacky (e.g., get_free_port is subject to the race condition that was a reason of lots of flapping tests in the past) and (2) work with pure IP:port addresses (i.e., should not be used after the server would be converted to libp2p).

Kept two imports: the one that is used externally (log_visible_maddrs) and the one that is used in a couple of tests (get_free_port), since I want to restrict the scope of this PR. In follow-up PRs, we can actually move get_free_port to test_utils, since it is not used in any of hivemind-internal code

mryab · 2022-04-20T23:26:33Z

hivemind/utils/networking.py

+        initial_peers_str = " ".join(str(addr) for addr in selected_maddrs)
+
+    _logger.info(
+        f"Running a DHT instance. To connect other peers to this one, use "


There was an "over the Internet" clause, but as we know, this works perfectly fine for local networks as well

mryab · 2022-04-20T23:44:27Z

Example output of three `hivemind-dht` instances (two have a refresh period of 5 seconds, one has a default period of 30 seconds, but is launched with `GOLOG_LOGLEVEL=INFO HIVEMIND_LOGLEVEL=DEBUG`) and one DHT launched from the Python REPL for the purpose of adding custom keys

codecov · 2022-04-20T23:46:43Z

Codecov Report

Merging #465 (4b0a518) into master (724cdfe) will decrease coverage by 0.13%.
The diff coverage is 55.81%.

@@            Coverage Diff             @@
##           master     #465      +/-   ##
==========================================
- Coverage   83.03%   82.90%   -0.14%     
==========================================
  Files          82       83       +1     
  Lines        8136     8177      +41     
==========================================
+ Hits         6756     6779      +23     
- Misses       1380     1398      +18

Impacted Files	Coverage Δ
hivemind/dht/dht.py	`91.25% <ø> (+1.25%)`	⬆️
hivemind/dht/node.py	`91.44% <ø> (ø)`
hivemind/utils/networking.py	`43.24% <26.66%> (-13.28%)`	⬇️
hivemind/hivemind_cli/run_dht.py	`70.37% <70.37%> (ø)`
hivemind/utils/__init__.py	`100.00% <100.00%> (ø)`
hivemind/averaging/averager.py	`88.36% <0.00%> (-0.48%)`	⬇️
hivemind/averaging/matchmaking.py	`84.22% <0.00%> (ø)`
hivemind/utils/asyncio.py	`100.00% <0.00%> (+0.84%)`	⬆️

mryab · 2022-04-20T23:51:58Z

The launch command and the first output lines for the first node are

~/hivemind$ hivemind-dht --refresh_period 5
Apr 21 02:09:37.060 [INFO] Running a DHT instance. To connect other peers to this one, use --initial_peers /ip4/172.27.77.70/tcp/43331/p2p/QmNMLCmjYUFgQ1TTkEv8P6kxQ3HYJHVhhWL6BH3nMLD3kW
Apr 21 02:09:37.060 [INFO] Full list of visible multiaddresses: /ip4/172.27.77.70/tcp/43331/p2p/QmNMLCmjYUFgQ1TTkEv8P6kxQ3HYJHVhhWL6BH3nMLD3kW /ip4/127.0.0.1/tcp/43331/p2p/QmNMLCmjYUFgQ1TTkEv8P6kxQ3HYJHVhhWL6BH3nMLD3kW
Apr 21 02:09:37.061 [INFO] 1 DHT nodes (including this one) are in the local routing table
Apr 21 02:09:37.061 [INFO] Local storage contains 0 keys

examples/albert/README.md

borzunov

Thanks for the PR! I've left some discussion points below.

hivemind/utils/networking.py

borzunov · 2022-04-21T22:54:35Z

hivemind/utils/__init__.py

+from hivemind.utils.networking import (
+    Endpoint,
+    choose_ip_address,
+    get_free_port,
+    get_port,
+    log_visible_maddrs,
+    replace_port,
+)


I'd suggest to not import here anything at all, since usage of most of these functions are discouraged.

In fact, most of them are legacy that is (1) quite hacky (e.g., get_free_port is subject to the race condition that was a reason of lots of flapping tests in the past) and (2) work with pure IP:port addresses (i.e., should not be used after the server would be converted to libp2p).

borzunov · 2022-04-21T22:57:25Z

hivemind/hivemind_cli/run_dht.py

+    logger.debug(f"Local storage contents: {node.protocol.storage}")
+
+
+def main():


Can we add a test that launches a DHT with this script (and maybe tries to connect to it)?

This would make impossible to forget about this script if someone changes smth in the DHT/P2P interface. It would also save us from the drop in codecov coverage :)

Added tests/test_cli_scripts.py with a basic test for hivemind-dht, we can also test hivemind-server similarly in the future

borzunov · 2022-04-21T23:03:40Z

hivemind/hivemind_cli/run_dht.py

+
+def main():
+    parser = ArgumentParser()
+    parser.add_argument(


Lots of other P2P/DHT params are not covered here, and we use some of them while testing libp2p connectivity features, etc.

What's your take on this? Should we add them gradually (when deemed necessary) or write code that automatically fetches their names, help lines, and defaults using reflexion?

In case of the former, I'm worried that the help lines & defaults would quickly become not synchronized.

I guess that for the first PR that introduces hivemind-dht, we don't need to support all possible scenarios in which the DHT can be used. I deliberately kept it to a minimum to use the core functionality and took my inspiration from the ALBERT example; if you believe that some of the essential arguments are missing, I'd be happy to add them.

For the desynchronization of DHT docstrings, I'd not worry too much about it at first: I'm not aware of any imminent PRs that change the purpose of the given DHT arguments, and if somebody does change its API, I believe that at least one of us (me, you or @justheuristic) will have a look at the corresponding PR. Though it doesn't guarantee that we will notice a desync, the issue will be easily fixable, and adding extra abstraction at this point seems to me to be an overkill

borzunov · 2022-04-21T23:09:38Z

In the future, we may add an interactive version of hivemind-dht similar to a database or key-value storage CLI client

One more nice-to-have thing for the future: make this script show a really simple web interface showing the DHT contents. We don't need to use any frameworks for that (we could write it in the same way as python -m http.server works).

Such an interface would be great for educational purposes: if a student starts to work with hivemind, they would be able to explore DHT (and make sense out of it) before they learn how all the asyncio & libp2p stuff necessary to work with it in the code.

Co-authored-by: Michael Diskin <[email protected]>

borzunov

Great! I have only one issue left.

hivemind/hivemind_cli/run_dht.py

Co-authored-by: Alexander Borzunov <[email protected]>

mryab requested review from justheuristic and borzunov April 20, 2022 23:22

mryab commented Apr 20, 2022

View reviewed changes

mryab force-pushed the cli_dht branch 2 times, most recently from 9b1d663 to f9c2f9c Compare April 20, 2022 23:35

mryab changed the base branch from master to bump-black April 20, 2022 23:41

yhn112 reviewed Apr 21, 2022

View reviewed changes

examples/albert/README.md Outdated Show resolved Hide resolved

justheuristic approved these changes Apr 21, 2022

View reviewed changes

borzunov requested changes Apr 21, 2022

View reviewed changes

Base automatically changed from bump-black to master May 3, 2022 21:38

mryab force-pushed the cli_dht branch from 31c13f8 to 098642a Compare May 3, 2022 22:04

borzunov added the dht label May 23, 2022

mryab and others added 6 commits June 4, 2022 20:16

Implement a CLI for hivemind.DHT

2cae16a

Fix log message in README

b4710df

Update examples/albert/README.md

b6efc19

Co-authored-by: Michael Diskin <[email protected]>

Post-rebase changes

660cde9

_logger -> logger (for now)

9d464ef

Remove top-level imports in hivemind.utils

5ae8809

mryab force-pushed the cli_dht branch from 098642a to 5ae8809 Compare June 4, 2022 17:29

mryab added 4 commits June 4, 2022 20:32

Add get_free port to imports from utils.networking

7aaf726

Change initial_peers_str to initial_peers

d29b67d

Add a basic test for hivemind-dht

a54f939

Run black and isort

25f9534

mryab requested review from justheuristic and removed request for borzunov June 4, 2022 18:57

mryab requested a review from borzunov June 4, 2022 18:57

borzunov approved these changes Jun 5, 2022

View reviewed changes

hivemind/hivemind_cli/run_dht.py Outdated Show resolved Hide resolved

mryab and others added 3 commits June 5, 2022 15:09

Update help for the identity argument

7cdda0b

Co-authored-by: Alexander Borzunov <[email protected]>

Fix formatting for identity_path

419de8f

Reword the help message (mainly to trigger CI)

4b0a518

mryab merged commit c49802a into master Jun 5, 2022

mryab deleted the cli_dht branch June 5, 2022 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a CLI for hivemind.DHT #465

Implement a CLI for hivemind.DHT #465

mryab commented Apr 20, 2022 •

edited

Loading

mryab Apr 20, 2022

borzunov Apr 21, 2022

mryab Jun 4, 2022

mryab Apr 20, 2022

mryab commented Apr 20, 2022

codecov bot commented Apr 20, 2022 •

edited

Loading

mryab commented Apr 20, 2022

borzunov left a comment

borzunov Apr 21, 2022

borzunov Apr 21, 2022 •

edited

Loading

mryab Jun 4, 2022

borzunov Apr 21, 2022

mryab Jun 4, 2022

borzunov commented Apr 21, 2022 •

edited

Loading

borzunov left a comment

		logger.debug(f"Local storage contents: {node.protocol.storage}")


		def main():

Implement a CLI for hivemind.DHT #465

Implement a CLI for hivemind.DHT #465

Conversation

mryab commented Apr 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mryab commented Apr 20, 2022

codecov bot commented Apr 20, 2022 • edited Loading

Codecov Report

mryab commented Apr 20, 2022

borzunov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

borzunov Apr 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

borzunov commented Apr 21, 2022 • edited Loading

borzunov left a comment

Choose a reason for hiding this comment

mryab commented Apr 20, 2022 •

edited

Loading

codecov bot commented Apr 20, 2022 •

edited

Loading

borzunov Apr 21, 2022 •

edited

Loading

borzunov commented Apr 21, 2022 •

edited

Loading