-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a CLI for hivemind.DHT #465
Conversation
hivemind/utils/__init__.py
Outdated
from hivemind.utils.networking import ( | ||
Endpoint, | ||
choose_ip_address, | ||
get_free_port, | ||
get_port, | ||
log_visible_maddrs, | ||
replace_port, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a reminder, the goal here is to incrementally move from wildcard imports to explicit enumeration of all exposed functions and classes. All public objects remain available for direct imports, but the higher we go in the import hierarchy, the less we want to expose (mostly the key functionality at the top level of hivemind, which also helps on autocompletion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to not import here anything at all, since usage of most of these functions are discouraged.
In fact, most of them are legacy that is (1) quite hacky (e.g., get_free_port
is subject to the race condition that was a reason of lots of flapping tests in the past) and (2) work with pure IP:port
addresses (i.e., should not be used after the server would be converted to libp2p).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kept two imports: the one that is used externally (log_visible_maddrs
) and the one that is used in a couple of tests (get_free_port
), since I want to restrict the scope of this PR. In follow-up PRs, we can actually move get_free_port
to test_utils
, since it is not used in any of hivemind-internal code
initial_peers_str = " ".join(str(addr) for addr in selected_maddrs) | ||
|
||
_logger.info( | ||
f"Running a DHT instance. To connect other peers to this one, use " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was an "over the Internet" clause, but as we know, this works perfectly fine for local networks as well
9b1d663
to
f9c2f9c
Compare
Codecov Report
@@ Coverage Diff @@
## master #465 +/- ##
==========================================
- Coverage 83.03% 82.90% -0.14%
==========================================
Files 82 83 +1
Lines 8136 8177 +41
==========================================
+ Hits 6756 6779 +23
- Misses 1380 1398 +18
|
The launch command and the first output lines for the first node are
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I've left some discussion points below.
hivemind/utils/__init__.py
Outdated
from hivemind.utils.networking import ( | ||
Endpoint, | ||
choose_ip_address, | ||
get_free_port, | ||
get_port, | ||
log_visible_maddrs, | ||
replace_port, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to not import here anything at all, since usage of most of these functions are discouraged.
In fact, most of them are legacy that is (1) quite hacky (e.g., get_free_port
is subject to the race condition that was a reason of lots of flapping tests in the past) and (2) work with pure IP:port
addresses (i.e., should not be used after the server would be converted to libp2p).
logger.debug(f"Local storage contents: {node.protocol.storage}") | ||
|
||
|
||
def main(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test that launches a DHT with this script (and maybe tries to connect to it)?
This would make impossible to forget about this script if someone changes smth in the DHT/P2P interface. It would also save us from the drop in codecov coverage :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests/test_cli_scripts.py
with a basic test for hivemind-dht
, we can also test hivemind-server
similarly in the future
|
||
def main(): | ||
parser = ArgumentParser() | ||
parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of other P2P
/DHT
params are not covered here, and we use some of them while testing libp2p connectivity features, etc.
What's your take on this? Should we add them gradually (when deemed necessary) or write code that automatically fetches their names, help lines, and defaults using reflexion?
In case of the former, I'm worried that the help lines & defaults would quickly become not synchronized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that for the first PR that introduces hivemind-dht
, we don't need to support all possible scenarios in which the DHT can be used. I deliberately kept it to a minimum to use the core functionality and took my inspiration from the ALBERT example; if you believe that some of the essential arguments are missing, I'd be happy to add them.
For the desynchronization of DHT docstrings, I'd not worry too much about it at first: I'm not aware of any imminent PRs that change the purpose of the given DHT arguments, and if somebody does change its API, I believe that at least one of us (me, you or @justheuristic) will have a look at the corresponding PR. Though it doesn't guarantee that we will notice a desync, the issue will be easily fixable, and adding extra abstraction at this point seems to me to be an overkill
One more nice-to-have thing for the future: make this script show a really simple web interface showing the DHT contents. We don't need to use any frameworks for that (we could write it in the same way as Such an interface would be great for educational purposes: if a student starts to work with hivemind, they would be able to explore DHT (and make sense out of it) before they learn how all the asyncio & libp2p stuff necessary to work with it in the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I have only one issue left.
Co-authored-by: Alexander Borzunov <[email protected]>
This PR creates a command-line script for creating a DHT instance which becomes available directly after the installation of hivemind. Importantly, this script does not create an auxiliary peer or allow to monitor/store task-specific metrics — the goal is simply to have a quick way to create a DHT instance to establish connectivity; later on, more specialized peers with user code may join the network.
Since the peer needs to be exposed to the network, I've moved the
log_visible_maddrs
function toutils.networking
, further simplifying the ALBERT example folder. Also, I've fixed a couple of typos and reduced the high-level interaction surface of hivemind.utils(.networking)In the future, we may add an interactive version of
hivemind-dht
similar to a database or key-value storage CLI client; e.g., with an ability to request or store specific keys and list the available nodes. This may greatly simplify debugging "lost" keys or DHT connectivity issues; for the time being, I've added a simpler version that monitors the DHT state without additional network load.