Expose node information to allow us building proper tooling to visualize our network. #1099

skylenet · 2019-01-09T15:38:13Z

Create an admin API that exposes the following information about individual swarm nodes:

peers
known nodes
chunk IDs that are in the local datastore

The API could be in JSON RPC or HTTP.

This would bring up the possibility to create tooling that could iterate over all our nodes (in private deployments) and gather this information. Something similar has been already mention in the roundtable, .e.g "chunk explorer".

holisticode · 2019-01-14T14:38:10Z

#159
#158

Also note that "chunk IDs that are in the local datastore" has security implications and should be something optional.

The "chunk explorer" task aims indeed at the same goals.

nonsense · 2019-01-14T15:34:25Z

@holisticode chunk IDs that are in the local datastore should NOT have any security implications, because anyone is free to do whatever they want with their node, or am I missing something?

holisticode · 2019-01-14T17:13:05Z

@nonsense my understanding is that you could be tracking who is storing what. What do you think @nagydani, is that something we need to consider?

nonsense · 2019-01-14T17:25:12Z

@holisticode it is my node, so I am free to do whatever I want with it. People who care about privacy should be uploading/syncing encrypted content. Even if we don't implement this, it is trivial to run an iterator on LevelDB and print out all chunks, and therefore review all non-encrypted chunks, so we have to assume that users would do/have already done that.

holisticode · 2019-01-15T15:06:52Z

@nonsense Yes but also from outside like with an API - "What chunks do you have?"

nonsense · 2019-01-15T16:02:15Z

@holisticode yeah, we discussed that this will be an admin endpoint or something like that - one that you should not expose unless you know what you are doing. geth already has such APIs.

acud · 2019-01-17T08:29:01Z

I think as @holisticode that this should be implemented over JSON-RPC not HTTP.
I'm also not sure if we should expose this over HTTP (although it would probably simplify the tooling around reading the data from the nodes in a cluster/deployment).

Regarding the points @skylenet mentioned:

peers

This should already be available through JSON-RPC. Do a geth attach path/to/ipc and invoke admin.peers. This number though, I think, comes from the p2p layer and not the kademlia since it is not really a stable number. I can expose the kademlia peers instead.

known nodes

will expose this as the kademlia known peers in the address book

chunk IDs that are in the local datastore

I'm assuming that there's no other way to do this apart from running a full database scan, checking each key if it is a chunk hash index key and if it is - assume it is a chunk in the database.
I believe that the problematics of this is quite obvious as it will send each queried node into a halt (I believe it would just prevent any interaction with the database). In a database of a few thousands chunks it would be fine, but with a million for example, we'd be seeing longer locks. I've already noted this to @zelig and @janos (with the new shed we would have far more indexes, which in turn means we'd have to iterate over all those entries too). If we isolate the usage just to test clusters I guess this is fine, but we probably shouldn't use this in production.

I really think we should maintain several database connections when using leveldb as a backend (one for each index, basically, with separated data folders for each of them).

janos · 2019-01-17T09:20:27Z

@justelad Yes, iterating is needed in every request to get all chunks. But iteration should not be on all indexes (for both current or new localstore). We only need to iterate over retrieval index. Also, iteration is not locking the database. Every iterator is taking a snapshot on which it iterates, allowing other operations to commit changes to the log. I have measured that it takes around 0.7s to iterate over 1.000.000 keys on my laptop with pretty good ssd. This is an expensive operation, with 100% cpu utilization.

But I doubt that we would need to provide all chunk keys in one response. Paging is a must for this lists. Having a response with a few million items, even if they are just 32 bytes long, is not efficient for both server and client.

nonsense · 2019-01-22T20:20:58Z

@skylenet for peers, we already have admin.peers. We could do something like:

# 1. enable apis for a given deployment
    extraFlags:
      - --ws
      - --wsorigins=localhost
      - --wsapi=admin,debug,pss

# 2. port forward to websocket interface on a given node
kubectl port-forward -n tony pods/swarm-private-40 8546:8546

# 3. get peers and extract only enodes
echo '{"jsonrpc":"2.0","method":"admin_peers","id":1}' | websocat ws://localhost:8546 --origin localhost | jq ".[]" | tail -n+3 | jq ".[] | .enode"

We could get the node id with

echo '{"jsonrpc":"2.0","method":"admin_nodeInfo","id":1}' | websocat ws://localhost:8546 --origin localhost | jq ".[]" | tail -n+3 | jq ".id"

This should be mostly enough to construct a network snapshot, once we query all nodes from a given deployment. cc @gluk256

nonsense · 2019-01-22T20:26:27Z

@justelad getting the peers from admin.peers should be just fine in private deployments, after the network has established. In a public network, you might see a peer there, that has not yet gone through the bzz handshake I guess, but chances of this happening should be low.

nonsense · 2019-01-22T20:30:08Z

I suggest as a next step we add an API to the stream protocol, called listChunks for example, that runs an iterator over the local store and prints the chunk IDs. I'm not worried about load, as @janos says it is pretty fast, as we will use this only for debugging, when the node is not syncing or storing chunks - this is visible in the counters for handled messages in Grafana.

Adding an API to hive for known peers would also probably be helpful.

holisticode · 2019-01-31T16:23:10Z

After conversation on standup 29/01/2019, the decision has been taken to actually implement a HasChunk method which can be called via API.

A complete listChunks method will be implemented only on GlobalStore

holisticode · 2019-01-31T16:27:34Z

A way to query known nodes is via the GetPeerSubscriptions() endpoint which has been implemented with ethereum/go-ethereum#18972

holisticode · 2019-02-07T15:15:52Z

Closed by ethereum/go-ethereum#18972

Two possible solutions for known nodes:

admin.peers
GetPeerSubscriptions

nonsense · 2019-02-07T15:20:56Z

admin.peers shows actual peers, not all known nodes.
known nodes are collected as part of the hive protocol. As soon as we have a use-case for them, we should expose them as well in a more structured format (currently they are visible only in the bzz.hive string).

nonsense mentioned this issue Jan 22, 2019

Create tool to extract network snapshot #1135

Open

holisticode self-assigned this Jan 29, 2019

holisticode mentioned this issue Feb 6, 2019

swarm: Get all chunk references for a given file ethereum/go-ethereum#19002

Merged

holisticode closed this as completed Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose node information to allow us building proper tooling to visualize our network. #1099

Expose node information to allow us building proper tooling to visualize our network. #1099

skylenet commented Jan 9, 2019 •

edited by holisticode

Loading

holisticode commented Jan 14, 2019

nonsense commented Jan 14, 2019

holisticode commented Jan 14, 2019

nonsense commented Jan 14, 2019

holisticode commented Jan 15, 2019

nonsense commented Jan 15, 2019

acud commented Jan 17, 2019

janos commented Jan 17, 2019

nonsense commented Jan 22, 2019 •

edited

Loading

nonsense commented Jan 22, 2019

nonsense commented Jan 22, 2019

holisticode commented Jan 31, 2019

holisticode commented Jan 31, 2019

holisticode commented Feb 7, 2019

nonsense commented Feb 7, 2019

Expose node information to allow us building proper tooling to visualize our network. #1099

Expose node information to allow us building proper tooling to visualize our network. #1099

Comments

skylenet commented Jan 9, 2019 • edited by holisticode Loading

holisticode commented Jan 14, 2019

nonsense commented Jan 14, 2019

holisticode commented Jan 14, 2019

nonsense commented Jan 14, 2019

holisticode commented Jan 15, 2019

nonsense commented Jan 15, 2019

acud commented Jan 17, 2019

janos commented Jan 17, 2019

nonsense commented Jan 22, 2019 • edited Loading

nonsense commented Jan 22, 2019

nonsense commented Jan 22, 2019

holisticode commented Jan 31, 2019

holisticode commented Jan 31, 2019

holisticode commented Feb 7, 2019

nonsense commented Feb 7, 2019

skylenet commented Jan 9, 2019 •

edited by holisticode

Loading

nonsense commented Jan 22, 2019 •

edited

Loading