-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decentralised NAT Signalling #365
Comments
There are a few things that come to mind need to be addressed for this.
|
This will need a bit of discussion to refine the idea some more but I think we have a good basis here. For a start I think some refactoring work needs to be done for hole-punching and related messaging. |
A diagram here would help understand what the problem is. Compare:
|
Also the difference between closest node connections vs closest node entries in NodeGraph and the difference between |
To maintain the network connectivity we need to maintain a bunch of open connections for each node. It raises some questions.
The main one is there any point in distinguishing entries in the node Graph are active connections or not? what can we do with the host/port information without having an active connection to it? If there is no use for this information if we're not connected to the node then shouldn't we only store information about nodes we're actively connected to? |
The other approach we considered was to use a kademlia to gossip/broadcast a hole-punch relay message to the target we are trying to connect to. if we picked N nodes out of the nodes we were connected to that are the closest to the target node and we asked them to relay a hole-punch message. If each node receiving this message relayed it to the n of it's connected nodes that were closest to the target node. Then we would eventually get the message to the target node. This message has some disavantages.
|
NAT
punching and discoveryNAT
punching and discovery (dealing with bidirectional punch relay message)
It may not be useful to store the Keeping the previously seen |
There is a node library for negotiating port mappings with the |
Persistent connections that are needed to maintain "NAT reachability" will need to be maintained at the Note that "NAT reachability" is the ability for a node to be reachable behind a NAT even while idle. |
Assigning @emmacasolin to this issue to be resolved after #357 is merged. It's a good continuation of that PR. Please review this issue details, and we can review it on Monday next sprint. |
Something like Dijkstra's algorithm might be useful here for forming increasingly closer connections to the target node, however, we would need to modify the definition of "closeness" to better meet our needs. Rather than defining the "closest" node (to the target node) using solely the Kademlia way, we could incorporate a node's held proxy connections when deciding which node to use as a signaller. The "closest" node can be defined as the node that we have an open connection to that has an open connection to the closest (Kademlia definition) node to the target node. If we also maintain that every node must at least have an open connection to its k closest nodes then, provided k >= 2, we are guaranteed to be able to find the "shortest" path to the target node, since the next connection we make will always get us closer to the target. The connections made during this process could potentially be cached to speed up connection formation in the future. Take for example this graph of nodes, where k = 2. The lines are active connections, and some nodes have more than k active connections. If all nodes have exactly k active proxy connections, then using the closest Kademlia-wise node to the target we're connected to will take the same number of connections as taking active connections into account. If any nodes have more than k active proxy connections, however, then using only the Kademlia definition can be either the same or slower. In this example, Node 10 wants to connect to Node 2. If it chooses the next closest node to the target using the Kademlia definition of closest then it will choose 11 as a signalling node, followed by 0, then 1, then finally the target node 2. In this situation three new proxy connections would need to be established. If it were instead to choose the signalling node as the node with an active connection to the next closest node to the target using the Kademlia definition then it will choose 9 as a signalling node, followed by 1 and then the target node 2, needing only 2 new proxy connections to be established. |
Looks good. This was generally the idea I had about discovery but re-reading the spec I see that it's pretty vague on this. If I recall my thinking correctly. We would've asked all of our connected nodes for their X closest nodes to the target. Out of that larger set we would select the closest nodes and make connections to them. I think that's in line with your modified closest definition above. Keep in mind that in terms of optimisation I think it's more important to minimise the number of alternate routes we explore than the number of hops we take to the target. |
It's a good idea to do a simulation of this in a simple script. You'll have to start off with some cases:
|
If we're only maintaining active connections to the k closest nodes, isn't it possible that sub-groups of close nodes could become "detached" from the rest of the network? Only maintaining connections to the closest nodes seems like it would encourage connections to be clumped together, meaning that clumps of nodes may be only connected to each other and be unreachable to other nodes outside of the "clump". |
Yea, so we may need to also maintain connections to far-off nodes too to balance it out. |
It bears looking into. But we do want strong 'local' connectivity. I was thinking about maintaining some 'regional' connections spread across all buckets to act as shortcuts during discovery. I don't think we can end up with disconnected groups if we just rely on the closest nodes though but that is something that should be checked. |
Imagine k is 20, then if every node in the 20 closest nodes connects to every other 20 node, you have a cluster of close nodes of 20 with no outside connections. |
Actually I don't know if that's the case. @emmacasolin can you use #158 (comment) (script: #326 (comment)) and work out a visualisation/simulation. If the minimum number of connections is 1, then it should be possible to have disconnected pairs. But if the minimum number of connections is greater than 1, I imagine there's less likelihood for that to be the case, even if all nodes have the same complete node graph. But I'm not sure if this is the case, can you verify this? Also remember that connections are directional. Our we have a directed graph because connections are client to server. Also once another node has made a directional connection to you, you should be able to make the reverse directional connection without hole punching. Active proxy connections may be divided between outgoing and incoming. These are separate. |
You could use graphology, ngraph or cytoskape. https://github.com/anvaka/ngraph. Related is #367 |
For future reference, most "visual" aspects of JS ecosystem is outside of nodejs conventions. You're now in the territory of frontend-JS development. Which requires different context. Here are some resources:
Some nodejs packages purport to provide terminal rendering. These are often quite niche, and more advanced visualisations often don't work or are just toys. Simpler visualisations are possible though with https://github.com/vadimdemedes/ink and https://github.com/kroitor/asciichart. Some nodejs packages support desktop GUIs like imgui and other graphics-libraries. These tend to be OS-specific, and some are cross-platform. These are more advanced of course, and are intended to create general GUI programs. Recommendations:
|
So we are going to need this for Beta CLI launch. I'm exploring more of this idea here https://chat.openai.com/share/de1b9d93-755b-4fb5-91e8-1288fa7c8300. I need someone to expand this conversation to include what we do with the MDNS discovered nodes, since they are not part of the node graph atm. It seems a multi-hop signalling system is needed to find the closest nodes to connect from. however, I want this to be as efficient as possible, can the operation to hole punch the closest nodes be done in similar way to how we find the node information? We want to avoid amplification here. @amydevs comments regarding MDNS required. @tegefaulkes any new things coming from the nodes domain that has been refactored recently? |
Also #599 is relevant to this discussion too since the mainnet/testnet page will need to acquire information from the seed node cluster and if nodes are connecting to only m/n of the seed cluster then this may impact the reconciliation of all of the information. |
Information from MDNS is not relevant to this process. address information within a single machine or a lan network is not useful for punching in from an external network. Besides that, what has changed in the nodes domain since writing this up?
|
I need to make the operation generic from the perspective of A trying to connect to B. It's important that it's an efficient way if internally it's doing multi-hop hole punching. |
Depends on what you mean by This spec describes a method of finding a common node to signal with by doing a But doing something like sending a signalling message across multiple hops. I'm very resistant to doing this. Not that it's not an option. But how do we send a message like that very quickly across multiple hops? How do we discover the connection chain between two nodes remotely without the amplification problem? |
Multi hop means each time attempting a bridge to help facilitate a connection. |
Ok I just re-read everything in this issue. This is actually very complex problem that has alot of different kinds of solutions with various trade-offs. I'm going to try to summarise the situation right now. The root of this problem is the existence of NAT. Even with IPv6, NAT is likely to continue being a problem. This is because even without a NAT gateway, there may be firewalls that simply default-deny unknown packets. And this is likely to be true on mobile networks regardless of any technology evolution (ignoring the fact that hole punching itself would not work here, but solving this problem is necessary for efficient decentralized relaying too). Currently we have a fixed set of seed nodes that is deployed into AWS. Let's take testnet.polykey.com as an example. Details on this is being discussed in #599. Currently all PK nodes upon launching with This ensures that all nodes have a port-mapping established. The network stack has been completely replaced with QUIC. Furthermore, we now only have 1 single Suppose our seed cluster has 2 seed nodes. That means right now all nodes launched will have 2 QUIC connections, but 1 QUIC socket, and thus 1 port mapping in their NAT (assuming NAT exists for them). During a recent testnet 6 test, we were able to connect 2 separate nodes both with hole-punched NAT between Sydney and Austin. This was done with the help of the seed nodes facilitating the signaling operation. Atm, there's a bug with both nodes being able to hole punch each other #605, but there's on-going progress on that from #609 (@tegefaulkes confirm please). This works because all nodes are connected to the same seed nodes. Which means when node A wants to connect to node B, it can basically ask any or all of the seed nodes S1 and S2 to forward on the signaling message. In relation to #365 (comment), this means there's a strict quorum, well it's completely overlapping basically. Now why does an overlap matter? The reason is simple. The seed nodes all have different IPs. In a prior testnet integration #403 we realised that we are not going down the route of a "distributed and load balanced" seed node cluster architecture as this goes against the principle of decentralisation. So therefore because the seed nodes have separate IPs, then during port mapping at the NAT layer, if node B only connected to S2, while node A connected to S1, then the node B's NAT would not accept packets from S1, as its IP and port is not part of the port-mapping structure in B's NAT. Basically they would be dropped as unknown packets. From a NAT port mapping perspective, it is important to remember that these mappings are temporary, and may disappear after 20 seconds of zero packet activity. Primarily we can assume that they mean a particular local IP and local port inside the LAN, is mapped to a external IP and external port. Because we use the same QUIC socket for a given node, the local IP and local port is always the same, but the external IP and external port will vary for every external node it is connecting to. So node A and node B must share a common seed node, in this issue's parlance, that is called the "signaling node", simply due to the need for the common signaling node's IP and port to be in place for both A and B's NAT port mapping so that both A and B accept packets from the common signaling node. This the core problem of this issue. Now decentralized signaling means achieving 2 things:
These 2 are sort of interelated, enabling the seed node architecture to scale - and by scalability we mean for every new seed node we deploy, do we get superlinear, linear or sublinear returns to over-all network capacity (and capacity is measured by multi-constrained limits of how big a practical realistic network will be). The current architecture does not scale, because every node is potentially doing 100% of the work of all signaling operations and 100% of all connections (which is to say, if there was 1000 nodes launched, there would be 1000 connections to every single seed node). The ICE protocol in each node is asking all the seed nodes concurrently to perform the signaling operation. Although technically it could also just ask 1 out of all of them, with the assumption that all seed nodes are connected. @tegefaulkes can you confirm where in the source code this is decided upon? Choosing only 1 out of all seed nodes to do this reduces the workfload some-what, however all nodes still have to connect to the common set of seed nodes in entirety. As per #365 (comment), it is possible to workaround this a bit by introducing strict or sloopy quorums, thus enabling at least one node to be the common seed node. However this introduces the problem of knowing which node is the actual common seed node to perform the signaling role. And such a solution would only work in the context of the blessed seed node cluster. As an aside, #599 is also discussing the change in the DNS record architecture to allow for there to be a public dashboard of the network. Which actually fits into our designs for private network bootstrapping, and also means there will need to be a change from hardcoding trusted seed node IDs to instead relying on a root certificate for every node which would require #154. Which enables dynamic seed node autoscaling. Now earlier in this issue, we imagined that getting every node to connect to their closest nodes is a decentralized co-ordination algorithm as it means we can expect every node to have connected to their closest nodes, thus placing a NAT port mapping for their closest nodes, and therefore being able to re-use the kademlia closeness to be able to "find" the potential signaling node to use to connect to a particular target node. However we later realised, that this kind of decentralized co-ordination would not form good connectivity across the entire network graph. #365 (comment) and #372. So even if we were to always be actively connected where active means regular packet activity and therefore NAT port mapping is maintained to our closest nodes, if node A was not in the same cluster as node B, then none of the potential signaling nodes will be reachable. This problem is transitive. If the potential signaling nodes are not reachable, then none of the potential signaling nodes of the signaling nodes will be reachable... etc. Empirical simulation shows that we could have good connectivity with random node connections, but there's discussion about how the reason this works is primarily due to having connections to the farthest bucket. #365 (comment) @tegefaulkes, can you explain what the difference is between spreading random active connections across the buckets, versus just random active connections across the entire graph. My impression is that they are roughly the same. You say:
This conflicts with empirical summary from the simulation saying that random connections work fine too. Is your objection due to the fact that setting up random connections doesn't work with kademlia atm, since we always try to find the closest 20 nodes first. And also remember that a node record is only added to the node graph upon a successful connection, meaning that we would likely connect up to 20 nodes to fulfill the bucket size, and this is likely to be spread through the first few buckets. Then are you saying that we add additional methods into our node graph to also find the farthest 20 nodes too? I'm just confused as to whether your comments came after the results showing random connections maintained connectivity with least complexity, and therefore is really arguing that while random works fine, structurally we need to fit into kademlia's way of finding nodes to connect to anyway, thus we now need to find closest nodes, farthest nodes... or just give me 20 random nodes across the entire graph? This is in reference to #365 (comment) since you posted that on the same date as #372 (comment). ...MORE COMMENTARY COMING |
Replying to parts of the above comment... "During a recent testnet 6 test, we were able to connect 2 separate nodes both with hole-punched NAT between Sydney and Austin. This was done with the help of the seed nodes facilitating the signaling operation. Atm, there's a bug with both nodes being able to hole punch each other #605, but there's on-going progress on that from #609 (@tegefaulkes confirm please)." Yes, I'm nearly done with #609. #605 is planned but I think addressing memory problems with #598 is going to be done first. "The ICE protocol in each node is asking all the seed nodes concurrently to perform the signaling operation. Although technically it could also just ask 1 out of all of them, with the assumption that all seed nodes are connected. @tegefaulkes can you confirm where in the source code this is decided upon?" Hole punching is initiated with " @tegefaulkes, can you explain what the difference is between spreading random active connections across the buckets, versus just random active connections across the entire graph. My impression is that they are roughly the same. " For connectivity we want a random spread of connection across the network. For searching to work with Kademlia, the 'closer' two nodes are, the more likely they should have a connection. But if you only held connections to your 'closest' nodes then the network would be fragmented into clusters. So we need a mix of random connections and close ones. |
Continuing from the previous commentary, some discussions with @tegefaulkes indicates that:
Now global connectivity means we need to at least have random connections but, we must have at least connections to the farthest bucket too. This implies that random selection by itself shoulidn't be enough, we should have policy-directed randomness. In order to do this, the NM and NG must provide a way to also find the farthest bucket. So instead of only finding the closest nodes, we also want to find the farthest nodes and a random spread of other nodes to maintain active connections to. From NG's perspective, finding farthest nodes or selecting random nodes in the node graph will be easy. The NG buckets are just prefix indexed, and we can always select nodes from the farthest bucket. The seed nodes will receive connections from all over the Node ID space, so they are likely to have a fairly uniform split because they always receive contact first. For the NCM to get the farthest bucket, it's just a matter of calling: nodeGraph.getBucket(255); As a side note I noticed that the NG has the There's also another method called Now the issue is that, using I'm not sure how that would work in relation to random selections across the network. If we always connect closest and farthest, then why do we need random selection in between? This is the first step at the very least is to be able to query the NG in more sophisticated ways. Subsequently we need to expose this information over the network. Then I am still confused as to whether kademlia will even help us find the intermediate signaling node necessary. Since closest won't help, how would we know which nodes that are closest or farthest away can act as a signaling node? Maybe another selection strategy is required? If closest causes disconnected clustering, why not always pick the farthest? And use that as our signaling system? According to #372 (comment), the bottom K strategy did work but wasn't as efficient as random K at maintaining connectivity with minimum number of active connections. I think there may be a trade-off between the probability of finding a sufficient signaling chain to reach a target node and the number of active connections to be maintained for full connectivity. Because if everything was connected to everything, then any path may lead me to the target. But if there's fewer active connections, then there are less paths to my target, and thus a better way of identifying which path to take is important. |
Some additional notes. The kademlia XOR property is subject to triangle inequality. https://stackoverflow.com/a/25756549 This means: If A to B is far away. A to B would be shorter than or equal to the distance of A to C plus C to B. So if C is close to B, then it would also have to be far away from A too. Because the sum of the 2 distances would have to be greater or equal to A to B. In other words: If B is the target node, and C is a close node to B, and B is far away from A, then C is also far away from A, potentially even farther away actually. |
This triangle inequality property is quite important to giving a geometric intuition to how this thing will work. If B is the target, then we have to search backwards. What we are searching for first is C. C is the node that has an active connection to B. And if we are preferring active connections to be selected from close nodes, then C would relatively be far from A as well. Upon finding C, we could attempt a connection to that first - because we have to in order to initiate the signal phases. Then of course in order to connect to C, we have to find C', the intermediate node that we can use to help us connect to C. Now there's a problem here, although none of this means that C is any closer to A, the originating node. Because the distance between A to C could be equal to A to B. This kind of means, that if we keep trying to find C', and C'', each time we may just be finding another node that is just as far away, and potentially farther away in XOR distance to A. Eventually we would exhaust everything that is close to B, C, C', C''... etc (assuming that the C is always the closest node to B). In fact, if we are asking kademlia what are the closest nodes to B, then it would likely give us a list of C, C', C''... etc. We're not actually "closer" to getting the node that is close to us. |
Ok on another note, the procedure is roughly right now:
The modification to ICE, is to not simply always contact all seed nodes. Instead to do this more intelligently. We could use provenance here. One idea is that whoever knows about this Node ID and was able to tell you about the IP + port, can only have done so, if they had previously connected to that Node ID. Because records do not go into the node graph until a successful connection has been made. Therefore one makes an observation that if a node does not even have the relevant record in its node graph, then it is impossible for them to be a valid signalling node in the context of hole punching. Conversely, if a node has a relevant record, it could be a valid signalling node if it still has that active connection to that node. The reason it could and not will is because:
So the 2 reasons are really just 1 reason. That 1 reason becomes more lopsided over time. That is the longer a node is running in the network, the more likely there will be nodes that had former connections to that node. (By how much though? The refresh bucket system seems like it wouldn't necessarily always pick a node ID that would exist). As a side note, the termination condition of the find node mechanism is not verified: It appears that we need to have a few conditions to ensure that we should not end up contacting the entire network. Right now it appears the refresh bucket mechanism would result in every node in the network end up slowly flooding the entire network by trying to connect to everybody. This needs to be prevented. |
Provenance may be relevant here, if the NodeGraph kept track of who told it about this NodeId information prior to connecting to it, this may help trace back to those who would have a persistent connection to the target node. On the other hand, this may just always lead back our seed nodes, and that isn't particularly great either. |
To summarise the situation right now. We have modify ICE, where it selects the node to do the signalling. At that specific point in time we have a few choices to figure out here:
Ok so here's an idea. Let's make nodes maintain connections to close nodes, but also have random connections to far away nodes. So the first step is to resolve the Node ID, that means doing a find node operation which leads us to close nodes, that are likely to have active connections to the target node. Second step which is simultaneous to the first step, is to ask our own active connections (which some are close and some are far), what their active connections are. So basically this lets us understand who is connected to us, and who they are connected to, as well as who is connected to the target node. This search process will eventually lead to a bridge point where there is a node that is in the set intersection, and is the missing link that can bridge the 2 networks together. There is a high probability that all nodes are only separated by 6 degrees. |
Actually the 6 degrees of separation isn't proved. We still need to work out the probabilistic longest chain. |
So at 6 random connections, the small world network says that the number of intermediate hops is |
Going to try this implementation and see how it goes:
No matter what just connecting to nodes to do the find nodes (which is intended to just resolve the node ID to the IP + port) requires connecting to other nodes anyway. So ICE has to more options to try. If you have 6 active connections, you can ask those connections what their active connections are - basically can they can help to connect to. You can then make connections to them, and ask them what their active connections are. This somewhat close a flooding operation. Where you are flooding through the active connection list. Actually is this a maze traversal problem? Kind of like the micromouse challenge? It seems the idea is that the find nodes operation could be used along with flood fill of active connections. |
@tegefaulkes write up a summary comment about what has been done in #618, and how we may proceed to improve it. Also we now begin empirical tests. We need to create some EC2 machines and setup the NAT firewalls for all these machines, we can take code from the NAT simulations in PK CLI - which has been disabled, we can re-use that code to setup the nftables/iptables across these EC2 machines, we can do it at scale with Pulumi, and start stress testing the network. Observability can be improved with more events/metrics exposed to the dashboards #599. |
Tomorrow morning make sure to get PK CLI setup with the new version of PK and deploy the new version to the testnet. |
Requirements of this design
NAT
punching.Additional context
Specification
We need to be able to hole punch a connection to the target node in a decentralised way. This can happen in stages.
Holding connections
To ensure that a connection path is available we need to have an environment where a chain of connections to the target is likely to exist. To do this each node needs to hold active connections to other nodes. the distribution of these connections needs to ensure that the overall network is strongly connected. Bonus points for minimising the number of hops.
Using the kademlia structure we can maintain active connections to the N closest nodes. but we can also hold connections between other nodes across the network. The closest nodes should maintain strong connections locally in the network while the extra connections will form shortcuts between parts of the network that are not closely related. Note I mean close the kademlia sense of closeness.
Discovery
The discovery process is very similar to the Kademlia
FIND_NODE
process. the difference here is that we only search over the active connectionsProxy
connections. Using this process we should be able to find a chain of connections that reaches our target Node.Generally the idea is that we are walking the graph of active connections to find a chain of connections that leads to our target. Naively this would be a pretty expensive and exhaustive search but utilising the Kademlia structure we can ensure that we only walk closer to our target at each step of the search.
Forming direct connections
Given a chain of connections A>B>C>D we should be able to form a direct connection of A>B. We can form incrementally closer connections by using each step of the chain as a signaller to punch a connection to the next part of the chain. I will refer to this increasingly closer connection as a
Feeler
.With the Chain A>B>C>D we can start by requesting B to signal a hole punch between A and C. If the hole-punch is successful then we now have a connection chain of A>C>D. We can then repeat the step use C as the new signaller to form a connection between A and D thus completing the desired direct connection.
At each step the 'signalling' node is directly connected to both nodes and thus is aware of the 'NAT-ified' ports and addresses. It can easily coordinate a hole-punch between the two nodes. So long as we can find a connection chain we can form a connection between two nodes through
NAT
s without a centralised signaller.If we fail to form a more direct connection then an alternative chain will need to be found. Given a graph that is sufficiently connected then there should be multiple routes to the target. If for some reason a direct connection can't be formed but the target node is holding a connection. This process should ideally connect us with a node that IS holding an active connection to the target node and can act as a relay for said node. It's possible that not all nodes will like to act as a data relay so we may need a way to negotiate with the target node to connect to a relay node that will relay data.
Full process
We can combine the discovery and connection forming to create increasing closer connections to our target. the process will look something like this.
Eventually this should form a direct connection to our target. We at any point in time concurrently have alpha
Feelers
walking closer to the target during this process.If we had existing discovery knowledge about the network such as know connection chains that lead to the target then that can be used to greatly speed up the process. if we had knowledge of an existing connection chain that leads to the target then we could just do the connection forming process without doing any discovery. Likewise if we had a partial chain that we know leads closer to our target then we can use that as a starting point and reduce how much we need to search.
Additional Resources
Sub-Issues & Sub-PRs created
MDNS
integration andNodeGraph
structure expansion #537The text was updated successfully, but these errors were encountered: