-
Notifications
You must be signed in to change notification settings - Fork 452
Hidden Services Specifications for anonymous seeding
The design specification described is mostly based on the ideas and excellent work of the people behind the Tor Project. Tor Hidden services are the leading solution for anonymous webhosting, but unsuitable for video streaming - like YouTube, because it is too slow. Tor also depends on a number of 'trusted' central directory servers. Our approach uses the UDP protocol and does not rely on central directory servers.
In the original Tor design, central directory servers called HSDirs are used for retrieving information about a hidden service, like the service-key and public keys of the introduction points. In a peer-to-peer environment such a central server does not exist, the protocol works in a fully decentralized network of BitTorrent peers. Our solution for retrieving the essential information to connect to a hidden seeding service is by storing all required information in our built-in DHT. See the figure below for which information is stored in the DHT. The fields that are marked "VL" have a variable length and use the first 2 bytes to indicate the length.
Both the seeder and downloader use circuits while accessing the DHT, but with the current implementation the introduction point on itself knows which info hash is shared and what the rendezvous point will be. This is a known weakness in the protocol, but is to be solved later on in future work, when opportunistic encryption in a web of trust is reality.
The introduction points and rendezvous point for downloading over hidden seeding services should always be connectible, to allow a downloader to build a circuit and connect to the introduction point of the seeder, and to allow the seeder to build a circuit to the rendezvous-point of the downloader. The current approach in the tunnel community is to require every exit-node in the network to be connectible. In this case, there is no doubt about the connectability of the introduction point, as the introduction point itself is in fact an exit-node of a circuit initiated by the seeder. This also solves the connectability problem for the rendezvous point, as the rendezvous point is in fact an exit-node of a circuit initiated by the downloader. Solving the connectability problem for the introduction and rendezvous points is not essentially the same: the introduction point always needs to be connectible for strangers, but an unconnectable rendezvous point can be punctured by letting the hop that needs to connect to the rendezvous point propagate its identity back to the seeder, via the circuit to the introduction point relayed to the downloader, then propagated to the rendezvous-point which on its turn sends a IPv8 puncture to the last hop. This will only work for the rendezvous point because there is already an existing circuit around. For the introduction point it will always be necessary to be connectible to the outside world.
Message cells always start with a circuit_id
, which is required to identify to which circuit a particular message belongs. The remainder of the message consists of an encrypted payload. At each hop, this payload will be encrypted or decrypted depending on our position within the circuit. The following sections describe a scenario where Bob owns some files and wants to share them via peer to peer over the BitTorrent network. Alice is interested in downloading this file.
In preparation for seeding files over the BitTorrent network, Bob builds up at least one anonymous circuit to let another node serve as an introduction point for his seeding services.
The original info hash of the torrent is prepended with a string tribler anonymous download
, and the SHA1 hash is calculated over this string, resulting in a modified info hash to be used for finding hidden services. For each file Bob is seeding, a unique keypair is generated and stored in the session.
By sending an establish-intro
message with the modified info hash of the torrent file to the last hop of a circuit, this last hop becomes an introduction point for the modified info hash. The payload byte format of the establish-intro
message is shown below.
After receiving an establish-intro
message, the introduction point responds with an intro-established
acknowledgment message back to the seeder. The introduction point will also announce the torrents' info hash to the IPv8 DHT. This way, the DHT can be queried to return introduction points for a given info hash. The intro-established
message does not have any additional payload. The payload is shown in the figure below.
When Alice knows the info hash of the torrent file seeded by Bob, she can calculate the modified info hash by prepending tribler anonymous download
and calculating the SHA1 hash on this string. By querying the DHT for the modified info hash, she finds Bob in the list of introduction points. A direct DHT lookup by Alice reveals her interest in the torrent file and leaks her privacy. To prevent this leakage, Alice asks for peers over any circuit from the pool. She sends a peers-request
message with the modified info hash over this circuit. The byte format of the payload is shown in the figure below.
The last hop receiving the peers-request
cell will do a peer lookup for the info hash, and the returned peers are packed into a peers-response
cell. Peer lookup is currently done through the IPv8 DHT or using the PexCommunity
, which is a special IPv8 overlay that allows introduction points to exchange contact information. The byte format of the peer-respones
payload is shown below.
When Alice receives the peers-response
message, she will likely find the introduction point chosen by Bob in the list of peers.
Alice sends a create-e2e
message over a circuit that will exit over the exiting socket into the introduction point of Bob. The byte format of the payload is shown below.
Note that the circuit-id is omitted in this message. This is by design, the create-e2e
message is exited as an IPv8 message over the exit socket of an existing circuit with the introduction point as its final destination. Therefore no circuit is involved, as the introduction point does not receive the message over a circuit.
If the introduction point recognizes the info hash received by the create-e2e
, the message is relayed onto the introduction circuit leading to Bob.
Bob establishes a rendezvous circuit to a rendezvous point (RP). This circuit is required to have a connectable hop at the end of the circuit, as the rendezvous point is required to accept inbound connections from the downloaders' circuit. After building the circuit, an establish-rendezvous
message will be sent over this circuit to the last hop, with a random chosen single-use rendezvous-cookie as payload. The rendezvous point is now waiting for an inbound connection with a valid cookie, to link the end-to-end circuit. The byte format of the payload is shown below.
After receiving the establish-rendezvous
message, the node is marked as a rendezvous point, it will associate the circuit it is connected to the received rendezvous-cookie. It will then reply with a rendezvous-established
message back to Bob. The byte format of the payload of this message is shown below.
When Bob receives the rendezvous-established
, he can acknowledge the create-e2e
message with a created-e2e
message. This message contains all the information needed by Alice to build a circuit ending in the rendezvous point chosen by Bob. It is sent over the circuit ending in the introduction point of Bob. The introduction point will look up the corresponding circuit identifier in the payload, and relay the message downwards into the exiting socket from the exit-circuit initiated by Alice. The byte format of the payload for this message is shown below.
When Alice receives the Alice builds a new circuit ending at the rendezvous-point of Bob, and sends a link-e2e
message along this circuit. The byte format of the payload is shown below.
When the rendezvous point receives a link-e2e
message and the rendezvous-cookie provided in the payload is the same as the cookie that the seeder communicated earlier, the two circuits are combined into 1 circuit, where inbound and outbound data is relayed into the circuits replacing the exit sockets. The linking of the circuits is acknowledged by a linked-e2e
cell back to Alice, without additional payload. The byte format of the payload is shown below.
When Alice receives a linked-e2e
message, the handshake is completed. Alice initiates the downloading via a libtorrent session that gets its data over the circuit, via the rendezvous point, from Bob. This circuit is end-to-end encrypted between Alice and Bob, making it impossible for outsiders to see what data is transferred over the circuit. Moreover, Bob and Alice are communicating with each other, but don't know each other's real identity.
To set up a hidden seeding service, tunnels from different entities have to be created in parallel with each other. The table below explains which messages are transferred for setting up a connection between a seeder and downloader.
Seeder | Downloader | IP | RP | Exit | |
---|---|---|---|---|---|
establish-intro | from | to | |||
intro-established | to | from | |||
establish-rendezvous | from | to | |||
rendezvous-established | to | from | |||
create-e2e | to | from | relay | ||
created-e2e | from | to | relay | ||
link-e2e | from | to | |||
linked-e2e | to | from | |||
peers-request | from | to | |||
peers-response | to | from |
The figure below provides a schematic overview of the hidden services message flow. Note that all messages are tunneled, except for the DHT/PEX announce (which takes place within the DHT/PEX overlay). The peers-request
and peers-response
messages are tunneled through a separate tunnel, while all other messages are either tunneled through a circuit that ends in the introduction point, or through a circuit that ends in the rendezvous point.