⚠️ UPDATE 12/12/18

Due to several cease and desist notices coming from the FBI, we’ve made the decision of taking TheOnionBay down.

Thanks to all of our users (Alice and Bob) for these weeks of file sharing.

Installation/development setup

Install python3 and pipenv on your machine, then on the root of the project run:

    $ pipenv install
    $ pipenv shell

This will install all python dependencies and activate the project’s virtual environment.

Usage

(Main) Option 1 - Online

You can find TheOnionBay online on the following links:

Tracker

http://theonionbay.club/

It will display the list of files available in the network and its neighbours.

Nodes

http://node1.theonionbay.club/

http://node2.theonionbay.club/

http://node3.theonionbay.club/

http://node4.theonionbay.club/

http://node5.theonionbay.club/

http://node6.theonionbay.club/

Each node displays traffic information in the network, to help visualise what is going on.

Clients

http://alice.theonionbay.club/

http://bob.theonionbay.club/

These two already set up clients have a list of predefined files that they can share with each other. Adding a third client is a trivial task.

All machines are synchronized with the GitHub repository.

Option 2 - LAN

You can also run TheOnionBay on a range of local machines.

Firstly, run the tracker:

    $ python3 tracker.py

Then, you will need to run at least three nodes with:

     $ python3 node.py <ip>

where <ip> is the public ip of the current machine.

And finally, one or more clients.

    $ python3 client.py <path-to-list-of-files>

where <path-to-list-of-files> is the path to a *.json file containing the files information. Examples can be found on the client folder.

Keep in mind this option is LAN-oriented, so the processes must run in different machines.

Onion Routing

Terms and definitions

In the next sections we will talk about fields having names such as UpID, DownCID, etc… In these names, Up and Down stand for upstream and downstream, therefore referring to the connection towards the tracker or towards the client, respectively.

Creating the tunnel

A peer A selects three nodes X, Y, Z among those in the node pool and starts by building a circuit. For that, A will send a message that will create the whole circuit at once. Broadly speaking, the message is sent to X, but it contains a payload that is forwarded to Y, and subsequently another payload is forwarded to Z.

The message is sent by POST HTTP, and is as follow:

{
    "CID": "35ce8f756b5748248597dd756c75a9c5",
    "aes_key": "<encrypted AES key for X>",
    "payload": "<encrypted data>"
}

where CID stands for CircuitID and is a randomly generated UUIDv4 (it is globally unique). The field aes_key is the session key (16 bytes, for example 62E45FA2AA90DA900007FE59C88FDAEC) that is shared between A and X, encrypted with RSA, so that only X can decrypt it (once encrypted with RSA, the key is 128 bytes long). The field payload is encrypted with the AES key, and contains the data for extending the tunnel.

Instead of a key, we could very well send key material (a seed) so the node generates the key on its own with a key derivation function. In addition to this simplified Needham-Schroeder key exchange, the codebase contains the necessary Diffie-Hellman key agreement protocol if it was to be used.

The decrypted payload should be interpreted as a JSON object, and has the following structure:

{
    "to": "<IP of Y>",
    "aes_key": "<encrypted AES key for Y>",
    "relay": "<encrypted data>"
}

The field to contains the IP address of the next node, where X will forward the message. In order to do that, X will first generate a new CID for the communication between X and Y, let’s say this new CID is e9ca363dc386415d9c13-127e0ca0b673. The fields aes_key and relay are destined to Y and are encrypted so that only Y can read it.

Now, X has five pieces of data about its connection to A and the next hop, Y:

<DownIP, DownCID, SessKey, UpIP, UpCID>

Where DownIP is the IP address of A, DownCID is the CID between A and X ( 35ce8f756b5748248597dd756c75a9c5), SessKey is the AES symmetric key between A and X (62E45FA2AA90DA900007FE59C88FDAEC), UpIP is the IP of Y, and UpCID is the CID between X and Y (e9ca363dc386415d9c13127e0ca0b673).

X will put all this data in its relay table (which is implemented with 2 dictionnaries indexed by DownCID and UpCID) so later on, when it receives a message whose CID is in DownCID or UpCID it knows where to relay the message, and it knows the session key to encrypt/decrypt the payload.

X can now forward the tunnel creation message to Y. It will create a new message for Y, such as

{
    "CID": "e9ca363dc386415d9c13127e0ca0b673",
    "aes_key": "<encrypted AES key for Y>",
    "payload": "<encrypted data>"
}

where aes_key and payload are just copied from the message shown above.

When Y receives this message, the procedure is exactly the same as what X did, since the message Y receives contains the same fields as what X received from A (note that the AES key that Y receives is shared between A and Y). Now, Y forwards the tunnel creation message to Z in the same way. The only difference for Z is that there is no longer a “aes_key” field for the tracker, since the messages are plaintext between the exit node Z and the tracker.

As you may notice, we are modelling stateful multiplexed TLS secured TCP connections (actual implementation of the node network in TOR) with the data structure in each node that relates inbound IP’s, CID’s, SessKey with outbound IP’s, CID’s.

Connecting the tracker

The final payload that Z forwards to the tracker is actually a list describing which files A can share on the torrent network. The tracker responds by giving the list of all files made available by the other users of the network.

Message exchange once the tunnel is established

The tracker always gets messages in plaintext. This is the TOR way and the simplest way. No encryption after the exit nodes.

When a client sends a message to the tracker, the payload is encrypted by the client with the three symmetric keys of the three nodes X, Y and Z. Each node decrypts the payload and relay to the next (thus it is plaintext between Z and the tracker). When the tracker responds, the message is also plaintext between the tracker and Z. Z encrypts the payload with its symmetric key and send to Y. Y encrypts with its symmetric key, and so on. When the client receives a message, it always decrypts the payload with its three symmetric keys.

Data structures in a node

A node contains four tables in memory to handle message and file forwarding: two relay tables, the Upstream File Sharing Table and the Downstream File Sharing Table.

Relay tables

The relay tables are filled as described in the previous section, when a node receives for the first time a message from an unknown CID, or the first time a message is relayed for a given CID. Its fields are

(key)DownCID DownIP SessKey UpIP UpCID
         

and

(key)UpCID UpIP SessKey DownIP DownCID
         

Where DownIP is the IP address of the previous node downstream, DownCID is the CID of the connection to the previous node downstream, SessKey is a symmetric key shared with the client all the way downstream, UpIP is the IP of the next node upstream, and UpCID is the CID of the connection to the next node.

Upstream File Sharing Table

This table is filled when a node is responsible for transmitting a file coming from its client to another client through a bridge. Each file sharing have an unique ID, named a File Sharing ID, or FSID for short. The File Sharing Table is as follow

BridgeCID BridgeIP FSID
     

This tells the node where to forward a file message that arrives with a given FSID.

Downstream File Sharing Table

This table is filled when a node is going to receive a file coming from another node through a bridge. The DownCID is one of the downstream connections that the node has already, and BridgeCID is the CID that the node at the other side of the bridge is using to connect to this node (CID9 in the image below). To get the correct downstream ID and SessKey to forward a file message, the Relay Table should be indexed with DownCID.

DownCID BridgeCID
   

File Sharing Protocol

Messages sent by the client

The client can send three types of messages. These JSON structures are carried in the payload transmitted through the nodes.

This describes what files a client is ready to share, and is as follow:

{
    "type": "ls",
    "files": ["titanic", "privateryan", "shawshank"]
}

As described before, this message is the first one transmitted by the client to the tracker.

A client can request a file to the tracker.

{
    "type": "request",
    "file": "borat"
}

A client can send a the content of a file after the tracker asked it to do so.

{
    "type": "file",
    "file": "<name of the file>",
    "data": "<content of the file>",
    "FSID": "<FSID>"
}

The FSID field is a number uniquely identifying the file sharing process, and is explained below. It is also important to include the name of the file, so that the target client knows what file it received.

Messages sent by the tracker

A tracker can send three types of messages. But since they are a bit more complicated than the messages sent by clients, we will rather describe chronologically the process of file sharing from the point of view of the tracker.

A note on control messages:

Nodes can at any point receive control messages from the tracker. How do they differentiate between messages intended for them and messages that have to go back to the client ? The URL on which the message is sent is appended with /control, for example http://node1.theonionbay.club/control.

Assume the following setting:

Two clients, C1 and C2, are connected to the tracker, and we gave a name to the CircuitID of each link between nodes. Suppose C1 sends to the tracker a request to the tracker for a file named Dikkenek.avi that C2 can provide. We will describe how the bridge between Z2 and Z1 is established. How the actual file is transmitted is described later. The procedure is as follow:

1- The tracker assigns a File Sharing ID (FSID) to this file sharing operation. This is a number, taken from a counter incremented at each file request. Let’s name this number FSID1.

2- The tracker creates a new CircuitID that will be used in the link to bridge Z2 to Z1. Let’s name this new CID CID9.

3- The tracker sends a control message to Z2, instructing it to add a new entry to its Upstream File Sharing Table to properly redirect the file to Z1. The message being sent is

{
    "type": "make_bridge",
    "bridge_CID": "<CID9>",
    "to": "<IP of Z1>",
    "FSID": "<FSID1>"
}

Note that this is the only message where the field "CID" is not used. In this case, the FSID is sufficient to indicate to Z2 which messages should be redirected to the bridge, it is independent from its link to the tracker.

Upon receiving this message, Z2 creates an entry in its Upstream File Sharing Table:

BridgeCID BridgeIP FSID
CID9 IP of Z1 FSID1

So whenever a file message labelled with FSID1 arrives to Z2, it knows where to forward the message, and which CID to put in the message.

The situation is now looking like that:

4- The tracker sends control message to Z1, instructing it to add a new entry to its Downstream File Sharing Table. This will allow the file coming from Z2 to be properly redirected to Y1 and ultimately to C1. This control message is

{
    "CID": "<CID4>",
    "type": "receive_bridge",
    "bridge_CID": "<CID9>"
}

In order to find the downstream link to which redirect file messages from the bridge, Z1 executes a lookup in its Relay table in the UpCID column with the CID it shares with the tracker (CID4). The matching DownCID and DownIP (which are CID3 and the IP of Y in our example) indicate where to forward file messages.

Z1 can fill the Downstream File Sharing Table with that information:

DownCID BridgeCID
<CID3> <CID9>

We do not store the DownIP or the SessKey here, as this would be redundant with the data in the Relay table.

1- Now that the bridge is established between Z2 and Z1, the tracker sends a request message to C2, instructing it to share its file name Dikkenek.avi, by using the File Sharing ID FSID1. The payload of the message is

{
    "type": "request",
    "file": "Dikkenek.avi",
    "FSID": "<FSID1>",
}

2- Upon receiving this request, the message has been encrypted three times by the three nodes (Z2, Y2 and X2). C2 decrypts the message, fetch the file and send the file sharing message to its tunnel through X2. This message is encrypted three times. The payload is as described in the previous section:

{
    "type": "file",
    "file": "Dikkenek.avi",
    "data": "<content of the file>",
    "FSID": "<FSID>"
}

3- When the message arrives to Z2, Z2 removes the last layer of encryption and sees that this is a file sharing message (the payload is now plaintext). Therefore, Z2 does a lookup in its Upstream File Sharing Table by using the FSID contained in the message. This tells Z2 that the file should be transmitted to Z1 with CID CID9.

4- Z2 forwards the file message to Z1 (through the bridge):

{
    "CID": "<CID9>",
    "payload": {
        "type": "file",
        "file": "Dikkenek.avi",
        "data": "<content of the file>"
    }
}

At this points, the FSID is no longer needed, so it is not transmitted any more.

5- Upon receiving the message from Z2, Z1 does a table lookup in the Downstream File Sharing Table, and finds CID9 in the column BridgeCID. The matching DownCID is CID3. Z1 can therefore look at the Relay Table to find the IP and SessKey associated to CID3, and finds the IP of Y1, and K_Z1 for the symmetric key. Z1 therefore has every needed information to encrypt and send the message back to C1 through Y1.

6- The message arrives back, encrypted three times to C1. C1 decrypts the three layers, and finds the file message:

{
    "type": "file",
    "file": "Dikkenek.avi",
    "data": "<content of the file>"
}

Et voilà.