SkillAgentSearch skills...

Dollynator

Autonomous self-replicating code

Install / Use

/learn @Tribler/Dollynator
About this skill

Quality Score

0/100

Supported Platforms

Universal

README


Dollynator


|jenkins_build|

A self-replicating autonomous Tribler exit-node.

Dollynator (formerly PlebNet) is an Internet-deployed Darwinian reinforcement learning system based on self-replication. Also referred to as a botnet for good, it consists of many generations of autonomous entities living on VPS instances with VPN installed, running Tribler_ exit-nodes, and routing torrent traffic in our Tor-like network.

While providing privacy and anonymity for regular Tribler users, it is earning reputation in form of MB tokens stored on Trustchain, which are in turn put on sale for Bitcoin on a fully decentralized Tribler marketplace. Once the bot earns enough Bitcoin, it buys a new VPS instance using Cloudomate_, and finally self-replicates.

The name Dollynator pays tribute to Dolly the sheep (the first cloned mammal) and the artificial intelligence of Terminator. It might also remotely resemble Skynet, a self-aware network that went out of control.

Bootstrapping

The first running node needs to be installed manually. One of the options is to buy a VPS using Cloudomate, and install Dollynator from a local system using the plebnet/clone/create-child.sh script.

::

Usage: ./create-child.sh [options] -h --help Shows this help message -i --ip Ip address of the server to run install on -p --password Root password of the server -t --testnet Install agent in testnet mode (default 0) -e --exitnode Run as exitnode for tribler -conf --config (optional) VPN configuration file (.ovpn) Requires the destination config name. Example: -conf source_config.ovpn dest_config.ovpn -cred --credentials (optional) VPN credentials file (.conf) Requires the destination credentials name. Example -cred source_credentials.conf dest_credentials.conf -b --branch (optional) Branch of code to install from (default master)

Example:

.. code-block:: console

./create-child.sh -i <ip> -p <password> -e -b develop

For development purposes, it is also useful to know how to run the system locally_.

.. _how to run the system locally: INSTALL.rst

Lifecycle

The life of a bot starts by executing plebnet setup command, which prepares the initial configuration, starts an IRC bot, and creates a cronjob running plebnet check command every 5 minutes.

The whole lifecycle is then managed by the check command. First, it ensures Tribler is running. Then it selects a candidate VPS provider and a specific server configuration for the next generation, and calculates the price. One of the pre-defined market strategies is used to convert obtained MB tokens to Bitcoin. Once enough resources are earned, it purchases the selected VPS and VPN options using Cloudomate.

Finally, it connects to the purchased server over SSH, downloads the latest source code from GitHub, install required dependencies, sets up VPN, and runs plebnet setup to bring the child to life. At that moment, the parent selects a new candidate VPS and continues to maximize its offspring until the end of its own contract expiration.

Gossiping

Information is shared across the network through gossiping.

What is gossiping

Gossiping or epidemic protocols have been around for decades now and they have shown to have many desirable properties for data dissemination, fast convergence, load sharing, robustness and resilience to failures. Although there are many variants of the gossiping protocol available, both traditional and not protocols adhere to the same basic gossiping framework.

Each node of the system maintains a partial view of the environment. Interactions between peers are periodic and pairwise exchange of data among peers that is organised as follows: every node selects a partner to gossip with among all its acquaintances in the network and it selects the information to be exchanged. The partner proceeds to the same steps, resulting in a bidirectional exchange between partner nodes.

Direct communication between nodes

Communication between nodes is carried out using socket technology; each node maintains a list of contacts, containing the necessary information to reach a number of nodes in the botnet using Berkeley Socket API. Each node makes sure to keep its list updated and dependable exchanging information about the network with the rest of the nodes.

Secure messaging

A secure communication is guaranteed by the use of both RSA (asymmetrical) and Advanced Encryption Standard (symmetrical) cryptographic algorithms. RSA is used to safely share symmetric keys for AES encryption and to sign messages across the network.

Reinforcement Learning

The choice of the next VPS to buy is dictated by a modification of the QD-Learning algorithm, a technique that scales Q-Learning onto distributed newtorks.

.. TODO: what can we learn about providers? VPS option can be out of stock/Cloudomate broken/provider IP subnet blocked/find most efficient configurations

What is Q-Learning?

Q-Learning is a reinforcement learning technique. The aim of this technique is to learn how to act in the environment. The decision process is based on a data structure called Q-Table, which encodes rewards given by the environment when specific actions are performed in different states.

In a regular Q-Learning scenario, the values in Q-Table are updated as follows:

.. image:: http://latex.codecogs.com/gif.latex?Q_%7Bnew%7D%28s_%7Bt%7D%2Ca_%7Bt%7D%29%5Cleftarrow%20%281-lr%29+lr*%28reward%20+discount%20*%5Cmax_%7Ba%7D%28s_%7Bt+1%7D%2Ca%29%29

discount is a discount factor (how important gains of future steps are)

lr is a learning rate

s(t) is a current state

s(t+1) is a subsequent state

a is an action, leading to a next state

What is QD-Learning?

QD-Learning scales the knowledge provided by Q-Learning techniques on a distributed network of agents. Its goal it exploiting single agents' experiences to have them investigate on their own Q-Tables, whilst at every iteration of the algorithm have every node collaborate with each other by merging their Q-Table with their gossiping neighbour's. The QD-Learning algorithm proposed by Soummya Kar, José M. F. Moura and H. Vincent Poor in their paper__ performs two types of updates on a node's Q-Table whenever the agent completes an action:

  • it updates its Q-Table cells objects of the completed action by merging the corresponding cells of received Q-Tables from other peers
  • it updates first its environment, then its Q-Table based on its own experience gained over time

The two steps of the QD-Learning algorithm update are weighted by time-dependent factors, respectively beta and alpha, which grow inversely proportional over time to ensure eventual convergence to a single optimal Q-Table for every agent. More specifically, at the beginning the update algorithm values higher individual exploration of agents over information coming from remote Q-Tables (thus alpha >> beta), although as time and updates progress the relevance of remote information eventually becomes the single affecting factor on Q-Tables.

.. _Paper: https://doi.org/10.1109/TSP.2013.2241057 .. _: Paper

Reinforcement Mappings

We define a few mappings which are used in a reinforcement learning jargon:

  • states and actions - VPS offers

  • environment – transition matrix between states and actions. This determines what reinforcement we will get by choosing a certain transition. Initially all 0s.

  • current_state – current VPS option

Initial values

Initial values for Q-Table are, just as for the environment, set all to 0.

How does it work in Dollynator?

In Dollynator, we use our own variation of QD-Learning. As we are not fully aware of the environment and our reinforcements for each state, we learn them on the go.

The main difference with the QD-Learning proposed in literature is the avoidance of reaching a forced convergence. This means that over time the releveance of a node's individual experience on the update fucntion does not get annihilated and overwhelmed by the remote information's weight: instead, alpha has a low-bar set at 0.2 (or 20% weight on the update formula) and beta is capped at a maximum of 0.8 (or 80% weight).

Environment is getting updated by each try of replication:

  • when a node manages to buy a new option and replicate, environment is updated positively (all the column corresponding to the successfully bought state)

  • when nodes fails to buy an option, environment is updated negatively (all the column corresponding to the chosen failed state)

  • regardless of the outcome of the buying attempt, the column corresponding to the agent's current state is entirely updated based on how efficient it has proven to be. The efficiency value is based on how many MB tokens a given node has earned over period of time and money invested in the VPS where it resides (all of which is normalized according to heuristics on previous reports and current direct experience).

After updating the environment values, Q-Table is recalculated one more time to find the action maximizing our possible gains for each state.

What is passed to the child?

  • state (provider name + option name), corresponding to the newly bought VPS service

  • name (a unique id)

  • tree of replications (a path to the root node)

  • providers_offers (all VPS offers for all providers)

  • current Q-Table

Final remarks about reinforcement learning

To choose an option from Q-Table we use an exponential distribution with lambda converging decreasingly to 1. As lambda is changing with number of replications, this process is similar to **simulated anne

Related Skills

View on GitHub
GitHub Stars27
CategoryDevelopment
Updated11mo ago
Forks19

Languages

Python

Security Score

82/100

Audited on Apr 24, 2025

No findings