Twingraph
TwinGraph is a Python framework for distributed container orchestration using Kubernetes clusters, Docker Compose/Swarm or cloud resources on AWS (AWS Lambda, AWS Batch, Amazon EKS). Applications include high-throughput simulations, simulation-driven optimization, Digital Twins and machine learning.
Install / Use
/learn @aws-samples/TwingraphREADME
<img src="docs/figures/icon.svg#gh-light-mode-only" media="(prefers-color-scheme: light)" width=20><img src="docs/figures/icon.svg#gh-dark-mode-only" media="(prefers-color-scheme: dark)" width=20> TwinGraph
TwinGraph provides a Python-based high-throughput container orchestration framework for simulation, predictive modeling and optimization workflows. It supports asynchronous multi-host computing, dynamic directed-acyclic-graph (DAG) pipelines, and recording of custom workflow artifacts and attributes within a graph database, for repeatability and auditability.
TwinGraph is used by adding decorators to Python functions to record attributes associated with these functions, such as inputs/outputs, source code and compute platform in a TinkerGraph or Amazon Neptune database. It is also optionally a graph orchestrator using Celery in the backend and runs the decorated functions on a chosen compute (AWS Batch, AWS Lambda, Amazon EKS) and container orchestrator (Kubernetes, Docker Compose) in an asynchronous manner.
TwinGraph can be used to run a few concurrent or linked compute tasks, or scale up to hundreds of thousands of containerized compute tasks in a number of different compute nodes/hosts; communication of information between tasks is handled through message queues in an event-driven workflow chain. An example architectural flow of information is shown in Figure 1:
<center> <img src="docs/figures/OverallPicture.png" width=780></center> <p align="center">Figure 1: Overall Information Flow.</p>There are a number of key capabilities outlined in the following Figure 2 for TwinGraph. Instead of domain specific language (DSL) for specifying pipelines, TwinGraph uses algorithms defined in native Python (loops, conditionals & recursion) to define the control flow, i.e. dependencies of graph structure on intermediate outputs and runtime stochasticity.
<center> <img src="docs/figures/Challenges.png" width=780></center> <p align="center">Figure 2: Challenges in Designing Orchestrators</p>The examples highlighted in the next section provide an overview of how to use TwinGraph.
Supported Operating Systems:
- Linux
- Amazon Linux 2, Centos 7, RHEL, Fedora, Ubuntu, etc.
- Other Operating Systems
- Please use the Docker Compose installation option
API Documentation
There are two API calls within TwinGraph - component and pipeline - information about these can be found here.
Examples
There are a series of examples located in the examples/orchestration_demos folder. These are designed to show how TwinGraph can be used through the two exposed Python decorator APIs, 'component' and 'pipeline' to orchestrate complex workflows on heterogeneous compute environments. The summary and links to the individual Readmes is given below. Before running AWS service specific examples (Demo 1-) please ensure aws-cli is configured, and run update_credentials.py. To stop running Celery tasks, use stop_and_delete.py.
| Example | Description | | ----------- | ----------- | | Demo 1 | Demonstrate graph tracing capability with local compute| | Demo 2 | Building a Docker container & graph tracing with containerized (Docker) compute | | Demo 3 | Automatically including git history in traced attributes | | Demo 4 | Using Celery to dispatch tasks and perform asynchronous computing | | Demo 5 | Using Celery to dispatch and run containerized tasks on Kubernetes (MiniK8s, EKS) | | Demo 6 | Using Celery to dispatch and run containerized tasks on AWS Batch| | Demo 7 | Using Celery to dispatch tasks to AWS Lambda| | Demo 8 | Using Celery to dispatch multiple tasks to different platforms/environments| | Demo 9 | Using Amazon Neptune instead of TinkerGraph for graph tracing | | Demo 10 | Running multiple pipelines together |
(Linux) Installation Option 1: Quick Installation
Prerequisites Step (install Make, Git, Docker and Python):
- RHEL/Centos/Fedora/AL2:
sudo yum install make git docker python
- Ubuntu/Debian:
sudo apt-get install -y make git python3 python-is-python3 apt-transport-https curl gnupg-agent ca-certificates software-properties-common unzip python3-pip
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"
sudo apt install -y docker-ce docker-ce-cli containerd.io
- Clone the repository:
git clone https://github.com/aws-samples/twingraph.git
Installation Step:
Navigate to the TwinGraph folder, where the Makefile is located, and key in:
make install
make docker_containers_poetry
Note that this step might indicate some missing packages on some distributions, please remedy this by installing any missing base packages as indicated during the installation.
This make install and the make docker_containers_poetry commands are run once to setup the environment for message passing, graph recording or visualization.
After installing and running TwinGraph the first time, for subsequent usage, if the docker socket is restarted or the containers are no longer running, you can use the convenience script provided to restart the containers (this will also prune stopped containers):
sh start_containers.sh
Note If the containers are already running, this will throw an error when trying to start the containers - you only need one instantiation/container of each image running at a time for most use cases.
Additionally, when using TwinGraph, ensure either that the TwinGraph environment is enabled using poetry, which has it installed:
poetry shell
Alternatively install the package in your own existing environment with the appropriate version, for which you can build first:
poetry build
Once it completes, activate your existing Python virtual environment (conda/venv) and install TwinGraph:
(existing-python-env) pip install dist/twingraph-*.whl
For users of Kubernetes and Amazon EKS, please also install kubectl and eksctl separately, and optionally Kubernetes dashboard for monitoring and debugging.
Uninstallation Step:
Once again, inside the TwinGraph folder with Makefile, key in:
make uninstall
- The packages AWS-CLI and poetry virtual environment have to be removed manually:
poetry env remove <PATH-TO-PYTHON-EXE>
(Mac & Windows) Installation Option 2: Running in Docker
-
Install and ensure Docker with the Compose tool is running:
docker info -
Clone/Download TwinGraph, and within the folder with
docker-compose.yamlrun Docker Compose, and open an interactive shell:docker compose up -d- Note: When running for the first time,
composemight show error in the first line as the image does not exist yet, but this will be remedied automatically by the script building/pulling the container images.
- Note: When running for the first time,
-
Open an interactive shell to run the code, or develop within the container using VSCode remote development extension (next point).
docker exec -it twingraph-twingraph-1 bash -
Please install VS Code development using the remote containers extension - it is advised for ease of use, together with the Docker extension where you can attach a new session to running containers.
The interactive Shell enables you to run and execute code, but will not persist once the containers are cleaned up. For this, please add a local folder (outside the container) to the docker-compose.yaml file after line 16 as follows:
14 volumes:
...
17 - /LOCAL-FOLDER:/home/twingraph-user/DOCKER-FOLDER
It is then possible to work on local files outside of the container and use the container only to deploy them with TwinGraph.
- Once completed, you can bring down all the containers (for this, open a shell outside the container, in the same folder where TwinGraph was downloaded or cloned where
docker-compose.yamlresides):docker compose down - Note: When using Compose, TwinGraph API decorator variables need to point to Redis, Gremlin server
