pytm: A Pythonic framework for threat modeling

Introduction

Traditional threat modeling too often comes late to the party, or sometimes not at all. In addition, creating manual data flows and reports can be extremely time-consuming. The goal of pytm is to shift threat modeling to the left, making threat modeling more automated and developer-centric.

Features

Based on your input and definition of the architectural design, pytm can automatically generate the following items:

Data Flow Diagram (DFD)
Sequence Diagram
Relevant threats to your system

Requirements

Linux/MacOS
Python 3.x
Graphviz package
Java (OpenJDK 10 or 11)
plantuml.jar

Getting Started

The tm.py is an example model. You can run it to generate the report and diagram image files that it references:

mkdir -p tm
./tm.py --report docs/basic_template.md | pandoc -f markdown -t html > tm/report.html
./tm.py --dfd | dot -Tpng -o tm/dfd.png
./tm.py --seq | java -Djava.awt.headless=true -jar $PLANTUML_PATH -tpng -pipe > tm/seq.png

There's also an example Makefile that wraps all these into targets that can be easily shared for multiple models. If you have GNU make installed (available by default on Linux distros but not on OSX), simply run:

make MODEL=the_name_of_your_model_minus_.py

You should either have plantuml.jar on the same directory as your model, or set PLANTUML_PATH.

To avoid installing all the dependencies, like pandoc or Java, the script can be run inside a container:

# do this only once
export USE_DOCKER=true
make image

# call this after every change in your model
make

Getting Started - Devbox Variant

To simplify the usage of pytm host dependencies can be completely isolated using Devbox. This is usually a lower overhead and more convenient alternative to the OCI container approach.

Install Devbox on Linux/MacOS: curl -fsSL https://get.jetify.com/devbox | bash
Install Devbox on Windows/WSL
Update to latest version of devbox: devbox version update
Set your GitHub access token in the ~/.config/nix/nix.conf file file: access-tokens = github.com=YOUR_TOKEN_HERE
Create a new, isolated shell environment that includes all the tools and packages specified in the project's devbox.json file: devbox shell
Display the full path to the Python executable that will be used when you simply type python in your terminal by using the which python command. The output should be the following path: .devbox/nix/profile/default/bin/python
Test by running the following command, which should generate a DFD as a PNG file called sample.png: ./tm.py --dfd | dot -Tpng -o sample.png
Exit the Devbox shell environment: exit

Usage

All available arguments:

usage: tm.py [-h] [--sqldump SQLDUMP] [--debug] [--dfd] [--report REPORT]
             [--exclude EXCLUDE] [--seq] [--list] [--describe DESCRIBE]
             [--list-elements] [--json JSON] [--levels LEVELS [LEVELS ...]]
             [--stale_days STALE_DAYS]

optional arguments:
  -h, --help            show this help message and exit
  --sqldump SQLDUMP     dumps all threat model elements and findings into the
                        named sqlite file (erased if exists)
  --debug               print debug messages
  --dfd                 output DFD
  --report REPORT       output report using the named template file (sample
                        template file is under docs/template.md)
  --exclude EXCLUDE     specify threat IDs to be ignored
  --seq                 output sequential diagram
  --list                list all available threats
  --colormap            color the risk in the diagram
  --describe DESCRIBE   describe the properties available for a given element
  --list-elements       list all elements which can be part of a threat model
  --json JSON           output a JSON file
  --levels LEVELS [LEVELS ...]
                        Select levels to be drawn in the threat model (int
                        separated by comma).
  --stale_days STALE_DAYS
                        checks if the delta between the TM script and the code
                        described by it is bigger than the specified value in
                        days

The stale_days argument tries to determine how far apart in days the model script (which you are writing) is from the code that implements the system being modeled. Ideally, they should be pretty close in most cases of an actively developed system. You can run this periodically to measure the pulse of your project and the 'freshness' of your threat model.

Currently available elements are: TM, Element, Server, ExternalEntity, Datastore, Actor, Process, SetOfProcesses, Dataflow, Boundary and Lambda.

The available properties of an element can be listed by using --describe followed by the name of an element:


(pytm) ➜  pytm git:(master) ✗ ./tm.py --describe Element
Element class attributes:
  OS
  definesConnectionTimeout        default: False
  description
  handlesResources                default: False
  implementsAuthenticationScheme  default: False
  implementsNonce                 default: False
  inBoundary
  inScope                         Is the element in scope of the threat model, default: True
  isAdmin                         default: False
  isHardened                      default: False
  name                            required
  onAWS                           default: False

The colormap argument, used together with dfd, outputs a color-coded DFD where the elements are painted red, yellow or green depending on their risk level (as identified by running the rules).

Usage - Devbox Variant

devbox shell
pytm usage as usual
exit

Creating a Threat Model

The following is a sample tm.py file that describes a simple application where a User logs into the application and posts comments on the app. The app server stores those comments into the database. There is an AWS Lambda that periodically cleans the Database.


#!/usr/bin/env python3

from pytm.pytm import TM, Server, Datastore, Dataflow, Boundary, Actor, Lambda, Data, Classification

tm = TM("my test tm")
tm.description = "another test tm"
tm.isOrdered = True

User_Web = Boundary("User/Web")
Web_DB = Boundary("Web/DB")

user = Actor("User")
user.inBoundary = User_Web

web = Server("Web Server")
web.OS = "CloudOS"
web.isHardened = True
web.sourceCode = "server/web.cc"

db = Datastore("SQL Database (*)")
db.OS = "CentOS"
db.isHardened = False
db.inBoundary = Web_DB
db.isSql = True
db.inScope = False
db.sourceCode = "model/schema.sql"

comments = Data(
    name="Comments", 
    description="Comments in HTML or Markdown",  
    classification=Classification.PUBLIC,  
    isPII=False,
    isCredentials=False,  
    # credentialsLife=Lifetime.LONG,  
    isStored=True, 
    isSourceEncryptedAtRest=False, 
    isDestEncryptedAtRest=True 
)

results = Data(
    name="results", 
    description="Results of insert op",  
    classification=Classification.SENSITIVE,  
    isPII=False, 
    isCredentials=False,  
    # credentialsLife=Lifetime.LONG,  
    isStored=True, 
    isSourceEncryptedAtRest=False, 
    isDestEncryptedAtRest=True 
)

my_lambda = Lambda("cleanDBevery6hours")
my_lambda.hasAccessControl = True
my_lambda.inBoundary = Web_DB

my_lambda_to_db = Dataflow(my_lambda, db, "(&lambda;)Periodically cleans DB")
my_lambda_to_db.protocol = "SQL"
my_lambda_to_db.dstPort = 3306

user_to_web = Dataflow(user, web, "User enters comments (*)")
user_to_web.protocol = "HTTP"
user_to_web.dstPort = 80
user_to_web.data = comments

web_to_user = Dataflow(web, user, "Comments saved (*)")
web_to_user.protocol = "HTTP"

web_to_db = Dataflow(web, db, "Insert query with comments")
web_to_db.protocol = "MySQL"
web_to_db.dstPort = 3306

db_to_web = Dataflow(db, web, "Comments contents")
db_to_web.protocol = "MySQL"
db_to_web.data = results

tm.process()

You also have the option of using pytmGPT to create your models from prose!

Generating Diagrams

Diagrams are output as Dot and PlantUML.

When --dfd argument is passed to the above tm.py file it generates output to stdout, which is fed to Graphviz's dot to generate the Data Flow Diagram:


tm.py --dfd | dot -Tpng -o sample.png

Generates this diagram:

Adding ".levels = [1,2]" attributes to an element will cause it (and its associated Dataflows if both flow endings are in the same DFD level) to render (or not) depending on the command argument "--levels 1 2".

The following command generates a Sequence diagram.


tm.py --seq | java -Djava.awt.headless=true -jar plantuml.jar -tpng -pipe > seq.png

Generates this diagram:

Creating a Report

The diagrams and findings can be included in the template to create a final report:


tm.py --report docs/basic_template.md | pandoc -f markdown -t html > report.html

The templating format used in the report template is very simple:


# Threat Model Sample
***

## System Description

{tm.description}

## Dataflow Diagram

![Level 0 DFD](dfd.png)

## Dataflows

Name|From|To |Data|Protocol|Port
----|----|---|----|--------|----
{dataflows:repeat:{{item.name}}|{{item.source.name}}|{{item.sink.name}}|{{item.data}}|{{item.protocol}}|{{item.dstPort}}
}

## Findings

{f

Pytm

Install / Use

README