Tstl
Template Scripting Testing Language tool: automated test generation for Python
Install / Use
/learn @agroce/TstlREADME
TSTL: the Template Scripting Testing Language
TSTL is a domain-specific language (DSL) and set of tools to support automated generation of tests for software. This implementation targets Python. You define (in Python) a set of components used to build up a test, and any properties you want to hold for the tested system, and TSTL generates tests for your system. TSTL supports test replay, test reduction, and code coverage analysis, and includes push-button support for some sophisticated test-generation methods. In other words, TSTL is a property-based testing tool.
What is property based testing? Property-based testing is testing that relies not on developers specifying results for specific inputs or call sequences, but on more general specification of behavior, combined with automatic generation of many tests to make sure that the general specification holds. For more on property-based testing see:
-
https://fsharpforfunandprofit.com/posts/property-based-testing/
-
https://hypothesis.works/articles/what-is-property-based-testing/
-
https://github.com/trailofbits/deepstate (a tool mixing symbolic analysis and fuzzing with property-based testing, for C and C++, with design somewhat informed by TSTL)
TSTL has been used to find and fix real faults in real code, including ESRI's ArcPy (http://desktop.arcgis.com/en/arcmap/latest/analyze/arcpy/what-is-arcpy-.htm), sortedcontainers (https://github.com/grantjenks/sorted_containers), gmpy2 (https://github.com/aleaxit/gmpy), sympy (http://www.sympy.org/en/index.html), pyfakefs (https://github.com/jmcgeheeiv/pyfakefs), Python itself (https://bugs.python.org/issue27870), the Solidity compiler (https://github.com/ethereum/solidity), a Solidity static analysis tool (https://github.com/crytic/slither), the Vyper compiler (e.g. https://github.com/ethereum/vyper/issues/1658), and even OS X.
Installation
You can grab a recent tstl most easily using pip. pip install tstl should work fine. If you want something even more recent you can do:
git clone https://github.com/agroce/tstl.git
cd tstl
python setup.py install
For code coverage, you will also need to install Ned Batchelder's coverage.py tool; pip install coverage is all that is needed.
TSTL in a Nutshell
To get an idea of how TSTL operates, let's try a toy example. We will use TSTL to solve a simple "puzzle" to see if it is possible to generate the integer value 510 using only a few lines of Python code, using only a small set of operations (add 4, subtract 3, multiply by 3, and produce a power of two) starting from 0.
- Create a file called
nutshell.tstlwith the following content:
@import math
# A line beginning with an @ is just python code.
pool: <int> 5
# A pool is a set of values we'll produce and use in testing.
# We need some integers, and we'll let TSTL produce up to 5 of them.
# The name is a variable name, basically, but often will be like a
# type name, showing how the value is used.
<int> := 0
<int> += 4
<int> -= 3
<int> *= 3
{OverflowError} <int> := int(math.pow(2,<int>))
# These are actions, basically single lines of Python code.
# The big changes from normal Python are:
# 1. := is like Python assignment with =, but also tells TSTL this
# assignment _initializes_ a value.
# 2. <int> is a placeholder meaning _any_ int value in the pool.
# 3. {OverflowError} means that we want to ignore if this line of
# Python produces an uncaught OverflowError exception.
# A test in TSTL is a sequence of actions. So, given the above, one
# test would be:
#
# int3 = 0
# int4 = 0
# int3 *= 3
# int4 += 4
# int3 = 0
# int2 = int(math.pow(2,int4))
# int2 -= 3
# As you can see, the actions can appear in any order, but every
# pool variable is first initialized by some := assignment.
# Similarly, TSTL may use pool variables in an arbitrary order;
# thus we never see int0 or int1 used, here, by chance.
# The size of the int pool determines how many different ints can
# appear in such a test. You can think of it as TSTL's "working
# memory." If you have a pool size of 1, and an action like
# foo(<int>,<int>) you'll always call foo with the same value for both
# parameters -- like foo(int0,int0). You should always have a pool
# size at least as large as the number of times you use a pool in a
# single action. More is often better, to give TSTL more ability to
# bring back in earlier computed values.
property: <int> != 510
# property: expresses an invariant of what we are testing. If the
# boolean expression evaluates to False, the test has failed.
As in normal Python, # indicates a comment. Comment lines are below
the TSTL code being described.
- Type
tstl nutshell.tstl. - Type
tstl_rt --normalize --output nutshell.test.
This should, in a few seconds, find a way to violate the property
(produce the value 510), find a maximally-simple version of that
"failing test", and produce a file nutshell.test that contains the
test. If we had omitted the {OverflowError} TSTL would either have
found a way to produce 510, or (less likely) would have found a way to
produce an overflow in the pow call: either would be considered a failure.
- Type
tstl_replay nutshell.test --verbose.
This will replay the test you just created.
- Comment out (using
#as usual in Python code) the line<int> -= 3. Now try runningtstl_rt.
The core idea of TSTL is to define a set of possible steps in a test, plus properties describing what can be considered a test failure, and let TSTL find out if there exists a sequence of actions that will produce a test failure. The actions may be function or method calls, or steps that assemble input data (for example, building up a string to pass to a parser), or, really, anything you can do with Python.
Using TSTL
TSTL installs a few standard tools: the TSTL compiler itself, tstl; a random test generator
tstl_rt; a tool for producing standalone tests, tstl_standalone;
a tool for replaying TSTL test files, tstl_replay; a tool for
delta-debugging and normalization of TSTL tests, tstl_reduce; and a tool for running a set of tests as a regression, tstl_regress.
You can do most of what you'll need with just the commands tstl, tstl_rt, tstl_replay, and tstl_reduce.
tstl <filename.tstl>compiles a.tstlfile into ansut.pyinterface for testingtstl_rtruns random testing on thesut.pyin the current directory, and dumps any discovered faults into.testfileststl_replay <filename.test>runs a saved TSTL test, and tells you if it passes or fails; with--verboseit provides a fairly detailed trace of the test executiontstl_reduce <filename.test> <newfilename.tstl>takes<filename.test>runs reduction and normalization on it to produce a shorter, easier to understand test, and saves the output as<newfilename.tstl>.
All of these tools offer a large number of configuration options; --help will produce a list of supported options for all TSTL tools.
Extended Example
The easiest way to understand TSTL may be to examine examples/AVL/avlnew.tstl (https://github.com/agroce/tstl/blob/master/examples/AVL/avlnew.tstl), which is a simple example file in the latest language format.
avlnew.tstl creates a pretty full-featured tester for an AVL tree class. You can
write something very quick and fairly effective with just a few lines
of code, however:
@import avl
pool: <int> 3
pool: <avl> 2
property: <avl>.check_balanced()
<int> := <[1..20]>
<avl> := avl.AVLTree()
<avl>.insert(<int>)
<avl>.delete(<int>)
<avl>.find(<int>)
<avl>.display()
This says that there are two kinds of "things" involved in our
AVL tree implementation testing: int and avl. We define, in
Python, how to create these things, and what we can do with
these things, and then TSTL produces sequences of actions, that is
tests, that match our definition. TSTL also checks that all AVL trees, at all times, are
properly balanced. If we wanted, as in avlnew.tstl, we could also
make sure that our AVL tree "acts like" a set --- when we insert
something, we can find that thing, and when we delete something, we
can no longer find it.
Note that we start with "raw Python" to import the avl module, the SUT. While TSTL supports using from, aliases, and wildcards in imports, you should always import the module(s) under test with a simple import. This allows TSTL to identify the code to be tested and automatically provide coverage, static analysis-aided testing methods, and proper module management. Utility code in the standard library, on the other hand, can be imported any way you wish.
If we test this (or avlnew.tstl) for 30 seconds, something like this will appear:
~/tstl/examples/AVL$ tstl_rt --timeout 30
Random testing using config=Config(swarmSwitch=None, verbose=False, fastQuickAnalysis=False, failedLogging=None, maxtests=-1, greedyStutter=False, exploit=None, seed=None, generalize=False, localize=False, uncaught=False, speed='FAST', internal=False, normalize=False, highLowSwarm=None, replayable=False, essentials=False, quickTests=False, coverfile='coverage.out', uniqueValuesAnalysis=False, swarm=False, ignoreprops=False, total=False, swarmLength=None, noreassign=False, profile=False, full=False, multiple=False, relax=False, swarmP=0.5, stutter=None, running=False, compareFails=False, nocover=False, swarmProbs=None, gendepth=None, quickAnalysis=False, exploitCeiling=0.1, logging=None, html=None, keep=False, depth=100, throughput=False, timeout=30, output=None, markov=None, startExploit=0)
12 [2:0]
-- < 2 [1:0]
---- < 1 [0:0] L
---- > 5 [0:0] L
-- > 13 [1:-1]
---- > 14 [0:0] L
set([1, 2, 5, 12, 13, 14])
...
11 [2:0]
-- < 5 [1:0]
---- < 1 [0:0] L
---- > 9 [0:0] L
-- > 14 [1:-1]
---- > 18 [0:0] L
set([1, 5, 9, 11, 14, 18
Related Skills
gh-issues
335.2kFetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]
node-connect
335.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
82.5kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
82.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
