Repiet: The Brutalizing Piet Recompiler

Repiet is a compiler for the Piet graphical language, written in Python. The name "repiet" is meant to convey that it can recompile Piet programs into Piet. Additionally, there are Python and C++ backends (more to come, and please contribute!).

To compile a Piet program, we lex it, parse it, and optionally perform some optimizations. The result of parsing and subsequent passes are parse graphs; our intermediate representation. The compiler (repiet.compiler.compiler) collaborates with backends to grok this representation, and emit a program in a target language.

We describe repiet as a "brutalizing recompiler" because it will take your beautiful, hand-crafted Piet program and reconstruct it in a brutalist style. Who knows why you'd want that. The images referenced in this README are a somewhat randomly-generated Piet program -- tweaked a little bit so it does something and also terminates. Spoiler: it takes a number from the user, emits the corresponding ascii character, and terminates, but spends most of its time NOPping.

First Pass: Lexing

The lexer (repiet.lexer.Lexer) computes lexemes, or cardinally-connected sets of same-colored pixels, and identifies whitespace and blocking features (by default, these are all pixels which are neither white nor coding colors; this may be controlled through opinions described in repiet.util). As the lexemes are computed, we locate eight corners of each; corresponding to the eight (dp, cc) states. As whitespace is located, we identify the 4 pixels reached by sliding in each of four dp directions.

The role of the lexer is to eliminate pixel-based computations from subsequent passes.

Second Pass: Parsing

The parser begins in the upper-left corner of the image, and computes a parse graph whose nodes correspond to (lexeme, dp, cc) states. For each node, we compute the operation (if any) resulting from the (dp, cc)-ward transition from the lexeme, as well if the next visited lexeme (if any). If the operation is switch or pointer, there are two or four possible next-visited lexemes respectively.

The role of the parser is to eliminate the dp and cc from subsequent passes. Thus, interpreters and compilers of the intermediate representations need only grok parse graph, handle input, output, and other stack-based operations.

Intermediate Representation

A parse graph is a rooted digraph containing Nodes (repiet.util.Node) which consist of a name, zero or more operations, and a list of outgoing neighbor's names. To compile this representation into another language, one need only implement the trampoline pattern.

A node may have zero, one, two, or four children. These respectively correspond to halting, jumping, and the switch or pointer operations. Hence, a node's operation list is constrained such that only the final operation may be switch or pointer. To implement these two operations, a compiled program pops a value from the stack, takes that value modulo 2 or 4, respectively, and uses the result as the index to the list of children. If the stack is empty, the value is taken to be zero.

The backends (repiet.backends) are quite rudimentary, and only grok this very simple IR. Thus, the only optimizations available to us are those which perform surgery on parse graphs.

Optimizing Passes

We implement a Tracer (repiet.tracer.Tracer) which collects a parse graph into sequences of non-branching operations. The result is a new parse graph, typically with fewer nodes. Depending on the backend chosen, this may be a slight optimization.

Additionally, we implement a rudimentary Static Evaluator (repiet.optimizer.StaticEvaluator), which maintains compile-time stack while tracing through instructions. Presently, the static evaluator stops whenever the program (a) takes input from the user, (b) tries to pop from an empty stack, or (c) attempts to roll beyond the depth of the stack. Further optimizations may be possible: the values coming off of an empty stack could be treated as symbols, for example. Currently the backends are incapable of representing symbolic variables, so such operations would require quite a bit of work.

The static evaluator is a work in progress, and it looks reasonable to interpose a stack-depth analyzer between the tracer and static evaluator: the language spec recommends skipping instructions that pop from an empty stack, which is especially visible in this page's working example. The existing static evaluator does not drop those instructions, but they're apparently rare in hand-crafted or assembled Piet programs.

Installing and Using `repiet`

Repiet is a python package, with a standard setup.py. To get the very latest, fetch the git repo and install from there.

git clone https://github.com/boothby/repiet.git
cd repiet
python setup.py install

Otherwise, just run pip install repiet and you're off to the races. The primary interface to repiet is the module's __main__, but you can also import it repiet from python and poke around the module structure. The __main__ can be used either as an executable python module,

python -m repiet ...

or depending on how it's been installed, directly from the command line

repiet ...

But is it Faster Than C?

Yes! Well, let's back up a minute for the folks missing context. Recently there have been a spate of blog posts of folks claiming that their pet language is "faster than C" by implementing a feature-incomplete version of the wc utility and running a single benchmark to demonstrate superiority.

After cloning the repiet repo, point your command line at that directory. We've placed a wc utility into the assets directory, and we're going to time it versus the wc supplied by my operating system (you don't need to know any details about my system, of course).

$ cd assets
$ repiet wc.png -o wc.c --backend c --codel_size 10 -O 2
$ gcc wc.c -o wc -O3

Okay, now our wc utility is built; let's grab some data to test.

$ wget https://github.com/dwyl/english-words/raw/master/words_dictionary.json

Now, we're off to the races! Don't worry, I didn't cherrypick these benchmarks at all!

$ time cat words_dictionary.json | wc -w
740204

real	0m0.145s
user	0m0.136s
sys	0m0.036s

Wow that's fast. Now let's our hand-optimized Piet hotrod!

$ time cat words_dictionary.json | ./wc
740204
real	0m0.131s
user	0m0.140s
sys	0m0.028s

Holy wowzers, that's extremely peppy! Good thing I only ran those tests once, and didn't show a distribution of runtimes!

So there you go, Piet is faster than C. Tell your grandma.

Notes on Style and Quality

This project was written purely for the amusement of the author. Hence, the code in this project is idiomatic and [golfed](h

Repiet

Install / Use

README