SkillAgentSearch skills...

Terrorvm

Lightweight, fast Virtual Machine for dynamic, object-oriented languages.

Install / Use

/learn @txus/Terrorvm
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

TerrorVM Build Status

A lightweight Virtual Machine for dynamic, object-oriented languages. It aims to be fast, as simple as possible, easily optimizable, with LLVM support, and easily targetable for language designers and implementors. That's why its interface (instruction set and bytecode format) is extensively documented and an example compiler is provided under the compiler folder.

Before anything, I want to give special thanks to my awesome mentors [Jeremy Tregunna][jtregunna], [Brian Ford][brixen], [Dirkjan Bussink][dbussink] and [Evan Phoenix][evanphx]. Without their teachings and patience I would never have started this in the first place.

Disclaimer

I'd love to discuss literally anything about my choices regarding the design and the implementation of TerrorVM. Feel free to ping me [on twitter][twitter], drop me [an email][email], or if you are in Berlin, just grab some beers together :) After all,

I have no idea what I'm
doing

Object model

In TerrorVM, everything is an object, and every object may have a prototype. The basic value types that the VM provides are:

  • Number: Double-precision floating point numbers.
  • String: Immutable strings.
  • Vector: Dynamically sized vectors that may contain any type.
  • Map: Hashmaps (for now only strings are supported as keys).
  • Closure: A first-class function.
  • True: True boolean.
  • False: False boolean.
  • Nil: Represents nothingness. It is falsy just like false.

These basic types are objects themselves (of type Object). They are the prototype for any objects of their own kind, and are provided with all the functionality that those objects will need -- this is done in the preludes ([alpha][alpha] and [beta][beta]), I'll explain what those are a bit further ahead.

Objects are simply collections of slots that may contain any kind of object.

I'm considering adding Traits, although I'll wait until I see a need for it. In the simplicity of Terror lies its power.

The VM runtime object

TerrorVM exposes as much of itself as possible at runtime. The goal of this is to make it easily targetable and flexible. For example, the toplevel object VM exposes two subobjects (types and primitives). VM.types is a map of all the VM types like this:

{
  :object => Object,
  :vector => Vector,
  :number => Number,
  ...
}

Primitives contains all the native functions that the VM exposes (such as puts, print, clone, arithmetic primitives, etc):

{
  :clone => #<Closure ...>,
  :puts => #<Closure ...>,
  ...
}

Preludes

As you already know, TerrorVM tries to implement as much as possible in its own code, rather than C. This makes it a perfect candidate as a multi-language VM to implement any language on top of it. You can find the high-level [alpha][alpha] and [beta][beta] preludes and their respective compiled versions [alpha][alpha_native] and [beta][beta_native].

The alpha prelude wires up the VM primitives to the real objects at runtime, so that your code can use them conveniently. This is our current prelude in high-level Ruby (interpreted by the VM in the bootstrap phase):

VM.types[:object].clone = VM.primitives[:clone]
VM.types[:object].print = VM.primitives[:print]
VM.types[:object].puts  = VM.primitives[:puts]

VM.types[:number][:+] = VM.primitives[:'number_+']
VM.types[:number][:-] = VM.primitives[:'number_-']
VM.types[:number][:/] = VM.primitives[:'number_/']
VM.types[:number][:*] = VM.primitives[:'number_*']

VM.types[:vector][:[]] = VM.primitives[:'vector_[]']
VM.types[:vector][:to_map] = VM.primitives[:vector_to_map]

Beautiful, isn't it? :)

In later stages, such as [beta][beta], we define other high-level functions like Vector#map.

If you wish to change any kernel files such as the prelude, you'll have to recompile the files to the native TVM format, like this:

$ make kernel

And if you add more high-level examples (in Ruby) under the compiler/examples folder, you must recompile them as well:

$ make examples

Debugging your TerrorVM programs

TerrorVM ships with a debugger that you can use to debug your programs. The debugger can set breakpoint at specific lines and step through either high-level lines of code or low-level bytecode instructions.

To use the debugger, pass -d as a second argument to terror:

$ bin/terror -d examples/functions.tvm

Here's an example session:

/Users/txus/Code/terrorvm/compiler/examples/functions.rb:1
1    > a = 123
2      foo = 123
3      self.fn = -> foo {
4        # foo is shadowed because it's a local argument

> n
DEBUG src/terror/vm.c:82: PUSH 0
DEBUG src/terror/vm.c:284: SETLOCAL 0

/Users/txus/Code/terrorvm/compiler/examples/functions.rb:2
1      a = 123
2    > foo = 123
3      self.fn = -> foo {
4        # foo is shadowed because it's a local argument
5        a + foo

>

The debugger will always show you the high-level code (in compiler/examples/functions.rb) so you know where you are at every point.

The commands for the debugger are:

h: show help
s: step to the next bytecode instruction
n: step to the next line of code
c: continue execution
d: show the stack
l: show locals
t: show backtrace
b: set breakpoint in a line. Example: b 30

Implementing your own dynamic language running on TerrorVM

TerrorVM is designed to run dynamic languages. You can easily implement a compiler of your own that compiles your favorite dynamic language down to TVM bytecode.

I've written a demo compiler in Ruby under the compiler/ folder, just to show how easy it is to write your own. This demo compiler compiles a subset of Ruby down to TerrorVM bytecode, so you can easily peek at the source code or just copy and modify it.

You can write your compiler in whatever language you prefer, of course.

Garbage collection

The algorithm of choice for TerrorVM was [Baker's treadmill][treadmill], an incremental, real-time, non-moving GC algorithm, implemented in [libtreadmill][libtreadmill].

Unfortunately I couldn't make it work so for now I'm using a simple Mark and Sweep implemented in [libsweeper][libsweeper] as a separate library and included via Git submodules.

Concurrency

This is a really important topic these days, not to be overlooked. Although its concurrency support is not in place yet, it will feature forking, threads and coroutines, but I might change my mind as I learn more.

Bytecode format

The bytecode format might change to be more compact, but I'll describe what it is for now. A file must contain a main block, and may contain other blocks (functions defined there). This is how a block looks like (if you're curious, it's just a hello world):

_0_main
:2:8
"hello world
"puts
16 PUSHSELF
17 PUSH
0
128 SEND
1
1
20 PUSHNIL
144 RET

As you can see, _main, defines the entry point of the file. Then these mysterious numbers :2:8 mean that this block has two literals and eight lines of instructions. There are actually only 5 instructions, but the operands for these instructions count as well, so we're in a total of 8.

Right after these counts, we have the literals, each one in its own line. There are two kinds of literals: numbers and strings. Numbers are just numbers, but strings must be preceded by a ".

And finally we get to eight lines of numbers, namely the instructions and their operands. The labels you see beside every instruction (PUSHSELF) are totally optional, the VM doesn't read them, but they help debugging when looking at a bytecode file manually.

After that there might be more functions. Imagine our hello world defined an empty closure, then we'd have right after 144 RET:

_4_block_153
:0:2
20 PUSHNIL
144 RET

That's it! :)

Examples (high-level Ruby code and its Terror compiled counterpart)

Instruction set

  • NOOP: no operation -- does nothing.

Values

  • PUSHSELF: pushes the current self to the stack.
  • PUSHLOBBY: pushes the Lobby (toplevel object) to the stack.
  • PUSH A: pushes the literal at index A to the stack.
  • PUSHTRUE: pushes the true object to the stack.
  • PUSHFALSE: pushes the false object to the stack.
  • PUSHNIL: pushes the nil object to the stack.

Local variables

  • PUSHLOCAL A: pushes the local at index A to the stack.
  • SETLOCAL A: sets the current top of the stack to the local variable A. Does not consume any stack.
  • PUSHLOCALDEPTH A, B: pushes the local at index B from an enclosing scope (at depth A) to the stack.
  • SETLOCALDEPTH A, B: sets the current top of the stack to the local variable B in an enclos

Related Skills

View on GitHub
GitHub Stars42
CategoryDevelopment
Updated2y ago
Forks1

Languages

C

Security Score

75/100

Audited on Feb 25, 2024

No findings