hobbes

a language, embedded compiler, and runtime for efficient dynamic expression evaluation, data storage and analysis

|section |description | |------------------------------------|:------------------------------------------------------------:| |Building |how to build and install hobbes | |Embedding |use hobbes in C++ programs | |Evaluation |evaluate basic hobbes expressions | |Storage |record data for out-of-band analysis | |Networking |interact with remote hobbes processes | |Comprehensions |transform/sort/join/filter/group sequences for data analysis | |Pattern Matching |efficiently classify and destructure data | |Parsing |parse text with LALR(1) grammars | |Type Classes |overloading and compile-time calculation | |Unqualifier Modules|user-defined "compiler plugins" for custom constraint handling|

Note on Hobbes Usage

Hobbes is built for high performance integration with C/C++ applications. While Hobbes is a strongly typed language that offers compile-time checks, it doesn't have a sandboxed runtime environment or runtime safety features. By design, Hobbes gives direct access to memory and does not have array bounds checks. Additionally, Hobbes supports compilation and execution of native code remotely over a network (RPC). This feature is meant for use within your trusted internal network only. If you choose to utilize such functionality, you need to be aware of these design choices and understand the security implications.

Building <a name="building"></a>

To build hobbes, you will need LLVM 3.3 or later, cmake 3.4 or later, GNU gcc 4.8 or later, and a version 2.5 or later Linux kernel.

With LLVM, cmake, and g++ installed, after downloading this code you should be able to build and install hobbes just by running:

$ cmake .
$ make
$ make install

Depending on where you've installed LLVM, this might not work until you give cmake the path to LLVM's cmake modules. In particular, you want the path to LLVM's LLVMConfig.cmake file. If you set the environment variable LLVM_DIR to point to that directory, then the previous steps should complete successfully.

The build process will produce a static library, libhobbes.a, which can be linked into a C++ executable (if you want to use hobbes in a .so instead, the build produces a different static library to use, libhobbes-pic.a).

In addition, the build process will produce two utility programs, hi and hog. The hi program is a basic interactive interpreter for hobbes expressions and the hog program will efficiently record structured data produced by applications into data files (these files can be loaded and queried at the same time by hi). The source code for these programs can be informative as well, because they demonstrate many different aspects of the hobbes API.

Embedding <a name="embedding"></a>

Let's consider how to embed hobbes in a simple C++ program. This code implements a very basic shell, similar to hi:

#include <iostream>
#include <stdexcept>
#include <hobbes/hobbes.H>

int main() {
  hobbes::cc c;

  while (std::cin) {
    std::cout << "> " << std::flush;

    std::string line;
    std::getline(std::cin, line);
    if (line == ":q") break;

    try {
      c.compileFn<void()>("print(" + line + ")")();
    } catch (std::exception& ex) {
      std::cout << "*** " << ex.what();
    }

    std::cout << std::endl;
    hobbes::resetMemoryPool();
  }

  return 0;
}

First, to compile any expression we need to construct a hobbes::cc object. Then, in the context of an exception handler, we can compile functions out of this hobbes::cc object with the compileFn method, giving it the type we expect back (void() in this case) and also a string that we expect can compile to a value of that type. If any stage of compilation fails (parse error, type mismatch, etc) then an exception with details about the failure will be raised. Finally, we call hobbes::resetMemoryPool() to release any memory that might have been dynamically allocated by our compiled expression (that is, memory allocated by compiled expressions but not the compiled functions themselves -- those are released only when hobbes::cc is destroyed or when hobbes::cc::releaseMachineCode is used to destroy them).

When a compiled function decides to allocate memory, that allocation happens out of a "memory region". A memory region is a dynamically-growable sequence of bytes where allocations come from, and it is deallocated in bulk when hobbes::resetMemoryPool() is called. This makes allocation and deallocation very efficient, but requires some thought to establish "logical transaction" boundaries. The active memory region is thread-local, so you can use the same function pointer in several threads concurrently without worrying about synchronization issues.

Finally, if we put the above program in a file called "test.cpp" then we can build it like this:

$ g++ -pthread -std=c++17 -I <path-to-hobbes-headers> -I <path-to-llvm-headers> test.cpp -o test -L <path-to-hobbes-libs> -lhobbes -ldl -lrt -ltinfo -lz -L <path-to-llvm-libs> `llvm-config --libs x86asmparser x86codegen x86 mcjit passes`

The explicit path statements may not be necessary depending on where/how LLVM and hobbes have been installed on your system. The inline invocation of the llvm-config program is typical with users of LLVM, to avoid explicitly listing several libraries.

If everything worked correctly, we should now have a simple shell where we can evaluate hobbes expressions.

Evaluation <a name="evaluation"></a>

Now that we have a working shell, we can experiment with expressions to get a sense of how the hobbes language works. To start with, we have some typical constant values:

> 'c'
'c'
> 42
42
> 3.14159
3.14159

In total, these are the set of supported primitive types/constants:

|name |example|description | |------|-------|:--------------------------------------------------:| |unit |() |like 'void' in C, a trivial type with just one value| |bool |false|either true or false | |char |'c' |a single character of text | |byte |0Xff |a single byte (0-255) | |short |42S |a 2-byte number | |int |42 |a 4-byte number | |long |42L |an 8-byte number | |float |42.0f|a 4-byte floating point value | |double|42.0 |an 8-byte floating point value |

These primitives can be combined with arrays:

> [1, 2, 3]
[1, 2, 3]
> "foobar"
"foobar"
> 0xdeadbeef
0xdeadbeef

They can also be combined with records/tuples:

> {name="Jimmy", age=45, job="programmer"}
{name="Jimmy", age=45, job="programmer"}
> ("Jimmy", 45, "programmer")
("Jimmy", 45, "programmer")

Or with arrays of records/tuples, where they will print conveniently as a table:

> [{name="Jimmy", age=45, job="programmer"}, {name="Chicken", age=40, job="programmer"}]
   name age        job
------- --- ----------
  Jimmy  45 programmer
Chicken  40 programmer
> [("Jimmy", 45, "programmer"),("Chicken", 40, "programmer")]
  Jimmy 45 programmer
Chicken 40 programmer

We can combine types with variants or sums (the "nameless" form of variants, as tuples are to records). Because "naked" variant introduction is incomplete by definition, we may sometimes need to introduce explicit type annotations:

> |food="pizza"|::|vehicle:int,food:[char]|
|food="pizza"|
> |1="pizza"|::(int+[char])
|1="pizza"|

Also, variants can be combined with isorecursive types to make linked lists. These have a convenient form when printed, and built-in functions to avoid the recursive and variant type boilerplate:

> roll(|1=("chicken", roll(|1=("hot dog", roll(|1=("pizza", roll(|0=()|))|))|))|) :: ^x.(()+([char]*x))
"chicken":"hot dog":"pizza":[]
> cons("chicken", cons("hot dog", cons("pizza", nil())))
"chicken":"hot dog":"pizza":[]

These types and type constructors together make up the "algebraic data types" common in the ML family of programming languages (SML, ocaml, Haskell, ...). The syntax and names for these types also come from ML and common academic programming language textbooks as in TaPL and PFPL.

As in Haskell, hobbes supports a form of qualified types with type classes and even user-defined constraint resolvers (which we will see in more detail later). Among many other uses, this allows both mixed-type arithmetic and type inference to coexist:

> 0X01+2+3.0+4L+5S
15
> (\x y z u v.x+y+z+u+v)(0X01,2,3S,4L,5.0)
15

We can also use the hi program (a slightly more complex interpreter than the example earlier, distributed with hobbes) to evaluate expressions like this and also inspect their types. For example, the types for the primitive expressions we considered earlier can be queried:

$ hi
hi : an interactive shell for hobbes
      type ':h' for help on commands

> :t ()
()
> :t false
bool
> :t 'c'
char
> :t 0Xff
byte
> :t 42S
sho

Hobbes

Install / Use

README

hobbes

Note on Hobbes Usage

Building <a name="building"></a>

Embedding <a name="embedding"></a>

Evaluation <a name="evaluation"></a>