lamtez: a typed lambda-calculus compiling to Tezos Michelson VM

Lamtez is Domain-Specific Language for smart contracts, and its compiler to Michelson, Tezos' virtual machine for smart contracts execution.

It is released under the MIT license, with no guaranty of any kind (among other, there's no proof that it compiles to sound nor correct Michelson code). I'm interested in feedbacks anyway, if you experiment with it.

Building

The compiler is written in OCaml; the build process relies on ocamlbuild, and it depends on the parser generator menhir. If opam is installed on your system, opam install ocamlbuild menhir should be enough to get a building environment. If you compiled Tezos from sources, Lamtez requirements are a subset of Tezos'.

Examples

If you would rather look at source code examples first, before reading the fine manual, you can go and check EXAMPLES.md. It reimplements contract exmaples from [http://michelson-lang.com] and the Michelson specification.

Cheat sheet

For a quick summary of the language syntax and main features, a PDF two-pages summary is available as https://github.com/fab13n/lamtez/raw/master/CHEAT-SHEET.pdf.

General considerations on the language

The language is strongly influenced by ML dialects OCaml and Haskell: mostly functional, relying heavily on sum-types and product-types, statically typed with type inference. It also composes with some limitations in Michelson. Most of the limitations in Michelson language follow from the following assumptions:

Most useful smart contracts will be rather simple, commpared to typical programs written be profesionnal developers in general-purpose languages;
A very high level of confidence in contracts correctness will be required. Given how finding and exploiting bugs in smart contracts can be turned into tokens and actual money, every contract handling significant amounts of money will be thoroughly reviewed and attacked by black-het hackers, several of them smarter than the contract's author and white-hat reviewers.
Tezos being self-amending, it's better to start with a very limited language and progressively add empirically needed features, rather than start with many bells and whistles: some of them might prove less secure than anticipated, or not as useful as it first seemed, but each of them making formal proofs more tedious or complicated to produce.

Michelson closely ressembles a simply-typed lambda calculus, without native recursivity nor proper closures; drastic limitations also exist on the ability for a contract to call other contracts.

Compared to hand-written Michelson, Lamtez contracts offer higher-level features:

Sum and product types of arbitrary size and with arbitrary field/case names. In Michelson, sums with more than two alternatives and products with more than two fields have to be encoded with nested alternatives and pairs, thus quickly becoming hard to read for humans.
lexically scoped variables: keeping track of what is at which level of the stack is cumbersome when writing a program, and it makes reading someone else's contracts dreadful. Being able to name things rather than shuffling them on a stack greatly improves contract readability.
Limited support for closures: a function can use variables declared outside of it. It is done by either compiling functions as a pair of a LAMBDA with a tuple capturing its environment from the stack, by beta-reducing the closures when possible, or by reimplementing MAP and REDUCE operations with LOOP when the function needs its environment. Some cases remain, though, where Lamtez is not able to compile a closure.
type inference: as most ML dialects, lamtez uses a variant of the W algorithm to guess types with very little user annotations. However, contrary to ML, it expects the final contract not to be polymorphic, as Michelson doesn't support it. So you might get some "Add more type annotations" errors. Annotating function parameters is sufficient, although often not necessary, to ensure a monomorphic contract type. (inner polymorphic functions are technically possible, but NOT YET IMPLEMENTED)
storage management: instead of manually managing the storage, variables prefixed with @ are declared and stored in it, and can be updated in the contract through a simple and readable syntax (under some conditions). An option is provided to declare and extract an initial storage value from the contract source file.
TRANSFER_TOKEN, the act of calling another contract, can only be done under very strict conditions in Michelson (the stack must be empty). This restriction is partially lifted in Lamtez: it will dedicate a field in the storage to save the stack before calling, and restore it after it returns.

(note: currently, the stack is saved rather naively: a big sum type is reserved in the storage, one per invocation of TRANSFER_TOKEN, and the whole stack is saved in it. First the stack could be pruned before saving, as some slots won't be used after the contract call anymore, and two invocations with the same stack types above them should share their slot)

Lamtez is mostly functional. It has two kinds of side effects: storage updates and calls to other contracts. Normally, functional programming languages will only let you touch side effects with a ten foot pole. They typically call that pole a "monad"; wielding a monad towards unfamiliar developers scares them about as much as doing so with an actual ten foot pole.

Lamtez doesn't explicitly supports monads, but it enforces some (coarser) constraints about where side effects are accepted. Namely, these operations can't happen in an inner function (we couldn't keep track of them without monads then), nor in places where evaluation order isn't intuitively obvious: not in tuples/products, not in function arguments.

Whitespaces are not significant except to separate tokens, indentation is not significant, comments start with a # sign and run until the end of the current line.

Lamtez is not proved correct in any way; the proper way to ensure any correctness property is by working directly on the generated Michelson code. It is annotated and commented, among other with the stack's state after each instruction, in order to help analysis.

Anatomy of a contract

A contract is composed of:

a series of type declarations: it creates type structures, with arbitrary sizes, arbitrary labelled fields and alternative cases.
a series of stored variables: those variables, starting with a @ symbol, are kept between contract calls in the blockchain's storage. They can optionally be given an initial value; if all stored variables in the contract have an initial value, then one can extract a data initialization value from the contract, which can be directly passed to the tezos client. (In the future it will be possible to compile storage values from separate files; but the contract's type declarations are needed to properly encode sum and product values).
a function, taking a parameter and returning a result. This function can also, under controlled conditions, perform two kinds of effects: calling other contracts, and updating the content of stored variables.

Expression Syntax

Functions and function applications

In languages inspired by lambda calculus, functions are introduced by a $\lambda$ sign; In Lamtez as in ML we represent it with the fun keyword; we tolerate also the backslash \\, visually more similar, used by Haskell, more concise, but which might make the code look more cryptic at first sight.

The complete syntax for a function is fun x: body. Parameters can be annotated with types, through the :: infix operator: fun x :: nat: body. The result type can also be specified, although it can normally be inferred by the compiler: fun x :: nat :: unit: body.

Lamtez supports multi argument functions, but contrary to ML-inspired languages, they are encoded through tuples rather than through currying (nested functions returning functions, a standard idiom in λ-calcullus inspired languages). Currying is mainly useful for partial applications, and their usefulness is not demonstrated for Tezos smart contracts. They can always be encoded through eta-expansion if needed. Finally, currying leads to cumbersome code at best, in a target language that doesn't support closures.

Functions are syntactically applied the ML/λ-calculus way, by putting the arguments after the function, without parentheses nor separating commas, i.e. what's written f(x, y) in C-inspired syntaxes is written f x y, and equivalent to f(x, y), the application of function f to the pair (x, y). Parentheses are still needed for nested function applications, e.g. what would be written f(g(x), g(y)) in C would give f (g x) (g y) in Lamtez, as in most ML dialects.

Lamtez performs type inference, i.e. if there are enough hints in the code, it will correctly guess the types which the user omitted to write. However, unlike most other ML-family languages, polymorphism is not allowed, because the underlying Michelson VM doesn't support it (TODO: inner lambdas could be polymorphic, as long as it doesn't show on the outside type. Not sure hwo useful that would be in practice). So if the type inference algorithm determines that a function's best type is, say, ∀a: list a -> nat, the compiler will emit an error and demand more type annotations rather than accepting

Examples:

fun x: x
fun x y: x + y
fun x :: nat, y ::nat: x - y
fun x :: nat, y: x * y
fun parameter: ((), ma

Lamtez

Install / Use

README