Introduction

TML (Tau Meta-Language) is a variant of Datalog. It is intended to serve as a translator between formal languages (and more uses, see under the Philosophy section). The main difference between TML and common Datalog implementations is that TML works under the Partial Fixed-Point (PFP) semantics, unlike common implementations that follow the Well-Founded Semantics (WFS) or stratified Datalog. By that TML (like with WFS) imposes no syntactic restrictions on negation, however unlike WFS or stratified Datalog it is PSPACE complete rather than P complete. TML's implementation heavily relies on BDDs (Binary Decision Diagrams) in its internals. This gives it extraordinary performance in time and space terms, and allowing negation to be feasible even over large universes. In fact negated bodies, as below, do not consume more time or space than positive bodies by any means, thanks to the BDD mechanism.

Universe

The size of the universe (or domain of discourse) of a TML program is:

The number of distinct non-relation symbols in the program, plus
The maximum nonnegative integer that appears in the program (where the length of input strings counts as appearing in the program), plus
256 if at least one character symbol is used in the program (or at least one string appears in the program).

Fixed Points

TML follows the PFP semantics in the following sense. On each step, all rules are executed once and only once, causing a set of insertions and deletions of terms. If the same term is inserted and deleted at the same step, the program halts and evaluates to unsat. Otherwise it continues to the next step performing again a single evaluation of every rule. This process must eventually halt in either of the following forms:

The database obtained from the current step is equal to the database resulted from the previous step. This is a Fixed-Point. When this happens, the resulted database is considered as the final result.
Or, the database obtained in one step equals to the database state in some previous, not immediate predecessor, state. In this case the program will loop forever if we wouldn't detect it and halt the program, and is therefore evaluated to unsat, as no fixed point exists.

Note that only one of the two options can happen because the arity and the universe size are fixed. Ultimately, for universe size n and maximum arity k, they will occur in no more than 2^n^k steps.

Facts and Rules

A TML program consists of initial terms (facts) and rules that instruct the interpreter which terms to derive or delete at each step, as described in the previous section.

Facts

Facts take the form of

a(b c).
a(b(c)).
a(1 2 3).
rel('t' 1 2).
b(?x).
r.

Each term begins with a relation symbol (which is not considered as part of the universe). It is then possibly followed by parenthesis (or any balanced sequence of them) containing symbols. A term like b(b) is understood as containing two different symbols, one b stands for a relation symbol, and the second one for a universe element. Symbols may either be alphanumeric literals (not beginning with a digit), or a nonzero integer, or a character. Additionally a term may contain variables, prefixed with ?. A fact that contain variables, like b(?x), is interpreted where ?x goes over the whole universe. So the program

a(1).
b(?x).

is equivalent to

a(1).
b(0).
b(1).

Rules

Rules are terms separated by commas and one update operator :-. For example:

e(?x ?y) :- e(?x ?z), e(?z ?y).

what is left to the update operator (which may be several terms separated by commas) is called the head of the rule, while the rhs is called the body of the rule. This latter example therefore instructs to add to the next-step database all facts of the form e(?x ?y) such that e(?x ?z) and e(?x, ?y) appear in the current-step database, for some value of ?z. So for example the following program

e(1 2).
e(2 1).
e(?x ?y) :- e(?x ?z), e(?z ?y).

will result with:

e(1 2).
e(2 1).
e(1 1).
e(2 2).

Note that the order of facts and rules does not matter. Also note that all facts and rules must end with a dot.

Negation and Deletion

Bodies may be negated using the negation symbol ~, for example:

e(?x ?y) :- e(?x ?z), e(?z ?y), ~e(?x ?x).

The variable ?x will bind to all values such that e(?x ?x) does not appear in the current-step database.

Heads may also contain the negation symbol, in which case it is interpreted as deletion. For example the rule:

~e(?x ?x) :- e(?x ?x).

Will make the next-step database to not include all terms of the form e(?x ?x) included on the current step database.

For performance reasons it is advised to better not have variables appearing in negated bodies that do not occur in any positive head (in the same rule), or variables that appear in positive heads and do not appear in bodies (also in the same rule). However this is only a performance advice. TML should work correctly either way, where variables (implicitly) range over the whole universe.

Fact deletion

It is also possible to use negated facts. These are deleted right after non-negated facts are added and right before rules are being executed.

a(2). b(?x).
~b(1). ~a(?x).
a_copy(?x) :- a(?x).
b_copy(?x) :- b(?x).

will result with

b(2).
b(0).
b_copy(2).
b_copy(0).

Sequencing

It is possible to sequence programs one after the other using curly brackets. For example the program

{
  e(1 2).
  e(2 3).
  e(3 1).
  e(?x ?y) :- e(?x ?z), e(?z ?y).
}
{
  ~e(?x ?x) :- e(?x ?x).
}

will result with

e(1 2).
e(1 3).
e(2 1).
e(2 3).
e(3 1).
e(3 2).

More generally, the output of one program is considered the input of the other. It is possible to filter the output before passing it to the next program as in the section "Queries".

Nested programs

It is possible to nest programs (actually whole sequences). Any nested program (or sequence of nested programs) is evaluated after the parent program reaches its fixed point.

For example this nested program:

{
    x1 :- x.
    {   y.
        { x2 :- x. }
        { y2 :- y. }
        { z2 :- z. }
    }
    y1 :- y.
    {   y.
        x2 :- x.
        { z.
          x3 :- x.
          y3 :- y.
          z3 :- z.
        }
        y2 :- y.
        z2 :- z.
    }
    z1 :- z.
    x.
}

is equivalent to this sequence:

{
    x.
    x1 :- x.
    y1 :- y.
    z1 :- z.
}
{ y. }
{ x2 :- x. }
{ y2 :- y. }
{ z2 :- z. }
{
    y.
    x2 :- x.
    y2 :- y.
    z2 :- z.
}
{
    z.
    x3 :- x.
    y3 :- y.
    z3 :- z.
}

Strings

It is possible to input strings to the database. The line

@string mystr "abc".

will add following facts to the database:

c(2).
b(1).
a(0).
mystr(2 'c').
mystr(1 'b').
mystr(0 'a').
mystr(printable 2 0).
mystr(printable 1 0).
mystr(printable 0 0).
mystr(alnum 2 0).
mystr(alnum 1 0).
mystr(alnum 0 0).
mystr(alpha 2 0).
mystr(alpha 1 0).
mystr(alpha 0 0).

More generally, @string relname "str" will use the relation symbol relname to declare two relations with that name:

2-ary relation mystr(?pos ?char) with a character and its position in the string
3-ary relation `mystr(?class ?pos 0) with a character classification (digit, alpha, alnum, printable, space). Third argument is always 0 to distinguish from 2-ary relation of the same name.

@string directive also declares unary relation for each unique character of the string and stores in it all character positions.

It is also possible to output a string to stdout by using it as a relation symbol:

@stdout str1.

In addition a string can refer to command line arguments:

@string str $1.

or to be taken from stdin:

@string str stdin.

or from a file:

@string str <filename>.

or from a file provided as a command line argument:

@string str <$1>.

Finally it is possible to refer to the length of the string by the symbol len:str.

Queries

TML features three kinds of queries: filtering, proving, and tree extraction. Filtering and tree extraction replace the resulted database with their result, namely deleting everything unrelated to them. Their result is then outputed or passed to the next sequenced program.

Filtering is done by:

! e(1 ?x).

which will leave on the database only the results that match the term e(1 ?x). Tree extraction is done by supplying the root (which may possibly contain variables) after !!:

!! T((?x ?y)).

Proof extraction is done by adding a single directive specifying a relation name:

@trace relname.

which will construct a forest with relation symbol relname that contains all possible witnesses to all derived facts, in a fashion that was described above: if we have a rule

e(?x ?y) :- e(?x ?z), e(?y ?z).
@trace P.

then the trace tree will have the form

P((e(?x ?y)) (e(?x ?z)) (e(?y ?z))).

Grammars

It is possible to supply a context free grammar as a syntactic shortcut for definite clause grammars. For example Dyck's language may be written as:

start => null.
start => '(' start ')' start.

and will be converted to the rules:

start(?v1 ?v1) :- str(((?v1)) (?v2) ((?v3))).
start(?v3 ?v3) :- str(((?v1)) (?v2) ((?v3))).
start(?v1 ?v5) :- str(((?v1)) ('(') ((?v2))), start(?v2 ?v3),
    str(((?v3)) (')') ((?v4))), start(?v4 ?v5).

where str is some string defined in the program. Grammars are allowed in programs that contain only one string. If multiple strings require parsing it is possible to define them in sequenced programs.

Extracting the parse forest can be done by extracting a proof of the start symbol:

!! parseFor

TML

Install / Use

README