Tandem
MIRROR of https://codeberg.org/catseye/Tandem : A rewriting language where the rewrite rules form a Kleene algebra
Install / Use
/learn @catseye/TandemREADME
Tandem
Try it online @ catseye.tc | Wiki entry @ esolangs.org | See also: Squishy2K ∘ Arboretuum ∘ Tamsin ∘ Vinegar ∘ Cabra
Tandem is an experimental rewriting language where the rewrite rules form a Kleene algebra.
The object being rewritten by a Tandem program is a collection of labelled stacks -- a finite mapping from strings to strings. The strings are always rewritten at the left edge, so they are effectively stacks.
A Tandem program consists of a single rewrite rule along with zero or more pragmas. The rewrite rule is applied to an initial state to possibly obtain a final state. This rule is applied only once. However, in Tandem, a rule is a composite object which may contain subrules that get applied many times.
Rewrite rules
A rewrite rule is defined inductively as follows:
- 0 is a rewrite rule.
- 1 is a rewrite rule.
- If l is a character and s and t are strings, then:
- ls → t is a rewrite rule.
- ls… → t is a rewrite rule.
- ls… → t… is a rewrite rule.
- If Ri and Rj are rewrite rules, then Ri | Rj is a rewrite rule.
- If Ri and Rj are rewrite rules, then Ri & Rj is a rewrite rule.
- If R is a rewrite rule, then R* is a rewrite rule.
- Nothing else is a rewrite rule.
A rewrite rule may either match and change the state, match and make no change to the state (e.g. replacing X with X), or fail to match (in which case the state does not change).
In the following, we say M(R,S) for the set of labelled stacks that would be matched by R in some state S. These are the redexes of R in S. If M(R,S) is the empty set then R fails to match on S.
In general, a rule can match multiple redexes, i.e., M(R,S) can contain more than one element. However, if a rule matches more than one redex during program execution, an unrecoverable error to the effect of “multiple rewrite choices encountered” occurs. This determinism is so that the algebraic rules work out nicely. If some implementations wish to experiment with other ways of resolving the non-determinism when this happens, they may do so by defining implementation-specific pragma.
0 and 1
0 is an artificial rewrite rule that always fails to match. 1 is an artificial rewrite rule that always succeeds in matching, and changes nothing. These are the identity elements for disjunction and conjunction respectively. More will be said about them later.
Individual rewrite rules
In an individual rewrite rule, l identifies the stack which will be rewritten. In this section “the stack” means “the stack labelled with l”.
There are three variations on individual rewrite rules.
Variation 1: s is not followed by the special signifier …: If the stack equals s exactly, then the stack will become t exactly.
Variation 2: s is followed by the special signifier … but t is not: If the stack begins with s, then the stack will become t exactly.
Variation 3: s is followed by the special signifier … and so is t: If the stack begins with s, then that s will be removed from the stack and t will be prepended on the stack in its place.
Examples follow. If the stack labelled R contains only name, replace it with mud:
Rname → mud
If the stack labelled Q starts with m, replace the whole thing with mud:
Qm… → mud
Strip a leading 0 from the stack labelled A if one is present:
A0… → …
If the top element of A is 0 and the element under it is 1, swap them:
A01… → 10…
Disjunction
Ri | Rj means that either Ri can be applied or Rj can be applied. If both Ri would fail to match and Rj would fail to match, then Ri | Rj fails to match.
M(Ri | Rj, S) = M(Ri, S) ∪ M(Rj, S).
Ri | Rj can introduce nondeterminism, if both Ri and Rj match; see note on nondeterminism above.
Examples follow. Rewrite a to b, or b to a, on the stack labelled A, whichever (if either) is applicable:
Aa → b | Ab → a
Conjunction
Ri & Rj means to first apply Ri to obtain a modified state, then apply Rj to the modified state. If Ri does not match, or Rj does not match (on the modified state) then Ri & Rj does not match, and no changes are made to the state.
M(Ri & Rj, S) = M(Rj, S') where S' is the modified state (assuming M(Ri, S) is not empty; otherwise M(Ri & Rj, S) is empty).
Examples follow. Rewrite a to b on the stack labelled X, and b to a on the stack labelled Y, assuming both of these rewrites are possible. (If only one is possible, nothing happens):
Xa → b & Yb → a
Never matches, because even if the first rewrite suceeded, the second one could not possibly succeed:
Xa → b & Xc → d
(Note that this definition of & is different from the original (pre-June 25th) definition; the previous definition was incompatible with the definition of a Kleene algebra, and apparently basically flawed as an component in this algebraic structure to boot.)
Asteration
R* means: repeatedly apply R while M(R, S) is nonempty for the current state S. R* is always considered to succeed in matching, even if R was not matched even once.
Examples follow. Strip all leading 0s from the stack labelled A:
A0…→…*
Syntax
& has a higher precedence than |. * is postfix and has the highest precedence of all. Enclose expressions in parentheses ( and ) to get around the precedence rules.
For ease of typing, -> is allowed as a synonym for →, and ... is allowed as a synonym for ….
A stack may be labelled with any string. The most common choice is a single uppercase Latin alphabetic character, in which case no special syntax is needed. For other names, the label must be enclosed in double-quotes.
Ab→c
"A"b→c
"29glerph"b→c
The strings to be rewritten may also be quoted, and this is how one can get spaces and special forms such as -> and ... onto a stack.
Ab→" c"
"->""->"->"->"
Spaces between the label, the strings, the ..., and the -> are otherwise ignored. The following are equivalent:
Ab…→c…
A b … → c …
Inside a quoted string, three escape sequences are recognized:
\": a literal double-quote\\: a literal backslash\{xxxx}: a Unicode character given by its Unicode code point xxxx given in hexadecimal
As a matter of syntactic sugar, the format of the … forms of the individual rules may be reversed by preceding the rule with a % symbol. This is convenient for when you want to think of the stack as having its top element on the right edge of the string, rather than the left. (It is not stored differently in any way, it is merely a sugar on the notation).
%C…a → …b
%C…foo → foo
Initial State
At the beginning of a Tandem program with rewrite rule R, it can be assumed that, under normal conditions, for each label mentioned in the program, there exists a labelled stack in the collection with that label and it is initialized with an empty string. This leads to the following idiom to initialize stacks with initial values:
S→0 &
T→"$" &
K→1111 &
...rest of program goes here...
Pragmas
A pragma is a special string enclosed in braces ({ and }) which may influence how the Tandem program in question is to be interpreted.
{B:...}, {S:...}, {C:...}
See “Input and output” below.
{!...}
Comment. Any character except } may appear between the ! and the closing }. Comments are ignored by the implementation.
Input and Output
No implementation of Tandem is required to implement I/O, however implementations are encouraged to provide, at the least, some facility for the user to inspect the state of the collection of stacks after the rewriting has been applied. Even better if they are given a way to specify the initial contents of one or more stacks.
That said, if an implementation wants to implement I/O, there are several I/O disciplines that Tandem might define. A Tandem program can indicate it expects a particular I/O discipline via a pragma.
Batch I/O
The pragma {B:i,o} where i and o are stack labels indicates a program expects batch I/O in the following manner.
At the beginning of the program, standard input to the program is loaded onto the stack labelled i. The top of this stack will be the first character read from the input. An empty stack indicates end of input.
At the conclusion of the program, the contents of the stack o are written to the standard output. Because it is useful to accumulate output as the program runs, the top of this stack will be the last character sent to the output. It may also be convenient to use the % sugar to make accumuluating output look more natural.
Stream I/O
The pragma {S:...} indicates the program expects stream I/O. INCOMPLETE.
Console I/O
The pragma {C:...} indicates the program expects reactive console I/O. INCOMPLETE.
Example programs
Rudimentary examples
Hello, world!, using batch I/O.
{B:I,O}%O… → "Hello, world!"
Cat program, using batch I/O. Note that this only supports inputs consisting of 0's and 1's, so a better title might be “binary cat program”.
{B:I,O}
Q→0 &
(
Q0→0 & I0…→… & %O…→…0 |
Q0→0 & I1…→… & %O…→…1 |
Q0→1 & I→
)*
Reverse cat program, using batch I/O. Unlike the above this will accept arbitrary characters in the input. On the other hand, it is rather cheaty. To make it a valid program, there
