SkillAgentSearch skills...

J2

j2 is a minimalist concatenative programming language that makes up for its simplicity by its ability to natively bind with C libraries' ABI *and types*, *without glue*

Install / Use

/learn @jasonnyberg/J2

README

(This doc is a work in progress…)

NOTE: The “cpp” branch contains a “dumb” port of this project from C to C++, which I did for several reasons:

  • Obviously, it greases the skids towards expanding its runtime reflection capabilities into C++.
  • I wanted to eliminate the dependency on the GCC-only “nested functions” extension, using “standard” C++ lambdas instead…
  • ...so it can be compiled with Clang/LLVM…
  • ...so I can play with KLEE.

The Elevator Pitch...

J2 is a system that combines a minimalist programming language (“Edict”, for “executable dictionary”) with a C reflection/FFI system. It allows you to easily import shared libraries and use the datatypes, variables, and functions within it (at least the ones in global scope) directly, without needing to write any “glue” code.

For example, if I have a file “inc.c” containing:

typedef struct mystruct { int i; float f; } mystruct;
extern mystruct increment(mystruct x) { x.i+=1; x.f+=1; return x; }

...and I make a shared library out of it:

gcc --shared -g inc.c -o inc.so

...then I can import that shared library into the edict interpreter and get right to using it:

e> loadlib([inc.so]) @mylib
Finished curating module
e> mylib<mystruct! <2@i 4@f> @x increment(x)>/ stack!
VMRES_STACK
 <null>
        structure_type "struct mystruct" 
    member "i" 
    base_type "int" 0x3 (0x7f361c0121a0)
    member "f" 
    base_type "float" 5 (0x7f361c0121a4)

It looks simple, but there’s a lot going on here under the covers here...

Language Overview

“Edict” is a minimalist programming language that makes up for its simplicity by having the built-in ability to understand and dynamically bind with C libraries, providing "native" access to C types, variables, and methods, of arbitrary complexity, without writing wrappers or glue code. Alternatively, you can look at it as a reflection library for C programs that allows them to expose dynamic access to their own internals at runtime.

The language is built upon a foundation of three elements:

  • Data Structure: All data (the entire state of the system) resides in a "Listree" structure, a recursively-defined hierarchical dictionary.
  • Reflection: C type, function, and variable definitions/declarations from C libraries are imported and curated via libraries’ DWARF (debugging) information.
  • Interpreter(s): A simple bytecode VM (with accompanying interpreters/jit-compilers) provides a unified method of accessing and exploiting instructions and data.

These three elements assemble themselves at runtime into a multi-purpose programmable environment. The system bootstraps itself by:

  • Importing its own library, so it can reflect upon itself
  • Compiling a kernel for the VM; by default, it compiles an Edict REPL (itself defined in Edict), using C building blocks imported from its own library
  • Running the kernel in the VM…
  • ...Which in turn (by default) evaluates an Edict file and/or interactive session.

The evolution of Edict has been influenced by Forth, Lisp, Joy, Factor, Tcl, Mathematica, and others.

The Language

Key Points:

  • Edict is a “stack” language: Work is performed by popping data from the top of the data stack, operating on it, and pushing results back onto the stack.
  • Edict has a dictionary which contains… well, everything, including the data stack.
  • Edict
  • “Code” is “data” (but not in the same way as Lisp).
  • Edict code is compiled into a simple core bytecode, which a VM executes.
  • Other kinds of structured/hierarchical data can be compiled into this bytecode.
  • This VM is “stackless”; It doesn’t recurse, and the VM inner bytecode dispatch loop is nearly condition-free:
    • **<code>while (call=vm_dispatch(call)); // VM's inner loop</code></strong>
  • A VM context maintains its state within a single root “Listree” object.
  • Within a VM context, several stacks of items and frames, for:
    • Bytecodes
    • Data (Edict is a point-free postfix language)
    • A hierarchical-dictionary
    • Prefix-style “function call” operators
    • Exceptions

Listree (“List Tree”)

(NOTE: I found this "ZigZag" structure that is very similar (and predates) my Listree: https://www.nongnu.org/gzz/gi/gi.html)

The “Listree” is the core data structure of Edict and the VM which implements it. An instance of a Listree consists simply of a Listree Value, which contains:

  • Zero or one buffers(s) of “arbitrary” data, and
  • Zero or more labeled references to other Listree Values*.

The possibility of a Listree Value to contain (labeled) references to other Listree Values is what makes the Listree a “hierarchical” or “recursive” data structure. The Listree is at both the core of Edict and the system which implements it.

*It’s not explicit in this simple explanation (and that’s on purpose), but each label actually refers to a list of references to other Listree Values.

As mentioned, the VM maintains all of its state within a Listree Value, including “Data Stack” and “Dictionary” sub-Listrees, among several others.

In Edict, there are two types of data values, but values of both types are stored using Listree Values which exist either on the VM’s Data Stack or within its Dictionary.

Literals

The simplest type of value is a "literal". Literals are just text, delineated by square brackets:

[This is a literal]

If you come from a LISP background, you might think that the interpreter breaks literals down into s-expressions, but this is not the case. Everything between the brackets is represented “literally” in a Listree Value’s “data buffer”.

The Edict interpreter keeps track of nested square brackets, so:

[This is a [nested] literal]

is interpreted as a single literal with the value "This is a [nested] literal".

A "" character appearing in the definition of a literal "escapes" the next character, allowing the interpreter to create literals containing (for instance) unbalanced square brackets:

[This is a literal containing an unbalanced \[ bracket]

When the interpreter comes across one, literal values (or rather, references to the value -- an important distinction) are simply placed on the data stack.

OTOH, the interpreter does a little more processing on anything that’s not a literal. (There are other kinds of values that are not literals, which I’ll discuss in a later section…)

The (Hierarchical) Dictionary

The Edict programmer can assign names to values on the stack, and subsequently refer to those values by those names. Assignment looks simply like this:

@mylabel

Assignment of a name to a value actually does several things:

  • It adds a “Listree Item” to the top frame* of the dictionary with that name, if it doesn’t already exist, and
  • adds to it a reference to the value, and
  • removes the stack’s reference to that value.

When the interpreter sees a reference to a label, that reference is replaced by a reference to the value associated with that label… In other words:

  • You can name things
  • You can then reference those things by name
  • (It’s not rocket science)

Evaluation

Edict is different from other languages in one important way. Many languages are “homoiconic”, i.e. code and data are represented using the same underlying structures. (LISP is a traditional example of a homoiconic language.) Edict is neither homoiconic nor non-homoiconic: It doesn’t have “functions” at all. It simply has an evaluation operator which can be applied to values.

A basic “function-like” thing in Edict might look like:

[1@x]

(Assign the label “x” to the value “1”.)

Note that it is just a literal.

To invoke it, the evaluation operator is used:

[1@x]!

The outcome of this is exactly the same as if the interpreter had just directly read the following:

1@x

All the evaluation operator does is feed the contents of the top of the top of the stack to the interpreter.

Now: recall that labels can be assigned to values, and that labels can be invoked to recall their values, and those values are pushed to the stack, and that the evaluation operator feeds the contents of the stack back into the interpreter:

[1@x]@f
f!

This little sequence does the following:

  • Pushes the literal value “[1@x]” to the stack
  • Applies the label “f” to the TOS value (removing it from the stack in the process)
  • Recalls the value labeled “f” to the stack
  • Evaluates it, which:
    • Pushes the literal value “1” to the stack
    • Assigns the label “x” to the TOS value (removing it in the process)

Library Import/Reflection

Edict can import C library types and global variable/function information via the debugging section (DWARF) of a library, if it’s available. The DWARF information is processed and stored in the dictionary, and the VM can “understand” this information and present it within the Edict interpreter using the same simple “native” syntax that operates on literal values.

  • Instances of a type can be allocated (and placed on the stack, of course) by simply “evaluating the name of the type
    • int! @x
  • Global variables can be “referenced” by their name (and their value will be placed on the stack) just like any other labeled data
    • (assuming “int MyCGlobalInt=0;”)
    • MyCGlobalInt @y
  • Global functions can be “evaluated” just like a literal value, using the “!” operator; arguments will be pulled from the stack, and return values will be pushed to the stack.
    • (Assuming “int Multiply(int x, int y) { return x*y; }”)
    • x y Multiply!

(WIP BELOW…)

<table> <tr> <td>REF </td> <td>Reference </td> </tr> <tr> <td>-REF </td> <td>Reference (tail) </td> </tr> <tr> <td>@ </td> <td>Assignment to TOS </td
View on GitHub
GitHub Stars46
CategoryDevelopment
Updated2mo ago
Forks3

Languages

C

Security Score

95/100

Audited on Jan 28, 2026

No findings