Stoneknifeforth
a tiny self-hosted Forth implementation
Install / Use
/learn @kragen/StoneknifeforthREADME
StoneKnifeForth
This is StoneKnifeForth, a very simple language inspired by Forth. It is not expected to be useful; instead, its purpose is to show how simple a compiler can be. The compiler is a bit under two pages of code when the comments are removed.
This package includes a “metacircular compiler” which is written in StoneKnifeForth and compiles StoneKnifeForth to an x86 Linux ELF executable.
There is also a StoneKnifeForth interpreter written in Python (tested with Python 2.4). It seems to be about 100× slower than code emitted by the compiler.
(All of the measurements below may be a bit out of date.)
On my 700MHz laptop, measuring wall-clock time:
- compiling the compiler, using the compiler running in the interpreter, takes about 10 seconds;
- compiling the compiler, using the compiler compiled with the compiler, takes about 0.1 seconds;
- compiling a version of the compiler from which comments and extra whitespace have been “trimmed”, using the compiler compiled with the compiler, takes about 0.02 seconds.
So this is a programming language implementation that can recompile
itself from source twice per 24fps movie frame. The entire “trimmed”
source code is 1902 bytes, which is less than half the size of the
nearest comparable project that I’m aware of, otccelf, which is 4748
bytes.
As demonstrated by the interpreter written in Python, the programming language itself is essentially machine-independent, with very few x86 quirks:
- items on the stack are 32 bits in size;
- arithmetic is 32-bit;
- stack items stored in memory not only take up 4 bytes, but they are little-endian.
(It would be fairly easy to make a tiny “compiler” if the source language were, say, x86 machine code.)
The output executable is 4063 bytes, containing about 1400
instructions. valgrind reports that it takes 1,813,395 instructions
to compile itself. (So you would think that it could compile itself
in 2.6 ms. The long runtimes are a result of reading its input one
byte at at time.)
Why? To Know What I’m Doing
A year and a half ago, I wrote a metacircular Bicicleta-language interpreter, and I said:
Alan Kay frequently expresses enthusiasm over the metacircular Lisp interpreter in the Lisp 1.5 Programmer’s Manual. For example, in http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=273&page=4 he writes:
Yes, that was the big revelation to me when I was in graduate school — when I finally understood that the half page of code on the bottom of page 13 of the Lisp 1.5 manual was Lisp in itself. These were “Maxwell’s Equations of Software!” This is the whole world of programming in a few lines that I can put my hand over. I realized that anytime I want to know what I’m doing, I can just write down the kernel of this thing in a half page and it’s not going to lose any power. In fact, it’s going to gain power by being able to reenter itself much more readily than most systems done the other way can possibly do.
But if you try to implement a Lisp interpreter in a low-level language by translating that metacircular interpreter into it (as I did later that year) you run into a problem. The metacircular interpreter glosses over a large number of things that turn out to be nontrivial to implement — indeed, about half of the code is devoted to things outside of the scope of the metacircular interpreter. Here’s a list of issues that the Lisp 1.5 metacircular interpreter neglects, some semantic and some merely implementational:
- memory management
- argument evaluation order
- laziness vs. strictness
- other aspects of control flow
- representation and comparison of atoms
- representation of pairs
- parsing
- type checking and type testing (atom)
- recursive function call and return
- tail-calls
- lexical vs. dynamic scoping
In John C. Reynolds’s paper, “Definitional Interpreters Revisited”, Higher-Order and Symbolic Computation, 11, 355–361 (1998), he says:
In the fourth and third paragraphs before the end of Section 5, I should have emphasized the fact that a metacircular interpreter is not really a definition, since it is trivial when the defining language is understood, and otherwise it is ambiguous. In particular, Interpreters I and II say nothing about order of application, while Interpreters I and III say little about higher-order functions. Jim Morris put the matter more strongly:
The activity of defining features in terms of themselves is highly suspect, especially when they are as subtle as functional objects. It is a fad that should be debunked, in my opinion. A real significance of [a self-defined] interpreter . . . is that it displays a simple universal function for the language in question.
On the other hand, I clearly remember that John McCarthy’s definition of LISP [1DI], which is a definitional interpreter in the style of II, was a great help when I first learned that language. But it was not the sole support of my understanding.
(For what it’s worth, it may not even the case that self-defined interpreters are necessarily Turing-complete; it might be possible to write a non-Turing-complete metacircular interpreter for a non-Turing-complete language, such as David Turner’s Total Functional Programming systems.)
A metacircular compiler forces you to confront this extra complexity. Moreover, metacircular compilers are self-sustaining in a way that interpreters aren’t; once you have the compiler running, you are free to add features to the language it supports, then take advantage of those features in the compiler.
So this is a “stone knife” programming tool: bootstrapped out of almost nothing as quickly as possible.
Why? To Develop a Compiler Incrementally
When I wrote Ur-Scheme, my thought was to see if I could figure out how to develop a compiler incrementally, starting by building a small working metacircular compiler in less than a day, and adding features from there. I pretty much failed; it took me two and a half weeks to get it to compile itself successfully.
Part of the problem is that a minimal subset of R5RS Scheme powerful
enough to write a compiler in — without making the compiler even
larger due to writing it in a low-level language — is still a
relatively large language. Ur-Scheme doesn’t have much arithmetic,
but it does have integers, dynamic typing, closures, characters,
strings, lists, recursion, booleans, variadic functions, let to
introduce local variables, character and string literals, a sort of
crude macro system, five different conditional forms (if, cond, case,
and, or), quotation, tail-call optimization, function argument count
verification, bounds checking, symbols, buffered input to keep it from
taking multiple seconds to compile itself, and a library of functions
for processing lists, strings, and characters. And each of those
things was added because it was necessary to get the compiler to be
able to compile itself. The end result was that the compiler is 90
kilobytes of source code, about 1600 lines if you leave out the
comments.
Now, maybe you can write 1600 lines of working Scheme in a day, but I
sure as hell can’t. It’s still not a very large compiler, as
compilers go, but it’s a lot bigger than otccelf. So I hypothesized
that maybe a simpler language, without a requirement for compatibility
with something else, would enable me to get a compiler bootstrapped
more easily.
So StoneKnifeForth was born. It’s inspired by Forth, so it inherits most of Forth’s traditional simplifications:
- no expressions;
- no statements;
- no types, dynamic or static;
- no floating-point (although Ur-Scheme doesn’t have floating-point either);
- no scoping, lexical or dynamic;
- no dynamic memory allocation;
And it added a few of its own:
- no names of more than one byte;
- no user-defined IMMEDIATE words (the Forth equivalent of Lisp macros);
- no interpretation state, so no compile-time evaluation at all;
- no interactive REPL;
- no
doloops; - no
elseonifstatements; - no recursion;
- no access to the return stack;
- no access to the filesystem, just stdin and stdout;
- no multithreading.
Surprisingly, the language that results is still almost bearable to write a compiler in, although it definitely has the flavor of an assembler.
Unfortunately, I still totally failed to get it done in a day. It was
15 days from when I first started scribbling about it in my notebook
until it was able to compile itself successfully, although git only
shows active development happening on six of those days (including
some help from my friend Aristotle). So that’s an improvement, but
not as much of an improvement as I would like. At that point, it was
13k of source, 114 non-comment lines of code, which is definitely a
lot smaller than Ur-Scheme’s 90k and 1600 lines. (Although there are
another 181 lines of Python for the bootstrap interpreter.)
It’s possible to imagine writing and debugging 114 lines of code in a day, or even 300 lines. It’s still maybe a bit optimistic to think I could do that in a day, so maybe I need to find a way to increase incrementality further.
My theory was that once I had a working compiler, I could add features to the language incrementally and test them as I went. So far I haven’t gotten to that part.
Why? Wirth envy
[Michael Franz writes][3]:
In order to find the optimal cost/benefit ratio, Wirth used a highly intuitive metric, the origin of which is unknown to me but that may very well be Wirth’s own invention. He used th
Related Skills
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
