Phorth
A small forth-like language that targets the CPython VM.
Install / Use
/learn @llllllllll/PhorthREADME
==========
phorth
phorth is a bootstrapped forth-like language where instead of writing our
primitive words in some assembler, we have chosen to write them in a mix of
CPython bytecode and C++*. By using a superset of CPython bytecode, we may
interpret our phorth programs with an unmodified version of CPython using
the same interpreter loop that handles normal python code objects.
* C++ is used where the VM is too restrictive. This is mainly used to handle dynamic jumps which there is no opcode for in CPython.
Purpose
phorth exists because I wanted to attempt to use the CPython virtual machine
for a language that was not Python. Forth seemed like a decent candidate because
of the stack based nature of the VM; however, the CPython machine's model is
radically different which led to a lot of fun hacks later.
Model
phorth is designed to run inside of a single self modifying CPython code
object. The CPython data stack is not global, instead a new data stack is
created for each code object that is being executed. This means that there is no
simple way to write words like nip or over as Python function because
they just do stack manipulation. To get around this problem, I decided to make a
phorth 'context' a single code object. This means that phorth gets a
single data stack to be managed by the interpreter. I can also take advantage of
all of CPython instructions that work on the stack. This means that nip can
be defined in terms of CPython instructions as ROT_TWO, POP_TOP. Some
words can even be implemented as single CPython instructions, for example:
drop is just POP_TOP!
Because I am not using CPython's control stack, this is implemented as a list
stored as a local variable of the frame. The local is accessed through two
functions push_return_addr and pop_return_addr which may only be called
from a phorth context. These function inspect the calling stack frame and
manipulate the values as needed.
Hacks
Some terrible things needed to happen to make this work. The whole project is hacks that are threaded together to do something cute, but below I will list some of my favorites.
Mutation of Code Objects In Place During Runtime
Probably the most terrible of all of the hacks is that I needed to be able to
mutate the code object as it was running. This is because I treat the
``co_code``, which is a ``bytes`` object, as a large mutable memory segment
which acts as the ``phorth`` context's addressable memory space. In Python there
is no way to mutate ``bytes``, which is good because they are supposed to be
immutable; (un)fortunatly there are no such restrictions in the CPython C API.
This allows us to define new words at runtime which is critical for a
Forth. Forth is very deeply tied to the repl and interactive experience so we
needed some way to define words on the fly.
Computed Jumps in the CPython VM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CPython has no need for a computed jump instruction so the VM does not have
one. This is totally reasonable for Python because, obviously, Python can
compiled in such a way that a computed goto is not needed. In ``phorth``, I
don't have as much static information so I needed to be able to jump to an
address in the code object dynamically. To implement this the ``phorth`` context
is actually a generator. This means that the code object has the
``CO_GENERATOR`` flag set and uses ``YIELD_VALUE`` instructions to pause control
flow. The ``phorth`` context is yields ``int``\* objects which are one less than
the address to jump to. The reason it is one less that the address to jump to is
that it really works by setting ``context.gi_frame.f_lasti`` (in C++ again, this
is not mutable in Python) to the value yielded. This means that when execution
resumes, it will resume at the new location. I decided to yield the ``lasti``
instead of the actual target because in most places the location is computed as
an offset from the current instruction pointer meaning I could subtract one from
the offset there and save another subtraction in the runner. I didn't want to
have all of my jump targets start with a ``POP_TOP`` so using the normal
generator ``next`` code would not work. The runner reimplements most of the
``next`` function with special handling to manage the ``lasti`` assignment and
reenter the code without pushing any values onto the stack.
\* There is a special case of yielding ``None`` which means resume execution on
the next instruction. For control flow this is a ``NOP``; however, yielding
causes the state of the ``gi_frame`` object to get synced with the internal
state of ``PyEval_EvalFrameEx``. This is needed to access the ``f_stacktop``
inside some of the primitive ``phorth`` words defined in C++.
Direct Threaded Code in CPython Bytecode
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Direct threaded code is a model where functions are layed out as a list of
addresses of other functions, starting with a call to some machinery that starts
executing the thread. All functions need to end in some ``next`` function that
will jump to the next function in the thread.
There is a little more complexity to the computed jump machinery described
above. The context may yield a negative number which says, "derefence the
absolute value yielded and jump there, also decrement the value by 2 and push it
onto the control stack". This instruction is pretty complicated but it is
designed to make the direct threaded code model simpler. The ``docol`` procedure
starts the threading by yielding the inverse of the next instruction. This will
be seen by the runner which will dereference the value, jumping to the function
whose address appeared after the docol. It will also push ``addr - 2``, or
really the next word's address onto the control stack. Each word defined in
``phorth`` also ends in some ``exit`` procedure which just pops the top value
off the control stack and throws it away because it points to the address after
the function ends, afterwards it yields the top value of the control stack like
normal. This is basically using the control stack as a stack of instruction
pointers for each thread.
One side effect of this is that the ``co_code`` is really a superset of the
CPython bytecode because there are a lot of bytes that are not valid
instructions. This means that ``dis`` of the ``phorth`` context will often fail
once some words are defined.
Defined Words
-------------
Out of the box phorth comes with many words defined. The names are mostly taken
from forth with many omitted and some added. This list is nowhere near the list
of words required to be a compliant forth, but ``phorth`` is not aiming for
that. Like Python, words that start with ``_`` are pseudo private, or meant for
debugging. This includes ``_dis`` which prints the output of ``dis`` on the
``phorth`` context and ``_cstack`` which prints the control (return) stack.
Words starting with ``py::`` are meant to help interface with the CPython
virtual machine. For example, ``py::getattr`` pops a string and an object from
the stack and calls ``getattr``.
.. code-block::
> words
[<Word '!': addr=412, immediate=False>,
<Word '&': addr=585, immediate=False>,
<Word "'": addr=327, immediate=False>,
<Word '(': addr=813, immediate=True>,
<Word '*': addr=637, immediate=False>,
<Word '+': addr=541, immediate=False>,
<Word ',': addr=218, immediate=False>,
<Word '-': addr=559, immediate=False>,
<Word '-rot': addr=1133, immediate=False>,
<Word '.': addr=509, immediate=False>,
<Word '.s': addr=458, immediate=False>,
<Word '/': addr=623, immediate=False>,
<Word '/mod': addr=472, immediate=False>,
<Word '0<': addr=927, immediate=False>,
<Word '0=': addr=945, immediate=False>,
<Word '0>': addr=963, immediate=False>,
<Word '0branch': addr=448, immediate=True>,
<Word '1+': addr=981, immediate=False>,
<Word '1-': addr=999, immediate=False>,
<Word '2*': addr=1017, immediate=False>,
<Word '2+': addr=1035, immediate=False>,
<Word '2-': addr=1053, immediate=False>,
<Word '2/': addr=1071, immediate=False>,
<Word '2drop': addr=1089, immediate=False>,
<Word '2dup': addr=551, immediate=False>,
<Word ':': addr=641, immediate=False>,
<Word ';': addr=775, immediate=True>,
<Word '<': addr=535, immediate=False>,
<Word '<<': addr=563, immediate=False>,
<Word '<=': addr=545, immediate=False>,
<Word '<>': addr=529, immediate=False>,
<Word '=': addr=519, immediate=False>,
<Word '>': addr=627, immediate=False>,
<Word '>=': addr=605, immediate=False>,
<Word '>>': addr=615, immediate=False>,
<Word '>cfa': addr=188, immediate=False>,
<Word '?': addr=1105, immediate=False>,
<Word '@': addr=392, immediate=False>,
<Word '[': addr=309, immediate=False>,
<Word ']': addr=318, immediate=True>,
<Word '^': addr=595, immediate=False>,
<Word '_cstack': addr=599, immediate=False>,
<Word '_dis': addr=261, immediate=False>,
<Word 'b!': addr=423, immediate=False>,
<Word 'b,': addr=231, immediate=False>,
<Word 'b@': addr=402, immediate=False>,
<Word 'branch': addr=440, immediate=True>,
<Word 'bye': addr=482, immediate=False>,
<Word 'create': addr=296, immediate=False>,
<Word 'drop': addr=493, immediate=False>,
<Word 'dup': addr=505, immediate=False>,
<Word 'exit': addr=765, immediate=False>,
<Word 'false': addr=579, immediate=False>,
<Word 'find': addr=244, immediate=False>,
<Word 'here': addr=573, immediate=False>,
<Word 'immediate': addr=801, immediate=False>,
<Word 'latest': addr=513, immediate=False>,
<Word 'matmul': addr=555, immediate=False>,
<Word 'mod': addr=633, immediate=False>,
<Word 'nip': addr=488, immediate=False>,
<Word 'none': addr=567, immediate=False>,
<Word 'noop': addr=1121, im
Related Skills
node-connect
346.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
