SkillAgentSearch skills...

Third

Third, a small Forth compiler for 8086 DOS

Install / Use

/learn @benhoyt/Third
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Third, a small Forth compiler for 8086 DOS

Historical note: I wrote this in 1998 (when I was 16), so I'm just putting it up here as a backup and for "historical interest". It's still a pretty good show-case for how simple a self-hosting Forth compiler can be.

I wrote it more or less all myself (though parts of the assembler were pretty much copied, I think :-), but it's loosely based on eForth and the Forth compilers my dad wrote.

It still works on current 32-bit Windows systems, but 64-bit Windows systems don't run 16-bit DOS programs anymore, so for those you'll need to use DOSBox or something similar.

All the documentation that follows is original.

Quick start

To jump straight into using Third, execute the batch file MBIG MBER MQUICK in that order. Then you can run Big to use Third, or Ber to use Third with Berwyn Hoyt's extensions. Type BYE once you're in Third to exit back to DOS. To load a Forth file (source code) type "INCLUDE filename.f".

General information

Third was designed and coded by Benjamin Hoyt, with complements to the following people:

Charles Moore           inventing Forth
Bruce Hoyt              coding FE (Forth Editor)
Berwyn Hoyt             bug reports and help to make Third clean
Bill Muench             eForth
Wonyong Koh             hForth
Tom Zimmer              F-PC
Anton Ertl              gForth
ANS committee           creating the ANS Standard

Third tries hard, most of the time, to be a reasonable 16 bit Forth compiler for DOS PCs. It was originally coded to be a very small Forth as a learning tool for others, but this idea vanished when Berwyn and I decided to use it for a cross compiler for the Motorola 68HC12 CPU. It is now a somewhat stable ANS Forth system with various features.

Excepting ENVIRONMENT? Third is a full CORE system just with the bare Kernel.COM. With Big.F loaded (to make Big.COM) Third is a ANS system with the following wordsets and their extensions implemented:

CORE DOUBLE EXCEPTION FACILITY FILE TOOLS SEARCH STRING

Missing are BLOCK FLOATING LOCALS MEMORY. Even though the BLOCK wordset is supposed to be implemented if the FILE wordset is, I can't see much use for it at the moment. When I need it, I'll code it. FLOATING will probably never come into the picture. I've never needed or liked LOCALS (to say the least). MEMORY words seem a bit pointless for a small 16 bit Forth with only a 64k address space.

Third also has a clean metacompiler (thanks to Bill Muench's eForth metacompiler), an 8086 assembler, a nice DOS shell routine, a simple decompiler, a fairly nice Quick reference help, and reasonable docs.

Third dictionary structure

Third's dictionary is a structure of names and other information chained together in a linked list. Each "record" in the linked structure contains information about a Forth "word" (don't confuse with the CPU-word type meaning). Each record in the linked list has a pointer to the previously defined word, whose link points to the word defined before that, whose link points to ... and so it goes till the link contains a zero (end of linked list).

A wordlist is a list of words chained together in the above way.

Each word in Third's dictionary is composed of three items: the link to the previously defined word, a count and flags byte, and a name for use by the Forth dictionary searcher. Directly following the name string in memory is the machine code for that word.

The link is simply a 16 bit "cell" (a cell is the basic unit of information in any Forth system) which contains a pointer, or zero.

The count and flags byte is divided into two sections. The five low bits is the count: the number of characters in the following Forth name, which can be a maximum of 31, for obvious reasons. The three high bits in this byte are "flags", of which only the highest bit (bit 7 of the byte) is presently used, for the IMMEDIATE flag.

The Forth virtual machine

This virtual machine is sort of like a very fast emulated CPU. Its instructions are the CODEd words. NEXT is the instruction processor. SI points to the current "instruction", and acts like a CPU instruction pointer. Each instruction is a pointer to 80x86 code to execute this instruction, so to execute the instruction we simply load a 16 bit word from SI and jump to this address.

Like all Forths, Third has two stacks: the data stack and the return stack. It keeps function call return addresses (and at times, various other stuff) on the return stack. The return stack is addressed using BP as a stack pointer. The data stack, referenced by SP using PUSH and POP, holds all function parameters and return values. Yes, with this lovely setup you can more than one return value (unlike some languages we know of :-).

The Third virtual machine uses 80x86 CPU registers as follows:

registers       use
-------------------
AX              scratch, used for LODSW in NEXT
BX              scratch, used for referencing memory and the like
CX              scratch, used for counting and loops
DX              scratch, used for division and multiplication
DI              scratch, used for string instructions
SI              the virtual machine instruction pointer
BP              the virtual machine return stack pointer
SP              the virtual machine data stack pointer
CS              80x86 code segment register (never changes)
DS              data segment, must stay the same as CS
SS              stack segment, must stay the same as CS
ES              scratch segment register
flags           direction flag is expected to always be clear (CLD)

Memory usage

Third uses one full 80x86 64k segment which is divides up as follows:

base    limit   description
---------------------------
0000    0100    256 byte DOS program segment prefix (PSP)
0100    HERE    Forth dictionary: code, name, and data spaces
HERE    FDF0    free space (used for PAD buffers and MALLOC)
FDF0    FE00    8 element search order (CONTEXT) stack
FE00    FF00    128 cell data stack (SP stack)
FF00    FFFE    127 cell return stack (BP stack)
FFFE    10000   two bytes unused memory :-)

Stack diagrams

Stack diagrams or pictures are expressed with the incoming parameters on the left of the "--" and outgoing return values on the right. The item on the top of the stack (abbreviated TOS) is always the left most item in the stack picture.

A '|' (vertical bar) character in a stack picture means "or this" and is used when a word has varying parameters or return values. A ".." (two dots) sequence means "many items could be here".

Possible double quotes before the "--" tells what (if any) input buffer parsing the word does. Parsed text abbreviations are given in the following table:

abbreviation    description
---------------------------
<char>          a delimiting character marking end of string being parsed
<chars>         zero or more occurences of the character char
<space>         a delimiting space character (or any whitespace character)
<spaces>        zero or more consecutive spaces (or any whitespace)
<quote>         a delimiting double quote '"'
<paren>         a delimiting right parenthesis ')'
<eol>           an implied delimiter marking the end of a line
ccc             a parsed sequence of arbitrary chars excluding any delimiter
name            a token bounded by space, same as ccc<space> or ccc<eol>

Some examples of stack diagrams follow:

word            stack picture
-----------------------------
MY-WORD         stack-before "parsing" -- stack-after
ROT             x1 x2 x3 -- x2 x3 x1
?DUP            x -- 0 | x x
PARSE           char "ccc<char>" -- c-addr u
RESTORE-INPUT   xn .. x1 n -- flag

The meaning of the various stack picture data types are as follows:

symbol          data type                               # of cells on stack
---------------------------------------------------------------------------
flag            boolean flag                            1
true            true flag                               1
false           false flag                              1
xt              execution token                         1
addr            address                                 1
a-addr          cell aligned address                    1
c-addr          character aligned address               1
ofs             offset into segment (Third specific)    1
seg             segment pointer (Third specific)        1
ior             I/O result                              1
fam             file access method                      1
fileid          file identifier                         1
wid             wordlist identifier                     1
char            character                               1
n               signed single-cell integer              1
+n              non-negative single-cell integer        1
u               unsigned single-cell integer            1
n|u             signed or unsigned single-cell integer  1
x               unspecified cell                        1
d               signed double-cell integer              2
+d              non-negative double-cell integer        2
ud              unsigned double-cell integer            2
d|ud            signed or unsigned double-cell integer  2
xd              unspecified cell pair                   2
colon-sys       colon definition compilation            implementation defined
do-sys          DO...LOOP structures                    impl
View on GitHub
GitHub Stars84
CategoryDevelopment
Updated1mo ago
Forks7

Languages

Forth

Security Score

100/100

Audited on Mar 4, 2026

No findings