Third
Third, a small Forth compiler for 8086 DOS
Install / Use
/learn @benhoyt/ThirdREADME
Third, a small Forth compiler for 8086 DOS
Historical note: I wrote this in 1998 (when I was 16), so I'm just putting it up here as a backup and for "historical interest". It's still a pretty good show-case for how simple a self-hosting Forth compiler can be.
I wrote it more or less all myself (though parts of the assembler were pretty much copied, I think :-), but it's loosely based on eForth and the Forth compilers my dad wrote.
It still works on current 32-bit Windows systems, but 64-bit Windows systems don't run 16-bit DOS programs anymore, so for those you'll need to use DOSBox or something similar.
All the documentation that follows is original.
Quick start
To jump straight into using Third, execute the batch file MBIG MBER MQUICK in that order. Then you can run Big to use Third, or Ber to use Third with Berwyn Hoyt's extensions. Type BYE once you're in Third to exit back to DOS. To load a Forth file (source code) type "INCLUDE filename.f".
General information
Third was designed and coded by Benjamin Hoyt, with complements to the following people:
Charles Moore inventing Forth
Bruce Hoyt coding FE (Forth Editor)
Berwyn Hoyt bug reports and help to make Third clean
Bill Muench eForth
Wonyong Koh hForth
Tom Zimmer F-PC
Anton Ertl gForth
ANS committee creating the ANS Standard
Third tries hard, most of the time, to be a reasonable 16 bit Forth compiler for DOS PCs. It was originally coded to be a very small Forth as a learning tool for others, but this idea vanished when Berwyn and I decided to use it for a cross compiler for the Motorola 68HC12 CPU. It is now a somewhat stable ANS Forth system with various features.
Excepting ENVIRONMENT? Third is a full CORE system just with the bare Kernel.COM. With Big.F loaded (to make Big.COM) Third is a ANS system with the following wordsets and their extensions implemented:
CORE DOUBLE EXCEPTION FACILITY FILE TOOLS SEARCH STRING
Missing are BLOCK FLOATING LOCALS MEMORY. Even though the BLOCK wordset is supposed to be implemented if the FILE wordset is, I can't see much use for it at the moment. When I need it, I'll code it. FLOATING will probably never come into the picture. I've never needed or liked LOCALS (to say the least). MEMORY words seem a bit pointless for a small 16 bit Forth with only a 64k address space.
Third also has a clean metacompiler (thanks to Bill Muench's eForth metacompiler), an 8086 assembler, a nice DOS shell routine, a simple decompiler, a fairly nice Quick reference help, and reasonable docs.
Third dictionary structure
Third's dictionary is a structure of names and other information chained together in a linked list. Each "record" in the linked structure contains information about a Forth "word" (don't confuse with the CPU-word type meaning). Each record in the linked list has a pointer to the previously defined word, whose link points to the word defined before that, whose link points to ... and so it goes till the link contains a zero (end of linked list).
A wordlist is a list of words chained together in the above way.
Each word in Third's dictionary is composed of three items: the link to the previously defined word, a count and flags byte, and a name for use by the Forth dictionary searcher. Directly following the name string in memory is the machine code for that word.
The link is simply a 16 bit "cell" (a cell is the basic unit of information in any Forth system) which contains a pointer, or zero.
The count and flags byte is divided into two sections. The five low bits is the count: the number of characters in the following Forth name, which can be a maximum of 31, for obvious reasons. The three high bits in this byte are "flags", of which only the highest bit (bit 7 of the byte) is presently used, for the IMMEDIATE flag.
The Forth virtual machine
This virtual machine is sort of like a very fast emulated CPU. Its instructions are the CODEd words. NEXT is the instruction processor. SI points to the current "instruction", and acts like a CPU instruction pointer. Each instruction is a pointer to 80x86 code to execute this instruction, so to execute the instruction we simply load a 16 bit word from SI and jump to this address.
Like all Forths, Third has two stacks: the data stack and the return stack. It keeps function call return addresses (and at times, various other stuff) on the return stack. The return stack is addressed using BP as a stack pointer. The data stack, referenced by SP using PUSH and POP, holds all function parameters and return values. Yes, with this lovely setup you can more than one return value (unlike some languages we know of :-).
The Third virtual machine uses 80x86 CPU registers as follows:
registers use
-------------------
AX scratch, used for LODSW in NEXT
BX scratch, used for referencing memory and the like
CX scratch, used for counting and loops
DX scratch, used for division and multiplication
DI scratch, used for string instructions
SI the virtual machine instruction pointer
BP the virtual machine return stack pointer
SP the virtual machine data stack pointer
CS 80x86 code segment register (never changes)
DS data segment, must stay the same as CS
SS stack segment, must stay the same as CS
ES scratch segment register
flags direction flag is expected to always be clear (CLD)
Memory usage
Third uses one full 80x86 64k segment which is divides up as follows:
base limit description
---------------------------
0000 0100 256 byte DOS program segment prefix (PSP)
0100 HERE Forth dictionary: code, name, and data spaces
HERE FDF0 free space (used for PAD buffers and MALLOC)
FDF0 FE00 8 element search order (CONTEXT) stack
FE00 FF00 128 cell data stack (SP stack)
FF00 FFFE 127 cell return stack (BP stack)
FFFE 10000 two bytes unused memory :-)
Stack diagrams
Stack diagrams or pictures are expressed with the incoming parameters on the left of the "--" and outgoing return values on the right. The item on the top of the stack (abbreviated TOS) is always the left most item in the stack picture.
A '|' (vertical bar) character in a stack picture means "or this" and is used when a word has varying parameters or return values. A ".." (two dots) sequence means "many items could be here".
Possible double quotes before the "--" tells what (if any) input buffer parsing the word does. Parsed text abbreviations are given in the following table:
abbreviation description
---------------------------
<char> a delimiting character marking end of string being parsed
<chars> zero or more occurences of the character char
<space> a delimiting space character (or any whitespace character)
<spaces> zero or more consecutive spaces (or any whitespace)
<quote> a delimiting double quote '"'
<paren> a delimiting right parenthesis ')'
<eol> an implied delimiter marking the end of a line
ccc a parsed sequence of arbitrary chars excluding any delimiter
name a token bounded by space, same as ccc<space> or ccc<eol>
Some examples of stack diagrams follow:
word stack picture
-----------------------------
MY-WORD stack-before "parsing" -- stack-after
ROT x1 x2 x3 -- x2 x3 x1
?DUP x -- 0 | x x
PARSE char "ccc<char>" -- c-addr u
RESTORE-INPUT xn .. x1 n -- flag
The meaning of the various stack picture data types are as follows:
symbol data type # of cells on stack
---------------------------------------------------------------------------
flag boolean flag 1
true true flag 1
false false flag 1
xt execution token 1
addr address 1
a-addr cell aligned address 1
c-addr character aligned address 1
ofs offset into segment (Third specific) 1
seg segment pointer (Third specific) 1
ior I/O result 1
fam file access method 1
fileid file identifier 1
wid wordlist identifier 1
char character 1
n signed single-cell integer 1
+n non-negative single-cell integer 1
u unsigned single-cell integer 1
n|u signed or unsigned single-cell integer 1
x unspecified cell 1
d signed double-cell integer 2
+d non-negative double-cell integer 2
ud unsigned double-cell integer 2
d|ud signed or unsigned double-cell integer 2
xd unspecified cell pair 2
colon-sys colon definition compilation implementation defined
do-sys DO...LOOP structures impl
