Fcc
Forth interpreter and compiler - a standard, portable, optimized Forth
Install / Use
/learn @bshepherdson/FccREADME
Forth Compiler
FCC is an interpreter and compiler for ANS Forth. It has a portable C implementation, and a bootstrapping Forth library that runs on top of it.
Goals
- Fast - ideally competitive with Gforth.
- On x86_64, beats Gforth on most benchmarks.
- On M1 Macs (ARM64), trails Gforth by 4-6x.
- Portable to many platforms both rich and embedded.
- It should be portable to most things with a C compiler, and translating the C to assembly by hand is tractable. Most of the heavy lifting is in Forth.
- The library is (barring bugs) flexible regarding the sizes of cells, chars and address units. It does assume cells are big enough for a pointer, so it probably won't work on 8-bit machines.
- Standard - follow the ANS Forth standard.
- Currently the Forth 2012 version, see below for compliance details.
- Interoperable with C libraries, similar to Gforth's technique.
- One of my projects is a Gameboy emulator; that requires SDL.
- No progress on this front so far.
- Unencumbered by the GPL. Gforth being licensed under the GPL and the
bootstrapping nature of Forth conspire to make it tricky, and sometimes
impossible, to release a Forth binary built with Gforth without including its
full source.
- For some of my projects, that is an unacceptable restriction.
- FCC is released under the Apache license 2.0, and therefore does not limit your ability to release FCC, or binaries built with it.
Current state
The portable C version is fairly complete. (Standards compliance details below.)
It has been tested in Linux on arm32 and x86_64, built using gcc. Your
mileage may vary on other platforms, processors or compilers.
DCPU
One of my side projects is writing a Forth OS for the
DCPU-16,
the fake 16-bit CPU invented by Notch for the now-defunct game 0x10c. A
partial attempt to assemble this Forth for it is in dcpu/.
Standards Compliance
This section details the compliance of FCC with the Forth 2012 standard
FCC is a Forth-2012 System.
All standard CORE words are implemented.
Core Extensions
Providing .(, .R, 0<>, 0>, 2>R, 2R>, 2R@, :NONAME, <>, ?DO,
ACTION-OF, AGAIN, BUFFER:, C", CASE, COMPILE,, DEFER, DEFER!,
DEFER@, ENDCASE, ENDOF, ERASE, FALSE, HEX, HOLDS, IS, MARKER,
NIP, OF, PAD, PARSE, PARSE-NAME, PICK, REFILL, RESTORE-INPUT,
ROLL, SAVE-INPUT, SOURCE-ID, TO, TRUE, TUCK, U.R, U>, UNUSED,
VALUE, WITHIN, \ from the Core Extensions word set.
(Everything but S\" and [COMPILE].)
Tools
Providing .S, ?, DUMP, and WORDS from the Tools word set.
(Everything but SEE.)
Providing AHEAD, SYNONYM, N>R, NR>, [DEFINED], and [UNDEFINED] from
the Tools Extensions word set.
Facility
Providing +FIELD, BEGIN-STRUCTURE, CFIELD:, END-STRUCTURE, and FIELD:
from the Facility Extensions word set.
Implementation-defined Options
As required in Section 4.1.1, and appearing in the same order.
- Cell-aligned addresses are aligned to the pointer size of the host machine (eg. 32 bits on 32-bit machines, 64 bits on 64-bit machines).
EMITwill try to print whatever you give it; the result depends on the output device, generally the terminal.ACCEPTuses GNU readline to support editing.- The character set for
EMITandKEYis standard ASCII. - All addresses are character-aligned, since characters and address units are the same size (bytes, 8 bits).
- Spaces (
0x20,' ') and tabs (0x09,'\t') are treated as spaces. - The control-flow stack is the data stack during compilation of a definition.
- Digits larger than 35 are not converted and will be treated as the end of the number being parsed.
ACCEPTechoes the entered text. It does not put the newline character into the input, but it does display a newline.ABORTis equivalent toQUIT: it clears the stacks, sets the input source to the user input device, and continues interpreting.- Uses the GNU
readlinelibrary for reading input, so line endings are abstracted. Whatever your terminal accepts, essentially. - Counted strings have a maximum length of 255 characters.
- Parsed strings have a maximum length of 255 characters.
- Definition names have a maximum length of 255 characters.
- No limit on the length of
ENVIRONMENT?queries. - File names can be given on the command line; they will be loaded in order. Then the input device will remain the user's terminal.
- The only supported output device is the user's terminal. (It can be redirected to a file or other process, if the shell supports that.)
- Dictionary definitions take this form:
- link pointer (points to the previous definition, forming a linked list)
- metadata cell. The name is in the low byte.
0x100indicates hidden,0x200indicates an "immediate" word. - pointer to the name
- code field
- An address unit is that of the host machine, generally an 8-bit byte.
- Number representation is that of the host machine, generally 2s complement.
- Ranges of numeric types, where m is the number of bits in a pointer/cell
on the host machine:
n: -(2^(m-1)) to 2^(m-1) - 1+n: 0 to 2^(m-1) - 1u: 0 to 2^m - 1d: -(2^(2m-1)) to 2^(2m-1) - 1+d: 0 to 2^(2m-1) - 1ud: 0 to 2^(2m) - 1
- Writing to any part of data space is permitted. (Though changing eg. the dictionary's link pointers might result in ambiguous conditions.)
WORDusesHEREas its buffer; therefore it is large (and very transient).- A cell is the same size as a pointer on the host machine. Generally, an address unit is a byte, so a cell is 4 units on a 32-bit machine and 8 on a 64-bit machine.
- A character is a single address unit (generally a byte).
- The keyboard input buffer is dynamically allocated by
readline, and is limited only by system RAM. (The parse buffer will truncate the input to 256 characters, however.) - The pictured numeric string area is
HERE, therefore very large (and very transient). PADreturns an area of 1024 address units (bytes, usually).- FCC is case-insensitive.
- The prompt looks like
" ok\n> ". - Uses the host system's division routines. (Usually libc, which is symmetric.)
STATEis either0(interpreting) or1(compiling).- Arithmetic overflow is that of the host system. Generally, wrapping around and not throwing exceptions.
- After a
DOES>, the current definition is hidden.
Ambiguous Conditions
General conditions, in the same order as Section 4.1.2.
- A name is neither a valid definition name nor a valid number during text
interpretation (3.4)
- The error message
*** Unrecognized word: xyzis written to the standard error stream.
- The error message
- A definition name exceeded the maximum length allowed (3.3.1.2)
- When a definition name is too long (more than 255 bytes) it might be considered immediate or hidden wrongly. Results will be unpredictable.
- TODO: This error case could be checked and reported nicely.
- Addressing a region not listed in 3.3.3 Data space
- Addressing outside the data space might work normally, might segfault (or similar) or might do something else (bus errors, maybe).
- In other words, memory accesses are native system memory accesses, and trigger system errors.
- Argument type incompatible with the specified input parameter (3.1)
- Types are not checked, and most types (eg. flags) are cell-sized integers.
- Passing wrong types therefore might result in odd behavior, but not in a checked type error.
- Attempting to obtain the execution token of a definition with undefined
interpretation semantics
- Asking for the
xtof a word without interpretation semantics generally will return anxt, but executing it will be an ambiguous condition (probably a segfault).
- Asking for the
- Dividing by zero
- Dividing by zero will cause a system exception and exit FCC.
- TODO: Catch and handle that more gracefully, probably by
QUITting.
- Insufficient data-stack space or return-stack space (stack overflow)
- Stack overflows might cause a segfault (or similar) but might also overwrite other memory.
- Insufficient space for loop-control parameters
- Loop-control parameters are on the return stack, so see above.
- Insufficient space in the dictionary
- The dictionary headers and data space are the same block. The portable C
version
malloc()s 4 megabytes by default; overflowing it will probably cause a segfault or similar. - TODO: Make
ALLOTcheck this condition and allocate more space when possible.
- The dictionary headers and data space are the same block. The portable C
version
- Interpreting a word with undefined interpretation semantics
- Interpreting a word with undefined interpretation semantics (like
IForWHILE) will usually consume and/or add junk on the stack, and may read or write memory unpredictably, and therefore may cause a segfault.
- Interpreting a word with undefined interpretation semantics (like
- Modifying the contents of the input buffer or a string literal (3.3.3.4,
3.3.3.5)
- Modifying the input buffer is not formally supported, but it will work as one would expect: the edited text is what gets parsed. Likewise, editing string literals should work sanely, though it's not formally supported.
- Overflow of a pictured numeric output string
- Pictured numeric output uses data space after
HERE, so overflow is unlikely (and described above).
- Pictured numeric output uses data space after
- Parsed string overflow
- Parsed strings are dynamically allocated by
readline, so overflow there is usually impossible (other than exhausting the system RAM). A maximum of
- Parsed strings are dynamically allocated by
Related Skills
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
