SkillAgentSearch skills...

Fcc

Forth interpreter and compiler - a standard, portable, optimized Forth

Install / Use

/learn @bshepherdson/Fcc
About this skill

Quality Score

0/100

Supported Platforms

Zed

Tags

README

Forth Compiler

FCC is an interpreter and compiler for ANS Forth. It has a portable C implementation, and a bootstrapping Forth library that runs on top of it.

Goals

  • Fast - ideally competitive with Gforth.
    • On x86_64, beats Gforth on most benchmarks.
    • On M1 Macs (ARM64), trails Gforth by 4-6x.
  • Portable to many platforms both rich and embedded.
    • It should be portable to most things with a C compiler, and translating the C to assembly by hand is tractable. Most of the heavy lifting is in Forth.
    • The library is (barring bugs) flexible regarding the sizes of cells, chars and address units. It does assume cells are big enough for a pointer, so it probably won't work on 8-bit machines.
  • Standard - follow the ANS Forth standard.
    • Currently the Forth 2012 version, see below for compliance details.
  • Interoperable with C libraries, similar to Gforth's technique.
    • One of my projects is a Gameboy emulator; that requires SDL.
    • No progress on this front so far.
  • Unencumbered by the GPL. Gforth being licensed under the GPL and the bootstrapping nature of Forth conspire to make it tricky, and sometimes impossible, to release a Forth binary built with Gforth without including its full source.
    • For some of my projects, that is an unacceptable restriction.
    • FCC is released under the Apache license 2.0, and therefore does not limit your ability to release FCC, or binaries built with it.

Current state

The portable C version is fairly complete. (Standards compliance details below.)

It has been tested in Linux on arm32 and x86_64, built using gcc. Your mileage may vary on other platforms, processors or compilers.

DCPU

One of my side projects is writing a Forth OS for the DCPU-16, the fake 16-bit CPU invented by Notch for the now-defunct game 0x10c. A partial attempt to assemble this Forth for it is in dcpu/.

Standards Compliance

This section details the compliance of FCC with the Forth 2012 standard

FCC is a Forth-2012 System.

All standard CORE words are implemented.

Core Extensions

Providing .(, .R, 0<>, 0>, 2>R, 2R>, 2R@, :NONAME, <>, ?DO, ACTION-OF, AGAIN, BUFFER:, C", CASE, COMPILE,, DEFER, DEFER!, DEFER@, ENDCASE, ENDOF, ERASE, FALSE, HEX, HOLDS, IS, MARKER, NIP, OF, PAD, PARSE, PARSE-NAME, PICK, REFILL, RESTORE-INPUT, ROLL, SAVE-INPUT, SOURCE-ID, TO, TRUE, TUCK, U.R, U>, UNUSED, VALUE, WITHIN, \ from the Core Extensions word set.

(Everything but S\" and [COMPILE].)

Tools

Providing .S, ?, DUMP, and WORDS from the Tools word set.

(Everything but SEE.)

Providing AHEAD, SYNONYM, N>R, NR>, [DEFINED], and [UNDEFINED] from the Tools Extensions word set.

Facility

Providing +FIELD, BEGIN-STRUCTURE, CFIELD:, END-STRUCTURE, and FIELD: from the Facility Extensions word set.

Implementation-defined Options

As required in Section 4.1.1, and appearing in the same order.

  • Cell-aligned addresses are aligned to the pointer size of the host machine (eg. 32 bits on 32-bit machines, 64 bits on 64-bit machines).
  • EMIT will try to print whatever you give it; the result depends on the output device, generally the terminal.
  • ACCEPT uses GNU readline to support editing.
  • The character set for EMIT and KEY is standard ASCII.
  • All addresses are character-aligned, since characters and address units are the same size (bytes, 8 bits).
  • Spaces (0x20, ' ') and tabs (0x09, '\t') are treated as spaces.
  • The control-flow stack is the data stack during compilation of a definition.
  • Digits larger than 35 are not converted and will be treated as the end of the number being parsed.
  • ACCEPT echoes the entered text. It does not put the newline character into the input, but it does display a newline.
  • ABORT is equivalent to QUIT: it clears the stacks, sets the input source to the user input device, and continues interpreting.
  • Uses the GNU readline library for reading input, so line endings are abstracted. Whatever your terminal accepts, essentially.
  • Counted strings have a maximum length of 255 characters.
  • Parsed strings have a maximum length of 255 characters.
  • Definition names have a maximum length of 255 characters.
  • No limit on the length of ENVIRONMENT? queries.
  • File names can be given on the command line; they will be loaded in order. Then the input device will remain the user's terminal.
  • The only supported output device is the user's terminal. (It can be redirected to a file or other process, if the shell supports that.)
  • Dictionary definitions take this form:
    • link pointer (points to the previous definition, forming a linked list)
    • metadata cell. The name is in the low byte. 0x100 indicates hidden, 0x200 indicates an "immediate" word.
    • pointer to the name
    • code field
  • An address unit is that of the host machine, generally an 8-bit byte.
  • Number representation is that of the host machine, generally 2s complement.
  • Ranges of numeric types, where m is the number of bits in a pointer/cell on the host machine:
    • n: -(2^(m-1)) to 2^(m-1) - 1
    • +n: 0 to 2^(m-1) - 1
    • u: 0 to 2^m - 1
    • d: -(2^(2m-1)) to 2^(2m-1) - 1
    • +d: 0 to 2^(2m-1) - 1
    • ud: 0 to 2^(2m) - 1
  • Writing to any part of data space is permitted. (Though changing eg. the dictionary's link pointers might result in ambiguous conditions.)
  • WORD uses HERE as its buffer; therefore it is large (and very transient).
  • A cell is the same size as a pointer on the host machine. Generally, an address unit is a byte, so a cell is 4 units on a 32-bit machine and 8 on a 64-bit machine.
  • A character is a single address unit (generally a byte).
  • The keyboard input buffer is dynamically allocated by readline, and is limited only by system RAM. (The parse buffer will truncate the input to 256 characters, however.)
  • The pictured numeric string area is HERE, therefore very large (and very transient).
  • PAD returns an area of 1024 address units (bytes, usually).
  • FCC is case-insensitive.
  • The prompt looks like " ok\n> ".
  • Uses the host system's division routines. (Usually libc, which is symmetric.)
  • STATE is either 0 (interpreting) or 1 (compiling).
  • Arithmetic overflow is that of the host system. Generally, wrapping around and not throwing exceptions.
  • After a DOES>, the current definition is hidden.

Ambiguous Conditions

General conditions, in the same order as Section 4.1.2.

  • A name is neither a valid definition name nor a valid number during text interpretation (3.4)
    • The error message *** Unrecognized word: xyz is written to the standard error stream.
  • A definition name exceeded the maximum length allowed (3.3.1.2)
    • When a definition name is too long (more than 255 bytes) it might be considered immediate or hidden wrongly. Results will be unpredictable.
    • TODO: This error case could be checked and reported nicely.
  • Addressing a region not listed in 3.3.3 Data space
    • Addressing outside the data space might work normally, might segfault (or similar) or might do something else (bus errors, maybe).
    • In other words, memory accesses are native system memory accesses, and trigger system errors.
  • Argument type incompatible with the specified input parameter (3.1)
    • Types are not checked, and most types (eg. flags) are cell-sized integers.
    • Passing wrong types therefore might result in odd behavior, but not in a checked type error.
  • Attempting to obtain the execution token of a definition with undefined interpretation semantics
    • Asking for the xt of a word without interpretation semantics generally will return an xt, but executing it will be an ambiguous condition (probably a segfault).
  • Dividing by zero
    • Dividing by zero will cause a system exception and exit FCC.
    • TODO: Catch and handle that more gracefully, probably by QUITting.
  • Insufficient data-stack space or return-stack space (stack overflow)
    • Stack overflows might cause a segfault (or similar) but might also overwrite other memory.
  • Insufficient space for loop-control parameters
    • Loop-control parameters are on the return stack, so see above.
  • Insufficient space in the dictionary
    • The dictionary headers and data space are the same block. The portable C version malloc()s 4 megabytes by default; overflowing it will probably cause a segfault or similar.
    • TODO: Make ALLOT check this condition and allocate more space when possible.
  • Interpreting a word with undefined interpretation semantics
    • Interpreting a word with undefined interpretation semantics (like IF or WHILE) will usually consume and/or add junk on the stack, and may read or write memory unpredictably, and therefore may cause a segfault.
  • Modifying the contents of the input buffer or a string literal (3.3.3.4, 3.3.3.5)
    • Modifying the input buffer is not formally supported, but it will work as one would expect: the edited text is what gets parsed. Likewise, editing string literals should work sanely, though it's not formally supported.
  • Overflow of a pictured numeric output string
    • Pictured numeric output uses data space after HERE, so overflow is unlikely (and described above).
  • Parsed string overflow
    • Parsed strings are dynamically allocated by readline, so overflow there is usually impossible (other than exhausting the system RAM). A maximum of

Related Skills

View on GitHub
GitHub Stars34
CategoryDevelopment
Updated10h ago
Forks3

Languages

Forth

Security Score

90/100

Audited on Apr 5, 2026

No findings