Corth
A self-hosted stack based language like Forth
Install / Use
/learn @corth-lang/CorthREADME
Corth
A stack based programming language I designed based on Forth and Porth from the Porth programming language series of Tsoding Daily channel. The language is quite similar to Porth, however there are several differences. The compiler was written in Python first, but then rewrote it using Corth language itself. It is now a self hosted language.
The repo consists of the compiler source code in Corth, an already compiled and assembled compiler, standard library and examples.
Requirements
- Right now, the compiler can only compile in ELF64 format.
- Compiler uses NASM which can be downloaded using a package manager or from its repo.
How to use the compiler?
To compile a Corth program to an ELF64 executable, cd to the directory that contains the corth executable and run:
./corth compile <file-name> -i ./std/
This will create an executable called output which can be directly run.
The compiler can be bootstrapped using the bootstrap subcommand:
./corth bootstrap ./compiler/ --std ./std/
This will compile the compiler source code and place it at ./corth.
Quick start to the Corth language
- Corth is a concatinative (stack based) language. Every operation pops its arguments from the end of the stack and pushes its return values to the end of the stack.
- Programs are compiled into NASM format first, then into object and executable files.
- Indentation, spaces, newline characters are ignored by the compiler unless they are inside of a string.
- Names can contain any character except whitespace (space, tab or newline); but can not start with a decimal digit, a dollar sign, single or double quote.
- Language is made up of WORDS, instead of statements. Because of this, the lexer does not create a AST and instead only produces a sequence of tokens. These tokens are then run in order.
- EVERYTHING IN THIS LANGUAGE CAN BE CHANGED AT ANY TIME. USE WITH CAUTION.
First program:
// From ./examples/hello_world.corth
include "linux_86x/stdio.corth"
proc main
int int -> int
in let argc argv in
"Hello, World!\n" puts
end 0 end
- This is a simple program that prints 'Hello, World!' when run.
- Compile and run this program with
./corth compile ./examples/hello_world.corth -i ./std.corth && ./output includekeyword is used to include the library stdio, which contains some basic I/O procedures and macros for writing to/from files and the standard streams.letkeyword is used to 'name' values. In this example, it is used to name the parameter values.prockeyword is used to define a procedure.mainis where the program starts.- Writing anything in double quotes causes it to be interpreted as a string. A string is just a pointer to a character array and an integer that represents its length.
putsis defined in the stdio library, which prints the string that is passed to it in the standard output.- For more examples, you can check the ./examples/ directory.
- For more information about these concepts, keep scrolling.
Numbers:
// Push and pop direction of the stack is kept on the right side in this document.
// This will hopefully help understand the concept of stack and 'let' keyword.
34 // Stack = { 34 }
0b101001 // Stack = { 34, 41 }
0o205126 // Stack = { 34, 41, 68182 }
0x5729da // Stack = { 34, 41, 68182, 5712346 }
'a' // Stack = { 34, 41, 68182, 5712346, 97 }
'\n' // Stack = { 34, 41, 68182, 5712346, 97, 10 }
- Numbers push an integer type to the stack. An integer type of stack item stores an 8 byte value.
- Lexer allows binary, octal and hexadecimal integers as well. The prefixes are
0b,0oand0xrespectively. - Putting a character between single quotes ('') causes it to be interpreted as an integer, which appends its ASCII value to the stack.
- Escape statements such as
\nare supported in characters.
Strings:
"This is a string" // Stack = { 0x648a15, 16 }
"This also is a string.
In Corth, multi-line strings are supported." // Stack = { 0x648a15, 16, 0x648a26, 71 }
- Strings push two items, a pointer to the address and an integer length of the string.
- Escape statements are also supported in strings.
- It is not recommended to modify the contents of strings in program, although it is possible.
Arithmetic operators:
34 35 + // Stack = { 69 }
69 27 - // Stack = { 69, 42 }
68 inc // Stack = { 69, 42, 69 }
43 dec // Stack = { 69, 42, 69, 42 }
3 23 * // Stack = { 69, 42, 69, 42, 69 }
85 2 / // Stack = { 69, 42, 69, 42, 69, 42 }
+sums the last two items and pushes the result back,-subtracts signed or unsigned integers.*multiplies and/divides signed integers. Right now, the compiler does not keep track of the integer 'signeded-ness', so every signed and unsigned operation can be used on any integer type. The unsigned versions of\*and/areu\*andu/.incanddecare macros defined as1 +and1 -in core/arithmetic.corth. They can be used to increment or decrement a number once.
Include:
include "str.corth"
- When compiler sees an
includekeyword, it starts to compile the file whose path is given after theincludekeyword. - Right now, the compiler does not allow including directories. A todo error will be given if tried.
- ./std/ directory contains some useful libraries like core/stack.corth or str.corth.
Stack operations:
1 2 3 // Stack = { 1, 2, 3 }
swp // Stack = { 1, 3, 2 }
drop // Stack = { 1, 3 }
dup // Stack = { 1, 3, 3 }
rot // Stack = { 3, 3, 1 }
arot // Stack = { 1, 3, 3 }
- ./std/core/stack.corth contains several macros for stack manipulation.
dropremoves the last item from the stack.dupduplicates the last item in the stack.swpswaps the places of the last two items.rotrotates the places of the last 3 items, by moving the first added to the last position.arotdoes the exact opposite of whatrotdoes.- These operations are macros defined using let, with names starting with underscore (_). For hard to manage stack operations, it is recommended to use
letinstead of these macros. - There are some others, but it is recommended to check ./std/core/stack.corth for more information since they are a bit more complex.
I/O:
"Hello, world!\n" puts // Prints 'Hello, world!' to the standard output.
34 35 + eputu " is a nice number.\n" eputs // Prints '69 is a nice number.' to the standard error.
- ./std/linux_x86/stdio.corth contains useful procedures for I/O operations like reading from and writing to streams.
Procedures:
proc arithmetic-average // The name of the procedure is 'arithmetic-average'.
// Procedure takes two integers as arguments, and returns a single integer.
// The leftmost type is the oldest item in the stack.
int int -> int
in
// This is where the code is located.
+ 2 /
end
// This procedure will be run.
proc main
// Right now, only 'int int -> int' argument layout is allowed for the main procedure.
int int -> int
in let argc argv in
"The arithmetic average of 53 and 31 is " puts 53 31 arithmetic-average putu ".\n" puts
end 0 end // Program exits with exit code 0.
procdefines a procedure, which can be called anywhere in the code.returncan be used to early return from a procedure, but the stack must match with the procedure's output layout.- Because of the way the stack works, procedures can return more than one value unlike most other languages.
- Program starts from the
mainprocedure; if it is not defined, the compiler will return an error. - If the code does not require recursion and is simple, it might be better to use a macro depending on the exact requirements.
Macros:
macro sayHi
// Name of the macro is 'sayHi'. This means when the compiler sees 'sayHi' anywhere in the code, it will convert it to these.
// Takes a string, the name and prints a welcome message.
"Hi, " puts puts "!\n" puts
endmacro
proc main
int int -> int
in let argc argv in
"Josh"
sayHi // This will be converted to this:
// "Hi, " puts puts "!\n" puts
end 0 end
macrokeyword is used to define macros andendmacrois used to end the definition.- Macros expand at compile time, allowing simplifying and compressing code without losing functionality.
- Macros are only compiled after expanding, so any compile time error that would be caused by a macro is not detected before expansion.
- Using
letinside a macro is usually a bad idea, although some library macros are designed that way (likedup,swporrot). If the code requireslet; either change that macro to a procedure, or name the let variable with names that starts and ends with underscores (_).
Name scopes:
-
Two global or two local variables can not have the same name, but if a local and a global variable have the same name, the local one will be reachable until it is removed from the scope.
-
If a name is defined globally (for example with
memory), it can be reached from anywhere meaning any procedure in and out of the same file. -
If a name is defined locally in a procedure, it can only be reached within the scope that the statement it is in. For example:
// 'x' is undefined here. 69 let x in // 'x' is defined here. end // 'x' is undefined here.
Comments:
// This is a line comment, it can also come after code
34 35 + putu // Just like that
/*
This is a block comment, aka a multi-line comment.
Block comments can span several lines.
*/
- Comments do not have a
Related Skills
node-connect
350.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
