ParseKit
Parse() is a comprehensive Delphi-based toolkit designed to streamline the entire process of compiler construction. By providing a unified API, it allows developers to define a custom programming language’s syntax and logic within a single configuration object. The system manages the complex transition from source text to native binaries.
Install / Use
/learn @tinyBigGAMES/ParseKitREADME

What is Parse()?
Parse() is a compiler construction toolkit. You define the language. It handles the rest.
Parse() is a Delphi library that gives you a complete, configurable compilation pipeline in a single fluent API. You describe your language: its keywords, operators, grammar rules, and code generation, through one configuration object. Parse() runs that description through a full source-to-binary toolchain: tokenizer, Pratt parser, semantic engine, C++23 emitter, and Zig-based native compiler.
// This is your entire language definition.
// One config object. One Compile() call. One native binary.
LParse.Config()
.AddKeyword('print', 'keyword.print')
.AddOperator('(', 'delimiter.lparen')
.AddOperator(')', 'delimiter.rparen')
.AddStringStyle('"', '"', PARSE_KIND_STRING, True);
LParse.Config().RegisterStatement('keyword.print', 'stmt.print',
function(AParser: TParseParserBase): TParseASTNodeBase
var
LNode: TParseASTNode;
begin
LNode := AParser.CreateNode();
AParser.Consume();
AParser.Expect('delimiter.lparen');
LNode.AddChild(TParseASTNode(AParser.ParseExpression(0)));
AParser.Expect('delimiter.rparen');
Result := LNode;
end);
LParse.Config().RegisterEmitter('stmt.print',
procedure(ANode: TParseASTNodeBase; AGen: TParseIRBase)
begin
AGen.Stmt('std::cout << ' +
LParse.Config().ExprToString(ANode.GetChild(0)) + ' << std::endl;');
end);
LParse.SetSourceFile('hello.mylang');
LParse.Compile();
The target is always C++ 23, compiled to a native binary via Zig/Clang. You never write C++ and you never configure a build system. Parse() generates, compiles, and optionally executes the result in a single call.
🎯 Who is Parse() For?
Parse() is for developers who want to build a programming language without spending six months on infrastructure. If any of the following describes you, Parse() is worth a look:
- Language designers: You have a syntax idea and want to see it run as a native binary this week, not next year. Parse() handles every stage from source text to executable. You focus on the language design.
- Domain-specific language authors: Build a DSL for configuration, scripting, query, or automation that compiles to native code rather than being interpreted. No external runtime, no JVM, no Python dependency.
- Compiler students and researchers: Learn real compiler construction techniques: Pratt parsing, scope analysis, AST enrichment, code generation, through working implementations, not textbook pseudocode.
- Tool builders: Build code generators, transpilers, or custom build pipelines around a language you define. Parse() fits naturally into developer tooling and build-time workflows.
- Language porters: Bring an existing language syntax to native compilation by writing the grammar and emit handlers. The hard parts: parsing theory, optimization, linking, are all handled.
✨ Key Features
- 🔧 One config, one language: A single
TParseobject drives every stage of the pipeline. Define keywords, operators, grammar rules, and emitters in one place. The lexer, parser, semantic engine, and codegen all read from the same config. - ⚡ Pratt parser built in: Top-down operator precedence parsing ships ready to use. Register prefix, infix-left, infix-right, and statement handlers. Binding powers control precedence. No grammar files, no parser generators.
- 🌳 Generic AST with attribute store: Every AST node carries a string-keyed attribute dictionary. Parse stages communicate through attributes. The semantic engine writes
PARSE_ATTR_*values that the codegen reads. No coupling between stages. - 🔬 Semantic engine with scope trees: Built-in symbol table, scope push/pop, symbol declaration and lookup, type compatibility checking, and implicit coercion annotation. Register only the handlers you need; unregistered node kinds are walked transparently.
- 🎯 C++23 fluent emitter: A structured IR builder generates well-formed C++ 23 text. Functions, parameters, variables, control flow, expressions: all through a fluent API. No string-formatting C++ by hand.
- 🏗️ Dual-file output: Generates
.h(forward declarations, type aliases, constants) and.cpp(implementations) automatically. Language authors direct output to header or source per statement. - 🚀 Zig as the build backend: The generated C++ is compiled to a native binary by Zig/Clang with no external toolchain setup required. Target Windows x64 or Linux x64. Build modes: exe, lib, dll. Optimize levels: debug, release-safe, release-fast, release-small.
- 🔄 Type inference surface: Built-in literal-type mapping and call-site scanning for dynamically-typed languages. Infer variable types from initialisers without explicit type annotations.
- 🗂️ TOML config persistence: Language configurations can be serialized to and loaded from TOML files. Snapshot a language definition and reload it without recompiling the Delphi host.
- 📡 LSP-ready enriched AST: After semantic analysis the AST is self-sufficient. Every node carries resolved type, symbol, scope, and storage attributes. An LSP layer can query
FindNodeAt,GetSymbolsInScopeAt, andFindSymbolwithout re-running any stage. - 🧩 ExprToString with overrides: Built-in recursive expression-to-C++ conversion handles all standard expression kinds. Register per-kind overrides for language-specific rendering (e.g. Pascal single-quoted strings to C++ double-quoted).
- 🏷️ Name mangling and TypeToIR: Register a name mangling function to transform source identifiers to valid C++ names. Register a TypeToIR function to map source type kinds to C++ type strings.
🌐 Four Languages, One Toolkit
Parse() ships with four showcase language implementations. Each proves a different point about what the toolkit can express. All four compile the same program: declare variables, define functions, loop, branch, output, to native binaries using identical pipeline machinery.
Pascal
Classic structured Pascal. Case-insensitive keywords, := assignment, begin/end blocks, typed variables, typed procedures and functions, writeln with multiple arguments. The Result return convention. Full C++23 forward declarations to .h.
program HelloWorld;
var
greeting: string;
count: integer;
procedure PrintBanner(msg: string);
begin
writeln('--- ', msg, ' ---');
end;
function Add(a: integer; b: integer): integer;
begin
Result := a + b;
end;
begin
greeting := 'Hello, World!';
count := 5;
PrintBanner(greeting);
writeln('5 + 3 = ', Add(5, 3));
for i := 1 to count do
writeln(' Step ', i);
end.
Lua
Dynamically typed. No type annotations anywhere. Literal-based type inference at declaration sites. Call-site pre-scan for parameter type inference. local/global scope. function/end, if/then/else/end, for i = start, limit do/end, while/do/end. .. string concatenation. print() variadic output.
local greeting = "Hello, World!"
local count = 5
function PrintBanner(msg)
print("--- " .. msg .. " ---")
end
function Add(a, b)
return a + b
end
PrintBanner(greeting)
print("5 + 3 = " .. Add(5, 3))
for i = 1, count do
print(" Step " .. i)
end
BASIC
Implicit program body, no header keyword required. Dim/As typed variable declarations. Sub/End Sub and Function/End Function. The = operator does dual duty: assignment at statement level, equality in expressions. & string concatenation. If/Then/Else/End If. For/To/Next. While/Wend. Print variadic output.
Dim greeting As String
Dim count As Integer
Sub PrintBanner(msg As String)
Print "--- " & msg & " ---"
End Sub
Function Add(a As Integer, b As Integer) As Integer
Add = a + b
End Function
greeting = "Hello, World!"
count = 5
PrintBanner(greeting)
Print "5 + 3 = " & Add(5, 3)
For i = 1 To count
Print " Step " & i
Next i
Scheme
S-expression syntax. A single ( handler drives all parsing, with no infix operators registered, no block keywords, and no statement terminator. (define var expr), (define (f args) body), (set! x expr), (if cond then else), (begin expr...), (display expr), (newline). Kebab-case identifier names are mangled to snake_case for C++. #t/#f boolean literals.
(define greeting "Hello, World!")
(define count 5)
(define (print-banner msg)
(display "--- ")
(display msg)
(display " ---")
(newline))
(define (add a b)
(+ a b))
(print-banner greeting)
(display "5 + 3 = ")
(display (add 5 3))
(newline)
🔄 The Pipeline
Every language built on Parse() follows the same path:
Source Text
│
▼
┌─────────┐ token stream ┌─────────┐ AST ┌───────────┐
│ Lexer │ ──────────────► │ Parser │ ───────────► │ Semantics │
└─────────┘ └─────────┘ └───────────┘
│
enriched AST (PARSE_ATTR_*)
│
▼
┌───────────┐
│ CodeGen │ ──► .h + .cpp
└───────────┘ │
▼
┌──────────┐
