Rosettes
⌾⌾⌾ Rosettes — ReDoS-safe syntax highlighter for Python 3.14+ with free-threading.
Install / Use
/learn @lbliii/RosettesREADME
⌾⌾⌾ Rosettes
A Python syntax highlighter and Pygments alternative for secure code highlighting and existing CSS themes.
from rosettes import highlight
html = highlight("def hello(): print('world')", "python")
What is Rosettes?
Rosettes is a syntax highlighter for Python 3.14t. Hand-written state machines, O(n) guaranteed, zero ReDoS risk. Safe for untrusted input in web apps and APIs.
Why people pick it:
- O(n) guaranteed — Hand-written state machines, no regex backtracking
- Zero ReDoS — No exploitable patterns, safe for untrusted input
- Free-threading native — All lexer state is local variables, keyword tables are
frozenset, tokens are immutable. Highlight from any number of threads with zero contention. - Pygments compatible — Drop-in CSS class compatibility for existing themes
- 55 languages — Python, JavaScript, Rust, Go, and 51 more
Use Rosettes For
- HTML code highlighting — Highlight source code for docs, blogs, and web apps
- Pygments migration paths — Keep existing CSS themes with Pygments-compatible classes
- Security-sensitive rendering — Highlight untrusted input without regex backtracking risk
- Parallel highlighting — Process many code blocks across threads with
highlight_many() - Python-native docs stacks — Use with Bengal, Patitas, or custom site generators
Installation
pip install rosettes
Requires Python 3.14+
Quick Start
| Function | Description |
|----------|-------------|
| highlight(code, lang) | Generate HTML with syntax highlighting |
| tokenize(code, lang) | Get raw tokens for custom processing |
| highlight_many(items) | Parallel highlighting for multiple blocks |
| list_languages() | List all 55 supported languages |
Features
| Feature | Description | Docs |
|---------|-------------|------|
| Choosing Rosettes | When it fits, migration from Pygments, and tradeoffs | Choosing Rosettes → |
| Basic Highlighting | highlight() and tokenize() functions | Highlighting → |
| Parallel Processing | highlight_many() for multi-core systems | Parallel → |
| Line Highlighting | Highlight specific lines, add line numbers | Lines → |
| CSS Styling | Semantic or Pygments-compatible classes | Styling → |
| Custom Formatters | Build terminal, LaTeX, or custom output | Extending → |
📚 Full documentation: lbliii.github.io/rosettes
Usage
<details> <summary><strong>Basic Highlighting</strong> — Generate HTML from code</summary>from rosettes import highlight
# Basic highlighting
html = highlight("def foo(): pass", "python")
# <div class="rosettes" data-language="python">...</div>
# With line numbers
html = highlight(code, "python", show_linenos=True)
# Highlight specific lines
html = highlight(code, "python", hl_lines={2, 3, 4})
</details>
<details>
<summary><strong>Parallel Processing</strong> — Speed up multiple blocks</summary>
For 8+ code blocks, use highlight_many() for parallel processing:
from rosettes import highlight_many
blocks = [
("def foo(): pass", "python"),
("const x = 1;", "javascript"),
("fn main() {}", "rust"),
]
# Highlight in parallel
results = highlight_many(blocks)
On Python 3.14t with free-threading, this provides 1.5-2x speedup for 50+ blocks.
</details> <details> <summary><strong>Tokenization</strong> — Raw tokens for custom processing</summary>from rosettes import tokenize
tokens = tokenize("x = 42", "python")
for token in tokens:
print(f"{token.type.name}: {token.value!r}")
# NAME: 'x'
# WHITESPACE: ' '
# OPERATOR: '='
# WHITESPACE: ' '
# NUMBER_INTEGER: '42'
</details>
<details>
<summary><strong>CSS Class Styles</strong> — Semantic or Pygments</summary>
Semantic (default) — Readable, self-documenting:
html = highlight(code, "python")
# <span class="syntax-keyword">def</span>
# <span class="syntax-function">hello</span>
.syntax-keyword { color: #ff79c6; }
.syntax-function { color: #50fa7b; }
.syntax-string { color: #f1fa8c; }
Pygments-compatible — Use existing themes:
html = highlight(code, "python", css_class_style="pygments")
# <span class="k">def</span>
# <span class="nf">hello</span>
</details>
Supported Languages
<details> <summary><strong>55 languages</strong> with full syntax support</summary>| Category | Languages | |----------|-----------| | Core | Python, JavaScript, TypeScript, JSON, YAML, TOML, Bash, HTML, CSS, Diff | | Systems | C, C++, Rust, Go, Zig | | JVM | Java, Kotlin, Scala, Groovy, Clojure | | Apple | Swift | | Scripting | Ruby, Perl, PHP, Lua, R, PowerShell | | Functional | Haskell, Elixir | | Data/Query | SQL, CSV, GraphQL | | Markup | Markdown, XML | | Config | INI, Nginx, Dockerfile, Makefile, HCL | | Schema | Protobuf | | Modern | Dart, Julia, Nim, Gleam, V | | AI/ML | Mojo, Triton, CUDA, Stan | | Other | PKL, CUE, Tree, Kida, Jinja, Plaintext |
</details>Architecture
<details> <summary><strong>State Machine Lexers</strong> — O(n) guaranteed</summary>Every lexer is a hand-written finite state machine:
┌─────────────────────────────────────────────────────────────┐
│ State Machine Lexer │
│ │
│ ┌─────────┐ char ┌─────────┐ char ┌─────────┐ │
│ │ INITIAL │ ────────► │ STRING │ ────────► │ ESCAPE │ │
│ │ STATE │ │ STATE │ │ STATE │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │ │
│ │ emit │ emit │ emit │
│ ▼ ▼ ▼ │
│ [Token] [Token] [Token] │
└─────────────────────────────────────────────────────────────┘
Key properties:
- Single character lookahead (O(n) guaranteed)
- No backtracking (no ReDoS possible)
- Immutable state (thread-safe)
- Local variables only (no shared mutable state)
All public APIs are thread-safe:
- Lexers use only local variables during tokenization
- Formatter state is immutable
- Registry uses
functools.cachefor memoization - Module declares itself safe for free-threading (PEP 703)
Performance
On a 10,000-line Python file:
-
Tokenize — ~12ms
-
Highlight — ~18ms
-
Parallel highlighting — Run
python benchmarks/benchmark_parallel.pyto see scaling on your machine. Example with 100 code blocks on 8-core:Threads Time Speedup 1 0.04s 1.00x 2 0.02s 1.61x 4 0.02s 2.53x 8 0.02s 2.10x
Documentation
| Section | Description | |---------|-------------| | Get Started | Installation and quickstart | | Highlighting | Core highlighting APIs | | Styling | CSS classes and themes | | Reference | Complete API documentation | | About | Architecture and design |
Development
git clone https://github.com/lbliii/rosettes.git
cd rosettes
uv sync --group dev
pytest
Run parallel benchmark (free-threading scaling demo):
python benchmarks/benchmark_parallel.py
The Bengal Ecosystem
A structured reactive stack — every layer written in pure Python for 3.14t free-threading.
| | | | | |--:|---|---|---| | ᓚᘏᗢ | Bengal | Static site generator | Docs | | ∿∿ | Purr | Content runtime | — | | ⌁⌁ | Chirp | Web framework | Docs | | =^..^= | Pounce | ASGI server | Docs | | )彡 | Kida | Template engine | Docs | | ฅᨐฅ | Patitas | Markdown parser | Docs | | ⌾⌾⌾ | Rosettes | Syntax highlighter ← You are here | Docs |
Python-native. Free-threading ready. No npm required.
License
MIT License — see LICENSE for details.
