Lexbor
Lexbor is development of an open source HTML Renderer library. https://lexbor.com
Install / Use
/learn @lexbor/LexborREADME
Lexbor
Crafting a Browser Engine with Simplicity and Flexibility.
Description
Lexbor is still in development, but the existing modules are already production-ready.
A set of fast, standards-compliant tools (modules) for working with modern web technologies — HTML parsing, CSS processing, URL handling, and more. These modules are production-ready today and form the foundation of a browser engine in development.
Features
- High Performance — one of the fastest HTML parsers available
- Standards Compliant — rigid adherence to WHATWG (HTML, DOM, URL, Encoding) and W3C (CSS) specifications
- Modular Architecture — use only what you need (e.g., just the CSS parser or Encoding module) to keep your application lightweight
- Zero Dependencies — written in pure C99, making it easy to assist, build, and embed in any project without dependency hell
- Production Ready — heavily tested on over 200 million web pages to ensure stability and correctness
Available Modules
| Module | Status | Description | |--------|--------|-------------| | DOM | ✅ Ready | DOM tree manipulation | | HTML | ✅ Ready | Full HTML parser | | CSS | ✅ Ready | CSS parsing, CSSOM, Selectors | | URL | ✅ Ready | URL parsing | | Encoding | ✅ Ready | 40+ encodings support | | Unicode | ✅ Ready | Normalization, IDNA | | Punycode | ✅ Ready | IDN encode/decode | | Layout | 🚧 In progress | — | | Fonts | 🚧 In progress | — | | and more | 🚧 In progress | — |
https://lexbor.com/modules/.
Who Uses Lexbor?
- PHP — DOM/HTML extension (since PHP 8.4), URL extension (since PHP 8.5)
- SerpApi — uses Lexbor in production for HTML parsing at scale
- Selectolax — popular Python library for fast web scraping
- Nokolexbor — high-performance Nokogiri alternative for Ruby
More bindings available for Elixir, Crystal, D, Julia, Erlang.
HTML Module
- Full conformance with the HTML5 specification.
- Manipulation of elements and attributes: add, change, delete and other.
- Supports fragment parsing (for innerHTML).
- Supports parsing by chunks.
- Passes all tree construction tests.
- Tested by 200+ million HTML pages with ASAN.
- Two ways to parse HTML: by Document, by Parser.
- Supports determining encoding by byte stream.
- Parsing CSS Styles in tag attributes and in the
<style>tag.
Documentation: https://lexbor.com/modules/html/.
CSS Module
- Full conformance with the CSS Syntax module.
- Supports:
-
- [x] Selectors.
-
- [x] StyleSheet Tree (aka CSSOM).
-
- [x] and so on.
Documentation: https://lexbor.com/modules/css/.
Selectors Module
- Search for HTML elements using CSS selectors.
Documentation: https://lexbor.com/modules/selectors/.
Encoding Module
- Full conformance with the Encoding specification.
- Supports
40 encodingsfor encode/decode. - Supports single and buffering encode/decode.
Documentation: https://lexbor.com/modules/encoding/.
URL Module
- Conformance with the URL specification
- Support Unicode ToASCII
Documentation: https://lexbor.com/modules/url/.
Punycode Module
- Conformance with the Punycode specification.
- Support Encode/Decode.
Unicode Module
- Unicode Standard Annex #15.
-
- Support Unicode normalization forms: D (NFD), C (NFC), KD (NFKD), KC (NFKC).
-
- Support chunks (stream).
- Unicode Technical Standard #46.
-
- Support Unicode IDNA Processing.
-
- Support Unicode ToASCII.
-
- Support Unicode ToUnicode.
Documentation: https://lexbor.com/modules/unicode/.
Build and Installation
Binary packages
Binaries are available for:
- CentOS 6, 7, 8
- Debian 8, 9, 10, 11
- Fedora 28, 29, 30, 31, 32, 33, 34, 36, 37
- RHEL 7, 8
- Ubuntu 14.04, 16.04, 18.04, 18.10, 19.04, 19.10, 20.04, 20.10, 21.04, 22.04
Currently for x86_64 architecture.
If you need any other architecture, please, write to support@lexbor.com.
vcpkg
For vcpkg users there is a lexbor port that can be installed via vcpkg install lexbor or by adding it to dependencies section of your vcpkg.json file.
macOS
Homebrew
To install lexbor on macOS from Homebrew:
brew install lexbor
MacPorts
To install lexbor on macOS from MacPorts:
sudo port install lexbor
Source code
For building and installing Lexbor library from source code, use CMake (open-source, cross-platform build system).
cmake . -DLEXBOR_BUILD_TESTS=ON -DLEXBOR_BUILD_EXAMPLES=ON
make
make test
Please, see more information in documentation.
Amalgamation
Lexbor can be built as a single-header/single-source amalgamation for easy integration into your project without managing multiple files or dependencies.
The amalgamation combines all selected modules and their dependencies into one .h file, making it simple to drop into any C/C++ project.
Generate Amalgamation
Use the single.pl script to generate an amalgamated version:
# Generate amalgamation with all modules
perl single.pl --all > lexbor_single.h
# Generate amalgamation for specific modules
perl single.pl html css > lexbor_html_css_single.h
# Generate with exported symbols (for dynamic linking)
perl single.pl --with-export-symbols html > lexbor_html_single.h
# Use a different port (default: posix)
perl single.pl --port=windows_nt html > lexbor_html_single.h
Once generated, simply include the amalgamated file in your project:
#include "lexbor_single.h"
int
main(void)
{
/* Your code using Lexbor. */
return 0;
}
Compile without any additional dependencies:
gcc -o myapp myapp.c lexbor_single.h
Documentation: https://lexbor.com/amalgamation/.
Single or separately
Single
- liblexbor — this is a single library that includes all modules.
Separately
- liblexbor-{module name} — libraries for each module.
If you only need an HTML parser, use liblexbor-html.
Separate modules may depend on each other.
For example, dependencies for liblexbor-html: liblexbor-core, liblexbor-dom, liblexbor-tag, liblexbor-ns.
The liblexbor-html library already contains all the pointers to the required dependencies. Just include it in the assembly: gcc program.c -llexbor-html.
External Bindings and Wrappers
- Elixir binding for the HTML module (since 2.0 version)
- Erlang Fast HTML5 Parser with CSS selectors and DOM manipulation (since 2.6.0 version)
- Crystal Fast HTML5 Parser with CSS selectors for Crystal language
- Python binding for modest and lexbor engines.
- D Fast HTML5 Parser with CSS selectors for D programming language
- Ring Fast HTML5 Parser with CSS selectors and DOM manipulation for the Ring programming language.
- Ruby Fast HTML5 Parser with both CSS selectors and XPath support.
- PHP's DOM extension uses Lexbor's HTML living standard parser and CSS selector support, starting from PHP 8.4.
- Julia binding for the HTML module.
You can create a binding or wrapper for the lexbor and place the link here!
Documentation
Available on lexbor.com in Documentation section.
Roadmap
Please, see roadmap on lexbor.com.
Getting Help
- E-mail support@lexbor.com
Our Sponsors
<img src="images/neural-logo.png" alt="goneural.ai" width="320"> [<img src="images/SerpApi-logo.png" alt="serpapi.com" width="320">](https://serpapi.com/?ut
