Utf8.lua
pure-lua 5.3 regex library
Install / Use
/learn @Stepets/Utf8.luaREADME
utf8.lua
pure-lua 5.3 regex library for Lua 5.3, Lua 5.1, LuaJIT
This library provides simple way to add UTF-8 support into your application.
Example:
local utf8 = require('.utf8'):init()
for k,v in pairs(utf8) do
string[k] = v
end
local str = "пыщпыщ ололоо я водитель нло"
print(str:find("(.л.+)н"))
-- 8 26 ололоо я водитель
print(str:gsub("ло+", "보라"))
-- пыщпыщ о보라보라 я водитель н보라 3
print(str:match("^п[лопыщ ]*я"))
-- пыщпыщ ололоо я
Usage:
This library can be used as drop-in replacement for vanilla string library. It exports all vanilla functions under raw sub-object.
local utf8 = require('.utf8'):init()
local str = "пыщпыщ ололоо я водитель нло"
utf8.gsub(str, "ло+", "보라")
-- пыщпыщ о보라보라 я водитель н보라 3
utf8.raw.gsub(str, "ло+", "보라")
-- пыщпыщ о보라보라о я водитель н보라 3
It also provides all functions from Lua 5.3 UTF-8 module except utf8.len (s [, i [, j]]). If you need to validate your strings use utf8.validate(str, byte_pos) or iterate over with utf8.validator.
Please note that library assumes regexes are valid UTF-8 strings, if you need to manipulate individual bytes use vanilla functions under utf8.raw.
Installation:
Download repository to your project folder. (no rockspecs yet)
Examples assume library placed under utf8 subfolder not utf8.lua.
As of Lua 5.3 default utf8 module has precedence over user-provided. In this case you can specify full module path (.utf8).
Configuration:
Library is highly modular. You can provide your implementation for almost any function used. Library already has several back-ends:
- Runtime character class processing using hardcoded codepoint ranges or using native functions through
ffi. - Basic functions for working with UTF-8 characters have specializations for
ffi-enabled runtime and for tarantool.
Probably most interesting customizations are utf8.config.loadstring and utf8.config.cache if you want to precompile your regexes.
local utf8 = require('.utf8')
utf8.config = {
cache = my_smart_cache,
}
utf8:init()
For lower and upper functions to work in environments where ffi cannot be used, you can specify substitution tables (data example)
local utf8 = require('.utf8')
utf8.config = {
conversion = {
uc_lc = utf8_uc_lc,
lc_uc = utf8_lc_uc
},
}
utf8:init()
Customization is done before initialization. If you want, you can change configuration after init, it might work for everything but modules. All of them should be reloaded.
Documentation:
Issue reporting:
Please provide example script that causes error together with environment description and debug output. Debug output can be obtained like:
local utf8 = require('.utf8')
utf8.config = {
debug = utf8:require("util").debug
}
utf8:init()
-- your code
Default logger used is io.write and can be changed by specifying logger = my_logger in configuration
Related Skills
node-connect
342.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.7kCommit, push, and open a PR
