GraphemeSplitter
A C# implementation of the Unicode grapheme cluster breaking algorithm
Install / Use
/learn @ufcpp/GraphemeSplitterREADME
GraphemeSplitter
A C# implementation of the Unicode grapheme cluster breaking algorithm.
Notes
- This library uses Unicode 10.0 version of grepheme boundary algorithm.
- In .NET 5.0,
StringInfo.GetTextElementEnumeratorcan enumerate graphemes correctly with Unicode 13.0 algorithm.
NuGet package
https://www.nuget.org/packages/GraphemeSplitter/
Install-Package GraphemeSplitter
Sample
using GraphemeSplitter;
using static System.Console;
using static System.String;
public partial class Program
{
static string Split(string s) => Join(", ", s.GetGraphemes());
static void Main()
{
WriteLine(Split("👨👨👧👦👩👩👧👦👨👨👧👦")); // 👨👨👧👦, 👩👩👧👦, 👨👨👧👦
}
}

Implementation
This library basically implements http://unicode.org/reports/tr29/.
Expample:
type | text | split result --- | --- | --- diacritical marks | à̡̠́ḅ̢̂̃c̣̤̃̄d̥̦̅̆ | "à̡̠́", "ḅ̢̂̃", "c̣̤̃̄", "d̥̦̅̆" variation selector | 葛葛󠄀葛󠄁 | "葛", "葛󠄀", "葛󠄁" asian syllable | 안녕하세요 | "안", "녕", "하", "세", "요" family emoji | 👨👨👧👦👩👩👧👦👨👨👧👦 | "👨👨👧👦", "👩👩👧👦", "👨👨👧👦" emoji skin tone | 👩🏻👱🏼👧🏽👦🏾 | "👩🏻", "👱🏼", "👧🏽", "👦🏾"
but slacks out the GB10, GB12, and GB13 rules for simplification.
original:
- GB10 … (E_Base | EBG) Extend* × E_Modifier
- GB12 … sot (RI RI)* RI × RI
- GB13 … [^RI] (RI RI)* RI × RI
implemented:
- GB10 … (E_Base | EBG) × Extend
- GB10 … (E_Base | EBG | Extend) × E_Modifier
- GB12/GB13 … RI × RI
Difference is:
sequence | original | implemented --- | --- | --- à🏻 (U+61, U+300, U+1F3FB) | × ÷ | × × 🇯🇵🇺🇸 (U+1F1EF, U+1F1F5, U+1F1FA, U+1F1F8) | × ÷ × | × × ×
(where ÷ and × means boundary and no bounadry respectively.)
Acknowledgements
This library is influenced by
- https://github.com/devongovett/grapheme-breaker
- https://github.com/orling/grapheme-splitter
- https://github.com/unicode-rs/unicode-segmentation
Related Skills
node-connect
352.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
