Base32768
Binary-to-text encoding highly optimised for UTF-16
Install / Use
/learn @qntm/Base32768README
base32768
Base32768 is a binary encoding optimised for UTF-16-encoded text. This JavaScript module, base32768, is the first implementation of this encoding.
The efficiency chart speaks for itself. Efficiency ratings are averaged over long inputs. Higher is better.
<table> <thead> <tr> <th colspan="2" rowspan="2">Encoding</th> <th colspan="3">Efficiency</th> <th rowspan="2">Bytes per Tweet *</th> </tr> <tr> <th>UTF‑8</th> <th>UTF‑16</th> <th>UTF‑32</th> </tr> </thead> <tbody> <tr> <td rowspan="5">ASCII‑constrained</td> <td>Unary / <a href="https://github.com/ferno/base1">Base1</a></td> <td style="text-align: right;">0%</td> <td style="text-align: right;">0%</td> <td style="text-align: right;">0%</td> <td style="text-align: right;">1</td> </tr> <tr> <td>Binary</td> <td style="text-align: right;">13%</td> <td style="text-align: right;">6%</td> <td style="text-align: right;">3%</td> <td style="text-align: right;">35</td> </tr> <tr> <td>Hexadecimal</td> <td style="text-align: right;">50%</td> <td style="text-align: right;">25%</td> <td style="text-align: right;">13%</td> <td style="text-align: right;">140</td> </tr> <tr> <td>Base64</td> <td style="text-align: right;"><strong>75%</strong></td> <td style="text-align: right;">38%</td> <td style="text-align: right;">19%</td> <td style="text-align: right;">210</td> </tr> <tr> <td>Base85 †</td> <td style="text-align: right;">80%</td> <td style="text-align: right;">40%</td> <td style="text-align: right;">20%</td> <td style="text-align: right;">224</td> </tr> <tr> <td rowspan="4">BMP‑constrained</td> <td><a href="https://github.com/ferno/hexagram-encode">HexagramEncode</a></td> <td style="text-align: right;">25%</td> <td style="text-align: right;">38%</td> <td style="text-align: right;">19%</td> <td style="text-align: right;">105</td> </tr> <tr> <td><a href="https://github.com/ferno/braille-encode">BrailleEncode</a></td> <td style="text-align: right;">33%</td> <td style="text-align: right;">50%</td> <td style="text-align: right;">25%</td> <td style="text-align: right;">140</td> </tr> <tr> <td><a href="https://github.com/qntm/base2048">Base2048</a></td> <td style="text-align: right;">56%</td> <td style="text-align: right;">69%</td> <td style="text-align: right;">34%</td> <td style="text-align: right;"><strong>385</strong></td> </tr> <tr> <td><a href="https://github.com/ferno/base32768">Base32768</a></td> <td style="text-align: right;">63%</td> <td style="text-align: right;"><strong>94%</strong></td> <td style="text-align: right;">47%</td> <td style="text-align: right;">263</td> </tr> <tr> <td rowspan="3">Full Unicode</td> <td><a href="https://github.com/keith-turner/ecoji">Ecoji</a></td> <td style="text-align: right;">31%</td> <td style="text-align: right;">31%</td> <td style="text-align: right;">31%</td> <td style="text-align: right;">175</td> </tr> <tr> <td><a href="https://github.com/ferno/base65536">Base65536</a></td> <td style="text-align: right;">56%</td> <td style="text-align: right;">64%</td> <td style="text-align: right;"><strong>50%</strong></td> <td style="text-align: right;">280</td> </tr> <tr> <td><a href="https://github.com/ferno/base131072">Base131072</a> ‡</td> <td style="text-align: right;">53%+</td> <td style="text-align: right;">53%+</td> <td style="text-align: right;">53%</td> <td style="text-align: right;">297</td> </tr> </tbody> </table>* New-style "long" Tweets, up to 280 Unicode characters give or take Twitter's complex "weighting" calculation.<br/> † Base85 is listed for completeness but all variants use characters which are considered hazardous for general use in text: escape characters, brackets, punctuation etc..<br/> ‡ Base131072 is a work in progress, not yet ready for general use.<br/>
Base32768 uses only "safe" Unicode code points - no unassigned code points, no whitespace, no control characters, etc..
Installation
npm install base32768
Usage
import { encode, decode } from 'base32768'
const uint8Array = new Uint8Array([104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100])
const str = encode(uint8Array)
console.log(str)
// 6 code points, '媒腻㐤┖ꈳ埳'
const uint8Array2 = decode(str)
console.log(uint8Array2)
// [104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100]
API
base32768.encode(uint8Array)
Encodes a Uint8Array and returns a Base32768 String. Note that every Node.js Buffer is a Uint8Array.
The string is suitable for passing safely through almost any "Unicode-clean" text-handling API. This string contains no special characters and is immune to Unicode normalization. Give or take some padding characters, the output string has 1 character per 15 bits of input.
All characters are chosen from the Basic Multilingual Plane. This means that when encoded as UTF-16, all characters occupy 16 bits. Thus, there are 16 bits of output UTF-16 text per 15 bits of input, an efficiency of 93.75%.
base32768.decode(str)
Decodes a Base32768 String and returns a Uint8Array containing the original binary data. Note that a Uint8Array can be converted to a Node.js Buffer like so:
const buffer = Buffer.from(uint8Array.buffer, uint8Array.byteOffset, uint8Array.byteLength)
License
MIT
Related Skills
node-connect
349.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.7kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
