Bytes
Data structures handling
Install / Use
/learn @ExodusOSS/BytesREADME
@exodus/bytes
Uint8Array conversion to and from base64, base32, base58, hex, utf8, utf16, bech32 and wif
And a TextEncoder / TextDecoder polyfill
See documentation.
Strict
Performs proper input validation, ensures no garbage-in-garbage-out
Tested in CI with @exodus/test on:
Fast
10-20xfaster thanBufferpolyfill2-10xfaster thaniconv-lite
The above was for the js fallback
It's up to 100x when native impl is available
e.g. in utf8fromString on Hermes / React Native or fromHex in Chrome
Also:
3-8xfaster thanbs5810-30xfaster than@scure/base(or>100xon Node.js <25)- Faster in
utf8toString/utf8fromStringthanBufferorTextDecoder/TextEncoderon Node.js
See Performance for more info
TextEncoder / TextDecoder polyfill
import { TextDecoder, TextEncoder } from '@exodus/bytes/encoding.js'
import { TextDecoderStream, TextEncoderStream } from '@exodus/bytes/encoding.js' // Requires Streams
Less than half the bundle size of text-encoding, whatwg-encoding or iconv-lite (gzipped or not).
Also much faster than all of those.
[!TIP] See also the lite version to get this down to 8 KiB gzipped.
Spec compliant, passing WPT and covered with extra tests.
Moreover, tests for this library uncovered bugs in all major implementations.
Including all three major browser engines being wrong at UTF-8.
See WPT pull request.
It works correctly even in environments that have native implementations broken (that's all of them currently).
Runs (and passes WPT) on Node.js built without ICU.
[!NOTE] Faster than Node.js native implementation on Node.js.
The JS multi-byte version is as fast as native impl in Node.js and browsers, but (unlike them) returns correct results.
For encodings where native version is known to be fast and correct, it is automatically used.
Some single-byte encodings are faster than native in all three major browser engines.
See analysis table for more info.
Caveat: TextDecoder / TextEncoder APIs are lossy by default per spec
These are only provided as a compatibility layer, prefer hardened APIs instead in new code.
-
TextDecodercan (and should) be used with{ fatal: true }option for all purposes demanding correctness / lossless transforms -
TextEncoderdoes not support a fatal mode per spec, it always performs replacement.That is not suitable for hashing, cryptography or consensus applications.
Otherwise there would be non-equal strings with equal signatures and hashes — the collision is caused by the lossy transform of a JS string to bytes. Those also survive e.g.JSON.stringify/JSON.parseor being sent over network.Use strict APIs in new applications, see
utf8fromString/utf16fromStringbelow.
Those throw on non-well-formed strings by default.
Lite version
Alternate exports exist that can help reduce bundle size, see comparison:
| import | size |
| - | - |
| @exodus/bytes/encoding-browser.js | <sub></sub> |
| @exodus/bytes/encoding-lite.js | <sub>
</sub> |
| @exodus/bytes/encoding.js | <sub>
</sub> |
|
text-encoding | <sub></sub> |
|
iconv-lite | <sub></sub> |
|
whatwg-encoding | <sub></sub> |
Libraries are advised to use single-purpose hardened @exodus/bytes/utf8.js / @exodus/bytes/utf16.js APIs for Unicode.
Applications (including React Native apps) are advised to load either @exodus/bytes/encoding-lite.js or @exodus/bytes/encoding.js
(depending on whether legacy multi-byte support is needed) and use that as a global polyfill.
@exodus/bytes/encoding-lite.js
If you don't need support for legacy multi-byte encodings.
Reduces the bundle size ~12x, while still keeping utf-8, utf-16le, utf-16be and all single-byte encodings specified by the spec.
The only difference is support for legacy multi-byte encodings.
This can be useful for example in React Native global TextDecoder polyfill, if you are sure that you don't need legacy multi-byte encodings support.
@exodus/bytes/encoding-browser.js
Resolves to a tiny import in browser bundles, preferring native TextDecoder / TextEncoder.
For non-browsers (Node.js, React Native), loads a full implementation.
[!NOTE] This is not the default behavior for
@exodus/bytes/encoding.jsbecause all major browser implementations have bugs, which@exodus/bytes/encoding.jsfixes. Only use if you are ok with that.
API
@exodus/bytes/utf8.js <sub>
</sub>
UTF-8 encoding/decoding
import { utf8fromString, utf8toString } from '@exodus/bytes/utf8.js'
// loose
import { utf8fromStringLoose, utf8toStringLoose } from '@exodus/bytes/utf8.js'
These methods by design encode/decode BOM (codepoint U+FEFF Byte Order Mark) as-is.
If you need BOM handling or detection, use @exodus/bytes/encoding.js
utf8fromString(string, format = 'uint8')
Encode a string to UTF-8 bytes (strict mode)
Throws on invalid Unicode (unpaired surrogates)
This is similar to the following snippet (but works on all engines):
// Strict encode, requiring Unicode codepoints to be valid
if (typeof string !== 'string' || !string.isWellFormed()) throw new TypeError()
return new TextEncoder().encode(string)
utf8fromStringLoose(string, format = 'uint8')
Encode a string to UTF-8 bytes (loose mode)
Replaces invalid Unicode (unpaired surrogates) with replacement codepoints U+FFFD
per WHATWG Encoding specification.
Such replacement is a non-injective function, is irreversable and causes collisions.
Prefer using strict throwing methods for cryptography applications.
This is similar to the following snippet (but works on all engines):
// Loose encode, replacing invalid Unicode codepoints with U+
