EntropyString for JavaScript

Efficiently generate cryptographically strong random strings of specified entropy from various character sets.

<a name="TOC"></a>TOC

Installation
Usage
Overview
Real Need
More Examples
Character Sets
Custom Characters
Efficiency
Custom Bytes
No Crypto
Browser Version
Entropy Bits
Why You Don't Need UUIDs
Upgrading
Take Away

<a name="Installation"></a>Installation

Yarn

  yarn add entropy-string

NPM

  npm install entropy-string

TOC

<a name="Usage"></a>Usage

Generate strings as an efficient replacement to using version 4 UUIDs

const entropy = new Entropy()
const string = entropy.string()

GtTr2h4PT2mjffm2GrDN2rhpqp

See the UUID section for a discussion of why the above is more efficient than using the string representation of version 4 UUIDs.

Generate a potential of 1 million random strings with 1 in a billion chance of repeat

const { Entropy } = require('entropy-string')

const entropy = new Entropy({ total: 1e6, risk: 1e9 })
const string = entropy.string()

pbbnBD4MQ3rbRN

See Real Need for description of what the total and risk parameters represent.

Hexidecimal strings

EntropyString uses predefined charset32 characters by default (see Character Sets). To get a random hexadecimal string:

const { Entropy, charset16 } = require('entropy-string')

const entropy = new Entropy({ total: 1e6, risk: 1e9, charset: charset16 })
const string = entropy.string()

878114ac513a538e22

Custom characters

Custom characters may also be specified. Using uppercase hexadecimal characters:

const { Entropy } = require('entropy-string')

const entropy = new Entropy({ total: 1e6, risk: 1e9, charset: '0123456789ABCDEF' })
const string = entropy.string()

16E26779479356B516

Convenience functions

Convenience functions smallID, mediumID, largeID, sessionID and token provide random strings for various predefined bits of entropy. For example, a small id represents a potential of 30 strings with a 1 in a million chance of repeat:

const { Entropy } = require('entropy-string')

const entropy = new Entropy()
const string = entropy.smallID()

DpTQqg

Or, to generate an OWASP session ID:

const { Entropy } = require('entropy-string')

const entropy = new Entropy()
const string = entropy.sessionID()

nqqBt2P669nmjPQRqh4NtmTPn9

Or perhaps you need an 256-bit token using RFC 4648 file system and URL safe characters:

const { Entropy, charset64} = require('entropy-string')

const entropy = new Entropy({ charset: charset64 })
const string = entropy.token()

t-Z8b9FLvpc-roln2BZnGYLZAX_pn5U7uO_cbfldsIt

Examples

Run any of the examples in the examples directory by:

yarn examples
node examples/dist/tldr_1.js

TOC

<a name="Overview"></a>Overview

EntropyString provides easy creation of randomly generated strings of specific entropy using various character sets. Such strings are needed as unique identifiers when generating, for example, random IDs and you don't want the overkill of a UUID.

A key concern when generating such strings is that they be unique. Guaranteed uniqueness, however, requires either deterministic generation (e.g., a counter) that is not random, or that each newly created random string be compared against all existing strings. When randomness is required, the overhead of storing and comparing strings is often too onerous and a different tack is chosen.

A common strategy is to replace the guarantee of uniqueness with a weaker but often sufficient one of probabilistic uniqueness. Specifically, rather than being absolutely sure of uniqueness, we settle for a statement such as "there is less than a 1 in a billion chance that two of my strings are the same". We use an implicit version of this very strategy every time we use a hash set, where the keys are formed from taking the hash of some value. We assume there will be no hash collision using our values, but we do not have any true guarantee of uniqueness per se.

Fortunately, a probabilistic uniqueness strategy requires much less overhead than guaranteed uniqueness. But it does require we have some manner of qualifying what we mean by "there is less than a 1 in a billion chance that 1 million strings of this form will have a repeat".

Understanding probabilistic uniqueness of random strings requires an understanding of entropy and of estimating the probability of a collision (i.e., the probability that two strings in a set of randomly generated strings might be the same). The blog post Hash Collision Probabilities provides an excellent overview of deriving an expression for calculating the probability of a collision in some number of hashes using a perfect hash with an N-bit output. This is sufficient for understanding the probability of collision given a hash with a fixed output of N-bits, but does not provide an answer to qualifying what we mean by "there is less than a 1 in a billion chance that 1 million strings of this form will have a repeat". The Entropy Bits section below describes how EntropyString provides this qualifying measure.

We'll begin investigating EntropyString by considering the Real Need when generating random strings.

TOC

<a name="RealNeed"></a>Real Need

Let's start by reflecting on the common statement: I need random strings 16 characters long.

Okay. There are libraries available that address that exact need. But first, there are some questions that arise from the need as stated, such as:

What characters do you want to use?
How many of these strings do you need?
Why do you need these strings?

The available libraries often let you specify the characters to use. So we can assume for now that question 1 is answered with:

Hexadecimal will do fine.

As for question 2, the developer might respond:

I need 10,000 of these things.

Ah, now we're getting somewhere. The answer to question 3 might lead to a further qualification:

I need to generate 10,000 random, unique IDs.

And the cat's out of the bag. We're getting at the real need, and it's not the same as the original statement. The developer needs uniqueness across a total of some number of strings. The length of the string is a by-product of the uniqueness, not the goal, and should not be the primary specification for the random string.

As noted in the Overview, guaranteeing uniqueness is difficult, so we'll replace that declaration with one of probabilistic uniqueness by asking a fourth question:

<ol start=4> <li>What risk of a repeat are you willing to accept?</li> </ol>

Probabilistic uniqueness contains risk. That's the price we pay for giving up on the stronger declaration of guaranteed uniqueness. But the developer can quantify an appropriate risk for a particular scenario with a statement like:

I guess I can live with a 1 in a million chance of a repeat.

So now we've finally gotten to the developer's real need:

I need 10,000 random hexadecimal IDs with less than 1 in a million chance of any repeats.

Not only is this statement more specific, there is no mention of string length. The developer needs probabilistic uniqueness, and strings are to be used to capture randomness for this purpose. As such, the length of the string is simply a by-product of the encoding used to represent the required uniqueness as a string.

How do you address this need using a library designed to generate strings of specified length? Well, you don't, because that library was designed to answer the originally stated need, not the real need we've uncovered. We need a library that deals with probabilistic uniqueness of a total number of some strings. And that's exactly what EntropyString does.

Let's use EntropyString to help this developer generate 5 hexadecimal IDs from a pool of a potential 10,000 IDs with a 1 in a million chance of a repeat:

const { Entropy, charset16 } = require('entropy-string')
const entropy = new Entropy({ total: 10000, risk: 1000000, charset: charset16 })
const strings = Array(5).fill('').map(e => entropy.string())

["85e442fa0e83", "a74dc126af1e", "368cd13b1f6e", "81bf94e1278d", "fe7dec099ac9"]

Examining the above code, the total and risk parameters specify how much entropy is needed to satisfy the probabilistic uniqueness of generating a potential total of 10,000 strings with a 1 in a million risk of repeat. The charset parameter specifies the characters to use. Finally, the strings themselves are generated using entropy.string().

Looking at the IDs, we can see each is 12 characters long. It seems the developer didn't really need 16 characters after all.

JavaScript

Install / Use

README