Copycat
Generate deterministic fake values: The same input will always generate the same fake-output.
Install / Use
/learn @supabase-community/CopycatREADME
import { copycat } from '@snaplet/copycat'
copycat.email('foo')
// => 'Raleigh.McGlynn56687@wholewick.info'
copycat.email('bar')
// => 'Amir_Kris69246@raw-lout.name'
copycat.email('foo')
// => 'Raleigh.McGlynn56687@wholewick.info'
Motivation
The problem
Many of the use cases we aim to solve with snaplet involves anonymizing sensitive information. In practice, this involves replacing each bit of sensitive data with something else that resembles the original value, yet does not allow the original value to be inferred.
To do this, we initially turned to faker for replacing the sensitive data with fake data. This approach took us quite far. However, we struggled with getting the replacement data to be deterministic: we found we did not have enough control over how results are generated to be able to easily ensure that for each value of the original data we wanted to replace, we'd always get the same replacement value out.
Faker allows one to seed a pseudo-random number generator (PRNG), such that the same sequence of values will be generated every time. While this means the sequence is deterministic, the problem was we did not have enough control over where the next value in the sequence was going to be used. Changes to the contents or structure in the original data we're replacing and changes to how we are using faker both had an effect on the way we used this sequence, which in turn had an effect on the resulting replacement value for any particular value in the original data. In other words, we had determinism, but not in a way that is useful for our purpose.
The solution
What we were really needing was not the same sequence of generated values every time, but the same mapping to generated values every time.
This is exactly what we designed Copycat to do. For each method provided by Copycat, a given input value will always map to the same output value.
import { copycat } from '@snaplet/copycat'
copycat.email('foo')
// => 'Raleigh.McGlynn56687@wholewick.info'
copycat.email('bar')
// => 'Amir_Kris69246@raw-lout.name'
copycat.email('foo')
// => 'Raleigh.McGlynn56687@wholewick.info'
Copycat works statelessly: for the same input, the same value will be returned regardless of the environment, process, call ordering, or any other external factors.
Under the hood, Copycat hashes the input values (using SipHash), with the intention of making it computationally infeasible for the input values to be inferred from the output values.
Alternative approaches
It is still technically possible to make use of faker or similar libraries that offer deterministic PRNG - with some modification. That said, these solutions came with practical limitations that we decided made them less viable for us:
- It is possible to simply seed the PRNG for every identifier, and then use it to generate only a single value. This seems to be a misuse of these libraries though: there is an up-front cost to seeding these PRNGs that can be expensive if done for each and every value to be generated. Here are benchmarks that point to this up-front cost.
- You can generate a sequence of N values, hash identifiers to some integer smaller than N, then simply use that as an index to lookup a value in the sequence. This can even be done lazily. Still, you're now limiting the uniqueness of the values to N. The larger N is, the larger the cost of keeping these sequences in memory, or the more computationally expensive it is if you do not hold onto the sequences in memory. The smaller N is, the less unique your generated values are.
Note though that for either of these approaches, hashing might also still be needed to make it infeasible for the inputs to be inferred from the outputs.
API Reference
Overview
<a name="input"></a>All Copycat functions take in an input value as their first parameter:
import { copycat } from '@snaplet/copycat'
copycat.email('foo')
// => 'Raleigh.McGlynn56687@wholewick.info'
The given input can be any JSON-serializable value. For any two calls to the same function, the input given in each call serializes down to the same value and the same output will be returned.
Note that unlike JSON.stringify(), object property ordering is not considered.
Working with PII (Personal Identifiable Information)
<a name="pii"></a>If you're using sensitive information as input to Copycat, the fact that Copycat makes use of SipHash means it is difficult for the original input value to be inferred from the output value - it is computationally infeasible.
// It is difficult to infer 'Some sensitive input' from 'Rhianna Ebert'
copycat.fullName('Some sensitive input')
// -> 'Rhianna Ebert'
That said, there is still something we need to watch out for: with enough guessing, the input values can still be figured out from the output values.
Lets say we replaced all the first names in some table of data. Included in this data was the name 'Susan', which was replaced with 'Therese':
copycat.firstName('Susan') // -> 'Therese'
While the attacker is able to see the name Therese, it is difficult for them to look at Copycat's code, and figure out 'Susan' from 'Therese'. But the attacker knows they're dealing with first names, and they have access to the Copycat library. What they can do, is input a list of first names into Copycat, until they find a matching name.
Let's say they input the name 'John'. The result is 'April', which does not match 'Therese', so they move on. They next try 'Sarah', which maps to 'Florencio' - again no match, they move on. They next try Susan, which maps to the name they see - Therese. This means they have a match, and now know that the original name was Susan:
copycat.firstName('John') // -> 'April', no match
copycat.firstName('Sarah') // -> 'Florencio', no match
copycat.firstName('Susan') // -> 'Therese', match!
To prevent this, you'll need to give copycat a key to use when hashing the values:
// store this somewhere safe
const key = copycat.generateHashKey('g9u*rT#!72R$zl5e')
copycat.fullName('foo')
// => 'Mohamed Weissnat'
copycat.setHashKey(key)
copycat.fullName('foo')
// => 'Bertha Sauer'
The idea is that while Copycat's code is publicly known, the key isn't publically known. This means that even though attackers have access to Copycat's code, they are not able to figure out which inputs map to which outputs, since they do not have access to the key.
faker
A re-export of faker from @faker-js/faker. We do not alter faker in any way, and do not seed it.
fictional
A re-export of fictional, a library used under the hood by copycat for mapping inputs to primitive values.
copycat.scramble(input[, options])
Takes in an input value, and returns a value of the same type and length, but with each character/digit replaced with a different character/digit.
For string, the replacement characters will be in the same character range:
- By default, spaces are preserved (see
preserveoption below) - Lower case ASCII characters are replaced with lower case ASCII letters
- Upper case ASCII characters are replaced with upper case ASCII letters
- Digits are replaced with digits
- Any other ASCII character in the code point range 32 to 126 (0x20 - 0x7e) is replaced with either an alphanumeric character, or
_,-, or+ - Any other character is replaced with a Latin-1 character in the range of (0x20 - 0x7e, or 0xa0 - 0xff)
copycat.scramble('Zakary Hessel')
// => 'Vqjmtp Rkbqyl'
If a number is given, each digit will be replaced, and the floating point (if relevant) will be preserved:
copycat.scramble(782364.902374)
// => 239724.505138
If an object or array is given, the values inside the object or array will be recursively scrambled:
copycat.scramble({
a: [
{
b: 23,
c: 'foo',
},
],
})
// => { a: [ { b: 10, c: 'mem' } ] }
If a date is given, each segment in the date will be scrambled:
copycat.scramble(new Date('2022-10-25T19:08:39.374Z'))
// => {}
If a boolean or null value is given, the value will simply be returned.
If a value of any other type is given, an error will be thrown
options
- preserve: An array of characters that should remain the same if present in the given input string
copycat.scramble('foo@bar.org', { preserve: ['@', '.'] })
// => 'nzx@vib.elt'
copycat.oneOf(input, values[, options])
Takes in an input value and an array of values, and returns an item in values that corresponds to that input:
copycat.oneOf('foo', ['red', 'green', 'blue'])
// => 'green'
copycat.oneOfString(input, values[, options])
Like oneOf(), takes in an input value and an array of values, and returns an item in values that corresponds to that input. However, values needs to be an array of string values, and only values within the character limit set by the limit option will be picked.
copycat.oneOfString('foo', ['short', 'loooooooong'], { limit: 6 })
// => 'short'
options
limit: If thevaluesare strings, the picked value will be constrained to be less thanlimit's amount of charactersfallback: string | (input) => string: Whenlimitis specified but no values match the givenlimit, fallback is called with the given input value.
options
limit: If thevaluesare strings, the picked value will be constrained to be less than or equallimit's amount
Related Skills
feishu-drive
342.0k|
things-mac
342.0kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
342.0kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
codebase-memory-mcp
1.1kHigh-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 66 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.

