SkillAgentSearch skills...

DataLossPrevention

Data Loss Prevention (DLP) Sample Data Files

Install / Use

/learn @bhdicaire/DataLossPrevention

README

dataLossPrevention logo

You’ve been there too — setting up a data loss prevention solution might be a damn long project (DLP), if you need to support multiple languages and don’t have adequate data sources.

This repository consolidate Data Loss/Leak Prevention insight and sample files (e.g., datasets), that I have collected and used over the years. Your quality assurance library does not have to be unique, everyone strives for consistency.

Fork this repository, and improve your library. Even better, send me an update :laughing:.

A DLP solution is a set of enterprise processes, tools, and techniques that monitor sensitive information and prevent data exfiltration.

What problem does it solve and why is it useful?

I wasn't happy with the provided bundle of mock files to test my DLP policies and demonstrate compliance. They were either too simple or not localized for my use case.

Friend don’t let friends test the effectiveness of a DLP solution with production data. You need realistic test data[^1] in several formats such as CSV, JSON, SQL, TXT, and Excel to make sure your DLP Policies are working correctly especially after a significant change.

dataLossPrevention by Benoît H. Dicaire is shared with an unlicense. For more information, please refer to unlicense.org.

[^1]: Refer to the sensitive information type entity definitions provided by Microsoft for more information about the required structure.

Fake sensitive information generators

| Name | Cybersecurity | Finance | Legal | Personal | Technology| | :-- | :--: | :--:| :--: | :--:| :--: | |DLP Test| X | X | X | X | X | |Fake Person Generator| X | X | X | X | X | |Fake Generator| X | X | X | X | X | |GenerateData.com[^2]| X | X | X | X | X | |Get Fake Data| X | X | X | X | X | |Get Bored Human| X | X | X | X | X | |Mockaroo| X | X | X | X | X | |Mock Turtle| X | X | X | X | X | |Venkom| X | X | X | X | X |

[^2]:Source code is available on GitHub/benkeen/generatedata

You can also search on GitHub for library code and C tool related to data-generator, fake-data, mock-data , mock-data-generator, and test data.

Related Skills

View on GitHub
GitHub Stars46
CategoryDevelopment
Updated20d ago
Forks10

Languages

PostScript

Security Score

95/100

Audited on Mar 5, 2026

No findings