Piicloak
Enterprise-grade PII detection and anonymization REST API built on Presidio
Install / Use
/learn @dimanjet/PiicloakREADME
PIICloak
<div align="center">Enterprise-grade PII detection and anonymization API
Fast · Accurate · GDPR/CCPA Ready · 31 Entity Types
Quick Start · Documentation · Use Cases · API Reference
</div>🎯 What is PIICloak?
PIICloak is a production-ready REST API service for detecting and anonymizing Personally Identifiable Information (PII) in text and documents. Built on Microsoft's Presidio with custom recognizers optimized for:
- 🏢 Salesforce data (Account/Contact/Case IDs)
- ⚖️ Legal documents (Case numbers, contracts)
- 💰 Financial data (Bank accounts, tax IDs)
- 🏥 Healthcare (Medical records, HIPAA compliance)
- 💻 Technical data (API keys, IP addresses)
Why PIICloak?
| Feature | PIICloak | Alternatives |
|---------|----------|--------------|
| Entity Types | 31 (including custom business entities) | 10-15 standard types |
| Organization Detection | ✅ NER-based (works with ANY company name) | ❌ Pattern-only |
| Salesforce Support | ✅ Native (Account/Contact/Case/Lead IDs) | ❌ Not included |
| Legal Document Support | ✅ Case numbers, contracts, dockets | ❌ Not included |
| API Keys Detection | ✅ OpenAI, AWS, GitHub, Stripe, generic | ⚠️ Limited |
| SDK | ✅ Python SDK included | ❌ API only |
| One-Line Install | ✅ pip install piicloak | ⚠️ Complex setup |
| Docker Ready | ✅ Production-grade image | ⚠️ Basic |
| Metrics | ✅ Prometheus built-in | ❌ None |
| Auth | ✅ Optional API key | ❌ None |
🚀 Quick Start
30-Second Setup
# Install
pip install piicloak
# Run
python -m piicloak
Server starts on http://localhost:8000 🎉
Instant Test
curl -X POST http://localhost:8000/anonymize \
-H "Content-Type: application/json" \
-d '{"text": "Email john@acme.com, SSN 123-45-6789"}'
Response:
{
"anonymized": "Email <EMAIL_ADDRESS>, SSN <US_SSN>",
"entities_found": [
{"type": "EMAIL_ADDRESS", "text": "john@acme.com", "score": 1.0},
{"type": "US_SSN", "text": "123-45-6789", "score": 0.85}
]
}
Docker
docker run -p 8000:8000 dimanjet/piicloak
Python SDK
from piicloak import PIICloak
cloak = PIICloak()
result = cloak.anonymize("Contact John Smith at john@acme.com")
print(result.anonymized) # "Contact <PERSON> at <EMAIL_ADDRESS>"
✨ Features
Supported Entity Types (31)
| Entity Type | Description | Example |
|-------------|-------------|---------|
| 👤 PERSONAL IDENTIFIABLE INFORMATION |||
| PERSON | Names of individuals (NER-based) | "John Smith", "Jane Doe" |
| EMAIL_ADDRESS | Email addresses | "john@example.com" |
| PHONE_NUMBER | Phone numbers (multiple formats) | "+1-555-123-4567", "(555) 123-4567" |
| US_SSN | US Social Security Numbers | "123-45-6789" |
| US_PASSPORT | US Passport numbers | "123456789" |
| US_DRIVER_LICENSE | US Driver's License numbers | "D1234567" |
| ADDRESS | Physical addresses (NER + patterns) | "123 Main St, New York, NY 10001" |
| 💳 FINANCIAL INFORMATION |||
| CREDIT_CARD | Credit card numbers (all major brands) | "4532-1234-5678-9010" |
| IBAN_CODE | International Bank Account Numbers | "GB82 WEST 1234 5698 7654 32" |
| US_BANK_NUMBER | US bank account numbers | "123456789012" |
| BANK_ACCOUNT | Generic bank account patterns | "ACC-123456789" |
| TAX_ID | Tax IDs (EIN/TIN) | "12-3456789" |
| CRYPTO | Cryptocurrency addresses | "1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa" |
| 🏢 ORGANIZATIONAL DATA |||
| ORGANIZATION | Company names (NER-based) | "Acme Corp", "Tech Industries Inc" |
| DOMAIN | Internet domains | "example.com", "company.io" |
| SALESFORCE_ID | Salesforce record IDs (Account/Contact/Case/Lead) | "0015000000AbcDEF", "5005000000XyzABC" |
| ACCOUNT_ID | Generic account identifiers | "ACC-123456", "A-987654" |
| ⚖️ LEGAL DOCUMENTS |||
| CASE_NUMBER | Court case numbers (Federal/State) | "1:24-cv-12345", "CR-2024-001234" |
| CONTRACT_NUMBER | Contract and agreement numbers | "CONT-2024-001", "AGR-123456" |
| 💻 TECHNICAL & SECURITY |||
| USERNAME | Usernames and login IDs | "john_smith123", "@johndoe", "admin" |
| API_KEY | API keys (OpenAI, AWS, GitHub, Stripe, generic) | "sk-1234567890abcdef...", "ghp_abc..." |
| IP_ADDRESS | IPv4 and IPv6 addresses | "192.168.1.1", "2001:0db8::1" |
| URL | Web URLs | "https://example.com/page" |
| 🏥 HEALTHCARE & OTHER |||
| MEDICAL_LICENSE | Medical license numbers | "MD-123456" |
| UK_NHS | UK NHS numbers | "123 456 7890" |
| NRP | Número de Registro de Personas (Spanish ID) | "12345678A" |
| LOCATION | Geographic locations (NER-based) | "New York", "San Francisco" |
| DATE_TIME | Dates and timestamps | "2024-01-20", "January 20th, 2024" |
Total: 31 entity types covering personal, financial, organizational, legal, technical, and healthcare data.
Anonymization Modes
# Replace with entity type (default)
{"mode": "replace"} → "Contact <PERSON> at <EMAIL_ADDRESS>"
# Mask with asterisks
{"mode": "mask"} → "Contact ******** at ****************"
# Redact (remove completely)
{"mode": "redact"} → "Contact at "
# Hash (SHA256)
{"mode": "hash"} → "Contact a1b2c3d4... at e5f6g7h8..."
💼 Use Cases
Salesforce Data Protection
curl -X POST http://localhost:8000/anonymize \
-H "Content-Type: application/json" \
-d '{
"text": "Account: 0015000000AbcDEFG, Contact: Jane Doe (jane@company.com), Case: 5005000000XyzABC"
}'
Output:
Account: <SALESFORCE_ID>, Contact: <PERSON> (<EMAIL_ADDRESS>), Case: <SALESFORCE_ID>
Legal Documents
curl -X POST http://localhost:8000/anonymize \
-H "Content-Type: application/json" \
-d '{
"text": "Case No. 1:24-cv-12345 - Plaintiff John Doe (SSN: 123-45-6789) vs. Acme Corp (EIN: 12-3456789)"
}'
Output:
Case No. <CASE_NUMBER> - Plaintiff <PERSON> (SSN: <US_SSN>) vs. <ORGANIZATION> (EIN: <TAX_ID>)
API Keys & Secrets
curl -X POST http://localhost:8000/anonymize \
-H "Content-Type: application/json" \
-d '{
"text": "OpenAI key: sk-1234567890abcdefghijklmnopqrstuv, GitHub: ghp_abcdefghijklmnopqrstuvwxyz1234567890"
}'
Output:
OpenAI key: <API_KEY>, GitHub: <API_KEY>
.docx Files
curl -X POST http://localhost:8000/anonymize/docx \
-F "document=@contract.docx" \
-F "mode=replace"
📖 Documentation
Installation
# Basic installation
pip install piicloak
# Download NLP model (required)
python -m spacy download en_core_web_lg
# Or install everything at once
pip install piicloak && python -m spacy download en_core_web_lg
Configuration
All settings use the PIICLOAK_ prefix and have sensible defaults:
| Environment Variable | Default | Description |
|---------------------|---------|-------------|
| PIICLOAK_HOST | 0.0.0.0 | Server host |
| PIICLOAK_PORT | 8000 | Server port (standard) |
| PIICLOAK_DEBUG | false | Debug mode |
| PIICLOAK_WORKERS | 4 | Gunicorn workers |
| PIICLOAK_LOG_LEVEL | INFO | Logging level |
| PIICLOAK_SPACY_MODEL | en_core_web_lg | spaCy model |
| PIICLOAK_SCORE_THRESHOLD | 0.4 | Min confidence score (0-1) |
| PIICLOAK_DEFAULT_MODE | replace | Default anonymization mode |
| PIICLOAK_CORS_ORIGINS | * | CORS allowed origins |
| PIICLOAK_API_KEY | "" | Optional API key (empty = no auth) |
| PIICLOAK_RATE_LIMIT | 100/minute | Rate limiting |
| PIICLOAK_ENABLE_METRICS | true | Prometheus metrics |
Example:
export PIICLOAK_PORT=9000
export PIICLOAK_API_KEY=your-secret-key
python -m piicloak
🔌 API Reference
Endpoints
POST /anonymize - Anonymize Text
Request:
{
"text": "Contact John at john@acme.com",
"entities": ["PERSON", "EMAIL_ADDRESS"], // optional
"mode": "replace", // optional
"language": "en", // optional
"score_threshold": 0.4 // optional
}
Response:
{
"original": "Contact John at john@acme.com",
"anonymized": "Contact <PERSON> at <EMAIL_ADDRESS>",
"entities_found": [...]
}
POST /analyze - Detect PII Only
curl -X POST http://localhost:8000/analyze \
-H "Content-Type: application/json" \
-d '{"text": "Contact john@example.com"}'
GET /entities - List Supported Entities
curl http://localhost:8000/entities
GET /metrics - Prometheus Metrics
curl http://localhost:8000/metrics
GET /health - Health Check
curl http://localhost:8000/health
🐳 Deployment
Docker
# Build
docker build -t piicloak .
# Run
docker run -p 8000:8000 piicloak
# With environment variables
docker run -p 8000:8000 \
-e PIICLOAK_API_KEY=your-key \
-e PIICLOAK_WORKERS=8 \
piicloak
Docker Compose
docker-compose up -d
Production (Gunicorn)
pip install gunicorn
gunicorn -c gunicorn.conf.py "piicloak.app:create_application()"
Kubernetes
See docs/DEPLOYMENT.md for Kubernetes deployment guide.
🛠️ Development
Setup
