Datamimic
๐ง Model-driven synthetic test data for CI/CD and analytics - deterministic, privacy-preserving, and domain-aware. Includes Python APIs, XML pipelines, and MCP/IDE integration to orchestrate realistic datasets for finance, healthcare, and other regulated environments.
Install / Use
/learn @rapiddweller/DatamimicQuality Score
Category
Development & EngineeringSupported Platforms
README
DATAMIMIC โ Deterministic Synthetic Test Data That Makes Sense
Generate realistic, interconnected, and reproducible test data for finance, healthcare, and beyond.
Faker gives you random data. DATAMIMIC gives you consistent, explainable datasets that respect business logic and domain constraints.
- ๐งฌ Patient medical histories that match age and demographics
- ๐ณ Bank transactions that obey balance constraints
- ๐ก Insurance policies aligned with real risk profiles
โจ Why DATAMIMIC?
Typical data generators produce isolated random values. Thatโs fine for unit tests โ but meaningless for system, analytics, or compliance testing.
# Faker โ broken relationships
patient_name = fake.name()
patient_age = fake.random_int(1, 99)
conditions = [fake.word()]
# "25-year-old with Alzheimer's" โ nonsense data
# DATAMIMIC โ contextual realism
from datamimic_ce.domains.healthcare.services import PatientService
patient = PatientService().generate()
print(f"{patient.full_name}, {patient.age}, {patient.conditions}")
# "Shirley Thompson, 72, ['Diabetes', 'Hypertension']"
โ๏ธ Quickstart (Community Edition)
Install and run:
pip install datamimic-ce
Deterministic Generation
DATAMIMIC produces the same data for the same request, across machines and CI runs. Seeds, clocks, and UUIDv5 namespaces enforce reproducibility.
from datamimic_ce.domains.facade import generate_domain
request = {
"domain": "person",
"version": "v1",
"count": 1,
"seed": "docs-demo", # identical seed โ identical output
"locale": "en_US",
"clock": "2025-01-01T00:00:00Z" # fixed clock = stable time context
}
response = generate_domain(request)
print(response["items"][0]["id"])
# Same input โ same output
Determinism Contract
- Inputs:
{seed, clock, uuidv5-namespace, request body} - Guarantees: byte-identical payloads + stable
determinism_proof.content_hash - Scope: all CE domains (see docs for domain-specific caveats)
โก MCP (Model Context Protocol)
Run DATAMIMIC as an MCP server so Claude / Cursor (and agents) can call deterministic data tools.
Install
pip install datamimic-ce[mcp]
# Development
pip install -e .[mcp]
Run (SSE transport)
export DATAMIMIC_MCP_HOST=127.0.0.1
export DATAMIMIC_MCP_PORT=8765
# Optional auth; clients must send the same token via Authorization: Bearer or X-API-Key
export DATAMIMIC_MCP_API_KEY=changeme
datamimic-mcp
In-proc example (determinism proof)
import anyio, json
from fastmcp.client import Client
from datamimic_ce.mcp.models import GenerateArgs
from datamimic_ce.mcp.server import create_server
async def main():
args = GenerateArgs(domain="person", locale="en_US", seed=42, count=2)
payload = args.model_dump(mode="python")
async with Client(create_server()) as c:
a = await c.call_tool("generate", {"args": payload})
b = await c.call_tool("generate", {"args": payload})
print(json.loads(a[0].text)["determinism_proof"]["content_hash"]
== json.loads(b[0].text)["determinism_proof"]["content_hash"]) # True
anyio.run(main)
Config keys
DATAMIMIC_MCP_HOST(default127.0.0.1)DATAMIMIC_MCP_PORT(default8765)DATAMIMIC_MCP_API_KEY(unset = no auth)- Requests over cap (
count > 10_000) are rejected with422.
โก๏ธ Full guide, IDE configs (Claude/Cursor), transports, errors: docs/mcp_quickstart.md
๐งฉ Domains & Examples
๐ฅ Healthcare
from datamimic_ce.domains.healthcare.services import PatientService
patient = PatientService().generate()
print(patient.full_name, patient.conditions)
- Demographically realistic patients
- Doctor specialties match conditions
- Hospital capacities and types
- Longitudinal medical records
๐ฐ Finance
from datamimic_ce.domains.finance.services import BankAccountService
account = BankAccountService().generate()
print(account.account_number, account.balance)
- Balances respect transaction histories
- Card/IBAN formats per locale
- Distributions tuned for fraud/reconciliation tests
๐ Demographics
PersonServicewith locale packs (DE / US / VN), versioned and auditable
๐ Deterministic by Design
- Frozen clocks + canonical hashing โ reproducible IDs
- Seeded RNG โ identical outputs across runs
- Schema validation (XSD/JSONSchema) โ structural integrity
- Provenance hashing โ audit-ready lineage
๐ See Developer Guide
๐งฎ XML / Python Parity
Python:
from random import Random
from datamimic_ce.domains.common.models.demographic_config import DemographicConfig
from datamimic_ce.domains.healthcare.services import PatientService
cfg = DemographicConfig(age_min=70, age_max=75)
svc = PatientService(dataset="US", demographic_config=cfg, rng=Random(1337))
print(svc.generate().to_dict())
Equivalent XML:
<setup>
<generate name="seeded_seniors" count="3" target="CSV">
<variable name="patient" entity="Patient" dataset="US" ageMin="70" ageMax="75" rngSeed="1337" />
<key name="full_name" script="patient.full_name" />
<key name="age" script="patient.age" />
<array name="conditions" script="patient.conditions" />
</generate>
</setup>
๐งฐ CLI
# Run instant healthcare demo
datamimic demo create healthcare-example
datamimic run ./healthcare-example/datamimic.xml
# Verify version
datamimic version
Quality gates (repo):
make typecheck # mypy --strict
make lint # pylint (โฅ9.0 score target)
make coverage # target โฅ 90%
๐งญ Architecture Snapshot
- Core pipeline: Determinism kit โข Domain services โข Schema validators
- Governance layer: Group tables โข Linkage audits โข Provenance hashing
- Execution layer: CLI โข API โข XML runners โข MCP server
โ๏ธ CE vs EE
| Feature | Community (CE) | Enterprise (EE) | | ------------------------------------- | -------------- | --------------- | | Deterministic domain generation | โ | โ | | XML + Python pipelines | โ | โ | | Healthcare & Finance domains | โ | โ | | Multi-user collaboration | โ | โ | | Governance & lineage dashboards | โ | โ | | ML engines (Mostly AI, Synthcity, โฆ) | โ | โ | | RBAC & audit logging (HIPAA/GDPR/PCI) | โ | โ | | EDIFACT / SWIFT adapters | โ | โ |
๐ Compare editions โข Book a strategy call
๐ Documentation & Community
๐ Get Started
pip install datamimic-ce
Generate data that makes sense โ deterministically. โญ Star us on GitHub if DATAMIMIC improves your testing workflow.
Related Skills
gh-issues
353.1kFetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]
node-connect
353.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
Writing Hookify Rules
111.6kThis skill should be used when the user asks to "create a hookify rule", "write a hook rule", "configure hookify", "add a hookify rule", or needs guidance on hookify rule syntax and patterns.
