SyntheticData
High-performance synthetic enterprise data generator. Produces 100+ interconnected financial tables — GL journal entries, document flows, subledgers, banking/KYC/AML, process mining (OCEL 2.0), graph exports (PyTorch Geometric, Neo4j), and 20+ process chains — with Benford's Law compliance, ACFE-aligned fraud labels, and formal privacy guarantees.
Install / Use
/learn @mivertowski/SyntheticDataREADME
DataSynth v2.0.0
Synthetic enterprise data generation for ML training, audit analytics, and system testing.
DataSynth generates statistically realistic, fully interconnected enterprise financial data. It produces coherent General Ledger journal entries, document flows, subledger records, banking transactions, process mining event logs, and graph exports across 20+ enterprise process families.
Generated data respects accounting identities (debits = credits, Assets = Liabilities + Equity), follows empirical distributions (Benford's Law, log-normal mixtures), and maintains referential integrity across 100+ output tables.
Table of Contents
- Quick Start
- Key Capabilities
- Architecture
- Installation
- Configuration
- Output Structure
- Python SDK
- Server & Deployment
- Desktop UI
- Privacy-Preserving Fingerprinting
- Use Cases
- Performance
- Documentation
- License
Quick Start
# Build from source
git clone https://github.com/mivertowski/SyntheticData.git
cd SyntheticData
cargo build --release
# Demo mode -- generates a complete dataset with defaults
./target/release/datasynth-data generate --demo --output ./demo-output
# Full audit simulation (113+ output files)
./target/release/datasynth-data generate --demo --preset audit-group --output ./audit-output
# Or configure for your use case
./target/release/datasynth-data init --industry manufacturing --complexity medium -o config.yaml
./target/release/datasynth-data validate --config config.yaml
./target/release/datasynth-data generate --config config.yaml --output ./output
Group Audit Simulation
The audit-group preset generates a complete enterprise group audit dataset following ISA, IFRS, US GAAP, and local regulations:
./target/release/datasynth-data generate --demo --preset audit-group --output ./audit-output
# Export in SAP, French (FEC), or German (GoBD) audit formats
./target/release/datasynth-data generate --config config.yaml --output ./output --export-format sap --export-format fec
This produces 113+ interconnected files:
| Category | Content | |----------|---------| | Financial Statements | Standalone + consolidated BS/IS/CF with elimination schedules | | Audit Lifecycle | Engagement, risk assessment, procedures, sampling, findings, opinion | | ISA 600 Group Audit | Component auditors, materiality allocation, scope, instructions, reports | | Risk Assessment | Combined Risk Assessment (CRA) per account area and assertion | | Audit Methodology | Materiality (ISA 320), sampling (ISA 530), analytical procedures (ISA 520) | | Accounting Standards | Deferred tax, ECL, provisions, pensions, stock comp, business combinations | | SOX Compliance | Section 302 certifications, Section 404 ICFR assessments | | Graph Export | 78+ entity types, 39+ edge types for ML training and AI agent interaction |
CRA drives sampling, sampling correlates with misstatement rates, misstatements drive findings, findings drive the audit opinion.
Key Capabilities
Statistical Foundations
- Distribution engine -- Log-normal mixtures, Gaussian mixtures, Pareto, Weibull, Beta, and zero-inflated distributions with configurable components
- Copula correlations -- Cross-field dependency modeling via Gaussian, Clayton, Gumbel, Frank, and Student-t copulas
- Benford's Law -- First and second-digit compliance with configurable deviation for anomaly injection
- Temporal patterns -- Month-end/quarter-end/year-end volume spikes, intraday segments, business day calendars (15 regions), processing lags, and fiscal calendar support
- Regime changes -- Economic cycles, acquisition effects, and structural breaks in time series
- Industry profiles -- Pre-configured distributions for Retail, Manufacturing, Financial Services, Healthcare, and Technology
Enterprise Process Simulation
Every process chain generates its own master data, documents, and journal entries -- all cross-referenced:
| Process Family | Scope | |----------------|-------| | General Ledger | Journal entries, chart of accounts (small/medium/large), ACDOCA event logs | | Procure-to-Pay | Purchase requisitions, POs, goods receipts, vendor invoices, payments, three-way match | | Order-to-Cash | Sales orders, deliveries, customer invoices, receipts, dunning | | Source-to-Contract | Spend analysis, sourcing projects, supplier qualification, RFx, bids, contracts, scorecards | | Hire-to-Retire | Payroll runs, tax/deduction calculations, time & attendance, expense reports, benefit enrollment | | Manufacturing | Production orders, BOM explosion, routing operations, WIP costing, quality inspections, cycle counts | | Financial Reporting | Balance sheet, income statement, cash flow, changes in equity, KPIs, budget variance | | Tax Accounting | Multi-jurisdiction tax (Federal/State/Local), VAT/GST returns, ASC 740/IAS 12 provisions, FIN 48 uncertain positions, withholding | | Treasury | Cash positioning, probability-weighted forecasts, cash pooling, hedging (ASC 815/IFRS 9), debt covenants, netting | | Project Accounting | WBS hierarchies, cost lines, percentage-of-completion revenue, earned value (SPI/CPI/EAC), change orders | | ESG / Sustainability | GHG Scope 1/2/3 emissions, energy/water/waste, workforce diversity, safety metrics, GRI/SASB/TCFD disclosures | | Intercompany | IC matching, transfer pricing, consolidation eliminations, currency translation | | Subledgers | AR, AP, Fixed Assets, Inventory -- each with GL reconciliation | | Period Close | Monthly close engine, depreciation runs, accruals, year-end closing entries | | Banking / KYC / AML | Customer personas, KYC profiles, AML typologies (structuring, layering, mule, funnel) | | Sales | Quote-to-order pipeline with win rate modeling and pricing negotiation | | Bank Reconciliation | Statement matching, outstanding checks, deposits in transit | | Audit | ISA lifecycle: engagements, workpapers, evidence, risk assessments, findings, opinions (ISA 700), KAMs (ISA 701), SOX 302/404 | | Group Audit (ISA 600) | Component auditors, materiality allocation, scope assignment, component instructions/reports, consolidation |
Accounting, Audit & Compliance Standards
- Accounting frameworks -- US GAAP, IFRS, French GAAP (PCG), German GAAP (HGB/SKR04), and dual reporting
- Revenue recognition -- ASC 606 / IFRS 15 with contract generation, performance obligations, and SSP allocation
- Leases -- ASC 842 / IFRS 16 with ROU assets, lease liabilities, and classification
- Fair value -- ASC 820 / IFRS 13 Level 1/2/3 hierarchy
- Impairment -- ASC 360 / IAS 36 testing with fair value estimation
- Audit standards -- ISA (34 standards), PCAOB (19+ standards) with procedure mapping
- SOX compliance -- Section 302/404 assessments with deficiency classification and material weakness detection
- COSO 2013 -- 5 components, 17 principles, maturity levels, entity-level and transaction-level controls
- Compliance regulations -- 45+ built-in standards registry, jurisdiction profiles (10 countries), regulatory filings, audit procedures, and compliance findings with full deficiency classification
- Cross-domain compliance graph -- Standards linked to GL account types and business processes; full traversal paths (Company -> Jurisdiction -> Standard -> Account -> JournalEntry)
- Localized exports -- FEC (French) and GoBD (German) audit file formats
- Enterprise Group Audit (ISA 600) -- Component auditor assignment, group materiality allocation, scope assignment (full/specific/analytical), component instructions and reports
- Audit Opinion (ISA 700/705/706/701) -- Opinion derived from findings severity and going concern, Key Audit Matters, PCAOB ICFR opinion
- Audit Methodology -- Combined Risk Assessment (ISA 315), materiality calculations (ISA 320), sampling methodology (ISA 530), SCOTS classification, unusual item detection, analytical relationships (ISA 520)
- Deferred Tax (IAS 12 / ASC 740) -- Temporary differences, ETR reconciliation, rollforward schedules, valuation allowances
- Business Combinations (IFRS 3 / ASC 805) -- Purchase price allocation, fair value step-ups, goodwill, contingent consideration
- Segment Reporting (IFRS 8 / ASC 280) -- Operating segments with reconciliation to consolidated totals
- Expected Credit Loss (IFRS 9 / ASC 326) -- Provision matrix by aging bucket, forward-looking scenarios, ECL movements
- Pensions (IAS 19 / ASC 715) -- DBO rollforward, plan assets, pension expense, OCI remeasurements
- Provisions (IAS 37 / ASC 450) -- Framework-aware recognition thresholds, provision movements
- Stock Compensation (ASC 718 / IFRS 2) -- Grants, vesting schedules, expense recognition
- Functional Currency (IAS 21) -- Per-entity functional currency, CTA as OCI
- Consolidated Financial Statements -- Standalone + consolidated with elimination schedules
- Going Concern (ISA 570) -- Financial indicator derivation, management mitigation plans
- Subsequent Events (ISA 560 / IAS 10) -- Adjusting and non-adjusting events
YAML-Driven Audit FSM Engine
The datasynth-audit-fsm crate provides a methodology-agnostic state machine engine that loads audit methodology blueprints from YAML and generates even
