Tardygrada
Trust infrastructure for AI agents. Know who produced a value, when, and that it hasn't been tampered with. Zero dependencies. Pure C.
Install / Use
/learn @fabio-rovai/TardygradaQuality Score
Category
Development & EngineeringSupported Platforms
README
Your agent says it checked three sources. Did it?
Your document says "completed on time" on page 2 and "delayed 3 months" on page 7. Did anyone notice?
Your scoring pipeline passed through 5 agents. Can you prove the scores weren't changed along the way?
git clone https://github.com/fabio-rovai/tardygrada && cd tardygrada && make
tardy run "Paris is in France" # VERIFIED (80%)
tardy verify-doc report.md # 2 contradictions found
tardy daemon start && tardy run "check this" # persistent, remembers everything
What it does
Catches lazy agents
Your agent claims it queried the knowledge base, consulted sources, and cross-checked. Tardygrada records every operation independently — like a dashcam. If the agent faked it, you'll know.
| Laziness type | What it means | Caught? | |---|---|:-:| | Did nothing, produced output anyway | NoWork | Yes | | Skimmed instead of analyzing | ShallowWork | Yes | | Fabricated evidence of work | FakeProof | Yes | | Copied another agent's answer | CopiedWork | Yes | | "Verified" itself in a circle | CircularVerification | Yes |
Catches contradicting claims
"The project was completed on time." and "The project was delayed by 3 months." — both sound fine alone. Together, they're a contradiction. Existing tools check claims one by one and miss this.
Tardygrada checks them together. Three layers:
- Logical contradictions (direct opposites, impossible combinations)
- Numeric contradictions (the math doesn't add up)
- Domain contradictions (the science doesn't work)
tardy verify-doc paper.md
# [CONFLICT] Lines 42 vs 89:
# "We used no external APIs"
# "API costs totalled $2,400"
# → claims no APIs but reports API costs
Catches tampered data
A score of 8.5 stored in a Python dict — any agent can silently change it to 9.5. In Tardygrada, values are locked by the operating system. Tampering requires breaking SHA-256 or forging an ed25519 signature.
Get started
Just the CLI:
make # builds in < 3 seconds
tardy run "your claim here" # verify anything
tardy verify-doc your-file.md # scan for contradictions
Persistent mode (remembers between runs):
tardy daemon start # start background service
tardy run "claim" # uses persistent knowledge base
tardy daemon status # see what it knows
tardy daemon stop # clean shutdown
Inside Claude Code (MCP server):
{
"mcpServers": {
"tardygrada": {
"command": "tardygrada",
"args": ["mcp-bridge"]
}
}
}
Then just ask: "verify this document for contradictions"
Inside Claude Code (session monitor):
/targyactivate
Activates Tardygrada as a contradiction monitor for the entire session. Every claim you and Claude make is recorded in the palace memory and checked against session history. If either side contradicts itself, Tardygrada flags it. Say targy off to deactivate.
Inside Qwen Code (MCP server):
Qwen Code uses newline-delimited JSON-RPC instead of Content-Length framing. Use the included adapter:
{
"mcpServers": {
"tardygrada": {
"command": "/bin/bash",
"args": ["path/to/tardygrada/hooks/targy-mcp-wrapper.sh"]
}
}
}
This gives Qwen Code access to verify_claim, verify_document, spawn_agent, read_agent, and daemon_status as native MCP tools. The wrapper starts the daemon automatically if it isn't running.
Convert your existing agents:
tardy terraform /path/to/crewai # 153K lines → 53 instructions
tardy terraform /path/to/llamaindex # 237K lines → 15 instructions
How well does it work?
Laziness detection
| | Precision | Recall | F1 | |---|:-:|:-:|:-:| | Clear cases (60 traces) | 1.00 | 1.00 | 1.00 | | + Adversarial (100 total) | 1.00 | 0.85 | 0.92 |
100 traces total. Zero false positives. Smart copiers who change 10-15% of the text slip through (similarity below threshold) — a known limitation. No existing tool does any of this.
Contradiction and hallucination detection
| Dataset | What it is | Tardygrada | Best alternative | |---|---|:-:|:-:| | Clear contradictions (125) | Designed compositional | 95% | SelfCheck: 59% | | + Borderline cases (225 total) | Soft/ambiguous contradictions | 69% | SelfCheck: 38% | | AgentHallu (693 trajectories) | Real agent hallucinations, 7 frameworks | F1: 0.58 | DeepSeek-V3.1: 0.52 | | ContraDoc (891 docs) | Real documents, human-annotated | F1: 0.58 | SelfCheck: 0.16 | | HaluEval (500 responses) | Individual factual errors | F1: 0.03 | SelfCheck: 0.32 |
Detection runs in two modes: deterministic (all benchmarks use this) or LLM-enhanced for broader coverage. Typical speeds: 5.7ms/trajectory (AgentHallu), 7.5ms/document (ContraDoc), 0.015ms/case (synthetic).
On ContraDoc (891 real documents) — F1 0.58, up from 0.16 after fixing a bug where the benchmark accidentally used the SelfCheck baseline instead of proper triple checking. Recall jumped from 9.1% to 64.8%.
On AgentHallu (693 real agent trajectories) — F1 0.58, beats DeepSeek-V3.1 (0.52). GPT-5 gets 0.70 but costs per-trajectory API calls.
HaluEval (individual factual errors) — F1 0.03. Expected: our pipeline catches contradictions between claims, not individual factual mistakes. SelfCheck does better here (0.32) because its loose heuristics accidentally catch some errors.
<details> <summary>AgentHallu per-category recall</summary>What runs where: Contradiction detection (verify-doc, all benchmarks) uses the internal decomposition + consistency + numeric layers — no external calls. Claim grounding (
tardy run "claim") optionally connects to open-ontologies for OWL reasoning, or uses the built-in Datalog engine. Different features, different paths.
| Category | Recall | |---|:-:| | Reasoning | 68% | | Planning | 66% | | Retrieval | 59% | | Human-Interaction | 53% | | Tool-Use | 21% |
</details> <details> <summary>Detailed breakdown (clear cases)</summary>| Difficulty | Detection | |---|:-:| | Easy (direct opposites) | 100% | | Medium (logical) | 100% | | Hard (math/physics) | 96% | | Subtle (domain knowledge) | 92% | | Very subtle (statistical) | 88% |
</details>Scaling
| Agents | Time | |-------:|-----:| | 5 | 0.6 ms | | 500 | 21 ms | | 5,000 | 97 ms |
Under the hood
<details> <summary><b>How verification works</b></summary>graph LR
subgraph Pipeline["Verification Pipeline"]
direction LR
C["Claim"] --> D["Decompose"]
D --> G["Ground"]
G --> CON["Consistency"]
CON --> P["Probabilistic"]
P --> PR["Protocol"]
PR --> F["Certification"]
F --> CR["Cross-Rep"]
CR --> W["Work Verify"]
W --> V{"VERIFIED /<br>CONFLICT /<br>UNVERIFIABLE"}
end
style Pipeline fill:transparent
Claims are decomposed into triples, grounded against a knowledge base, checked for consistency, scored probabilistically, and verified for work integrity. Eight layers, all deterministic.
</details> <details> <summary><b>How tamper protection works</b></summary>graph LR
subgraph Trust["Protection Levels"]
direction LR
MUT["Mutable"] --> DEF["Default<br>(OS-locked)"]
DEF --> VER["Verified<br>(+ SHA-256)"]
VER --> HARD["Hardened<br>(+ replicas)"]
HARD --> SOV["Sovereign<br>(+ ed25519 + BFT)"]
end
style Trust fill:transparent
Values are protected at the operating system level. The OS kernel enforces read-only memory. SHA-256 hashes detect any change. Ed25519 signatures prove authorship. BFT consensus requires corrupting multiple independent replicas.
</details> <details> <summary><b>How the daemon works</b></summary>graph TB
subgraph visible["What you see"]
USER["You"] --> CLI["tardy run / verify-doc"]
end
subgraph hidden["What happens"]
CLI --> DAEMON["Persistent daemon"]
DAEMON --> AGENTS["Living agents"]
DAEMON --> KB["Growing knowledge base"]
DAEMON --> VERIFY["Verification pipeline"]
end
style visible fill:transparent
style hidden fill:transparent
The daemon keeps agents alive between commands. The knowledge base grows as verified claims accumulate. Sovereign agents persist to disk on shutdown and reload on restart.
</details> <details> <summary><b>Architecture</b></summary>graph TB
subgraph Tardygrada["Tardygrada"]
CLI_CMD["CLI"] --> DAEMON_S["Daemon"]
DAEMON_S --> VM["VM Core"]
VM --> VERIFY_S["Verification"]
VM --> ONTO["Knowledge Base"]
VM --> CRYPTO_S["Cryptography"]
VERIFY_S --> DECOMP_S["Decompose"]
VERIFY_S --> NUMERIC_S["Numeric Check"]
VERIFY_S --> DOMAIN_S["Domain Check"]
VERIFY_S --> WORK_S["Work Verify"]
end
subgraph External["Optional integrations"]
BITF["brain-in-the-fish<br>(multi-agent debate)"]
OO["open-ontologies<br>(OWL reasoning)"]
end
VM -- "coordinate" --> BITF
VM -- "grounded_in" --> OO
style Tardygrada fill:transparent
style External fill:transparent
</details>
<details>
<summary><b>The language (for power users)</b></summary>
agent MedicalAdvisor @sovereign @semantics(truth.min_confidence: 0.99) {
invar
Related Skills
node-connect
354.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
112.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
Hook Development
112.4kThis skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.
MCP Integration
112.4kThis skill should be used when the user asks to "add MCP server", "integrate MCP", "configure MCP in plugin", "use .mcp.json", "set up Model Context Protocol", "connect external service", mentions "${CLAUDE_PLUGIN_ROOT} with MCP", or discusses MCP server types (SSE, stdio, HTTP, WebSocket). Provides comprehensive guidance for integrating Model Context Protocol servers into Claude Code plugins for external tool and service integration.
