SkillAgentSearch skills...

Tardygrada

Trust infrastructure for AI agents. Know who produced a value, when, and that it hasn't been tampered with. Zero dependencies. Pure C.

Install / Use

/learn @fabio-rovai/Tardygrada

README

CI License: MIT

<p align="center"> <img src="tardygrada-logo.png" alt="Tardygrada" width="200"> </p> <h3 align="center">Catch lazy agents, contradicting claims, and tampered data</h3>

Your agent says it checked three sources. Did it?

Your document says "completed on time" on page 2 and "delayed 3 months" on page 7. Did anyone notice?

Your scoring pipeline passed through 5 agents. Can you prove the scores weren't changed along the way?

git clone https://github.com/fabio-rovai/tardygrada && cd tardygrada && make

tardy run "Paris is in France"                    # VERIFIED (80%)
tardy verify-doc report.md                        # 2 contradictions found
tardy daemon start && tardy run "check this"      # persistent, remembers everything

What it does

Catches lazy agents

Your agent claims it queried the knowledge base, consulted sources, and cross-checked. Tardygrada records every operation independently — like a dashcam. If the agent faked it, you'll know.

| Laziness type | What it means | Caught? | |---|---|:-:| | Did nothing, produced output anyway | NoWork | Yes | | Skimmed instead of analyzing | ShallowWork | Yes | | Fabricated evidence of work | FakeProof | Yes | | Copied another agent's answer | CopiedWork | Yes | | "Verified" itself in a circle | CircularVerification | Yes |

Catches contradicting claims

"The project was completed on time." and "The project was delayed by 3 months." — both sound fine alone. Together, they're a contradiction. Existing tools check claims one by one and miss this.

Tardygrada checks them together. Three layers:

  • Logical contradictions (direct opposites, impossible combinations)
  • Numeric contradictions (the math doesn't add up)
  • Domain contradictions (the science doesn't work)
tardy verify-doc paper.md
# [CONFLICT] Lines 42 vs 89:
#   "We used no external APIs"
#   "API costs totalled $2,400"
#   → claims no APIs but reports API costs

Catches tampered data

A score of 8.5 stored in a Python dict — any agent can silently change it to 9.5. In Tardygrada, values are locked by the operating system. Tampering requires breaking SHA-256 or forging an ed25519 signature.


Get started

Just the CLI:

make                                    # builds in < 3 seconds
tardy run "your claim here"             # verify anything
tardy verify-doc your-file.md           # scan for contradictions

Persistent mode (remembers between runs):

tardy daemon start                      # start background service
tardy run "claim"                       # uses persistent knowledge base
tardy daemon status                     # see what it knows
tardy daemon stop                       # clean shutdown

Inside Claude Code (MCP server):

{
  "mcpServers": {
    "tardygrada": {
      "command": "tardygrada",
      "args": ["mcp-bridge"]
    }
  }
}

Then just ask: "verify this document for contradictions"

Inside Claude Code (session monitor):

/targyactivate

Activates Tardygrada as a contradiction monitor for the entire session. Every claim you and Claude make is recorded in the palace memory and checked against session history. If either side contradicts itself, Tardygrada flags it. Say targy off to deactivate.

Inside Qwen Code (MCP server):

Qwen Code uses newline-delimited JSON-RPC instead of Content-Length framing. Use the included adapter:

{
  "mcpServers": {
    "tardygrada": {
      "command": "/bin/bash",
      "args": ["path/to/tardygrada/hooks/targy-mcp-wrapper.sh"]
    }
  }
}

This gives Qwen Code access to verify_claim, verify_document, spawn_agent, read_agent, and daemon_status as native MCP tools. The wrapper starts the daemon automatically if it isn't running.

Convert your existing agents:

tardy terraform /path/to/crewai         # 153K lines → 53 instructions
tardy terraform /path/to/llamaindex     # 237K lines → 15 instructions

How well does it work?

Laziness detection

| | Precision | Recall | F1 | |---|:-:|:-:|:-:| | Clear cases (60 traces) | 1.00 | 1.00 | 1.00 | | + Adversarial (100 total) | 1.00 | 0.85 | 0.92 |

100 traces total. Zero false positives. Smart copiers who change 10-15% of the text slip through (similarity below threshold) — a known limitation. No existing tool does any of this.

Contradiction and hallucination detection

| Dataset | What it is | Tardygrada | Best alternative | |---|---|:-:|:-:| | Clear contradictions (125) | Designed compositional | 95% | SelfCheck: 59% | | + Borderline cases (225 total) | Soft/ambiguous contradictions | 69% | SelfCheck: 38% | | AgentHallu (693 trajectories) | Real agent hallucinations, 7 frameworks | F1: 0.58 | DeepSeek-V3.1: 0.52 | | ContraDoc (891 docs) | Real documents, human-annotated | F1: 0.58 | SelfCheck: 0.16 | | HaluEval (500 responses) | Individual factual errors | F1: 0.03 | SelfCheck: 0.32 |

Detection runs in two modes: deterministic (all benchmarks use this) or LLM-enhanced for broader coverage. Typical speeds: 5.7ms/trajectory (AgentHallu), 7.5ms/document (ContraDoc), 0.015ms/case (synthetic).

On ContraDoc (891 real documents) — F1 0.58, up from 0.16 after fixing a bug where the benchmark accidentally used the SelfCheck baseline instead of proper triple checking. Recall jumped from 9.1% to 64.8%.

On AgentHallu (693 real agent trajectories) — F1 0.58, beats DeepSeek-V3.1 (0.52). GPT-5 gets 0.70 but costs per-trajectory API calls.

HaluEval (individual factual errors) — F1 0.03. Expected: our pipeline catches contradictions between claims, not individual factual mistakes. SelfCheck does better here (0.32) because its loose heuristics accidentally catch some errors.

What runs where: Contradiction detection (verify-doc, all benchmarks) uses the internal decomposition + consistency + numeric layers — no external calls. Claim grounding (tardy run "claim") optionally connects to open-ontologies for OWL reasoning, or uses the built-in Datalog engine. Different features, different paths.

<details> <summary>AgentHallu per-category recall</summary>

| Category | Recall | |---|:-:| | Reasoning | 68% | | Planning | 66% | | Retrieval | 59% | | Human-Interaction | 53% | | Tool-Use | 21% |

</details> <details> <summary>Detailed breakdown (clear cases)</summary>

| Difficulty | Detection | |---|:-:| | Easy (direct opposites) | 100% | | Medium (logical) | 100% | | Hard (math/physics) | 96% | | Subtle (domain knowledge) | 92% | | Very subtle (statistical) | 88% |

</details>

Scaling

| Agents | Time | |-------:|-----:| | 5 | 0.6 ms | | 500 | 21 ms | | 5,000 | 97 ms |


Under the hood

<details> <summary><b>How verification works</b></summary>
graph LR
    subgraph Pipeline["Verification Pipeline"]
        direction LR
        C["Claim"] --> D["Decompose"]
        D --> G["Ground"]
        G --> CON["Consistency"]
        CON --> P["Probabilistic"]
        P --> PR["Protocol"]
        PR --> F["Certification"]
        F --> CR["Cross-Rep"]
        CR --> W["Work Verify"]
        W --> V{"VERIFIED /<br>CONFLICT /<br>UNVERIFIABLE"}
    end

    style Pipeline fill:transparent

Claims are decomposed into triples, grounded against a knowledge base, checked for consistency, scored probabilistically, and verified for work integrity. Eight layers, all deterministic.

</details> <details> <summary><b>How tamper protection works</b></summary>
graph LR
    subgraph Trust["Protection Levels"]
        direction LR
        MUT["Mutable"] --> DEF["Default<br>(OS-locked)"]
        DEF --> VER["Verified<br>(+ SHA-256)"]
        VER --> HARD["Hardened<br>(+ replicas)"]
        HARD --> SOV["Sovereign<br>(+ ed25519 + BFT)"]
    end

    style Trust fill:transparent

Values are protected at the operating system level. The OS kernel enforces read-only memory. SHA-256 hashes detect any change. Ed25519 signatures prove authorship. BFT consensus requires corrupting multiple independent replicas.

</details> <details> <summary><b>How the daemon works</b></summary>
graph TB
    subgraph visible["What you see"]
        USER["You"] --> CLI["tardy run / verify-doc"]
    end

    subgraph hidden["What happens"]
        CLI --> DAEMON["Persistent daemon"]
        DAEMON --> AGENTS["Living agents"]
        DAEMON --> KB["Growing knowledge base"]
        DAEMON --> VERIFY["Verification pipeline"]
    end

    style visible fill:transparent
    style hidden fill:transparent

The daemon keeps agents alive between commands. The knowledge base grows as verified claims accumulate. Sovereign agents persist to disk on shutdown and reload on restart.

</details> <details> <summary><b>Architecture</b></summary>
graph TB
    subgraph Tardygrada["Tardygrada"]
        CLI_CMD["CLI"] --> DAEMON_S["Daemon"]
        DAEMON_S --> VM["VM Core"]
        VM --> VERIFY_S["Verification"]
        VM --> ONTO["Knowledge Base"]
        VM --> CRYPTO_S["Cryptography"]
        VERIFY_S --> DECOMP_S["Decompose"]
        VERIFY_S --> NUMERIC_S["Numeric Check"]
        VERIFY_S --> DOMAIN_S["Domain Check"]
        VERIFY_S --> WORK_S["Work Verify"]
    end

    subgraph External["Optional integrations"]
        BITF["brain-in-the-fish<br>(multi-agent debate)"]
        OO["open-ontologies<br>(OWL reasoning)"]
    end

    VM -- "coordinate" --> BITF
    VM -- "grounded_in" --> OO

    style Tardygrada fill:transparent
    style External fill:transparent
</details> <details> <summary><b>The language (for power users)</b></summary>
agent MedicalAdvisor @sovereign @semantics(truth.min_confidence: 0.99) {
    invar

Related Skills

View on GitHub
GitHub Stars17
CategoryDevelopment
Updated1d ago
Forks1

Languages

C

Security Score

95/100

Audited on Apr 9, 2026

No findings