Temper

Your AI writes fast. Temper makes it last.

Intent-driven development with behavioral testing, security analysis, and quality gates for AI-generated code

Website | Getting Started | Releases

</div>

The Problem

AI writes code fast. But "fast" without "right" creates bugs, technical debt, and features that miss the point.

"Why not just tell Claude to be careful?"

You can. And it helps. But AI-generated code has structural failure patterns that "be careful" doesn't address. These aren't sloppiness — they're limitations of how LLMs generate code:

Missing behaviors — AI builds the happy path, skips edge cases. Rate limiting? Error recovery? Never implemented.
Wrong problem solved — Feature works perfectly, but nobody asked for it. All tests pass, wrong thing built.
Over-engineering — AI creates factories, strategies, and abstractions for something used exactly once.
Hallucinated APIs — AI calls methods that don't exist. It's confident they do.
Missing wiring — New code never registered in routing, DI, or config. The code itself is correct; the integration is missing.

These map to three unanswered questions:

| Question | What Goes Wrong Without It | |----------|---------------------------| | Did we solve the problem? | Feature works but nobody uses it. Wrong problem solved. | | Does it do the right things? | Happy path works, edge cases ship broken. | | Does the code work? | Tests pass, but they test implementation details, not behaviors. |

Most AI tools answer only the third. Temper answers all three.

IDD + BDD + TDD: Three Layers, One File

Temper combines three development methodologies in a single artifact called intent.md. Each layer answers a different question and is enforced at a different stage of the pipeline:

intent.md
|
+-- Intent Section (IDD)        WHY are we building this?
|   |                            Problem statement
|   |                            Success criteria (each with a Validate: type)
|   |                            Constraints
|   |
+-- Scenarios Section (BDD)     WHAT should it do?
    |                            Gherkin Given/When/Then
    |                            Derived BEFORE architecture
    |                            Every planned file traces to a scenario
    |
    +-- /temper:build (TDD)      HOW do we build it?
                                 Tests written from scenarios
                                 RED -> GREEN -> REFACTOR

IDD: Intent-Driven Development

Question: Did we solve the problem? When: Defined during /temper:plan, validated during /temper:review

IDD captures the why behind a feature. Not "add a password reset endpoint" but "users should be able to reset their password without contacting support, completing the flow in under 2 minutes."

The Intent section of intent.md contains:

Problem — What problem are we solving? For whom?
Success Criteria — Measurable outcomes, each with a Validate: type that tells review how to check it
Constraints — Technical or business limitations
Target Users — Who benefits

Validate Types

Each success criterion gets a validation type. This is what makes IDD mechanical instead of subjective:

| Type | What It Means | How Review Checks It | Example | |------|--------------|---------------------|---------| | scenario | Criterion is satisfied when a linked BDD scenario's test passes | Finds the test, runs it, checks PASS | "Users can reset password" -> linked to scenario "Successful password reset" | | code | Criterion is satisfied when specific code exists | Greps the codebase for the pattern | "POST /api/reset endpoint exists" -> greps for route definition | | metric | Can't be verified before deployment | Flags for post-deploy monitoring | "Support tickets decrease 30%" -> requires production data | | manual | Requires human judgment | Flags for human review, non-blocking | "Reset flow feels intuitive" -> UX review needed |

Why this matters: Without validate types, "intent validation" means the AI reads your success criteria and subjectively judges "yeah, this looks met." With validate types, most criteria are mechanically verified — a test passes or it doesn't, code exists or it doesn't. Only metric and manual require judgment.

Intent Validation in Review

When /temper:review runs, it produces:

Intent Validation (IDD): 4/5 (3 mechanical, 1 deferred, 1 manual)
  Problem: Users unable to reset passwords without support

  [x] Users can reset password without support
      validate: scenario -> test_successful_reset PASS
  [x] Reset endpoint exists at POST /api/reset
      validate: code -> route found in AuthController.ts:23
  [x] Rate limiting prevents abuse
      validate: scenario -> test_rate_limiting PASS
  [ ] Support ticket volume decreases 30%
      validate: metric -> post-deploy monitoring required
  [ ] Reset flow completes in under 2 minutes
      validate: manual -> requires human review

  Confidence: 3/5 mechanically verified

The higher the ratio of scenario/code criteria, the more confidence you have that the feature actually solves the stated problem.

BDD: Behavior-Driven Development

Question: Does it do the right things? When: Scenarios derived during /temper:plan (before architecture), enforced during /temper:build

BDD in Temper isn't an afterthought — scenarios are derived before the architecture exists. This is the key design decision. The flow is:

1. Blast radius analysis     -> identifies affected files and risk areas
2. Scenario derivation       -> behaviors from requirements + blast radius
3. Architecture from scenarios -> file list justified by scenarios

Not the other way around. This prevents the AI from planning 15 files and then writing scenarios that justify them. Instead, scenarios define what the system must do, and the file list follows.

Where Scenarios Come From

Scenarios aren't invented — they're derived from concrete sources:

| Source | Becomes | |--------|---------| | Feature description | Happy path scenarios | | Acceptance criteria (Jira/GitHub issue) | Validation scenarios | | Blast radius: risk areas | Edge case and error scenarios | | Blast radius: affected consumers | Regression guard scenarios ("existing X still works") |

File-to-Scenario Traceability

Every file in the plan must justify its existence:

Scenario-traced files:
  src/services/PasswordResetService.ts  -> Scenario: "Successful password reset"
  src/middleware/RateLimiter.ts          -> Scenario: "Rate limiting enforced"

Infrastructure files (no scenario needed, but must state dependency):
  db/migrations/001_add_reset_tokens.sql -> Required by PasswordResetService
  config/email.ts                        -> Required by PasswordResetService

If the AI plans a file that no scenario needs and isn't infrastructure — that file shouldn't exist. This is how Temper prevents over-engineering structurally, not by hoping the AI "keeps it simple."

Scenario Coverage Gate

After all tasks complete, /temper:build runs the scenario coverage gate:

Scenario Coverage: 5/5
  [x] Successful password reset     -> test_successful_reset (PASS)
  [x] Expired token rejected        -> test_expired_token (PASS)
  [x] Rate limiting enforced        -> test_rate_limiting (PASS)
  [x] Invalid email format          -> test_invalid_email (PASS)
  [x] Non-existent user handled     -> test_nonexistent_user (PASS)

If any scenario has no passing test, build cannot proceed. It writes the missing test, runs it, and implements the feature if the test fails. This is how the rate-limiting example works — the scenario existed in intent.md, no test covered it, so build caught the gap.

TDD: Test-Driven Development

Question: Does the code work? When: During /temper:build, per scenario

TDD in Temper is scenario-driven. Instead of the AI deciding what to test, tests are derived from BDD scenarios:

| BDD Scenario | Becomes TDD | |-------------|-------------| | Given (preconditions) | Test setup | | When (action) | Method/endpoint call | | Then (expected outcome) | Assertions | | Scenario name | Test name |

The cycle per scenario:

RED — Write test mapped to scenario name. Run it. Must fail (proves the test actually tests something).
GREEN — Write minimal code to make the test pass. Nothing more.
REFACTOR — Clean up only if safe and obvious. All tests must still pass.

How TDD and BDD Work Together

When both intent.md and the TDD pack are active:

intent.md drives WHAT to test — scenarios define the test cases
TDD pack drives HOW to test — RED-GREEN-REFACTOR discipline, naming conventions, test structure

When only TDD pack is active (no intent.md — trivial/simple features):

TDD pack drives both what and how — freestyle test-first development

When neither is active:

No enforced test-first — implement, then test

This priority chain means intent.md and TDD aren't competing methodolo

Temper

Install / Use

README

Temper

The Problem

IDD + BDD + TDD: Three Layers, One File

IDD: Intent-Driven Development

Validate Types

Intent Validation in Review

BDD: Behavior-Driven Development

Where Scenarios Come From

File-to-Scenario Traceability

Scenario Coverage Gate

TDD: Test-Driven Development

How TDD and BDD Work Together