Sifaka

AI text improvement through research-backed critique with complete observability

Status: Alpha software (v0.3.0). Functional but early-stage. Best suited for evaluation, experimentation, and development.

Why Sifaka?

The Problem: AI-generated text often needs refinement. How do you know if AI output is good enough? How can you systematically improve it without manual review of every output?

What Sifaka Provides:

Research-Backed Improvement: Implements peer-reviewed critique techniques (Reflexion, Constitutional AI, Self-Refine, etc.)
Complete Observability: Full audit trail showing exactly how text improved
Iterative Refinement: Automatic multi-round critique and improvement cycles
Provider-Agnostic: Works with OpenAI, Anthropic, Google, Groq

Use Case Example: Generate product descriptions for e-commerce. Sifaka:

Critiques initial draft for clarity, persuasiveness, SEO
Iteratively refines through multiple improvement cycles
Validates against your criteria (length, required keywords, tone)
Provides complete transparency into every improvement step

Installation

# Clone the repository
git clone https://github.com/sifaka-ai/sifaka
cd sifaka

# Install with uv (recommended)
uv pip install -e .

# Or with standard pip
pip install -e .

Setup

Configure your LLM provider API keys using environment variables or .env file:

# OpenAI (default provider)
export OPENAI_API_KEY=sk-...

# Or Anthropic
export ANTHROPIC_API_KEY=sk-ant-...

# Or Google
export GOOGLE_API_KEY=...

# Or Groq
export GROQ_API_KEY=...

Quick Start

from sifaka import improve
import asyncio

async def main():
    result = await improve("Write about renewable energy benefits")
    print(result.final_text)
    print(f"\nImprovement score: {result.improvement_score:.2f}")
    print(f"Iterations: {result.iteration}")

asyncio.run(main())

Synchronous API

from sifaka import improve_sync

result = improve_sync("Write about renewable energy benefits")
print(result.final_text)

Core Features

1. Research-Backed Critics

Sifaka implements peer-reviewed critique techniques from academic research:

| Critic | Best For | Research Paper | |--------|----------|----------------| | SELF_REFINE | General improvement | Self-Refine (2023) | | REFLEXION | Learning from mistakes | Reflexion (2023) | | CONSTITUTIONAL | Safety & ethics | Constitutional AI (2022) | | SELF_CONSISTENCY | Balanced perspectives | Self-Consistency (2022) | | SELF_RAG | Fact-checking | Self-RAG (2023) | | META_REWARDING | Self-evaluation | Meta-Rewarding (2024) | | N_CRITICS | Multiple perspectives | N-Critics (2023) | | STYLE | Tone & style | Custom implementation |

2. Complete Observability

result = await improve("Your text")

# Access complete audit trail
for iteration in result.trace:
    print(f"Iteration {iteration.number}")
    print(f"  Critique: {iteration.critique}")
    print(f"  Improvement: {iteration.improvement}")
    print(f"  Time: {iteration.processing_time:.2f}s")

3. Provider-Agnostic Design

# OpenAI
result = await improve(text, provider="openai", model="gpt-4o-mini")

# Anthropic
result = await improve(text, provider="anthropic", model="claude-3-5-sonnet")

# Google
result = await improve(text, provider="google", model="gemini-1.5-flash")

# Groq (fast inference)
result = await improve(text, provider="groq", model="llama3-8b-8192")

4. Validation & Quality Control

from sifaka.validators import LengthValidator, ContentValidator

result = await improve(
    "Write a product description",
    validators=[
        LengthValidator(min_length=100, max_length=200),
        ContentValidator(required_terms=["features", "benefits"])
    ]
)

Usage Examples

Example 1: Basic Improvement

from sifaka import improve

result = await improve("AI is important for business.")
print(result.final_text)
# Output: "Artificial intelligence transforms business operations by automating..."

Example 2: Using Specific Critics

from sifaka import improve
from sifaka.core.types import CriticType

# Single critic
result = await improve(
    "Explain quantum computing",
    critics=[CriticType.REFLEXION]
)

# Multiple critics
result = await improve(
    "Explain quantum computing",
    critics=[CriticType.REFLEXION, CriticType.SELF_REFINE]
)

Example 3: Style Transformation

from sifaka.critics.style import StyleCritic

result = await improve(
    "We offer comprehensive solutions for your needs.",
    critics=[StyleCritic(
        style_description="Casual and friendly",
        style_examples=["Hey there!", "No worries!"]
    )]
)

Example 4: Fact-Checking with SELF_RAG

result = await improve(
    "The Great Wall of China is visible from space.",
    critics=[CriticType.SELF_RAG]
)
# Critiques factual accuracy and suggests corrections

Example 5: Safety & Ethics Check

result = await improve(
    "Guide on pest control methods",
    critics=[CriticType.CONSTITUTIONAL]
)
# Evaluates against safety principles

Example 6: Multiple Perspectives

result = await improve(
    "Product launch announcement",
    critics=[CriticType.N_CRITICS]
)
# Gets feedback from technical expert, general audience, editor, skeptic perspectives

Example 7: Iteration Control

# More iterations for higher quality
result = await improve(
    "Draft email to client",
    max_iterations=5  # Default is 3
)

# Force improvements even if validation passes
result = await improve(
    "Good text that passes validation",
    force_improvements=True
)

Example 8: Configuration

from sifaka import Config

config = Config(
    model="gpt-4",
    temperature=0.7,
    max_iterations=5,
    timeout_seconds=120
)

result = await improve("Your text", config=config)

Example 9: Storage Backends

from sifaka.storage.file import FileStorage
from sifaka.storage.redis import RedisStorage

# File storage
result = await improve(
    "Your text",
    storage=FileStorage("./results")
)

# Redis storage
result = await improve(
    "Your text",
    storage=RedisStorage("redis://localhost:6379")
)

Example 10: Error Handling

from sifaka.core.exceptions import ValidationError, CriticError

try:
    result = await improve(text)
except ValidationError as e:
    print(f"Validation failed: {e}")
except CriticError as e:
    print(f"Critic error: {e}")

Example 11: Batch Processing

import asyncio

texts = ["Text 1", "Text 2", "Text 3"]
tasks = [improve(text) for text in texts]
results = await asyncio.gather(*tasks)

Example 12: Custom Validators

from sifaka.validators import BaseValidator

class CustomValidator(BaseValidator):
    async def validate(self, text: str) -> ValidationResult:
        # Your custom validation logic
        passed = "important_keyword" in text.lower()
        return ValidationResult(
            validator="custom",
            passed=passed,
            message="Must contain 'important_keyword'"
        )

result = await improve(text, validators=[CustomValidator()])

Example 13: Combining Critics for Comprehensive Review

# Technical accuracy + readability
result = await improve(
    "Technical documentation",
    critics=[CriticType.REFLEXION, CriticType.STYLE]
)

# Safety + factual accuracy
result = await improve(
    "Health advice article",
    critics=[CriticType.CONSTITUTIONAL, CriticType.SELF_RAG]
)

# Comprehensive review
result = await improve(
    "Important business document",
    critics=[
        CriticType.SELF_REFINE,
        CriticType.N_CRITICS,
        CriticType.META_REWARDING
    ]
)

Configuration

Environment Variables

# LLM Provider Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
GROQ_API_KEY=...

# Optional: Default settings
SIFAKA_DEFAULT_MODEL=gpt-4o-mini
SIFAKA_MAX_ITERATIONS=3
SIFAKA_TEMPERATURE=0.7

Config Object

from sifaka import Config

config = Config(
    # Model settings
    model="gpt-4",              # LLM model to use
    temperature=0.7,            # Creativity (0.0-2.0)
    max_tokens=1000,            # Max response length

    # Critic settings
    critic_temperature=0.3,     # Lower = more consistent
    critic_context_window=3,    # Previous critiques to consider

    # Behavior settings
    max_iterations=3,           # Max improvement cycles
    force_improvements=False,   # Improve even if valid
    timeout_seconds=300,        # Overall timeout
)

Architecture Overview

┌─────────────────────────────────────────────┐
│           Sifaka Improvement Loop           │
└─────────────────────────────────────────────┘
                      │
                      ▼
        ┌──────────────────────────┐
        │   1. Generate/Modify     │
        │      (LLM Provider)      │
        └──────────────────────────┘
                      │
                      ▼
        ┌──────────────────────────┐
        │   2. Critique            │
        │   (Critics: Reflexion,   │
        │    Constitutional, etc)  │
        └───────────

Sifaka

Install / Use

README

Sifaka

Why Sifaka?

Installation

Setup

Quick Start

Synchronous API

Core Features

1. Research-Backed Critics

2. Complete Observability

3. Provider-Agnostic Design

4. Validation & Quality Control

Usage Examples

Example 1: Basic Improvement

Example 2: Using Specific Critics

Example 3: Style Transformation

Example 4: Fact-Checking with SELF_RAG

Example 5: Safety & Ethics Check

Example 6: Multiple Perspectives

Example 7: Iteration Control

Example 8: Configuration

Example 9: Storage Backends

Example 10: Error Handling

Example 11: Batch Processing

Example 12: Custom Validators

Example 13: Combining Critics for Comprehensive Review

Configuration

Environment Variables

Config Object

Architecture Overview