Sifaka
Sifaka is an open-source framework that adds reflection and reliability to large language model (LLM) applications.
Install / Use
/learn @sifaka-ai/SifakaREADME
Sifaka
AI text improvement through research-backed critique with complete observability
Status: Alpha software (v0.3.0). Functional but early-stage. Best suited for evaluation, experimentation, and development.
Why Sifaka?
The Problem: AI-generated text often needs refinement. How do you know if AI output is good enough? How can you systematically improve it without manual review of every output?
What Sifaka Provides:
- Research-Backed Improvement: Implements peer-reviewed critique techniques (Reflexion, Constitutional AI, Self-Refine, etc.)
- Complete Observability: Full audit trail showing exactly how text improved
- Iterative Refinement: Automatic multi-round critique and improvement cycles
- Provider-Agnostic: Works with OpenAI, Anthropic, Google, Groq
Use Case Example: Generate product descriptions for e-commerce. Sifaka:
- Critiques initial draft for clarity, persuasiveness, SEO
- Iteratively refines through multiple improvement cycles
- Validates against your criteria (length, required keywords, tone)
- Provides complete transparency into every improvement step
Installation
# Clone the repository
git clone https://github.com/sifaka-ai/sifaka
cd sifaka
# Install with uv (recommended)
uv pip install -e .
# Or with standard pip
pip install -e .
Setup
Configure your LLM provider API keys using environment variables or .env file:
# OpenAI (default provider)
export OPENAI_API_KEY=sk-...
# Or Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
# Or Google
export GOOGLE_API_KEY=...
# Or Groq
export GROQ_API_KEY=...
Quick Start
from sifaka import improve
import asyncio
async def main():
result = await improve("Write about renewable energy benefits")
print(result.final_text)
print(f"\nImprovement score: {result.improvement_score:.2f}")
print(f"Iterations: {result.iteration}")
asyncio.run(main())
Synchronous API
from sifaka import improve_sync
result = improve_sync("Write about renewable energy benefits")
print(result.final_text)
Core Features
1. Research-Backed Critics
Sifaka implements peer-reviewed critique techniques from academic research:
| Critic | Best For | Research Paper | |--------|----------|----------------| | SELF_REFINE | General improvement | Self-Refine (2023) | | REFLEXION | Learning from mistakes | Reflexion (2023) | | CONSTITUTIONAL | Safety & ethics | Constitutional AI (2022) | | SELF_CONSISTENCY | Balanced perspectives | Self-Consistency (2022) | | SELF_RAG | Fact-checking | Self-RAG (2023) | | META_REWARDING | Self-evaluation | Meta-Rewarding (2024) | | N_CRITICS | Multiple perspectives | N-Critics (2023) | | STYLE | Tone & style | Custom implementation |
2. Complete Observability
result = await improve("Your text")
# Access complete audit trail
for iteration in result.trace:
print(f"Iteration {iteration.number}")
print(f" Critique: {iteration.critique}")
print(f" Improvement: {iteration.improvement}")
print(f" Time: {iteration.processing_time:.2f}s")
3. Provider-Agnostic Design
# OpenAI
result = await improve(text, provider="openai", model="gpt-4o-mini")
# Anthropic
result = await improve(text, provider="anthropic", model="claude-3-5-sonnet")
# Google
result = await improve(text, provider="google", model="gemini-1.5-flash")
# Groq (fast inference)
result = await improve(text, provider="groq", model="llama3-8b-8192")
4. Validation & Quality Control
from sifaka.validators import LengthValidator, ContentValidator
result = await improve(
"Write a product description",
validators=[
LengthValidator(min_length=100, max_length=200),
ContentValidator(required_terms=["features", "benefits"])
]
)
Usage Examples
Example 1: Basic Improvement
from sifaka import improve
result = await improve("AI is important for business.")
print(result.final_text)
# Output: "Artificial intelligence transforms business operations by automating..."
Example 2: Using Specific Critics
from sifaka import improve
from sifaka.core.types import CriticType
# Single critic
result = await improve(
"Explain quantum computing",
critics=[CriticType.REFLEXION]
)
# Multiple critics
result = await improve(
"Explain quantum computing",
critics=[CriticType.REFLEXION, CriticType.SELF_REFINE]
)
Example 3: Style Transformation
from sifaka.critics.style import StyleCritic
result = await improve(
"We offer comprehensive solutions for your needs.",
critics=[StyleCritic(
style_description="Casual and friendly",
style_examples=["Hey there!", "No worries!"]
)]
)
Example 4: Fact-Checking with SELF_RAG
result = await improve(
"The Great Wall of China is visible from space.",
critics=[CriticType.SELF_RAG]
)
# Critiques factual accuracy and suggests corrections
Example 5: Safety & Ethics Check
result = await improve(
"Guide on pest control methods",
critics=[CriticType.CONSTITUTIONAL]
)
# Evaluates against safety principles
Example 6: Multiple Perspectives
result = await improve(
"Product launch announcement",
critics=[CriticType.N_CRITICS]
)
# Gets feedback from technical expert, general audience, editor, skeptic perspectives
Example 7: Iteration Control
# More iterations for higher quality
result = await improve(
"Draft email to client",
max_iterations=5 # Default is 3
)
# Force improvements even if validation passes
result = await improve(
"Good text that passes validation",
force_improvements=True
)
Example 8: Configuration
from sifaka import Config
config = Config(
model="gpt-4",
temperature=0.7,
max_iterations=5,
timeout_seconds=120
)
result = await improve("Your text", config=config)
Example 9: Storage Backends
from sifaka.storage.file import FileStorage
from sifaka.storage.redis import RedisStorage
# File storage
result = await improve(
"Your text",
storage=FileStorage("./results")
)
# Redis storage
result = await improve(
"Your text",
storage=RedisStorage("redis://localhost:6379")
)
Example 10: Error Handling
from sifaka.core.exceptions import ValidationError, CriticError
try:
result = await improve(text)
except ValidationError as e:
print(f"Validation failed: {e}")
except CriticError as e:
print(f"Critic error: {e}")
Example 11: Batch Processing
import asyncio
texts = ["Text 1", "Text 2", "Text 3"]
tasks = [improve(text) for text in texts]
results = await asyncio.gather(*tasks)
Example 12: Custom Validators
from sifaka.validators import BaseValidator
class CustomValidator(BaseValidator):
async def validate(self, text: str) -> ValidationResult:
# Your custom validation logic
passed = "important_keyword" in text.lower()
return ValidationResult(
validator="custom",
passed=passed,
message="Must contain 'important_keyword'"
)
result = await improve(text, validators=[CustomValidator()])
Example 13: Combining Critics for Comprehensive Review
# Technical accuracy + readability
result = await improve(
"Technical documentation",
critics=[CriticType.REFLEXION, CriticType.STYLE]
)
# Safety + factual accuracy
result = await improve(
"Health advice article",
critics=[CriticType.CONSTITUTIONAL, CriticType.SELF_RAG]
)
# Comprehensive review
result = await improve(
"Important business document",
critics=[
CriticType.SELF_REFINE,
CriticType.N_CRITICS,
CriticType.META_REWARDING
]
)
Configuration
Environment Variables
# LLM Provider Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
GROQ_API_KEY=...
# Optional: Default settings
SIFAKA_DEFAULT_MODEL=gpt-4o-mini
SIFAKA_MAX_ITERATIONS=3
SIFAKA_TEMPERATURE=0.7
Config Object
from sifaka import Config
config = Config(
# Model settings
model="gpt-4", # LLM model to use
temperature=0.7, # Creativity (0.0-2.0)
max_tokens=1000, # Max response length
# Critic settings
critic_temperature=0.3, # Lower = more consistent
critic_context_window=3, # Previous critiques to consider
# Behavior settings
max_iterations=3, # Max improvement cycles
force_improvements=False, # Improve even if valid
timeout_seconds=300, # Overall timeout
)
Architecture Overview
┌─────────────────────────────────────────────┐
│ Sifaka Improvement Loop │
└─────────────────────────────────────────────┘
│
▼
┌──────────────────────────┐
│ 1. Generate/Modify │
│ (LLM Provider) │
└──────────────────────────┘
│
▼
┌──────────────────────────┐
│ 2. Critique │
│ (Critics: Reflexion, │
│ Constitutional, etc) │
└───────────
