SkillAgentSearch skills...

Sifaka

Sifaka is an open-source framework that adds reflection and reliability to large language model (LLM) applications.

Install / Use

/learn @sifaka-ai/Sifaka
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Sifaka

AI text improvement through research-backed critique with complete observability

Python License Version Coverage

Status: Alpha software (v0.3.0). Functional but early-stage. Best suited for evaluation, experimentation, and development.


Why Sifaka?

The Problem: AI-generated text often needs refinement. How do you know if AI output is good enough? How can you systematically improve it without manual review of every output?

What Sifaka Provides:

  • Research-Backed Improvement: Implements peer-reviewed critique techniques (Reflexion, Constitutional AI, Self-Refine, etc.)
  • Complete Observability: Full audit trail showing exactly how text improved
  • Iterative Refinement: Automatic multi-round critique and improvement cycles
  • Provider-Agnostic: Works with OpenAI, Anthropic, Google, Groq

Use Case Example: Generate product descriptions for e-commerce. Sifaka:

  1. Critiques initial draft for clarity, persuasiveness, SEO
  2. Iteratively refines through multiple improvement cycles
  3. Validates against your criteria (length, required keywords, tone)
  4. Provides complete transparency into every improvement step

Installation

# Clone the repository
git clone https://github.com/sifaka-ai/sifaka
cd sifaka

# Install with uv (recommended)
uv pip install -e .

# Or with standard pip
pip install -e .

Setup

Configure your LLM provider API keys using environment variables or .env file:

# OpenAI (default provider)
export OPENAI_API_KEY=sk-...

# Or Anthropic
export ANTHROPIC_API_KEY=sk-ant-...

# Or Google
export GOOGLE_API_KEY=...

# Or Groq
export GROQ_API_KEY=...

Quick Start

from sifaka import improve
import asyncio

async def main():
    result = await improve("Write about renewable energy benefits")
    print(result.final_text)
    print(f"\nImprovement score: {result.improvement_score:.2f}")
    print(f"Iterations: {result.iteration}")

asyncio.run(main())

Synchronous API

from sifaka import improve_sync

result = improve_sync("Write about renewable energy benefits")
print(result.final_text)

Core Features

1. Research-Backed Critics

Sifaka implements peer-reviewed critique techniques from academic research:

| Critic | Best For | Research Paper | |--------|----------|----------------| | SELF_REFINE | General improvement | Self-Refine (2023) | | REFLEXION | Learning from mistakes | Reflexion (2023) | | CONSTITUTIONAL | Safety & ethics | Constitutional AI (2022) | | SELF_CONSISTENCY | Balanced perspectives | Self-Consistency (2022) | | SELF_RAG | Fact-checking | Self-RAG (2023) | | META_REWARDING | Self-evaluation | Meta-Rewarding (2024) | | N_CRITICS | Multiple perspectives | N-Critics (2023) | | STYLE | Tone & style | Custom implementation |

2. Complete Observability

result = await improve("Your text")

# Access complete audit trail
for iteration in result.trace:
    print(f"Iteration {iteration.number}")
    print(f"  Critique: {iteration.critique}")
    print(f"  Improvement: {iteration.improvement}")
    print(f"  Time: {iteration.processing_time:.2f}s")

3. Provider-Agnostic Design

# OpenAI
result = await improve(text, provider="openai", model="gpt-4o-mini")

# Anthropic
result = await improve(text, provider="anthropic", model="claude-3-5-sonnet")

# Google
result = await improve(text, provider="google", model="gemini-1.5-flash")

# Groq (fast inference)
result = await improve(text, provider="groq", model="llama3-8b-8192")

4. Validation & Quality Control

from sifaka.validators import LengthValidator, ContentValidator

result = await improve(
    "Write a product description",
    validators=[
        LengthValidator(min_length=100, max_length=200),
        ContentValidator(required_terms=["features", "benefits"])
    ]
)

Usage Examples

Example 1: Basic Improvement

from sifaka import improve

result = await improve("AI is important for business.")
print(result.final_text)
# Output: "Artificial intelligence transforms business operations by automating..."

Example 2: Using Specific Critics

from sifaka import improve
from sifaka.core.types import CriticType

# Single critic
result = await improve(
    "Explain quantum computing",
    critics=[CriticType.REFLEXION]
)

# Multiple critics
result = await improve(
    "Explain quantum computing",
    critics=[CriticType.REFLEXION, CriticType.SELF_REFINE]
)

Example 3: Style Transformation

from sifaka.critics.style import StyleCritic

result = await improve(
    "We offer comprehensive solutions for your needs.",
    critics=[StyleCritic(
        style_description="Casual and friendly",
        style_examples=["Hey there!", "No worries!"]
    )]
)

Example 4: Fact-Checking with SELF_RAG

result = await improve(
    "The Great Wall of China is visible from space.",
    critics=[CriticType.SELF_RAG]
)
# Critiques factual accuracy and suggests corrections

Example 5: Safety & Ethics Check

result = await improve(
    "Guide on pest control methods",
    critics=[CriticType.CONSTITUTIONAL]
)
# Evaluates against safety principles

Example 6: Multiple Perspectives

result = await improve(
    "Product launch announcement",
    critics=[CriticType.N_CRITICS]
)
# Gets feedback from technical expert, general audience, editor, skeptic perspectives

Example 7: Iteration Control

# More iterations for higher quality
result = await improve(
    "Draft email to client",
    max_iterations=5  # Default is 3
)

# Force improvements even if validation passes
result = await improve(
    "Good text that passes validation",
    force_improvements=True
)

Example 8: Configuration

from sifaka import Config

config = Config(
    model="gpt-4",
    temperature=0.7,
    max_iterations=5,
    timeout_seconds=120
)

result = await improve("Your text", config=config)

Example 9: Storage Backends

from sifaka.storage.file import FileStorage
from sifaka.storage.redis import RedisStorage

# File storage
result = await improve(
    "Your text",
    storage=FileStorage("./results")
)

# Redis storage
result = await improve(
    "Your text",
    storage=RedisStorage("redis://localhost:6379")
)

Example 10: Error Handling

from sifaka.core.exceptions import ValidationError, CriticError

try:
    result = await improve(text)
except ValidationError as e:
    print(f"Validation failed: {e}")
except CriticError as e:
    print(f"Critic error: {e}")

Example 11: Batch Processing

import asyncio

texts = ["Text 1", "Text 2", "Text 3"]
tasks = [improve(text) for text in texts]
results = await asyncio.gather(*tasks)

Example 12: Custom Validators

from sifaka.validators import BaseValidator

class CustomValidator(BaseValidator):
    async def validate(self, text: str) -> ValidationResult:
        # Your custom validation logic
        passed = "important_keyword" in text.lower()
        return ValidationResult(
            validator="custom",
            passed=passed,
            message="Must contain 'important_keyword'"
        )

result = await improve(text, validators=[CustomValidator()])

Example 13: Combining Critics for Comprehensive Review

# Technical accuracy + readability
result = await improve(
    "Technical documentation",
    critics=[CriticType.REFLEXION, CriticType.STYLE]
)

# Safety + factual accuracy
result = await improve(
    "Health advice article",
    critics=[CriticType.CONSTITUTIONAL, CriticType.SELF_RAG]
)

# Comprehensive review
result = await improve(
    "Important business document",
    critics=[
        CriticType.SELF_REFINE,
        CriticType.N_CRITICS,
        CriticType.META_REWARDING
    ]
)

Configuration

Environment Variables

# LLM Provider Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
GROQ_API_KEY=...

# Optional: Default settings
SIFAKA_DEFAULT_MODEL=gpt-4o-mini
SIFAKA_MAX_ITERATIONS=3
SIFAKA_TEMPERATURE=0.7

Config Object

from sifaka import Config

config = Config(
    # Model settings
    model="gpt-4",              # LLM model to use
    temperature=0.7,            # Creativity (0.0-2.0)
    max_tokens=1000,            # Max response length

    # Critic settings
    critic_temperature=0.3,     # Lower = more consistent
    critic_context_window=3,    # Previous critiques to consider

    # Behavior settings
    max_iterations=3,           # Max improvement cycles
    force_improvements=False,   # Improve even if valid
    timeout_seconds=300,        # Overall timeout
)

Architecture Overview

┌─────────────────────────────────────────────┐
│           Sifaka Improvement Loop           │
└─────────────────────────────────────────────┘
                      │
                      ▼
        ┌──────────────────────────┐
        │   1. Generate/Modify     │
        │      (LLM Provider)      │
        └──────────────────────────┘
                      │
                      ▼
        ┌──────────────────────────┐
        │   2. Critique            │
        │   (Critics: Reflexion,   │
        │    Constitutional, etc)  │
        └───────────
View on GitHub
GitHub Stars15
CategoryDevelopment
Updated2mo ago
Forks1

Languages

Python

Security Score

95/100

Audited on Jan 14, 2026

No findings