Results for "promptfoo"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

21 skills found

promptfoo / Promptfoo

19.9k

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.

claude codeclaude desktop+1

cici-cdcicd+15

Updated 6m ago

AgentEvalHQ / AgentEval

AgentEval is the comprehensive .NET toolkit for AI agent evaluation—tool usage validation, RAG quality metrics, stochastic evaluation, and model comparison—built first for Microsoft Agent Framework (MAF) and Microsoft.Extensions.AI. What RAGAS, PromptFoo and DeepEval do for Python, AgentEval does for .NET

universal

agentagenticevals+6

Updated 17h ago

promptfoo / Promptfoo Action

The GitHub Action for Promptfoo. Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

claude codeclaude desktop+1

actionsllmprompt-engineering+2

Updated 4h ago

rsfl / Splunk MCP Llm Siemulator

A Docker lab integrating Splunk SIEM with Ollama LLM via MCP for AI security operations. Features Promptfoo OWASP evaluation, TA-ollama and TA-mcp-jsonrpc add-ons, dual bind-mount log ingestion, and real-time HEC streaming across six indexes for MITRE ATLAS TTP detection.

claude codecursor

Updated 6d ago

perzeuss / Dify Promptfoo

Evaluate Dify assistants with promptfoo!

universal

Updated 1mo ago

rsfl / Splunk MCP Llm Siemulator Linux

Linux version of Splunk MCP LLM MCP SIEMulator . A Docker lab integrating Splunk SIEM with Ollama LLM via Model Context Protocol for AI-powered security operations. Features Promptfoo evaluation, OpenWebUI chat interface, Splunk UF and Raw HEC logging for real-time event ingestion and LLM-assisted incident response testing.

claude codecursor

Updated 27d ago

eon01 / LLMPromptEngineeringForDevelopersFiles

This repository contains the code snippets used in "LLM Prompt Engineering For Developers"

universal

azure-prompt-flowbetterpromptchain-of-thought+16

Updated 5mo ago

kpavlov / Koog Spring Boot Assistant

Kotlin + SpringBoot + Koog + Promptfoo example

universal

examplekoogkotlin+3

Updated 4d ago

TomasHer / Prompting Blueprints

Your guide to the Agentic AI evolution. **Prompting Blueprints** offers a curated collection of concepts and tactics for building autonomous AI workflows. Master tool-specific playbooks, backed by structured prompt packs and rigorous evaluations for the latest AI models.

gemini cli

chatgptcontext-engineeringgemini+6

Updated 22h ago

openclay-ai / Openclay

Runtime-secured AI tooling framework for production-grade LLM applications, protecting against prompt injection, jailbreaks, and adversarial attacks.

universal

aiai-defenseai-hacking+17

Updated 8d ago

promptfoo / Mini Foo

Mini promptfoo used for interviews

universal

Updated 14d ago

yukinagae / Genkitx Promptfoo

Community Plugin for Genkit to use Promptfoo

universal

aievaluationevaluation-framework+15

Updated 11d ago

GenAIGator / AI RedTeaming With PromptFoo

A collection of AI red teaming tests built with Promptfoo to simulate adversarial prompts, detect prompt injection and jailbreak vulnerabilities, and evaluate the security of LLM applications and agents.

universal

Updated 11d ago

promptfoo / MCP Agent Provider

A promptfoo custom provider to test MCP servers with our evil mcp server

claude codecursor

Updated 12h ago

christhesoul / Minitest Promptfoo

A simple Ruby wrapper for testing your LLM prompts with Promptfoo

universal

Updated 11d ago

syamsasi99 / Prompt Evaluator

prompt-evaluator is an open-source toolkit for evaluating, testing, and comparing LLM prompts. It provides a GUI-driven workflow for running prompt tests, tracking token usage, visualizing results, and ensuring reliability across models like OpenAI, Claude, and Gemini.

claude codeclaude desktop+1

ai-evaluationai-evaluation-frameworkai-evaluation-metrics+10

Updated 3d ago