Results for "agent-evaluation"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

650 skills found · Page 1 of 22

mlflow / Mlflow

25.1k

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

universal

agentopsagentsai+15

Updated 1h ago

google / Adk Python

18.7k

An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.

universal

agentagenticagentic-ai+13

Updated 26m ago

comet-ml / Opik

18.6k

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

universal

evaluationhacktoberfesthacktoberfest2025+10

Updated 4m ago

raga-ai-hub / RagaAI Catalyst

16.1k

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view

universal

agentic-aiagentic-ai-developmentagentneo+9

Updated 1h ago

trycua / Cua

13.4k

Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).

universal

agentai-agentapple+15

Updated 1h ago

dataelement / Bisheng

11.3k

BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI workflow, RAG, Agent, Unified model management, Evaluation, SFT, Dataset Management, Enterprise-level System Management, Observability and more.

universal

agentaichatbot+17

Updated 2h ago

google / Adk Go

7.4k

An open-source, code-first Go toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.

gemini cliclaude code+1

a2aagentsagents-sdk+11

Updated 1h ago

tensortrade-org / Tensortrade

6.1k

An open source reinforcement learning framework for training, evaluating, and deploying robust trading agents.

universal

Updated 1h ago

GoogleCloudPlatform / Agent Starter Pack

6.1k

Ship AI Agents to Google Cloud in minutes, not months. Production-ready templates with built-in CI/CD, evaluation, and observability.

gemini cli

agentsgcpgemini+5

Updated 4h ago

coze-dev / Coze Loop

5.4k

Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to monitoring.

universal

agentagent-evaluationagent-observability+14

Updated 6h ago

Giskard-AI / Giskard Oss

5.2k

🐢 Open-Source Evaluation & Testing library for LLM Agents

universal

agent-evaluationai-red-teamai-security+14

Updated 1h ago

Kiln-AI / Kiln

4.7k

Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.

claude codecursor

aichain-of-thoughtcollaboration+17

Updated 3h ago

evalstate / Fast Agent

3.7k

Code, Build and Evaluate agents - excellent Model and Skills/MCP/ACP Support

claude codecursor

acpagentagent-framework+8

Updated 1d ago

Tencent / AI Infra Guard

3.4k

A full-stack AI Red Teaming platform securing AI ecosystems via OpenClaw Security Scan, Agent Scan, Skills Scan, MCP scan, AI Infra scan and LLM jailbreak evaluation.

claude codecursor

agentaibenchmark+13

Updated 22m ago

THUDM / AgentBench

3.3k

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

universal

chatgptgpt-4llm+1

Updated 2h ago

truera / Trulens

3.2k

Evaluation and Tracking for LLM Experiments and AI Agents

universal

agent-evaluationagentopsai-agents+10

Updated 5h ago

langwatch / Langwatch

3.2k

The platform for LLM evaluations and AI agent testing

universal

aianalyticsdatasets+10

Updated 5m ago

openlit / Openlit

2.3k

Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. 🚀💻 Integrates with 50+ LLM Providers, VectorDBs, Agent Frameworks and GPUs.

universal

ai-observabilityamd-gpuclickhouse+17

Updated 3h ago

openai / Neural Mmo

1.7k

Code for the paper "Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents"

universal

paper

Updated 1d ago

google / Adk Java

1.4k

An open-source, code-first Java toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.

universal

agentagenticagentic-ai+15

Updated 2h ago