650 skills found · Page 1 of 22
mlflow / MlflowThe open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.
google / Adk PythonAn open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
comet-ml / OpikDebug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
raga-ai-hub / RagaAI CatalystPython SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view
trycua / CuaOpen-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).
dataelement / BishengBISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI workflow, RAG, Agent, Unified model management, Evaluation, SFT, Dataset Management, Enterprise-level System Management, Observability and more.
google / Adk GoAn open-source, code-first Go toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
tensortrade-org / TensortradeAn open source reinforcement learning framework for training, evaluating, and deploying robust trading agents.
GoogleCloudPlatform / Agent Starter PackShip AI Agents to Google Cloud in minutes, not months. Production-ready templates with built-in CI/CD, evaluation, and observability.
coze-dev / Coze LoopNext-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to monitoring.
Giskard-AI / Giskard Oss🐢 Open-Source Evaluation & Testing library for LLM Agents
Kiln-AI / KilnBuild, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.
evalstate / Fast AgentCode, Build and Evaluate agents - excellent Model and Skills/MCP/ACP Support
Tencent / AI Infra GuardA full-stack AI Red Teaming platform securing AI ecosystems via OpenClaw Security Scan, Agent Scan, Skills Scan, MCP scan, AI Infra scan and LLM jailbreak evaluation.
THUDM / AgentBenchA Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
truera / TrulensEvaluation and Tracking for LLM Experiments and AI Agents
langwatch / LangwatchThe platform for LLM evaluations and AI agent testing
openlit / OpenlitOpen source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. 🚀💻 Integrates with 50+ LLM Providers, VectorDBs, Agent Frameworks and GPUs.
openai / Neural MmoCode for the paper "Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents"
google / Adk JavaAn open-source, code-first Java toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.