agent-evaluation
Use this when you need to EVALUATE OR IMPROVE or OPTIMIZE an existing LLM agent's output quality - including improving tool selection accuracy, answer quality, reducing costs, or fixing issues where the agent gives wrong/incomplete responses. Evaluates agents systematically using MLflow evaluation with datasets, scorers, and tracing. IMPORTANT - Always also load the instrumenting-with-mlflow-tracing skill before starting any work. Covers end-to-end evaluation workflow or individual components (tracing setup, dataset creation, scorer definition, evaluation execution).
Install / Use
# Copy GEMINI.md from https://github.com/Paldom/databricks-apps-fastapi-starter/blob/main/.gemini/skills/agent-evaluation/SKILL.mdGemini Rules
Gemini CLI config
Quality Score
Category
AutomationSupported Platforms
Tags
Related Skills
apple-reminders
337.3kManage Apple Reminders via remindctl CLI (list, add, edit, complete, delete). Supports lists, date filters, and JSON/plain output.
canvas
337.3kCanvas Skill Display HTML content on connected OpenClaw nodes (Mac app, iOS, Android). Overview The canvas tool lets you present web content on any connected node's canvas view. Great for: -
gh-issues
337.3kFetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]
imsg
337.3kiMessage/SMS CLI for listing chats, history, and sending messages via Messages.app.
Security Score
Audited on Mar 26, 2026
