agent-evaluation

Use this when you need to EVALUATE OR IMPROVE or OPTIMIZE an existing LLM agent's output quality - including improving tool selection accuracy, answer quality, reducing costs, or fixing issues where the agent gives wrong/incomplete responses. Evaluates agents systematically using MLflow evaluation with datasets, scorers, and tracing. IMPORTANT - Always also load the instrumenting-with-mlflow-tracing skill before starting any work. Covers end-to-end evaluation workflow or individual components (tracing setup, dataset creation, scorer definition, evaluation execution).

Generate Convert Improve

Install / Use

# Copy GEMINI.md from https://github.com/Paldom/databricks-apps-fastapi-starter/blob/main/.gemini/skills/agent-evaluation/SKILL.md

About this skill

♊

Gemini Rules

Gemini CLI config

Quality Score

33/100

Related Skills

apple-reminders

337.3k

Manage Apple Reminders via remindctl CLI (list, add, edit, complete, delete). Supports lists, date filters, and JSON/plain output.

canvas

337.3k

Canvas Skill Display HTML content on connected OpenClaw nodes (Mac app, iOS, Android). Overview The canvas tool lets you present web content on any connected node's canvas view. Great for: -

gh-issues

337.3k

Fetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]

imsg

337.3k

iMessage/SMS CLI for listing chats, history, and sending messages via Messages.app.

Paldom

View profile

View on GitHub

GitHub Stars0

CategoryAutomation

Updated2h ago

Forks0

Paldom/agent-evaluation

Security Score

80/100

Audited on Mar 26, 2026

1 medium1 low