Results for "gpt-evaluation"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

85 skills found · Page 1 of 3

oumi-ai / Oumi

9.2k

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

universal

dpoevaluationfine-tuning+9

Updated 36m ago

LianjiaTech / BELLE

8.3k

BELLE: Be Everyone's Large Language model Engine（开源中文对话大模型）

universal

bloomchinese-nlpgpt-evaluation+7

Updated 5h ago

open-compass / Opencompass

6.8k

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

claude codeclaude desktop

benchmarkchatgptevaluation+5

Updated 50m ago

Azure-Samples / Contoso Chat

757

This sample has the full End2End process of creating RAG application with Prompty and Azure AI Foundry. It includes GPT-4 LLM application code, evaluations, deployment automation with AZD CLI, GitHub actions for evaluation and deployment and intent mapping for multiple LLM task mapping.

vscode copilot

ai-azd-templatesazd-templatesazure-ai-foundry+10

Updated 3d ago

THU-KEG / EvaluationPapers4ChatGPT

455

Resource, Evaluation and Detection Papers for ChatGPT

universal

chatgptdetectionevaluation+2

Updated 20d ago

nlpyang / Geval

414

Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"

universal

Updated 12d ago

GPT-Fathom / GPT Fathom

343

GPT-Fathom is an open-source and reproducible LLM evaluation suite, benchmarking 10+ leading open-source and closed-source LLMs as well as OpenAI's earlier models on 20+ curated benchmarks under aligned settings.

universal

Updated 17d ago

prometheus-eval / Prometheus

313

[ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on a customized score rubric, Prometheus is a good alternative for human evaluation and GPT-4 evaluation.

zed

Updated 1d ago

PicoTrex / GPT ImgEval

305

GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities

universal

Updated 1mo ago

3DTopia / GPTEval3D

284

[ CVPR 2024 ] Implementation for "GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation"

universal

Updated 3mo ago

VietnamAIHub / Vietnamese LLMs

279

Dự án bao gồm: 1. Xây dựng bộ dữ Instructions Vietnamese (chất lượng, nhiều, và đa dạng). 2.LLM Training, Finetuning, Evaluating & Testing trên Open-source mô hình ngôn ngữ: Bloomz,T5, UL2, LLaMA (1&2), OpenLLaMA, GPT-J pythia etc. 3. Ứng dụng và Giao diện Người dùng (UI)

universal

Updated 3d ago

wxjiao / Is ChatGPT A Good Translator

248

A preliminary evaluation of ChatGPT/GPT-4 for machine translation.

universal

chatgptgpt-4multilingual+4

Updated 8mo ago

AnchoringAI / Anchoring AI

155

An open-source no-code tool for teams to collaborate on building, evaluating, and hosting applications leveraging GPT and other large language models. You could easily build and share LLM-powered apps, manage your budget and run batch jobs.

universal

aianchoringapache+11

Updated 1mo ago

FreedomIntelligence / Evaluation Of ChatGPT On Information Extraction

134

An Evaluation of ChatGPT on Information Extraction task, including Named Entity Recognition (NER), Relation Extraction (RE), Event Extraction (EE) and Aspect-based Sentiment Analysis (ABSA).

universal

chatgpterror-typesevaluation+10

Updated 5mo ago

SCUT-DLVCLab / GPT 4V OCR

126

Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)

universal

Updated 2mo ago

allenai / CommonGen Eval

Evaluating LLMs with CommonGen-Lite

universal

chatgptevaluationgpt-evaluation+4

Updated 1mo ago

Re-Align / Just Eval

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

universal

evaluationgpt4llm+3

Updated 3mo ago

prometheus-eval / Prometheus Vision

[ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on customized score rubric, Prometheus-Vision is a good alternative for human evaluation and GPT-4V evaluation.

zed

Updated 28d ago

BladeTransformerLLC / OvercookedGPT

An OpenAI gym environment to evaluate the ability of LLMs (eg. GPT-4, Claude) in long-horizon reasoning and task planning in dynamic multi-agent settings.

claude codeclaude desktop

Updated 2mo ago

EPFL-VILAB / Fm Vision Evals

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks, ICLR 2026

universal

Updated 1mo ago