85 skills found · Page 1 of 3
oumi-ai / OumiEasily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!
LianjiaTech / BELLEBELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)
open-compass / OpencompassOpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Azure-Samples / Contoso ChatThis sample has the full End2End process of creating RAG application with Prompty and Azure AI Foundry. It includes GPT-4 LLM application code, evaluations, deployment automation with AZD CLI, GitHub actions for evaluation and deployment and intent mapping for multiple LLM task mapping.
THU-KEG / EvaluationPapers4ChatGPTResource, Evaluation and Detection Papers for ChatGPT
nlpyang / GevalCode for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"
GPT-Fathom / GPT FathomGPT-Fathom is an open-source and reproducible LLM evaluation suite, benchmarking 10+ leading open-source and closed-source LLMs as well as OpenAI's earlier models on 20+ curated benchmarks under aligned settings.
prometheus-eval / Prometheus[ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on a customized score rubric, Prometheus is a good alternative for human evaluation and GPT-4 evaluation.
PicoTrex / GPT ImgEvalGPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities
3DTopia / GPTEval3D[ CVPR 2024 ] Implementation for "GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation"
VietnamAIHub / Vietnamese LLMsDự án bao gồm: 1. Xây dựng bộ dữ Instructions Vietnamese (chất lượng, nhiều, và đa dạng). 2.LLM Training, Finetuning, Evaluating & Testing trên Open-source mô hình ngôn ngữ: Bloomz,T5, UL2, LLaMA (1&2), OpenLLaMA, GPT-J pythia etc. 3. Ứng dụng và Giao diện Người dùng (UI)
wxjiao / Is ChatGPT A Good TranslatorA preliminary evaluation of ChatGPT/GPT-4 for machine translation.
AnchoringAI / Anchoring AIAn open-source no-code tool for teams to collaborate on building, evaluating, and hosting applications leveraging GPT and other large language models. You could easily build and share LLM-powered apps, manage your budget and run batch jobs.
FreedomIntelligence / Evaluation Of ChatGPT On Information ExtractionAn Evaluation of ChatGPT on Information Extraction task, including Named Entity Recognition (NER), Relation Extraction (RE), Event Extraction (EE) and Aspect-based Sentiment Analysis (ABSA).
SCUT-DLVCLab / GPT 4V OCREvaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
allenai / CommonGen EvalEvaluating LLMs with CommonGen-Lite
Re-Align / Just EvalA simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
prometheus-eval / Prometheus Vision[ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on customized score rubric, Prometheus-Vision is a good alternative for human evaluation and GPT-4V evaluation.
BladeTransformerLLC / OvercookedGPTAn OpenAI gym environment to evaluate the ability of LLMs (eg. GPT-4, Claude) in long-horizon reasoning and task planning in dynamic multi-agent settings.
EPFL-VILAB / Fm Vision EvalsHow Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks, ICLR 2026