1,160 skills found · Page 3 of 39
evalplus / EvalplusRigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
huggingface / AisheetsBuild, enrich, and transform datasets using AI models with no code
mbzuai-oryx / Video ChatGPT[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
OpenGenerativeAI / Llm ColosseumBenchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
mattpocock / EvaliteEvaluate your LLM-powered apps with TypeScript
Barca0412 / Introduction To Quantitative Finance入门资料整理:1.多因子股票量化框架开源教程 2.学界和业界的经典资料收录 3.AI + 金融的相关工作,包括LLM, Agent, benchmark(evaluation), etc.
cyberark / FuzzyAIA powerful tool for automated LLM fuzzing. It is designed to help developers and security researchers identify and mitigate potential jailbreaks in their LLM APIs.
yueliu1999 / Awesome Jailbreak On LLMsAwesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.
Scale3-Labs / LangtraceLangtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorDBs and more.. Integrate using Typescript, Python. 🚀💻📊
microsoft / PromptyPrompty makes it easy to create, manage, debug, and evaluate LLM prompts for your AI applications. Prompty is an asset class and format for LLM prompts designed to enhance observability, understandability, and portability for developers.
thu-coai / Safety PromptsChinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。
cvs-health / UqlmUQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
rlancemartin / Auto EvaluatorEvaluation tool for LLM QA chains
EmbeddedLLM / JamAIBaseThe collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
prometheus-eval / Prometheus EvalEvaluate your LLM's response with Prometheus and GPT4 💯
JudgmentLabs / JudgevalThe open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.
langchain-ai / OpenevalsReadymade evaluators for your LLM apps
JackHopkins / Factorio Learning EnvironmentA non-saturating, open-ended environment for evaluating LLMs in Factorio
vllm-project / GuidellmEvaluate and Enhance Your LLM Deployments for Real-World Inference Needs
dezoito / Ollama Grid SearchA multi-platform desktop application to evaluate and compare LLM models, written in Rust and React.