10 skills found
zzyfight / Genai Compliance BenchGenAI compliance benchmark is a evaluation benchmarks for generative AI in regulated industries.
ai-dynamo / AiperfAIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.
alopatenko / LLMEvaluationA comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods.
aichipdesign / ChipgptvNatural language is not enough: Benchmarking multi-modal generative AI for Verilog generation (ICCAD 2024)
cysecbench / DatasetGenerative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models
Generative-Engine-Marketing / GEM BenchFirst complete benchmark for Generative Engine Marketing (GEM), an emerging field that focuses on monetizing generative AI by seamlessly integrating advertisements into Large Language Model (LLM) responses. Our work addresses the core problem of ad-injected response (AIR) generation and provides a framework for its evaluation.
pvlbzn / LataiLatAI – A latency benchmarking tool for evaluating multiple generative AI providers and models 🌎.
toxy4ny / Kidnapp AI BenchmarkKidnapp-AI-Benchmark is a modular, extensible framework designed to systematically test and evaluate privacy leakage, data extraction, and adversarial vulnerabilities in large language models (LLMs) and other generative AI systems. Built for red teamers, penetration testers, and AI security researchers.
ml-energy / BenchmarkA time & energy benchmark suite for generative AI
Faruman / Comparison FinancialDataGenerationThe following repository contains the online appendix for the paper "Generative AI for Banks: Benchmarks and Algorithms for Synthetic Financial Transaction Data". It not only provides the python code for all experiments conducted but also background information for the literature review.