Results for "llm-testing"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

575 skills found · Page 1 of 20

block / Goose

33.6k

an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM

claude codecursor

mcp

Updated 20m ago

raga-ai-hub / RagaAI Catalyst

16.1k

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view

universal

agentic-aiagentic-ai-developmentagentneo+9

Updated 1h ago

alibaba / MNN

14.7k

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.

universal

armconvolutiondeep-learning+8

Updated 5m ago

microsoft / Promptflow

11.1k

Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.

universal

aiai-application-developmentai-applications+5

Updated 19h ago

evidentlyai / Evidently

7.3k

Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.

universal

data-driftdata-qualitydata-science+11

Updated 51m ago

Giskard-AI / Giskard Oss

5.2k

🐢 Open-Source Evaluation & Testing library for LLM Agents

universal

agent-evaluationai-red-teamai-security+14

Updated 12h ago

LearningCircuit / Local Deep Research

4.2k

Local Deep Research achieves ~95% on SimpleQA benchmark (tested with GPT-4.1-mini). Supports local and cloud LLMs (Ollama, Google, Anthropic, ...). Searches 10+ sources - arXiv, PubMed, web, and your private documents. Everything Local & Encrypted.

claude codeclaude desktop

academiaanthropicarxiv+17

Updated 3h ago

langwatch / Langwatch

3.2k

The platform for LLM evaluations and AI agent testing

universal

aianalyticsdatasets+10

Updated 5m ago

hegelai / Prompttools

3.0k

Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).

universal

deep-learningdeveloper-toolsembeddings+6

Updated 11h ago

ianarawjo / ChainForge

3.0k

An open-source visual programming environment for battle-testing prompts to LLMs.

universal

aievaluationlarge-language-models+3

Updated 2d ago

Forethought-Technologies / AutoChain

1.9k

AutoChain: Build lightweight, extensible, and testable LLM Agents

universal

Updated 1d ago

Pythagora-io / Pythagora

1.8k

Generate automated tests for your Node.js app via LLMs without developers having to write a single line of code.

universal

api-testingapi-testing-frameworkautomated-testing+6

Updated 15h ago

BlackSnufkin / LitterBox

1.3k

A secure sandbox environment for malware developers and red teamers to test payloads against detection mechanisms before deployment. Integrates with LLM agents via MCP for enhanced analysis capabilities.

claude codecursor

aidocker-composemalware-analysis+6

Updated 1d ago

JoasASantos / NeuroSploit

974

NeuroSploit is an advanced, AI-powered penetration testing framework designed to automate and augment various aspects of offensive security operations. Leveraging the capabilities of large language models (LLMs).

universal

ai-agentscybersecurityframework+3

Updated 2h ago

pixegami / Rag Tutorial V2

937

An Improved Langchain RAG Tutorial (v2) with local LLMs, database updates, and testing.

universal

Updated 7d ago

georgian-io / LLM Finetuning Toolkit

870

Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.

universal

ablation-studyclassificationfalcon+15

Updated 4d ago

codefuse-ai / Test Agent

672

Agent that empowers software testing with LLMs; industrial-first in China

universal

developer-toolssoftware-quality-toolsoftware-testing+2

Updated 19h ago

qixucen / Atom

659

[NeurIPS 2025] Atom of Thoughts for Markov LLM Test-Time Scaling

universal

Updated 3d ago

devoxx / DevoxxGenieIDEAPlugin

635

DevoxxGenie is a plugin for IntelliJ IDEA that uses local LLM's (Ollama, LMStudio, GPT4All, Jan and Llama.cpp) and Cloud based LLMs to help review, test, explain your project code. Latest version now also supports Spec Driven Development with CLI Runners.

claude codeclaude desktop+2

anthropicazure-aichatgpt+17

Updated 6h ago

PacificAI / Langtest

554

Deliver safe & effective language models

universal

ai-safetyai-testingartificial-intelligence+16

Updated 2h ago