575 skills found · Page 1 of 20
block / Goosean open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
raga-ai-hub / RagaAI CatalystPython SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view
alibaba / MNNMNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.
microsoft / PromptflowBuild high-quality LLM apps - from prototyping, testing to production deployment and monitoring.
evidentlyai / EvidentlyEvidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Giskard-AI / Giskard Oss🐢 Open-Source Evaluation & Testing library for LLM Agents
LearningCircuit / Local Deep ResearchLocal Deep Research achieves ~95% on SimpleQA benchmark (tested with GPT-4.1-mini). Supports local and cloud LLMs (Ollama, Google, Anthropic, ...). Searches 10+ sources - arXiv, PubMed, web, and your private documents. Everything Local & Encrypted.
langwatch / LangwatchThe platform for LLM evaluations and AI agent testing
hegelai / PrompttoolsOpen-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).
ianarawjo / ChainForgeAn open-source visual programming environment for battle-testing prompts to LLMs.
Forethought-Technologies / AutoChainAutoChain: Build lightweight, extensible, and testable LLM Agents
Pythagora-io / PythagoraGenerate automated tests for your Node.js app via LLMs without developers having to write a single line of code.
BlackSnufkin / LitterBoxA secure sandbox environment for malware developers and red teamers to test payloads against detection mechanisms before deployment. Integrates with LLM agents via MCP for enhanced analysis capabilities.
JoasASantos / NeuroSploitNeuroSploit is an advanced, AI-powered penetration testing framework designed to automate and augment various aspects of offensive security operations. Leveraging the capabilities of large language models (LLMs).
pixegami / Rag Tutorial V2An Improved Langchain RAG Tutorial (v2) with local LLMs, database updates, and testing.
georgian-io / LLM Finetuning ToolkitToolkit for fine-tuning, ablating and unit-testing open-source LLMs.
codefuse-ai / Test AgentAgent that empowers software testing with LLMs; industrial-first in China
qixucen / Atom[NeurIPS 2025] Atom of Thoughts for Markov LLM Test-Time Scaling
devoxx / DevoxxGenieIDEAPluginDevoxxGenie is a plugin for IntelliJ IDEA that uses local LLM's (Ollama, LMStudio, GPT4All, Jan and Llama.cpp) and Cloud based LLMs to help review, test, explain your project code. Latest version now also supports Spec Driven Development with CLI Runners.
PacificAI / LangtestDeliver safe & effective language models