AgentGuard

Declarative guardrails and safety controls for .NET AI agents

Generate Convert Improve

Install / Use

/learn @filipw/AgentGuard

About this skill

Quality Score

0/100

README

AgentGuard

Declarative guardrails and safety controls for .NET AI agents

What NeMo Guardrails and Guardrails AI do for Python, AgentGuard does for .NET - with the fluent APIs, composable rules, and type safety that .NET developers expect.

Why AgentGuard?

Every AI agent needs the same safety guardrails: PII detection, prompt injection blocking, topic enforcement, token limits, output validation. AgentGuard provides all of this as composable, testable, declarative rules.

The core engine is framework-agnostic - use it standalone, with Microsoft Agent Framework, Semantic Kernel, or any other .NET AI stack. Framework-specific adapters (like AgentGuard.AgentFramework for MAF) wire guardrails into the host's middleware pipeline.

// Get started with sensible defaults - fully offline, no configuration needed
using AgentGuard.Onnx;

var policy = new GuardrailPolicyBuilder()
    .UseDefaults()    // normalization + regex + Defender ML + PII + secrets + tool guardrails
    .Build();

var pipeline = new GuardrailPipeline(policy, logger);
var result = await pipeline.RunAsync(new GuardrailContext { Text = userInput, Phase = GuardrailPhase.Input });

if (result.IsBlocked)
    Console.WriteLine($"Blocked: {result.BlockingResult!.Reason}");

// Or pick and choose individual rules
var policy = new GuardrailPolicyBuilder()
    .NormalizeInput()              // decode base64/hex/unicode evasion tricks
    .GuardRetrieval()              // filter poisoned RAG chunks
    .BlockPromptInjection()        // regex-based injection detection
    .RedactPII(PiiCategory.Email | PiiCategory.Phone | PiiCategory.SSN)
    .DetectSecrets()               // block API keys, tokens, connection strings
    .EnforceTopicBoundary("customer-support", "billing", "returns")
    .LimitInputTokens(4000)
    .GuardToolCalls()              // inspect tool call arguments for injection
    .GuardToolResults()            // detect indirect injection in tool results
    .Build();

// Plug into Microsoft Agent Framework
using AgentGuard.AgentFramework;
using AgentGuard.Onnx;

var guardedAgent = agent
    .AsBuilder()
    .UseAgentGuard(g => g.UseDefaults())
    .Build();

Features

Regex-based rules (fast, zero-cost, offline)

Input normalization - decodes evasion encodings (base64, hex, reversed text, Unicode homoglyphs) before downstream rules evaluate the text, catching attacks hidden via encoding tricks
Prompt injection detection - blocks jailbreak attempts, system prompt extraction, role-play attacks, end sequence injection, variable expansion, framing attacks, and rule addition with configurable sensitivity levels (Low/Medium/High). Patterns based on the Arcanum Prompt Injection Taxonomy
PII redaction - detects and redacts emails, phone numbers, SSNs, credit cards, IP addresses, dates of birth, and custom patterns on input and output
Topic boundary enforcement - keyword-based topic matching with pluggable ITopicSimilarityProvider for embedding-based similarity. EmbeddingSimilarityProvider in AgentGuard.Local uses any IEmbeddingGenerator<string, Embedding<float>> for cosine similarity with automatic topic embedding caching
Token limits - enforces input/output token budgets using Microsoft.ML.Tokenizers (cl100k_base) with configurable overflow strategies (Reject/Truncate/Warn)
Secrets detection - detects API keys (AWS, GitHub, Azure, Slack), JWT tokens, private keys, connection strings, bearer tokens. Block or redact actions with custom patterns and optional Shannon entropy-based detection
Content safety - severity-based filtering via pluggable IContentSafetyClassifier (Azure AI Content Safety adapter included). Detects harmful content (hate, violence, self-harm, sexual) - a complementary layer to prompt injection detection, not a substitute for it
Azure Prompt Shields - dedicated prompt injection detection via Azure AI Content Safety's Prompt Shield API (text:shieldPrompt). Detects user prompt attacks (jailbreaks, role-play, encoding attacks) and document attacks (indirect injection in grounded content). F1 50.3% (85.9% precision, 35.6% recall) on adversarial benchmarks — complements local Defender (F1 ~97%) with a cloud-based signal. Order 14. Install via AgentGuard.Azure package
Azure Protected Material detection - detects copyrighted text (lyrics, articles, recipes) and code from GitHub repositories in LLM-generated output via text:detectProtectedMaterial and text:detectProtectedMaterialForCode. Code detection returns license info and source URLs. No C# SDK exists for these APIs — AgentGuard provides the only .NET client. Output phase (order 76), supports Block/Warn actions. Install via AgentGuard.Azure package

RAG & Agentic guardrails (zero-cost, offline)

Retrieval guardrails - filters retrieved chunks before they reach the LLM context. Detects prompt injection, secrets, and PII in knowledge base content. Supports relevance score filtering, max chunk limits, remove/sanitize actions, and custom filters. Integrates with MAF via RetrievalGuardrailContextProvider
Tool call guardrails - inspects agent tool call arguments for SQL injection, code injection, path traversal, command injection, SSRF, template injection, and XSS. Per-tool and per-argument allowlists for tools that legitimately accept code/SQL. Automatically extracted from MAF agent responses
Tool result guardrails - detects indirect prompt injection hidden in tool call results (emails, documents, API responses). Three-tier risk-based detection with tool-specific risk profiles (email=high, docs=medium, calculator=low). Supports block or sanitize actions, Unicode control character stripping, and custom patterns. Inspired by StackOneHQ/defender

ONNX ML-based rules (fast, accurate, offline)

StackOne Defender prompt injection detection - uses the StackOne Defender fine-tuned MiniLM-L6-v2 ONNX model (~22 MB, bundled in NuGet) for ML-based classification. F1 ~0.97 on adversarial benchmarks, ~8 ms inference, fully offline. No download required - the model is bundled with AgentGuard.Onnx. Order 11 (default). Also supports optional DeBERTa v3 model (order 12, separate download via ./eng/download-onnx-model.sh)

Remote ML classifier (SOTA models via HTTP)

Remote prompt injection detection - calls external model servers (Ollama, vLLM, HuggingFace TGI, custom FastAPI endpoints) for ML-based classification. Designed for SOTA models like Sentinel-v2 (Qwen3-0.6B, F1 ~0.957, 32K context). Lightweight - no native ML dependencies, just HttpClient. Pluggable IRemoteClassifier abstraction. Order 13, fails open by default. Install via AgentGuard.RemoteClassifier package.

Remote ML classifier (SOTA models via HTTP)

Remote prompt injection detection — calls external model servers (Ollama, vLLM, HuggingFace TGI, custom FastAPI endpoints) for ML-based classification. Designed for SOTA models like Sentinel-v2 (Qwen3-0.6B, F1 ~0.957, 32K context). Lightweight — no native ML dependencies, just HttpClient. Pluggable IRemoteClassifier abstraction. Order 13, fails open by default. Install via AgentGuard.RemoteClassifier package.

LLM-based rules (accurate, pluggable via `IChatClient`)

For teams that need higher accuracy than regex, AgentGuard provides LLM-as-judge guardrail rules that work with any IChatClient (Azure OpenAI, Ollama, local models, etc.):

LLM prompt injection detection - catches sophisticated attacks that regex misses: narrative smuggling, meta-prompting, cognitive overload, multi-chain attacks, and more. Prompt templates cover all 12 attack technique families and 20 evasion methods from the Arcanum PI Taxonomy. Returns structured threat classification metadata (technique, intent, evasion, confidence) for operational telemetry
LLM PII detection & redaction - catches unstructured PII like full names, physical addresses, and contextual identifiers that regex can't find. Supports block or redact modes
LLM topic boundary enforcement - semantic topic classification that understands intent, not just keywords
LLM output policy enforcement - checks if agent responses violate custom policy constraints (e.g. "never recommend competitors", "always include a disclaimer"). Configurable policy description with block or warn modes
LLM groundedness checking - detects hallucinated facts and claims not supported by the conversation context. Uses conversation history for grounding evaluation
LLM copyright detection - catches verbatim reproduction of copyrighted material (song lyrics, book passages, articles, restrictively-licensed code). Kill switch for copyright violations

using AgentGuard.Onnx;
using AgentGuard.Azure.PromptShield;

// Six-tier prompt injection detection: Regex → Defender → DeBERTa → Remote ML → Prompt Shield → LLM
var policy = new GuardrailPolicyBuilder()
    .BlockPromptInjection()                              // tier 1: fast regex (order 10)
    .BlockPromptInjectionWithDefender()                   // tier 2: Defender ML (order 11, bundled)
    .BlockPromptInjectionWithRemoteClassifier(

Related Skills

node-connect

353.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

353.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

353.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

filipw

View profile

View on GitHub

GitHub Stars7

CategoryDevelopment

Updated20h ago

Forks0

filipw/AgentGuard

Languages

Security Score

90/100

Audited on Apr 9, 2026

No findings

AgentGuard

Install / Use

README

AgentGuard

Why AgentGuard?

Features

Regex-based rules (fast, zero-cost, offline)

RAG & Agentic guardrails (zero-cost, offline)

ONNX ML-based rules (fast, accurate, offline)

Remote ML classifier (SOTA models via HTTP)

Remote ML classifier (SOTA models via HTTP)

LLM-based rules (accurate, pluggable via IChatClient)

Related Skills

LLM-based rules (accurate, pluggable via `IChatClient`)