SkillAgentSearch skills...

AgentGuard

Declarative guardrails and safety controls for .NET AI agents

Install / Use

/learn @filipw/AgentGuard
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

AgentGuard

Declarative guardrails and safety controls for .NET AI agents

NuGet Build License: MIT

What NeMo Guardrails and Guardrails AI do for Python, AgentGuard does for .NET - with the fluent APIs, composable rules, and type safety that .NET developers expect.


Why AgentGuard?

Every AI agent needs the same safety guardrails: PII detection, prompt injection blocking, topic enforcement, token limits, output validation. AgentGuard provides all of this as composable, testable, declarative rules.

The core engine is framework-agnostic - use it standalone, with Microsoft Agent Framework, Semantic Kernel, or any other .NET AI stack. Framework-specific adapters (like AgentGuard.AgentFramework for MAF) wire guardrails into the host's middleware pipeline.

// Get started with sensible defaults - fully offline, no configuration needed
using AgentGuard.Onnx;

var policy = new GuardrailPolicyBuilder()
    .UseDefaults()    // normalization + regex + Defender ML + PII + secrets + tool guardrails
    .Build();

var pipeline = new GuardrailPipeline(policy, logger);
var result = await pipeline.RunAsync(new GuardrailContext { Text = userInput, Phase = GuardrailPhase.Input });

if (result.IsBlocked)
    Console.WriteLine($"Blocked: {result.BlockingResult!.Reason}");
// Or pick and choose individual rules
var policy = new GuardrailPolicyBuilder()
    .NormalizeInput()              // decode base64/hex/unicode evasion tricks
    .GuardRetrieval()              // filter poisoned RAG chunks
    .BlockPromptInjection()        // regex-based injection detection
    .RedactPII(PiiCategory.Email | PiiCategory.Phone | PiiCategory.SSN)
    .DetectSecrets()               // block API keys, tokens, connection strings
    .EnforceTopicBoundary("customer-support", "billing", "returns")
    .LimitInputTokens(4000)
    .GuardToolCalls()              // inspect tool call arguments for injection
    .GuardToolResults()            // detect indirect injection in tool results
    .Build();
// Plug into Microsoft Agent Framework
using AgentGuard.AgentFramework;
using AgentGuard.Onnx;

var guardedAgent = agent
    .AsBuilder()
    .UseAgentGuard(g => g.UseDefaults())
    .Build();

Features

Regex-based rules (fast, zero-cost, offline)

  • Input normalization - decodes evasion encodings (base64, hex, reversed text, Unicode homoglyphs) before downstream rules evaluate the text, catching attacks hidden via encoding tricks
  • Prompt injection detection - blocks jailbreak attempts, system prompt extraction, role-play attacks, end sequence injection, variable expansion, framing attacks, and rule addition with configurable sensitivity levels (Low/Medium/High). Patterns based on the Arcanum Prompt Injection Taxonomy
  • PII redaction - detects and redacts emails, phone numbers, SSNs, credit cards, IP addresses, dates of birth, and custom patterns on input and output
  • Topic boundary enforcement - keyword-based topic matching with pluggable ITopicSimilarityProvider for embedding-based similarity. EmbeddingSimilarityProvider in AgentGuard.Local uses any IEmbeddingGenerator<string, Embedding<float>> for cosine similarity with automatic topic embedding caching
  • Token limits - enforces input/output token budgets using Microsoft.ML.Tokenizers (cl100k_base) with configurable overflow strategies (Reject/Truncate/Warn)
  • Secrets detection - detects API keys (AWS, GitHub, Azure, Slack), JWT tokens, private keys, connection strings, bearer tokens. Block or redact actions with custom patterns and optional Shannon entropy-based detection
  • Content safety - severity-based filtering via pluggable IContentSafetyClassifier (Azure AI Content Safety adapter included). Detects harmful content (hate, violence, self-harm, sexual) - a complementary layer to prompt injection detection, not a substitute for it
  • Azure Prompt Shields - dedicated prompt injection detection via Azure AI Content Safety's Prompt Shield API (text:shieldPrompt). Detects user prompt attacks (jailbreaks, role-play, encoding attacks) and document attacks (indirect injection in grounded content). F1 50.3% (85.9% precision, 35.6% recall) on adversarial benchmarks — complements local Defender (F1 ~97%) with a cloud-based signal. Order 14. Install via AgentGuard.Azure package
  • Azure Protected Material detection - detects copyrighted text (lyrics, articles, recipes) and code from GitHub repositories in LLM-generated output via text:detectProtectedMaterial and text:detectProtectedMaterialForCode. Code detection returns license info and source URLs. No C# SDK exists for these APIs — AgentGuard provides the only .NET client. Output phase (order 76), supports Block/Warn actions. Install via AgentGuard.Azure package

RAG & Agentic guardrails (zero-cost, offline)

  • Retrieval guardrails - filters retrieved chunks before they reach the LLM context. Detects prompt injection, secrets, and PII in knowledge base content. Supports relevance score filtering, max chunk limits, remove/sanitize actions, and custom filters. Integrates with MAF via RetrievalGuardrailContextProvider
  • Tool call guardrails - inspects agent tool call arguments for SQL injection, code injection, path traversal, command injection, SSRF, template injection, and XSS. Per-tool and per-argument allowlists for tools that legitimately accept code/SQL. Automatically extracted from MAF agent responses
  • Tool result guardrails - detects indirect prompt injection hidden in tool call results (emails, documents, API responses). Three-tier risk-based detection with tool-specific risk profiles (email=high, docs=medium, calculator=low). Supports block or sanitize actions, Unicode control character stripping, and custom patterns. Inspired by StackOneHQ/defender

ONNX ML-based rules (fast, accurate, offline)

  • StackOne Defender prompt injection detection - uses the StackOne Defender fine-tuned MiniLM-L6-v2 ONNX model (~22 MB, bundled in NuGet) for ML-based classification. F1 ~0.97 on adversarial benchmarks, ~8 ms inference, fully offline. No download required - the model is bundled with AgentGuard.Onnx. Order 11 (default). Also supports optional DeBERTa v3 model (order 12, separate download via ./eng/download-onnx-model.sh)

Remote ML classifier (SOTA models via HTTP)

  • Remote prompt injection detection - calls external model servers (Ollama, vLLM, HuggingFace TGI, custom FastAPI endpoints) for ML-based classification. Designed for SOTA models like Sentinel-v2 (Qwen3-0.6B, F1 ~0.957, 32K context). Lightweight - no native ML dependencies, just HttpClient. Pluggable IRemoteClassifier abstraction. Order 13, fails open by default. Install via AgentGuard.RemoteClassifier package.

Remote ML classifier (SOTA models via HTTP)

  • Remote prompt injection detection — calls external model servers (Ollama, vLLM, HuggingFace TGI, custom FastAPI endpoints) for ML-based classification. Designed for SOTA models like Sentinel-v2 (Qwen3-0.6B, F1 ~0.957, 32K context). Lightweight — no native ML dependencies, just HttpClient. Pluggable IRemoteClassifier abstraction. Order 13, fails open by default. Install via AgentGuard.RemoteClassifier package.

LLM-based rules (accurate, pluggable via IChatClient)

For teams that need higher accuracy than regex, AgentGuard provides LLM-as-judge guardrail rules that work with any IChatClient (Azure OpenAI, Ollama, local models, etc.):

  • LLM prompt injection detection - catches sophisticated attacks that regex misses: narrative smuggling, meta-prompting, cognitive overload, multi-chain attacks, and more. Prompt templates cover all 12 attack technique families and 20 evasion methods from the Arcanum PI Taxonomy. Returns structured threat classification metadata (technique, intent, evasion, confidence) for operational telemetry
  • LLM PII detection & redaction - catches unstructured PII like full names, physical addresses, and contextual identifiers that regex can't find. Supports block or redact modes
  • LLM topic boundary enforcement - semantic topic classification that understands intent, not just keywords
  • LLM output policy enforcement - checks if agent responses violate custom policy constraints (e.g. "never recommend competitors", "always include a disclaimer"). Configurable policy description with block or warn modes
  • LLM groundedness checking - detects hallucinated facts and claims not supported by the conversation context. Uses conversation history for grounding evaluation
  • LLM copyright detection - catches verbatim reproduction of copyrighted material (song lyrics, book passages, articles, restrictively-licensed code). Kill switch for copyright violations
using AgentGuard.Onnx;
using AgentGuard.Azure.PromptShield;

// Six-tier prompt injection detection: Regex → Defender → DeBERTa → Remote ML → Prompt Shield → LLM
var policy = new GuardrailPolicyBuilder()
    .BlockPromptInjection()                              // tier 1: fast regex (order 10)
    .BlockPromptInjectionWithDefender()                   // tier 2: Defender ML (order 11, bundled)
    .BlockPromptInjectionWithRemoteClassifier( 

Related Skills

View on GitHub
GitHub Stars7
CategoryDevelopment
Updated20h ago
Forks0

Languages

C#

Security Score

90/100

Audited on Apr 9, 2026

No findings