SharpAI

Transform your .NET applications into AI powerhouses - embed models directly or deploy as an Ollama-compatible and OpenAI-compatible API server. No cloud dependencies, no limits, just local embeddings and inference.

<img src="https://img.shields.io/badge/.NET-5C2D91?style=for-the-badge&logo=.net&logoColor=white" /> <img src="https://img.shields.io/badge/C%23-239120?style=for-the-badge&logo=c-sharp&logoColor=white" /> <img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge" /> <a href="https://www.nuget.org/packages/SharpAI/"> <img src="https://img.shields.io/nuget/v/SharpAI.svg?style=flat" alt="NuGet Version"> </a>   <a href="https://www.nuget.org/packages/SharpAI"> <img src="https://img.shields.io/nuget/dt/SharpAI.svg" alt="NuGet Downloads"> </a> A .NET library for local AI model inference with Ollama-compatible and OpenAI-compatible REST APIs Embeddings • Completions • Chat • Built on LlamaSharp • GGUF Models Only

📁 Monorepo Structure

SharpAI is organized as a monorepo containing the core library, server, dashboard, and client SDKs:

SharpAI/
├── src/                    # Core .NET library and server
│   ├── SharpAI/           # Core library (NuGet: SharpAI)
│   ├── SharpAI.Server/    # REST API server
│   └── Test.*/            # Test projects
├── dashboard/              # Next.js 14 web interface
├── sdk/
│   ├── csharp/            # C# SDK (NuGet: SharpAI.Sdk)
│   ├── python/            # Python SDK (coming soon)
│   └── js/                # TypeScript/JavaScript SDK (npm: @sharpai/sdk)
├── docker/                 # Docker assets
└── README.md

Sub-Projects

| Project | Description | Documentation | |---------|-------------|---------------| | SharpAI | Core .NET library for local AI inference | This README | | SharpAI.Server | Ollama & OpenAI compatible REST API server | This README | | Dashboard | Next.js web interface for managing models | dashboard/README.md | | C# SDK | SDK for .NET applications to connect to SharpAI server | sdk/csharp/README.md | | TypeScript SDK | SDK for Node.js/browser applications | sdk/js/README.md | | Python SDK | SDK for Python applications | sdk/python/README.md |

🚀 Features

Ollama and OpenAI Compatible REST API Server - Provides endpoints compatible with API from Ollama and OpenAI
Model Management - Download and manage GGUF models from HuggingFace using Ollama APIs
Multiple Inference Types:
- Text embeddings generation
- Text completions
- Chat completions
Prompt Engineering Tools - Built-in helpers for formatting prompts for different model types
GPU Acceleration - Automatic CUDA detection when available
Streaming Support - Real-time token streaming for completions
SQLite Model Registry - Tracks model metadata and file information

📦 Installation

Install SharpAI via NuGet:

dotnet add package SharpAI

Or via Package Manager Console:

Install-Package SharpAI

📖 Core Components

AIDriver

The main entry point that provides access to all functionality:

using SharpAI;
using SyslogLogging;

// Initialize the AI driver
var ai = new AIDriver(
    logging: new LoggingModule(), 
    databaseFilename: "./sharpai.db",     
    huggingFaceApiKey: "hf_xxxxxxxxxxxx", 
    modelDirectory: "./models/"           
);

// Download a model from HuggingFace (GGUF format only)
await ai.Models.Add(
    name: "QuantFactory/Qwen2.5-3B-GGUF",
    quantizationPriority: null,
    progressCallback: (url, bytesDownloaded, percentComplete) =>
    {
        Console.WriteLine($"Progress: {percentComplete:P0}");
    });

// Generate a completion
string response = await ai.Completion.GenerateCompletion(
    model: "QuantFactory/Qwen2.5-3B-GGUF",
    prompt: "Once upon a time",
    maxTokens: 512,
    temperature: 0.7f
);

The AIDriver provides access to APIs via:

ai.Models - Model management operations
ai.Embeddings - Embedding generation
ai.Completion - Text completion generation
ai.Chat - Chat completion generation

ModelDriver

Manages model downloads and lifecycle:

// List all downloaded models
List<ModelFile> models = ai.Models.All();

// Get a specific model
ModelFile model = ai.Models.GetByName("QuantFactory/Qwen2.5-3B-GGUF");

// Download a new model from HuggingFace (GGUF format only)
ModelFile downloaded = await ai.Models.Add(
    name: "leliuga/all-MiniLM-L6-v2-GGUF",
    quantizationPriority: null,
    progressCallback: null);

// Delete a model
ai.Models.Delete("QuantFactory/Qwen2.5-3B-GGUF");

// Get the filesystem path for a model
string modelPath = ai.Models.GetFilename("QuantFactory/Qwen2.5-3B-GGUF");

🗄️ Model Management

SharpAI automatically handles downloading GGUF files from HuggingFace. Only GGUF format models are supported.

Queries available GGUF files for a model
Selects appropriate quantization based on file naming conventions
Downloads and stores models with metadata
Tracks model information in local Sqlite model registry

Model metadata includes:

Model name and GUID
File size and hashes (MD5, SHA1, SHA256)
Quantization type
Source URL
Creation timestamps

🔢 Generating Embeddings

Generate vector embeddings for text:

// Single text embedding
float[] embedding = await ai.Embeddings.Generate(
    model: "leliuga/all-MiniLM-L6-v2-GGUF",
    input: "This is a sample text"
);

// Multiple text embeddings
string[] texts = { "First text", "Second text", "Third text" };
float[][] embeddings = await ai.Embeddings.Generate(
    model: "leliuga/all-MiniLM-L6-v2-GGUF",
    inputs: texts
);

📝 Text Completions

Note: for best results, structure your prompt in a manner appropriate for the model you are using. See the prompt formatting section below.

Generate text continuations:

// Non-streaming completion
string completion = await ai.Completion.GenerateCompletion(
    model: "QuantFactory/Qwen2.5-3B-GGUF",
    prompt: "The meaning of life is",
    maxTokens: 512,
    temperature: 0.7f
);

// Streaming completion
await foreach (string token in ai.Completion.GenerateCompletionStreaming(
    model: "QuantFactory/Qwen2.5-3B-GGUF",
    prompt: "Write a poem about",
    maxTokens: 512,
    temperature: 0.8f))
{
    Console.Write(token);
}

💬 Chat Completions

Note: for best results, structure your prompt in a manner appropriate for the model you are using. See the prompt formatting section below.

Generate conversational responses:

// Non-streaming chat
string response = await ai.Chat.GenerateCompletion(
    model: "QuantFactory/Qwen2.5-3B-GGUF",
    prompt: chatFormattedPrompt,  // Prompt should be formatted for chat
    maxTokens: 512,
    temperature: 0.7f
);

// Streaming chat
await foreach (string token in ai.Chat.GenerateCompletionStreaming(
    model: "QuantFactory/Qwen2.5-3B-GGUF",
    prompt: chatFormattedPrompt,
    maxTokens: 512,
    temperature: 0.7f))
{
    Console.Write(token);
}

🛠️ Prompt Formatting

SharpAI includes prompt builders to format conversations for different model types:

Chat Message Formatting

using SharpAI.Prompts;

var messages = new List<ChatMessage>
{
    new ChatMessage { Role = "system", Content = "You are a helpful assistant." },
    new ChatMessage { Role = "user", Content = "What is the capital of France?" },
    new ChatMessage { Role = "assistant", Content = "The capital of France is Paris." },
    new ChatMessage { Role = "user", Content = "What is its population?" }
};

// Format for different model types
string chatMLPrompt = PromptBuilder.Build(ChatFormat.ChatML, messages);
/* Output:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>
<|im_start|>user
What is its population?<|im_end|>
<|im_start|>assistant
*/

string llama2Prompt = PromptBuilder.Build(ChatFormat.Llama2, messages);
/* Output:
<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

What is the capital of France? [/INST] The capital of France is Paris. </s><s>[INST] What is its population? [/INST] 
*/

string simplePrompt = PromptBuilder.Build(ChatFormat.Simple, messages);
/* Output:
system: You are a helpful assistant.
user: What is the capital of France?
assistant: The capital of France is Paris.
user: What is its population?
assistant:
*/

Supported chat formats:

Simple - Basic role: content format (generic models, base models)
ChatML - OpenAI ChatML format (GPT models, models fine-tuned with ChatML) including Qwen
Llama2 - Llama 2 instruction format (Llama-2-Chat models)
Llama3 - Llama 3 format (Llama-3-Instruct models)
Alpaca - Alpaca instruction

SharpAI

Install / Use

README

SharpAI

📁 Monorepo Structure

Sub-Projects

🚀 Features

📋 Table of Contents

📦 Installation

📖 Core Components

AIDriver

ModelDriver

🗄️ Model Management

🔢 Generating Embeddings

📝 Text Completions

💬 Chat Completions

🛠️ Prompt Formatting

Chat Message Formatting