SharpAI
SharpAI is an embeddable embeddings, completions, and model management platform using llama.cpp via LlamaSharp, with a built-in Ollama-compatible webserver.
Install / Use
/learn @jchristn/SharpAIREADME
SharpAI
Transform your .NET applications into AI powerhouses - embed models directly or deploy as an Ollama-compatible and OpenAI-compatible API server. No cloud dependencies, no limits, just local embeddings and inference.
<p align="center"> <img src="https://img.shields.io/badge/.NET-5C2D91?style=for-the-badge&logo=.net&logoColor=white" /> <img src="https://img.shields.io/badge/C%23-239120?style=for-the-badge&logo=c-sharp&logoColor=white" /> <img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge" /> </p> <p align="center"> <a href="https://www.nuget.org/packages/SharpAI/"> <img src="https://img.shields.io/nuget/v/SharpAI.svg?style=flat" alt="NuGet Version"> </a> <a href="https://www.nuget.org/packages/SharpAI"> <img src="https://img.shields.io/nuget/dt/SharpAI.svg" alt="NuGet Downloads"> </a> </p> <p align="center"> <strong>A .NET library for local AI model inference with Ollama-compatible and OpenAI-compatible REST APIs</strong> </p> <p align="center"> Embeddings • Completions • Chat • Built on LlamaSharp • GGUF Models Only </p>📁 Monorepo Structure
SharpAI is organized as a monorepo containing the core library, server, dashboard, and client SDKs:
SharpAI/
├── src/ # Core .NET library and server
│ ├── SharpAI/ # Core library (NuGet: SharpAI)
│ ├── SharpAI.Server/ # REST API server
│ └── Test.*/ # Test projects
├── dashboard/ # Next.js 14 web interface
├── sdk/
│ ├── csharp/ # C# SDK (NuGet: SharpAI.Sdk)
│ ├── python/ # Python SDK (coming soon)
│ └── js/ # TypeScript/JavaScript SDK (npm: @sharpai/sdk)
├── docker/ # Docker assets
└── README.md
Sub-Projects
| Project | Description | Documentation | |---------|-------------|---------------| | SharpAI | Core .NET library for local AI inference | This README | | SharpAI.Server | Ollama & OpenAI compatible REST API server | This README | | Dashboard | Next.js web interface for managing models | dashboard/README.md | | C# SDK | SDK for .NET applications to connect to SharpAI server | sdk/csharp/README.md | | TypeScript SDK | SDK for Node.js/browser applications | sdk/js/README.md | | Python SDK | SDK for Python applications | sdk/python/README.md |
🚀 Features
- Ollama and OpenAI Compatible REST API Server - Provides endpoints compatible with API from Ollama and OpenAI
- Model Management - Download and manage GGUF models from HuggingFace using Ollama APIs
- Multiple Inference Types:
- Text embeddings generation
- Text completions
- Chat completions
- Prompt Engineering Tools - Built-in helpers for formatting prompts for different model types
- GPU Acceleration - Automatic CUDA detection when available
- Streaming Support - Real-time token streaming for completions
- SQLite Model Registry - Tracks model metadata and file information
📋 Table of Contents
- Installation
- Core Components
- Model Management
- Generating Embeddings
- Text Completions
- Chat Completions
- Prompt Formatting
- API Server
- Requirements
- Version History
- License
- Acknowledgments
📦 Installation
Install SharpAI via NuGet:
dotnet add package SharpAI
Or via Package Manager Console:
Install-Package SharpAI
📖 Core Components
AIDriver
The main entry point that provides access to all functionality:
using SharpAI;
using SyslogLogging;
// Initialize the AI driver
var ai = new AIDriver(
logging: new LoggingModule(),
databaseFilename: "./sharpai.db",
huggingFaceApiKey: "hf_xxxxxxxxxxxx",
modelDirectory: "./models/"
);
// Download a model from HuggingFace (GGUF format only)
await ai.Models.Add(
name: "QuantFactory/Qwen2.5-3B-GGUF",
quantizationPriority: null,
progressCallback: (url, bytesDownloaded, percentComplete) =>
{
Console.WriteLine($"Progress: {percentComplete:P0}");
});
// Generate a completion
string response = await ai.Completion.GenerateCompletion(
model: "QuantFactory/Qwen2.5-3B-GGUF",
prompt: "Once upon a time",
maxTokens: 512,
temperature: 0.7f
);
The AIDriver provides access to APIs via:
ai.Models- Model management operationsai.Embeddings- Embedding generationai.Completion- Text completion generationai.Chat- Chat completion generation
ModelDriver
Manages model downloads and lifecycle:
// List all downloaded models
List<ModelFile> models = ai.Models.All();
// Get a specific model
ModelFile model = ai.Models.GetByName("QuantFactory/Qwen2.5-3B-GGUF");
// Download a new model from HuggingFace (GGUF format only)
ModelFile downloaded = await ai.Models.Add(
name: "leliuga/all-MiniLM-L6-v2-GGUF",
quantizationPriority: null,
progressCallback: null);
// Delete a model
ai.Models.Delete("QuantFactory/Qwen2.5-3B-GGUF");
// Get the filesystem path for a model
string modelPath = ai.Models.GetFilename("QuantFactory/Qwen2.5-3B-GGUF");
🗄️ Model Management
SharpAI automatically handles downloading GGUF files from HuggingFace. Only GGUF format models are supported.
- Queries available GGUF files for a model
- Selects appropriate quantization based on file naming conventions
- Downloads and stores models with metadata
- Tracks model information in local Sqlite model registry
Model metadata includes:
- Model name and GUID
- File size and hashes (MD5, SHA1, SHA256)
- Quantization type
- Source URL
- Creation timestamps
🔢 Generating Embeddings
Generate vector embeddings for text:
// Single text embedding
float[] embedding = await ai.Embeddings.Generate(
model: "leliuga/all-MiniLM-L6-v2-GGUF",
input: "This is a sample text"
);
// Multiple text embeddings
string[] texts = { "First text", "Second text", "Third text" };
float[][] embeddings = await ai.Embeddings.Generate(
model: "leliuga/all-MiniLM-L6-v2-GGUF",
inputs: texts
);
📝 Text Completions
Note: for best results, structure your prompt in a manner appropriate for the model you are using. See the prompt formatting section below.
Generate text continuations:
// Non-streaming completion
string completion = await ai.Completion.GenerateCompletion(
model: "QuantFactory/Qwen2.5-3B-GGUF",
prompt: "The meaning of life is",
maxTokens: 512,
temperature: 0.7f
);
// Streaming completion
await foreach (string token in ai.Completion.GenerateCompletionStreaming(
model: "QuantFactory/Qwen2.5-3B-GGUF",
prompt: "Write a poem about",
maxTokens: 512,
temperature: 0.8f))
{
Console.Write(token);
}
💬 Chat Completions
Note: for best results, structure your prompt in a manner appropriate for the model you are using. See the prompt formatting section below.
Generate conversational responses:
// Non-streaming chat
string response = await ai.Chat.GenerateCompletion(
model: "QuantFactory/Qwen2.5-3B-GGUF",
prompt: chatFormattedPrompt, // Prompt should be formatted for chat
maxTokens: 512,
temperature: 0.7f
);
// Streaming chat
await foreach (string token in ai.Chat.GenerateCompletionStreaming(
model: "QuantFactory/Qwen2.5-3B-GGUF",
prompt: chatFormattedPrompt,
maxTokens: 512,
temperature: 0.7f))
{
Console.Write(token);
}
🛠️ Prompt Formatting
SharpAI includes prompt builders to format conversations for different model types:
Chat Message Formatting
using SharpAI.Prompts;
var messages = new List<ChatMessage>
{
new ChatMessage { Role = "system", Content = "You are a helpful assistant." },
new ChatMessage { Role = "user", Content = "What is the capital of France?" },
new ChatMessage { Role = "assistant", Content = "The capital of France is Paris." },
new ChatMessage { Role = "user", Content = "What is its population?" }
};
// Format for different model types
string chatMLPrompt = PromptBuilder.Build(ChatFormat.ChatML, messages);
/* Output:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>
<|im_start|>user
What is its population?<|im_end|>
<|im_start|>assistant
*/
string llama2Prompt = PromptBuilder.Build(ChatFormat.Llama2, messages);
/* Output:
<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>
What is the capital of France? [/INST] The capital of France is Paris. </s><s>[INST] What is its population? [/INST]
*/
string simplePrompt = PromptBuilder.Build(ChatFormat.Simple, messages);
/* Output:
system: You are a helpful assistant.
user: What is the capital of France?
assistant: The capital of France is Paris.
user: What is its population?
assistant:
*/
Supported chat formats:
Simple- Basic role: content format (generic models, base models)ChatML- OpenAI ChatML format (GPT models, models fine-tuned with ChatML) including QwenLlama2- Llama 2 instruction format (Llama-2-Chat models)Llama3- Llama 3 format (Llama-3-Instruct models)Alpaca- Alpaca instruction
