Sigil

A local-first LLM development studio. Build, test, and customize inference workflows with your own models — no cloud, totally local.

Generate Convert Improve

Install / Use

/learn @Thrasher-Software/Sigil

About this skill

Quality Score

0/100

README

Sigil – Local LLM Runner with Web UI

Tideglass

Sigil is a local-first application for running Hugging Face transformer models directly on your machine. Built with a FastAPI backend and a React/Vite web interface, Sigil provides a streamlined environment for loading, interacting with, and customizing transformer models through a modern UI and flexible API.

Designed for developers and researchers, Sigil offers a modular and transparent alternative to hosted inference platforms. Whether you're experimenting with small models, adjusting generation settings, or building a custom interface, Sigil gives you the tools to work with local models efficiently, no external dependencies required.

Use Sigil as a starting point for your own local AI workflows, or extend it to suit the needs of your own projects.

📑 Table of Contents

Sigil – Local LLM Runner with Web UI

Features

Sigil is built for developers who want full control over local inference workflows. Key features include:

Model Support

Load Hugging Face-compatible transformer models directly from your local file system
Supports .safetensors, .bin, and associated tokenizer/config files
Modular model loader (load_model_internal) designed for extensibility

Backend Architecture

FastAPI-based REST API with modular routers (chat, settings, models, and more)
Endpoints for chat, model loading (by path or name), VRAM status, runtime settings, theme listing, and model listing.
Model configuration and inference settings stored in application state for easy access and live updates
Full backend logging to backend_api.log for transparency and debugging

Prompt Handling

Dual-mode support: "instruction" and "chat" style prompting
Automatically formats input using Hugging Face's apply_chat_template() when appropriate
Optional system prompt overrides per request

Runtime Configuration

Update generation parameters on the fly using /settings/update:
- temperature
- top_p
- max_new_tokens
- system_prompt (optional)
Designed for prompt engineers and iterative experimentation

GPU Awareness

Detects CUDA-compatible devices and uses them automatically when available
Exposes VRAM usage via /api/v1/vram API endpoint

Frontend Interface

Built with React and Vite for a fast, responsive user experience
Live chat interface for real-time interaction with models
Accessible at http://localhost:5173 during development
Dynamic theme loading supported via /themes endpoint.

Development Environment

start_dev.sh handles coordinated startup of backend and frontend
Supports local-first workflows with no external dependencies

Interface Walkthrough

Sigil's frontend is designed for clarity, responsiveness, and developer-first workflows. Once the application is running, here's what the interface provides:

Model Loading

After startup, the frontend will prompt you to select a model. Options may include loading from a local path or choosing a pre-defined model name discovered by the backend. Upon submission:

The backend loads the model via the /api/v1/model/load or /api/v1/model/load/{model_name} endpoint.
Real-time status updates appear in backend/backend_api.log
On success, the chat interface becomes available

Application Launch and Model Loading

Visual Aids for the Interface

Here are some screenshots to help familiarize you with the UI and the core workflows:

Model Loading View:
Code Interaction View:
Thinking / Processing View:

Chat Interface

After loading a model, the frontend presents a clean interface to interact with it:

Type prompts and receive model responses
Backend routes input through either instruction or chat prompt templates
Frontend displays complete output once inference is completed
Chats are Saved: Your conversations are automatically saved locally, so you can close the application and resume your chat later.
Settings Persist: Sampling parameters (like temperature, top_p, etc.) are saved with each chat and will be restored when you reopen it.
Multiple Chats: Sigil supports multiple chat instances, allowing you to work with different models or conversations simultaneously.

Themes

Sigil supports both light and dark variants of each theme. The theme system is designed to be easily extensible:

Each theme consists of CSS variables defining colors and UI properties
Themes automatically support both light and dark modes
Custom themes can be added by creating new CSS files in the frontend/public/themes directory
Update: An AI Assistant plugin for automated theme creation is currently in development.

Developer Tools and Feedback

All requests are routed through the FastAPI backend
Adjust generation settings (e.g. temperature, top_p, max_new_tokens) via the /api/v1/settings/update endpoint
VRAM status can be accessed via the /api/v1/vram endpoint if GPU is enabled
Available models can be listed via /models endpoint.
Available themes can be listed via /themes endpoint.
Logs provide visibility into model status and runtime behavior

This interface is ideal for local experimentation, debugging, and integrating lightweight LLMs into your workflow without external dependencies.

Prerequisites

Python 3.11+
pip (Python package installer)
Node.js and npm (for the frontend development server and building)
A compatible Hugging Face transformer model downloaded locally (e.g., TinyLlama). The model directory should contain files like *.safetensors or pytorch_model.bin, config.json, tokenizer.json, etc.

Setup

Clone the repository:

git clone https://github.com/Thrasher-Intelligence/sigil
cd https://github.com/Thrasher-Intelligence/sigil

Create and activate a Python virtual environment:

python3 -m venv venv
source venv/bin/activate
# On Windows use: venv\Scripts\activate

Install Python dependencies:
```
pip install -r requirements.txt
```
Note: Installing torch can sometimes be complex. If you need a specific version (e.g., for CUDA), refer to the official PyTorch installation guide: https://pytorch.org/get-started/locally/
Install Frontend dependencies:
```
cd frontend
npm install
cd ..
```
(Note: Frontend dependencies are managed by package.json in the root directory).
(Optional but Recommended) Download or Place Models:
- Download models (like TinyLlama) as described before.
- Place the downloaded model directories inside the backend/models/ directory. This allows loading them by name (e.g., TinyLlama-1.1B-Chat-v1.0).
- Alternatively, you can still load models from any local path using the UI.

Running the Development Environment

Ensure your Python virtual environment is activated.
Run the appropriate development startup script for your platform:

For macOS/Linux:
```
./scripts/start_dev.sh
```
For Windows (PowerShell):
```
.\scripts\start_dev.ps1
```
For Windows (Command Prompt):
```
scripts\start_dev.bat
```
Wait for Startup:
- The script first starts the backend API server (Uvicorn). You'll see logs in backend_api.log. It will indicate if CUDA is detected.
- Then, it starts the frontend Vite development server (usually accessible at http://localhost:5173).
Load Model & Chat:
- Open your web browser to the frontend URL (e.g., http://localhost:5173).
- The web UI should provide an interface to specify the path to your model directory or select a model name listed from backend/models/.
- Once the model path/name is submitted, the frontend will call the appropriate backend API endpoint (/api/v1/model/load or /api/v1/model/load/{model_name}) to load the model. Watch the backend logs (backend/backend_api.log) for progress.
- After the model is loaded successfully, you can use the chat interface.
Stopping:
- Press Ctrl+C in the terminal where start_dev.sh is running. The script will handle shutting down both th

Related Skills

bluebubbles

346.8k

Use when you need to send or manage iMessages via BlueBubbles (recommended iMessage integration). Calls go through the generic message tool with channel="bluebubbles".

gh-issues

346.8k

Fetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]

node-connect

346.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

oracle

346.8k

Best practices for using the oracle CLI (prompt + file bundling, engines, sessions, and file attachment patterns).