Sigil
A local-first LLM development studio. Build, test, and customize inference workflows with your own models — no cloud, totally local.
Install / Use
/learn @Thrasher-Software/SigilREADME
Sigil – Local LLM Runner with Web UI

Sigil is a local-first application for running Hugging Face transformer models directly on your machine. Built with a FastAPI backend and a React/Vite web interface, Sigil provides a streamlined environment for loading, interacting with, and customizing transformer models through a modern UI and flexible API.
Designed for developers and researchers, Sigil offers a modular and transparent alternative to hosted inference platforms. Whether you're experimenting with small models, adjusting generation settings, or building a custom interface, Sigil gives you the tools to work with local models efficiently, no external dependencies required.
Use Sigil as a starting point for your own local AI workflows, or extend it to suit the needs of your own projects.
📑 Table of Contents
- Sigil – Local LLM Runner with Web UI
<a name="top"></a>
Features
Sigil is built for developers who want full control over local inference workflows. Key features include:
Model Support
- Load Hugging Face-compatible transformer models directly from your local file system
- Supports
.safetensors,.bin, and associated tokenizer/config files - Modular model loader (
load_model_internal) designed for extensibility
Backend Architecture
- FastAPI-based REST API with modular routers (
chat,settings,models, and more) - Endpoints for chat, model loading (by path or name), VRAM status, runtime settings, theme listing, and model listing.
- Model configuration and inference settings stored in application state for easy access and live updates
- Full backend logging to
backend_api.logfor transparency and debugging
Prompt Handling
- Dual-mode support:
"instruction"and"chat"style prompting - Automatically formats input using Hugging Face's
apply_chat_template()when appropriate - Optional system prompt overrides per request
Runtime Configuration
- Update generation parameters on the fly using
/settings/update:temperaturetop_pmax_new_tokenssystem_prompt(optional)
- Designed for prompt engineers and iterative experimentation
GPU Awareness
- Detects CUDA-compatible devices and uses them automatically when available
- Exposes VRAM usage via
/api/v1/vramAPI endpoint
Frontend Interface
- Built with React and Vite for a fast, responsive user experience
- Live chat interface for real-time interaction with models
- Accessible at
http://localhost:5173during development - Dynamic theme loading supported via
/themesendpoint.
Development Environment
start_dev.shhandles coordinated startup of backend and frontend- Supports local-first workflows with no external dependencies
Interface Walkthrough
Sigil's frontend is designed for clarity, responsiveness, and developer-first workflows. Once the application is running, here's what the interface provides:
Model Loading
After startup, the frontend will prompt you to select a model. Options may include loading from a local path or choosing a pre-defined model name discovered by the backend. Upon submission:
- The backend loads the model via the
/api/v1/model/loador/api/v1/model/load/{model_name}endpoint. - Real-time status updates appear in
backend/backend_api.log - On success, the chat interface becomes available

Visual Aids for the Interface
Here are some screenshots to help familiarize you with the UI and the core workflows:
-
Model Loading View:

-
Code Interaction View:

-
Thinking / Processing View:

Chat Interface
After loading a model, the frontend presents a clean interface to interact with it:
- Type prompts and receive model responses
- Backend routes input through either instruction or chat prompt templates
- Frontend displays complete output once inference is completed
- Chats are Saved: Your conversations are automatically saved locally, so you can close the application and resume your chat later.
- Settings Persist: Sampling parameters (like temperature, top_p, etc.) are saved with each chat and will be restored when you reopen it.
- Multiple Chats: Sigil supports multiple chat instances, allowing you to work with different models or conversations simultaneously.
Themes
Sigil supports both light and dark variants of each theme. The theme system is designed to be easily extensible:
- Each theme consists of CSS variables defining colors and UI properties
- Themes automatically support both light and dark modes
- Custom themes can be added by creating new CSS files in the
frontend/public/themesdirectory - Update: An AI Assistant plugin for automated theme creation is currently in development.
Developer Tools and Feedback
- All requests are routed through the FastAPI backend
- Adjust generation settings (e.g.
temperature,top_p,max_new_tokens) via the/api/v1/settings/updateendpoint - VRAM status can be accessed via the
/api/v1/vramendpoint if GPU is enabled - Available models can be listed via
/modelsendpoint. - Available themes can be listed via
/themesendpoint. - Logs provide visibility into model status and runtime behavior
This interface is ideal for local experimentation, debugging, and integrating lightweight LLMs into your workflow without external dependencies.
Prerequisites
- Python 3.11+
pip(Python package installer)- Node.js and
npm(for the frontend development server and building) - A compatible Hugging Face transformer model downloaded locally (e.g., TinyLlama). The model directory should contain files like
*.safetensorsorpytorch_model.bin,config.json,tokenizer.json, etc.
Setup
-
Clone the repository:
git clone https://github.com/Thrasher-Intelligence/sigil cd https://github.com/Thrasher-Intelligence/sigil -
Create and activate a Python virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate -
Install Python dependencies:
pip install -r requirements.txtNote: Installing
torchcan sometimes be complex. If you need a specific version (e.g., for CUDA), refer to the official PyTorch installation guide: https://pytorch.org/get-started/locally/ -
Install Frontend dependencies:
cd frontend npm install cd ..(Note: Frontend dependencies are managed by
package.jsonin the root directory). -
(Optional but Recommended) Download or Place Models:
- Download models (like TinyLlama) as described before.
- Place the downloaded model directories inside the
backend/models/directory. This allows loading them by name (e.g.,TinyLlama-1.1B-Chat-v1.0). - Alternatively, you can still load models from any local path using the UI.
Running the Development Environment
-
Ensure your Python virtual environment is activated.
-
Run the appropriate development startup script for your platform:
For macOS/Linux:
./scripts/start_dev.shFor Windows (PowerShell):
.\scripts\start_dev.ps1For Windows (Command Prompt):
scripts\start_dev.bat -
Wait for Startup:
- The script first starts the backend API server (Uvicorn). You'll see logs in
backend_api.log. It will indicate if CUDA is detected. - Then, it starts the frontend Vite development server (usually accessible at
http://localhost:5173).
- The script first starts the backend API server (Uvicorn). You'll see logs in
-
Load Model & Chat:
- Open your web browser to the frontend URL (e.g.,
http://localhost:5173). - The web UI should provide an interface to specify the path to your model directory or select a model name listed from
backend/models/. - Once the model path/name is submitted, the frontend will call the appropriate backend API endpoint (
/api/v1/model/loador/api/v1/model/load/{model_name}) to load the model. Watch the backend logs (backend/backend_api.log) for progress. - After the model is loaded successfully, you can use the chat interface.
- Open your web browser to the frontend URL (e.g.,
-
Stopping:
- Press
Ctrl+Cin the terminal wherestart_dev.shis running. The script will handle shutting down both th
- Press
Related Skills
bluebubbles
346.8kUse when you need to send or manage iMessages via BlueBubbles (recommended iMessage integration). Calls go through the generic message tool with channel="bluebubbles".
gh-issues
346.8kFetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]
node-connect
346.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
oracle
346.8kBest practices for using the oracle CLI (prompt + file bundling, engines, sessions, and file attachment patterns).
