ToolNeuron
On-device AI for Android — LLM chat (GGUF/llama.cpp), vision models (VLM), image generation (Stable Diffusion), tool calling, AI personas, RAG knowledge packs, TTS/STT. Fully offline, zero subscriptions, open-source.
Install / Use
/learn @Siddhesh2377/ToolNeuronREADME
ToolNeuron
Offline AI assistant for Android. Run LLMs, generate images, search documents — all on-device. No cloud. No subscriptions. No data leaves your phone.
<p align="left"> <a href="https://play.google.com/store/apps/details?id=com.dark.tool_neuron"> <img src="https://play.google.com/intl/en_us/badges/static/images/badges/en_badge_web_generic.png" alt="Get it on Google Play" height="80"/> </a> </p>Download APK · Discord · Report Issue
What It Does
- Text generation — Load any GGUF model (Llama, Mistral, Gemma, Phi, Qwen, etc.) and chat with it locally
- Image generation — Stable Diffusion 1.5 on-device, with inpainting support
- Image tools — Upscale and segment images locally (depth, style transfer, inpainting coming soon)
- RAG — Inject PDFs, Word docs, Excel, EPUB into conversations with semantic search
- Plugins — Web search, file manager, calculator, notepad, date/time, system info, dev utils — all callable by the LLM
- AI memory — The AI remembers facts about you across conversations, with deduplication and a forgetting curve
- Text-to-speech — 10 voices, 5 languages, on-device synthesis
- Encrypted storage — AES-256-GCM with hardware-backed keys for all chat data
- System backup — Export everything as an encrypted
.tnbackupfile
Requirements
| | Minimum | Recommended | |---|---------|-------------| | Android | 10 (API 29) | 12+ | | RAM | 6 GB | 8–12 GB | | Storage | 4 GB free | 10 GB free | | CPU | ARM64 or x86_64 | Snapdragon 8 Gen 1+ |
Getting Started
1. Install
Google Play or GitHub Releases.
2. Get a model
From the in-app Model Store (recommended):
- Open the drawer menu → Model Store
- Add a HuggingFace repository (e.g.
bartowski/Phi-3.5-mini-instruct-GGUF) - Pick a quantization and download
Or manually:
- Download a
.gguffile from HuggingFace - Use the model picker in ToolNeuron to load it
3. Chat
Select your model, wait for it to load, start typing. Responses stream in real-time.
Recommended models for getting started
| Use case | Model | Size | |----------|-------|------| | Quick test | Qwen3.5 0.8B Q4_K_M | ~600 MB | | General use | Qwen3.5 4B Q4_K_M | ~2.8 GB | | Power users | Qwen3.5 9B Q4_K_M | ~5.5 GB |
Pick Q4_K_M for a good balance between quality and size. Use Q6_K if your device has the RAM for it.
Features
Text Generation
- Any GGUF model works — load via file picker (no storage permissions needed, uses SAF)
- Configurable parameters: temperature, top-k, top-p, min-p, repeat penalty, context length
- Function calling with grammar-constrained JSON output
- Thinking mode for models that support it
- Per-model configs saved to database
Image Generation
- Stable Diffusion 1.5 (censored and uncensored variants)
- Text-to-image and inpainting
- Configurable steps, CFG scale, seed, negative prompts, schedulers
Image Tools
| Tool | Status | |------|--------| | Upscaling | Ready | | Segmentation (MobileSAM) | Ready | | Depth estimation | Model pending | | Style transfer | Model pending | | LaMa inpainting | Model pending |
RAG (Document Intelligence)
Create knowledge bases from:
- Files — PDF, Word (.doc/.docx), Excel (.xls/.xlsx), EPUB, TXT
- Text — Paste any text content
- Chat history — Convert past conversations into searchable knowledge
- Neuron Packets — Import encrypted
.neuronRAG files
The RAG pipeline uses hybrid retrieval: FTS4 BM25 + vector search + Reciprocal Rank Fusion + Maximal Marginal Relevance. Results are injected into the conversation context automatically.
Encrypted RAGs support admin passwords and read-only user access.
Plugin System
7 built-in plugins the LLM can call during conversations:
| Plugin | What it does | |--------|-------------| | Web Search | Search the web and scrape content | | File Manager | List, read, create files | | Calculator | Math expressions and unit conversion | | Notepad | Save and retrieve notes | | Date & Time | Current time, timezone conversion, date math | | System Info | RAM, battery, storage, device details | | Dev Utils | Hash, encode, format, text transforms |
AI Memory
Inspired by Mem0. After conversations, the LLM extracts facts about you and stores them for future context. Deduplication via Jaccard similarity, with a forgetting curve so stale memories decay. You can view, edit, and delete memories from the Memory screen.
Text-to-Speech
On-device TTS via Supertonic (ONNX Runtime). 10 voices (5 female, 5 male), 5 languages (EN, KR, ES, PT, FR). Adjustable speed and quality. Auto-speak option reads responses aloud.
Hardware Tuning
Auto-detects CPU topology (P-cores, E-cores) and recommends thread count, context size, and cache settings. Three modes: Performance, Balanced, Power Saver.
System Backup
Export everything to an encrypted .tnbackup file (PBKDF2 + AES-256-GCM):
- Chat history, AI memories, personas, knowledge graphs
- Model configs and app settings
- RAG files and AI models (optional, can be large)
Privacy
- Zero data collection. No telemetry, no analytics, no crash reporting.
- Everything stays on-device. Conversations, generated images, documents, TTS audio — none of it leaves your phone.
- Encrypted storage. AES-256-GCM with Android KeyStore. On supported devices, keys live in the Trusted Execution Environment.
- No storage permissions. Models load through Android's file picker (SAF). The app can't access arbitrary files.
- Open source. Read the code yourself.
Building from Source
Prerequisites
- Android Studio Meerkat (2025.1.1)+
- JDK 17
- Android SDK 36+, NDK 26.x
Build
git clone https://github.com/Siddhesh2377/ToolNeuron.git
cd ToolNeuron
# Debug
./gradlew assembleDebug
./gradlew installDebug
# Release
./gradlew assembleRelease
APKs land in app/build/outputs/apk/.
If you hit NDK issues, make sure NDK 26.x is installed via SDK Manager. For memory issues during build, bump the Gradle heap in gradle.properties:
org.gradle.jvmargs=-Xmx4096m
Architecture
| Layer | Technology | |-------|-----------| | Language | Kotlin, C++ (JNI) | | UI | Jetpack Compose | | Text inference | llama.cpp | | Image inference | LocalDream (SD 1.5) | | TTS | Supertonic (ONNX Runtime) | | Database | Room + UMS (custom binary format) | | Encryption | AES-256-GCM, Android KeyStore | | DI | Dagger Hilt | | Async | Kotlin Coroutines + Flow |
Modules
| Module | Purpose |
|--------|---------|
| app | Main Android application |
| ums | Unified Memory System — binary record storage with JNI |
| neuron-packet | Encrypted RAG packet format with access control |
| memory-vault | Legacy encrypted storage (read-only, used for migration) |
| system_encryptor | Native encryption primitives |
| file_ops | Native file operations |
Contributing
See CONTRIBUTORS.md for the project ecosystem and related repos.
How to contribute
- Fork the repo
- Create a feature branch:
git checkout -b feature/your-feature - Make focused commits with clear messages
- Test on a real device — emulators don't reflect real performance
- Open a PR with a description of what you changed and how you tested it
Priority areas
- Bug fixes and stability
- Device compatibility testing (especially mid-range phones)
- Performance improvements
- Documentation and translations
- New plugins
What not to do
- Don't submit untested code
- Don't add cloud dependencies or telemetry
- Don't break offline functionality
- Don't add broad storage permissions
Security
If you find a security vulnerability:
- Do not open a public GitHub issue
- Email siddheshsonar2377@gmail.com
- Include reproduction steps
- Allow reasonable time for a fix before disclosure
Acknowledgments
- llama.cpp — LLM inference engine
- LocalDream — Stable Diffusion on Android
- ONNX Runtime — TTS inference
- Mem0 — AI memory architecture inspiration
- SillyTavern — Character card format reference
- Apache POI, PDFBox-Android, EpubLib — Document parsing
- Jetpack Compose, Room, Hilt, OkHttp, Coil 3, Jsoup
License
Apache License 2.0 — use it, modify it, distribute it. Attribution appreciated.
Built by Siddhesh Sonar
Related Skills
docs-writer
99.3k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
339.1kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
project-overview
FlightPHP Skeleton Project Instructions This document provides guidelines and best practices for structuring and developing a project using the FlightPHP framework. Instructions for AI Coding A
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
