ToolNeuron

On-device AI for Android — LLM chat (GGUF/llama.cpp), vision models (VLM), image generation (Stable Diffusion), tool calling, AI personas, RAG knowledge packs, TTS/STT. Fully offline, zero subscriptions, open-source.

Generate Convert Improve

Install / Use

/learn @Siddhesh2377/ToolNeuron

About this skill

Quality Score

0/100

README

ToolNeuron

Offline AI assistant for Android. Run LLMs, generate images, search documents — all on-device. No cloud. No subscriptions. No data leaves your phone.

Download APK · Discord · Report Issue

What It Does

Text generation — Load any GGUF model (Llama, Mistral, Gemma, Phi, Qwen, etc.) and chat with it locally
Image generation — Stable Diffusion 1.5 on-device, with inpainting support
Image tools — Upscale and segment images locally (depth, style transfer, inpainting coming soon)
RAG — Inject PDFs, Word docs, Excel, EPUB into conversations with semantic search
Plugins — Web search, file manager, calculator, notepad, date/time, system info, dev utils — all callable by the LLM
AI memory — The AI remembers facts about you across conversations, with deduplication and a forgetting curve
Text-to-speech — 10 voices, 5 languages, on-device synthesis
Encrypted storage — AES-256-GCM with hardware-backed keys for all chat data
System backup — Export everything as an encrypted .tnbackup file

Requirements

| | Minimum | Recommended | |---|---------|-------------| | Android | 10 (API 29) | 12+ | | RAM | 6 GB | 8–12 GB | | Storage | 4 GB free | 10 GB free | | CPU | ARM64 or x86_64 | Snapdragon 8 Gen 1+ |

Getting Started

1. Install

Google Play or GitHub Releases.

2. Get a model

From the in-app Model Store (recommended):

Open the drawer menu → Model Store
Add a HuggingFace repository (e.g. bartowski/Phi-3.5-mini-instruct-GGUF)
Pick a quantization and download

Or manually:

Download a .gguf file from HuggingFace
Use the model picker in ToolNeuron to load it

3. Chat

Select your model, wait for it to load, start typing. Responses stream in real-time.

Recommended models for getting started

| Use case | Model | Size | |----------|-------|------| | Quick test | Qwen3.5 0.8B Q4_K_M | ~600 MB | | General use | Qwen3.5 4B Q4_K_M | ~2.8 GB | | Power users | Qwen3.5 9B Q4_K_M | ~5.5 GB |

Pick Q4_K_M for a good balance between quality and size. Use Q6_K if your device has the RAM for it.

Features

Text Generation

Any GGUF model works — load via file picker (no storage permissions needed, uses SAF)
Configurable parameters: temperature, top-k, top-p, min-p, repeat penalty, context length
Function calling with grammar-constrained JSON output
Thinking mode for models that support it
Per-model configs saved to database

Image Generation

Stable Diffusion 1.5 (censored and uncensored variants)
Text-to-image and inpainting
Configurable steps, CFG scale, seed, negative prompts, schedulers

Image Tools

| Tool | Status | |------|--------| | Upscaling | Ready | | Segmentation (MobileSAM) | Ready | | Depth estimation | Model pending | | Style transfer | Model pending | | LaMa inpainting | Model pending |

RAG (Document Intelligence)

Create knowledge bases from:

Files — PDF, Word (.doc/.docx), Excel (.xls/.xlsx), EPUB, TXT
Text — Paste any text content
Chat history — Convert past conversations into searchable knowledge
Neuron Packets — Import encrypted .neuron RAG files

The RAG pipeline uses hybrid retrieval: FTS4 BM25 + vector search + Reciprocal Rank Fusion + Maximal Marginal Relevance. Results are injected into the conversation context automatically.

Encrypted RAGs support admin passwords and read-only user access.

Plugin System

7 built-in plugins the LLM can call during conversations:

| Plugin | What it does | |--------|-------------| | Web Search | Search the web and scrape content | | File Manager | List, read, create files | | Calculator | Math expressions and unit conversion | | Notepad | Save and retrieve notes | | Date & Time | Current time, timezone conversion, date math | | System Info | RAM, battery, storage, device details | | Dev Utils | Hash, encode, format, text transforms |

AI Memory

Inspired by Mem0. After conversations, the LLM extracts facts about you and stores them for future context. Deduplication via Jaccard similarity, with a forgetting curve so stale memories decay. You can view, edit, and delete memories from the Memory screen.

Text-to-Speech

On-device TTS via Supertonic (ONNX Runtime). 10 voices (5 female, 5 male), 5 languages (EN, KR, ES, PT, FR). Adjustable speed and quality. Auto-speak option reads responses aloud.

Hardware Tuning

Auto-detects CPU topology (P-cores, E-cores) and recommends thread count, context size, and cache settings. Three modes: Performance, Balanced, Power Saver.

System Backup

Export everything to an encrypted .tnbackup file (PBKDF2 + AES-256-GCM):

Chat history, AI memories, personas, knowledge graphs
Model configs and app settings
RAG files and AI models (optional, can be large)

Privacy

Zero data collection. No telemetry, no analytics, no crash reporting.
Everything stays on-device. Conversations, generated images, documents, TTS audio — none of it leaves your phone.
Encrypted storage. AES-256-GCM with Android KeyStore. On supported devices, keys live in the Trusted Execution Environment.
No storage permissions. Models load through Android's file picker (SAF). The app can't access arbitrary files.
Open source. Read the code yourself.

Building from Source

Prerequisites

Android Studio Meerkat (2025.1.1)+
JDK 17
Android SDK 36+, NDK 26.x

Build

git clone https://github.com/Siddhesh2377/ToolNeuron.git
cd ToolNeuron

# Debug
./gradlew assembleDebug
./gradlew installDebug

# Release
./gradlew assembleRelease

APKs land in app/build/outputs/apk/.

If you hit NDK issues, make sure NDK 26.x is installed via SDK Manager. For memory issues during build, bump the Gradle heap in gradle.properties:

org.gradle.jvmargs=-Xmx4096m

Architecture

| Layer | Technology | |-------|-----------| | Language | Kotlin, C++ (JNI) | | UI | Jetpack Compose | | Text inference | llama.cpp | | Image inference | LocalDream (SD 1.5) | | TTS | Supertonic (ONNX Runtime) | | Database | Room + UMS (custom binary format) | | Encryption | AES-256-GCM, Android KeyStore | | DI | Dagger Hilt | | Async | Kotlin Coroutines + Flow |

Modules

| Module | Purpose | |--------|---------| | app | Main Android application | | ums | Unified Memory System — binary record storage with JNI | | neuron-packet | Encrypted RAG packet format with access control | | memory-vault | Legacy encrypted storage (read-only, used for migration) | | system_encryptor | Native encryption primitives | | file_ops | Native file operations |

Contributing

See CONTRIBUTORS.md for the project ecosystem and related repos.

How to contribute

Fork the repo
Create a feature branch: git checkout -b feature/your-feature
Make focused commits with clear messages
Test on a real device — emulators don't reflect real performance
Open a PR with a description of what you changed and how you tested it

Priority areas

Bug fixes and stability
Device compatibility testing (especially mid-range phones)
Performance improvements
Documentation and translations
New plugins

What not to do

Don't submit untested code
Don't add cloud dependencies or telemetry
Don't break offline functionality
Don't add broad storage permissions

Security

If you find a security vulnerability:

Do not open a public GitHub issue
Email siddheshsonar2377@gmail.com
Include reproduction steps
Allow reasonable time for a fix before disclosure

Acknowledgments

llama.cpp — LLM inference engine
LocalDream — Stable Diffusion on Android
ONNX Runtime — TTS inference
Mem0 — AI memory architecture inspiration
SillyTavern — Character card format reference
Apache POI, PDFBox-Android, EpubLib — Document parsing
Jetpack Compose, Room, Hilt, OkHttp, Coil 3, Jsoup

License

Apache License 2.0 — use it, modify it, distribute it. Attribution appreciated.

Built by Siddhesh Sonar

Star the repo · Report a bug · Join Discord

Related Skills

docs-writer

99.3k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

339.1k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

project-overview

FlightPHP Skeleton Project Instructions This document provides guidelines and best practices for structuring and developing a project using the FlightPHP framework. Instructions for AI Coding A

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

Siddhesh2377

View profile

View on GitHub

GitHub Stars308

CategoryContent

Updated8h ago

Forks24

Siddhesh2377/ToolNeuron

Languages

Kotlin

Security Score

100/100

Audited on Mar 28, 2026

No findings