SkillAgentSearch skills...

TinyMobileLLM

on-mobile-llm is a research-style project that evaluates how well small language models (0.5B–2B parameters) run fully offline on an Android smartphone using GGUF + llama.cpp + Termux. The goal is to measure speed, memory usage, thermals, stability, and output quality across a variety of SLM architectures.

Install / Use

/learn @m4vic/TinyMobileLLM
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

TinyMobileLLM

TinyMobileLLM is a research-style project that benchmarks tiny language models (0.5B–2B parameters) on both PC and Mobile hardware.
The purpose is to understand:

  • how fast tiny LLMs run on real smartphones
  • how quantization affects speed & memory
  • which architectures (Transformer vs Recurrent) perform better
  • how multi-threading scales on mobile CPUs
  • whether tiny LLMs are usable for real offline apps

All tests use llama.cpp with GGUF models.


Project Structure

tinyMobileLLM/
│
├── README.md
├── LICENSE
├── .gitignore
│
├── models/                # GGUF models (NOT committed)
├── llama.cpp/             # Windows or Termux build
│
├── docs/
│   ├── 01_overview.md
│   ├── 02_pc_setup.md
│   ├── 03_model_inventory.md
│   ├── 04_benchmark_methodology.md
│   ├── 05_results_summary.md
│   └── 06_future_work.md
│   ├── experiments_pc/
│   └── experiments_mobile/
│
├── benchmarks/
│   ├── pc_logs/
│   └── mobile_logs/
│
├── scripts/
│   ├── pc_benchmark.ps1
│   └── termux_benchmark.sh
│   
│
└── media/
    ├── screenshots/
    └──recordings/
    

Requirements

PC

  • Windows 10
  • i5-12400F
  • 16GB DDR4
  • llama.cpp b7109

Mobile

  • Snapdragon 855
  • 6GB RAM
  • Termux
  • Android 12

Download Required Models (GGUF)

You must download the same models used in our benchmarks.

Qwen2.5 Models (0.5B & 1.5B)

https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF
https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF/tree/main

Gemma e2B Q3_K_M

https://huggingface.co/gleidsonnunes/gemma-3n-E2B-it-Q3_K_M-GGUF/tree/main

RecurrentGemma 2B Q2_K

https://huggingface.co/archaeus06/RLPR-Gemma2-2B-it-Q2_K-GGUF/tree/main

Place them inside:

tinyMobileLLM/models/<model-family>/

(Full structure shown in Model Inventory.)

Quickstart

PC Inference

.\llama-cli.exe -m "models/qwen2.5/qwen2.5-0.5b-instruct-q5_k_m.gguf" -p "Hello" -n 200

Mobile Inference

./llama-cli -m "/data/.../qwen2.5-0.5b-instruct-q5_k_m.gguf" -p "Hello" -n 100

Summary Tables

PC Decode Speed (tokens/s)

| Model | Quant | TPS | Memory | |------|--------|------|--------| | Qwen0.5B | Q5_K_M | 80.58 | 852 MB | | Qwen1.5B | Q3_K_M | 39.79 | 1290 MB | | Qwen1.5B | Q4_K_M | 33.85 | 1474 MB | | Qwen1.5B | Q5_K_M | 33.44 | 1635 MB | | Gemma e2B | Q3_K_M | 22.29 | 2770 MB | | RecurrentGemma 2B | Q2_K | 26.00 | 2087 MB |


Mobile Decode Speed (Thread = 1)

| Model | Quant | TPS | Memory | |------|--------|------|--------| | Qwen0.5B | Q5_K_M | 16.25 | 852 MB | | Qwen1.5B | Q3_K_M | 7.60 | 1290 MB | | Qwen1.5B | Q4_K_M | 6.29 | 1474 MB | | Qwen1.5B | Q5_K_M | 5.98 | 1635 MB | | RecurrentGemma 2B | Q2_K | 5.10 | 2087 MB | | Gemma e2B | Q3_K_M | 3.65 | 2770 MB |


Mobile Multi-Thread Scaling (t1 → t4)

| Model | t1 TPS | t4 TPS | Scaling | |--------|--------|--------|----------| | Qwen0.5B Q5 | 16.25 | 15.45 | ↓ none | | Qwen1.5B Q3 | 7.60 | 13.81 | ↑ good | | Qwen1.5B Q5 | 5.98 | 11.11 | ↑ good | | RecurrentGemma 2B | 5.10 | 8.88 | ↑ very good | | Gemma e2B Q3 | 3.65 | N/A | — |


Recommended Tiny Models for Mobile

| Rank | Model | Why | |------|--------|------| | #1 | Qwen1.5B Q3_K_M | Best speed/quality balance | | #2 | RecurrentGemma 2B Q2_K | Best large model for phones | | #3 | Qwen0.5B Q5_K_M | Extremely fast & lightweight |


Experiment Documentation

  • All PC experiments → docs/experiments_pc/
  • All Mobile experiments → docs/experiments_mobile/
  • Raw logs → benchmarks/{pc_logs,mobile_logs}

Each experiment includes:

  • commands
  • raw logs
  • extracted metrics
  • sample output
  • interpretation

Future Work

  • more models (Phi-2, MiniCPM, RWKV)
  • more devices (Snapdragon 8 Gen 1/2)
  • thermal profiling
  • quality scoring
  • automated benchmark scripts

Youtube video representation language (hindi)

🤝Contributions

PRs are welcome — especially additional mobile devices and models.

View on GitHub
GitHub Stars7
CategoryEducation
Updated1mo ago
Forks0

Languages

Shell

Security Score

75/100

Audited on Mar 2, 2026

No findings