SkillAgentSearch skills...

TinyChat

LLaMA 2-based chat for ultra-low-memory hardware with a 60 MB footprint

Install / Use

/learn @starhopp3r/TinyChat
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

TinyChat

logo

TinyChat15M is a 15-million parameter conversational language model built on the Meta Llama 2 architecture. Designed to operate on devices with as little as 60 MB of free memory, TinyChat15M has been successfully deployed on the Sipeed LicheeRV Nano W, a compact RISC-V development board equipped with just 256 MB of DDR3 memory. Inspired by Dr. Andrej Karpathy’s llama2.c project, TinyChat15M showcases that small conversational language models can be both effective and resource-efficient, making advanced AI capabilities more accessible and sustainable. You can find a detailed blog post on this project here.

Usage

First, navigate to the folder where you keep your projects, and then clone this repository into that folder:

git clone https://github.com/starhopp3r/TinyChat.git

Next, navigate to the llama2.c folder:

cd TinyChat/llama2.c

Now, download the TinyChat15M model from Hugging Face:

wget https://huggingface.co/starhopp3r/TinyChat/resolve/main/TinyChat15M.bin

Next, compile the C code:

make run

Now, to run the TinyChat15M Assistant use the following command:

./run TinyChat15M.bin -t 1.0 -p 0.9 -n 2048 -m chat

Note that the temperature (-t flag) and top-p value (-p flag) can be set to any number between 0 and 1. For optimal results, it's recommended to sample with -t 1.0 and -p 0.9, meaning a temperature of 1.0 (default) and top-p sampling at 0.9 (default). Intuitively, top-p sampling prevents tokens with extremely low probabilities from being selected, reducing the chances of getting "unlucky" during sampling and decreasing the likelihood of generating off-topic content. Generally, to control the diversity of samples, you can either adjust the temperature (i.e., vary -t between 0 and 1 while keeping top-p off with -p 0) or the top-p value (i.e., vary -p between 0 and 1 while keeping the temperature at 1), but it’s advisable not to modify both simultaneously. Detailed explanations of LLM sampling strategies can be found here, here and here.

View on GitHub
GitHub Stars13
CategoryDevelopment
Updated15d ago
Forks1

Languages

C

Security Score

90/100

Audited on Mar 19, 2026

No findings