<p align="center"> <img src="https://img.shields.io/badge/Language-C11-blue?style=flat-square" alt="C11"> <img src="https://img.shields.io/badge/Binary_Size-~80KB-brightgreen?style=flat-square" alt="Binary Size"> <img src="https://img.shields.io/badge/Runtime_RAM-45MB-orange?style=flat-square" alt="RAM"> <img src="https://img.shields.io/badge/Dependencies-Zero-success?style=flat-square" alt="Zero Dependencies"> <img src="https://img.shields.io/badge/License-MIT-yellow?style=flat-square" alt="MIT License"> </p> <h1 align="center">PicoLM</h1> <p align="center"> <strong>Run a 1-billion parameter LLM on a $10 board with 256MB RAM.</strong><br> Pure C. Zero dependencies. One binary. No Python. No cloud. </p> <p align="center"> <code>echo "Explain gravity" | ./picolm model.gguf -n 100 -j 4</code> </p>

The Perfect Match: PicoLM + PicoClaw

PicoLM was built as the local brain for PicoClaw — an ultra-lightweight AI assistant in Go that runs on $10 hardware. Together, they form a fully offline AI agent — no cloud, no API keys, no internet, no monthly bills.

Every other LLM provider needs the internet. PicoLM doesn't.

<table align="center"> <tr align="center"> <td><b>The Hardware</b></td> <td><b>The Architecture</b></td> </tr> <tr> <td align="center"><img src="https://raw.githubusercontent.com/sipeed/picoclaw/main/assets/licheervnano.png" alt="$9.90 LicheeRV Nano" width="360"></td> <td align="center"><img src="https://raw.githubusercontent.com/sipeed/picoclaw/main/assets/arch.jpg" alt="PicoClaw architecture — PicoLM sits in the LLM box" width="420"></td> </tr> <tr> <td align="center"><em>$9.90 — that's the entire server</em></td> <td align="center"><em>PicoLM powers the LLM box in PicoClaw's agent loop</em></td> </tr> </table>

Why they're a perfect fit

| | Cloud Provider (OpenAI, etc.) | PicoLM (Local) | |---|---|---| | Cost | Pay per token, forever | Free forever | | Privacy | Your data sent to servers | Everything stays on-device | | Internet | Required for every request | Not needed at all | | Latency | Network round-trip + inference | Inference only | | Hardware | Needs a $599 Mac Mini | Runs on a $10 board | | Binary | N/A | ~80KB single file | | RAM | N/A | 45 MB total |

How it works

PicoClaw's agent loop spawns PicoLM as a subprocess. Messages come in from Telegram, Discord, or CLI — PicoClaw formats them into a chat template, pipes the prompt to picolm via stdin, and reads the response from stdout. When tools are needed, --json grammar mode guarantees valid JSON even from a 1B model.

Telegram / Discord / CLI
        │
        ▼
   ┌──────────┐    stdin: prompt     ┌───────────┐
   │ PicoClaw │ ──────────────────►  │  picolm   │
   │   (Go)   │ ◄──────────────────  │   (C)     │
   └──────────┘    stdout: response  │ + model   │
        │                            └───────────┘
        ▼                            45 MB RAM
   User gets reply                   No internet

Quick setup

# 1. Build PicoLM
cd picolm && make native    # or: make pi (Raspberry Pi)

# 2. Download model (one-time, 638 MB)
make model

# 3. Build PicoClaw
cd ../picoclaw && make deps && make build

# 4. Configure (~/.picoclaw/config.json)

{
  "agents": {
    "defaults": {
      "provider": "picolm",
      "model": "picolm-local"
    }
  },
  "providers": {
    "picolm": {
      "binary": "~/.picolm/bin/picolm",
      "model": "~/.picolm/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf",
      "max_tokens": 256,
      "threads": 4,
      "template": "chatml"
    }
  }
}

# 5. Chat — fully offline!
picoclaw agent -m "What is photosynthesis?"

Or install everything in one line

curl -sSL https://raw.githubusercontent.com/RightNow-AI/picolm/main/install.sh | bash

Performance on real hardware

| Device | Price | Generation Speed | RAM Used | |--------|-------|-----------------|----------| | Pi 5 (4-core) | $60 | ~10 tok/s | 45 MB | | Pi 4 (4-core) | $35 | ~8 tok/s | 45 MB | | Pi 3B+ | $25 | ~4 tok/s | 45 MB | | Pi Zero 2W | $15 | ~2 tok/s | 45 MB | | LicheeRV Nano | $10 | ~1 tok/s | 45 MB |

JSON tool calling

PicoClaw automatically activates --json grammar mode when it needs structured output. This guarantees syntactically valid JSON even from a 1B parameter model — essential for reliable tool calling on tiny hardware:

picoclaw agent -m "Search for weather in Tokyo"
# → PicoLM generates: {"tool_calls": [{"function": {"name": "web_search", "arguments": "{\"query\": \"weather Tokyo\"}"}}]}

For the full PicoClaw documentation, see the PicoClaw README.

What is PicoLM?

PicoLM is a minimal, from-scratch LLM inference engine written in ~2,500 lines of C11. It runs TinyLlama 1.1B (and other LLaMA-architecture models in GGUF format) on hardware that most inference frameworks won't even consider:

Raspberry Pi Zero 2W ($15, 512MB RAM, ARM Cortex-A53)
Sipeed LicheeRV ($12, 512MB RAM, RISC-V)
Raspberry Pi 3/4/5 (1-8GB RAM, ARM NEON SIMD)
Any Linux/Windows/macOS x86-64 machine

The model file (638MB) stays on disk. PicoLM memory-maps it and streams one layer at a time through RAM. Total runtime memory: ~45MB including the FP16 KV cache.

                    ┌──────────────────────────────────────────┐
   What goes        │         45 MB Runtime RAM                │
   in RAM           │  ┌─────────┐ ┌──────────┐ ┌───────────┐  │
                    │  │ Buffers │ │ FP16 KV  │ │ Tokenizer │  │
                    │  │  1.2 MB │ │ Cache    │ │   4.5 MB  │  │
                    │  │         │ │  ~40 MB  │ │           │  │
                    │  └─────────┘ └──────────┘ └───────────┘  │
                    └──────────────────────────────────────────┘

                    ┌──────────────────────────────────────────┐
   What stays       │        638 MB Model on Disk              │
   on disk          │       (mmap — OS pages in layers         │
   (via mmap)       │        as needed, ~1 at a time)          │
                    └──────────────────────────────────────────┘

Features

| Feature | Description | |---------|-------------| | GGUF Native | Reads GGUF v2/v3 files directly — no conversion needed | | K-Quant Support | Q2_K, Q3_K, Q4_K, Q5_K, Q6_K, Q8_0, Q4_0, F16, F32 | | mmap Layer Streaming | Model weights stay on disk; OS pages in one layer at a time | | FP16 KV Cache | Halves KV cache memory (44MB vs 88MB for 2048 context) | | Flash Attention | Online softmax — no O(seq_len) attention buffer needed | | Pre-computed RoPE | cos/sin lookup tables eliminate transcendentals from hot loop | | SIMD Acceleration | ARM NEON (Pi 3/4/5) and x86 SSE2 (Intel/AMD) auto-detected | | Fused Dot Products | Dequantize + dot-product in one pass — no intermediate buffer | | Multi-threaded matmul | Parallel matrix-vector multiply across CPU cores | | Grammar-Constrained JSON | --json flag forces valid JSON output (for tool calling) | | KV Cache Persistence | --cache saves/loads prompt state — skip prefill on re-runs | | BPE Tokenizer | Score-based byte-pair encoding, loaded from GGUF metadata | | Top-p Sampling | Temperature + nucleus sampling with configurable seed | | Pipe-friendly | Reads prompts from stdin: echo "Hello" \| ./picolm model.gguf | | Zero Dependencies | Only libc, libm, libpthread. No external libraries. | | Cross-platform | Linux, Windows (MSVC), macOS. ARM, x86-64, RISC-V. |

Quick Start

One-liner install (Raspberry Pi / Linux)

curl -sSL https://raw.githubusercontent.com/RightNow-AI/picolm/main/install.sh | bash

This will:

Detect your platform (ARM64, ARMv7, x86-64)
Install build dependencies (gcc, make, curl)
Build PicoLM with optimal SIMD flags for your CPU
Download TinyLlama 1.1B Q4_K_M (638 MB)
Run a quick test
Generate PicoClaw config
Add picolm to your PATH

Build from source

git clone https://github.com/rightnow-ai/picolm.git
cd picolm/picolm

# Auto-detect CPU (enables SSE2/AVX on x86, NEON on ARM)
make native

# Download a model
make model

# Run it
./picolm /opt/picolm/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
    -p "The meaning of life is" -n 100

Build on Windows (MSVC)

cd picolm
build.bat
picolm.exe model.gguf -p "Hello world" -n 50

Platform-specific builds

make native      # x86/ARM auto-detect (recommended for local machine)
make pi          # Raspberry Pi 3/4/5 (64-bit ARM + NEON SIMD)
make pi-arm32    # Pi Zero / Pi 1 (32-bit ARM)
make cross-pi    # Cross-compile for Pi from x86 (static binary)
make riscv       # RISC-V (Sipeed LicheeRV, etc.)
make static      # Static binary for single-file deployment
make debug       # Debug build with symbols, no optimization

Usage

PicoLM — ultra-lightweight LLM inference engine

Usage: picolm <model.gguf> [options]

Generation options:
  -p <prompt>    Input prompt (or pipe via stdin)
  -n <int>       Max tokens to generate (default: 256)
  -t <float>     Temperature (default: 0.8, 0=greedy)
  -k <float>     Top-p / nucleus sampling (default: 0.9)
  -s <int>       RNG seed (default: 42)
  -c <int>       Context length override
  -j <int>       Number of threads (default: 4)

Advanced options:
  --json         Grammar-constrained JSON output mode
  --cache <file> KV cache file (saves/loads prompt state)

Examples

Basic generation:

./picolm model.gguf -p "Once upon a time" -n 200

**Greedy decoding (deterministic, temperature=

Picolm

Install / Use

README