Functionary

Chat language model that can use tools and interpret the results

Generate Convert Improve

Install / Use

/learn @MeetKai/Functionary

About this skill

Quality Score

0/100

README

Functionary

Functionary is a language model that can interpret and execute functions/plugins.

The model determines when to execute functions, whether in parallel or serially, and can understand their outputs. It only triggers functions as needed. Function definitions are given as JSON Schema Objects, similar to OpenAI GPT function calls.

Documentation and more examples: functionary.meetkai.com

<details> <summary>Changelog: (click to expand)</summary>

[2024/12/24] We release meetkai/functionary-v4r-small-preview - our first version of Functionary that can generate the reasoning steps first before using the tools
[2024/10/21] New server powered by SGLang!
[2024/08/21] We release meetkai/functionary-small-v3.2 and meetkai/functionary-medium-v3.2
[2024/08/11] Our newest model (meetkai/functionary-medium-v3.1) is ranked 2nd in Berkeley Function-Calling Leaderboard
[2024/08/08] We release 128k-context length 70B-model: meetkai/functionary-medium-v3.1 that are based on meta-llama/Meta-Llama-3.1-70B-Instruct
[2024/08/07] We release 2 128k-context length models that are based on meta-llama/Meta-Llama-3.1-8B-Instruct:
- meetkai/functionary-small-v3.1: using Meta's original prompt template as described in: User-defined Custom tool calling
- meetkai/functionary-small-v3.2: using our own prompt template. This model is better than meetkai/functionary-small-v3.1
[2024/06/14] We release meetkai/functionary-medium-v3.0 (based on meta-llama/Meta-Llama-3-70B-Instruct) with better capability for function calling
[2024/05/17] We release meetkai/functionary-small-v2.5 with better capability for function calling and code interpreter compared with functionary-small-v2.4
[2024/05/06] Streaming support for functionary v2 to v2.4 models is released in llama-cpp-python!
[2024/05/03] Added support for serverless vLLM deployment on Modal.com
[2024/04/02] We release meetkai/functionary-small-v2.4 and meetkai/functionary-medium-v2.4! The first functionary models with code-interpreter ability (by passing in {type: "code_interpreter"} in tools)!

</details>

Getting Started

Functionary can be deployed using either our vLLM or SGLang servers. Choose either one depending on your preferences.

Installation

vLLM

pip install -e .[vllm]

SGLang

pip install -e .[sglang] --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python

Running the server

Small Model

vLLM

python3 server_vllm.py --model "meetkai/functionary-v4r-small-preview" --host 0.0.0.0 --port 8000 --max-model-len 8192

SGLang

python3 server_sglang.py --model-path "meetkai/functionary-v4r-small-preview" --host 0.0.0.0 --port 8000 --context-length 8192

Medium Model

Our medium models require: 4xA6000 or 2xA100 80GB to run, need to use: tensor-parallel-size or tp (SGLang)

vLLM

# vllm requires to run this first: https://github.com/vllm-project/vllm/issues/6152
export VLLM_WORKER_MULTIPROC_METHOD=spawn

python server_vllm.py --model "meetkai/functionary-medium-v3.1" --host 0.0.0.0 --port 8000 --max-model-len 8192 --tensor-parallel-size 2

SGLang

python server_sglang.py --model-path "meetkai/functionary-medium-v3.1" --host 0.0.0.0 --port 8000 --context-length 8192 --tp 2

LoRA Support (Currently Only in vLLM)

Similar to LoRA in vLLM, our server supports serving LoRA adapters both at startup and dynamically.

To serve a LoRA adapter at startup, run the server with the --lora-modules argument:

python server_vllm.py --model {BASE_MODEL} --enable-lora --lora-modules {name}={path} {name}={path} --host 0.0.0.0 --port 8000

To serve a LoRA adapter dynamically, use the /v1/load_lora_adapter endpoint:

python server_vllm.py --model {BASE_MODEL} --enable-lora --host 0.0.0.0 --port 8000
# Load a LoRA adapter dynamically
curl -X POST http://localhost:8000/v1/load_lora_adapter \
  -H "Content-Type: application/json" \
  -d '{
    "lora_name": "my_lora",
    "lora_path": "/path/to/my_lora_adapter"
  }'
# Example chat request to lora adapter
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my_lora",
    "messages": [...],
    "tools": [...],
    "tool_choice": "auto"
  }'
# Unload a LoRA adapter dynamically
curl -X POST http://localhost:8000/v1/unload_lora_adapter \
  -H "Content-Type: application/json" \
  -d '{
    "lora_name": "my_lora"
  }'

Text-Generation-Inference (TGI)

We also provide a service that performs inference on Functionary models using Text-Generation-Inference (TGI). Follow these steps to get started:

Install Docker following their installation instructions.
Install the Docker SDK for Python

pip install docker

Start up the Functionary TGI server

At start-up, the Functionary TGI server tries to connect to an existing TGI endpoint. In this case, you can run the following:

python3 server_tgi.py --model <REMOTE_MODEL_ID_OR_LOCAL_MODEL_PATH> --endpoint <TGI_SERVICE_ENDPOINT>

If the TGI endpoint does not exist, the Functionary TGI server will start a new TGI endpoint container with the address provided in the endpoint CLI argument via the installed Docker Python SDK. Run the following commands for remote and local models respectively:

python3 server_tgi.py --model <REMOTE_MODEL_ID> --remote_model_save_folder <PATH_TO_SAVE_AND_CACHE_REMOTE_MODEL> --endpoint <TGI_SERVICE_ENDPOINT>

python3 server_tgi.py --model <LOCAL_MODEL_PATH> --endpoint <TGI_SERVICE_ENDPOINT>

Make either OpenAI-compatible or raw HTTP requests to the Functionary TGI server.

Docker

If you're having trouble with dependencies, and you have nvidia-container-toolkit, you can start your environment like this:

cd <ROOT>

# vLLM
sudo docker build -t functionary-vllm -f dockerfiles/Dockerfile.vllm .
sudo docker run --runtime nvidia --gpus all -p 8000:8000 functionary-vllm

# SGLang
sudo docker build -t functionary-sglang -f dockerfiles/Dockerfile.sgl .
sudo docker run --runtime nvidia --gpus all -p 8000:8000 functionary-sglang

OpenAI Compatible Usage

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="functionary")

client.chat.completions.create(
    model="meetkai/functionary-v4r-small-preview",
    messages=[{"role": "user",
            "content": "What is the weather for Istanbul?"}
    ],
    tools=[{
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                }
            }
        }],
    tool_choice="auto"
)

Raw Usage:

<details> <summary>Details (click to expand)</summary>

import requests

data = {
    'model': 'meetkai/functionary-v4r-small-preview', # model name here is the value of argument "--model" in deploying: server_vllm.py or server.py
    'messages': [
        {
            "role": "user",
            "content": "What is the weather for Istanbul?"
        }
    ],
    'tools':[ # For functionary-7b-v2 we use "tools"; for functionary-7b-v1.4 we use "functions" = [{"name": "get_current_weather", "description":..., "parameters": ....}]
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type":

Related Skills

node-connect

339.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

claude-opus-4-5-migration

83.9k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

frontend-design

83.9k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

model-usage

339.3k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

MeetKai

View profile

View on GitHub

GitHub Stars1.6k

CategoryDevelopment

Updated2d ago

Forks118

MeetKai/functionary

Languages

Python

Security Score

100/100

Audited on Mar 26, 2026

No findings