Functionary
Chat language model that can use tools and interpret the results
Install / Use
/learn @MeetKai/FunctionaryREADME
Functionary
<a href="https://meetkai.com/"> <img align="right" width="256" height="256" src="https://github.com/meetkai/functionary/assets/3749407/c7a1972d-6ad7-40dc-8000-dceabe6baabd"> </a>Functionary is a language model that can interpret and execute functions/plugins.
The model determines when to execute functions, whether in parallel or serially, and can understand their outputs. It only triggers functions as needed. Function definitions are given as JSON Schema Objects, similar to OpenAI GPT function calls.
Documentation and more examples: functionary.meetkai.com
<details> <summary>Changelog: (click to expand)</summary>- [2024/12/24] We release meetkai/functionary-v4r-small-preview - our first version of Functionary that can generate the reasoning steps first before using the tools
- [2024/10/21] New server powered by SGLang!
- [2024/08/21] We release meetkai/functionary-small-v3.2 and meetkai/functionary-medium-v3.2
- [2024/08/11] Our newest model (meetkai/functionary-medium-v3.1) is ranked 2nd in Berkeley Function-Calling Leaderboard
- [2024/08/08] We release 128k-context length 70B-model: meetkai/functionary-medium-v3.1 that are based on meta-llama/Meta-Llama-3.1-70B-Instruct
- [2024/08/07] We release 2 128k-context length models that are based on meta-llama/Meta-Llama-3.1-8B-Instruct:
- meetkai/functionary-small-v3.1: using Meta's original prompt template as described in: User-defined Custom tool calling
- meetkai/functionary-small-v3.2: using our own prompt template. This model is better than meetkai/functionary-small-v3.1
- [2024/06/14] We release meetkai/functionary-medium-v3.0 (based on meta-llama/Meta-Llama-3-70B-Instruct) with better capability for function calling
- [2024/05/17] We release meetkai/functionary-small-v2.5 with better capability for function calling and code interpreter compared with functionary-small-v2.4
- [2024/05/06] Streaming support for functionary v2 to v2.4 models is released in llama-cpp-python!
- [2024/05/03] Added support for serverless vLLM deployment on Modal.com
- [2024/04/02] We release meetkai/functionary-small-v2.4 and meetkai/functionary-medium-v2.4! The first functionary models with code-interpreter ability (by passing in
{type: "code_interpreter"}in tools)!
Getting Started
Functionary can be deployed using either our vLLM or SGLang servers. Choose either one depending on your preferences.
Installation
vLLM
pip install -e .[vllm]
SGLang
pip install -e .[sglang] --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
Running the server
Small Model
vLLM
python3 server_vllm.py --model "meetkai/functionary-v4r-small-preview" --host 0.0.0.0 --port 8000 --max-model-len 8192
SGLang
python3 server_sglang.py --model-path "meetkai/functionary-v4r-small-preview" --host 0.0.0.0 --port 8000 --context-length 8192
Medium Model
Our medium models require: 4xA6000 or 2xA100 80GB to run, need to use: tensor-parallel-size or tp (SGLang)
vLLM
# vllm requires to run this first: https://github.com/vllm-project/vllm/issues/6152
export VLLM_WORKER_MULTIPROC_METHOD=spawn
python server_vllm.py --model "meetkai/functionary-medium-v3.1" --host 0.0.0.0 --port 8000 --max-model-len 8192 --tensor-parallel-size 2
SGLang
python server_sglang.py --model-path "meetkai/functionary-medium-v3.1" --host 0.0.0.0 --port 8000 --context-length 8192 --tp 2
LoRA Support (Currently Only in vLLM)
Similar to LoRA in vLLM, our server supports serving LoRA adapters both at startup and dynamically.
To serve a LoRA adapter at startup, run the server with the --lora-modules argument:
python server_vllm.py --model {BASE_MODEL} --enable-lora --lora-modules {name}={path} {name}={path} --host 0.0.0.0 --port 8000
To serve a LoRA adapter dynamically, use the /v1/load_lora_adapter endpoint:
python server_vllm.py --model {BASE_MODEL} --enable-lora --host 0.0.0.0 --port 8000
# Load a LoRA adapter dynamically
curl -X POST http://localhost:8000/v1/load_lora_adapter \
-H "Content-Type: application/json" \
-d '{
"lora_name": "my_lora",
"lora_path": "/path/to/my_lora_adapter"
}'
# Example chat request to lora adapter
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my_lora",
"messages": [...],
"tools": [...],
"tool_choice": "auto"
}'
# Unload a LoRA adapter dynamically
curl -X POST http://localhost:8000/v1/unload_lora_adapter \
-H "Content-Type: application/json" \
-d '{
"lora_name": "my_lora"
}'
Text-Generation-Inference (TGI)
We also provide a service that performs inference on Functionary models using Text-Generation-Inference (TGI). Follow these steps to get started:
-
Install Docker following their installation instructions.
-
Install the Docker SDK for Python
pip install docker
- Start up the Functionary TGI server
At start-up, the Functionary TGI server tries to connect to an existing TGI endpoint. In this case, you can run the following:
python3 server_tgi.py --model <REMOTE_MODEL_ID_OR_LOCAL_MODEL_PATH> --endpoint <TGI_SERVICE_ENDPOINT>
If the TGI endpoint does not exist, the Functionary TGI server will start a new TGI endpoint container with the address provided in the endpoint CLI argument via the installed Docker Python SDK. Run the following commands for remote and local models respectively:
python3 server_tgi.py --model <REMOTE_MODEL_ID> --remote_model_save_folder <PATH_TO_SAVE_AND_CACHE_REMOTE_MODEL> --endpoint <TGI_SERVICE_ENDPOINT>
python3 server_tgi.py --model <LOCAL_MODEL_PATH> --endpoint <TGI_SERVICE_ENDPOINT>
- Make either OpenAI-compatible or raw HTTP requests to the Functionary TGI server.
Docker
If you're having trouble with dependencies, and you have nvidia-container-toolkit, you can start your environment like this:
cd <ROOT>
# vLLM
sudo docker build -t functionary-vllm -f dockerfiles/Dockerfile.vllm .
sudo docker run --runtime nvidia --gpus all -p 8000:8000 functionary-vllm
# SGLang
sudo docker build -t functionary-sglang -f dockerfiles/Dockerfile.sgl .
sudo docker run --runtime nvidia --gpus all -p 8000:8000 functionary-sglang
OpenAI Compatible Usage
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="functionary")
client.chat.completions.create(
model="meetkai/functionary-v4r-small-preview",
messages=[{"role": "user",
"content": "What is the weather for Istanbul?"}
],
tools=[{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}],
tool_choice="auto"
)
Raw Usage:
<details> <summary>Details (click to expand)</summary>import requests
data = {
'model': 'meetkai/functionary-v4r-small-preview', # model name here is the value of argument "--model" in deploying: server_vllm.py or server.py
'messages': [
{
"role": "user",
"content": "What is the weather for Istanbul?"
}
],
'tools':[ # For functionary-7b-v2 we use "tools"; for functionary-7b-v1.4 we use "functions" = [{"name": "get_current_weather", "description":..., "parameters": ....}]
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type":
Related Skills
node-connect
339.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
83.9kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
339.3kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
