Octotools
OctoTools: An agentic framework with extensible tools for complex reasoning
Install / Use
/learn @octotools/OctotoolsREADME
<a name="readme-top"></a>
<div align="center"> <img src="https://raw.githubusercontent.com/octotools/octotools/refs/heads/main/assets/octotools.svg" alt="OctoTools Logo" width="100"> </div>OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning
<!--- BADGES: START ---> <!-- [](https://discord.gg/kgUXdZHgNG) --> <!--- BADGES: END --->Updates
News
- 2025-07-22: 📄 Added support for Ollama LLM. Now you can use any Ollama-supported models. Thanks @yeahdongcn for your contribution!
- 2025-07-03: 📄 Added support for LiteLLM LLM. Now you can use any LiteLLM-supported models. Thanks @axion66 for your contribution!
- 2025-06-25: 📄 Added backend support for Azure OpenAI. Thanks @sufiyan-ahmed for your contribution!
- 2025-05-21: 📄 Added support for vLLM LLM. Now you can use any vLLM-supported models and your local checkpoint models. Check out the example notebook for more details.
- 2025-05-19: 📄 A great re-implementation of the OctoTools framework is available here! Thank you Maciek Tokarski for your contribution!
- 2025-05-03: 🏆 Excited to announce that OctoTools won the Best Paper Award at the NAACL 2025 - KnowledgeNLP Workshop! Check out our oral presentation slides here.
- 2025-05-01: 📚 A comprehensive tutorial on OctoTools is now available here. Special thanks to @fikird for creating this detailed guide!
- 2025-04-19: 📦 Released Python package on PyPI at pypi.org/project/octotoolkit! Check out the installation guide for more details.
- 2025-04-17: 🚀 Support for a broader range of LLM engines is available now! See the full list of supported LLM engines here.
- 2025-03-08: 📺 Thrilled to have OctoTools featured in a tutorial by Discover AI at YouTube! Watch the engaging video here.
- 2025-02-16: 📄 Our paper is now available as a preprint on ArXiv! Read it here!
TODO
Stay tuned, we're working on the following:
- [X] Add support for Anthropic LLM
- [X] Add support for Together AI LLM
- [X] Add support for DeepSeek LLM
- [X] Add support for Gemini LLM
- [X] Add support for Grok LLM
- [X] Release Python package on PyPI
- [X] Add support for vLLM LLM
- [X] Add support for Azure OpenAI
- [X] Add support for LiteLLM (to support API models)
- [X] Add support for Ollama (to support local models)
TBD: We're excited to collaborate with the community to expand OctoTools to more tools, domains, and beyond! Join our Slack or reach out to Pan Lu to get started!
Get Started
Step-by-step Tutorial
Here is a detailed explanation and tutorial on octotools here.
YouTube Tutorial
Excited to have a tutorial video for OctoTools covered by Discover AI at YouTube!
<div align="center"> <a href="https://www.youtube.com/watch?v=4828sGfx7dk"> <img src="https://img.youtube.com/vi/4828sGfx7dk/maxresdefault.jpg" alt="OctoTools Tutorial" width="100%"> </a> </div>Introduction
We introduce OctoTools, a training-free, user-friendly, and easily extensible open-source agentic framework designed to tackle complex reasoning across diverse domains. OctoTools introduces standardized tool cards to encapsulate tool functionality, a planner for both high-level and low-level planning, and an executor to carry out tool usage.
Tool cards define tool-usage metadata and encapsulate heterogeneous tools, enabling training-free integration of new tools without additional training or framework refinement. (2) The planner governs both high-level and low-level planning to address the global objective and refine actions step by step. (3) The executor instantiates tool calls by generating executable commands and save structured results in the context. The final answer is summarized from the full trajectory in the context. Furthermore, the task-specific toolset optimization algorithm learns a beneficial subset of tools for downstream tasks.

We validate OctoTools' generality across 16 diverse tasks (including MathVista, MMLU-Pro, MedQA, and GAIA-Text), achieving substantial average accuracy gains of 9.3% over GPT-4o. Furthermore, OctoTools also outperforms AutoGen, GPT-Functions and LangChain by up to 10.6% when given the same set of tools.
<p align="center"> <img src="https://raw.githubusercontent.com/octotools/octotools/refs/heads/main/assets/result/main_scores_bar_chart.png" width="50%"> <!-- Text. --> </p>Supported LLM Engines
We support a broad range of LLM engines, including GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, and more.
| Model Family | Engines (Multi-modal) | Engines (Text-Only) | Official Model List |
|--------------|-------------------|--------------------| -------------------- |
| OpenAI | gpt-4-turbo, gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o1, o3, o1-pro, o4-mini | gpt-3.5-turbo, gpt-4, o1-mini, o3-mini | OpenAI Models |
| Azure OpenAI | gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o1, o3, o1-pro, o4-mini | gpt-3.5-turbo, gpt-4, o1-mini, o3-mini | Azure OpenAI Models |
| Anthropic | claude-3-haiku-20240307, claude-3-sonnet-20240229, claude-3-opus-20240229, claude-3-5-sonnet-20240620, claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022, claude-3-7-sonnet-20250219 | | Anthropic Models |
| TogetherAI | Most multi-modal models, including meta-llama/Llama-4-Scout-17B-16E-Instruct, Qwen/QwQ-32B, Qwen/Qwen2-VL-72B-Instruct | Most text-only models, including meta-llama/Llama-3-70b-chat-hf, Qwen/Qwen2-72B-Instruct | TogetherAI Models |
| DeepSeek | | deepseek-chat, deepseek-reasoner | DeepSeek Models |
| Gemini | gemini-1.5-pro, gemini-1.5-flash-8b, gemini-1.5-flash, gemini-2.0-flash-lite, gemini-2.0-flash, gemini-2.5-pro-preview-03-25 | | Gemini Models |
| Grok | grok-2-vision-1212, grok-2-vision, grok-2-vision-latest | grok-3-mini-fast-beta, grok-3-mini-fast, grok-3-mini-fast-latest, grok-3-mini-beta, grok-3-mini, grok-3-mini-latest, grok-3-fast-beta, grok-3-fast, grok-3-fast-latest, grok-3-beta, grok-3, grok-3-latest | Grok Models |
| vLLM | Various vLLM-supported models, for example, Qwen2.5-VL-3B-Instruct and Qwen2.5-VL-72B-Instruct. You can also use local checkpoint models for customization and local inference. (Example-1, Example-2)| Various vLLM-supported models, for example, Qwen2.5-1.5B-Instruct. You can also use local checkpoint models for customization and local inference. | vLLM Models |
| LiteLLM | Any model supported by
