Gamal
Research tool leveraging LLM for answers
Install / Use
/learn @ariya/GamalREADME
Gamal
Gamal is a simple, zero-dependency tool designed to quickly provide answers to questions. It finds relevant web pages and uses an LLM to summarize the content, delivering concise answers. Gamal is accessible via the terminal (as a CLI tool), through its minimalist web interface, or as a Telegram bot.
Gamal utilizes SearXNG for web searches and requires an LLM to generate responses based on search results. By default, Gamal integrates with OpenRouter as its LLM service, requiring the configuration of an API key in the LLM_API_KEY environment variable. Please continue reading for detailed instructions on configuring Gamal to use either a local LLM (llama.cpp, Jan, and Ollama) or other managed LLM services (offering over half a dozen options, including OpenAI, Fireworks, and Groq).
To execute Gamal as a CLI tool, run it with Node.js (version >= 18) or Bun:
./gamal.js
For instant answers, pipe the questions directly into Gamal:
echo "List 5 Indonesia's best travel destinations" | ./gamal.js
Gamal also includes a minimalist front-end web interface. To launch it, specify the environment variable GAMAL_HTTP_PORT, for example:
GAMAL_HTTP_PORT=5000 ./gamal.js
Then, open a web browser and go to localhost:5000.
Gamal is capable of functioning as a Telegram bot. Obtain a token (refer to Telegram documentation for details) and set it as the environment variable GAMAL_TELEGRAM_TOKEN before launching Gamal. Note that conversation history in Telegram chats is stored in memory and not persisted to disk.
Multi-language Support
Gamal can converse in many languages besides English. It always tries to respond in the same language as the question. You can freely switch languages between questions, as shown in the following example:
>> Which planet in our solar system is the biggest?
Jupiter is the largest planet in our solar system [1].
[1] https://science.nasa.gov/jupiter/
>> ¿Y el más caliente?
Venus es el planeta más caliente, con hasta 475°C. [1].
[1] https://www.redastronomy.com/sistema-solar/el-planeta-venus/
Gamal's continuous integration workflows include evaluation tests in English, Spanish, German, French, Italian, and Indonesian.
Conversational Interface
With the integration of third-party tools, Gamal can engage in conversations using voice (both input and output) rather than just text.
For automatic speech recognition (ASR), also known as speech-to-text (STT), Gamal leverages the streaming tool from whisper.cpp. Ensure that whisper-cpp-stream, or the custom executable specified in the WHISPER_STREAM environment variable, is available in your system's path. Whisper requires a GGML model, which can be downloaded from Hugging Face. The base model (60 MB) is generally a good balance between accuracy and speed for most modern computers. Set the WHISPER_MODEL environment variable to the full path of the downloaded model.
To enable Gamal to respond with voice instead of just text, set the environment variables to configure any TTS API compatible with the OpenAI Speech API. For example, a good local TTS with Kokoro-82M can be set up with either Kokoro-FastAPI (with pre-downloaded voice weights) or Speaches (ensure that the chosen voice weights are downloaded), and then:
export TTS_API_BASE_URL=http://127.0.0.1:8880/v1
export TTS_VOICE="af_bella"
Gamal will detect this TTS API service and use it to generate the corresponding audio. Note that the synthesized audio will be played back through the speaker or other audio output using the play utility from the SOX (Sound eXchange) project. Ensure that SOX is installed and available in your system's PATH.
Using Other LLM Services
Gamal is designed to be used with OpenRouter by default, but it can also be configured to work with other LLM services by adjusting some environment variables. The correct API key and a suitable model are required.
Compatible LLM services include Deep Infra, Fireworks, Gemini, Groq, Hyperbolic, Lepton, Novita, OpenAI, and Together.
Refer to the relevant section for configuration details. The example provided is for Llama-3.1 8B, though any LLM with 7B parameters should also work, such as Mistral 7B, Qwen-2 7B, or Gemma-2 9B.
<details><summary>Deep Infra</summary>export LLM_API_BASE_URL=https://api.deepinfra.com/v1/openai
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/Meta-Llama-3.1-8B-Instruct"
</details>
<details><summary>Fireworks</summary>
export LLM_API_BASE_URL=https://api.fireworks.ai/inference/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="accounts/fireworks/models/llama-v3p1-8b-instruct"
</details>
<details><summary>Google Gemini</summary>
export LLM_API_BASE_URL=https://generativelanguage.googleapis.com/v1beta
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="gemini-1.5-flash-8b"
export LLM_JSON_SCHEMA=1
</details>
<details><summary>Groq</summary>
export LLM_API_BASE_URL=https://api.groq.com/openai/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="llama-3.1-8b-instant"
</details>
<details><summary>Hyperbolic</summary>
export LLM_API_BASE_URL=https://api.hyperbolic.xyz/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/Meta-Llama-3.1-8B-Instruct"
</details>
<details><summary>Lepton</summary>
export LLM_API_BASE_URL=https://llama3-1-8b.lepton.run/api/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="llama3-1-8b"
</details>
<details><summary>Novita</summary>
export LLM_API_BASE_URL=https://api.novita.ai/v3/openai
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/llama-3.1-8b-instruct"
</details>
<details><summary>OpenAI</summary>
export LLM_API_BASE_URL=https://api.openai.com/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="gpt-4o-mini"
</details>
<details><summary>Together</summary>
export LLM_API_BASE_URL=https://api.together.xyz/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"
</details>
Using Local LLM Servers
Gamal is compatible with local LLM inference tools such as llama.cpp, Jan, and Ollama. Refer to the relevant section for configuration details.
The example provided uses Llama-3.1 8B. For optimal per
