SkillAgentSearch skills...

LLMChat

A full-stack Webui implementation of Large Language model, such as ChatGPT or LLaMA.

Install / Use

/learn @c0sogi/LLMChat

README

LLMChat 🎉

👋 Welcome to the LLMChat repository, a full-stack implementation of an API server built with Python FastAPI, and a beautiful frontend powered by Flutter. 💬 This project is designed to deliver a seamless chat experience with the advanced ChatGPT and other LLM models. 🔝 Offering a modern infrastructure that can be easily extended when GPT-4's Multimodal and Plugin features become available. 🚀 Enjoy your stay!

Demo


Enjoy the beautiful UI and rich set of customizable widgets provided by Flutter.

  • It supports both mobile and PC environments.
  • Markdown is also supported, so you can use it to format your messages.

Web Browsing

  • Duckduckgo

    You can use the Duckduckgo search engine to find relevant information on the web. Just activate the 'Browse' toggle button!

    Watch the demo video for full-browsing: https://www.youtube.com/watch?v=mj_CVrWrS08

Browse Web


Vector Embedding

  • Embed Any Text

    With the /embed command, you can store the text indefinitely in your own private vector database and query it later, anytime. If you use the /share command, the text is stored in a public vector database that everyone can share. Enabling Query toggle button or /query command helps the AI generate contextualized answers by searching for text similarities in the public and private databases. This solves one of the biggest limitations of language models: memory.

  • Upload Your PDF File

    You can embed PDF file by clicking Embed Document on the bottom left. In a few seconds, text contents of PDF will be converted to vectors and embedded to Redis cache.

Upload Your PDF File


  • Change your chat model

    You can change your chat model by dropdown menu. You can define whatever model you want to use in LLMModels which is located in app/models/llms.py.

    Change your chat model


  • Change your chat title

    You can change your chat title by clicking the title of the chat. This will be stored until you change or delete it!

    Change your chat title


🦙 Local LLMs

llama api

For the local Llalam LLMs, it is assumed to work only in the local environment and uses the http://localhost:8002/v1/completions endpoint. It continuously checks the status of the llama API server by connecting to http://localhost:8002/health once a second to see if a 200 OK response is returned, and if not, it automatically runs a separate process to create a the API server.

Llama.cpp

The main goal of llama.cpp is to run the LLaMA model using GGML 4-bit quantization with plain C/C++ implementation without dependencies. You have to download GGML bin file from huggingface and put it in the llama_models/ggml folder, and define LLMModel in app/models/llms.py. There are few examples, so you can easily define your own model. Refer to the llama.cpp repository for more information: https://github.com/ggerganov/llama.cpp

Exllama

A standalone Python/C++/CUDA implementation of Llama for use with 4-bit GPTQ weights, designed to be fast and memory-efficient on modern GPUs. It uses pytorch and sentencepiece to run the model. It is assumed to work only in the local environment and at least one NVIDIA CUDA GPU is required. You have to download tokenizer, config, and GPTQ files from huggingface and put it in the llama_models/gptq/YOUR_MODEL_FOLDER folder, and define LLMModel in app/models/llms.py. There are few examples, so you can easily define your own model. Refer to the exllama repository for more detailed information: https://github.com/turboderp/exllama


Key Features

  • FastAPI - High-performance web framework for building APIs with Python.
  • Flutter - Webapp frontend with beautiful UI and rich set of customizable widgets.
  • ChatGPT - Seamless integration with the OpenAI API for text generation and message management.
  • LLAMA - Suporting LocalLLM, LlamaCpp and Exllama models.
  • WebSocket Connection - Real-time, two-way communication with the ChatGPT, and other LLM models, with Flutter frontend webapp.
  • Vectorstore - Using Redis and Langchain, store and retrieve vector embeddings for similarity search. It will help AI to generate more relevant responses.
  • Auto summarization - Using Langchain's summarize chain, summarize the conversation and store it in the database. It will help saving a lot of tokens.
  • Web Browsing - Using Duckduckgo search engine, browse the web and find relevant information.
  • Concurrency - Asynchronous programming with async/await syntax for concurrency and parallelism.
  • Security - Token validation and authentication to keep API secure.
  • Database - Manage database connections and execute MySQL queries. Easily perform Create, Read, Update, and Delete actions, with sqlalchemy.asyncio
  • Cache - Manage cache connections and execute Redis queries with aioredis. Easily perform Create, Read, Update, and Delete actions, with aioredis.

Getting Started / Installation

To set up the on your local machine, follow these simple steps. Before you begin, ensure you have docker and docker-compose installed on your machine. If you want to run the server without docker, you have to install Python 3.11 additionally. Even though, you need Docker to run DB servers.

1. Clone the repository

To recursively clone the submodules to use Exllama or llama.cpp models, use the following command:

git clone --recurse-submodules https://github.com/c0sogi/llmchat.git

You only want to use core features(OpenAI), use the following command:

git clone https://github.com/c0sogi/llmchat.git

2. Change to the project directory

cd LLMChat

3. Create .env file

Setup an env file, referring to .env-sample file. Enter database information to create, OpenAI API Key, and other necessary configurations. Optionals are not required, just leave them as they are.

4. To run the server

Execute these. It may take a few minutes to start the server for the first time:

docker-compose -f docker-compose-local.yaml up

5. To stop the server

docker-compose -f docker-compose-local.yaml down

6. Enjoy it

Now you can access the server at http://localhost:8000/docs and the database at db:3306 or cache:6379. You can also access the app at http://localhost:8000/chat.

  • To run the server without docker If you want to run the server without docker, you have to install Python 3.11 additionally. Even though, you need Docker to run DB servers. Turn off the API server already running with docker-compose -f docker-compose-local.yaml down api. Don't forget to run other DB servers on Docker! Then, run the following commands:

    python -m main
    

    Your Server should now be up and running on http://localhost:8001 in this case.

License

This project is licensed under the MIT License, which allows for free use, modification, and distribution, as long as the original copyright and license notice are included in any copy or substantial portion of the software.

Why FastAPI?

🚀 FastAPI is a modern web framework for building APIs with Python. 💪 It has high performance, easy to learn, fast to code, and ready for production. 👍 One of the main features of FastAPI is that it supports concurrency and async/await syntax. 🤝 This means that you can write code that can handle multiple tasks at the same time without blocking each other, especially when dealing with I/O bound operations, such as network requests, database queries, file operations, etc.

Why Flutter?

📱 Flutter is an open-source UI toolkit developed by Google for building native user interfaces for mobile, web, and desktop platforms from a single codebase. 👨‍💻 It uses Dart, a modern object-oriented programming language, and provides a rich set of customizable widgets that can adapt to any design.

WebSocket Connection

You can access ChatGPT or LlamaCpp through WebSocket connection using two modules: app/routers/websocket and app/utils/chat/chat_stream_manager. These modules facilitate the communication between the Flutter client and the Chat model through a WebSocket. With the WebSocket, you can establish a real-time, two-way communication channel to interact with the LLM.

Usage

To start a conversation, connect to the WebSocket route /ws/chat/{api_key} with a valid API key registered in the database. Note that this API key is not the same as OpenAI API key, but only available for your server to validate the user. Once connected, you can send messages and commands to interact with the LLM model. The WebSocket will send back chat responses in real-time. This websocket connection is established via Flutter app, which can accessed with endpoint /chat.

websocket.py

websocket.py is responsible for setting up a WebSocket connection and handling user authentication. It defines the WebSocket route /chat/{api_key} that accepts a WebSocket and an API key as parameters.

When a client connects to the WebSocket, it first checks the API key to authenticate the user. If the API key is valid, the begin_chat() function is called from the stream_manager.py module to start the conversation.

In case of an unregistered API key or an unexpected error, an appropriate message is sent to the client and the connection is closed.

@router.websocket("/chat/{api_key}")
async def ws_chat(websocket: WebSocket, api_key: str):
    ...

stream_manager.py

stream_manager.py is responsible for managing the conversation and handling user messages. It defines the `

View on GitHub
GitHub Stars289
CategoryDevelopment
Updated5d ago
Forks50

Languages

Python

Security Score

100/100

Audited on Mar 26, 2026

No findings