WeClone
🚀 One-stop solution for creating your AI twin from chat history 💡 Fine-tune LLMs with your chat logs to capture your unique style, then bind to a chatbot to bring your digital self to life. 从聊天记录创造数字分身的一站式解决方案
Install / Use
/learn @xming521/WeCloneREADME
<a href="https://qm.qq.com/cgi-bin/qm/qr?k=wNdgbOVT6oFOJ2wlMLsolUXErW9ESLpk&jump_from=webapi&authKey=z/reOp6YLyvR4Tl2k2nYMsLoMC3w9/99ucgKMX0oRGlxDV/WbYnvq2QxODoIkfxn" target="_blank" style="text-decoration: none;">
<img src="https://img.shields.io/badge/QQ群-708067078-12B7F5?style=for-the-badge&logo=qq&logoColor=white" alt="WeClone①" title="WeClone①">
</a>
<a href="https://hellogithub.com/repository/12ab209b56cb4cfd885c8cfd4cfdd53e" target="_blank"><img src="https://abroad.hellogithub.com/v1/widgets/recommend.svg?rid=12ab209b56cb4cfd885c8cfd4cfdd53e&claim_uid=RThlPDoGrFvdMY5" alt="Featured|HelloGitHub" style="width: 150px; height: 28px;" /></a> <a href="https://trendshift.io/repositories/13759" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13759" alt="xming521%2FWeClone | Trendshift" style="width: 220px; height: 50px;" /></a> <a href="https://deepwiki.com/xming521/WeClone"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki" style="width: 134px; height: 23px;margin-bottom: 3px;"></a>
</div> <p align="center"> <a href="https://github.com/xming521/WeClone/blob/master/README_zh.md" target="_blank">简体中文</a>| English</a>| <a href="https://www.weclone.love/" target="_blank"> Project Homepage </a> | <a href="https://docs.weclone.love/docs/introduce/what-is-weclone.html" target="_blank"> Documentation </a> </p>[!IMPORTANT]
Telegram is now supported as a data source !
✨Core Features
- 💫 Complete end-to-end solution for creating digital avatars, including chat data export, preprocessing, model training, and deployment
- 💬 Fine-tune LLM using chat history with support for image modal data, infusing it with that authentic "flavor"
- 🔗 Integrate with Telegram, WhatsApp (coming soon) to create your own digital avatar
- 🛡️ Privacy information filtering with localized fine-tuning and deployment for secure and controllable data
📋Features & Notes
Data Source Platform Support
| Platform | Text | Images | Voice | Video | Animated Emojis/Stickers | Links (Sharing) | Quote | Forward | Location | Files | |----------|------|--------|-------|-------|-----------------|-----------------|-------|---------|----------|-------| | Telegram | ✅ | ✅ | ❌ | ❌ | ⚠️Convert to Emoji | ❌ | ❌ | ✅ | ✅ | ❌ | | WhatsApp | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | | Discord | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | | Slack | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 |
Deployment Platform Support
| Platform | Deployment Support | |----------|--------------------| | Telegram | ✅ | | WhatsApp | 🚧 | | Discord | ✅ | | Slack | ✅ |
[!IMPORTANT]
- WeClone is still in rapid iteration phase, current performance does not represent final results.
- LLM fine-tuning effectiveness largely depends on model size, quantity and quality of chat data. Theoretically, larger models with more data yield better results.
- The performance of the 7B model is average, while models with 14B or more parameters tend to deliver better results.
- Windows environment has not been rigorously tested. You can use WSL as the runtime environment.
Recent Updates
[25/07/10] Data source added Telegram
[25/06/05] Support for image modal data fine-tuning
Online Fine-Tuning
- Big Model Lab (Lab4AI) (with 50 CNY voucher): https://www.lab4ai.cn/project/detail?utm_source=weclone1&id=ab83d14684fa45d197f67eddb3d8316c&type=project
Hardware Requirements
The project uses Qwen2.5-VL-7B-Instruct model by default with LoRA method for SFT stage fine-tuning. You can also use other models and methods supported by LLaMA Factory.
Estimated VRAM requirements:
| Method | Precision | 7B | 14B | 30B | 70B | xB |
| ------------------------------- | --------- | ----- | ----- | ----- | ------ | ------- |
| Full (bf16 or fp16) | 32 | 120GB | 240GB | 600GB | 1200GB | 18xGB |
| Full (pure_bf16) | 16 | 60GB | 120GB | 300GB | 600GB | 8xGB |
| Freeze/LoRA/GaLore/APOLLO/BAdam | 16 | 16GB | 32GB | 64GB | 160GB | 2xGB |
| QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | xGB |
| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | x/2GB |
| QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | x/4GB |
Environment Setup
-
CUDA installation (skip if already installed, requires version 12.6 or above)
-
It is recommended to use uv to install dependencies, which is a very fast Python environment manager. After installing uv, you can use the following commands to create a new Python environment and install dependencies.
git clone https://github.com/xming521/WeClone.git && cd WeClone
uv venv .venv --python=3.12
source .venv/bin/activate # windows .venv\Scripts\activate
uv pip install --group main -e .
- Copy the configuration file template and rename it to
settings.jsonc, and make subsequent configuration changes in this file:
cp examples/tg.template.jsonc settings.jsonc
[!NOTE] Training and inference related configurations are unified in the file
settings.jsonc
- Use the following command to test whether the CUDA environment is correctly configured and can be recognized by PyTorch (not needed for Mac):
python -c "import torch; print('CUDA Available:', torch.cuda.is_available());"
- (Optional) Install FlashAttention to accelerate training and inference:
uv pip install flash-attn --no-build-isolation.
Model Download
It is recommended to use Hugging Face to download models, or use the following command:
git lfs install
git clone https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct models/Qwen2.5-VL-7B-Instruct
Data Preparation
Please use Telegram Desktop to export chat records. Click the top right corner in the chat interface, then click "Export chat history". Select Photos for message types and JSON for format. You can export multiple contacts (group chat records are not recommended), then place the exported ChatExport_* in the ./dataset/telegram directory, meaning put different people's chat record folders together in ./dataset/telegram.
Data Preprocessing
- First, modify the
language,platform, andinclude_typein the configuration file according to your needs. - If you use telegram, you need to modify the
telegram_args.my_idin the configuration file to your own telegram user ID. - By default, the project uses Microsoft Presidio to remove
phone numbers, email addresses, credit card numbers, IP addresses, geographic location names, international bank account numbers, cryptocurrency wallet addresses, age information, and generic ID numbersfrom the data, but it cannot guarantee 100% identification. - Therefore, a blocklist
blocked_wordsis provided insettings.jsonc, allowing users to manually add words or phrases they want to filter (the entire sentence containing blocked words will be removed by default).
[!IMPORTANT] 🚨 Please be sure to protect personal privacy and do not leak personal information!
- Execute the following command to process the data. You can modify the
make_dataset_argsin settings.jsonc according to your own chat style.
weclone-cli make-dataset
More Parameter Details: Data Preprocessing
Configure Parameters and Fine-tune Model
- (Optional) Modify
model_name_or_path,template,lora_targetinsettings.jsoncto select other locally downloaded models. - Modify
per_device_train_batch_sizeandgradient_accumulation_stepsto adjust VRAM usage. - You can modify parameters like
num_train_epochs,lora_rank,lora_dropoutintrain_sft_argsbased on your dataset's quantity and quality.
Single GPU Training
weclone-cli train-sft
Multi-GPU Training
Uncomment the deepspeed line in settings.jsonc and use the following command for multi-GPU training:
uv pip install "deepspeed<=0.16.9"
deepspeed --num_gpus=number_of_gpus weclone/train/train_sft.py
Simple Inference with Browser Demo
Test suitable temperature and top_p values, then modify infer_args in settings.jsonc for subsequent inference use.
weclone-cli webchat-demo
Inference Using API
weclone-cli server
Test with Common Chat Questions
Does not include questions asking for personal information, only daily conversation. Test results are in test_result-my.txt.
weclone-cli server
weclone-cli test-model
🖼️ Results Showcase
[!TIP] **We're looking for interesting examples of nati
Related Skills
openhue
328.7kControl Philips Hue lights and scenes via the OpenHue CLI.
sag
328.7kElevenLabs text-to-speech with mac-style say UX.
weather
328.7kGet current weather and forecasts via wttr.in or Open-Meteo
tweakcc
1.4kCustomize Claude Code's system prompts, create custom toolsets, input pattern highlighters, themes/thinking verbs/spinners, customize input box & user message styling, support AGENTS.md, unlock private/unreleased features, and much more. Supports both native/npm installs on all platforms.
