Jarvis

Windows-based offline voice assistant with local speech recognition and LLM processing

Generate Convert Improve

Install / Use

/learn @jmtitan/Jarvis

About this skill

Quality Score

0/100

README

Jarvis Voice Assistant

English | 中文

English Documentation

Jarvis is a Windows-based local voice assistant system featuring real-time voice activation, speech recognition, local language model processing, and voice synthesis. It's designed to operate completely offline, ensuring privacy while providing a comprehensive voice interaction experience.

Quick Start

To get started with Jarvis:

# Clone the repository
git clone https://github.com/jmtitan/Jarvis.git
cd Jarvis

# Set up conda environment and install dependencies
conda create -n jarvis python=3.8
conda activate jarvis
pip install -r requirements.txt

# Download required model files (not included in repo)
# See "Installing Whisper" section for details

Key Features

Real-time voice activation with WebRTC VAD
Offline speech recognition powered by Whisper.cpp
Local large language model processing via Ollama
Multiple voice synthesis options with Edge TTS
System tray integration for easy access
Global hotkey support for quick controls
Customizable voice, speech rate and volume
Settings UI for easy configuration

System Requirements

Windows 10/11
Python 3.8+
At least 6-core CPU
16GB RAM
Recommended: NVIDIA GPU (8GB+ VRAM)

Detailed Installation Guide

1. Setting up Python Environment

Install Python 3.8 or newer from python.org

Create and activate conda environment:

conda create -n jarvis python=3.8
conda activate jarvis

Install the required dependencies:
```
pip install -r requirements.txt
```
Note: If you encounter issues installing PyAudio, you may need to install it from a wheel file:
```
pip install pipwin
pipwin install pyaudio
```

2. Installing Whisper

The project uses Whisper.cpp for speech recognition, which requires model files and binaries.

Whisper Model Files

The project includes tiny.en.bin and base.en.bin models in the models directory
If you need to download them manually:
- Visit ggerganov/whisper.cpp/releases
- Download either ggml-tiny.en.bin or ggml-base.en.bin
- Rename the files to tiny.en.bin or base.en.bin
- Place them in the models directory

Whisper.cpp Binaries

The project includes necessary binaries in the whisper_bin directory
If you need to install them manually:
- Visit ggerganov/whisper.cpp/releases
- Download the Windows release zip file
- Extract the contents
- Copy whisper.dll, main.exe and related files to the whisper_bin directory

3. Installing Ollama (for LLM support)

Install Ollama:
```
winget install Ollama
```
Or download from Ollama's official website
After installation, download a compatible model:
```
ollama pull llama2
```
(You can replace llama2 with another model of your choice, such as mistral or phi)

4. Configure the Application

Review and modify config.yaml according to your preferences:
- Adjust audio settings
- Change voice settings
- Configure hotkeys
- Set LLM parameters

Usage Instructions

Start the assistant:
- Double click start_jarvis.bat in the root directory
- Or run bat/jarvis_window.bat directly
The system will initialize and show in the system tray.
System tray options:
- Status: Shows current assistant status
- Settings: Opens the settings window
- Exit: Closes the application
Default global hotkeys:
- Ctrl + F1: Toggle listening mode
- Ctrl + F2: Switch between available voices
- Ctrl + F3: Adjust speech rate
Using the assistant:
- The system monitors your microphone for speech
- When speech is detected, it's transcribed using Whisper
- The transcription is sent to the local LLM for processing
- The LLM's response is synthesized into speech and played back

Troubleshooting

No audio input detected: Check your microphone settings and ensure the correct device is selected
Speech recognition issues: Try using a larger model like base.en.bin for better accuracy
LLM not responding: Ensure Ollama is running and you've pulled a model
TTS not working: Check your internet connection as Edge TTS requires connectivity

Future Plans

1. More Human-like Conversation Experience

Allow users to interrupt Jarvis while speaking
Jarvis only processes the most recent user audio as conversation topic
More natural conversation flow and tone
Context awareness and emotional understanding

2. Memory Functionality

Users can imprint their own thought patterns on Jarvis
Long-term memory storage and retrieval
Personalized conversation style adaptation
User preference learning and memory

3. MCP (Model Context Protocol)

Allow Jarvis to operate desktop files and basic software
Implement email viewing/replying functionality
Schedule planning and management
Basic agent capabilities, such as:
- File management and organization
- Application control
- Calendar and reminder management
- Email processing and response
- Simple automation tasks

License

MIT License

中文文档

Jarvis是一个基于Windows的本地语音助手系统，具有实时语音激活、语音识别、本地语言模型处理和语音合成功能。它完全离线运行，在提供全面语音交互体验的同时确保隐私安全。

快速开始

开始使用Jarvis：

# 克隆仓库
git clone https://github.com/jmtitan/Jarvis.git
cd Jarvis

# 设置conda环境并安装依赖
conda create -n jarvis python=3.8
conda activate jarvis
pip install -r requirements.txt

# 下载所需模型文件（未包含在仓库中）
# 详见"安装Whisper"部分

主要特性

基于WebRTC VAD的实时语音激活
基于Whisper.cpp的离线语音识别
通过Ollama进行本地大语言模型处理
支持Edge TTS的多种语音合成选项
系统托盘集成，便于访问
支持全局热键快速控制
可自定义语音、语速和音量
设置界面便于配置

系统要求

Windows 10/11
Python 3.8+
至少6核CPU
16GB RAM
推荐：NVIDIA GPU（8GB+ VRAM）

详细安装指南

1. 设置Python环境

从python.org安装Python 3.8或更新版本

创建并激活conda环境：

conda create -n jarvis python=3.8
conda activate jarvis

安装所需依赖：
```
pip install -r requirements.txt
```
注意：如果安装PyAudio时遇到问题，可能需要从wheel文件安装：
```
pip install pipwin
pipwin install pyaudio
```

2. 安装Whisper

项目使用Whisper.cpp进行语音识别，需要模型文件和二进制文件。

Whisper模型文件

项目在models目录中包含tiny.en.bin和base.en.bin模型
如需手动下载：
- 访问ggerganov/whisper.cpp/releases
- 下载ggml-tiny.en.bin或ggml-base.en.bin
- 将文件重命名为tiny.en.bin或base.en.bin
- 放置在models目录中

Whisper.cpp二进制文件

项目在whisper_bin目录中包含必要的二进制文件
如需手动安装：
- 访问ggerganov/whisper.cpp/releases
- 下载Windows版本的zip文件
- 解压内容
- 将whisper.dll、main.exe等相关文件复制到whisper_bin目录

3. 安装Ollama（用于LLM支持）

安装Ollama：
```
winget install Ollama
```
或从Ollama官网下载
安装后，下载兼容的模型：
```
ollama pull llama2
```
（可以用其他模型替换llama2，如mistral或phi）

4. 配置应用程序

根据个人偏好查看和修改config.yaml：
- 调整音频设置
- 更改语音设置
- 配置热键
- 设置LLM参数

使用说明

启动助手：
- 双击根目录下的start_jarvis.bat
- 或直接运行bat/jarvis_window.bat
系统将初始化并显示在系统托盘中。
系统托盘选项：
- 状态：显示当前助手状态
- 设置：打开设置窗口
- 退出：关闭应用程序
默认全局热键：
- Ctrl + F1：切换监听模式
- Ctrl + F2：切换可用语音
- Ctrl + F3：调整语速
使用助手：
- 系统监控麦克风的语音输入
- 检测到语音时，使用Whisper进行转录
- 将转录发送到本地LLM进行处理
- LLM的响应通过语音合成播放

故障排除

未检测到音频输入：检查麦克风设置并确保选择了正确的设备
语音识别问题：尝试使用更大的模型如base.en.bin以提高准确性
LLM无响应：确保Ollama正在运行且已下载模型
TTS不工作：检查网络连接，因为Edge TTS需要联网

未来计划

1. 更拟人的对话体验

允许用户打断Jarvis的发言
Jarvis只截取最近的用户音频作为对话主题
更自然的对话流程和语气
上下文感知和情绪理解

2. Memory功能

用户可以为Jarvis打上自己的思想烙印
长期记忆存储和检索
个性化对话风格适应
用户偏好学习和记忆

3. MCP（Model Context Protocol）

允许Jarvis操作桌面文件和基本软件
实现查看/回复邮件功能
制定和管理日程计划
基本的agent功能，如：
- 文件管理和组织
- 应用程序控制
- 日历和提醒管理
- 邮件处理和回复
- 简单的自动化任务

Related Skills

node-connect

353.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

353.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

353.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。