JARVIS
Your own personal voice assistant: Voice to Text to LLM to Speech, displayed in a web interface
Install / Use
/learn @AlexandreSajus/JARVISREADME
JARVIS
<p align="center"> <img src="media/cqb_conv.png" alt="JARVIS helping me choose a firearm" width="100%"/> </p>Your own voice personal assistant: Voice to Text to LLM to Speech, displayed in a web interface.
How it works
- :microphone: The user speaks into the microphone
- :keyboard: Voice is converted to text using <a href="https://deepgram.com/" target="_blank">Deepgram</a>
- :robot: Text is sent to <a href="https://openai.com/" target="_blank">OpenAI</a>'s GPT-3 API to generate a response
- :loudspeaker: Response is converted to speech using <a href="https://elevenlabs.io/" target="_blank">ElevenLabs</a>
- :loud_sound: Speech is played using <a href="https://www.pygame.org/wiki/GettingStarted" target="_blank">Pygame</a>
- :computer: Conversation is displayed in a webpage using <a href="https://github.com/Avaiga/taipy" target="_blank">Taipy</a>
Video Demo
<p align="center"> <a href="https://youtu.be/aIg4-eL9ATc" target="_blank"> <img src="media/git_thumb.png" alt="Youtube Devlog" width="50%"/> </a> </p>Requirements
Python 3.8 - 3.11
Make sure you have the following API keys:
- <a href="https://developers.deepgram.com/docs/authenticating" target="_blank">Deepgram</a>
- <a href="https://platform.openai.com/account/api-keys" target="_blank">OpenAI</a>
- <a href="https://elevenlabs.io/docs/api-reference/text-to-speech" target="_blank">Elevenlabs</a>
How to install
- Clone the repository
git clone https://github.com/AlexandreSajus/JARVIS.git
- Install the requirements
pip install -r requirements.txt
- Create a
.envfile in the root directory and add the following variables:
DEEPGRAM_API_KEY=XXX...XXX
OPENAI_API_KEY=sk-XXX...XXX
ELEVENLABS_API_KEY=XXX...XXX
How to use
- Run
display.pyto start the web interface
python display.py
- In another terminal, run
jarvis.pyto start the voice assistant
python main.py
- Once ready, both the web interface and the terminal will show
Listening... - You can now speak into the microphone
- Once you stop speaking, it will show
Stopped listening - It will then start processing your request
- Once the response is ready, it will show
Speaking... - The response will be played and displayed in the web interface.
Here is an example:
Listening...
Done listening
Finished transcribing in 1.21 seconds.
Finished generating response in 0.72 seconds.
Finished generating audio in 1.85 seconds.
Speaking...
--- USER: good morning jarvis
--- JARVIS: Good morning, Alex! How can I assist you today?
Listening...
...
<p align="center">
<img src="media/good_morning.png" alt="Saying good morning" width="80%"/>
</p>