Voice2Text
This project demonstrates how to use the Gemini API to build your own virtual assistant.
Install / Use
/learn @JD277/Voice2TextREADME
Virtual Assistant with Gemini API
Objective
This project demonstrates how to use the Gemini API to build your own virtual assistant. The assistant responds to a wake word, processes user input, and provides intelligent answers. This project is designed as a hands-on tutorial for developers looking to learn how to integrate the Gemini API into their own projects, with a focus on voice interaction.
Features
Wake Word Activation: The assistant listens for the wake word "Hey ADA" before processing any input. Voice Recording: Once activated, the assistant records the user's voice, sends the audio input to the Gemini API, and receives a text response. Custom Responses: Based on user prompts, the assistant provides intelligent responses in both text and voice formats.
Requirements
Python 3.x Gemini API Access Libraries: pyaudio, speech_recognition
How It Works
Wake Word Detection: The virtual assistant continuously listens for the phrase "Hey ADA". Once heard, the assistant starts recording the user's voice. Processing: The recorded voice is sent to the Gemini API, where the user's query is processed. Response: The assistant receives a text response from the API, which is then converted to a voice response for the user.
Installation
Clone the repository:
git clone https://github.com/your-username/your-repo-name.git
Install the required dependencies:
pip install -r requirements.txt
Set up your Gemini API key in the environment variables:
GEMINI_API_KEY='your-api-key'
Run the application:
python app1.py
Files
- app1.py: Main application file that integrates wake word detection, voice recording, and communication with the Gemini API.
- grabar_audio2.py: Handles the audio recording and processing.
- on_start.py: Initializes the application and sets up necessary configurations.
Contribution
Feel free to contribute by opening issues or submitting pull requests.
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
