Voice2Text

This project demonstrates how to use the Gemini API to build your own virtual assistant.

Generate Convert Improve

Install / Use

/learn @JD277/Voice2Text

About this skill

Quality Score

0/100

README

Virtual Assistant with Gemini API

Objective

This project demonstrates how to use the Gemini API to build your own virtual assistant. The assistant responds to a wake word, processes user input, and provides intelligent answers. This project is designed as a hands-on tutorial for developers looking to learn how to integrate the Gemini API into their own projects, with a focus on voice interaction.

Features

Wake Word Activation: The assistant listens for the wake word "Hey ADA" before processing any input. Voice Recording: Once activated, the assistant records the user's voice, sends the audio input to the Gemini API, and receives a text response. Custom Responses: Based on user prompts, the assistant provides intelligent responses in both text and voice formats.

Requirements

Python 3.x Gemini API Access Libraries: pyaudio, speech_recognition

How It Works

Wake Word Detection: The virtual assistant continuously listens for the phrase "Hey ADA". Once heard, the assistant starts recording the user's voice. Processing: The recorded voice is sent to the Gemini API, where the user's query is processed. Response: The assistant receives a text response from the API, which is then converted to a voice response for the user.

Installation

Clone the repository:

git clone https://github.com/your-username/your-repo-name.git

Install the required dependencies:

pip install -r requirements.txt

Set up your Gemini API key in the environment variables:

GEMINI_API_KEY='your-api-key'

Run the application:

python app1.py

Files

app1.py: Main application file that integrates wake word detection, voice recording, and communication with the Gemini API.
grabar_audio2.py: Handles the audio recording and processing.
on_start.py: Initializes the application and sets up necessary configurations.

Contribution

Feel free to contribute by opening issues or submitting pull requests.

Related Skills

node-connect

344.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

99.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

344.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

344.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。