Pipecatemma
AI Home Care Voice Assistant - Emma, your compassionate voice companion for elderly care using Pipecat and OpenAI Realtime API
Install / Use
/learn @ArkMaster123/PipecatemmaREADME
🎙️ Emma AI Voice Assistant
A cutting-edge speech-to-speech AI assistant powered by OpenAI's Realtime API with stunning real-time shader visualizations that respond to voice activity.
✨ Features
🗣️ Real-Time Voice Interaction
- Speech-to-Speech: Direct voice conversation with Emma AI
- Low Latency: WebRTC connection for minimal delay
- Voice Activity Detection: Automatic speech detection with server VAD
- Natural Conversations: Handles interruptions and maintains context
🎨 Stunning Visual Experience
- Real-Time Shader Animations: Beautiful WebGL shaders that respond to voice
- Voice-Reactive Visuals: Animations change based on:
- Your voice volume and frequency
- Emma's response activity
- Connection status
- Smooth Transitions: Fluid animations with exponential smoothing
- Dynamic Intensity: Visual feedback adapts to conversation flow
🔧 Advanced Audio Processing
- Frequency Analysis: Real-time FFT analysis of audio streams
- Voice Detection: Smart algorithms to detect human speech patterns
- Dual Audio Monitoring: Analyzes both input (your voice) and output (Emma's voice)
- Audio Quality Controls: Echo cancellation, noise suppression, auto-gain
🛡️ Enterprise-Grade Security
- Ephemeral Tokens: Secure session management with auto-expiring keys
- Server-Side API Keys: Your OpenAI keys never leave the server
- Robust Error Handling: Comprehensive error recovery and user feedback
🚀 Getting Started
Prerequisites
- Node.js 18+
- OpenAI API key with Realtime API access
- Modern web browser with WebRTC support
Installation
-
Clone the repository
git clone <repository-url> cd emma-voice-assistant -
Install dependencies
npm install -
Set up environment variables
cp .env.local.example .env.localAdd your OpenAI API key to
.env.local:OPENAI_API_KEY=your_openai_api_key_here -
Run the development server
npm run dev -
Open your browser Navigate to
http://localhost:3000/emma-advanced
🎮 How to Use
🔌 Connect to Emma
- Click the "Connect" button
- Allow microphone permissions when prompted
- Wait for the connection status to show "Connected"
🎤 Start Talking
- Emma automatically listens when connected
- The shader box will animate based on your voice
- Green pulsing ring indicates you're speaking
- Purple pulsing ring shows Emma is responding
🎛️ Controls
- 🎤 Mic Button: Toggle microphone on/off
- 🔊 Volume Button: Mute/unmute Emma's responses
- 📞 Disconnect Button: End the session
📝 View Transcript
- Real-time conversation transcript appears below
- Shows both your words and Emma's responses
- Timestamps for each interaction
🏗️ Architecture
🔄 Connection Flow
Frontend → Backend API → OpenAI Realtime API
↓ ↓ ↓
WebRTC ← Ephemeral Token ← Session Creation
🎵 Audio Processing Pipeline
Microphone → Web Audio API → Frequency Analysis → Shader Animation
↓
WebRTC Stream → OpenAI Realtime API
↓
Speaker ← Audio Element ← WebRTC Response ← Emma's Voice
↓
Shader Animation ← Output Analysis ← Audio Monitoring
📁 Project Structure
src/
├── app/
│ ├── api/emma/realtime/ # Backend API endpoints
│ │ ├── session/ # Session management
│ │ ├── connect/ # WebRTC connection
│ │ └── disconnect/ # Session cleanup
│ └── emma-advanced/ # Main voice interface
├── components/
│ ├── ui/ # UI components
│ └── ShaderBackground.tsx # WebGL shader renderer
├── types/ # TypeScript definitions
└── hooks/ # Custom React hooks
🔧 Technical Details
🌐 WebRTC Implementation
- Direct peer connection to OpenAI's Realtime API
- Automatic ICE candidate handling
- Data channels for event communication
- Media stream management for audio I/O
🎨 Shader System
- WebGL-based real-time rendering
- Responds to audio frequency data
- Multiple animation states and transitions
- Optimized for 60fps performance
📊 Audio Analysis
- FFT Size: 256 bins for frequency analysis
- Smoothing: 0.8 time constant for stable visuals
- Voice Detection: Frequency range analysis (85Hz-255Hz)
- Real-time Processing: 60fps analysis loop
🔐 Security Features
- Ephemeral API tokens (1-minute expiration)
- Server-side API key management
- Input validation and sanitization
- Comprehensive error handling
🎯 API Endpoints
POST /api/emma/realtime/session
Creates a new voice session with ephemeral token
- Input: Voice settings (voice, instructions, temperature)
- Output: Session ID, expiration, client secret
POST /api/emma/realtime/connect
Establishes WebRTC connection (fallback endpoint)
- Input: Session ID, SDP offer
- Output: SDP answer
POST /api/emma/realtime/disconnect
Cleanly terminates voice session
- Input: Session ID
- Output: Success confirmation
🛠️ Configuration
🎵 Voice Settings
{
voice: 'alloy' | 'echo' | 'fable' | 'onyx' | 'nova' | 'shimmer',
instructions: string,
temperature: 0.0 - 2.0
}
🎨 Shader Parameters
{
audioLevel: 0.0 - 1.0, // Overall volume
voiceActivity: 0.0 - 1.0, // Voice-specific activity
botSpeaking: boolean, // Emma response state
intensity: 0.3 - 1.5, // Animation intensity
speed: 0.3 - 2.0, // Animation speed
complexity: 0.0 - 1.0 // Visual complexity
}
🐛 Troubleshooting
Common Issues
"Invalid or missing API key"
- Ensure your OpenAI API key is set in
.env.local - Verify your key has Realtime API access
"Microphone permission denied"
- Allow microphone access in your browser
- Check browser security settings
"WebRTC connection failed"
- Ensure stable internet connection
- Try refreshing the page
- Check browser WebRTC support
"Session creation failed"
- Verify OpenAI API key validity
- Check OpenAI service status
- Review server console logs
🔮 Future Enhancements
- [ ] Multiple voice profiles
- [ ] Conversation history persistence
- [ ] Custom shader themes
- [ ] Mobile app support
- [ ] Multi-language support
- [ ] Voice training capabilities
🤝 Contributing
We welcome contributions! Please see our contributing guidelines for details on:
- Code style and standards
- Pull request process
- Issue reporting
- Feature requests
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- OpenAI for the incredible Realtime API
- WebRTC community for real-time communication standards
- Three.js community for WebGL inspiration
- React and Next.js teams for the amazing frameworks
Made with 💖 by Vibe Coder
Experience the future of voice AI interaction with Emma - where technology meets artistry in perfect harmony.
Related Skills
node-connect
325.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
80.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
325.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
80.3kCommit, push, and open a PR
