Tankwork
Desktop agent framework for creating AI agents that can see and control your computer through voice and text commands
Install / Use
/learn @AgentTankOS/TankworkREADME
TankWork
Overview
TankWork is an open-source desktop agent framework that enables AI to perceive and control your computer through computer vision and system-level interactions. Agents can:
- Control your computer directly through voice or text commands
- Process real-time screen content using computer vision and expert skill routing
- Interact through natural language voice commands and text input
- Provide continuous audio-visual feedback and action logging
- Switch seamlessly between assistant and computer control modes
Built for developers and researchers working on autonomous desktop agents, TankWork combines advanced computer vision, voice processing, and system control to create AI agents that can truly understand, analyze, and interact with computer interfaces.
Key Features
- 🎯 Direct Computer Control - Voice and text command execution
- 🔍 Computer Vision Analysis - Real-time screen processing
- 🗣️ Voice Interaction - Natural language with ElevenLabs
- 🤖 Customizable Agents - Configurable personalities and skills
- 📊 Real-time Feedback - Audio-visual updates and logging
System Requirements
- Recommended Platform: macOS with Apple Silicon (M1, M2, M3, M4) for optimal computer-use capabilities
- Python Version: 3.12 or higher
- Windows Support: Coming soon
- Display Settings: Computer-use is more accurate with a clean desktop
Quick Installation
1. Prerequisites
- Install Anaconda here (recommended for dependency management)
- Terminal/Command Prompt access
2. Clone Repository
# Clone repository
git clone https://github.com/AgentTankOS/tankwork.git
cd tankwork
3. Install Dependencies
# Install required packages
pip install --upgrade pip setuptools wheel
pip install -r requirements.txt
4. Configure Environment
Create a .env file in the project root:
# Copy example environment file
cp .env.example .env
Add your API keys and settings to .env:
# Required API Keys
GEMINI_API_KEY=your_api_key
OPENAI_API_KEY=your_api_key
ELEVENLABS_API_KEY=your_api_key
ANTHROPIC_API_KEY=your_api_key
# Voice Settings
ELEVENLABS_MODEL=eleven_flash_v2_5
# Computer Use Settings
COMPUTER_USE_IMPLEMENTATION=tank
COMPUTER_USE_MODEL=claude-3-5-sonnet-20241022
COMPUTER_USE_MODEL_PROVIDER=anthropic
# Narrative Processor
NARRATIVE_LOGGER_NAME=ComputerUse.Tank
NARRATIVE_MODEL=gpt-4o
NARRATIVE_TEMPERATURE=0.6
NARRATIVE_MAX_TOKENS=250
# Logging
LOG_LEVEL=INFO
5. Launch Application
python main.py
Features
Computer Use Mode
- Command-based computer control through text input or voice commands
- Advanced voice intent recognition for natural command interpretation
- Executes direct computer operations based on user commands
- Real-time voice narration of command execution
- Live action logging with visual status updates
- Continuous feedback through both audio and text channels

Assistant Mode
- Trigger via "Select Region" or "Full Screen" buttons, or voice commands
- Features voice intent determination system
- Real-time screen/vision analysis with expert skill routing
- Default Skill: Ticker Analysis
- Provides intelligent observation and advice based on screen content
- Live voice narration of analysis results
- Dynamic text logging of observations and insights

Voice Command System
- Voice intent determination for both Assistant and Computer Use modes
- Natural language processing for command interpretation
- Seamless switching between modes using voice commands
- Voice-activated ticker analysis and computer control
- Real-time audio feedback and confirmation
Example Commands:
-
Assistant Mode (triggers automatic screenshot + skill like Ticker Analysis):
- "What do you think about this token?"
- "Should I buy this token?"
- "Is this a good entry point?"
-
Computer Use Mode (triggers direct actions):
- "Go to Amazon"
- "Open my email"
- "Search for flights to Paris"
Real-Time Feedback System
- Live voice narration of all agent actions and analyses
- Dynamic text action logging with visual feedback
- Continuous status updates and command confirmation
- Immersive audio-visual user experience
Agent Configuration
Pre-configured Agents
TankWork comes with four pre-configured agents, each with distinct personalities and specializations. You can add new agents and customize all agents.
1. Gennifer
- Role: Lead Crypto Analyst
- Voice ID: 21m00Tcm4TlvDq8ikWAM
- Theme Color: #ff4a4a
- Specialization: Fundamental crypto metrics, community analysis
- Analysis Style: Focuses on sustainable growth patterns and risk management
- Tone: Clear, educational, encouraging
2. Twain
- Role: Narrative Specialist
- Voice ID: g5CIjZEefAph4nQFvHAz
- Theme Color: #33B261
- Specialization: Content creation and storytelling
- Analysis Style: Evaluates narrative structure and engagement
- Tone: Engaging, story-focused, balanced
3. Cody
- Role: Technical Web3 Architect
- Voice ID: cjVigY5qzO86Huf0OWal
- Theme Color: #4a90ff
- Specialization: Blockchain development and architecture
- Analysis Style: Technical implementation and security analysis
- Tone: Technical but approachable, systematic
4. Art
- Role: Creative AI Specialist
- Voice ID: bIHbv24MWmeRgasZH58o
- Theme Color: #F7D620
- Specialization: Digital art and design innovation
- Analysis Style: Aesthetic quality and creative innovation
- Tone: Imaginative and expressive
Agent Customization
New agents can be added and all agents can be fully customized through the configuration system:
AVATAR_CONFIG = {
"agent_id": {
"name": str,
"image_path": str, # Path to static avatar image
"video_path": str, # Path to avatar video animation
"voice_id": str, # ElevenLabs voice ID
"accent_color": str, # Hex color code for UI theming
"prompts": {
"personality": str, # Core personality traits
"analysis": str, # Analysis approach and focus
"narrative": str # Communication style and tone
},
"skills": List[str] # Available skill sets
}
}
Customizable Elements
-
Visual Identity
- Static avatar image
- Animated video avatar
- UI accent color scheme
-
Voice Configuration
- ElevenLabs voice ID selection
- Voice model parameters
-
Behavioral Settings
- Personality prompt templates
- Analysis frameworks
- Narrative style guidelines
-
Skill Configuration
- Assignable skill sets
- Analysis parameters
- Specialization focus
Contributing
Contributions are welcome! Please read our Contributing Guidelines for details on how to submit pull requests, report issues, and contribute to the project.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Related Skills
node-connect
352.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
