Dexter, A Voice-Controlled Assistant

Dexter is a voice-controlled assistant, akin to Google Home, Siri and Alexa.

Dexter's your right hand!

Table of Contents

Quick start
Prerequisites
Configuration
Running
Components
- Inputs
- Outputs
- Services
- Notifiers
Notes
Related work
Project status
Press
Bugs

Quick start

If you quickly want to get up and running then:

Do a git clone of the repo: git clone https://github.com/iamsrp/dexter
Install the prerequisites: bash dexter/requirements
Make sure you have a microphone and speaker
Try running dexter.py with the appropriate config file for your distro (e.g. env -u DISPLAY TERM=dumb ./dexter.py -c pi_config)
Possibly wait for a little bit while some of the input and output components download their models for the first time

Running bash ./requirements will install most of what Dexter needs to function. Running env ALL=true bash ./requirements will install everything. Be warned that the requirements script installs a lot of stuff; having a (cheap) dedicated machine for running Dexter is probably wise.

The example_config file has a decent overview of the various services, and it's recommended that you crib from that. More documentation on the different components can be found in their modules.

Prerequisites

Linux:
- Raspberry Pi OS, on a Raspberry Pi
- Ubuntu x86_64
Python 3.
Around 1G to 2G of free disk space (if you want to use Whisper, Coqui or Vosk with a good model).
Most of what is listed in the requirements file. What you actually need will depend on what components you add.

Some of the components need extra package installed to make them work (e.g. Spotify needs various magic); this is generally documented in the module's PyDoc.

Speech Recognition Notes

The most accurate Speech-to-Text (STT) engine is OpenAI's Whisper, but it's also the heaviest and requires mildly decent hardware in order to run at a reasonable speed. You can also run an STT engine on a remote host (e.g. using the whisper_server.py script) and use the RemoteInput class to offload the work to it.

If you want to use the Vosk STT engine then the Usage examples section on its install page should be enough to tell you how to install it. The various models are on its models page, though you will need the 64bit version of Raspberry Pi OS if you want to load in the full model, since it needs about 6Gb to instantiate. (See the Vosk install page for info on getting the 64bit whl file.) Per the Vosk developers, you can remove rescore and rnnlm folders from the models to make the full model run if memory is limited.

For Coqui you'll need the trained models and scorer from their site. For more information on setting up Coqui read their project page and documentation.

When it comes to audio input, make sure that you have a decent microphone with no noise; try listening to some arecord output to make sure it sounds clear. You can also provide a wav_dir argument for all of the audio input components, like dexter.input.openai_whisper.WhisperInput.

Architecture Specifics

Dexter has been developed using a 4Gb Raspberry Pi 4 and a x86_64 Ubuntu machine. It's also been tested on various other hardware.

If you're running Dexter on a Raspberry Pi or the StarFive VisionFive2 then make sure that ALSA is working by testing aplay and arecord, tweaking volume and recording levels with alsamixer. If it is not then you may well get strange errors from pyaudio. You might also want a /home/pi/.asoundrc file, see the dot.asoundrc.* examples in the top-level directory.

You can see the different hardware devices in alsamixer, via F6. You might also need to set the Audio Output in the System settings in raspi-config to your preference. On the StarFive board the built-in audio-out doesn't currently seem work (for me) and using a USB audio adapter seems wonky too.

For input, Coqui supports Tensorflow Light and it does a pretty decent job of recognition in near realtime. Vosk is also a more recent speech-to-text engine which seems to work well.

OpenAI's Whisper requires PyTorch to run and this is only available on the Pi 64bit OS. However it's not very fast and takes about 4x realtime to decode audio on a Pi 4; using a remote server for it is recommended.

For the StarFive VisionFive2 there's only really early Ubuntu support and most third-party libraries don't have RISC-V versions as yet. PocketSphinx seems to work as a SST engine but is very slow (10x realtime). So you probably want to run Whisper on a remote server just like you would for the Pi.

The Pi Zero 2 will also run Dexter as well but has, of course, just 512Mb of RAM. So, once again, offloading the STT engine to a server is recommended. Apart from that it looks to be fine.

The Pi also has some really great, and cheap, HATs which can be used for user feedback on status (see below). The current code supports a couple of these but adding support for new ones is pretty easy, should you be feeling keen.

Hardware

When using a Raspberry Pi 4 to drive Dexter I've found the following work for me:

Samson Go Mic Portable USB Condenser Microphone
One of these is useful to tell you what it's thinking (and the Mini HATs have buttons, which can be handy):
Any old speaker!

Configuration

The configuration for Dexter is done via a JSON5 file; sorry, but it was an easy way to do it. Unlike vanilla JSON, JSON5 supports comments so the example configuration files have some associated annotation.

The file is expected to have three main dict entries: key_phrases, notifiers and components. The key_phrases are a list of strings which Dexter will listen for, in order to spot a command. For example, "Dexter, what's the time?"

The notifiers are ways in which Dexter let's the user know what it's currently doing (e.g. actively listening, speaking back to you, etc.).

The components section should be a dict with the following entries: inputs, outputs and services; each of these is a list of component definitions. Each component definition is a [string,dict] pair. The string is the fully-qualified name of the component class; the dict is a list of keyword arguments (kwargs; variable name & value pairs) which will be used to instantiate the class. The values in the kwargs may contain environment variables enclosed by dollar-sign-denoted curly braces, of the form ${VARIABLE_NAME}.

The test_config file is a simple example, the platform specific ones are more fleshed out, and example_config is fully annotated.

Running

You can run the client like this:

cd the_checkout_directory
nohup ./dexter.py -c test_config > dexter.log 2>&1 &

(If that crashes because the DISPLAY isn't accessible (thanks pygame) then add env -u DISPLAY TERM-dumb at the start. The TERM=dumb is needed since more recent versions of pygame seem to do nasty things with curses which totally borks the terminal. Hence you need to set the terminal to a dumb one, or pipe the output to a file like in the above, or both. Yuck.)

You can then stop it with a CTRL-c or by sending it a SIGINT.

Technical Details

Overview

The system has the following main parts:

Notifiers: These communicate state to the outside world
Components: These are the active parts of the system
- Inputs: Get requests in
- Outputs: Communicate responses back out
- Services: Perform requested tasks

The system has an event loop which listens for requests using the inputs and, when one comes in, it sends the request to each of the services.

Each service will determine whether it thinks it can handle the request and, if so, creates a Handler instance to do so. The service then hands this handler back to the system. It also includes a belief value denoting how sure it was that the request was for it, and whether any handling should be exclusive.

The system then sorts the returned handlers according to belief and invokes the first one. If that handler was exclusive then it stops, otherwise it invokes the next, and so on.

Services can also register timer events with the event loop. This is handy for, say, setting alarms to ring at certain times. When active, services can also inform the system of their state (e.g. whether they're handling input, processing a request, performing an action, or outputting a response). The notifiers can use these status updates to inform the user of what's going on.

And that's pretty much it. Mostly, if you want to add a service then it's probably easiest to take an existing one (e.g. the EchoService and use it as a template). Yes, it's cargo cult programming but, at the end of the day, if it works then...

Notifiers

The Notifiers are how Dexter tells the user what it's doing. For example, if it has started listening or if it's querying an outside service, then it uses the notifiers to say so.

There are these types right now:

A simple logging notifier, which writes to the console.
Ones for the [Pimoroni Unicorn HAT HD](https://shop.pimoroni.com/pr

Dexter

Install / Use

README