MSCognitiveSpeechForVoiceAttack

Enables use of the Microsoft Cognitive text-to-speech service with Voice Attack

Generate Convert Improve

Install / Use

/learn @jamescl604/MSCognitiveSpeechForVoiceAttack

About this skill

Quality Score

0/100

README

Microsoft Cognitive Speech plug-in for Voice Attack

Enables the use of the more advanced text-to-speech capabilities in Voice Attack profiles. Currently Voice Attack only supports the voices that come natively in Windows which aren't the greatest. Microsoft has better ones online, especially the more human sounding "neural" voices.

Try out examples of the Cognitive Service voices to see what you think.

If you enjoy the plug-in, feel free to buy me coffee :)

Screen Capture

Features

Use any of the voices included in Cognitive Services (including the more human sounding "neural" voices)
Native support for international languages (provided by Microsoft Cognitive Services)
(Optional) Add a "Radio" effect so spoken text sounds like its coming over a radio (ideal for plane/combat type simulations)
(Optional) Ability to use SSML for fine-control of how the speech is synthesized
(Optional) Ability to cache synthesized audio outputs locally to avoid additional service calls
Automatic clean up of cached files (configurable expiration period)
Ability to override config options by using Voice Attack variables (i.e. MSCognitiveTextToSpeech.DefaultVoiceName)

Dependencies

Azure Speech service subscription (free tier will work fine, especially with caching turned on)
Voice Attack (free or paid)

Who is this for?

Mainly gamers and flight sim players who want the better speech capabilities for their Voice Attack commands. I use it for Microsoft Flight Simulator 2020 to simulate having a co-pilot but you can use it for really anything where you want to a command to say something.

Inspiration and Kudos

Other people and work that inspired and help me accomplish this project:

Virtual Voyager : Immersive Aviation for MSFS 2020
Mark Heath and contributors to NAudio
MRacko for MSFS Mobile Companion App

Setup

YouTube

This video goes over the install and setup steps below: https://www.youtube.com/watch?v=DvJ8FcthEO8

Setup

Create a free Azure user account if you don't have one
Setup a Speech Service subcription in your account (free tier)
Get the Subscription Key and Region for your Speech subscription
Download the latest release of the plug-in (zip file)
Open/extract the zip file, copy the MSCognitiveTextToSpeech folder to your Voice Attack "apps" folder (i.e. Voice Attack/apps/MSCognitiveTextToSpeech/)
Edit the settings in the MSCognitiveTextToSpeech.dll.config file
- Modify AzureSubscriptionKey and AzureRegion settings
- Review and modify any of the other settings if you don't like the defaults
In Voice Attack, go Options (wrench icon) > General > ensure "Enable Plug-in Support" is enabled
Run a test :
- Add a command that calls the plug-in (under the Other > Advanced > Execute an External Plugin Function)
- Set the "Context" field to the text your want to be spoken (it also supports tokens)
- Ensure the "Wait for the plug-in function to finish before continuing" is enabled
- Save the command then execute it. You should hear the text spoken out loud in the voice you selected if it's working. If not, see the Troubleshooting section below.

Note: the plug-in is pre-configured to save (cache) the generated audio files into the Voice Attack/Sounds/MSCognitiveTextToSpeech folder. You can disable this or change the location in the "Voice Attack/Apps/MSCognitiveTextToSpeech/MSCognitiveTextToSpeech.dll.config" file

FAQ

Is this free? Are there any costs?

The plug-in is completely free to use.
Voice Attack has a free and a paid option (details) Paid is typically only $10 USD.
Cognitive Services supports 5 million characters for standard voices and 0.5 million characters for neural voices in the free tier. This should be plenty for most people especially with the way the plug-in caches speech for phrases it's already processed. [More details can be found here](https://azure.
microsoft.com/en-us/pricing/details/cognitive-services/speech-services/)

Are the free allowances enough?

A typically gaming day for me will use 200 phrases. If we assume each phrase is 100 characters (most are much shorter), that's 20,000 characters per gaming day. If we gamed 20 days in the month, that would 400,000 characters total. With caching enabled, the numbers of calls will be even less.

Are you tracking usage or anything else with the plug-in?

No. Anything you run through plug-in is only seen by Cognitive Services (for the text-to-speech operation) and that is within the bounds of your own account.

Are you making money from this?

Nope, nada, but if you enjoy it, feel free to buy me coffee :)

What is the caching based on?

It's based on the SSML xml message sent to the speech service. When you change what's in the Context field for the call to the plug-in OR the Voice Name or Voice Language fields in the config, a new wav file will get saved/cached.

How is the cache managed?

Each time Voice Attack is started, the plug-in loads and it deletes any files from the cache folder that haven't been accessed in the last 30 days (default). You can change the number of days in the config file as well.

What is SSML? How or why should I use it?

It's completely optional and really only needed if you want to go deeper into how precise you want the speech. For example, some voices support emotion, or you can add pauses, inflections, improve pronouciation, etc.

Micorosft documentation on the SSML syntax and capabilities
Microsoft's online Speech generation tools: Simple Editor | Advanced Editor

Note: the plug-in generates the <speech> and <voice> tags. Anything you put in the "Context" field on the call to the plug-in will be placed inside the <voice> tag.

Can I use Voice Attack Tokens in the Context field?

Absolutely. Just like in other places, Voice Attack will process the tokens before it gets to the plug-in. The resulting text is what gets sent off for the speech generation. For example : [Hi, Hello;Greetings] my friend would randomly become one of the 3 variants : Hi my friend, Hello my friend, Greetings my friend.

How can I change settings using variables (override the config file)?

Each of the settings available in the config file can also be set using a Voice Attack variable. The variables are the combination of the prefix MSCognitiveTextToSpeech. and the KeyName from the config file. Examples: MSCognitiveTextToSpeech.DefaultVoiceName, MSCognitiveTextToSpeech.AddRadioEffect. If the config and variable use different settings, the variable takes precedence.

Note: the variables need to be the correct variable type (i.e settings that are true/false need to use "Set a Boolean type", anything that is a number should be an Int, Text should a Text type.)

Troubleshooting

If the following doesn't help, post an issue or question on the Issues page.

The plug-in isn't showing up in Voice Attack after being installed

Make sure you're using the version of the plug-in that matches you Voice Attack install (i.e. x86 plug-in if you'zre using 32-bit Voice Attack (most people). Use the x64 plug-in if the Options window title in Voice Attack says "64-bit")

Related Skills

node-connect

346.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

107.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

346.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

346.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。