uLipSync

uLipSync is an asset for lip-syncing in Unity. It has the following features:

Utilizes Job System and Burst Compiler to run faster on any OS without using native plugins.
Can be calibrated to create a per-character profile.
Both runtime analysis and pre-bake processing are available.
Pre-bake processing can be integrated with Timeline.
Pre-bake data can be converted to AnimationClip

Features

LipSync

Profile

Real-time Analysis

Mic Input

Pre-Bake

Timeline

AnimationClip

Texture Change

VRM Support

WebGL Support

Install

Unity Package
- Download the latest .unitypackage from Release page.
- Import Unity.Burst and Unity.Mathematics from Package Manager.
Git URL (UPM)
- Add https://github.com/hecomi/uLipSync.git#upm to Package Manager.
Scoped Registry (UPM)
- Add a scoped registry to your project.
  - URL: https://registry.npmjs.com
  - Scope: com.hecomi
- Install uLipSync in Package Manager.
  - If you can't find uLipSync in the list, please try entering com.hecomi.ulipsync directly.

How to Use

Mechanism

When a sound is played by AudioSource, a buffer of the sound comes into the OnAudioFilterRead() method of a component attached to the same GameObject. We can modify this buffer to apply sound effects like reverb, but at the same time since we know what kind of waveform is being played, we can also analyze it to calculate Mel-Frequency Cepstrum Coefficients (MFCC), which represent the characteristics of the human vocal tract. In other words, if the calculation is done well, you can get parameters that sound like "ah" if the current waveform being played is "a", and parameters that sound like "e" if the current waveform is "e" (in addition to vowels, consonants like "s" can also be analyzed). By comparing these parameters with the pre-registered parameters for each of the "aieou" phonemes, we can calculate the similarity between each phoneme and the current sound, and use this information to adjust the blend shape of the SkinnedMeshRenderer for accurate lip-syncing. If you feed the input from the microphone into AudioSource, you can also lipsync to your current voice.

The component that performs this analysis is uLipSync, the data that contains phoneme parameters is Profile, and the component that moves the blendshape is uLipSyncBlendShape. We also have a uLipSyncMicrophone asset that plays the audio from the microphone. Here's an illustration of what it looks like.

Setup

Let's set up using Unity-chan. The sample scene is Samples / 01. Play AudioClip / 01-1. Play Audio Clip. If you installed this from UPM, please import Samples / 00. Common sample (which contains Unity's assets).

After placing Unity-chan, add the AudioSource component to any game object where a sound will be played and set an AudioClip to it to play a Unity-chan's voice.

First, add a uLipSync component to the same GameObject. For now, select uLipSync-Profile-UnityChan from the list and assign it to the Profile slot of the component (if you assign something different, such as Male, it will not lip sync properly).

Next, set up the blendshape to receive the results of the analysis and move them. Add uLipSyncBlendShape to the root of Unity-chan's SkinnedMeshRenderer. Select the target blendshape, MTH_DEF, and go to Blend Shapes > Phoneme - BlendShape Table and add 7 items, A, I, U, E, O, N, and -, by pushing the + button ("-" is for noise). Then select the blendshape corresponding to each phoneme, as shown in the following image.

Finally, to connect the two, in the uLipSync component, go to Parameters > On Lip Sync Updated (LipSyncInfo) and press + to add an event, then drag and drop the game object (or component) with the uLipSyncBlendShape component where it says None (Object). Find uLipSyncBlendShape in the pull-down list and select OnLipSyncUpdate in it.

Now when you run the game, Unity-chan will move her mouth as she speaks.

Adjust lipsync

The range of the volume to be recognized and the response speed of the mouth can be set in the Parameters of the uLipSyncBlendShape component.

Volume Min/Max (Log10)
- Set the minimum and maximum volume (closed / most open) to be recognized (Log10, so 0.1 is -1, 0.01 is -2).
Smoothness
- The response speed of the mouth.

As for the volume, you can see the information about the current, maximum, and minimum volume in the Runtime Information of the uLipSync component, so try to set it based on this information.

AudioSource Position

In some cases, you may want to attach the AudioSource to the mouth position and uLipSync to another GameObject. In this case, it may be a bit troublesome, but you can add a component called uLipSyncAudioSource to the same GameObject as the AudioSource, and set it in uLipSync Parameters > Audio Source Proxy. Samples / 03. AudioSource Proxy is a sample scene.

Microphone

If you want to use a microphone as an input, add uLipSyncMicrophone to the same GameObject as uLipSync. This component will generate an AudioSource with the microphone input as a clip. The sample scene is Samples / 02-1. Mic Input.

Select the device to be used for input from Device, and if Is Auto Start is checked, it will start automatically. To start and stop microphone input, press the Stop Mic / Start Mic button in the UI as shown below at runtime.

If you want to control it from a script, use uLipSync.MicUtil.GetDeviceList() to identify the microphone to be used, and pass its MicDevice.index to the index of this component. Then call StartRecord() to start it or StopRecord() to stop it.

Note that the microphone input will be played back in Unity a little later than your own speech. If you want to use a voice captured by another software for broadcasting, set Parameters > Output Sound Gain to 0 in the uLipSync component. If the volume of the AudioSource is set to 0, the data passed to OnAudioFilterRead() will be silent and cannot be analyzed.

In the uLipSync component, go to Profile > Profile and select a profile from the list (Male for male, Female for female, etc.) and run it. However, since the profile is not personalized, the accuracy of the default profile may not be good. Next, we will see how to create a calibration data that matches your own voice.

Calibration

So far we have used the sample Profile data, but in this section, let's see how to create data adjusted for other voices (voice actors' data or your own voice).

Create Profile

Clicking the Profile > Profile > Create button in the uLipSync component will create the data in the root of the Assets directory and set it to the component. You can also create it from the Project window by right-clicking > uLipSync > Profile.

Next, register the phonemes you want to be recognized in Profile > MFCC > MFCCs. Basically, AIUEO is fine, but it is recommended to add a phoneme for breath ("-" or other appropriate character) to prevent the breath input. You can use any alphabet, hiragana, katakana, etc. as long as the characters you register match the uLipSyncBlendShape.

Next, we will calibrate each of the phonemes we have created.

Calibration using Mic Input

The first way is to use a microphone. uLipSyncMicrophone should be added to the object. Calibration will be done at runtime, so start the game to analyze the input. Press and hold the Calib button to the right of each phoneme while speaking the sound of each phoneme into the microphone, such

ULipSync

Install / Use

README