Talking Head (3D)

[!IMPORTANT] Netflix acquires gaming avatar maker Ready Player Me. Following the acquisition, Ready Player Me will be winding down its services on January 31, 2026. This includes both of its online avatar creation tools, Ready Player Me and PlayerZero.

Demo Videos

All the demo videos are real-time screen captures from a Chrome browser running the TalkingHead test web app without any post-processing.

Video | Description ---|--- <img src="images/dynamicbones.jpg" width="200"/><br><img src="images/dynamicbones2.jpg" width="200"/> | Having a good hair day! – A two-part introduction to the TalkingHead's dynamic bones feature 🦴🦴 and built-in physics engine. Using custom models with rigged hair and two different hairstyles. See Appendix E for more details. <img src="images/screenshot4.jpg" width="200"/> | I chat with Jenny and Harri. The close-up view allows you to evaluate the accuracy of lip-sync in both English and Finnish. Using GPT-3.5 and Microsoft text-to-speech. <img src="images/screenshot5.jpg" width="200"/> | A short demo of how AI can control the avatar's movements. Using OpenAI's function calling and Google TTS with the TalkingHead's built-in viseme generation. <img src="images/screenshot6.jpg" width="200"/> | Michael lip-syncs to two MP3 audio tracks using OpenAI's Whisper and TalkingHead's speakAudio method. He kicks things off with some casual talk, but then goes all out by trying to tackle an old Meat Loaf classic. 🤘 Keep rockin', Michael! 🎤😂 <img src="images/screenshot3.jpg" width="200"/> $$\color{transparent}{\rule{200px}{0px}}$$ | Julia and I showcase some of the features of the TalkingHead class and the test app including the settings, some poses and animations.

Use Case Examples

Some featured videos, apps, and projects using the TalkingHead class:

Video/App | Use Case ---|--- <img src="images/dialoglab.jpg" width="200"/> | Human-AI group conversations. Researchers from UVA, Google, Northeastern, Google DeepMind, and Google Research developed DialogLab, a toolkit to author, simulate and test human-AI group conversations. 🤖🤖🤖 <img src="images/openai.jpg" width="200"/> | Low-latency AI speech over WebRTC. Speech-to-speech in realtime over WebRTC using OpenAI Realtime API. Learn more about the audio-driven lip-sync module at HeadAudio.<br>Note: Realtime speech-to-speech usage is much more expensive than standard AI text tokens, so please check OpenAI pricing for gpt-realtime-mini before use. <img src="images/olivia.jpg" width="200"/> | Video conferencing. A video conferencing solution with real-time transcription, contextual AI responses, and voice lip-sync. The app and demo, featuring Olivia, by namnm 👍 <img src="images/edgespeaker.png" width="200"/> | Fully in-browser AI you can talk to. Uses TalkingHead, HeadTTS (with Kokoro), whisper-web, and WebLLM (with Llama 3.2). No APIs, no accounts. For best performance and WebGPU support, use a desktop version of Chrome or Edge: 👉 EdgeSpeaker.com <img src="images/geminicompetition.jpg" width="200"/> | Recycling Advisor 3D. Snap a photo and get local recycling advice from a talking avatar. My entry for the Gemini API Developer Competition 2024. <img src="images/evertrail.jpg" width="200"/> | Live Twitch adventure. Evertrail is an infinite, real-time generated world where all of your choices shape the outcome. Video clip and the app by JPhilipp 👏👏<br>NEWS: Featured at the AI Film Awards during the 2025 Cannes Film Festival! <img src="images/cliquevm.jpg" width="200"/> | Quantum physics using a blackboard. David introduces us to the CHSH game and explores the mystery of quantum entanglement. For more information about the research project, see CliqueVM. <img src="images/datingprofile.jpg" width="200"/> $$\color{transparent}{\rule{200px}{0px}}$$ | Interactive Dating Profiles. ❤️ Researchers from the MIT Media Lab and Harvard used the TalkingHead class and data-driven AI to create digital twins that potential dating partners could interact with. Their paper (Baradari et al., 2025) was presented at CHI 2025 in Japan.

More projects, sites and research using TalkingHead:

Link | Description ---|--- Cancer Clinical Trial Participation | Researchers at the University of Florida explored how multiple virtual agents can help overcome barriers to joining cancer clinical trials. TalkMateAI | Real-time Voice-Controlled 3D Avatar with Multimodal AI. Riverts | A platform for building, running, and analyzing interactive user-avatar conversations. Interactive Avatar | Interactive avatars as a service - an easy way to add an AI-driven avatar on your website. Alter egos alter engagement | Researchers at the University of Florida used TalkingHead to explore how embodied AI chatbots can support mental well-being.

Introduction

Talking Head (3D) is a browser JavaScript class featuring a 3D avatar that can speak and lip-sync in real-time. It also knows a set of emojis and can convert them into facial expressions.

The class supports full-body 3D avatars (GLB) and Mixamo animations (FBX). The avatar must have a Mixamo-compatible rig and ARKit and Oculus viseme blend shapes. See Appendix A for details on creating your own avatar.

By default, the class uses Google Cloud TTS for text-to-speech and has a built-in lip-sync support for English, German, French, Finnish, and Lithuanian. New lip-sync languages can be added by creating new lip-sync language modules.

It is also possible to integrate the TalkingHead class with any external TTS service that can provide word-level timestamps, such as the ElevenLabs WebSocket API. Note that a lip-sync language module is not required if your TTS engine can output viseme IDs or blend shape data directly. For example, by using the Microsoft Azure Speech SDK, you can extend TalkingHead's lip-sync support to 100+ languages.

The class uses ThreeJS / WebGL for 3D rendering.

Add-on Modules

Modules compatible with the Talking Head (3D) project:

Module | Description --- | --- <img src="images/headtts.jpg" width="200"/> | HeadTTS is a free and open-source English TTS with Kokoro neural voices, viseme IDs, and accurate phoneme-level timestamps. It can run entirely in a browser using WebGPU. <img src="images/headaudio.jpg" width="200"/> | HeadAudio in an audio worklet node/processor for audio-driven, real-time viseme detection and lip-sync. No text transcription or timestamps needed. Fast and fully in-browser. <img src="images/motionengine.jpg" width="200"/> $$\color{transparent}{\rule{200px}{0px}}$$ | MotionEngine extends the TalkingHead's built-in animation system with a new set of advanced gestures/expressions, a semantic control layer tailored for LLM-driven execution, and LLM-based tooling for authoring and composing new motions.

TalkingHead class

You can download the TalkingHead modules from releases (without dependencies). Alternatively, you can install them from NPM, or import all the needed modules from a CDN:

<script type="importmap">
{ "imports":
  {
    "three": "https://cdn.jsdelivr.net/npm/three@0.180.0/build/three.module.js/+esm",
    "three/addons/": "https://cdn.jsdelivr.net/npm/three@0.180.0/examples/jsm/",
    "talkinghead": "https://cdn.jsdelivr.net/gh/met4citizen/TalkingHead@1.7/modules/talkinghead.mjs"
  }
}
</script>

[!TIP] FOR HOBBYISTS: If you're just looking to experiment on your personal laptop without dealing with proxies, JSON Web Tokens, or Single Sign-On, take a look at the minimal code example. Simply download the file, add your Google TTS API key, and you'll have a basic web app template with a talking head.

If you want to use the built-in Google TTS and lip-sync using Single Sign-On (SSO) functionality, give the class your TTS proxy endpoint and a function from which to obtain the JSON Web Token needed to use that proxy. Refer to Appendix B

TalkingHead

Install / Use

README