BlenderChar

News

Lip Sync with Blender and Magpie

Introduction

Can *you* use Blender to make groovy character animations? The answer is YES! I was able to create my first lip sync by using a nifty little shareware program called Magpie (to decompose the audio track) and through the magic of Blender's Relative Vertex Key system. Here is the final animation:
faceX08.mpg
and the Blender project may be downloaded here:
faceX08_blend.zip
This article is an overview of the steps I went through to create this lip sync.

A frame from the animation

Before we do the First Things First ...

Before we start the tutorial I would like to take a moment to discuss Relative Vertex Keys (RVK's). I am going to go briefly over the steps for creating the RVK's for the character -- this is a big topic and it is worthy of it's own tutorial, so for all of the gory details I will direct you here:
Jonathan DuBois' RVK Tutorial
RVK's work like this: we create a base key, then we create several "relative" keys. These relative keys are calculated by blender as "offets" or "differences" from the base key. With this kind of procedure we can quickly build up a continuum of complex facial expressions: if we have a base key, one relative key that makes the character frown, and another relative key that makes the character stick his tongue out, we can mix these keys together to have a whole range of expressions where the character is in various states of frowning while sticking his tongue out. Blender calculates the final facial expression by first starting with the base key, then adding a proportion of the "relative offset" from the first relative key (the proportion that is to be used is obtained from that RVK's IPO curve), the adding a proportion of the second RVK, and so on, for as many RVK's you have created. This makes RVK's the tool of choice for lipsyncing, as we can easily blend between mouth shapes (or "phonemes") to make the character look like he is really speaking. We can even make it look like he is speaking while frowning with his tongue sticking out!

Creating the base key

Let's quickly go over the steps to create a base key. Select the mesh and press 'i' to bring up the insert key menu. Select "Mesh" from this menu. Press the F7 key to go to the "Effects Window" and select "Relative Keys". Press Shift-F6 to go to the IPO window and select the 'Mesh Key' button (the one with the picture of a square with vertices at the corners). You will see a horizontal orange line and an red curve. The orange horizontal line represents our base key, and whenever we want to edit the base mesh we must select this orange line before going into edit mode (the line turns yellow when selected). The red IPO curve that is in the window is unneeded so we will get rid of it. Select the red curve and press 'x' to delete it.

Creating the 1st Relative key

Now, lets create the 1st RVK. Select the orange base key, move your cursor into the 3D window and press TAB to go into mesh edit mode. Press 'i' and select "Mesh" to create the first RVK. In the IPO window a blue horizontal line will appear above the orange base key line -- this blue line represents the 1st RVK (you may need to zoom out in the IPO window to see it). Whenever we want to edit this RVK we select this blue line before going into edit mode (the line turns light blue when selected). We can now edit the mesh (for example, we can make the character pucker up and make an 'OO' mouth shape).

Morphing between facial expression is done using IPO curves for the RVK's -- one curve per relative key. To create an IPO curve for the 1st RVK we select "Key 1" from the right side of the IPO window and we click in the IPO window with the left mouse button while holding the ctrl key. A new IPO curve appears. We can shape this IPO curve just like we do any other IPO curve: TAB puts us into edit mode and we can create new nodes on the curve by ctrl-left-clicking or by duplicating old nodes with the 'd' key. When the IPO curve is at 0.0 for a given frame we do not see the influence of the RVK. When the IPO is at 1.0 we see the full effect of the RVK, mixed with any other RVK's that are influencing the mesh at that frame. One can obtain interesting results sometimes by making the IPO go negative or greater than 1.0.

When the IPO for "Key 1" is at 0.0 we see the base key, and when it is at 1.0 we see "Key 1"

The other relative vertex keys and their IPO curves are created in an analogous fashion. Thus concludes this brief and incomplete introduction to RVK's -- now let's get lipsyncing!

OK, ... First Things First

Relative Vertex Keys are needed for speech and facial expressions

The first job (which is beyond the scope of this tutorial) is to create a model. The mesh of the model should have several relative vertex keys built in that represent the basic phonemes of speech. The model I use has 11 such keys, but it is possible to do decent animations with less. An excellent introduction to using phonemes in computer animation can be found on this page:
Michael Comet's Lipsync tute
You should also create several other keys which model other facial movements and expressions -- my model has 16 additional RVK's that control his eyes, eye brows, moustache and hair. It would also be useful to have an armature for your model -- certainly a neck and a skull would greatly increase the possibilities for your character. My model has an armature on layer 4. At the risk of sounding like a T.V. cooking show chef I will say "Here's one I prepared earlier":
Lipsync_start_blend.zip

Next we need a soundclip. The dulcet tones of blender's creator Ton Roosendaal should do the trick!
TON.wav
We also have to download a copy of the Magpie shareware program. It can be obtained from the Magpie web site:
http://www.thirdwishsoftware.com

Have you been pronouncing Blender correctly? Find out today -- straight from the horse's mouth!

Isolating a Word

Locating the word "Blender"

O.K., lets start plotting out where our phonemes will go! Start Magpie. Select the menu "File->Open..." and select the file ton.wav. We will be mapping our favorite word "Blender" (the second occurance) The first trick is to find it! This is done by highlighting parts of the wav and pressing the "Play Selection" button (the "Play" button which is between two red vertical bars) After locating and highlighting the word (it should lie roughly between 00:00:04:17 and 00:00:05:03, which is indicated in the bottom right corner of the window) press the magnifying glass button to zoom into this part of the waveform.

At this point we should choose a mouth set to look at from the menu "Mouth->Select Mouth Set". The one I like the best is the default one as we don't have to look at a pair of dead eyes while the mouth is moving ("Billy" gives me nightmares!).

We want to see the mouth only!

Isolating Phonemes

O.K., lets find the first phoneme (this should "B" fun!) The important tool in this part of the exercise is the "Play From Selection" button (the "Play" button which is preceded by a red verticle bar). Pick frame 142 and press this button. This plays frame 142 and all frames after it in our zoomed sound clip. What is Ton saying? I hear "lender". This tells me that the "B" phoneme is earlier than this frame. Now try frame 141 and press the button. I hear "Blender"! This tells me that I will give this frame the "B" phoneme. Double click the "B" on the side bar to assign a B phoneme to frame 141.

B! Oh B! Where could you be?

Continue this exercise with the rest of the phonemes in this word. Pay attention to the animated cartoon mouth -- if something doesn't look quite right try moving a phoneme around by a frame or two. The frames that I get for the phonemes are as follows: "B" on frame 141; "L" on frame 143; "E" on frame 144; "N" on frame 146; "D" on frame 147; "U" on frame 148 (the "E" sounds like a "U"); and "R" on frame 150. If you don't get these values, don't fret: there is no one true method for mapping out the phonemes!

Map out the Other Words

If you have time, map out the rest of the words in the clip. It is highly recommended that you also go through the "Getting Started" section of the help file that comes with Magpie. Here is a copy of my Magpie file for this sound clip:
TON.mps

The Lipsync Python Script

At this point I should point out that the existence of the python script "Lipsync" by Chris Clawson and Liubomir Kovatchev that will do a lot of the following work for you:
Lipsync Script at Meloware
While the Lipsync script is a wonderful time saver, I feel that everyone should take the long road at least once to get a full understanding of how RVK's and their IPO's work. If you are in a hurry however, and just want to get your lipsync done quickly, then Lipsync is definitely the way to go.

The Lipsync python script from Meloware is a great time saver!

Getting the Exposure Sheet into Blender

Having mapped out our sound clip, our next task is to use this info to construct some mesh IPO's for our relative vertex keys. We have a couple of choices. We could save it in the magpie format (an MPS file) and then alternate between the magpie window and the blender window as we construct our IPO's. No thanks! Blender has a built in text editor so wouldn't it be nice to have the magpie data as text right inside our file? (the answer is yes!) We do this using the following steps: A) Press the copy to clipboard button; B) Launch notepad and select the menu "Edit->Paste"; C) Save the file as Magpie.txt; D) Open your model in blender and bring up a text window (shift-F11) and select "Open New" from the Menu and locate the file Magpie.txt from the file browser.

From Magpie to Notepad to Blender

I actually like to do one variation to this technique: I like to start my animation on frame 1001, so when I have the pasted the text output in notepad I add one thousand to all of the frame numbers. The reason why I do this is because I like to reserve frames 1-999 for viewing and modelling my RVK's -- if you look at my file you'll find that on frame 1 you'll see the base mesh, frame 11 you'll see Key 1, frame 21 you'll see key 2, etc. I will adopt this convention for the rest of the tutorial, so for example the "B" phoneme will now land on frame 1141 instead of frame 141.

I like to view my keys on frames 11-271

Modelling the IPO's

Now to model the mesh IPO's. We have seven phonemes which luckily are represented by only four relative vertex keys: Key 8 gives us the "B"; Key 6 gives us the "L", "N" and "D" mouth shapes; Key 4 gives us the "E" and "U" sounds; and Key 5 gives us the "R".

Lets shape the IPO for the first phoneme (this should "B" cool -- yeah, yeah I know: it wasn't funny the first time!). Select the Key 8 IPO and enter edit mode. We will allow the "B" phoneme two frames to form so shift-left-click the value 0.0 on frame 1139. We want the phoneme to be fully present on frame 1141 so shift-left-click the value 1.0 on frame 1141. We want the "B" to be absent when the "L" phoneme appears on frame 1143 so shift-left-click 0.0 on frame 1143. Finally (this is optional), I like the handles on my IPO's to be of vector type so I select all vertices and press V_KEY.

Modelling more IPO's

Groovy, lets do the "L". Select the IPO for key 6 and enter edit mode. The "B" phoneme appears on frame 1141 so we set the "L" IPO to zero on frame 1141. The "L" phoneme is fully present on frame 1143 so we shift-left-click value 1.0 on that frame. Finally, the "E" phoneme starts on frame 1144 so we want the "L" phoneme to be "Blended away" by that point: shift-left-click 0.0 on frame 1144. Continue this method until you have mapped out all the phonemes on your exposure sheet.

Blender!

Putting it All Together

The last task which must be done is to sync up the audio with the video. The software I used to do mine was Ulead Video Studio, a commercial product that came free with my video card. There is a free product called DDClip free that will also do this for you. I like the price and the interface for DDClip better than ULead, but DDClip is very limited in terms of output options, and in particular it will not compress your output file. You can use DDClip with the Tsunami MPEG encoder (TMPGEnc) to get some good results with excellent compression for free! DDClip free and TMPGEnc can be downloaded from these places:
DDClip Free
TMPGEnc

Syncing up the video and audio in Ulead Video Studio

It often looks more natural if you delay the start of your audio by a few frames. When I sync up the audio and the video perfectly on my animation it looks like the dubbing on a Godzilla movie! This is a rule in character animation: the sound of the phonemes should be anticipated by the motion. For example, look at the "B" phoneme. The sound of the "B" doesn't occur when the mouth is closed -- it happens when the mouth pops open!

Jeepers! Something's Missing!

If you play the movie as it is right now, you'll say to yourself "Jeepers! Somethings missing!" It is reminiscent of that creepy "Billy" mouthset that comes with Magpie. The eyes seem dead. The mouth is moving but there's no feeling (however, he's still a better actor than Jean-Claude Van Damme!)

Core dump!

I have adopted a few loose "rules" when animating this character. He is a cartoon so his expressions should be highly exaggerated so these rules may not be appropriate for other characters you might want to animate. I use them as a guideline, as a means to provide consistency of character -- but I won't lose any sleep when I choose to ignore them. Here they are:

Some Tips

Blink on "percussive" consanants such as T, B, P or D. Unless you are trying to make your character look sleepy, the blink should be very quick (I use 1 or 2 frames to close the eyes, and two frames for the eyes to open). Relative Vertex Keys #12 and #13 control my models eyelids.
Raise and tilt the eyebrows on some vowels. On my model, this is done using Relative Vertex Keys #21-#24.
raise the moustache on "E" vowels or other phonemes that give the top lip a flat shape. Lower the moustache on "O"'s and other phonemes that give the mouth a round shape. Relative vertex keys #26 and #27 take care of this.

"T & A" Blender style! (Hehe)

Occasionally introduce some kind of asymmetry to the animation. Perhaps only make one eyebrow raise briefly (this can make your character look thoughtful). Perhaps make one eyelid drop slightly. Occasionally rock the head to and fro as he is talking, in beat with the syllables.
Give your character some kind of quirk: mine spits every time he pronounces a "P" or "B" phoneme (these phonemes involve popping air between closed lips). The spit is parented to the head bone and jumps to layer 1 from invisible layer 20 and back again. These spittings are also anticipated: before he spits his neck leans backwards then snaps forward quickly as the spit comes out. His hair also quickly stands on end sometimes when he says the accented syllable of an important work.

Why not say it *and* spray it?!?

Some Final Words ...

I hope this overview of how I made my lip sync proves to be helpful to someone, somewhere. The time to do this lipsync (not including the time taken to create the model, rig the armature, and hack together the Relative Vertex Keys) was about 16 hours. Only about 2 hours were spent using the Magpie program and about 4 hours shaping the IPO's for the phonemes. The rest of the time was spent tweeking the facial expressions, modelling spit, and plotting out the key frames for the head and neck. The time needed to create the model and add the skeleton was the real time consumer and was an ongoing process that spanned several months. Luckily, this model has a fairly low polynomial count so the modelling of the Relative Vertex Keys was fairly painless. Creating the RVK's should always be the last part of the model construction process because you must be 100% satisfied with your base mesh before you can start working on them -- I had to throw out my RVK's a couple of times because I realized I needed to add some vertices to my mesh! That said, although creating a usable model can be time consuming and frustrating, it is also highly rewarding. I hope more people decide to use Blender to make cool character animations! Happy Blendering!

Subliminal message: Send Hos all of your money!