Search:
My Xbox
Xbox Engineering Blog

TrangoXbox LIVE Avatar Technology

Posted April 7, 2010 by Dan Kroymann (SDE) – Trango

Introduced nearly two years ago with the New Xbox Experience, the Xbox LIVE Avatars have quickly become a signature piece of the Xbox story. We’ve all seen them chatting to each other in your friends list, waving at you in the dashboard, wielding light sabers, driving remote control warthogs, or even rocking out as the front man for your band in Guitar Hero 5. Yet have you ever stopped to wonder what goes into making all of that possible? As more and more games incorporate avatars, both as playable characters and through unlockable clothing and props, now seems like a great time to take a peek inside the avatar system and the team that brought them to the Xbox. In this post, Dan Kroymann, a senior developer at Xbox and one of the original members of the avatar team, will dive deep into the technology behind the Xbox LIVE Avatars.

Avatars at a High Level

In the beginning, when we first started designing the avatar system, a couple of key technical goals helped shape the design decisions.  First and foremost, the avatar system had to be exgame developers to adopt and incorporate into their games, but at the same time it had to be powerful and flexible enough to not hinder their creativity.  The next goal was to design a system with backward/forward compatibility considered from the very start.  It was absolutely imperative that we create a system that would allow us to innovate new features over time and have as many of those features as possible “just work” with games that had already shipped.  A classic example of this forward compatibility can be seen in the avatar marketplace which was released a year after the Xbox LIVE Avatars first debuted.  Without requiring any changes, existing avatar games were automatically able to handle avatars wearing new downloadable clothing, even though there was no such thing as downloadable clothing when those games were originally created.

With these guiding design principles in mind, the avatar system is broken down into the following high level components:

  • Avatar Metadata: A super compact description of your avatar, including everything you are wearing, your height, weight, hair color, skin color, facial features, etc. This tiny object weighs in at less than 1KB, and is what the Xbox uses to describe your avatar. When a game wants to display a user's avatar, it asks Xbox LIVE for the user's avatar metadata, and then passes that metadata object into the avatar system to load the full avatar.
  • Asset Loading System: Given an avatar metadata, this system handles locating and loading all of the components that make up the avatar, returning the complete set of art assets (ie: skeleton, 3D meshes, textures, animations, etc.) that the game will need to render the avatar. As a key piece of the backward compatibility story, this code runs in the Xbox system software, which means that we have the ability to update it without needing to patch any of the games that have already shipped. This design is what enabled us to add the avatar marketplace and awardable items via a system update last year.
  • Animation and Rendering: To help developers quickly incorporate avatars into their games, and to keep the art style consistent across games, the final component provided by the Xbox team is a default animation and rendering system. The renderer is designed to be a simple "drop in" component that can be plugged into an existing game architecture and rendering pipeline with minimal effort. This lets developers get up and running quickly rather than spending weeks hooking avatars into their game - with just a few lines of code a developer can load an avatar and have it rendering on screen. The avatar automatically stands there breathing and fidgeting as it randomly cycles through built in idle animations, and with another line or two of code, the developer can trigger sequences of animations like waving, clapping, or punting poor little Keflings around their miniature little towns.

So What Exactly Goes Into Rendering an Avatar?

The Skeleton

It all starts with the skeleton.  Every avatar has a skeleton as do each of the props like the light saber and the RC Warthog.  In the avatar system, when we talk about a skeleton, what we are really referring to is a series of joints that are connected to each other in a hierarchical structure.  In other words, the hip joint is connected to the knee joint which is connected to the ankle joint, and if you rotate the hip joint, the knee joint and ankle joint are moved accordingly.  An Xbox LIVE Avatar is made up of 71 joints, half of which are found in the hands alone!  It takes a lot of joints to create expressive hands and fingers capable of making fists, giving a thumbs-up gesture, or twirling drumsticks.  Every frame, as the animation system plays through an animation, the joints in the skeleton are rotated and moved into the exact pose needed for that frame.  I’ll get into the specifics of how the animation system does this later, but for now, let’s just worry about what happens once the skeleton has been setup for a single frame and we’re ready to render the avatar in that pose.

Skinning the Avatar

No, I’m not talking about filleting your poor avatar in some crazy homage to Silence of the Lambs.  In computer graphics, skinning refers to the process of fitting a complex 3D model to an underlying skeleton.  All of the 3D meshes that make up the avatar (the body, head, hair, clothing, accessories, etc.) are authored in a static pose with the avatar standing straight and the arms out in a “T” position.  The vertices on the meshes are then each bound to up to four joints on the skeleton.  This means that as those skeleton joints rotate and move, the mesh vertices move along with the joints.  The default avatar renderer uses the GPU to do all of this work, which means that the mesh data never has to be touched or modified by the CPU.  Every frame, the meshes are passed to the GPU in the static “T” pose (otherwise known as the bindpose, thus named because it is the pose in which the mesh vertices are bound to the skeleton).  The GPU then takes care of transforming every single vertex from its default position into the appropriate position given the current skeleton layout.  


Facial Animation

With the mesh geometry manipulated into the correct pose for the current frame of animation, the next step is to animate the facial features.  Unlike the avatar’s arms and legs which are 3D shapes that can be animated by moving the underlying skeleton, the avatar’s eyes, eyebrows, and mouth are simply textures that are painted onto the head of the avatar.  Think of the head itself as a blank canvas onto which we paste cutout drawings for each of the facial features.  To animate these features over time, we select which texture to use for a particular feature from a collection of different textures (ie: smiling mouth, frowning mouth, angry mouth, open mouth, etc).  Each frame of animation therefore consists of a specific skeletal

pose, along with references to the desired textures for each of the animated facial features.


Lighting

The Xbox LIVE Avatars have a very distinct look designed to be fun and approachable, yet detailed enough to be a usable replacement for a typical game character.  The lighting applied to the avatar is an important component in achieving this look.  In the default renderer, multiple different light sources and effects are combined to create a warm, soft-edged avatar that avoids the dull plastic look of traditional computer graphics, while simultaneously steering clear of the hyper-realism sought by many of today’s cutting edge games.  The image below shows the three primary lighting components individually, and then illustrates their combined effects. 


We start with a basic ambient light which illuminates the avatar in a flat, shadow-less light.  By itself, this light does nothing to highlight the details of the avatar.  There is no sense of depth and overlapping features such as the fingers or the chin and neck blend together indistinguishably.

The next step is to apply a “rim light” effect that gives the impression of a light source originating behind the avatar and helps soften the edges of the avatar and give it a warmer feel.  This effect is achieved by applying an exponentially brighter light on the surfaces of the avatar that face perpendicular to the point of the view from which the avatar is being rendered.  In other words, the closer you get to the edge, the brighter the light gets.  As with most effects in computer graphics, this is a cheap approximation of a real world phenomenon that is much easier to compute in real time than its 100% physically accurate counterpart.  Astute observers will have noticed some flaws in the way the rim light effect highlights surfaces that are perpendicular to the view point, but are not actually on the outer edge of the avatar itself (the nose for example).  These are the tradeoffs we make in order to have a renderer that looks the way we want it to, while simultaneously requiring the minimum resources necessary to render in real time.

The final major component of the lighting system is a directional light source that is typically above and to the right of the avatar.  This final touch brightens the avatar and adds a bit more depth by allowing the avatar to cast shadows onto itself, which leads us to the next topic: Self-Shadowing.

Self-Shadowing

In order to give the avatar a solid, believably real appearance, the primary directional light source is used for “self-shadowing.”  As the name implies, self-shadowing refers to an object casting shadows on itself (as opposed to an object casting shadows onto its surrounding environment).  The default renderer achieves this self-shadowing through the use of a real time generated shadow map.  Each frame, once the avatar has been positioned in the desired pose, the avatar is rendered from the perspective of the directional light.  During this render pass, rather than record the color of each pixel that is rendered, we instead record the distance from the light to that pixel, otherwise known as the “depth” of that pixel.  The result of this is a grayscale picture that we refer to as the “shadow map” that describes for each pixel in the image how far the light ray had to travel before it hit something.  We can then use this image when rendering the avatar to determine if a particular spot on the avatar should be illuminated by the directional light, or should be shadowed.

To explain how this works, let’s go back to the step where we generated the shadow map.  Take one of the light rays that intersected a point on the surface of the avatar and image that it extends on through the avatar.

Anything that it hits from there onward will be shadowed by the original surface point that the ray first hit.  Additionally, we know that the distance from the light to any of additional intersection points is going to be greater than the distance to the original surface point.  To put it another way, for any given light ray there are some number of points on the surface of the avatar that intersect that ray, and of those points, only the one closest to the light should be illuminated.  Now back to the main render pass.  For each pixel drawn as we render the avatar, we calculate the path of the ray that would travel from the light to that pixel.  This ray corresponds to a particular point in the shadow map, so to determine if this pixel should be illuminated all we have to do is compare its distance from the light to the distance recorded for that ray in the shadow map.  If the pixel’s distance is greater than the shadow map value, then clearly there is something else on this same ray that is closer to the light, and thus is casting a shadow on this pixel.

How Are Avatar Assets Created, and How Does a Game Load Them?

Art Packages and the Avatar Rig

So far I’ve covered how a game animates and renders an avatar, but that leaves out an important piece of the puzzle: where do the various resources (meshes, textures, etc) that a game needs to render an avatar come from?  Well it all starts with an artist working in either Maya or 3D Studio Max.  Using a character rig created by Rare, artists hand craft the 3D meshes that represent a particular piece of clothing and design the textures that will be overlaid onto this mesh.  These meshes are then rigged to the appropriate joints in the avatar skeleton.  For each point on the mesh, weights are carefully assigned to up to 4 nearby joints so that as those joints each move and rotate, the mesh twists and deforms in a believable way.

Each type of clothing has a specific region that it has to stay within in order to ensure that there won’t be weird intersections between two pieces of clothing, or between the clothing and the external game environment.  If a hat was ridiculously tall for instance, that could pose problems for games that built their environments around the more typical avatar sizes.  To prevent these sorts of problems from making it into the game, a special set of assets known as the “maximum bounding mesh” (or as I call it: the marshmallow puff man) are used to both show artists the limits that they need to stay within, as well as allow game developers to plan their game around the largest possible assets.

Each type of clothing also has a very specific memory budget that can be used for 3D geometry and texture data in order to ensure that the code running on the Xbox can predetermine how much memory it needs to set aside to load an avatar without needing to know in advance what the various possible pieces of clothing are.  All of these limits are then enforced in the tools that compile the art assets and prepare them for use on the Xbox.

Asset Compression

On the Xbox, as with any game console, resources such as disc space, memory, and network bandwidth are all in short supply which forces the game engine to carefully balance its resource usage.  In order to minimize the network bandwidth needed to download avatar assets and the disc space needed to store them, the Xbox LIVE Avatar system employs a collection of sophisticated compression algorithms to shrink the assets down as small as possible.  Texture data is stored in a compressed format known as DXT.   One of the benefits of this compressed texture format is that the Xbox GPU can read this format directly without needing to decompress it, which means that not only is the compressed file size smaller, but the runtime memory usage is smaller as well since the texture does not need to be decompressed in memory.

Spherical Quantization Compression

3D geometry is compressed using a spherical quantization algorithm that can be tweaked to find the right balance between compression size and degradation of the original mesh.  Here’s how it works:  We start by taking a bunch of spheres and packing them together into a hexagonal lattice.  Then we overlay the original 3D mesh onto this same space.  Each vertex in the mesh is then snapped from its original position in space to the center of the nearest sphere.  In doing so, we are then able to record the position of the vertex as a simple number that indicates which sphere it was assigned to, rather than needing to record the precise X/Y/Z coordinates of its position in space.  By making the spheres larger, we are able to shrink the file size at the cost of potentially moving vertices further away from their original position.  Conversely, by making the spheres smaller and smaller, the vertices stay closer to their original positions, but it takes a larger number to record which sphere each vertex was assigned to thus increasing the file size.  Then following these two targeted compression steps (DXT texture compression and spherical quantization of 3D geometry), we do an additional general LZX compression on the entire file which further shrinks the file size without losing any quality.

On Demand Animation Decompression

Animation data uses a similar compression scheme to the geometry compression, with an additional feature that keeps the runtime memory usage as low as possible.  During compression, the animation is sliced up into a series of keyframes uniformly separated in time over the duration of the animation.  Each keyframe contains the position/rotation of each joint in the skeleton as well as the indices of the various animated facial texture layers.  The joint data is compressed using the same spherical quantization compression algorithm used for 3D geometry, however unlike the mesh vertices, it is not decompressed when loading the avatar (or in this case, when loading the animation).  At runtime, in order to calculate the precise pose of the skeleton at any moment, the animation system decompresses the keyframes that immediately precede and follow the requested time and then smoothly blends between the poses described in those two keyframes.  Joint positions are linearly interpolated while joint rotations are blended using spherical linear interpolation.  Using this keyframe method enables smooth and accurate playback of the animation no matter what the game’s frame rate is, nor at what speed the animation is being played.  By decompressing keyframes on demand as needed, we minimize the animation system’s memory usage, however this comes at the cost of additional CPU processing required each frame.

Asset Loader

When a game wants to display a user’s avatar, it asks Xbox LIVE for the user’s avatar metadata (the super compact structure that describes everything about an avatar), and then passes that metadata object to the asset loading system.  The asset loading system then parses the metadata, locates and loads all of the assets needed for the avatar, and finally decompresses the assets into the raw textures, vertex buffers, etc. for the game to use.  When the Xbox LIVE Avatars were first introduced as part of the New Xbox Experience, all of the assets available for use on an avatar were stored on the Xbox within a file known as the “Avatar Asset Pack”.  Every Xbox had the same package of assets available to it, whether it was online or offline, which made asset loading a fairly simply process.

The following summer, we introduced the Avatar Marketplace and avatar awards, which meant that assets would now be coming from sources other than the Avatar Asset Pack.  The original stock assets are still loaded from the Avatar Asset Pack, while the newer assets are loaded from one of three possible locations:

  1. Download from Xbox LIVE: When playing online against other users, their avatars may be wearing assets that you don’t have available on your Xbox.  In this case, the required assets are downloaded on the fly from Xbox LIVE.  As an optimization, these assets are temporarily cached so that if two avatars are both wearing the same item, it only has to be downloaded once.
  2. Locally Installed Copy: When loading an asset for a remote player’s avatar, if someone on your Xbox has already purchased or been awarded the asset, then the locally installed copy of the asset is used, which speeds things up by avoiding the unnecessary download from Xbox LIVE.
  3. Your User Profile: Whenever you put a purchased/awarded asset onto your avatar, a copy of that item is written into your profile.  This allows you to take your profile on a memory unit over to your friend’s house, and without requiring a connection to Xbox LIVE, your avatar can be loaded even if your friend doesn’t already have everything your avatar is wearing.

As I mentioned at the beginning of this post, the asset loading logic is a key pillar supporting the forward/backward compatibility of the Xbox LIVE Avatar system.  By designing the asset loading system with this principle in mind, we were able to add the new downloadable asset feature a year after avatars were first introduced without needing to make any changes to the existing games.  The game’s code doesn’t know, and in fact doesn’t need to know, where the assets are coming from or how they are loaded and decompressed.  The game simply says to the system: “Can you please give me the textures, meshes, etc. that I need to render this avatar?” and we take care of the rest.

When Things Go Wrong…

As much as I have loved the technical challenges behind designing the Xbox LIVE Avatar system, the unexpected issues that I get to deal with are what make my job a bit more fun than your average software developer’s job.  As I routinely say to my friends and coworkers here at Xbox: “My bugs are cooler than your bugs.”  So, let me leave you with a collage of some of the wackier bugs that I’ve dealt with while working on the Xbox LIVE Avatar system.

©2010 Microsoft Corporation. All Rights Reserved