Updated: 4/8/2003 for spelling, headers,and added links to “Scenegraphs Today” section
Updated: 9/13/2005 updated bio, added link to scenegraphs
Updated: 6/30/2007 moved to wordpress to allow comments — old URLs should forward here, but please update your links.
Updates: 6/16/2012 for new OpenSG features
To help understand where scenegraphs came from, it’s useful to take a quick look at the evolution of graphics languages like OpenGL and DirectX. Early on, real-time graphics existed on special image generation (IG) hardware that contained entire visual databases in closed proprietary form. Modellers created their databases and loaded them onto the hardware IG. Programmers were generally limited to modifying elements of these databases, like the position and rotation of a helicopter or setting the time-of-day.
SGI introduced a more open and programmable option for image generation hardware and along with it, graphical languages that allowed more direct programmability of the image pipeline. OpenGL (from SGI’s original “GL”) consists of a stream of primitive drawing commands (draw polygon, line, point, etc..) state settings (set color, texture, etc..) and matrix manipulations (push/pop to model-view or perspective matrix, etc..). But it contains very little information that allows the system to self-optimize and improve performance.
This was fine for drawing all sorts of scenes. But polygons that are out of view do consume resources – the hardware doesn’t even know they’re out of view until very late in the rendering pipeline. Unnecessary state changes, extra texture loads, and other common graphics procedures are best avoided if they don’t contribute to the final image.
Culling is the process of removing everything from a scene that will not contribute to the final image, including things that are behind the observer, off-screen, or, in more advanced systems, hidden behind other objects (i.e., occluded). Generalized frustum culling works by comparing each object’s spatial boundaries with a viewing frustum – a truncated pyramid that represents the visible volume of space. OpenGL does this implicitly when you send it polygons – by default, it transforms and clips all polygons to the edges of the viewing volume (most hardware uses a combination of gross clipping and 2D scissoring, but that’s a bit too detailed for this section of the article).
Rather than do the heavy work at the OpenGL and polygon level, scenegraph architects realized they could better perform culling at higher level abstractions for greater efficiency. If we can remove the invisible objects first, then we make the hardware do less work and generally improve performance and the all-important frame-rate.
The way it works is fairly straightforward. Any object that is entirely within the viewing frustum is sent on down to the hardware. For objects that are part in/part out, we usually don’t bother checking individual polygons on the CPU, but we might break a very complex object into several simpler ones so some of them may be culled in or out individually. Of course, any object that is entirely outside the culling volume is rejected early on.
To efficiently perform this calculation, it’s beneficial to organize the objects into a hierarchy or tree, propagating any shared information towards the root of the tree. There are many kinds of trees we could use. But lets keep it to a simple one-parent N-child hierarchy — a directed acyclic graph or DAG.
Such a basic scenegraph will have a root node, with one or more children. Each child node can in turn contain zero or more children, some of which will be the graphical objects we want to draw. The other nodes are there for structural purposes and can get quite complex, as we’ll see later on.
For example, if a building was composed of rooms, a group node of the scenegraph (call it “Building”) might contain several nodes (called “room-0″ “room-1″ and so on). The bounding box of the “Building” node would be defined such that it contains the bounding boxes of all of the rooms. So if the building node was determined to be invisible, then there would be no need to check the child nodes since they would also be invisible.
Another benefit of hierarchy was in ease of manipulation. Given a car containing doors and wheels, it was much easier to move the “car” node and have the child nodes (doors and wheels) follow automatically. Without a hierarchy, one might probably have to move each of these sub-objects synchronously each time the car moved. Of course, that could be solved with some clever back-pointers among dependent matrices, but that’s exactly what scenegraphs are doing in a more formal fashion.
So for example, consider a tank. It might have the following hierarchical representation:
By splitting the object into “nodes” and representing the connectivity between these nodes, we can better manipulate the final polygons of the tank.
We can animate pieces separately. We can rotate the turret, fire the gun, and open the hatch. We can animate the left and right tread to simulate turning.
Rendering Advantages — State Sorting
Scenegraphs showed clear benefits for improving rendering performance and making more optimal use of the available hardware resources. By keeping a “retained” model of the virtual world, scenegraphs could make additional optimizations, such as parallel processing culling and drawing, and most importantly: state sorting.
State sorting is a concept whereby all of the objects being rendered are sorted by similarities in state (texture map, lighting values, transparency, and so on). Since changing state is often an expensive operation due to hardware implementations, this is usually a big performance win, even on the newest hardware. A good example of this is turning lighting on and off — imagine a generic SIMD hardware architecture, executing the same code over four parallel geometry processors. There may be one version of the code for “lit” objects and one version for “unlit.” Changing from lit to unlit state can cause all four processors to flush and reload. But if we can try to turn lighting on or off only once per frame instead of once per object, we can improve performance.
For an even stronger example, imagine we were drawing 100 cars, each containing some polygons in metal (state 1), rubber (state 2) and glass (state 3), it might be beneficial to draw all of the metal objects first, then the rubber ones, and then the glass. We can have 3 state changes, or we can have 300. And at least some state sorting is already required if we’re depth sorting the windows for correct blending results.
However, early state sorting was hampered by the fact that if two objects had very different transformations (for example two windshields on two cars in different locations), it was costly to sort these objects by state alone because changing the viewing matrices was also a fairly expensive operation. Today, however, it is usually much cheaper to sort by state first, though exactly which state is the most expensive (and therefore the most important sort key) varies from platform to platform. We might even want our engine to be able to vary how it state sorts depending on the hardware. As we’ll see later in the article, this is where scenegraphs can excel.
Early scenegraphs employed the concept of state encapsulation to facilitate state sorting. This meant each object in the scenegraph would point to a separate state structure–a set of material colors, texture, lighting, transparency, and so on. The scenegraph could then compare these state objects for similarities or just sort by the pointers. Even still, when switching from one state set to another, the system tried to only change the relevant differences and not blindly apply all state parameters, some of which, like texture loads and binds, could be very expensive time-wise.
In these systems, state sharing was achieved by having two graphical objects point to the same state set. This had other advantages, such as being able to quickly switch from “visible light” states to “infrared” states using simple pointer swaps.
In this example, many of the nodes (rectangles) in the Tank hierarchy are assigned states (ovals). When the tank is drawn, we can sort the objects by state and try to minimize the number of state changes. For example, we can draw the left and right tread at the same time and only set the “rubber” state once. Since depth-first traversal would visit these in that order anyway, we haven’t gained much. But we’d want to draw the base and turret at the same time too; so state encapsulation sorting can provide the needed information to make this possible.
Early scenegraphs were primarily transform graphs, representing object hierarchies in terms of inherited parent/child transformation relationships. For example, a car node might have four wheel-nodes that would be specified relative to axle and steering nodes (their center of rotation), which would in turn be specified relative to the car. Or, perhaps, a building might contain walls, floors, windows, and interior rooms, which might contain desks and chairs and so on.
Dynamic Coordinate Systems
Dynamic Coordinate Systems (DCS) were added for things like our tank, where we wanted the tank to be able to move around from frame to frame and the turret to rotate independently. DCS nodes were originally more expensive, mainly because there was extra bookkeeping information that could not be pre-computed, but instead needed to be re-computed when the object moved, or at worst each frame.
What bookkeeping? Take culling, for example. It often uses bounding boxes or spheres to contain all of a node’s children and their bounding boxes, recursively. If the node’s bounding volume is invisible, all of the children are therefore invisible. When a child moves, the bounding box needs to be re-computed. So we might write the logic as: re-compute the bounding box only when a child moves. But what happens when all of the children move? Do we re-compute the bounding box each time or wait till they’re all done moving? In that case, it might be better to re-compute the bounding box once per frame, or better yet, store a flag that says if any of the children changed that frame and then re-compute the box at most once per frame. This sort of tradeoff is the kind of thing scenegraphs excel at, where immediate mode rendering does little to help.
Static Coordinate Systems
In the case of buildings, since they don’t move, we could use static coordinate systems (called SCS in Performer). These were simple matrix transformations without a lot of overhead. The main difference being that SCS nodes could pre-compute important information, like bounding boxes and collision information. More importantly, in a MP (multi-process) system, SCS nodes are guaranteed to remain the same from process to process, whereas DCS nodes need to be buffered so that changes in one process don’t have immediate effects in another.
Aside: for a quick example of the sort of MP problems that arise, consider two cubes that are being manipulated in one process and drawn in another. If the first process modifies both cubes before either is drawn, things are happy. If the first process moves the cubes after they’re drawn, things are okay, but you won’t see the change until the next rendered frame, by which time something else might have happened. But if the first process modifies one cube and then both are drawn before it can modify the other, you can see strange artifacts that make the cubes appear to oscillate with respect to each other. Worse still, in a true MP system, the first process can be in the middle of updating one cube while the other is drawn, causing unpredictable results.
We may not be used to using multi-threading or multi-processing on wintel boxes, but it’s becoming more and more important, even on single CPU machines. With hyper-threading, AGP bottlenecks, and consoles that contain many independent processors, synchronizing a dedicated “draw” process with a main application, possibly running at a different frame-rate is going to be a challenge more and more people will be familiar with.
Adding Groups, LOD, and other useful nodes
In addition to coordinate system nodes and basic graphical objects, scenegraphs added other types of nodes to take advantage of the “retained mode” and frame-to-frame coherence optimizations. Most of these node types derive from the basic group node, which acts as a simple container for any number of children, spatially proximate or not but does not impose any restriction on its children.
Level of Detail nodes use computations about how far an object is from the observer to “dial in” the amount of detail shown or switch between two or more child nodes which represent an object at various fidelities. The basic idea is that a far-away object can be rendered at lower fidelity (fewer polygons, smaller textures, etc.). Many schemes have been invented to deal with object switching or fading between LOD states, and the state of the art lies in various so-called continuous level of detail schemes.
Switch nodes are a form of group node that sets the active child node (zero or one out of N children) based on some key value (e.g., 0 to n-1). Sequence nodes are a form of switch where the key value cycles based on time. Animations can be made with sequence nodes – each frame of animation is stored as a unique child object and the parent sequence node controls the active frame. A DCS-Sequence is useful for motion-captured joint animation, for example, where an array of transformations is applied in the same way a sequence node iterates through the list of children (it used to require having N SCS nodes under a Sequence, which was wasteful). DCSSequences can, for example, be efficiently compressed and stored and take very little CPU time to play back (though their interactivity leaves something to be desired).
SGI’s Performer was an early example of a scenegraph that was primarily a multi-process transformation graph. Performer had state objects which did not exist in the hierarchy per se, but were referenced by graphical objects. Performer made many advances in the use of MP programming techniques to optimize performance on SGI’s multi processor systems. Performer did a great job of state sorting, though an early design decision limited state sorting to only under individual DCS nodes – in other words, objects could not be grouped for similar-state rendering if they had different DCS nodes above them. Performer also made extensive use of traversal masks and per-node callbacks for special effects.
Adding State Nodes to the Tree
Later scenegraphs added the notion of state as an actual node type. This had some advantages, especially in terms of being able to aggregate common state. For example, if there were 100 brick objects, we could insert a “brick” material node as parent to those 100 objects and the scenegraph render process would implicitly render these together. In fact, one of the principal benefits of state nodes are that explicit state sorting is given to the scenegraph modeler. For skilled modelers, this provides more control and more potential for optimization than automatic state sorting. But in the general case, it probably is not a win.
Why? An illustrative example takes 100 tank objects, each with three states (say tread, metal, and camo). But since we want the tanks to each be independently movable, they would be grouped with each tank having its own parent DCS node, plus some more DCS nodes for the turret and tread wheels if desired. Below that top DCS, we’d see the three state nodes and below those, the individual geometry (shared or instanced). This means, in practical terms, that we’d have 100 tread, metal, and camo nodes and that we’d change state at least 300 times during the rendering of the scene. A better scheme might group the graphical objects by the three common states, but that would require each geometry object to have its own DCS and we’d run the risk of a turret forgetting to drive on when the base of the tank does.
Paradigm’s VisKit is a good example of this approach. It also added other useful node types like “cameras” (representing the observer in the scenegraph, rather than as implicitly at 0,0,0 in modelview space. But in other ways, VisKit was very similar to early versions of Performer (not surprisingly, since its designer was the person who had managed the early Performer team at SGI).
Adding Action or Event Nodes
Many scenegraphs had the notion of per-node callbacks that the programmer could specify. In Performer, each node could have multiple callbacks, depending on the context. In Cull processing, any cull callbacks (if present) would be invoked to affect the culling result. In Draw processing, any draw callbacks would similarly be invoked for drawing special effects. Since these processes worked in a hierarchical depth-first traversal fashion, pre- and post- traversal callbacks were often provided to let things be done before and/or after traversal of child nodes. Application-side callbacks were also provided to do computation or automation on a node once each frame (e.g., for conditional logic, for animation, to move a DCS, collect statistics, and so on).
However, the main drawbacks of such automatic actions per node are twofold. First, they are very difficult to schedule efficiently, since the application does not know in advance which nodes will be visible or how much time any given callback might consume. They can take an arbitrary amount of time to execute, and generally block further processing of culling or drawing (blocking on draw can cause “bubbles” or stalls in hardware queues). They are also somewhat scattered in terms of cache coherence and branch prediction—similar operations are almost never performed in repeated series. In Performer apps, for example, callbacks were sometimes found to cause CPU bottlenecks and non-deterministic behaviors.
The second drawback of callbacks is more complicated. Since app-side callbacks need to be invoked before the culling or drawing traversals begin (since the app can change the positions of objects, moving them in and out of view, for example), the app traversal generally visits every object in the scenegraph, even those that are way off screen. This can be very costly and ultimately defeats the advantages that culling gives over a brute-force immediate mode implementation.
A better system might do some culling first and then do per-node processing based on how close an object was to being in view. Far away objects usually need limited processing, usually just to determine when they will enter the view. And the app process may move an object. So there’s still a cyclic dependency between this optimization and culling which needs to be addressed.
Inventor existed at SGI at the roughly same time as Performer with a very different approach. The goal there was usability over performance. The result was a very elaborate and highly re-usable set of scenegraph nodes, but at the cost of performance. So much so that Inventor was relegated to academic projects and rapid prototyping but to my knowledge, no serious (i.e., high performance) real-time efforts. Many people tried to mix Performer and Inventor to get the best of both worlds, but this was almost always a dead end.
Adding Event Nodes
Event nodes were a later addition to systems like Inventor and its descendent, VRML. The idea behind a scenegraph event system is fairly clever in theory. If the camera or observer is an object in the scenegraph, we can test to see when this object collides with one or more invisible “trigger” volumes also in the scenegraph. A trigger or sensor object could be linked to an effector or action object that would animate a node, for example. Events could be mouse or keyboard based too, so if you click on a 3D button, something else happens in the virtual world.
In this way, one could write an entire user-interactive program in a scenegraph. Doors could be opened, lights turned on by flicking virtual switches, and so on. All data driven.
Virtual Reality Modeling/Markup Language was the extension of Inventor, drafted after many competing forces finally came together (lead by SGI at the time). It was very similar to Inventor in form and function and suffered from many of the same performance disabilities. But the main benefit was that it was highly self-contained and simple to transport across network connections. It also added concepts for extensibility and portability that Inventor largely lacked (being SGI-specific) and is now being further revised in something called Web3D or X3D or VRML200x.
Body and Facial Animation
X3D and MPEG-4 add special node types for Body and Facial Animations, since for humans, there are some clever ways to extract differences from a standard (implicit) model for better compression. We can encode phonetic visual expressions (visemes) as well as joint animations for elbows and wrists using many fewer bits than if we were coding these things generically.
GeoSpatial problems (like drawing the entire earth) require some special nodes to deal with the inherent hardware precision limitations of graphics hardware, namely single precision floating point. True geospatial information requires more than 23 bits of mantissa to properly represent and scenegraphs are generally done using 32-bit floats, so we add some new GeoNode types to various scenegraph schemes. GeoVrml is one such approach, driven largely by the folks at SRI. Keyhole used its own approach for EarthViewer.
“Potential Visual Set” is a broad term for a sort of generalized culling technique. In basic culling, we take the entire scene and recursively find which objects fall on or within some bounding volume, usually a frustum (a truncated pyramid, approximating the viewing volume). In generalized culling, we might have pre-computed lists of objects that are spatially grouped (like “group” nodes, only they need not be hierarchically associated) and probably visible at the same time. Other techniques might make use of shadow or blocker objects that rule out certain regions of space.
The “Cell and Portal” approach, for example, usually groups the world into rooms or cells, with each cell having a list of objects in it and a list of portals, doors, or other connections to the adjacent (or even distant but connected) cells. When a portal is deemed visible, the culling routine looks at the portal’s connected cell and checks all of its portals, and so on and so on recursively, each time adding (ORing) the overall set of visible objects and each time, reducing (ANDing) the frustum to the portal (door) we can see through. In simpler implementations, objects within a single cell are considered visible whenever their cell is culled in. Often traditional spatial culling is used to further narrow the visible set.
What’s most interesting about Cells and Portals is that it can also generalize the notion of rendering to framebuffers and destinations and make use of standins or impostors. A doorway can be a portal to another room, or it can be implemented as a textured polygon, pre-rendered from an image of that room from the correct perspective. If it’s done right, there’s no way to tell the difference. Mirrors are implemented in much the same way. A mirror can be rendered by inverting the view matrix and projecting the camera through the mirror, then drawing normally into the framebuffer. Or it can be rendered by projecting the camera through the mirror, rendering the scene to a texture, and applying the texture to the mirror as a painting.
The downside of PVS techniques is that they’re usually added to scenegraphs as an afterthought and not built in from the ground up. NetImmerse is/was a game engine that made extensive use of Cells and Portals.
Inventor is easy to use. It provides a rich set of node types which make it easy to get something up and running quickly. And it adds some nice 3D GUI types too, which make producing a finished application that much quicker.
However, Inventor is a poor performer. It suffers from some critical design flaws, such as virtualizing all interfaces, even to atomic data members, which doesn’t help performance any (even COM objects try not to virtualize member getters and setters). But the biggest flaw is in the execution model, the active nodes in the scenegraph consume CPU time while the scene is being rendered. And since all nodes must be visited, view frustum culling is not common, even at the rendering stage. So richly immersive scenes will be slow unless the programmer makes the effort to optimize it by him or herself.
[note: some people who use Inventor have written in to say many of my criticisms have been addressed more recently.]
VRML suffers from many of the same performance limitations as Inventor. It’s nice to be able to specify what are essentially dataflow programs right in the scenegraph by hooking sensors to effectors using routes or linkages and place active clickable objects in the world with a few lines of text. But VRML suffers from a severe namespace problem, where declared objects can be ambiguously or incompletely defined (via dangling external references) and so on.
Just looking at the dataflow problem gives some sense of how buggy a VRML system can be. If a scenegraph finds an effector node first and then finds a sensor node that drives the effector, what is the proper way to process this? Do we normally process the effector node first, then the sensor, thereby potentially computing the effector again this frame (risking an infinite loop or at least a performance hit to fix the problem)? Or do we wait till next frame where it may be too late? Or, perhaps, do we sort the entire scenegraph to make sure all sensors come before their down-wind effectors (if that is even possible given the cyclic possibilities)? This could bring up problems with state and transform dependencies and make objects go haywire.
Given global DEF/USE semantics, can we have two objects using the same global name or is this an error? It could be accidental. If so, did we mean to use the first one or the last one? If we try to use the hierarchy to segment the namespace (as is done in Java, for example), what happens when we subtly reorganize the scenegraph because two objects that had been attached now can move independently (for example, a car riding on a moving flatbed train now drives off at the station). What if we want to reorganize the scenegraph for better state-based performance on different target hardware configurations? We could easily break our nice scenegraph-based program in the process.
Given that we want to hook some nodes up to other nodes to enable event processing, we’d also like a guaranteed consistent way of naming objects that doesn’t change after spatial or state sorting or doesn’t even change if parts of the scene are currently loaded or not (early scenegraphs were entirely memory resident). We want a logical or semantic naming scheme, like in namespaces. We want handles that persist and reflect structures that may not even be local.Scenegraphs today are quite sophisticated and quite readily available, even free and open sourced. They’re generally well suited for cross-platform game development. But current scenegraphs do have some important weaknesses. One is an overloading of the tree concept with all sorts of bells and whistles that slow things down. Another is that without structural changes, coordinating changes in distributed systems is difficult. Very few of the current crop of scenegraphs were designed with MMOGs in mind.The heart of the problem is an overloading of what was once a nice, straightforward performance improvement over immediate mode OpenGL. We moved to hierarchies so we could cull and draw more efficiently. Then we added in all this extra stuff, like hanging ornaments on a Christmas tree, except that some of the ornaments are nice juicy steaks and some are whole live cows. They simply don’t belong.Put another way, the original transform-graph concept sought to organize the visual database spatially to take advantage of grouping proximate or linked objects. We propagated shared spatial information up the tree, where we could make earlier traversal decisions and save time in true log-n tree fashion.But we have more than one way of organizing our visual database. Culling and PVS techniques want to have spatially organized databases for optimum performance. If the scenegraph is instead organized largely by state, then we might need to cull each 3-state tank (in the tank example) three times, once for each articulated part, instead of being able to cull out each tank once and only once. But if we want to get the best hardware performance, we really do want to sort the visible set by the most expensive state changes first. Moreover, since states don’t change that often, we don’t want to re-sort the scene every frame. But if we start with a spatial view each time and sort only the visible objects, that seems that’s we’re stuck with (as was the case with Performer, believe it or not). If we re-sort the whole scenegraph for state optimization (only once, hopefully), we lose the nice spatial coherence we count on for fast culling.
By executing actions at each node during a depth-first traversal, we are most likely invoking bits of code in an arbitrary (almost random) order. This runs counter to the advanced scheduling many compilers try to do to take advantage of CPU branch prediction and pipelining, instruction pre-fetch and high-speed local caching, to name a few. Instruction and Data Cache misses can affect performance by up to 10x on many systems. So doesn’t it make sense, that if we have 100 physics nodes and 100 inverse kinematics animation nodes, we try to process those nodes together, just like we tried to do for state (especially for systems with special vectorizing or SIMD capabilities). So this gives yet another competing organizational approach to how to optimize the scenegraph.
Put all of these together and it’s easy to see that the current evolution of scenegraphs has taken a wrong turn somewhere. And it will require a change in approach to move past the roadblock.
Granted, it is probably impossible to find a single perfect organization for a scenegraph that simultaneously optimizes for spatial, state, semantic, and CPU considerations. Some people try to hand-design theirs to straddle the fence and make the best of what they have. But a better idea is to remove one of the fundamental constraints: that there need be a single scenegraph organization for a given visual database.
It is entirely possible that we can have a single set of objects, call it an object soup, but have two, three, four, or more hierarchies linking these objects into independent and complimentary organizations. It’s been on the wish of a number of scenegraph designers for years, though it’s never been a requirement before distributed databases came along.
But how to implement this is another matter. The solution, it seems, lies in the separation of concepts of scenegraph “nodes” from the “objects” they represent. By making shared objects live in a soup, we minimize the amount of waste and miscoordination we might see with four or more simultaneous object hierarchies. This way the “node” part of an object is just a few bytes – just enough to point to the object in the soup and to the parent/child/sibling relationships in this particular view. All of the real “meat” is kept once in the object, which ideally contains back pointers to each node in each graph, limited to a small number like four.
Is this rocket science? Not really. Relational databases have separated indices from data since the dawn of time. And scenegraphs are just one way of indexing into big visual databases. Once scenegraph designers come to grips with that, the rest is downhill.
The second problem is how to correlate among multiple database views (i.e., sets of indices). Since lightweight nodes in two views point back to the same object, it’s easy to see how given a node in one database view, we could find the corresponding node another view — just follow the back pointers. This lets us cull using the optimized spatial view and render using the hardware-optimized state view.
The heart of an efficient distributed database implementation, then, is using the spatial view to limit what happens in the other views (rendering, culling, physics, animation, and so on) and distributing changes in the spatial view among disparate systems. The state, semantic, and application views do not generally change, except for visibility and priority per time interval, so the real meat of the task is in synchronizing the spatial views.
The semantic or logical view of a visual database is just a convenient way of accessing objects in the object soup. Think of it as the google (albeit local, not web-wide) of visual databases. The organization is arbitrary and entirely up to the developer. A developer might use the semantic view as a large dictionary of objects, organized by object type, subtype and so on. Or a game may divide objects up by their role in game play. But the main idea is that the leaves of this tree are the actual objects in the world.
What’s important is that the logical/semantic structure is well known (published) for all concurrent developers to use. It is a rendezvous point, as well a convenience.
But it can be used for more elaborate schemes as well. For example, if the semantic view is organized into “vehicles” and then “cars” under that, we could perform some operation on all of the game universe’s cars at once (perhaps, proximity tracking).
And there is no reason why objects could not be located under more than one branch of the semantic tree. There could be a branch called “physical objects” as well as the “vehicle/car” branch. One could set the physics computation process everything under the “physics objects” branch automatically.
As discussed earlier, the State View is intended to be a platform-specific state sort and state aggregation view. For a platform on which texture fetching is very expensive, we might see textureIDs as the most significant branches in the tree, thereby minimizing the number of textureID changes. On another platform, lighting mode might be more expensive to change. The State View can generally be computed on the client at load-time and does not change much. Which objects are on or off does, but their fundamental draw order does not.
One exception to that rule is depth-sorted objects, like transparent polygons. Here, we might have a branch of the state view that is somewhat dynamic without slowing down the rest of the system.
One of the latest buzzwords in modern computer graphics is Shading Languages. The main idea is that complex images can be constructed by mathematically combining (adding, subtracting, multiplying, etc..) many simpler images, often through a small assembly language program instead of using actual framebuffer operations. For example, a nice 3D bump mapped brick texture (where the bump mapping provides nice light and shadow cues to make the brick seem more 3D) might be described as a combination of a flat red texture, two or three rendering stages of bump mapping (rendering light and shadows), a light map for global shadows, and perhaps a specular highlight map if the object has little glass or metal bits.
Shaders can be expressed as programs, algorithms, or as a “shading tree,” where the constituent sub-shaders are broken down in hierarchical fashion, like we see for spatial transformations. This shading tree might be explicit, if the underlying scenegraph supports such advanced concepts, or it might be implicit, as an abstract representation of (for purposes of understanding) some pre-compiled code.
It’s important to realize that the shading tree we see could easily vary from hardware platform to hardware platform, depending on the graphics capabilities and other factors. For example, some hardware supports advanced bump mapping in a single operation – so the state tree node in that case would be one node. Other hardware might not support bump mapping at all, but we can still achieve bump mapping effects by making multiple simpler rendering passes (one for the texture, one for light areas and one for dark areas). So the shading tree in that case might have a parent node with three children, representing the three passes.
(note: here, boxes are states and circles are rendering objects)
Shading trees will also vary from software API to software API. But since we’ve separated out the notion of our Spatial View (see below) from the State View, this affords us a good place to handle the interface between with underlying graphics APIs we might want to use (such as OpenGL or DirectX).
The presence of an Application View is not a strict requirement. In fact, it is the least useful view out of the bunch, mainly because compilers are so much better at scheduling code on CPUs. And a big problem with data-driven programs is that they can be very hard to debug and stamp out pseudo race conditions. But, on the other hand, they’re very nice for rapid prototyping and platform-neutral abstraction. They’re also quite useful for giving game players the ability to dynamically change game behaviors (e.g., mod programming or simple tunability).
The Spatial View, on the other hand, is the most important view from a distributed database point of view (and with all of the MMOG pushes out there, who isn’t building a distributed database these days?). By organizing the world into spaces and sub-spaces, we can efficiently decide how to route messages, prioritize computations, and cull the database to minimize rendering time and network traffic.
The subdivision of the world into hierarchical spaces is not arbitrary, but there are a number of valid schemes for doing so. What is important is that the subdivision scheme be fairly well tuned to the culling procedure, that no node has too many or too few children (i.e., too tall or wide a tree). In other words, the same rules that apply to well balanced trees in general.
The choice of whether spaces are static or dynamic is also open. For quad tree schemes, the subdivision is relatively static. If an object moves, it might cause new quad cells to be created or destroyed, but no quad-cells ever move. For spheres or bounding box trees, the bounding volumes will likely move as the objects they contain move. Rules for stretching volumes and forcing children in or out of them are also flexible and fairly easy to impement as iterative solutions (with local, not global optimization). In this scheme, bounding volumes can overlap, but they need not do so.
It’s not even a problem for an object to be contained in multiple bounding volumes as long as it’s not culled in or out more than once per frame. I’ve played with systems with “floating” spaces that group objects for lighting purposes (e.g., all objects that are affected by a light are in one space). Grouping objects in formation is another useful extension. A group of tanks or fighters can be dynamically gathered by their proximity and culled as a group, even if there isn’t a single “parent” node in the traditional scenegraph sense.
I’ve covered the basics of scenegraphs, where they came from, where they are, and where I think they’re going, at least from one point of view. Much of this work is related to on-going development of a so-called “multi-view” scenegraph. The ultimate goal of this work is to come up with a simple, light-weight system for optimizing rendering across many platforms. Look for future articles on my progress with this work.
This document intentionally doesn’t directly address whether you should or shouldn’t use a scenegraph in your 3D app. I trust that given the full facts you’ll know best what you need. But for those people who dismiss scenegraphs out of hand, I hope this article does at least shed some light on the likelihood that you are using a scenegraph in one way or another, whether you call it “portals,” bones,” “linked matrices,” or anything else. Because when it comes down to it, this is all just common sense and experience put to work.