SGI Logo

OpenGL

Direct 3D Analysis

An OpenGL Perspective

Download a PostScript version of this document.

1 Introduction

This report outlines some of the most notable deficiencies of the design and current implementation of Direct 3D Immediate Mode. It is not intended to be a comprehensive comparison of Direct 3D and OpenGL, though it does contrast the two APIs (Application Programming Interfaces) in a number of areas. Nor does it discuss Direct 3D Retained Mode; that is a higher-level API that will be examined in a companion report.

In the following pages, "Direct 3D" refers to the Direct 3D Immediate Mode API as defined in the DirectX 3 public release of September, 1996. "Direct 3D" is also used to refer to the DirectX 3 implementation of that API. "OpenGL" refers to the OpenGL 1.1 specification as defined by the OpenGL Architecture Review Board.

Note that references of the form "[chapter part page]" refer to the specified chapter, part, and page number of the DirectX 2 documentation which was the initial document used in this analysis. For example, [5C17] refers to Chapter 5, part C, page 17 (the Direct 3D Immediate Mode Reference). Very little has changed between the DirectX 2 and DirectX 3 public releases, as far as Direct 3D is concerned.

DirectX, Direct 3D, DirectDraw, Windows, Windows 95, Windows NT, and Microsoft are trademarks or registered trademarks of Microsoft Corporation.

OpenGL is a registered trademark and Cosmo is a trademark of Silicon Graphics, Inc.


2 Layering and Relationship to Other APIs

2.1 Why is This Important?

Layering is a technique for structuring software. In general it is used to partition a large piece of software so that separate organizations can implement components of the software independently. Layering is accomplished by defining and conforming to ``interfaces'' between the software components. It is distinguished from some other software structuring techniques by the presence of a hierarchy: components at one level interact only with components at their own level and the next levels above and below. In some cases, the very lowest layer is defined by hardware rather than another software layer.

In this discussion, layering is important primarily because of its performance implications. If there are too many layers, performance can be degraded by passing commands and data from one layer to another. This is especially acute for simple operations, since the cost of passing data through an interface may be large compared to the cost of performing the final operations. For this reason, performance-sensitive applications tend to use the lowest appropriate layer.

Layering can be effective when each layer performs a significant amount of work (that is, adds a significant amount of value) before invoking the next. Examples of effective layering are Direct 3D Retained Mode layered on Direct 3D Immediate Mode, and Cosmo 3D layered on OpenGL.

2.2 Layering in Direct 3D

The Direct 3D architecture defines two layers. The Direct 3D API constitutes one layer. Below it lies another layer containing two components: the HAL (Hardware Abstraction Layer) and HEL (Hardware Emulation Layer). Each piece of hardware that supports Direct 3D has its own implementation of the HAL. The HEL is intended to ``fill in the gaps'' by providing functionality that is not always available in HALs.

Contrast this with OpenGL, which is intended to be the lowest software layer above the hardware. In many ways, OpenGL is more analogous to the Direct 3D HAL than to Direct 3D itself. (But unlike the Direct 3D HAL, OpenGL implementations must be complete; they aren't allowed to omit significant functionality.)

Layering in Direct 3D raises performance concerns because the amount of work performed by the Direct 3D layer is small. This means applications can incur a significant amount of overhead when issuing simple commands that must flow from layer to layer. Direct 3D users are warned to minimize this overhead; see for example [5A27].

2.3 High Level vs. Low Level

Direct 3D is called a low-level API [5A5]. In the context of 3D graphics libraries, this usually means that it occupies a layer in the software hierarchy that is close to the hardware. However, Direct 3D is not the lowest such layer; as noted above, the HAL is lower.

Like the HAL, OpenGL is designed to be the lowest level above the hardware. It is fully standardized and efficient on machines with and without graphics acceleration hardware. Unlike the HAL, OpenGL is intended to be used directly by applications. Good-quality implementations of OpenGL thus have the potential to enjoy less layering overhead than Direct 3D implementations.

Because of its large feature set, OpenGL is often called a high-end API. ``High-end'' is sometimes confused with ``high-level,'' but as noted above, OpenGL is actually lower-level than the Direct 3D Immediate Mode API.


3 Code Portability

3.1 Why is This Important?

Code portability is a measure of the ability to run application code without change on a variety of different machines. If an API offers good code portability, then it reduces software development time by minimizing the number of special cases (machine dependencies) that must be considered. It also reduces the time and expense of software testing, by minimizing the number of machines and machine configurations that must be tested. These issues are particularly important in the PC market, because there are so many hardware vendors and so many ways of combining their products to build a system.

3.2 The Problem of Subsetting

Direct 3D does not guarantee that every machine will support the same set of functionality. A ``capability query'' mechanism (invoked by the GetCaps method) is available to list the features supported by the machine on which the application is running.

If one machine offers one subset of the Direct 3D functionality, and another machine offers a different subset, then any application that is to run on both machines must contain special code for each of them. Since the code for one machine is not executed on the other, to verify that the application is correct, it must be tested on both machines. In these ways, subsetting detracts from code portability.

The capability query mechanism in Direct 3D is fine-grained and extensive, raising the possibility that tens to hundreds of subsets (one per vendor or per graphics accelerator) may arise. If this occurs, the problem of creating portable code may become intractable. This possibility is not far-fetched. Consider that accelerators are allowed to require that all triangles must be sorted directly by the application [5C51], a constraint that would force major structural changes for most applications. See [5C50-52], [5C59-60], [5C66-73], and [5C81] for extensive lists of all the Direct 3D features that might or might not be supported on a given machine.

In theory, the HEL (Hardware Emulation Layer) could be used to mitigate this problem, by providing the functionality that various HALs lack (albeit at much lower performance). However, the current implementation of the HEL is also lacking interesting portions of the functionality specified by the Direct 3D API. As a result, many applications will be exposed directly to the limitations of the individual HALs. Here are some examples of Direct 3D functionality not supported in the DirectX 2 SDK (Software Developer's Kit):

  • Mipmaps (for high-quality texturing and special lighting effects)
  • Culling front-facing triangles (as opposed to back-facing; used for generating cutaway views)
  • Disabling culling altogether (used for rendering fully transparent objects)
  • Patterned line and triangle drawing (used for CAD applications and inexpensive transparency)
  • Blending (for translucency, image dissolves, and smoke/fire effects)
  • Masking writes to the color buffer and depth (Z) buffer (for compositing 3D sprites with complex backgrounds)
  • Depth-buffer comparison functions other than less-or-equal (for multipass algorithms that render lighting and reflections)
  • Alpha comparison functions other than less-or-equal
  • Texture coordinate clamping (as opposed to wrapping around; used for decals on surfaces)
  • Raster operations (e.g. exclusive-OR for reversible drawing, used when manipulating complex objects interactively)
  • OpenGL addresses the subsetting issue simply, by requiring that the entire set of core functions be supported on all implementations. This set is defined by the OpenGL specification, and conformance is enforced by the OpenGL Conformance Test Suite. Applications can use any of these functions confidently. (This is not to say that all OpenGL implementations are identical; vendors may provide innovative non-portable functionality by using the OpenGL extension mechanism. However, OpenGL guarantees availability of more critical fundamental features than does Direct 3D.)

    It should also be noted that Direct 3D's capability query mechanism is not robust enough to describe the behavior of real machines. Therefore applications that depend on the mechanism will encounter situations that it can't handle. A common case is a graphics accelerator with some limited resource that is shared by two or more features. For example, an accelerator might be able to support Z buffering or texture mapping, but not both at the same time, because it lacks the required amount of memory. The capability query mechanism can report that Z buffering is supported or is not supported, but can't report that it's supported only if texturing isn't being used.

    3.3 Window System Dependence

    Direct 3D is dependent on other window-system-specific APIs to provide essential 3D graphics services (for example, DirectDraw provides drawing surfaces and texture maps). This limits the portability of Direct 3D code. OpenGL is more careful about maintaining its independence, and allows window-system-dependent code to be isolated from rendering and texture-management code. This is one reason OpenGL is supported on Windows 95, Windows NT, the X Window System, OS/2, and the Macintosh OS, while to date Direct 3D is supported only on Windows 95.

    3.4 The Problem of Conformance

    At present, Direct 3D lacks a published specification and a set of conformance tests for enforcing the specification. Thus hardware vendors have no straightforward way to determine whether their Direct 3D implementations are consistent with other Direct 3D implementations, and inconsistencies have already appeared in practice. In contrast, OpenGL offers a complete specification and the necessary conformance tests.


    4 Data Portability

    4.1 Why is This Important?

    Data portability is a measure of the ability to use the same models, texture maps, and other scene information on a variety of different machines. This arises in traditional software distribution schemes (boxed disk or CD-ROM), but is becoming much more important in networked environments such as the Web, where the type of a client or server machine is not known in advance, and a single database on a server must handle many clients.

    For most entertainment, simulation, and product design applications, building models and textures is more expensive and time consuming than writing the application code. An API with good data portability leverages that investment effectively. An API with poor data portability forces developers to incur additional (sometimes significant) costs to customize models and textures.

    4.2 How the Execute Buffer Affects Object Modelling

    In Direct 3D, models of 3D objects are built from meshes of triangles. The vertices of the triangles, along with information about how the vertices are connected and instructions for drawing the triangles, are placed in a data structure known as an ``execute buffer.'' Once filled, this buffer is passed to Direct 3D for transformation, lighting, and drawing.

    It is important to use large execute buffers as much as possible, because they permit better sharing of vertex information and because layering overhead (see 2.2) becomes significant for small execute buffers.

    The maximum size of an execute buffer is machine-dependent [5A10]. Furthermore, there is no guaranteed minimum size. The ideal size is system-dependent; what works on one system will be too large to fit the execute buffer of another, or too small to be efficient on a third. The amount of variation is significant: software-only systems have no practical limit on the size of an execute buffer, but very high-performance systems (using small amounts of fast on-chip memory) are not able to process more than a few tens of triangles at a time.

    Therefore no single model for a 3D object will work well with Direct 3D on all systems. Since data portability is mandatory in many applications, the implication is that Direct 3D applications must include code to restructure every model for the execute buffer size that's supported by the machine on which the application is running.

    OpenGL solves this problem by avoiding the generalized mesh structure that Direct 3D uses. OpenGL supports triangle strips and fans that can be specified fully a vertex at a time. These primitives need only enough high-speed storage for three vertices at a time, so they are particularly well-suited for high-speed hardware implementations. Vertices may be grouped together into arrays to allow several strips or fans to be specified at once. (This has efficiency advantages on software-only implementations, but no disadvantage on hardware-accelerated implementations.) There is no limit to the size of a strip or fan. Once a modeller has expressed a 3D object as a set of strips and fans, there is never a need to restructure it.

    4.3 Colors

    RGB colors in Direct 3D are "hardwired" to have 8 bits per color component and 32 bits for a complete group of color components. (There are several places in the API where this dependency is exposed; see the D3DCOLORVALUE structure and the RGB_GETBLUE macro, for example.) Multimedia applications like film/video editing and film output applications like presentation graphics often need to deal with 10 or 12 bits per color component to avoid artifacts. Images from such applications will have to be converted to be used in Direct 3D, and output images from Direct 3D might not be suitable for them.

    OpenGL's color representation scheme accommodates a much wider variety of color resolutions. It is in daily use with 12-bit-per-component RGB colors, and has been used in monochrome applications that required 16-bit precision. It also handles very compact representations, such as 3 bits each of red and green along with 2 bits of blue (for a total of 8 bits). In most cases images need not be converted before being used, though OpenGL has the ability to convert color representations easily.

    According to Microsoft's documentation, RGB textures in Direct 3D must have 8, 24, or 32 bits per texel; ramp-mode textures must have 8 bits per texel [5A69]. It's common for flight simulation textures to be 16 bits deep (RGBA 5/5/5/1 or RGB 5/6/5), so Direct 3D applications would be forced to convert these textures before using them. (Interestingly, 5/5/5/1 and 5/6/5 are common display formats on PCs.)

    OpenGL offers a texture storage mechanism with a wide variety of texel formats, and automatic conversion if a requested format is not available.


    5 Graphics Acceleration Issues

    5.1 Why is This Important?

    Two important characteristics of a 3D graphics API are scalability and cost-effectiveness.

    Scalability refers to the range of performance available to users of the API. A highly-scalable API will support implementations on low-end PCs with software-only rendering all the way up to high-end visualization and simulation systems.

    Cost-effectiveness refers to the performance available to the user of an API at a given price point. A cost-effective API will make the most efficient use of hardware resources available, yielding the best performance at a given price.

    Possibly the most significant factor in graphics accelerator performance is bandwidth: the amount of data that can be accessed (or moved) per second. Each piece of memory in a system has a limited capacity for providing or receiving data. That capacity must be used efficiently (without waste) if the accelerator is to be cost-effective, and it must be easily increasable if the accelerator is to be scalable.

    5.2 The DirectDraw Pixel Addressability Problem

    One way to improve bandwidth for graphics is to provide a separate video memory, distinct from the main memory used by the CPU. This allows each device to access its own memory without interference from the other. Video memory can be upgraded separately from main memory, so this helps improve scalability as well.

    Direct 3D uses DirectDraw to provide drawing surfaces and texture maps. A DirectDraw surface may reside in video memory or in main memory. To access a DirectDraw surface, an application ``locks'' the surface and receives a pointer to it. The pixels in the surface may then be modified by ordinary CPU instructions acting on the pointer.

    The ability to modify a portion of a drawing surface or a texture map is important. For example, it could be used to update a texture map with an image taken from a video camera. (Textures typically must have dimensions that are powers of 2, for efficiency reasons; none of the common video formats have such dimensions, so video images must be mounted in a ``frame'' of texels with power-of-2 dimensions.) It could also be used to update a portion of a texture map representing the terrain currently visible in a flight simulator.

    The problem with the DirectDraw model is that it once again couples video memory to the CPU, which is precisely the situation the separate video memory was designed to avoid. When a surface is locked, memory accesses from the CPU to the surface contend with other graphics accesses to video memory.

    Most high-performance graphics systems completely decouple video memory from the CPU in order to maximize bandwidth available for graphics. The DirectDraw architecture will not be efficient on such systems. Furthermore, since DirectDraw encourages the use of this model, applications will come to depend on it. A growing base of applications that require direct access to video memory will make it more difficult to build high-performance graphics accelerators for PCs. And such applications would not scale up in performance as faster graphics accelerators become available, because they would be limited by the rate at which the CPU can access video memory.

    OpenGL is careful to ensure that main memory and video memory can remain separate, for maximum bandwidth and improved scalability. It does this by providing methods to update portions of video memory efficiently, and never requiring that an application running on the CPU have unconstrained access to video memory in order to implement an essential function. Thus OpenGL can be supported efficiently on machines with or without unified memories.

    5.3 The Direct 3D Execute Buffer Addressibility Problem

    Similar issues exist for the Direct 3D execute buffer. Direct 3D presumes that some accelerators will store execute buffers in video memory, and offers a locking mechanism much like that for DirectDraw surfaces. However, CPU accesses to an execute buffer in video memory will contend with accesses from the graphics accelerator, and the overall architecture makes it more difficult to build a scalable high-performance accelerator.

    The Direct 3D execute buffer mechanism is also subject to ``race conditions'' as both the CPU and the graphics accelerator attempt to update the buffer. It is interesting to note that the application must be prepared to handle such situations [5C28]. OpenGL handles its analogous case (replacing display lists) transparently.

    5.4 Pipelining

    A cost-effective graphics accelerator uses hardware resources effectively. The 3D graphics API can affect this process in several ways. One of these is pipelining.

    The best graphics performance is attained when the CPU and the graphics accelerator are working in parallel: the CPU isn't waiting for the graphics accelerator to finish a previous task, and the graphics accelerator isn't waiting for the CPU to deliver a new task.

    Consider a game program. It operates in a cycle: check input devices; figure out how to move the viewpoint, characters, etc.; set up information to be drawn; call the 3D library to perform the drawing; wait for the image to be displayed and the next cycle to start. Such a cycle is commonly called a "frame." Suppose we represent the portions of a frame with letters: "I" for checking input devices, "C" for computing their effect on the game, "S" for setting up new information to be drawn, and "D" for drawing. We can represent the time the graphics accelerator is drawing with "G". Blanks will indicate idle time.

    Recall that in Direct 3D the normal mode of operation is to create a large execute buffer, then process it. A frame might look like this:

    |IIIIICCCCCSSSSSSSSSSDDDDDDDDDD     | <- CPU
    |                         GGGGGGGGGG| <- Graphics accelerator
    

    During the "S" periods, the game is setting up the large execute buffer. During the first few "D" periods, Direct 3D is processing vertices in the execute buffer. During the latter "D" periods Direct 3D is streaming triangles out to the accelerator, which begins rasterizing them immediately.

    Now suppose that instead of batching up large numbers of commands in an execute buffer, the game trickled them out a few at a time. This might be the natural approach for OpenGL, which can deliver triangles to the accelerator a vertex at a time, when that's convenient for the application:

    |IIIIICCCCCSDSDSDSDSDSDSDSDSDSD     | <- CPU
    |            G G G G G G G G G G    | <- Graphics accelerator
    

    Note that although the game performed exactly the same amount of work, by giving the graphics accelerator something to draw early, more of the work was accomplished in parallel and there is idle time left in the frame (more blank spaces at the right end of the diagram). That idle time could be put to use, improving the game play or the quality of the visuals.

    The key observation to be made here is that as graphics accelerators become more capable, Direct 3D's emphasis on large execute buffers will become less and less appropriate. As developers start to use smaller execute buffers in order to improve pipelining, the layering overhead of Direct 3D and the HAL will begin to be more of a concern.

    5.5 Flow Control

    A related issue is flow control. A pipelined system like the one described above must have buffer memory at various stages in order to smooth out the delays incurred when one stage requires more processing time than normal. (For example, when a large triangle is rasterized, extra time is required to fill all the pixels. Some buffering is needed ahead of the rasterizer, so that it can queue up the next few triangles while it's working on the large one.) This buffering allows the graphics accelerator to accommodate occasional unusual conditions without suffering from a bottleneck.

    The amount of memory required for this buffering is related to the size and ``burstiness'' of the graphics commands passed from the application to the 3D library. Large sets of commands delivered in a single burst (as would be the case for a Direct 3D execute buffer) tend to require large amounts of buffer memory, thus increasing the cost of the graphics accelerator. If the accelerator vendor chooses to reduce costs by skimping on the buffer memory, then the performance of the accelerator becomes more erratic.

    OpenGL was designed with these considerations in mind. For example, OpenGL's graphics primitives can all be processed a vertex at a time, as the CPU delivers each vertex. This maximizes parallel processing and can reduce the amount of flow-control buffering required in the typical case.

    5.6 Fast Memory Requirements for Graphics Accelerators

    In the world of integrated circuit design, "smaller" often means "faster" as well as "cheaper." The small memory that houses a CPU's registers is fast because references to it can be decoded quickly, and because it is close to the functional units that use its contents. On-chip cache memory is larger, but slower; off-chip cache is larger and slower still; main memory is quite large, but so slow that the performance of many programs is now dominated by the time it takes to fetch data from it.

    The situation is the same in the graphics accelerator. The accelerator needs very fast access to the data for each vertex, so it's advisable to keep the vertex information in a few registers or at worst in a small, fast, on-chip memory.

    This is another area in which Direct 3D's use of large execute buffers could be a problem. Triangles in the execute buffer are specified by indices into a potentially large list of vertices. If large vertex lists are allowed, then they may be too large for storage in a fast on-chip memory. If the vertex lists are restricted to a size that will fit conveniently on-chip, then the implementation may face data portability problems (see 4.2).

    As mentioned above, this was another explicit consideration in the design of OpenGL. The graphics primitives in OpenGL require very small amounts of state information, and so can be fit into small, fast, on-chip memories.

    This discussion concentrated on systems with graphics accelerators. As CPU speeds increase relative to main memory speeds, the same issues will arise even for machines without graphics accelerators. It will be interesting to see how the performance of Direct 3D applications changes on future CPUs.

    5.7 Interpretation Overhead

    In order to build a very fast Direct 3D machine, one must develop hardware that is capable of interpreting execute buffers. This presents a few design problems. The need for relatively large memories has been discussed above. Other concerns include the following.

    Because triangles are specified by indices into a vertex list, an indirection is required to fetch the data for each vertex. This may be expensive in some implementations. The design decision in OpenGL was to ensure that this indirection can be performed in the CPU if the application chooses to use it, and to avoid the cost otherwise.

    Triangles in the execute buffer include a flag word as well as vertex indices. The flag word must be examined and interpreted for each triangle, which costs time. The flag word also requires additional memory. OpenGL, by contrast, uses primitives with implicit connectivity so that no flag word is required.

    Execute buffers can include forward branch operations [5C49]. These could be useful in software-only renderers. However, in systems with graphics accelerators, the entire execute buffer might have to be transferred to the accelerator before the buffer is interpreted; this could easily eliminate the advantage of the branch operation in systems where bandwidth is a limited resource. (Consider a large execute buffer in which the first operation is a test and branch to the end. If the branch is taken, nearly the entire cost of transferring the execute buffer to the accelerator would have been wasted.) Arguably, decisions of this kind are best left to a higher-level library like Direct 3D Retained Mode or Cosmo 3D.


    6 Functionality

    6.1 Why is This Important?

    An API that provides a rich set of features can reduce application development time and offer a wider range of rendering effects.

    Concerns about the memory required to support a large feature set can be mitigated by careful structuring of the implementation of the library.

    6.2 Some OpenGL Features That Direct 3D Lacks

    A detailed feature comparison is beyond the scope of this report. Here is a sampling of features to clarify the nature of the issue.

    6.2.1 Culling

    Culling is the process of eliminating triangles that have a particular orientation. For example, one might cull all the triangles on the side of a sphere facing away from the viewer, since those triangles normally will not be visible.

    Direct 3D software renderers have only one culling mode [5C98] and can't change it: backfacing triangles are always culled. This makes it difficult to render ``cutaway'' views in which front-facing triangles are culled (or made transparent and rendered in a second pass).

    6.2.2 Stencilling

    OpenGL offers a stencil buffer that Direct 3D lacks. The stencil buffer contains a mask or matte that can be updated whenever a pixel is drawn, and can be used to control when pixels are drawn. Some applications include:

  • Drawing clean edges on polygons (see Herrell, Baldwin, and Wilcox, "High Quality Polygon Edging," IEEE Computer Graphics and Applications, July 1995).
  • Determining the depth complexity (``overdraw'') of a scene for modelling and performance tuning. ([5A29] claims that determing this exactly is difficult for Direct 3D; the stencil buffer makes it relatively easy for OpenGL applications).
  • High-level culling of obscured objects in complex scenes. By performing a rough-and-ready sort of objects into front-to-back order, setting the stencil buffer in areas covered by a drawn object, and then examining the contents of the stencil buffer, one can determine whether to draw objects that are further away from the eye. Particularly useful in rendering scenes inside buildings.
  • Cross-dissolves of images. One can draw an image, clear the stencil buffer, then progressively set more bits in the stencil buffer and draw a second image only where the stencil bits are set. This allows the first image to dissolve into the second according to any desired pattern.
  • 6.2.3 Texture Memory Management

    This report has already discussed some reasons for separating graphics memory and main memory. Assuming one wishes to support a texture map memory distinct from main memory, it is necessary to manage the textures in that memory. OpenGL offers services for determining whether a particular texture will fit in texture memory, and for deciding which textures will reside in texture memory at any given time.

    6.2.4 3D Textures

    3D textures are usually considered a feature for high-end applications like medical volume rendering. However, they are also valuable in games for smoke, fire, and fog effects. To support them, Direct 3D and DirectDraw would have to be extended in nontrivial ways.

    6.2.5 Integration of 3D and Imaging Operations

    OpenGL offers a large set of image-processing operations. These are not available in Direct 3D, though many of them are available in DirectDraw.

    In some cases it's important to have image operations well-integrated with 3D graphics. For example, labelling pieces of a model with icons or text requires the ability to attach images to points in 3D space. Applications with large numbers of such labels need to position them without having to transform each attachment point and query whether it is actually visible. This is relatively inefficient in Direct 3D.

    6.2.6 Accumulation Buffer

    Direct 3D has no feature analogous to the accumulation buffer of OpenGL. The accumulation buffer can be used for making high-quality antialiased scenes, for special effects like camera depth-of-field and motion blur, and for other operations on sequences of images (for example, converting a 60Hz video sequence to 50Hz without gaps or stuttering).

    6.2.7 Support for Curves and Surfaces

    OpenGL offers support for rational parametric polynomial surfaces, including NURBS. These are an essential part of some industrial design and character animation applications. They can also be used for more efficient rendering, in cases where smoothly-curved objects can be represented more compactly as a surface than as a collection of triangles.

    6.2.8 Efficient Execution of Repeated Command Sequences

    3D graphics often involves the repeated execution of sequences of rendering commands. Support for re-execution of command sequences is convenient for the programmer, but it can also accelerate rendering.

    Sequences of Direct 3D commands can be placed into execute buffers, but an execute buffer cannot invoke the contents of another execute buffer. Higher-level APIs can store and resubmit execute buffers, but Direct 3D doesn't provide a means for storing and executing such sequences in the graphics accelerator.

    In contrast, OpenGL supports a display list mechanism that encapsulates arbitrary sequences of OpenGL commands. This includes the ability to execute other display lists from within a display list. Common uses for this include character animation and articulating mechanical structures for CAD/CAM.

    6.3 Extensibility

    Computer graphics technology is evolving rapidly. In order to innovate, graphics hardware vendors must have a means to provide new functionality to their customers. In the case of OpenGL, there is a well-defined extension mechanism for this purpose. Because OpenGL is the lowest-level 3D graphics API in the system, and the extension mechanism minimizes syntactic and semantic conflicts between vendors, any vendor can extend OpenGL easily and unilaterally. This mechanism has been used widely. One benefit is that extensions to OpenGL 1.0 were validated by customers, and the most useful were folded into the OpenGL 1.1 specification.

    Extending Direct 3D is more difficult. Direct 3D is layered on the HAL and HEL. These interfaces are controlled exclusively by Microsoft. Therefore hardware vendors cannot expose new Direct 3D functionality without Microsoft's consent and participation. At best, this delays the rate at which new features can be provided to customers.


    7 Ease of Use

    7.1 Why is This Important?

    Since end users never see the source code of an application, why is it important for a 3D graphics API to be easy to use?

    The most obvious answer is that an easy-to-use API reduces program development time, thus bringing products to market earlier. An equally important answer is that it can make the application more robust and reliable, by reducing chances for programmers to err or by making errors easier to detect.

    7.2 Problems Caused by the Execute Buffer

    Most of the obvious ease-of-use problems in Direct 3D are associated with the execute buffer. Entries in the buffer are typically set with macro invocations to avoid excessively complicated code. For example, to enable smooth shading for a set of triangles, one might use this code taken from [5A82]:

    #define STATE_DATA(type, arg, ptr) \
    ((LPD3DSTATE) ptr)->drstRenderStateType = (D3DRENDERSTATETYPE) type;\
    ((LPD3DSTATE) ptr)->dwArg[0] = arg; \
    ptr = (void *)(((LPD3DSTATE) ptr) + 1)
    
    STATE_DATA(D3DRENDERSTATE_SHADEMODE, D3DSHADE_GOURAUD, lpBuffer);
    

    whereas in OpenGL one would use

    glShadeModel(GL_SMOOTH);
    

    One significant side-effect of the Direct 3D macro-oriented coding style is that less compile-time type-checking is possible. Another is that the execute buffer is more vulnerable to stray pointer references or off-by-one errors in index computations.

    An example may help clarify this point. Here is a portion of a working Direct 3D program that draws an unlit flat-shaded cube with each side a different color:

    /* (materials defined earlier) */
    

    LPDIRECT3DEXECUTEBUFFER lpD3DExBuf; D3DEXECUTEBUFFERDESC debDesc; D3DEXECUTEDATA d3dExData; int NumVertices = 8; int NumTri = 12;

    /* calculate the size of the buffer */ size = sizeof(D3DLVERTEX) * NumVertices; size += sizeof(D3DSTATUS) * 1; size += sizeof(D3DPROCESSVERTICES) * 6; size += sizeof(D3DINSTRUCTION) * 15; size += sizeof(D3DSTATE) * 6; size += sizeof(D3DTRIANGLE) * NumTri; /* Create an execute buffer */ memset(&debDesc;, 0, sizeof(D3DEXECUTEBUFFERDESC)); debDesc.dwSize = sizeof(D3DEXECUTEBUFFERDESC); debDesc.dwFlags = D3DDEB_BUFSIZE; debDesc.dwBufferSize = size; if (lpDev->lpVtbl->CreateExecuteBuffer(lpDev, &debDesc;, &lpD3DExBuf;, NULL) != D3D_OK) return FALSE; if (lpD3DExBuf->lpVtbl->Lock(lpD3DExBuf, &debDesc;) != D3D_OK) return FALSE; lpBufStart = debDesc.lpData; memset(lpBufStart, 0, size); lpPointer = lpBufStart;

    /* Insert instructions to transform and apply colors to vertices */ /* Set up vertex list */ { D3DCOLOR diff = RGB_MAKE(255,255,255); D3DCOLOR spec = RGB_MAKE(0,0,0); LPD3DLVERTEX v = lpPointer; v[0].dvX = D3DVAL(-1.0); v[0].dvY = D3DVAL(1.0); v[0].dvZ = D3DVAL(1.0); v[0].dcColor = diff; v[0].dcSpecular = spec; v[1].dvX = D3DVAL(-1.0); v[1].dvY = D3DVAL(-1.0); v[1].dvZ = D3DVAL(1.0); v[1].dcColor = diff; v[1].dcSpecular = spec; v[2].dvX = D3DVAL(1.0); v[2].dvY = D3DVAL(-1.0); v[2].dvZ = D3DVAL(1.0); v[2].dcColor = diff; v[2].dcSpecular = spec; v[3].dvX = D3DVAL(1.0); v[3].dvY = D3DVAL(-1.0); v[3].dvZ = D3DVAL(-1.0); v[3].dcColor = diff; v[3].dcSpecular = spec; v[4].dvX = D3DVAL(-1.0); v[4].dvY = D3DVAL(1.0); v[4].dvZ = D3DVAL(-1.0); v[4].dcColor = diff; v[4].dcSpecular = spec; v[5].dvX = D3DVAL(-1.0); v[5].dvY = D3DVAL(-1.0); v[5].dvZ = D3DVAL(-1.0); v[5].dcColor = diff; v[5].dcSpecular = spec; v[6].dvX = D3DVAL(1.0); v[6].dvY = D3DVAL(1.0); v[6].dvZ = D3DVAL(1.0); v[6].dcColor = diff; v[6].dcSpecular = spec; v[7].dvX = D3DVAL(1.0); v[7].dvY = D3DVAL(1.0); v[7].dvZ = D3DVAL(-1.0); v[7].dcColor = diff; v[7].dcSpecular = spec; lpPointer = (void *) &v;[8]; }

    lpInsStart = lpPointer; OP_SET_STATUS(D3DSETSTATUS_ALL, D3DSTATUS_DEFAULT, 2048, 2048, 0, 0, lpPointer); for (i=0;i<6;i++) { int n = (i<5) ? 1 : 3; OP_STATE_LIGHT(1, lpPointer); STATE_DATA(D3DLIGHTSTATE_MATERIAL, D3DMaterialHandle[i], lpPointer); OP_PROCESS_VERTICES(1, lpPointer); PROCESSVERTICES_DATA(D3DPROCESSVERTICES_TRANSFORM, i, n, lpPointer); } if (QWORD_ALIGNED(lpPointer)) { OP_NOP(lpPointer); }

    /* Insert triangle list */ OP_TRIANGLE_LIST(NumTri, lpPointer); { LP3DTRIANGLE tri = lpPointer; WORD flags = D3DTRIFLAG_EDGEENABLETRIANGLE; tri[0].v1 = 0; tri[0].v2 = 6; tri[0].v3 = 7; tri[0].wFlags = flags; tri[1].v1 = 0; tri[1].v2 = 7; tri[1].v3 = 4; tri[1].wFlags = flags; tri[2].v1 = 1; tri[2].v2 = 2; tri[2].v3 = 6; tri[2].wFlags = flags; tri[3].v1 = 1; tri[3].v2 = 6; tri[3].v3 = 0; tri[3].wFlags = flags; tri[4].v1 = 2; tri[4].v2 = 3; tri[4].v3 = 7; tri[4].wFlags = flags; tri[5].v1 = 2; tri[5].v2 = 7; tri[5].v3 = 6; tri[5].wFlags = flags; tri[6].v1 = 3; tri[6].v2 = 5; tri[6].v3 = 4; tri[6].wFlags = flags; tri[7].v1 = 3; tri[7].v2 = 4; tri[7].v3 = 7; tri[7].wFlags = flags; tri[8].v1 = 4; tri[8].v2 = 5; tri[8].v3 = 1; tri[8].wFlags = flags; tri[9].v1 = 4; tri[9].v2 = 1; tri[9].v3 = 0; tri[9].wFlags = flags; tri[10].v1 = 5; tri[10].v2 = 3; tri[10].v3 = 2; tri[10].wFlags =flags; tri[11].v1 = 5; tri[11].v2 = 2; tri[11].v3 = 1; tri[11].wFlags =flags; lpPointer = (void *) &tri;[12]; }

    OP_EXIT(lpPointer);

    /* Setup the execute data describing the buffer */ lpD3DExBuf->lpVtbl->Unlock(lpD3DExBuf); memset(&d3dExData;, 0, sizeof(D3DEXECUTEDATA)); d3dExData.dwSize = sizeof(D3DEXECUTEDATA); d3dExData.dwVertexCount = NumVertices; d3dExData.dwInstructionOffset = (ULONG)((char*)lpInsStart - (char*)lpBufStart); d3dExData.dwInstructionLength = (ULONG)((char*)lpPointer - (char*)lpInsStart); lpD3DExBuf->lpVtbl->SetExecuteData(lpD3DExBuf, &d3dExData;);

    /* Draw the cube (i.e. execute the instruction buffer) */ if (lpDev->lpVtbl->BeginScene(lpDev) != D3D_OK) return FALSE; if (lpDev->lpVtbl->Execute(lpDev, lpD3DExBuf, lpView, D3DEXECUTE_CLIPPED) != D3D_OK) return FALSE; if (lpDev->lpVtbl->EndScene(lpDev) != D3D_OK) return FALSE;

    And here is the same operation in OpenGL:

            glShadeModel( GL_FLAT );
            /* X/Z faces */
            glBegin( GL_QUAD_STRIP );
    	        glColor3f( 1.0, 0.0, 0.0 );
    	        glVertex3f(  1.0,  1.0,  1.0 );
    	        glVertex3f(  1.0, -1.0,  1.0 );
    	        glVertex3f(  1.0,  1.0, -1.0 );
    	        glVertex3f(  1.0, -1.0, -1.0 );
    	        glColor3f( 0.0, 1.0, 0.0 );
    	        glVertex3f( -1.0,  1.0, -1.0 );
    	        glVertex3f( -1.0, -1.0, -1.0 );
    	        glColor3f( 0.0, 0.0, 1.0 );
    	        glVertex3f( -1.0,  1.0,  1.0 );
    	        glVertex3f( -1.0, -1.0,  1.0 );
    	        glColor3f( 1.0, 1.0, 0.0 );
    	        glVertex3f(  1.0,  1.0,  1.0 );
    	        glVertex3f(  1.0, -1.0,  1.0 );
            glEnd();
            /* Y faces */
            glBegin( GL_QUADS );
    	        glColor3f( 1.0, 0.0, 1.0 );
    	        glVertex3f(  1.0,  1.0,  1.0 );
    	        glVertex3f(  1.0,  1.0, -1.0 );
    	        glVertex3f( -1.0,  1.0, -1.0 );
    	        glVertex3f( -1.0,  1.0,  1.0 );
    	        glColor3f( 0.0, 1.0, 1.0 );
    	        glVertex3f(  1.0, -1.0,  1.0 );
    	        glVertex3f(  1.0, -1.0, -1.0 );
    	        glVertex3f( -1.0, -1.0, -1.0 );
    	        glVertex3f( -1.0, -1.0,  1.0 );
            glEnd();
            if (glGetError())
                return FALSE;
    

    7.3 Debugging

    To make finding and fixing bugs easier, programming interfaces should report run-time errors at or near the code that causes the problem.

    It has already been noted that inserting commands into an execute buffer is error prone. Unfortunately, it is only when the execute buffer is executed that errors are reported. In addition, the only possible indications of a malformed execute buffer are the DDERR_INVALIDOBJECT and DDERR_INVALIDPARMS return values from the execute method [5C19]. These errors values provide little information to assist debugging.

    A validate method is provided that can be used to debug execute buffer errors [5C30]. The method invokes a callback function to return the offset into the execute buffer at which an error was first detected. Only the offset and not the nature of the error is returned. Even if more useful debugging information was returned, the fact remains that the execute buffer's corruption is likely to have occurred long before the error is detected. Even with the validate method, there is no easy way to work back from the error to the point in the application at which the execute buffer was corrupted.

    In contrast, OpenGL posts run-time errors as commands are executed, and meaningful error codes help indicate the nature of the problem. By querying the OpenGL error state with the glGetError command, it is possible to isolate the problem to the erroneous OpenGL command. While OpenGL run-time errors can be caused only by improper OpenGL commands, Direct 3D errors can arise from improper commands or corruption of the execute buffer caused by unrelated code.


    8 Conclusions

    There are a number of shortcomings in the design and current implementation of Direct 3D Immediate Mode. In general, these affect performance, code and data portability, functionality, and ease of use. In the long run, the consequences include higher development cost, more difficult and expensive object modelling, and less effective graphics acceleration.

    For most applications, OpenGL is a better alternative than Direct 3D. Besides the technical considerations described above, OpenGL is the only graphics API available on all of the following platforms: Windows NT, Windows 95, MacOS, BeOS, OS/2 and most versions of Unix. As far as performance is concerned, Microsoft recently released an optimized OpenGL implementation for Windows 95 and Windows NT. This release features a new OpenGL hardware abstraction layer, called the mini-client driver (or MCD), that makes implementing the API on commodity PC graphics hardware very straight forward. In addition, Silicon Graphics will continue to push PC performance with the availability of Cosmo OpenGL.

    Given all of these factors, developers should weigh the deficiencies of Direct 3D carefully before deciding to use it rather than OpenGL.


    [SGI Surf home] [Dev. Program]

    We welcome feedback and comments at webmaster@www.sgi.com.

    COPYRIGHT © 1994, 1995 Silicon Graphics, Inc. All Rights Reserved. Trademark Information