Vladimir Vukicevic is an employee of the Mozilla Corporation, currently working on Gecko 1.9, the next version of the rendering engine used in the Firefox browser. He focuses mainly on back-end rendering, specifically the move to "Cairo" across all platforms, but he works in other areas as well. He’s been involved full time with the Mozilla Project for just over 2 years now.
Questions and Answers via email, asked in September, 2006.
AB: Vlad, what’s your vision of "Web3D?" How do you differentiate it from the common "2D web" experience and existing "networked 3D" applications?
VV: I think the 3D web is somewhat of a middle ground between the two, blending 3D with 2D content. It should also retain one of the main characteristics of early HTML content, which is that you should be able to copy-and-paste from sites that you visit to bring the same content to your own sites.
"Web3D" won’t be about meshes and normal maps and fragment shaders, but it will be about enhancing the current web experience with 3D content, whether that’s for data visualization, for aesthetics, or for novelty factor. Taking advantage of 3D features in UI is also an interesting area, and I hope that putting 3D capabilities alongside HTML will allow for easier experimentation in that area.
AB: Common personal computers have been powerful enough for decent 3D graphics for 5, maybe 10 years. Yet, for most participants on the world wide web, 3D is not a typical daily feature, despite some very successful specialized uses of networked 3D, like Second Life, Google Earth, and World of Warcraft. Why do think that is?
VV: The biggest reason I’d say is that 3D is far less approachable than 2D. It’s very easy to come up with ways to put together 2D elements in some meaningful way, namely photos/images, text, and shapes. Applications such as SketchUp and Google Earth are changing that for 3D, but it’s still much more difficult to create a "3D scene" than it is to create a 2D composition.
There’s also no shortage of 2D source materials, with cameras being pretty much omnipresent. Other than that, there are also few (discovered) compelling use cases for mixing 3D and 2D; the widely successful 3D apps on desktops today focus heavily on the 3D side of the experience, typically in gaming and to a lesser degree in architectural and other walkthroughs where visualization of space is the most important element.
Some of the early research in 3D desktops (Looking Glass in particular) is seeing some of those ideas applied in the Linux community’s Xgl project and Microsoft’s Vista, so there will probably be much more experimentation in this space soon.
AB: When we talk about 3D desktops, we’re generally talking about manipulating (bending, stretching, flipping) application windows as 2D objects (more like paper, or a soft LCD panel) inside the full 3D arena. Even if an application uses 3D graphics, it’s still generally treated as "flat" source material, with the 3Dness lost in compositing. What sort of changes (OS, Mozilla, or API standards) would be needed to treat applications as 3D citizens of a truly 3D desktop? So you could, as you say, copy-and-paste from one 3D app to another or to the desktop?
VV: With Vista and with Xgl, I don’t think there are any major technical hurdles towards full 3D "applications" on the desktop. Copy and paste is a different problem, if you’re talking about copying 3D data; the biggest issue is the lack of a useful interchange format. It’s fairly easy for 2D image data, because there are some well-established formats that can carry around most of the useful data in an image (e.g., png, jpg, tiff for more complex stuff). Nothing as ubiquitous exists for 3D yet, though there may be hope with COLLADA.
AB: Okay. What are the biggest changes in Mozilla, both current and up-coming, that will support "Web3D?"
VV: It’s difficult to predict what will need to happen to Mozilla to fully support a 3D web, but we’re taking the first few steps on what we hope is the right path. We’re about to release an extension that adds OpenGL capability to the 3D canvas. This will allow for experimentation with 3D mixed with HTML content, though at a very low level.
The next step is already underway as well, which is to convert our graphics layer to a unified rendering architecture built on top of Cairo, a cross-platform 2D graphics library. One of the Cairo backends is OpenGL, and we will have the possibility of rendering the entire browser application through OpenGL, taking advantage of the acceleration that can provide. (We hope — a large portion of the browser’s performance profile is dominated by text measurement and text rendering, and OpenGL can’t help us much there.)
After this, it’s not all that clear what the next steps should be; it will take a while to see how people take advantage of either of these features, and then see where we need to build capabilities into the platform to support the usage of 3D and the mixing of 3D content with traditional 2D HTML content. I’m not sure where current 3D XML languages such as X3D fit into this; X3D has many of the same problems integrating with HTML that SVG does.
AB: Many of the sub-systems added after HTML was defined are not finely inter-mixable, though they may exist on the same resulting page. Can you talk about the typical problems with allowing plain HTML to exist inside a Canvas or SVG block, say, mapped to an arbitrary polygon surface, or alternately, allowing a 2D/3D object to fly across the main web page?
VV: SVG already has the notion of a <foreignObject>, which allows for placing non-SVG content within SVG, transformed by the current SVG transform. HTML can live within a foreignObject, and it behaves as if it was its own small page area, for rendering purposes. Because much of HTML layout is dependant on width, as long as a width can be given for the area where HTML is to be rendered (whether that’s in canvas, SVG, or some 3D surface), it’s possible to render HTML to a flat surface and then map/transform it to the destination. Event handling becomes tricky, but it’s doable.
So, there really isn’t much difficulty in placing HTML chunks within the context of a richer graphical environment, as long as the HTML object is the leaf node. Things become difficult if you want to mix HTML and another language without any "adapter" blocks — e.g. if you had a <web3d:sphere> object that rendered a 3D sphere, what would <html:em><web3d:sphere></html:em> do? I think the right way is to define some set of adapters between the different markup spaces, and not allow for arbitrary mixing.
AB: Cascading Style Sheets (CSS) can be used with plain HTML, SVG, and possibly, Canvas to extend the power and reuse of common elements. How does CSS apply differently to 3D objects? Is it limited to styles like line-thickness or texture? Or will we be able to define new 3D styles, like 3D primitives in Second Life, or combinations thereof?
VV: Canvas actually doesn’t use CSS; all styling of anything rendered to a canvas element is done entirely through script. There currently is no CSS language for 3D content (at least, as far as I’m aware), but I think it would be a good fit. It’s difficult to apply CSS which was intended to apply presentation attributes (colors, font sizes/styles, etc.) to declarative markup (the actual data content that’s in the HTML). In practice, it’s pretty hard to do everything with CSS; some of the presentation leaks out into the actual markup.
Things become even muddier when you consider CSS for what’s basically a presentation language, like SVG — is the fact that the triangle path is red part of the content (that is, an essential part of the data the SVG document is intended to convey), or is it part of the presentation (it can be changed or done away with without affecting the meaning of the document)?
The same problem applies to 3D, though I think that there is a pretty good fit for a CSS language there: CSS could define the rendering state, and the markup would define the relevant objects/meshes and their position relative to eachother. The 3D markup would become purely a geometry/position graph, with style/state information provided by a separate mechanism. This would allow for some pretty powerful results using a relatively simple mechanism; for example, you could easily set every sphere in your scene to red or a certain texture without having to make them all descendants of some common "red material" node or having to manually apply the same material to every sphere node in the graph.
[Note: for a longer discussion of scenegraphs and rendering state, you might enjoy my older Scenegraphs article. Just substitute CSS_3D for "state graph."]
AB: One of the challenges with using OpenGL is security, due in part to very large [unchecked] arrays that could result in buffer overruns. But other challenges include issues like real-time streaming of large datasets from multiple network sources, handling multiple resolutions for each object, and sharing limited resources, like texture memory. What’s your overall approach to managing such complexity inside the Canvas tag, both for performance and ease of use?
VV: While implementing the OpenGL context for Canvas, the solution I came up with for the security issues was to sacrifice performance for at least some security; any arrays that you pass in are explicitly wrapped with their length and element size, and I explicitly check any array-using methods; e.g. if you’re drawing triangles, I make sure that there are at least 3*N vertices in the vertex array and so on. If drawing from an index buffer, I make sure that each array is at least as long as the highest index. If you enable the vertex array, I make sure that there’s something bound to it, etc. This comes at a price, but it’s a better solution than directly exposing OpenGL (which is normally not a security-critical component) to web content.
As for the rest of the issues, I actually don’t manage them at all — I’m punting on all the complex mesh/dataset and resource management issues, and letting people working with the canvas OpenGL bits implement these in script. As solutions show up, I plan on evaluating where the bottlenecks are and improving performance or providing native code versions of common operations.