hardware at Adam Frisby

Archive for the ‘hardware’ tag

Ideas for Scene Graph Optimisation

Authors note: I use terms such as ‘disadvantage’ when refering to Second Life’s building tools as a comparison to professional tools with professional artists, naturally user generated content tends to lean towards less efficient building techniques. This is not a slight on the content creators themselves, just that the tools make lots more work for people writing renderers and dealing with efficiency.

Second Note: Like my previous post, a large deal of this is speculation. I plan on confirming or denying a large number of my suspicious with the Xenki viewer’s design, but at this point should be just ramblings on the authors blog rather than any authorative statement.

As a sidenote from my previous post – I have some more ideas I’d like to try put into practice directly with rendering Second Life(tm)-style scenes faster for Xenki. The mainline SL client achieves it as far as I can tell through a combination of utter brute force (equivilent to sending an entire dam through a garden hose every minute – It’s pretty impressive.) and lots and lots and lots of caching.

This is not going to play well with WPF at all (I can see that much already), first we dont have access to low level hardware, and second I dont want to debug a thousand graphics glitches with every nuanced bit of hardware. Thanks, but no thanks, I’d rather let MS worry about that part.

So, if brute force is out of the question, what options exist for making things render faster.

First is the obvious one – let’s cache better.

One of the things that has been lamented previously has been the fact that Second Life has dynamic content, ergo we cannot cache the scene – I suspect this isnt the whole deal, while it is true that every object in the scene can potentially be moved (scripted or avatar building) at any moment, we can evaluate a lot of them on probabilities and discount swathes as likely to move.

Objects

Objects can be pretty easily split between “Likely to move” and “Unlikely to move.” Likely to move objects were either recently created, marked temporary or physical, or contain scripts. While it is true the others could still move, the probability is significantly lower, and therefor we can more readily cache them. If they get moved, then we’ll need to rebuild that cache (without the object that moved), but for now – it’s acceptable.

This cache could take the form of rendering the entire ’static’ portion of the scene to a single massive vertex buffer, and then rendering the dynamic elements individually (or in smaller caches). This is very similar to how modern games work – however in that case you have the advantage of being able to build a BSP tree in the editor. I am uncertain as to whether we are capable of doing BSP generation fast enough to make this dynamic cache feasible, but it is an interesting idea nontheless (Insert additional concerns about wide open spaces and BSP trees here).

A potential downside here is that we’ll need to change how LOD works for this to be effective – rather than having LOD calculated “on the fly” as your camera navigates, we will need to force the scene, then only update LOD periodically as the cache refreshes. In this case, LOD may become a function of the size of the object in absolute terms rather than relative to screen space.

Maintaining this cache on an idle processor

One of the great things about processors lately has been the abundance of cores added, this means chances are there is a piece of hardware sitting on this machine without much to do. We can leverage this by doing the cache building and maintainence on a seperate thread which runs on another processor, because the cache is not a prerequisite to rendering – we can optimise the cache in the background, then use it when it is availible.

Handling Textures Better

Second Life has the disadvantage of not using professionally created textures on every surface – this means that it’s possible for a microscopic object that you cannot really see having a massive 1024×1024 sized texture attached to it, increasing both bandwidth usage – and the amount of texture memory that is consumed in displaying your scene.

An idea for fixing this problem could be to measure the surface area each texture is applied to, then using this surface area to approximate what resolution we should render each texture as. (Converting that 1024×1024 texture down to a 32×32 texture if it is only used once, on that object).

By doing this, in combination with careful management of the amount of texture memory availible (downsampling to fit memory and applicability together) this may get around at least part of the “huge texture memory consumption problem”.

Written by Adam Frisby

August 6th, 2008 at 4:43 pm

Posted in Xenki

Tagged with hardware, optimisation, performance, wpf, Xenki

Procedural Generation of Prims considered harmful?

with one comment

Yep.

I said it – one of the things that’s been touted as so fantastic about SL’s rendering performance is the speed at which you can push them to the graphics card, the amount of caching in vertex buffers that can be done, etc.

I’m about to say that it actually doesnt seem to matter that much, and Prims lose out in a lot of cases for some very interesting, but difficult to fix reasons, and doing performance workarounds for this is going to be complex, irritating and make me wish I was dealing with my precious meshes.

I should note here, that the performance of the XBAP application on my crummy laptop graphics card is still relatively solid – and I’m brute forcing nearly every operation at this point.

Reason Number Uno: Fill rate, “invisible” triangles.

Prims waste a lot of triangles in areas we cannot see – occlusion culling of whole objects works well here, but it doesnt work when we’re dealing with potentially a few thousand triangles that are part of an object, but inseperable. This is mostly due to construction techniques than something we can fix at the renderer level, but nonetheless it has a major impact on performance.

Possible Solutions

I’m experimenting with using CSG (Constructive Solid Geometry – boolean operations) at the moment as a method of reducing the number of hidden triangles pushed to the screen. This will have some complexity when involving transparent surfaces, but if we discount transparent primitives from the algorithm we may get a reasonable reduction in the number of triangles pushed to the screen, at the expense of increasing the number of vertex buffers used (prims do have vertex caching on their side).

This is something I plan on experimenting with and am looking at ways to do CSG in C# without me having to dig out research papers.

Reason Number Duo: Really Inefficient Texturing

This is a more annoying issue – namely that as we start drawing triangles for the procedural surface, we have to flick texture index multiple times to render the primitive (assuming it isnt the same texture on all sides), on a spherical or curved surface this isnt so much of a problem – we push a few thousand, flip, push a few thousand more. Fine.

On boxes – Push 2 triangles. Flip. Push 2 triangles. Flip. Now, of course it’s better not to flip at all – and as some people will point out, pushing 2 triangles vs a few thousand is better and still more efficient. The problem here is how primitives differ from mesh based models.

Traditionally in mesh based modelling, you generate a single texture with a uv map for the entire object. By wrapping and contorting it, you can render the entire object as one single pass, which means we dont need to pause, do a new texture lookup, repeat as many times. It still happens occasionally, but the number is much much lower.

If your scene (such as in a modern game) only has 50 uniquely textured objects on scene at once (look closely and you will find it’s probably not much higher than this number) this is fine. It works well – if we appropriately stage our render pipeline, we might even be able to group these into a single pass each.

SL? Your lucky if your scene has less than 100 textures visible. I’ve seen regions where this number is many times more, potentially in the thousands — and as I pointed out earlier, we’re flipping textures midway through rendering single object collections, which is possibly hurting the performance gains we are making by being able to cache those collections originally.

Yeuck.

Some possible solutions here

There’s a couple of potential solutions to this, but I think the easiest one is to leave this to ATi/NVidia/Intel – pipelining similar textures is something I expect their drivers to do. If this does become a problem, I have some ideas in place for grouping similarly textured faces from different primitive groups into single vertex collections.

Written by Adam Frisby

August 6th, 2008 at 3:30 pm

Posted in Technical

Tagged with direct3d, hardware, opengl, performance, viewer, wpf, Xenki

Archive for the ‘hardware’ tag

Ideas for Scene Graph Optimisation

First is the obvious one – let’s cache better.

Objects

Maintaining this cache on an idle processor

Handling Textures Better

Procedural Generation of Prims considered harmful?

Reason Number Uno: Fill rate, “invisible” triangles.

Possible Solutions

Reason Number Duo: Really Inefficient Texturing

Some possible solutions here

Pages

Shameless Plug

Recent Posts

Categories

Cumulus

Tags

Meta

Most Voted Posts

You need to log in to vote