## Tuesday, March 12, 2019

I am no expert in making BVHs fast so just use this blog post as a starting point.   But there are some things you can try if you want to speed things up.   All of them involve tradeoffs so test them rather than assume they are faster!

1. Store the inverse ray direction with the ray.   The most popular ray-bounding box hit test assumes this (used in both the intel and nvidia ray tracers as far as I know):

Note that if ray direction were passed in you would have 6 divides rather than 6 adds.

2.  Do an early out in your ray traversal.   This is a trick used in many BVHs but not the one I have in the ray tracing minibook series.   Martin Lambers suggested this version to me which is not only faster, it is cleaner code.

3. Build using the surface area heuristic (SAH).   This is a greedy algorithm that minimizes the sum of the areas of the bounding boxes in the level being built.    I based mine on the pseudocode in this old paper I did with Ingo Wald and Solomon Boulos.    I used simple arrays for the sets and the quicksort from wikipedia for the sort.

Jacco Bikker said...

- Traverse the nodes front-to-back. That way, you can cull nodes of which the entry point is beyond the currently nearest intersection. Expect a very significant performance increase.
- Do not bother sorting nodes for shadow rays and do an early out.
- Have dedicated shadow ray code; they just need to store a boolean, not an intersection distance + u + v.
- Store your intersection result as a set of four floats using a single float4 write. Modern GPUs will do a single mem op for this, which is much faster than e.g. writing 2 or 3 separate floats. Same for reading back that data later.
- Improve SAH construction speed with some binning. Yields nearly identical trees at a fraction of the cost.
- Consider spatial splits. A lot of extra work, but you can expect another 20% efficiency improvement for ray queries.
- Use a top-level BVH over BVhs built per scene graph node. This lets you construct static geometry BVHs offline, while dynamic objects move for free if the motion can be expressed with a matrix, or with a low quality BVH if it can't.

There's probably more, but this should be a good start.

- Jacco.

Jacco Bikker said...

Also interesting: Ganestam et al.'s Bonsai construction scheme is very fast and can be augmented with a very effective spatial split extension. It's a weird one, since it does the spatial splitting as a preprocess, but it yields nearly perfect trees (compared to straight Bonsai).

Carmelo J Fernández-Agüera Tortosa said...

Watch out for NaNs in the slab test!
instead of "vec3 tmin = min(t0,t1), tmax=max(t0,t1)" do "vec3 tmin = min(t0,t1), tmax=max(t1,t0)". Switching the order of the operands makes sure you don't propagate unnecessary NaNs.

Stefan Zellmann said...

Good advice on NVIDIA: make sure that nodes and prims are loaded from memory using 128bit LD instructions. Proper alignment and data layout is key to this. Can mean an extra 10% saved.

Also: recursion hurts here, but that’s a different topic.