This week I had the pleasure to present the experiments I’ve doing for the past six months on GPU driven rendering at the Digital Dragons conference in Poland. The event was well organised with lots of interesting talks, and I managed to finally meet many awesome graphics people that I only knew via Twitter.
I have uploaded the presentation slides in pdf and pptx formats with speaker notes in case anyone is interested and also the modified source code I used for the experiments (I have included an executable, to compile it you will need to download NvAPI).
The main difference between this and the previous version is that this time I pushed the number of instances to 20K (up from 2K) to get some meaningful profiling metrics. This required a change in the way I performed the scan for stream compaction to support more thread groups, as I describe in the presentation. This version also focuses on reducing the memory bandwidth requirements by splitting the instance data into separate streams, using 4×3 matrices for transformations and packing data as much as possible.
These changes dropped the full occlusion pass cost down to 0.25ms (for 20K instances) on a GTX970 and to about a millisecond on a laptop with an HD4000 GPU. Compared to the previous versions, the revised code can process and cull 10 times more instances on the HD4000.
It is only unfortunate that Intel does not support a MultiDraw*Indirect API extension, as performance profiling showed that a large number DrawIndexed*Indirect calls hurt performance on the HD4000.
I am looking forward to an even bigger Digital Dragons conference next year! We need more events like these in Europe.