This is a lighter overview of D3D12's mesh and amplification shaders are, that may or may not be useful. "Amplification shader" is the D3D12 term for what Nvidia is calling a "task task shader".
A mesh shader is a compute-like shader, which is glued to a fragment shader. This completely lops off the entire rendering pipeline prior to rasterization. The mesh shader is a compute-like shader because you specify how many threads to a work group, and you just dispatch a bunch of work groups to run it, and you get to do things like group shared memory and so on. Mesh shaders are special because they output triangles, which are then rasterized and fed to a normal pixel shader. There's also an optional stage before the mesh shader called a task shader, which I'll explain later.
When rendering with mesh shaders, you still get early z/stencil culling before the pixel shader (in the circumstances which you would normally get early z culling), and you also get the normal ROP stuff you'd get from an ordinary raster pipeline.
What is this used for? Well, for one, you can do just about everything (and more!) with mesh shaders that the the ill-fated tessellation and geometry shaders could do. There's also this "meshlet" rendering technique, where you represent your models as short triangle strips, and you use the mesh shader to aggressively cull meshlets before rasterizing them. This are big scary techniques that are very case-specific, but I think there's a lot of potential for novel uses that don't involve mimicking the entire vertex pipeline.
In this post I call mesh shaders and task shaders "compute-like", because they are very similar to compute shaders by design. However, in Nvidia's current hardware, they are a type of graphics shader, and thus subject to different limitations and performance characteristics.
I mean, yes,
- but only with Nvidia's special vendor extension that adds it.
The new Turing architecture introduced a bunch of new and exciting features, like ray tracing! And and and, they even made OpenGL vendor extensions for some of these! But not for ray tracing >:(.
Also none of the GPU debuggers I've tried support OpenGL mesh shaders, including Nsight, so beware!
For one, you need to have hardware that supports the GL_NV_mesh_shader extension.
To compile a shader program that comprises of one mesh shader and one pixel shader, you'll use the enums GL_MESH_SHADER_NV and GL_MESH_SHADER_BIT_NV (instead of GL_VERTEX_SHADER and GL_VERTEX_SHADER_BIT). And then you'll need a mesh shader and a pixel shader. We'll get to that in a second.
To draw this shader program, you'll use one of these guys:
This mesh shader just draws a full screen quad.
There isn't really much to say here.
This one just draws one work group w/ the starting index being 0.
I want to note that normally you'll have a higher value of "local_size_x", like "32". By setting this to 1 like my mesh shader above does, this launches one (1) thread per work group on your GPU, and we're only launching one work group, so it is essentially a scalar process. Which may or may not be what you want! I really doubt this the best way to draw a full screen quad, however.
Task shaders are more different compute-like shaders that launch mesh shaders. I haven't had a chance to play around with them yet, but they're not really any more complicated to use than mesh shaders are.
One really cool thing you can do here is just turn off rasterization and use a shader program that is just a task and mesh shader. Why? Well, what you have in that case is something like a compute shader that can launch a variable number of a different compute-like shader. Well, you can also do that with indirect dispatching, BUT, task shaders can also pass arbitrary output to the mesh shaders they launch without requiring you dump it into a UAV (or something) first, so that is neat!
If you just want to play with a simple example, the things I'd look at next are interpolants and using gl_LocalInvocationID.x to figure out what thread you're in.