As a noob in rendering pipelines and computer graphics in general, something that was very puzzling to me and I’d just accepted as-is without truly grasping was that in a 3D rendering pipeline, the operation of clipping transformed vertices takes place in another space, oddly named “clip-space” before perspective division is done. Why the hell?!
As you may know, GPUs invoke a user-defined program, called vertex shader for each vertex of a given primitive so that the primitive will be transformed from whatever space they were defined or modeled in to the clip-space. The result of this invocation will be a 4-elemetn vector in homogeneous coordinates that is used during the clip test such that only those primitives (more like their vertices but anyways) that survive the clip test are passed down the pipeline and the rest are clipped, i.e. removed from the pipeline. The reason GPUs do clipping is not only to increase performance and b/w by clipping primitives that’d have been otherwise a waste of compute because they would be out of viewport but also to guarantee that rasterizers, poor guys having to deal with lots of corner cases already, work within a well-defined distance away from eye/camera/viewpoint via the inclusion of viewport and not deal with arbitrarily positioned vertices (think of numerical precision). That is the clipping operation in a nutshell, however it might not be immediately obvious to you, as it was not to me why we invent yet another confusing space only to apply clipping.
If you think about it, there are 3 candidate locations in the whole 3D pipeline where we could handle the clipping:
- Before perspective/orthographic/what-have-you projection using the planes of view frustum
- In 4D clip-space before perspective division
- In 3D space after perspective division, i.e. working in NDC
The first doesn’t seem so suitable to me as the calculations would be tied to the way camera (Is it perspective? Is it orthographic?) in a 3D application will be modeled, how about you?
The second, hmm let’s set it aside for a second.
The third looks like a good candidate; we are already done with projecting vertices and dividing by w to give the good old feeling of three-dimensional perspective and able to work within view-volume cuboid handling all degenerate cases like NaNs/Infs nice and easy. Let’s see what happens to the vertices behind the eye/camera/viewpoint:
What is going on in this picture is, the point behind origin, Q2 with a negative w value, when simply transformed via vertex shader to projection-space first and then perspective-divided, projects to a point in front of viewer as if it was visible, which is wrong. What we want is: determine such points with negative w values before applying perspective-division because unless we do something to restore the fact that this point had w=w<0 value and was behind camera, we will lose this information as after perspective division, it’ll have been projected to a point in front of the eye.
Besides the fact that it simplifies the math to do inside/outside tests for clip-planes a lot to apply clipping in 4D clip-space, it also helps us clip primitives behind the camera rather easily. And if you think that it’s rather a rare case when vertices happen to be behind a camera in a 3D application, think again!
Now, even though what’s actually done in a GPU HW can be waaaay different than what’s outlined here in terms of actual computations or implementations, it’s not that far from the truth and what you’d see in the wild.
Please note that I tried and over-simplified a lot such as what happens to partially visible primitives? or guardband-clipping or does HW really check intersection with each half-plane? Really?! for the sake of clarity; maybe these will be get to be an excuse for me to dust off my writing.