sobota, 6 kwietnia 2019

Reverse engineering the rendering of The Witcher 3, part 13b - witcher senses (outline map)

Welcome,

This is the second part of demystifying Witcher Senses effect from The Witcher 3: Wild Hunt.

Once again, example scene we are working on:


In the first post I showed a bit how "intensity map" is being generated.
We have one full-resolution R11G11B10_FLOAT texture which can look like this:


The green channel represents "traces" and red one - interesting objects Geralt can interact with.

Having this we can move to the next stage - I called it "outline map".

This is a bit strange 512x512 R16G16_FLOAT texture. What's important here, it's implemented in ping-pong fashion. That means, outline map from previous frame is input (along with intensity map) for generating a new outline map in current frame.

You can implement ping-pong buffers in many ways probably but my personal like is as follows (pseudocode):
 // Declarations  
 Texture2D m_texOutlineMap[2];  
 uint m_outlineIndex = 0;  
   
 // Rendering  
 void Render()  
 {  
   pDevCon->SetInputTexture( m_texOutlineMap[m_outlineIndex] );  
   pDevCon->SetOutputTexture( m_texOutlineMap[!m_outlineIndex] );  
   ...  
   pDevCon->Draw(...);  
   
   // after draw  
   m_outlineIndex = !m_outlineIndex;  
 }  

Such approach, when input is always [m_outlineIndex] and output is always [!m_outlineIndex] allows for nice flexibility in terms of applying postFXs in general.

Let's take a look at pixel shader:
 ps_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb3[1], immediateIndexed  
    dcl_sampler s0, mode_default  
    dcl_sampler s1, mode_default  
    dcl_resource_texture2d (float,float,float,float) t0  
    dcl_resource_texture2d (float,float,float,float) t1  
    dcl_input_ps linear v2.xy  
    dcl_output o0.xyzw  
    dcl_temps 4  
   0: add r0.xyzw, v2.xyxy, v2.xyxy  
   1: round_ni r1.xy, r0.zwzz  
   2: frc r0.xyzw, r0.xyzw  
   3: add r1.zw, r1.xxxy, l(0.000000, 0.000000, -1.000000, -1.000000)  
   4: dp2 r1.z, r1.zwzz, r1.zwzz  
   5: add r1.z, -r1.z, l(1.000000)  
   6: max r2.w, r1.z, l(0)  
   7: dp2 r1.z, r1.xyxx, r1.xyxx  
   8: add r3.xyzw, r1.xyxy, l(-1.000000, -0.000000, -0.000000, -1.000000)  
   9: add r1.x, -r1.z, l(1.000000)  
  10: max r2.x, r1.x, l(0)  
  11: dp2 r1.x, r3.xyxx, r3.xyxx  
  12: dp2 r1.y, r3.zwzz, r3.zwzz  
  13: add r1.xy, -r1.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000)  
  14: max r2.yz, r1.xxyx, l(0, 0, 0, 0)  
  15: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r0.zwzz, t1.xyzw, s1  
  16: dp4 r1.x, r1.xyzw, r2.xyzw  
  17: add r2.xyzw, r0.zwzw, l(0.003906, 0.000000, -0.003906, 0.000000)  
  18: add r0.xyzw, r0.xyzw, l(0.000000, 0.003906, 0.000000, -0.003906)  
  19: sample_indexable(texture2d)(float,float,float,float) r1.yz, r2.xyxx, t1.zxyw, s1  
  20: sample_indexable(texture2d)(float,float,float,float) r2.xy, r2.zwzz, t1.xyzw, s1  
  21: add r1.yz, r1.yyzy, -r2.xxyx  
  22: sample_indexable(texture2d)(float,float,float,float) r0.xy, r0.xyxx, t1.xyzw, s1  
  23: sample_indexable(texture2d)(float,float,float,float) r0.zw, r0.zwzz, t1.zwxy, s1  
  24: add r0.xy, -r0.zwzz, r0.xyxx  
  25: max r0.xy, abs(r0.xyxx), abs(r1.yzyy)  
  26: min r0.xy, r0.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000)  
  27: mul r0.xy, r0.xyxx, r1.xxxx  
  28: sample_indexable(texture2d)(float,float,float,float) r0.zw, v2.xyxx, t0.zwxy, s0  
  29: mad r0.w, r1.x, l(0.150000), r0.w  
  30: mad r0.x, r0.x, l(0.350000), r0.w  
  31: mad r0.x, r0.y, l(0.350000), r0.x  
  32: mul r0.yw, cb3[0].zzzw, l(0.000000, 300.000000, 0.000000, 300.000000)  
  33: mad r0.yw, v2.xxxy, l(0.000000, 150.000000, 0.000000, 150.000000), r0.yyyw  
  34: ftoi r0.yw, r0.yyyw  
  35: bfrev r0.w, r0.w  
  36: iadd r0.y, r0.w, r0.y  
  37: ishr r0.w, r0.y, l(13)  
  38: xor r0.y, r0.y, r0.w  
  39: imul null, r0.w, r0.y, r0.y  
  40: imad r0.w, r0.w, l(0x0000ec4d), l(0.0000000000000000000000000000000000001)  
  41: imad r0.y, r0.y, r0.w, l(146956042240.000000)  
  42: and r0.y, r0.y, l(0x7fffffff)  
  43: itof r0.y, r0.y  
  44: mad r0.y, r0.y, l(0.000000001), l(0.650000)  
  45: add_sat r1.xyzw, v2.xyxy, l(0.001953, 0.000000, -0.001953, 0.000000)  
  46: sample_indexable(texture2d)(float,float,float,float) r0.w, r1.xyxx, t0.yzwx, s0  
  47: sample_indexable(texture2d)(float,float,float,float) r1.x, r1.zwzz, t0.xyzw, s0  
  48: add r0.w, r0.w, r1.x  
  49: add_sat r1.xyzw, v2.xyxy, l(0.000000, 0.001953, 0.000000, -0.001953)  
  50: sample_indexable(texture2d)(float,float,float,float) r1.x, r1.xyxx, t0.xyzw, s0  
  51: sample_indexable(texture2d)(float,float,float,float) r1.y, r1.zwzz, t0.yxzw, s0  
  52: add r0.w, r0.w, r1.x  
  53: add r0.w, r1.y, r0.w  
  54: mad r0.w, r0.w, l(0.250000), -r0.z  
  55: mul r0.w, r0.y, r0.w  
  56: mul r0.y, r0.y, r0.z  
  57: mad r0.x, r0.w, l(0.900000), r0.x  
  58: mad r0.y, r0.y, l(-0.240000), r0.x  
  59: add r0.x, r0.y, r0.z  
  60: mov_sat r0.z, cb3[0].x  
  61: log r0.z, r0.z  
  62: mul r0.z, r0.z, l(100.000000)  
  63: exp r0.z, r0.z  
  64: mad r0.z, r0.z, l(0.160000), l(0.700000)  
  65: mul o0.xy, r0.zzzz, r0.xyxx  
  66: mov o0.zw, l(0, 0, 0, 0)  
  67: ret  


As you can see, output of outline map is divided to four equal squares and this is the first thing we need to look at:
   0: add r0.xyzw, v2.xyxy, v2.xyxy  
   1: round_ni r1.xy, r0.zwzz  
   2: frc r0.xyzw, r0.xyzw  
   3: add r1.zw, r1.xxxy, l(0.000000, 0.000000, -1.000000, -1.000000)  
   4: dp2 r1.z, r1.zwzz, r1.zwzz  
   5: add r1.z, -r1.z, l(1.000000)  
   6: max r2.w, r1.z, l(0)  
   7: dp2 r1.z, r1.xyxx, r1.xyxx  
   8: add r3.xyzw, r1.xyxy, l(-1.000000, -0.000000, -0.000000, -1.000000)  
   9: add r1.x, -r1.z, l(1.000000)  
  10: max r2.x, r1.x, l(0)  
  11: dp2 r1.x, r3.xyxx, r3.xyxx  
  12: dp2 r1.y, r3.zwzz, r3.zwzz  
  13: add r1.xy, -r1.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000)  
  14: max r2.yz, r1.xxyx, l(0, 0, 0, 0)  

We start by calculating floor( TextureUV * 2.0 ), which gives:

To determine individual squares, a small function is used:
 float getParams(float2 uv)  
 {  
      float d = dot(uv, uv);  
      d = 1.0 - d;  
      d = max( d, 0.0 );  
   
      return d;  
 }  

Note that this function returns 1.0 when input is float2(0.0, 0.0).
We have this case in upper left corner. To have the same situtation for upper right corner, we have to subtract float2(1, 0) from floored texcoords, for green square subtract float2(0, 1) and for yellow one - float2(1.0, 1.0).

So:
   float2 flooredTextureUV = floor( 2.0 * TextureUV );  
   ...
     
   float2 uv1 = flooredTextureUV;  
   float2 uv2 = flooredTextureUV + float2(-1.0, -0.0);   
   float2 uv3 = flooredTextureUV + float2( -0.0, -1.0);  
   float2 uv4 = flooredTextureUV + float2(-1.0, -1.0);  
   
   float4 mask;  
   mask.x = getParams( uv1 );  
   mask.y = getParams( uv2 );  
   mask.z = getParams( uv3 );  
   mask.w = getParams( uv4 );  

Each of mask components is equal to one or zero and is responsible for one square within texture. For instance mask.r and mask.w:
mask.r

mask.w

Once we have obtainted mask, let's move further. Line 15 samples intensity map. Please note that intensity texture is R11G11B10_FLOAT, while we sample all rgba components. In this scenario, .a is set implicitly to 1.0f.

Texcoords used for this operation can be calculated as frac( TextureUV * 2.0 ). So result of this operations looks for example like this:

Do you see similarity?

The next step is really smart - a 4-components dot product (dp4) is performed:
  16: dp4 r1.x, r1.xyzw, r2.xyzw   

This way, in upper left square we have only red channel (therefore, only interesting objects), in upper right - only green channel (only traces) and in lower right - everything (because .w component of intensity was implicitly set to 1.0). Brilliant idea. The result of dot product looks this way:


Having this masterFilter, we are ready to determine outlines of objects. This is not that hard as one can expect. The algorithm is quite similar to the one applied to sharpen - we have to obtain max abs difference of values.

Here's what happens, we sample four texels near currently processsed one (important: texel size in this case is 1.0/256.0 !) and calculate maximum absolute differences for both red and green channels:
   float fTexel = 1.0 / 256;  
     
   float2 sampling1 = TextureUV + float2( fTexel, 0 );  
   float2 sampling2 = TextureUV + float2( -fTexel, 0 );  
   float2 sampling3 = TextureUV + float2( 0, fTexel );  
   float2 sampling4 = TextureUV + float2( 0, -fTexel );  
     
   float2 intensity_x0 = texIntensityMap.Sample( sampler1, sampling1 ).xy;  
   float2 intensity_x1 = texIntensityMap.Sample( sampler1, sampling2 ).xy;  
   float2 intensity_diff_x = intensity_x0 - intensity_x1;  
     
   float2 intensity_y0 = texIntensityMap.Sample( sampler1, sampling3 ).xy;  
   float2 intensity_y1 = texIntensityMap.Sample( sampler1, sampling4 ).xy;  
   float2 intensity_diff_y = intensity_y0 - intensity_y1;  
     
   float2 maxAbsDifference = max( abs(intensity_diff_x), abs(intensity_diff_y) );  
   maxAbsDifference = saturate(maxAbsDifference);  

Now - if we multiply filter and maxAbsDifference...

So simple and so effective.

Once we have outlines, we sample outline map from previous frame.
Then, to have "ghosting" effect we take a bit of parameters calculated with current pass and values from outline map.

Say "hi" to our old friend - integer noise. It's present here as well. Animation parameters ( cb3[0].zw ) are from constant buffer and they change with time.
   float2 outlines = masterFilter * maxAbsDifference;  
     
   // Sample outline map  
   float2 outlineMap = texOutlineMap.Sample( samplerLinearWrap, uv ).xy;  
     
   // I guess it's related with ghosting   
   float paramOutline = masterFilter*0.15 + outlineMap.y;  
   paramOutline += 0.35 * outlines.r;  
   paramOutline += 0.35 * outlines.g;  
     
   // input for integer noise  
   float2 noiseWeights = cb3_v0.zw;
   float2 noiseInputs = 150.0*uv + 300.0*noiseWeights;  
   int2 iNoiseInputs = (int2) noiseInputs;  
     
   float noise0 = clamp( integerNoise( iNoiseInputs.x + reversebits(iNoiseInputs.y) ), -1, 1 ) + 0.65; // r0.y  
     

Side note: If you would like to implement Witcher Senses on your own I suggest to clamp integer noise to [-1;1] range (as its website says). There is no clamp in original TW3 shader but without clamping I had awful artifacts and whole outline map was unstable.

Then, we sample outline map the same way as intensity map before (this time size of texel is 1.0/512.0) and calculate average value of  .x component:

  // sampling of outline map  
   fTexel = 1.0 / 512.0;  
     
   sampling1 = saturate( uv + float2( fTexel, 0 ) );  
   sampling2 = saturate( uv + float2( -fTexel, 0 ) );  
   sampling3 = saturate( uv + float2( 0, fTexel ) );  
   sampling4 = saturate( uv + float2( 0, -fTexel ) );  
     
   float outline_x0 = texOutlineMap.Sample( sampler0, sampling1 ).x;  
   float outline_x1 = texOutlineMap.Sample( sampler0, sampling2 ).x;  
   float outline_y0 = texOutlineMap.Sample( sampler0, sampling3 ).x;  
   float outline_y1 = texOutlineMap.Sample( sampler0, sampling4 ).x;  
   float averageOutline = (outline_x0+outline_x1+outline_y0+outline_y1) / 4.0;  

Then, following the assembly, a difference between average and value in that particular pixel is computed and perturbed with integer noise:
   // perturb with noise  
   float frameOutlineDifference = averageOutline - outlineMap.x;  
   frameOutlineDifference *= noise0;  

The next step is to perturb value from "old" outline map with noise - this is main line which gives blocky look to output texture.

There are some more calculations later and, at the very end, "damping" is caculated.
   // the main place with gives blocky look of texture  
   float newNoise = outlineMap.x * noise0;  
     
   float newOutline = frameOutlineDifference * 0.9 + paramOutline;  
   newOutline -= 0.24*newNoise;  
     
   // 59: add r0.x, r0.y, r0.z  
   float2 finalOutline = float2( outlineMap.x + newOutline, newOutline);  
     
   // * calculate damping  
   float dampingParam = saturate( cb3_v0.x );  
   dampingParam = pow( dampingParam, 100 );    
     
   float damping = 0.7 + 0.16*dampingParam;  
   
   
   // * final multiplication  
   float2 finalColor = finalOutline * damping;  
   return float4(finalColor, 0, 0);


Here is a small video which shows outline map in action:



If you are interested with complete pixel shader, it's here. It's compatible with RenderDoc.
What's interesting (and, to be honest, slightly frustrating) despite its assembly is the same as the original shader from Witcher 3, the final look of outline map in RenderDoc changes!

On a side note - in the last pass (link below) you will see that only .r channel of outline map is being used. So why do we need .g channel then? I guess it's some sort of ping-pong buffer within the texture - please note that .r contains .g channel + some new value.

We have arrived to the end of the second part. Go to the last one here.


I hope you enjoyed it.
Thanks for reading!

Brak komentarzy:

Prześlij komentarz