niedziela, 17 marca 2019

Reverse engineering the rendering of The Witcher 3, part 12 - stupid sky tricks

Welcome,

This part of the series will be slightly different comparing to the previous ones. Today I'd like to show you some aspects of sky shaders from The Witcher 3.

Why some "stupid tricks" instead of full shader? Well, there are a few reasons. First of all, sky shader in The Witcher 3 is quite a complex beast. Pixel Shader of 2015 version has 267 lines of assembly while PS from "Blood & Wine" DLC - 385.
Moreover, they have quite a lot of inputs which doesn't really help in struggles to reverse engineer complete (and readable!) HLSL code.

Therefore, I decided to show you some tricks from these shaders only. If I find anything new, this post will be updated.

The differences between 2015 version of the game and B&W (2016) addon are quite notable. This includes, for instance, different calculation of stars and their blinking, different approach to rendering of the Sun... Blood & Wine shader also calculates Milky Way during the night.

I'll start with some basics and switch to stupid tricks later.

Basics

As most of modern video games, The Witcher 3 uses skydome to represent sky. Take a look at hemisphere used for this in The Witcher 3 (2015). On a side note, in this case bounding box of this mesh ranges from [0,0,0] to [1,1,1] (Z is up-axis) and has smoothly distributed UVs. We'll use them later.


The idea behind skydome is similar to skybox (mesh being used is the only difference). During vertex shader we translate a skydome with respect to observer (usually by camera position) which gives an illusion that sky is really far away - we'll never go there.

If you have been following the series for a while you know that The Witcher 3 uses reversed depth - that means, far plane is represented by 0.0f while near plane - by 1.0f. To make sure that output of skydome will be completely on far plane, we set MinDepth the same as MaxDepth of viewport parameters:


To learn how MinDepth and MaxDepth fields are used during viewport transform click here (docs.microsoft.com).

Vertex Shader

Let's start with vertex shader. In The Witcher 3 (2015) assembly of VS is as follows:
 vs_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer cb1[4], immediateIndexed  
    dcl_constantbuffer cb2[6], immediateIndexed  
    dcl_input v0.xyz  
    dcl_input v1.xy  
    dcl_output o0.xy  
    dcl_output o1.xyz  
    dcl_output_siv o2.xyzw, position  
    dcl_temps 2  
   0: mov o0.xy, v1.xyxx  
   1: mad r0.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx  
   2: mov r0.w, l(1.000000)  
   3: dp4 o1.x, r0.xyzw, cb2[0].xyzw  
   4: dp4 o1.y, r0.xyzw, cb2[1].xyzw  
   5: dp4 o1.z, r0.xyzw, cb2[2].xyzw  
   6: mul r1.xyzw, cb1[0].yyyy, cb2[1].xyzw  
   7: mad r1.xyzw, cb2[0].xyzw, cb1[0].xxxx, r1.xyzw  
   8: mad r1.xyzw, cb2[2].xyzw, cb1[0].zzzz, r1.xyzw  
   9: mad r1.xyzw, cb1[0].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r1.xyzw  
  10: dp4 o2.x, r0.xyzw, r1.xyzw  
  11: mul r1.xyzw, cb1[1].yyyy, cb2[1].xyzw  
  12: mad r1.xyzw, cb2[0].xyzw, cb1[1].xxxx, r1.xyzw  
  13: mad r1.xyzw, cb2[2].xyzw, cb1[1].zzzz, r1.xyzw  
  14: mad r1.xyzw, cb1[1].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r1.xyzw  
  15: dp4 o2.y, r0.xyzw, r1.xyzw  
  16: mul r1.xyzw, cb1[2].yyyy, cb2[1].xyzw  
  17: mad r1.xyzw, cb2[0].xyzw, cb1[2].xxxx, r1.xyzw  
  18: mad r1.xyzw, cb2[2].xyzw, cb1[2].zzzz, r1.xyzw  
  19: mad r1.xyzw, cb1[2].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r1.xyzw  
  20: dp4 o2.z, r0.xyzw, r1.xyzw  
  21: mul r1.xyzw, cb1[3].yyyy, cb2[1].xyzw  
  22: mad r1.xyzw, cb2[0].xyzw, cb1[3].xxxx, r1.xyzw  
  23: mad r1.xyzw, cb2[2].xyzw, cb1[3].zzzz, r1.xyzw  
  24: mad r1.xyzw, cb1[3].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r1.xyzw  
  25: dp4 o2.w, r0.xyzw, r1.xyzw  
  26: ret  

In this scenario VS outputs only texcoords and world-space position. In Blood & Wine it also outputs normalized normal vector. I'll stay with the 2015 version as it's simpler.

Take a look at constant buffer marked as cb2:


Here we have world matrix (uniform scaling by 100 and translation by camera position). Nothing fancy. cb2_v4 and cb2_v5 are scale/bias factors which serve to transform positions of vertices from [0-1] range to [-1;1] one. But here, in terms of Z-axis (up) these coefficents will 'squeeze' it.


We have already seen similar VS in previous parts of the series. The general algorithm is to pass texcoords further, then calculate Position with scale/bias factors, then calculate PositionW in world space, then calculate final clip space position by multiplying matWorld and matViewProj matrices together -> use their product to multiply with Position to get final SV_Position.

So, the HLSL for this vertex shader would be something like this:
 struct InputStruct {  
      float3 param0 : POSITION;  
      float2 param1 : TEXCOORD;  
      float3 param2 : NORMAL;  
      float4 param3 : TANGENT;  
 };  
   
 struct OutputStruct {  
      float2 param0 : TEXCOORD0;  
      float3 param1 : TEXCOORD1;  
      float4 param2 : SV_Position;  
 };  
   
 OutputStruct EditedShaderVS(in InputStruct IN)  
 {  
      OutputStruct OUT = (OutputStruct)0;  
        
      // Simple texcoords passing  
      OUT.param0 = IN.param1;  
        
        
      // * Manually construct world and viewProj martices from float4s:  
      row_major matrix matWorld = matrix(cb2_v0, cb2_v1, cb2_v2, float4(0,0,0,1) );  
      matrix matViewProj = matrix(cb1_v0, cb1_v1, cb1_v2, cb1_v3);  
   
      // * Some optional fun with worldMatrix  
      // a) Scale  
      //matWorld._11 = matWorld._22 = matWorld._33 = 0.225f;  
   
      // b) Translate  
      // X Y Z  
      //matWorld._14 = 520.0997;  
      //matWorld._24 = 74.4226;  
      //matWorld._34 = 113.9;  
   
      // Local space - note the scale+bias here!  
      //float3 meshScale = float3(2.0, 2.0, 2.0);  
      //float3 meshBias = float3(-1.0, -1.0, -0.4);  
      float3 meshScale = cb2_v4.xyz;  
      float3 meshBias = cb2_v5.xyz;  
   
      float3 Position = IN.param0 * meshScale + meshBias;  
        
      // World space  
      float4 PositionW = mul(float4(Position, 1.0), transpose(matWorld) );  
      OUT.param1 = PositionW.xyz;  
   
      // Clip space - original approach from The Witcher 3  
      matrix matWorldViewProj = mul(matViewProj, matWorld);  
      OUT.param2 = mul( float4(Position, 1.0), transpose(matWorldViewProj) );  
        
      return OUT;  
 }  

Comparison of the my shader (left) and the original one (right):

The great thing about RenderDoc is that it allows to inject your own shader instead of original one and your changes do affect the pipeline until the very end of a frame. As you can see in HLSL code, I gave you some options to change scaling and translation of the final geometry. You can play with it and achieve some funny results:

Hail to the skydome!

Optimizing the vertex shader

Do you see a problem with the original vertex shader? Per-vertex matrix-matrix multiplication is completely redundant! I found it in at least few vertex shaders (for instance, in distant rain shafts).  We could optimize it by multiplying PositionW with matViewProj immediately!

So, we can replace HLSL code:
      // Clip space - original approach from The Witcher 3  
      matrix matWorldViewProj = mul(matViewProj, matWorld);  
      OUT.param2 = mul( float4(Position, 1.0), transpose(matWorldViewProj) );  

with this one:
      // Clip space - optimized version  
      OUT.param2 = mul( matViewProj, PositionW );  

An optimized version produces the following assembly:
    vs_5_0  
    dcl_globalFlags refactoringAllowed  
    dcl_constantbuffer CB1[4], immediateIndexed  
    dcl_constantbuffer CB2[6], immediateIndexed  
    dcl_input v0.xyz  
    dcl_input v1.xy  
    dcl_output o0.xy  
    dcl_output o1.xyz  
    dcl_output_siv o2.xyzw, position  
    dcl_temps 2  
   0: mov o0.xy, v1.xyxx  
   1: mad r0.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx  
   2: mov r0.w, l(1.000000)  
   3: dp4 r1.x, r0.xyzw, cb2[0].xyzw  
   4: dp4 r1.y, r0.xyzw, cb2[1].xyzw  
   5: dp4 r1.z, r0.xyzw, cb2[2].xyzw  
   6: mov o1.xyz, r1.xyzx  
   7: mov r1.w, l(1.000000)  
   8: dp4 o2.x, cb1[0].xyzw, r1.xyzw  
   9: dp4 o2.y, cb1[1].xyzw, r1.xyzw  
  10: dp4 o2.z, cb1[2].xyzw, r1.xyzw  
  11: dp4 o2.w, cb1[3].xyzw, r1.xyzw  
  12: ret

As you can see, we reduced number of instructions from 26 to 12 - that's quite a change. I don't know how widespread this problem is in the game but c'mon CD Projekt Red, maybe a patch or something? :)

I'm not kidding here. You can inject my optimized shader instead of original one in RenderDoc and see for yourself that this optimization changes nothing in terms of visuals. Honestly, I don't know why CD Projekt Red decided to do per-vertex matrix-matrix multiplication...

The Sun

In The Witcher 3 (2015) calculating of atmospheric scattering and the Sun consists of two separate draw calls:

The Witcher 3 (2015) - before

The Witcher 3 (2015) - with sky

The Witcher 3 (2015) - with sky + the Sun
Rendering the Sun in 2015 version is pretty similar to the Moon in terms of geometry and blend/depth states.


On the other hand, in Blood & Wine sky with the Sun is rendered in one pass:

The Witcher 3: Blood & Wine (2016) - before sky

The Witcher 3: Blood & Wine (2016) - with sky and the Sun

No matter how you want to render the Sun at some point you will need (normalized) direction of sunlight. The most intuitive way to obtain this vector is to use spherical coordinates. Basically you need only two values representing two angles (in radians!): phi and theta. Once you have them you can assume r = 1, so it cancels, so for y-up Cartesian coordinate system we can write HLSL code like this:
 float3 vSunDir;  
 vSunDir.x = sin(fTheta)*cos(fPhi);  
 vSunDir.y = sin(fTheta)*sin(fPhi);  
 vSunDir.z = cos(fTheta);  
 vSunDir = normalize(vSunDir);  

Normally you calculate sunlight direction in your application, then pass it to constant buffer for further use.

Once we have sunlight direction we can dive into assembly of pixel shader from Blood & Wine....
  ...   
  100: add r1.xyw, -r0.xyxz, cb12[0].xyxz  
  101: dp3 r2.x, r1.xywx, r1.xywx  
  102: rsq r2.x, r2.x  
  103: mul r1.xyw, r1.xyxw, r2.xxxx  
  104: mov_sat r2.xy, cb12[205].yxyy  
  105: dp3 r2.z, -r1.xywx, -r1.xywx  
  106: rsq r2.z, r2.z  
  107: mul r1.xyw, -r1.xyxw, r2.zzzz  
  ...  

Okay. To start, cb12[0].xyz is a position of camera, while in r0.xyz we store vertex position (it's an output from vertex shader). Therefore, line 100 calculates worldToCamera vector. But take a look at lines 105-107. We could write it as normalize( -worldToCamera), which means we calculate normalized cameraToWorld vector.

  120: dp3_sat r1.x, cb12[203].yzwy, r1.xywx  

Then we calculate dot product between cameraToWorld and sunDirection vectors! Remember they have to be normalized. Also we saturate whole expression to clamp it within [0-1] range.

Cool! We have this dot product in r1.x. Let's find the next use of it...
  152: log r1.x, r1.x  
  153: mul r1.x, r1.x, cb12[203].x  
  154: exp r1.x, r1.x  
  155: mul r1.x, r2.y, r1.x  


The "log, mul, exp" triple is, simply speaking, exponentation. As you can see, we raise our cosine (dot product of normalized vectors) to some power. You may ask, why? This way we can produce gradient which will mimic our Sun. (And line 155 affects opacity of this gradient, so you can for instance set this to zero to completely hide the Sun). See some examples:

exponent = 54

exponent = 2400
Having this gradient, we use it to interpolate between skyColor and sunColor! To make sure there will be no artifacts we had to saturate in line 120.

Please take a note that this trick can be used to mimic corona phenomenon for the Moon (with lower values of the exponent). For this you will need moonDirection vector - which can be easily calculated with spherical coordinates.

Final HLSL can look similar to the following snippet:
 float3 vCamToWorld = normalize( PosW – CameraPos );  
   
 float cosTheta = saturate( dot(vSunDir, vCamToWorld) );  
 float sunGradient = pow( cosTheta, sunExponent );  
   
 float3 color = lerp( skyColor, sunColor, sunGradient );  

Moving stars

If you would make a timelapse during the night on a clear sky in The Witcher 3 you would notice that stars are not static - they slightly move across the sky with time! I noticed this quite accidentally and wanted to see how this was done.

Let's start with fact that stars in The Witcher 3 are represented with 1024x1024x6 cubemap. If you think about it, it's very handy solution as it easily allows to map directions to sample the cubemap.

Consider the following piece of assembly:
  159: add r1.xyz, -v1.xyzx, cb1[8].xyzx  
  160: dp3 r0.w, r1.xyzx, r1.xyzx  
  161: rsq r0.w, r0.w  
  162: mul r1.xyz, r0.wwww, r1.xyzx  
  163: mul r2.xyz, cb12[204].zwyz, l(0.000000, 0.000000, 1.000000, 0.000000)  
  164: mad r2.xyz, cb12[204].yzwy, l(0.000000, 1.000000, 0.000000, 0.000000), -r2.xyzx  
  165: mul r4.xyz, r2.xyzx, cb12[204].zwyz  
  166: mad r4.xyz, r2.zxyz, cb12[204].wyzw, -r4.xyzx  
  167: dp3 r4.x, r1.xyzx, r4.xyzx  
  168: dp2 r4.y, r1.xyxx, r2.yzyy  
  169: dp3 r4.z, r1.xyzx, cb12[204].yzwy  
  170: dp3 r0.w, r4.xyzx, r4.xyzx  
  171: rsq r0.w, r0.w  
  172: mul r2.xyz, r0.wwww, r4.xyzx  
  173: sample_indexable(texturecube)(float,float,float,float) r4.xyz, r2.xyzx, t0.xyzw, s0  

To calculate final sampling vector (line 173), we start by calculating normalized worldToCamera vector (lines 159-162).

Then we calculate 2 cross products (163-164, 165-166) with moonDirection and later perform 3 dot products to get final sampling vector. HLSL:

 float3 vWorldToCamera = normalize( g_CameraPos.xyz - Input.PositionW.xyz );  
 float3 vMoonDirection = cb12_v204.yzw;  
   
 float3 vStarsSamplingDir = cross( vMoonDirection, float3(0, 0, 1) );  
 float3 vStarsSamplingDir2 = cross( vStarsSamplingDir, vMoonDirection );  
   
 float dirX = dot( vWorldToCamera, vStarsSamplingDir2 );  
 float dirY = dot( vWorldToCamera, vStarsSamplingDir );  
 float dirZ = dot( vWorldToCamera, vMoonDirection);  
 float3 dirXYZ = normalize( float3(dirX, dirY, dirZ) );  
   
 float3 starsColor = texNightStars.Sample( samplerAnisoWrap, dirXYZ ).rgb;  

Note to self: This is really well-thought and I definitely have to investigate it in more details.
Note to readers: If you know more about this operation, let me know!

Blinking stars

Another nice trick I wanted to investigate in more details is blinking of stars. If you walk around, let's say, outskirts of Novigrad City and sky is clear you can notice that stars are blinking.

I was curious how this was implemented. So the difference is quite big between the 2015 version and Blood & Wine. For simplicity I'll stay with 2015 version.

So we start just after sampling starsColor from the previous section:
  174: mul r0.w, v0.x, l(100.000000)  
  175: round_ni r1.w, r0.w  
  176: mad r2.w, v0.y, l(50.000000), cb0[0].x  
  177: round_ni r4.w, r2.w  
  178: bfrev r4.w, r4.w  
  179: iadd r5.x, r1.w, r4.w  
  180: ishr r5.y, r5.x, l(13)  
  181: xor r5.x, r5.x, r5.y  
  182: imul null, r5.y, r5.x, r5.x  
  183: imad r5.y, r5.y, l(0x0000ec4d), l(0.0000000000000000000000000000000000001)  
  184: imad r5.x, r5.x, r5.y, l(146956042240.000000)  
  185: and r5.x, r5.x, l(0x7fffffff)  
  186: itof r5.x, r5.x  
  187: mad r5.y, v0.x, l(100.000000), l(-1.000000)  
  188: round_ni r5.y, r5.y  
  189: iadd r4.w, r4.w, r5.y  
  190: ishr r5.z, r4.w, l(13)  
  191: xor r4.w, r4.w, r5.z  
  192: imul null, r5.z, r4.w, r4.w  
  193: imad r5.z, r5.z, l(0x0000ec4d), l(0.0000000000000000000000000000000000001)  
  194: imad r4.w, r4.w, r5.z, l(146956042240.000000)  
  195: and r4.w, r4.w, l(0x7fffffff)  
  196: itof r4.w, r4.w  
  197: add r5.z, r2.w, l(-1.000000)  
  198: round_ni r5.z, r5.z  
  199: bfrev r5.z, r5.z  
  200: iadd r1.w, r1.w, r5.z  
  201: ishr r5.w, r1.w, l(13)  
  202: xor r1.w, r1.w, r5.w  
  203: imul null, r5.w, r1.w, r1.w  
  204: imad r5.w, r5.w, l(0x0000ec4d), l(0.0000000000000000000000000000000000001)  
  205: imad r1.w, r1.w, r5.w, l(146956042240.000000)  
  206: and r1.w, r1.w, l(0x7fffffff)  
  207: itof r1.w, r1.w  
  208: mul r1.w, r1.w, l(0.000000001)  
  209: iadd r5.y, r5.z, r5.y  
  210: ishr r5.z, r5.y, l(13)  
  211: xor r5.y, r5.y, r5.z  
  212: imul null, r5.z, r5.y, r5.y  
  213: imad r5.z, r5.z, l(0x0000ec4d), l(0.0000000000000000000000000000000000001)  
  214: imad r5.y, r5.y, r5.z, l(146956042240.000000)  
  215: and r5.y, r5.y, l(0x7fffffff)  
  216: itof r5.y, r5.y  
  217: frc r0.w, r0.w  
  218: add r0.w, -r0.w, l(1.000000)  
  219: mul r5.z, r0.w, r0.w  
  220: mul r0.w, r0.w, r5.z  
  221: mul r5.xz, r5.xxzx, l(0.000000001, 0.000000, 3.000000, 0.000000)  
  222: mad r0.w, r0.w, l(-2.000000), r5.z  
  223: frc r2.w, r2.w  
  224: add r2.w, -r2.w, l(1.000000)  
  225: mul r5.z, r2.w, r2.w  
  226: mul r2.w, r2.w, r5.z  
  227: mul r5.z, r5.z, l(3.000000)  
  228: mad r2.w, r2.w, l(-2.000000), r5.z  
  229: mad r4.w, r4.w, l(0.000000001), -r5.x  
  230: mad r4.w, r0.w, r4.w, r5.x  
  231: mad r5.x, r5.y, l(0.000000001), -r1.w  
  232: mad r0.w, r0.w, r5.x, r1.w  
  233: add r0.w, -r4.w, r0.w  
  234: mad r0.w, r2.w, r0.w, r4.w  
  235: mad r2.xyz, r0.wwww, l(0.000500, 0.000500, 0.000500, 0.000000), r2.xyzx  
  236: sample_indexable(texturecube)(float,float,float,float) r2.xyz, r2.xyzx, t0.xyzw, s0  
  237: log r4.xyz, r4.xyzx  
  238: mul r4.xyz, r4.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000)  
  239: exp r4.xyz, r4.xyzx  
  240: log r2.xyz, r2.xyzx  
  241: mul r2.xyz, r2.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000)  
  242: exp r2.xyz, r2.xyzx  
  243: mul r2.xyz, r2.xyzx, r4.xyzx  

Huh. Let's take a look at the very end of this quite big piece of assembly.

Once we sampled starsColor in line 173 we calculate some offset value. This offset is used to perturb first sampling direction (r2.xyz, line 235), then we sample stars cubemap again, perform gamma correction on these two values (237-242) and multiply them together (243).

Simple, isn't it? Well, not really. Think about this offset for a while. It must be different across whole skydome - stars blinking the same way would look very unrealistic.

To make sure that offset will be as diverse as possible we will take advantage of UVs wrapped across skydome (v0.xy) and elapsed time from constant buffer (cb[0].x).

If you are unfamiliar with this intimidating ishr/xor/and thing, take a look at lightnings effect to learn more about integer noise.

So as you can see, integer noise is called here four times, but it's different now comparing to lightnings. To make results even more random the input integer for noise is a sum (iadd) and reversing bits is performed (reversebits instrinsic; bfrev instruction).

Okay, easy now. Let's start from start.
We have 4 "iterations" of integer noise. I analyzed the assembly and calculation of all 4 iterations looks like this:
 // * Inputs - UV and elapsed time in seconds  
 float2 starsUV;  
 starsUV.x = 100.0 * Input.TextureUV.x;       
 starsUV.y = 50.0  * Input.TextureUV.y + g_fTime;  
             
 // * Iteration 1  
 int iStars1_A = reversebits( asint( floor(starsUV.y) ) );  
 int iStars1_B = asint( floor(starsUV.x) );            
   
 float fStarsNoise1 = integerNoise( iStars1_A + iStars1_B );  
             
   
 // * Iteration 2  
 int iStars2_A = reversebits( asint( floor(starsUV.y) ) );  
 int iStars2_B = asint( floor( starsUV.x - 1.0 ) );       
   
 float fStarsNoise2 = integerNoise( iStars2_A + iStars2_B );  
        
   
 // * Iteration 3  
 int iStars3_A = reversebits( asint( floor( starsUV.y - 1.0 ) ) );  
 int iStars3_B = asint( floor(starsUV.x) );  
   
 float fStarsNoise3 = integerNoise( iStars3_A + iStars3_B );  
             
   
 // * Iteration 4  
 int iStars4_A = reversebits( asint( floor( starsUV.y - 1.0 ) ) );  
 int iStars4_B = asint( floor( starsUV.x - 1.0 ) );  
   
 float fStarsNoise4 = integerNoise( iStars4_A + iStars4_B );  

The final outputs of all these 4 iterations are (follow itof instructions to find them):

Iteration 1 - r5.x,
Iteration 2 - r4.w,
Iteration 3 - r1.w,
Iteration 4 - r5.y

After the last itof (line 216) we have:
  217: frc r0.w, r0.w   
  218: add r0.w, -r0.w, l(1.000000)   
  219: mul r5.z, r0.w, r0.w   
  220: mul r0.w, r0.w, r5.z   
  221: mul r5.xz, r5.xxzx, l(0.000000001, 0.000000, 3.000000, 0.000000)   
  222: mad r0.w, r0.w, l(-2.000000), r5.z   
  223: frc r2.w, r2.w   
  224: add r2.w, -r2.w, l(1.000000)   
  225: mul r5.z, r2.w, r2.w   
  226: mul r2.w, r2.w, r5.z   
  227: mul r5.z, r5.z, l(3.000000)   
  228: mad r2.w, r2.w, l(-2.000000), r5.z   

These lines calculate values for S-curve for weights based on fractional part of UVs, just like in case of lightnings. So:

  float s_curve( float x )   
  {   
    float x2 = x * x;   
    float x3 = x2 * x;   
      
    // -2x^3 + 3x^2   
    return -2.0*x3 + 3.0*x2;   
  }  
   
 ...  
 
 // lines 217-222
 float weightX = 1.0 - frac( starsUV.x );  
 weightX = s_curve( weightX );  
   
 // lines 223-228
 float weightY = 1.0 - frac( starsUV.y );  
 weightY = s_curve( weightY );  

As you can expect, these factors serve to interpolate noise smoothly and generate final offset for sampling coordinates:
  229: mad r4.w, r4.w, l(0.000000001), -r5.x   
  230: mad r4.w, r0.w, r4.w, r5.x   
  float noise0 = lerp( fStarsNoise1, fStarsNoise2, weightX );  
   
  231: mad r5.x, r5.y, l(0.000000001), -r1.w   
  232: mad r0.w, r0.w, r5.x, r1.w   
  float noise1 = lerp( fStarsNoise3, fStarsNoise4, weightX );  
   
  233: add r0.w, -r4.w, r0.w   
  234: mad r0.w, r2.w, r0.w, r4.w   
  float offset = lerp( noise0, noise1, weightY );            
   
  235: mad r2.xyz, r0.wwww, l(0.000500, 0.000500, 0.000500, 0.000000), r2.xyzx   
  236: sample_indexable(texturecube)(float,float,float,float) r2.xyz, r2.xyzx, t0.xyzw, s0   
  float3 starsPerturbedDir = dirXYZ + offset * 0.0005;  
    
  float3 starsColorDisturbed = texNightStars.Sample( samplerAnisoWrap, starsPerturbedDir ).rgb;


Once we have starsColorDisturbed, the hardest part is over. Phew!

The next step is to perform gamma correction on both starsColor and starsColorDisturbed and multiply them:
  starsColor = pow( starsColor, 2.2 );  
  starsColorDisturbed = pow( starsColorDisturbed, 2.2 );  
   
  float3 starsFinal = starsColor * starsColorDisturbed;  

Stars - the final touches

We have starsFinal in r1.xyz. What's happening at the end of processing stars is this:
  256: log r1.xyz, r1.xyzx  
  257: mul r1.xyz, r1.xyzx, l(2.500000, 2.500000, 2.500000, 0.000000)  
  258: exp r1.xyz, r1.xyzx  
  259: min r1.xyz, r1.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000)  
  260: add r0.w, -cb0[9].w, l(1.000000)  
  261: mul r1.xyz, r0.wwww, r1.xyzx  
  262: mul r1.xyz, r1.xyzx, l(10.000000, 10.000000, 10.000000, 0.000000)  

This is much, much easier comparing to blinking and moving stars.
So we start with raising starsFinal to power of 2.5 - this allows to control density of stars. Pretty clever. Then, we make sure the maximum color of stars is float3(1, 1, 1).

cb0[9].w is used to control general visibility of stars. So in daytime expect this to be set to 1.0 (which yields in multiplying by zero) and 0.0 during nights.

At the end we boost visibility of stars by 10. And this is over! :)

Summary

In this post I presented some cool tricks I found while investigating sky shader from The Witcher 3. I hope you enjoyed it. Thanks for reading!

Take care,
M.

Brak komentarzy:

Prześlij komentarz