Quake's fast inverse square root
The inverse square root function (1/sqrt(x)) is used heavily during a game engine's draw loop to normalize vectors into "unit" vectors with a length of 1. Normalized vectors are important because when you take the dot product of two of them (Ax*Bx + Ay*By + Az*Bz), the result is the cosine of the angle between the two vectors.
So imagine you have a vector that describes the way a surface is facing (its surface normal) and a second vector that describes the direction of sunlight. If the sunlight vector is parallel to your surface normal, which is to say the surface is facing the sun, the dot product of the two normalized vectors is the cosine of 0, which is 1. If the surface is 90 degrees from the sun, the dot product is the cosine of 90, which is 0. Anything in between and you get a value between 0 and 1, which essentially allows you to describe how brightly you should light that surface.
Now, you run this calculation on every triangle visible in a 3D game, and you do all that 30 or more times a second, and you have a basic pointsource lighting effect! Recall, however, that you need that inverse square root function to calculate unit vectors for each of those triangles. The square root operation is slow. Do it thousands of times per draw loop, and you have yourself a slow game.
But if you don't mind throwing away a marginal bit of accuracy, there's a faster way.
There's an inverse square root function that a lot of game engines use that is rumored to have originated in Quake III by Nvidia programmer Gary Tarolli. It looks like this:
float InvSqrt(float x) { float xhalf = 0.5f*x; int i = *(int*)&x; // get bits for floating value i = 0x5f3759df  (i>>1); // gives initial guess y0 x = *(float*)&i; // convert bits back to float x = x*(1.5fxhalf*x*x); // Newton step, repeating increases accuracy return x; }
Hold on, what was that?!?!
So way back when, Newton came up with a clever way of approximating the inverse square root. First, you divide the original number x by two. Lets call that "xhalf". Then, you make a reasonably close guess at the inverse square root. Let's call that g. Then, you take those two variables and perform this calculation (which you'll recognize as the last step in the InvSqrt function):
g = g*(1.5  xhalf * g *g)
If you do that over and over, substituting in the updated g on each iteration, g will quickly hone in on the inverse square root of x! In fact, if you pick a decent g to start with, a single iteration or two will get you very close to the correct value.
The question is, how do you find that initial guesstimate for the first g? The game engine coders use trick with how floating point numbers are represented in binary, with the exponent and mantissa broken up similar to scientific notation. In a 32bit float, the leftmost bit is a sign bit and is 0 for positive numbers. That's followed by 8 bits of exponent (biased by 127 to represent negative and positive exponents), and the final 23 bits represent the mantissa.
Well, to do an inverse square root, you basically need to multiply the exponent by 1/2.
When you shift those bits to the right (a very fast operation), the effect on the exponent is to divide it by 2. You still need to subtract the exponent from 0 to change it's sign, though, and what do you do about the mantissa which was also affected in the bit shift operation?
This is where the magic 0x5f3759df number comes in. It's absolutely bonkers, but by subtracting the bit shift result from 0x5f3759df, the mantissa is reset to near to it's original state and the exponent is subtracted from 0 (taking into account it's bias of 127).
The result is very close to the inverse square root. Close enough for a single pass through Newton's equation to end up with a value precise enough for practical purposes.
Understanding Quake's Fast Inverse Square Root
Fast Inverse Square Root  details and math behind the magic number (PDF)
Posted by Jason Striegel 
Dec 4, 2008 07:47 PM
Gaming, Math, Software Engineering 
Permalink
 Comments (3)
Recent Entries
 HOWTO  make a compiler
 Haptic Compass
 Grow your own fresh air indoors
 Pure Data  open source audiovisual processing environment
 FLARToolKit  augmented reality for Flash
 Make 3D stereo charts in Excel
 UTF8 lets you Tweet funny characters
 Watch commercialfree TV in Front Row with PyeTV
 Pseudo3D video conferencing
 Tetherbot  browse on your laptop through the TMobile G1
Comments
Newest comments listed first.
Posted by: Jim Shima on January 6, 2009 at 9:53 AM 
This trick has been used in the signal processing community for some time. In fact, the inverse square root via Newton's method is an opcode on the ADI Sharc processors (to do fast sqrt operations). Newton's method for any function x^n requires a division per iteration, and for x^(n) requires only multiplications. Since DSPs can do multiplications very fast, computing x^(1/2) is much faster than computed x^(1/2). And to compute a fast sqrt, you can take the output of the inverse sqrt and multiply by the original x to get sqrt(x).
Also as stated, for the initial guess in floating pt when you shift the IEEE float number to the right by one that gives you a very accurate starting guess (you must remove the exp bias first). This effectively divides the exp by two as well as the mantissa. If you go through the entire analysis, you will find that the starting guess is, at worst, only 6% in error to the true sqrt. Not bad for just one shift! That is why doing only one Newton iteration is good enough for most applications.
Posted by: DJSolin on February 11, 2009 at 2:33 AM 
And I had always asked my calc profs what in the hell did we do math anyway, "It's not like I am going to use it in programming!!" I used to say!! ;) I guess the joke is on me... I LOVE MATH!!
Leave a comment
Bloggers
Welcome to the Hacks Blog!
Categories
 Ajax
 Amazon
 Android
 AppleTV
 arduino
 Astronomy
 Baseball
 BlackBerry
 Blogging
 Body
 Cars
 Cryptography
 Data
 Design
 Education
 Electronics
 Energy
 Events
 Excel
 Excerpts
 Firefox
 Flash
 Flickr
 Flying Things
 Food
 Gaming
 Gmail
 Google Earth
 Google Maps
 Government
 Greasemonkey
 Hacks Series
 Hackszine Podcast
 Halo
 Hardware
 Home
 Home Theater
 iPhone
 iPod
 IRC
 iTunes
 Java
 Kindle
 Knoppix
 Language
 LEGO
 Life
 Lifehacker
 Linux
 Linux Desktop
 Linux Multimedia
 Linux Server
 Mac
 Mapping
 Math
 Microsoft Office
 Mind
 Mind Performance
 Mobile Phones
 Music
 MySpace
 MySQL
 NetFlix
 Network Security
 olpc
 Online Investing
 OpenOffice
 Outdoor
 Parenting
 PCs
 PDAs
 Perl
 Philosophy
 Photography
 PHP
 Pleo
 Podcast
 Podcasting
 Productivity
 PSP
 Retro Computing
 Retro Gaming
 Science
 Screencasts
 Security
 Shopping
 Skype
 Smart Home
 Software Engineering
 Sports
 SQL
 Statistics
 Survival
 TiVo
 Transportation
 Travel
 Ubuntu
 User Interface
 Video
 Virtualization
 Visual Studio
 VoIP
 Web
 Web Site Measurement
 Windows
 Windows Server
 Wireless
 Word
 World
 Xbox
 Yahoo!
 YouTube
Archives
 February 2009
 January 2009
 December 2008
 November 2008
 October 2008
 September 2008
 August 2008
 July 2008
 June 2008
 May 2008
 April 2008
 March 2008
 February 2008
 January 2008
 December 2007
 November 2007
 October 2007
 September 2007
 August 2007
 July 2007
 June 2007
 May 2007
 April 2007
 March 2007
 February 2007
 January 2007
 December 2006
 November 2006
 October 2006
 September 2006
Recent Posts
 HOWTO  make a compiler
 Haptic Compass
 Grow your own fresh air indoors
 Pure Data  open source audiovisual processing environment
 FLARToolKit  augmented reality for Flash
 Make 3D stereo charts in Excel
 UTF8 lets you Tweet funny characters
 Watch commercialfree TV in Front Row with PyeTV
 Pseudo3D video conferencing
 Tetherbot  browse on your laptop through the TMobile G1
www.flickr.com
