Mozilla’s New JavaScript Value Representation

Here at Mozilla, we have many monkeys.

One such effort, JaegerMonkey, is focused on revamping the baseline performance of our JS Engine. That effort is going really well. On the SunSpider benchmark, JaegerMonkey is starting to pull away from the Mozilla trunk’s JS Engine. Both are faster than the engine that ships in Firefox 3.6. JaegerMonkey is not a total rewrite, but it does change some fundamental parts of the engine. If you’re an extension author that uses the JSAPI directly, or you embed Mozilla’s JS Engine in other software, there are some changes you’ll need to know about. Our new representation of JavaScript Values (aka jsvals) is the first big change. It has just landed on mozilla-central. The patch was a ton of work, and most of the credit goes to Mozilla engineer Luke Wagner.

So, what is a jsval? It’s the C/C++ type that corresponds to a value in a JavaScript program. Here’s a snippet of JavaScript that assigns some values to four variables:

var foo = {dana: "zuul"};
var bar = "hi";
var baz = 37;
var qux = 3.1415;

Developers can use the JSAPI to manipulate these values from C and C++. Since JavaScript is dynamically typed, the types of those values can change at runtime. For example, the type of the value of ‘bar’ in the code above could change from a string to a number. C++ code sometimes needs to be able to tell which type a value has at a given moment, so there needs to be a clever way to pack that information into jsval type.

Below, I’ll explain how the new value representation works on 32-bit systems, using information cribbed from a presentation by Mozilla engineer David Anderson. We adapted this layout from WebKit and LuaJIT, with some modifications. On 64-bit systems, our design is different from theirs. The basic idea is pretty old–it can be found in this 1993 survey paper.

The Old Way

The old jsval representation fit in a 32 bit value, using the 3 lowest bits as a way to tag the value as a particular type. These were called type tags.

Objects

A jsval with an object value would look like this:

var foo = {dana: “xuul”} @ 0×86753090

C++ code can inspect the three tag bits at the end. By observing that all three are 0 valued, it can determine that the value is an object, and should be interpreted as a pointer to a JSObject at the address 0×86753090.

Strings

Strings worked similarly:

This time, one of the tag bits is set to 1, so C++ code knows that this value is a pointer to a string. Once the type tag was determined, the implementation would perform some bit masking to determine the true value of the last 4 bits.

var bar = “hi” @ 0×20506638

The masked value contains the correct pointer to a JS String at 0×20506638.

Numbers

Integers were stored in the value itself:

The implementation only examined the least significant tag bit for a 1 value to determine whether it was an integer. If it was, it would perform right shift on the value to get the actual integer value.

var baz = 37; // 0×25 in hex

The mechanics of this scheme mean that integers only have 31 bits of space to work with, rather than the usual 32. Floating point numbers don’t fit at all. For example, the float 3.1415 looks like this in memory: 400921cac083126f. Too many bits.

When an integer got too big to fit in 31 bits, or a floating point number was encountered, the value would get converted to a double, and stored on the heap.

Once again, some bit masking was performed to determine the actual address of the memory:

var qux = 3.1415 @ 0xA0B0CCD0 => 400921cac083126f

We have a 32-bit value that contains a pointer to the real value of the float, which is 64 bits wide. This arrangement is bad for at least three reasons. Firstly, we have to allocate to create a number. Secondly, we have to clean up that number later (during GC). Thirdly, it hurts locality, because we have to fetch float values from arbitrary heap locations to do even simple calculations.

The New Way

We call them Fat Values. They’re 64 bits wide.

For Objects, Strings, and Integers, we use the first 32 bits as a type tag. The second 32 bits contains the payload.

var foo = {dana: “xuul”} @ 0×86753090

var bar = “hi” @ 0×20506638

var baz = 37

The payoff is that we can fit the full range of 32-bit integers in integer-tagged values, and floating point numbers fit right in the jsval:

var qux = 3.1415; // (400921cac083126f)

We can distinguish type tags from floating point numbers by using a quirk of IEEE-754 double precision numbers.

The first bit is the sign bit (purple), and the next eleven (yellow) are all exponent bits. If all of the exponent bits are 1s, then the number is a NaN, unless all of the remaining bits (the blue ones) are 0s. If all of the blue bits in this diagram were 0, the value would be either positive or negative infinity. We distinguish the values we’re using for type tags from other NaNs by marking the first 16 bits as 1s. In practice, all hardware and standard libraries produce a single canonical NaN value, so we’re free to use all of the other values for our own purposes. This technique is called NaN boxing.

Changes for JSAPI users

Here is the short version, courtesy of Luke Wagner.

  • jsval is no longer word-sized
  • jsval can hold a full int32
  • doubles are stored in the jsval; JSVAL_TO_DOUBLE returns double
  • jsval and jsid no longer share the same representation
  • JSClass method signatures have been modified to take jsids for id
    arguments and pass jsval arguments by const jsval*.

You can read up in more detail, and provide feedback, by checking out Luke’s mozilla.dev.tech.js-engine post on the matter.

Comments (18)

  1. Aziz wrote:

    Smalltalk has used a similar type of encoding since 16-bits was a big number. As has LISP, for that matter. I’ll grant that the NaN encoding is more clever than I’ve seen before, yes.

    Monday, August 2, 2010 at 8:29 pm #
  2. überrationell wrote:

    I like how you move from one insane binary op scheme with negative performance implications to another “unknown behaviour” way of doing things. True engineering.

    I wonder what the people over at Chromes V8 do. Oh, they use *proper classes*, call *real functions* and do not rely on bit magic. I guess they know they should not try and outsmart the compiler to save some bits in memory.

    Monday, August 2, 2010 at 9:03 pm #
  3. Nadav wrote:

    This is really interesting. I wonder what is the performance gain of the second method to a naive method where a separate variable holds the type of the ‘js-value’. Would the performance loss be similar to the performance loss when moving from 32 to 64bits ?

    Monday, August 2, 2010 at 10:15 pm #
  4. martin wrote:

    > Here’s a snippet of JavaScript that assigns some values to three variables:

    I think you meant to say ‘four variables’ or did I miss something?

    Tuesday, August 3, 2010 at 1:11 am #
  5. Jason Riedy wrote:

    Little problem / possibility: You’re creating signaling NaNs on some platforms.

    At least one language implementor (not JS) uses this to their advantage. They perform operations as if all data is double. Then they check the invalid flag. If it’s raised, some datum may have been something *other* than double, and they go back and re-check.

    That flips the problem around to your creating quiet NaNs on some platforms. We couldn’t force a single NaN representation in the 754 revision, alas, so we’re stuck with that old mistake.

    Tuesday, August 3, 2010 at 5:44 am #
  6. yay for the Ghostbusters reference in the object var! I know there was some Ghostbuster-y stuff left in Mozilla aside of the XUL namespace! ;-)

    That said, you state it’s three variables when you actually made it to be four. ;-)

    Tuesday, August 3, 2010 at 10:06 am #
  7. Dan wrote:

    It’s still a little ridiculous the way they’re laying it out. I had it done as a 64-bit struct, but it should have more than just type information, and it works out to a remarkably complicated combination of unions and structs when you pack it all in efficiently considering get/set r/w/x and all the other properties you can attach to a value.

    Tuesday, August 3, 2010 at 10:22 am #
  8. rsayre wrote:

    @überrationell, v8′s design is a little different. They still do some bit shifting, I think just for 31-bit integers. All of us use classes and call functions :)

    Tuesday, August 3, 2010 at 10:45 am #
  9. Luke wrote:

    @Jason Riedy: The only NaN value that is used with double arithmetic is the canonical (quiet) NaN. Any other NaN is, by definition, some other type of encoded value.

    Tuesday, August 3, 2010 at 11:01 am #
  10. rsayre wrote:

    edits: Fixed that three variable mistake, and added references to LuaJIT and an old paper that Luke Wagner pointed me to.

    Tuesday, August 3, 2010 at 1:52 pm #
  11. default wrote:

    @überrationell

    Tag bits and the like are nothing unusual and a very common optimization. I don’t know V8 well, but I am sure they do something similar.

    Wednesday, August 4, 2010 at 5:47 am #
  12. Paul wrote:

    So it’s basically been changed to COM variant but that’s been around for 10+ years.

    Wednesday, August 11, 2010 at 12:49 pm #
  13. Josh Pearce wrote:

    In the example, if the last four bits are 1100, it’s a string. The first seven four-bit quadruplets, plus a bitmask’d version of the last quadruplet (1100) comprised eight sets for four bits. My questions is: how can four bits, if they don’t already encode the hex character I’m looking for, be bitmask’d into displaying it, and still be capable of encoding any of the 16 possible hex characters?

    Wednesday, August 11, 2010 at 5:55 pm #
  14. iip wrote:

    I wonder how the Fat 64-bit values influence the performance of processing integers, strings and objects in 32-bit environment.
    For the objects I am pretty sure it would require more processor time. For the other for me is difficult to say if 64-bit value processing would take more or less than the old-style processing (shifting/masking).
    My point is what is the influence on the code which is not “heavy, numeric code”?

    Thursday, August 12, 2010 at 1:15 am #
  15. Beat Bolli wrote:

    @Josh: You’re basically right, but:

    You can build your memory allocator so that it returns all memory aligned to 8 bytes, so the last three bits are always zero.

    Thursday, August 12, 2010 at 5:16 am #
  16. brion wrote:

    @Josh: the last hex digit in a pointer is constrained by alignment; not all possible values need be represented.

    Thursday, August 12, 2010 at 7:14 am #
  17. g wrote:

    Josh Pearce, strings (like objects) are represented by jsvals containing pointers. The allocator makes sure that these pointers are always 8-byte-aligned; that is, the addresses always end with 0 or 8 in hex. Then — this is with the old-style 32-bit jsvals — an object is just represented by the pointer itself, and a string is represented by the pointer OR 0×00000004. To get the pointer, all you have to do is to mask away bit 2.

    The alignment restriction is why the lowest 4 bits don’t need to be able to contain an arbitrary hex digit. They only need to be able to contain (something that decodes to) a 0 or an 8.

    Thursday, August 12, 2010 at 9:47 am #
  18. Omega192 wrote:

    @Robert Kaiser:
    Woah, I didn’t even catch the Ghostbusters reference, but I did catch the “867-5309″
    Seems like a sort of easter egg :]

    Thursday, August 12, 2010 at 8:18 pm #

Trackbacks/Pingbacks (16)

  1. The Burning Edge » Blog Archive » 2010-08-07 Trunk builds on Saturday, August 7, 2010 at 11:47 pm

    [...] Fixed: 549143 – JS perf: Fat unboxed values. [...]

  2. Newest Update to Firefox 4 Beta :: The Mozilla Blog on Wednesday, August 11, 2010 at 11:36 am

    [...] New JavaScript Values Changes to the C++ representation of JavaScript enable Firefox to execute heavy, numeric code more efficiently. This sort of code is used to produce smooth, streamlined graphics in modern Web applications. For more details, see Rob Sayre’s blog post. [...]

  3. Pasirodė Firefox 4.0 beta 3 versija · Radiocool.lt on Wednesday, August 11, 2010 at 12:08 pm

    [...] Javascript mechanizmas leis vykdyti kodą su skaičių kintamaisiais efektyviau.  Nuo praeitos Beta versijos ištaisyta 450 [...]

  4. New Beta Releases For Google Chrome And Firefox 4 on Wednesday, August 11, 2010 at 12:47 pm

    [...] sort of code is used to produce smooth, streamlined graphics in modern Web applications.  [ … ]The Firefox 4 Beta has been released in more than 30 languages including one Indian language, [...]

  5. [...] grâce à un changement dans la manière de représenter les chiffres en C++. Pour en savoir plus, un billet sur le blog de Rob Sayre explique la [...]

  6. Download Firefox 4.0 Beta for Windows, Linux and Mac | Geekword on Wednesday, August 11, 2010 at 3:55 pm

    [...] in 34 different languages and brings with it an improved javascript engine which now has a new way of representing values in JavaScript that allows Firefox to execute heavy, numeric code (used for things like graphics and animations) [...]

  7. [...] result of this change will be more streamlined graphics in Web apps. Firefox developer Rob Sayre offers more specifics on how this change affects the development of the JaegerMonkey JavaScript engine that’s meant [...]

  8. [...] result of this change will be more streamlined graphics in Web apps. Firefox developer Rob Sayre offers more specifics on how this change affects the development of the JaegerMonkey JavaScript engine that’s meant [...]

  9. Firefox browser update brings multitouch to Windows 7 | on Wednesday, August 11, 2010 at 6:59 pm

    [...] new release also includes a number of speed improvements, especially on JavaScript-reliant sites like Gmail or Facebook which can load slowly on older [...]

  10. New Firefox Beta 3 « Umesh kukreti's Blog on Wednesday, August 11, 2010 at 8:51 pm

    [...] result of this change will be more streamlined graphics in Web apps. Firefox developer Rob Sayre offers more specifics on how this change affects the development of the JaegerMonkey JavaScript engine that’s meant [...]

  11. [...] פיירפוקס 4 שמוסיף תמיכת מולטי-טאץ לדפדפן תחת חלונות 7 ושיפורים נוספים למנוע ה-Javascript של הדפדפן לשיפור מהירות הגרפיקה [...]

  12. pseudotecnico:blog » Firefox 4 beta 3 on Wednesday, August 11, 2010 at 9:46 pm

    [...] nuova modalità di rappresentazione dei valori JavaScript in C/C++ (vedi articolo di Rob Sayre), che permette una velocità superiore nei calcoli complessi. All’atto pratico questo si [...]

  13. Mozilla Italia » Archivio » Firefox 4 beta 3 on Wednesday, August 11, 2010 at 9:48 pm

    [...] nuova modalità di rappresentazione dei valori JavaScript in C/C++ (vedi articolo di Rob Sayre), che permette una velocità superiore nei calcoli complessi. All’atto pratico questo si [...]

  14. Firefox 4 beta 3 brings JS math optimization and multitouch... on Thursday, August 12, 2010 at 4:30 am

    Firefox 4 beta 3 brings JS math optimization and multitouch…

    Mozilla has announced the availability of the third Firefox 4 beta. The release brings several improvements…

  15. Tech Baccha » Mozilla rolls out Firefox 4 beta 3 on Thursday, August 12, 2010 at 7:52 am

    [...] result of this change will be more streamlined graphics in Web apps. Firefox developer Rob Sayre offers more specifics on how this change affects the development of the JaegerMonkey JavaScript engine that’s meant [...]

  16. [...] release of Firefox 4. The latest release, Beta 3, incorporates a number of JavaScript-related speed improvements, but the big news this go-round is the addition of multi-touch support for Windows [...]