The Evolution of
Numerical Computing in Java


  1. Preamble
  2. Numerical Precision
  3. Allowable Compiler Optimizations
  4. Standard Library Extensions
  5. Efficient classes
  6. Complex Numbers
  7. Operator Overloading
  8. Rounding
  9. Exceptions
  10. NaN
  11. Matricies
  12. Subscript Checking
  13. Memory Hierarchy
  14. Summary. Skip to here if you're in a rush.


This is a very preliminary proposal. It is not yet baked enough for widespread public disemmination. Don't redistribute it. Nothing is decided. Don't depend on it. Or build it. It's just for discussion. Comments to James Gosling


The original specification of floating point arithmetic in Java was done in a context where sophisticated high performance numerical computing was not an issue. But for a variety of reasons Java has become very attractive to the high performance numerical community who could use Java if only it were slightly improved. This document is a set of proposals to address these needs.

A common theme that runs through all these proposals is to use the existing Java models where ever possible. Rather than defining new facilities, I show how to use the existing facilities, with some augmentation.

Numerical Precision

One of the important principles in the design of Java is that we actually try to write down how developers programs should behave. This may sound like a fairly obvious thing to do, but in programming languages it is (sadly) revolutionary. It is standard for language specifications to be silent on the actual semantics of floating point arithmetic because at one time there was huge variability in floating point hardware. In order to include as many machines as possible, languages traditionally left this as "implementation dependent". This makes application porting hellish because knowing that a program works on one CPU doesn't imply anything about correctness on another.

One of the key advances that makes solving this possible is that the IEEE 754 spec has become very widely accepted and almost universally implemented in modern CPU chips. This provides a consistant definition for arithmetic semantics that is truly cross-platform. The current Java numerics spec is essentially a direct reference to the IEEE spec. It calls for conformance to the core single and double precision floating point formats, which are the only ones precisely defined.

The problem with this approach is that there are vague areas on the periphery of the spec that were left out (like rounding, exceptions, and extended precision) largely because there is a fair amount of variability from platform to platform in what they actually implement. There is also significant variability in what precisions (float, double, extended) perform best, and in how slow exactly conforming operations are.

Another way to look at this issue is that there are really two "communities of use" for floating point: those who are numerical specialists, and those who aren't. The current Java spec is strongly oriented towards non-specialists and are served well by it. For specialists who understand the implications of loosened conformance requirements, the possible performance gain is crucial. The IEEE 754 spec was designed so that with care, a developer sufficiently proficient in numerical analysis can actually write robust portable code in the face of loose numerics.

Independant of this proposal, the Java spec is being amended to permit the use of extended precision in intermediate values in expression evaluation. This allows, for example, Intel extended registers and PowerPC fused-multiply-add to be employed. This is largely benign, since it (almost) only increases accuracy (double roundings may cause occasional slight decreases in accuracy). It does make FP behaviour non-deterministic, but in real programs this hasn't been a problem.

The definition of each operator is changed so that instead of their being exactly one correct result, there is a set of values that would be considered correct. The correct value as defined by the strict spec is a member of the expanded set, hence existing implementations that adhere to the strict spec also adhere to the loosened spec.

This proposal adds a new declaration modifier that specifies that a method, or all the methods in a class, can be executed under slightly looser rules.

looseNumerics public sum(float a[], float b[]) {
    double s = 0;
    for (int i=a.length;--i>=0; ) s+=a[i]*b[j];
    return s;
A whole class could be marked with this modifier:
looseNumerics class complex { ...  }
Implementation note: The modifier would just set a bit in the appropriate flag word in the .class file and would not affect the bytecode instruction set at all. A VM that ignores this bit completely conforms to this spec (since the tighter rules are, in a sense, a strict subset of the looser rules; in other words, the loose rules allow the result of an operation to be an unspecified member of some bounded set of values and the tight rules specify one of those values.

VMs don't have to be changed to conform to this spec (if they ignore undefined modifier bits), and the change to compilers is minimal to support the modifier.

This loosening would consist of allowing the use of extended precision more pervasivly: local variables can be assigned to extended registers.

A natural part of this extension is to say that the cast operators (float) and (double) force a correct rounding to their respective precisions. All compilers currently optimize away casts where the cast is redundant. For example, in (double)(a+b), where a and b are doubles, the cast is redundant since the result of the addition is already a double. So no code is generated. Under this spec revision, the value of the addition could be extended in a looseNumerics context, and without an explicit round instruction, the result of the cast would be incorrect. There is no round-double-to-double instruction in the Java VM so either:

  1. An instruction has to be added
  2. It can be implemented within the current VM spec using (for example) an empty method that gets optimized away, except for it's parameter passing side effects:
    static double roundToDouble(double d) { return d; }
    ... (double)(a+b)          //original
    ... roundToDouble(a+b)     //rewrite to force rounding
This needs to be resolved.

Loosening impacts testing, making it much harder. In particular, different platforms may do loosening in subtly different ways. Developers should always test their software under the default consistent rules, but they can only test them under the loose rules for the platforms that they happen to have. It is therfore important that users, who may be running the software on a platform the developer has never seen, be able to override the looseness specifications of the developer, forcing the VM to execute the program under the strict rules. Runtime VMs should have a switch "-loosen" that allows the VM to loosen the rules for classes that have specified loosening. ie. loosening only happens if the looseXXX modifier is specified by the developer and -loosen is specified by the user. (This is the safest way to do it. It could also be that "-loosen" is the default, and users would have to specify "-consistent" if they're encountering difficulties that they think might be numerical in nature).

Allowable Compiler Optimizations

Writers of optimizing compilers frequently exploit mathematical identities that hold in the mathematics we were all taught in school, but which do not (quite) hold in computer floating point. Here are a few examples of the kind of improper transformations that are sometimes desired:
Old New Why it's bogus
x=0*y x=0 If y == Inf or NaN, x would be NaN
x=4*y*0.5 x=2*y If 4*y overflows, the result would be Inf
x=y/10 x=y*0.1 One extra rounding of an inexact calculation: 1/10=>0.1
x=a+b+c x=a+(b+c) The standard catastrophic cancellation problem. This is actually the worst because it shows up in many parallelization and pipelining optimizations. There's some simple background information here.
These optimizations can have serious negative impacts on correctness, but they can also have serious positive impacts on performance. I propose another modifier, idealizedNumerics, similar to looseNumerics, that allows optimizers to assume that the standard mathematical identies apply, without regard to exceptions or rounding; essentially assuming that calculations are done with infinite accuracy.

There is considerable feeling that this rule is too generous & too dangerous: the possible erosion of precision and twisting of behaviour can be pretty extreme. An alternative is to only allow the optimizer to assume that the associative rule applies, not all the standard identities.

Standard Library Extensions

The standard math package needs:

It may also be reasonable to define a "LooseMath" class that is analogous to "Math", except that the functions are allowed to be (slightly) inexact, for reasons of performance.

(an historical aside: the reason that the math class doesn't have functions like sqrt(float) is that once upon a time, the compiler couldn't do method overload discrimination based on float vs double)

Efficient classes

One of the barriers to doing the natural implementation of (for example) complex numbers as classes is that class objects are less efficient than primitive types like double and int.

It is almost possible, under the current language spec for a sufficiently clever optimizing compiler to transform certain classes into lightweight objects that are not heap allocated and are passed by value rather than reference: declare the class and all its instance variables to be final. However, this doesn't quite work: there are areas where the "pointer" behaviour of these immutable objects leaks through:

This can be partially solved by adding an extra field to every immutable value that is a unique ID. `==' then compares the unique ID. While this does repair the semantics of `==', it does so at a pretty significant cost elsewhere in the system: Every pass-by-value object is at least 32 bits longer, and object IDs have to be created frequently (eg. in Complex `a=b+c' a new ID has to be assigned to `a'. Casting remains ugly.

I propose to do something similar to what was done in Sather value/immutable types. In the language spec:

Classes may be declared with the prefix modifier "immutable" as follows:
  immutable class Complex { ... }
An immutable class is different from a regular class in the following ways:
In the VM spec:
Immutable classes are just like regular classes except that they have a modifier flag set. Source to bytecode compilers generate the same code for immutable and regular classes, except that it is guaranteed that immutable objects will never be cast, compared, or stored into outside of the constructor. Conforming implementations are allowed to ignore the flag. Optimizing implementations can use it as a signal to inline the class (they don't have to -- if, for example, the object is large).

Complex Numbers

Just do it as the obvious class:

public final class complex {
        private final double re, imag;
        public complex(double re, double imag) {
       = re;
                this.imag = imag;
        public complex add(complex a, complex b) {
                return new complex(, a.imag+b.imag);
If the optimizer can do the right inlining of immutable classes, and operator overloading is implemented, then the straightforward "within the language" implementation of complex numbers should give the same performance and ease of use as a wired-in type in a language like Fortran.

Operator Overloading

Everyone I've talked to who does numerical work insists that operator overloading is totally essential. The method invocation style for writing expressions is so cumbersome as to be essentially useless.

The proposal is to extend the existing rules for operators; if the operator would not be legal according to the existing specification, it would be interpreted as an ordinary call to a distinguished method. For example, a*b+c might be translated as: a.times(b).plus(c), Or a[i,j]=k might be translated as: a.set(i,j,k) if the variables involved were not primitives.

This approach has been used in other languages to great effect. There are three caveats:

  1. While the operation involved may be commutative, the method invocation syntax is not. So, if a Matrix class defines a scalar element-wise multiply method Matrix times(int s) { ... } it may invoked by this syntax m*2 but not by the reverse 2*m because 2.times(m) would not be legal. To cope with situations where the lhs is a primitive, but the rhs is an object, there need to be names for the reversed operations. 1/m maps to m.divideReverse(1)

  2. The == operator could not be defined to call equals. But for immutable types, this isn't an issue.

  3. Java currently uses the distinction between primitives and reference types to disambiguate expressions such as (p)+q which might either be a cast or an addition. Such expressions would still need to be disambiguated based on whether the type is a primitive, a seemingly arbitrary distinction.

There are no binary compatibility implications. This proposal is much more attractive when combined with the immutable object optimization, because complex numbers could be compatibly added and used in form close to that of other numerical primitives.

One unresolved issue is what names should the operators map to?

plusSimple alphanumeric names, as used in the preceeding. They should be chosen to agree with existing practice (like the BigInteger package). One of the reasons I like this syntax is that the names suggest a semantics which could help guide developers away from some of the abuses seen in C++.
operator+The C++ syntax.
this+A slightly more suggestive operator oriented syntax. It looks particularly good in the full declaration:
complex this+(complex b) { ... }

We should also implement Guy Steele's suggestion of defining some of the Unicode operators to map to methods (eg. 2219, raised dot=>; 00F7, division sign=>a.divide(b); 00D7, multiplication sign=>a.cross(b); these three are important for matricies, but there are many more symbols that we could exploit).


The current spec requires the use of round-to-nearest, the IEEE 754 default and recommended mode. This is exactly the right behaviour in the vast majority of situations.

However, in some situations other rounding modes are needed. The usual mechanism is to have a procedure that jams a new value into a floating point status register. This changes global state, rather than affecting the operators case-by-case. General procedures almost never react well to being invoked with an altered rounding mode. For example:

v = sin(x);

will rarely return a useful lower bound for the value of sin(x).

It is usually better done on a more controlled scope. The Borneo project proposes extending the language with a new statement that brackets a block of code, vaguely like this:

round(up) {

This unfortuantly has pretty dramatic implications for the VM & language specs and every version of them. From my limited personal experience, it has always seemed to me that rounding control needed to be done on an even finer granularity. For example, given:

v = a - b*c

To get a lower bound for v, the subtraction needs to be rounded down, while the multiplication needs to be rounded up. I propose that a new standard package be written that provides these operations explicitly:

v = RoundDown.subtract(a,RoundUp.multiply(b,c));

A decent compiler will inline these and generate good code.


An aside on nomenclature: The word exception in the IEEE 754 spec has a rather different meaning than the same word in C++ or Java. The IEEE spec uses it to simply mean something that goes wrong, like a divide by zero. When an IEEE exception occurs a bit in a flag word is set and a trap may occur. Exception in C++ or Java means a non-local transfer of control from a throw statement to an enclosing try/catch. An IEEE trap and a Java exception are essentially the same thing.

The IEEE 754 spec defines several standard arithmetic exceptions:

By and large, these are best handled by letting the default action occur. For example, divide-by-zero and overflow produce infinity (except 0/0 produces NaN). Underflow produces zero. The behavior described in the Java spec is these default actions. The IEEE spec also calls for these to be available as a word of flags and traps that alter the flow of the program. In practice, the flags and traps are rarely used, and the default behavior works well.

However, there are some important situations where these can be very useful. A standard example in C looks like this:

if(something_went_wrong()) careful_but_slow_computation();
There are a variety of ways this could be done, but I havn't seen one that's particularly appealing (methods that manipulate global state feels like a clunky hack, special syntax feels like an overblown solution to a rare problem). In this case, global flag values that bad a problem since they don't affect the behaviour of programs that don't pay attention to them. Trap enables are a different matter.

There's also a theoretical issue with performance: hardware designers I've talked to have said that allowing for IEEE exceptions adds significant complexity to floating point pipelines, and often creates a performance penalty, while the default IEEE action allows the pipelines to continue flowing smoothly.

Traps also cause a serious problem with correctness: if they happen in code that is not equipped to handle them, data structure invarients could easily become corrupted when unexpected traps are taken.

I propose to add two methods to class Math:

public int getIEEEflags();
public void setIEEEflags(int f);
That would get and set the flags. I do not propose to add trapping behaviour.


The IEEE 754 spec defines many NaN values. The Java spec (20.9.22, 20.9.23, 21.10.21 & 20.10.22) collapse all these values into one. This shouldn't happen and is causing some developers significant problems. The spec should be changed.


Java doesn't support multidimensional matricies. People often use arrays-of-arrays, which are only marginally useful. A more satisfactory solution is to use classes:
final class FloatMatrix2D {
    private final float[] m;
    private final int cols;
    public FloatMatrix2D(int c, int r) {
	cols = c;
	m = new float[c*r];
    public float get(int c, int r) {
	return m[r*cols+c];
Between operator overloading, method inlining, and efficient classes, This straightforward implementation leads to optimal and convenient behavior.

It does have one interaction with operator overloading: it's important that the [] operator be allowed to have multiple comma-seperated indexes.

Subscript Checking

Most numerical inner loops do intensive array subscripting. The Java spec mandates subscript checking always and everywhere. This mandatory subscript checking is partly for security and partly for reliability: it's a big reason why Java programs are as solid as they are. But they do incur a runtime cost. I think it would be wrong to either eliminate the checks or even provide a "no subscript checking" switch, rather it appears workable to depend on optimizers to eliminate them when it can be determined to be safe by flow analysis.

Memory Hierarchy

The interactions of the disk/ram/cache/register memory hierarchy are critically important in getting good performance. Java's memory model tends to scatter objects throughout the address space, which has a negative impact on paging and cache locality. Languages like Fortran and C have mechanisms that allow developers to cluster objects in memory and have coherent layout of multi dimensional arrays. But the current automatic storage allocation is one of the big reasons that Java has been so successful: it makes programming much easier and the results much more reliable. It feels like the right answer is to rely on sophisticated garbage collection algorithms to automate the solution. In particular, the "train algorithm" does an excellent job of clustering related objects & this is the one that will be in the upcoming HotSpot release.


I propose the following changes:
  1. Numerical Precision: Add `looseNumerics' modifier for classes and methods.
  2. Allowable Compiler Optimizations: Implement the `idealizedNumerics' modifier for classes and methods.
  3. Standard Library Extensions: Minimal: add float version of sqrt()
  4. Efficient classes: Implement "immutable" classes.
  5. Complex Numbers: Just do the obvious class based implementation.
  6. Operator Overloading: Map a+b to when a is an object
  7. Rounding: Nothing in the near term. At most add a library of explicit rounding methods.
  8. Exceptions: Implement the set/get flag methods.
  9. NaN: Remove the restrictions on NaN that collapse all NaN values onto the same value.
  10. Matricies May have to do something new here. It's not clear what.
  11. Subscript Checking Rely on the optimizer.
  12. Memory Hierarchy Rely on the garbage collector.