A case for keeping primitives in Java

Primitives are essential for applications dominated by numerical calculations

Primitives have been part of the Java programming language since its initial release in 1996, and yet they remain one of the more controversial language features. John Moore makes a strong case for keeping primitives in the Java language by comparing simple Java benchmarks, both with and without primitives. He then compares the performance of Java to that of Scala, C++, and JavaScript in a particular type of application, where primitives make a notable difference.

Question: What are the three most important factors in purchasing real estate?
Answer: Location, location, location.

This old and often-used adage is meant to imply that location completely dominates all other factors when it comes to real estate. In a similar argument, the three most important factors to consider for using primitive types in Java are performance, performance, performance. There are two differences between the argument for real estate and the argument for primitives. First, with real estate, location dominates in almost all situations, but the performance gains from using primitive types can vary greatly from one kind of application to another. Second, with real estate, there are other factors to consider even though they are usually minor in comparison to location. With primitive types, there is only one reason to use them — performance; and then only if the application is the kind that can benefit from their use.

Primitives offer little value to most business-related and Internet applications that use a client-server programming model with a database on the backend. But the performance of applications that are dominated by numerical calculations can benefit greatly from the use of primitives.

The inclusion of primitives in Java has been one of the more controversial language design decisions, as evidenced by the number of articles and forum posts related to this decision. Simon Ritter noted in his JAX London in November 2011 keynote address that serious consideration was being given to the removal of primitives in a future version of Java (see slide 41). In this article I'll briefly introduce primitives and Java's dual-type system. Using code samples and simple benchmarks, I'll make my case for why Java primitives are needed for certain types of applications. I will also compare Java's performance to that of Scala, C++, and JavaScript.

Primitives versus objects

As you probably already know if you are reading this article, Java has a dual-type system, usually referred to as primitive types and object types, often abbreviated simply as primitives and objects. There are eight primitive types predefined in Java, and their names are reserved keywords. Commonly used examples include int, double, and boolean. Essentially all other types in Java, including all user-defined types, are object types. (I say "essentially" because array types are a bit of a hybrid, but they are much more like object types than primitive types.) For each primitive type there is a corresponding wrapper class that is an object type; examples include Integer for int, Double for double, and Boolean for boolean.

Primitive types are value based, but object types are reference based, and therein lies both the power and the source of controversy of primitive types. To illustrate the difference, consider the two declarations below. The first declaration uses a primitive type and the second uses a wrapper class.

int n1 = 100;
Integer n2 = new Integer(100);

Using autoboxing, a feature added to JDK 5, I could shorten the second declaration to simply

Integer n2 = 100;

but the underlying semantics don't change. Autoboxing simplifies the use of wrapper classes and reduces the amount of code a programmer has to write, but it doesn't change anything at runtime.

The difference between the primitive n1 and the wrapper object n2 is illustrated by the diagram in Figure 1.

A diagram of the memory layout of primitives versus objects. John I. Moore, Jr.

Figure 1. Memory layout of primitives versus objects

The variable n1 holds an integer value, but the variable n2 contains a reference to an object, and it is the object that holds the integer value. In addition, the object referenced by n2 also contains a reference to the class object Double.

The problem with primitives

Before I try to convince you of the need for primitive types, I should acknowledge that many people won't agree with me. Sherman Alpert in "Primitive types considered harmful" argues that primitives are harmful because they mix "procedural semantics into an otherwise uniform object-oriented model. Primitives are not first-class objects, yet they exist in a language that involves, primarily, first-class objects." Primitives and objects (in the form of wrapper classes) provide two ways of handling logically similar types, but they have very different underlying semantics. For example, how should two instances be compared for equality? For primitive types, one uses the == operator, but for objects the preferred choice is to call the equals() method, which isn't an option for primitives. Similarly, different semantics exist when assigning values or passing parameters. Even the default values are different; e.g., 0 for int versus null for Integer.

For more background on this issue, see Eric Bruno's blog post, "A modern primitive discussion," which summarizes some of the pros and cons of primitives. A number of discussions on Stack Overflow also focus on primitives, including "Why do people still use primitive types in Java?" and "Is there a reason to always use Objects instead of primitives?." Programmers Stack Exchange hosts a similar discussion entitled "When to use primitive vs class in Java?".

Memory utilization

A double in Java always occupies 64 bits in memory, but the size of a reference depends on the Java virtual machine (JVM). My computer runs the 64-bit version of Windows 7 and a 64-bit JVM, and therefore a reference on my computer occupies 64 bits. Based on the diagram in Figure 1 I would expect a single double such as n1 to occupy 8 bytes (64 bits), and I would expect a single Double such as n2 to occupy 24 bytes — 8 for the reference to the object, 8 for the double value stored in the object, and 8 for the reference to the class object for Double. Plus, Java uses extra memory to support garbage collection for objects types but not for primitive types. Let's check it out.

Using an approach similar to that of Glen McCluskey in "Java primitive types vs. wrappers," the method shown in Listing 1 measures the number of bytes occupied by an n-by-n matrix (two-dimensional array) of double.

Listing 1. Calculating memory utilization of type double

public static long getBytesUsingPrimitives(int n)
    System.gc();   // force garbage collection
    long memStart = Runtime.getRuntime().freeMemory();
    double[][] a = new double[n][n];

    // put some random values in the matrix
    for (int i = 0;  i < n;  ++i)
        for (int j = 0; j < n;  ++j)
            a[i][j] = Math.random();

    long memEnd = Runtime.getRuntime().freeMemory();

    return memStart - memEnd;

Modifying the code in Listing 1 with the obvious type changes (not shown), we can also measure the number of bytes occupied by an n-by-n matrix of Double. When I test these two methods on my computer using 1000-by-1000 matrices, I get the results shown in Table 1 below. As illustrated, the version for primitive type double equates to a little more than 8 bytes per entry in the matrix, roughly what I expected. However, the version for object type Double required a little more than 28 bytes per entry in the matrix. Thus, in this case, the memory utilization of Double is more than three times the memory utilization of double, which should not be a surprise to anyone who understands the memory layout illustrated in Figure 1 above.

Runtime performance

To compare the runtime performances for primitives and objects, we need an algorithm dominated by numerical calculations. For this article I have chosen matrix multiplication, and I compute the time required to multiply two 1000-by-1000 matrices. I coded matrix multiplication for double in a straightforward manner as shown in Listing 2 below. While there may be faster ways to implement matrix multiplication (perhaps using concurrency), that point is not really relevant to this article. All I need is common code in two similar methods, one using the primitive double and one using the wrapper class Double. The code for multiplying two matrices of type Double is exactly like that in Listing 2 with the obvious type changes.

Listing 2. Multiplying two matrices of type double

public static double[][] multiply(double[][] a, double[][] b)
    if (!checkArgs(a, b))
        throw new IllegalArgumentException("Matrices not compatible for multiplication");

    int nRows = a.length;
    int nCols = b[0].length;

    double[][] result = new double[nRows][nCols];

    for (int rowNum = 0;  rowNum < nRows;  ++rowNum)
        for (int colNum = 0;  colNum < nCols;  ++colNum)
                double sum = 0.0;

                for (int i = 0;  i < a[0].length;  ++i)
                    sum += a[rowNum][i]*b[i][colNum];

                result[rowNum][colNum] = sum;

    return result;

I ran the two methods to multiply two 1000-by-1000 matrices on my computer several times and measured the results. The average times are shown in Table 2. Thus, in this case, the runtime performance of double is more than four times as fast as that of Double. That is simply too much of a difference to ignore.

The SciMark 2.0 benchmark

Thus far I've used the single, simple benchmark of matrix multiplication to demonstrate that primitives can yield significantly greater computing performance than objects. To reinforce my claims I'll use a more scientific benchmark. SciMark 2.0 is a Java benchmark for scientific and numerical computing available from the National Institute of Standards and Technology (NIST). I downloaded the source code for this benchmark and created two versions, the original version using primitives and a second version using wrapper classes. For the second version I replaced int with Integer and double with Double to get the full effect of using wrapper classes. Both versions are available in the source code for this article.

The SciMark benchmark measures performance of several computational routines and reports a composite score in approximate Mflops (millions of floating point operations per second). Thus, larger numbers are better for this benchmark. Table 3 gives the average composite scores from several runs of each version of this benchmark on my computer. As shown, the runtime performances of the two versions of the SciMark 2.0 benchmark were consistent with the matrix multiplication results above in that the version with primitives was almost five times faster than the version using wrapper classes.

You've seen a few variations of Java programs doing numerical calculations, using both a homegrown benchmark and a more scientific one. But how does Java compare to other languages? I'll conclude with a quick look at how Java's performance compares to that of three other programming languages: Scala, C++, and JavaScript.

Benchmarking Scala

Scala is a programming language that runs on the JVM and appears to be gaining in popularity. Scala has a unified type system, meaning that it doesn't distinguish between primitives and objects. According to Erik Osheim in Scala's Numeric type class (Pt. 1), Scala uses primitive types when possible but will use objects if necessary. Similarly, Martin Odersky's description of Scala's Arrays says that "... a Scala array Array[Int] is represented as a Java int[], an Array[Double] is represented as a Java double[] ..."

So does this mean that Scala's unified type system will have runtime performance comparable to Java's primitive types? Let's see.

1 2 Page 1
Page 1 of 2