Recently, I helped design a Java server application that resembled an in-memory database. That is, we biased the design toward caching tons of data in memory to provide super-fast query performance.
Once we got the prototype running, we naturally decided to profile the data memory footprint after it had been parsed and loaded from disk. The unsatisfactory initial results, however, prompted me to search for explanations.
Note: You can download this article's source code from Resources.
The tool
Since Java purposefully hides many aspects of memory management, discovering how much memory your objects consume takes some work. You could use the Runtime.freeMemory()
method to measure heap size differences before and after several objects have been allocated. Several articles, such as Ramchander Varadarajan's "Question of the Week No. 107" (Sun Microsystems, September 2000) and Tony Sintes's "Memory Matters" (JavaWorld, December 2001), detail that idea. Unfortunately, the former article's solution fails because the implementation employs a wrong Runtime
method, while the latter article's solution has its own imperfections:
- A single call to
Runtime.freeMemory()
proves insufficient because a JVM may decide to increase its current heap size at any time (especially when it runs garbage collection). Unless the total heap size is already at the -Xmx maximum size, we should useRuntime.totalMemory()-Runtime.freeMemory()
as the used heap size. - Executing a single
Runtime.gc()
call may not prove sufficiently aggressive for requesting garbage collection. We could, for example, request object finalizers to run as well. And sinceRuntime.gc()
is not documented to block until collection completes, it is a good idea to wait until the perceived heap size stabilizes. - If the profiled class creates any static data as part of its per-class class initialization (including static class and field initializers), the heap memory used for the first class instance may include that data. We should ignore heap space consumed by the first class instance.
Considering those problems, I present Sizeof
, a tool with which I snoop at various Java core and application classes:
public class Sizeof { public static void main (String [] args) throws Exception { // Warm up all classes/methods we will use runGC (); usedMemory (); // Array to keep strong references to allocated objects final int count = 100000; Object [] objects = new Object [count]; long heap1 = 0; // Allocate count+1 objects, discard the first one for (int i = -1; i < count; ++ i) { Object object = null; // Instantiate your data here and assign it to object object = new Object (); //object = new Integer (i); //object = new Long (i); //object = new String (); //object = new byte [128][1] if (i >= 0) objects [i] = object; else { object = null; // Discard the warm up object runGC (); heap1 = usedMemory (); // Take a before heap snapshot } } runGC (); long heap2 = usedMemory (); // Take an after heap snapshot: final int size = Math.round (((float)(heap2 - heap1))/count); System.out.println ("'before' heap: " + heap1 + ", 'after' heap: " + heap2); System.out.println ("heap delta: " + (heap2 - heap1) + ", {" + objects [0].getClass () + "} size = " + size + " bytes"); for (int i = 0; i < count; ++ i) objects [i] = null; objects = null; } private static void runGC () throws Exception { // It helps to call Runtime.gc() // using several method calls: for (int r = 0; r < 4; ++ r) _runGC (); } private static void _runGC () throws Exception { long usedMem1 = usedMemory (), usedMem2 = Long.MAX_VALUE; for (int i = 0; (usedMem1 < usedMem2) && (i < 500); ++ i) { s_runtime.runFinalization (); s_runtime.gc (); Thread.currentThread ().yield (); usedMem2 = usedMem1; usedMem1 = usedMemory (); } } private static long usedMemory () { return s_runtime.totalMemory () - s_runtime.freeMemory (); } private static final Runtime s_runtime = Runtime.getRuntime (); } // End of class
Sizeof
's key methods are runGC()
and usedMemory()
. I use a runGC()
wrapper method to call _runGC()
several times because it appears to make the method more aggressive. (I am not sure why, but it's possible creating and destroying a method call-stack frame causes a change in the reachability root set and prompts the garbage collector to work harder. Moreover, consuming a large fraction of the heap space to create enough work for the garbage collector to kick in also helps. In general, it is hard to ensure everything is collected. The exact details depend on the JVM and garbage collection algorithm.)
Note carefully the places where I invoke runGC()
. You can edit the code between the heap1
and heap2
declarations to instantiate anything of interest.
Also note how Sizeof
prints the object size: the transitive closure of data required by all count
class instances, divided by count
. For most classes, the result will be memory consumed by a single class instance, including all of its owned fields. That memory footprint value differs from data provided by many commercial profilers that report shallow memory footprints (for example, if an object has an int[]
field, its memory consumption will appear separately).
The results
Let's apply this simple tool to a few classes, then see if the results match our expectations.
Note: The following results are based on Sun's JDK 1.3.1 for Windows. Due to what is and is not guaranteed by the Java language and JVM specifications, you cannot apply these specific results to other platforms or other Java implementations.
java.lang.Object
Well, the root of all objects just had to be my first case. For java.lang.Object
, I get:
'before' heap: 510696, 'after' heap: 1310696 heap delta: 800000, {class java.lang.Object} size = 8 bytes
So, a plain Object
takes 8 bytes; of course, no one should expect the size to be 0, as every instance must carry around fields that support base operations like equals()
, hashCode()
, wait()/notify()
, and so on.
java.lang.Integer
My colleagues and I frequently wrap native ints
into Integer
instances so we can store them in Java collections. How much does it cost us in memory?
'before' heap: 510696, 'after' heap: 2110696 heap delta: 1600000, {class java.lang.Integer} size = 16 bytes
The 16-byte result is a little worse than I expected because an int
value can fit into just 4 extra bytes. Using an Integer
costs me a 300 percent memory overhead compared to when I can store the value as a primitive type.
java.lang.Long
Long
should take more memory than Integer
, but it does not:
'before' heap: 510696, 'after' heap: 2110696 heap delta: 1600000, {class java.lang.Long} size = 16 bytes
Clearly, actual object size on the heap is subject to low-level memory alignment done by a particular JVM implementation for a particular CPU type. It looks like a Long
is 8 bytes of Object
overhead, plus 8 bytes more for the actual long value. In contrast, Integer
had an unused 4-byte hole, most likely because the JVM I use forces object alignment on an 8-byte word boundary.
Arrays
Playing with primitive type arrays proves instructive, partly to discover any hidden overhead and partly to justify another popular trick: wrapping primitive values in a size-1 array to use them as objects. By modifying Sizeof.main()
to have a loop that increments the created array length on every iteration, I get for int
arrays:
length: 0, {class [I} size = 16 bytes length: 1, {class [I} size = 16 bytes length: 2, {class [I} size = 24 bytes length: 3, {class [I} size = 24 bytes length: 4, {class [I} size = 32 bytes length: 5, {class [I} size = 32 bytes length: 6, {class [I} size = 40 bytes length: 7, {class [I} size = 40 bytes length: 8, {class [I} size = 48 bytes length: 9, {class [I} size = 48 bytes length: 10, {class [I} size = 56 bytes
and for char
arrays:
length: 0, {class [C} size = 16 bytes length: 1, {class [C} size = 16 bytes length: 2, {class [C} size = 16 bytes length: 3, {class [C} size = 24 bytes length: 4, {class [C} size = 24 bytes length: 5, {class [C} size = 24 bytes length: 6, {class [C} size = 24 bytes length: 7, {class [C} size = 32 bytes length: 8, {class [C} size = 32 bytes length: 9, {class [C} size = 32 bytes length: 10, {class [C} size = 32 bytes
Above, the evidence of 8-byte alignment pops up again. Also, in addition to the inevitable Object
8-byte overhead, a primitive array adds another 8 bytes (out of which at least 4 bytes support the length
field). And using int[1]
appears to not offer any memory advantages over an Integer
instance, except maybe as a mutable version of the same data.
Multidimensional arrays
Multidimensional arrays offer another surprise. Developers commonly employ constructs like int[dim1][dim2]
in numerical and scientific computing. In an int[dim1][dim2]
array instance, every nested int[dim2]
array is an Object
in its own right. Each adds the usual 16-byte array overhead. When I don't need a triangular or ragged array, that represents pure overhead. The impact grows when array dimensions greatly differ. For example, a int[128][2]
instance takes 3,600 bytes. Compared to the 1,040 bytes an int[256]
instance uses (which has the same capacity), 3,600 bytes represent a 246 percent overhead. In the extreme case of byte[256][1]
, the overhead factor is almost 19! Compare that to the C/C++ situation in which the same syntax does not add any storage overhead.
java.lang.String
Let's try an empty String
, first constructed as new String()
:
'before' heap: 510696, 'after' heap: 4510696 heap delta: 4000000, {class java.lang.String} size = 40 bytes
The result proves quite depressing. An empty String
takes 40 bytes—enough memory to fit 20 Java characters.
Before I try String
s with content, I need a helper method to create String
s guaranteed not to get interned. Merely using literals as in:
object = "string with 20 chars";
will not work because all such object handles will end up pointing to the same String
instance. The language specification dictates such behavior (see also the java.lang.String.intern()
method). Therefore, to continue our memory snooping, try:
public static String createString (final int length) { char [] result = new char [length]; for (int i = 0; i < length; ++ i) result [i] = (char) i; return new String (result); }
After arming myself with this String
creator method, I get the following results:
length: 0, {class java.lang.String} size = 40 bytes length: 1, {class java.lang.String} size = 40 bytes length: 2, {class java.lang.String} size = 40 bytes length: 3, {class java.lang.String} size = 48 bytes length: 4, {class java.lang.String} size = 48 bytes length: 5, {class java.lang.String} size = 48 bytes length: 6, {class java.lang.String} size = 48 bytes length: 7, {class java.lang.String} size = 56 bytes length: 8, {class java.lang.String} size = 56 bytes length: 9, {class java.lang.String} size = 56 bytes length: 10, {class java.lang.String} size = 56 bytes
The results clearly show that a String
's memory growth tracks its internal char
array's growth. However, the String
class adds another 24 bytes of overhead. For a nonempty String
of size 10 characters or less, the added overhead cost relative to useful payload (2 bytes for each char
plus 4 bytes for the length), ranges from 100 to 400 percent.
Of course, the penalty depends on your application's data distribution. Somehow I suspected that 10 characters represents the typical String
length for a variety of applications. To get a concrete data point, I instrumented the SwingSet2 demo (by modifying the String
class implementation directly) that came with JDK 1.3.x to track the lengths of the String
s it creates. After a few minutes playing with the demo, a data dump showed that about 180,000 Strings
were instantiated. Sorting them into size buckets confirmed my expectations:
[0-10]: 96481 [10-20]: 27279 [20-30]: 31949 [30-40]: 7917 [40-50]: 7344 [50-60]: 3545 [60-70]: 1581 [70-80]: 1247 [80-90]: 874 ...
That's right, more than 50 percent of all String
lengths fell into the 0-10 bucket, the very hot spot of String
class inefficiency!
In reality, String
s can consume even more memory than their lengths suggest: String
s generated out of StringBuffer
s (either explicitly or via the '+' concatenation operator) likely have char
arrays with lengths larger than the reported String
lengths because StringBuffer
s typically start with a capacity of 16, then double it on append()
operations. So, for example, createString(1) + ' '
ends up with a char
array of size 16, not 2.
What do we do?
"This is all very well, but we don't have any choice but to use String
s and other types provided by Java, do we?" I hear you ask. Let's find out.