While learning Java, you'll occasionally encounter a language behavior that leaves you puzzled. For example, what does expression new int[10] instanceof Object
returning true
signify about arrays? In this post, I'll examine some of Java's language oddities.
Arrays are objects
A long time ago, while writing about message formatters, I encountered something strange in Java's java.text.MessageFormat
standard library class. Consider the following pair of formatting methods:
StringBuffer format(Object[] arguments, StringBuffer result, FieldPosition pos)
StringBuffer format(Object arguments, StringBuffer result, FieldPosition pos)
According to the Javadoc, either method formats an array of objects. Wait a minute! How can you pass an array of objects to Object arguments
? Is this a Javadoc misprint? The answer is no: you can pass an array of objects to this parameter.
The Java Language Specification explains this oddity. Section 10.1. Array Types states (in the fine print) that Object
is also a supertype of all array types. Hence, each of the following lines of code will output true
:
System.out.println(new int[10] instanceof Object);
System.out.println(new String[] { "A", "B" } instanceof Object);
I've created an ArraysAreObjects
application that demonstrates arrays being objects. Listing 1 presents the application's source code.
Listing 1. ArraysAreObjects.java
(version 1)
public class ArraysAreObjects
{
public static void main(String[] args)
{
print(new String[] { "A", "B", "C" });
print("Hello");
print(new int[] { 1, 2, 3 });
print(new Integer[] { 1, 2, 3 });
}
static void print(Object objects)
{
if (objects instanceof Object[])
for (Object object: (Object[]) objects)
System.out.println(object);
else
System.out.printf("[%s]%n", objects);
System.out.println();
}
}
ArraysAreObjects
declares a print()
method that prints an object or an array of objects. It differentiates between these cases via objects instanceof Object[]
, which returns true
when objects
references an array of objects.
Compile Listing 1 as follows:
javac ArraysAreObjects.java
Run the resulting application as follows:
java ArraysAreObjects
You should observe the following output (with a different hash code):
A
B
C
[Hello]
[[I@42d3bd8b]
1
2
3
Perhaps you're surprised to see something like [[I@42d3bd8b]
instead of each integer on a separate line when executing print(new int[] { 1, 2, 3 });
. Section 4.10.3. Subtyping among Array Types provides an answer:
The following rules define the direct supertype relation among array types:
If S and T are both reference types, then S[] >1 T[] iff S >1 T.
Object >1 Object[]
Cloneable >1 Object[]
java.io.Serializable >1 Object[]
If P is a primitive type, then:
Object >1 P[]
Cloneable >1 P[]
java.io.Serializable >1 P[]
Essentially, this section tells us that Object
and not Object[]
is the supertype of a primitive array type
This information helps to explain why MessageFormat
has two format()
methods that differ only in the type of the first parameter: Object[]
or Object
. The format()
method with Object[]
as its first parameter is called for reference array type arguments (e.g., new String[] { "A", "B" }
), whereas the other format()
method is called for primitive array type arguments, as in format(new int[] { 1, 2, 3 }, sb, pos)
.
Never write code like that shown in Listing 1. Instead, use Java's variable arguments (varargs) language feature (introduced in Java 5 long after Java 1.1's debut of MessageFormat
) to achieve more concise code. Consider Listing 2.
Listing 2. ArraysAreObjects.java
(version 2)
public class ArraysAreObjects
{
public static void main(String[] args)
{
print("A", "B", "C");
print("Hello");
print(1, 2, 3);
}
static void print(Object... objects)
{
for (Object object: objects)
System.out.println(object);
System.out.println();
}
}
Although this code is straightforward, you might be curious about print(1, 2, 3);
. The compiler generates code to autobox each integer into an Integer
object. These objects are stored in an Object[]
array that's passed to print()
.
When you run this application, you should observe the following output:
A
B
C
Hello
1
2
3
The java.util
package's Arrays
and Objects
classes also demonstrate the impact of arrays being objects. Arrays
declares a boolean deepEquals(Object[] a1, Object[] a2)
method to determine whether two arrays are deeply equal (defined in that method's Javadoc). Similarly, Objects
declares boolean deepEquals(Object a, Object b)
to determine whether two nonarray or array objects are deeply equal.
You don't have to use Objects.deepEquals()
to compare a pair of nonarray objects. Instead, you could create a pair of arrays to hold these objects and pass these arrays to Arrays.deepEquals()
. But isn't that a code smell?
In case you're wondering how primitive array types are handled, note that Objects.deepEquals()
and Arrays.deepEquals()
delegate to Arrays.deepEquals0()
. Here's that method's source code:
static boolean deepEquals0(Object e1, Object e2)
{
assert e1 != null;
boolean eq;
if (e1 instanceof Object[] && e2 instanceof Object[])
eq = deepEquals ((Object[]) e1, (Object[]) e2);
else if (e1 instanceof byte[] && e2 instanceof byte[])
eq = equals((byte[]) e1, (byte[]) e2);
else if (e1 instanceof short[] && e2 instanceof short[])
eq = equals((short[]) e1, (short[]) e2);
else if (e1 instanceof int[] && e2 instanceof int[])
eq = equals((int[]) e1, (int[]) e2);
else if (e1 instanceof long[] && e2 instanceof long[])
eq = equals((long[]) e1, (long[]) e2);
else if (e1 instanceof char[] && e2 instanceof char[])
eq = equals((char[]) e1, (char[]) e2);
else if (e1 instanceof float[] && e2 instanceof float[])
eq = equals((float[]) e1, (float[]) e2);
else if (e1 instanceof double[] && e2 instanceof double[])
eq = equals((double[]) e1, (double[]) e2);
else if (e1 instanceof boolean[] && e2 instanceof boolean[])
eq = equals((boolean[]) e1, (boolean[]) e2);
else
eq = e1.equals(e2);
return eq;
}
As you can see, each primitive array type is handled as a special case.
Bytes and shorts are second-class citizens
According to Section 4.2. Primitive Types and Values in the Java Language Specification, Java supports five integral types: byte integer, short integer, integer, long integer, and character. These primitive types are represented via keywords byte
, short
, int
, long
, and char
, respectively. Each of the byte
, short
, int
, and long
types represents a signed integer. In contrast, char
represents an unsigned UTF-16 code unit.
Consider byte
, short
, int
, and long
. Each type differs only in its range of values based on the number of bits associated with the type: 8 (byte
), 16 (short
), 32 (int
), or 64 (long
). Because byte
and short
have smaller ranges (-128 through 127 for byte
and -32768 through 32767 for short
), the Java virtual machine (JVM) was designed with limited support for these types (which saved a few instructions).
The JVM provides various int
-only instructions (e.g., iadd
, isub
, and imul
). Similarly, the JVM provides various long
-only instructions (e.g., ladd
, ldiv
, and lneg
). In contrast, byte
and short
don't merit similar instructions.
The JVM does provide the following instructions to support byte
and short
:
bipush
: Sign-extend 8-bit byte integer operand to 32-bit integer and push the result onto the operand stack.i2b
: Pop the 32-bit integer from the top of the operand stack, truncate this value to an 8-bit byte integer, sign-extend the result to a 32-bit integer, and push the result onto the operand stack.i2s
: Pop the 32-bit integer from the top of the operand stack, truncate this value to a 16-bit short integer, sign-extend the result to a 32-bit integer, and push the result onto the operand stack.sipush
: Sign-extend 16-bit short integer operand to 32-bit integer and push the result onto the operand stack.
The Java language reflects this second-class support for byte
and short
by not supporting byte
or short
integer literals. An integer literal is either of type int
(with no suffix) or of type long
(with the l
or L
suffix). However, it does provide one convenience: when assigning an int
literal to a byte
or a short
variable, you don't have to specify a cast operator when the literal ranges from -128 through 127 (byte
) or -32768 through 32767 (short
). For example, you can specify byte b = 27;
instead of having to specify byte b = (byte) 27;
. Similarly, you can specify short s = 299;
instead of having to specify short s = (short) 299;
.
It's easier to understand this second-class citizen business when you examine the bytecode to a simple application. Consider Listing 3.
Listing 3. BytesAndShorts.java
(version 1)
public class BytesAndShorts
{
public static void main(String[] args)
{
byte b = 27;
short s = 299;
}
}
Assuming that you've compiled this listing to BytesAndShorts.class
, execute the following command to obtain a disassembly:
javap -v BytesAndShorts
The following is that portion of the disassembly that's relevant to the main()
method:
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: (0x0009) ACC_PUBLIC, ACC_STATIC
Code:
stack=1, locals=3, args_size=1
0: bipush 27
2: istore_1
3: sipush 299
6: istore_2
7: return
There are three local variables: 0 (args
), 1 (b
), and 2 (s
).
At the source code level, 27
is a 32-bit integer literal. For efficiency, 27
is stored as an 8-bit byte following the operation code (opcode) for the bipush
instruction. As stated earlier, this instruction sign-extends this 8-bit value to a 32-bit value that's stored on the operand stack. This value will be popped off the stack and stored in local variable 1 (via istore_1
) -- recall that 1 refers to b
in the source code.
Here is something interesting: the istore_1
instruction reveals that byte
variable b
is really of type int
at the JVM level. After all, the istore
instructions store 32-bit values.
Continuing with the disassembly, sipush 299
sign-extends 299
to a 32-bit value that's stored on the operand stack, and the subsequent istore_2
instruction stores this 32-bit value in int
variable s
.
It appears that the JVM does not recognize byte
or short
variables, but treats them as if they are of type int
. Listing 4 presents an application that probes deeper into this situation.
Listing 4. BytesAndShorts.java
(version 2)
public class BytesAndShorts
{
public static void main(String[] args)
{
int i = 35;
byte b = (byte) i;
short s = (byte) i;
}
}
Assuming that you've compiled this listing to BytesAndShorts.class
, execute the following command to obtain a disassembly:
javap -v BytesAndShorts
The following is that portion of the disassembly that's relevant to the main()
method:
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: (0x0009) ACC_PUBLIC, ACC_STATIC
Code:
stack=1, locals=4, args_size=1
0: bipush 35
2: istore_1
3: iload_1
4: i2b
5: istore_2
6: iload_1
7: i2b
8: i2s
9: istore_3
10: return
There are four local variables: 0 (args
), 1 (i
), 2 (b
), and 3 (s
).
The first two instructions convert 35
to a 32-bit integer and store it in int
variable i
. There are no surprises here. In contrast, the next three instructions retrieve this value, convert it to a byte
(via i2b
), and store the result in "int
" variable b
. Even though the JVM doesn't regard b
to be of type byte
, it still treats this variable as if it were a byte
: i2b
ensures that the 32-bit integer value won't lie outside the range -128 through 127.
The instruction sequence from offset 6 through offset 9 is interesting. I could have specified short s = (short) i;
instead of short s = (byte) i;
in the source code, but chose to deviate in order to see what happens at the JVM level. The i2b
instruction at offset 7 first converts the 32-bit integer value stored in i
to an 8-bit byte. The subsequent i2s
instruction converts this result to a 16-bit short integer, which is then sign-extended to a 32-bit integer in preparation for being stored in s
via istore_3
. The bytecode sequence for short s = (byte) i;
ensures that the value stored in "int
" variable s
doesn't lie outside the range -32768 through 32767 (and shows that you should avoid useless casts).
Private fields and methods are accessible without reflection
Under certain circumstances, you can access an object's private
field or call its private
method without having to use Java's Reflection API. Consider Listing 5.