Four harmful Java idioms, and how to fix them

Rewrite convention and optimize your code for maintainability

Idioms in any programming language help you get your bearings and provide structure when you start writing code. But, as John O'Hanley points out, some Java programming idioms are both widespread and harmful, having a negative influence on code maintainability. Read on if you're ready to break with convention, rewrite four harmful Java idioms, and optimize your code for maintainability.

Coding idioms exist to make life easier for programmers -- in particular for maintenance programmers, who are usually reading code produced by someone else. Coding conventions are fundamentally about showing compassion for these maintenance workers; the conventions you choose should allow them to read, understand, and use your code as quickly and as painlessly as possible. The more you optimize your coding style for the mental experience of those who will be maintaining your code, the more compassionate your code becomes, and the more quickly it will be understood.

Similarly, higher-level idioms (immutability and package structure, for example) aim to both improve the design and make code easier to understand. In fact, some might say that "improving the design" and "making the code easier to understand" are, in the end, one and the same thing.

In this article, you'll see that some popular idioms should be changed in favor of alternatives that are demonstrably more compassionate towards the reader. One might argue that any idioms that are already widespread should never be abandoned, simply because they are expected by most readers. However, reader expectations are only one part of the equation, and they shouldn't necessarily override all other considerations. The fundamental question should not be "Does the reader expected this?" but rather "How fast will the reader understand it?" I'll approach four different problematic -- but common! -- idioms in turn.

Local variables, method arguments, and fields: Which is which?

When trying to understand code, a reader often needs to answer a simple question: Where did this data come from? In particular, when trying to understand a method, a reader needs to know which items are local variables, which are fields, and which are method arguments. To exercise compassion for your reader, and to absolutely minimize the effort required to understand your code, it's likely best to use a simple naming convention to distinguish between these three cases.

Many organizations qualify field variables with this to distinguish them from other kinds of variables. That's good, but it doesn't go far enough: there should be a convention to distinguish method arguments as well. The logic that justifies naming conventions for fields also applies to method arguments, and for exactly the same reasons.

Listing 1 offers an example of an equals() method that does not distinguish between the three kinds of variables.

Listing 1. A method that doesn't distinguish between variable types

public boolean equals (Object arg) {
  if (! (arg instanceof Range)) return false;
  Range other = (Range) arg;
  return start.equals(other.start) && end.equals(other.end);

The French writer Marcel Proust was famous for analyzing human experience to microscopic levels. How would a modern-day Proust describe the experience of reading this equals() method? When you try to understand it, you experience a small amount of discomfort when you encounter start, because it's suddenly referenced out of the blue. Your brain hits a speed bump as it encounters something it hasn't seen before -- Uh oh, what's this? After a moment, through a tedious process of elimination (the length of which is proportional to the size of the method), you will eventually figure out that start is actually a field. The same sort of trivial and tedious reasoning is also needed to distinguish method arguments from other types of data.

Listing 1 isn't very long. If the method were larger, then the effort necessary to understand it would be increased as well.

Why make the reader experience this jolt, this moment of confusion, however slight? Why not spare your readers the unpleasant experience of not understanding something immediately? Why force them to figure it out? If a simple naming convention can allow the reader to distinguish the three kinds of variables almost unconsciously, then shouldn't it be used? As Steve McConnell says in Code Complete, "It's OK to figure out murder mysteries, but you shouldn't need to figure out code. You should be able to read it."

One might argue that this is why fields should be declared at the start of a class. But that isn't a good solution, because it forces the reader to either memorize field names or perform a lookup operation in a distant section of the class. Either option makes the reader do more work. And the whole point is to exercise maximal compassion for the readers, and let them read and understand your code with minimal effort.

Listing 2 contains the example code from Listing 1, rewritten. It uses a naming convention in which:

  • Method arguments are prefixed with a
  • Fields are prefixed with f
  • Local variables have no prefix at all

Listing 2. Variable types are now clearly distinguished

public boolean equals (Object aOther) {
  if (! (aOther instanceof Range)) return false;
  Range other = (Range) aOther;
  return fStart.equals(other.fStart) && fEnd.equals(other.fEnd);

You might object to the style on display in Listing 2, protesting that "Hungarian notation is obsolete." But that objection is erroneous: Hungarian notation specifies type information. The above naming convention says nothing about type. Rather, it distinguishes between fields, arguments, and local variables. These are two completely different concepts

The idea of using naming conventions in this way may seem trivial, but it isn't: when such conventions are used throughout your code, the effort needed to understand it is significantly reduced, because you can perform more operations without thinking about them.

"By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and in effect increases the mental power of the race. Before the introduction of the Arabic notation, multiplication was difficult, and the division even of integers called into play the highest mathematical faculties. Probably nothing in the modern world would have more astonished a Greek mathematician than to learn that ... a large proportion of the population of Western Europe could perform the operation of division for the largest numbers. This fact would have seemed to him a sheer impossibility ... Our modern power of easy reckoning with decimal fractions is the almost miraculous result of the gradual discovery of a perfect notation."

-- Alfred North Whitehead,

An Introduction To Mathematics

Package-by-layer: Preventing the use of package-private scope

A common way of dividing up an application is package-by-layer:

  • com.blah.action
  • com.blah.dao
  • com.blah.model
  • com.blah.util

In other words, items belonging to any one feature are spread out over different packages, according to what might be called behavioral categories. Because the items in each feature need to be visible to each other, this implies that nearly all of the classes in such an application are public.

In effect, this common packaging style throws package-private scope out the window. Package-private scope isn't simply ignored; you're actually prevented from using it. Now, package-private scope was chosen as the default scope by the designers of the Java programming language. This choice was made, of course, to push you gently in the direction of better design -- to start with minimal scope, and increase it only if necessary. (This is the usual "minimize ripple effects by keeping secrets" technique, an idea at the very heart of object programming.) For some strange reason, a significant part of the Java community has, by adopting the package-by-layer convention, rejected an important way of minimizing scope. This seems unjustified.

An alternate style is package-by-feature:

  • com.blah.painting
  • com.blah.buyer
  • com.blah.seller
  • com.blah.webmaster
  • com.blah.useraccess
  • com.blah.util

Here, items are not grouped according to their behavioral style. Instead, the behavior of individual classes is treated as an implementation detail, and classes are grouped according to the highest possible level of abstraction -- the level of the feature -- wherein all items related to a feature (and only to that feature) reside in one and the same package. This is not a revolutionary or heretical idea; this is exactly what packages were created for in the first place.

For example, in a Web application, the com.blah.painting package might consist of these items:

  • A model object
  • A data access object
  • A controller or action object
  • statements.sql: The SQL statements used by the DAO
  • view.jsp: The JSP used to render the result to the user

It's important to note that, in this style, each package should contain all items related to a feature (and only that feature). In particular, the package may contain files other than Java source code. In package-by-feature, the ideal is to pass the deletion test: you should be able to delete a feature by deleting a single directory, without leaving behind any cruft.

The benefits of this style over package-by-layer are compelling:

  • Packages have much higher cohesion and modularity. Coupling between packages is minimized.

  • Code is much more self-documenting. The reader gets a general idea of what features are included just by reading the package names. In Code Complete, Steve McConnell refers to self-documenting code as "the Holy Grail of legibility."

  • The design still honors the idea of separation of layers, simply by using separate classes within each feature.

  • Related items are in the same place. There is no need to navigate around the source tree to edit closely related items.

  • Items are package-private by default, as they should be. If an item needs to be visible from another package, then it's changed to public. (Note that changing a class to public does not mean that all of its members should be made public as well; there can be a mixture of public and package-private members in the same class.)

  • Deleting a feature is often as simple as deleting a single item: a directory.

  • There are fewer items in each package, and the overall package structure evolves more naturally. For example, if a package gets too large, then it can be refactored as desired into two or more new packages. The alternative package-by-layer style, however, does not scale or evolve in this way at all: its packages contain an arbitrarily large number of classes, and you cannot easily refactor the package structure.

Some frameworks promote a package-by-layer style with conventional package names as a purported advantage: because conventional package names are used, programmers always know where to navigate to find items. But why force them to navigate at all? With the alternative package-by-feature style, such tedious navigation is usually not needed, so it entirely transcends the need for any such naming convention.

"The single most important factor that distinguishes a well-designed module from a poorly designed one is the degree to which the module hides its internal data and other implementation details from other modules."

-- Joshua Bloch,

Effective Java

JavaBeans: Why use them when immutables will do?

Immutable objects are objects that do not change state (their data, in other words) after construction. Martin Odersky, the principal creator of Scala, has recently praised the qualities of immutables. In Effective Java, Joshua Bloch makes a very convincing case in favor of strongly preferring immutable objects. To summarize Bloch's points, immutables:

  • Are simple
  • Are thread-safe, and require no synchronization
  • Can be shared freely
  • Never need to be defensively copied
  • Never need a copy constructor, or a clone() method
  • Make good building blocks for other classes
  • Make good Map keys and Set elements
  • Have failure atomicity -- that is, they cannot be left in an invalid state when an error occurs

"Classes should be immutable unless there is a very good reason for making them mutable," says Bloch. But his advice seems to have been largely ignored. In place of immutable objects, most applications use JavaBeans (or something similar) instead. JavaBean objects are significantly more complex than immutables. Their complexity arises from their large state space. In a rough sense, you might think of a JavaBean as the diametric opposite of an immutable object: it allows maximal mutability.

It's common to use JavaBeans to model database records. Is that an appropriate design? Think of it this way: if you were modeling a row from a database ResultSet, without any preconceptions or framework constraints whatsoever, what would your design be like? Would it be similar to a JavaBean, or would it be significantly different?

I think it would be enormously different:

1 2 Page 1
Page 1 of 2
How to choose a low-code development platform