Unwrap the package statement's potential

Minimize project complexity and maximize code reuse

Apart from the well-known and generally observed Sun Microsystems package naming convention for avoiding top-level package name collisions, few programmers thoroughly understand the deceptively simple package statement. Most Java programmers think the package keyword is little more than a broadaxe means to group project classes together. Most Java programmers simply use the package feature to create one unique namespace per project. Unfortunately, this approach does not stand the test of time nor scale.

When a simplistic packaging attitude scales up to team-scale (let alone enterprise-scale) Java code repositories, it gradually and painfully becomes clear that incorrectly creating and managing your Java code repository's package hierarchy can have costly and profound code maintenance implications. Worse still, these problems grow as your codebase matures and typically infect code with total disregard for project boundaries.

Consequently, when it comes to using the package statement, a few decisions must be made correctly from day one.

In this article, I explain why many Java programmers improperly use the package keyword and show you one alternative approach that has stood the test of time.

The newbie approach of using Java packages

When you first started programming in Java, you typically did not use the package statement at all. The classic HelloWorld introduction to the language quite rightly does not use nor discuss Java packages or its package keyword in any way:

// (No package statement here!)
class HelloWorld {
   public static void main(String[] args) {
      System.out.println("Hello World");
   }
}

You simply declared your classes implicitly in the default package (the package with no name) so you could run your Java program in the least verbose way (i.e., by commanding your console):

> java HelloWorld

Luckily, Java's design wasn't crippled by this program execution convenience. Elegantly and powerfully supporting large-scale programming projects was one of the top design priorities, and this is reflected in the package language feature. Hence, the newbie approach of class declaration in the default package is not sustainable when you graduate to implementing real projects.

Class name collisions and the birth of packages

As you get more comfortable with Java, you quickly find that leaving all of your project's classes in the default package limits the practical number of classes this default package namespace can hold. For example, if your first few experimental classes were called Main or Program, and your first true project also required a Main or Program class as its main entry point, then you would have a class name collision. Either you deleted your old classes or you remembered that Java allows you to create multiple package namespaces by subdividing the global namespace into multiple package-scope spaces.

The point at which Java newcomers typically and finally see the light in regards to the package statement is when they start their second Java project and want a clean separation between their first project's classes and their second project's classes.

Soon, creating a new package for each new Java project becomes second nature. Unfortunately, many Java programmers' understanding of package's true potential stagnates at this point. But continuing to use Java's package feature in this primitive way is wholly unsatisfactory in the long term, especially where code repositories grow in size from mere molehills to mountains.

Code duplication: the big no-no

The long-term problem of simply creating a new package for each new project is code duplication. Code duplication is one of the big evils of programming because:

  • Maintenance costs can spiral out of control
  • Readability suffers
  • Code becomes bloated
  • System performance might turn sluggish

We all know the root of this problem: trademark programmer's laziness. Often we get the feeling we're working on something that we've already done or solved in some distant past, so we hunt down the old solution, copy the appropriate code (logic snippet, entire method, or (hopefully not!) entire classes), and joyfully paste this into our new project. Hence the expression cut-and-paste programming.

If you shiver at the thought of your Java code repository being littered with multiple copies of near-identical bits of logic, methods, or even classes, then you need to unleash the package statement's true power in your day-to-day Java development methodology.

The Big Bang...uh, I mean split

Let's approach the code duplication problem logically: to outlaw and eradicate all code duplication, any nontrivial piece of code should only occur once and once only. This means, among other things, that any and all

  • Generic logic
  • Generic data groupings
  • Generic methods/routines
  • Generic constants
  • Generic classes
  • Generic interfaces

should never be declared in an application-specific package.

This key observation leads us to the package organization Golden Rule Number 1:

Golden Rule Number 1
Never mix generic code with application code directly

Below your

com.company

or

org.yourorg

package level, split your package hierarchy into two fundamentally incompatible branches:

  1. The reusable items branch
  2. The project-specific branch

Application code always uses generic code (library classes and routines) but never contains such code. The opposite is true also: library code never contains any application-specific code or even application dependencies.

If you have never considered these two fundamentally different kinds of code, then you need to start thinking about this fundamental code dichotomy in your daily programming routine. It is the key to unleashing true code reuse in your organization and banishing code duplication once and for all.

This black-and-white code perspective applied to packages logically requires a topmost-level branching into a generic/reusable package master branch and a nongeneric/nonreusable (i.e., application-specific) master branch.

So for example, for the past five years, I've split my org.lv top-level Java namespace into org.lv.lego and org.lv.apps. (lv stands for nothing more exciting than my initials.) Both these fundamental top-level branches are then further subdivided into more detailed subpackages. My lego branch, for example, is currently subdivided into the following subpackages:

org.lv.lego.adt
org.lv.lego.animation
org.lv.lego.applets
org.lv.lego.beans
org.lv.lego.comms
org.lv.lego.crunch
org.lv.lego.database
org.lv.lego.files
org.lv.lego.games
org.lv.lego.graphics
org.lv.lego.gui
org.lv.lego.html
org.lv.lego.image
org.lv.lego.java
org.lv.lego.jgl
org.lv.lego.math
org.lv.lego.realtime
org.lv.lego.science
org.lv.lego.streams
org.lv.lego.text
org.lv.lego.threads

Note how most of these packages' logical content is self-evident as a result of carefully choosing appropriate and self-descriptive package subbranch names (compare to the java.* hierarchy). This is critically important in unlocking the reuse potential of reusable (generic) resources such as reusable logic, routines, constants, classes, and interfaces. Poorly named package branches, like poorly named classes/interfaces themselves, can confuse your intended user base and sabotage your resources' reuse potential.

At these deeper package levels, you again must be very careful about how you further organize your packages.

Here's Golden Rule Number 2:

Golden Rule Number 2
Keep it hierarchical

Always create a package hierarchy that has a balanced, fractal-like tree structure.

If you end up with a hierarchy that degenerates, in places, into a linear listing, then you are failing to exploit the Java package feature correctly. The classic mistake is simply listing project packages under your top-level applications package branch, my equivalent org.lv.apps. This is a mistake because a linear list of projects is not hierarchical. Linear lists are hard for human brains to grasp long term; hierarchies, on the other hand, are a natural fit for our brains' neural networks.

Projects can always be categorized by a key criterion, and this criterion or attribute should reflect in your Java package hierarchy. As an example, here's how my org.lv.apps is currently subdivided:

org.lv.apps.comms
org.lv.apps.dirs
org.lv.apps.files
org.lv.apps.games
org.lv.apps.image
org.lv.apps.java
org.lv.apps.math

Obviously your subdivisions will most likely differ from mine, but the important point is to think big and always keep future expansion in mind. Deep package hierarchies are healthy. Shallow ones are not.

Where to store all those static utility routines

Once you've accepted the logical need for two fundamentally different types of classes (generic ones and application-specific ones), you're just one step away from solving another awkward problem: where to store those oh-so-handy, but totally non-object-oriented, static utility routines.

I always despair when I see Java code that contains completely generic facets embedded in application-specific classes. Say an e-commerce application relied on a class called Customer containing, among other things, the following method:

private String surroundedBy(String string, String quote) {
   return quote + string + quote;
}

The class Customer programmer included a utility method to produce a string, which is quoted: method surroundedBy(String, String). The method is declared private, presumably because the author judged the method to be a Customer class implementation detail. Since the method is also not declared static, it apparently follows that this method is deliberately declared as an instance method. Looks perfectly benign, or is it? What is wrong with this method?

First of all, since this method is declared as an instance method, why does it not depend on any Customer object state (i.e., object fields)? It does not depend on any object fields because it does not need or use any fields; this utility method only requires its parameters and nothing else to do its job. This is the telltale logical signature of a class-independent utility method; in other words, this method is, in fact, not logically an instance method at all.

Secondly, this method also does not play any sensible part in the abstraction that class Customer should embody, so it simply does not belong in the Customer class to begin with. This method looks less benign by the minute. So what next?

The correct, by-the-book location for the surroundedBy() method is inside a class with an exclusive focus on string processing, not customerhood. Unfortunately, the String class itself is declared final and can't therefore be subclassed into a BetterString class (for example) to rehouse the surroundedBy() method in a logically justified place. A sensible alternative approach is to define a new class devoted to string processing, say class StringUtilities (or the shorter StringUtils or shorter still StringKit), and promote the surroundedBy() method by making it available as a public static routine in class StringKit, like this:

public class StringKit {
// .. Many other string-processing routines here
public static String surroundedBy(String string, String quote) {
   return quote + string + quote;
}
// .. Many other string-processing routines here
} // End of class StringKit

What do we achieve by performing this Extract Method refactoring?

In the short term, we achieve two things:

  • You make available a perfectly reusable (and therefore valuable) piece of code for future reuse across project and application boundaries
  • You improve Customer abstraction implementation by eliminating the dilution (read: pollution) of the abstraction

In the long term, the above refactoring technique results in other, possibly even more important and valuable, side effects:

  • Less new code is written (future code will call StringKit.surroundedBy())
  • Your software system's overall architecture becomes simpler to understand as more top-level logic and structure become clearer
  • Your software becomes more robust because more code relies on a library of foundation building blocks that will be tested more thoroughly and more frequently than "plain" application code

Unfortunately, real-life software is truly littered with embedded methods like surroundedBy(), yet few Java programmers reuse these methods because:

  • They are declared in application-specific code, which by definition is deemed nongeneral and therefore nonreusable
  • They are often not even visible to any programmers who may wish to reuse them because they are declared private or package-scope

One solution is to methodically identify and move such misplaced reusable methods to problem domain-specific utility classes. Read the sidebar, "Static Utility Methods Repositories, A Personal Example," for an example of how to tackle this.

Related:
1 2 Page 1
Page 1 of 2