So, you've inherited some code. It's 50,000 lines of the oldest code on the project. The authors have left the company and won't return your calls. It's undocumented, badly designed, and buggy; yet the rest of the team uses it all the time. There's absolutely no option but to make it work, and that's your job. What do you do?
Don't stress. This article will make some concrete suggestions as to the sorts of things you can do, from the simple and mindless to the complex and dangerous. The process can be mostly evolutionary and absolutely safe. In a couple of months, colleagues will stop complaining about your section of the code, and a couple of months after that, your revamped code will be the pride of the project.
Some of the suggestions that follow are nothing new. Indeed, concepts such as documenting code and following coding standards have been drummed into programmers since they went to school. However, the usual reasons given are motherhood statements -- things you wouldn't say the opposite of (be a good boy, write understandable code, for example). I've tried to explain why you would want to do those things, and what their positive effects are.
So, let's get to it.
Start with the easy things
When making bad code good, it's best to start with easy changes, to make sure you don't break anything. A large lump of bad code breaks easily, and nobody can be expected to fix bugs in it without some preparation.
Fix the Javadoc comments
You never realize a comment's importance until you need to read it. Even if nobody else reads it (and usually, they don't), Javadoc comments are important for the code authors to help them remember what the code/class/parameter should do. Pieces of information particularly important to include are:
- Whether an object parameter may be null, and what it means if it is
- Whether a parameter needs to be mutable or not
- Whether a return value is mutable
- Whether a return value may be null
- Whether changes to the return value affect the returner's internal state
Fixing the Javadoc comments doesn't mean just adding them where there were none before. You should also delete misleading and trivial comments (such as documenting setters and getters) as they engender a false sense of security. Important information exists for inclusion in Javadoc comments, so there's no point wasting the time of potential readers.
When you write new methods and classes, try to include some Javadoc comments (just a few) to explain what they do. Even if it's obvious to you, the next person to get the code might thank you for the effort. If you don't know what to write in the Javadoc comments, you have no business writing the code.
Javadoc comments can also serve as a record of what you learned about the code. When you figure out what a particularly tricky, clever, or ugly method does (especially a private
or protected
method with no implied contract), write a comment to record it for posterity. This expands the amount of the code under your control and helps you later when you're trying to decipher something related.
Of course, Javadoc comments should only explain the effect of a method, never the implementation (because you might change that). If code within a method needs explanation, use standard code comments. A school of thought exists that says with good enough identifiers you don't need code comments at all. I'm not that extreme, but I take the point. Indeed, many code comments don't tell you anything new. I urge you to leave those uninformative comments out, as they only serve to make simple code verbose and intimidating.
Fix the code formatting
Code formatting standards always spark contention. I'm in favor of coding standards, as long as they're the ones I already use, of course. If you take over a piece of badly formatted code (different from how you would do it), and you're having trouble reading it, you'll want to reformat it. Having a project-wide coding standard makes it less defensible for people to reformat code to their personal unreadable style and more likely that any particular piece of code makes sense to you the first time you read it. The most widely accepted coding standard is probably Sun's.
The advantage of reformatting badly formatted, badly written code is that as you change it, you leave a clear trail indicating where you've been. If you find yourself looking at a badly formatted method, you know not to trust it because the expert (that's you) has not put a seal of approval on it. Moreover, if you reformat code, at least you know you've read it. You may not have understood it, but you might notice repeated sections of code, calls to obsolete methods, or any number of obviously bad things. If you're lucky, you might find a bug caused by indentation that doesn't match the brackets.
While you're messing with the code, you might like to rename identifiers to be meaningful. I have almost given up using abbreviations in identifier names because I find them difficult to understand and hard to pronounce. There's no point having an easy-to-type identifier if it makes so little sense that you can't remember it. And please, use ALLCAPS identifiers only for final
variables!
Finally, I think it's just dishonest to claim that code that does not meet project coding standards is good code.
Follow project conventions
Your project, if it is a big one, probably includes some conventions. Examples that I have encountered include:
- Error logging and tracing: The project may specify a standard interface to the logging system. Code that does not use the standard interface may write directly to
System.out
, causing messy output that can't be turned off by the logging system configuration. With a standard logging interface, logging implementations can be changed as the project evolves without breaking existing code. - Access to resources: There may be conventions for location of images to display in the user interface. Code that does not comply with the conventions, for example by referring to absolute file locations rather than the class path, will break as soon as the project needs to be packaged for release.
- Use of CORBA: Distributed object systems, such as CORBA, add complexity to a project. CORBA's limited capability to pass objects as arguments can, unless the programmers are vigilant, result in struct types being used instead of proper objects. Project standards can require that CORBA objects be wrapped in an OO/CORBA impedance mismatch adapter, which restricts the CORBA types to the remote access layer where they belong. Similarly, ORB instance creation is a task that belongs in the remote access layer, not in general code.
Changing bad code to follow these project conventions yields low-hanging fruit in the form of immediate results. Inventing conventions where they do not exist proves more difficult, but in all likelihood is worth the effort.
Refactoring
With any luck, by fixing the Javadoc comments, reformatting the code, and changing the code to comply with project conventions, you've learnt something about the code. Armed with this experience, you're prepared to change it -- the refactoring process. In case you're unfamiliar with the term, refactoring means changing the design of existing code. (See Resources for more on refactoring.)
Write automated unit tests
Unit tests are one of those things that we all think of as being done by the mythical perfect programmer, yet something for which the rest of us just never find time (but read Test Infected). However, with some experience, you'll find that unit tests save you time. Automated unit tests help you go fast. Here's how.
In all likelihood, if you are working on a large piece of bad code, it's part of a bigger project. It might be a distributed system with assorted servers and associated processes, and some sort of user interface. The only way to tell whether your code works is to start up some of the system, try a few things, and see if it works. Starting all these processes, and remembering the whole sequence, wastes your time. You're employed to fix bugs, not wait for computers. The solution: write small, quick tests that you can run quickly and that will tell you whether or not the code works. A JUnit test gives you a green light or a red light. Isn't that easier than checking all the output to make sure it is correct? With proper procedures, testing the correctness of the code becomes an absolute no-brainer.
Of course, writing unit tests is not easy, which is one of the reasons they're not as popular as they should be. Writing unit tests for distributed systems is very difficult indeed. However, writing correct code without testing proves even harder than that, so the easy way out is to write unit tests. You'll find that if you start with unit tests for simple parts of the code that have small interfaces, you'll eventually have the skills to tackle larger and more complex parts of the code. And almost certainly, you'll find some embarrassingly stupid bugs along the way.
Unit tests are very important because they tell you when you break things. And things are very likely to be broken, given the daring nature of the upcoming refactorings. By the way, if you break something, and somebody else detects it first, your unit tests need to be improved. Other people are handy like that.
You aren't gonna need it
One of the Extreme Programming mantras says, "You aren't gonna need it." That is, unless you have a requirement for some code today, you have no reason to write it, and to make your job easier, you shouldn't. Unnecessary code becomes particularly easy to detect in code already in use -- it's the bits nobody calls because they don't work. Usually, the requirement has lapsed because nobody expects it to ever work. In this case, it's easiest to delete the code. Think of every line you delete as being something you'll never need to understand, document, reformat, or refactor.
Also, find ideas that were started but never completed, and delete them. You'll find that a lot of refactorings or design decisions that you don't understand were made only to accommodate changes that never saw the light of day, and by deprecating those features, you can simplify the design. If it later turns out that you need the feature back, you can put it in and do it right.
Another fertile source of unnecessary code includes completed but nonetheless unused features. These are a consequence of designing code before knowing what it is supposed to do. Take them out. Methods and classes will vanish before your very eyes, and nobody will notice.
Break up big classes
With big classes the usual causes are:
- No clear definition of what the class should do, resulting in it doing rather a lot of different things
- Out-of-control inner classes
- Numerous static and instance methods
- Excessive numbers of convenience methods
- Cut-and-pasted code
My rule of thumb states that 500 lines of code amounts to a big class. Classes over 1,000 lines are bad and must be fixed, and classes over 2,000 lines equal acts of terrorism. In the paragraphs below you'll find suggestions for five areas on how to break up big classes.
Firstly, extract the named inner classes. Some people write inner classes so they can achieve friend classes as you would in C++; for those people there is no solution but to stop them from writing Java. However, developers create many inner classes simply to put them near the code from which they are used. In such situations the inner classes can easily be extracted. However, tiny, anonymous inner classes are usually so intimately attached to their containing class that they can safely be left in the code. If they really are a problem, give them a name and move them out.
Secondly, examine the class's static and instance (nonstatic) methods. If you have a lot of both, your class may be trying to be two different things. Perhaps the static methods implement some sort of singleton, while the instance methods implement things referred to by the singleton. If that's the case, it won't be much effort to break the class in two, with one taking over the responsibilities of the singleton and the other taking over the responsibilities of the instances. You might also find some static methods that just provide some useful functionality. They can be extracted into a tools class. The difficult part about this sort of refactoring is finding all the references to this class that no longer compile. Refer to the "Tools" section below for some ideas.
A third idea: closely examine the class's instance variables. If two or more must be kept in sync (for example, a map from As to Bs and the reverse map from Bs to As), you should extract those variables into a new class. The new class's methods will replace any direct access to those instance variables in the original class. By placing all of the objects' behavior into one class, the relationship between those variables will be more aptly defined and more difficult to break.