If you shrug and reboot the box after looking around for a few minutes, you may have missed the fact that a junior admin inadvertently deleted /boot and some portions of /etc and /usr/lib64 due to a runaway script they were writing. That's what was causing the segfaults and the wonky behavior. But since you rebooted the server without digging into the problem, you've made it much worse, and you'll soon boot a rescue image -- with all kinds of ponderous work awaiting you -- while a production server is down.
This is but one significant reason reboots in the Unix world should be extremely rare. Rather than a troubleshooting step, they're a Hail Mary approach to server administration. In short, nobody ever fixed a problem caused by a full /var partition by rebooting the box. (And don't give me any pedantic nonsense about open filehandles -- you know what I mean.)
In many cases, it's extremely important not to reboot, because the key to fixing the problem is present on the system before the reboot, but will not be immediately available after. The problem will recur, and if the only known solution is to reboot, then the problem will never be fixed unless or until someone decides not to reboot and instead tries to find the root of the problem. Unfortunately, that's not as common as it should be. Face it -- a bad stick of RAM cares not a whit about system uptime or when the box was last booted. It'll cause problems no matter what.
The next time you're looking at a problem and someone says, "Hey, let's just reboot the thing," make sure you've exhausted every other possibility before you send it to
init 6. The time and pain you save will definitely be your own.
This story, "When in doubt, reboot? Not Unix boxes," was originally published at InfoWorld.com. Read more of Paul Venezia's The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.