Last week, I wrote a little item titled "Nine traits of the veteran Unix admin." Had I known that a few hundred thousand people would read it in just a few days, I might have put on a clean shirt or something. Regardless, I'm sorry I stopped at nine. I had at least fifteen, but it was already a long post.
The most interesting aspect of the feedback I received was that the vast majority of readers agreed with me on just about every point -- with the exception of the first and last items: sudo and reboots. (There were also a few folks who hammered me for not including vi in favor of vim -- I did! And who thought that because I referenced Perl briefly, I hadn't ever used bash or some such nonsense?)
[ Also on InfoWorld: Read Paul Venezia's Deep Dive PDF on virtualization networking. | Check out Paul Venezia's five-year plan to tackle the 8 problems IT must solve. ]
I want to take a closer look at the reboot issue. It's a hot spot for all server admins, but to Unix geeks, it's a deeper issue -- probably because Windows admins use reboots as one of their first troubleshooting steps, while it's one of the last for the Unix team.
Here's the reality: Server reboots should be rare -- very rare. I cited kernel updates and hardware replacement as the two leading causes of reboots in the Unix world. Some have mentioned significant security risks in not rebooting servers, but that's nonsense. If there's a security risk present in a service or application, a patch can be applied without requiring a reboot. If the security risk is present in a kernel module, it's generally possible to unload the module, apply a patch, and reload the module. Yes, as I said, you need to reboot if there's a security risk in the kernel. Otherwise, there's no real reason to reboot a Unix box.
Some argued that other risks arise if you don't reboot, such as the possibility certain critical services aren't set to start at boot, which can cause problems. This is true, but it shouldn't be an issue if you're a good admin. Forgetting to set service startup parameters is a rookie mistake. Naturally, if you're building the box and it's not in production, you can do all the reboot tests you want without adverse effects. That's just good practice.
But there's another side: Those who consider reboots to be a worthy troubleshooting step are going to get themselves in trouble sooner rather than later. Let's say a Unix box has gone wonky. A few services that were running will no longer start, maybe with a segfault, and other oddities abound.