Last week I wrote a piece on why it isn't a good idea to just reboot Unix boxes when something goes wrong without first trying to determine why it went amiss. The post was predicated on my 20 years of wrangling Unix and Linux boxes. I thought it was an innocuous point, but it stirred up a hornet's nest.
From some of the feedback I received via email and in the ensuing massive Slashdot thread, you'd think that Unix admins reboot their boxes every morning during their first cup of coffee. Yes, I got positive comments from those who understood what I was saying. But there were plenty of others who seemed to think that rebooting is something best done early and often. (For the record, I couldn't give a rat's ass about uptime numbers -- that's completely unrelated.)
[ Paul Venezia has been on a tear, writing about when to reboot a Unix box and the nine traits of the veteran Unix admin. He was recently seen shaking his fist at the sky. | Also on InfoWorld: Read Paul Venezia's Deep Dive PDF on virtualization networking. ]
I just don't get it. What the hell has gone wrong with system administration?
One of the more telling comments I received was the idea that since the advent of virtualization, there's no point in trying to fix anything anymore. If a weird error pops up, just redeploy the original template and toss the old VM on the scrap heap. Similar ideas revolved around re-imaging laptops and desktops rather than fixing the problem. OK. Full stop. A laptop or desktop is most certainly not a server, and servers should not be treated that way. But even that's not the full reality of the situation.
I'm starting to think that current server virtualization technologies are contributing to the decline of real server administration skills. Sure, you can redeploy a template in 10 minutes or so, but that's only a "solution" if you're talking about a stateless server that gets its data from elsewhere in the infrastructure and operates in a farm. You're not going to wantonly redeploy a database server just because something went slightly wrong somewhere, are you? Sure, it's possible to build structures to support that, but unless it's a massive infrastructure where there are dozens or hundreds of database servers, what's the point?







