The joy of data center automation -- and its hidden dangers

Automating manual or repetitive tasks is a source of great satisfaction for admins, but great disasters may await

Page 2 of 2

What should happen in that case is that the server throws an error via an SNMP trap, or email, and does not bring up the Web services. This problem would be instantly visible and presumably easily fixed after some digging. However, if the server continues to bring up all the services and is joined to the load-balanced group, it may not function properly.

Depending on the actual problem encountered, this might mean that all services on the new server are broken, which would be fairly obvious after a fashion, but it may mean something much more subtle than that. In fact, it could very well be something that will evade detection by service, content, and application monitoring frameworks. The server may look OK, but it really isn't. That is a much more devious problem to tackle.

It's even more troubling if the impact is relatively minor, meaning that problem reports come in sporadically during certain times of the day when new servers are spawned from that template, or only a subset of all users are affected because servers that are already running do not have the same issue. Those issues are extremely difficult to find. I would much rather have a dozen servers spin up, hit an error, send an alarm, and halt than come up all the way and corrupt an application. An application that's slowing down due to lower capacity is better than a fast app that's broken and potentially damaging a database.

The point is that seemingly minor automation efforts may work flawlessly for a long, long time before they run off the rails. Autopilot is great, but we still want someone looking over the instruments to make sure that things are running as they should. Throwing as much error checking as possible into what are otherwise simple automation tasks may seem somewhat onerous at first, but it is as critical as the automation step itself.

Some of my automation scripts are 25 percent functional code and 75 percent error checking and failure handling. I'm also a big fan of automation scripts that have no output whatsoever if problems are not encountered, but will throw debug info out to STDOUT if they run into problems. When used with mail -E in cronjobs or startup scripts, debug to STDOUT makes for a very simple notification step right from the source.

Automation truly is a source of great satisfaction. We get to build a clever framework to facilitate some goal, then watch it work. But like that Lego car, if we aren't paying attention, it will run into a wall eventually. It's best to plan for that right from the outset.

This story, "The joy of data center automation -- and its hidden dangers," was originally published at InfoWorld.com. Read more of Paul Venezia's The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

To comment on this article and other InfoWorld content, visit InfoWorld's LinkedIn page, Facebook page and Twitter stream.
| 1 2 Page 2
From CIO: 8 Free Online Courses to Grow Your Tech Skills
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.