"Unfortunately, for many companies, IT is basically a utility," says Atwell Williams, director of enterprise service management at BMC Software. "When you come home and flip on the light switch, you don't immediately call the electric company and say, 'Thanks a lot guys, great job.' But if the lights don't come on, you better believe people will call."
So the Houston-based provider of system management solutions uses its own software, BMC Performance Manager, to monitor service outages that might have happened had they not been caught in time. By identifying and measuring the number of potential outages, BMC can establish goals for reducing them. Better still, BMC identifies the systems most important to the ongoing success of the enterprise and gives those priority when deciding what to fix first.
"This is the essence of business service management," Williams says. "Many companies have lots of information telling them something bad is going to happen, but they don't know which of those things will truly impact their business. They end up spending time chasing down things that aren't really a problem."
At Sun Microsystems in Santa Clara, Calif., CIO Bill Vass sits in front of a custom dashboard monitoring the health of his "canaries" -- dummy users that log in to internal Sun systems across the globe every 15 minutes -- gauging response times, availability, and performance. When a system goes red, techs are automatically dispatched by pager to deal with the problem. In addition to obtaining real-time information, Vass and his team regularly comb detailed reports containing every conceivable metric, from code compliance to customer satisfaction -- the customers in this case being Sun employees.
"One of my rules is you manage what you measure," Vass says.
But Sun has taken performance monitoring a step further by assigning a dollar value to every potential outage -- essentially marrying system performance directly to ROI.
"At the end of every outage, we have a thing called 'estimated cost to Sun,' " Vass explains. "It took us eight months to determine the formula for that, and I had to get it blessed by finance before anyone would believe it. But now I can walk in and say, 'We need to upgrade network switches on this campus because if they fail, it will cost us $500,000 a month in downtime. New switches will cost $3 mil, but they'll pay for themselves in six months.' "
Perception is reality
One of the big challenges IT managers face is that of starting out in the hole -- with management and users anticipating the worst. Even if a project hits all its marks, it can be perceived as a failure.
"The problem with the word 'IT' is that it's almost always preceded by 'goddamned,' " says Ken Rau, senior consultant at Cutter Consortium in Wallace, N.C. "I've talked with CIOs who deliver 80 to 90 percent of their projects on budget and on time, but to their users it's, 'What have you done for me lately?' It's a communications and perception problem."
Vass agrees that bad tech experiences tend to linger longer than good ones. "One day on my morning commute there was an accident and it took me three hours to get to work. That happened five years ago, but I still remember it. People are like that with IT systems. They remember the one day in three years when the mail was down and they had to give a presentation. That's why we keep stats on things like system availability. So when they say, 'Your availability sucks,' we can show them the numbers that say we're up 99.999 percent of the time."