These are the days of the big data buzz and the proliferation of mind-numbing reports to beef up any kind of argument imaginable. But I'm reminded of one fight between the IT operations and development teams that showed how simple metrics can decidedly prove a point. It was before the advent of devops, a classic case of many different teams involved in day-to-day IT operations, and begs the question of "can't we all just get along?"
It was the mid-1980s, and I was working at a regional bank in the data center group as a fairly new first-level manager of the IT operations support team. All computer processing, mostly batch work, happened on either Honeywell or IBM mainframes.
[ Take a walk on tech's lighter side with America's funniest IT stories. | Get your weekly dose of workplace shenanigans by following Off the Record on Twitter and subscribing to the Off the Record newsletter. ]
There was one major annoyance to our team: dropping everything to deal with failures caused by the development team's simple errors. For example, basic typos were made on JCL (job control language) -- long before software was available to catch such problems.
Our team knew that some of the developers were less than careful with testing changes, partly due to the fact that one of our team's jobs was to fix batch job failures ASAP without the developers. We were on-site 24/7, and they were usually at home when the batch cycles were running. We could either wait for them to arrive or do it ourselves. We went with the latter (faster) option.
Even more frustrating, the developers' managers weren't holding their own people responsible for the numerous job failures. Developers could toss in a half-baked change request and think to themselves, "Someone else will fix it if it goes south" -- which is exactly what we did.
Not only was operations taken away from our other projects by these missteps, but the failures made our whole business look bad. For example, if a botched batch job prevented updates to the Demand Deposit Accounts (DDA) system, we were at risk of not having current checking, savings, and CD account information available in the branches. If that happened too often, senior bank management would "have words" with the IT managers. It was also very embarrassing when our tellers had to tell customers, "Sorry, we can't give you your current balance right now."
Our boss talked to development's manager many times about their quality control, but to no avail. Finally my boss came up with a way to get the point across. There was a daily processing report, created on paper, that came from our data center group. Copies were distributed to most managers and to others who requested it. This report included such info as check volumes, start and completion times of key batch jobs, and a list of jobs that failed or "ABENDed" ("ABEND" standing for "abnormal end" of the job). My boss's goal was to show a connection between the number of changes submitted by development and the number of ABENDs.