Dear Bob ...
I'm a consultant helping organizations to achieve and maintain certification to the ISO 27001 Information Security standard. This requires (among much else) that they define KPIs (key performance indicators) for security.
This seems a very reasonable request and good management practice, but I've not been able to identify satisfactory (to me -- they seem to keep the auditors happy) indicators.
The problem is that security is, in some respects, like a fire or burglar alarm -- failures are (hopefully) rare events. The fact that you haven't had a fire or break-in this month may be a worthy achievement, but (possibly) doesn't tell you much about what will happen next month. So keeping a record of rare events, while certainly important, doesn't help you assess the effectiveness of your security.
Equally, there are plenty of indicators that are straightforward and inexpensive to gather (good!): the number of spam e-mails blocked, the number of virus infections trapped, the number of packets blocked at the firewall. The problem with these statistics is that the trends are not helpful. If the number of detected viruses has gone up, is that because there are more attacks going on or because a new version of the software is more effective at detecting infection?
I've asked many industry gurus for good examples of security metrics, without success. Have you any thoughts?
Dear Unsatisfied ...
Many thoughts. Not sure if any of them are useful thoughts, but many thoughts nonetheless.
Starting with this: Requiring KPIs is probably a useful idea, so long as everyone understands you can't start with the question of what constitutes a good KPI.
KPIs are metrics. As is always the case with metrics, the question of "how should we measure this" has to come after the question, "When we use the word 'this,' what does it refer to?"
So never mind the security KPIs. The place to start is to ask, what are the organization's security goals?
This might seem too obvious to ask: "Our security goals are to have no break-ins, no stolen data, no cybervandalism, and no successful malware attacks, you chump." And this would be an admirable set of goals if they weren't a great example of optimizing the part at the expense of the whole -- because these goals are easily achieved. All a company has to do is shut down its internal networks and Internet connections. If the company isn't willing to do that, it has to reformulate its security goals to recognize the need to balance risk and effectiveness, and describing an optimization that's a balance among multiple competing goals is a far more difficult formulation to create.
Which means the answer is going to be more complicated than anyone will like, might be more complicated than anyone would be willing to accept and certainly will be more complicated than anything I can fully answer within the framework of Advice Line. To get to a good answer would probably require a week of on-site consulting -- not that I'm not trying to sell you a week of on-site consulting, just giving you a sense of the magnitude of the challenge. (Not that I'd complain if you asked.)
Anyway, here's the shape of an answer, to get you started:
Begin by defining IT's operational goals for performance, stability, and ease-of-use. Ease-of-use is an excellent example of the well-known inverse relationship between the importance of a goal and the difficulty of measuring it objectively; neglect to measure it, though, and you won't get it, which is worse.
Second, you need a threat inventory. This is your list of the types of attacks you know you need to plan for. It should take the form of a hierarchical list, starting with something akin to the "chump" goals I listed above, drilled down a few levels, until everyone responsible figures you have a good handle on the types of threats you face.
Third, establish the concept of "acceptable countermeasures." An acceptable countermeasure helps achieve a security goal without degrading performance, stability, or ease-of-use beyond x. Of course, "x" can't be defined numerically until you figure out how to measure performance, stability, and ease-of-use.
Next is defining security goals. This is where it gets interesting, because you live where meteorologists live: They can "predict" a 30 percent chance of rain tomorrow, but when tomorrow shows up it either rains or it doesn't. In your world, here's the best I can come up with: Your goal for each item in the threat inventory is to have implemented acceptable countermeasures that are at least as good as industry standard practice (usually and incorrectly called "best practice") for that threat category.
Two points on this: First, industry standard practice is a moving target; it improves over time, which means your own practices have to improve over time as well. This is a good thing.
And second, the reason you can't just implement industry standard practice is that in some cases doing so would mean implementing an unacceptable countermeasure.
Now we get to the good part: what happens after a security incident. The answer, I think, should be an analysis of how it happened and whether any acceptable countermeasure would have prevented it. Based on the severity of the incident, you might decide to redefine what's acceptable. Or you might not, figuring there are times when cleaning up a mess afterward is better than the cost of avoiding it.
Which gets to one of the many complications that prevent me from giving you a complete answer here: I've limited this discussion to countermeasures, when in fact you also need to define potential responses to the various threats in the threat inventory. You do what you can to prevent fires from starting, but need a fire department to handle the ones that break out anyway.
Now (at last!) we're ready to talk KPIs. If your goal is to implement acceptable countermeasures for all threats in the threat inventory, here are logical KPIs:
- Percent of actual attacks that are not listed in the inventory (any that aren't on the list constitute a planning failure).
- Percent of actual attacks that (1) were successful; and (2) would have been thwarted by an acceptable countermeasure you didn't implement.
- Percent of successful attacks that could not have been thwarted by an acceptable countermeasure and for which you had no planned response.
[ If you find this approach to metrics useful, you'll find much more on the subject in Bob's new book, "Keep the Joint Running: A Manifesto for 21st Century Information Technology." ]
Politically, of course, none of this matters because no matter how you assess risk and your response to it, after a problem occurs you're guilty.
Which means communicating the nature of your security plans, and their limitations, over and over again, is far more important than something as relatively trivial as measuring their success.