Just think: 14 million data elements, and that's the tip of the identity and access data iceberg! One might contend that 4 million (or 10 million) records in the examples above are not really indicative of "big data" per se. That's true. We used simple, conservative numbers to show how things grow.
In a real-world environment, data accumulates much faster. The two previous examples only talked about data for people, applications (accounts), and activity (logins and file shares). Every business has many more applications that are important, and each of those has a wealth of data to be collected and analyzed. Data must be collected regarding access, roles, inheritance, permissions, assignment, denial -- and for such key systems such as financials, HR, CRM, databases, email, SharePoint, and so on. We also need to collect activity for those systems -- more than just logon events. The growth in data collected is very rapid.
Boiling an ocean of details
In the face of exploding amounts of data and increased security expectations is the ever-persistent need to understand and manage identities and their access privileges successfully. It's becoming increasingly challenging to answer even simple questions. For example: "What should Bob be able to do with that application or data?"
Supervisors are often too busy to deal with such questions and may not know the right answer. Environments, applications, and policies are constantly changing. Multiple systems must be traversed in order to arrive at an answer. To make matters worse, these are often loosely coupled systems, where additional expertise must be applied to determine the proper action. No human can boil this ocean of data and look at all of this, nor would anyone want to.
As a CISO, I want to leverage all of this data to my advantage, but how do I manage it? How do I ensure that the right identities have the right access and are doing the right things? How do I see the anomalies and outliers? What's the risk? And how do I do all of this in a timely fashion?
In other words, how do I turn all of this data into useful information or actionable intelligence?
No human can quickly and accurately digest and make sense of all this data. This is a problem that's seems perfectly suited to a technology that can do the heavy lifting: the data crunching and analytics, the highly repetitive tasks. While the power of automation is certainly required, that's not the whole story. You still need humans involved in the business analysis. Where do we automate, and where do we still need human oversight or intervention?
Defining a process
What is needed is a process for collecting, analyzing, visualizing and taking action -- converting identity and access data into actionable intelligence:
Collect > Analyze > Visualize > Act > Operationalize
First, we need data to answer the who, what, where, when, why, and how questions as they relate to identity and access:
Who: Things, people, or applications that have an identity
What: The action taken, resource or data accessed or involved
Where: Location, geography, or position
When: Time stamp
Why: The detail behind the user activity and intention, this is where inference and analytics are useful
How: The mechanism by which access was granted or used; activity history, inference, and aggregated data can help provide this context