What every CISO should know about machine learning

Machine learning allows security to adapt to the network being monitored, thereby improving threat detection and reducing false positives

Every organization’s network is unique. So are its security needs.

Every network runs a unique mix of functions, applications, and supported services. That means a security learning and prediction model that works on someone else’s network won’t necessarily work on yours.

Unfortunately, many security technology vendors take just such a “one size fits all” approach with “trained” systems that attempt to detect anomalies using a single learning method with training data. These systems pull data from the entire customer base for both false positives and true positives. Then, once the system has been trained, the updated standard profile is sent to all customers in an update, and the same learning model is applied to all customers and their networks. In other words, what’s missed on one customer will be missed on all.

That’s not to say there isn’t value in macro threat intelligence. But it’s only one piece of a complex puzzle, and it ignores the fact that each network is unique.

Machine learning

Masergy’s Unified Enterprise Security (UES) uses multistage machine learning analysis to find, then learn, predictable patterns on a network. Every network has its own learning model, based on the types of methods that work best for it. Instead of applying a one-size-fits-all approach, the Masergy system detects events differently on different networks.

At the heart of the analysis process is a data prediction gradient that uses multiple learning models, including associated rules learning, sparse dictionary learning, Bayesian fields, and artificial neural networks. This system can learn data streams from any of six subsystems, each serving a distinct purpose:

  • Frequencies: When data is transmitted, the data from a network card, log, or vulnerability scanner is defined as an event. The frequency and magnitude of these events are measured over set periods of time, then mapped.
  • Pairings: This subsystem identifies which systems are communicating with each other, the protocols they’re using, and the size of the bidirectional communications. Then, like frequencies, these are mapped directly to a date and time.
  • Protocols: By determining which protocols are used in a network stream, this subsystem identifies the type of applications, operating systems, and infrastructure devices on a network.
  • Resources: This subsystem builds an asset list of devices on a network and the communications methods they use. Over time, this list is adjusted so that the system can learn the parameters and baselines. This, in turn, lets the system make predictions based on combinations of protocols and services in use.
  • Statistics: Metadata on groups of systems from smallest (single system) to largest (whole network) is gathered, then fed to other subsystems.
  • Threshold: Using curve-fitting algorithms to learn data trends, this system generates major and minor brackets. Then it tracks both high peaks and low troughs to determine when a value has exceeded its bracket.

Masergy’s data prediction gradient can use data from all six subsystems. Then it processes the data using multiple learning models, comparing the learned data with the original raw data and using regression analysis to grade each data stream against its own learning models. In this way, the system determines the predictability of any data model.

Also, data models with high predictability are tracked and used for anomaly detection. Those with low predictability are monitored using the data prediction gradient, in case they later become predictable.

The final analysis is done by clustering data. The Masergy system arranges the data into individual fields; this creates dimensions that are disassociated with the original structures.

Next, these dimensions are individually analyzed with cluster analysis, using different clusters of dimensions to create hyperplanes. Projections of these hyperplanes can be analyzed to find patterns that do not exist in the ambient data, and these often show emerging patterns that point to deeply hidden anomalies. This technique is used to form a temporal grid that serves as a prediction model. This lets the system find anomalies in the hyperplanes that can then be mapped back to the original data in the ambient space.

Masergy’s system adapts to the network being monitored. Instead of applying a one-size-fits-all approach, Masergy UES detects events differently on different networks.

Correlation analysis

As the dictionary tells us, a correlation happens when two or more things go together in ways not expected from chance alone. Most security solutions use correlation as part of their monitoring. In what’s typically the last step before sending an alert, these solutions use rules-based systems to correlate data sets.

Masergy’s platform takes correlation one step further. Masergy UES continuously identifies, analyzes, and correlates typical network traffic, alerts, and packet behaviors over long periods of time. It then deploys unique methods to detect and thwart reconnaissance activity prior to an attack. UES also dramatically reduces the number of both false positives and false alarms.

Masergy UES does all this by building behavioral profiles that far exceed the traditional frequency, threshold, and NetFlow-based detection methods used by other security products. To achieve this tight correlation between dissimilar data sources, UES tightly integrates five important data sets directly into its engine:

  •  Vulnerabilities: Masergy’s platform includes an integrated vulnerability scanner that maintains a fresh profile of all vulnerabilities on the network. This helps the system better understand the attack surface of the network, not only guaranteeing that the correct signatures are loaded in an intrusion-detection system, but also allowing the behavioral engine to adjust its threat profiles based on known vulnerabilities and server locations. For instance, a server receiving live network traffic from the Internet would get much higher scrutiny than an internal server with little traffic.
  • Intrusion detection: An integrated, knowledge-based intrusion detection system is essential to finding attack vectors of known exploits. By itself, such a system cannot find unknown or zero-day exploits. But fully integrated into Masergy UES, the intrusion detection system also serves as a data source that feeds the behavioral analytics engine. For example, this intrusion detection system could report on which types of exploits a hacker is trying. It could even directly correlate with vulnerability information to predict which attack types are most likely to be effective.
  • Log capture and analysis: Masergy UES captures and analyzes logs from any log-producing device, application, proxy, or service; it can also provide information about applications that have no network presence. First, the system examines all log information using its rules-based engine. Then it organizes the logs into a unified format and sends them through a learning model. This can detect brute-force password attempts at local workstations as well.
  • Threat intelligence: Masergy UES is fully managed and continuously monitored by Masergy’s staff of certified security experts. In essence, the company becomes an extension of an organization’s team, reinforcing the intelligence that UES learns by correlating all attack vectors. This threat intelligence also provides feeds of information that the analysis engine uses to correlate and prioritize data, such as Internet hosts used as attack sources and services exploited.
  • Vendor disclosures: Masergy keeps up with the industry. Whenever hardware and software vendors reveal new security issues with their products, Masergy UES analysis engine correlates that information. In this way, UES determines which systems have new exploits that could work against them.

If and when new information is required, Masergy UES can immediately launch an appropriate service. Correlation rules can be made quite strict, too. The data is written directly into the unified data set, allowing the same learning models to be used for feature detection.

While all of these data sources could be gathered externally, it’s far more powerful to control, analyze, and manage them from a single platform. That way, all data -- not only the mappable fields -- can be correlated, since they’re part of a universal data set. The result is dramatically improved prediction, detection, and protection against threats.

Historical data

After a burglary at your home or office, you’d probably want to strengthen your security appropriately. Thieves came in a window? Equip the windows with alarms. Bad guys pried open the front door? Add more robust locks.

But what if you couldn’t detect how the burglars got in? That’s essentially the problem with many of today’s network security systems. They use standard learning models, meaning the model is trained not with actual historic data, but instead with collective sample data that’s later distributed to all systems. From the vendor’s point of view, it’s an approach that’s relatively easy. But for the user, security decisions can become difficult. With only collective samples, network and security managers are stuck working with sparse data from equally sparse sources.

Raw data matters. Reducing the richness of the raw data also reduces the effectiveness of threat detection. Starting with too little data results in both numerous false positives and false negatives. Also, data profiles or culled data are forms of data reduction, meaning they’re useless for other learning models including entirely new models.

That’s why Masergy UES maintains packet headers for at least 14 days. This helps the system maintain a large enough set of historical data to be used for effective security. And for the best possible network protection, UES employs every field in the packet headers, not a select few.

Masergy UES captures and retains copious amounts of data, allowing it to spot important correlations among seemingly unlikely data sets. The very improbability of correlation among these data sets is precisely what makes them a rich resource for anomaly detection. And the easiest way to build a large historical data set is to maintain the raw data used in past analysis runs.

Masergy UES also includes a data prediction gradient that matches data with learning models to produce stable, predictable patterns. For Masergy’s analysis engine, unlike with some other systems, more data is better than less.

With more conventional approaches, anomaly detection is often challenged by a small data set that never allows predictability. Conversely, a large data set can overwhelm the system and fail to produce results fast enough, or it can present so much variation, the system fails to detect it. In addition, noisy data sources, bad clusters, or transient data can automatically fall to the bottom of the gradient, meaning the model will not use them for detection.

One key to creating a data gradient is regression testing. If the prediction of a model suddenly fails when it has rarely done so before, this either means the outcome is a true anomaly or that the model is no longer valid for the data. To make this determination, Masergy once again uses historical data, this time to perform regression analysis against the current model. Because anomaly detection suffers from sparse data, UES maintains an abundance of historic data for the local learning models.

In essence, Masergy UES remembers how the bad guys got in and what “normal” looks like. Then it uses these memories -- and this data -- to keep networks safe.

Time-based analysis

A communication network is, among other things, a temporal environment. In this context, it can be said layers of timed events comprise the network. Often, these timed events are mutually synchronized. That is, certain events must occur within a specific sequence to allow further communications.

Even complete communications occur in sequences of time. For example, a website cannot be connected unless the website’s name is first resolved to an address. This, in turn, activates connections to different locations in order to gather up all the components required to be displayed on the page. Even the user’s website visit is part of a larger temporal pattern, one that is made up of typing, clicking, and scrolling. All of these actions occur over a span of time and can be observed for time-based analysis.

Let’s say you determine through time-based observation that your staff does nearly all of its Web browsing during normal, Monday-through-Friday work hours. On weekends and late evenings, you find, Web browsing is almost nonexistent. In this example, these are time-based observations, and they can help with all sorts of network planning, processing, and protection.

That’s why Masergy UES platform uses long-term storage and behavioral profiles to analyze data over long periods of time. Masergy understands the importance of time to security. In our approach, every given piece of data that operates on a temporal system is analyzed using appropriately temporal techniques.

1 2 Page 1
Page 1 of 2