Splunk started its life as a log analysis system and has since grown into a general solution for analyzing and acting on machine-generated data.
With Splunk Enterprise 6.5, the company's enterprise-level offerings now feature machine learning, an ingredient that's all but obligatory for any big data product. But Splunk's approach is less opaque than most, and it encourages enterprise devs to build with it instead of merely deploying it.
Splunk has two offerings for machine learning: a prepackaged set of functionalities for common use cases, and a developer toolkit for building custom machine learning models that can be leveraged against data harvested with Splunk.
Start with the easy stuff
Enterprises getting their feet wet with either Splunk, machine learning, or a combination of the two can start with the Splunk IT Service Intelligence, Splunk User Behavior Analytics, and Splunk Enterprise Security bundled solution sets.
All of these focus on problems where enterprises have to paw through mountains of data and perform analyses on them that reflect common business problems. For instance, if you want to use machine intelligence to guard against outside attacks or insider threats, you'd most likely use some kind of anomaly detection algorithm. But you'd need to ensure that the algorithm can adapt intelligently and not get swamped by natural changes in behavior.
Splunk says that in such instances where the problem's already well-known and defined, solutions should be provided in a form enterprises can make use of immediately, rather than having to reinvent the wheel.
Break out the toolkit
The newly unveiled Machine Learning Toolkit is likely to be the most useful offering. It leverages Python, which has a wide user base in machine learning and scientific computing, to allow users to develop their own machine learning models that can be run on Splunk data.
As such, Machine Learning Toolkit helps users build solutions that aren't covered by the out-of-the-box machine learning or customize a given solution to fix some knotty internal problem.
Splunk provides examples for the Toolkit that cover pretty common scenarios -- for example, detecting outliers in server response time or forecasting the number of employee logins for a given time period. The Toolkit also includes visualizations that can be used to display results from the algorithms on Splunk's dashboard UI.
It's the data, stupid
All of this allows Splunk's products to be enriched with machine learning without becoming a black box. It's too easy to claim something is "powered by machine learning" without providing any details about what's going on under the hood. The most promising advances in machine learning come from open source toolkits that can be used by everyone but investigated by those who need to know what's going on underneath.
Machine learning is more about data than algorithms, and Splunk's always been a data company. It's far more suited to incorporating machine learning than many other outfits might be, but Splunk has taken the extra step of adding machine learning that's open-ended, not closed.