The new version of the Splunk machine data search engine comes with a distributed indexing technology that could save storage costs for those customers running the software as a high-availability service.
"The data that is being collected in Splunk is becoming more mission critical," said Sanjay Mehta, Splunk vice president of product marketing, explaining the need for distributed indexing.
[ Also on InfoWorld: Splunk Storm brings log management to the cloud. | Learn how effective collection and analysis of log files can help you improve security, troubleshooting, compliance, and systems management with InfoWorld's Log Analysis Deep Dive Report. Download it today! | Get the latest practical data center info and news with Paul Venezia's Deep End blog and InfoWorld's Data Center newsletter. ]
Splunk Enterprise 5 can also generate reports more quickly than its predecessor, the company claims, and comes with new tools to link the software to third-party programs.
The Splunk search engine was designed to collect and index data generated by machines, such as log files from servers and routers. Administrators can use such data to troubleshoot problems and ensure smooth operations. The company has also pitched Splunk as a tool for business managers to collect and analyze operational intelligence.
This is the first version of Splunk to use a new indexing technology that incorporates replication into its routine operations. The software will store multiple copies of its index, which it uses to answer user queries, across different servers. If one server goes down, indexing will continue on the other server, or servers. When the downed server comes back online, it is then updated with the new information. Users consulting Splunk can get their answers from any operational server, which increases the reliability of the service.
"The index data is replicated as it is streaming into Splunk. You can make as many copies as you need," Mehta said. "We have a distributed architecture, so the query tier determines where to fulfill the queries."
With distributed indexing, organizations will no longer need to keep backups on storage area networks (SANs) for fault-tolerant operations, Mehta explained. Instead, the organization can store multiple indexes on commodity servers, he said. "The software layer is providing the replication and availability," Mehta said.
The company also offers SDKs for Java, Python and PHP, in preview modes. In addition, the company now offers versioned APIs (application programming interfaces), which allow third-party applications to continue to work with Splunk even after Splunk itself is updated with new capabilities. Developers will just have to specify which version of the API they want to use.
The opening of Splunk to third-party developers has been a goal of the company's since the start of the year, Mehta said. Allowing administrators and developers to extend Splunk functionality to other applications allows Splunk to be used in more systems. For example, call center IT support could use Splunk search results to better pinpoint problems. Splunk data can be worked into business intelligence tools, such as Microsoft Excel. Developers could create mobile applications using Splunk data, such as apps that monitor the temperature inside buildings or the health of patients.
"All these sources of data can now link directly through the API," Mehta said.
The new Splunk can also generate reports much faster, thanks to a new data summarization technology the company has added. "We maintain summaries on the indexes. The summaries are up-to-date and they are reusable by other available searches," Mehta said. Reports that used to take a few minutes to compile now should take only a few seconds, Mehta said.
Splunk offers a version of its namesake technology that can be downloaded at no cost. The commercial version of the software starts at $5,000, for United States customers.