The new version of the Splunk machine data search engine comes with a distributed indexing technology that could save storage costs for those customers running the software as a high-availability service.
"The data that is being collected in Splunk is becoming more mission critical," said Sanjay Mehta, Splunk vice president of product marketing, explaining the need for distributed indexing.
[ Also on InfoWorld: Splunk Storm brings log management to the cloud. | Learn how effective collection and analysis of log files can help you improve security, troubleshooting, compliance, and systems management with InfoWorld's Log Analysis Deep Dive Report. Download it today! | Get the latest practical data center info and news with Paul Venezia's Deep End blog and InfoWorld's Data Center newsletter. ]
Splunk Enterprise 5 can also generate reports more quickly than its predecessor, the company claims, and comes with new tools to link the software to third-party programs.
The Splunk search engine was designed to collect and index data generated by machines, such as log files from servers and routers. Administrators can use such data to troubleshoot problems and ensure smooth operations. The company has also pitched Splunk as a tool for business managers to collect and analyze operational intelligence.
This is the first version of Splunk to use a new indexing technology that incorporates replication into its routine operations. The software will store multiple copies of its index, which it uses to answer user queries, across different servers. If one server goes down, indexing will continue on the other server, or servers. When the downed server comes back online, it is then updated with the new information. Users consulting Splunk can get their answers from any operational server, which increases the reliability of the service.
"The index data is replicated as it is streaming into Splunk. You can make as many copies as you need," Mehta said. "We have a distributed architecture, so the query tier determines where to fulfill the queries."
With distributed indexing, organizations will no longer need to keep backups on storage area networks (SANs) for fault-tolerant operations, Mehta explained. Instead, the organization can store multiple indexes on commodity servers, he said. "The software layer is providing the replication and availability," Mehta said.
The company also offers SDKs for Java, Python and PHP, in preview modes. In addition, the company now offers versioned APIs (application programming interfaces), which allow third-party applications to continue to work with Splunk even after Splunk itself is updated with new capabilities. Developers will just have to specify which version of the API they want to use.