With Hadoop turning into a one-size-fits-all repository for data, an array of search solutions specifically for Hadoop have come to the fore over the past year. One of those contenders, LucidWorks, has joined with Hortonworks, one of the major distributors of Hadoop, to offer the LucidWorks edition of Hadoop search engine Solr as a reference architecture for searches on the Hortonworks Data Platform, or HDP.
Hortonworks made news earlier this week with word of a new edition (2.1) of HDP, released neck-and-neck with a rev of Pivotal's own Hadoop-powered big-data offering. As before, the big advantage Hortonworks claims to bring is its pure open source roots: Anyone can pick up a copy of HDP and start working with it, no strings attached. No additional licensing fee is being charged for LuceneWorks Solr as part of HDP, either; the only charge is for support, per HDP itself.
Solr is based on Apache's own Lucene project and adds many options not found in the original that ought to appeal to those building next-generation data-driven apps -- for example, support for geospatial search.
The end-user advantages of Solr, according to Will Hayes, chief product officer for LucidWorks, lie in how it makes a broader variety of Hadoop searches possible for both less technical and more technical users. Queries can be constructed in natural language ways or through more precise key/value pairs.
"If you think about the precision approach [to Hadoop data]," said Hayes, "you have to know what you're looking for. One of the things Solar will add on top of Hadoop is the ease of exploration of the data, a quick way for folks who perhaps have to do more precise access through SQL or Java APIs to explore the data in the lake, then be able to rapidly refine what they gain access to and what it means, and get better use for that data further down the road. They don't have to start with a SQL or an API call."
Another implication of Solr being able to return search results across a Hadoop cluster is that more data can be kept in Hadoop and not pretransformed for the sake of analytics. "This means not having to anticipate the questions [to ask] before you load the data," Hayes explained. "You want to collect everything, then answer questions, or expand the scope of those questions."
Solr will be rolled into HDP via a multistep process. The first phase involves making Solr available for customers on June 1 within a sandbox. After that, Solr will be integrated directly into the next release of HDP, although no release schedule for that has been announced yet. Later on, Hortonworks plans to do some work on hooking up Solr to Ambari, the management and monitoring component for Hadoop, for easier control of indexing speeds and alerting, among other aspects.
LucidWorks has also produced a version of Solr that's meant to join the ever-growing parade of open source or lower-priced products designed to steal some of Splunk's log-search thunder. Entitled SiLK, the new LucidWorks product is a combination of Solr and several other open source log analysis tools that have made wave recently -- such as Logstash and Kibana, but also Apache Flume. Given that SiLK can also work with Hadoop, it's clear LucidWorks is intent on helping make Hadoop more into the ultimate data repository, not just because of what can be put into it, but how it can be pulled back out again.
This story, "LucidWorks, Hortonworks team up to be Hadoop's search engine," was originally published at InfoWorld.com. Get the first word on what the important tech news really means with the InfoWorld Tech Watch blog. For the latest developments in business technology news, follow InfoWorld.com on Twitter.