The big data market is heating up, and unlike some overhyped trends (social media), it's pretty easy to pinpoint ROI with these tools.
When we put out calls for nominees through the Story Source Newsletter, HARO, Twitter, and other channels, we received more than 100 recommendations. Usually, when we get that many, a good chunk of them can be dismissed out of hand. Some are clearly science projects; others have zero funding, no management pedigree and a dubious value proposition, while a few are clearly the product of malarial hallucinations.
[ Explore the current trends and solutions in BI with InfoWorld's interactive Business Intelligence iGuide. | Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. ]
Not so this time. Very few of the startups we looked at were whacky long shots. Most were decent ideas, backed by real VC money and seasoned management teams.
Recently, we've changed how the final 10 startups to watch are selected. First, a big list of nominees on Startup50.com are compiled. (Check out the big data list of 42 nominees here.) Then, we let readers vote on their favorites.
This time around another wrinkle was introduced. Startups left off the big list can challenge specific startups on it, trying to steal their spot away. If the challenge is deemed to have merit, we'll set up a separate vote. Sqrrl and DataStax both fought their way onto the list of nominees through challenges.
All told, more than 11,000 people voted for their favorite big data startups, with Cloudant winning, SiSense coming in a close second and SumAll finishing a strong third.
This time around we weighed voting more heavily than normal. Usually, voting is given a weight of about 30 percent, and then we turn to other factors, such as funding, the pedigree of the management team and the viability of the startup's roadmap.
However, the entire list of 42 big data nominees (plus several others that initially escaped our notice) is ridiculously strong.
Take Xplenty, for instance. They finished eighth in voting, but we considered bumping them because the startup is only a year old, hasn't raised significant funding and doesn't yet have big-name customers. All marks against it.
Balancing those negatives is the fact that voting does matter, and roundups like this are best if they include a mix of top startups well on the way to reaching their potential along with some startups that are pretty much all potential right now.
As we started looking at potential replacements, we realized that any of the top 25 or so vote-getters could make solid arguments for inclusion.
Frankly, we could have slotted Platfora, Cloudmeter, CloudPhysics, Sqrrl, RainStor, Rocket Fuel or several others in Xplenty's place. big data startups, unlike some other spaces, have real substance to them. They are building viable products that target real-world pain points (pain points businesses are willing to pay to solve -- today), and most big data startups are well-funded with solid management teams. It is just a really strong space.
So, Xplenty stuck. Yes, they're more raw potential than giant killer at this stage, but their coding-free Hadoop big data service is simple, easy to use and affordable for even the mid-market.
Now, it's your turn. Vote for your favorite big data startups, and we'll rank the top 10 and crown an overall winner.
What they do: Provide Databases-as-a-Service.
Headquarters: Boston, Mass.
CEO: Derek Schoettle. Before Cloudant, he was vice president, CME Sales at Vertica Systems, which was acquired by HP in 2011.
Funding: Cloudant just closed a $12 million second round of funding in May. Devonshire Investors, Rackspace Hosting, and Toba Capital led the round, which included participation from current investors Avalon Ventures, In-Q-Tel, and Samsung Venture Investment Corporation. Cloudant has raised $16 million to date.
Why they're on this list: They finished first in Startup50.com voting, just upped their funding to $16 million and now claim more than 12,000 customers. According to Cloudant, the problem with databases is that if an application is successful, organizations often outgrow them. This is commonly referred to as the "App Store Effect." Even "scale-out" distributed databases and caches are limited by cluster hardware and partitioning schemes.
The Cloudant Database-as-a-Service (DBaaS) is a managed service purpose-built for data-driven Web and mobile application developers who want to handle big data workloads without ever having to deal with distributed database design, sharding, partitioning, backup, etc. Cloudant works by storing, analyzing, and distributing application data across a global network of data centers, delivering low-latency, highly available data layer performance, and pushing dynamic data closer to the edge.
Market potential and competitive landscape: According to Market Research Media, the worldwide NoSQL market is expected to reach $3.4 billion by 2018, with a compound annual growth rate (CAGR) of 21 percent between 2013 and 2018. The NoSQL market is expected to generate $14 billion in revenues over the period 2013-2018.
Cloudant is rather uniquely positioned at the moment. While Oracle and MySQL have been available on AWS, there aren't that many NoSQL DBaaS offerings out there. Joyent rolled one out earlier this year, and AWS's DynamoDB is in beta.
Cloudant claims a customer base of more than 12,000 multi-tenant customers, including Samsung, DHL, Monsanto, Salesforce.com (Heroku), SourceFire, Hot Head Games, Flurry, AppAdvice, and LiveMocha.
What they do: Provide a Hadoop-based big data platform.
Headquarters: Palo Alto, Calif,
CEO: Mike Olson, who was formerly CEO of Sleepycat Software, an embedded database company that was acquired by Oracle in 2006. After the acquisition, Olson spent two years at Oracle as VP for Embedded Technologies.
Funding: Cloudera has raised $140 million in venture capital to date. Its investors include Accel Partners Greylock Partners, Ignition Partners, In-Q-Tel and Meritech Capital Partners.
Why they're on this list: Big data is hot, and Cloudera pioneered the Hadoop-based big data space. Moreover, they're sitting on a giant pile of VC cash and have a top-notch management team.
Frankly, we thought long and hard about leaving Cloudera off this list -- not because they don't belong, but because they've been doing well enough for long enough that we're not sure that the label "startup" really fits anymore.
However, they did well in Startup50.com voting (finishing in the top 10), and they pretty much proved the business case for Hadoop. Cloudera lets users query all of their structured and unstructured data to gain a view beyond what's available from relational databases. Cloudera recently released Impala, a new open-source interactive query engine for Hadoop that enables interactive querying on massive data sets in real time.
Market potential and competitive landscape: Gartner forecasts that big data will drive $34 billion in IT spending this year, increasing to $232 billion by 2016. Gartner also predicts that by 2015 65 percent of packaged analytic applications with "advanced analytics" will include embedded Hadoop.
Cloudera clearly has first-mover advantage, but competitors include EMC, Pivotal, Hortonworks and MapR. Intel just entered the fray, as well. Customers include CBS Interactive, eBay, Expedia, Monsanto and Samsung.
What they do: Provide enterprise search tools to help navigate big data.
Headquarters: Redwood City, Calif.
CEO: Paul Doscher. Prior to LucidWorks, he was CEO of Exalead, an enterprise search company. Back in 2003, he became CEO and one of the principal founders for JasperSoft, an open-source business intelligence platform provider, and he later served as EVP of worldwide field operations for VMware.
Funding: Total venture funding stands at $16 million (from Granite Ventures, Walden International, In-Q-Tel and Shasta Ventures).
Why they're on this list: IT organizations are beginning to collect orders of magnitude more data than they gathered even a few years ago. Collecting data is one thing; however, making actual use of it is another. Enterprise search clearly has a role to play in terms of making big data accessible. The challenge is doing it in a way that other applications can utilize.
LucidWorks Search is designed to help developers build highly secure, scalable and cost-effective search applications, while providing a simple and comprehensive way to access open-source search technologies.
LucidWorks big data is an application development platform that integrates search capabilities into the foundational layer of big data implementations. The product is built on a foundation of key Apache open-source projects and enables organizations to quickly discover, access and evaluate large volumes of structured and unstructured data. LucidWorks big data and LucidWorks Search work hand-in-hand to accelerate and simplify the building of highly secure, scalable and cost-effective search applications.
Market potential and competitive landscape: According to WikiBon, the total big data market reached $11.4 billion in 2012, ahead of Wikibon's 2011 forecast. WikiBon believes that the market will reach $18.1 billion in 2013, an annual growth of 61 percent. This puts it on pace to exceed $47 billion by 2017. That translates to a 31 percent compound annual growth rate over the five year period 2012-2017.
Competitive landscape: Competitors include Endeca, Autonomy, and Elasticsearch.
ADP is a named customer.
What they do: Provide a Hadoop/NoSQL big data platform.
Headquarters: San Jose, Calif.
CEO: John Schroeder, who previously served as CEO of Calista Technologies, which was acquired by Microsoft. Before that, he was CEO of Rainfinity, which EMC purchased.
Funding: In March 2013, MapR Technologies raised $30 million in VC funding in a round led by new investor Mayfield Fund, with participation from existing investors Lightspeed Venture Partners, NEA and Redpoint Ventures. This brings total funding to $59 million.
Why they're on this list: MapR finished in the top 10 in Startup50.com voting, has impressive VC backing and a CEO who knows how to see startups through to successful exits.
MapR's platform merges Hadoop, NoSQL, database and streaming applications into one unified big data platform. Anyone with even a cursory knowledge of Hadoop knows that speed isn't one of its claims to fame. MapR claims to have overcome the speed obstacle, while also offering such enterprise-grade features as "high availability, business continuity, real-time streaming, standard file-based access through NFS, full database access through ODBC, and support for mission-critical SLAs."
Competitive landscape: Competitors include Cloudera, EMC, Pivotal, Hortonworks, and Intel.
Named customers include Ancestry, Rebicon and comScore.
What they do: Develop database technologies to enable "Fast Data."
Headquarters: Redwood City, Calif.
CEO: Mike Hummel, who previously co-founded Empulse, a portal solutions and software consulting company now specializing in Web 2.0 projects.
Funding: ParStream has secured $5.6 million in Series A funding from Khosla, Baker Capital, CrunchFund, Tola Capital and Data Collective.
Why they're on this list: Traditional databases just weren't designed for Big-Data-scale analytics, and they certainly aren't able to deliver those insights in real time. Traditional databases analyze data sequentially and aren't able to take advantage of advances in multi-core processing.
At CTIA 2013 CEO Michael Hummel noted that memory is a big bottleneck for traditional databases. Meanwhile, the big data database darling, Hadoop, has trouble scaling efficiently.
Hummel argues that ParStream's database was purpose-built for speed. Whereas many database platforms exist for the purpose of storing and analyzing large quantities of data, ParStream was designed to deliver faster response times and to reduce big data storage infrastructure costs in the process.
ParStream enables "Fast Data" by using a distributed architecture that processes data in parallel. ParStream was specifically engineered to deliver both big data and fast data, enabled by a unique High Performance Compressed Index (HPCI). This removes the extra step and time required for decompression of data.
ParStream claims to provide sub-second response times on billions of data records while continuously importing new data.
Market potential and competitive landscape: Analysts see the big data market reaching anywhere from $18 billion (WikiBon) to $34 billion (Gartner) in 2013. Competitors include SAP HANA, Apache platforms and Vertica Systems (HP). Searchmetrics is a named customer, but Hummel assured me that more will be going on the record soon.
What they do: Provide database infrastructure software that simplifies the way database environments are deployed and managed.
Headquarters: Santa Clara, Calif.