Data mining service crawls billions of Web pages

Startup 80legs launches data mining service that leverages a 50,000-computer grid to search, crunch millions of Web pages in minutes

80legs has officially launched its service, which brings supercomputer-scale data mining of the Web to companies, and even individuals.

The Houston, Texas-based startup leverages a grid of 50,000 servers to search and crunch millions of Web pages within minutes, CEO Shion Deysarkar told Computerworld on Monday ahead of the Demo Fall 09 conference in San Diego.

[ Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. ]

Target customers include market researchers looking to mine public opinion on a particular product or service, lawyers searching for copyright infringement and piracy, or online ad agencies looking to do competitive analysis of where rival firms are placing their ads, Deysarkar said.

But some individuals are even using the 80legs beta to research reviews and opinions on various wines. This involves 80legs' app, which uses some natural language processing technology and is more sophisticated than simple Google keyword searches, Deysarkar said.

Each search will cost $2 per million pages crawled, plus 3 cents per CPU-hour used. A search involving 1 million pages would be returned within 10-20 minutes, he said, but 80legs can search the entire Web if so desired.

Customers must fill out a job form and either select one of the semantic analysis, or text extraction apps, written by 80legs. Or they can upload their own app, which must plug into either a Java or .Net application program interface, or API.

80legs doesn't own its own grid, but instead rents it from a fellow startup, Plura Processing , which shares the same venture capital firm, Creeris Ventures.

80legs originally planned to leverage Plura's grid to develop its own Webcrawling-based service, but later decided to "let other people develop their own services and ideas while we provide the crawling," Deysarkar said.

Deysarkar said Amazon Web Services (AWS) is 80legs' main competitor, though he claims companies who use AWS will face three disadvantages: 1) they will only be able to leverage a fraction of 80legs' 50,000 node-grid; 2) they will have to go to the expense and trouble of writing their own webcrawling app; 3) they will pay more than twice as much in crawling and usage charges.

80legs plans to offer Perl and Python APIs in the future. And in two months, the company aims to release its own iPhone-like App Store for independent developers to sell apps to end users.

In contrast to Apple's App Store, developers will be able to set their own price and keep 100 percent of the revenue, Deysarkar said.

This story, "Data mining service crawls billions of Web pages" was originally published by Computerworld.

Mobile Security Insider: iOS vs. Android vs. BlackBerry vs. Windows Phone
Join the discussion
Be the first to comment on this article. Our Commenting Policies