Shopzilla buys into big data for fast meta-shopping

VoltDB handles terabytes of daily updates, with up to 100,000 database writes per second, for comparison shopping service

With a user base of more than 40 million shoppers worldwide, Shopzilla is a leader in connecting buyers and sellers online. Each month, through both its destination websites and affiliate network, Shopzilla connects shoppers with more than 100 million products from tens of thousands of retailers.

To provide the most recent inventory of products and prices, Shopzilla moved its inventory platform from a traditional relational database to VoltDB, a specialized, open source database for high-velocity big data.

[ Download InfoWorld's Big Data Analytics Deep Dive for a comprehensive, practical overview of this booming field. | Harness the power of Hadoop with InfoWorld's 7 top tools for taming big data. | Get started with NoSQL -- these 10 standout NoSQL databases are worth a try. ]

Shopping the virtual mall

If you shop online, chances are you've landed at Shopzilla or its portfolio of brands -- including Bizrate, Beso, Retrevo, TaDa, RobotOatmeal, and others -- which help shoppers worldwide discover, compare, and purchase products.

Shopzilla's core value is that it presents an up-to-date inventory of products and prices, so shoppers don't need to surf endlessly across different sites. But until recently Shopzilla's inventory platform, which ingests terabytes of data from merchant feeds on a daily basis, was powered by a traditional relational database. Software, hardware, and administration resources were becoming prohibitively costly to support high-velocity inventory updates.

While researching database vendors, Shopzilla compared a number of NoSQL and sharded MySQL products before selecting VoltDB, an in-memory relational database, designed specifically for high performance. The intent was to narrow the data "ingestion-to-decision gap" by running thousands of writes and tens of thousands of reads per second, while performing real-time tracking.

Getting up to speed

Working with VoltDB, Shopzilla significantly boosted the rate at which it can process inventory data and derive actionable intelligence. Higher-velocity updates drive revenues by delivering near-real-time information to consumers and by passing along more highly targeted leads to the thousands of retailers paying Shopzilla on a pay-per-click basis.

On a simple three-node evaluation cluster supporting full durability, Shopzilla achieved 80,000 to 100,000 writes per second with VoltDB. Once fully optimized and in production, that level of performance helped eliminate complicated caching and data pre-loading processes, simplifying Shopzilla's architecture and allowing it to interact directly with the database. The company also used VoltDB's point-in-time snapshot capability as a faster way to export inventory data for further analysis. In addition, Shopzilla gained the ability to filter offers coming into its system, removing duplicates and reducing the transactional load downstream from 2,500 TPS to 650 TPS, allowing the company to save on hardware and operational expenses.

VoltDB's stored SQL procedures allow Shopzilla to identify and fix any errors in application updates before deployment. Shopzilla is also able to constantly update its sales and consumer feedback for online merchants and retail advertisers based on those real-time analytics.

Since switching to VoltDB, Shopzilla has achieved its first milestone of rapid feed ingestion with a five-fold increase in performance. It's the first step in enabling an overall latency reduction for accurate product and pricing information.

This article, "Shopzilla buys into big data for fast meta-shopping," was originally published at Read more of Andrew Lampitt's Think Big Data blog, and keep up on the latest developments in big data at For the latest business technology news, follow on Twitter.