Big data drives high performance for

Analysis of Web user clickstreams and machine-generated log files helps auto shopping site increase revenue and defend against malicious bots

Web businesses have long realized that a direct line can be drawn between revenue and the quality of user experience. Measuring that experience on a wide scale has become a leading big data analytics endeavor, one embraced by By analyzing data from 12 million monthly website visits, the company is optimizing the user experience for its shoppers -- and gaining deep operational insight and fraud prevention capability in the bargain.

As a website for car shoppers to find, learn about, and purchase vehicles, earns fees on car sales along with revenue from banner advertising surrounding content on thousands of cars, trucks, SUVs, and vans from all major manufacturers. With a fast user interface, shoppers spend more time on the site and, thus, are more likely to buy vehicles and click on banner ads.

[ Download the Big Data Analytics Deep Dive for a comprehensive, practical overview of this booming field. | Harness the power of Hadoop with the 7 top tools for taming big data. | InfoWorld's Log Analysis Deep Dive PDF special report shows how effective collection and analysis of log files can help you improve security, troubleshooting, compliance, and systems management. ]'s application management team has three key goals for its website: maintaining high performance, protecting content, and tracking traffic sources for advertisers. Behind the scenes, bot and spider traffic is a persistent menace that degrades website performance. Some malicious bots also scrape content such as vehicle listings for use by spammers on fake sites to lure unsuspecting consumers into giving up personal details.

Log files hold the key to identifying malicious activity and optimizing performance, but acquiring all that weblog traffic data and analyzing it manually is immensely cumbersome and time-consuming. Without real-time reports, the team resorted to overprovisioning its server infrastructure to ensure that fast page load speeds could be maintained consistently. turned to Splunk to collect, index, search, and analyze machine-generated big data sets from a wide variety of sources in real time. At its core, Splunk's patented "machine data web" organizes and understands log data. Splunk also includes prebuilt reports to help the team identify illicit scraping and bot traffic as separate and distinct from legitimate visitor traffic. The reports have become valuable sales tools -- and powerful support tools for internal stakeholders to take appropriate measures against unwanted traffic.

Tangible return on investment is realized in two ways. First, efficient, real-time data collection saves more than 400 man-hours annually. Second, it aids in accommodating peak traffic periods. During the 2012 Super Bowl, for example, detailed performance statistics provided operational insight that reduced server and administration costs by approximately $160,000.

Splunk has "enabled us to tackle a larger set of problems in less time," explains Jon Abend, manager of technical operations. "Beyond Web logs, in real-time we can now easily analyze application logs, application servers, middleware components, system metric logs, etc. The variety of types of users -- performance engineers, middleware teams, search engine marketing teams, etc. -- may not have been so broad without this ability to manage an equally broad variety of systems."

Since early this year, the big data environment at has managed more than 35 terabytes of data. It adds 2.5 million weblogs per hour, representing an additional terabyte per week, and handles more than 750 million queries per month. With that kind of big data analytic insight, is sure to remain in the fast lane for a long time to come.

This article, "Big data drives high performance for," was originally published at Read more of Andrew Lampitt's Think Big Data blog, and keep up on the latest developments in big data at For the latest business technology news, follow on Twitter.

Copyright © 2012 IDG Communications, Inc.

How to choose a low-code development platform