Credit bureaus like Experian, Transunion, and Equifax have been amassing personal data on consumers for decades, mainly to sell to marketers for targeted campaigns. To cope with demand for real-time updates of consumer touch points -- social media, Web browsing, and so on -- Experian moved from the mainframe to a scale-out architecture of Hadoop and HBase. The company now aims to process 100 million records per hour, including geographic, demographic, and lifestyle data.
With $4 billion in revenue last year and more than 15,000 employees, Experian is a leader in credit reporting and marketing services. The company has a long history of managing consumer data: who consumers are, what they buy, how they're connected, and how they interact. In the past, Experian's mainframes batch-processed consumer database updates on a monthly basis and its clients applied those updates for campaign adjustments. That was fine -- before the Web, mobile devices, and social media.
Today's consumers leave a digital "exhaust" of data detailing their purchasing behaviors, online browsing patterns, email responses, and social media activity. Experian's mainframe systems were approaching their limits in being able to handle the phenomenal increase in data.
And Experians' marketer clients, the top retail companies in the world, sought an integrated view of that data so they could respond in real time. They wanted to know, for example, if a particular customer in one store is the same customer now liking the company on Facebook or tweeting it on Twitter.
To meet those new demands, Experian created its Cross-Channel Identity Resolution (CCIR) engine to maintain a repository of interconnected client touch points. About 30 criteria were identified as required for its underlying technology, most importantly batch and real-time processing, as well as scalability.
Experian chose two open source technologies: Hadoop, the distributed batch processing big data framework, and its closely associated NoSQL database, HBase (Hadoop Database), for its real-time processing. HBase was also chosen for its redundancy and fault-tolerant architecture, optimized for storage.
The company selected Cloudera's distribution of both Hadoop and HBase. It also opted for Cloudera Manager, proprietary monitoring and administration software with a rich set of features. Cloudera's enterprise service-level agreement for technical support was another deciding factor.
Experian is delighted with the solution, which accelerates processing performance by 50 times at a fraction of the cost of the legacy environment. It believes the environment is the first data management platform to ingest data and link diverse information together -- postal addresses, social media IDs, email addresses, Web cookies, phone numbers, and so on -- across the entire marketing ecosystem, and assemble it into a format that pleases customers.
Now Experian clients have the most accurate, recent view of who their customers are across multiple channels, yielding informed interactions and positive consumer experiences. With relevant messages, rather than consumers feeling that their time is being wasted, they welcome contact.
This article, "Experian credits big data for real-time marketing," was originally published at InfoWorld.com. Read more of Andrew Lampitt's Think Big Data blog, and keep up on the latest developments in big data at InfoWorld.com For the latest business technology news, follow InfoWorld.com on Twitter.