How to build a big data supply chain

To get the most from big data, you must marshal new infrastructure and develop new collaborative processes. John Haddad of Informatica provides salient examples

The bigger big data gets, the more challenging it becomes to manage and analyze to deliver actionable business insight. That's a little ironic, given that the main promise of big data is the ability to make better business decisions based on compute-intensive analysis of massive data sets. The solution is to create a supply chain that identifiess business goals from the start -- and deploy the agile infrastructure necessary to make good on those objectives.

In this week's New Tech Forum, Informatica senior director of product marketing John Haddad details four common use cases that help illustrate how a properly constructed big data architecture can deliver results in the real world. -- Paul Venezia

For decades, IT has relied on conventional business intelligence and data warehousing, with well-defined requirements and pre-defined reports.

In the new world of big data analytics, discovery is part of the process, so objectives shift as new insights emerge. This requires an infrastructure and process that can quickly and seamlessly go from data exploration to business insight to actionable information.

To swiftly transform data into business value, a big data architecture should be seen as a supply chain that can manage and process the volume, variety, and velocity of data. To get started, every company needs a big data process. That process is divided into three steps:

1. Identify business goals
No one should deploy big data without an overall vision for what will be gained. The foundation for developing these goals is your data science and analytics team working closely with subject matter experts. Data scientists, analysts, and developers must collaborate to prioritize business goals, generate insights, and validate hypotheses and analytic models.

2. Make big data insights operational
It's imperative that the data science team works in conjunction with the devops team. Both groups should ensure that insights and goals are operational, with repeatable processes and methods, and they communicate actionable information to stakeholders, customers, and partners.

3. Build a big data pipeline
The data management and analytics systems architecture must facilitate collaboration and eliminate manual steps. The big data supply chain consists of four key operations necessary for turning raw data into actionable information. These include:

  • Acquire and store: Access all types of data from any platform at any latency through adapters to operational and legacy systems, social media, and machine data, with the ability to collect and store data in batch, real-time and near-real-time modes.
  • Refine and enrich: Integrate, cleanse, and prepare data for analysis, while collecting both technical and operational metadata to tag and enrich data sets, making them easier to find and reuse.
  • Explore and curate: Browse data and visualize and discover patterns, trends, and insights with potential business impact; curate and govern those data sets that hold the most business value.
  • Distribute and manage: Transform and distribute actionable information to end-users through mobile devices, enterprise applications, and other means. Manage and support service-level agreements with a flexible deployment architecture.

Once the process is established, the big data reference architecture can support these four common big data use case patterns, which enable actionable business intelligence: data warehouse optimization, 360-degree customer analytics, real-time operational intelligence, and managed data lakes.

Data warehouse optimization
As data volumes grow, companies spend more and more on the data warehouse environment. The problem arises when capacity in the environment is consumed too quickly, which ultimately forces organizations into costly upgrades in storage and processing power.

One way to cope with high-volume data growth is to deploy Hadoop, which presents an inexpensive solution for storing and processing data at scale. Instead of staging raw data that comes from the source systems into the warehouse, simply store original source data in Hadoop. From there, you can prepare and pre-process the data before moving the results (a much smaller set of data) back into the data warehouse for business intelligence and analytical reporting. Hadoop does not replace the traditional data warehouse, but it provides an excellent, complementary solution.

1 2 Page 1
Page 1 of 2