Big data is quickly becoming an important part of large-scale business operations. But it's never been very quick itself, due to limitations on how that data is stored, manipulated, and retrieved.
Platfora has developed a solution for big data users that provides business analysts with self-service access, rather than requiring IT to maintain fixed-purpose reporting and analytics, which may fail to deliver the information businesses need to make timely, effective decisions.
In this edition of the New Tech Forum, Ben Werther, CEO of Platfora, gives us a look at how big data can be used for agile business intelligence, without the traditional hangups and sluggish performance. -- Paul Venezia
Bringing big data into focus with a better lens
Big data analytics today tends to suffer from an inherent contradiction: To gain competitive advantage, many companies are jumping on big data technologies, which enable them to process raw data in new ways -- yielding sharper and much more timely business intelligence. Yet the traditional processes for extracting business intelligence from big data and sharing it throughout the organization are anything but fast.
Without question, the Apache Hadoop open source project has helped to advance big data analytics. Hadoop is massively scalable and provides a framework for distributed processing of massive data sets across clusters of computers using cost-effective commodity hardware. Hadoop's flexible "schema on read" approach enables companies to define schema after data has been stored, instead of being constrained by the traditional database "schema on write" model. But Hadoop has limitations that must be overcome if businesses want to take full advantage of their raw data in all forms.
MapReduce was the original programming model used to process these large data sets in Hadoop. This required companies to hire MapReduce experts and/or train in-house IT staff to pull data out of Hadoop and into a legacy data warehouse. This approach is time-consuming as well as resource-intensive and does not provide subsecond response times required in production environments. Early adopters of the technology have also used Apache Hive and derivative technologies to connect to Hadoop by translating SQL-like queries into MapReduce -- but the process is still slow and requires experts. Additionally, these necessary steps toward making Hadoop work for the organization often place significant burdens on IT teams.
The inflexibility and latency of big data analytics are particularly frustrating for business analysts, who are under pressure to deliver timely and actionable business intelligence to the organization. Not only do they typically have little or no control over the data analytics process, most don't even realize how much valuable insight is likely being overlooked due to technology constraints. As the volume of semistructured and, increasingly, multistructured data -- Web logs, mobile application server logs, tweets, Facebook Likes, audio files, emails, and more -- continues to balloon, the situation will only worsen, yielding more frustration on all sides.
Rethinking the status quo of data analytics
The mantra of data analytics has been the same for decades: Don't build a data warehouse until you know the questions you want to ask of your data. Data warehouses store precomputed answers intended to respond to questions relatively quickly -- the limitation being that only predetermined questions can be answered.
If the questions need to change, it's impossible for business analysts to go back to the raw data to get answers to new questions or explore data beyond predefined parameters. Adding new data sets to a data warehouse also presents a challenge, as does making changes to an existing data set, such as adjusting the level of granularity (for example, from days to hours). Seemingly minor alterations like these can take weeks if not months to execute.
Today's enterprises require a more flexible approach to performing big data analytics because:
- The variety and quantity of data are growing massively.
- Analysts can't know in advance what questions they'll need to ask of their data as the market, customers, and competitors change.
- To answer the full range of unanticipated questions, self-service access must be provided to all of an enterprise's raw data.
- To stay competitive, businesses need to use their data in more ways than ever before.
Moreover, business analysts need to be empowered to manipulate data so that it can be shared with other people in the organization. In short, they must play a direct role in fostering collaboration around business intelligence. After all, business intelligence provides value to the company only if it can be used for business decision-making -- and only if those decisions are made at the right time and by the right people in the organization.