The 3 Cs from data to digital transformation

Designing the right data solution for modern applications requires sound research and understanding of what is available both outside and, more important, inside your company

number 3 with network nodes top three
Getty Images

Without question, digital transformation has captured the attention of the world’s business community. But digital transformation is really data transformation because the foundation is so frequently putting new sources of data to use. Consider how new data-driven juggernauts have upended industries, such as Uber in transportation, Airbnb in hospitality, and Alibaba and Tencent in payments, pushing China’s mobile payments to a record $32 trillion in 2017.

Companies seeking a similar impact on their industries can follow a simple three Cs approach to improving their data infrastructure on a path to digital success. That includes:

  • Consideration of the applications and analytics that need to be served.
  • Consolidation of the expanding number of datastores and analytic systems.
  • Cloud focus for new deployments.

1. Consideration

Too often, technology deployments can take on a life of their own independent of the applications and analytics they serve. It takes time and energy to corral the inputs from business units on specific requirements and work them into the infrastructure.

Data architects can use the following points to focus the team’s consideration of data requirements.

Consider the chief data beneficiary

Often, we think of the chief data officer, but in the business world, we need to consider the chief data beneficiaries and how we can serve them. This could be a simple analysis of who makes use of the data from a given application, what service level agreements do they need, or what specific analytic functions will they rely on.

Consider the data requirements of the application

Every application has unique data requirements, and it pays to get ahead from the start. For example, new applications with data coming from mobile devices pose interesting questions: How do I ensure capture of all the data? What happens with a temporary loss of connectivity? How do I ensure there isn’t deduplicated data in my database? What is my tolerance for the number of events I can lose: 1 percent, 5 percent, 0 percent?

Consider the application lifespan and ecosystem fit

All applications have some finite life; however, the successful ones usually last longer than ever expected. In preparation for a longer application life, understand the ecosystem fit upfront. What kind of tools do I have available for this data architecture? How many developers in my company will be able to assist? Troubleshoot? Debug? What mechanisms do I have to connect this solution to others inside my company? Answering these questions upfront can minimize complexity down the line.

2. Consolidation

No one begins a data-centric project by saying, “I cannot wait to have a separate database, data warehouse, data lake, and data science system.” However, far too often the proliferation of data systems leads to such configurations.

Consolidation has been an enterprise data theme for years. Early on this focused on the deployment of the data warehouse, a central repository for all corporate data to reside for easy analysis. However, it should be noted that the inclusion of data warehousing complemented but did not replace the database. So now we are at a minimum of two systems instead of one.

With the proliferation of Hadoop and cloud object stores such as AWS S3, we’ve added another piece to the puzzle. However, data lakes rarely have the performance or query capability of a data warehouse; therefore, they add to proliferation more than they reduce it.

Finally, the intense interest in machine learning and AI is driving companies to deploy more systems, such as Apache Spark. Apache Spark does not replace the database, the data warehouse, or the data lake, but rather complements these systems with a robust transformation engine. Proliferation remains.

However, today a movement is underway to bring data systems back to the united front where they began, where transactions and analytics took place in a single system.

New architectures and performance capabilities allow for systems that combine both transactional and analytical workloads. These can reduce the amount of sprawl between databases and data warehouses, as well as potentially alleviate additional systems.

Prominent analyst groups such as Gartner, Forrester, and 451 Research have all identified this trend, although they refer to it by different terms. Gartner calls it hybrid transaction/analytical processing, or HTAP. Forrester refers to it as translytical, and 451 Research uses the term hybrid operational analytical processing, or HOAP.

Regardless of terminology, the convergence trend is here, allowing companies to simplify data architectures, reduce or eliminate the need for ETL (extract, transform, load) and build modern, scalable applications that can support real-time engagements, a critical enabler of digital transformation.

3. Cloud

The dominant deployment factor for data architects today is the cloud. And while there is no single answer to provide cloud direction, a few guidelines will help ensure success.

Stay standards-based for cloud-only deployments

Each major cloud offers a variety of software services on top of basic infrastructure deployment. Database choices are just one. When picking a cloud database solution, be sure to stick with those that are standards-friendly. For example, databases that support ANSI SQL will allow you greater flexibility to migrate over time, should that be required. Databases that use their own query language will pose a more formidable migration challenge.

Assess multicloud requirements

Today or in the future, your solution may need to run on more than one cloud. If so, be sure that the data solution you pick supports that deployment model. For example, companies serving the retail sector, their clients may have preferences for specific clouds. You can understand why Amazon may be less favored by large retailers compared to Microsoft Azure.

Even so, cloud flexibility often makes business sense by helping provide competitive pricing, and an insurance policy of choice. Understandably maintaining a solution on multiple clouds can take a bit more effort, but the benefits often outweigh it.

Assess on-premises requirements

Even though the megatrends point to the cloud, with data solutions there are often cases where an on-premises solution makes sense. Data architects now need to consider standards-based solutions that work across multiple clouds and on-premises for complete coverage.

Architect for success through consideration, consolidation and cloud

Designing the right data solution for modern applications requires sound research and understanding of what is available both outside and, more importantly, inside your company. Spend the time up front on consideration, consolidation and cloud to ensure the most success, and set you on a path from data to digital transformation.

This article is published as part of the IDG Contributor Network. Want to Join?