Expand data types faster than systems

Data architects and leaders are often tasked with collecting a wide range of new data. Sometimes that can lead to the inadvertent growth of new data stores

jw datastructures 4
Forgemind ArchiMedia (CC BY 2.0)

In a world of ever-increasing devices and connectivity, we are awash in new data types. Specifically, the data industry has grown from a model where basic data types served much of the business world to a broad set of data types that unfortunately have caused a rash of data system proliferation.

But recent developments with multimodel databases brings relief to data architects in reducing the number of overall systems.

Early days of structured data

For decades, data stores worked well with a limited set of data types. This functioned best when the inputs to these databases and data warehouses fit within a set of constraints. For example, with inputs from custom applications, businesses can control the data structures from the beginning and can cap the number of new formats.

However, with the evolution of the internet and astronomical growth in device connectivity, the data landscape shifted rapidly to accommodate unstructured data, most commonly in the newer JSON format.

The unstructured explosion

Introduction of the document format

With new web and mobile applications, plus the desire to remove barriers to application development, the industry adopted the document format, most commonly implemented via JSON, or JavaScript Object Notation. JSON has become the preferred mechanism for attribute-value pairs eclipsing the earlier document format of XML.

Benefits of JSON

For developers, JSON provides an ability to quickly stand up new applications and not necessarily be locked into a set schema common with traditional relational databases. For example, if a mobile application is sending certain data to a database server and that mobile application changes, the JSON attributes can change without having to change the underlying database.

JSON also provides developers with the ability to populate the database with an API, a preferred method for rapid prototyping and application deployment

Consolidating with multimodel databases

Fortunately for developers, almost all major databases, including classic relational databases like Oracle and SQL Server, now include support for JSON as well as a host of other data types. This begs the question of when and how additional data stores should be supported for varying data types.

Consider the entirety of the pipeline

To understand the benefit of converging data systems, first consider that a typical data pipeline might involve the following components:

Data Ingest > Database(s) > Application or Business Intelligence

In legacy architectures, different data ingest paths might have led to different databases for data capture. However, a simplified approach allows for multiple data types but without proliferating data stores.

Let’s explore two use cases for architecting with multiple data types.

Real-time data pipelines from sensors or devices

The Internet of Things and associated devices continues to spawn applications deriving value from data. Because it can be difficult to synchronize a single consistent schema across a range of devices and a database simultaneously, accommodating a mix of structured and unstructured data can facilitate accommodating new data ingest models.

For example, parameters such as Device ID or Serial Number are likely to be universal and apply across a range of devices in a standardized format and can easily fit within a set schema. Other metrics such as ambient environmental statistics may differ among a range of products. Those metrics can be sent to the data store as a JSON blob, which provides flexibility for new device data points.

Querying this single data store provides the advantages of immediate results, and when that data store supports both SQL and JSON, the combination becomes powerful.

Accommodating variable customer data

A similar example of combining data types exists when combining structured and unstructured data with human input. Real estate is a shining example when you consider the data points for a residential property. A typical house with a have a number of structured fields like bedrooms, bathrooms, square footage, lot size, and address. The house will also have a number of unstructured fields based on human input for things like house style and neighborhood feel.

To use a residential real estate data set, it makes sense to have both structured query capabilities as well as the ability to access unstructured data easily. Especially with the case of open-ended human input, capabilities like full-text search come in to play. Here, a data store that supports SQL, JSON, and full-text search will go a long way to simplifying the overall infrastructure.

Rethinking data store choices

Without question we are likely to see continued adoption of new devices and applications which mandate flexibility for the underlying data store. However, adding a new type of database or data warehouse for each new data-ingest path, or pipeline process, does not scale. Rather, architects should consider the movement towards accommodating multiple data types and query options within a single data store for simplicity, scalability, and cost savings.

This article is published as part of the IDG Contributor Network. Want to Join?