Harmonizing big data with an enterprise knowledge graph

In addition to streamlining how users retrieve diverse data via automation capabilities, a knowledge graph standardizes those data according to relevant business terms and models

abstract chart graph trend

One of the most significant results of the big data era is the broadening diversity of data types required to solidify data as an enterprise asset. The maturation of technologies addressing scale and speed has done little to decrease the difficulties associated with complexity, schema transformation and integration of data necessary for informed action.

The influence of cloud computing, mobile technologies, and distributed computing environments contribute to today’s variegated IT landscape for big data. Conventional approaches to master data management and data lakes lack critical requirements to unite data—regardless of location—across the enterprise for singular control over multiple sources.

The enterprise knowledge graph concept directly addresses these limitations, heralding an evolutionary leap forward in big data management. It provides singular access for data across the enterprise in any form, harmonizes those data in a standardized format, and assists with the facilitation of action required to repeatedly leverage them for use cases spanning organizations and verticals.

Enterprise-spanning connections and data representation

An enterprise data fabric delivers these benefits by successfully extending the notion of master data management and data lakes. The former is predominantly a means of describing the underlying data, typically via unified schema. In their original inception data lakes grant universal access to data in their native formats, yet lack the necessary metadata and semantic consistency for long term sustainability.  

Enterprise knowledge graphs, however, include the metadata and semantic benefits of MDM hubs but link all data together in adherence to semantic standards. The combination of enterprise-wide ontologies, taxonomies, and terminology delivers data in a representation (in terms of meaning and schema) immediately identifiable to the user. These linked data approaches connect all data uniformly.

Health care providers, for example, can connect the voluminous types of data relevant to their industry by creating an exhaustive list of events such as diagnostics, patient outcomes, operations, billing codes, and others, describing them with standardized models and fortifying them with uniform terminology across the data spectrum.

Regardless of where data is—whether in the cloud, a cache, or awaiting computation—users can link them in the same consistent format that has meaning to their business purposes. The standardized ontologies, which are malleable to incorporate new events, and unified terminology align all data to the knowledge graph’s schema regardless of their origination or other points of distinctions.

Active automation

The true value of an enterprise knowledge graph’s access layer is in the automated action it facilitates. With data stemming from any number of disparate source systems, the automatic generation of code for analytics or transformation is invaluable. Such automation is one of the crucial advantages of an enterprise knowledge graph that reduces the difficulty in not only accessing data, but also applying them to necessary action.

The use cases provisioned by this combination are innumerable. A health care organization attempting to predict the event of respiratory failure for patients in multiple locations could use knowledge graph applications to monitor the blood pressure of all hospital incumbents. The graph would enable the organization to create an abstract description of the blood pressure data related to this event (respiratory failure), then automatically compile that description into code which obtains the prediction data.

The overarching value proposition of this approach is that the user simply issues a query for the data he or she needs regardless of where data originated. The automation capabilities of an enterprise knowledge graph create the action whereby the data that pertains to the query is attained. The key difference is the user need not necessarily know the source system or the particularities of its schema to get the data. Access is much more expedient since all of the data engineering work of cleansing and transforming data is done upfront prior to issuing queries.

In addition, the data relevant to the query can stem from multiple source systems, but is still accessed from a single place (the knowledge graph). The user is not responsible for personally accessing those systems; instead, the query mechanism is able to cull the appropriate data from the varying systems accessible from the centralized knowledge graph.


An enterprise knowledge graph effectively unifies several aspects of the considerable variations intrinsic to big data. It unifies the means of accessing, representing, automating and even moving data from an array of source systems and architectural complexities. In addition to streamlining how users retrieve diverse data via automation capabilities, the knowledge graph standardizes those data according to relevant business terms and models. The result is a homogenized data set wrought from any number of data types and sources.

This article is published as part of the IDG Contributor Network. Want to Join?