TLA 2013: Technology Creation/Enhancement
Master data management vendor Health Market Science serves the health care industry, where records exist in multiple systems and often vary in nomenclature, format, and structure. HMS's job is to provide the master file for these common records, for discovering fraud, waste, and abuse in the health care system, both for forensic analysis and real-time authorization of, say, controlled-substance prescriptions.
The typical approach is to create a relational data warehouse to hold the normalized and validated data, but that approach significantly limits the scalability of such a platform, in terms of both volume and variety of data, and this in the speed of the analytics possible on the master file. Additionally, to support both current and historical analysis, the platform must maintain point-in-time and revision information for every entity in the system.
To address the issues of data variety, volume, and velocity, O'Neill championed a novel solution to the problem that combined cutting-edge big data techniques. One challenge was to integrate the legacy relational database management system with the big data platform. This required creative thinking and use of nontraditional integration techniques, so HMS could continue to deliver from the legacy platform, while developing new products, capabilities, and services on the new platform.
The effort required abandoning traditional mechanisms that assure data consistency and integrity (such as locking mechanisms) and instead embracing techniques that allow for eventual consistency in the system, while shielding users and services from inconsistent states and integrity issues as data changes. That led to the adoption of NoSQL approaches and technology such as the Cassandra nonrelational storage mechanism and Storm distributed processing framework. O'Neill ended up becoming a contributor to the open source Cassandra and championed the establishment of numerous open source projects that extended Cassandra's capabilities and allowed it to integrate more easily into a loosely coupled services-based infrastructure.
Because its architecture is not based on batch-processing frameworks such as Hadoop, the platform better supports real-time integration via Web services. This enables self-service models and rapid, lightweight integration between systems -- in other words, mashups.
HMS's big data platform can process orders of magnitude more data than the legacy technology. Also, because it is built entirely on open source technologies, there are no mandatory licensing costs as HMS expands the system's cluster capacity and, thus, no financial worries over growing and shrinking the cluster to accommodate demand. Also because the platform is based on open source software, HMS has been able to extend the base capabilities of the infrastructure to meet market needs much faster than is possible with commercial software.