Q&A: Why Syncsort introduced the mainframe to Hadoop

In an interview with IDGE, Josh Rogers and Lonne Jaffe of Syncsort explain how they plan to transform big iron and traditional data warehouse/analytics

1 2 3 Page 2
Page 2 of 3

There’s also an underlying growth trend. The mainframe, which is tens of billions of dollars a year in spend, is a mission-critical system that runs some of the world’s most important transactional environments, things like airline reservation systems or credit card processing systems or retail commerce systems. It’s used for those cases because it’s an I/O supercomputer. It’s unmatched in terms of concurrent transaction processing.

It's not useful in the same way for things like running a social network and you need to do status updates, for things where it doesn’t matter if you refresh the page if the thing hasn’t updated yet. But things where it does matter, where you need perfect transactional integrity like financial services, they’ve started to measure the power of the mainframe by the number of Cyber Mondays that you can run on a single box. That’s the metric that they’re using because it’s so powerful.

There’s a big data company called Splunk, which is one of the fastest-growing software companies in the history of humanity. It's been targeting a couple of really important use cases, two of which are cyber security and application performance monitoring. There has been this gap, which is that they couldn’t monitor the mainframe because it’s actually really hard. [Splunk’s] technology pulls log data off of pretty much almost every other kind of system that exists in an enterprise except for the mainframe.

The mainframe tech is really hard because there are a lot of logs. It’s one of the most prolific log generators, and it’s running your systems of record. Your cyber security system is not that useful if it can secure your entire enterprise except for your system of record, the system that has all of your customer bank accounts on it. Similarly, for your application performance monitoring system, it can monitor basically your entire enterprise except for the third tier of your three-tier applications. You can never get to the root causes of any problems.

Splunk approached us about that, and we were in this unique position where most of the other mainframe software vendors have massive existing businesses that are getting decimated by Splunk, so the last thing they want to do is try to help them get mainframe log data off the mainframe at Splunk. The Silicon Valley companies, many of them don’t even realize the mainframe still exists, let alone that the mainframe is a bigger market than most of the data industry.

That’s what Ironstream does. It pulls the cyber security data and the advanced application performance monitoring data off the mainframe, feeds it into Splunk for advanced analytics on those use cases; cyber defense and application performance monitoring and increasingly other things like broad analytics and new customer churn and the like. That’s actually become the fastest-growing product in the 48-year history of Syncsort, is already closing huge, million-dollar-plus deals and has been really exciting in terms of real time streaming and telemetry data.

Rogers: If you look at our customer base for DMX-h and Ironstream, something that’s unique is the pace at which we’ve been able to take the largest enterprises in the world into production. We’re approaching customers not only with products that can deliver the value prop we’re describing, but also with expertise, and even battle scars, of having done this with many Fortune 500 enterprise over last 36 months as Syncsort has been delivering these products to market.

We believe we have the best experience in world in offloading processing to Hadoop and delivering mainframe data to big data infrastructure -- like infusing Splunk with critical log data for monitoring. This expertise is packaged into every proposal we do -- we include services to help customers get their tech configured and operationally sound. We don’t simply deliver technology, but help customers have success with it.

One large financial institution got started with a data warehouse offload program and was very successful. As it got deeper into it and put more and more processing into production in its Hadoop cluster, the company started to look at other areas to reduce cost. It had a test data management process running on the mainframe to support app testing, and it was very rigid and expensive – the company could only run it once a year. It realized this was the type of process that would be well-suited for Hadoop. It turned to Syncsort to help them rebuild that process in a Hadoop cluster. Businesses might start with data warehouse offload, but once they have the infrastructure up, we see it starts to attract other workloads and starts to increase the value of investment.

IDGE: The other thing that you brought up is cost savings and you mentioned it on the Hadoop thing but if you were to crystallize it, what are the key ways that you save money for customers?

Rogers: The bulk of cost savings is in moving workloads from expensive platforms to less expensive ones, but there’s another piece and it invokes the company name. We’re steeped in history and are the leader in sorting technology. The sort function gets invoked at every stage of data processing -- it drives up mainframe CPU and cost because of the high usage, and it increases the size of Hadoop clusters.

One of the benefits of all of Syncsort’s products -- this has been the case for 40-plus years and a key differentiator -- is we’ll execute the same workload more efficiently than any other product because of the core architecture of our compute engine and the time-tested algorithms and optimizations that get invoked at runtime. There’s the cost savings because you’re running data processes on a Hadoop cluster instead of a Teradata box, but also huge cost savings because our technology makes the workloads more efficient in the Hadoop cluster. They’re even more efficient than if you wrote those jobs as custom code like MapReduce code directly on Hadoop.

Jaffe: The main mechanism is the ability to shut down existing spend on legacy systems. I’ll give you an example around DMX-h. The traditional architecture of an analytics environment is you have six or seven different source systems. One of them might be social media and maybe Web logs from your website and mainframe and legacy databases and things like that. You bring them into a data warehouse, and you do some preprocessing, then you do some analytics, maybe create some dashboards against it.

What the customers are doing is they’re putting Hadoop in the middle of that architecture. They’re putting it between the source systems and the downstream systems. When that happens they can shut off a lot of things. They can shut off their spend on ETL products, products and companies like Informatica. They just turn them off. They can shut off half of the capacity in their data warehouse. About half of that capacity is effectively preprocessing that can be done for a teeny fraction of the cost in Hadoop.

Then on the source system side you have these big systems like mainframes, which are tens of billions of dollars, about half of which is essentially inefficient batch preprocessing, all of which can be moved into Hadoop. Mainframe is actually metered, so as soon as you move workloads off of it you start saving a huge amount of spend and that happens instantaneously. That savings is all immediate.

That’s the step-one project. That’s a project that we’ve done over and over again at some of the largest companies in the world, including the banks and many other largest telecom and financial institutions. It’s a very easy project because it doesn’t require any reengineering of your business processes. You’re not changing the analytics that you’re doing at all, you’re not changing the source systems, nothing is changing other than you’re replacing some really inefficient systems with Hadoop and you just snap it right in. It’s pretty elegant and you’re ready to go.

The step-two project, which in some ways is the more interesting one, is you can start shutting down the downstream part of that architecture altogether. Instead of sending it through Hadoop on the way to where it was going anyway, you start sending it to Hadoop and you start running new, next-generation analytics directly against that. Now you’re able to do all sorts of things that you weren’t able to do before.

That’s less of a cost savings play, although a lot of times the use cases are very cost-savings oriented like churn analytics,  one of the most common things people are doing on their large volumes of data they’re running into Hadoop, predicting which customers are going to continue so that you can rescue them now for relatively small amounts of money instead of having to pay more money to rescue them later. Those can be business-oriented cost savings that need to happen as well.

In the case of Ironstream, one of the main ways that people are able to save money is by shoving off the legacy monitoring products, some of which are billion-dollar-a-year businesses that are running what I sometimes affectionately refer to captive-grazing-based business models where they are not really improving the products at all. They’re jacking up prices on the customers, they’re just extracting value and so by now lighting up their existing environment within something like Splunk, which they’re typically already using for everything else, customers can start turning off those systems and save many orders of magnitude more money than they spend on Splunk and Ironstream and all the rest of their new analytical environment put together.

Rogers: A real-life example of this is with another financial institution we work with that had challenges with compliance and security. Application testing groups would leverage mainframe data assets, and the organization didn’t have a good way of monitoring that only authorized users were accessing various data assets. Insuring privacy has grown in importance as regulations have increased in the financial sector. They had some level of mainframe monitoring, but it required a single mainframe expert to interpret whether any violations in access had occurred. This was big job, and the person monitoring the mainframe was not connected to the application testing groups and the information was in a format that was not easily shareable across departments.

This institution was a Splunk user and had great success in using Splunk dashboards to deliver information to broad sets of users to support other use cases. They determined a Splunk dashboard would be the best way to communicate to testing managers data access patterns and ensure compliance, but needed a way to move the appropriate MF log data (up to a 1TB per day) into Splunk on a real-time basis, which is what Ironstream does. They were able to decrease the risk of violating compliance regulations, and the project had a material impact on the business -- it increased the speed at which they can deliver new features in applications because it sped up the testing process.

IDGE: You’ve made two acquisitions. How do those acquisitions advance the strategy?

Jaffe: I’ll go in reverse order starting with the most recent one. William Data Systems made software that helped our cyber security play. It added another type of security data to the Ironstream product, which was network security. Network security is very important when you’re doing cyber defense because a lot of the interesting attacks and problems happen in the context of the network and all of these systems, especially the high-value and industrial-scale systems. They are a London, UK-based company.

We acquired the technology, we immediately put the IP inside Ironstream so that you get it for free as part of Ironstream, and we started using all of the talent in the company to build new capabilities within the Ironstream product. Then we also gave the existing William Data products a lot of lift. We were able to go to our existing install base and say to everybody: Do you want this? A lot of the customers said yes, so we were able to create some growth there.

The prior acquisition is a company called Circle Computer Group, and also based in the United Kingdom, also near London, and it was a very similar dynamic. It had software that allowed you to shut down spend on some legacy data platforms while also moving the data to places where it was a lot more accessible by fast-growing big data platforms. Within a few weeks of closing the acquisition, we had essentially paid back a quarter of the purchase price of the company by giving it lift and bringing it to our existing customers. The CEO of the company became the head of our European operations because he’s a fantastic leader.

All of the technical talent started working on building Ironstream, and actually they’ve built a big part of that core organic IP. Then we were able to also get a number of customers we didn’t have before who we could upsell with the rest of our products. That’s the kind of acquisition we were looking for, highly differentiated tech that’s a near adjacency to what we currently do where we can give them lift and then use all of the other parts of the company, like the talent and the intellectual property, for some of our new initiatives.

1 2 3 Page 2
Page 2 of 3