Q&A: Why Syncsort introduced the mainframe to Hadoop

In an interview with IDGE, Josh Rogers and Lonne Jaffe of Syncsort explain how they plan to transform big iron and traditional data warehouse/analytics

Q&A: Why Syncsort introduced the mainframe to Hadoop
Thinkstock

When you think of leaders in big data and analytics, you’d be forgiven for not listing Syncsort among them. But this nearly 50-year-old company, which began selling software for the decidedly unglamorous job of optimizing mainframe sorting, has refashioned itself into a critical conduit by which core corporate data flows into Hadoop and other key big data platforms. Syncsort labels itself "a freedom fighter" liberating data and dollars -- sometimes millions of dollars -- from the stranglehold of big iron and traditional data warehouse/analytics systems.

In this installment of the IDG CEO Interview Series, Chief Content Officer John Gallant spoke with Josh Rogers, who was named CEO this week, as well as outgoing CEO Lonne Jaffe, who remains as Senior Advisor to Syncsort’s board. Among other topics, the pair talked about why Syncsort was recently acquired by Clearlake Capital Group, and how Syncsort’s close partnership with Splunk is dramatically improving security and application performance management.

IDGE: Lonne, I understand you like good storytelling. What’s the story you tell IT leaders today about Syncsort?

Lonne Jaffe, Syncsort

Lonne Jaffe, Syncsort

Jaffe: Syncsort was founded in 1968. It was one of the very earliest software companies. I joined a little over two and a half years ago, and over the last couple of years the company has focused on this new mission around liberating data and liberating budgets from the stranglehold of legacy systems, while making the data and the budgets available for the fastest-growing data platforms in the world, things like Apache Hadoop and Splunk. Those platforms allow some of the interesting next-generation machine learning technology that we’re seeing manifest across all sorts of interesting industries like health care and self-driving cars and the Internet of things and the like. It has been a remarkable transformation in a lot of ways because people don’t usually think of a 48-year-old software company as the kind of entity that would be able to innovate around next-generation big data platforms organically.

We’ve also been doing a lot of acquisitions. The play since I joined -- and even after we were acquired a few weeks ago -- is to add to our organic innovation acquisitions of high-value businesses that are in near-adjacent spaces and that are aligned with that theme, acquiring companies that have technology and talent that would help with that storyline of liberating budgets and liberating data.

We think of ourselves a little bit like freedom fighters. The company is in a unique position because Silicon Valley companies often struggle with even understanding the basics of some of the larger existing platforms that are out there, especially things like the mainframe. The really large companies that have the talent and the go-to-market that would be well suited to do something like this often have the classic innovator’s dilemma, which is that the last thing they want to do is liberate budgets from their existing businesses in order to make those budgets available for new systems that are largely open source that they don’t control and that are being sold for sometimes one one-hundredth the price of their existing products.

We’re in this unique position of being large enough to be able to pull it off and having the talent and technology that would be needed. [We’re also] small enough that we can be decisive and act with conviction around these growth opportunities that involve not just making data accessible to the new machine learning platforms but also shutting down huge amounts of spend -- orders of magnitude more spend in some of these legacy platforms as we do that data liberation.

Josh Rogers, Syncsort

Josh Rogers, Syncsort

Rogers: What we see with Syncsort customers is that they’re grappling with significant data challenges -- how to look at analytics systems and data repositories and leverage the power of new tech like Hadoop to increase their ability to analyze data. That whole decision process is complex. But then, once decided, they’re also having to think about how to make those new platforms usable by plugging them into existing systems.

One of the most difficult to integrate is the mainframe -- moving data out of it and making it useful in Hadoop infrastructure is challenging because Hadoop’s open source nature, there aren’t a lot of user tools to help. Syncsort makes that much easier for customers, so they can actually get an ROI on their investments, and allow them to take advantage of mainframe data -- often their most important data source.

The other piece of the story we tell is the cost-savings opportunity of moving workloads from the mainframe or data warehouse and replicating those same workloads to execute in a low cost infrastructure like Hadoop. We’re hugely focused on helping customers here. We’re an active contributor to the Hadoop open source project and have good partnerships in the space, but also have 48 years of mainframe experience. This puts us in a unique place, and has allowed us to gain trust of companies and customers on both sides: big iron and big data.

IDGE: Lonne, you worked at IBM in acquisitions and in some other roles in the tech industry. Why did you take on this job? What did you see in the company that made this an appealing opportunity for you?

Jaffe: A couple of things. One was that there was already organic technology that was highly differentiated in terms of being usable for the strategy. It was the nature of the product itself, including the Hadoop product that was largely built before I joined. We’re unique in the industry in terms of capabilities, extremely high performance with deeply instrumented hooks into the existing legacy business platforms and native integration with Apache Hadoop, which is arguably the fastest-growing software platform in the entire industry. That was a rich and powerful asset that I was excited about.

The other piece is the company itself, because it had been around for so long and was selling industrial scale software to the largest companies in the world. [We have] thousands of customers in 87 countries with subsidiaries in eight countries and the existing renewal stream where we had conversations with thousands of customers every year. That was a rich, intangible asset that could be used as an anchor to acquire some of the more interesting high-value technology companies out there, so we had this ability to do a two-prong strategy.

One [prong] was to double down on organic growth. Part of that was launching a number of new products that would capitalize on the existing go-to-market [capabilities] but also the existing products, the technology the company already had. The other piece explicitly from the beginning was to do acquisitions.

I’m a big believer that the innovation tool kit has many tools in it. One of them is building products and taking them to market, but another really important innovation tool is the ability to acquire businesses that already exist. In many ways that can be easier and more rewarding than even building products can be because a lot of times when you build stuff it doesn’t work or it takes really a long time. When you acquire something you can look to see if it works before you buy it and once you do the acquisition you have it instantaneously. A lot of times it comes along not only with tech and talent but also revenue and profit, customers and existing go-to-market. The ability to do that two-prong strategy was a big part of what attracted me to the company.

IDGE: You talked about capitalizing on this big data opportunity. Specifically, how are you doing that? What are the steps you’ve taken to do that?

Jaffe: There are two products that are particularly salient. The first is called DMX-h and the other is Ironstream. DMX-h is our Hadoop-based product. The strategy there has been to make tremendous contributions to the Hadoop open source project. There are a number of open source players like Cloudera and Hortonworks, MapR that have been big supporters of ours in making those contributions, and we’ve been one of the more prolific contributors to it. As we’ve done those contributions we’ve designed them in such a way that they help the Hadoop stack mature, but they also give us an advantage as we connect in our higher-value software that runs on top of Hadoop.

Hadoop is in many ways becoming the de facto operating system for data in the industry. I think of it a little bit like TCP/IP. It’s becoming a standard. It’s not really a product exactly anymore. It’s a framework and an architecture that everyone is building to. I’ve never really seen this level of ubiquitous agreement by every large existing company and all the well-resourced newer companies to build to the same exact fundamental architecture.

Our product is a runtime engine that runs on top of Hadoop for existing workloads that you were previously running in a place that’s a lot more expensive and locked away. People have legacy data warehouses today where you can spend as much as $200,000 a terabyte in a typical deployment. These can be $100-$150 million deployments at a typical customer.

They can move those workloads to run on top of our product in Hadoop (and we have some tools to make that easier that we don’t sell that help the moving process). Because Hadoop is using commodity hardware, regular Intel-based machines, and it’s running open source software that is very inexpensive and is becoming increasingly easy to manage, the all-in cost can be on the order of $400 to $1,000 a terabyte. That’s somewhere between one-thousandth or -- more generously to the legacy vendors -- probably one-hundredth the price.

When something is that much cheaper it’s essentially free. Emotionally, it feels like it’s free when it’s a hundredth the price of what you’re currently spending. That can save staggering amounts of money -- it can be $10 million in a single year in terms of run-rate spend -- that you can immediately use to hire data scientists and machine learning experts to do advanced analytics. You now also have all of your data in a platform that’s getting better faster than anything else in the technology industry.

It has two huge advantages. One is you save a ton of money, and immediately you also unlock the data and put it in a place where every day there’s another next-generation advanced machine learning system that comes online. All of your data is already there and ready to be used.

There is a big shift that’s happening in the data world around machine learning that is still in its infancy but is a juggernaut in terms of its momentum. It used to be that humans wrote almost all of the software. Humans wrote software, then the software did something. Now it’s becoming increasingly the case that the machines are writing the software. People don’t call it that. They call it machine learning or they call it training models. But really what’s happening is the data is coming in, the machines are using the data to write software and then that software runs and does something.

You see this even with companies that you think of as being in a different industry. Tesla is a car company, but in some ways it’s also a machine learning company where they have to build a car so that they can gather all the data that they need to train their machine learning model to be a self-driving system. The system gets better when it has more data.

This is why Google was able to open-source all of its core IT associated with the search algorithms and a lot of its other machine learning systems because they know that their insurmountable advantage is not the IP or the algorithms but rather the staggering amount of data that they have compared to everyone else. If you don’t have that volume of data you can’t train the models and the software. A lot of times the computers that are writing the software that then runs, the humans don’t even know what it does but they don’t have to know because it works and it works because it’s trained on really large amounts of data.

A lot of these algorithms were written in the '70s and the '80s, and they weren’t that useful until recently because there wasn’t enough data to train the models. Now, because of these new open source platforms like Hadoop, it’s possible to have seven years of data on your customer ATM system or your credit card processing system or your airline reservation system instead of only three weeks of data, so the models can get really good. When you’re building your clinical analytic system for health care, figuring out which treatments result in the best outcome and you feed all the data in, you can get interesting, useful results whereas before you didn’t have enough data to do that.

That’s a big underlying secular growth opportunity around the play. The short-term capability of the product is a really powerful runtime engine that’s well suited to moving workloads from systems like the mainframe, Teradata, Natezza, Oracle inside Hadoop, and it serves as a catcher for those systems, then runs them in a way that’s easy to maintain and high performance and very secure.

IDGE: To be clear, that’s DMX?

1 2 3 Page 1
Page 1 of 3