Devops open source monitoring tools can reduce cost and increase uptime

Open source tools can cost-effectively help devops engineers see and correlate data from multiple data streams more easily

multiple monitor analyze
Thinkstock

Recent progress in the open source world of devops tools can help enterprises reduce costs and increase uptime. They key thing is for managers to know how to choose and implement tools.

Break down silos of data

Look for tools that support the core devops cultural shift, which is the cultural bias to working collaboratively. In the modern enterprise we often have very siloed data and information, and monitoring tools should help bring that data together. Devops engineers using the right tools can make sure that data is shared not just with the ops team, but the dev team and the business team. IT managers need a report that’s generated off same data that goes to the business. With the right tools, managers have a real-time report that goes to the business, the dev teams, and the ops teams so that they are all looking at the same data at the same time.

This reduces the time needed to resolve issues. It eliminates the problem of having the network teams, development teams and database teams looking at different data and not immediately seeing where a problem lies. The right tools can ensure that they’re all looking at the same screen, and shows the same red dots in the same places. That cuts down on long conversations and incident calls.

That’s just the basics. Some tools are integrating business data into that same platform including Tweets. For example, if there’s suddenly a spike in Twitter users saying, “Hey, I can’t access X,” the tech team can see that. No one is looking at that today, and it’s a key performance metric for customer services. It’s a good indicator of a problem.

Some of the big players already offer these types of tools, including HP and IBM, but they are pricey and sometimes require commitments to other parts of the vendor’s platform. Sometimes it may require a team of five people just to maintain the system. This is sometimes unacceptable to a mid-size organization.

What many midsize organizations, and some larger enterprises, are opting for is an open source tool that fits right on top of the ELK stack. It’s built on top of something that’s open-sourced, so it’s rapidly implemented and scalable.

Synthetic monitoring

In the devops world synthetic monitoring is gaining a lot of traction. With synthetic monitoring engineers run simulated user transactions against the application from external locations. Generally, the simplest synthetic test is just a ping test to see if the site is up. Engineers can run that from multiple locations, and run it every minute, to make sure that performance for users in Singapore is just as good as performance for users in Cincinnati, for example. The testing is done repeatedly from all the locations around the globe where the users are. That’s key to ensuring that features are working as expected for all your users, all the time.

The next most advanced step of synthetic monitoring is defining the key product flows. Whether it’s adding something to the cart, completing a checkout for an e-commerce site, or posting a comment on a blog thread, these all need to be tested.

Most synthetic monitoring tools can be used for things other than devops, but we can’t do devops without the synthetic monitoring tools. It’s the way these tools are used in a collaborative environment that makes them a natural fit to support devops.

Synthetic monitoring is becoming more and more embedded in standard devops practices. There isn’t a separate QA team. If there is, it’s generally responsible for setting strategy, and guiding principles, and providing a framework under which each of those agile teams operate.

Synthetic monitoring is low-hanging fruit for monitoring and automation, which is core to devops. Customers can’t have a site be down and not know about it. Synthetic monitoring tells us that even though it only tells you when there is a problem. It can’t predict problems. Nevertheless, if you’re not implementing synthetic monitoring and basic user journeys on your key properties, from your key locations, you missed something fundamental. That’s a key first step.

Machine learning is supported

Machine learning can also quickly ramp up efficiency. The tools can ingest all the incident data, including ServiceNow logs. Engineers can learn when the incident happened and can start to get the machine to learn to look for the same trends. Then managers can predict outages.

To reduce downtime, tools are important, but so is the culture around the tools. In addition to choosing tools, managers need to discuss organizational structure, and talk about all the different moving parts related to application rollouts. They need to have a change communicator work with the organization, and then get the devops engineers to work with the teams, and maybe even off-shore some 24/7 work. Finally, managers should build a roadmap to look at what can be more automated and where managers can apply machine learning.

Related:

Copyright © 2018 IDG Communications, Inc.