Small IT, big problems: Log data reveals the unknown

Even the smallest IT team can leverage log management for smoother operations and stronger security

Small IT teams, like large IT teams, depend on up-to-date information for troubleshooting, security incident investigations, and other important tasks. Pinpointing and prioritizing issues quickly and efficiently can mean the difference between crisis and success for a business, regardless of size. Much of this critical up-to-date information comes in the form of log data.

It’s important to note that a "small IT team" doesn’t mean small data. Many small IT teams and their associated systems generate many gigabytes of logs every day. However, with limited budgets and staff, small IT teams face unique challenges when it comes to log analysis. Small IT teams are, not surprisingly, commonplace in the millions of small businesses out there (and even in some large businesses). According to Spicework’s annual report on technology adoption trends, a company with 99 employees or fewer typically has 2.5 or fewer IT professionals on staff.

Log data is the Clark Kent of IT data: mild and unassuming on the outside, but concealing tremendous power. Logs capture the vital signs of your business and provide a definitive record of every behavior from the popularity of products to the state of the security and performance of your network. Analyzing log data correctly transforms the way decisions are made and even the way the business operates. All too often, this data sits dormant on servers, waiting to be discovered. But it doesn’t have to -- small businesses and IT teams simply need a way to leverage it.

Understanding log data

One of the biggest problems IT teams face with respect to log data is simply aggregating and correlating the data. There are no data standards. Every custom application has custom logs. Existing tools like grep have serious limitations, such as character limits for the log files that are returned (see Figure 1).

The reality of using grep to manually sift through log files

Figure 1: The reality of using grep to manually sift through log files.

When you have only 2.5 IT professionals on staff, aggregating and correlating all of the log data your company creates becomes a monumental task. The task is further compounded by today’s technology, which moves so fast that new log data is created continuously, 24/7, by every application and system in your infrastructure.

For your IT team to successfully leverage log data, you first need to find a way to manage it.

Collect and centralize

Aggregating log data in one place -- as it’s generated from apps, infrastructure, and distributed environments -- is essential to getting an end-to-end view of IT. Having to search through individual silos of data and manually make correlations can be time consuming, especially when a key service is down. For example, sending all syslog and Windows events to a single place means you can break away from having to rely on multiple point tools to resolve an issue. Automating the collection of log data and centralizing it is the starting point to getting more value from your data. (See Table 1 for examples of data that IT departments need to centralize and collect.)

Most tools and manual approaches require users to normalize or select specific data from the log files, which takes time and loses the ultimate context. A better approach is to collect in real time and keep log data in its raw, native format to answer unforeseen questions. However, that can be challenging. There are no standard formats for log data. Nearly every system, application, and security device will have a different log data format.

Spreadsheets and BI tools break when used to analyze log data from disparate systems. The minute a schema is created or the data is put in rows and columns, spreadsheet and BI tools present a mountain of work beyond simply adding log data from other systems. The worst-case scenario is requiring context from a full log file, only to find out the context was lost when the data was normalized or put through ETL.

Search and monitor

Combing through log data to identify an issue essentially equates to looking for a needle in a haystack. Just as someone might use a magnet to find that needle, small IT teams need an easy way to single out the log files that are needed. These would be the files that indicate whether something went wrong, a system failed, or security was breached. 

The solution to this problem is simple, one we’re all quite familiar with: search. Having a search function to explore centralized log data removes the need to use grep or a spreadsheet to find IT issues. Imagine being able to simply type in error OR fail* and having all relevant log files reply, across all of your systems.

While being able to search for issues will significantly save time and reduce costs, the real benefit comes from transitioning from a reactive to a proactive approach. By proactively monitoring for the likes of error OR fail* and receiving an alert when it shows up, IT teams can identify and address problems before they become full-blown fire drills. (See Figure 2 for an example of a search that pulls up all errors and failures in a log file.)

Searching log files for errors or failures

Figure 2: Searching log files for errors or failures offers quick rewards. Note how a visual timeline helps isolate the issue.

Visualize and report

Creating dashboards and reports on all relevant data provides at-a-glance information on IT health and issues. For example, creating a visualization of failures by host will help an IT team to better prioritize and knock out the biggest failures first (see Figure 3 for an example of the power of visualizations).

The natural progression is to move from individual visualizations to dashboards, which include multiple visualizations based on views of both live and historical data. Dashboards are the home base for IT teams to monitor multiple thresholds and trend-based conditions.

It’s important to note that identifying and drilling down into problems directly from a dashboard is essential to an efficient workflow when managing and using log data. Being able to click down from a chart or graph directly to raw data reduces the time required to address issues, and it helps teams move from a reactive to a proactive state.

log visualizations

Figure 3: Visualizations can quickly show you where the problems are occurring.

Centrally collecting and analyzing log data is only the starting point. The real benefits begin once teams are able to break out of fire-drill mode and automate the process.

Eliminate manual searches

Moving beyond grep and manual log searches will reduce the mean time to resolution for problems and free up IT to work on more strategic projects. Keep in mind that being able to access log data and eliminate silos that span all systems and applications is critical to getting the insights needed to resolve issues and realize the full value of log data. Searching one or two applications or even a single system could end up only addressing the symptoms of a problem and not the root cause. In addition, digging through individual logs and sources reduces the efficiency benefits that come with eliminating the manual process.

Monitor systems, applications, and KPIs

An ounce of prevention is worth a pound of cure. Proactively monitoring Web and application data for issues will help IT teams get out of fire-drill mode. Monitoring for issues and triggering an alert when a key performance indicator isn’t met or a system is down will fundamentally transform an IT team from a reactive to a proactive mind-set.

Making that transformation plays a major role in helping IT teams to spend their limited resources identifying opportunities to optimize systems and applications, rather than reacting to problems. Becoming proactive ultimately enables the team to take on more strategic work, such as improving IT security.

Improve IT security

Because log data contains a definitive record of activity across your infrastructure and network, it also includes information that could indicate fraud, breaches, and advanced threats. Using log data to support IT security can accelerate security investigations and help determine the root cause of a breach. Using a spreadsheet or other row and column tool for log data can break an investigation or even inadvertently remove the data that holds the key to a breach -- because important data didn’t fit into a particular schema.

Resolving security issues can also be time sensitive. Being able to search across log data quickly or receive an alert when an anomalous activity takes place represents mission-critical capabilities for preventing or stopping an attack.

Like Clark Kent who grew from a mild Midwestern young man into a full-fledged superhero, log data should make the transformation from unassuming feedback to powerful insights into overcoming complex issues. Remember: Simply because you might be a small business or have a small IT team doesn’t mean you can’t leap tall buildings in a single bound or fly faster than a locomotive.

Shay Mowlem is VP of product management at Splunk.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Copyright © 2015 IDG Communications, Inc.