Look deep into log files

Omnisight offers powerful, flexible analysis if you have the skills

As every server administrator knows, log files are the pulse of a network infrastructure. They tell us what has occurred in an application or service, and if they stop growing, something is wrong. Log files can tell us who is using our services, how many users are using a particular resource, how often, and for how long. Logs can also be extremely valuable as forensic evidence in computer crime investigation and litigation. The trick is to be able to use log files to analyze trends in resource utilization, identify and remove security threats, and provide a useful audit trail of user action, without being buried by the sheer volume or resorting to the onerous task of manual inspection.

A large infrastructure can generate many gigabytes of logs from several services and applications every day. These logs are usually archived for a period of time, analyzed, and then discarded on a predetermined schedule. The data contained in these logs may be extremely important or completely irrelevant. Either way, the logs need to be perused to determine which is which, and what is worth flagging for further investigation. For instance, the only way to really know how a Web site is performing is to generate reports based on the Web server log-file data, and use those reports to determine if there are problems with the servers or with the site itself.

Addamark Technologies addresses these issues with Omnisight 2.5. Currently implemented at Lehman Brothers, Yahoo, and Agilent Technologies, among others, Omnisight allows systems managers to extract meaningful data from truly massive log files generated by services and applications by providing a means to import, store, and perform deep analysis on that data.

Heavy Parsing

Addamark’s Omnisight is best described as a programming framework for log-file analysis. Relying on open source packages such as Apache, with a heart written in C and a nervous system written in Perl, Omnisight is not a tool for the faint of heart or light of wallet. Omnisight runs on Red Hat Linux 7.3 or Red Hat Enterprise Linux AS 2.1, with support for Red Hat 8 nearing completion. Omnisight is designed to be implemented in a distributed environment and installed on a cluster. Exchanging SSH (Secure Shell) public keys for the root user between the cluster servers permits seamless installation of the cluster, but could be viewed as a minimal security risk. In my testing, the cluster installation was simple, however, and controlled from a single installer script. When complete, three Red Hat Linux servers were ready to handle log files.

Omnisight isn’t designed to handle live log files, but to import large, static log files into a database. To import a log file requires first creating a parsing file that describes the data to be indexed. For instance, to parse an Apache Web server log, you’ll need to create a file containing the specific log format, parsing rules, variable declarations, and potentially embedded Perl code to handle special-case log files and varying reporting formats found in many applications. These files must be written with care and tested thoroughly, as any deviations will result in parsing errors and lost data. Once the parsing file is complete, it is referenced by the indexing engine, which then imports the log file.

Addamark provides a few sample files, but it would be great to see more included with the package. Omnisight is very powerful and flexible, but it’s also very complex. You can use it to analyze log files from any application, in any format, whether supported by Addamark or not, but it requires significant skill to do so. Most of the customization of the import tools is done in a mixture of Perl and SQL. To correctly implement Omnisight will require a high level of Perl and SQL experience, although Addamark does include seven days of assistance in the cost.

Once a log file has been imported it can then be analyzed by the engine. Querying and reporting is done via a CLI and Web front-end, both of which are minimalist interfaces. The Web interface provides a central interface for reporting, query construction, and maintenance, while the CLI tools are broken into separate functions.

Deep Analysis

Reports rely on queries to the database, and SQL queries must be written for the specific log file to be analyzed. Addamark provides a handful of queries that highlight the sample log files, but again, significant skill is required to write queries for anything beyond that. Once queries have been written, they can be collected into reports to be run by an administrator, or by an authorized user. No reports are provided with Omnisight; all reports must be developed in-house.

With its format-agnostic approach, Omnisight can be adapted to handle just about any log file analysis task or objective. An obvious use is to collect and analyze data from a variety of network devices to investigate a suspected employee or an external break-in. Another adaptation could be cross-analyzing log files generated by security-card access devices and the PBX to determine if anyone who wasn’t logged entering the building was making phone calls.

Addamark is aiming Omnisight at large infrastructures with heavy-duty log-file storage, maintenance, and analysis requirements, hence the built-in clustering. Log files imported into Omnisight are meant to stay there ad infinitum, not discarded after a period of time. To achieve this, every log file is compressed during import. On a five-node cluster of dual Xeon servers, a 500MB Check Point firewall log in LEA (Log Export API) format was imported, mirrored, and compressed in 125 seconds, with a nearly 10:1 compression ratio. On the same cluster, a fairly complex query of 6 million records returned in 23 seconds — truly impressive feats.

Omnisight is a powerful and flexible tool for log-file analysis. The scope of its reach is almost limitless, given its open architecture and highly customizable parsing functions. This is not a tool to be installed by administrators and driven by nontechnical management; it’s a tool to be carefully implemented and maintained by skilled programmers. If you have the need to store and analyze massive log files from a wide variety of services and devices — and if you have the skill to handle the implementation — Omnisight can handle the load.

InfoWorld Scorecard
Value (10.0%)
Implementation (20.0%)
Management (20.0%)
Performance (30.0%)
Scalability (20.0%)
Overall Score (100%)
Addamark Omnisight 2.5 8.0 8.0 7.0 9.0 9.0 8.3
To comment on this article and other InfoWorld content, visit InfoWorld's LinkedIn page, Facebook page and Twitter stream.
From CIO: 8 Free Online Courses to Grow Your Tech Skills
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.