Analyzing network security events for intrusion detection and forensics is a good and popular reason to implement log management, but it's not the only reason. Auditing and compliance are becoming just as important as traditional security requirements, while the best-run IT shops understand the value of logging for systems and application management.
Regardless of the purpose behind logging, the log management process has several distinct phases: policy definition, configuration, collection, normalization, indexing, storage, correlation, baselining, alerting, and reporting. You may see the various phases summarized in different ways, but the lifecycle is always the same. When choosing a log management solution (see "InfoWorld review: Better network security, compliance with log management"), you'll want to evaluate the product features and capabilities with the whole process in mind.
Policy definition. Policy definition means determining what you're going to audit and alert on. Is your company interested in security event detection, operations and application management, or compliance auditing? Will you be auditing just workstations and servers or also applications and network devices?
Configuration. When you decide what and why you want to audit, you then need to detail what log events will help you achieve those goals. Many log management vendors provide "suites" or "packages" that attempt to provide predefined, built-in configurations to support various goals, although I did not find any product that was as inclusive as needed. Each user will have to review what the product is able to capture and alert on, then define additional capturing to fit the demands of their environment. Configuration is the act of translating your audit policies into actionable information capture.
On a related note, you can click here to download a detailed listing of events that should be monitored and reported on in Microsoft Windows networks: Windows Security Event IDs [Excel file].
Collection. Data collection involves sending log message events from clients to the log management server. Most products provide agentless data collection or require that client events be forwarded to the server. Most log management products provide agent software to assist with data collection in cases where agentless collection doesn't make sense.
Normalization. Collected data is often parsed and separated into its individual data fields as it enters the data stream. Parsed data (also known as structured) is typically easier to index, retrieve, and to report on. Unparsed data (also known as raw or unstructured) can normally be collected, but isn't as easy to index, retrieve, or report on. Often administrators will have to create their own parsing or treat unstructured data as a single data field, as well as conduct keyword searches to retrieve information.
Normalization is the process of resolving different representations of the same types of data into a similar format in a common database. In a log management database, this may involve synchronizing reported event time to a common time format -- say, local time to Coordinated Universal Time. It may mean resolving IP addresses to host names, and anything else that attempts to make disparate information more similar. The more parsed and normalized data you have, the better. When reviewing products, be sure to examine the number of parsers included to make sure they capture the majority of the log information in your environment.
Indexing. In order to optimize data retrieval for search queries, filters, and reporting, data needs to be indexed as it stored. Indexing takes parsed data, although some vendors will index unstructured data for faster retrieval.
Storage. Captured data needs to be stored to medium- or long-term storage. All products save to local hard drives, and some can store to external storage arrays, such as SAN, NAS, and so on. All the products tested allow event messages to be exported for long-term storage and later retrieved if needed. If you're concerned about legal chain of custody requirements, make sure the solution you're evaluating cryptographically signs all stored messages.
Correlation. Correlation is the process of taking different events from the same or different sources and recognizing a singular event. For example, some log management products have the ability to recognize a packet flood or password guessing attack, versus simply reporting multiple dropped packets or failed logons. Correlation reflects product intelligence. Log management products that excel at correlation are known as Security Information and Event Managers (SIEM). A number of products in the review combine log management (log collection, storage, querying, and reporting) and SIEM functionality, but only their log management functionality was evaluated.
Note that, in order for centralized log management to work well, it's very important that all incoming log information have accurate time stamps. Make sure that all monitored clients have the correct time and time zone. This will help in reporting, forensic analysis, and legal purposes.
Baselining. Baselining is the process of defining what is normal in a particular environment, so that alerting is done on only aberrant patterns and events. For instance, every network environment has multiple failed logons during the day. How many failed logons are normal? How many failed logons in a particular time period should be reported as suspicious? Some log management products will listen to incoming message traffic and help set alerts when levels have exceeded certain thresholds. If the product doesn't do it, you must.
Alerting. When a critical security or operation event happens, it's important that a response team get notified. Most products support email and SNMP alerting, and others support paging, SMS, network broadcast messages, and syslog forwards. A few products interface with popular help desk products so that a service ticket can automatically be generated and routed.
It's also crucial for alerting thresholds to suppress multiple, continuous alerts from happening from a single causative event; most products support this feature. For example, you don't want to be alerted 1,000 times of a single, continuing port scan across multiple ports. One alert should be enough to get the response team moving.
Reporting. Reporting on all collected events allows long-term baselining and metrics to be accomplished. Critical events should be included on reports and alerted. Reporting allows technical teams to pinpoint problems and management to gauge compliance efforts.
You can find a more detailed discussion of the log management lifecycle and security auditing in my downloadable report, "Log Analysis Deep Dive: Finding Gold in Log Files."