The NIST guide steps through all of the essentials of log file management: identifying the threats and risks to your environment; determining policies for logging, auditing, and handling logs; collating, indexing, and normalizing logs for analysis; defining and generating alerts and actions for critical events; and defining reports and metrics for management review. From putting log management infrastructure and processes into place to reviewing and archiving logs, it leaves no stone unturned.
One of the most important determiners of success is how much you can automate the process -- because what you don't automate you probably won't do. For example, it's critical to let an event management system (often described with acronyms of SIM, SEM, or SEIM) do all the hard work. It should be configured to collect, filter, and analyze the data, and to prioritize and generate alerts. You don't want to stare at reported events all day deciding which ones should be acted upon.
For every event record you collect, you need to determine its criticality. Most event log records are unimportant, meaning that they don't lead to an alert or action item. You must store them for some period of time in case they are needed for troubleshooting or forensics analysis after the fact. The big questions are where to store them and where to filter them?
One school of thought is that all events, regardless of criticality, should be sent to a centralized server. Then if they are ever needed, investigators need only look in one place to see all events. A central repository is a great idea in a perfect world. But considering that 1,000 computers and a firewall can generate 25GB of data each day, sending all of your logs to one place can have a huge impact on the network and centralized storage. Even a multiterabyte SAN will be able to store but so many days of data.
Another idea is to filter out unimportant events at the client and send only medium and critical events to the centralized server. If the unimportant events are needed, then they can be examined on the client in question or sucked into the centralized server whenever more cogent analysis is needed.
What determines a critical event?
Critical events should always lead to an immediate alert and a responsive investigation. If your configuration ends up generating so many critical events a day that responders can't keep up or action items queue up ignored, then you've defined your critical events too broadly.
An event record should be defined as critical when it indicates malicious activity. For example, suppose you rename your Administrator account and fine-tune your network so that nothing legitimate ever looks for or uses the log-on name Administrator. If a single log-on event detects the log-on name Administrator being used, you have an actionable event.