February 2008 Archives

Anton Chuvakin

AddThis Social Bookmark Button

As promised, here is another “Top 11 Reasons” which is about log analysis. Don’t just read your logs (definitely don’t just collect them); analyze them. Why? Here are the reasons:

  1. Seen an obscure log message lately? Me too - in fact, everybody have. How do you know what it means (and logs usually do mean something) without analysis? At the very least, you might need to bring additional context to know what some logs mean (example: IP address -> hostname -> server owner)
  2. Logs often measure in gigabytes and soon will in terabytes; log volume grows all the time - it definitely passed the  limit of what a human can read a long time ago, it then made simple filtering ‘what logs to read’ impossible as well: automated log analysis is the only choice.
  3. Do you peruse your logs in real time? This is simply absurd! However, automated real-time analysis is entirely possible (and some logs do crave for your attention ASAP - e.g. major system failures, confirmed intrusions, etc)
  4. Can you read multiple logs at the same time? Yes, kind of, if you print them out on multiple pages to correlate (yes, I’ve seen this done :-)). Is this efficient? God, no! Correlation across logs of different types is one of the most useful approaches to log analysis.
  5. A lot of insight hides in “sparse” logs, logs where a single record barely matters, but a large aggregate does (e.g. from one “connection allowed” firewall log to a scan pattern). Thus, the only way to extract that insight from a pool of data is through  algorithms that “condense” that collection of logs into usable knowledge (some say, visualization is the way to go)
  6. Ever did a manual log baselining? This is where you read the logs for a while and learn which ones are normal for your environment. Wonna do it again? Thought so :-)  Log baseline learning is a useful and simple log analysis technique, but humans can only do it for so much before burning out.
  7. OK, let’s pick the important logs to review. Which ones are those? The right answer is “we don’t know, until we see them.” Thus, to even figure out which logs to read, you need automated analysis.
  8. Log analysis for compliance? Why, yes! Compliance is NOT only about log storage (e.g. see PCI DSS). How to highlight compliance-relevant messages? How to see which messages will lead to a violation? How do you satisfy those “daily log review” requirements (again, see PCI DSS)? Through automated analysis, of course!
  9. Logs  allow you to profile your users, your data and your resources/assets. Really? Yes, really: such profiling can then tell you if those users behave in an unusual manner (in fact, the oldest log analysis systems worked like that). Such techniques may help reach the holy grail of log analysis: have the system automatically tell you what matters for you!
  10. Ever tried to hire a log analysis expert? Those are few and far between. What if your junior analysts can suddenly analyze logs just as well? One log analysis system creator told me that his log data mining system enabled exactly that. Thus, saving a lot of money to his organization.
  11. Finally, can you predict future with your logs? I hope so! Research on predictive analytics is ongoing, but you can only do it with automated analysis tools, not with just your head alone (no matter how big :-)) …

 Past top 11 reasons:

 

Technorati tags: , ,


Anton Chuvakin

AddThis Social Bookmark Button

This is my 6th logging poll (vote here now!)- links to the previous five polls below.

This one is deceptively similar to the #1 below, but it is not. This poll is What logs do you actually LOOK at? and not Which Logs Do You Collect? In other words, are you a log packrat? Are you collecting and never using the log data? You are making a mistake, if you don’t.

Past polls:

  • Poll #5 “What are your top challenges with logs?” (analysis)
  • Poll #4 “Who looks at logs in your organization?” (analysis)
  • Poll #3 “What do you do with Logs?” (analysis)
  • Poll #2 “Why collect logs?” (analysis)
  • Poll #1 “Which logs do you collect?” (analysis)
  •  

    UPDATE: analysis of this poll posted here. Enjoy!

    Technorati tags: , , ,


    Chris Josephes

    AddThis Social Bookmark Button

    A podcasting friend of mine ran into the problem of always having to send the new episodes to his co-hosts for review. Once everyone agreed that it was okay, the show was put live on the RSS feed. Their method of distributing a raw mp3 file? Email.

    If only there was a way to distribute the file electronically, without the overhead of email, and yet still get the file automatically once it’s ready. How about RSS?

    ITunes and other feed aggregators have the feature of handling RSS feeds that are protected by HTTP authentication. When you download the feed, your client will prompt you for a username and password before downloading the RSS XML.

    My friend’s podcast now has two RSS feeds:

    http://www.example.com/feeds/public/podcast.xml
    http://www.example.com/feeds/private/podcast.xml

    The first URL is what’s submitted to all of the podcast directories. The second one is strictly for preview purposes. All of the responsible parties for the show subscribe to the private feed. This allows them to test new episodes, and verify that the RSS <item> content for the episode is correct.

    Once everyone has agreed that the episode is ready, the RSS tags for the episode are copied over to the public feed XML file. Now outside users can see the episode and download it.

    This is pretty good for most situations, but there is still one risk: HTTP URLs can contain authentication encoding in them, like so…

    http://username:password@www.example.com/feeds/private/podcast.xml

    Avoid using this convention in your bookmarks, or feed entires. If that URL were to be copied to an outside data source, there’s a chance that it could get into the wild. When that happens, outsiders may end up listening to your private, not production ready, podcast.

    To reduce the chances of that happening, ITunes won’t list a podcast in their directory if the URL contains an embedded username and password. They won’t even list a podcast if the server makes a request for HTTP authentication.

    AddThis Social Bookmark Button

    Years ago I visited Danga back before the Six Apart acquisition, when the company had its headquarters a couple of miles away and when Brad lived a block and a half down the street. Brad showed me some of their management tools — almost all home-grown.

    I mention that because today I stumbled on Dormando’s [crappy] Operations Mantras. Dormando works for Six Apart, and he has the same philosophy I see in Brad. Relentless automation and merciless monitoring are the two secrets of efficient and effective system administration and operations management.

    I only wish that someone had handed me this list of mantras when I started as a system administrator in the ’90s. (Puppet and Xen would have been nice too.)