As announced earlier on my personal blog, I launched an open source project on Google Code called “loghetti”. It’s written in Python, and is a foundation for what I hope will become a very flexible tool to help admins (myself included) get whatever data they need out of their Apache logs.

Here are a couple of examples of stuff it can do:

Get a list of all of the 500 errors:

./loghetti.py –code=500 access.log

This will send all matching lines in access.log to STDOUT. To get a bit more complex:

./loghetti.py –ip=192.168.1.2 –code=500 –month=11 –day=21 –urlbase=index.php –count access.log

This will *not* return the lines that match all of those rules - but rather a simple count of the matching lines. This request is a somwhat typical support scenario. You have a client at 192.168.1.2 reporting 500 errors they received on some arbitrary date, when trying to reach your intranet’s home page. It’s not unusual in a support role to have the client say “it happened like, a million times”. Of course, –count will dutifully report that it happened 4 times (for example), which is likely closer to the truth.

Ok, one more example, because I happen to be a fan of this feature:

./loghetti.py –urldata=foo:bar access.log

This causes loghetti to parse the query string, and return lines where the query parameter “foo” matches argument “bar”. In other words, lines that look something like this:

http://www.yourdomain.com?stuff=things&foo=bar&this=that

There are billions of features I’d like to implement, but I figured since the tool is useful to me already, it would likely be useful to others, and maybe others can help get features that might help them implemented more quickly.

Let me know your thoughts!