I have been meaning for the longest time to scratch the personal itch of creating a robust command line tool to analyze duplicates on a file system. There are a few scripts floating around in various languages and the problem is not all that difficult to solve, but I went the whole nine yards and wrote a reasonably cool command line tool that uses md5 checksums to detect duplicates. A report is generated in addition to stdout dup messages in CSV format, so you can manually look through dupes and decide what you want to do with them.
Liten can be downloaded from the cheeseshop: http://cheeseshop.python.org/pypi/Liten/0.1a
I have a rather long list of things to finish, like threading, daemonizing, caching ORM backend, way more unittesting etc. Give it a whirl and let me know what you think…
Oh, and thanks to the following people that I bugged with dumb questions like I usually do :)
Titus Brown, Shannon Behrens, Rick Copeland, Jeremy Jones, Scott Leerssen.