I was recently given the unenviable task of providing my manager with a spreadsheet listing all the code artificacts (.java files, xml files, JSPs, etc.) that my team was working on. The list should include, among other things, all the source files under my src directory, JSPs under the web directory, and XML configuration files under web/WEB-INF. But it should exclude things like generated source code, .class files, and jar files.
My desktop at the client site runs Windows, so I thought I would try out the dir command. That got me nowhere. Then I jumped into cygwin and tried ls -R. Of course, this listed everything under my project directory, including the CVS directories and their contents. Plus, the file listing didn’t specify the full path. If I was going to use this output to create my spreadsheet it would take quite a lot of cutting, pasting, and deleting to make it suitable for the spreadsheet — there had to be a better way.
I’ve spoken fluent Java for almost 10 years now, so I thought, hey, I can do this in Java! I looked at the API for java.io.File … and then thought again.
Well, that Ruby language seems pretty popular … I’ll try that.
I googled “Ruby recursive directory” and found a couple of interesting hits. This looks darn easy to do in Ruby. I installed Ruby (cake with the one-click install), fired up TextPad and cranked out this little script in just a few minutes. It did the job perfectly.
Oh, and by the way, it’s my first Ruby script.
require 'find'
dirs = ["src/java","src/hbm", "src/conf", "src/unit-test", "src/integration-test", "web"]
excludes = ["CVS","classes","images","lib","tlds"]
for dir in dirs
Find.find(dir) do |path|
if FileTest.directory?(path)
if excludes.include?(File.basename(path))
Find.prune # Don't look any further into this directory.
else
next
end
else
p path
end
end
end
The next time you’ve got a tedious task like this, give Ruby a try. I was impressed at the quality of the documentation, and just as importantly, the quality of the error messages spat out by the Ruby interpreter when my language guesses weren’t quite right. Every good repair man knows that it takes more than one tool to get a job done — pick the right one and you can save yourself a lot of time and frustration — and maybe have some fun along the way.



The next time you need this, don't write any code at all:
ls -R | grep -v CVS | grep -v classes | grep -v images | grep -v lib | grep -v tlds
That will give you about 99% of what you want.
Souldn't this be on onRuby.net?
why not just use find(1) directly?
Phil,
Yes that's come close -- but all I want is a list of files -- not the directories. And I wanted each file to include it's path from the current directory. Granted, I am sure I could get that to work using the shell, I was just pleased that I was able to do in Ruby with a simple script that it's easily customized as needed.
Jay,
I really don't think there's any problem with this post here at OnJava. My point was that we are not just Java programmers -- we are software engineers. We need to be open to using whatever tools best help us solve problems.
k,
Did you mean use the Unix 'find' command? If so, yes, that certainly could have worked also. One thing nice about the Ruby script is it's not dependent on cygwin.
why not just obtain a report from your SCM?
re: phil, Bill
Programmers need to learn the "find" command and xargs. Deciding to run this in Ruby is a better choice than doing it in Java no doubt. But, in general, Bash can beat Ruby in terms of terseness.
But, the thing I hate the most about Ruby posts is the idea that an author is required to do task X in the fewest number of lines of code. I say, three cheers to Bill for solving his problem (even if he didn't grok the powerful master known as "find")
Re: Jay "Shouldn't this be on OnRuby.net?"
The people who run these O'Reilly blogs decided that it would be better to segment up blog authors into focus areas. IMO, it was odd only because rarely does someone only write about technology X or technology Y. *shrug*
But, Bill's post make sense. Java programmers should be learning how to leverage Ruby. IT makes perfect sense.
What is wrong with a non-recursive method? I don't know anything about Ruby but I would think a non-recursive method would have a better performance than a recursive method.
I have done non-recursive directories with Borland Delphi quite a few times. I know, Delphi isn't free but it has some free alternatives. (Including Linux and Mac alternatives.) And all you basically need is a dynamic array to store all the folders you've found, walking through this list and for each folder in the list, you add all it's subfolders to this list.
Of course, any files you find are added to your output. All folders you find are appended to the list, so they will be next in line to search.
I'd probably have used the ant zip task for this which'll ignore the cvs/svn files by default.. and I'm free to add any extra excludes. Then I'd simply dump out the paths in the resulting zip.
i thought this was an april fool's joke
Sorry, when I come to OnJava, I am looking for Java related stuff. I could do the same thing (and probably just as easily) with Tcl, Python or Perl.
Wow -- I had no idea one little blog would cause such a stir. From now on I will wear an inverted coffee mug as a mind-control helmet and chant "All Hail Java", "All Hail Java".
Bill,
I have had a very similar experience. Indeed, Ruby is a language worth exploring. A self-respecting developer should be a little more language agnostic. The comments here display a lack of composure and open-mindedness. Ruby is not the holy grail, but it certainly is fun.
org.apache.commons.io.FileUtils.listFiles
How about:
dir /s /b
/S Displays files in specified directory and all subdirectories.
/B Uses bare format (no heading information or summary).
Example output:
c:\vircon>dir /s /b
c:\vircon\blog
c:\vircon\hours
c:\vircon\blog\a
c:\vircon\blog\b
c:\vircon\blog\c
c:\vircon\blog\d
c:\vircon\blog\a\29042006
c:\vircon\blog\a\First Delivery from Ravi.txt
c:\vircon\blog\a\29042006\aboutus.html
etc....
Yes dir /s /b, and dir /s /b>output.txt will send all output to the file output.txt
Lovely documentation, I am learning to use ruby as well to replace common tasks like this. However, I still would have used plain old GNU find for this task.
great site
"I was impressed at the quality of the documentation, and just as importantly, the quality of the error messages spat out by the Ruby interpreter when my language guesses weren't quite righ"
LOL!!
Ich erklare meinen Freunden uber diese Seite. Interessieren!
Interesting comments.. :D
A Media Production company specializing in All types of 2d & 3D Animations,e-learning,
3d simulation,3d walkthrough,3d modeling animation,Multimedia presentations, marketing presentation, webpromoting,Videopresentation,Animations-website-design,webstreaming, development,hosting, Flash presentations, interactive presentations,cbts,wbts,and more...
I was wondering why you have this if statement:
if excludes.include?(File.basename(path))
You really need to learn basic bash shell usage, and commands such as find, grep, etc..
As was mentioned above, this sort of thing is a quick shell command. If you don't want directores, just give the "-type f" argument to find, e.g.
find ./ -type f | grep -v CVS | grep -v classes | ...
I am sure that can be done in Java as well tha easy
Hi Bill,
Have you tried JRuby? I'm not a Java user but I've heard great things about it. Not just for Ruby integration, but also for coding in Java interactively, etc. (see irb)
Here are a few shortcuts and/or style tips related to your code sample.
%w():
dirs = %w( src/java libs )
excludes = %w( CVS lib ~.*tmp ).collect {|e| Regexp.new e }
array.each:
dirs.each do |dir|
code
end
It matches the structure of the other blocks (do |var,var2| .. end) and is clearer. (The object 'dirs' is being passed a code block (do .. end) and it runs it for 'each' element.)
command if condition:
if File.directory? path
Find.prune if excludes.any? {|e| File.basename(path).match e }
next
end
Also works like:
next unless x > 6
next if x < 12 unless y.nil?
etc.
Often this improves readability. Instead of
if really_long_tests raise "error foo"
if other_tests raise "error bar"
if this_isn't_what_we_want next
you get
raise "error foo" if really_long_tests
raise "error bar" if ...
next unless this_is_what_we_want
which seems much more readable to me. The flow of the code becomes evident instead of the implementation details.
Also, give 'irb', the interactive ruby interpreter a try. It's the easiest way to test code. (With tab completion turned on it lets you tab through object methods, variables, etc.)
Thanks!
Hello Bill,
A google search on Recursive Directory Tree for Ruby brought up this page (I know it's a Java Topic but the reason I was searching is because of some contradictory information on this subject) I have the Ruby Cookbook and tried out Recipe 6.12 "Walking a Directory Tree". The discussion that followed said "Note how all the files in the top-level directory are processed after the subdirectories" which is depth-first traversal. My observation of the running code showed that in general this is not true and that only contrived examples of directory trees such as those created by create_tree.rb in the beginning of chapter 6 have this characteristic. Upon investigation, the Dir.open() method used in Find.find() under Linux does not return a sorted list of directories and files. Rather, the order in which directories and files get returned is based on the Inode order of creation on disk. Since the create_tree.rb creates the tree in the order you would expect for depth first traversal, it appears that depth first traversal is functional in the recipe. This may not be true for other operating systems besides Linux. In fact, you can mimic the Dir.open() behavior for directories using the Linux ls -U command for unsorted listings. Having pondered this strange behavior of the Recipe and digging deeper into the Ruby libraries I was able to modify Find.find() so that Dir.open()'s returned array values are sorted before continuing with an extra sort function call and block. After modifying Find.find() recipe 6.12 worked as expected - depth first traversal for all possible directory trees.
Another book, "Programming Ruby - The Pragmatic Programmers Guide" shows an example usage of Find.find() and the results printed verify that there is indeed no sorting feature of the Dir.open() method in Find.find() and that Recipe 6.12 can not possibly yield Depth first traversal behavior using the Ruby library Find.find() as the basis.
I can't decide if this is a bug or a feature in the Find.find() method - using unsorted entries from Dir.open() is certainly faster but can mess up other algorithms relying on a Depth First Traversal result.
Here is my fix inside Find.find() to make the recipe work:
dd = Dir.open(file)
d=dd.entries.sort { |x,y|
f1 = File.join(file,x)
f2 = File.join(file,y)
b1 = File.directory?(f1) ? "D#{f1}" : "F#{f1}"
b2 = File.directory?(f2) ? "D#{f2}" : "F#{f2}"
b2 b1
}
# carry on with the rest of Find.find() using
# sorted variable d instead
who can help me i need to write a code using trees to display a list of files and folders in an email inbox using java. all i keep finding use Jswings and i am not looking for those ones
man tree