Update: So much thanks goes out to james (and of course, Ben for getting the party started with his first submission for a Unix 1 line command equivalent to my 60 lines of XSLT.), who has not only helped optimize the optimization process, but has also solved the bug that was excluding some of the necessary dependencies for everything to work properly.
Thanks for all of your help, james!
Update: You know what I love even more about the various software development communities of the world?
here comes my one-liner:
sed ‘/^[^/]/d’ < dep-list | sort -u > deduped-del-list
Ben’s solution leaves a blank line at the top of the file
which can be easily spotted by running diff.
Apparently my first one-liner also left a blank line, heh. Here’s a smarter one:
sed -n ‘/^\//p’ < dep-list | sort -u > deduped-del-list
Which showcases yet another common trait of us hacker types… Competition with not only each other, but with ourselves as well, finding new and better ways to push the envelope just that much further with each iteration, the result of which, if not mistaken, is called “technological advancement” (or sometimes “feature creep”, dependent, of course, on the situation at hand, and in many cases, ones own perspective ;)
None-the-less, this is the kind of stuff that feeds the body, soul, and mind of us hacker-types. Or in other words:
Thanks for the follow-ups, james!
Update: One of things I most certainly love in regards to software development communities in general, but in particular the Unix* communities, is the tendency towards the open sharing of knowledge between community members (or in my particular case, not necessarily a member of the community, and instead more like a member-in-training ;).
As I specified at the bottom of this post,
For all of you Unix-geeks out there in whom would know right away how to go about this with standard Unix-command line tools, please bear with me… While I’ve learned quite a bit over the last couple of years, I’m still a Windows-trained hacker attempting to push the limits of my Windows-”poisoned” mind ;-)
To which james responded with,
using sed and sort that would be a one-liner…
sort -u dep-list | grep -v 'dynamic' > deduped-del-list
Nice! Thanks to each of you for helping out a poor, lost, Windows-trained hackers soul such as myself with a phat little tip to help speed up the process. *MUCH* appreciated!
A while back, Abel Braaksma posted a request to XSL-List for interesting use-cases/implementations in which XSLT 2.0 played a role. While I am still working on optimizing things a bit, one of the pains I’ve been dealing with as part of the nuXleus project is attempting to keep the size of the distribution down by including only the absolutely mandatory shared libraries for each included application as opposed to your standard distro in which will include most everything that comes along for the ride with each package installed.
Of course, the good folks at rPath are working on a solution that will help TREMENDOUSLY in regards to this process, by allowing the ability to require individual files as opposed to entire packages when building out a new package that requires at least one file from another package (e.g. a shared library) to work properly. If not mistaken, this wonderful feature will be available as part of the next (2.0) release of rPath Linux. So for those of you interested in building out optimized Linux-based appliances, with minimal amount of effort, stay tuned to Planet Conary (conary is the build, repository, and package management system for rPath Linux) for the latest updates.
In the mean time, being one who tends to be lured by interesting and challenging problems, @ http://nuxleus.com/dev/browser/build you will find a listing and links to all of the various build scripts for the nuXleus project that I’ve been hacking together for the last couple of months, in an attempt to automate the process of finding and installing, as mentioned, only the absolutely mandatory shared libraries using various techniques I’ve been learning along the way.
While the Unix-command line (specifically tools such as grep, find, ld, ldd, etc…) is amazingly powerful, one of the more difficult tasks I have come across is attempting to filter through a generated index of shared libraries that are required for the various binary executables of the utilities included in each release**, outputting a sorted, de-duped list of required libraries as a result to then copy into the appropriate lib directory into the distribution build directory for packaging.
The solution I’ve found that seems to work quite nicely is a pretty simple, straight forward XSLT 2.0 transformation file that utilizes the ability to read in text files, generate a temporary tree, to then use for-each-group to filter and output the mentioned de-duped list of required shared libraries.
An example list of files to de-dupe,
The XSLT to import the above text file, sort, de-dupe, and output the result,
The XML file that drives the process,
The output of the above transformation,
While it would be possible to use the Muenchian Method to de-dupe a generated XML representation of this same list, and while it would be possible to use a separate external process to either,
a) generate an XML file that wraps the input text into an XML file with a single text node wrapped inside of a parent node
b) generate an XML file via an external process (a simple process, but none-the-less, external) that contains a new node for each line of the input file,
> process a) would require both an external process as well as access to the node-set() function (which is not part of the XSLT 1.0 spec, and as such, implementations are both sporadic, as well as proprietary to each processor) to then convert the text node into a temporary tree for processing, and process b) still requires an external process. Of course, as we all are aware, for-each-group is SO MUCH BETTER (in regards to ease of use, understanding of how it works, and quite possibly performance dependent, of course, upon processor-specific optimizations), and as such makes the process of grouping, sorting, de-duping, and so forth an enjoyable experience as opposed to a frustrating learning curve.
Of course, the next step with the above would be to turn all of the various scripts contained in the build directory into a single XSLT that uses extension functions to invoke the various external process, which would allow for a much more dynamic and manageable build process. I’ve got a bit of a start on this, but not a whole lot, as with limited time, the above, thus far, is pretty much as far as I’ve taken this in regards to something that some of you might consider useful (and usable.)
That said, for those with interest, I put together a fairly crude/rough tutorial (which is really just a copy/paste of an IM conversation I had with Russ, explaining how the process worked.) Of course, with the promise of a simplified build process as part of the next release of rPath Linux, the above may not even be all that necessary for much longer. But knowledge is never a bad thing to have, so I figured it would be worth sharing with you all what I’ve learned thus far in my quest to build the leanest, meanest, most capable .NET-based virtualized Linux distribution I possibly can.
On this same topic, while I had hoped to get to pushing out the next release of nuXleus yesterday, too many tasks, and not enough time kept that from happening. That said, I have various build processes running in the background. Once things are in a state where I feel comfortable pushing out the mentioned next release, I will update with a new post.
Until then, enjoy the rest of your weekends!
** For all of you Unix-geeks out there in whom would know right away how to go about this with standard Unix-command line tools, please bear with me… While I’ve learned quite a bit over the last couple of years, I’m still a Windows-trained hacker attempting to push the limits of my Windows-”poisoned” mind ;-)