Update: So much thanks goes out to james (and of course, Ben for getting the party started with his first submission for a Unix 1 line command equivalent to my 60 lines of XSLT.), who has not only helped optimize the optimization process, but has also solved the bug that was excluding some of the necessary dependencies for everything to work properly.
I’ve checked the results into the repository, which can be viewed @ http://nuxleus.com/dev/browser/build (see Changeset 3899 for the specific diff details)
Thanks for all of your help, james!
Update: You know what I love even more about the various software development communities of the world?
Competition… :)
here comes my one-liner:
sed ‘/^[^/]/d’ < dep-list | sort -u > deduped-del-list
Ben’s solution leaves a blank line at the top of the file
which can be easily spotted by running diff.
which comes from james, who then follows up again with,
Apparently my first one-liner also left a blank line, heh. Here’s a smarter one:
sed -n ‘/^\//p’ < dep-list | sort -u > deduped-del-list
Which showcases yet another common trait of us hacker types… Competition with not only each other, but with ourselves as well, finding new and better ways to push the envelope just that much further with each iteration, the result of which, if not mistaken, is called “technological advancement” (or sometimes “feature creep”, dependent, of course, on the situation at hand, and in many cases, ones own perspective ;)
None-the-less, this is the kind of stuff that feeds the body, soul, and mind of us hacker-types. Or in other words: Change Art ;)
Thanks for the follow-ups, james!
Update: One of things I most certainly love in regards to software development communities in general, but in particular the Unix* communities, is the tendency towards the open sharing of knowledge between community members (or in my particular case, not necessarily a member of the community, and instead more like a member-in-training ;).
As I specified at the bottom of this post,
For all of you Unix-geeks out there in whom would know right away how to go about this with standard Unix-command line tools, please bear with me… While I’ve learned quite a bit over the last couple of years, I’m still a Windows-trained hacker attempting to push the limits of my Windows-”poisoned” mind ;-)
To which james responded with,
using sed and sort that would be a one-liner…
which after a request for an example, Ben responded with,
sort -u dep-list | grep -v 'dynamic' > deduped-del-list
Nice! Thanks to each of you for helping out a poor, lost, Windows-trained hackers soul such as myself with a phat little tip to help speed up the process. *MUCH* appreciated!
[Original Post]
A while back, Abel Braaksma posted a request to XSL-List for interesting use-cases/implementations in which XSLT 2.0 played a role. While I am still working on optimizing things a bit, one of the pains I’ve been dealing with as part of the nuXleus project is attempting to keep the size of the distribution down by including only the absolutely mandatory shared libraries for each included application as opposed to your standard distro in which will include most everything that comes along for the ride with each package installed.
Of course, the good folks at rPath are working on a solution that will help TREMENDOUSLY in regards to this process, by allowing the ability to require individual files as opposed to entire packages when building out a new package that requires at least one file from another package (e.g. a shared library) to work properly. If not mistaken, this wonderful feature will be available as part of the next (2.0) release of rPath Linux. So for those of you interested in building out optimized Linux-based appliances, with minimal amount of effort, stay tuned to Planet Conary (conary is the build, repository, and package management system for rPath Linux) for the latest updates.
In the mean time, being one who tends to be lured by interesting and challenging problems, @ http://nuxleus.com/dev/browser/build you will find a listing and links to all of the various build scripts for the nuXleus project that I’ve been hacking together for the last couple of months, in an attempt to automate the process of finding and installing, as mentioned, only the absolutely mandatory shared libraries using various techniques I’ve been learning along the way.
While the Unix-command line (specifically tools such as grep, find, ld, ldd, etc…) is amazingly powerful, one of the more difficult tasks I have come across is attempting to filter through a generated index of shared libraries that are required for the various binary executables of the utilities included in each release**, outputting a sorted, de-duped list of required libraries as a result to then copy into the appropriate lib directory into the distribution build directory for packaging.
The solution I’ve found that seems to work quite nicely is a pretty simple, straight forward XSLT 2.0 transformation file that utilizes the ability to read in text files, generate a temporary tree, to then use for-each-group to filter and output the mentioned de-duped list of required shared libraries.
An example list of files to de-dupe,
http://nuxleus.com/dev/browser/build/dep-list?format=raw
The XSLT to import the above text file, sort, de-dupe, and output the result,
http://nuxleus.com/dev/browser/build/strip-sort-dedup-dep-list.xsl
The XML file that drives the process,
http://nuxleus.com/dev/browser/build/dedup-list.xml
The output of the above transformation,
http://nuxleus.com/dev/browser/build/deduped-dep-list?format=raw
While it would be possible to use the Muenchian Method to de-dupe a generated XML representation of this same list, and while it would be possible to use a separate external process to either,
a) generate an XML file that wraps the input text into an XML file with a single text node wrapped inside of a parent node
b) generate an XML file via an external process (a simple process, but none-the-less, external) that contains a new node for each line of the input file,
> process a) would require both an external process as well as access to the node-set() function (which is not part of the XSLT 1.0 spec, and as such, implementations are both sporadic, as well as proprietary to each processor) to then convert the text node into a temporary tree for processing, and process b) still requires an external process. Of course, as we all are aware, for-each-group is SO MUCH BETTER (in regards to ease of use, understanding of how it works, and quite possibly performance dependent, of course, upon processor-specific optimizations), and as such makes the process of grouping, sorting, de-duping, and so forth an enjoyable experience as opposed to a frustrating learning curve.
Of course, the next step with the above would be to turn all of the various scripts contained in the build directory into a single XSLT that uses extension functions to invoke the various external process, which would allow for a much more dynamic and manageable build process. I’ve got a bit of a start on this, but not a whole lot, as with limited time, the above, thus far, is pretty much as far as I’ve taken this in regards to something that some of you might consider useful (and usable.)
That said, for those with interest, I put together a fairly crude/rough tutorial (which is really just a copy/paste of an IM conversation I had with Russ, explaining how the process worked.) Of course, with the promise of a simplified build process as part of the next release of rPath Linux, the above may not even be all that necessary for much longer. But knowledge is never a bad thing to have, so I figured it would be worth sharing with you all what I’ve learned thus far in my quest to build the leanest, meanest, most capable .NET-based virtualized Linux distribution I possibly can.
On this same topic, while I had hoped to get to pushing out the next release of nuXleus yesterday, too many tasks, and not enough time kept that from happening. That said, I have various build processes running in the background. Once things are in a state where I feel comfortable pushing out the mentioned next release, I will update with a new post.
Until then, enjoy the rest of your weekends!
—
** For all of you Unix-geeks out there in whom would know right away how to go about this with standard Unix-command line tools, please bear with me… While I’ve learned quite a bit over the last couple of years, I’m still a Windows-trained hacker attempting to push the limits of my Windows-”poisoned” mind ;-)


using sed and sort that would be a one-liner...
@james,
Looks like I need to do some quick research. That said, would you mind providing a quick sample? Would be *much* appreciated :D
sort -u dep-list | grep -v 'dynamic' > deduped-del-list
Gracias, Ben! Will bring this to the top of the post to ensure this handy tip is properly propagated.
here comes my one-liner:
sed '/^[^/]/d' < dep-list | sort -u > deduped-del-list
Ben's solution leaves a blank line at the top of the file
which can be easily spotted by running diff.
@james,
I *LOVE* it! The competition heats up... ;) Will bring this to the top now.
David,
"/lib/ld" is missing from your dep-list, yet it appears in the deduped one.
Apparently my first one-liner also left a blank line, heh. Here's a smarter one:
sed -n '/^\//p' < dep-list | sort -u > deduped-del-list
@james,
> "/lib/ld" is missing from your dep-list, yet it appears in the deduped one.
Yeah, that's something I hacked into the transformation file @ http://nuxleus.com/dev/browser/build/strip-sort-dedup-dep-list.xsl#L31
<!--
HACK
for some reason the linux /lib/ld* library/symlink are not being copied into the dep list
As such, a hack, which will place this before the other entries.
HACK
-->
<xsl:text>/lib/ld</xsl:text><xsl:value-of select="$linebreak"/>
I haven't figured out why the /lib/ld isn't in the dep-list as of yet, but it has something directly to do with http://nuxleus.com/dev/browser/build/build-dep-list which processes this > http://nuxleus.com/dev/browser/build/bin-dir-list < list of directories to determine which dependencies are required. I think what I still need to do is to first dedup the result list from the first process, to then process the results with the same (or at least similar) process contained in "build-dep-list", which looks like,
for name in `cat bin-dir-list`; do
for name in $name*; do
ldd $name | cut -d ' ' -f3 | cut -d '.' -f1 >> dep-list
done
done
Which I think would make sense, given the fact that there are going to be shared libraries in which have dependencies on other shared libraries. Obviously there's still a bit work that needs to be done to perfect the overall optimization process. ;)
Thanks for the follow-up, james! I must admit, this is *MUCH* more fun that writing entries on ODF vs. EOOXML to then spend my days getting my a$$ virtually kicked by follow-up comments from members of my "M. David $uck$ rock$" not-so-much-a-fan club members ;) :D
maybe this one would do the trick:
ldd $name | sed '/> (/d;s/^[^/]*\(\/[^ ][^ ]*\) .*$/\1/' >> dep-list
@james,
let me try that now. Thanks! :D
@james,
You are my new hero! That works *GREAT*! :D
So I just pulled this all together into the same build-dep-list script, and checked in the result. via the changeset @ http://nuxleus.com/dev/changeset/3899
updated with a whole bunch of kick a17563 capabilities provided courtesy of james@http://www.oreillynet.com/xml/blog/2007/02/using_xslt_20_to_optimize_linu.html#comment-493435 (as well as several comments above this one)
OF course, now I need to find out why "a$$" was expanded to "a17563" in the check-in notes, but that's less of a concern at this stage than is the fact that this now works the way it should. :)
Thanks again for your help, james!
I'm glad 'sed' could help you.
As for the a$$ - $$ got replaced by the current process ID.
> As for the a$$ - $$ got replaced by the current process ID.
Ahhh... Okay, that now makes sense. Thanks! Oh, and when you see 'sed' next, give (him/her/it) my thanks, would ya? ;)