Related link: http://dubinko.info/blog
I finally have a personal content management system that lets me access the data, even when the software isn’t running.
More than a little inspired by Danny O’Brien’s Life Hacks talk at ETech, and using more than a little of David Mertz’s public domain code from Text Processing in Python, I finally took a first major path towards the Brain Attic concept I first wrote about in this very weblog.
A funny thing happened on the way to XML, though. It turns out that plain old text is a better format for writing, and reading (which happens much more often). As an author and editor of the XForms specification, I don’t say this lightly. Your favorite text editor is the greatest productivity tool there is.
All my important textual data–my working (and searching) set–is now spread out over a tree of intelligently named directories. All *.txt files. I can move to any OS and be instantly productive. I can easily copy these files to my iPod or any other PDA.
I have scripts to convert structured text to XHTML, suitable for printing or, say, submitting a manuscript. I have scripts and XSLT to produce a weblog and RSS feed from a text file (now active, check it out).
XML is, of course, still important, and as long as people need to edit XML, they’ll need XForms. But something more fundamental, something missed by practically every existing piece of software, is the most important thing:
It’s the data, stupid.
How do you manage all your “stuff”? Talk Back.


surprised that plain text is easier for humans than XML?
I've never understood why some people seem to think XML is some sort of magic bullet that will make accessing and creating textual data suddenly completely transparent and easy to do for everyone.
Plain text is a far simpler format for humans to use, the ONLY benefit in XML IMHO is when the text is data for (mainly) machine consumption.
Corrected Link?
http://www.oreillynet.com/cs/weblog/pub/wlg/2210
Corrected Link?
Easy to guess: http://www.oreillynet.com/pub/wlg/2210
My "one app": plaintext that's better than plaintext
=head1 POD, Perl's "Plain Old Documentation"
This is a I<great> format. I tend to write most stuff with it. It provides enough markup to roughly structure and format text, but not so much as to get in the way of reading it.
=head2 Formatting markup is simple
It doesn't really get in the way B<too> badly. Certainly it only rarely makes the text particularly unreadable.
=head2 Structure markup is visually helpful
As you can see, it actually provides visual landmarks. Particularly with an editor that syntax highlights POD, this is great — you can skim long texts easily.
You also have labelled, bulleted, and ordered lists at your disposal. Using C<=for> and C<=begin>/C<=end> you can insert passages of foreign markup specific to certain POD renderers, that others will ignore (f.ex, to use HTML C<img> tags or maybe insert a Postscript graphic).
=head2 Tool paradise
For simple consumption, the format is trivial enough that tools are not required, nor do you need them for editing. This is in sharp contrast with a heavyweight markup language like XML.
There is a variety of tools already available to process POD. With the tools available with any standard Perl install you can turn POD into HTML, Postscript, or LaTeX.
Making new tools for custom tasks using Perl is dead easy. Making new converters is slightly trickier, but a variety of existing POD parser modules take a lot of the burden.
=head2 Conclusion
It's just great. I love it.
I guess it confirms the rule that everyone uses only one app: for Joel Splosky that's Excel, for random HR person it's PPT, for Don Lancaster it's Postscript — and for me, it's Perl.
My "one app": plaintext that's better than plaintext
=head1 POD, Perl's "Plain Old Documentation"
This is a I<great> format. I tend to write most stuff with it. It provides enough markup to roughly structure and format text, but not so much as to get in the way of reading it.
=head2 Formatting markup is simple
It doesn't really get in the way B<too> badly. Certainly it only rarely makes the text particularly unreadable.
=head2 Structure markup is visually helpful
As you can see, it actually provides visual landmarks. Particularly with an editor that syntax highlights POD, this is great — you can skim long texts easily.
You also have labelled, bulleted, and ordered lists at your disposal. Using C<=for> and C<=begin>/C<=end> you can insert passages of foreign markup specific to certain POD renderers, that others will ignore (f.ex, to use HTML C<img> tags or maybe insert a Postscript graphic).
=head2 Tool paradise
For simple consumption, the format is trivial enough that tools are not required, nor do you need them for editing. This is in sharp contrast with a heavyweight markup language like XML.
There is a variety of tools already available to process POD. With the tools available with any standard Perl install you can turn POD into HTML, Postscript, or LaTeX.
Making new tools for custom tasks using Perl is dead easy. Making new converters is slightly trickier, but a variety of existing POD parser modules take a lot of the burden.
=head2 Conclusion
It's just great. I love it.
I guess it confirms the rule that everyone uses only one app: for Joel Splosky that's Excel, for random HR person it's PPT, for Don Lancaster it's Postscript — and for me, it's Perl.
Also, I wish O'Reilly had a preview button on the comment form.
A hierarchy of textual formats?
I find that their is a hierarchy to data stored as human readable text. From a series of notes in a file made without regard to prgrammatic filtering/Searching - when I try to keep spellings consistant and may leave text marker strings around to help when searching in vi.
- through more structured text - tabular data that is easily read by awk (and so by most other scripting languages)
- and on to text with yet more structure where i will write it so that it could be parsed by a scripting language (I have used this technique to format written data in lisp and Python data structures). Personally I find XML syntax very verbose for typing by hand and since I rarely use other tools that read or write XML, I survive without writing XML
For your data repository you might want to do things like, restrict the characters used in file and directory names as some tools/OSs don't like spaces or exclamation marks etc in names, or have problems with manipulating them in command line shells. Don't use too many characters in file and directory names, and don't have more than one file or directory name that is only distinguishable by case - that will cause problems on case insensitive systems.
You might also like to have handy a utility to change the line endings of text files between that supported on multiple OSs.