March 2003 Archives

Todd Mezzulo

AddThis Social Bookmark Button

text2pdf PRO is a commercial tool developed by SANFACE Software that allows text files to be easily converted to PDF. Fabrizio Sanface sent me this story written by Paul Williams, Director of IT at The Halifax Herald Limited, which details how this great tool created with Perl helps a metro daily newspaper deal with the thousands of daily invoices and monthly statements it generates for advertising customers.

The Halifax Herald Limited Project

The Halifax Herald Limited is one of Canada’s oldest and largest independent newspapers. Based in Halifax, Nova Scotia, and dating from 1875, the Herald publishes The Chronicle-Herald, The Mail-Star and The Sunday Herald.

The Problem

Like all metro daily newspapers, the Herald publishes hundreds of display ads and thousands of classified ads every day, resulting in thousands of daily invoices and monthly statements. For years we printed duplicate bills so the Accounting Department would have file copies in the event an advertiser had questions or required a reprint.

Obviously, a significant amount of time and space was occupied handling these bills — separating, filing, finding, re-filing — all very manual processes. The bills themselves begin as huge ascii files that are printed on continuous pre-printed forms using IBM 6400 line printers.
In 2000, we developed a PERL-based document management system that indexed the ascii files by invoice number. This allowed the Accounting Department to search for a particular invoice or statement and display it on their screen. The system was quite convenient for them but not robust enough to allow us to stop printing and filing duplicates.

It proved that a document management system would save time, money and storage space, and allow us to better serve our customers. Managing, protecting, indexing and cross-referencing all these ascii files was a primary challenge. An Oracle database, fronted by a cgi interface using DBI/DBD and PERL was the answer.

Why txt2pdf PRO?

The other major concern was the integrity of ascii files — they’re easy to alter. This made PDF desirable. Customers, auditors and taxmen could agree that PDFs were exact replicas of original bills.
How to convert our ascii files to PDF? A quick search of the web turned up txt2pdf PRO. Right out of the box it produced perfect PDF versions of our bills. Of course they were still plain text, hard to read on the screen and painful to print on line printers.

txt2pdf PRO’s overlay/underlay capabilities provided a perfect solution — underlays exactly replicating our pre-printed forms, color-matched and complete with the Herald logo. Now the PDFs look just like the bills we print, making them easy to read on the screen and suitable for reprinting on laser printers.

While evaluating txt2pdf PRO we discovered it also converted all our large reports accurately to PDF. These reports are hundreds of pages thick and many are just for historical record.

Summary

Benefits derived from this project:

*No longer printing duplicate bills.
*No more manual filing of hard copy bills and reports.
*Invoices & statements available online for review and reprinting.

*Document database can be searched by invoice number, account number and/or date.

*Significant paper and ribbon savings.

*Large reports are now cataloged and full-text searchable.

Paul Williams
Director of Information Technology
The Halifax Herald Limited

To learn how large and small companies are using Perl to meet their goals, check out Perl Success Stories.

If you have a Perl success story of your own that you’d like to share, please let me know. You can reach me at: todd@oreilly.com.

Jacek Artymiak

AddThis Social Bookmark Button

Related link: http://www.freedom-to-tinker.com/archives/000336.html

Outlawing encryption, firewalls, and other means of privacy protection is like telling us to leave the doors and windows to our homes wide open, with all of our documents, credit cards, passports, and health records left on the doorstep for anyone to steal, destroy, falsify, and use as they please. Is this the world we want to live in?

Bad laws, such as the bills that Edward W. Felten and Slashdot write about, will not help governments catch the bad guys. They will make us all more vulnerable.

Jacek Artymiak

AddThis Social Bookmark Button

Related link: http://www.ietf.org/internet-drafts/draft-fink-6bone-phaseout-00.txt

This from 6bone (IPv6 Testing Address Allocation) Phaseout [Fink, Hinden 2003]:

During 2002 more production IPv6 address prefixes had been allocated than are allocated by the 6bone at the top level. It is generally assumed that this is one reasonable indicator that planning for a 6bone phaseout should begin.

Looks like the world is slowly switching to IPv6. I wonder how log it will take to phase out IPv4?

Jacek Artymiak

AddThis Social Bookmark Button

Related link: http://www.deadly.org/article.php3?sid=20030325141427

The OpenBSD packet filter has been ported to FreeBSD and NetBSD. If you plan to move from ipfw to pf, then this might help to get you started (just remember to replace modify state with modulate state).

UPDATE 2003-05-03, 10:57 GMT+1: Max Laier wrote to inform me that pf for FreeBSD has a new home. Thank you, Max!

Uche Ogbuji

AddThis Social Bookmark Button

Related link: http://www.itworld.com/nl/ebiz_ent/03182003/

Of course Sean is dead wrong as to the salient matter, but he’s always a good read. RDF is for people who understand directed graphs. If you take any random audience, this is, of course, a small proportion. Same story for forensic histology, but I doubt Sean would moot for closing down all the crime labs. The argument that “not everyone can get RDF ” is not worth any number of words. The more interesting point is that anyone who can’t get RDF can’t get relational databases or any other sort of formal information modeling, and they can’t get code (both flow of control and declarative algeras are graphs more complex than RDF). For those outside this set, as Sean points out too obliquely, there are plenty of tools and they needn’t deal with RDF directly.

Andy Oram

AddThis Social Bookmark Button

Once again, an event of immense historical impact–the bombardment and invasion of Iraq by the United States–brings out the essayist in everyone. I will join the crowd today, but with an essay that is almost more cultural than political. It has recently occurred to me that the salient factor in the public’s support for the war called Operation Iraqi Freedom is their affinity for George W. Bush’s style. And this may also explain the deep split between U.S. opinion and those of nearly every other country in the world (Poland being a possible exception).

Bush has made this war his own to an extent without precedent. State Department Secretary Powell presented the intellectual’s war before the U.N., and Defense Secretary Rumsfeld is good for pitch-hitting or second quotes, but Bush is the ultimate spokesperson for every turn taken in U.S. policy. And he carries out this job with a flair no one can imitate, whether it’s his recreation of grade B cowboy movie scenes (there ain’t enough room in this town for the both of us, Saddam–I’m givin’ you 48 hours to git out) or the ingenuous protests of a heartland Everyman (”Saddam could disarm any time he wanted. He could just drive up to the parking lot with a truck full of weapons and turn them over”–not an exact quote, because it’s from memory, but the gist of something Bush said a few weeks ago.)

When I read in The Nation today (in an article that apparently they did not put online, “Building Cities for Peace,” March 31, 2003 issue) that a small town in Connecticut they described as “Republican-leaning” voted overwhelmingly to oppose the war, I realized that American’s reaction to the war is divided culturally, and probably (though I don’t have statistics on this) geographically. If the reference to driving up to a parking lot with a truck resonates with you, if you find it an exemplary execution of honest plain thinking, you probably go for the war as a whole. Otherwise, you are as repelled by war as by Bush’s folksy commentary on it. This is not a matter of intelligence or attention span or anything else simplistic; it’s a matter of personal style.

So this is a cultural war. Not in the sense of fundamentalist Islasmism against traditional Western liberal philosophy (as is believed by those who mistake excuses for reasons) but in the sense of why people support it and why they don’t.

Bush is not stupid, but he deliberately dumbs down the conversation because he knows that doing so benefits his position. Shortly after the parking lot remark, chief U.N. inspector Hans Blix delivered a carefully reasoned speech in which he explained why it would take months to prove that Iraq was free of illegal weapons, even with full cooperation. This speech, needless to say, did not make it onto the nightly news.

The only way to break the grip that the Bush Administration and the major media outlets have put on discourse seems to be to reduce one’s beliefs to similar ten-word formulae. I am a bit embarrassed by some of the simplistic statements that the broadcaster and newspapers have quoted from the anti-war side–but the overwhelming point is just that: they have been quoted. Don’t assume their more nuanced believes match what they said to get on TV.

So this is Bush’s war. It didn’t begin that way, of course. The roots of the invasion go back to a doctrine expounded long ago by a few right-wingers such as Rumsfeld, Paul D. Wolfowitz, and Richard Perle, long before George W. Bush could even name the countries on Iraq’s borders. You can read about the doctrine’s history in
The Nation,
or even in today’s

New York Times

if you like your information mainstream. But Bush is the reason for the war by now, because he made it his through his style.

Others tried to make it their war too. Tony Blair strove quite assiduously to do so, but was firmly told by his compatriots and the Europeans (dare I say fellow Europeans?) that it was Bush’s war and not his. One gets the distinct impression that the Turkish military would like to make a little bit of it their war, and others will jump in the breach soon enough. But for now it remains Bush’s war.

Wars as expressions of conflicts between individuals are nothing new, of course. What else was the Battle of Hastings in 1066? But it’s appalling that a modern war of such scope should be the result of the way Hussein and Bush rub on each other.

And the practical implications of that? There is no point any more in debating terrorism, or weapons of mass destruction (as I did in another
weblog
a few weeks ago), or oil, or the Palestinian situation, or any other aspect of reason.
We are in the age of passion. You joined this war (or its opponents) years ago, without knowing it. I suppose human history has always catapulted itself ahead, or backward, with similar blindness. There is not much anyone can do now until it is all over; but once we have some breathing room we must start to take the duct tape off of our eyes.

David Sklar

AddThis Social Bookmark Button

Related link: http://www.egovos.org/march-2003/index.html

I spent three days in Washington DC this week at the Open Standards/Open Source for National and Local eGovernment Programs in the U.S. and EU conference.

The conference started off with Whitfield Diffie, looking like Gandalf in a
3-piece suit
, describing some of the security benefits of open source
software and transparency in general. This boils down mostly to the notion
that a secret which is difficult to change is a vulnerability. Cryptosystems
are time-consuming and expensive to develop, so if your security depends on
the secrecy of the system, then you have big problems if that secrecy is
breached. Keys are easier and cheaper to regenerate. If you have to come up
with and distribute a new encryption key because one was compromised, you
have a much smaller headache.

Peter Loscocco talked about NSA’s SELinux project, which adds a
Mandatory Access Control framework to the Linux kernel. What was most
interesting to me about this talk was the discussion of technology transfer
as an explicit part of NSA’s mission. Linux provided a more effective means
for them to accomplish this than previous efforts.

A session presenting the results of a survey of Open Source software in the
Department of Defense
revealed plenty of examples. However, it seems that
concern about licensing hinders its use. This is story that has been
repeated many times here. A commercial or government organization convenes
their herd of IP lawyers to decide whether using Open Source products would
imperil their rights. Yet hardly any of these organizations are planning to
modify the OSS, and even fewer have redistribution plans. The licenses
shouldn’t be such a concern, but they are. The absence of caselaw regarding
Open Source license enforcement makes this an even murkier area.

Fixing unneeeded licensing concerns is mostly a perceptual problem, but a
more substantive one is indemnification. Johan Goossens of
NATO and Rob Page of Zope Corp. gave an presentation of the NATO intranet
system developed on top of Zope. One of the reasons that NATO chose Zope was
that a company stands behind to indemnify the components in case of patent
problems or other issues. The existence of a corporate backer wasn’t crucial
for support, release management, or other traditional factors cited in
defense of proprietary software (although those are nice), but solving the
indemnification issue was necessary.

Jim Willis gave an excellent talk about products he’s developed for the
State of Rhode Island that enhance citizen access to public data. Public
interest groups and lobbyists both appreciate the ability to track rules,
regulations, and pending legislation with e-mail alerts, calendaring, and
various kinds of searching. Using PHP to glue together existing open-source
products, Jim produced impressive results in just a few months. One of his motivations is the very important point that his government has a very real responsibility act as a custodian for data that belongs to its citizens. Storing the data in open formats and building open source tools to access that data are a crucial part of that custodianship.

Jesse Kornblum from the Air Force Office of Special Investigations Computer Investigations and Operations Branch demonstrated some forensics tools that he uses to gather evidence. His team has released the source code to their customized versions of dd and md5sum. His talk highlighted another asset of open source software: easy independent verification of results. If his investigation produces evidence that prohibited material was found on a computer, no one (prosecution or defense) has to take his word for it. He can provide the disk image he worked on, the tool that found the prohibited material and the source code to that tool. His results can be reproduced and verified by others. He can even provide the source code to the tool that ensures that the disk image contains the same data as the original computer being investigated. Open source ensures that the conclusions of investigators aren’t black box “take it from me” assertions, but well-justified statements of fact that can be independently verified and duplicated.

My talk on PHP went well. I had a cheering section in the folks from the OU Sinapse project. They’ve built (and open-sourced) a huge campus portal project with PHP and have many universities both deploying it and collaborating on development. There were plenty of other examples of PHP users I ran into at the conference such as the US Defense Department, the US Census Bureau, and the Mexican federal government.

There were a number of sessions and discussions that debated the relative security merits of open source and closed source software. The typical response to “open source => more eyeballs => security holes are found and fixed” was that “open source => more enemy eyeballs => security holes exploited before they’re widely fixed.” Mostly overlooked was the fact that a sufficiently well-funded and well-connected attacker will have the source code to a “closed source” product. Microsoft has signed a shared source agreement with the Russian government. How likely is it that copies of the source code might make its way out of the government? Would it be that difficult to get a job with the company that Oracle hires to empty its trashcans and bring a FireWire DVD burner to work with you one night? Security is always a tradeoff, never an absolute. But when governments are discussing repelling attackers, they have to be prepared for the best attackers. For those folks, everything is open source.

What are your thoughts on or experiences with government and open source software?

Jacek Artymiak

AddThis Social Bookmark Button

Related link: http://www.avet.com.pl/en/conferences/bsday/bsday.php

BSDay 2003 will be the first Polish BSD conference, and the first Polish IT conference sponsored by O’Reilly & Associates.

Here’s more information in English and in Polish.

I’ll be there to speak about OpenBSD among other things.

Kevin Bedell

AddThis Social Bookmark Button

It was only a question. I didn’t expect a fight to break out.

But at the Thursday afternoon keynote on ‘The Future of Java’ at SYS-CON’s Web Services Edge 2003 East Conference in Boston, that’s almost what happened.

The keynote was a panel discussion with some of the best people in the world when it comes to understanding the future of Enterprise Java. On the panel were:

Simon Phipps
Chief Technology Evangelist - Sun Microsystems

Marc Fleury, Ph.D.
founder and president of JBoss Group

Dave Chappell
VP, Chief Technology Evangelist - Sonic Software

Dr. Jeff Capone
CTO - Aligo

Tyler Jewell
Director, Technical Evangelism - BEA

After about 30 minutes of general discussion the floor opened to questions. I figured this would be a great chance to get some insight as to why companies should adopt either Open Source or commercially-developed J2EE application servers. So I raised my hand and asked:

“Can you as a panel compare the difference in the value propositions of open source and commercially-developed J2EE application servers?”

At first there was just silence as they looked at each other. Then Marc Fluery said, “I wouldn’t touch that question with a ten-foot pole!” - though true to his style, he immediately followed with “no, actually of course I’ll answer it”.

And Marc then rattled off a series of pretty impressive reasons for Open Source, including:

  • Because of their cost structure and the international reach of JBoss Group, they literally have some of the best developers in the world working on the project.
  • They support the J2EE standard, though they are not tied to it for marketing or business reasons, and they have implemented features not mandated by the J2EE standard in order to build a better product.
  • Because they have such great developers and are driven by nothing other than technical reasons, they are actually able to innovate and extend the capabilities of J2EE containers. He gave examples where some of their research is considered as being world class. Others on the panel agreed with Marc on this point.

But both Simon Phipps (from Sun) and Tyler Jewell (of BEA) cited sources indicating that the actual license costs were only a part of the total cost for developing applications. Tyler indicated 1) that companies usually spent about 6 times what they spent for licensing on support and services, and 2) that BEA’s professional services rates were actually cheaper than those of the JBoss Group anyway.

Tyler then ask Marc if the JBoss Group would consider setting their professional service rates to be 6 times their license costs (which, of course, was a joke since JBoss is Open Source and free…).

Simon Phipps spent some time describing Sun’s commitment to Open Source technologies (which I agree have been very significant). Upon hearing Simon describe the Net Beans technologies that Sun has been supporting, Marc Fleury lifted his hand to his mouth and made a funny sound not unlike a sick duck quacking.

One of the points that Simon (from Sun) made in support of commercial products is that they were more likely to be certified as supporting the J2EE standards. (Commercial companies generally pay Sun to license the tools used to ‘certify’ products as ‘J2EE-compliant’.) With Open Source projects it was likely that they didn’t do so - and that since Open Source projects were developed by a team of developers for whom “standard compliance” may not be as important as performance or other technical features, it was possible that the Open Source project may diverge from the standard and leave users as locked in to a single platform “as if they’d used .NET”.

Simon also mentioned that Sun had offered to make the toolset for J2EE certification available to JBoss and challenged Marc on-stage to get JBoss certified as compliant.

Marc discussed how JBoss was actually hoping to feed some of their extentions back into the J2EE standard. He said that improvements to performance and capabilities were important - and that standards compliance for its own sake didn’t always result in the best products (”remember CORBA?”, he asked).

Tyler (from BEA) also added that his company provides phone support 24 hours a day all around the world (and generally in the local language).

As time ran out on the session and people began filing out of the auditorium bound for other sessions and tutorials, the panel discussion stayed heated. I heard people calling for them to ‘take it off-line!.

I guess in the end I took from it that while software being free and open source is good, that may or may not mean that it’s the most cost-effective solution for your company. But I also walked away certain that innovation in the industry was as likely to come from the Open Source sector as it was from anywhere else - if not more likely.

In the end, you need to compare the options for yourself and make the decision based on all the needs of your business.

If you have questions, though, I’d recommend not asking this group. You might end up having to pull them apart.

Schuyler Erle

AddThis Social Bookmark Button

Related link: http://www.extremetech.com/article2/0,3973,708884,00.asp

Conventional wisdom holds that simultaneous use of 802.11b in a single location without interference is limited to the three non-overlapping channels in the 2.4 GHz ISM band, which can severely complicate Wi-Fi deployment strategies. According to this article, the particulars of 802.11b actually permit simultaneous use of 4 slightly overlapping channels — with less than 5% interference between channels. (Thanks, Nate!)

Uche Ogbuji

AddThis Social Bookmark Button

Related link: http://www.amk.ca/quotations/python-quotes/

We will perhaps eventually be writing only small modules which are identified by name as they are used to build larger ones, so that devices like indentation, rather than delimiters, might become feasible for expressing local structure in the source language.

–Donald E. Knuth, “Structured Programming with goto Statements”, Computing Surveys, Vol 6 No 4, Dec. 1974

I love to tell people how I discovered Python. I was just getting into Red Hat, back in late 1996 (Red Hat 4.0, if I remember correctly). I was using the printer set-up UI and it broke, leaving a traceback to the screen. I remember on a lark following the traceback to the source file, which was in a languae I hadn’t heard of. Reading the section of code around the reported failure, and after a bit of trial and error, I was able to fix the bug and that got the printer applet working again. I don’t remember the details, but I believe it was a misplaced variable assignment in one of the code branches.

I did not look at a single reference on Python during that incident. I was pretty much blown away at how clear the language was, and I’ve been a strong Python user and advocate ever since then. But this discusson is about indentation The funny thing is that I didn’t learn that Python required indentation until I later on read the Python tutorial, trying to get properly into the language. Even then, I don’t recall that, that detail caused me a moment’s pause.

When I was working on the printer GUI, through all my trial and error I always just naturaly followed the indentation that was in the code module. I was a C++ guy at the time (and just beginning to learn that things were deeply amiss with Java) and everywhere I’d ever worked, your code would be rejected upon review if you didn’t meet code standards, of which indentation was a prominent part. For this enforcement to come from the language rather than peer review seemed a natural progression.

I remember the Knuth Indentation Quote (KIQ) coming up in the Python community. There was some talk a few years ago in the Python Software Association of putting it on a T-shirt for sale (I don’t believe thre was ever such a T-shirt designed but one of the IPC T-shirts hadd the motto “Life’s better without braces”). I never really appeciated the quote because I always felt that I’d rather advocate Python based on serious considerations of expressiveness and flexibility rather than combating what I hold to be frivolities. And I didn’t really know the context of the KIQ.

Now Python has firmly entered the mainstream. There are still people who make an amazing amount of noise about a syntactic feature that enforces clarity. Over the years I’ve heard many criticisms of Python, many which have been very valid, and many of those which have since been addressed. This included poor Unicode support, lack of closures, tendency towards memory leaks and speed. I have never taken anyone seriously who held the indentation as a serious defect in Python, and so I might as well use the KIQ as the light-hearted tool it is. In future, rather than rolling my eyes the next time someone says “Ewww. Significant whitespace”, I’ll just smile and trot out the words of the greatest mind in the history of computer science (sorry, Turing and von Neumann fans).

Jacek Artymiak

AddThis Social Bookmark Button

Related link: http://www.netbsd.org/Changes/#10th-birthday

The NetBSD project was started 10 years ago, on Friday, March 21, 1993. With 3461 packages and 50+ hardware platforms under its belt, NetBSD is the most portable Unix operating system in existence today. It’s a good time to look back and see what they managed to accomplish so far.

Happy Birthday!

What are you using NetBSD for? Why did you choose it over other OSes? Share your NetBSD stories with the rest of us!

Kevin Bedell

AddThis Social Bookmark Button

In an article on CNN/Money last week, a recent Gartner report was quoted as saying, “By 2004, more than 80 percent of U.S. executive boardrooms will have discussed offshore sourcing, and more than 40 percent of U.S. enterprises will have completed some type of pilot or will be sourcing IT (information technology) services”.

In addition,
Forrester analyst John McCarthy recently predicted, “Over the next 15 years, 3.3 million U.S. service industry jobs and $136 billion in wages will move offshore to countries like India, Russia, China and the Philippines”, and that “The IT industry will lead the initial overseas exodus.”

Wow. Those are big numbers. How can you be sure that your job isn’t one of those outsourced? Here are some ideas:

1. Constantly update your skills.

In another weblog entry, Uche Ogbuji asked, “Are XML, Web Services, CORBA and such for Joe Codeloader?”. Let me answer that - “No, unless they want to make sure they stay employable!”.

Letting your skills go rusty while technology changes around you gives upper management a reason to shop around. Faced with the prospect of sending you through training at their expense (which takes time, costs more and pushes out project end-dates), they may just make the choice to outsource to cheap off-shore labor that already has the skills they need.

2. Get to know the business.

One thing that you have over any outsourcing company is specific, domain-knowledge on your particular company. You can learn to speak the business users’ language. You know about the other systems you need to integrate with. You understand the history of why the old systems didn’t work. This knowledge can’t be replicated.

(By the way, this also means staying in one place long enough to gain that knowledge!)

3. Make your customers need you.

This comes through being competent as well as getting along. You need your users to like you so much, that they ask for you by name. You want your users to tell their management how valuable you are and how they need you on the team.

4. Learn technologies that integrate other technologies.

It’s much easier to outsource development of a specific project than to outsouce the work of getting it integrated with the other systems you already have. This is why you need to know CORBA, SOAP, Web Services, MQ Series and whatever other technologies and middleware your company uses.

I’m finding that almost all my new projects have at least some level of effort associated with integration to other existing systems. Do what you can to move into those parts of the projects.

5. Learn a lot of technologies. Learn *base* technologies.

I had a friend in Engineering school that got a job with Ford after college and became a carburetor test engineer. He knew carburetors better than anyone! This, of course, did him absolutely no good when Ford switched everything over to fuel injection systems.

He was able to stay with them and switched into ‘metal forming’. He said, “I don’t care how many composite materials there are, they will *always* have metal in cars.”

Looking at your skills from this perspective, are you the computer-equivalent of a “carburetor test engineer”?

You need to learn as many different technologies as you can to make sure you’re safe if one is discontinued. You also need to learn *base* technologies - those that are used all over the place (like http, xml, web services, etc.).

Of course, nothing can make you safe from every possible circumstance. After all, as The Clash are often quoted as saying, “The future is unwritten”!

In the end, you are your own best safety net. Take control of your skill set and write your own future.

Uche Ogbuji

AddThis Social Bookmark Button

Related link: http://www.oreillynet.com/cs/user/view/cs_msg/15469

That article is an alpha-geek level article, to use Tim’s coinage. Well, guess what? Most programmers aren’t even geeks, period - they are just punching the clock. And that’s why complicated technologies fail, and that’s why the whole comparison game (like J2EE vs .NET) is essentially useless.

I don’t really think the above point of view is cynical (as the poster disclaimed a couple of times). I do believe that it points to all that is wrong in software development, and far more important than SOAP vs CORBA or XML vs CSV. When we program computers, we are effectively constructing simulacra of the real world. The most egregious failures occur when our
models diverge from the real world in unexpected ways. It takes an extraordinary amount of skill to model the world as we attempt to in our work, and I am skeptical that there will ever be a technology that comes along to make it a casual
effort.

There has never been a shortage of products promising to make programming easy enough for any odd punter. From the early 4GL packages to Powerbuilder to Visual Basic in software to the “Dummy” books and seemingly recesison-proof prorammer “bootcamps”, I am not exactly woried that Joe Codeloader suffers too much neglect. However, I have long argued that XML and even Web services are not technologies appropriate to his contingent. See here for my rant on the idea that an XML user would not understand the concept of a labelled tree. See here for my rant about boiling XML and WS down to glossy wizards. The former link is part of a relevant and interesting thread that’s nicely summarized here.

Even though I got a formal Computer Engineering degree, I don’t think at all this means that all developers who meet my idea of basic standards should have done so as well. I’ve been very privileged over the years to have worked with a variety of developers who have attained the highest level of craft and knowledge through individual study and effort. It takes some aptitude (and I am firmly convinced that aptitude plays a gigantic role in the craft of programming) and it takes a lot of motivation and hard work, but I really bristle at the idea that one should not expect much craft of programmers. It’s not so important that this is an offense to those who have put an extraordinary amount of work into learning how to program well. It is very important that shoddy craft by some programmers brings the entire profession into disrepute.

The question comes up “well, how many developers of such standard have you worked with in real life”. This question used to surprise me. Throughout my career, from my consulting to help pay college bills to my first jobs upon graduation through my overall progress as a professional consultant, I’ve almost always worked with programmers of the highest caliber. Very few of them have made a near-profession of commenting on technology, as I have, but of course loudness is no mesure of competence. I have come to realize that I’m very lucky, and that my direct experience is an odd corner of the full picture.

I do hope that techologies such as XML, which are very useful in saving time and improving expressiveness for conscientious developers, do not get sabotaged because of all the reflex considerations for the less engaged.

Uche Ogbuji

AddThis Social Bookmark Button

Related link: http://lists.xml.org/archives/xml-dev/200302/msg00476.html

Rem acu tetugit Paul Prescod. IOW, He nailed the point, as he so often does. I think the idea of merging header and body into a single document is the single biggest flaw in SOAP. Yes, SOAP section 5 (the RPC datatyping section) was probably the largest overall mistake in SOAP’s evoution (as even heavyweight SOAP boosters have started admitting now), but it is far less fundamenal than SOAP’s monolithic design. I think the proper “fix” is to have XML in the HTTP payload as the message only, and the SOAPish headers to be moved as HTTP extension headers using XML external parsed entities in the values.

Jacek Artymiak

AddThis Social Bookmark Button

Related link: http://www.avet.com.pl/pl/companynews.php?id=0

BSDay will take place in May 2003 in Warsaw, Poland.

Jacek Artymiak

AddThis Social Bookmark Button

A friend of mine deleted some system files while he was logged into his Mac OS X box as root. He didn’t have backup copies, and he did not want to reinstall the whole system. Off we went to Apple Support and did a quick search. I did not hold my breath for an easy solution, but we were lucky this time. Fortunately for my friend, the files he removed were a part of the April 2002 Security Update and they have not been changed by other Security Updates since then. Within seconds the April 2002 Security Update package was on the Desktop. Here’s what we did next:

  • Double-click on the .dmg.bin file and we had it unpacked on the Desktop.
  • Another double-click and the .img file was mounted on the Desktop.
  • Next, we opened a new Terminal window and did:
    cd /Volumes/Security Update April 2002/SecurityUpdateApr2002.pkg/Contents/Resources/

  • Next, we had to unpack the archive with:
    sudo pax -rvzf SecurityUpdateApr2002.pax.gz

  • Now it was only a matter of copying missing files from the newly created directories to the directory where my friend had some fun with rm.

I’m sure that Apple would not encourage you to play with Security Updates in that way, but it was an emergency. Do not count on such hacks to save you every time, but if you’re stuck, some Unix trickery can help. Now, go and do those backups you’ve been planning to do since last summer!

Have you ever used Security Updates in non-standard ways? Share you experience with others…

Uche Ogbuji

AddThis Social Bookmark Button

Related link: http://www.xs4all.nl/~irmen/comp/CORBA%20vs%20SOAP.html

This is a more in-depth comparison of SOAP to CORBA than the rant I blogged earlier. It is quite sharply biased against SOAP (I’ve never been much swayed by “eeew XML is so verbose” arguments). But I think many of its points are fundamentally sound. Mike Olson also ran some SOAP/XML-RPC/CORBA performance tests on Python, with remarkable results.

Uche Ogbuji

AddThis Social Bookmark Button

Related link: http://sg.sun.com/events/presentation/files/kmasia2002/Sun.KnowledgeMngmnt_FINAL…

This Sun white paper highlights a project I and my Fourthought colleagues consultant on, using XML/RDF/DAML/OWL Semantic Web technologies to develop a repository of metadata from distributed business information. Actually, the repository has been launched and is being used within Sun, and it’s been a nice demonstration of the power of Semantic Web technologies in closed domains, whether or not the Semantic Web itself will ever be practical.

Uche Ogbuji

AddThis Social Bookmark Button

Related link: http://www.ietf.org/internet-drafts/draft-klyne-message-xml-00.txt

I’ve always been a proponent of RDF in reasonable circs: i.e. in closed systems where authorities are clear. But ocasionally I come across the usage that makes me wonder whether it is madness, genius, or both.

The idea of encoding e-mail messages in XML is not new. Several years ago, Jonathan Borden developed XML Mail Transport Protocol (XMTP) for representation of RFC 822 and MIME in XML. I thouht Jonathan might have also experimented with an RDF variation on this, but the relevant Web page is down right now, so I’m not sure.

Graham Klyne’s Interet Draft is much more comprehensive than XMTP. Though it’s in RDF/XML, it tries to keep the RDFisms as disceet as possible. Nevertheless, the question is whether the following works for an XML e-mail representation:

<emx:Message
    xmlns:emx='URN:ietf:params:email-xml:'
    xmlns:rfc822='URN:ietf:params:rfc822:'>
  <rfc822:from>
    <emx:Address>
      <emx:adrs>mailto:Christopher.Robin@GreenDoor.org</emx:adrs>
      <emx:name>Christopher Robin</emx:name>
    </emx:Address>
  </rfc822:from>
  <rfc822:to>
    <emx:Address>
      <emx:adrs>mailto:Pooh@PoohCorner.100Aker.org</emx:adrs>
      <emx:name>Winnie the Pooh</emx:name>
    </emx:Address>
  </rfc822:to>
  <rfc822:subject>Re: Woozle hunting</rfc822:subject>
  <emx:content type='text/plain'>
    You're the Best Bear in All the World
  </emx:content>
</emx:Message>

I’ll have to chew on it a bit to decide. “It’s too verbose” comments ae too facile: remember that the aim here is more machine accessibility than human readibility.

chromatic

AddThis Social Bookmark Button

Related link: http://opensourcetesting.org/

A press release announcing OpenSourceTesting.org just crossed my desk. Their mission? To promote open tools within the test automation market. As the author of a couple of these tools myself, it’ll be nice to hear about businesses using them.

Schuyler Erle

AddThis Social Bookmark Button

Am I the only one who got spam from Microsoft this morning, advertising Windows Server 2003 and Visual Studio .NET? I’m confused about where they got my e-mail address — I haven’t used a Microsoft product since 1999, and I’ve only been working for O’Reilly since the beginning of 2001. Also, what’s this crap?

Please note that it can take up to
eight weeks to update customer information in our database; therefore,
you may receive e-mail from us within that time period.

Are they really that desperate for business now that they have to join the ranks of porn peddlers, penny stock hucksters, herbal Viagra purveyors, and the sons and widows of ex-Nigerian strongmen?

Did you get spammed by Microsoft, too?

David Sklar

AddThis Social Bookmark Button

Related link: http://www.iana.org/reports/af-report-08jan03.htm

In August 2001, the Taliban government of Afghanistan restricted Internet access in all of Afghanistan to one computer.

In September 2002, the Islamic Transitional Government of Afghanistan asked IANA for control of the .af CCTLD (with the United Nations Development Program assisting with technical infrastructure). In January, that request was approved.

The existing Ministry of Communications web site isn’t too exciting, but they’re just getting started. The one area where Afghanistan may have trouble is making lots of money from English-speaking vanity domain seekers. http://meatlo.af is not so enticing.

Andy Oram

AddThis Social Bookmark Button

Related link: http://news.com.com/2100-1016-991622.html

A lot of brickbats are coming the way of SCO since it launched a lawsuit against IBM on the grounds of trade secrets. What’s scandalous is not the choice to resort to a lawsuit–because companies have to defend these sorts of things in court in order to preserve their meaning–but the disregard for the needs of Linux users, developers, vendors, and watchers everywhere. SCO chose a low road indeed, trying to maximize its legal flexibility instead of acting like a member of a community.

Linux supporters are worried about this for good reason. The lawsuit inevitably recalls the suit AT&T brought against one branch of BSD developers in 1992. Then as now, the issue was that developers had access to UNIX during the time they developed their own code. The AT&T complaint involved copyright rather than trade secrets, but the parallels are unmistakable.

Although my memory may deceive me, I believe AT&T never demonstrated that a single line of BSD code originated in UNIX (which officially should be written in all-caps). The lawsuit was resolved after many years, but a lot of people blame the confusion around the suit for the stagnation of BSD and its inability to take off at the crucial moment when people were looking for a free software operating system. (I doubt that the lawsuit was the problem, but it did waste time and make a mess of things.)

AT&T sold its rights to UNIX long ago, apparently recognizing that it was managing every aspect of that valuable technology with the same incompetence that it had conducted the BSD lawsuit. As intellectual property, UNIX bounced around for a while and ended up at SCO. It’s probably no coincidence that SCO decides to act the heavy around this period when many observers believe UNIX is dying and that Linux will take over where it stood.

But they know very well what problems and bad feelings the BSD lawsuit reached. They know how many people (roughly) depend on Linux day by day. What would a responsible company do to uphold its rights while allowing the world to continue?

SCO could have examined Linux code and determined where their purported trade secrets lay. They would then have widely publicized the disputed code. They’d say, “Don’t use JFS” (or whatever it happened to be); “we’re litigating it.” Whatever components were in dispute could quickly be pulled out of the kernel; users could depend on other components for whatever functionality they needed.

Of course, SCO’s lawyers wouldn’t tell them to do this. I’m sure the lawyers want as wide a field to play on as they can get. And it is not they who will be appalled when play is done and they discover the whole field has been turned into a desert.

SCO can still overrule its narrow-minded lawyers and take a high road. If they’ve got a claim, make it clearly. That is what the public deserves. Judging from the scattered news reports I’ve read, they refused to be specific even in the legal complaint they sent the court.

And this hand-waving is a tell-tale sign of weakness. We are all justified in assuming, till we have evidence to the contrary, that SCO’s lawsuit will go the way of the evidence the Bush administration waved about excitedly for months concerning aluminum tubes purchased by Iraq, now revealed by weapons inspectors on the ground to bear no relation to weapons of mass destruction. But millions of users around the world are in limbo until we know for sure, and there is no reason for that except malice or hamfistedness on the part of SCO.

What’s behind the lawsuit?

David Sklar

AddThis Social Bookmark Button

Ambitious plans are in the works for a vast digital library in Alexandria. Read all about it in the New York Times:

  • http://www.nytimes.com/2003/03/01/arts/01ALEX.html

    Searching the unstructured, unedited web via Google or alltheweb.com has its virtues. I’ve solved plenty of “why doesn’t this compile” problems by pasting error messages into Google’s search box.

    Many searches, however, require slogging through piles of nonsense. If I’m looking for accurate historical or political information, I’m always worried if I find a page that seems to be helpful but is just written by some random freelance interested person. Maybe it’s accurate or maybe the author is well known on soc.culture.freedonia as a notorious partisan of a fringe anti-Freedonian militant sect.

    Digital information distribution projects that involve editors or librarians are interesting combinations of the reach of the web with the discerning filter of a human. Sure, wikis and faq-o-matics have their place, but there are times when I want to be able to trust and verify the sources of information.

    So what’s out there that fills this role in various topic areas? There’s a good source for O’Reilly books, but using the Internet to learn about the Internet are the baby steps of this kind of research. Here’s a few things I’ve found:

    • JSTOR provides online access to many scholarly journals.
    • arXiv provides free online access to current physics, math, and CS research
    • The New York Public Library has about 30,000 searchable digital images online with plans for another 570,000 in the next few years

    What are your favorite structured/edited online databases?

  • AddThis Social Bookmark Button

    Related link: http://bioperl.org/pipermail/bioperl-l/






    Introduction


    As I seem to be volunteering for more and more BioPerl documentation
    jobs recently, I thought I’d pool my resources and recycle some of my
    tuits to write a list summary. Expect these to be sporadic and
    incomplete; my goal is to highlight important questions, changes,
    fixes, and proposals, not recapitulate all list traffic. I’ll try to
    include appropriate links to specific messages, or at least to the
    parent message. It’ll probably take me awhile to get good at this, so
    please bear with me (and do send any suggestions).


    To play a bit of catch up, I’m now going to loosely summarize the
    entire month of January (leaving a few topics untouched that are
    better addressed in February). February’s summary will be ready soon,
    after which you’ll see more easily digestable weekly (or perhaps
    bi-weekly) summaries. I’ll also be posting the HTML-ized summaries on
    my O’Reilly weblog with active hyperlinks.


    One item from December 31 of 2002 bears mentioning: Ewan Birney
    released stable version 1.2, with significant new functionality, and
    important updates to code that makes use of NCBI web services;
    upgrading is highly recommended, although some of the January list
    activity reflects small trials and tribulations with this release.


     http://makeashorterlink.com/?S17521DA3




    Questions



    • Searching the mailing list archives

      This seemed like an appropriate topic to put at the top of my list.
      The Bioperl-l mailing list isn’t exactly as high-traffic as
      perl5-porters or the linux kernel mailing list, but it is a mixture
      of both deeply technical development issues and novice user
      questions. While the BioPerl tutorial and documentation are the
      first places one should look for answers, the second place must be
      the archives of the mailing list. Brain Osborne pointed out that
      “the Search box is hidden below the Thanks link at www.bioperl.org”.

      It wasn’t mentioned, but the “htdig” link Hilmar Lapp pointed out
      (which is also below the search box) does not actually index the
      bioperl mailing list, but seems to search all other OBF-affiliated
      lists (biojava, biopython, etc) …


       http://users.bioperl.org/htdig/

      Michal Kurowski pointed out that “the quickest way of accessing old
      postings seems to be a group archive from the mailman pages” and that
      “you can even download the whole thing and use it as a local mailbox”,
      which happens to be very useful if you want to write list summaries.
      Mailman archives are at:


       http://bioperl.org/pipermail/bioperl-l/


    • Bioperl 1.2 builds under cygwin

      John Nash reports that he was able to build the 1.2 distribution
      under cygwin once MakeMaker issues were overcome (in his case by
      upgrading to perl 5.8.0). Other tips are provided:

       http://makeashorterlink.com/?S23631DA3
       http://makeashorterlink.com/?M27643DA3


    • Getting/untarring the 1.2 distribution

      Some people had trouble either FTPing the 1.2 distribution, or with
      successfully untarring the tarball. These problems seemed to have
      resolved by themselves, and may have been related to router issues at
      the server. For the record, bioperl-1.2 can be found at:

       http://www.bioperl.org/ftp/DIST/bioperl-1.2.tar.gz


    • man pages with bioperl-1.2

      People may have noticed that the “make” process for bioperl-1.2 does
      not generate nor install man pages. Ewan Birney explains, “In 1.2 we
      had to drop the manifyfication stage of the makefile because it was
      triggering a line-too-long error on some OSs due to shell
      constraints”. If you wish to get them back, comment (or delete) out
      the MY::manifypods sub in Makefile.PL

       http://makeashorterlink.com/?F10761DA3


    • Converting ABI trace to Phred format

      When asked why an ABI trace file read via SeqIO::abi didn’t generate a
      Bio::Seq::SeqWithQuality (a sequence with associated quality values),
      Aaron Mackey replied, “I’m not sure why abi.pm in the bioperl
      distribution doesn’t set it’s sequence factory to SeqWithQuality”; I’m
      still not sure why. See the fix at:

       http://makeashorterlink.com/?H2C954DA3


    • biocorba status

      When asked about the status of the biocorba project, Jason Stajich
      replied, “We have working bindings in java,perl,python and bridges to
      the respective Bio* toolkits from these bindings for servers and
      clients based on a slightly modified BSANE IDL spec from OMG”. He
      qualified that statement with “none of the original developers are
      using it in any of their work so development and final rounds of
      testing have not really happened”

       http://makeashorterlink.com/?G57A42DA3


    • DNA Smith-Waterman

      Yee Man has reimplemented the classic Smith-Waterman algorithm, with
      algorithmic improvements as suggested by Gotoh (affine gaps) and Myers
      & Miller (linear space), and wondered whether it would be a good
      addition to the BioPerl C-coded extension library (which currently
      contains a protein-only Smith-Waterman implementation by Ewan Birney,
      pSW.pm). Some discussion about classic (and novel) dynamic
      programming algorithms ensued, which eventually boiled down to a
      desire to have the generic (but extremely fast) Smith-Waterman code
      (written by Webb Miller) used by Bill Pearson’s SSEARCH implementation
      made more widely available as a linkable C library (which BioPerl
      could then subsume). Interested parties should contact me.
      Relatedly, to answer one of our FAQ’s yet again, if you currently want
      to do Smith-Waterman on DNA sequences, you should use BioPerl’s
      bindings to the EMBOSS suite of sequence utilities.

       http://makeashorterlink.com/?Y20B21DA3
       http://makeashorterlink.com/?M12B23DA3


    • using AUTOLOAD for get/set accessors

      The BioPerl code is full of explicitly coded accessor methods; often
      we are asked why we don’t use more code-efficient methods of
      autogenerating these identical functions (via AUTOLOAD or
      Class::MakeMethod). The discussion is long-ranging, but it boils down
      to wanting every accessor to have the same functionality with respect
      to undef values and return value behavior, as dictated by our accessor
      “boilerplate” (which we kindly ask everyone to use). Yes, we know we
      can achieve that via sophisticated Class::MakeMethod usage, but we
      have bigger fish to fry at the moment. There’s another, subtler
      issue about interfaces and implementation method introspection, but
      I’ll leave that to a later discussion.

       http://makeashorterlink.com/?Z22C32DA3


    • Bio:Seq no longer a RangeI (bug in Bio::Graphics::Panel)

      Much to the consternation of Lincoln Stein (and his legions of
      Bio::Graphics users), BioPerl 1.2 introduced a change to Bio::Seq in
      that it no longer complies with the Bio::RangeI interface; see Heikki
      Lehvaslaiho’s “This has to be cruft!” message from November:

       http://makeashorterlink.com/?Q53C41DA3

      Unfortunately, Bio::Graphics::Panel relied on Bio::Seq having a
      “start” method, so lots of existing code broke. A number of fixes
      were recommended, including a) using a Bio::Seq::SeqFactory to
      generate Bio::LocatableSeq’s (which do implement RangeI methods), b)
      patching your Bio::Graphics::Panel and c) upgrading BioPerl 1.2 to
      the live CVS development version. A BioPerl 1.2.1 is forthcoming for
      this, and other reasons.


       http://makeashorterlink.com/?R17C61DA3
       http://makeashorterlink.com/?S3AC21DA3


    • complement(join(e1, e2)) vs. join(complement(e1), complement(e2))

      Periodically, people ask “Is it possible to have bioperl output
      features in Genbank format of the form
      “complement(join(1..50,60..100))” rather than
      “join(complement(1..50),complement(60..100))?” This time it
      degenerated a little into a discussion about whether these two
      representations were semantically equivalent (short answer: yes). The
      answer to the original question is that BioPerl parses either
      representation into the same structure, which can only be “dumped” in
      one representation (presently, the latter).

       http://makeashorterlink.com/?H2CC16DA3


    • GenBank bond() FT operator

      Recent GenBank files have begun to exhibit a new feature location
      operator, “bond”, to identify dicysteine bonds in proteins and mRNA
      splice sites in RefSeq sequences. BioPerl has no concept of this
      location operator (which is really more of a feature, and would
      be better represented as a /bond feature table entry), and so
      currently dies when parsing a record containing it. A brute force
      fix is provided, but a better answer is yet to appear:

       http://makeashorterlink.com/?L24D12DA3





    Changes/Additions