January 2005 Archives

brian d foy

AddThis Social Bookmark Button

Computer power is all about monitor size. I remember moving aside a 13-inch monitor and hooking up a 21-inch monitor to my Quadra 650. It felt like my computer was 4 times faster.

Now I have a 17-inch external display (has anyone else noticed that Apple uses “display” and everyone else seems to use “monitor”?) that I piccked up from CompUSA on a special promotion: they are selling the Benq FP731 with a $40 offf in-store incentive and $80 of manufacturer rebates for an end total of about $180. I checked the reviews of the monitor: it didn’t get bad reviews, so for the price I figure I could take a shot at getting burned.

Now I have this nice and bright monitor putting my Powerbook display to shame. It’s the difference between washing your whites with your colored clothes, then having someone who knows something about laundry come along and wash the whites separately. The Benq is bright: 260 nits bright. I don’t know what a nit is, but this display has 260 of them and it makes my Powerbook whites look dingy grey. I even checked to ensure the cats hadn’t been stepping on the brightness keys (it’s that and Num Lock they they always seem to hit).

Curious things emerged when I added the display. The Powerbook detected the display immediately and things just worked. I had put the display to my left and arranged the displays in the control panel to match their physical arrangement. When I did that, I couldn’t get to my Dock anymore! I like the Dock on the left side. Now when my cursor wandered that way it kept wandering onto the the other display. It was a bit confusing because my mind told me that I wanted to get to another application, and my hand did the right movements, but my cursor ended up on the other display while I was trying to remember what I was doing.

I re-arranged the displays so the external one was “under” the Powerbooks. I think I have the going down stuff in my head, but from the external display I move to the right thinking I’ll end up on the Powerbook display. I end up activating Exposé since I use the hot corners for that. Exposé doesn’t have much to do on the external display, but al lthe windows on the Powerbook display move around.

I’m curious what will happen as I get used to this arrangement, then leave the external display behind as I travel. Will I keep trying to make the cursor fall off the bottom of the screen?

Still, despite my cognitive obstacles, my computer all of a sudden feels more powerful. I don’t have as many overlapping windows because I have more real estate to deal with. If anyone wants to donate one of the large Cinerama displays, I can report back on whether the perceived increase in power is linear or expontential.

Ming Chow

AddThis Social Bookmark Button

Related link: http://www.cs.tufts.edu/~mchow/excollege

Back in December, I announced that I am teaching a course entitled “Security, Privacy, and Politics in the Computer Age” offered by the Experimental College at Tufts University. The course is open to all Tufts undergraduate students, regardless of area of study. This coming week will be my second full week of class. Here are a few news and notes about my experiences so far:

  • I am exceptionally pleased with how things are going. The students and the responses that I have received are tremendous.
  • I was very worried in my first day of class about enrollment –only six students showed up. An Experimental College class must have a minimum of eight students or else it will be canceled. The following week, 15 students showed up and officially registered for my class, which was tremendous. One of the problems was because many students did not return to campus for the first day of classes. Another major factor for the significant increase of students is word-of-mouth advertisement. What a difference a week makes!
  • Many students were afraid that my course would be too technical or “that you needed to be good with computers.” Of course, that is not the premise of my class, and I alleviated all student’s fears by saying that outright in class.
  • There are only a few students that have some technical knowledge, which works to my advantage, considering that was the intended audience for my course. Many students are majoring in a humanity or a social science (e.g. English, Psychology, Economics, International Relations).
  • Students said they were interested in my course because “they want to know more about computers” and many also recognized that computer security threats are growing.
  • I asked a series of entertaining and preliminary questions to the class. See news item 0004 on the News section of my course website. In short, all students have used Windows and Macs. Only a handful (about 5) students have worked with UNIX or Linux. All students have received a computer virus or some kind of malware in their lifetime. Finally, all students have different personal privacy preferences.
  • My first full lecture was on the basic software development life-cycle. I had a very engaging activity in class where I divided the class into two groups (developers and Quality Assurance) and one CEO. I spoke very little about programming and programming languages, but it seems that the students have a good idea on the software development methodology.
  • My last lecture was a bit more subtle. I discussed proprietary vs. free vs. open source software. One problem I encountered is that many students were not aware of open source software and “what does it mean.” Many students said they were accustomed to popular packages such as Adobe Photoshop, Microsoft Office, and AOL Instant Messenger, and many did not know that there were alternatives to those popular packages. Many students initially did not even know what source code was.
  • I have already assigned some homework. My homeworks are very straight forward. I have already graded my first set of homework, and I was very pleased with what the students did. See my first homework assignment on the Assignments section of my course website. In general, many students were honest with their answers (e.g. they didn’t write things that they didn’t know), and they used their common sense.
Jacek Artymiak

AddThis Social Bookmark Button

Related link: http://www.devguide.net/books/openbsdfw-01-ed/index.htm

I’m close to finishing my first book for O’Reilly. Yes, it is about firewalls. It is a continuation of BFWOAP2, which is a continuation of the original BFWOAP.
Someone suggested I should make BFWOAP2 available in PDF, which I’m a little reluctant to do. But another person suggested that a PDF version of BFWOAP might be a good teaser for those considering a purchase of BFWOAP2 or the O’Reilly book.

OK, you want it, you got it. It’s not free, but it is a PDF without DRM. What? No DRM? Yes, I trust you.

Jono Bacon

AddThis Social Bookmark Button

A little while back, I decided I wanted to learn Python. Knowing some local Python fanatics, I was not exactly uneducated in the relative merits of the language, and although I mainly used C++ and Qt for GUI programming, I was increasingly wanting to write some GNOME software with Python. Although Qt made GUI programming easier, it seemed to me that Qt was mainly suited towards writing huge, complex tools; the kind of tools that Trolltech showcase on their website. I don’t deny that Qt is an incredible product, but I was keen on writing software easier.

I decided to make the move over to Python and started reading Dive Into Python. Well, let me be completely straight here - I only really read a few chapters of the book before I was eager to get on and just write code. So, I trawled the Internet and read a few bits of code, and before I knew it I was writing my own little programs. At this primitive point in my learning, I figured that I should really set myself a test program to write so that I can at least work towards a project. In my experience I had tended to abandon many of these little test programs, but I was determined to write something of use. The more I read and learned about Python, it seemed increasingly easy to use, and I thought, what the heck, I will be adventurous. Right…

The project I decided to work on was in retrospect, remarkably adventurous. I, like many other gadget freaks across the world, own an iRiver. These little MP3/OGG players are renowned for their capabilities, and are well liked in the Open Source world for their ease of use with Linux. Hooking one of these players up to Linux involves two steps. Firstly, you plug it into your USB port, and if you are running a distribution with Project Utopia, a little window will pop up to display the contents of the iRiver’s hard disk. You can now drag over your songs. After you have done this, the second step is to update the database on the iRiver. To do this you can use a special little command line program called iRipDB that has been written to generate an iRiver database file. This database basically contains details of the tracks, song length, genres and other information that can make the iRiver more pleasurable to use. Although these two steps are fairly simple, I was keen for the database generation on the iRiver to be as simple as possible. So, in my infinite wisdom of knowing Python for around four hours, I decided I would write a GUI front end to iRipDB called GNOME iRiver.

At this point, I faced some distinct challenges:

  • I needed to figure out GUI programming - I had never done any GUI programming with Python. I knew I needed to use PyGTK, but how so? Also, I was utterly uneducated in how Glade fitted into the picture.
  • I needed to figure out how to interact with the iRiver - how would my little program talk to the iRiver? Was this simple or did it involve some freaky Python bongo?
  • I needed to figure out how to run iRipDB from my program - was this a tough challenge? I suspected there would be a Python command to run a separate process, but how complex was this?
  • Oh, and I still needed to learn Python itself. Best not forget that one.

It seemed the odds were stacked against me, but I pushed on. The first step was to understand how to create a GUI program. In the C++ Qt programming world, this involves creating your user interface in Qt Designer and then using some C++ black magic to re-implement the generated class so you can write slots that hook up to the signals emitted by your interface. Although certainly usable, this way of working is not exactly easy for the novice programmer, and I suspected Glade had a similar way of working.

Not so. In the Glade scheme of things, it seems that I could create my interfaces in Glade and then use a tool called simple-glade-codegen.py to process the .glade file and spit out a file in which I could write my Python code. The script would generate all the function frosting that was required when creating connections between signals and functions in Glade. This in itself got me up and running in no time; I had a GUI program running in 5 minutes.

Dealing with the hardware was my biggest fear. Whenever I have considered programming with hardware on a Linux system before, I have always assumed it to be a hugely complex maze of code. I am pleased to report that times have changes substantially. The recent work going on in the HAL and DBUS camps has given us the ability to easily connect to the HAL daemon and find out properties about the hardware on the system. These properties include all detectable information, as well as the ability to augment this information with special device information files.

With Python support available for HAL/DBUS, I plodded on to understand how to use the technology. Unfortunately, there was virtually no documentation on how to use HAL/DBUS with Python, and the only documentation was the HAL Specification, which is written in a language that assumes you already know everything about HAL anyway. I did skim through the spec, but it was pretty much lost on me, being the numpty programmer that I am. I subsequently turned to the HAL mailing list which is populated by some notable hackers, and each of them are very responsive. I first posted to see where to begin, and the resultant information, combined with the code from the HAL Device Manager, gave me the opportunity to get started. I sent some later posts asking for some more details, but the real help came from David Zeuthen on IRC. David walked me through the process of searching for specific devices. I plan on writing up this process sometime soon for the benefit of others.

With the hardware support working in literally a few days, I was utterly impressed how simple it was to deal with hardware with HAL. I then cracked on to implement the last remaining bits and pieces to actually make the program work. After a few mails from people who had read about my progress with GNOME iRiver on my blog, I decided to release an early (but buggy) version to see what these people thought. They reported success.

Astute readers of my previous work will know that I place a lot of importance in simplicity. This is part of the reason I have made certain technical decisions in my work, and this is the reason why I have written quite a lot about the importance of simplicity and usability. With these beliefs in mind, I see no reason why simplicity and usability cannot extend to the programmer. Although programmers provide a translation layer to convert between the technicality of code and the importance of functionality and usability, this translation can be made easier if simplicity is engineered in the programming side of the wall.

This experience has demonstrated some important principles in development for me. Firstly, the ease of use of Python, and the fact I don’t need to worry about many of the pure mechanics of coding (such as declaring types, memory management etc) lowers the bar for people to get involved in coding. I have never considered myself a fantastic programmer, and I have always had to work to understand concepts that natural programmers can execute in their sleep. Although I feel I understand the program design/usability side of the fence better than the programming side, Python has managed to make this translation process easier.

Simplicity breeds simplicity. If a culture of simplicity is engineered into the nuts and bolts of software authoring, it is likely that the core values of simplicity and usability will transcend to each new layer in the software stack. Take a look at GNOME for example. In recent years, a lot of emphasis has been placed in engineering good usability into the platform. This culture has been propagated by programmers such as Seth Nickell, Nat Friedman, Miguel de Icaza, Joe Shaw, Jeff Waugh and others. What we are seeing now is a definitive peer requirement that usability is well considered at every step of the development process.

This same kind of concept appears to be implemented inside the Python community. With the core Python language setting the bar for ease of use, each of the peripheral language additions and tools have a clearly defined level in which to match ease of use. We can see this in how HAL/DBUS are implemented, how PyGTK works, and how it all works with Glade. It would be pointless if Python was an easy to use language, but each of the tools built around it involved baffling levels of complexity.

It is great to have experienced such a simple and effective programming environment. With recent developments such as XUL, Mono, Ruby and other languages, a fundamental focus seems to be placed on the availability of quality high level development tools. I remember some years back hearing a barrage of snobbery about higher-level-than-C languages, and I am pleased to see this generic view has shifted. By making toolsets such as the one I have discussed in the article available, we are opening up the game to a lot more people. Sure, you may not want OpenOffice.org written in Python, but coding is not as black and white as small and large applications.

Have you had a similar experience? Should the bar be lowered? Share your views here…

Kevin Shockey

AddThis Social Bookmark Button

Related link: http://www.medsphere.com/media/press/20050125.rbw

At this year’s Open Source Business Conference, Geoffery Moore will pronounce that open source has crossed the chasm. Since I haven’t had the privilege of hearing what he will use as proof of this conclusion, I’ll reserve my judgment. I hope Doug Kaye over at IT Conversations covers the OSBC this year just in case I can’t make it. Recall that technologies begin in the early market, but must cross a chasm before joining the mainstream market and receiving widespread acceptance.

I will share something I’ve been watching closely for the last year. I’ve been on the alert for any venture capital deal, that involves a company related to open source. A few that you might already be aware of are JBoss, MySQL, and GlueCode to name just a few. The latest to join their ranks is Medsphere Systems Corporation. My belief is that tracking the flow of venture capital money into these companies will signal how well open source business models are doing. The obvious connection follows, the more money that flows into open source projects and open source communities, the quicker that open source will enter the mainstream market.

Medsphere enhanced the Department of Veteran Affairs. (VA’s) highly acclaimed, open source VistA EHR to develop Medsphere OpenVista for the commercial market. Over 15,000 physicians and 56,000 nurses in more than 1,300 healthcare organizations, including 160 medical centers and 850 clinics, are currently using VistA.

The apparent business model for Medsphere seems to be fairly tried and true. It is based on the delivery of services. Their current service offering includes deployment, training, support, and custom development. It is not clear whether their software is still open source, only that is was based on the original VistA. I didn’t found much information about the software to base any type of conclusion.

Note: There is a WorldVistA project hosted at SourceForge.net, and a OpenVista as well (no files here though). I do not know what relationship the Medsphere software has with the WorldVista project.

Hear of any other deals lately?

Uche Ogbuji

AddThis Social Bookmark Button

Peter Sefton’s “Hacking Open Office” comes at an energetic time for OOo. I’ve been pointed to ooo2dbk, a tool for generating DocBook XML from OpenOffice.org documents, which opens up even more XML possibilities. Let me also mention my complementary article to Sefton’s, which covers some additional topics such as XML Catalogs.

Jacek Artymiak

AddThis Social Bookmark Button

I’m thinking of starting my own podcast. I have a book to finish first, but when I’m done I’d like to set up my own little podcast corner studio and have a go at it, see what happens. Yesterday, feeling a little dizzy after all the medication I’ve been taking recently, I gave up on editing my writing and went online to check out what I might need to start. My shopping list includes the following:

  • a microphone — that’s easy, I’m going for an inexpensive Shure stick I found online for a reasonable price.
  • a microphone stand — my local music gear dealer will make a few zlotys on that one.
  • a mixer — I’m going to try the inexpensive Behringer gear with two microphone inputs. Can’t find them in the local stores, so I’ll order one online.
  • cables — damn, more cables… my wife won’t be happy when she sees more cable around the apartment.
  • headphones — I already have a pair of light studio headphones.

I will also need some sound effects, background music, and loops. I can get by with free stuff found on the Internet, but I really like the commercial sound effects libraries.

And I’ll need a host with plenty of disk space and no monthly transfer limit. For a reasonable amount of money.

So, summing up, the people who will make money on podcasting will be: manufacturers of MP3 players, microphones, mixers, cables, headphones, and all other audio gear, audio software (not sure about that one, Audition is free), ISPs, hosting providers, sound effects and music libraries, and possibly sound engineers and studios who may want to make extra money helping podcasters achieve broadcast quality. Oh, yes, publishers are no doubt working on podcasting books as I type.

But what about the podcasters themselves? One idea would be to sell past shows for download money or publish them on CDs. But that is so old and tried. And it doesn’t work that well. Maybe a better idea would be to sell subscriptions to th new shows and give the old ones away? Or combine both approaches and provide a small window of free download opportunity? Who knows. Oh, yeah, the most popular shows will be able to sell ad space, but that’s still a long way ahead. Podcasting is still a geek toy, it may be a couple of years before ordinary people discover the joys of audio without pre-programmed music created by people who talk about the things other people actually want to listen about. But it will happen. Cool!

Derek Sivers

AddThis Social Bookmark Button

Wouldn’t it be nice if we would just tell the customer, at time of purchase, “We’ll ship this thing you’ve ordered at the lowest rate possible, and bill you exactly what it turns out to cost.” But nooooo…. people want you to PREDICT what it’s going to cost!

RIGHT NOW : CD Baby only sells CDs, only from one warehouse, so we can predict the shipping cost pretty easily.

BUT SOON : CD Baby will have multiple warehouses in different countries *and* allow some musicians to ship items directly from them to the customer *and* allow the customer to split up their order to have some items sent to multiple addresses.

SO… guess what we have to know now?

#1 - what country the customer is having this item shipped to

#2 - warehouse_stock for this item - (to know closest warehouse that has it)

#3 - what shipping methods are allowed from that warehouse to their country (fedex, usps, etc)

#4 - cost to ship that item from that warehouse by that method to their country

#5 - which of those shipping options (from #3) the customer chooses

#6 - what other warehouse items are in order, so for example we can give discount if 50 different items going from one warehouse in one shipment

#7 - how much of a discount to give, for that. (or, what the ship-cost for that many items is)

Haven’t written the code, yet, to calculate all of this. It’s a little daunting.

Been there? Done that?

Derek Sivers

AddThis Social Bookmark Button

In 1.2 million CDs sold, I had never thought of “shipments” as a separate table. I always thought of it as attributes of an order, but if we’re going to be sending things from multiple locations, we’ll have multiple shipments for an order, each shipment with its own attributes, so…

SITUATIONS:
* - some items in an order backordered, so they’re shipped later
* - FedEx as main shipping method, but one backordered CD sent USPS later
* - one CD sent from Japan, one CD sent from Canada, both to a person in Switzerland
* - customer orders only one “merch” item, which the musician sends themselves: “shipment” is what that merchant tells us it is
* - some items in an order sent from our warehouse, some from self-ship merchant

new idea: make a table to keep track of shipments. a shipment has its attributes that need to be kept track of, and the lineitems in an order just link to this shipment.

This kinda turns our existing internal model upside-down, but the more I think about it, I realize it makes a lot of sense. I love it when you come across things like this that make you look at your system in a whole new way.


CREATE TABLE shipments (
id serial PRIMARY KEY,
warehouse_id int REFERENCES warehouses(id) ON DELETE RESTRICT,
address_id int not null REFERENCES addresses(id) ON DELETE RESTRICT,
date_shipped timestamp(0) with time zone,
shipped_by varchar(8),
ship_method_id int REFERENCES ship_methods(id) ON DELETE RESTRICT,
tracking text
);


CREATE TABLE lineitems (
id serial PRIMARY KEY,
inv_id int not null REFERENCES invoices(id),
item_id int not null REFERENCES items(id),
address_id int REFERENCES addresses(id),
shipment_id int REFERENCES shipments(id),
linestatus int not null REFERENCES line_status(id) default '1',
quantity int not null default '1',
currency char(3) not null default 'USD',
price numeric(7,2) not null,
wholesale numeric(7,2) not null,
shipcost numeric(6,2),
soundscanned date
);

Nome sane?

Derek Sivers

AddThis Social Bookmark Button

Sometimes in my blog, here, I won’t have time to write a full entertaining narrational “article” about something, so I’ll just quickly paste in some thoughts that you may find useful if dealing with similiar problems or situations on your end.

Here’s one: GIFT CERTIFICATES
I used to think of them as just items in a cart, with a negative balance. But here’s another way to think of them…

# The membership-account idea : SUBTRACTION ON YOUR ACCOUNT

Little Jimmy gets a gift certificate from grandma.
He comes to CD Baby, and puts some CDs in his cart.
Upon checkout, we ask him to create an account here so we know who he is.
Anywhere in the process, once we know who he is, he can tell us if he has any gift certificates.
By entering their passcode, it adds the full amount of the gift cert to his account - permanently.
He can do this with multiple gift certs, and it will keep adding to this single amount.
Whenever he’s buying anything, the total cost of his order (including shipping) has this gift-cert balance subtracted from it.

== HOW IT WORKS, INSIDE:
Someone purchases a gift-cert. It does nothing but create a giftcert, (asking them for optional extra info, like who to say it’s from, to, and a message with it).

When Little Jimmy comes to use it, we update that giftcert with his customer_id, used=true, date_used=now AND:
A new entry in giftcert_entries with the giftcert_id, and the negative-amount of the giftcert total.
The above two steps are a single transaction, like double-entry accounting. We took it out of one column, into another.

(A sum of giftcert_entries tells us his total gift balance: -20)

He completes his order, total $17 - so his $20 giftcert is used to pay for the order:
A new entry in giftcert_entries with the invoice_id and the positive amount of the giftcert used.

(A sum of giftcert_entries tells us his new total gift balance: (-20 + 17 = -3))

== DATABASE:

CREATE TABLE giftcerts (
id serial PRIMARY KEY,
code char(10) not null UNIQUE,
amount numeric(8,2) not null CHECK (amount > 0),
email text not null,
from text,
to text,
message text,
used boolean not null default false,
customer_id int REFERENCES customers(id) ON DELETE RESTRICT,
date_used date
);


CREATE TABLE giftcert_entries (
id serial PRIMARY KEY,
customer_id int not null REFERENCES customers(id) ON DELETE RESTRICT,
entry_date date not null default CURRENT_DATE,
amount numeric(8,2) not null CHECK (amount <> 0),
giftcert_id int REFERENCES giftcerts(id) ON DELETE RESTRICT,
invoice_id int REFERENCES invoices(id) ON DELETE RESTRICT
);

You got a better way, punk?

brian d foy

AddThis Social Bookmark Button

Related link: http://www.apress.com/book/bookDisplay.html?bID=307

Apress took the best of Randal’s columns from WebTechniques, Linux Magazine, SysAdmin, The Perl Journal, and some others and put them in one book. They are also available online at Stonehenge’s website, but I prefer books myself. Still, it’s the most popular part of the website, even counting his collection of pictures of clouds and food

I’m not going to review Randal’s book (since I work for him), but someone will have a review in the next issue of The Perl Review.
I do get something out of this, though: If you buy enough books, Randal might buy me a nice steak dinner again.

Ben Lieberman

AddThis Social Bookmark Button

I have just submitted my final copy of my book “The Art of System Modeling” to O’Reilly for a tentative July publication. I have therefore set up this weblog to offer folks the chance to chat with me about the book, and any other topic that seems to be useful. I look forward to reading the various comments of the O’Reilly audience.

Thanks!

I would like to hear from you!

Uche Ogbuji

AddThis Social Bookmark Button

First of all, I follow
LinuxHardware.org, which posts news and article links related to Linux hardware. It’s a Slashdot-like site, and you may also consider
Slashdot’s Linux section, although there is a lot more noise there (often very entertaining noise, to be fair).

LinuxHardware.net is a complementary site, essentially a search engine for external resources relating to Linux/hardware issues.

It’s always worth referring to the Linux Hardware Compatibility HOWTO. It’s not as rapidly updated as more specialized resources, but it’s still a good bedrock resource for finding and working with Linux-friendly hardware.

Laptop users have an especially useful resource: Linux on Laptops. This is a compendium of user-submitted HOWTOs for a specific Laptop model and Linux distribution.

Also of use is the Linux tested site, which includes nice charts organized by hardware category and distribution, and
Linux Online’s list of Linux-friendly harware vendors.

Of peripheral (no pun intended) interest is Linuxdevices.com, which covers the amazingly rich world of embedded Linux. It offers news on embedded Linux phones, PDAs, digital cameras and camcorders, routers, and more. It also has news and resources for embedded Linux developers.

What are your favorite Linux hardware resources?

Kevin Bedell

AddThis Social Bookmark Button

In this great article from MIT’s Technology Review, Michelle Delio presents 13 ideas for putting the Mac mini to work as your non-primary computing machine.

My favorite? As the center of a Media/Internet/Communications hub for your automobile. The Mac mini’s media capabilities combined with its bluetooth and other features will make it the coolest car add-on gadget since air conditioning.

A company called Classic Restorations is now taking orders to a install Mac mini in your car. According to their President, Melvin Benzaquen, “For around the price of mounting an iPod in your car, you get a whole Macintosh computer.”

brian d foy

AddThis Social Bookmark Button

Related link: http://versiontracker.com/dyn/moreinfo/macosx/21941&vid=127425

I’ve had to use my Powerbook’s modem twice in the past month, and each time I got a hanging disconnect. The first time I waited a while, unplugged the phone cable, logged out, and other useless things before I rebooted. I hate rebooting. My uptime, a silly metric I take great pride in, was 38 days at that point (only because 38 days ago I upgraded the system software).

This time my uptime was 27 days (since the last time I had this problem during a holiday visit to family and had to use a modem). I figured there was a way to fix this with administrator kung-fu, but my sysadmin muscles have atrophied since I started using Mac OS X. Not only that, my prime resource, the internet, isn’t available until I fix the problem.

Although I went through my process list and killed everything that looked like it was using the modem, I fixed the symptom: the scrolling “Disconnecting…” status message in my menu bar. Changing the modem settings and applying them (say, from modem sounds off to on) looked like it gave me back my modem, but Internet Connect only looked like it was responding and the status didn’t change from “Idle”.

Now I’ve found “End Hanging Disconnect”, which is really just an AppleScript wrapper around sudo killall pppd, although it runs it 5 times just to make sure.

The little script has gotten some good reviews, but now I have to wait for a chance to use it. I’d like to see if it works for me, but I’d also like to never have that problem again. What’s going to win?

Uche Ogbuji

AddThis Social Bookmark Button

The Python/XML community has an unfortunately long tradition of dodgy benchmarks. I had a lot to say about probably the most egregious example in my article on PyRXP. PyRXP is called an XML parser, and its developers benchmark it as such against other Python/XML parsers. The problem is that it turns out PyRXP is not an XML parser. It fails the most fundamental conformance to the most important aspect of XML: Unicode support. As a result, a benchmark of PyRXP against an XML parser is ludicrously unfair. In my article I had a lot to say about how poisonous such unfair benchmarks are.

On the less egregious end are benchmarks of libxml2’s default Python binding, which is in many ways so gnomic (no pun intended) and trecherous that it’s also an unfair comparison against most Pythonic XML tools. It sounds as if Martijn Faassen’s lxml is making decent progress towards rectifying this.

But I must say that the benchmarks that were the last straw for me came from an old friend. Fredrik Lundh (”/F”) is IMO one of the few XML package developers in the Python community who really understand both Python and XML. This has been generally borne out in his ElementTree library, about which I’ve always had a lot of good things to say. cElementTree
came along and suddenly raised the Python/XML benchmark sweepstakes once again. As part of promotion of cElementTree, /F posted a benchmark on the home page. The benchmarks are very flattering to cElementTree, and it’s probably deserving of some such flattery, but as I examined the performance issue a bit more, I’ve come to conclude that his benchmarks are pretty much useless.

The problem is that besides a performance bug in my own Amara 0.9.2, which /F brought to my notice, and that was fixed in the subsequent release, I was unable to reproduce under real-world conditions anything like the proportions implied in /F’s benchmarks. Well, /F pretty much admits that all he’s doing in his benchmark is reading in a file using each library. Hmm. This is not the stuff of which useful benchmarks are made. Nobody reads in a 3MB XML document just to throw all the data away, least of all Python developers who have long been vocal of their desire to do as little with XML as possible. Of course of I can’t be 100% sure in this complaint because I haven’t seen the benchmark code, but then again that’s just another complaint.

I set out to run at least one real-world benchmark, in order to determine whether there is anything to the no-op benchmarks /F uses. The basics come from
this article, where I introduce the Old Testament test. The idea is simply to print all verses containing the word ‘begat’ Jon Bosak’s Old Testament in XML, a 3.3MB document. A quick note on the characteristics of the file: it contains 23145 v elements containing each Bible verse and only text: no child elements. The v elements and their content represent about 3.2 of the file’s total 3.3MB. In the rest of this article I present the code and results.

I’m working on a Dell Inspiron 8600 notebook with 2GB RAM. It’s a Centrino 1.7GHz, which is about equivalent to a P4-3GHz (modulo the equally wacky world of CPU benchmarks). The OS is Fedora Core 3 Linux, and I’ve tuned DMA and the like. I’m running Python 2.3.2. The following are my pystone results:

$ python /home/uogbuji/lib/lib/python2.3/test/pystone.py
Pystone(1.1) time for 50000 passes = 2.99
This machine benchmarks at 16722.4 pystones/second

I ran each case 5 times and recorded the high and low run times, according to the UNIX time command. In understand very well that this is not quite statistically thorough, but It’s well ahead of all the other such benchmarks I’ve seen in terms of reproduceability (I present all my code) and usefulness (this is a real-world use-case for XML processing).

First up: plain old PySAX. Forget the performance characteristics for a moment: this code was just a pain in the arse to write.

from xml import sax

class OtHandler(sax.ContentHandler):
    def __init__(self):
        #Yes, all this rigmarole *is* required, otherwise
        #you could miss The word "begat" split across
        #multiple SAX events
        self.verse = None
        return

    def startElementNS(self, (ns, local), qname, attrs):
        if local == u'v':
            self.verse = u''
        return

    def endElementNS(self, name, qname):
        if (self.verse is not None
            and self.verse.find(u'begat') != -1):
            print self.verse
        self.verse = None
        return

    def characters(self, text):
        if self.verse is not None:
            #Yeah yeah, probably a tad faster to use the
            #''.join(fragment_list) trick, but not worth
            #the complication with these small verse chunks
            self.verse += text
        return

handler = OtHandler()
parser = sax.make_parser()
parser.setContentHandler(handler)
parser.setFeature(sax.handler.feature_namespaces, 1)
parser.parse("ot.xml")

I get numbers ranging from 2.32 - 3.97 seconds.

Next up is PySAX using a filter to normalize text events, and thus simplify the SAX code a great deal. The filter, amara.saxtools.normalize_text_filter is basically the one I
posted here, with some improvements. The code is much less painful than the PySAX example above, but it still demonstrates why SAX turns off people used to Python’s simplicity.

from xml import sax
from amara import saxtools

class OtHandler(sax.ContentHandler):
    def characters(self, text):
        if text.find(u'begat') != -1:
            print text
        return

handler = OtHandler()
parser = sax.make_parser()
normal_parser = saxtools.normalize_text_filter(parser)
normal_parser.setContentHandler(handler)
normal_parser.setFeature(sax.handler.feature_namespaces, 1)
normal_parser.parse("ot.xml")

I get numbers ranging from 2.66 - 4.88 seconds.

Next up is Amara pushdom, which tries to combine some of the performance advantages of SAX with the (relative) ease of DOM.

from amara import domtools

for docfrag in domtools.pushdom(u'v', source='ot.xml'):
    text = docfrag.childNodes[0].firstChild.data
    if text.find(u'begat') != -1:
         print text

I get numbers ranging from 5.83 - 7.11 seconds.

Next up is Amara pushbind, which tries to combine some of the performance advantages of SAX with the most Pythonic (and thus easy) API I can imagine.

from amara import binderytools

for v in binderytools.pushbind(u'v', source='ot.xml'):
    text = unicode(v)
    if text.find(u'begat') != -1:
         print text

I get numbers ranging from 10.46 - 11.40 seconds.

Next up is Amara bindery chunker, which is the basis of pushbind.

from xml import sax
from amara import binderytools

def handle_chunk(docfrag):
    text = unicode(docfrag.v)
    if text.find(u'begat') != -1:
        print text

xpatterns = 'v'
handler = binderytools.saxbind_chunker(xpatterns=xpatterns,
        chunk_consumer=handle_chunk
    )
parser = sax.make_parser()
parser.setContentHandler(handler)
parser.setFeature(sax.handler.feature_namespaces, 1)
parser.parse("ot.xml")

I get numbers ranging from 9.44 - 10.27 seconds.

Finally, I look at /F’s cElementTree.

import cElementTree as ElementTree

tree = ElementTree.parse("ot.xml")
for v in tree.findall("//v"):
    text = v.text
    if text.find(u'begat') != -1:
        print text

I get numbers ranging from 1.53 - 3.18 seconds.

So what do I conclude from these numbers? As I’ve said before, the speed of cElementTree amazes, me, but it’s advantage in the real world is nowhere near as dramatic as /F’s benchmarks claim. More relevant to my own vanity, Amara 0.9.3’s disadvantage in the real world is nowhere as dramatic as /F’s benchmarks claim. IMHO, it’s close enough in performance to all the other options, and offers so many advantages in areas besides performance, that it’s a very respectable alternative to any Python/XML library out there.

But the point of this exercise goes far beyond all that. We really need to clean up our act in what is a very strange political battleground in the Python/XML space. If we’ve decided that MIPS wars are what we’re going to be all about in development, then let’s benchmark properly. Let’s gather some real-world use-cases and normalized test conditions. Let’s make sure all our benchmarks are transparent (at least release all the code used), and let’s put some statistical rigor behind them (not an easy thing to do, and not something I claim to have done in this article). Let’s do all this as a community.

While we’re at it, I’d like to repeat my call for test case diversity from my PyRXP article: [R]un the tests
on a variety of hardware and operating systems, and [don’t]
focus on a single XML file, but rather examine a variety of XML files.
Numerous characteristics of XML files can affect parsing and processing
speed, including:

  • The preponderance of elements versus attributes versus text (and
    even comments and processing instructions)
  • Any repetition of element or attribute names, values and text content
  • The distribution of white space
  • The character encoding
  • The use of character and general entities
  • The input source (in-memory, string, file, URL, etc.)

And if we’re not willing to do things rightly, let’s stop deceiving users with meaningless benchmarks.

What real-world conditions would you like to see represented in respectable Python/XML benchmarks?

Derek Sivers

AddThis Social Bookmark Button

A big change I’m adding in the CD Baby rewrite is the ability for the store to be browsing/searching only a subset of its items. This is useful for genre-specific stores, say jazz.cdbaby.com, where browsing and searching the catalog would only show you jazz albums.

I planning, I was calling these “LIMITERS”, because in SQL terms, I imagined it would work like this: Say I’m searching and browsing the store, looking at top-sellers or new-arrivals:

SELECT * FROM items ORDER BY sold DESC LIMIT 20
SELECT * FROM items ORDER BY date_added DESC LIMIT 20

… if you add a LIMITER of showing only jazz, the queries become:

SELECT * FROM items WHERE style='jazz' ORDER BY sold DESC LIMIT 20
SELECT * FROM items WHERE style='jazz' ORDER BY date_added DESC LIMIT 20

… if you add another LIMITER of showing only artists from Sweden, the queries become:

SELECT * FROM items WHERE style='jazz' AND location='SE' ORDER BY sold DESC LIMIT 20
SELECT * FROM items WHERE style='jazz' AND location='SE' ORDER BY date_added DESC LIMIT 20

… and so on.

At first I figured some smart object would parse a config of limiters and add it to all SQL queries. But I realized that would lead to some awfully slow queries, since some of the other queries in browsing and searching the store can get pretty complex on their own, and adding in the extra limitations would be even worse.

Jeremy suggested memcached. Brilliant! We pass it all the complex queries we want, and just tell memcached to remember the results, so our customers browsing the site see it fast. No extra work needed. Speed problem solved.

BUT… since one of my planned LIMITERs would be a join on another table, we foresaw that trying to get a class to just add a join to all of our queries is asking for trouble. The whole “complicated queries on the fly” idea had to be nixed.

So then I thought that I’d export a bunch of nightly cache tables of all of our items with these various limiters. Say, “jazz.db” would be a subset of our catalog, already defined as only jazz, where we’d have all the fields needed for browsing (id, artist_name, item_name, price, description, etc) - and search only that cache-table when browsing the jazz store.

We were about to do this when we looked at the stuff for browsing, realized it was all just duplicated cache-info, and all that we really needed to know is what item ID#s are included in this genre-specific store we’re browsing! That’s it! Everything else can be reliably joined in. No need to export entire new databases or tables. Just give a list of item IDs. Let’s call it catalogs and it would look like this:

CREATE TABLE catalogs (
id serial primary key,
name text,
description text
-- etc...
);
CREATE TABLE catalog_items (
catalog_id int not null REFERENCES catalogs(id),
item_id int not null REFERENCES items(id),
PRIMARY KEY (catalog_id, item_id)
)

Looking at top-sellers or new-arrivals becomes:

SELECT items.* FROM items
LEFT JOIN catalog_items ON items.id=catalog_items.id
WHERE catalog_items.catalog_id=5
ORDER BY sold DESC LIMIT 20


SELECT items.* FROM items
LEFT JOIN catalog_items ON items.id=catalog_items.id
WHERE catalog_items.catalog_id=5
ORDER BY date_added DESC LIMIT 20

Perfect! Now we can populate item IDs into catalog_items however we like, by any crazy logic or no logic at all, and the store can easily browse/search it as a subset.

Last step : how to know set up our system so that it sometimes joins against this catalog_items table when needed, (jazz.cdbaby.com), but doesn’t when not (www.cdbaby.com)?

Easy! We’ll ALWAYS join against it. If you’re browsing www.cdbaby.com with no limiters then that’s catalog #1, a catalog of all available items that we’ll populate into this join-table every night like we do the others. Catalog #2 and up will be the subsets.

This even solves a problem I wasn’t looking to solve: the trouble of always having to pass “WHERE active=TRUE” in every single search, so that I’m not showing items until they’re approved to go on the site. It ALSO solves a problem of not including abstract items’ variations, which is a topic of a future post.

Derek Sivers

AddThis Social Bookmark Button

When switching CD Baby from MySQL to PostgreSQL, it gave me the perfect opportunity to fix some things in our database structure that needed changing. Among many other changes, here are some interesting ones:

ALBUMS ARE NOW ITEMS:
OLD: one big table called “albums”, that has everything we sell
NEW: one common table called “items”, with sub-tables depending on the type of item

CD Baby was written to sell only one thing : CDs.
When people ask if we can sell their T-Shirt, we say no.
When they ask if we can sell digital downloads, we say no.
Can we make a bundle of albums sold for a discounted price? No.
Gift certificates? No.

My “albums” table was the rusty axle to this wheel. Everything revolved around it. Every line of code ever written was dependent on this “albums” table and the way it worked.

So - I imagined a future CD Baby that could sell many different types of items, looked for the basic common things that they ALL have (name, price, description), and made it an “items” table.

If item is an ALBUM, it pulls in details from the “albums” table. (An album can be any format: CD, vinyl, download, whatever. It’s a collection of songs.)
If item is a BUNDLE, it foreign-key joins a list of other items, and gives it one combined price.
If item is MERCH, it’s the “merchant ships it” thing, where we collect the money, the vendor ships it directly to customer, then we pay them after it’s proven shipped.
If item is DOWNLOAD, it foreign-key joins to a song: a piece of music. (A download album, then, is a BUNDLE of DOWNLOADs.)

ADDRESSES STAND ALONE
OLD: customer had address, city, state, zip, country. invoice had address, city, state, zip, country
NEW: table called “addresses”, linked to by not only customer but lineitems

I always thought it was kinda cool that the big online stores remembered my multiple addresses I’ve ever used (a one-to-many relationship between customer and addresses). Then I realized that this same “addresses” table could be used to not have an address per-order, but rather an address per-ITEM inside an order!

Yeah I know this sounds obvious, but it’s fun figuring out this shit on my own, with no mentor or instruction book telling me this is how the big boys do it. Very satisfying.

SOUNDS-LIKE: A LIST OF FAMOUS ARTISTS
OLD: artists tell us they sound like “bob dylan ani difranco and early zepplin” in a single text field
NEW: album_soundlike joins album_id to list of id#s of famous artists

This is going to be one of the biggest converting challenges.

For 6 years, I’ve asked 80,000 artists to “tell us three famous artists people say you sound like”. This info was entered into a text field, however they gave it to me. Some would even write, “I don’t sound like nobody” or “Chili Peppers back before they got lame”. When people would search CD Baby, I’d just do a full-text search of this field to see if it returned what they were looking for. Now it’s time to try to organize that data, so we really know which famous artists the artists are referring to.

I got a list of famous artists from the brilliant Robert Kaye at MusicBrainz. I stuck these 131,000 artists into a database with only their name and an auto-generated id#. Then a join table to link the album’s ID to the multiple IDs of the referenced artists.

UPDATE: passed this part of the project to Robert, directly, since he’s the king of music metadata. He’s found a great solution using PyLucene that he will open source, too. More on that later, I guess.

Ming Chow

AddThis Social Bookmark Button

Related link: http://www.tuftsdaily.com/vnews/display.v/ART/2005/01/20/41ef4cefa1ecf

Last week, I was interviewed by an editor of the Tufts University daily newspaper, the Tufts Daily, about phenomenon known as blogs. The interview questions I received were very good questions:

  • How long have you been writing weblogs? How did you learn about blogs and how did you get involved in writing them? What do you write about?
  • Do you believe the way people receive news has been changing over the past couple of years? Why do you think some people have turned to blogs or the internet for news?
  • What do you feel are the pros/cons of receiving news from a blog? How reliable are they?

And the interview wouldn’t be complete without that most important question:

  • What do you feel are the pros/cons of receiving news from a blog? How reliable are they?

The article was published back on Thursday. The article couldn’t have come on more better and coincidential timing because yesterday, there was a national feature about the ethical concerns in blogs. Many of the points raised in the latter article coincides with my comments in the Tufts Daily, and it goes into the question of credibility in greater depth.

Finally, there is a new blog-publishing website called Ready, Set, BLOG!. It is created and maintained by the same creator of the GMail4Troops website, Drew Olanoff. One of the differences between Ready, Set, BLOG! and Blogger is the community features. For example, you can view the last visitors to your blog, then you can
visit them and so on. There’s top blogs of the day based on
nominations, plus many other features.

But from all of this, it is very apparent that blogs are becoming mainstream, important to our daily communications, and as a source of information to the critical mass.

(Special thanks to Tufts Daily editor Stephanie Christofides for putting together to article, and for finding me on, where else, on the O’Reilly Network)

Derek Sivers

AddThis Social Bookmark Button

Related link: http://www.rubyonrails.org/

I know I’ve talked about it sporadically for a year now, (see past posts), but I’ve finally started my CD Baby rewrite.

BIG CHANGE #1 : PostgreSQL

I dabbled with PostgreSQL one night when I wasn’t in the mood to do what I was supposed to be doing. I’d heard some people I respect rave about it, and since I really do love databases, I decided it was worth a few hours of my time. Holy Canoli! Its strictness solves most of my data corruption problems I’ve had with MySQL!

Example problems with CD Baby database in MySQL:

#1 - entries deleted from one table that were required by another.
Example: An album is deleted from our catalog, so now all the lineitems (customer purchases) of that album are left dangling with no matching identifier.

#2 - many invalid entries, especially many dates of “0000-00-00″
This one has gotten us into trouble with our digital distribution partners, whose much-smarter system throw up a big fat error when we stupidly report an album’s release date as “0000-00-00″

#3 - no strict requiring of join-ids matching.
Example: someone mistakenly entering that an order belongs to customer #314981 when it should have been #319481, and there is no customer #314981. Database didn’t complain so we didn’t notice.

Yes of course with these and every other example we all say, “Yeah but your code should have prevented that.” Thing is : in MOST places it does, but various little shell scripts and admin scripts can’t check for every possible human error, so here I am with a constantly corrupt database.

This is where I *LOVE* the strictness that PostgreSQL makes easy. Yes I hear that MySQL InnoDB tables do the same thing, but I’ve already fallen in love with PostgreSQL, and switched.

BIG CHANGE #2 : RUBY, BABY!

Like a lost soul walkin’ the earth, lookin’ for spirituality, that stumbles upon the right church with the right people at the right time, I’ve found my niche with Ruby. Its little itty-bitty community attracts some brilliant “think different” types with a love for beautiful code that do this for love, not money.

I liked it immediately a year ago when I learned it while stuck in a cabin in Sweden. I stopped after some shell scripts though because its web-making features weren’t up to snuff. Now, with Rails, there are a team of passionate geniuses contributing to this web-making framework daily. It’s small enough that you can stay on top of it, and watch this framework get more and more powerful by the week. Improvements that are pragmatic not political. People using it to make effective websites, contributing to the shared framework around it as they go. Why not take advantage of all this brilliant work?

It took a lot to get me to switch from PHP, the only language I really know, to Ruby. I tell my non-computer friends, “It’s like the week before sitting down to write a book, I decided to write it in Portuguese instead of English, because it’ll be easier.” It sounds crazy, but we’ll see.

Bookmark/subscribe to my author page, here, at http://www.oreillynet.com/pub/au/1841 if you want to watch the almost-daily developments.

Please no flames about Python, PHP, Java, or MySQL. My choice to use Ruby + Postgres was due to my love of them, not hate of something else.

Andy Oram

AddThis Social Bookmark Button

Opera houses show translations of operas to the audiences on screens
while the performance is underway. I hope the opera houses consider a
change of platform after the performance of La Bohème reported in
today’s Boston Globe, where during the main character’s first aria,
the computer controlling the display started running a routine anti-virus
scan.

Nitesh Dhanjani

AddThis Social Bookmark Button

Related link: http://tor.eff.org/

Tor has been around a while, but I have only recently had the chance to look into it in more detail:

Tor is a network of ‘virtual’ tunnels that allows you to connect to hosts on the Internet with increased privacy. You can use it to keep remote hosts (such as web servers you may be connecting to) from learning about your location (IP address). Tor does this by routing outgoing connections from your computer via “onion routers”, i.e. specifically designates hosts that have been setup to participate in the system. To quote from the Tor website:

“To create a private network pathway with Tor, the user’s software or client incrementally builds a circuit of encrypted connections through servers on the network. The circuit is extended one hop at a time, and each server along the way knows only which server gave it data and which server it is giving data to. No individual server ever knows the complete path that a data packet has taken. The client negotiates a separate set of encryption keys for each hop along the circuit to ensure that each hop can’t trace these connections as they pass through.

Once a circuit has been established, many kinds of data can be exchanged and several different sorts of software applications can be deployed over the Tor network. Because each server sees no more than one hop in the circuit, neither an eavesdropper nor a compromised server can use traffic analysis to link the connection’s source and destination…”

Great! In addition to help protect everyday privacy by allowing web surfing to be anonymous for the ordinary user, Tor sounds like an excellent idea for those who wish to establish outbound connections via ISPs that prohibit certain protocols (since Tor uses proxy software to tunnel the connection via it’s routers). Also, I’m pretty sure Tor will begin to be quite popular among BitTorrent users! However, do note that while Tor attempts to anonymize your location, it does not protect against protocol specific issues:

“Tor can’t solve all anonymity problems. It focuses only on protecting the transport of data. You need to use protocol-specific support software if you don’t want the sites you visit to see your identifying information. For example, you can use web proxies such as Privoxy while web browsing to block cookies and withhold information about your browser type.

Also, to protect your anonymity, be smart. Don’t provide your name or other revealing information in web forms. Be aware that, like all anonymizing networks that are fast enough for web browsing, Tor does not provide protection against end-to-end timing attacks: If your attacker can watch the traffic coming out of your computer, and also the traffic arriving at your chosen destination, he can use statistical analysis to discover that they are part of the same circuit.”

Appropriate links:
The Tor web-site.
Tor documentation.
Download Tor.
OS X specific instructions.

Andy Oram

AddThis Social Bookmark Button

Tony Mobily describes his

Free Software Magazine

as being half about freedom and half about technology. Altogether, it
is meant to help everyone from the curious individual to the head of a
government or business department understand what they’re getting into
and how to make use of free and open source software. Mobily is trying
to establish his new magazine as the authoritative source for all
kinds of information on free software.

I understand that there’s a need here, which others have tried to fill
and not quite succeeded. One can get news about free software–and as
many opinions as you can stomach–hourly from a number of online
sites, but few go into depth and none could be called authoritative. A
few sites such as
First Monday
offer intriguing articles of a more formal and academic nature. And
several excellent magazines cover Linux, but they’re directed at
particular subsets of Linux users and don’t have the broad mandate of
Free Software Magazine. Is there a niche for Mobily’s venture?

The first issue provides some nice nuggets. My favorite article is
Malcolm D. Spence’s checklist for justifying free software:

Free software is not just about “no license fees”!
.

Chris J. Karr lays out the various options for

programming on Mac OS X
.

Mobily’s own article

Creating Free Software Magazine

helps explain the magazine’s rather generic-looking layout–the
formatting is all done through XSLT and requires no manual
intervention.

Free Software Magazine releases its articles under licenses that
permit reuse: Creative Commons licenses, the GNU Free Documentation
License, or “Verbatim Copying Only.” It also puts its articles
online–particularly valuable for getting its message out to readers
in developing countries. But Mobily is looking forward to
paid subscribers
for their print version so that the magazine can continue–and
pay its writers.

Adam Trachtenberg

AddThis Social Bookmark Button

Related link: http://story.news.yahoo.com/news?tmpl=story&cid=1540&e=13&u=/afp/20050118/sc_afp…

Interesting story on the O’Reilly mascot, the tarsier.

brian d foy

AddThis Social Bookmark Button

Related link: http://www.theperlreview.com/Found/

I’m creating “Found Perl”, a little, virtual Perl memorabilia museum. I’ve got a lot of stuff to include, and I bet the community has orders of magnitude more.

I’ve been looking at Found Magazine since I heard about it on This American Life. They publish pictures things that people find lying about: mostly flat things like scraps of paper. I’d like to do that for Perl.

For instance, I have on display:

  • Highways signs pointing to Perl, Germany (next to Apach, France).
  • The temporary camel tattoo I was giving out at one of the Perl conferences
  • My original receipt for my first copy of Programming perl
  • Mark Jason’s card announcing “Perl Advanced Techniques Handbook” (now “Higher Order Perl”)
  • and some other things

There is a lot of stuff I’d like to find, and even more stuff I probably don’t know about. If you have something, please send me an image or scan at found@theperlreview.com.

  • Perl Magnetic Poetry, from The Perl Journal
  • Nat’s profane “Perl is my bitch” sticker set (What were the other slogans?)
  • An original blow-in card for The Perl Journal
  • Pictures of other Perl t-shirts, especially the one given out at the first couple of Perl conferences
  • The O’Reilly beret given out at one of the Perl conferences
  • Scans of signatures from various Perl people (I don’t have any myself)
  • Tim Bunce’s handwriting on a bar napkin saying “Generic database interface: use GDI or something” :)
  • Images of swag (keychains, water bottles, pens) that Perl vendors gave out.
  • Instances of the string “Perl” in everyday life. I remember seeing an image of a European road sign pointing to “Perl”. I think it was German, but I can’t find it.
Kevin Shockey

AddThis Social Bookmark Button

Related link: http://www.mono-project.com

Many people believe that the future of Mono is very bright, I know I’m one of them. Specifically, they believe that Mono will become a very popular software development platform for Linux. I think so too, but I think there is something missing in this prediction. In this article I will answer a series of simple questions that identify what is missing. In the end, it will make it easy to understand how Mono fulfills this prediction. We’ll start with a key question in this analysis.

Will Mono become popular because of hard-core Linux aficionados adopting the software development platform?

This is highly unlikely. Developers as with most technologists tend to be extremely religious in their choice of a development platform. Therefore, most developers choose one and dedicate themselves to mastering that platform. So choosing a software development platform tends to be a mutually exclusive decision. We either choose Perl or we choose Mono. Only the rare developer will be open, flexible, and agnostic with their choice. Learning C#, no matter how easy it might be, will still require a developer to change their religion. There will be some defections, but not enough to make Mono popular.

So the hard-core Linux aficionados will not be able to achieve critical mass for Mono. What about the current C# developer base? When we consider this base, there is one question on most people’s mind.

Why would any development team that currently targets Windows and the .NET Framework care about Mono?

At this time, the most frequent response you might receive is, they don’t. They don’t care about Mono because their server room doesn’t yet have any Linux servers and none of their target users have Linux on their desktop.

They currently don’t care, but I believe that they will. As I see it, Mono will only become a popular development platform on Linux with the help of the Windows community. They are the largest market for all things .NET and C#, so if Mono is ever going to reach critical mass, then it must be through them.

Even though Linux deployments continue to increase there will still be more C# programmers in the world than all of the Gnome GTK+, Qt, Perl, Python and PHP communities put together. Although some developers from these groups will be willing to learn a few new tricks with Mono, the rest (I’ve met some of them) would rather die than switch. So it still remains. If Mono is to become a popular development platform for Linux, it will only be possible with the support and acceptance of the Windows C# developers.

What will happen to make them want to care about Mono?

Most importantly, Linux will continue to gain popularity. With strong support by all of the major hardware vendors, it is clear that the Linux server market will continue to explode. Since Linux helps the hardware vendors sell more boxes, it is only a matter of time before this becomes true for the desktop as well. Pretty soon, if it isn’t already, Linux will be constantly in the news and in all of the trade journals. With the bounty of open source applications most Linux distributions include or make available, there are bound to be functionality that becomes available that was previously beyond the reach of some IT shops.

So first they will become curious about open source software and why everyone seems to be buying Linux boxes.Imagine this scenario:

You are the development lead in a medium sized IT organization with 15 to 30 developers. The developers on your team are hardcore Microsoft advocates. You’re developing ASP.NET and .NET rich client applications and deploying the applications on Windows 2000 or Windows Server 2003. Most of your key developers are comfortable with C# and use SQL Server and IIS.

One day you get in your snail mail in-box an article torn from InformationWeek or ComputerWorld. The article is about Linux, or maybe it is about cross platform deployment, or maybe it is even about Mono. On this article the IT Director for your department or your software development manager has scribbled a quick note. He has written “What do you think of this? Is this for real? Should we be looking into this open source stuff?”

Your first reaction is to ignore this delivery and get back to coding. But you’ve seen some articles about Mono in your .NET solution surfing on the Internet and now you wonder. Is there anything to this Mono stuff. You go and check out the Mono Project site to snoop around. You find a link to the Windows installation and you download the setup program. On one of your development servers you install Mono, and are you fairly impressed with the effortless installation.

The bottom line is that you like Mono and are curious about how it will behave in a limited production scenario. You and your team builds a quick ASP.NET application and you deploy it on the Server with Apache and mod_mono. The testing of the application reveals that it is ready for prime time and now your organization cares about Mono.

I believe that this scenario will play out with more and more frequency over the next 18 months. In Geoffrey Moore’s classic “Crossing the Chasm” he presents a model to predict technology adoption. This model breaks adoption into two distinct groups, the early market and the mainstream market. We all are fairly familiar with the people that make up the early market, technology enthusiasts and visionaries. The key to the model is the first group in the mainstream market, the pragmatists. Once a technology takes hold in this group it has crossed the chasm and is ready for main stream adoption. I believe that Mono is currently in the early market, but it is poised to cross the chasm in the next 12 months. With the famous City of Munich success Mono may be establishing the references necessary for pragmatists to have confidence in Mono. When that happens, then curiosity in Mono will explode and so will adoption.

Summarizing thus far, Linux aficionados will not be able to make Mono popular so it is up to the Windows C# community to complete the task. Although they currently don’t care, as Linux gains market share and their curiosity peaks, adoption will begin to explode. Once adoption of Mono within the Windows community occurs, there is just one question left.

How will Windows C# developers make Mono popular on Linux

The following steps, present the final analysis for how Windows C# developers make Mono popular on Linux.

  1. Experimentation becomes acceptance
    As IT shops continue to test and deploy one-off or non-critical applications written in Mono, their confidence will begin to grow. Organizations will realize that they are re-using their previous C# investments and saving money with Mono. This realization will earn the project a great deal of respect and build the momentum necessary for Mono to cross the chasm. Of course, additional case studies will help build this momentum.

  2. Choosing Mono
    There will arise unique situations, where IT shops will choose Mono or Mono on Linux as the target platform over Windows and .NET. There are three scenarios which make sense.

    • Autonomy - Many organizations will be drawn to the freedom and liberties that accompany the open source licenses for Mono. International development shops will be especially receptive of Mono. As opposed to Java, Mono allows foreign companies to create .NET applications without any worries over whether the architecture will stay available. No other software development platform matches Mono’s freedom and ease of use for developing rich client applications.
    • Innovation + Feedback = Growth - Mono is a very active project. I anticipate the innovation and evolution of Mono will very soon outpace Microsoft’s ability to continue enhancing the .NET Framework. Rumors of this are already beginning to ciruculate. I predict that Microsoft delays for Longhorn and Avalon will keep them distracted for the next few years or so. This lapse will allow the Mono project to provide features and bug fixes in response to the user community that Microsoft will be unable to match. In the end the combination of a vibrant innovative development community plus their ability to remain responsive to the user community will make Mono very attractive. This attractiveness will add momentum to the growth of Mono on Windows.
    • Not Available Here - The wealth of applications available for Linux under open source licenses will become a powerful asset to many IT shops. The breadth of applications will expose software development deparments to new functionality that may have only previously been available on Windows with very expensive licensing costs. With these new applications, there will arise many new integration and development opportunities.
  3. Linux Rules
    With the general acceptance of Mono growing from increased application deployment, the final step will be the dominance of Linux in the server market along with the emergence of Linux on the desktop. Then the inertia of Mono combined with the inertia of Linux will draw the attention of everyone. ISVs will take special notice of the growth of Linux and they will have to include Linux as a standard operating system option for their products. Corporate IT departments will continue to find more applications to deploy on Mono and Linux.

    One of the biggest drivers for Corporate IT shops will be integrated solutions. As vendors increasingly choose Mono and Linux as their target platform, then the integrated solutions they sell will be Trojan horses bringing Linux and maybe Mono into the server room.

  4. VB Tips The Scale
    Currently there is only limited portability between Visual Basic .NET and Mono. However, when Mono is able to achieve relative complete compatibility, the number of potential users for Mono will escalate quickly. Of course, VB.NET programmers will have to pass through the entire process of denial, curiosity, experimentation, and acceptance. Once this process is complete, the VB.NET community will easily follow the lead of the C# community. This process may already be starting, even now.

Conclusion

In the end, Mono will become a popular target platform for Linux. This will be due to explosion of the Linux market, the openness of the Mono licenses, and the strength of the open source software development model. It will also happen because of how easy it will be for Windows C# (and VB.NET) to switch to Mono. IT management will love and encourage the switch because it will open up cross platform development without retraining and without re-outfitting developers with tools and resources. ISVs will choose Mono for the same reason, and end up delivering integrated solutions using Mono and Linux into IT shops everywhere. Ultimately the popularity of Mono will grow from the power that any open source software brings; the power of choice. Having the flexibility and confidence in different solution alternatives will improve the quality of our solutions. Organizations will choose Mono and Linux when the needs of the problem demand them. For most, it will be reassuring that that all of this flexibility will be available to Linux and Windows developers alike.

What kind of Mono programming book would you like to have right now? What kind of books will you need in a six months, at Release 2.0?

Jono Bacon

AddThis Social Bookmark Button

With my initial introduction to Linux, I associated the Linux and Open Source community as a group of people who are very capable at creating complex software for complex people. Back in 1998, geeks on IRC would wow me with the wonders of emacs and its kitchen sink approach, and it was said that I could spend my entire day emacs. Wow, thats a thought. As much as I choose to not use emacs for various reasons, I also respect that it fills the technical boots in which competing editors were hopeless at the time. It certainly is a very capable kitchen sink indeed.

As time rumbled on, I gained an increasing interest in the KDE project and how it could make Linux an easier to use platform. Despite the many challenges that faced KDE, their approach seemed fairly consistent and eager to bring forth the same kind of features that make a front row appearance in the Windows world, but on a UNIX/Linux/BSD platform. Much effort was poured into the project, and I got increasingly involved with KDE myself.

Then came along GNOME, freedesktop.org, OpenOffice.org and a range of other projects and tools that pushed for an easier to use platform. Despite these efforts, fundamental problems such as clicking on a floppy disk icon and having it automatically mount seemed to pour salt into the wound. Unfortunately, towards the end of the nineties, the solution appeared to be to pass the buck and insist it was some other project’s issue. Hey, you need this particular feature in KDE? That may well be XFree86’s domain, or it may be a kernel issue, or you may just need to go away and code it yourself using some freaky bongo programming chops.

As time continued to roll on, and as my experience with Open Source and Linux turned from a hobby to a career, I was introduced to the Debian project. Although more complex to administer, Debian walked me towards the promised land of figuring out dependencies, and no longer suffering some of the dependency nightmares that plagued my humble little system back then with Mandrake. This was an interesting choice. Do I sacrifice the desktop ease of use of Mandrake (no major problem as I was running CVS KDE code anyway) for a more complex system with increasingly easy package management. I did, and I never went back. From then onwards, if it weren’t packaged for Debian (albeit me nosing around with CVS code), it simply didn’t sit on a Bacon system.

Bring forth Ubuntu

Spin forth to last year, where I am now working as a professional Open Source evangelist at OpenAdvantage and writing for a number of Linux press magazines and sites. Within my role as a journalist and consultant, I had dabbled, reviewed, evaluated and tested hundreds of different distributions and tools, but none of them had that special something that Debian gave me.

As this happy Debian user, I read with great interest about an underground distribution called Ubuntu. With funding from supremo millionaire and first African in space, Mark Shuttleworth, it sounded like just another Linux distribution that would tail behind Red Hat, SuSE, Mandrake and Debian. Via a series of exchanged emails, it was possible to track down where the action was going on, who was involved and when it would be launched. There were a series of beta copies of their distribution and alas, my inquisitive side got the better of me and I installed it on a Powerbook; itself, not a simple system get Linux onto. It worked. There was no fuss, there were no problems. As I used the system, GNOME (which I had switched to some time back, see this weblog entry for the meat about why) seemed to integrate seamlessly with everything before me. Everything looked alike, my devices were detected, and things simply felt, simple.

For me, Ubuntu is a nigh on perfect distribution. The reason for this is that is focuses clearly on simplicity. With an impressive coalition of developers on Shuttleworth’s payroll, Ubuntu has managed to be successful in getting the balance between a comfortable set of defaults, and the ability to remove all of the nonsense that clog up many competing distributions. Why do I need three CD players? Why do I need three web browsers? Why do I need six text editors? Ubuntu gives you one (or occasionally two) application(s) that are picked as the best tools for the job. The effort in the project has been pooled into providing a simple, effective and tight distribution that performs well and works a treat at detecting your hardware.

Even Shuttleworth himself is clear that Ubuntu needs to follow a direction with strong technical achievements. As part of the LUGRadio team, I interviewed Mark for an up and coming show released tomorrow (Mon 17th Jan 2005), and he seems different to most people in his position. He speaks the words of someone who really understands Open Source, and someone who really understands the responsibilities of a distribution. With someone who understands Open Source at a technical and managerial level, as well as a team of highly reputable hackers on board, Ubuntu is something to get excited about.

Simplicity does not mean ignorance

What is interesting about this evolution of Linux, is that the mindset of the community has the ability to shift substantially. In my experience, it seems that there are two sectors in which rabid zealotry and traditionalism have the chance of digging their feet in; IT and Music. In the IT world, it is not unusual to meet people who are still using software from 10 years ago that they point blank refuse to move away from, even though there is a better solution available. Aside from sticking to the tools, the same problem can be said for the mindset. If someone has the view that ‘high level languages are bad, you should code everything in assembly’, it is likely that this view will stick, even though it can involve a longer process and involve more work. This purist attitude can be quite an issue, particularly in an environment where you need to work as part of a team.

In the Linux world, the community has had the ability to take traditionalist beliefs and augment them with new thinking. Although there are many in the Linux community who will follow tradition, there is also a substantial amount of new blood that will push for modernisation and change. This produces a wave effect that gives the community the opportunity to aspire to new thinking and make attempts to push the software forward while still showing respect for tradition. This is the reason why a Linux distribution can have a cutting edge desktop and still ship vi. This is a great symbiosis of the two mindsets.

Future

With Open Source, we have seen the opportunity to challenge established market driven approaches. In the closed source, commercial world, the approach seems to be to lay on more and more icing that will give users an urge to upgrade. Although this makes sense in one way (more buttons give you the impression you get more features for your money), this approach also erodes the simplicity that forms the foundation for many of these systems. If you look at a number of commercial tools, many of them swing around the idea of giving you certain complex functionality, but easier and more conveniently. This certainly applies to Microsoft Windows, Apple Mac OS X, the iPod, Visual Basic, Delphi and more. I suppose the analogy is that although you could go and buy all the separate bits to build a car, it is far more convenient if you purchase a pre-built car for the convenience.

With Open Source, there is the ability to strike a balance between convenience, simplicty and features. The tough part is in getting the balance and not alienating users. And of course, all of this must exist within an open community in which you want to avoid a clique of developers who claim they know best and close their ears to suggestions and improvements. I am not saying there is a magic potion involved in Open Source that can make these challenges easier or non-exisitant, but within a community there is the opportunity to develop a range of solutions and pick the best path forward based on technical capability and usability.

Where life is going to get interesting is what happens next. With Ubuntu, the distribution seems great, and this is largely because they have shipped a first version (called Warty) that works a treat, but to be fair, Warty was simply a modified fork of Debian Unstable, and the additional functionality that was rolled in was not that drastic from an architectural perspective. As Canonical move away from Debian Unstable more and more with each new six monthly release, maintaining the level of stability, feature capability and satisfaction is going to be a real challenge. Also, Ubuntu is a new distribution with a fresh and new outlook on creating a Linux distribution. It has not yet become riddled with political issues, and although I don’t believe this will be a problem in the future, there is always the risk that political fall-outs can occur in a project such as this. This could be a particular challenge with payroll and contributor volunteers. It will be interesting to observe how the project continues to mature. There are a number of people, myself included, that truly hope Ubuntu achieves every opportunity it promises.

I think the biggest challenge we face is in how we take the huge amount of Open Source work being created and make it work together in a simple and effective manner. Seven years ago, creating a simple, uncluttered user interface was easier because there were fewer features, programs and developers. With a huge community of developers, and a massive amount of applications being created, the emphasis has to be placed on choosing ‘best of breed’ tools that do the job well. I see no reason why a distribution should span over seven or more CD’s; too much choice can relegate a computer to a confusing quagmire of complexity. Sure, provide all the software, but provide sensible defaults with the best of breed tools such as GNOME, Firefox, Thunderbird, GIMP, Scribus, OpenOffice.org etc.

We are facing interesting times, and Open Source continues to rattle on and on like a steam train. With the ability to step back and look at our community objectively, we always have the opportunity to determining if we are moving forward in the best direction.

Did you have a similar or different experience? Is simplicity this important? Share your thoughts here…

Uche Ogbuji

AddThis Social Bookmark Button

Related link: http://www.nelson.monkey.org/~nelson/weblog/tech/python/xpath2.html

I designed Amara XML Toolkit to make the simple things easy and the complex things possible. I’m open to honest, constructive criticism of where I failed in that aim, but I don’t want any misconceptions floating out there.

Cutting to the high-speed chase scene, here is how Nelson Minar can do what he wants in Amara:

from amara import binderytools
doc = binderytools.bind_file("foo.opml")

for outline in doc.xpath("//outline"):
    print outline.xmlUrl

If someone thinks that’s too complex, I’ll be happy to hear ideas of how to make it simpler. It’s 4 lines of code that’s very similar code to the ElementTree example. In my previous blog I went on the impression that Nelson really wanted to use XPath in attributes, so I showed how to make that possible in Amara. He somehow misinterpreted that, implying that throwing in such a rule is the only way to parse a document in Amara.

In reality, 90% of Amara users will never need to invoke a special rule while parsing XML. The defaults are generally fine, tuned for speed/space versus functionality.

Amara does let you turn on and off custom behaviors with simple declarative rules, and it lets you tune those rules to be applicable to just portions of a document. I think this is a good way to save users a lot of code. Yes, the downside is that you have to learn the available rules, but that is inevitable, and I’ve always thought it’s easier to read a documentation on an existing capability than to write code to reinvent it.

But as I always say, code speaks louder than words, so here is more. Above I challenged folks to show how they could make the Amara bindery example simpler. Well, in my last release of Amara I decided to take on that challenge myself. Amara 0.9.2 introduces the Pushbind. With Pushbind, here is code that does what Nelson wants:

from amara import binderytools
for frag in binderytools.pushbind('outline',source='foo.opml'):
    print frag.outline.xmlUrl

There you go. One fewer line, and the XML looks to all observation like just any other Python object coming in from an iterator. One nice bonus is that it is extremely memory efficient. In fact, it never uses much more memory, in general, than it takes to represent one outline element. This is true whether foo.opml is 1KB or 1MB.

As an illustration for general users, the following code prints all verses containing the word ‘begat’
Jon Bosak’s Old Testament in XML, a 3.3MB document, again without ever needing to have the entire document in memory (although there is always the possibility that the loop will outrun Python’s garbage collector).

from amara import binderytools
for frag in binderytools.pushbind('v',source='ot.xml'):
    text = unicode(frag.v)
    if text.find('begat') != -1:
        print text.encode('utf-8') #There's some non-ASCII in ot.xml

I personally think that Pushbind handles just about any of the cases that make people turn to SAX.

brian d foy

AddThis Social Bookmark Button

This morning I couldn’t ssh into my mail account, at least not from my machine. I went to another machine on a different network (and in a different country) and got in from there. Hmm… looks like DNS is all messed up.

If only it was that simple. The front web page for PANIX says (right now).

Panix’s main domain name, panix.com, has been hijacked by parties unknown. Panix staff are currently working around the clock to recover our domain.

For most customers, accesses to Panix using the panix.com domain will not work or will end up at a false site.

As a temporary workaround, you can use the panix.net domain in place of panix.com. In other words, if you’re trying to log onto “shell.panix.com” or see your mail at “mail.panix.com,” use “shell.panix.net” or “mail.panix.net” instead.

Mail to username@panix.com is currently being redirected to the false site , and should be considered lost or compromised if it does not arrive in your Panix mailbox. If you have online accounts that authenticate via email address, you might wish to protect them against fraud by changing that address to your username “@panix.net”.

Holy canoli! That means that all of my mail to my public email address, comdog@panix.com, could be going to the wrong place. Someone has virtually cracked the mail server simply by redefining what the mail server is.

I check out the WHOIS record from my home network. It definitely looks wrong: PANIX is a New York City ISP. What’s all this Las Vegas nonsense? The nameservers are in the UK.

Domain Name.......... panix.com
  Creation Date........ 1991-04-22
  Registration Date.... 2005-01-15
  Expiry Date.......... 2006-04-23
  Organisation Name.... vanessa Miranda
  Organisation Address. 1010 Grand Cerritos Ave
  Organisation Address.
  Organisation Address. Las Vegas
  Organisation Address. 89123
  Organisation Address. NV
  Organisation Address. UNITED STATES

Admin Name........... na vanessa Miranda
  Admin Address........ 1010 Grand Cerritos Ave
  Admin Address........
  Admin Address........ Las Vegas
  Admin Address........ 89123
  Admin Address........ NV
  Admin Address........ UNITED STATES
  Admin Email.......... jzoh@yahoo.com
  Admin Phone.......... +44.702413697
  Admin Fax............ +44.7026413697

Tech Name............ Domain Admin
  Tech Address......... Burnhill Business Centre
  Tech Address.........
  Tech Address......... Beckenham
  Tech Address......... BR3 3LA
  Tech Address......... Kent
  Tech Address......... GREAT BRITAIN (UK)
  Tech Email........... admin@powerhost.co.uk
  Tech Phone........... +44.2082496081
  Tech Fax............. +44.2082496076
  Name Server.......... ns1.ukdnsservers.co.uk
  Name Server.......... ns2.ukdnsservers.co.uk

I go to the Internic site to use their whois and get a different answer, and one that has the right nameservers. It seems odd that PANIX would use an Australian company to register their domain.

Domain Name: PANIX.COM
   Registrar: MELBOURNE IT, LTD. D/B/A INTERNET NAMES WORLDWIDE
   Whois Server: whois.melbourneit.com
   Referral URL: http://www.melbourneit.com
   Name Server: NS1.ACCESS.NET
   Name Server: NS2.ACCESS.NET
   Status: ACTIVE
   Updated Date: 14-jan-2005
   Creation Date: 22-apr-1991
   Expiration Date: 23-apr-2006

Now here’s the mind bending part of it all: Some networks haven’t seen or aren’t respecting the changed record, so everything works as normal from those networks. Other networks obey the new registration. Some people can send mail to me and I get it. Some people send mail to me and it might end up on a cracker’s machine.

So, which networks get it and which don’t? Which networks include my banks? Even if my banks don’t send me mail with any compromising information included, they do include some information.

It’s even odder though. Even though the shell hosts do not have DNS entries on my home network (the one using the compromised records), the web server address is just fine. Many other address do not even resolve. It has the same IP number as before, and from another network I can change the web page and see the results. The compromised records have correct entries for some services. If someone is going to hijack the domain, why would they do that? I see a bit of intent there: is there some sort of extortion involved? There is just enough effect to say “We own you”. If someone really wanted the domain, I think they’d just take over everything.

[and, for those of you playing at home (and since this is the first question I get from people, I’m not using any passwords that are sent over the network. Everything I need to get to has my public ssh identity. I log in to the machine and read my mail with PINE. No web mail, no POP, no nothing. :)]

Uche Ogbuji

AddThis Social Bookmark Button

Related link: http://www.nelson.monkey.org/~nelson/weblog/tech/python/xpath.html

First of all, here are the three snippets Nelson posted:

PyXML

from xml.dom.ext.reader import Sax2
from xml import xpath
doc = Sax2.FromXmlFile('foo.opml').documentElement
for url in xpath.Evaluate('//@xmlUrl', doc):
  print url.value

My take: this uses the ancient 4DOM code. I expect it to be slow as hell and suck all the memory out of your computer. People, avoid the line from xml.dom.ext.reader import Sax2 like the plague. If there are docs that still suggest it, they really should be fixed. If you do use PyXML, use minidom, but I personally have not been much of an advocate of PyXML in ages.

libxml2

import libxml2
doc = libxml2.parseFile('foo.opml')
for url in doc.xpathEval('//@xmlUrl'):
  print url.content

My take: as Nelson admits this snippet is very deceptive. It doesn’t show even a fraction of the hair-pulling that would characterize a real-world version of the same code. It ignores the fact that libxml2 forces you to do your own memory management, that it requires very hideous C-ish idioms to work through the XPath results, etc.

ElementTree

from elementtree import ElementTree
tree = ElementTree.parse("foo.opml")
for outline in tree.findall("//outline"):
  print outline.get('xmlUrl')

My take: ElementTree is always a breath of fresh air, but Nelson mentions that he was hampered by the XPath limitations (no attribute axis, for example). Well, there is always some cost to max simplicity, max performance.

And out of my corner are the following offerings.

4Suite:

from Ft.Xml.Domlette import NonvalidatingReader
doc = NonvalidatingReader.parseUri("foo.opml")
for url in doc.xpath("//@xmlUrl"):
    print url.value

Here you have 100% of XPath’s power, plus the option to extend XPath in Python, if need be. It’s also plenty fast these days, if not quite as fast as libxml2, and probably not as fast as cElementTree.

Amara

from amara import binderytools
rule = binderytools.preserve_attribute_details(u'*')
doc = binderytools.bind_file("foo.opml", rules=[rule])

for url in doc.xpath("//@xmlUrl"):
    print url.value

Looks very similar to the 4Suite example besides the imports and the declared rule. Amara does not support XPath attributes by default (to save space, similar, I’d guess, to the reasoning in ElementTree), but you can trivially enable them by asserting the above rule. 4Suite has no such limitations, but Amara’s edge is more clearly shown if you’re not using XPath. For example, Amara would allow you to access an XHTML title easily, without needing XPath: print doc.html.head.title. This is what I mean by extreme Python-friendliness. I should point out, though, that Amara’s XPath implementation does have some other limitations, but not any most users are likely to run into.

Got code of your own?

Uche Ogbuji

AddThis Social Bookmark Button

Related link: http://www.nelson.monkey.org/~nelson/weblog/tech/python/xpath.html

Nelson says: There’s the stock Python install, which barely does anything [for XML]. That’s overstated. Plain old SAX and minidom may not be ideal, but they’re useable. Various bugs in PySAX and Minidom (see, for example this article ) have unfortunately plagued the standard library, but starting with Python 2.3, I think that they deliver what’s promised. The main problem is that what they promise doesn’t fit Python’s shoes all that well. PySAX’s very literal translation from Java’s class/method callback feels very stilted in a language that now has the likes of generators and nested scopes. I suspect if PySAX were in development now things would be very different. It’s to some extent a legacy problem. I used to recommend SAX to those who need performance, but I think my own recent work (represented in Amara) and that of Fredrik Lundh (in ElementTree) may be enough to render PySAX obsolete. as for minidom, it could do with a lot of more Python-friendly sugar so that people don’t have to think in the W3C’s over-elaborated API, but once you get the hang of DOM, you can pretty much do whatever you need with it.

So rather than completely writing off the stdlib XML facilities, as Nelson did, I damn it with faint praise. Not a difference worth much bother? Perhaps. Moving on, here’s Nelson again:

PyXML, which has an ugly hack to confusingly install on top of the default Python libraries. But if you follow the advice of Python’s most visible XML expert, Uche Ogbuji, you may think there’s something wrong with PyXML and install 4Suite instead, which is the same as PyXML only different.

I’ve done a horrible job of explaining 4Suite if people are thinking it’s in any way similar to PyXML. The two could hardly be more different. Maybe Nelson means that the XPath libraries are the same? This isn’t true either. Years ago we did copy the 4Suite code base to PyXML, and it was massaged to make it fit better into PyXML overall. Since then, the XPath in 4Suite has evolved into an entirely different beast: much faster, more extensible, and with a cleaned up API.

Or should you use Amara instead? Fair question. When I developed Amara I considered lumping all that code back into 4Suite, but I thought it better to release it as a separate 4Suite add-on. For one thing, I think it has a very different flavor: focusing on Python idioms rather than what-would-W3C-do (which we’d been peeling away from gradually in 4Suite, anyway).

I think I can make a workable soundbite for the cause: If you’re coming more from a Python background, and XML is just something that’s getting in your way, try Amara. If you’re coming from an XML background, and you think in DOM, XSLT and all that, try 4Suite. Does anyone find that soundbite useful? Based on it, I think Nelson should be trying Amara rather than just 4Suite. I should point out that Amara is very fast as well (and 4Suite has made huge strides from when it was too slow to bear: it’s now very respectable, if not blistering).

ElementTree which is brilliantly fast and simple to use, but limited

Hmm. Several times I’ve made the mistake of claiming some limitation in ElementTree, and then along comes Fredrik to straighten me out. ElementTree is a lot more versatile than one might think at first glance. So why did I develop Amara? Why didn’t I just use ElementTree? I did for a while, but I always felt that ElementTree does a great job of loosening DOM shackles for something more Python-flavored (hats off to Fredrik, who tried to coax me that DOM not good enough for Python long before I saw the light). But I honestly think ElementTree doesn’t go quite far enough. Amara follows the principle that once I decide to shrug off DOM, I want to be able to use every possible nifty tool in Python’s arsenal to make the XML feel native to the language. I want something closer to Gnosis Utilities Objectify, but using a much more declarative framework. I think that Amara’s unique niche is a combination of extreme Python-friendliness and declarativity. I think that XML without declarativity results in far too much and too brittle code, even in Python.

xmltramp, which is even more hacky.

I’ll risk the flames and be honest. I don’t think xmltramp
is (yet) industrial strength. It’s a lot hackier than ElementTree, Gnosis, generateDS, 4Suite or even Amara. It looks and probably feel great in the first foray, but I don’t think that experience will scale to heavy usage. Besides, It doesn’t support XPath.

But what’s missing is a clear single simple library to use.

I don’t believe a single choice is appropriate. I want many options. I think people who want just one way to process XML are limited by sketchy experience with XML. Just like I wouldn’t expect one single library for text processing in Python (and I expect no one would suggest such a thing), I can’t imagine how anyone couls shoehorn all the breadth and variety of XML use cases into a single idiom, or even two or three. XML is ridiculously versatile, and this necessitates broad choice. I do a lot with XML and consequently, i often use 3-4 different tools in any given day.

PyXML seems the most standard, but it seems very slow and it tries to be more DOM-like than Python-like. I hate DOM.

I don’t promote PyXML s any sort of standard. To me the only standard is Python’s stdlib and PyXML is not in it. It’s just a couce, and a flawed one for some of the reasons you mention. I think PyXML was important, but has been overtaken by events. I’m not entirely blameless in that matter, and I’m sorry I never had all the energy to work on PyXML as hard as, say 4Suite, but I think at this point it’s too late.

[with PyXML] from xml.dom.ext.reader import Sax2

Yuck. That’s the ancient DOM code included in PyXML. Many people make the mistake of invoking it. It is dreadfully slow and consumes a dreadful amount of memory. Always use PyXML’s minidom. Just replace the above with:

from xml import minidom
from xml import xpath
doc = minidom.parse('foo.opml').documentElement
for url in xpath.Evaluate('//@xmlUrl', doc):
  print url.value

You’ll get a lot more speed, but all my other downer comments on PyXML still apply. There are better options.

the awfulness of the libxml2 API

I couldn’t agree more. libxml2 is a miracle of function, but alas in a form that doesn’t suit Python one bit. I know that folks are working on better libxml2 wrappers, but familiar as I am with the C code, I honestly don’t believe they can produce anything truly Pythonesque without losing all the performance gains.

So that’s all the chatter. But code speaks louder, and I’ll offer some in a subsequent entry.

Andy Oram

AddThis Social Bookmark Button

Most observers see free software development as a vine–and not a
particularly pretty one. They watch as the code for various free
software projects extends tendrils in odd directions, puts down roots
wherever it finds a friendly resting point, and generally seems to
grow with no thought or planning. Linux, for instance, was criticized
a few weeks ago for lacking a “roadmap.”

True, few free software projects can boast a formal structure of user
requirements, schedules, test plans, and so forth. But does this mean
that user requirements are not specified, that the software does not
benefit from thorough testing, and that in general there is no point
of contact between open source and traditional software engineering?

This past weekend I had a talk with Andrew Stellman, a programmer and
project manager, on this subject. He is also working on a book for
O’Reilly along with a co-author, Jennifer Greene. Like many
sophisticated developers, Andrew finds that his job requires him to do
most of his work with .NET and Windows, but he has a great affection
for free software. “When open source is well done,” he told me, “it’s
the best software–no doubt about it.”

In this blog I’ll discuss the ideas Andrew and Jennifer have codified,
and the implications of Andrew’s work for free and open source
software.

Back to software engineering basics

Andrew’s and and Jennifer’s ideas for process improvement are familiar
to anyone who has read the classic texts in software engineering. What
they have tried to do is find simple solutions to tasks such as
developing use cases, and to strip down the planning process to the
essentials that 90% of software shops need. User requirements, test
plans, unit testing, and regression testing–those are key to making
and maintaining high-quality software.

User requirements determine what succeeds and what fails. If only one
product offers you the features you need, you’ll use it no matter how
ill-designed or difficult to use it is (and no matter how onerous the
commercial license). On the other hand, you’ll walk right by the
best-designed, award-winning software if it doesn’t do what you want.

Just writing down user requirements, Andrew says, is a great way to
eliminate bugs and speed up the development process. This is because,
done well, it impels the designer to state specifically what inputs
and outputs he or she expects.

He also has an interesting way of summarizing the goals of software
engineering: to ensure that all users’ needs are met.

This is a fairly simple idea when you consider user requirements
documents: only if you specify the features required by the user will
they get into the final product. But the same driving force also
affects testing. Consider this: each different use of software
represents a different path through the source code. An untested path
means a potentially buggy feature, which means a user’s needs that go
unmet. Thus, specific requirements followed by exhaustive testing are
synonymous with meeting users’ needs.

As many have observed, the ad hoc development cycle typical of open
source tends to produce tools suited to programmers, or structured in
such a way that programmers are comfortable with them. Jennifer points
out that most users don’t like the multiplicity of choices in such
tools and just want the software to follow the familiar steps they
already go through to accomplish a task.

Open source–a different approach

While Andrew wishes free software developers would do more of the
things that software engineering experts advise, he is still very
enthusiastic about the quality of free software. Somehow it achieves
many goals of formal software engineering in its own, less structured
fashion. We spent some time exploring this mystery, and come up with
the following ideas:

Mailing lists

Free software projects are conducted openly, which means that
every change is debated on mailing lists that everybody can
join. This effectively allows every user to contribute ideas for
features–and more importantly, makes sure the request is
heard. In proprietary development, feature requests are funneled
up through management channels and inevitably get filtered along
the way. One person, or a few people, may semi-arbitrarily drop a
request. On open mailing lists, by contrast, feature requests
receive whatever attention they deserve (and usually more); those
with merit get implemented. The same goes for bugs.

Parallel development

In Tracy Kidder’s classic Soul of A New Machine, Data General sets
up parallel development teams racing to produce the best possible
chip. This kind of parallel, competitive development (without the
inhumane pressure) goes on all the time in free software. Two or
three people may be simultaneously working on a new virtual memory
system for Linux, for instance. One person may do his best to
solve the device numbering problem over several years, while
another says, “No, you’re doing it all wrong, I’m going to do it
differently.” Where there’s interest, a wealth of solutions are
available to choose from. (Of course, sometimes there’s not enough
interest among developers and there are few or no choices.)

Widely distributed testing

This phenomenon was described by Eric Raymond in his famous
dictum, “Given enough eyeballs, all bugs are shallow.” This
statement has certainly been disputed (particularly where security
vulnerabilities are concerned) and seems somewhat dubious in the
face of long-dormant bugs that sometimes surface in major free
software projects. But such testing often does work, very
powerfully. Andrew says, “As long as thousands of students are
eager to take each new revision of the Linux kernel and load it on
their hardware, testing is going to make Linux strong. I don’t
know why they’re willing to do it, but it’s great for Linux.”

Design is not as much fun as coding. Almost everybody starts to code
too soon; that’s an industry-wide observation. But hopefully more
highly experienced free software developers will decide it’s time to
try something new and take the next step into design. They’ll write
out those user requirements, based on mailing list feedback. They’ll
lay out tasks for parallel development. And they’ll formalize a lot of
the testing. Then we’ll really see what free software can accomplish.

Uche Ogbuji

AddThis Social Bookmark Button

Myth #1 that Microsoft might have fallen for: Microsoft’s security woes are just because of its popularity. If Linux or OS X were as popular, they’d be dogged just as badly.

No one has a shred of evidence to indicate that any Linux has more vulnerabilities than Windows, and that the only reason they’re not attacked is that they’re not popular? But lack of evidence has not always interfered with Microsoft’s apparent beliefs. Apathy towards security has the specious advantage of saving some resources. Often it takes no more than the silliest premise to dissuade an organization from necessary investment.

There was a time, years ago, when there were too many vulnerabilities in Linux, especially buffer overflows in the likes of IMAP and SMTP servers, and even the kernel. Guess what? These bugs were heavily exploited, even though Linux was less popular than it is now. Many machines got rooted in those days. But a curious thing happened. Those days vanished.

These days most Linuxen are impressively secure in their default distribution and almost all significant software developers associated with Linux have cleaned up their act (or faced ejection from distributions). It has become much harder to exploit a Linux, even for a determined attacker.

It doesn’t take much grade school logic to figure that since Linuxes were hit hard back when they were even less widespread than now, that the relative present-day lack of malware punch is not because they’re not as popular as Windows.

Myth #2 that Microsoft might have fallen for: people get malware when they do things Microsoft doesn’t approve of, anyway.

It’s so tempting. “These people getting malware are doing things they shouldn’t be doing, so they get what they deserve”. If Microsoft believed this, it would be an effective salve to the conscience. A cynic such as I am considers that Microsoft would rather send BSA paratroopers after people they vaguely suspect of naughtiness than deploy measures to protect the whole class. Of course, no one who has had any experience with Windows can believe for one moment the canard that people only get malware when installing pirate or peer-to-peer software, or legit software by shoddy vendors (interestingly enough, the vendors usually cited are Microsoft’s competitors).

I could tell a thousand stories, but one will do for anecdote. I set up Windows XP for my parents in law. Pretty run-of-the-mill custom PC. I patched it to the nines using Windows update. A lot of work, but it’s what we kin techies do. When I was done my son wanted to play around with it, so I opened IE (I was taking a break before installing Firefox and hiding all traces of IE) and wandered on-line to his favorite spot: hotwheels.com. Trouble is, I misspelled the Web site name. I don’t remember the exact mispelling, but I do know that as soon as I saw the resulting page, I could tell there was trouble. The resulting cascade of trojan spyware was spectacular. Looking at “Add/remove software” listed some twenty of them. On a lark, I tried nuking them all using all the measures I could–uninstalling, removing directories, etc. Emtying a lake with a teaspoon. I had to start all over again with a reinstall.

This is what can happen with one erroneously entered URL. I’ve seen similar effects from an aunt who clicked one of those “download these cute smileys for your e-mail” ads, and countless other examples. It’s not hard to imagine how close each keystroke/mouseclick brings Aunt Hattie to MalHell. Oh no. Malware victims are ordinary people doing perfectly ordinary things, and being cruelly punished for it.

Microsoft must recognize this to some extent, considering they’ve now pledged seriousness to anti-Malware. After all, why would they offer Penicillin to Corsairs? But is such a misperception possibly part of the reason they waited so long to take action?

As a Linux user married to an OS X user, malware is not something I worry much about. But I’m kin techie for many other households, and Windows security problems affect me all too painfully. If mythology helps to fuel Microsoft’s lagging response, I hope I can do what I can to help debunk the silly myths.

Ming Chow

AddThis Social Bookmark Button

Related link: http://news.zdnet.com/2100-1009_22-5534064.html

The news couldn’t be any worse this week for technology happenings directly relating to the US Government.

Back in October, Amit Yoran left his post as the National Cyber Security Division, part of the Department of Homeland Security (DHS).

Now only several months since the October incident, another prominent member of the National Cyber Security Division, Robert Liscouski, announced his resignation earlier this week.

Add on the news of the FBI leaning to shelve it’s multi-million dollar file sharing software (Virtual File Case) to combat terrorism.

All this news makes the cloud on government-related IT ventures only darker.

Now I am not going to rip on “what is wrong?” I am not even going to go there. I do want to ask a very simple question: will cybersecurity be properly emphasized and respected, if ever?

Some things I know for sure:

  • Throwing money at the cybersecurity problem isn’t going to work.
  • Sure, we are all concerned about terrorism, and nuclear proliferation. But a cyberattack on some of our power plants and other utilities is also devastating (recall the massive blackout in the Northeast a while back).

Some other things I know:

  • Cyberattacks, including viruses, spyware, worms, and Trojan Horses are becoming more sophisticated and lethal.
  • There is a plethora of public tools funded and sponsered by the US Government including the Common Vulnerabilities and Exposures database (maintained by MITRE) and various communications by the US Computer Emergency Response Team (US-CERT)
  • In general, the public doesn’t have a clue on cybersecurity. The number of infested, flawed, and buggy computers and software is mind-boggling.
  • Crackers broke into the T-Mobile network, and e-mails belonging to the Secret Service were read, along with other highly sensitive files. This was another woe that was announced this week. I think this issue is big enough to make the government understand the importance of cybersecurity.
  • There are companies that are helping out in this problem (e.g. Microsoft and Symantec).

There are lots of unknowns as well, for example, what are the upcoming projects and goals of the cybersecurity division of the DHS?

I am very passionate about these issues, and I hate reading such woeful stories in the news. I may not have all the answers to solve “the problem,” but I can offer some pointers and considerations for the Cyber Security Division to think about:

  • VISIBILITY! Announce and promote current initiatives, upcoming projects, and breaking news to the public. Right now, the group look so buried under all the bureauocracy.
  • I think one of the biggest problem in moving forward and explaining all the issues to the public is that the public is too scared. Well, you have to be honest with them and demonstrate the vulnerabilities (which is not too difficult to do if you have a PC on the network).
  • Ask yourself: are you too decentralized to achieve your goals? If so, would it be ideal to hand off the powers to a roundtable of security experts/firms?
  • Consider a standard channel of communication/announcements (although I admit this will be nearly impossible). Right now, the public is bombarded with information from various sources and companies on problems and most importantly, on what to do. This is not necessarily a good thing. Have all the sources and companies send major announcements to one integrated channe so people will know where to get the latest information. It will ultimately lead to important “visibility” of the division.

Of course, the ultimate goal is to get people to care, which seems to be light-years away.

Well?

AddThis Social Bookmark Button

Related link: http://dangillmor.typepad.com/dan_gillmor_on_grassroots/2005/01/dave_winer_defe.…

Dear Mom,

I know you and Dad worry about me sometimes, making it on my own in the world. It can be tough being a parent, wondering if you raised your kids correctly, hoping that I haven’t packed it all in and taken up motorcycle racing for a living.

In a recent recent news.com interview with Bill Gates, responding to a question about more and more people advocating patent and copyright reform, the man himself said:

There are some new modern-day sort of communists who want to get rid of the incentive for musicians and moviemakers and software makers under various guises. They don’t think that those incentives should exist.

You know that I have concerns that a copyright system in which nothing has passed into the public domain in nearly 80 years isn’t really serving the public good. You also know that I worry that a patent system that elevates mere ideas–despite implementation differences or concurrent innovation–into legal mechanisms to attack competitors is bad for free markets.

Don’t worry, though. I didn’t buy the motorcycle and I’m not a communist.

The only person who can call me a communist without starting an argument is my friend Melanie and that’s because she doesn’t really believe it (also, she’s a lot cuter than Bill Gates). I don’t particularly want to own the means of production; I wouldn’t know what to do with them.

In a comment on the Dan Gillmor weblog linked above, Stephen Downes responded with something I wish I had said:

If Gates came out and said, “Well, open source represents the preservation of private property and of open marketplaces,” people would wonder why Microsoft campaigns so hard against it. But it it’s ‘communism’ then it’s something everybody can understand as evil.

Mom, I think an economic and political system that grants monopolies based on ideas and allows the holders of these virtual monopolies to use the full investigative and judicial force of the government to maintain that monopoly isn’t exactly free market capitalism. Sure, it’s not communism, but when I’m the one arguing for fewer government-enforced monopolies and greater competition, it’s easy to see who actually favors free markets.

I’ll call you and Dad soon. I promise.

Love,
your son

Have you called your mother lately?

David Sklar

AddThis Social Bookmark Button

Do the designers at Apple truly have a monopoly on small, stylish, quiet desktop computers? If I want a small, non-ugly, non-noisy computer to run Windows or Linux on, what are my options? Do I have any options?

The Shuttle XPC is too big and its shape is too obtrusive. My scanner is 9 x 13 or so, but only an inch high, so it’s easy to have in an accessible location but be invisible when I’m not using it.

Are the “Ultra Small Form Factor” chasses (chassises?) from Dell, IBM, etc. the best I can hope for?

[Brief Update: After I viewed this entry, the context-sensitive AdSense ads at the bottom of the page had an ad for Simplified Innovation — which has some OK small case options.]

Where can I get a small, stylish, quiet PC on which I can run Windows or Linux?

Andy Oram

AddThis Social Bookmark Button

I started to get interested in robotics when I realized that an
intimate relationship with a robot would probably be part of my life
at some point. In Japan, robots are already being used to in elder
care, and the
Neurobotics Laboratory
at Carnegie Mellon University illustrates research to develop robots
that can aid disabled people.

Why is this so important? By time my generation retires, there may be
about three working people for each one of us, and two of those three
could well be employed taking care of us. We need to automate in order
to remain productive.

I will explore the social implications of what journalist
Phillip Longman
calls the global baby bust (Foreign Affairs, May/June 2004)
at the end of this article, but first I’ll report on some of the
interesting activities going on currently in robotics, based on a
conversation I had last weekend with
Geoffrey Gordon,
a research scientist focusing on reinforcement learning at CMU. To
me, Geoff described his specialty as applying algorithms found in
Artificial Intelligence to robotics.

To get some idea of the mind-boggling constellation of skills that go
into robotics, try browsing the
current projects
of the CMU Robotics Institute.

Software and networking

One major branch of CMU research goes into helping robots understand
their environment by communicating with other robots and drawing
conclusions from their combined data. Examples include figuring out
how an area is lit or heated from the values of sensors scattered
across the area. (Lighting is a popular research subject because it’s
so easy for the researcher to control.)

Algorithms for this kind of distributed machine learning can get
surprisingly complicated. Some traditional AI algorithms assume that
samples can be collected in arbitrary order, but the dynamic changes
of robotic systems, with actors in constant motion, means that the
order in which events occur must be respected more.

Among the constraints is keeping communication to a minimum, because
communication uses much more power than the CPU.

In adapting typical AI algorithms such as neural networks, one has to
deal with unusually high failure rates of individual network
nodes. Sensors in real-life environments are fragile, and their
wireless communications are subject to noise and interference.

Links between nodes in real-life sensor networks are asymmetric and
changeable in the quality of communications. It’s important to know
the quality of connections in order to find the most robust network
architecture, with the least transmission costs, that gives each node
the greatest chance of getting the data it needs.

While mesh networks (relatively unstructured collections of systems
with multiple redundant connections) are getting a lot of publicity, Geoff’s colleague Carlos
E. Guestrin has found that hierarchical trees work better for many
applications. Each node exchanges its data with nodes above and
beneath it; experimentation shows that each node ends with a
reasonably accurate approximation of what is happening in its
environment. The tree scales better than a mesh, and one can predict
the overhead required for each operation. A tree, however, has to be
programmed to reconfigure itself quickly, because the node failures
already mentioned can cause major breaks in the tree.

The problems of peer-to-peer data collection in robotic systems are
philosophically interesting, because one realizes that no single actor
can possess the whole truth, and that one’s understanding of the truth
is always restricted and distorted. In fact, the term “distorted” is
misleading, because it suggests that there is some absolute, ideal
truth we are approximating, a concept that is not particularly
helpful.

Hardware

Commoditization is a common theme in business and consumer
electronics, but sometimes it invades academia as well. Lab
researchers don’t always want to build everything from scratch, even
if doing so makes a machine that’s cheaper or more appropriate to the
application. Geoff Gordon would rather install something off the shelf
and get down to the work of his specialty faster.

Intel makes a low-cost chip called the
XScale
for embedded systems, used in many robots. But it is not well suited
to Geoff’s applications because it lacks a floating-point unit.
Activities such as mapping need trigonometric functions. While Geoff
could work around the missing FPU with such things as tables, that
would waste time and introduce the risk of bugs. So Pentiums rule the
robots. Linux is plenty small for these embedded systems, even without
custom recompilation.

I asked for impressive examples of commercially available robots, and
Geoff expressed a high opinion of the Sony
AIBO
toy. He said it was gratifying to see such a powerful and
well-designed robot in everyday settings.

Applications

I asked Geoff whether any big breakthroughs seemed to be imminent in
robotics. He said ruefully that one aspect of AI and robotics is that
big breakthroughs often seem imminent, only to prove much more
difficult than researchers expected. Progress tends to affect some
deep aspect of the problem. For instance, he suggested that, in his
own field of reinforcement learning, researchers were learning how to
abstract elements of reinforcement. This essentially is the production
of generalized algorithms and libraries that can be applied to many
areas that have, up to now, reinvented each other’s wheels.

Navigation is a well-known robotic task. Various kinds of robots are
tested in regular
Robocup
soccer matches. On a much larger scale, the Department of Defense
sponsored a robot race last year from Los Angeles to Las Vegas. (Don’t
worry, regular traffic is banned from the roads used by robots.) CMU’s
Red Team
did the best of all the contestants last year, going seven miles
before its Humvee went over a bump and was left with its drive wheels in the
air. I noted that much of the race took place in the desert, and
wondered whether, if the DoD sponsored such a race forty years ago, it
would take place in the Florida Everglades. Don’t the historians say
one always fights the last war?

I’ll end by describing some research in aid to the disabled, which
began this article. Robots are being incorporated into motorized
walkers, so that a small pressure can direct the walker in the desired
direction. Some researchers are even trying to figure out how to tell
whether someone is falling and to support her.

A deeper intervention into a disabled person’s environment is a set of
sensors that can tell if the person is disoriented and wandering
around an apartment. Hopefully these sensors can kick off some kind of
intervention to help anchor the person. We are used to thinking of
robots understanding and reacting to physical space and events. But
some research goes further, where they understand and react to
psychological states as well.

AI may provide guidance here. One of Geoff’s students created a model
that reproduced how rats were found to learn new behaviors in some of
the classic experiments. Machine logic and biological logic here
converged. But it will not be within my lifetime, I expect, that a machine
can reproduce the way I have woven the rambles of my lunchtime
conversation with Geoff into this blog.

The Need for Robots

Robotics is a fascinating field, and my lunch with Geoff just opened
up new questions for me. The title of this article, therefore, refers
not so much to machines that learn as to my personal learning about
machines.

As I mentioned at the beginning of this article, the world’s
population is expected to peak in the next century. The peak has
already occurred in the economically developed countries, and the
shortfall in labor is being made up with immigration. Within a couple
generations, if civilization survives, the rest of the world is
expected to take the same course.

In a well-known book, The End of Work, Jeremy Rifkin raised
the fear of billions of people displaced from the economy by
automation. That could well cause a catastrophe in the first half of
this century, but in the second half we might find that the potential
workforce is far too small.

The current debate over Social Security is a distant ripple from that
tsumami. Congress argues over funding. And while that issue is
important, it won’t solve the problem unless young people adapt by
buying a lot of cans of food and storing them in their basements for
fifty years. Realistically, somebody in future generations is going to
have to produce all the goods that the huge numbers of old people need
to buy. The only way to provide a decent lifestyle for a population of
declining size is to vastly increase productivity, and robotics
promise to play a key role.

brian d foy

AddThis Social Bookmark Button

I looked back at 2004 to see what was new and cool. I have to remind myself that 2003 mostly didn’t exist for me, so a lot of people found out about these things much earlier than I did. It’s almost the middle of January, so it’s time to stop thinking about this, too. Some things should have made the list, but they have wormed their way so far into my head that I don’t even think about them anymore.

RSS

Syndicated web sites were nice, but not all that useful to me until I started using a news aggregator. I really don’t like using a web browser or clicking through links, and NetNewsWire doesn’t make me. Aside from the annoying feeds that don’t include the whole article, that’s worked out pretty well for me. Netflix tells me when they’ve shipped movies, I can catch up on my LiveJournal friends, I can scan the Reuters headlines, and keep up with random people who rarely have new content. Lately my favorite feed has been Brand Autopsy.

Netflix

What do I like more? DVDs by mail, or giving Blockbuster the finger (which I can do from my home office window since it’s two block straight down the street)? Why choose? I get DVDs in the mail and give Blockbuster the finger.

I did try Blockbuster.com’s service though. I had the same experience as rollick of Livejournal: In three weeks I got six DVDs from Blockbuster.com. In the same time I got roughly 20 DVDs from Netflix. Who cares if Blockbuster is cheaper by the month when they take a week to get a movie to you? Netflix has one day turnaround in Chicago.

Subversion

CVS is a annoyance I tolerate, but Subversion cures all of that. Too bad more places aren’t using it, and too bad it’s not available for Sourceforge projects. This time, O’Reilly is ahead of the curve and already has a Subversion book.

SQLite

I download databases and share them because they’re just a file. A file. One (as in uno, ein, un, ichi) file! The files can get big, but everything is still in one (as in uno, ein, un, ichi) place.

DBM::Deep

Persistent multilevel data structures in Perl, without the pain and thought. I don’t have to know which DBM implementation is installed or which module to use. Very cool.

Camera phone / PDA

I got a Nokia 3650 early in the year. It was supposed to be the phone all the cool kids were getting, and as usual I came in at the tail end of that craze. I like it, I guess.

iSync is great, even if the software on the phone kinda sucks. The phone calendar application doesn’t handle multiple calendars from iCal and flattens it to one calendar. If I try to sync a change on the phone back to iCal, everything gets screwed up. I use the iCal event location field for a lot of different things (especially ticket confirmation numbers) since I know that field shows up on the phone. Kinda sad, actually

Even though the Nokia calendar and contacts applications are pretty crappy, I’ve simply lowered my expectations and learned to live with it. I gave my fancy Handspring Edge to my wife. I only ever really used it for the address book anyway.

The email application and file browser that came with the phone suck too, but ProfiMail is much better, although I got replace the built-in email application with it (i.e. for the Send>via email option). I think it’s really just the Nokia interface that I don’t like. I have to push too many buttons to do things I commonly do.

It has a crappy little camera, but I still take a lot of pictures with it. I do maintain a moblog, but more interestingly, I’ve started taking pictures of things I want to remember, like book covers and advertisements. Normally I’d have to write these things down.

Gmail

Google’s free email service is so cool that people are literally begging to use it. I still have people emailing me to beg for a Gmail invitation. This post is going to get me a few more of those, but they shouldn’t feel bad: it’s not that I don’t like them, I’m just pretending they don’t exist. Still even after saying that, people will send me flowering prose about why they should get one of my invitations.

The service does rock, though. I forward all of my mail there for easy searching.

Apple Airport Express

Everyone likes the Apple Airport Express mini-mes because they can stream iTunes all over the house. I like it because printer sharing just works. I had our home printer hooked up to my FreeBSD machine (running CUPS) for a while, but my wife’s iBook never liked that. I had the printer hooked up to my Pismo Powerbook for a while, but the printer would drop off the network for no reason or the Pismo would hang on something. With the Apple Airport Express, it just works, for everybody, all the time. That’s the most important feature of my home network: zero tech support calls from my wife or cats.

War driving

War driving was cool three years ago, and even though I knew about it, I didn’t do it because I don’t need to find a wireless connection and I don’t have a car. However, one day in Pasadena I was riding in the back seat of my brother-in-law’s car and trying to get some work done. I hadn’t turned off my Airport card, and as we passed a Starbucks I get a dialog asking me if I want to join the network. Um, no, but thanks anyway. I pulled up iStumbler and just watched it. It pinged every time it found a network, which was about once a minute. As we drove past Cal Tech, if went nuts. That’s cool.

Since then I’ve tried it in a couple cities I’ve had to travel to. I’ve even tried it on the Chicago El a couple times, but I stopped lest I get my ass kicked by a couple of homeless guys who want to sell my Powerbook for crack.

I did get a keychain WiFi detector, and it worked very nicely for a week. I left it behind in a rental car because I didn’t have a keychain to attach it to.

Spam Assassin

I tried Spam Assassin for a while, and I’m not using it anymore. It’s cool if you’re a techie and want to fiddle with a bunch of things and write code to do things, but I don’t want to do that. I don’t want to have to re-train my mail filter to recognize spam. I don’t want to think about it at all. My mail provider doesn’t allow user_prefs anyway. SpamBouncer does the right thing without me doing anything, but Spam Assassin demands quality time. This would probably be different if I were the sysadmin responsible for mail or didn’t know procmail, but I’m not and I do, so it isn’t.

The Perl Review

Oh, I’m publishing this magazine. I ditched LaTeX in favor of InDesign. Sometime open source just doesn’t cut it. I finally decided I wanted to publish a magazine, not write software to publish a magazine.

Bluetooth

One of the best things is also one of the worst things. Bluetooth lets my phone talk to my computer and my headset, but only one at a time. I don’t have the wires, but I do have to disconnect and re-connect a couple times just to do something different. This isn’t the way that Bluetooth is supposed to work, but it’s the way the implementors use it. I’m only allowed to do one thing at a time.

IT Conversations

Ever wondered what gets said at all the conferences you can’t afford to go to? IT Conversations gives me MP3 recordings of them all. Now I know that Tim O’Reilly has a stump speech and that a lot of conferences are really, really dull. There are some gems though: Wil Wheaton, whom I dismissed as a washed-up has been of a TV show I didn’t like is actually a very entertaining speaker, and IT Conversations has two hours of him. The interviews with the internet old-timers, like Leonard Kleinrock are also cool not only for the content, but the moderated passion that comes with experience and perspective.

TiVo

TiVo actually makes TV interesting. I’m not a big TV watcher (we muddle through with an old 29-inch CRT), but now I can watch every episode of the Simpsons that comes on in a day. Well, I used to do that and now I just delete them because that got really old really fast. We do record The Daily Show and watch it when we want instead of sticking around the house around midnight to see it, and I’m ticking movies off my Netflix queue by finding them on the air at odd times of the morning. I’ve got 90 hours of space on my Series 2 TiVo just waiting for data.

The coolest thing, however, if the pause and rewind features. Sure, fast forwarding through commercials is okay, but stopping the action completely so I can talk on the phone, or rewinding to catch some missed dialog (”Eh? Whadda say? Speak up boy!”) are much better features for me.

We got the lifetime subscription, and we’re about to earn out of that.

Atomic clocks

I got some of these wireless clocks that supposedly sync with the national atomic clock radio signal to synchronize the time. A couple are really fancy and have temperature displays. I haven’t seen any of them sync properly yet, but they do run on batteries and stay on when the power goes out, so that’s something.

Kevin Shockey

AddThis Social Bookmark Button

Related link: http://www.osdl.org/newsroom/press_releases/2005/2005_01_10_beaverton.html

Living in Puerto Rico and learning of open source made me curious. What is the state of open source in the world? After doing some research, I was impressed with what I found and I now believe that the biggest adoption rates of open source will be in China, India, Africa, Eastern Europe, Latin America, Southeast Asia, and the former Soviet Republic. As an example, today another Chinese Linux group joined the Open Source Development Laboratory (OSDL).

Red Flag Linux, the leading developer of Linux software in China, has joined OSDL and will participate in the lab’s Desktop Linux (DTL), Carrier Grade Linux (CGL), and Data Center Linux (DCL) working groups. This comes as no surprise for Stuart Cohen, CEO of OSDL. According to Cohen “OSDL is committed to the continued growth of Linux in what many experts believe will become one of the world’s largest software markets: China.”

One very simple way to look at my prediction is by examining the numbers. China and India represent about half of the world’s population. When you factor the other areas I mentioned the percentage probably rises to nearly three quarters of everyone on earth. Many of these areas are ripe for adoption of FLOSS with strong motivation for autonomy, high piracy rates with strong efforts by companies to combat these trends, and struggling economies that will appreciate the low entry cost of FLOSS.

OSDL Membership

By the way, if you are wondering how to join the OSDL, visit their “Join OSDL” page to get the details. For $1,000 your organization can become an Institutional Member. For this membership fee, you receive access to technical mailing lists, technical sub-groups, the ability to join committees, and to attend working group meetings. Institutional membership do not, however, receive any voting privileges. Probably the biggest benefit of membership is access and influence with Linux Developers, vendors, and other member institutions. Membership also helps the OSDL fulfill their mission of accelerating the use of Linux for enterprise computing. Which makes membership all the more important.

What do you think of the globalization of open source?

Kevin Shockey

AddThis Social Bookmark Button

Introduction

In a previous life I worked closely with the President of a small wireless carrier. During a “Brown Bag” lunch session with him, I heard one of the best descriptions of business. During the lunch one of his employees had a question involving obtaining more money for her department. His response will always stay with me. First he lamented that more than anything he wanted very much to grant her request. He further explained, however, that he gets many similar requests.

The Pie Metaphor

He then introduced the pie metaphor. He said that we should imagine that a company’s finances were like a pie. By this he meant that the company currently had a fixed revenue stream. This revenue stream represented the pie. Now this pie, he explained, has already been divided into pieces already. Therefore, he said, I would love to give you more pie, but you need to tell me whose piece to make smaller. He further explained that there was no way (at that time) for him to make the pie any larger.

Lessons

There are some very powerful lessons in this story. In general it gives us a simple view of how management sees not only a business but also the various needs of the organization. Managing the budget for a company is like serving pie to everyone on New Years Day. Everyone that wants (or deserves) a piece of pie gets one. When everyone is finished there is usually nothing left but the crumbs. This perspective is useful, but here is what you really need to know.

The Pie Hack

First, there is way to get more pie. All you have to do is bake a bigger pie, or even better bake two. If the company has a larger revenue stream, then you can get a bigger allocation in the budget. Now this is where you come in. I do need to provide a disclaimer though. What I’m sharing now, is what is possible. I’m not saying it is going to be easy to make it happen. I’m also not saying that you may not confront some resistance. I’m merely indicating the way to get more pie.

As a software developer or system administrator, your challenge is to figure out how to increase your company’s revenue. By increasing revenues, and making sure everyone is aware that you are responsible for the increase, then I assure you that you can ask for more pie. Even better though, you are very likely to get a bigger piece.

Here are a few ideas that you can use to try to increase revenues (Another disclaimer: You need to know your business. If you don’t know your business and how it makes money, then get out into the organization and start talking to those who do know what makes you company tick):

  • Create an internal system that helps your sales department sell more.
  • Create a system that helps your accounts receivable department collect faster or more efficiently.
  • Create a system (or actually get your customer relationship management system working) that permits your customer service center representatives to cross and up-sell your existing customer base.
  • Innovate along with your marketing department to create a new product.
  • Use business intelligence to improve management’s visibility of key financial trends.

If you are responsible for critical systems, your options are fewer, but don’t despair, there are some ways to make a difference. Basically, you need to improve your system up time and performance. By making sure critical systems that support revenue generating departments are available and responsive, you are indirectly helping the company increase revenue. Remember, as corny as it sounds, generating revenue almost always is a team effort. Yes a good sales force and well prepared company can still sell if the systems are not there. It does, however, make their jobs significantly easier if they are in top shape. Calculate the difference in sales when your systems are not available. Then use this difference to justify what your contribution is to the bottom line.

This leads me to the second lesson. Your existence in the organization is for a purpose. I guarantee you, if you do not contribute to the bottom line, in some form, you wouldn’t be there. So your challenge is to completely understand your contribution. This information is critical in any type of negotiation, budget allocations or salaries. You can use the “value” calculation presented above and/or any other situation where your contribution makes or saves money. If your contribution improves the numbers, then that is your impact.

Once you understand your “value” you next need to understand how much your organization currently perceives your value. The best place to look is the annual budget. Get a copy and get someone to teach you how to read how the budget is allocated. The amount of the budget that is allocated to your department is your value. Any discrepancy between your calculated value and your perceived value is your leverage. This assumes that your contribution is more than your perceived value. If it isn’t, don’t kid yourself. This situation will not exist for long. As soon as the company doesn’t make the numbers,. you can bet they will start looking for ways to save money. If you department fails the value comparison, expect some changes.

Finally, Jan Carlzon (the swedish Tom Peters) says that “All business is show business.” This is the focus the leaders of your company have. So how do you think you will be able to get them to pay attention? You need to put on a show. You need to show what is your contribution. You need to scream it, you need to sing it, you need to praise yourself until they stand up and take notice. Then once you’ve got their attention make them understand how much more money you can make them if they established a training budget for your department, established an open source laboratory, or bought the software developers new high-performance machines. Make them feel that they are ignoring potential revenue and you may see their attention grow.

Conclusion

The pie metaphor gives you a useful way to relate to management. Everyone knows that any tool that accomplishes that has to be powerful. Once you understand the metaphor, then you are equipped to benefit from your knowledge. In the end though, if all else fails, then use the knowledge of your organization and the relative (lack of) value of your sister departments to tell management what they asked for. Tell them whose piece should be smaller and substantiate the recommendation with your value analysis. If they don’t give you more pie, keep asking. Sometimes it is she who puts on the biggest show that gets more pie.

Have any good management hacks to share?

Uche Ogbuji

AddThis Social Bookmark Button

Related link: http://www.emacswiki.org/cgi-bin/wiki/CopyAndPaste

The above link probably has enough information in disjointed form for you to cobble together a working solution for you, but in a nutshell what worked for me with GNU Emacs 21.3-17 in Fedora Core 3 was adding the following to my .emacs::

(setq x-select-enable-clipboard t)
(setq interprogram-paste-function 'x-cut-buffer-or-selection-value)

Why these settings aren’t the default on Linux completely bewilders me. How does anyone conceive that the average user will not want reliable cut and paste between their editor and other apps? Certainly most users will not be willing or able to go through what I did to get this sorted out.

The problem that I was seeing was that sometimes middle mouse click would work for cut and paste, and sometimes it wouldn’t. It was very erratic, even within the same emacs session. Every now and then using keyboard rather than mouse, or using menu, or some other random variation would seem to help ephemerally. I’ve seen numerous reports from other users with this same problem on emacs lists and elsewhere, and I almost invariably see either no response, or some comment from from emacs gurus completely discounting the poor user’s problems. Clearly this is a widespread problem that needs some attention in the Emacs communities.

I had never been able to use the typical GNOME clipboard operations at all in the past (e.g. yank from emacs then Ctrl-V to Firefox/Mozilla/Gedit or Shift-Ctrl-V to gnome-terminal). I’ve similar reports from KDE users.

With the above lines in my .emacs, cut/copy/paste now seems reliable with middle mouse, and I can indeed use GNOME clipboard ops as expected in GNU Emacs. In XEmacs (21.4.15) paste to XEmacs works fine from clipboard, though explicit clipboard events do not seem to work from XEmacs to other apps. Middle mouse click does work better, so that’s good enough for me. Besides, I’ve migrated from XEmacs to GNU Emacs because XEmacs does not support nXML, James Clark’s superlative XML mode. If you do use XEmacs the top link includes some suggested variations for XEmacs users.

Overall, I hope that XEmacs, GNU Emacs and their various distributors work out the kinks so that such a simple matter as cut and paste is not a matter of medieval alchemical incantations. Until then, I hope my notes are of help to someone else.

Ming Chow

AddThis Social Bookmark Button

Related link: http://www.microsoft.com/downloads/details.aspx?FamilyID=321cd7a2-6a57-4c57-a8bd…

An interesting way for Microsoft to start of the new year indeed. The application is called Microsoft Windows AntiSpyware, currently a beta version. The download page touts the new program as:

[Windows AntiSpyware (Beta) is] a security technology that helps protect Windows users from spyware and other potentially unwanted software. Known spyware on your PC can be detected and removed. This helps reduce negative effects caused by spyware including slow PC performance, annoying pop-up ads, unwanted changes to Internet settings, and unauthorized use of your private information. Continuous protection improves Internet browsing safety by guarding over 50 ways spyware can enter your PC.

This news was obviously an eye-opener when I first saw the news on Slashdot. I am highly curious about this move by Microsoft. In particular:

  • Is Microsoft’s business going reactive? First, they finally decided to incorporate a (more visible) firewall in the operating system, which is not a novel concept (Linux and Mac OS X were already ahead of the game). I say that the firewall in SP2 is “more visible” because there actually was a way to enable a “firewall” in the original XP, though not really publicized to the general public. Now, a spyware removal program, which again, is not a novel concept. What’s next, Microsoft’s own anti-virus program?
  • How good is this new spyware program compared to Ad-Aware, Spybot, etc., which already exist, are popular, and are reasonably effective? Will this put companies such as Lavasoft (publisher of Ad-Aware) out of business?

The most important question I have is:

  • What does this say about Microsoft? The message that I am getting from this message is that: “Hey, Microsoft Windows is a highly vulnerable operating system, and spyware and viruses can easily propagate through it.” My point is, if an operating system is reasonably secure and designed correctly, then such spyware application and a plethora of third-party security utilities are not necessary. Even worse, Microsoft admits that their new spyware removal program is buggy. A buggy software on top of buggy software is not a good thing.

My skepticism on Microsoft’s move may sound a bit harsh. The good thing about this is that Microsoft is making the initiative to do something about the spyware problem. Most importantly, “not a lot of people understand what spyware is or how to contain it, that should change when a computer giant such as Microsoft brings it to the attention of the masses” (thanks St. Clown).

Kevin Shockey

AddThis Social Bookmark Button

Related link: http://www.pewinternet.org/PPF/r/144/report_display.asp

According to a new report by the Pew Internet & American Life Project blog readership grew 58% in 2004 and now stands at 27% of Internet users. Some other great estimates from the report include:

  • Only 5% of Internet users are estimated to use RSS to receive news and other information delivered from blogs and content-rich Web sites.
  • 12% of Internet users have posted comments or other material on blogs.
  • Still, 62% of Internet users do not know what a blog is.

Some rough conclusions from this report:

  • Blog readership will continue to grow as more Internet users learn about the technology. Therefore, the news of blogging’s demise are greatly over-rated.
  • RSS use will grow significantly as more Internet users learn about this technology. Expect even more readers to emerge and the over-all blogging and RSS landscape to become even more crowded.

Do you think blogging readership will continue to grow in 2005?

Kevin Shockey

AddThis Social Bookmark Button

Introduction to Management Hacks

This is the first of a series of articles that is similar to the new O’Reilly Hacks series. These hacks provide an insider’s look at the mysteries that lie behind management, or what I like to call “The Dark Side.” I’m sad to say that I crossed over to the dark side for about 7 years and even worked closely with the Emperor of a company, i.e. the President/CEO. Although I am deeply scarred, I survived the experience and I’m here now to share my knowledge.

These hacks will range from observations about what is the reality behind certain situations, suggestions to improve your relationship with your boss or employees (depending on your circumstance), and provide tips to help you understand business and the role you play.

Keep your Customers Informed

This first hack goes both ways. It is equally useful going up the chain of command as well as down. Maybe you have heard of the theory that comes to us from the customer service arena. A customer will generally wait longer and be more content with the final result if they are kept informed. This is fairly simple to grasp if we place ourself into that situation. How do you feel when you are waiting on a result and you have no clues about the status, whether any problems exist, or when you will receive the result? Well if you are like me you feel frustrated, anxious, and most of all powerless. As the time increases before you receive information these feelings increase. They continue to increase until you begin to feel anger and resentment.

The simple solution to alleviate these feelings is information. By communicating status or the expected completion date, we dramatically reduce the strong feelings that come from a lack of information. As a Software Development Manager, I remember senior executives explaining to me that as long as they knew when to expect the product they were satisfied. They typically did not care as much whether the product was going to be late or any of the reasons why it was going to be late.

However, it is important to recall that most people are not idiots. So the information must be sincere. So keep it real! In general though, providing feedback is an invitation to discuss the situation. If they desire a different result, then this exchange provides the opportunity. Usually though, I think you will find that just providing the minimal amount of feedback, people will be satisfied.

Some Quick Applications

Some quick applications that you can try immediately will illustrate this hack:

  1. While you are previewing messages in your in-box, you discover that someone has sent you an e-mail that requires your response. You know they want a response so respond immediately. As soon as you finish reading the message, take about 10 seconds to acknowledge that you have received their message and will be responding when pigs fly or whatever you want to say. You will be amazed how grateful they will be and how relieved you will be from such little effort.
  2. If you are trucking along working on something, and an emergency arises. Send a quick message to your “customer” to let them know that you have been side-tracked, but you will return as soon as you can to continue work on their deliverable. Again this should take around 10 to 30 seconds, depending on how well you type.

Free Your Time

I’m sure you are asking “Why bother?” Why should you waste your time? Easy, this is a hack because once you complete this simple response you are typically free to go about your business and do whatever moves you. Obviously you need to comply with your target date or your promises, but in the meantime wouldn’t you like to keep perfecting the process of how to build Mono from source on MS Windows (or whatever)?

You can clean your plate by responding to everyone you owe and leave it up to them to pursue more information. Trust me, by the time you finish the last make command, they will have not even begin to feel the same anxiety if you hadn’t thrown them this bone. So it’s up to you. If you like people constantly stopping by your cubicle bugging you for information, stay silent. In the meantime, I’m working on that thing you asked, but it shouldn’t be ready till next Tuesday, see you then!

Know any good management hacks?

Andy Lester

AddThis Social Bookmark Button

I don’t always expect computer journalism to be of the highest quality,
but Bob Evans’ column in the latest issue of InformationWeek (12/20/2004)
has turned into the print equivalent of a talk radio show about spam, providing a
non-critical platform for any old idea in the guise of public forum.
I understand that the web’s like that, but I expect a bit more from
print magazines. (I’ve
called out InformationWeek before
for equating “extreme programming”
with “pair programming.”)


Under the false headline “Readers’ Ideas Take A Bite Out Of Spam”, Bob
Evans prints letters from readers on how to get rid of spam.
Space is wasted on pointless ideas of retribution (”the
old English Navy used fleet whippings…”), but then he lists, unchallenged, some technological suggestions:

first, “Isn’t there a way to send a reply message to the spammer saying
that the address is no good?” (thank you, James A. Olson); second,
“… an E-mail tool that simply returns the message to the source, with
a header that says something to the effect of, ‘not interested’” (thank
you, George Archibald); and third, “Instead of servers just filtering
and dropping spam out of E-mail, send each unwanted message back to the
spammer with a message, ‘Returned to Sender’” (thank you, Bob Bucciferro).


That’s fine, but where’s the analysis? At the very least, Bob should have pointed out that all these ideas won’t work because the spammer has no reason, other than basic human decency, to not bother you. He should also have pointed out that yes, there is a way
to tell the spammer the message is no good. It’s a 550 response code
in the SMTP transaction, which the spammer gets and then tries another email address to see if it’s valid.
This approach is called a dictionary attack.


Then, more unworkable solutions:

… Greg Litchfield suggests that “each ISP charge its customers one-half
penny per E-mail sent [and delivered], all fees are paid in advance,
[and if you] want to send a million E-mails … ante up the $5,000″…


… John Lepant says we should “simply charge for E-mail: right now
it’s a free service: I can send 10,000 of these messages, or 10,000,000
or just one for the same price. Charge 1 cent per E-mail, and spam
will evaporate.”

Again, no commentary from Bob. Anyone who’s vaguely aware of the issues surrounding spam knows that it’s impossible to get all ISPs to do something, and that legislation is not global. It’s impossible to get “each ISP” to “simply charge for E-mail.”


(It’s also worth noting that all these ideas are good solid
CYJs: “Can’t you just…?”
If you think you can “simply return the message to the source” or
“simply charge for E-mail”, then you’re not thinking the problem through.
However, you can rest assured that others have.)


For real questions and answers, here are some sites:

I wouldn’t mind so much if InformationWeek weren’t aimed at
“440,000 Business Technology professionals”, the higher
level IT executives who may well believe these things.
How about providing some real news about real progress? How about covering
SPF or
SpamAssassin?
Certainly, content-based spam filtering is no panacea, but it’s a damn sight better than “just tell them I’m not interested”.
Providing a
platform for discredited ideas does nothing to take us forward in dealing
with the problem.

Jacek Artymiak

AddThis Social Bookmark Button

If you want to upgrade your OpenBSD boxes and have as little problems as possible, download the Jan 02 OpenBSD 3.6 snaphosts from the nearest FTP mirror. Make sure you test them before you put them to production use.

Uche Ogbuji

AddThis Social Bookmark Button

Related link: http://www.pault.com/pault/pxml/xmlalternatives.html

Yeah yeah I’m the XML cheerleader and all that, but I’m all for diversity and ecumenism. Paul Tchistopolskii, one of my favorite agitators, has updated his nice list of XML alternatives, including similar markup languages (and some completely different), XML short-hand formats and XML subsets.

Derek Sivers

AddThis Social Bookmark Button

Related link: http://www.postgresql.org/docs/7.4/interactive/plpgsql.html

I just made my first PL/PgSQL - it wasn’t so hard! This one auto-generates ISRC codes for songs.

THE SITUATION:
We generate ISRC codes for musicians who request them, and have to make sure that (2) they fit the proper format (2) they are unique

THE FORMAT:
2 characters for the current year (05)
3 characters for our company code (PROBLEM! see below)
5 characters for a unique serial-ascending integer

THE PROBLEM:
Our company generates more than 100,000 ISRC codes a year, so our “company code” of HM2 filled up! They had to assign us a new company code: HM8. Then that filled up! So we got one more: HM9. Hopefully that will last us a while (300,000 songs a year)

So we have to start with HM2, and assign song-digits up to 99999. Then switch it to HM8, reset the counter to 0, and count up to 99999 again. Then again for HM9. I wanted all this to be done in the database automatically, without depending on my PHP or Ruby logic script. So… time to write my first PL/pgSQL script! I had heart it was pretty easy, so just put aside an hour of focused attention.

Here’s how it turned out. (NOTE: this will make more sense if you read the CREATE TABLE below, first, and then the function.)


CREATE OR REPLACE FUNCTION current_isrc_company_code() RETURNS char(3) AS '
DECLARE
current_code char(3);
current_number integer;
BEGIN
SELECT INTO current_code, current_number code, number FROM isrcs ORDER BY id DESC LIMIT 1;
IF current_code IS NULL THEN
return ''hm2'';
END IF;
IF current_number = 9999 THEN
IF current_code = ''hm2'' THEN
current_code := ''hm8'';
ELSIF current_code = ''hm8'' THEN
current_code := ''hm9'';
END IF;
SELECT INTO current_number setval(''isrcs_number_seq'', 1, false);
END IF;
RETURN current_code;
END;
' LANGUAGE plpgsql;


-- USAGE: INSERT INTO isrcs (song_id) VALUES (12345);
-- It auto-creates the rest.
CREATE TABLE isrcs (
id serial,
year char(2) not null DEFAULT SUBSTRING(CURRENT_DATE, 3, 2),
code char(3) not null DEFAULT current_isrc_company_code() CHECK (code='hm2' OR code='hm8' OR code='hm9'),
number serial not null CHECK (number < 10000),
song_id int not null REFERENCES songs(id) ON DELETE CASCADE,
CONSTRAINT unique_isrc PRIMARY KEY(year,code,number)
);

Advertisement