January 2005 Archives

brian d foy

AddThis Social Bookmark Button

Computer power is all about monitor size. I remember moving aside a 13-inch monitor and hooking up a 21-inch monitor to my Quadra 650. It felt like my computer was 4 times faster.

Now I have a 17-inch external display (has anyone else noticed that Apple uses “display” and everyone else seems to use “monitor”?) that I piccked up from CompUSA on a special promotion: they are selling the Benq FP731 with a $40 offf in-store incentive and $80 of manufacturer rebates for an end total of about $180. I checked the reviews of the monitor: it didn’t get bad reviews, so for the price I figure I could take a shot at getting burned.

Now I have this nice and bright monitor putting my Powerbook display to shame. It’s the difference between washing your whites with your colored clothes, then having someone who knows something about laundry come along and wash the whites separately. The Benq is bright: 260 nits bright. I don’t know what a nit is, but this display has 260 of them and it makes my Powerbook whites look dingy grey. I even checked to ensure the cats hadn’t been stepping on the brightness keys (it’s that and Num Lock they they always seem to hit).

Curious things emerged when I added the display. The Powerbook detected the display immediately and things just worked. I had put the display to my left and arranged the displays in the control panel to match their physical arrangement. When I did that, I couldn’t get to my Dock anymore! I like the Dock on the left side. Now when my cursor wandered that way it kept wandering onto the the other display. It was a bit confusing because my mind told me that I wanted to get to another application, and my hand did the right movements, but my cursor ended up on the other display while I was trying to remember what I was doing.

I re-arranged the displays so the external one was “under” the Powerbooks. I think I have the going down stuff in my head, but from the external display I move to the right thinking I’ll end up on the Powerbook display. I end up activating Exposé since I use the hot corners for that. Exposé doesn’t have much to do on the external display, but al lthe windows on the Powerbook display move around.

I’m curious what will happen as I get used to this arrangement, then leave the external display behind as I travel. Will I keep trying to make the cursor fall off the bottom of the screen?

Still, despite my cognitive obstacles, my computer all of a sudden feels more powerful. I don’t have as many overlapping windows because I have more real estate to deal with. If anyone wants to donate one of the large Cinerama displays, I can report back on whether the perceived increase in power is linear or expontential.

Nitesh Dhanjani

AddThis Social Bookmark Button

Related link: http://www.oreillynet.com/users/files/102971/keynote-test.swf


Just got my copy of iWork. Among other new features, Keynote now exports to flash. If you are interested, click here to download a sample 2-slide presentation that I exported to flash (.swf). As you can see, the export feature works as promised. I just wish there was an option to easily add UI components. Perhaps Apple will add this in a future update?

Ming Chow

AddThis Social Bookmark Button

Related link: http://www.cs.tufts.edu/~mchow/excollege

Back in December, I announced that I am teaching a course entitled “Security, Privacy, and Politics in the Computer Age” offered by the Experimental College at Tufts University. The course is open to all Tufts undergraduate students, regardless of area of study. This coming week will be my second full week of class. Here are a few news and notes about my experiences so far:

  • I am exceptionally pleased with how things are going. The students and the responses that I have received are tremendous.
  • I was very worried in my first day of class about enrollment –only six students showed up. An Experimental College class must have a minimum of eight students or else it will be canceled. The following week, 15 students showed up and officially registered for my class, which was tremendous. One of the problems was because many students did not return to campus for the first day of classes. Another major factor for the significant increase of students is word-of-mouth advertisement. What a difference a week makes!
  • Many students were afraid that my course would be too technical or “that you needed to be good with computers.” Of course, that is not the premise of my class, and I alleviated all student’s fears by saying that outright in class.
  • There are only a few students that have some technical knowledge, which works to my advantage, considering that was the intended audience for my course. Many students are majoring in a humanity or a social science (e.g. English, Psychology, Economics, International Relations).
  • Students said they were interested in my course because “they want to know more about computers” and many also recognized that computer security threats are growing.
  • I asked a series of entertaining and preliminary questions to the class. See news item 0004 on the News section of my course website. In short, all students have used Windows and Macs. Only a handful (about 5) students have worked with UNIX or Linux. All students have received a computer virus or some kind of malware in their lifetime. Finally, all students have different personal privacy preferences.
  • My first full lecture was on the basic software development life-cycle. I had a very engaging activity in class where I divided the class into two groups (developers and Quality Assurance) and one CEO. I spoke very little about programming and programming languages, but it seems that the students have a good idea on the software development methodology.
  • My last lecture was a bit more subtle. I discussed proprietary vs. free vs. open source software. One problem I encountered is that many students were not aware of open source software and “what does it mean.” Many students said they were accustomed to popular packages such as Adobe Photoshop, Microsoft Office, and AOL Instant Messenger, and many did not know that there were alternatives to those popular packages. Many students initially did not even know what source code was.
  • I have already assigned some homework. My homeworks are very straight forward. I have already graded my first set of homework, and I was very pleased with what the students did. See my first homework assignment on the Assignments section of my course website. In general, many students were honest with their answers (e.g. they didn’t write things that they didn’t know), and they used their common sense.
Jacek Artymiak

AddThis Social Bookmark Button

Related link: http://www.devguide.net/books/openbsdfw-01-ed/index.htm

I’m close to finishing my first book for O’Reilly. Yes, it is about firewalls. It is a continuation of BFWOAP2, which is a continuation of the original BFWOAP.
Someone suggested I should make BFWOAP2 available in PDF, which I’m a little reluctant to do. But another person suggested that a PDF version of BFWOAP might be a good teaser for those considering a purchase of BFWOAP2 or the O’Reilly book.

OK, you want it, you got it. It’s not free, but it is a PDF without DRM. What? No DRM? Yes, I trust you.

Jono Bacon

AddThis Social Bookmark Button

A little while back, I decided I wanted to learn Python. Knowing some local Python fanatics, I was not exactly uneducated in the relative merits of the language, and although I mainly used C++ and Qt for GUI programming, I was increasingly wanting to write some GNOME software with Python. Although Qt made GUI programming easier, it seemed to me that Qt was mainly suited towards writing huge, complex tools; the kind of tools that Trolltech showcase on their website. I don’t deny that Qt is an incredible product, but I was keen on writing software easier.

I decided to make the move over to Python and started reading Dive Into Python. Well, let me be completely straight here - I only really read a few chapters of the book before I was eager to get on and just write code. So, I trawled the Internet and read a few bits of code, and before I knew it I was writing my own little programs. At this primitive point in my learning, I figured that I should really set myself a test program to write so that I can at least work towards a project. In my experience I had tended to abandon many of these little test programs, but I was determined to write something of use. The more I read and learned about Python, it seemed increasingly easy to use, and I thought, what the heck, I will be adventurous. Right…

The project I decided to work on was in retrospect, remarkably adventurous. I, like many other gadget freaks across the world, own an iRiver. These little MP3/OGG players are renowned for their capabilities, and are well liked in the Open Source world for their ease of use with Linux. Hooking one of these players up to Linux involves two steps. Firstly, you plug it into your USB port, and if you are running a distribution with Project Utopia, a little window will pop up to display the contents of the iRiver’s hard disk. You can now drag over your songs. After you have done this, the second step is to update the database on the iRiver. To do this you can use a special little command line program called iRipDB that has been written to generate an iRiver database file. This database basically contains details of the tracks, song length, genres and other information that can make the iRiver more pleasurable to use. Although these two steps are fairly simple, I was keen for the database generation on the iRiver to be as simple as possible. So, in my infinite wisdom of knowing Python for around four hours, I decided I would write a GUI front end to iRipDB called GNOME iRiver.

At this point, I faced some distinct challenges:

  • I needed to figure out GUI programming - I had never done any GUI programming with Python. I knew I needed to use PyGTK, but how so? Also, I was utterly uneducated in how Glade fitted into the picture.
  • I needed to figure out how to interact with the iRiver - how would my little program talk to the iRiver? Was this simple or did it involve some freaky Python bongo?
  • I needed to figure out how to run iRipDB from my program - was this a tough challenge? I suspected there would be a Python command to run a separate process, but how complex was this?
  • Oh, and I still needed to learn Python itself. Best not forget that one.

It seemed the odds were stacked against me, but I pushed on. The first step was to understand how to create a GUI program. In the C++ Qt programming world, this involves creating your user interface in Qt Designer and then using some C++ black magic to re-implement the generated class so you can write slots that hook up to the signals emitted by your interface. Although certainly usable, this way of working is not exactly easy for the novice programmer, and I suspected Glade had a similar way of working.

Not so. In the Glade scheme of things, it seems that I could create my interfaces in Glade and then use a tool called simple-glade-codegen.py to process the .glade file and spit out a file in which I could write my Python code. The script would generate all the function frosting that was required when creating connections between signals and functions in Glade. This in itself got me up and running in no time; I had a GUI program running in 5 minutes.

Dealing with the hardware was my biggest fear. Whenever I have considered programming with hardware on a Linux system before, I have always assumed it to be a hugely complex maze of code. I am pleased to report that times have changes substantially. The recent work going on in the HAL and DBUS camps has given us the ability to easily connect to the HAL daemon and find out properties about the hardware on the system. These properties include all detectable information, as well as the ability to augment this information with special device information files.

With Python support available for HAL/DBUS, I plodded on to understand how to use the technology. Unfortunately, there was virtually no documentation on how to use HAL/DBUS with Python, and the only documentation was the HAL Specification, which is written in a language that assumes you already know everything about HAL anyway. I did skim through the spec, but it was pretty much lost on me, being the numpty programmer that I am. I subsequently turned to the HAL mailing list which is populated by some notable hackers, and each of them are very responsive. I first posted to see where to begin, and the resultant information, combined with the code from the HAL Device Manager, gave me the opportunity to get started. I sent some later posts asking for some more details, but the real help came from David Zeuthen on IRC. David walked me through the process of searching for specific devices. I plan on writing up this process sometime soon for the benefit of others.

With the hardware support working in literally a few days, I was utterly impressed how simple it was to deal with hardware with HAL. I then cracked on to implement the last remaining bits and pieces to actually make the program work. After a few mails from people who had read about my progress with GNOME iRiver on my blog, I decided to release an early (but buggy) version to see what these people thought. They reported success.

Astute readers of my previous work will know that I place a lot of importance in simplicity. This is part of the reason I have made certain technical decisions in my work, and this is the reason why I have written quite a lot about the importance of simplicity and usability. With these beliefs in mind, I see no reason why simplicity and usability cannot extend to the programmer. Although programmers provide a translation layer to convert between the technicality of code and the importance of functionality and usability, this translation can be made easier if simplicity is engineered in the programming side of the wall.

This experience has demonstrated some important principles in development for me. Firstly, the ease of use of Python, and the fact I don’t need to worry about many of the pure mechanics of coding (such as declaring types, memory management etc) lowers the bar for people to get involved in coding. I have never considered myself a fantastic programmer, and I have always had to work to understand concepts that natural programmers can execute in their sleep. Although I feel I understand the program design/usability side of the fence better than the programming side, Python has managed to make this translation process easier.

Simplicity breeds simplicity. If a culture of simplicity is engineered into the nuts and bolts of software authoring, it is likely that the core values of simplicity and usability will transcend to each new layer in the software stack. Take a look at GNOME for example. In recent years, a lot of emphasis has been placed in engineering good usability into the platform. This culture has been propagated by programmers such as Seth Nickell, Nat Friedman, Miguel de Icaza, Joe Shaw, Jeff Waugh and others. What we are seeing now is a definitive peer requirement that usability is well considered at every step of the development process.

This same kind of concept appears to be implemented inside the Python community. With the core Python language setting the bar for ease of use, each of the peripheral language additions and tools have a clearly defined level in which to match ease of use. We can see this in how HAL/DBUS are implemented, how PyGTK works, and how it all works with Glade. It would be pointless if Python was an easy to use language, but each of the tools built around it involved baffling levels of complexity.

It is great to have experienced such a simple and effective programming environment. With recent developments such as XUL, Mono, Ruby and other languages, a fundamental focus seems to be placed on the availability of quality high level development tools. I remember some years back hearing a barrage of snobbery about higher-level-than-C languages, and I am pleased to see this generic view has shifted. By making toolsets such as the one I have discussed in the article available, we are opening up the game to a lot more people. Sure, you may not want OpenOffice.org written in Python, but coding is not as black and white as small and large applications.

Have you had a similar experience? Should the bar be lowered? Share your views here…

Nitesh Dhanjani

AddThis Social Bookmark Button

Related link: http://www.mikematas.com/blog/2005/01/how-to-make-life-poster.html

Just came across this tutorial by Mike Matas, which shows you how to create and order a 20″ x 30″ poster through iPhoto. I just followed Mike’s instructions and placed an order for a poster consisting of 98 pictures from my iPhoto Library. Can’t wait until I get it. Pretty awesome idea!

Kevin Shockey

AddThis Social Bookmark Button

Related link: http://www.medsphere.com/media/press/20050125.rbw

At this year’s Open Source Business Conference, Geoffery Moore will pronounce that open source has crossed the chasm. Since I haven’t had the privilege of hearing what he will use as proof of this conclusion, I’ll reserve my judgment. I hope Doug Kaye over at IT Conversations covers the OSBC this year just in case I can’t make it. Recall that technologies begin in the early market, but must cross a chasm before joining the mainstream market and receiving widespread acceptance.

I will share something I’ve been watching closely for the last year. I’ve been on the alert for any venture capital deal, that involves a company related to open source. A few that you might already be aware of are JBoss, MySQL, and GlueCode to name just a few. The latest to join their ranks is Medsphere Systems Corporation. My belief is that tracking the flow of venture capital money into these companies will signal how well open source business models are doing. The obvious connection follows, the more money that flows into open source projects and open source communities, the quicker that open source will enter the mainstream market.

Medsphere enhanced the Department of Veteran Affairs. (VA’s) highly acclaimed, open source VistA EHR to develop Medsphere OpenVista for the commercial market. Over 15,000 physicians and 56,000 nurses in more than 1,300 healthcare organizations, including 160 medical centers and 850 clinics, are currently using VistA.

The apparent business model for Medsphere seems to be fairly tried and true. It is based on the delivery of services. Their current service offering includes deployment, training, support, and custom development. It is not clear whether their software is still open source, only that is was based on the original VistA. I didn’t found much information about the software to base any type of conclusion.

Note: There is a WorldVistA project hosted at SourceForge.net, and a OpenVista as well (no files here though). I do not know what relationship the Medsphere software has with the WorldVista project.

Hear of any other deals lately?

Uche Ogbuji

AddThis Social Bookmark Button

Peter Sefton’s “Hacking Open Office” comes at an energetic time for OOo. I’ve been pointed to ooo2dbk, a tool for generating DocBook XML from OpenOffice.org documents, which opens up even more XML possibilities. Let me also mention my complementary article to Sefton’s, which covers some additional topics such as XML Catalogs.

Jacek Artymiak

AddThis Social Bookmark Button

I’m thinking of starting my own podcast. I have a book to finish first, but when I’m done I’d like to set up my own little podcast corner studio and have a go at it, see what happens. Yesterday, feeling a little dizzy after all the medication I’ve been taking recently, I gave up on editing my writing and went online to check out what I might need to start. My shopping list includes the following:

  • a microphone — that’s easy, I’m going for an inexpensive Shure stick I found online for a reasonable price.
  • a microphone stand — my local music gear dealer will make a few zlotys on that one.
  • a mixer — I’m going to try the inexpensive Behringer gear with two microphone inputs. Can’t find them in the local stores, so I’ll order one online.
  • cables — damn, more cables… my wife won’t be happy when she sees more cable around the apartment.
  • headphones — I already have a pair of light studio headphones.

I will also need some sound effects, background music, and loops. I can get by with free stuff found on the Internet, but I really like the commercial sound effects libraries.

And I’ll need a host with plenty of disk space and no monthly transfer limit. For a reasonable amount of money.

So, summing up, the people who will make money on podcasting will be: manufacturers of MP3 players, microphones, mixers, cables, headphones, and all other audio gear, audio software (not sure about that one, Audition is free), ISPs, hosting providers, sound effects and music libraries, and possibly sound engineers and studios who may want to make extra money helping podcasters achieve broadcast quality. Oh, yes, publishers are no doubt working on podcasting books as I type.

But what about the podcasters themselves? One idea would be to sell past shows for download money or publish them on CDs. But that is so old and tried. And it doesn’t work that well. Maybe a better idea would be to sell subscriptions to th new shows and give the old ones away? Or combine both approaches and provide a small window of free download opportunity? Who knows. Oh, yeah, the most popular shows will be able to sell ad space, but that’s still a long way ahead. Podcasting is still a geek toy, it may be a couple of years before ordinary people discover the joys of audio without pre-programmed music created by people who talk about the things other people actually want to listen about. But it will happen. Cool!

Derek Sivers

AddThis Social Bookmark Button

Wouldn’t it be nice if we would just tell the customer, at time of purchase, “We’ll ship this thing you’ve ordered at the lowest rate possible, and bill you exactly what it turns out to cost.” But nooooo…. people want you to PREDICT what it’s going to cost!

RIGHT NOW : CD Baby only sells CDs, only from one warehouse, so we can predict the shipping cost pretty easily.

BUT SOON : CD Baby will have multiple warehouses in different countries *and* allow some musicians to ship items directly from them to the customer *and* allow the customer to split up their order to have some items sent to multiple addresses.

SO… guess what we have to know now?

#1 - what country the customer is having this item shipped to

#2 - warehouse_stock for this item - (to know closest warehouse that has it)

#3 - what shipping methods are allowed from that warehouse to their country (fedex, usps, etc)

#4 - cost to ship that item from that warehouse by that method to their country

#5 - which of those shipping options (from #3) the customer chooses

#6 - what other warehouse items are in order, so for example we can give discount if 50 different items going from one warehouse in one shipment

#7 - how much of a discount to give, for that. (or, what the ship-cost for that many items is)

Haven’t written the code, yet, to calculate all of this. It’s a little daunting.

Been there? Done that?

Derek Sivers

AddThis Social Bookmark Button

In 1.2 million CDs sold, I had never thought of “shipments” as a separate table. I always thought of it as attributes of an order, but if we’re going to be sending things from multiple locations, we’ll have multiple shipments for an order, each shipment with its own attributes, so…

SITUATIONS:
* - some items in an order backordered, so they’re shipped later
* - FedEx as main shipping method, but one backordered CD sent USPS later
* - one CD sent from Japan, one CD sent from Canada, both to a person in Switzerland
* - customer orders only one “merch” item, which the musician sends themselves: “shipment” is what that merchant tells us it is
* - some items in an order sent from our warehouse, some from self-ship merchant

new idea: make a table to keep track of shipments. a shipment has its attributes that need to be kept track of, and the lineitems in an order just link to this shipment.

This kinda turns our existing internal model upside-down, but the more I think about it, I realize it makes a lot of sense. I love it when you come across things like this that make you look at your system in a whole new way.


CREATE TABLE shipments (
id serial PRIMARY KEY,
warehouse_id int REFERENCES warehouses(id) ON DELETE RESTRICT,
address_id int not null REFERENCES addresses(id) ON DELETE RESTRICT,
date_shipped timestamp(0) with time zone,
shipped_by varchar(8),
ship_method_id int REFERENCES ship_methods(id) ON DELETE RESTRICT,
tracking text
);


CREATE TABLE lineitems (
id serial PRIMARY KEY,
inv_id int not null REFERENCES invoices(id),
item_id int not null REFERENCES items(id),
address_id int REFERENCES addresses(id),
shipment_id int REFERENCES shipments(id),
linestatus int not null REFERENCES line_status(id) default '1',
quantity int not null default '1',
currency char(3) not null default 'USD',
price numeric(7,2) not null,
wholesale numeric(7,2) not null,
shipcost numeric(6,2),
soundscanned date
);

Nome sane?

Derek Sivers

AddThis Social Bookmark Button

Sometimes in my blog, here, I won’t have time to write a full entertaining narrational “article” about something, so I’ll just quickly paste in some thoughts that you may find useful if dealing with similiar problems or situations on your end.

Here’s one: GIFT CERTIFICATES
I used to think of them as just items in a cart, with a negative balance. But here’s another way to think of them…

# The membership-account idea : SUBTRACTION ON YOUR ACCOUNT

Little Jimmy gets a gift certificate from grandma.
He comes to CD Baby, and puts some CDs in his cart.
Upon checkout, we ask him to create an account here so we know who he is.
Anywhere in the process, once we know who he is, he can tell us if he has any gift certificates.
By entering their passcode, it adds the full amount of the gift cert to his account - permanently.
He can do this with multiple gift certs, and it will keep adding to this single amount.
Whenever he’s buying anything, the total cost of his order (including shipping) has this gift-cert balance subtracted from it.

== HOW IT WORKS, INSIDE:
Someone purchases a gift-cert. It does nothing but create a giftcert, (asking them for optional extra info, like who to say it’s from, to, and a message with it).

When Little Jimmy comes to use it, we update that giftcert with his customer_id, used=true, date_used=now AND:
A new entry in giftcert_entries with the giftcert_id, and the negative-amount of the giftcert total.
The above two steps are a single transaction, like double-entry accounting. We took it out of one column, into another.

(A sum of giftcert_entries tells us his total gift balance: -20)

He completes his order, total $17 - so his $20 giftcert is used to pay for the order:
A new entry in giftcert_entries with the invoice_id and the positive amount of the giftcert used.

(A sum of giftcert_entries tells us his new total gift balance: (-20 + 17 = -3))

== DATABASE:

CREATE TABLE giftcerts (
id serial PRIMARY KEY,
code char(10) not null UNIQUE,
amount numeric(8,2) not null CHECK (amount > 0),
email text not null,
from text,
to text,
message text,
used boolean not null default false,
customer_id int REFERENCES customers(id) ON DELETE RESTRICT,
date_used date
);


CREATE TABLE giftcert_entries (
id serial PRIMARY KEY,
customer_id int not null REFERENCES customers(id) ON DELETE RESTRICT,
entry_date date not null default CURRENT_DATE,
amount numeric(8,2) not null CHECK (amount <> 0),
giftcert_id int REFERENCES giftcerts(id) ON DELETE RESTRICT,
invoice_id int REFERENCES invoices(id) ON DELETE RESTRICT
);

You got a better way, punk?

brian d foy

AddThis Social Bookmark Button

Related link: http://www.apress.com/book/bookDisplay.html?bID=307

Apress took the best of Randal’s columns from WebTechniques, Linux Magazine, SysAdmin, The Perl Journal, and some others and put them in one book. They are also available online at Stonehenge’s website, but I prefer books myself. Still, it’s the most popular part of the website, even counting his collection of pictures of clouds and food

I’m not going to review Randal’s book (since I work for him), but someone will have a review in the next issue of The Perl Review.
I do get something out of this, though: If you buy enough books, Randal might buy me a nice steak dinner again.

Ben Lieberman

AddThis Social Bookmark Button

I have just submitted my final copy of my book “The Art of System Modeling” to O’Reilly for a tentative July publication. I have therefore set up this weblog to offer folks the chance to chat with me about the book, and any other topic that seems to be useful. I look forward to reading the various comments of the O’Reilly audience.

Thanks!

I would like to hear from you!

Uche Ogbuji

AddThis Social Bookmark Button

First of all, I follow
LinuxHardware.org, which posts news and article links related to Linux hardware. It’s a Slashdot-like site, and you may also consider
Slashdot’s Linux section, although there is a lot more noise there (often very entertaining noise, to be fair).

LinuxHardware.net is a complementary site, essentially a search engine for external resources relating to Linux/hardware issues.

It’s always worth referring to the Linux Hardware Compatibility HOWTO. It’s not as rapidly updated as more specialized resources, but it’s still a good bedrock resource for finding and working with Linux-friendly hardware.

Laptop users have an especially useful resource: Linux on Laptops. This is a compendium of user-submitted HOWTOs for a specific Laptop model and Linux distribution.

Also of use is the Linux tested site, which includes nice charts organized by hardware category and distribution, and
Linux Online’s list of Linux-friendly harware vendors.

Of peripheral (no pun intended) interest is Linuxdevices.com, which covers the amazingly rich world of embedded Linux. It offers news on embedded Linux phones, PDAs, digital cameras and camcorders, routers, and more. It also has news and resources for embedded Linux developers.

What are your favorite Linux hardware resources?

Kevin Bedell

AddThis Social Bookmark Button

In this great article from MIT’s Technology Review, Michelle Delio presents 13 ideas for putting the Mac mini to work as your non-primary computing machine.

My favorite? As the center of a Media/Internet/Communications hub for your automobile. The Mac mini’s media capabilities combined with its bluetooth and other features will make it the coolest car add-on gadget since air conditioning.

A company called Classic Restorations is now taking orders to a install Mac mini in your car. According to their President, Melvin Benzaquen, “For around the price of mounting an iPod in your car, you get a whole Macintosh computer.”

brian d foy

AddThis Social Bookmark Button

Related link: http://versiontracker.com/dyn/moreinfo/macosx/21941&vid=127425

I’ve had to use my Powerbook’s modem twice in the past month, and each time I got a hanging disconnect. The first time I waited a while, unplugged the phone cable, logged out, and other useless things before I rebooted. I hate rebooting. My uptime, a silly metric I take great pride in, was 38 days at that point (only because 38 days ago I upgraded the system software).

This time my uptime was 27 days (since the last time I had this problem during a holiday visit to family and had to use a modem). I figured there was a way to fix this with administrator kung-fu, but my sysadmin muscles have atrophied since I started using Mac OS X. Not only that, my prime resource, the internet, isn’t available until I fix the problem.

Although I went through my process list and killed everything that looked like it was using the modem, I fixed the symptom: the scrolling “Disconnecting…” status message in my menu bar. Changing the modem settings and applying them (say, from modem sounds off to on) looked like it gave me back my modem, but Internet Connect only looked like it was responding and the status didn’t change from “Idle”.

Now I’ve found “End Hanging Disconnect”, which is really just an AppleScript wrapper around sudo killall pppd, although it runs it 5 times just to make sure.

The little script has gotten some good reviews, but now I have to wait for a chance to use it. I’d like to see if it works for me, but I’d also like to never have that problem again. What’s going to win?

Uche Ogbuji

AddThis Social Bookmark Button

The Python/XML community has an unfortunately long tradition of dodgy benchmarks. I had a lot to say about probably the most egregious example in my article on PyRXP. PyRXP is called an XML parser, and its developers benchmark it as such against other Python/XML parsers. The problem is that it turns out PyRXP is not an XML parser. It fails the most fundamental conformance to the most important aspect of XML: Unicode support. As a result, a benchmark of PyRXP against an XML parser is ludicrously unfair. In my article I had a lot to say about how poisonous such unfair benchmarks are.

On the less egregious end are benchmarks of libxml2’s default Python binding, which is in many ways so gnomic (no pun intended) and trecherous that it’s also an unfair comparison against most Pythonic XML tools. It sounds as if Martijn Faassen’s lxml is making decent progress towards rectifying this.

But I must say that the benchmarks that were the last straw for me came from an old friend. Fredrik Lundh (”/F”) is IMO one of the few XML package developers in the Python community who really understand both Python and XML. This has been generally borne out in his ElementTree library, about which I’ve always had a lot of good things to say. cElementTree
came along and suddenly raised the Python/XML benchmark sweepstakes once again. As part of promotion of cElementTree, /F posted a benchmark on the home page. The benchmarks are very flattering to cElementTree, and it’s probably deserving of some such flattery, but as I examined the performance issue a bit more, I’ve come to conclude that his benchmarks are pretty much useless.

The problem is that besides a performance bug in my own Amara 0.9.2, which /F brought to my notice, and that was fixed in the subsequent release, I was unable to reproduce under real-world conditions anything like the proportions implied in /F’s benchmarks. Well, /F pretty much admits that all he’s doing in his benchmark is reading in a file using each library. Hmm. This is not the stuff of which useful benchmarks are made. Nobody reads in a 3MB XML document just to throw all the data away, least of all Python developers who have long been vocal of their desire to do as little with XML as possible. Of course of I can’t be 100% sure in this complaint because I haven’t seen the benchmark code, but then again that’s just another complaint.

I set out to run at least one real-world benchmark, in order to determine whether there is anything to the no-op benchmarks /F uses. The basics come from
this article, where I introduce the Old Testament test. The idea is simply to print all verses containing the word ‘begat’ Jon Bosak’s Old Testament in XML, a 3.3MB document. A quick note on the characteristics of the file: it contains 23145 v elements containing each Bible verse and only text: no child elements. The v elements and their content represent about 3.2 of the file’s total 3.3MB. In the rest of this article I present the code and results.

I’m working on a Dell Inspiron 8600 notebook with 2GB RAM. It’s a Centrino 1.7GHz, which is about equivalent to a P4-3GHz (modulo the equally wacky world of CPU benchmarks). The OS is Fedora Core 3 Linux, and I’ve tuned DMA and the like. I’m running Python 2.3.2. The following are my pystone results:

$ python /home/uogbuji/lib/lib/python2.3/test/pystone.py
Pystone(1.1) time for 50000 passes = 2.99
This machine benchmarks at 16722.4 pystones/second

I ran each case 5 times and recorded the high and low run times, according to the UNIX time command. In understand very well that this is not quite statistically thorough, but It’s well ahead of all the other such benchmarks I’ve seen in terms of reproduceability (I present all my code) and usefulness (this is a real-world use-case for XML processing).

First up: plain old PySAX. Forget the performance characteristics for a moment: this code was just a pain in the arse to write.

from xml import sax

class OtHandler(sax.ContentHandler):
    def __init__(self):
        #Yes, all this rigmarole *is* required, otherwise
        #you could miss The word "begat" split across
        #multiple SAX events
        self.verse = None
        return

    def startElementNS(self, (ns, local), qname, attrs):
        if local == u'v':
            self.verse = u''
        return

    def endElementNS(self, name, qname):
        if (self.verse is not None
            and self.verse.find(u'begat') != -1):
            print self.verse
        self.verse = None
        return

    def characters(self, text):
        if self.verse is not None:
            #Yeah yeah, probably a tad faster to use the
            #''.join(fragment_list) trick, but not worth
            #the complication with these small verse chunks
            self.verse += text
        return

handler = OtHandler()
parser = sax.make_parser()
parser.setContentHandler(handler)
parser.setFeature(sax.handler.feature_namespaces, 1)
parser.parse("ot.xml")

I get numbers ranging from 2.32 - 3.97 seconds.

Next up is PySAX using a filter to normalize text events, and thus simplify the SAX code a great deal. The filter, amara.saxtools.normalize_text_filter is basically the one I
posted here, with some improvements. The code is much less painful than the PySAX example above, but it still demonstrates why SAX turns off people used to Python’s simplicity.

from xml import sax
from amara import saxtools

class OtHandler(sax.ContentHandler):
    def characters(self, text):
        if text.find(u'begat') != -1:
            print text
        return

handler = OtHandler()
parser = sax.make_parser()
normal_parser = saxtools.normalize_text_filter(parser)
normal_parser.setContentHandler(handler)
normal_parser.setFeature(sax.handler.feature_namespaces, 1)
normal_parser.parse("ot.xml")

I get numbers ranging from 2.66 - 4.88 seconds.

Next up is Amara pushdom, which tries to combine some of the performance advantages of SAX with the (relative) ease of DOM.

from amara import domtools

for docfrag in domtools.pushdom(u'v', source='ot.xml'):
    text = docfrag.childNodes[0].firstChild.data
    if text.find(u'begat') != -1:
         print text

I get numbers ranging from 5.83 - 7.11 seconds.

Next up is Amara pushbind, which tries to combine some of the performance advantages of SAX with the most Pythonic (and thus easy) API I can imagine.

from amara import binderytools

for v in binderytools.pushbind(u'v', source='ot.xml'):
    text = unicode(v)
    if text.find(u'begat') != -1:
         print text

I get numbers ranging from 10.46 - 11.40 seconds.

Next up is Amara bindery chunker, which is the basis of pushbind.

from xml import sax
from amara import binderytools

def handle_chunk(docfrag):
    text = unicode(docfrag.v)
    if text.find(u'begat') != -1:
        print text

xpatterns = 'v'
handler = binderytools.saxbind_chunker(xpatterns=xpatterns,
        chunk_consumer=handle_chunk
    )
parser = sax.make_parser()
parser.setContentHandler(handler)
parser.setFeature(sax.handler.feature_namespaces, 1)
parser.parse("ot.xml")

I get numbers ranging from 9.44 - 10.27 seconds.

Finally, I look at /F’s cElementTree.

import cElementTree as ElementTree

tree = ElementTree.parse("ot.xml")
for v in tree.findall("//v"):
    text = v.text
    if text.find(u'begat') != -1:
        print text

I get numbers ranging from 1.53 - 3.18 seconds.

So what do I conclude from these numbers? As I’ve said before, the speed of cElementTree amazes, me, but it’s advantage in the real world is nowhere near as dramatic as /F’s benchmarks claim. More relevant to my own vanity, Amara 0.9.3’s disadvantage in the real world is nowhere as dramatic as /F’s benchmarks claim. IMHO, it’s close enough in performance to all the other options, and offers so many advantages in areas besides performance, that it’s a very respectable alternative to any Python/XML library out there.

But the point of this exercise goes far beyond all that. We really need to clean up our act in what is a very strange political battleground in the Python/XML space. If we’ve decided that MIPS wars are what we’re going to be all about in development, then let’s benchmark properly. Let’s gather some real-world use-cases and normalized test conditions. Let’s make sure all our benchmarks are transparent (at least release all the code used), and let’s put some statistical rigor behind them (not an easy thing to do, and not something I claim to have done in this article). Let’s do all this as a community.

While we’re at it, I’d like to repeat my call for test case diversity from my PyRXP article: [R]un the tests
on a variety of hardware and operating systems, and [don’t]
focus on a single XML file, but rather examine a variety of XML files.
Numerous characteristics of XML files can affect parsing and processing
speed, including:

  • The preponderance of elements versus attributes versus text (and
    even comments and processing instructions)
  • Any repetition of element or attribute names, values and text content
  • The distribution of white space
  • The character encoding
  • The use of character and general entities
  • The input source (in-memory, string, file, URL, etc.)

And if we’re not willing to do things rightly, let’s stop deceiving users with meaningless benchmarks.

What real-world conditions would you like to see represented in respectable Python/XML benchmarks?

Derek Sivers

AddThis Social Bookmark Button

A big change I’m adding in the CD Baby rewrite is the ability for the store to be browsing/searching only a subset of its items. This is useful for genre-specific stores, say jazz.cdbaby.com, where browsing and searching the catalog would only show you jazz albums.

I planning, I was calling these “LIMITERS”, because in SQL terms, I imagined it would work like this: Say I’m searching and browsing the store, looking at top-sellers or new-arrivals:

SELECT * FROM items ORDER BY sold DESC LIMIT 20
SELECT * FROM items ORDER BY date_added DESC LIMIT 20

… if you add a LIMITER of showing only jazz, the queries become:

SELECT * FROM items WHERE style='jazz' ORDER BY sold DESC LIMIT 20
SELECT * FROM items WHERE style='jazz' ORDER BY date_added DESC LIMIT 20

… if you add another LIMITER of showing only artists from Sweden, the queries become:

SELECT * FROM items WHERE style='jazz' AND location='SE' ORDER BY sold DESC LIMIT 20
SELECT * FROM items WHERE style='jazz' AND location='SE' ORDER BY date_added DESC LIMIT 20

… and so on.

At first I figured some smart object would parse a config of limiters and add it to all SQL queries. But I realized that would lead to some awfully slow queries, since some of the other queries in browsing and searching the store can get pretty complex on their own, and adding in the extra limitations would be even worse.

Jeremy suggested memcached. Brilliant! We pass it all the complex queries we want, and just tell memcached to remember the results, so our customers browsing the site see it fast. No extra work needed. Speed problem solved.

BUT… since one of my planned LIMITERs would be a join on another table, we foresaw that trying to get a class to just add a join to all of our queries is asking for trouble. The whole “complicated queries on the fly” idea had to be nixed.

So then I thought that I’d export a bunch of nightly cache tables of all of our items with these various limiters. Say, “jazz.db” would be a subset of our catalog, already defined as only jazz, where we’d have all the fields needed for browsing (id, artist_name, item_name, price, description, etc) - and search only that cache-table when browsing the jazz store.

We were about to do this when we looked at the stuff for browsing, realized it was all just duplicated cache-info, and all that we really needed to know is what item ID#s are included in this genre-specific store we’re browsing! That’s it! Everything else can be reliably joined in. No need to export entire new databases or tables. Just give a list of item IDs. Let’s call it catalogs and it would look like this:

CREATE TABLE catalogs (
id serial primary key,
name text,
description text
-- etc...
);
CREATE TABLE catalog_items (
catalog_id int not null REFERENCES catalogs(id),
item_id int not null REFERENCES items(id),
PRIMARY KEY (catalog_id, item_id)
)

Looking at top-sellers or new-arrivals becomes:

SELECT items.* FROM items
LEFT JOIN catalog_items ON items.id=catalog_items.id
WHERE catalog_items.catalog_id=5
ORDER BY sold DESC LIMIT 20


SELECT items.* FROM items
LEFT JOIN catalog_items ON items.id=catalog_items.id
WHERE catalog_items.catalog_id=5
ORDER BY date_added DESC LIMIT 20

Perfect! Now we can populate item IDs into catalog_items however we like, by any crazy logic or no logic at all, and the store can easily browse/search it as a subset.

Last step : how to know set up our system so that it sometimes joins against this catalog_items table when needed, (jazz.cdbaby.com), but doesn’t when not (www.cdbaby.com)?

Easy! We’ll ALWAYS join against it. If you’re browsing www.cdbaby.com with no limiters then that’s catalog #1, a catalog of all available items that we’ll populate into this join-table every night like we do the others. Catalog #2 and up will be the subsets.

This even solves a problem I wasn’t looking to solve: the trouble of always having to pass “WHERE active=TRUE” in every single search, so that I’m not showing items until they’re approved to go on the site. It ALSO solves a problem of not including abstract items’ variations, which is a topic of a future post.

Derek Sivers

AddThis Social Bookmark Button

When switching CD Baby from MySQL to PostgreSQL, it gave me the perfect opportunity to fix some things in our database structure that needed changing. Among many other changes, here are some interesting ones:

ALBUMS ARE NOW ITEMS:
OLD: one big table called “albums”, that has everything we sell
NEW: one common table called “items”, with sub-tables depending on the type of item

CD Baby was written to sell only one thing : CDs.
When people ask if we can sell their T-Shirt, we say no.
When they ask if we can sell digital downloads, we say no.
Can we make a bundle of albums sold for a discounted price? No.
Gift certificates? No.

My “albums” table was the rusty axle to this wheel. Everything revolved around it. Every line of code ever written was dependent on this “albums” table and the way it worked.

So - I imagined a future CD Baby that could sell many different types of items, looked for the basic common things that they ALL have (name, price, description), and made it an “items” table.

If item is an ALBUM, it pulls in details from the “albums” table. (An album can be any format: CD, vinyl, download, whatever. It’s a collection of songs.)
If item is a BUNDLE, it foreign-key joins a list of other items, and gives it one combined price.
If item is MERCH, it’s the “merchant ships it” thing, where we collect the money, the vendor ships it directly to customer, then we pay them after it’s proven shipped.
If item is DOWNLOAD, it foreign-key joins to a song: a piece of music. (A download album, then, is a BUNDLE of DOWNLOADs.)

ADDRESSES STAND ALONE
OLD: customer had address, city, state, zip, country. invoice had address, city, state, zip, country
NEW: table called “addresses”, linked to by not only customer but lineitems

I always thought it was kinda cool that the big online stores remembered my multiple addresses I’ve ever used (a one-to-many relationship between customer and addresses). Then I realized that this same “addresses” table could be used to not have an address per-order, but rather an address per-ITEM inside an order!

Yeah I know this sounds obvious, but it’s fun figuring out this shit on my own, with no mentor or instruction book telling me this is how the big boys do it. Very satisfying.

SOUNDS-LIKE: A LIST OF FAMOUS ARTISTS
OLD: artists tell us they sound like “bob dylan ani difranco and early zepplin” in a single text field
NEW: album_soundlike joins album_id to list of id#s of famous artists

This is going to be one of the biggest converting challenges.

For 6 years, I’ve asked 80,000 artists to “tell us three famous artists people say you sound like”. This info was entered into a text field, however they gave it to me. Some would even write, “I don’t sound like nobody” or “Chili Peppers back before they got lame”. When people would search CD Baby, I’d just do a full-text search of this field to see if it returned what they were looking for. Now it’s time to try to organize that data, so we really know which famous artists the artists are referring to.

I got a list of famous artists from the brilliant Robert Kaye at MusicBrainz. I stuck these 131,000 artists into a database with only their name and an auto-generated id#. Then a join table to link the album’s ID to the multiple IDs of the referenced artists.

UPDATE: passed this part of the project to Robert, directly, since he’s the king of music metadata. He’s found a great solution using PyLucene that he will open source, too. More on that later, I guess.

Ming Chow

AddThis Social Bookmark Button

Related link: http://www.tuftsdaily.com/vnews/display.v/ART/2005/01/20/41ef4cefa1ecf

Last week, I was interviewed by an editor of the Tufts University daily newspaper, the Tufts Daily, about phenomenon known as blogs. The interview questions I received were very good questions:

  • How long have you been writing weblogs? How did you learn about blogs and how did you get involved in writing them? What do you write about?
  • Do you believe the way people receive news has been changing over the past couple of years? Why do you think some people have turned to blogs or the internet for news?
  • What do you feel are the pros/cons of receiving news from a blog? How reliable are they?

And the interview wouldn’t be complete without that most important question:

  • What do you feel are the pros/cons of receiving news from a blog? How reliable are they?

The article was published back on Thursday. The article couldn’t have come on more better and coincidential timing because yesterday, there was a national feature about the ethical concerns in blogs. Many of the points raised in the latter article coincides with my comments in the Tufts Daily, and it goes into the question of credibility in greater depth.

Finally, there is a new blog-publishing website called Ready, Set, BLOG!. It is created and maintained by the same creator of the GMail4Troops website, Drew Olanoff. One of the differences between Ready, Set, BLOG! and Blogger is the community features. For example, you can view the last visitors to your blog, then you can
visit them and so on. There’s top blogs of the day based on
nominations, plus many other features.

But from all of this, it is very apparent that blogs are becoming mainstream, important to our daily communications, and as a source of information to the critical mass.

(Special thanks to Tufts Daily editor Stephanie Christofides for putting together to article, and for finding me on, where else, on the O’Reilly Network)

Derek Sivers

AddThis Social Bookmark Button

Related link: http://www.rubyonrails.org/

I know I’ve talked about it sporadically for a year now, (see past posts), but I’ve finally started my CD Baby rewrite.

BIG CHANGE #1 : PostgreSQL

I dabbled with PostgreSQL one night when I wasn’t in the mood to do what I was supposed to be doing. I’d heard some people I respect rave about it, and since I really do love databases, I decided it was worth a few hours of my time. Holy Canoli! Its strictness solves most of my data corruption problems I’ve had with MySQL!

Example problems with CD Baby database in MySQL:

#1 - entries deleted from one table that were required by another.
Example: An album is deleted from our catalog, so now all the lineitems (customer purchases) of that album are left dangling with no matching identifier.

#2 - many invalid entries, especially many dates of “0000-00-00″
This one has gotten us into trouble with our digital distribution partners, whose much-smarter system throw up a big fat error when we stupidly report an album’s release date as “0000-00-00″

#3 - no strict requiring of join-ids matching.
Example: someone mistakenly entering that an order belongs to customer #314981 when it should have been #319481, and there is no customer #314981. Database didn’t complain so we didn’t notice.

Yes of course with these and every other example we all say, “Yeah but your code should have prevented that.” Thing is : in MOST places it does, but various little shell scripts and admin scripts can’t check for every possible human error, so here I am with a constantly corrupt database.

This is where I *LOVE* the strictness that PostgreSQL makes easy. Yes I hear that MySQL InnoDB tables do the same thing, but I’ve already fallen in love with PostgreSQL, and switched.

BIG CHANGE #2 : RUBY, BABY!

Like a lost soul walkin’ the earth, lookin’ for spirituality, that stumbles upon the right church with the right people at the right time, I’ve found my niche with Ruby. Its little itty-bitty community attracts some brilliant “think different” types with a love for beautiful code that do this for love, not money.

I liked it immediately a year ago when I learned it while stuck in a cabin in Sweden. I stopped after some shell scripts though because its web-making features weren’t up to snuff. Now, with Rails, there are a team of passionate geniuses contributing to this web-making framework daily. It’s small enough that you can stay on top of it, and watch this framework get more and more powerful by the week. Improvements that are pragmatic not political. People using it to make effective websites, contributing to the shared framework around it as they go. Why not take advantage of all this brilliant work?

It took a lot to get me to switch from PHP, the only language I really know, to Ruby. I tell my non-computer friends, “It’s like the week before sitting down to write a book, I decided to write it in Portuguese instead of English, because it’ll be easier.” It sounds crazy, but we’ll see.

Bookmark/subscribe to my author page, here, at http://www.oreillynet.com/pub/au/1841 if you want to watch the almost-daily developments.

Please no flames about Python, PHP, Java, or MySQL. My choice to use Ruby + Postgres was due to my love of them, not hate of something else.

Andy Oram

AddThis Social Bookmark Button

Opera houses show translations of operas to the audiences on screens
while the performance is underway. I hope the opera houses consider a
change of platform after the performance of La Bohème reported in
today’s Boston Globe, where during the main character’s first aria,
the computer controlling the display started running a routine anti-virus
scan.

Nitesh Dhanjani

AddThis Social Bookmark Button

Related link: http://tor.eff.org/

Tor has been around a while, but I have only recently had the chance to look into it in more detail:

Tor is a network of ‘virtual’ tunnels that allows you to connect to hosts on the Internet with increased privacy. You can use it to keep remote hosts (such as web servers you may be connecting to) from learning about your location (IP address). Tor does this by routing outgoing connections from your computer via “onion routers”, i.e. specifically designates hosts that have been setup to participate in the system. To quote from the Tor website:

“To create a private network pathway with Tor, the user’s software or client incrementally builds a circuit of encrypted connections through servers on the network. The circuit is extended one hop at a time, and each server along the way knows only which server gave it data and which server it is giving data to. No individual server ever knows the complete path that a data packet has taken. The client negotiates a separate set of encryption keys for each hop along the circuit to ensure that each hop can’t trace these connections as they pass through.

Once a circuit has been established, many kinds of data can be exchanged and several different sorts of software applications can be deployed over the Tor network. Because each server sees no more than one hop in the circuit, neither an eavesdropper nor a compromised server can use traffic analysis to link the connection’s source and destination…”

Great! In addition to help protect everyday privacy by allowing web surfing to be anonymous for the ordinary user, Tor sounds like an excellent idea for those who wish to establish outbound connections via ISPs that prohibit certain protocols (since Tor uses proxy software to tunnel the connection via it’s routers). Also, I’m pretty sure Tor will begin to be quite popular among BitTorrent users! However, do note that while Tor attempts to anonymize your location, it does not protect against protocol specific issues:

“Tor can’t solve all anonymity problems. It focuses only on protecting the transport of data. You need to use protocol-specific support software if you don’t want the sites you visit to see your identifying information. For example, you can use web proxies such as Privoxy while web browsing to block cookies and withhold information about your browser type.

Also, to protect your anonymity, be smart. Don’t provide your name or other revealing information in web forms. Be aware that, like all anonymizing networks that are fast enough for web browsing, Tor does not provide protection against end-to-end timing attacks: If your attacker can watch the traffic coming out of your computer, and also the traffic arriving at your chosen destination, he can use statistical analysis to discover that they are part of the same circuit.”

Appropriate links:
The Tor web-site.
Tor documentation.
Download Tor.
OS X specific instructions.

Andy Oram

AddThis Social Bookmark Button

Tony Mobily describes his

Free Software Magazine

as being half about freedom and half about technology. Altogether, it
is meant to help everyone from the curious individual to the head of a
government or business department understand what they’re getting into
and how to make use of free and open source software. Mobily is trying
to establish his new magazine as the authoritative source for all
kinds of information on free software.

I understand that there’s a need here, which others have tried to fill
and not quite succeeded. One can get news about free software–and as
many opinions as you can stomach–hourly from a number of online
sites, but few go into depth and none could be called authoritative. A
few sites such as
First Monday
offer intriguing articles of a more formal and academic nature. And
several excellent magazines cover Linux, but they’re directed at
particular subsets of Linux users and don’t have the broad mandate of
Free Software Magazine. Is there a niche for Mobily’s venture?

The first issue provides some nice nuggets. My favorite article is
Malcolm D. Spence’s checklist for justifying free software: