January 2006 Archives

Andy Oram

AddThis Social Bookmark Button

Related link: http://openqrm.sourceforge.net/

There’s an interesting new distributed management project called
openQRM. It’s currently around 73 on
SourceForge, and has been up in the top 50 recently. It was released
under a modified version of the Mozilla Public License by
Qlusters,
a company that was founded by openMosix developer Moshe Bar and that
I’ve repeatedly met with at LinuxWorld Expos. I just talked to a
colleague of mine, William Hurley, who recently took a job as Qlusters
CTO.

According to Hurley, what distinguishes openQRM from the many other
available network and cluster management tools is that it lets sites
continue to use the architecture and software they currently have with
minimal disruption. openQRM developers have already created plug-ins
for Nagios, Xen, and VMWare ESX. Integration with other software or
home-grown scripts at each user site should be fairly easy. Now that
the core technology is open source, the team is hoping to pull in more
developers from the community, particularly to support FreeBSD, OS X,
and other platforms.

openQRM is important because of the proliferation of servers run by
small organizations on small budgets, made possible in recent years by
free software running on cheap hardware. The bottleneck now, as many
TCO analysts point out, is system administrator time. According to
Hurley, without good management tools, system administration
can add $7000 to $20,000 to the annual cost of each box.

openQRM offers automatic failover for servers under its control. In
fact, you can set up automatic failover from one openQRM managing
server to another, so there’s no single point of failure anywhere in
your architecture. The system also supports diskless servers, which
lowers cost and rates of failure.

I find this product interesting also because it reflects the
continuing move to free software by commercial vendors. Qlusters
started out as an entirely proprietary product, but because they
lived in a Linux environment and ran on Linux systems, they felt they
wanted to become more a part of the free software community. I had a
long meeting on this topic with VP Fred Gallagher at the most recent
LinuxWorld, which I wrote up in a

blog from that conference
. Since then, they’ve acted on this
impulse. Hiring Hurley and releasing openQRM are both significant
steps toward supporting the free software movement.

Qlusters itself will market a variety of proprietary plug-ins and
management tools on top of this open platform. The license allows
other companies to develop proprietary products on top of openQRM. The
only stipulation (and the only modification Qlusters made to the
Mozilla Public License) was that commercial vendors have to
acknowledge they’re using Qlusters’s openQRM.

Andy Oram

AddThis Social Bookmark Button

I spent two days this past week at Microsoft Search Champs, a
conference where invitees make suggestions for their search tools and
other MSN offerings. Microsoft paid for everything and picked our
brains concerning a lot of different topics, some under NDA and some
public.

Why would I do this, and why would they want me there? I’ve been
associated with the free software movement for at least ten years. But
while I value openness, I also value functionality. If you browse my

articles and blogs about Microsoft
,
you’ll find about as many positive references as negative ones. I
appreciate new solutions and technologies from all vendors, and I
think one company’s success will provide a model and a motivation for
others to move forward.

Furthermore, Microsoft is around to stay, and people who make a living
in the computer world have adapted to it. Every professional and
aspiring professional I know, both programmers and system
administrators, have learned their way about both Microsoft systems
and Unix-like systems. A host of projects such as Samba and Mono set
this accommodation in code, as I discussed in my article

Can the Samba Story be Retold?

Finally, I have seen evidence–this week and other times–that there
are many different attitudes toward the open source movement and
transparency in general at Microsoft, and believe that I could have a
positive impact by going there, partly to argue for more openness on
their part.

Some highlights of Search Champs

All the tools and sites we looked are part of

Windows Live
,
a next-generation combination portal, social network, and information
site. Microsoft hopes that people who use MSN and competing networks
will move to Windows Live.

My favorite feature is
gadgets,
which are like the mini-applications you can add to your tool bars on
many operating systems, but with more real estate and therefore more
features. You could probably create an extremely powerful interactive
Web page in half an hour or so by adding gadgets for maps, RSS feeds,
message boards, and so forth. The key value will come if Microsoft is
successful in making it easy for non-Microsoft developers to create
and contribute gadgets.

For years, web designers and programmers have put together rich sites,
but gadgets can encapsulate the most popular features and makes it
something you can throw together as easily as an RSS feed or bookmark
list. It won’t look wildly creative, probably (it depends on how much
extra sweat you want to put into it), but it will look consistent with
other useful sites. If Microsoft can persuade a few alpha bloggers to
switch to this system, it may become a de facto standard.

The current crop of new Windows Live features seem consistent with
what I see as Microsoft’s two general strengths: attractive interfaces
and elegant integration.

For instance, their Windows Live Local combines maps, aerial views,
and 45-degree-angle photos, all very easy to reach. (My family and I
were a bit spooked to see our neighborhood, so lifelike were the
photos.) You could just click on two points of a map, and driving
directions are generated. (Some attendees asked why they don’t present
public transportation options too.) Then you can select a point and
see a photo of a particular traffic intersection so you can recognize
and navigate it. You can also drag a site to a scratch list that is
saved between sessions.

So these local features are an incremental improvement over what we’ve
had so far–incrementally better enough to make a difference for many
people.

Microsoft’s plans for Windows Live are also based on building
communities. This means persuading users to share personal
information. Productive citizens of Windows Live will have rich
identities, so that they can find other people with similar interests.

The last attempt by Microsoft to leverage user information was
Passport, which we all know didn’t go very far. Passport is still the
ID system that lies behind personal identity on Live. But the intent
now looks a lot different.

I think Passport failed because its core promise was for Microsoft to
guard very sensitive data for its users, such as phone numbers and
credit card information. Supposedly, when you buy something online you
could have Passport automatically transfer such information to the
corresponding party at the appropriate moment. People didn’t trust the
whole environment for online security–even if they could trust
Microsoft’s security, another point of contention–enough to place
their information in Passport’s hands for such hard-to-monitor
purposes.

But the new identity doesn’t involve credit cards so much as who your
pets are and what music you listen to. Microsoft certainly hopes
you’ll share a couple key pieces of demographic information with them
(age and gender) to help them target ads. But for the most part, what
you build up as online identity is not what you’d share with a vendor,
but what you’d share with neighbors and school chums.

The Live developers are working on lots of other interesting
things–multimedia search, a classified ad site, and more–but I’ll
leave it up to others to introduce them.

The value of openness

I certainly took the opportunity to press my philosophy at the
conference. Drawing on debates where I live in Massachusetts, I
complained to Microsoft managers that some of Microsoft’s supposedly
open formats (such as the XML format for Office) were encumbered by
all sorts of small but ominous restrictions, including the threat of
exercising patents. These cumulatively make potential users and
competitors afraid of Microsoft acting against them.

In addition to pointing the managers to the

groklaw analysis

of the legal labyrinth Microsoft erected around its Office XML format,
I also pointed to

critical coverage

of their assertion of patent rights on the FAT filesystem. (The
supposedly novel technique they patented looks to me like just a
variation on the familiar idea of file attributes stored in a parallel
location to the files.) And I did not omit mention of the

absurd Slashdot tug-of-war

over Microsoft’s Kerberos enhancements, which not only broke compatibility
with other Kerberos implementations, but were described in a document
you had to license just to read.

My point to management–and you have to remember I was talking to
developers here, not the lawyers or other managers who thought up
those legal forays–was that such activities create bad feelings among
many of the people they want to attract: the amorphous “information
loving” community of artists, academics, lawyers, and so forth. They
make developers worry, because if developers have to cede a substrate
to Microsoft and just build on top of that substrate, nothing prevents
Microsoft from coming along later and taking over the new layer they
just built.

And if such maneuvers do anything to help Microsoft’s business model,
it’s the wrong business model. (I limited my complaints to legal
issues, and did not want to load on yet more by talking about business
practices.) I think the very existence of Search Champs shows there’s
movement toward more openness at Microsoft, a pull against the more
controlling elements.

I was by not means the only one of the 57 invitees to have such
sympathies. I heard plenty of discussion of both Macs and GNU/Linux
systems. MSN managers declared they wanted their site to work on all
these systems. (I have tried live.com out a lot on Linux, using both
Firefox and Konqueror, and find it works fine.) On the bus to the
Redmond campus, I heard a possible solution to my problem getting Ext3
filesystem support compiled into the Linux kernel. Another attendee
told me bluntly, “There’s no reason for major sites to use anything
except open source software” and cited Lawrence Lessig as one of his
most inspiring influences. Several people (including Microsoft staff)
brought up
Creative Commons
approvingly, and DRM came in for a lot of criticism.

The Justice Department subpoena

I would have liked to spend more of the sessions discussing
Microsoft’s legal activities and lobbying, but another policy debate
upstaged it. Over the past two weeks, press reports revealed that the
U.S. Justice Department subpoena’d MSN and other sites to hand over
large amounts of search data, and MSN complied. The public, already
rubbed raw by the revelations that George W. Bush and the NSA ignored
laws to carry out widespread wiretapping, reacted with fury to MSN. In
our sessions at Search Champs, the MSN managers succeeded in
justifying their actions and winning us over, but they made some
promises to communicate better in the future.

A

semi-official Microsoft explanation

starts a Web page with valuable list of comments. At Search Champs, we
heard even more clearly that Microsoft negotiated hard with the
Justice Department and insisted on stripping out IP address
information. Furthermore, what they handed over was merely a list of
terms and the number of searches on them; no term could be correlated
with another term or with an IP address.

MSN managers came away with some rough guides for handling future
challenges. First, the major search sites should talk to one another
and come up with a common policy for handling government and research
requests. Second, they should publicize what requests they get and how
they respond.

What it felt like to be there

A corporate junket is a new experience for me, and I don’t know
whether I would have understood how it felt like in advance if I’d
heard about it from others.

On the one hand, Microsoft pampered us in almost every way, from the
cars they sent to pick us up at the airport to fine food and gift
certificates. They plied us with liquor before and after meals. We
were all lodged at the W Hotel, which is famous for a particular look
and feel. For instance, all the hallways and common areas are dim–not
romantic dim, but suspended-reality dim. In Seattle, which is
naturally dim most of the time, this is no enhancement. The hotel
follows up the theme with fancy functionalist designs along Central
European designs, leading to some electronic equipment that’s almost
impossible to operate. And then there’s the background music you hear
everywhere, which can perhaps be described as Third World New Age
technopop. One day they threw in ten minutes of the Adagio from
Mahler’s Fifth Symphony between two technopop selections.

Some of us would have felt more comfortable had they lodged us at
something equivalent to the YMCA, fed us on burgers, and made us take
public transportation. But we probably wouldn’t have felt guilty
enough to work as hard as they wanted.

On the other hand, the Microsoft building we were in was much like
other buildings used by high-tech companies, and spending two days
there was much like sitting around talking to any computer programmers
about any topic at any time. The meeting appeared highly structured
when the schedule was presented to us, but in fact it was fairly
informal. And they always followed up meetings by telling us what a
fantastic job we were doing–just for reacting to their work out of
our personal experience.

I’m not sure who learned more from the whole event. I certainly
learned a lot about people in the field as well as technology, and I
appreciate all the money and effort Microsoft put in.

Jeremy Jones

AddThis Social Bookmark Button

Related link: http://groups.google.com/group/turbogears/browse_frm/thread/7528362868b88299/15f…

The “eggs” that Kevin Dangoor (the creator of TurboGears) is referring to here is the package file format which TurboGears uses.

I’m not exactly sure what all the numbers are, though. He mentions that TurboGears eggs have been downloaded over 20,000 times and that overall there have been over 100,000 eggs downloaded. I guess he’s referring to downloads of other packages which TurboGears has a dependency on and which are hosted at turbogears.org.

Congratulations, Kevin! That is quite an accomplishment. I’d be interested to also see how many SVN checkouts there were. I’ve been running off of SVN for so long I can’t remember the last egg I downloaded. I suspect there are plenty of other folks in that same boat.

Jeremy Jones

AddThis Social Bookmark Button

Related link: http://ipython.scipy.org/

From the changelog, this looks like mostly cleanup and bug fixes rather than a major revamping. I just downloaded IPython using easy_install and it brought me up to 0.7.1. It complained about a conflicting version already installed, so I gave it the -D flag so it would delete the conflict. I should’ve looked more closely at where the conflict was before letting it just “fix” the conflict. It said the conclict was in a plain IPython directory and it installed this one to an ipython-0.7.1-py2.4.egg directory and updated a .pth file. Aahh. ipython-0.7.1-py2.4.egg contains an IPython directory, so I must’ve installed whatever version that was from source. It looks like easy_install did the right thing.

From what this message says, it looks like IPython will be getting a pretty major overhaul soon. The SVN branch for the overhaul is named “chainsaw”. That message also mentions that Ville Vainio will be taking over maintenance of the 0.7.x development line, thus freeing up Fernando Perez to work on the overhaul. This can only be a good thing.

I can’t say enough good things about IPython. For anyone not familiar with it, it’s a powerful replacement for the standard Python interactive shell, as well as a customizable shell which can be used for pretty much anything. People have even made it their default system shell. I highly recommend it.

Sid Steward

AddThis Social Bookmark Button

Related link: http://www.walterzorn.com/index.htm

Here are some fundamental, well made JS goodies. From the site:

JavaScript Vectorgraphics Library

Graphics capabilities for JavaScript. Routines to draw inclined (oblique) lines, ellipses, circles, rectangles, polylines, polygons. Elements which actually aren’t available through HTML.

Drag’nDrop & DHTML Library

A DHTML JavaScript Library with extended yet easily understandable DHTML API. Provides also Drag & Drop functionality for layers and images …

Tooltips with JavaScript

A cross-browser solution for javascript-created tooltips (information boxes close to the mouse pointer) that works even on Opera 5 and 6. The appearance of these tooltips can be customized in multiple ways (color, border, shadow etc.). The tooltips may contain plain text as well as HTML, for instance images etc.

Rotate Image

An experimental JavaScript Library to rotate images dynamically by arbitrary angles. Just a demonstration - it’s strictly advised against using this unpromising JavaScript experiment on a website!!

I’ve put these four JavaScript libraries under the LGPL (Lesser General Public License, http://www.gnu.org/copyleft/lesser.html ). You may use them for free under the terms of the LGPL and of my copyright.

Online Function Grapher

Written in JavaScript. Draws function graphs directly into the browser window - no download, no plugins required.

Chris Shiflett

AddThis Social Bookmark Button

Related link: http://shiflett.org/archive/184

Last month, I discussed Google’s XSS Vulnerability and provided an example that demonstrates it. I was hoping to highlight why character encoding consistency is important, but apparently the addslashes() versus mysql_real_escape_string() debate continues. Demonstrating Google’s XSS vulnerability was pretty easy. Demonstrating an SQL injection attack that is immune to addslashes() is a bit more involved, but still pretty straightforward.

For the impatient, here’s the code:

<?php 

$mysql = array(); 

$db mysqli_init(); $db->real_connect('localhost''myuser''mypass''mydb'); 

$_POST['username'] = chr(0xbf) .                      chr(0x27) .                      ' OR username = username /*'; $_POST['password'] = 'guess'; 

$mysql['username'] = addslashes($_POST['username']); $mysql['password'] = addslashes($_POST['password']); 

$sql "SELECT *         FROM   users         WHERE  username = '{$mysql['username']}'         AND    password = '{$mysql['password']}'"; 

$result $db->query($sql); 

if ($result->num_rows) {     echo '<p>Success</p>'; } else {     echo '<p>Failure</p>'; } 

?>

The full explanation covers this in a bit more detail, but the basic idea is that addslashes() can be tricked into creating valid multi-byte characters out of invalid ones. Whenever a multi-byte character ends in 0×5c (a backslash), an attacker can inject the beginning byte(s) of that character just prior to a single quote, and addslashes() will complete the character rather than escape the single quote. In essence, the backslash gets absorbed, and the single quote is successfully injected. This opens the door for SQL injection attacks.

The moral of the story is to use mysql_real_escape_string(), bound parameters, or any of the major database abstraction libraries.

Chris Tyler

AddThis Social Bookmark Button

Related link: http://blog.chris.tylers.info/index.php?/archives/17-How-to-Rollback-Package-Upd…

Fedora Core 4/5 uses yum for package management. yum is build on top of rpm, and pirut, pup, and yumex are graphical interfaces built on top of yum. Together, these tools provide a simple-to-use, powerful package management system.

One of the least-known secrets about rpm is that it can rollback (undo) package changes. It can take a fair bit of storage space to track the information necessary for rollback, but since storage is cheap, it’s worthwhile enabling this feature on most systems.

Here are cut-to-the-chase directions on using this feature:

  1. To configure yum to save rollback information, add the line tsflags=repackage to /etc/yum.conf.

  2. To configure command-line rpm to do the same thing, add the line %_repackage_all_erasures 1 to /etc/rpm/macros.

  3. Install, erase, and update packages to your heart’s content, using pup, pirut, yumex, yum, rpm, and the yum automatic update service.

If/when you want to rollback to a previous state, perform an rpm update with the –rollback option followed by a date/time specification. Some examples: rpm -Uhv –rollback ‘9:00 am’, rpm -Uhv –rollback ‘4 hours ago’, rpm -Uhv –rollback ‘december 25′.

Have you used package rollback?

Jeremy Jones

AddThis Social Bookmark Button

Related link: http://www.djangoproject.com/

When TurboGears came out, I was pretty excited about it. I was able to quickly throw together a digital photo management application for my wife. I was also able to quickly build her an online store for her new business. As is common with any technology, I encountered a number of problems while building my wife’s store. There have been issues with the Kid templating system which still appear to be unresolved. Maybe they’re fixed, I don’t know. I modified my code to bypass what was causing the errors I was seeing - at a loss of functionality and flexibility. Another issue I’ve seen is with CherryPy deadlocking on what appears to be session file cleanup. I could switch over to database session storage, but I’m not sure I want to go that route. If I stayed with TurboGears, though I might not have to switch; I just read on the CherryPy list that this issue may be fixed.

Serendipitously, I recently started looking at Django. All of these issues and my glance at Django really got me thinking about how TurboGears was put together. I have had almost no problems with the TurboGears code itself. Actually, I can’t really think of any problems I’ve had with core TG code. The problems I’ve had have been with the underlying components - specifically CherryPy and Kid. I began to wonder if I might be better off with a unified solution rather than a solution made of components from separate projects. So, out of curiosity as much as technological motivations, I began porting my wife’s store to Django. I may be totally wrong about unified vs. multiple project component based, but my thoughts are at least reaffirmed by this blog post from David Heinemeier Hansson (the creator of Ruby on Rails).

The store consists of a product catalog, a shopping cart, a shipping calculation algorithm, and a payment system using PayPal. The most logical first step was to migrate the database model. From a user perspective, both SQLObject and Django take a common approach. A class corresponds to a table and class attributes (each with a certain declared type) correspond to columns in that table. The database migration was pretty simple. I did do a minor overhaul in how I group products together. I still have a little work to do on that, though. One huge plus to Django is the pre-built admin interface. With a total of two extra lines of code per class/table, you get a beautifully usable, customizable adminstrative interface to your database. I haven’t utilized it much yet, however. In these initial stages, I find it easier to populate the products into the database through a script. But I think when the site goes live, this will be a huge feature in managing orders.

The next thing I did was to write a couple of quick pages to display the items in the product catalog. At this point, I knew that I was just experimenting and wasn’t committed to using Django yet. I wanted to see how Django would run in my production environment, which is under FastCGI.

Let me take a small pause to say that much of my recent trouble with TurboGears would have been caught earlier if I had deployed it incrementally to my production hosting server (or probably to a comparable environment not on the hosted server). So, I blame myself for not catching the problems earlier. I did try at one time to get the store running under FastCGI on one of my own servers in a comparable manner to how it’s run on Dreamhost (my hosting service). When I couldn’t get it running in a timely manner, I decided to not pursue it any further. In hindsight, this was obviously a mistake.

So, back to my tale. I let this minimalistic site run for a day or so to see if I could feel comfortable moving forward with Django. Everything checked out well. There were no anomylous errors. Both the admin interface and my product listing pages were displaying consistently. (However, the admin interface is devoid of images or a stylesheet. I’ll figure that out soon. I think it has to do with how I have my static/media mapped. I’m sure it’s not a problem. Famous last words, right?) At this point, I became fully committed to porting the full store over to Django.

Next, I set up a base template that each page would extend. This will be a nice addition to the site since I was unable to use this same feature in TurboGears. This is one of the problems I generally hinted at above regarding the Kid templating system. I also fleshed out some of the static pages so I mostly had the same look and feel and content. That was a snap.

I decided that I should next start building up the code around the product catalog, getting all the images and product details displaying properly. Again, this was simple. Django’s templating system is really simple and straightforward to use.

This is about as far as I’ve gotten. I’m hoping that I’ll have the site totally migrated by next weekend. I’ll post back with progress and thoughts on Django. So far, it’s working pretty reasonably.

Ming Chow

AddThis Social Bookmark Button

Related link: http://www.cs.tufts.edu/~mchow/excollege/s2006

I have returned to Tufts University to teach a new course Introduction to Game Development this semester. I am excited and fortunate for another teaching opportunity. Last year, I taught “Security, Privacy, and Politics in the Computer Age” at Tufts University, and it was a tremendous success. Teaching the course was a most rewarding and flattering opportunity for me. My course evaulation was very good. The students appreciated the applicable value of the course, and it gave them an exposure to the “tech culture” (most of the students were non-technical). Many of the students expressed that they wanted more technical content. Finally, the Tufts Experimental College asked students what courses they would like to see in the future, and many said a course on game development.

My experiences with computer graphics, networking, databases, software engineering, HCI/user interfaces, and algorithms, will all come in handy. My past development of several small games will certainly be valuable as well. When I was a Computer Science student at Tufts, most of the courses offered were theory-based and very few implementation-based. I always questioned the value of what I was learning, and how could I put everything that I learned together. That is the beauty of game development: it requires all facets of Computer Science. I wished that such a course was offered to me when I was a student, and this is a major reason why I am teaching the course back at my alma mater. Already, my course is filled. Several students said that they appreciate that I am teaching such a course at Tufts.

I will be using Java in the course, not C/C++. Why? Two reasons: portability and cost. I do not have a computer lab for the course, and not all students have Windows PCs. Most of the Java development tools, including the SDK and Eclipse, are free as in free beer, so students can do their work from their PC in their dorm room. Many students said that they know C/C++ so I’ll spend two days giving a Java crash course. And yes, I know that Killer Game Programming in Java will be vital resource for me and my class.

I welcome any insights or concerns. All the lectures, assignments, examples, and resources are available on the course website at http://www.cs.tufts.edu/~mchow/excollege/s2006/. Please feel free to follow my course online.

AddThis Social Bookmark Button

It used to be that compiling a custom Linux kernel was almost a necessity. Something you just had to do if you wanted a working system. These days, with loadable kernel modules and better hardware support in the vanilla kernel, I find kernel patching and custom configuration less “necessary” but often still desirable.

I’m curious. Why do Linux users these days configure and compile a kernel? To increase performance or hardware compatibility? To add filesystem support or enable experimental features? Just for the fun of it? An attempt to have absolutely the smallest kernel image possible? An attempt to build a highly portable kernel? My reason usually boils down to getting a new piece of hardware to work fully.

Also, what problems do most people have when compiling a kernel? Migrating to new kernel versions? Patching the kernel? Getting an initrd image to load? My biggest three problems are knowing which options I must enable, finding those options in menuconfig, and knowing the name of module I just compiled.

Please, write a comment and tell me why you find it necessary or desirable to compile a kernel, and the most annoying parts of the process.

Sid Steward

AddThis Social Bookmark Button

Related link: http://orsn.org/

From the ORSN FAQ:

A root server has a reference data base of all of the TLDs released by the ICANN (Top level Domain) e.g. DE, AT, CH, COM and many others.

The ORSN serves as a alternative for the existing root-server network since February 2002, which is coordinated by the ICANN. In contrast to the root servers of the ICANN, the ORSN servers should predominantly be placed in Europe. The maximum number of ORSN root-servers will be 13.

Until now, the administration is done by the USA and/or the ICANN. Therefor, a large number of root-servers is located in America. A loss or the modification of the root-server information could result in serious consequences for all other countries concerning their internet use. It is for example possible to stop a whole country from using the internet. In practice, this scenario didn’t happen so far but it can’t be excluded either.

It appears to be a local, independent ICANN root mirror and fail-safe. It subscribes to ICANN’s TLD policies, yet it reserves judgement over what it might mean for ICANN to ‘fail.’

Jono Bacon

AddThis Social Bookmark Button

When I first got into Open Source many moons ago, the advocacy movement was a thriving and vocal part of the community. Most of the movers and shakers back in the day were advocating the use of free and open software at work, to their friends and to their local community via LUGs and other groups. Back then, advocacy was a key part of the community, not only in showing existing computer users this alternative software, but also advising disadvantaged people for whom free software could really open up the doors to skill, employment and potential.

Recently it seems this community-driven advocacy effort has petered out somewhat, and there are far fewer people talking about, conducting, exploring, refining and pushing Open Source advocacy. What is surprising is that advocacy is certainly still going on. Within Open Source organisations as well as LUGs, community groups, IRC channels, forums and mailing lists there are countless people discussing and pushing the Open Source message. As I have written about a number of times in previous articles, advocacy is an artform that needs a reasoned, measured response, and is something that can certainly be buffed and refined. In other words, a stronger community could not only help spread Open Source further, but refine the quality of the message that is being pushed to develop an increased understanding in Open Source and free software.

So what can we do? Well, I would love to see more and more people getting involved in advocacy. To help push this a little further, I have set up Planet Advocacy. Like every other Planet site, Planet Advocacy collects together the blogs of those people who are involved in advocacy in some way. Planets have proven an ideal mechanism for developing ideas and communication between different people. It provides a one stop shop for the cutting edge.

With the site still shiny and new, I am looking for people to add to Planet Advocacy. If you think your blog would be an interesting addition, get in touch with me and tell me how you are involved with advocacy and also include a 62×80 hackergotchi .png of your face. To be clear, you don’t need to work as a professional advocate to get on Planet Advocacy - if you are advocating Open Source in your spare time, you are more than welcome. Planet Advocacy is really only the start. It would be great to see more articles, case studies, discussion groups and public meetings. The people are out there, we just need to share our experience and ideas.

Advocacy is an important component in the Open Source community. There are thousands of companies, charities, schools and people for whom Open Source could make a real and tangible difference - intelligent advocacy can help bring them over to us. There is nothing quite so satisfying as seeing someone get as excited about Open Source as we all were when we first started out. Helping to bring new people over not only extends and improves our community, but it spreads the underlying principles of Open Source such as choice, openness and collaboration. Lets see what we can achieve…

What do you think? Would you like to get involved? Do you think increased advocacy is worthwhile?

brian d foy

AddThis Social Bookmark Button

Related link: http://www.apple.com/support/downloads/bonjourforwindows_readme.html

I had to fire up my Windows box today, and I wanted to get some files off of my Mac. That’s not a big deal because they can see each other on the network, but from Windows I need to know the IP number of the machine I want it to look at.

I don’t do anything fancy to give names to my machines on the home network since most of them are Macs and find out about each other through Bonjour. Since my main Mac is called “buster”, from other Macs I get to it as “buster.local” whereever I need a host name. It’s all very nice and happens without me thinking about it.

I figured someone had probably made this available for Windows, and indeed, Apple has Bonjour for Windows. Now my Windows box is a bit more useful. Thanks Apple!

Jeremy Jones

AddThis Social Bookmark Button

Related link: http://www.turbogears.org/bugbounty.html

If I’m reading the link I’ve referenced right, if you submit a patch to the TurboGears project (a web development framework for the Python language) for one of the bugs with a “Develix” keyword and the patch gets accepted, Develix will reward the patch submitter with one year of free hosting.

I assume that it’s somehow important enough to Develix that these specific bugs get fixed that they are willing to set a bounty so that someone will fix them. If this is the case, it’s a win-win for the community. Develix gets some pertinent bugs fixed for a relatively low cost, the patcher gets free hosting, and the community gets the same bugs fixed, as well.

Kevin Shockey

AddThis Social Bookmark Button

Related link: http://conferences.oreillynet.com/cs/os2006/create/e_sess/

For the first time, the O’Reilly Open Source Convention will feature a Microsoft Windows track. The focus of this track will hopefully capture the growing momentum behind projects like Mono, mojoPortal, iFolder, TomBoy, F-Spot, Banshee, NHibernate, NAnt, and Nunit (just to name a few).

I believe that this new track addition recognizes this growing momentum and seeks to share it with the broader open source community. Of course a lot of the momentum is due to the ongoing success of Mono. Mono is now on release 1.1.13, which marks a feature freeze point for Mono in preparation for Mono 1.2. Windows.Forms is the only piece left before they officially move to version 1.2 of Mono. Their aim is to release Windows.Forms functionality that implements the .NET 1.1 API. From the news available, the most visible missing pieces left are Multiple Document Interface (MDI) and a few RichTextBox features.

I’m writing to encourage the various project communities from these projects to submit proposals. It is important for us to respond, now that we have been given an opportunity. Think of it this way. As much as we might enjoy contributing to a project, I think most would agree that they get even more enjoyment when people use their software. The O’Reilly Open Source Convention offers the opportunity to expose your projects, your software, and yourself to a wide audience of perhaps some of the most influential people in the software industry. I can’t think of any other event that offers the same opportunity. It simply is “the” place for the open source community to meet up and connect face to face.

With the recent announcement of Mono’s inclusion in the next release of Fedora Core, it is clear that .Net related open source is growing. Now is the time to share your knowledge and love of .Net with the world, don’t hesitate, navigate to the Submit a proposal page, and take the first step!

Will this new Windows track make you attend the O’Reilly Open Source Convention?

Jeremy Jones

AddThis Social Bookmark Button

As I’ve posted before, my wife wanted me to build her a website. Initially, I planned on building it using Plain Old HTML. It was going to be a plain storefront and customers would phone in orders. Then she decided that it would be more convenient if they could upload their images to us rather than email them. CGI would work perfectly for that. Then, we thought that maybe a store catalog and integrated shopping cart would be cool. I started digging into PHP for that. I shied away from TurboGears because I thought hosting would be a problem. After looking around, I decided that hosting was a non-issue, so I built her site in TurboGears.

I settled on Dreamhost for hosting because of price and FastCGI support. FastCGI is one of a handful of methods for deploying Turbogears in a hosted environment. FastCGI has been a source of frustration for me during this process and I don’t expect the frustration to go away any time soon. It just seems really quirky.

I finished my wife’s site yesterday. We did a final walkthrough of the site and I did a few finishing changes. I then began the “deployment to production” process last night. I followed the instructions on the “Installing TurboGears on Dreamhost” wiki.

Thus begins my frustration. Copy my files over. Not a problem. Modify the tg_fastcgi.fcgi script. Not a problem. Make a couple of changes to my TurboGears config file. Not a problem. Drop in a .htaccess file. Not a problem. Test that tg_fastcgi.fcgi runs properly from the command line. Not a problem. Point my browser at my site and get it to kick off the FastCGI process(es). Hmmm. It looked like it was trying to start something. I saw CPU utilization increase, but not on any process I had access to view. Then after what seemed like forever, as if by magic, there were maybe a dozen tg_fastcgi.fcgi processes running. That was liberating. The site was running. And it was pretty snappy, too.

There didn’t appear to be any obvious problems. Except when I needed to change something, then I had to “killall“ the tg_fastcgi.fcgi processes so the change would take effect. FastCGI is apparently more finnicky starting up right after you’ve just killed it. I again saw some unknown process eat a little CPU and then there were entries appearing in my log file that looked like this:

[Wed Jan 18 08:09:17 2006] [error] [client ] FastCGI: incomplete headers (0 bytes) received from server “/home/(my account)/(my domain)/tg_fastcgi.fcgi”

And then, after a while, it just magically came up.

Again, no obvious problems. Except when I added an item to my shopping cart. When I went to view my cart, there was nothing there. Then, when I clicked “View Cart” again, there was my item. Click again and it’s gone. Click again and it’s there again. Round and round we go. I’ve created a magical disappearing-reappearing shopping cart! Cool! Wait. Not cool. Customers won’t like that. Neither would my wife. I figured that the problem may be caused by the multiple tg_fastcgi.fcgi processes not sharing session data properly. Aarggghh. I switched over from using RAM as session storage to file-based session storage. The problem immediately went away.

Then I started getting 500 errors and entries in my log file that look like this:

server.log: self._lockFile(lockFilePath)
server.log: File “/home/(my account)/lib/lib/python2.4/site-packages/CherryPy-2.1.1-py2.4.egg/cherrypy/lib/filter/sessionfilter.py”, line 345, in _lockFile
server.log: raise SessionDeadlockError()
server.log:SessionDeadlockError

And 500 errors in the browser. And an unusable website. There appears to have been a bug entered against CherryPy which was supposed fix this problem. Maybe I hit a corner case. I don’t know. But it looks like another session-oriented issue. Maybe FastCGI isn’t playing nicely with the session storage files.

So, I have a web application which is difficult to modify quickly because FastCGI doesn’t appear to have a nice “restart” option. (If someone knows of one, I’d appreciate you posting it here. I found a reference to giving a “killall -USR1“, but I really don’t want to try that right now. The server is running OK for the moment). It seemingly randomly spews 500 errors and has session deadlocking issues. There is also sometimes a significant lag during the first request after there have been no requests for a while. The site has been (mostly) fun building. Deployment has been a beast, though.

I’m not blaming Dreamhost or TurboGears or FastCGI or CherryPy or anything else. I’m just venting a bit. It’s good to do that every once in a while. I guess tonight I’ll start trying to find solutions to the relevant problems.

Andy Oram

AddThis Social Bookmark Button

Related link: http://gplv3.fsf.org/

The General Public License covers some of the most important software
in widespread use: Linux, MySQL (dual-licensed by the vendor), Samba,
and many other modern packages, not to forget the suite of compiler
tools and command-line utilities from the Free Software Foundation,
for which the GPL was originally designed.

That’s why hundreds of people came to hear Richard Stallman and Eben
Moglen (law professor and general counsel for the FSF) lay out their
proposed new version of GPL on Monday. And why the audience included
world-renowned leaders from many free software projects–even some
projects such as Apache that aren’t covered by the GPL.

After the dramatic convocation
I reported on,
where the veil was lifted on the hitherto secret draft of the GPLv3,
the packed hall at MIT thinned out over the next day and a half. Most
people got what they needed at the opening session: they found that
the draft opposed patents and Digital Rights management, as expected,
but that it made no drastic changes in reaction to these threats or
other changes in the software field, and that the draft was graciously
accommodating to Application Service Providers (who could have
expected their trade secrets to come under attack), to software under
non-GPL licenses, and to companies acting in good faith to propagate
and make a business from GPL-covered software.

These attendees probably also realized that further work on the GPL
was going to descend into detailed textual analysis requiring both
sophistication and dedication. And most people were ready to leave
that to the committees set up by the FSF.

But don’t fade into the background. It’s easy to view the GPL and the ongoing
discussion about it
through the nicely designed
website,
including a Javascript-driven
comment area.
Moglen has urged the public to stay involved. And open source
proponent Bruce Perens offered several reasons to follow the upcoming
year of GPL discussion:

  • This is a rare chance for the public to make law.

  • His committee found many problems with the draft; it will need a lot
    of work.

  • Feedback will be taken seriously. Bruce’s committee, at least,
    promises to review comments carefully.

So try visiting the site from time to time and take at least an hour
to look at where discussion is heading. Don’t expect to reverse the
philosophy behind the endeavor–Bruce believes “the intent of the
document is sound”–but help to avoid unexpected harm.

Behind the GPL version 3

So what’s in the GPLv3? The actual
draft license
is not that hard to read, and a
rationale document
helps to explain it. Still, I feel it worthwhile to summarize some of
the more interesting points:

Compatibility with other free licenses

The drafters made changes that allow programmers to combine
GPL-covered code with code from other projects. The Apache and Eclipse
licenses were explicitly mentioned as compatible. This outreach is
particularly praiseworthy because those two projects offer key support
to Java programmers, and some others in the free software movement
have sometimes expressed distrust of Java. This change should help
everybody work together.

Patents

These are mentioned four places in the draft. The goals here are
modest: essentially, to force programmers to relinquish patent-related
controls if they use free software. If they have patents on free
software, they must give a patent license to anyone using it. If they
have cross-licensed patents or otherwise gained rights to use patents,
they must help spread this protection to the users of their software.

Digital Rights Management

The goals here are also modest: to make sure free software and DRM are
not used together–in short, to prevent freedom from being used
against itself. First, users are forbidden from closing off access to
works through encryption or authentication keys. (This doesn’t cover
legitimate uses of encryption and authentication for privacy
purposes.) Another clause attacks the notorious “technical
circumvention” measures in the Digital Millennium Copyright Act and
copycat treaties and laws, ruling out the use of GPL-covered software
to carry out the measures.

Tracking infringement

Previous versions of the GPL had built-in termination of the license
if a propagator infringed on it. This minimized the need for copyleft
holders to police users, but it placed a burden on vendors and other
users trying to build systems on free software. They might infringe
unknowingly and have the carpet pulled out from under them at any
time.

Version 3 requires the copyleft holder to notify an infringer within
60 days of the occurrence. The new clause provides protection for
people trying to build a business. It also demonstrates a confidence
by the drafters that the free software community has matured enough to
invest the necessary resources to check up on users.

Provision for additional restrictions

If copyright holders want to go further than the GPL in trying to open
up software–by requiring Application Service Providers to reveal the
code running on their servers, or to retaliate against patent
holders–they are explicitly allowed to do so. These clauses allow
people to experiment with their own solutions to what the Free
Software Foundation sees as problems, but for which it currently does
not see effective remedies.

There are many, many more details. The drafters have learned over the
years which clauses of the GPL have created confusion or prevented
people from doing useful things. The license has also been drafted
with more care to making it applicable in different countries.

Zak Greant, a volunteer who answers licensing questions for the FSF,
told me he is happy with the new draft, finding it both clearer and
more comprehensive. While he currently has to refer people to a FAQ or
other ancillary documents to answer questions, he estimates that 70%
of the questions now could be answered by the legal document itself.
I hope the preceding list makes you curious enough to check out the
official FSF site.

And Beyond

All that said, I took away from the conference a pessimistic
impression that the GPL is not the battlefield where the information
struggles of our day will be resolved. The drafters made no
suggestion that they had solved the problems of patents, DRM, or other
threats to user’s control over information. On the contrary, they used
the conference as a forum to call for political action on these
threats.

The looming collision between the control-obsessed entertainment
industry and today’s dynamic communities of programmers and modders
will be carried out in the social realm more than the legal one. The
law may produce some of the carnage, but it will mostly come along to
clean up the debris after the victory of one side or the other.

If the public turns against Digital Rights Management–if they even
understand what it is–they will do so because of outrageous missteps
like the recent botched Sony CD controls. Even during this highly
publicized incident, it was nearly impossible to find a teachable
moment concerning the importance of user control over computer systems
and software.

I hope FSF spokesperson Peter Brown is right in saying that we have a
great opportunity to explain the benefits of freedom to the public
over the coming year. I also sympathize with his claim that one must
use the term “freedom” instead of focusing on “open source.”

But opponents of the “open source” terminology always caricature the
term and its supporters. Those who pushed for open source have
promoted its ethics and community benefits just as free software
proponents have. The virtue of “openness” as a general principle is
powerful, and has brought people out on the streets in many countries.

I admit that the words “open source” do not slam the ethical challenge
down on the table the way the word “freedom” does. But “open source”
has helped free software spread to far more places in business and
public organization. Now many more people have something to defend
when the free software proponents warn them they’re in danger of
losing it.

I think FSF knows that it needs allies; that’s why the proposed
license demonstrates so much conciliation and coalition-building. In
addition, popularity of Samba, and the presence of Samba project
leaders at conference, shows that the free software movement has
accepted the need to co-exist with a non-free world, at least for a
while.

The FSF has reacted to the encroachment of outside control by trying
to exert forms of control all their own. They have often been
criticized for this, and I don’t want to rehash the flames wars here.
But after the license assails software patents and DRM, it goes on to
impose a ban on “works that illegally invade users’ privacy.” This
makes some sense in context (because some forms of DRM snoop on users)
but one wonders when the FSF is going to stop.

Why not keep going and ban the use of free software, for instance, to
promulgate racism? The obvious answer to this question is that it’s
hard to define what constitutes promulgating racism, and that banning
it would lead to encroachments on other activities that are
beneficial. But the same dilemmas dog the FSF as it tries to fend off
patents and DRM.

The other way to approach free software is the old BSD way of throwing
open the doors and allowing proprietary vendors to enfold the software
into closed products. Proponents of the BSD approach have made a
strong argument: if the free software movement is really a superior
way to treat software and its users, the free versions of the software
will ultimately win out over the proprietary ones. After all, who
could turn down the free software promise of open source code, a
community of experts to turn to for support, and a stream of new
features that will automatically interoperate on different systems?

History seems to bear out this argument, but once again, I’m not
writing this to revive an old debate. I’m putting it here to show that
the fate of free software depends on the reactions of the general
public.

It’s good for programmers to have a choice. For those who
feel it safer to require the unencumbered freedom of what they’ve
produced, the GPL should be as robust and usable as possible. The year
2006 is our year to make it so.

brian d foy

AddThis Social Bookmark Button

Related link: http://perlcast.com/2006/01/17/interview-with-brian-d-foy-about-the-winter-2005-…

Josh McAdams of Perlcast interviews me about the latest issue of The Perl Review.

If you don’t want to listen to me (I sure don’t!), Perlcast also has interviews with Slash-programmer Chris Nandor, Learning Perl author Randal Schwartz, Perl creator Larry Wall, and many other Perl names.

Andy Oram

AddThis Social Bookmark Button

Related link: http://gplv3.fsf.org/

We got it just a few hours ago–the proposed GPL 3 license. Most of
the world got it from a web site, while a lucky few hundred of us got
it at a formal meeting at MIT,

Lots of observers wondered how Richard Stallman, Eben Moglen, and
their advisers would handle such hot issues as remote services (called
Application Service Providers in the 1990s) and patents. Surprisingly,
the license embodies both the conservatism and the room for
experimentation for which we can take U.S. law as a metaphor.

There’s a big right to innovate in law, as in everything else, in the
United States. The right to make law is divided among the national,
state and local governments. For instance, states vary widely in tax
schemes, health insurance provisions, abortion controls, environmental
protections, and other things. This latitude is important not only
because different regions have different needs, but because an
experiment in one state can prove whether something is a good idea,
and can then be adopted at the national level.

The designers of the proposed GPL took a similar open approach in
remote services and patent retaliation. On both issues, the proposed
GPL upgrade takes a middle ground.

Thus, it makes no change that would restrict remote services from
using free software. This is wise in my opinion, because no
reasonable observer would want to drive Google (for instance) away
from free software by requiring them to release all the code that
implements their ranking algorithms.

But the proposed GPL leaves an opening for experimentation: it allows
people to add clauses that would require remote services to propagate
their source code.

This means that if you think you have a smashing good restriction that
would help the public by encouraging remote services to share their
software, and you have a valuable program these services might want to
use, you can release your code under the new GPL and add in your pet
cause. If you strike it lucky and your software is so valuable that
services want it, they will comply with your restriction.

That means there’s a market for legal innovation in the GPL. If others
in the free software community decide your clause led to more benefit
than harm, they’ll start adding it to their own licenses. And
eventually, I assume, after several years of success, the guardians of
the GPL may incorporate your clause into a new version of the GPL.

Similarly, the GPL designers took a much more modest approach to
patents than many people expected. The GPL itself includes a handful
of limited clauses.

Thus, if you have a patent on any software you release under the GPL,
you are granting a patent license without encumbrances on everyone who
uses the software.

Furthermore, if you have a patent license yourself (obtained by
cross-licensing, for instance) on software you release under the GPL,
you have to “act to shield” all users of that software. This is a
vague clause that Moglen hopes to tighten up after discussion.

Neither of these clauses address the most common situations where
holders of patents swoop down to attack free software. But clearly,
Moglen’s years of research into patents have not persuaded him that he
can provide an effective defense against this in a license (or, I
imagine, in patent pools and other mechanisms).

But again, a clause in the GPL 3 allows other people to impose patent
retaliation. This can provide a legal prop for efforts such as patent
pools. We will see how well they function over time.

Meanwhile (as several people at the conference have stressed) we need
to continue to fight software patents on a policy level. In the
European Union, software patent proponents react to every defeat in
every legal forum by finding another legal forum to bring the issue
back to life.

In contrast to the tentative steps toward handling remote services and

patents, the GPL comes out very strongly against Digital Rights
Management, even the term for which Stallman objects to. (No law gives
a copyright-holder or broadcaster rights to impose the restrictions
that DRM usually imposes.) And the new GPL contains a complicated
clause targeted at DRM. As I read it, the clause requires the sharing
of any key that controls access, thus rendering the key useless for
such control and making access equally available to all.

The conference was buzzing long before the opening statements and has
been buzzing ever since, but I wonder how much more we’ll learn, or be
able to improve on the proposal, during the next day and a half. In my
opinion, Moglen did a stupendous job presenting the meaning and
reasons for the clauses. Thoughtful responses will take weeks or
months to emerge, and the proposal is open now for world discussion.

I also heard from Free Software Foundation staff that more conferences
like this one are being planned, one for somewhere in Latin America
and another in Europe. Stallman apologized for holding this conference
in the United States, explaining that they couldn’t arrange an
alternative and listing diverse ways that people were prevented from
visiting the United States (or refused to come and be subjected to
harrassment at the consulate or the airport).

Meanwhile, it’s a whale of a conference. The weather is cold outside
but the atmosphere is popping in the conference hall, which is full to
capacity. I think I’ve never been with so many people I know in one
place, including my own wedding. The tone is very constructive.

People who have always hated the GPL will show no new warmth to the
new version. People who have used the GPL, I predict, will move to the
new one. The changes are relatively conservative, in my opinion, and
the ones that take the most risk are doing so for causes that all of
the GPL’s supporters are united on. However, no one is forced to
move. If substantial projects stick to the GPL 2, it will represent a
failure to persuade on the part of Stallman and Moglen. But in this
matter there is always choice.

brian d foy

AddThis Social Bookmark Button

Related link: http://www.yapc.org/America/

Yet Another Perl Conference, North American edition, is in Chicago on June 26-28, 2006. They issued their call for participation during the black hole I call December.

Their website has the details of the submission process (i.e. where to send your email) and topics of interest. Since this is Chicago, you’re allowed to vote as many times as you like for your own submission.

Curiously, the first Ruby on Rails conference (RailsConf 2006) is in Chicago on June 22-25. The YAPC folks hope that they can get some of the Ruby people to stick around so they can have some sort of cross-language event.

Derek Sivers

AddThis Social Bookmark Button

Related link: http://mysql.he.net/doc/refman/5.0/en/charset.html

I’ve just finished one of the most difficult and tedious problems I’ve ever solved, so I have to share the solution here in a little tutorial of how I fixed this, even though I’m sure there are better ways, this is what worked for me.

THE PROBLEM - PART 1:
My old CD Baby MySQL database from 1998 was filled with foreign characters and was in MySQL’s default (latin1) encoding.
For years, customers and clients had been using our web interface to give us their names, addresses, song titles, bio, and many things in all kinds of alphabets.
I wanted everything to be in UTF-8. (The database, the website, the MySQL client, everything.)

QUICK DEFINITION : "FOREIGN CHARACTERS"
When I say "foreign characters" I mean not just Greek, Icelandic, Japanese, Chinese, Korean, and others shown at Omniglot, but also the curly-quotes, ellipsis, em-dash, and things described at alistapart.

START OF THE SOLUTION (THE EASY PART):
* - Found a few hours of downtime at 2am on a Sunday night.
* - Shut down the website.
* - Did a raw data dump (mysqldump) of the data to a regular text "dump.sql" file. (85 tables, millions of rows, an 8 gig dump)
* - Completely removed MySQL 3.2 from the system
* - Installed MySQL 5.0 (FreeBSD ports), making sure to use –with-charset=utf8 while compiling (see http://dev.mysql.com/doc/refman/5.0/en/charset-server.html)
* - Did a sed replace on the dump.sql file, changing all table types to utf8.
* - (Also changed from MyISAM to InnoDB but that’s a different story, and had no problem.)
* - Changed my HTML header Content-Type to charset=utf-8 everywhere
* - Changed /etc/my.cnf to default charset utf8
* - Loaded the dump.sql file, and turned the website back on.
* - Made sure it mostly worked, and went to sleep

THE PROBLEM - PART 2:
Some foreign characters were perfect. Others were a jumble : what should have been one quotation-mark turned into a series of THREE jumbly characters. Weird. Had to be fixed. No idea where to start.

FIGURING OUT WHAT’S WRONG (THE HARD PART):
* - Unless you want to do *everything* in a web browser, you need to get a terminal that does Unicode and can display foreign characters. I used uxterm. See http://czyborra.com/unicode/terminals.html
* - I learned about using the SET NAMES utf8 query, but when I did that almost everything turned into a jumble.
* - I could send the database a set names utf8 command, and SOME would work. Or I could do set names latin1, and the rest would work. I was stumped.
* - It took about 10 hours of frowning and furiously typing, but I found out that
— #1 : The MySQL server was using UTF8 encoding.
— #2 : The MySQL client was using latin1 encoding.
— #3 : Even if I got the command-line MySQL client to use utf8, the PHP client was still using latin1 encoding.
— #4 : Most of my data must have been put into the MySQL server with latin1 encoding, which is why it worked with latin1 encoding on the client when getting it out.

Seems I had some characters in latin1, some characters in UTF-8, some in the database as HTML equivalents (&#20998;) and some characters that were just a total mystery.

A TOOLBOX FOR SLEUTHING CHARACTER ENCODING PROBLEMS:

#1 - USE MySQL CHAR_LENGTH TO FIND ROWS WITH MULTI-BYTE CHARACTERS:
SELECT name FROM clients WHERE LENGTH(name) != CHAR_LENGTH(name);

#2 - USE MySQL HEX and PHP bin2hex
SELECT name, HEX(name) FROM clients;
Get the result back into PHP, and run a bin2hex on the string, compare it to MySQL’s hex of that same string

#3 - SEE IT IN BOTH ENCODINGS
$db->query("SET NAMES latin1");
$db->query("SELECT name, HEX(name) FROM clients");
(compare the string and its hex result from MySQL with the bin2hex from PHP)
$db->query("SET NAMES utf8");
$db->query("SELECT name, HEX(name) FROM clients");
(compare the string and its hex result from MySQL with the bin2hex from PHP)

For all those strings that looked perfect in LATIN1 encoding, here’s how I would fix them in the database:
$db->query("SET NAMES latin1");
$db->query("SELECT id, name FROM clients");
$hex = bin2hex($x[’name’]);
$db->query("SET NAMES utf8");
$db->query("UPDATE clients SET name=UNHEX($hex) WHERE id=$id")

That seemed to work, for most things.
Problem is, only SOME of the database was in latin1 encoding, so I had to use a few quirky ways, but mostly my own eyes, to fix only these things, and not accidently re-encode something that was perfect.

#4 - USE A HEX/UNHEX REPLACE FOR THE UNFIXABLE CHARACTERS
Imagine, after all that fixing, you found strings like this:

Let~!@s say ^|%What a nice house you~!@ve got here, don~!@t you think?^!%.

Who knows when or how this happened, but obviously ~!@ is meant to be an apostrophe, ^|% an open-quote, and ^!% a closing-quote.

I’d use MySQL SUBSTRING to find the 3 characters that needed replacing:
SELECT SUBSTRING(quote, 353, 3) FROM table WHERE id=1;

Once narrowing it down to the exact string, add a HEX() around it:
SELECT HEX(SUBSTRING(quote, 353, 3)