October 2004 Archives

brian d foy

AddThis Social Bookmark Button

Related link: http://www.apple.com/itunes/download/

There isn’t much to report here. I started iTunes today and it told me to upgrade.

It looks like most of the new stuff is for the new iPod, but there is a curious new feature in the Edit menu. Select a playlist or library and the “Show Duplicate Songs” menu item becomes available.

To try this new feature, I actually had to do a little work. iTunes is smart enough to not add twice the same file to the music library. First, I added an MP3 and changed it’s tag info inside iTunes. I then re-added the original file again.

iTunes doesn’t think they are the same song, though. The “Display Duplicate Songs” doesn’t find them.

I change their tag info back to what it should be, an iTunes now thinks they are duplicates.

I change one letter in the title of one. They are no longer duplicates.

Some people have been keeping low-rate encoded versions so they can cram more onto their iPod. I down-sample the duplicate, and change its tag info to match the original. iTunes still thinks they are duplicates, even though one is encoded at a lower bit rate.

I import a completely different MP3, making sure it’s a different play length. I change its tag info to be exactly the same as the first MP3. iTunes thinks they are the same, even though they have different lengths.

Worse than that, not all of the tag fields matter. I can add a comment to one but not the other, and iTunes still thinks they are duplicates. If I change the Song Name and Artist have to be the same, but the Album and Genre can be different. I guess this almost makes sense: rip your favorite artist along with a couple of greatest hits CDs and you get the same song twice, but with different album names. On the other hand, I have three versions of “Ghost Rider in the Sky” by Johnny Cash, and they are not duplicates, but since they have the same Song Name and Artist, iTunes says that they are.

So, on first look, I think “Show Duplicate Songs” is a dubious feature. I want it to look inside the file and compare the bits of the MP3 frames while ignoring the tag. If the frames are the same, it’s the same song even if the tags are different.

brian d foy

AddThis Social Bookmark Button

Related link: http://www.r-project.org/

I ran across R recently, and this week over lunch I talked with an economist about statistical packages. Neither of us had tried R, though. It’s GNU and it’s free, unlike some other popular pacckages.

R has a Mac OS X package that installs quite nicely. They also have pre-compiled binaries for Linux and Windows. The R community looks like it’s stealing the best part of the TeX community just like Perl did. Where TeX has the Comprehensive TeX Archive Network (CTAN), and Perl has the Comprehensive Perl Archive Network which are really just gussied-up FTP servers, the R community has the Comprehensive R Archive Network. And, since I link to the Wikipedia entries for CTAN and CPAN, I created my first wikipedia entry: CRAN.

The R project page has lots of pretty pictures and examples, but for the numbers nerds, here’s a little taste:

I wanted to compare the occurances of the words “wrong” and “right” in the perlfaq repository, mostly because it’s saturday and it’s raining outside and I don’t have any new NetFlix movies to watch. I have the gory perl details in my use.perl journal.

    doc           wrong    right
----------------------------------
perlfaq1.pod        0        4
perlfaq2.pod        0        4
perlfaq3.pod        1        8
perlfaq4.pod        4       12
perlfaq5.pod        5        3
perlfaq6.pod        6        6
perlfaq7.pod        4       11
perlfaq8.pod        2        5
perlfaq9.pod        1        3

Curiously, the distribution of “wrongs” is a bell curve, although not quite symmetrical.

|
|           *
|         * *
|       * * * *
|       * * * *
|       * * * * *
|     * * * * * * *
0+------------------
  1 2 3 4 5 6 7 8 9

I want to get the standard deviation, not because it’s useful but more because that’s what I’m used to doing when a see a chart like that. I could plug the numbers into one of my fancy calculators, but then I wouldn’t get to play with R.

It’s really easy. Scary easy compared to the stuff I had to deal with way back when I was in college and writing my own statistical packages so I wouldn’t have to use the existing ones. I take the numbers from my chart and put them into R, then calculate the numbers I want.

albook_brian[791]$ R

R : Copyright 2004, The R Foundation for Statistical Computing
Version 2.0.0  (2004-10-04), ISBN 3-900051-07-0

> freq <- c( 1,4,5,6,4,2,1 )
> mean(freq)
[1] 3.285714
> median(freq)
[1] 4
> var(freq)
[1] 3.904762
> sd(freq)
[1] 1.976047

That’s enough to hook me. R has all sorts of other much more powerful features that I’m looking forward to exploring those too.

Uche Ogbuji

AddThis Social Bookmark Button

Related link: http://itre.cis.upenn.edu/~myl/languagelog/archives/001596.html

In Igbo I’d approximate “O wu ya” or “Ezi okwu”. I’ve seen extremes from IBM where i18n is taken with the utmost seriousness to typical dot comdom where that “let ‘em eat English, dammit” attitude is all too familiar. I’ve always hoped computers would be a stimulus for preservation rather than extinction of languages.

brian d foy

AddThis Social Bookmark Button

Related link: http://www.apple.com/ipodphoto/

I stopped into the Apple Store on North Michigan in Chicago. I had an hour to kill, and I needed to pick up some things anyway (nothing exciting: some iPod Cleaner), and I needed to check some eBay auctions that just closed (Paypal is refunding seller fees all day today!). While I’m here, I might as well post to my weblog too (since my crappy Comcast broadband is grumpy today). I love this place.

The store is packed. People are huddled around the new G5 iMac (on which I am typing this). I don’t like the form factor myself, but I never have liked attached monitors (or in this case, attached computers).

Apple employees are walking up to everyone they can corner to show them the new 60 Gb iPod, and people like it. I’m not just talking about the iPod: I actually like the store workers talking to me, unlike the bozos at the CompUSA down the street.

Sadly, the new iPod is $500, and I just bought a 40Gb a couple months ago.

I think the two guys behind me are both customers, although one is showing the other how to burn DVDs. The guy next to me is checking his email. A kid on the other side of the partition is playing a MIDI keyboard hooked up to GarageBand. People are smiling. And holy moly, there’s my wife! I didn’t show up with her but I’m supposed ot meet her in 15 minutes! She’s checking her email too. Gotta go….

Uche Ogbuji

AddThis Social Bookmark Button

Related link: http://www.cincomsmalltalk.com/blog/blogView?showComments=true&entry=3276345743

Via Patrick Logan’s blog I found this rant by James Robertson. Both Robertson and Logan are right that XML is too often abused, but both completely miss the point about the nature of this misuse. They are talking primarily about the (mis)use of XML for configuration files, but Robertson builds his case on a premise that points out the continuing problems with the relationship between Web services and XML.

People who really don’t know anything about XML or WS always seem to conflate the two. I get tired of pointing out to people that WS != XML. Then again, can I blame them if XML is too easy to confuse with the many derivative technologies seeking to abuse it?

WS has long ago degenerated into a joke to all but a few marketing professionals, industry analysts and committed developers. Originally it was supposed to be an improvement over the likes of COM and CORBA, this improvement coming because somehow the use of XML would work a salubrious magic. So much for that fantasy. WS-*, as Robertson points out, and many others have before, is now a far more complex “stack” than the entire OMA (of which CORBA is but a part) and with much less grounding in practice. Too bad for WS folk. XML folk don’t care. Why? Because just as XML was never going to magically save an under-architected system from itself, XML was never likely to be substantively damaged by the fact that it was considered the keystone of said under-architected system.

Problem is that word “substantively”. XML is not tainted by WS in hard-code reality, but it has been hurt in a much less tangible way. The long pull on XML towards data orientation from the SOAP camp (as well as the database camp and parts of the old-school OO programming camp) has sown endless confusion about the character and best use of XML. XML was born as a simplification of SGML, a very prose-oriented format, and it has always been better for prose than data records. I too used to think that data oriented and prose-oriented XML could happily coexist, but I’ve lately come to believe such a coexistence is dangerous.

Logan mentions YAML, which does seem to be an option for data record sets than XML, and for some config formats it might be a better option, though you couldn’t convince me that XML is not better for some. Certainly I’d prefer XML any day to Windows .ini type files, which always leave me bewildered. But my main point is dismay at the fact that the WS-* mess is so often used as a general strawman or springboard for any broad attack attack on XML.

Andy Oram

AddThis Social Bookmark Button

Related link: http://www.furzundfeuerstein.com/2004/10/election_2004_t.html

Brian McConnell, telecom consultant and O’Reilly author, summarizes how technology and fine-tuned organization together can help political activists monitor elections and deal with election-related problems and abuses quickly. While the suggestions are individually simple, they illustrate the how subtle interactions between new technologies and old-fashioned interpersonal interaction can be powerful.

Derek Sivers

AddThis Social Bookmark Button

Related link: http://advogato.org/article/258.html

Bram Cohen - the guy that made BitTorrent - has a really interesting essay called How to Write Maintainable Code.

In it, the part that hit me the most, is this:

Create tools: There are two ways of building a barn - one is to make a hammer and use it to nail the barn together, the other is to nail it together with your hands. They might take about the same amount of time, but the hammer will help you again in the future.

This relates to my earlier post on bottom-up programming - which to me is the total opposite of the way I’ve done things for 5 years and a real life-saver of a philosophy, really solving the mess I’ve gotten myself into, all this time.

So now, as I’m re-writing a lot of things from scratch, I’m remembering Bram’s advice, and taking my time to create tools - nice reusable classes - so that all future development will be MUCH easier.

It takes a lot longer, and would appear to the outside like nothing is happening, but it’s pouring a great foundation, and is really very exciting. I’m also using phpUnit2 (a PHP5 port of JUnit) - which has been wonderful. I write a test file for every class, and a test for every method.

Derek Sivers

AddThis Social Bookmark Button

Related link: http://www.37signals.com/svn/archives/000912.php

I rarely make an entry here just pointing to someone else’s blog entry, but I think everyone should read interface laundry and the Pushing Your Limits presentation it references. As I mentioned earlier in say no by default (also inspired by the 37 Signals guys) - I think this is one of the most important design lessons to learn. Not just visual-design, but EVERYTHING-design.

brian d foy

AddThis Social Bookmark Button

We’ve developed a night schedule in my house. My wife is an opera singer, so during productions, she works from roughly 7 pm to 11pm, and after that we have dinner. I’m a techie type, so that suits me just fine.

We’re both well-acquainted with early morning TV, although TiVo usually keeps us away from that. However, this week I’ve noticed that a lot of things important to me happen at 3 am.

For instance, TiVo likes to talk to the mothership at 3am. That may be a setting, but not one that I’ve changed.

Also, our internet cable provider, Comcast, likes to fiddle with things around then, apparently. The cable modem is generally useless around that time, which explains the error messages from TiVo saying that it needs to make a service connection.

Tonight, a couple remote servers I need to deal with are down. This is prime time programming! But it’s prime time on the other side too, and they know the only people they are going to annoy at this hour are music pirates and porn downloaders.

That’s the joy of 3 am.

Chris Shiflett

AddThis Social Bookmark Button

Related link: http://shiflett.org/archive/73

Someone finally wrote a good Firefox extension for del.icio.us. It’s called Foxylicious. What makes it good? It does exactly what I described in my previous comments about del.icio.us:

Now, if only there were browser plugins for Firefox and Safari that integrated del.icio.us into the standard bookmark mechanism (with some intelligent caching to minimize traffic).

This extension adds my del.icio.us bookmarks to my standard bookmarks menu in Firefox, organizes them by tags (alphabetizing these would be a nice improvement), and lets me choose when to update them (so that it’s not an abusive client). Other extensions just never seemed to grok it, as I mentioned before:

I’ve looked at this Firefox extension, but I fail to see what it offers that I don’t already have. Giving me the ability to post is great and all, but this bookmarklet already does that (I use the popup version).

Well done, Dietrich. Foxylicious is exactly what I needed.

brian d foy

AddThis Social Bookmark Button

Related link: http://www.google.com/sms/index.html

Send an SMS message to 46645 (GOOGL) to get back a Google search. There are some shortcuts too:

  • send a phone number to look up an address: handy for numbers that aren’t in my phonebook but show up on caller ID
  • send an area code to get an area name: where’s this call from? Oh, 920 is Wisconsin, I know who that was!
  • send a zip code to get a city name: I’d really like to send an address to get a zip code.

There are several other sorts of shortcut searches, but those are the ones that I’ve found useful.

Nitesh Dhanjani

AddThis Social Bookmark Button

Continental airlines allows you to do instant messaging (Yahoo, MSN, AIM) for $5.99 a flight. You plug-in in your laptop modem to the Airfone (phone headset) in the middle seat, and dial any number. Once connected, you swipe your credit card, and then refresh your browser to begin your session. If your credit card goes through, you are allowed to connect to Yahoo, MSN, or AIM for the rest of the flight.

But no Internet access is allowed by the service… hmm. I wasn’t convinced. Before I left for the airport, I configured my SSH server at home to run on port 5050 (Yahoo messenger uses this port), and tried to SSH into my box (port 5050) from the flight.. and it worked! Next, I port-forwarded my squid proxy port to my local box through the SSH tunnel, and was able to browse the web in air. As expected, the connection was very slow, and really not usable.. but well.. I had to try it!

Jetconnect should filter outgoing traffic to be limited to certain ports as well as specific hosts (scs.msg.yahoo.com, etc). I just sent them email about this, but I’m not holding my breath. This, by the way, is not the first time I have come across inadequate network ACLs when it comes to ISPs that try to limit network activity…

Derek Sivers

AddThis Social Bookmark Button

Related link: http://www.hostbaby.com/

The thinking-out-loud question of the day : user-passwords versus domain-account-passwords.

Re-organizing the database for HostBaby web hosting.
Until now the client-to-domain relationship was one-to-one. (Very silly, wrong and lazy, of course. People with multiple domains on HostBaby had to just go sign up all over again.)
I’ve got that fixed so each person now has a single “client” account, with multiple domains inside.

The question is — do we assign them a client-level username + password, or just let them log in with any one of their domain-level usernames + passwords?
(Because each domain name *does* need its own username + password anyway.)

CLIENT-LEVEL USER+PASS:
(on top of each domain also having its own user+pass)
Upside:
* - one single username + password to remember to access their account
* - could log in to their account, even before their domain-account is ready
* - could match the way our domain registrar company is set up : one username + password controls many domains. we could keep these synced.
* - not EVERY domain account needs a username + password : aliases and redirects don’t. Only website accounts need it.
* - this is how other companies seem to do it (Network Solutions, GoDaddy, etc.)

Downside:
* - more to remember!
* - I’d have to let them log in with their domain’s username + password anyway, since that’s the one they know the best
* - most of our clients have only one domain. requiring two different usernames + passwords just to administer that one domain is silly
* - more customer service complaints : more explaining

DOMAIN-LEVEL USER+PASS ONLY:
Upside:
* - easier for them to remember : if they know any of their username + passwords, they’re in
* - people with one domain only have one user+pass to remember
* - no duplication
* - this is how they’d try to log in, anyway (and the way it’s been for years)

Downside:
* - we generate that domain-level username+password for them when creating their account, so there’s a downtime after they sign up where they have to wait to hear from us before they can log in to their account
* - security risk? people with multiple domains have more likely chance someone could guess their info?
* - doesn’t match how our registrar works : we’ll have to choose their *first* username+password to be the master one at the registrar, and make sure they use that to connect to their account there, even when they add new domain names with new username+password combinations

I think it’s about 50/50. I’m going to try the domain-level only, and see how it goes.

Anyone else gone through this kind of decision before?

Dale Dougherty

AddThis Social Bookmark Button

Related link: http://interviews.slashdot.org/interviews/04/10/20/1518217.shtml?tid=192&tid=214…

I enjoyed the Slashdot interview with author Neal Stephenson. He answers a question about his popularity by saying that “one way to classify artists is by to whom they are accountable.” He characterizes himself as a “Beowulf,” working within a relatively new tradition of popular novelists, able to support himself through his work. In contrast, “Dante” writers working within an academic literary tradition depend upon patrons or academic postings that underwrite the work.

Here’s his insight into how the two traditions interact:”It has happened many times in history that new systems will come along and, instead of obliterating the old, will surround and encapsulate them and work in symbiosis with them but otherwise pretty much leave them alone (think mitochondria) and sometimes I get the feeling that something similar is happening with these two literary worlds. The fact that we are having a discussion like this one on a forum such as Slashdot is Exhibit A.”

In response to a question about whether hacking tools should be protected under the US consitituion, he responds: “I’m pretty sure that the Founding Fathers were thinking of flintlocks, not perl scripts, when they wrote the Second Amendment.”

On producing his own books: “For the Baroque Cycle books I needed to convert my manuscripts, which were all TeX files, into a Quark format used by the publisher. … This was nasty and tedious but, in the end, reasonably satisfying.”

On why brick-and-mortar bookstores are still around: “Because it turns out that a bookstore is a lot more than a machine that swaps money for books.”

brian d foy

AddThis Social Bookmark Button

Related link: http://use.perl.org

Last night, Chris Nandor added the Red Sox logo to use.perl. Tonight, they do what no team has ever done in baseball history: come back from 0-3 to win a championship series. They also played the most hours of baseball played in any seven games.

image

Now he has to keep it up or risk jinxing the World Series. At least this time he wasn’t cheating. Curiously, googling for “chris nandor cheating” returns as its first result The Perl Review.

Ming Chow

AddThis Social Bookmark Button

The recent news on IT has been less than welcoming, if not demoralizing. Some of the recent news headlines:

Such headlines have been appearing frequently, and it is somewhat frightening. I was going to respond to Kevin Schmidt’s recent article Down With the Software Engineer! Long Live the Application Builder! but the slew of bad news from the IT industry was coming out faster than I could think.

So the question is, are programmers and software engineers going to be extinct?

To start off, there is a difference between a programmer and a software engineer. Loosely, a programmer is focused on building software using specific language(s). A software engineer, has the abilities of a programmer, but also encompasses the software development methodologies such as prototyping, documenting, and testing.

I asked a friend about software engineering at his research laboratory, and his response was: Everybody is a programmer here. In some ways, everyone is also software engineer. Everyone uses programming and some software development methodologies to build tools and products for their research. This seems to be the mentality at many firms and companies. Programming is becoming a trade, and at many places, it is a required skill in order for people to do their work –just like how knowledge in Microsoft Office is a highly recommended, if not required skill, for a majority of office personnel. Alas, it is a major reason why the job of “programmer” is on the decline.

What about the job of software engineer? My professor once told me that software development is 80% design, and 20% coding and maintenance. His comment seem to still true. These days, prototyping, documenting, and/or testing are critical aspects of a software engineering position. Added to that, knowledge in algorithms, security, user interfaces, or other fields (e.g. any of the sciences including mathematics, physics, and psychology) is recommended.

I can understand why the role of programmer is going by the wayside, but it will never be totally extinct. Instead, it will be blended in with many other roles on the job. I do not see the role of software engineer to be going by the wayside like the programmer because a software engineer encompasses a large spectrum of methodologies. However, I do see that software engineers will require more than just knowing the development methodologies –-it will be more specialized towards the nature of the application (e.g. radar systems, gaming, medical, etc). Finally, I do feel that the media is overplaying the whole layoffs in IT scenario, and the headlines are harsh. There are still tremendous opportunities in IT especially is networking and security, education, general support, application development, user interface design, and web services. Plenty of skills and talent is needed in IT. The reason why the IT layoffs/decline of programmers and offshoring is front-page tech news is because the general public believes that programmers is IT, which is flat-out wrong.

brian d foy

AddThis Social Bookmark Button

The October issue of Circulation Management focuses on telemarketing and its response to the US Do Not Call Registry.

Some telemarketers just don’t get in. On page 29, in a sidebar to “Telemarketing: A Tale of Two Cities” (partial article online), Debbie Dawson of Dial America is paraphrased saying

it’s a shame that of those who have placed their names on the Do Not Call list, most do not realize how many calls they are missing out on that they would really like to receive.

Yeah, whatever. My guess is that people who took the affirmative action to put their name in the Do Not Call Registry to not want calls. They didn’t sign up for the Do Not Call Me Unless You Think I Want Your Product Registry. It seems pretty simple to me: just don’t call. No, non, nein, нет, não, nr, αριθ! (I got Babelfish right here, and I can keep going).

The article also notes that Mother Jones and Weight Watchers stopped calling subscribers at the end of their subscription period because they were getting complaints that they were violating the Do Not Call policy (although they weren’t, since they had an existing business relationship). Still, those companies listened the to consumers, who were saying “Do not call me”.

[Side Note: Even if you have an existing business relationship, you can still tell them not to call you and they have to not call you. ]

David Sklar

AddThis Social Bookmark Button

Related link: http://www.colinux.org/

As much I wanted my Windows Re-Education Camp efforts to succeed completely, my brain and my fingers have Unix idioms too deeply ingrained in them to make the increasing amount of adjustment effort worth it.

Instead of firing up my long-dormant VMWare installation, I thought I’d give CoLinux a try.

Setup and installation was very smooth, with one big exception: networking. This wasn’t a total surprise, since the CoLinux wiki warns that networking is the hardest part of setup, but it was still annoying.

Basic installation was easy and straightforward. I just followed the directions. I downloaded the Colinux distribution and a 1GB Debian 3.0r0 disk image file. I put everything in one directory, created a swapfile, made a few pathname adjustments to the CoLinux configuration file and was up and running with a Debian installation.

At this point, the virtual CoLinux computer could neither talk to the Windows XP side of things nor to the rest of the Internet. The CoLinux networking documentation describes two ways that CoLinux’s networking can be set up: NAT or Bridged.

Networking: NAT

NAT uses Windows’s Internet Connection Sharing to hide CoLinux behind the outward-facing IP address of Windows. This is recommended as the easier way to set up CoLinux networking, but it was off-limits to me because Windows XP (or my D-Link “Cable/DSL Residential Gateway”, depending on your perspective) is unnecessarily inflexible.

The D-Link router, which is plugged into my cable modem and acts as a firewall and router for my vast home/office network, insists that its private subnet be 192.168.0.1 - 192.168.0.255. Windows XP Internet Connection Sharing also insists that its private subnet (the one that the CoLinux virtual computer would use) be 192.168.0.1 - 192.168.0.255.

If either one of them would let me change what subnet it uses, then I could use NAT with CoLinux. But they don’t. Perhaps this is a misguided ploy to get me to buy a more expensive router or somehow run Windows Server 2003 (which has more configurable NAT settings) on my Thinkpad. (I should note, though, that the NAT software that comes with VMWare can use any subnet you specify. You get what you pay for, perhaps.)

Networking: Bridged

So, my NAT dreams squashed, I proceeded to the world of bridged networking. In this model, you tell CoLinux the name of your Ethernet adapter and then it piggybacks a connection on it. There’s just one physical Ethernet plug on the back of my computer, but to the network (the DHCP server in the D-Link router, the other computers behind the router, and so on) it appears that two computers, with two IP addresses and two MAC addresses, live behind that plug.

At first, everything was fine with bridged networking. I configured Debian’s /etc/network/interfaces file to get an IP address for eth0 via DHCP. I started up CoLinux. It talked to the DHCP server. The Windows side of things and the Linux side of things were two subnet-sharing digital peas in a pod. Both of these “computers” were sharing the same atoms in the physical world, but from their logical perspectives they were just two different computers connected via a network.

Networking: Unplugged

Then disaster struck: I unplugged my laptop from the network. This is not an infrequent occurance. I bought a 3.5 pound computer on
purpose. When I am traveling or otherwise not online, I’d still like to be able, for example, to access Apache running on CoLinux
from Firefox running on Windows XP.

Unfortunately, bridged networking makes this tricky. The Windows computer and the Linux computer really don’t know they live in the same CPU. So once that network cable was unplugged, they each thought they had no way to talk to any other computers on the network — including each other.

After disappearing down the disabling “Media Sense” rabbit hole, I stumbled upon what would provide my solution to this problem: the Loopback Adapter. This is essentially a software-only fake Ethernet adapter that is always “plugged in”. By telling CoLinux to use bridgednetworking, but over the Loopback Adapter and not the regular Ethernet connection, I had a way for Windows and Linux to talk to each other over a “network” whether or not my computer was actually connected to an external network.

In Windows, I assigned the Loopback Adapter a static IP address. In /etc/network/interfaces, I gave eth0 a different static IP address in the same subnet. (I chose a different subnet than 192.168.0.1 - 192.168.0.255, of course!) The result? uninterrupted communication between Windows and Linux.

All of my networking problems were not solved at this point, though. While Windows was now connected to two networks (the “real” one via its regular Ethernet port and the fakey one via the Loopback adapter), Linux was only connected to one: the fakey Loopback network. This means that Linux had no outside network access. This was bad.

I solved this problem by providing additional bridged network interfaces to CoLinux. One to the Windows Ethernet adapter and one to the Wireless Ethernet adapter. These are configured in /etc/network/interfaces to get IP addresses via DHCP. So, all direct communication between the Windows computer and the Linux computer happens over the private Loopback network. But, when the Linux computer wants to talk to the rest of the world, it uses the Ethernet adapters in the Windows computer to make it happen.

DNS

To smooth communication between the Windows computer and the Linux computer, I run BIND on Windows with configuration files that resolve the appropriate addresses in the private subnet to hostnames in my private .home top level domain.

Security

Because the Windows and Linux computers only need to talk to each other over the Loopback network, you can restrict connections (with Windows Firewall or iptables) for sensitive services to just the Loopback subnet. The public interfaces on both Windows and Linux still need appropriate protection from any incoming external connections.

Configuration Details

I am using Windows XP SP 2, Colinux 0.6.1, and WinPCap 3.1 Beta 3.

In Windows, Network Connection “Loopback Adapter” has these TCP/IP properties:

IP address: 10.3.75.1
Subnet Mask: 255.255.255.0
Default gateway: 10.3.75.1
Preferred DNS Server: 10.3.75.1

The networking portion of my Colinux configuration file is:

<network index=”0″ type=”bridged” name=”MS LoopBack Driver”/>

<network index=”1″ type=”bridged” name=”Intel(R) PRO/1000 MT Mobile Connection (Microsoft’s Packet Scheduler)”/>

<network index=”2″ type=”bridged” name=”Intel(R) PRO/Wireless 2200BG Network Connection (Microsoft’s Packet Scheduler)”/>]]>

In Linux, the /etc/network/interfaces file contains:

# lo: colinux loopback
# eth0: connection via MS Loopback to Windows XP
# eth1: bridged connection to world via Gigabit Ethernet
# eth2: bridged connection to world via 802.11b/g

auto lo eth0 eth1

iface lo   inet loopback
iface eth0 inet static
           up /etc/network/local-ns.pl
           address 10.3.75.2
           netmask 255.255.255.0
iface eth1 inet dhcp
           up /etc/network/local-ns.pl
iface eth2 inet dhcp
           up /etc/network/local-ns.pl

/etc/network/local-ns.pl is a short program that makes sure
that /etc/resolv.conf always has the local (Windows) nameserver IP
address in it. The script is:

#!/usr/bin/perl

$local_ns='search home
nameserver 10.3.75.1
';

my $resolv_conf;
open(IN,'</etc/resolv.conf');
{ local $/; $resolv_conf = <IN>; }
if (! ($resolv_conf =~ /Q$local_ns/)) {
$resolv_conf = $local_ns . $resolv_conf;
}

open(OUT,'>/etc/resolv.conf');
print OUT $resolv_conf;
close(OUT);

What was your CoLinux setup experience?

brian d foy

AddThis Social Bookmark Button

Related link: http://www.highedweb.org/2004/index.html

9:45 am It’s a late start for me. I stayed up watching the Packers game last night. Boy was that a waste.

10:00 am There are metrics and then there are analytics, I guess. Kathy Farrell from Empire State College, it going over server logs at the moment. The log from the Lotus Notes server looks weird, but she’s calling it a standard format. I haven’t ever worked with Notes, so maybe that’s just how they do things. I have to remember that this is a non-techy view off the world though.

It looks like Kathy is wearing the flesh-covered Handeze gloves.

10:15 am Maryann Stopha from SUNY Geneseo is demonstrating WebTrends. I hadn’t realized that this was still around, especially since you have to pay for it. Most of the discussion is the usual sort for web logs.

10:25 am Ned Stankus from Hamilton College is talking about logging best practices, which is really more about what to ignore in logs then what to do with them. I want to pipe up and tell them that logs are foremost for capacity planning: you can’t do that if you don’t log everything. I’m going to sit quietly at the bakc though.

A lot of the problems people seem to bring up in the discussion can be solved by Apache’s CustomLog stuff.

10:35 am Now Ned is talking about “campaigns”, which some other people might call “sessions”. He uses Urchin web log analyzer. Which academic services get a link on the home page? Well, let’s see what people are using! If no one looks at your department pages, you don’t get the front page link.

You can look at search terms to figure out how to improve your navigation elements. If people keep searching for the same thing, maybe it should be a link. Big search terms: Monopoly Instructions. It seems an Economics professor posted the instructions in 1995 style HTML.

10:55 am Ned just canvassed the room: Who uses Google to search their site? Just about everyone raised their hand.

11:05 am Kathy is talking about searches that return no results. She started tracking those search terms, which led to several bugs in the web server set-up that she was able to fix.

11:10 am Ned is showing a difference between logs for IIS and Apache: the first is case-insensitive, so index.html and INDEX.HTML are the same thing. That causes problems in the logs.

A lot of people are throwing out some bad advice about web logging, such as using third party logging services that use web bugs, or various misunderstandings of URLs. I keep forgetting that this is a non-techy view of the world. It’s not a bad things, but it is about a lot of stuff I take for granted.

11:20 am Maryann is talking about a web use survey that she conducted. Most of the students are asking for a web portal for her school.

11:30 am Melissa Meehan from Buffalo State College is talking about their Metrics Toolbox. They actually worked with Psychology faculty to conduct focus groups.

11:45 am Time for lunch: this two hour session was a bit much to sit through, and I’m dozing a bit.

brian d foy

AddThis Social Bookmark Button

Related link: http://www.highedweb.org/2004/index.html

11:00 am Jason Moore, an undergraduate from the University of Rochester, spent a half hour looking for open source project logos. He found 91, which are now crowding each other out on the slide. He started with “Ruby”, which is now covered by a bunch of other logos.

This talk is well-attended: most of the seats in the room are taken: I count about 70 people in the room. The estimated attendance for the conference is about 300, I think, and there are several other tracks going on at the same time.

Jason is going through the basic free software / open source stuff, so I’m tuning out a bit. I have to remember where I put SubEthaEdit so I can take notes with Jim Brandt, who’s sitting next to me. Most of this stuff you can find in Open Sources and Free as in Freedom

11:15 am Jason canvasses the room: Who knows the basic open source licenses? Five of us raise our hands, which probably really means 15, since I go by the rule that only 1/3 of the people who can answer yes ever raise their hands.

11:20 am Jason is going over Eric Raymond’s Cathedral and Bazaar stuff, which I’ve really started to detest. I liked it when it came out, but now I think it’s facile and dividing. The fight isn’t between proprietary and open source programming processes because both have elements of both styles and both can develop great things. You can have my Excel when you pry it from my cold, dead Powerbook. It’s pleasant here though: I don’t see many fanatics around, so I can have a middle-of-the-road opinion.

11:25 am Slim Devices: Perl source code available.

k12host.com: a website to help K-12 educators set up websites which their classes can use.

11:40 am The most attractive feature of open source at this conference seems to be hackability rather than moral or ethical purity. These people are about getting work done, and they need things that help them do that.

11:45 am Question from the floor: What happens when Jason graduates? He’s “the open source guy” in his world right now. Students churn more then employees. Hopefully someone else will take his place. Part of good open source development (well, any development) is gracefully passing the torch.

Lunch What a nice spread! We get restaurant quality food for free (well, as part of the conference price).

2:45 pm Jim Brandt is talking about “Test Driven Development” (I published an article about this by Denis Kosykh in the last issue of The Perl Review). Tests are error-prone, repetitive, and boring, which computers are really good at.

A good testing suite lets us code without fear: without fearing of breaking things, without fear of taking the system down, and so on. The tests (should) catch those things.

3:00 pm Now Jim is onto Test::More. I didn’t think this would be a hot topic here, but the room is mostly packed and people are crowding around the doors: probably 70 people or so.

3:20 pm Jim canvasses the room: Who has ever installed a Perl module. It looks like a fourth of the people raised their hands.

3:22 pm On to browser testing. Some of this we can automate with WWW::Mechanize by Andy Lester. We could use that to check some accessibility things, like ALT values in IMG tags. I think I just volunteered to write that extension to WWW::Mechanize to do that. I think it’s already done, maybe as a part of HTML::Tidy, so I might get off easy).

3:50 pm I’m off to find an empty room so I can show off Test::More and WWW::Mechanize. If you’re at this conference and can corner me, I’d be glad to do the same for you. Although I would really like everyone hear to hire Stonehenge to do formal training and consulting on all this, I’d really just like more people to use it and like to show off. ;)

4:45 pm Rick Ells from the University of Washington is talking about spoofing and phishing attacks. At some point someone will target university networks, so we should be prepared. He has an interesting true life story: some phishers set up signs around campus advertising free wireless (just log in with your campus username and password). The people who fell for it just gave their identity to these clowns running their own wireless network. You don’t even need a network for this: just the base station. Once you have their info, who cares if the service actually works?

5:20 pm I think the day is over. Time for dinner!

brian d foy

AddThis Social Bookmark Button

Related link: http://www.highedweb.org/2004/index.html

9:15 am I’m at the HighEdWebDev 2004 Conference in Rochester, NY this week, but just as a spectator. This is a conference mainly for web people at colleges and universities. One of the presenters is talking about some of the work which I and Randal Schwartz did through Stonehenge Consulting Services.

I’ve only been here for about a half hour (and I missed the first day, a Sunday), but the hallway conversations have been very interesting. This conference is a mix of “business”, content, and technical people, so it’s not just the geek’s perspective on web stuff.

9:45 am I’m sitting in the tech session, “Technical Propeller Hats Required”. Jim Brandt from the University of Buffalo is talking about converting their students services web site from vanilla CGI to mod_perl. He says that in the past couple of years, usage has increased from 25-50% each semester, and their site gets slammed especially hard during the first week of each semester.

Jim reports that upgrading CPUs, disks, and memory only got them limited improvements, and they weren’t sure that buying really big iron would be that much of an improvement for the cost. A lot of people have run into this problem, so it’s not really new.

He decided to look for other places to improve. Since he was using vanilla CGI (one process per request, a database connection per process), he realized that the real improvement would be dumping CGI.

The problem is that he needed a quick fix and couldn’t dump the existing code which had been built up over several years. Even if he wanted to rewrite everything, he didn’t have the time or resources to do it in time. He also had to integrate it with the existing set-up for other things, like the student authentication system.

He checked out mod_perl. He could get an immediate benefit with Apache::Registry and Apache::DBI—he could keep all the CGI code.

They decided to get some Perl training. He calls it “Just a little bit late” training rather than “Just in Time” training. They had already done a lot of homework and tried a lot of things, so they came to the training sessions with a lot of questions about problems they had already run into.

10:00 am Jim just canvassed the room. I counted about 30 people in the room, and most of them say they are using apache. Jim says that open source software has come a long way in usability and acceptance, but bringing in experts helps to mitigate management fears about its use.

Once they turned on mod_perl, they looked at their server logs to figure out who was using what when, and identified the top ten most used scripts. They concentrated their conversion efforts on those scripts, which Jim says was a big win with management: they didn’t have to convert everything before they could get the benefit.

10:15 am There aren’t too many laptops out in this room, which seems odd to me only because I tend to be at Perl conferences where the attendees like to IRC with the person sitting next to them. I can see a 12-inch iBook-er reading Slashdot, though.

Now Jim is talking about reverse proxing in apache. They separated the servers so the one doing the heavy lifting (database and CGI stuff) didn’t also have to handle all of the other content. To his surpise, Jim found that some of their users were on really slow connections (instead of the broadband they assumed all on-campus students have), so their heavy-lifting processes basically finished their heacy lifting, but were tied up trickling bits to the client. Once that work is given to the front end of the reverse proxy, they got a big speed up. Once the backend server doesn’t have to talk to the client, it has a lot more time to do their real work.

10:30 am Jim is talking about the hardware set-up. They started using SSL cards to take that load off of the CPU. With 25,000 people trying to hit a server in a couple hours, SSL key generation was a significant performance limiter.

They also went with a server farm. Instead of a couple of big machines, they went with more smaller machines managed by a separate load-balancer. At first the load-balancer ran slower, but onyl because they didn’t turn on sticky IPs: users had to keep renegotiating things because they were getting different back-end machines. When they fixed that, they got the faster results they expected.

10:40 am They still use CGI for some things, even though they have this big mod_perl set-up. Jim is talking about something I remember from Joel Spolsky: there are different types of software development, and each has a different economic scenario. In this case, they didn’t spend a lot of time creating fancy technology for a script that 200 people on campus might use five times a year. Interns can easily create CGI scripts and take care of those users.

10:50 am Jim finished his talk and is taking questions. A lot of people seem to be locked into certain technologies, either by initial choice (big code base), management fiat (”We will use Sun”), or a design decision (”We had to do this because J2EE needed it”).

Andy Oram

AddThis Social Bookmark Button

Related link: http://www.gnome.org/projects/beagle/

Google doesn’t want you to delete your mail. That note you tossed off
to your spouse last night, asking which brand of cereal to buy at the
grocery store, may be utterly irrelevant to you today, but to
Google Mail
it’s highly marketable information.

A similar concern, less commercial but equally avaricious in the
information sense, lies behind one of the projects from Ximian (now
Novell) to generate the most buzz: founder Nat Friedman’s

Dashboard

project. Despite a promising prototype, Dashboard implementation
turned out to involve a lot of deep and difficult questions, but its
supporters believe they have a way foreward. A future version of
Dashboard will be reconstituted on the

Beagle

project, led by Jon Trowbridge.

The GNOME foundation is treating Dashboard and Beagle as extremely
important. Trowbridge gave an informal keynote-like talk on them
today at the

4th GNOME Developer’s Summit
.

The issue does not concern GNOME alone. Dashboard and Beagle are
desktop-independent; they could be accessed by KDE as well. And
Microsoft has announced a similar system that automatically indexes
your entire computer system and turns up everything related to some
topic of importance to you.

Reasons for Dashboard, etc.

The problem motivating these systems is the common “Where did I see
that?” question. For instance, I told the GNOME Foundation executive
director Tim Ney today that I had seen survey results suggesting that
KDE is three times as popular as GNOME. (I don’t consider the results
necessarily accurate.)

Now I’m trying to figure out whether I saw this survey. Was it a Web
site I visit regularly, something on an RSS feed, an email sent by a
colleague, or a hallucination induced by listening to too much modern
jazz this week?

I don’t think either Dashboard or Longhorn will help me search that
last category any time soon. But they are supposed to help turn up
results from all the other categories–and (thanks to real-time
indexing) turn them up nearly instantly, even on a hard disk with
multiple gigabytes of information in a variety of formats.

More than a super-grep

The Dashboard/Beagle vision is far more than a super-grep, or
something able to search for keywords in files of different
formats. (Windows has offered that for a long time.) Beagle already
has time tracking, which means that if you read an email and
visit a file a few seconds later, Beagle will remember that they’re
related even if there’s no particular phrase that’s featured
prominently in both. Beagle also maintains a full-text index on every
Web site you visit. Trowbridge would like to go further and track of
the context in which you handle information. For instance, if
you save a file from someone’s email message, the file will contain a
marker indicating a connection with that email message.

Trowbridge complains that your computer throws away a lot of
information you give it (such as the fact that you saved a file from
an email message). But I wonder about the push to save so much
metainformation. True, we now have the processing power and storage
space to save all kinds of junk. But can we predict what information
will really be useful? I’ll return to this question at the end of this
article.

What Dashboard and Beagle entail

It’s worth briefly going over the architecture that supports the
personal information space, because that helps to show how extensively
a system must be changed to support it.

Fast search depends on an up-to-date database, whether one is talking
about the spidering done all the time by Internet search engines, or a
repository of terms used by files on your own hard disk. Thanks to events generated by the
new

D-BUS

interface being developed for Linux, a kernel subsystem called inotify can collect changes to files as they happen and pass them to interested userspace tools.

Beagle depends on an indexing tool called

Lucene

to keep track of what’s in various files on the system. It essentially
checks everything except files in dot directories and others that
traditionally contain throw-away data. As I already mentioned, it
records the contents of Web pages you visit. It can also search your
email, your IM logs, and anything else that exists as a file.

The next step is to associate store the metainformation collected in
various ways with the files. Microsoft’s Longhorn will theoretically
involve an entirely new filesystem called WinFS. (When this will
happen is anybody’s guess, but it won’t happen soon.) One of Linux’s
strengths is its support for multiple filesystems, and Trowbridge
doesn’t expect them all to be enhanced just to support
Beagle. However, many filesystems contain files called “extended
attributes,” often used to implement Access Control Lists and other
new features. Beagle can use these to store its metadata.

For each file format or type of information (email, for instance)
Beagle will have a back-end API to do searching. The developers are
even looking for ways to associate metainformation with
pictures. Beagle combines all the results and presents them in a
single front-end API. Applications that want to do system-wide
searches, therefore, will need to understand just the Beagle API in
order to access all types of data on the system. The current utility
used to demo Beagle is called best, for Bleeding Edge Search
Tool.

Privacy fears come to mind when one considers a tool that does instant
searches. Remember that (currently) Dashboard and Beagle are meant for
use by an individual on his or her personal data. One approach to the
issue is to say “Privacy is overrated” and assume that one is doing
the user a favor by presenting his or her entire disk contents on
demand. Another approach would be to divide information into
categories, such as to separate work data from personal data. But
that’s hard to do: asking the user to distinguish them is adding work,
while trying to do it from context risks oversimplifying the complex
lives led by users.

Indexing the infinite

I want Dashboard. I am intrigued by the idea that, instead of
organizing and boiling down the information I receive and trying to
get rid of what I don’t need, I should go in the opposite direction
and compulsively save information, expecting my computer to pluck out
what I need later. A saying attributed to AI researcher Marvin Minsky
claimed that his information store consisted of his friends. For this
task I trust computers more than friends. (Sorry, Tim Ney.)

But I worry about clever schemes to track and save information–and
not for privacy reasons. I just wonder whether we’ll know what we’ll
want in the future.

Archaeologists have found marvelous ways to deduce ancient people’s
lifestyles from the facts they turn. They make deductions based on
whether an artifact is upside-down or right-side-up, and from chemical
traces found nearby. Still, we often wish people in the past left more
clues.

We also do archaological searches on our computer’s data, which is
just as strangely organized and off-balance as the

MIT Stata Center

that hosts today’s GNOME summit. Once again, the data we left behind
on our computers proves frustratingly inadequate for today’s purposes.
And I would guess that increasing the data we collect will do little
to close the gap.

When one starts creating filesystem attributes and instrumenting
applications, one makes choices that will continue to have impacts
thirty years later. What new application will arise just a year or two
from now that will make the Beagle developers kick themselves because
they forgot to prepare for it?

So I’m not saying full system search is unfeasible. I’m just asking
how long it takes to prepare a system for the search, in comparison to
how it takes for the system to become obsolete. I’d like to try the
results, in any case.

What would you search for?

Uche Ogbuji

AddThis Social Bookmark Button

Earlier on Schematron inventor Rick Jelliffe blogged some exciting developments in the area. Heavy activity continues in the world of Schematron, the validation language++ for XML. ISO standardization is impending, implementations advance and tutorials/articles proliferate.

First of all the troubling bit. Schematron is on ISO standards track and I’ve been assured that time is very short to get in comments and corrections. The problem is that the last Committee Draft of ISO Schematron I saw is frankly a mess. It is hard to follow and full of errors and inconsistencies. David Cazzulino sent a great number of comments and issues to the Schematron-love-in mailing list before that list died (through some convulsion of SourceForge). Some of them were hashed out, and some are still outstanding. Brian Ewin added more issues to the fray but the list died before there was any discussion of these. I ran into more issues myself upon implementing the draft and sent my messages to the ISO DSDL mailing list, which is, I think, the only place to turn to. I can’t point to my message because the mailing list archives for dsdl-comments is down. Murphy’s Law indeed. Anyway, I haven’t seen an updated spec that addresses the numerous issues, and that makes me nervous at this point.

Moving on to the good news, there has been a lot of writing on Schematron lately:

my own implementation Scimitar is up to version 0.9.0, implementing all that I could decipher from the ISO draft. I’ve already started on the 1.1 generation of Scimitar, which includes some support for Jeni Tennison’s Datatype Library Language (DTLL), so if no major bugs or omissions turn up soon, 0.9.0 will pretty much become 1.0.

Whether standalone or embedded in RELAX NG or WXS, if you’re using XML, you should at least be considering Schematron.

Has Schematron found a place in your XML toolkit?

David Sklar

AddThis Social Bookmark Button

Related link: http://www.nypl.org/research/newton/

Last night, I went to an exhibit at the New York Public Library about Isaac Newton and his scientific influence. The computer geek side of me reacted in particular to two things.

First, one of the items on display was a letter that Newton wrote to Leibniz in 1677. In the letter, Newton was describing some techniques he had developed that were similar to (or part of) his emerging fluxional calculus, but he didn’t want to divulge the whole thing to Leibniz. So, he stopped short in an explanation and instead wrote an anagram, 6accdae13eff7i3l9n4o4qrr4s8t12ux, which represented his fundametal calculus theory.

Looking at this letter, I was struck by the similarity (both of purpose and visually) of Newton’s technique and one of the things that we use hash functions for today. “Hey, Leibniz, I’m not going to let you read the document in which I lay out my amazing new ideas, but here’s the results of sha1(file_get_contents(’/home/isaac/discovery-27.tex’)) just to prove I’ve done it.”

Newton’s secrecy and roundabout-ness, however, ultimately hurt his claim to have preceeded Leibniz in discovering (or “inventing”, depending on your philosophical leanings) calculus. He sat on his unpublished innovations while Leibniz, a few years later, published his, leading to the bitter feud of calculus primacy.

Second, another aspect of the exhibit that struck me were these lines from the introductory accompanying text:

During a time when the mathematical sciences and natural philosophy were integral to a broader encyclopedia of knowledge, these domains set an example of so-called superior knowledge for other disciplines to emulate: the search for rational, universal principles became the modus vivendi for all researchers, regardless of field.

Which made me think: is the Semantic Web the 21st century equivalent of Diderot’s Encyclopédie? What lessons have we learned (or not) from previous generations’ attempts to taxonomify (and neologize? :) all information? What would Enlightenment philosophers make of OWL?

Is the Semantic Web the philosophical child of the Enlightenment?

chromatic

AddThis Social Bookmark Button

Related link: http://wiki.linuxquestions.org/

Jeremy Garcia from LinuxQuestions.org just pointed out that their new wiki has over 2000 articles. If you’ve never been to the site, go. You’ll be as impressed as I am.