advertisement

September 2002 Archives

O´Reilly´s Digital Media Blogs have been expanded and are now located at a new home. To find our new blogs, please visit:
Bruce A. Epstein

AddThis Social Bookmark Button

Ambiguity is a greater social lubricant than alcohol. If we all understood each other precisely all the time, we’d probably be in trouble. However, ambiguity bothers some of us more than others. Somewhere along the spectrum considered “within normal limits” lie people like me who prefer, nay require, more precision than most people are comfortable with.
Case in point, suppose I said, “I’ll take a half-dozen bagels, please.” If the counterperson were to say, “Six? Okay, six bagels coming right up!” what is the appropriate response?:

(a) Yes, please.
(b) I said, a “half-dozen,” not “six.”
(c) Any damn fool knows that a “half-dozen” is “six.”
(d) I thought I got one free with each half-dozen, so don’t I get seven altogether?

Answer (a) is the most socially acceptable. Answer (b) might qualify me for an insane asylum. Answer (c) might be what I’m thinking, but we all need to accept the little small chat that is part of life, which also serves as a confirmation that the counterperson heard us and is responding. Item (d) is for me the most interesting, but let’s revisit item (b) first.

Despite the famous saying, “Six of one, half a dozen of the other,” is “a half-dozen” always equivalent to “six”? Maybe I expect a baker’s dozen to be 13, so when I say “a half-dozen,” I really want 7 (if rounding up). Or maybe I’m intentionally being imprecise to indicate I’d accept 5 or 7 if the items are unusually small or large. Or maybe I was giving an approximate number so the counterperson could get a bag of the appropriate size but reserving the right to change the number slightly. Or maybe I have a phobia about the number 6, being part of 666, etc. Certainly, in some countries there are many lucky and unlucky numbers, and many hotels in the US still don’t have a 13th floor. And what if I were a poet? Is, “Six shimmering Cinnabons,” the same as, “A half-dozen simmering Cinnabons”? Is, “Herb had a hankering for a half-dozen,” the same as, “He said he’d take six”?

Admittedly, in most cases, “half-dozen” and “six” mean the same thing, but the preceding examples amply demonstrate this is not always the case. So it is no wonder that far more ambiguous statements are misinterpreted all the time. For example, I once had a conversation with Tim O’Reilly that went something like this:

Tim: How high a priority is that book on Director?

Bruce: It is certainly not as important as getting the Dreamweaver book done. I probably won’t start on the Director book until after the Dreamweaver book is complete.

Some time later in the conversation, the following transpired:

Tim: So, we don’t need to decide what to do on the Director book until after the Dreamweaver book is done.

Bruce: I never said that.

Tim: You said you weren’t going to work on it until then anyway.

Bruce: No, I said the Dreamweaver book was a higher priority. I won’t let the Director book get in the way, but I hope to start work on it before completing the Dreamweaver book.

Tim: Arggghhhh! (as he drives off the road into a ditch).

Leaving aside the ambiguity in the word “complete,” which for our purposes means “finishing my editorial duties to the point where I can submit the book to Production, even if I have to QC the manuscript before going to press,” the ambiguities still abound. Both Tim and I would agree that I said the Dreamweaver book is a higher priority. But we disagree as to whether I’ll work on the Director book in the meantime. I intend to work on it as time allows.

There are many factors in assessing whether one person is simply being too anal or the other interpreting the facts too narrowly (or incorrectly). For example, we might have different priorities, different time horizons, or different levels of involvement (and therefore different levels of interest) in a given project. Tim might have misunderstood the subtlety (cell phone static still being what it is) or he might not even be interested. He might want to know only whether the Dreamweaver book will be completed on schedule. He might want to know only whether he can delay thinking about the Director book immediately; even if he does have to think about it before the Dreamweaver book is finished, he doesn’t have to think about it today.

But I think the major conflict when Tim and I communicate is how the two of us deal with ambiguity. In light of ambiguity, Tim seems to make a very firm assumption about what was said/meant and then be very annoyed or distracted if someone intended or interpreted it differently. That is, Tim is fine with ambiguity as long as everyone reaches the same conclusion despite it. :-) Obviously, we wouldn’t call it “ambiguity” if this was a likely outcome. Therefore, I prefer to clarify every ambiguity, and if that is impossible, clarify explicitly that the future remains ambiguous. For example, Tim wants me to report to him as follows:

1. Here is what I think is going to happen.
2. Here is what I intend to do based on these assumptions, and this is why I think it is the right think to do.

OTOH, I always like to add, “But these are the contingency plans, or alternative plans we should consider if my initial assumptions are incorrect.” Or I might say, “I don’t know which of these things are right and I need some guidance from you on it.” My approach leaves Tim thinking I haven’t made a final decision or advocated strongly enough. Tim’s approach leaves me thinking he hasn’t gave me enough guidance. So who is right? Is there a right answer?

It all depends. If Tim hired me expecting a certain amount of expertise, then he is right that I should make my recommendation and not always hedge my bets. If, OTOH, Tim has repeatedly questioned my recommendations and conclusions in the past, it is natural for me to assume I should discuss alternative outcomes and contingencies. Our personalities, stress levels, and expertise also come into play. I tend to be much more anal and detail-oriented, whereas Tim is more of a “big picture” guy. If I have more time than him, I might be more interested in and willing to talk about minor details. But I think the level of experience and expertise is perhaps the biggest contributing factor.

For example, suppose I’ve driven to Joe’s house 1,000 times. You might just say, “Meet me at Joe’s,” and expect me to show up at the “usual” time, having travelled the customary route. OTOH, if I’ve never driven there and don’t know Joe, I might have questions about the directions, the dress code, the time to arrive, whether I should bring anything, or who Joe is. You might reply, “You’ll figure it out. Just follow your nose.”

I postulate that Tim’s superior experience in publishing affords him two things that I don’t have: the ability to approximate an unknown future more accurately, and the knowledge of when such predictions are futile or counterproductive. Absent similar predictive powers, I focus on contingency plans and schedules, while not being able to rule out a sufficient number to arrive at the likely conclusion. In the reverse case where I know more than Tim, such as when discussing the Macromedia product space, I’m happy to summarize things in a way that undoubtedly leaves Tim wanting more details.

Granted, our brains might just be wired differently. I admit that there are many people more comfortable “flying-blind” than I am. Tim would probably say it is because I’m too anal, whereas I’d say I’m trying to gather sufficient information to do my job in an optimal manner long-term. Maybe Tim considers my tenure to be more tenuous. ;-)

Regardless, although we can argue about it usefulness, I must confess to the personality trait that unnerves Tim so. He is more the “In a Nutshell” type, whereas I tend to favor, “The Definitive Guide” series. The former assumes that people are comfortable with the broad strokes in the interest of saving time, even if it introduces substantial ambiguity. The latter is designed for people willing to read (or write) a book that is more in the hand-holding style, leaving no stone unturned. Which are you?

Are you more the “In a Nutshell” or “The Definitive Guide” type? Do you thrive or struggle in the face of ambiguity?

Betsy Waliszewski

AddThis Social Bookmark Button

Perl, through Project Breeze, has made it possible for the School of Natural and Physical Sciences (SNPS) at the University of Papua New Guinea, to accommodate the new academic calendar while handling the afairs of its students with far more accuracy and efficiency. Here is the story of Project Breeze, sent to me by Alfred Vahau, Associate Dean of Students in the School of Natural and Physical Sciences (SNPS) at the University of Papua New Guinea from 1999 - 2001. He is currently the Perl advocate and leader of Project Breeze.

Managing Student Affairs with Perl at the University of Papua New Guinea

In 1999, the University of Papua New Guinea (UPNG) restructured its academic year from two semesters per year to three trimesters, with a week between each term. The change in the academic calendar led the School of Natural and Physical Sciences (SNPS) to re-evaluate the data entry process for student records, which was done largely by hand, using Excel.

Project Breeze

The Breeze project begun in SNPS in March 2001 was created to explore alternatives to the manual data entry process. The initial goal of Project Breeze was to find a way to populate Excel cells automatically, thereby ending years of manual data entry. The approach was to look at ways that Unix might be used, without excluding the use of Excel or other Windows applications.

The Excel problem

The University of Papua New Guinea’s computing environment is centered around the Windows operating system. Excel and Word are the de facto standard on campus�nearly everyone uses them on a regular basis. However, when it came to data entry, no one knew enough about Excel to take advantage of all of its features. For most users, manual entry of data into cells and using cut, copy, and paste is what Excel is all about, which is fine for small quantities of data. In SNPS, we have to populate about 6000 Excel cells for processing information about 500 students, three times a year. The need to automate the process was evident when I assumed responsibility for managing student affairs in 1999, the year that UPNG introduced the new academic calendar.

From the beginning, I was very frustrated with our manual data entry process. I knew that Unix solutions existed, but was aware that there would be resistance to a Unix-only solution, as people at UPNG were more comfortable using Excel and Word. I began to search for a unified language capable of processing and producing Excel files. My search ended when I got a copy of Unix Power Tools, 2nd edition, and was first introduced to Perl.

An Outline of The Problem

As part of my duties in student administration in SNPS, at the end of each term, I asked that each discipline present the results of the courses taught in that area. These were submitted in Excel files in the format shown in Format 1:

ID Fname Sname CourseNo Grade

Format 1

ID is the student identification number, Fname is the student’s first name, Sname is the student surname, CourseNo is the number in the form xx.nnnn which identifies a course uniquely in each discipline, and the Grade is the letter grade given to the student based on her performance. Each discipline provided two listings for each course - alphabetic and merit order.

From the results of each course (SNPS offered an average of 45 courses per term), I wanted a file constructed in the format shown in Format 2.

ID No. Fname Sname C1 G1 C2 G2 C3 G3 C4 G4 GPA

Format 2

This second Excel file contains the names of all 500 students along with their identification numbers, the four course numbers C1, C2, C3, C4, the respective grades for each course, G1, G2, G3, G4, and the grade point average (GPA). Historically, this file was constructed manually by administrative staff of SNPS with information derived from the alphabetical listing of each discipline results for each course. In theory, that required is populating 6000 Excel cells by hand, using information from the alpha listing of the course results. This translated to two to five days of manual work to ensure accuracy of the results, as the future of any student was clearly in the balance. The new structure introduced an additional constraint: the results had to be determined quickly, prior to commencement of a new term.

Before the change to the new academic calendar, there had been a break of about a month, so the lengthy manual data entry process was rarely an issue. However, with the new academic calendar, it became increasingly clear that the manual process, in addition to being immensely inefficient, could no longer meet our needs for timely data.

The Perl Approach

To produce the information in Excel spreadsheet in Format 2 for all 500 students of SNPS, the Perl script reads two input files. These are student and grades files. The student file contains the names of students along with their ID numbers. The data is stored in an Excel format, then converted into text for Perl to process. The grades file is constructed from the course results from each of six disciplines and holds the id, course number, and the grade for each student. Thus, the Excel file (Format 2) derives its information from six separate Excel files.

Each of the Excel files is converted into an ASCII format using Takanori Kawai’s Spreadsheet::ParseExcel module (available from CPAN.) This essentially bypasses Excel’s guided menu on file conversion. The discipline files are then concatenated to produce a master results file called the “grades” file. This is the second input file. In the Perl script, I set up an array of course numbers:

@courses = qw(xx.nnnn yy.nnnn zz.nnnn);

and a hash for the courses:

%courses = (
'xx1'    =>  'xx.nnnn',
'yy1'  =>  'yy.nnnn',
'zz1 '     =>  'zz.nnnn');

The xx, yy and zz are discipline codes and nnnn are course numbers in each disciplines. The names are read into a hash keyed in by id numbers:

$name{$id} = $name;

The grades are referenced by both the student identification number and the course number:

$grade{$id, $courseno} = $grade;

Each Excel row is then built by using the Perl’s push and foreach functions. First the name and id are inserted into the result array:

@result = ($id, $name{$id});

Then the course number and the grades are appended to the end of the row:

push (@result, $course{$courseid}, $grade{$gradeid});

Computing the Grade Point Average

Once all the information for each student has been entered into the array, the grades are then passed to a subroutine, which determines the weighting of each grade and then computes the grade point average for each student. This figure is then appended at the end of the @result array:

push (@result, $gpa);

Producing the Excel file

The resulting row is written to an Excel file, which is created using John McNamara’s Spreadsheet::WriteExcel module (available from CPAN.)

Thus the result for each 500 students of SNPS is made available as an Excel file, which can be further, manipulated to produce any desired output.

How long does it take?

Breeze is so named because it makes data entry into Excel a breeze. The only requirement is that data must exist elsewhere in some format. The current program is written using Perl 5.6.1. It runs on Linux Red Hat 7.1 Pentium III PC at 1 GHz and 30 GB hard disk with 128 MB ram. The program returns all of the information in Format 2 in under five minutes. In contrast, the manual processing returned the same information after several hours of data entry.

Benefit to staff and students

There is a constraint associated with the term transitions of the new academic calendar. As the term breaks are short (1 week), the academic staff in SNPS is under pressure to finalize the results as quickly as possible. By creating a system capable of rapidly providing accurate results, the time saved becomes an additional day for staff to complete their grading. It also means that the students of SNPS can know their status well in advance of the beginning of a new term. Perl, through Project Breeze, has made it possible for SNPS to accommodate the new academic calendar, while handling the affairs of its students with far more accuracy and efficiency.

Further Goals

Breeze is now being developed to include a database that will provide information on any SNPS student from the time the student is admitted until graduation day. A generic filter is also being developed to extract data from Excel files directly from each discipline.

The Future

It is the Breeze mission statement to, �Harness the Power Tools of Contemporary IT to manage student affairs in SNPS in the new millenium.� While UPNG remains a Windows stronghold, the Breeze project has introduced Perl to the UPNG community. Although Breeze has not yet been introduced to the other schools at UPNG, the object-oriented version of Breeze soon to be developed in Perl will address university-wide problems with student administration.

– Alfred Vahau

Alfred Vahau is a lecturer in Geophysics in the School of Natural and Physical Sciences in the University of Papua New Guinea. He holds an M. Sc. degree in theoretical Physics from the University of Sussex. He was a research student in seismology in the Research School of Earth Sciences, ANU, from 1991 to 1995. His thesis Seismic Tomography Across the Edge of the Australian Shield has remained unpublished.

Alfred was SNPS’s Associate Dean of Students from 1999 - 2001. He is the Perl advocate and leader of Project Breeze in the School of Natural and Physical Sciences at the University of Papua New Guinea.

To learn how large and small companies are using Perl to meet their goals, check out Perl Success Stories.

If you have a Perl success story of your own that you’d like to share, please let me know. You can reach me at: betsy@oreilly.com

Betsy Waliszewski

AddThis Social Bookmark Button

Patrick Carmichael, a lecturer at the University of Reading in the UK, sent me this story. In addition to teaching teachers to use new technologies, he carries out research into how low-cost network technologies can be used to support civil-society projects, effect social change, and aid in conflict resolution around the world.

Networked Qualitative Data Analysis with Perl and XML

My work at the University of Reading involves research and development in two main areas. One is the evaluation and development of network technologies for use in education projects in the UK; the other is in a field broadly described as ?development education?. It was to talk about the latter that I attended YAPC 2000 at Carnegie-Mellon University, where I discussed how I was trying to apply ideas from development theory to the deployment of hardware and software in Southern Africa. In particular, I talked about how software developers could learn from ideas like ?appropriate technology? and how open source software was a vital element in the any strategy designed to build sustainable user and developer communities around the world.

Since then, I?ve continued to use Perl in a range of projects ? most recently developing a knowledge management system for an non-governmental organization in Egypt which provides training for other workers in the not-for-profit sector. In all the projects I?ve developed, budgets have been tight and the network infrastructure shaky, but there is enormous enthusiasm on the part of those with whom I work.

A marvelous phrase - and associated concept - to which I was introduced when working with journalists and academics who have lived, worked and networked through the wars of Yugoslav disintegration was ?tactical media?. This involves using what resources you have to the greatest effect, looking for innovative solutions to problems, and being prepared to move at high speed to respond to changing circumstances. When the network landscape ? and users? access to it and to each other ? is subject to change, then flexible, freely-available tools like Perl really come into their own. So in many parts of the world, you will find web mail, newsgroups, listservs, email auto-responders, groupware applications and custom-built server and client applications being patched together to form ad-hoc but functional networks with liberal applications of Perl glue.

One area that my academic colleagues identified as being lacking in their ?toolkit? was a lightweight and, above all, low-cost, computer-aided qualitative data analysis software (CAQDAS) tool. Researchers characteristically use these to analyze free-form texts such as interview transcripts. While some analysis can be carried out using the advanced features of proprietary ?office? applications, dedicated CAQDAS software allows the attachment of ?codes? and ?memos? to text fragments ? these codes and memos themselves can then be retrieved, sorted and even coded and ?memoed? themselves. Many dedicated CADQAS software packages are expensive and are adapted for installation on single computers; it is commonplace to find such software on the networks of Universities in the UK, but many of my potential ?client group? had intermittent access to the network via shared machines. In many cases, I knew their only access was in public spaces such as libraries, ?telecenters? and cybercafes.

In the university, we also had students ? often teachers studying part-time for Masters? and Doctoral degrees ? who wanted to be able to analyze data they collected in classrooms and other research sites without having to come to the campus. They were already supported via web mail and ?managed learning environment? software such as Blackboard, but they lacked access to data analysis tools.

A networked CAQDAS application offering the opportunity for researchers, data and analysis (the latter being securely stored server-side) to be distributed across the network seemed to be the answer. Inspired by Jon Udell?s ?Practical Internet Groupware? in which he describes a ?reviewable document base? and by the development of a number of Perl modules for parsing and querying XML (I used XML::Parser and XML::Twig), The first stage in the development of the application was to built a skeleton text-retrieval system based on regular expressions. While regular expression support already exists both in a range of generic and dedicated QDA software, but that offered by Perl, particularly when combined with other built-in functions and modules, is particularly rich, and it proved relatively easy to develop my initial text-retrieval system into a “code-and-retrieve” one in which users could retrieve data on the basis of text matching, attached codes or combinations of the two.

While some qualitative data make reference to specific dates and periods, others, particularly those of children, are more vague and make references to sequences of events without providing specific dates. When coding accounts, dates were converted wherever possible using Date::Manip. Thus ?the sixth of April, 1994? could be coded as: <date strictdate=”1994-04-06″>the sixth of April, 1994</date> allowing searches across a database for references to specific dates or timeframes. Other Perl modules and interfaces used included ?String::Approx? to allow approximate matching of text strings and ?Lingua::Wordnet? which provided an interface to the Wordnet lexical database in which nouns, verbs, adjectives and adverbs are organised into synonym sets, each representing one underlying lexical concept. This suite of ?middle-tier? features allowed the development of an effective data retrieval system operating through a web browser, with XML data files being parsed, and tags containing codes converted into hypertext links to further information or “pop-up” labels. A search facility allowed users to retrieve documents or document fragments, different regular expressions allowing the user to set the target or targets of their searches; their context; the amount of that context to be displayed; and the mode of presentation.

At this stage, the application allowed users to convert qualitative data into hypertexts and to ?explore? them. What was still lacking, however, was what QDA users refer to as a ?pencil-level richness? ? the capability to select a text block or fragment and attach either one of a set of predefined codes or a more substantial ?memo? through the user interface of the application ? in this case, the web browser. Automatic numbering of paragraphs within the XML of source documents allowed the generation of ?add a memo? links to the HTML and a trivial CGI script allowed these to be associated with the source document. Memoing user-selected text fragments seemed to pose more of a problem until I investigated the DHTML object model for web pages - the accessing the ?selection? and ?textrange? objects in Javascript or VBScript made it possible to submit both the originating text, its position in the entire document and the memo or codes via CGI. When this is submitted, the memo itself is stored as XML data fragments, which can be viewed alongside the source data, or independently of it, in a number of formats.

We are currently using the application ? provisionally named ?Codex? (code + XML = a body of knowledge; annals; a literary or scriptural corpus) ? in a number of areas. In addition to being made available as one of a set of lightweight network tools for development education work, it is likely to form part of a suite of network tools offered to teacher-researchers taking part in curriculum development projects in the UK. As CAQDAS applications go, it is basic; but then, it aims to provide the functions that its users need most while making minimal demands on their hardware, software and budgets. And, as their needs and expectations develop, so too might Codex. When my Inbox contains messages which read along the lines of: ?It?s good, but what would be really useful would be if it could ??, I know not only that someone is using it, but also that we are engaged in a different - and characteristically ?Perlish? - kind of participatory learning in which distinctions - between teachers and students, and between developers and users - are blurred.

A more complete description of the prototype application ? together with a discussion of its relation to other types of CAQDAS is available online at: CAQDAS

–Patrick Carmichael, University of Reading

To learn how large and small companies are using Perl to meet their goals, check out Perl Success Stories.

If you have a Perl success story of your own that you’d like to share, please let me know. You can reach me at: betsy@oreilly.com

Richard Koman

AddThis Social Bookmark Button

Related link: http://www.politechbot.com/p-03949.html

Anti-hacking laws currently on the books are designed to punish people or companies who break into someone else’s computer. Rep. Howard Berman (D-Calif.) recently entered a bill that would exempt from these penalties copyright holders who hack computers running on P2P systems in attempt to prevent piracy of their content.

Apparently stung by the criticism of the bill on the Politich mailing list, run by cnet’s Washington reporter Declan McCullagh, Berman’s office submitted a rationale for the bill. The entire message is available on Declan’s site. Some of the choicer excerpts:

Does H.R. 5211 allow copyright owners to hack into my computer?

No. Despite wildly inaccurate press reports, H.R. 5211 in no way allows a
copyright owner to “hack” into anyone’s computer. Copyright owners are only
allowed to enter or look into a P2P user’s computer to the same extent that
any other P2P user is able to do so. In other words, if a KaZaA user has
advertized to all 100 million other KaZaA users that he wants to download or
distribute a copyrighted song, the songwriter is not “hacking” if she reads
the advertisement like everyone else. H.R. 5211 then allows the songwriter
to take certain, limited actions to stop the distribution of her copyrighted
song between KaZaA users, but in no way allows her to enter or look into a
private area of those KaZaA users’ computers.

Why is a safe harbor from liability for copyright owners necessary?

Certain laws, while intended to prohibit malicious computer hacking, are so
broadly drafted that they may inadvertently create liability for copyright
owners who are merely trying to prevent piracy of their creations on P2P
networks. Because it is virtually certain that some P2P pirates will
attempt to use those laws to prevent copyright owners from stopping piracy,
it is necessary to clarify those laws.