advertisement

October 2003 Archives

O´Reilly´s Digital Media Blogs have been expanded and are now located at a new home. To find our new blogs, please visit:
Mark Sigal

AddThis Social Bookmark Button

With 40% of the online population now having always-on, broadband connections, new types of tools are needed for consumers to better manage the online information that flows through such connections. One such tool that is needed I will call an inter-personal information manager (iPim). Before delving into what this tool does, let me speak first to what the tool does NOT try to do; namely, re-create the wheel. Therefore, where an existing piece of the information management pie is “good enough,” the design goal of the iPim is to find a way to hook into it - whether that means through an RSS feed, via screen scraping or a full-blown web service.

This iPim has seven key functions: search, save, organize, share, publish, play and transact. As we all know, searching is a cornerstone of the online experience. Fortunately, search is pretty darn good from the standpoint of Googling for information, and Google provides APIs for programmatically accessing the Google-dom, although Google’s API efforts are half-hearted to date. (A note aside, if you haven’t downloaded the Google Toolbar 2.0, you are missing out on a high utility offering.) But, search can deliver more, as evidenced by the increased industry focus on extending search support to local business listings, products, and pricing and purchasing information. On the product search front, if you haven’t checked out Amazon’s new book text search function, do so soon. It definitely adds another dimension to search (related note: there is also a good Wired article on Amazon’s support for this functionality).

But there’s one fundamental flaw to search from an information management perspective, and here is an area where the iPim fills the gap. By design, search-related activities end up in the ‘get it and forget it’ bucket, meaning that found information, if it’s really useful, has to be re-searched a second and a third time. For the most part, this is no big deal since the cost of a new search is so low. Sometimes, however, you find stories, reviews, consumer opinions and online postings that inspire you or are relevant to an interest that you want to cultivate. Perhaps you encounter products that you are thinking of buying, come across information relevant to places you frequent and trust, or hear good things about products or places from others. How do you plug that type of information into your online workflow and make it persistent? Similarly, how do you represent the character and tenor of the personal and business relationships that you want to maintain and/or improve? Related to this one is the fact that from an operational perspective, there are some times when you are in a formal research mode (e.g., researching a trip - hotels to stay at, airfares, sites to see, places to eat), while others you simply discover a piece of information that you don’t want to forget or have to re-find again.

Simply put, saving online information is messy today - do you bookmark it, email it to yourself, do a “save as,” or a copy and paste? It’s all very ad hoc, suggesting that the vehicles for saving online information and importing shared content have to get better, and that a key driver of Microsoft’s success with Longhorn will be whether they get this piece of the puzzle figured out. One thing is for sure. If you use the iPim system regularly, it will grow to many hundreds of megabytes of information - no big deal if you are accessing this information locally on your PC, but a showstopper if you are saving it all remotely. To frame this one, my Microsoft Outlook PST file is more than 900 megabytes.

So how should you go about organizing your online information? A basic premise is to support as many different types of digital content as possible. What are the content types that stand out? Clearly, there’s lots of email and web content that we generate, view and/or receive from others on a daily basis. Also, most of us have hundreds of Office and Acrobat files. Then there are your libraries of pictures, MP3s, and digital video content. You can factor in postings (both yours and others), personal and business contacts, products, news feeds, classified ads and job listings into the information management mix as well. Ideally, the iPim can support traditional discussion-style threads, maintain associations between related content items (e.g., a product in your wish list and the article that inspired adding it), generate a map of the inter-relationships between people, their common interests and the specific content that inspires them (this is where social networking players like Friendster, Live Journal, LinkedIn and Tribe.net are heading). I am highly biased on this one, but the iPim must support near real-time filtering of information to quickly enable me to drill down to a specific subset of information (through text strings) that I am looking for. As someone once said to me, “How can Google search the entire Internet in under two seconds, but to find the stuff on my local hard drive takes five minutes and too often doesn’t return what I was looking for?” To me, this suggests that there are strong synergies between how many different content types the iPim supports and how unified the meta-information model is across types. Such is the core of the iPim; namely, facilitating new ways of organizing content and information.

How the system makes sharing online information simple, yet functional is dependent on whether the iPim is built around a peer-to-peer, web-based or email-based application model. Each approach has its advantages. Peer to peer can scale to support full workspaces and tightly aligned functionality (in the case of Groove). Web browser models are platform independent and ubiquitous. Email is also ubiquitous, bi-directional (supports send and receive), can support payloads, store and forward connectivity and is multi-modal (think: application based email, browser based email, email enabled mobile devices). The bottom line is that the iPim needs to make it single click easy to share information as an email, RSS feed, web page or native attachment, as this enables the same piece of information to be used in many different contexts.

One of these contexts is publishing. This can take the form of publishing information to a web site, syndicating it to an RSS newsreader, posting it to a blog or writing an online review. An open question is whether the best approach here is to handle such functionality in line (i.e., within the information manager), or out of band via an API to the content creator’s favorite publishing tool. Basic publishing support within the information manager seems like a requirement, as it allows private information to easily be converted into public data without changing environments.

The question of inline versus out of band presents a fork in the road of sorts; namely, how far does the information manager need to go as an execution environment for “playing” different types of digital content? Does it need to be an “uber-browser” that can play all sorts of content types? While that seems to be a recipe for being a jack of all trades, master of none, kudos to Apple for what they’ve done so far with the amalgam of iLife, .Mac and the iPod. My knee jerk on this front is that there’s a big difference between robustly dealing with the structure of information and needing to be the display environment for it. Thus, the information manager, must be able to recognize the structure and applicable contexts of all sorts of listing types (e.g., grab all of the for rent listings that fit my price/location parameters on craigslist), enable me to organize my local content store into lists, provide style sheets for different presentation templates and reports (e.g., show the different elements that led to a buying decision on a product) and generate maps that provide graphical views of the information being organized (Groxis is an early leader in the information mapping arena).

The circuit closes at the transaction. We all know that Amazon and eBay have come up with transaction models that work, are customer facing and logistically effective. Even better, both provide web services for interfacing with their functionality. Also, in case you haven’t been paying attention, Yahoo has rapidly been putting the end-to-end pieces together to pursue a greater piece of the online transactions pie - from pay per click ads to premium online services to Yahoo stores. And InterActiveCorp is singularly focused in this domain with Expedia, Hotels.com, LendingTree, Match.com, Ticketmaster and Citysearch.

Someone is going to come up with a set of online tools that addresses the iPim space. Me personally, I want one yesterday. And I haven’t even broached the topic of how all of this information ties in with both my mobile devices and the equipment in my smart, connected living room. I’ll save that one for another post.

Are there products in this area that you consider standout? Does search, save, organize, share, publish, play and transact adequately describe your online information management needs?

Damien Stolarz

AddThis Social Bookmark Button

In 2001 or 2002 M.C. Robot wrote this rap about my P2P media delivery company. I figured I should put it up for all to enjoy.

image

The B to the F to the N y’all
representing layer three of the stack, y’all

Traversal is the way that we cut through your NAT, cuz universal plug and play doesn’t handle that
all you app layer suckers better drop the SOAP cos’ we got the layer three which is the straight dope
Your gui’s aint shit cuz we roll our own skins- When you step to BFN that’s when your hurt begins:

  • your shit ain’t even beta
  • you’re a media playa hata
  • your dot net frontin’ makes you ship much lata

learn the rules, bitch and use your tools bitch because we’re dropping a load
with our unmanaged code on your business plan and your .NET scam

your C is sharp but your skills are flat cuz our code is old school like a backwards hat
So to you .NET bitches and you web service ho’s, your window of opportunity is comin’ to a close
we’ll source route your packets to the DOJ when we rewrap sockets and we take your pay

You ask “why, are we hatin’ on your skills?”
and we say “why, ‘cuz we’re paying our bills with YOUR customers

and that’s why you’re crying ‘cuz our biz dev team is out there lying…
I meant laying, a trap for y’all because your chump ass tech is headed for a fall…

You’re peer to peer - you’re so last year - this distributed code is f@$king up your ears -
Our vaporware condenses inside the bhong
that our coders smoke while we’re pumping this song
and reveal the release date far beyond–
any hope you’ll respond and it won’t be long
before we recruit your best staff
and bankrupt your execs and tool ‘em and laugh…

So don’t step to us and don’t compete with us and don’t challenge us or you’ll get beat by us.
We’re givin’ you the static that you didn’t expect
cuz our code commands unconditional respect
so step off and get lost before your companies wrecked by:

  • our pimp bottom line,
  • our internet crime,
  • our freeloading sucker-punching new paradigm.

We’re a fierce f@#king blue f*#king bird of prey
and your sucker MBAs just don’t know how to play
we can see you make a move from five miles away
and our strategy team swoops down and you’re food today

You see we don’t take kindly to competition
our execs are on an executive mission
to optimize internet transmission
and you’re out of commission
on a road to perdition
so go back to stealth mode while we take your position.

-M.C. Robot

The views above may be those of my former employer but the mode of expression is probably not.

Betsy Waliszewski

AddThis Social Bookmark Button

This story came to me from Ettore Vecchione — Computer Science Lecturer, IT Specialist, Director of the Computer Learning Center and Chair of the Department of Mathematics, Computer Science and Natural Science of John Cabot University. Read on to find out how Perl, Apache and MySQL, this Open Source trio, is now playing a lovely tune to the ears of everyone at the University.

John Cabot University is a Liberal Arts College located in the heart of Rome (i.e. Trastevere). Given its increased enrollment in the past two years, the time required to process and prepare Professor/Course evaluations and evaluation results was no longer feasible based on a system of manually inputting data into Excel worksheets and Word files. Consequently, workflow for certain administrative staff had become inefficient. In addition, these individuals found it all the more taxing to carry out their day-to-day work along with fulfilling the demand for prompt evaluation results by Faculty, Dean and President during evaluation periods.

With Perl, Apache and MySQL, this Open Source trio is now playing a lovely tune to the ears of everyone at the University.

The Scenario (When In Rome Do As the Romans Do?)

Traditionally, Professor/Course evaluations at John Cabot University have been carried out with photocopied forms. In other words, students are given a copy of the evaluation during a scheduled class near the end of the semester and then complete it by checking an appropriate numeric value from a scale of 0 to 5 for each question (17 questions in total). In addition, any personal commentary is written out by hand on the same form.

From an administrative point of view, the process involved in tabulating numbers and preparing student comments is extremely inefficient and labor intensive and as a result prone to produce errors. First, any numeric data present on the evaluation are keyed into a Microsoft Excel worksheet, question by question (each professor has his or her own worksheet). Various formulas stored in the Excel template (i.e. Sum, Mean, Count, etc.) do the number crunching. Next, a global composite is set up based on all course and professor averages. Any written comments are typed into a Microsoft Word file (once again each professor has his or her own file) and then formatted accordingly (i.e. professor�s name, course, etc.).

There are roughly 105 to 115 courses scheduled per semester. 4 to 6 weeks of keying in data and formatting are required. The entire process is repeated 4 times a year (i.e. Fall, Spring, Summer I and II semesters). Roughly 2000 evaluations are keyed in every semester (Fall/Spring) except for the two Summer semesters wherein 200 to 300 evaluations are submitted per session. In total, source data for over 4600 evaluations are manually inputted per year.

Another factor to take into consideration, is the amount of time spent to set up the paper based evaluation. A tremendous amount of photocopies are required along with inserting evaluations in envelops which are then placed in Faculty mailboxes for distribution to students.

The Solution (�I came, I saw, I conquered!�)

In order to ease the burden on staff, fulfill the demand for prompt evaluation results, and above all to eliminate any possibility of human error in data collection and processing at the source, I decided to develop a system that would automate the process by letting students enter their evaluations via a password authenticated Perl/CGI application running on an Apache server. The data would be stored in a MySQL database to be later processed by way of two Perl shell scripts. These shell scripts would then output the respective Excel workbooks and Word files.

Open Source –�The Road [Not] Taken�

While sitting down at an outdoor caf� in Rome on a hot, lazy August summer day of 2001 thinking about which tools to use to develop an evaluation system that would be powerful, scalable and whose output would produce Word and Excel files, two roads came to mind: 1) Microsoft�as all of our IT resources and applications are Windows 2000 based yet proprietary in nature; and 2) Open Source�free, rather steep learning curve yet extremely flexible as it permits for seamless interfacing with proprietary applications such as Microsoft Word and Excel. As I am an individual who enjoys delving into the unknown, �I [too] took the one less traveled� so to speak and went Open Source � And that has made all the difference !�

Now, apart from the slight poetic digression, the decision to use Open Source tools, (i.e. Perl, Apache and MySQL) for the JCU Professor/Course Evaluation System was based on the following criteria:

�All system components would run in a Microsoft Windows environment;

�Source data would be entered via a Web based evaluation form interface as this would altogether eliminate the need for a paper based version thus making the system accessible throughout the University�s network at any time desired;

�Source data would reside in a RDBMS for easy manipulation of data and storage;

�Textual commentary should be easily parsed, filtered, formatted and outputted in Word;

�Numeric data should be automatically calculated, formatted and outputted in Excel;

�A heavy duty Web Server would be need to process numerous, concurrent calls to the evaluation form script; and

�Cost of development must be kept to a minimum.

A tall order you say? Well, not so as Perl, Apache and MySQL all Open Source tools, let me design and develop an evaluation system on the par with various commercially available systems going for $15,000,000 except that my system was developed at a cost of less than $500.

System Components

The Professor/Course Evaluation is a complete Open Source software solution that lets students complete their evaluations via Web, preview it and finally submit it to a MySQL database for later processing. The various components described in more detail below are what make up the guts of the system.

1)The evaluation.cgi script (a multi-state cgi application). In terms of security, it will only accept evaluations that match the Course ID password stored in the MySQL database system. If the Course ID password is correct, the data are stored for processing or else the system blocks the submission process. Moreover, as to the correct Course ID password, it is deleted from the database when the evaluation is submitted. What this means and ensures, is that students can only submit one evaluation for that course. The following Perl modules were used in developing the script: CGI.pm and DBI.pm. CSS was added for presentational effects. Lines of code: 1300.

2)The password.pl script randomly generates and stores all passwords in the MySQL database. In fact, students do not have to worry about being tracked as course passwords are given to them by way of a small slip of paper from passwords previously generated by the system and outputted in Word format. Course ID passwords are not generated based on a student�s name but rather on the total number of students registered in a particular course. Therefore, a student�s identity remains strictly anonymous. This script was developed with DBI.pm and Win32::OLE.pm. Lines of code: 1500.

3)The excel_workbook.pl script outputs a workbook containing individual worksheets for each professor as well as a composite table. As a command line program, it lets the administrative staff select the database to output from. The Excel workbook contains separate worksheets containing the numeric results for each professors. The script also numbers all worksheets and sorts them from highest to lowest score based on one-way analysis of variance (ANOVA) equation. Upon completion, the workbook is automatically emailed and zipped to destination. It was developed using the following Perl modules: DBI.pm and Spreadsheet::WriteExcel.pm. Lines of code: 5000.

4)The word_comments.pl script generates the professor comments in Microsoft Word format. It too is a command line program with the same functionality as the excel_workbook.pl script. The DBI.pm and Win32::OLE.pm modules were used here. Lines of code: 1500.

5)The statistics_tables.pl and the comments_tables.pl scripts help automate the creation of all professor tables in the MySQL databases. These two scripts rely exclusively on the DBI.pm module. Lines of code: 150.

6)There are two Perl libraries (.pl files), namely, Calculate.pl and Word.pl. These libraries contain both the math based Perl subroutines and text formatting routines used by the Excel and Word scripts. Lines of code: 800.

7)The database that house each professor�s tables, along with the password tables were developed using the MySQL 4.0 database server. There are two databases, one that houses the numerics and the other the text. Two new databases are generated every semester.

8)The Web based evaluation.cgi script runs on an Apache 2.04 server.

Processing Time with the Perl, Apache and MySQL

Once online evaluations are complete, any one who is trained in using the system can easily and quickly generate both numeric statistics and textual commentary in under 20 minutes. (Of course, the data are now inputted by the students using the CGI application, accessible from any workstation in the university, running on the Apache server). All evaluations are completed within 3 to 4 days.

I have clocked the amount of time it takes to output an Excel workbook (roughly 56,000 cells) containing approximately 105 worksheets containing professor statistics and a composite at less than 11 minutes using a Pentium II 700Mhz Compaq DeskPro with 128 mbs of RAM. In addition, the Word files (roughly 24,000 words) containing student commentaries takes about 9 minutes. Compare this to 4 to 6 weeks of manual inputting and formatting for each semester. Wow, if only other things in life were just as easy!

Cheers to �Laziness, Patience and Hubris!� Isn�t that what Perl is all about?

Development Time

As Rome was not built in a day, nor was this system. I am a �one-man-band� in a small University with many responsibilities. However, it took me about 8 months to design, develop, test and debug various versions to meet our specific needs. Early on, before the Word and Excel implementation, all statistical data and text commentary were outputted to the web browser.

As a committed developer, I find myself wanting to continually improve the system. Clean, optimized code is what I am now striving for which does take time.

Future Enhancements

The Professor/Course Online evaluation system is still in its infancy. However, the following improvements have been scheduled for the not too distant future:

1)Create an �all-in-one� GUI application using the Tk module so that administrative staff can go about managing databases, generating passwords and outputting both numeric and text based reports.

2)Re-code the existing evaluation.cgi script so that it can work on a Apache-mod_perl server.

3)Use JavaScript form validation routines instead of having the CGI application validate the information on the web server. This will free up server resources.

4)Modify the word_comments.pl program so that the Microsoft Word file contains a blank copy of the student evaluation for professor reference.

5)Create additional sql routines into the excel_workbook.pl program so that it can generate cross-department comparisons.

6)Optimize sql routines to further speed up output.

7)Create Perl Object packages for the math and text parsing/formatting routines.

Final Remarks

The Professor/Course Evaluation as of this time has been successfully used for two consecutive years. As a system, it has without a doubt sped up the generation of evaluation results, reduced errors and paper usage, produced quality information, put the university on the par with other systems and alleviated the busy schedules of certain administrative staff.

–Ettore Vecchione


Ettore Vecchione is Computer Science Lecturer, IT Specialist, Director of the Computer Learning Center and is Chair of the Department of Mathematics, Computer Science and Natural Science of John Cabot University. John Cabot University is accredited by the Middle States Commission on Higher Education. In the words of James F. Creagan, President of John Cabot University, �Ettore is a man who wears many hats!�

He holds an MA degree from the University of Toronto in Instructional Technology and enjoys developing with Open Source tools which �..[have] made all the difference!�