To all of you MS Office gurus out there, here’s a chance to help out with a *GREAT* cause: As many of you will already know, Lawrence Lessig has made a shift in his fight for Free Culture, placing his attention on fighting the corruption that sits at the very foundation of the problems that keep us from living in this same mentioned free culture: Free from the corruptions that keep things in a state of… hmmm… not so much free and instead, not free.

As Professor Lessig makes mention in the same linked post from above,

Instead, as soon as I can locate some necessary technical help, I will be moving every presentation I have made (that I can) to a Mixter site (see, e.g., ccMixter) where others can freely download and remix what I’ve done, and use it however they like. I will continue to work to get all my books licensed freely. And I am currently finishing one last book about these issues that I hope will make at least some new contributions.

While the mixter site is currently live, at present time it lacks the necessary content that will enable anyone, anywhere to remix, cut-up, mashup, extend, and share the results of this work with the rest of the world. Related to this, via the getID3 overview page (getID3 is what ccMixter utilizes to sniff files to ensure they are what they say they are) on SourceForge,

# Formats identified, but not parsed:

* PDF
* RAR
* MS Office (.doc, .xls, etc)

In other words, while it can identify whether or not its an MS Office document type, it lacks the ability to parse that content to ensure it *TRULY* is what it claims to be. As we all know with good comes bad, the good in this case represented by the ability to extend the capabilities of a MS Office document via scripts and macros, the bad represented by the ability to extend the capabilities of a MS Office document via scripts and macros ;-)

With the above in mind, herein exists the opportunity to help: How do we go about putting into place various safeguards to ensure that the various MS Office documents that are remixed, cut-up, mashed up, and extended can be uploaded back to the site w/o concern that the resulting work will contain *very very bad things*? Does the server-side software** already exist to parse through a MS Office doctype and strip out any potential harmful scripts/macros? For obvious reasons this would all need to be automated. Also, any other ideas of how we could go about this to ensure a safe, happy, and enjoyable experience for everyone who chooses to participate?

Thanks in advance for any and all help with this task, everyone! *VERY* much appreciated!

** This is sitting on a custom Linux-build on EC2, so the utilities would either need to be Unix*-server friendly, or someone from — oh I don’t know, maybe Microsoft? ;-) — would need to donate a license for a Win2k3 or Longhorn Server instance that could be run via QEMU on this same mentioned EC2 instance.