Sign In/My Account | View Cart  
advertisement

Hate Online PDF? New Service Creates Loose HTML Cache

   Print.Print
Email.Email weblog link
Blog this.Blog this
Sid Steward

Sid Steward
Sep. 14, 2005 02:06 PM
Permalink

Atom feed for this author. RSS 1.0 feed for this author. RSS 2.0 feed for this author.

URL: http://lookleap.com/...

How do I hate thee? Let me count the ways. Even search engines don't like PDF. Yes, PDF is my bread and butter, but I still curse when I accidentally click a PDF hyperlink and Reader begins to open. So I finally created an online service for spinning a Google-like HTML cache of online PDFs. Except it's better than Google's. Yes, it uses pdftohtml.

I created LookLeap awhile ago and have been describing it to my friends as a URL abbreviator a'la TinyURL. After I added a caching feature for HTML pages, I recalled an old dream of mine: an online service for creating an HTML cache of online PDFs.

LookLeap's PDF Cache vs. Google's PDF Cache

Google already does this for some online PDFs, but an experiment of mine showed that Google can cache only part of the PDF. And I've never been fond of how Google's cache is one, long HTML file. Here is an example of Google's cache of a small PDF.

Here is LookLeap's cache of this same PDF. By splitting the document into its pages, they are easier to read and bookmark. When you find a page of interest, the 'Source PDF Document' link will take you to that very same PDF page (in some browsers).

Nice for PDF Publishers

The best part about LookLeap's HTML cache is that it's created from PDF on demand. The next time you publish a PDF online, you can create a LookLeap cache and share its link with your readers who (like me) are shy about downloading PDF.

Search engine users will dig this, too. Instead of getting one hit linking into your 280-page PDF, they'll get many hits linking into specific LookLeap pages that match their query. From there they can open these pages in your PDF.

How to Submit a PDF to LookLeap

You can create a LookLeap cache of an online PDF by visiting LookLeap.com, entering the PDF's URL in the form field, and clicking Submit. On the next page that opens, wait for the message 'Caching Web Page ... Done!' at the bottom-right. Then click the [cached] link.

LookLeap will begin to create the HTML cache. This happens only once, and this could fail on some PDFs, such as secured PDFs or PDFs that are too large. On success, it will give you a link to its new cache.

This new feature is beta; I will be grateful for your feedback and suggestions.

Sid Steward is a programmer, writer and entrepreneur. He maintains the PDF Toolkit and wrote PDF Hacks.

Return to weblogs.oreilly.com.



Weblog authors are solely responsible for the content and accuracy of their weblogs, including opinions they express, and O'Reilly Media, Inc., disclaims any and all liabililty for that content, its accuracy, and opinions it may contain.

Creative Commons License This work is licensed under a Creative Commons License.




Sponsored By: