A Canary Trap for URI Escaping
Subject:   Sounds like a work-around
Date:   2006-05-29 13:50:49
From:   BasSchulte

I may be wrong but this raises a big, BIG, red flag with me.

Sorta like: hey, I got this string from somewhere, let's try decoding it until I think it looks right (I've been through some utf-8 challenges lately...).

Unless you're dealing with idiots that supply you with data they say is in a given encoding that are plain wrong, just make sure you know what you're doing.

What are they sending me (tcpdump is your friend)? How is it encoded? How do I decode it to something my environment handles?

It all sounds so work-around-ish that it hurts.

But hey, if it works (good enough) for you, go ahead ;)



Full Threads Newest First

Showing messages 1 through 4 of 4.

  • Sounds like a work-around
    2006-05-30 09:51:34  Robert Spier | O'Reilly Author [View]

    You are missing the point -- this is about escaping, not encoding. Browsers and multiple redirects will often re-escape things in annoying and unexpected ways.
    • Sounds like a work-around
      2006-06-04 10:40:22  BasSchulte [View]

      Escaping/unescaping, encoding/decoding, same thing.
      • Escaping vs Encoding.
        2006-06-05 21:09:34  Robert Spier | O'Reilly Author [View]

        Not at all. To oversimplify, Encoding is about what the bits mean. Escaping is about marking certain character sequences that have special meaning.
        • Escaping vs Encoding.
          2006-07-31 12:55:50  rdeforest [View]

          I agree with BasSchulte - Escaping is a kind of Encoding. Both are ways of translating between one symbol system and another. Escaping is a the subset of encodinng where the contents are enveloped within the target coding. It is irrelevant that escaping uses prefixes to tag metacharachters. The problem (over-encoding) can still exist in other contexts.

          I like the idea of adding a 'canary' to detect over-coding, but I would prefer to use something more robust, like a CRC and I don't like the idea of using it to determine when to stop decoding.

          In the multiple redirect situation described in the article, I would prefer to fix the root problem: the redirects should not have been re-escaping the original data. This canary solution just hides the problem.