Determining which encoding the browser used for a url

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jon Maz

    Determining which encoding the browser used for a url

    Hi,

    I am working on a dotnet url rewriting mechanism that has to be able to deal
    with urls containing non-standard characters, eg
    http://www.mysite.com/Télécharger.

    The problem is that some browsers will encode this url using utf8 & some
    using ISO 8859 (I *think* those are the only two possibilities). For ISO
    8859 I can use the built-in UrlDecode function, for utf8 I am using a
    function I found on Google groups:

    public static string Utf8ToString(st ring inputString)
    {
    byte[] utf8Bytes = new byte[inputString.Len gth];
    for (int i=0; i < utf8Bytes.Lengt h; i++)
    {
    utf8Bytes[i] = (byte)inputStri ng[i];
    }
    return Encoding.UTF8.G etString(utf8By tes);
    }

    The problem is deciding *which* encoding the browser has used, and therefore
    which decoding function I need to use. It seems that Mozilla-based browsers
    use ISO 8859, whereas IE can use either, depending on a user-setting, and I
    haven't looked at any other browsers yet.

    As far as I know, the browser does NOT send anything in the headers that
    tell you what url-encoding it is using, so I guess I need some way of
    looking at the raw url and working out which encoding it's using.

    Can anyone help me with writing a function to do this? The ideal would be a
    GetEncoding(str ing testString) function, but I'd settle for a function
    IsUtf8Encoded(s tring testString), on the grounds that if it *isn't* utf8, it
    must be ISO 8859.

    TIA,

    JON



  • Joerg Jooss

    #2
    Re: Determining which encoding the browser used for a url

    Jon Maz wrote:
    [color=blue]
    > Hi,
    >
    > I am working on a dotnet url rewriting mechanism that has to be able
    > to deal with urls containing non-standard characters, eg
    > http://www.mysite.com/Télécharger.[/color]

    Doing this without direct control over your clients' configuration is a
    daunting task, as you've just found out ;-)

    [color=blue]
    > The problem is that some browsers will encode this url using utf8 &
    > some using ISO 8859 (I think those are the only two possibilities).[/color]

    Depends on you audience. Don't expect Chinese users to send ISO-8859-x.
    [color=blue]
    > For ISO 8859 I can use the built-in UrlDecode function, for utf8 I am
    > using a function I found on Google groups:
    >
    > public static string Utf8ToString(st ring inputString)
    > {
    > byte[] utf8Bytes = new byte[inputString.Len gth];
    > for (int i=0; i < utf8Bytes.Lengt h; i++)
    > {
    > utf8Bytes[i] = (byte)inputStri ng[i];
    > }
    > return Encoding.UTF8.G etString(utf8By tes);
    > }[/color]

    Um... why? System.Web.Http Utility has tons of methods for this,
    including
    public static string UrlDecode(strin g, Encoding);
    [color=blue]
    > The problem is deciding which encoding the browser has used, and
    > therefore which decoding function I need to use. It seems that
    > Mozilla-based browsers use ISO 8859, whereas IE can use either,
    > depending on a user-setting, and I haven't looked at any other
    > browsers yet.[/color]

    You can't solve this. It's like trying to open an arbitrary file and
    guess a correct character encoding.
    [color=blue]
    > As far as I know, the browser does NOT send anything in the headers
    > that tell you what url-encoding it is using, so I guess I need some
    > way of looking at the raw url and working out which encoding it's
    > using.[/color]

    You're right it's not defined what encoding to use. Sender and receiver
    need to agree on this.

    [color=blue]
    > Can anyone help me with writing a function to do this? The ideal
    > would be a GetEncoding(str ing testString) function, but I'd settle
    > for a function IsUtf8Encoded(s tring testString), on the grounds that
    > if it *isn't* utf8, it must be ISO 8859.[/color]

    I'd rather drop the requirement of transparently supporting non ASCII
    URL paths.

    Cheers,
    --

    mailto:news-reply@joergjoos s.de

    Comment

    Working...