Can XmlDocument.Load() method handle unicode characters?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • lamxing@gmail.com

    Can XmlDocument.Load() method handle unicode characters?

    Dear all,

    I've spent a long time to try to get the xmldocument.loa d method
    to handle UTF-8 characters, but no luck. Every time it loads a
    document contains european characters (such as the one below, output
    from google map API), it always said invalid character at position
    229, which I believe is the "ß" character.

    Can anyone point me to the right direction of how to load such
    documents using the xmldocument.loa d() method, or some other better
    ways to do this?

    Thanks!

    ---------------sample XML file------------------
    <?xml version="1.0" encoding="UTF-8" ?>
    - <kml xmlns="http://earth.google.co m/kml/2.0">
    - <Response>
    <name>germanias tr 134, berlin berlin</name>
    - <Status>
    <code>200</code>
    <request>geocod e</request>
    </Status>
    - <Placemark>
    <address>German iastraße 134, 12099 Tempelhof, Berlin, Germany</
    address>
    - <AddressDetai ls Accuracy="8"
    xmlns="urn:oasi s:names:tc:ciq: xsdschema:xAL:2 .0">
    - <Country>
    <CountryNameCod e>DE</CountryNameCode >
    - <Administrative Area>
    <Administrative AreaName>Berlin </AdministrativeA reaName>
    - <SubAdministrat iveArea>
    <SubAdministrat iveAreaName>Ber lin</SubAdministrati veAreaName>
    - <Locality>
    <LocalityName>B erlin</LocalityName>
    - <DependentLocal ity>
    <DependentLocal ityName>Tempelh of</DependentLocali tyName>
    - <Thoroughfare >
    <ThoroughfareNa me>Germaniastra ße 134</ThoroughfareNam e>
    </Thoroughfare>
    - <PostalCode>
    <PostalCodeNumb er>12099</PostalCodeNumbe r>
    </PostalCode>
    </DependentLocali ty>
    </Locality>
    </SubAdministrati veArea>
    </AdministrativeA rea>
    </Country>
    </AddressDetails>
    - <Point>
    <coordinates>13 .399486,52.4644 76,0</coordinates>
    </Point>
    </Placemark>
    </Response>
    </kml>

  • Bjoern Hoehrmann

    #2
    Re: Can XmlDocument.Loa d() method handle unicode characters?

    * lamxing@gmail.c om wrote in microsoft.publi c.dotnet.xml:
    I've spent a long time to try to get the xmldocument.loa d method
    >to handle UTF-8 characters, but no luck. Every time it loads a
    >document contains european characters (such as the one below, output
    >from google map API), it always said invalid character at position
    >229, which I believe is the "ß" character.
    Then it is most likely that your document is not UTF-8 encoded. You will
    have to check which bytes are actually at that position, e.g. using a
    hex editor (e.g., use File.OpenFile ... /e:Binary in Visual Studio). If
    the ß is encoded as two bytes C3 9F then that's either not the offending
    character, or you have other encoding problems (for example, you might
    have told the XML processor the document is US-ASCII encoded).

    Note that loading XML documents in Internet Explorer and copying and
    pasting the results does not help in any way to debug this kind of
    problem, compressing the document and loading it up to some web server
    is a more sensible approach.
    --
    Björn Höhrmann · mailto:bjoern@h oehrmann.de · http://bjoern.hoehrmann.de
    Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
    68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

    Comment

    • lamxing@gmail.com

      #3
      Re: Can XmlDocument.Loa d() method handle unicode characters?

      Thanks for your reply, Björn. Since this file is coming from a
      dynamic URL online, I just used the XmlDocument.Loa d(URL) method to
      load the xml file. In this case, how do I tell the XML processor what
      encoding the file would be before I load the document? I've saved the
      sample XML file (dynamicaly generated from google map) from IE's File-
      >Save As... , and uploaded the file to http://www.usctimes.com/gmap/
      geo.xml . It seems to open fine in the browser, does that means
      anything?


      Then it is most likely that your document is not UTF-8 encoded. You will
      have to check which bytes are actually at that position, e.g. using a
      hex editor (e.g., use File.OpenFile ... /e:Binary in Visual Studio). If
      the ß is encoded as two bytes C3 9F then that's either not the offending
      character, or you have other encoding problems (for example, you might
      have told the XML processor the document is US-ASCII encoded).
      >
      Note that loading XML documents in Internet Explorer and copying and
      pasting the results does not help in any way to debug this kind of
      problem, compressing the document and loading it up to some web server
      is a more sensible approach.
      --
      Björn Höhrmann · mailto:bjo...@h oehrmann.de ·http://bjoern.hoehrmann.de
      Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
      68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/

      Comment

      • Martin Honnen

        #4
        Re: Can XmlDocument.Loa d() method handle unicode characters?

        lamxing@gmail.c om wrote:
        >
        Since this file is coming from a
        dynamic URL online, I just used the XmlDocument.Loa d(URL) method to
        load the xml file. In this case, how do I tell the XML processor what
        encoding the file would be before I load the document?
        You don't have to tell the encoding, pass in the URL to the Load method
        and the XML parser will check the XML declaration for the declared
        encoding or will check for byte order mark and will then based on that
        information decode the bytes served to characters. If that is not
        possible you get an error.
        I've saved the
        sample XML file (dynamicaly generated from google map) from IE's File-
        >Save As... , and uploaded the file to http://www.usctimes.com/gmap/
        geo.xml . It seems to open fine in the browser, does that means
        anything?
        It also loads fine with .NET and the Load method of
        System.Xml.XmlD ocument so that file is properly encoded. And .NET parses
        it just fine (tested with .NET 1.x and 2.0).



        --

        Martin Honnen --- MVP XML

        Comment

        • lamxing@gmail.com

          #5
          Re: Can XmlDocument.Loa d() method handle unicode characters?

          Hi Martin,

          Thanks for the test result. It seems that if I load the file I
          saved earlier using XmlDocument.Loa d(), it worked fine. But when I
          tried to load the dynamic generated file directly from google map's
          server, it will cause that "invalid character in the given encoding,
          line 1, position 228" error. Does that mean google map uses the wrong
          encoding for that XML file? I don't think I can post the complete
          google map link here as the URL contains the google map API key. But
          the URL goes something like this:


          Any thoughts?


          Chris

          On Jan 31, 5:09 am, Martin Honnen <mahotr...@yaho o.dewrote:
          lamx...@gmail.c om wrote:
          >
          Since this file is coming from a
          >
          dynamic URL online, I just used the XmlDocument.Loa d(URL) method to
          load the xml file. In this case, how do I tell the XML processor what
          encoding the file would be before I load the document?
          >
          You don't have to tell the encoding, pass in the URL to the Load method
          and the XML parser will check the XML declaration for the declared
          encoding or will check for byte order mark and will then based on that
          information decode the bytes served to characters. If that is not
          possible you get an error.
          >
          I've saved the
          >
          sample XML file (dynamicaly generated from google map) from IE's File-
          Save As... , and uploaded the file tohttp://www.usctimes.co m/gmap/
          geo.xml . It seems to open fine in the browser, does that means
          anything?
          >
          It also loads fine with .NET and the Load method of
          System.Xml.XmlD ocument so that file is properly encoded. And .NET parses
          it just fine (tested with .NET 1.x and 2.0).
          >
          --
          >
          Martin Honnen --- MVP XML
          http://JavaScript.FAQTs.com/

          Comment

          • Martin Honnen

            #6
            Re: Can XmlDocument.Loa d() method handle unicode characters?

            lamxing@gmail.c om wrote:
            It seems that if I load the file I
            saved earlier using XmlDocument.Loa d(), it worked fine. But when I
            tried to load the dynamic generated file directly from google map's
            server, it will cause that "invalid character in the given encoding,
            line 1, position 228" error. Does that mean google map uses the wrong
            encoding for that XML file?
            It means that the XML is not properly encoded.


            --

            Martin Honnen --- MVP XML

            Comment

            • lamxing@gmail.com

              #7
              Re: Can XmlDocument.Loa d() method handle unicode characters?

              On Jan 31, 8:58 am, Martin Honnen <mahotr...@yaho o.dewrote:
              lamx...@gmail.c om wrote:
              It seems that if I load the file I
              saved earlier using XmlDocument.Loa d(), it worked fine. But when I
              tried to load the dynamic generated file directly from google map's
              server, it will cause that "invalid character in the given encoding,
              line 1, position 228" error. Does that mean google map uses the wrong
              encoding for that XML file?
              >
              It means that the XML is not properly encoded.
              >
              --
              >
              Martin Honnen --- MVP XML
              http://JavaScript.FAQTs.com/
              Martin,

              Do you have any suggestion on how can I load this dynamic file, or how
              to make the xml document properly encoded?

              Thanks!

              Comment

              • Bjoern Hoehrmann

                #8
                Re: Can XmlDocument.Loa d() method handle unicode characters?

                * lamxing@gmail.c om wrote in microsoft.publi c.dotnet.xml:
                >Do you have any suggestion on how can I load this dynamic file, or how
                >to make the xml document properly encoded?
                If the XML document is really not properly encoded, you should contact
                Google to have their service fixed. Until then all you can do is try to
                fix the XML document before parsing. For example, you could remove all
                non-ASCII octets or you could transcode the document from Windows-1252
                to UTF-8 using System.Text.Enc oding.
                --
                Björn Höhrmann · mailto:bjoern@h oehrmann.de · http://bjoern.hoehrmann.de
                Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
                68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

                Comment

                • lamxing@gmail.com

                  #9
                  Re: Can XmlDocument.Loa d() method handle unicode characters?

                  On Jan 31, 4:15 pm, Bjoern Hoehrmann <bjo...@hoehrma nn.dewrote:
                  * lamx...@gmail.c om wrote in microsoft.publi c.dotnet.xml:
                  >
                  Do you have any suggestion on how can I load this dynamic file, or how
                  to make the xml document properly encoded?
                  >
                  If the XML document is really not properly encoded, you should contact
                  Google to have their service fixed. Until then all you can do is try to
                  fix the XML document before parsing. For example, you could remove all
                  non-ASCII octets or you could transcode the document from Windows-1252
                  to UTF-8 using System.Text.Enc oding.
                  --
                  Björn Höhrmann · mailto:bjo...@h oehrmann.de ·http://bjoern.hoehrmann.de
                  Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
                  68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/

                  Hi Björn, Can you provide an example of how to save an online xml
                  document and transcode it to UTF-8 with System.Text.Enc oding? Thanks!

                  Comment

                  • =?Utf-8?B?SGVsZW5hIEtvdGFzIFtNU0ZUXQ==?=

                    #10
                    Re: Can XmlDocument.Loa d() method handle unicode characters?

                    First you have to find out which encoding does the dynamic document use.
                    XmlDocument/XmlTextReader by default uses UTF-8 unless there is a BOM mark or
                    encoding attribute in the XML declaration that says something else. Once you
                    find out the encoding, create a StreamReader over the input stream and
                    specify the document's encoding in its constructor. Then create an XmlReader
                    over this StreamReader and use XmlDocument.Loa d to load the document.

                    If you are sure that the document's encoding is indeed UTF-8 and there is an
                    invalid character in it, you can create an instance of UTF8Encoding that will
                    ignore invalid characters (see the UTF8Encoding constuctor).

                    -Helena


                    "lamxing@gmail. com" wrote:
                    On Jan 31, 4:15 pm, Bjoern Hoehrmann <bjo...@hoehrma nn.dewrote:
                    * lamx...@gmail.c om wrote in microsoft.publi c.dotnet.xml:
                    >Do you have any suggestion on how can I load this dynamic file, or how
                    >to make the xml document properly encoded?
                    If the XML document is really not properly encoded, you should contact
                    Google to have their service fixed. Until then all you can do is try to
                    fix the XML document before parsing. For example, you could remove all
                    non-ASCII octets or you could transcode the document from Windows-1252
                    to UTF-8 using System.Text.Enc oding.
                    --
                    Björn Höhrmann · mailto:bjo...@h oehrmann.de ·http://bjoern.hoehrmann.de
                    Weinh. Str. 22 · Telefon: +49(0)621/4309674 ·http://www.bjoernsworld.de
                    68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 ·http://www.websitedev.de/
                    >
                    >
                    Hi Björn, Can you provide an example of how to save an online xml
                    document and transcode it to UTF-8 with System.Text.Enc oding? Thanks!
                    >
                    >

                    Comment

                    • Tim Heap

                      #11
                      Re: Can XmlDocument.Loa d() method handle unicode characters?

                      Help !
                      I have the same problem and need to remove funny characters from my
                      source xml file. Please can someone supply an example..

                      Tim Heap
                      Software & Database Manager
                      POSTAR Ltd

                      tim@postar.co.u k

                      *** Sent via Developersdex http://www.developersdex.com ***

                      Comment

                      Working...