NITF: cant load objDOM because of HTML-entities

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ragnar Heil

    NITF: cant load objDOM because of HTML-entities

    Hi,

    I am receiving news from a press-agency in NITF-XML.
    Then I want to import them into my CMS using XML&SOAP.
    The import-tool runs fine if I have got an xml-document with real
    German special characters, not HTML entities.

    Unfortunately I receive the news with entities and get this error
    (translate from German):
    Parse Error in input XML file: Reference to a not definded entity
    'auml'.

    my code:
    Set objDom = CreateObject("M SXML2.DOMDocume nt.3.0")
    objDom.setPrope rty "SelectionLangu age", "XPath"
    objDom.async = False objDom.setPrope rty "SelectionNames paces",
    "xmlns:tcmapi=' http://www.tridion.com/ContentManager/5.0/TCMAPI'"
    objDom.Load (strFilePath & strXmlFileName)
    If Not objDom.parseErr or.reason = "" Then
    WriteToLog "Parse Error in input XML file: " &
    objDom.parseErr or.reason
    End If

    thanks for your help!
    Ragnar
  • Martin Honnen

    #2
    Re: NITF: cant load objDOM because of HTML-entities



    Ragnar Heil wrote:

    [color=blue]
    > I am receiving news from a press-agency in NITF-XML.
    > Then I want to import them into my CMS using XML&SOAP.
    > The import-tool runs fine if I have got an xml-document with real
    > German special characters, not HTML entities.
    >
    > Unfortunately I receive the news with entities and get this error
    > (translate from German):
    > Parse Error in input XML file: Reference to a not definded entity
    > 'auml'.
    >
    > my code:
    > Set objDom = CreateObject("M SXML2.DOMDocume nt.3.0")
    > objDom.setPrope rty "SelectionLangu age", "XPath"
    > objDom.async = False objDom.setPrope rty "SelectionNames paces",
    > "xmlns:tcmapi=' http://www.tridion.com/ContentManager/5.0/TCMAPI'"
    > objDom.Load (strFilePath & strXmlFileName)
    > If Not objDom.parseErr or.reason = "" Then
    > WriteToLog "Parse Error in input XML file: " &
    > objDom.parseErr or.reason
    > End If[/color]

    Well if an XML document uses entity references those entities need to be
    defined thus if @auml; is used there needs to be an entity declaration
    in the document type definition that declares the entity, otherwise the
    XML is not well-formed.

    --

    Martin Honnen

    Comment

    • Ragnar Heil

      #3
      Re: NITF: cant load objDOM because of HTML-entities

      Martin Honnen <mahotrash@yaho o.de> wrote in news:419ce15d$0 $28979$9b4e6d93
      @newsread4.arco r-online.net:
      [color=blue]
      > Well if an XML document uses entity references those entities need to be
      > defined thus if @auml; is used there needs to be an entity declaration
      > in the document type definition that declares the entity, otherwise the
      > XML is not well-formed.[/color]

      Hi Martin,

      now I have seen that this thread talks about a similar issue
      Subject: XML: "undefined entity"
      news:cnifpk$22e $1@netlx020.civ .utwente.nl

      yes, you are right, entity references have to be defined in the DTD like
      <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN"
      "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">

      I am really wondering why the NITF-files have no reference to a DTD.
      I could modify the NITF.dtd on our server but not the incoming files.
      Would you do it? take the incoming files and add a DTD-reference to them?
      Then I also can do another way of hacking and replace all entities with the
      real special characters (Umlaute).

      Comment

      • Johannes Koch

        #4
        Re: NITF: cant load objDOM because of HTML-entities

        Ragnar Heil wrote:[color=blue]
        > I am receiving news from a press-agency in NITF-XML.
        > Then I want to import them into my CMS using XML&SOAP.
        > The import-tool runs fine if I have got an xml-document with real
        > German special characters, not HTML entities.
        >
        > Unfortunately I receive the news with entities[/color]

        Tell the press agency to send XML:
        a) use characters directly with the appropriat encoding, or
        b) use numerical references (e.g. &#xfc; for german u umlaut).
        and to add a document type declaration.

        If you have a contract with them to get NITF-XML, they have to fulfill
        their part (send NITF-XML and not some code that looks like XML).
        --
        Johannes Koch
        In te domine speravi; non confundar in aeternum.
        (Te Deum, 4th cent.)

        Comment

        • Martin Honnen

          #5
          Re: NITF: cant load objDOM because of HTML-entities



          Ragnar Heil wrote:

          [color=blue]
          > I am really wondering why the NITF-files have no reference to a DTD.
          > I could modify the NITF.dtd on our server but not the incoming files.
          > Would you do it? take the incoming files and add a DTD-reference to them?[/color]

          If someone tells you that he is going to provide XML and it is not XML
          then you should probably insist that XML is being sent and not something
          that fullfills some rules of XML but not others. Otherwise you are
          forced to fix their not well-formed markup and as you can't use existing
          XML parsers to that you are left with some text processing.

          --

          Martin Honnen

          Comment

          • Ragnar Heil

            #6
            Re: NITF: cant load objDOM because of HTML-entities

            Johannes Koch <koch@w3develop ment.de> wrote in news:305s27F2tc lrsU1@uni-
            berlin.de:
            [color=blue]
            > If you have a contract with them to get NITF-XML, they have to fulfill
            > their part (send NITF-XML and not some code that looks like XML).[/color]

            HI Johannes and Martin,

            now I talked to a technical person from the press agency.
            They are aware that their NITF-xml-documents are not valid and wellformed
            :-(

            Now I am thinking of ways how to load the news-file into my objDOM without
            getting an error message from the parser which checks the validation


            Ragnar

            Comment

            • Johannes Koch

              #7
              Re: NITF: cant load objDOM because of HTML-entities

              Ragnar Heil wrote:[color=blue]
              > now I talked to a technical person from the press agency.
              > They are aware that their NITF-xml-documents are not valid and wellformed
              > :-([/color]

              And they don't want to change it?
              --
              Johannes Koch
              In te domine speravi; non confundar in aeternum.
              (Te Deum, 4th cent.)

              Comment

              • Ragnar Heil

                #8
                Re: NITF: cant load objDOM because of HTML-entities

                Johannes Koch <koch@w3develop ment.de> wrote in
                news:306a8cF2qr 7igU1@uni-berlin.de:
                [color=blue]
                > And they don't want to change it?[/color]

                well, I am going to mention this to DPA ;-)

                Are you aware of any tools which convert files with entities to files with
                Umlaute?


                Ragnar

                Comment

                • Johannes Koch

                  #9
                  Re: NITF: cant load objDOM because of HTML-entities

                  Ragnar Heil wrote:[color=blue]
                  > well, I am going to mention this to DPA ;-)[/color]

                  Good luck :-)
                  [color=blue]
                  > Are you aware of any tools which convert files with entities to files with
                  > Umlaute?[/color]

                  Maybe, recode can do this.
                  --
                  Johannes Koch
                  In te domine speravi; non confundar in aeternum.
                  (Te Deum, 4th cent.)

                  Comment

                  • Andy Dingley

                    #10
                    Re: NITF: cant load objDOM because of HTML-entities

                    On 18 Nov 2004 09:09:44 -0800, r@gnar.de (Ragnar Heil) wrote:
                    [color=blue]
                    >I am receiving news from a press-agency in NITF-XML.[/color]

                    Most (some ? / many ? / nearly all ?) NITF / NewsML / RSS feeds become
                    invalid whenever they encounters an accented character. You have no
                    practical hope of fixing this, because the organisations are beyond
                    your control and you really just have to deal with the garbage they're
                    sending you. Raise the issue with them, complain as loudly as you
                    can, but don't expect them to fix it.

                    I use some very ugly pre-processor code before the parser. If the
                    first parse attempt fails for this reason, I re-try with a version
                    that has had a reference to an appropriate local DTD added to it.

                    --
                    Smert' spamionam

                    Comment

                    • Ragnar Heil

                      #11
                      Re: NITF: cant load objDOM because of HTML-entities

                      Johannes Koch <koch@w3develop ment.de> wrote in
                      news:306elhF2r7 p4kU1@uni-berlin.de:
                      [color=blue]
                      > Maybe, recode can do this.[/color]

                      Now I am using SED which works fine. I also had HTMLTidy running, same
                      positive results


                      Ragnar


                      Comment

                      Working...