c++ XML processor class?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Pep

    c++ XML processor class?

    Hi anyone know of a C++ class capable of parsing a XML stream in to
    elements?

    I have tried using the xerces class but unfortunately this requires me
    to do a lot of complex processing to isolate the elements and their
    attributes and content which I do not want to do.

    I want a class that will parse the XML stream and then allow me to
    iterate the elements recursively, similar to this

    void iterateElements (element)
    {

    for (element.attrib utes)
    {
    attributePair = element.nextAtt ribute();
    // do some processing on the attribute pair
    }

    elementPair = element.getCont entPair();
    // do some processing on the element content

    for (element.elemen ts)
    {
    iterateElements (element.nextEl ement()); // recursively call
    this function
    }

    }

    So I would get a key/data pair for each element and for each element
    attribute.

    Here's hoping :)

  • Pavel Lepin

    #2
    Re: c++ XML processor class?


    Pep <pepaltavista@y ahoo.co.ukwrote in
    <1179479857.533 206.54300@o5g20 00hsb.googlegro ups.com>:
    I have tried using the xerces class but unfortunately this
    requires me to do a lot of complex processing to isolate
    the elements and their attributes and content which I do
    not want to do.
    Do you imply xerces-c++ doesn't have a DOM parser? I can
    hardly believe that... Hmm, of course it does:



    If your problem is that you find DOM API cumbersome, I would
    seriously recommend getting over it. Modules / components /
    class libs for parsing XML using something less elephantine
    than DOM certainly do exist (perl5's XML::Simple comes to
    mind... rather forcefully, in fact), certainly do have
    their uses, but also certainly have a big
    problem--generally, you cannot predict when you are going
    to run into one of their inherent limitations so that your
    project comes to a screeching halt at the worst possible
    moment.

    If your problem is that you need a streaming parser for
    whatever reason, I believe SAX is the only practical
    choice. I've no hands-on experience with SAX parsers, but
    from what I've heard using:



    ....should be straightforward enough.
    elementPair = element.getCont entPair();
    // do some processing on the element content
    Define 'element content'. string(.)? That's, generally
    speaking, is a bit broken. text()? That's not too good
    either. *? Then you don't need all that nonsense with
    processing 'next element' recursively.
    for (element.elemen ts)
    {
    iterateElements (element.nextEl ement()); //
    recursively call this function
    Define 'next element'. following-sibling::*[1]?
    following::*[1]? (Hint: in this case you lose important
    information about the document.)
    So I would get a key/data pair for each element and for
    each element attribute.
    'Key/data pair' in element context sounds fishy to me since
    you seem to imply--correct me if I'm wrong--that 'data'
    would be primitive, and not a tree (which it is in
    practice).

    --
    Pavel Lepin

    Comment

    • usenet@tech-know-ware.com

      #3
      Re: c++ XML processor class?

      On 18 May, 10:17, Pep <pepaltavi...@y ahoo.co.ukwrote :
      Hi anyone know of a C++ class capable of parsing a XML stream in to
      elements?
      >
      I have tried using the xerces class but unfortunately this requires me
      to do a lot of complex processing to isolate the elements and their
      attributes and content which I do not want to do.
      >
      I want a class that will parse the XML stream and then allow me to
      iterate the elements recursively, similar to this
      >
      void iterateElements (element)
      {
      >
      for (element.attrib utes)
      {
      attributePair = element.nextAtt ribute();
      // do some processing on the attribute pair
      }
      >
      elementPair = element.getCont entPair();
      // do some processing on the element content
      >
      for (element.elemen ts)
      {
      iterateElements (element.nextEl ement()); // recursively call
      this function
      }
      >
      }
      >
      So I would get a key/data pair for each element and for each element
      attribute.
      >
      Here's hoping :)
      It looks like you're looking for a pull-parser.

      The Microsoft XML-lite C++ parser (http://msdn2.microsoft.com/en-us/
      library/ms752838.aspx) is such a parser, although it's only available
      as a DLL and hence it may not be appropriate for you. I don't think
      it supports validation against a schema, but I could be wrong.

      libxml2 (http://xmlsoft.org/) also has such a parser, but written in
      C. This has source code available (I think under MIT license, but
      you'd best check if you're interested). I believe this can validate
      against a schema if needed.

      StAX (as opposed to SAX) is a specification that defines a pull-
      parser. But I'm not sure how well implementations conform to the
      definition. However, searching for something like "C++ StAX" might
      yield additional results.

      HTH,

      Pete.
      =============== =============== ===============
      Pete Cordell
      Tech-Know-Ware Ltd
      for XML Schema to C++ data binding visit
      Codalogic LMX generates C++ code to read and write XML data. Speed up code development and reduce bugs.

      Codalogic LMX generates C++ code to read and write XML data. Speed up code development and reduce bugs.

      =============== =============== ===============

      Comment

      • Pep

        #4
        Re: c++ XML processor class?


        Pavel Lepin wrote:
        Pep <pepaltavista@y ahoo.co.ukwrote in
        <1179479857.533 206.54300@o5g20 00hsb.googlegro ups.com>:
        I have tried using the xerces class but unfortunately this
        requires me to do a lot of complex processing to isolate
        the elements and their attributes and content which I do
        not want to do.
        >
        Do you imply xerces-c++ doesn't have a DOM parser? I can
        hardly believe that... Hmm, of course it does:
        >

        >
        If your problem is that you find DOM API cumbersome, I would
        seriously recommend getting over it. Modules / components /
        class libs for parsing XML using something less elephantine
        than DOM certainly do exist (perl5's XML::Simple comes to
        mind... rather forcefully, in fact), certainly do have
        their uses, but also certainly have a big
        problem--generally, you cannot predict when you are going
        to run into one of their inherent limitations so that your
        project comes to a screeching halt at the worst possible
        moment.
        >
        If your problem is that you need a streaming parser for
        whatever reason, I believe SAX is the only practical
        choice. I've no hands-on experience with SAX parsers, but
        from what I've heard using:
        >

        >
        ...should be straightforward enough.
        >
        elementPair = element.getCont entPair();
        // do some processing on the element content
        >
        Define 'element content'. string(.)? That's, generally
        speaking, is a bit broken. text()? That's not too good
        either. *? Then you don't need all that nonsense with
        processing 'next element' recursively.
        >
        for (element.elemen ts)
        {
        iterateElements (element.nextEl ement()); //
        recursively call this function
        >
        Define 'next element'. following-sibling::*[1]?
        following::*[1]? (Hint: in this case you lose important
        information about the document.)
        >
        So I would get a key/data pair for each element and for
        each element attribute.
        >
        'Key/data pair' in element context sounds fishy to me since
        you seem to imply--correct me if I'm wrong--that 'data'
        would be primitive, and not a tree (which it is in
        practice).
        >
        --
        Pavel Lepin
        Erm, I think you miss the point here.

        No I'm not implying or suggesting that xerces does not have a dom
        parser, rather I don't see a easy way of traversing a tree with it and
        I admit this may well be my inexperience with the library.

        As for you ripping apart what is obviously pseudo code supplied by me
        to illustrate the simple task I want to perform, I don't get your
        point. Irrespective of whether the data is in a tree format or not,
        xml does indeed have data in the form of key pairs and it is simply
        the key pairs I want to deal with not the whole tree structure.

        As it happens I have now looked at the libxml2 class and found i can
        quickly traverse the tree in a less complex manner than I had to
        follow with the xerces library, though this is probably because the
        documentation is slightly better.

        So in using the libxml2 class I can quickly get to the data that I
        want which is in a crude key/pair format i.e.

        <Cat ID="1" >
        <CatName>Models </CatName>
        </Cat>

        Which crudely gives key pair ID:1 from the <Catelement and
        text:Models from the <CatNameelement . Admittedly I have to do a
        little processing in order to derive the key/pair data entities I want
        but I get the end result.

        So like i said, I don't see your point in trying to analyse someones
        pseudo code with the attempt to imply the notation of key/pair as
        being "fishy"?

        Still thanks anyway ;)

        Comment

        • Pep

          #5
          Re: c++ XML processor class?


          use...@tech-know-ware.com wrote:
          On 18 May, 10:17, Pep <pepaltavi...@y ahoo.co.ukwrote :
          Hi anyone know of a C++ class capable of parsing a XML stream in to
          elements?

          I have tried using the xerces class but unfortunately this requires me
          to do a lot of complex processing to isolate the elements and their
          attributes and content which I do not want to do.

          I want a class that will parse the XML stream and then allow me to
          iterate the elements recursively, similar to this

          void iterateElements (element)
          {

          for (element.attrib utes)
          {
          attributePair = element.nextAtt ribute();
          // do some processing on the attribute pair
          }

          elementPair = element.getCont entPair();
          // do some processing on the element content

          for (element.elemen ts)
          {
          iterateElements (element.nextEl ement()); // recursively call
          this function
          }

          }

          So I would get a key/data pair for each element and for each element
          attribute.

          Here's hoping :)
          >
          It looks like you're looking for a pull-parser.
          >
          The Microsoft XML-lite C++ parser (http://msdn2.microsoft.com/en-us/
          library/ms752838.aspx) is such a parser, although it's only available
          as a DLL and hence it may not be appropriate for you. I don't think
          it supports validation against a schema, but I could be wrong.
          >
          libxml2 (http://xmlsoft.org/) also has such a parser, but written in
          C. This has source code available (I think under MIT license, but
          you'd best check if you're interested). I believe this can validate
          against a schema if needed.
          >
          StAX (as opposed to SAX) is a specification that defines a pull-
          parser. But I'm not sure how well implementations conform to the
          definition. However, searching for something like "C++ StAX" might
          yield additional results.
          >
          HTH,
          >
          Pete.
          =============== =============== ===============
          Pete Cordell
          Tech-Know-Ware Ltd
          for XML Schema to C++ data binding visit
          Codalogic LMX generates C++ code to read and write XML data. Speed up code development and reduce bugs.

          Codalogic LMX generates C++ code to read and write XML data. Speed up code development and reduce bugs.

          =============== =============== ===============
          Hey thanks Pete, a pull-parser is definitely what I want although I
          was not aware of the correct terminology here.

          I have since my OP, looked at libxml2 and adopted it's use. Which is
          great as it is C compliant and therefor C++ compliant by default and
          although I did not mention the architecture requirement, is nix
          compatible so it ticks all the boxes.

          So now I am trundling through the documentation and sample program to
          quickly develop the tool I need.

          Thanks again,
          Pep.

          Comment

          • =?ISO-8859-1?Q?J=FCrgen_Kahrs?=

            #6
            Re: c++ XML processor class?

            Pep wrote:
            So I would get a key/data pair for each element and for each element
            attribute.
            Did you consider a scripting language ?
            You said you wanted to simply pull one element
            after the other and also look at the attributes.



            This script reads one element after the other and
            simply prints an outline:

            @load xml
            XMLSTARTELEM {
            printf("%*s%s", 2*XMLDEPTH-2, "", XMLSTARTELEM)
            for (i=1; i<=NF; i++)
            printf(" %s='%s'", $i, XMLATTR[$i])
            print ""
            }

            That's all.

            Comment

            • Pavel Lepin

              #7
              Re: c++ XML processor class?


              Pep <pepaltavista@y ahoo.co.ukwrote in
              <1179490388.856 688.299380@p77g 2000hsh.googleg roups.com>:
              Pavel Lepin wrote:
              >Pep <pepaltavista@y ahoo.co.ukwrote in
              ><1179479857.53 3206.54300@o5g2 000hsb.googlegr oups.com>:
              I have tried using the xerces class but unfortunately
              this requires me to do a lot of complex processing to
              isolate the elements and their attributes and content
              which I do not want to do.
              >>
              >Do you imply xerces-c++ doesn't have a DOM parser? I can
              >hardly believe that... Hmm, of course it does:
              >>
              >If your problem is that you find DOM API cumbersome, I
              >would seriously recommend getting over it.
              >>
              >If your problem is that you need a streaming parser for
              >whatever reason, I believe SAX is the only practical
              >choice.
              >
              Erm, I think you miss the point here.
              That's what I thought, because I couldn't really see what
              your problem was...
              No I'm not implying or suggesting that xerces does not
              have a dom parser, rather I don't see a easy way of
              traversing a tree with it and I admit this may well be my
              inexperience with the library.
              ....on the other hand, maybe not. Is there any specific
              problem you're having with DOM tree traversal as
              implemented in xerces-c++? As I said, DOM might *seem* a
              bit cumbersome, and, well, I suppose it *is* a bit on the
              cumbersome side, but can you be a bit more specific on what
              gives you trouble with traversing the tree?
              As for you ripping apart what is obviously pseudo code
              supplied by me to illustrate the simple task I want to
              perform, I don't get your point.
              My point wasn't really anything about your pseudo-code, but
              rather that I perceive a problem with your way of thinking
              about XML processing. Naturally, I might be mistaken, my
              opinion being based solely on the code and comments you
              posted...
              Irrespective of whether the data is in a tree format or
              not, xml does indeed have data in the form of key pairs
              and it is simply the key pairs I want to deal with not the
              whole tree structure.
              There's no 'whether'. Any XML document represents a tree.
              You could, indeed, say that nodes are 'key-data' pairs, but
              only if you fully understand that in case of element
              nodes 'data' is always a list of nodes. Now that I think
              about it, there are no explicit keys, so you couldn't even
              say that.

              Okay, I guess I just might be on the wrong level of
              abstraction here and that causes misunderstandin g. If
              you're talking about documents similar to:

              <document>
              <data key="foo">bar</data>
              <data key="baz">quux</data>
              <etc/>
              </document>

              ....then my point would be that you probably don't need
              actual traversal anymore as soon as you reach one of
              the 'data' elements. getAttributeNS( ) and getTextContent( )
              should do anyway, since you would know the semantics of
              data elements.
              As it happens I have now looked at the libxml2 class and
              found i can quickly traverse the tree in a less complex
              manner than I had to follow with the xerces library,
              though this is probably because the documentation is
              slightly better.
              Whatever works for you. libxml2 is certainly workable, and I
              don't believe there are any significant limitations. There
              are just two points against it I think: it doesn't
              implement the W3C DOM API (although I think there was an
              adapter of sorts, developer separately from libxml2 itself)
              and it's written in C (but that's probably irrelevant in
              your case).
              So in using the libxml2 class I can quickly get to the
              data that I want which is in a crude key/pair format i.e.
              >
              <Cat ID="1" >
              <CatName>Models </CatName>
              </Cat>
              Oh yeah, I thought I was missing something. Wrong level of
              abstraction. I thought you were perceiving nodes themselves
              as key-value pairs.
              Which crudely gives key pair ID:1 from the <Catelement
              and text:Models from the <CatNameelement . Admittedly I
              have to do a little processing in order to derive the
              key/pair data entities I want but I get the end result.
              Well, it would work the same way with xerces-c++. I suppose
              libxml2 is a bit more light-weight, but in my eyes that is
              offset by it being non-standard. YMMV.
              So like i said, I don't see your point in trying to
              analyse someones pseudo code with the attempt to imply the
              notation of key/pair as being "fishy"?
              If you *represent* key-value pairs in XML that is perfectly
              okay I suppose. What I was objecting to was perceiving
              nodes as key-value pairs. Just a bit of misunderstandin g,
              as I said.

              --
              Pavel Lepin

              Comment

              • Boris Kolpackov

                #8
                Re: c++ XML processor class?

                Hi,

                Pep <pepaltavista@y ahoo.co.ukwrite s:
                So in using the libxml2 class I can quickly get to the data that I
                want which is in a crude key/pair format i.e.
                >
                <Cat ID="1" >
                <CatName>Models </CatName>
                </Cat>
                If all you need is to get the data stored in XML then a data
                binding approach may be an easy solution. In short you will
                have C++ classes generated that model your XML and which you
                can use to get to the data in a more convenient way:

                class Cat
                {
                int ID () const;
                string CatName () const;
                };

                Cat c = cat ("cat.xml");

                cout << c.ID () << " " << c.CatName () << endl;


                The following article provide a quick intro to XML data binding in
                C++:




                hth,
                -boris
                --
                Boris Kolpackov
                Code Synthesis Tools CC

                Open-Source, Cross-Platform C++ XML Data Binding

                Comment

                • Pep

                  #9
                  Re: c++ XML processor class?


                  Jürgen Kahrs wrote:
                  Pep wrote:
                  >
                  So I would get a key/data pair for each element and for each element
                  attribute.
                  >
                  Did you consider a scripting language ?
                  You said you wanted to simply pull one element
                  after the other and also look at the attributes.
                  >

                  >
                  This script reads one element after the other and
                  simply prints an outline:
                  >
                  @load xml
                  XMLSTARTELEM {
                  printf("%*s%s", 2*XMLDEPTH-2, "", XMLSTARTELEM)
                  for (i=1; i<=NF; i++)
                  printf(" %s='%s'", $i, XMLATTR[$i])
                  print ""
                  }
                  >
                  That's all.
                  Thanks, it looks interesting but unfortunately I have to do this as
                  part of a c++ library, so scripting is not an option for me.

                  Comment

                  • Pep

                    #10
                    Re: c++ XML processor class?


                    Boris Kolpackov wrote:
                    Hi,
                    >
                    Pep <pepaltavista@y ahoo.co.ukwrite s:
                    >
                    So in using the libxml2 class I can quickly get to the data that I
                    want which is in a crude key/pair format i.e.

                    <Cat ID="1" >
                    <CatName>Models </CatName>
                    </Cat>
                    >
                    If all you need is to get the data stored in XML then a data
                    binding approach may be an easy solution. In short you will
                    have C++ classes generated that model your XML and which you
                    can use to get to the data in a more convenient way:
                    >
                    class Cat
                    {
                    int ID () const;
                    string CatName () const;
                    };
                    >
                    Cat c = cat ("cat.xml");
                    >
                    cout << c.ID () << " " << c.CatName () << endl;
                    >
                    >
                    The following article provide a quick intro to XML data binding in
                    C++:
                    >

                    >
                    >
                    hth,
                    -boris
                    --
                    Boris Kolpackov
                    Code Synthesis Tools CC

                    Open-Source, Cross-Platform C++ XML Data Binding
                    Thanks Boris, I have found a solution to my problem using libxml2 but
                    as always, I am now interested in XML as I have to use it now, so I
                    will look in to URI you posted.

                    Comment

                    Working...