printing XML file with XSLT code

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Stu

    printing XML file with XSLT code

    Being a newbie with XSLT transformation code please excuse my neivte.
    In addition, I am not sure what I want to do can be done with xslt so
    I apologize up front for asking anything stupid

    I have a shell script that needs to get values from an XML file. What
    I want to do is transform the XML into something more KSH friendly so
    it can be easy to parsed in my KSH script.

    I would like to go through an entire XML document and for every
    "element" and "element/attribute" print the associated value in NVP
    (name value pair).

    Assume the following XML file:

    <?xml version="1.0" encoding="UTF-8"?>
    <abc>
    <def>
    <mno>2008-06-11-13:15:59</mno>
    <pqr stu="World">Hel lo</pqr>
    </def>
    <ghi>
    <jkl vwx="12345678" </jkl>
    </ghi>
    </abc>

    Below is my desired out. As you can see for each element I print
    "element=va lue" and for each attribute within an element I print
    "element_attrib ute=value"

    mno=2008-06-11-13:15:59
    pqr=Hello
    pqr_stu=World
    jkl_vwx=1234567 8

    Can somebody point me in the right direction or provide me with some
    sample XSLT transformation code that can do this.

    Keep in mind, I would like to keep this as generic as possible. That
    is I don't want to reference element or attributes by names. I would
    like something like this

    for each element
    do
    if attribute
    print element_attribu te=value
    else
    print element=value
    done

    As oppose to say search element "pqr" and print value

    Thanks to all who answer

  • =?ISO-8859-1?Q?J=FCrgen_Kahrs?=

    #2
    Re: printing XML file with XSLT code

    Stu wrote:
    Assume the following XML file:
    >
    <?xml version="1.0" encoding="UTF-8"?>
    <abc>
    <def>
    <mno>2008-06-11-13:15:59</mno>
    <pqr stu="World">Hel lo</pqr>
    </def>
    <ghi>
    <jkl vwx="12345678" </jkl>
    </ghi>
    </abc>
    >
    Below is my desired out. As you can see for each element I print
    "element=va lue" and for each attribute within an element I print
    "element_attrib ute=value"
    >
    mno=2008-06-11-13:15:59
    pqr=Hello
    pqr_stu=World
    jkl_vwx=1234567 8
    The following script in XMLgawk does it:

    @load xml
    XMLCHARDATA { data = $0 }
    XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "=" XMLATTR[i] }
    XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }

    It produces the output you wanted (except for a change in sequence).
    Can somebody point me in the right direction or provide me with some
    sample XSLT transformation code that can do this.
    >
    Keep in mind, I would like to keep this as generic as possible. That
    is I don't want to reference element or attributes by names.
    The solution above _is_ generic in the sense you described.
    Follow this link to the XMLgawk doc:


    Comment

    • Peter Flynn

      #3
      Re: printing XML file with XSLT code

      Stu wrote:
      Being a newbie with XSLT transformation code please excuse my neivte.
      In addition, I am not sure what I want to do can be done with xslt so
      I apologize up front for asking anything stupid
      >
      I have a shell script that needs to get values from an XML file. What
      I want to do is transform the XML into something more KSH friendly so
      it can be easy to parsed in my KSH script.
      >
      I would like to go through an entire XML document and for every
      "element" and "element/attribute" print the associated value in NVP
      (name value pair).
      >
      Assume the following XML file:
      >
      <?xml version="1.0" encoding="UTF-8"?>
      <abc>
      <def>
      <mno>2008-06-11-13:15:59</mno>
      <pqr stu="World">Hel lo</pqr>
      </def>
      <ghi>
      <jkl vwx="12345678" </jkl>
      </ghi>
      </abc>
      >
      Below is my desired out. As you can see for each element I print
      "element=va lue" and for each attribute within an element I print
      "element_attrib ute=value"
      >
      mno=2008-06-11-13:15:59
      pqr=Hello
      pqr_stu=World
      jkl_vwx=1234567 8
      >
      Can somebody point me in the right direction or provide me with some
      sample XSLT transformation code that can do this.
      >
      Keep in mind, I would like to keep this as generic as possible. That
      is I don't want to reference element or attributes by names. I would
      like something like this
      >
      for each element
      do
      if attribute
      print element_attribu te=value
      else
      print element=value
      done
      >
      As oppose to say search element "pqr" and print value
      >
      Thanks to all who answer
      You could run the onsgmls validating parser to output ESIS (below)
      which can trivially be processed by (eg) awk or similar.

      ?xml version="1.0" encoding="UTF-8"
      (abc
      -\n\012
      (def
      -\n\012
      (mno
      -2008-06-11-13:15:59
      )mno
      -\n\012
      Astu CDATA World
      (pqr
      -Hello
      )pqr
      -\n\012
      )def
      -\n\012
      (ghi
      -\n\012
      Avwx CDATA 12345678
      (jkl
      -
      )jkl
      -\n\012
      )ghi
      -\n\012
      )abc

      ///Peter

      Comment

      • David Carlisle

        #4
        Re: printing XML file with XSLT code

        $ saxon l.xml l.xsl

        mno=2008-06-11-13:15:59
        pqr_stu=World
        pqr=Hello
        jkl_vwx=1234567 8

        is produced by:

        <xsl:styleshe et version="1.0"
        xmlns:xsl="http ://www.w3.org/1999/XSL/Transform">
        <xsl:output method="text"/>
        <xsl:strip-space elements="*"/>
        <xsl:template match="text()">
        <xsl:value-of select="concat( ' ',name(..),'= ',.)"/>
        </xsl:template>
        <xsl:template match="@*">
        <xsl:value-of select="concat( ' ',name(..),'_ ',name(.),'=',. )"/>
        </xsl:template>
        <xsl:template match="*">
        <xsl:apply-templates select="@*|node ()"/>
        </xsl:template>
        </xsl:stylesheet>

        David

        --

        Comment

        • Hermann Peifer

          #5
          Re: printing XML file with XSLT code

          Jürgen Kahrs wrote:
          Stu wrote:
          >
          >Assume the following XML file:
          >>
          ><?xml version="1.0" encoding="UTF-8"?>
          ><abc>
          > <def>
          > <mno>2008-06-11-13:15:59</mno>
          > <pqr stu="World">Hel lo</pqr>
          > </def>
          > <ghi>
          > <jkl vwx="12345678" </jkl>
          > </ghi>
          ></abc>
          >>
          >Below is my desired out. As you can see for each element I print
          >"element=value " and for each attribute within an element I print
          >"element_attri bute=value"
          >>
          >mno=2008-06-11-13:15:59
          >pqr=Hello
          >pqr_stu=Worl d
          >jkl_vwx=123456 78
          >
          The following script in XMLgawk does it:
          >
          @load xml
          XMLCHARDATA { data = $0 }
          XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "=" XMLATTR[i] }
          XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }
          >
          It produces the output you wanted (except for a change in sequence).
          [:alnum:] delivers correct results with the given test data, but I guess you meant:

          XMLENDELEM && data ~ /[[:alnum:]]/ ...

          Hermann

          Comment

          • =?ISO-8859-1?Q?J=FCrgen_Kahrs?=

            #6
            Re: printing XML file with XSLT code

            Hermann Peifer schrieb:
            >The following script in XMLgawk does it:
            >>
            >@load xml
            >XMLCHARDATA { data = $0 }
            >XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "="
            >XMLATTR[i] }
            >XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }
            >>
            >It produces the output you wanted (except for a change in sequence).
            >
            [:alnum:] delivers correct results with the given test data, but I guess
            you meant:
            XMLENDELEM && data ~ /[[:alnum:]]/ ...
            No, I thought [:alnum:] was sufficient.
            Does it really make a difference in this example ?

            Comment

            • Hermann Peifer

              #7
              Re: printing XML file with XSLT code

              Jürgen Kahrs wrote:
              Hermann Peifer schrieb:
              >
              >>The following script in XMLgawk does it:
              >>>
              >>@load xml
              >>XMLCHARDATA { data = $0 }
              >>XMLSTARTELE M { for (i in XMLATTR) print XMLSTARTELEM "_" i "="
              >>XMLATTR[i] }
              >>XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }
              >>>
              >>It produces the output you wanted (except for a change in sequence).
              >>
              >[:alnum:] delivers correct results with the given test data, but I
              >guess you meant:
              >XMLENDELEM && data ~ /[[:alnum:]]/ ...
              >
              No, I thought [:alnum:] was sufficient.
              Does it really make a difference in this example ?
              [:alnum:] is treated as list of characters: 'a', 'l', 'm', 'n', 'u', and ':', whereas [[:alnum:]] is treated as a character class. The former will match 'Hello', but not 'HELLO', whereas the latter will match both. However, this doesn't make any difference with the given test data.

              To make your script a bit more generic and robust (in case of empty elements), I would go for:

              $ cat hermann.awk
              @load xml
              XMLCHARDATA { data = $0 }
              XMLSTARTELEM { data = ""; for (i in XMLATTR) print XMLSTARTELEM "_" i "=" XMLATTR[i] }
              XMLENDELEM && data !~ /^[[:space:]]*$/ { print XMLENDELEM "=" data }

              See below the different results for this sample data:

              $ cat file1
              <?xml version="1.0" encoding="UTF-8"?>
              <abc>
              <def>
              <mno>2008-06-11-13:15:59</mno>
              <pqr stu="World">Hel lo</pqr>
              <a1>.,-?(){}[]</a1>
              <a2>ABC</a2><a3/>
              </def>
              <ghi>
              <jkl vwx="12345678" </jkl>
              </ghi>
              </abc>

              $ xgawk -f hermann.awk file1
              mno=2008-06-11-13:15:59
              pqr_stu=World
              pqr=Hello
              a1=.,-?(){}[]
              a2=ABC
              jkl_vwx=1234567 8

              $ xgawk -f juergen.awk file1
              mno=2008-06-11-13:15:59
              pqr_stu=World
              pqr=Hello
              a2=ABC
              a3=ABC
              jkl_vwx=1234567 8

              Comment

              • =?ISO-8859-1?Q?J=FCrgen_Kahrs?=

                #8
                Re: printing XML file with XSLT code

                Hermann Peifer schrieb:
                Jürgen Kahrs wrote:
                >Hermann Peifer schrieb:
                >>
                >>>The following script in XMLgawk does it:
                >>>>
                >>>@load xml
                >>>XMLCHARDAT A { data = $0 }
                >>>XMLSTARTEL EM { for (i in XMLATTR) print XMLSTARTELEM "_" i "="
                >>>XMLATTR[i] }
                >>>XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }
                >>>>
                >>>It produces the output you wanted (except for a change in sequence).
                >>>
                >>[:alnum:] delivers correct results with the given test data, but I
                >>guess you meant:
                >>XMLENDELEM && data ~ /[[:alnum:]]/ ...
                >>
                >No, I thought [:alnum:] was sufficient.
                >Does it really make a difference in this example ?
                >
                [:alnum:] is treated as list of characters: 'a', 'l', 'm', 'n', 'u', and
                ':', whereas [[:alnum:]] is treated as a character class. The former
                will match 'Hello', but not 'HELLO', whereas the latter will match both.
                Thanks for the reminder.

                You know both languages equally well (XSL and XMLgawk).
                Would you prefer the XSL solution that was posted here ?

                Comment

                • Hermann Peifer

                  #9
                  Re: printing XML file with XSLT code

                  On Jun 12, 10:53 pm, Jürgen Kahrs <Juergen.KahrsD ELETET...@vr-web.de>
                  wrote:
                  You know both languages equally well (XSL and XMLgawk).
                  Would you prefer the XSL solution that was posted here ?
                  My rule of thumb is:

                  Big files (say: 100+ MB), with a flat, regular structure -XMLgawk
                  Small files with many optional and/or empty elements -XSL

                  Hermann

                  Comment

                  • Joseph J. Kesselman

                    #10
                    Re: printing XML file with XSLT code

                    Hermann Peifer wrote:
                    Big files (say: 100+ MB), with a flat, regular structure -XMLgawk
                    Small files with many optional and/or empty elements -XSL
                    Depends in part the XSLT processor, of course. Some handle large
                    documents better than others.

                    Comment

                    • Hermann Peifer

                      #11
                      Re: printing XML file with XSLT code

                      On Jun 13, 3:55 pm, "Joseph J. Kesselman" <keshlam-nos...@comcast. net>
                      wrote:
                      Hermann Peifer wrote:
                      Big files (say: 100+ MB), with a flat, regular structure -XMLgawk
                      Small files with many optional and/or empty elements -XSL
                      >
                      Depends in part the XSLT processor, of course. Some handle large
                      documents better than others.
                      Of course. Reality is not as black and white as my rule of thumb
                      suggests. Would you have any pointer to some helpful XSLT processor
                      comparison/benchmarking?

                      BTW, another rule of thumb is:

                      Transformation: XML to text, with regex string processing -XMLgawk
                      Transformation: XML to XML (in my context usually: XML to KML) -XSL

                      Hermann

                      Comment

                      • Joseph J. Kesselman

                        #12
                        Re: printing XML file with XSLT code

                        Hermann Peifer wrote:
                        Of course. Reality is not as black and white as my rule of thumb
                        suggests. Would you have any pointer to some helpful XSLT processor
                        comparison/benchmarking?
                        Most of what I've been doing has been using the W3C/NIST XPath and XSLT
                        conformance suites (pointed to from http://www.w3.org/QA/TheMatrix),
                        test sets such as the DataPower (now IBM) XSLTMark kernels (described at
                        http://www.xml.com/pub/a/2001/03/28/xsltmark/), or customer datasets
                        (which for obvious reasons I can't share).

                        I do know that the XSLT processor in the DataPower product can recognize
                        at least some cases where a document can be processed in a streaming
                        manner rather than reading it all into memory at once. That depends on
                        the nature of the stylesheet, of course; I'm not sure exactly where the
                        current limits are. But when this optimization works, it permits
                        handling huge documents and reduces latency, both of which are good
                        things. Websearch on "DataPower streaming" finds some discussion of this.

                        I don't think Apache Xalan has any true streaming capability yet, though
                        we've wanted it for many years. However, Xalan's internal data model
                        (DTM) is considerably more space-efficient than a standard Java DOM,
                        which improves its ability to handle large documents. (We had a version
                        of DTM which reduced overhead to only 16 bytes per XML node -- but
                        compressing things that far cost us some performance and imposed some
                        limitations we didn't like, so we had to let it grow a bit.)

                        I haven't used XMLgawk. But part of the point of XML is precisely that
                        adopting a shared (and relatively simple) syntax eases the task of
                        writing useful and reusable tools, and there's certainly a large amount
                        of "let a thousand flowers bloom" built into that assumption. I prefer
                        to stick to the W3C's standardized tools as much as possible, both to
                        push those to improve and for best portability of my work, but if
                        another tool does something XSLT really can't, or does it far better
                        than the copy of XSLT you have available to you, I'm not going to tell
                        you not to use it.

                        Comment

                        • Hermann Peifer

                          #13
                          Re: printing XML file with XSLT code

                          Joseph J. Kesselman wrote:
                          Hermann Peifer wrote:
                          >Of course. Reality is not as black and white as my rule of thumb
                          >suggests. Would you have any pointer to some helpful XSLT processor
                          >comparison/benchmarking?
                          >
                          Most of what I've been doing has been using the W3C/NIST XPath and XSLT
                          conformance suites (pointed to from http://www.w3.org/QA/TheMatrix),
                          test sets such as the DataPower (now IBM) XSLTMark kernels (described at
                          http://www.xml.com/pub/a/2001/03/28/xsltmark/), or customer datasets
                          (which for obvious reasons I can't share).
                          >
                          I do know that the XSLT processor in the DataPower product can recognize
                          at least some cases where a document can be processed in a streaming
                          manner rather than reading it all into memory at once. That depends on
                          the nature of the stylesheet, of course; I'm not sure exactly where the
                          current limits are. But when this optimization works, it permits
                          handling huge documents and reduces latency, both of which are good
                          things. Websearch on "DataPower streaming" finds some discussion of this.
                          >
                          I don't think Apache Xalan has any true streaming capability yet, though
                          we've wanted it for many years. However, Xalan's internal data model
                          (DTM) is considerably more space-efficient than a standard Java DOM,
                          which improves its ability to handle large documents. (We had a version
                          of DTM which reduced overhead to only 16 bytes per XML node -- but
                          compressing things that far cost us some performance and imposed some
                          limitations we didn't like, so we had to let it grow a bit.)
                          >
                          I haven't used XMLgawk. But part of the point of XML is precisely that
                          adopting a shared (and relatively simple) syntax eases the task of
                          writing useful and reusable tools, and there's certainly a large amount
                          of "let a thousand flowers bloom" built into that assumption. I prefer
                          to stick to the W3C's standardized tools as much as possible, both to
                          push those to improve and for best portability of my work, but if
                          another tool does something XSLT really can't, or does it far better
                          than the copy of XSLT you have available to you, I'm not going to tell
                          you not to use it.
                          Thanks for the information.

                          I can't remember that I ever came across something that XSLT really can't do, but string processing is obviously not a strength of XSLT 1.0. I read that this improved with version 2.0, but I don't have any own experience. For transforming large XML documents into text format, which in my context often includes some regex based string processing: XMLgawk continues to be my favourite tool.

                          Hermann

                          Comment

                          Working...