Marking words in a text

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Hvid Hat

    Marking words in a text

    Hello

    How should I go about marking certain words in a text? I've got a list of
    words:

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="Mark.xsl" ?>
    <Words>
    <Word>
    <Acronym>XML</Acronym>
    <Description>eX tensible Markup Language</Description>
    </Word>
    <Word>
    <Acronym>SGML </Acronym>
    <Description>St andard Generalized Markup Language</Description>
    </Word>
    <Word>
    <Acronym>ISO</Acronym>
    <Description>In ternational Organization for Standardization </Description>
    </Word>
    </Words>

    I want the words (acronyms) above to be marked within bold-tags in the text
    below:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:styleshe et version="1.0"
    xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"<xsl: output method="xml"
    version="1.0" encoding="UTF-8" indent="yes"/ <xsl:template match="Words">
    XML is a simple, very flexible text format derived from SGML (ISO 8879)
    </xsl:template>
    </xsl:stylesheet>

    Can someone help me on my way? :-)
  • Martin Honnen

    #2
    Re: Marking words in a text

    Hvid Hat wrote:
    I want the words (acronyms) above to be marked within bold-tags in the text
    below:
    >
    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:styleshe et version="1.0"
    xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"<xsl: output method="xml"
    version="1.0" encoding="UTF-8" indent="yes"/ <xsl:template match="Words">
    XML is a simple, very flexible text format derived from SGML (ISO 8879)
    </xsl:template>
    </xsl:stylesheet>
    That "text" is an XSLT stylesheet with output method="xml" so it is not
    clear what you want to achieve? Do you want to take your acronym list
    and transform it to HTML to be rendered in a browser?

    That is possible with

    <xsl:styleshe et
    xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"
    version="1.0">

    <xsl:output method="html" indent="yes"/>

    <xsl:template match="Words">
    <html lang="en">
    <head>
    <title>List of Acronymns</title>
    <style tyype="text/css">
    dt { font-weight: bold; }
    </style>
    </head>
    <body>
    <dl>
    <xsl:apply-templates select="Word"/>
    </dl>
    </body>
    </html>
    </xsl:template>

    <xsl:template match="Word">
    <dt>
    <xsl:value-of select="Acronym "/>
    </dt>
    <dd>
    <xsl:value-of select="Descrip tion"/>
    </dd>
    </xsl:template>

    </xsl:stylesheet>



    --

    Martin Honnen

    Comment

    • Peter Flynn

      #3
      Re: Marking words in a text

      Hvid Hat wrote:
      Hello
      >
      How should I go about marking certain words in a text? I've got a list of
      words:
      If you mean you want to automate the application of markup to a
      document, by matching each word against your list of acronyms, then it's
      probably possible in XSLT (easier in XSLT2 than 1.0) but difficult when
      you need to handle things like "in XML's model" where the "word" is not
      delimited by spaces or markup boundaries. You'd have to use a recursive
      template to isolate each word in turn and test it against your list,
      which would be slow.

      ///Peter

      <?xml version="1.0" encoding="UTF-8"?>
      <?xml-stylesheet type="text/xsl" href="Mark.xsl" ?>
      <Words>
      <Word>
      <Acronym>XML</Acronym>
      <Description>eX tensible Markup Language</Description>
      </Word>
      <Word>
      <Acronym>SGML </Acronym>
      <Description>St andard Generalized Markup Language</Description>
      </Word>
      <Word>
      <Acronym>ISO</Acronym>
      <Description>In ternational Organization for Standardization </Description>
      </Word>
      </Words>
      >
      I want the words (acronyms) above to be marked within bold-tags in the text
      below:
      >
      <?xml version="1.0" encoding="UTF-8"?>
      <xsl:styleshe et version="1.0"
      xmlns:xsl="http ://www.w3.org/1999/XSL/Transform"<xsl: output method="xml"
      version="1.0" encoding="UTF-8" indent="yes"/ <xsl:template match="Words">
      XML is a simple, very flexible text format derived from SGML (ISO 8879)
      </xsl:template>
      </xsl:stylesheet>
      >
      Can someone help me on my way? :-)

      Comment

      • Joseph J. Kesselman

        #4
        Re: Marking words in a text

        Peter Flynn wrote:
        You'd have to use a recursive
        template to isolate each word in turn and test it against your list,
        which would be slow.
        Or have the stylesheet invoke an extension function written in a
        language better suited to this task.

        Personally, I think you should make this the author's responsibility.
        Maybe use the (slow) find-words-and-tag-them as an authoring tool to
        help them do so... but encourage them to use appropriate markup in the
        first place rather than trying to reverse-engineer their text.

        Comment

        • Hvid Hat

          #5
          Re: Marking words in a text

          On 11-04-2008 21:12:48, Peter Flynn wrote:
          If you mean you want to automate the application of markup to a
          document, by matching each word against your list of acronyms, then it's
          probably possible in XSLT (easier in XSLT2 than 1.0) but difficult when
          you need to handle things like "in XML's model" where the "word" is not
          delimited by spaces or markup boundaries. You'd have to use a recursive
          template to isolate each word in turn and test it against your list,
          which would be slow.
          I'm just playing around with XSLT to improve my skills so the performance is
          not important. I'll give it a try but if anyone can help me on my way, I'd be
          appreciated :-)

          What if I wanted to mark up relating words in some text? Say I wanted to mark
          up countries consiting of more words, e.g. Faroe Islands, South Africa, New
          Zealand etc. Then I couldn't isolate each word in the text and make a
          comparision. Would I have to use a mix of contains, substring-before,
          substring-after?

          Comment

          • Hvid Hat

            #6
            Re: Marking words in a text

            On 11-04-2008 23:15:56, "Joseph J. Kesselman" wrote:
            Peter Flynn wrote:
            Or have the stylesheet invoke an extension function written in a
            language better suited to this task.
            I've written a few small extension functions in C#. I thought about writing
            an extension function to solve the problem. Any ideas on how to approach the
            problem. Create a comma-separated list of the words and pass the word list
            and the text to an extension function and have the function mark up the words
            and return the marked up text? Is it possible to access the XML containing
            the words from the extension function so I could make a List<stringwith in
            my extension function? Perhaps send the XML containing the words as a node
            set or something. Does it make sense? :-)
            Personally, I think you should make this the author's responsibility.
            Maybe use the (slow) find-words-and-tag-them as an authoring tool to
            help them do so... but encourage them to use appropriate markup in the
            first place rather than trying to reverse-engineer their text.
            I agree. I would make it an authoring tool but currently I'm just playing
            around with XSLT to improve my skills.

            Comment

            • Joseph J. Kesselman

              #7
              Re: Marking words in a text

              Hvid Hat wrote:
              What if I wanted to mark up relating words in some text?
              This is a programming problem first, then an XSLT problem. Figure out
              how you would solve it in any other programming language, so you have
              the problem well-formed and well-understood. Then figure out how to
              solve it nonprocedurally . Then implement that in XSLT... or decide not
              to do so, if it really isn't a problem well-suited to XSLT (as this may
              not be.)

              Comment

              • Peter Flynn

                #8
                Re: Marking words in a text

                Hvid Hat wrote:
                On 11-04-2008 21:12:48, Peter Flynn wrote:
                >
                >If you mean you want to automate the application of markup to a
                >document, by matching each word against your list of acronyms, then it's
                >probably possible in XSLT (easier in XSLT2 than 1.0) but difficult when
                >you need to handle things like "in XML's model" where the "word" is not
                >delimited by spaces or markup boundaries. You'd have to use a recursive
                >template to isolate each word in turn and test it against your list,
                >which would be slow.
                >
                I'm just playing around with XSLT to improve my skills so the performance is
                not important. I'll give it a try but if anyone can help me on my way, I'd be
                appreciated :-)
                >
                What if I wanted to mark up relating words in some text? Say I wanted to mark
                up countries consiting of more words, e.g. Faroe Islands, South Africa, New
                Zealand etc. Then I couldn't isolate each word in the text and make a
                comparision. Would I have to use a mix of contains, substring-before,
                substring-after?
                No, you'd pay someone to open the document in an XML editor and do it by
                hand.

                Really. If you want to apply reliable content markup on names (people,
                places, things), it's a *human* task.

                ///Peter

                Comment

                Working...