Regular Expression help

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Rob

    Regular Expression help

    Hi,
    I need to convert our word documents to html for our website. I've used
    MS Word's "Save as HTML" feature and ran "Microsoft Office HTML Filtrer
    2.0" to clean up the code but I am stuck with a lot of additional code
    and I want to write a script that will do a custom cleanup.

    The Word document has a "Table of Contents" and when I convert, I get
    links at the top of my page that link to the appropriate section but I
    get code like this:

    <a name="_Toc54767 572"></a><a name="_Toc58978 952"></a><a
    name="_Toc58980 987"></a><a
    name="_Toc58981 749"></a><a name="_Toc90871 301"></a><a
    name="_Toc93973 545"></a><a
    name="_Toc12611 4863"></a>
    <a name="_Toc15739 1168">My Title</a>

    I get a whole bunch of empty anchor tags each with a different name and
    only the last anchor tag is correct. I would like to use regular
    expressions to remove all empty "a" tags.

    I know how to use regular expressions with ASP 3.0 but I don't know the
    pattern.

    Does anyone know the regex.pattern to replace all empty <atags with an
    empty string?

    Thanks
    Rob



    *** Sent via Developersdex http://www.developersdex.com ***
  • Alexey Smirnov

    #2
    Re: Regular Expression help


    "Rob" <robert@hotmail .comwrote in message
    news:uMscFjDiHH A.4904@TK2MSFTN GP05.phx.gbl...
    Hi,
    I need to convert our word documents to html for our website. I've used
    MS Word's "Save as HTML" feature and ran "Microsoft Office HTML Filtrer
    2.0" to clean up the code but I am stuck with a lot of additional code
    and I want to write a script that will do a custom cleanup.
    >
    The Word document has a "Table of Contents" and when I convert, I get
    links at the top of my page that link to the appropriate section but I
    get code like this:
    >
    <a name="_Toc54767 572"></a><a name="_Toc58978 952"></a><a
    name="_Toc58980 987"></a><a
    name="_Toc58981 749"></a><a name="_Toc90871 301"></a><a
    name="_Toc93973 545"></a><a
    name="_Toc12611 4863"></a>
    <a name="_Toc15739 1168">My Title</a>
    >
    I get a whole bunch of empty anchor tags each with a different name and
    only the last anchor tag is correct. I would like to use regular
    expressions to remove all empty "a" tags.
    >
    Rob, I think something similar to

    Set RegularExpressi onObject = New RegExp

    With RegularExpressi onObject
    ..Pattern = "\<a(.|\n)*\>\< \/a\>"
    ..IgnoreCase = True
    ..Global = True
    End With

    ReplacedText = RegularExpressi onObject.Replac e(InitialText, "")


    Comment

    • Evertjan.

      #3
      Re: Regular Expression help

      Alexey Smirnov wrote on 27 apr 2007 in
      microsoft.publi c.inetserver.as p.general:
      >
      "Rob" <robert@hotmail .comwrote in message
      news:uMscFjDiHH A.4904@TK2MSFTN GP05.phx.gbl...
      [..]
      >>
      >I get a whole bunch of empty anchor tags each with a different name
      >and only the last anchor tag is correct. I would like to use regular
      >expressions to remove all empty "a" tags.
      >>
      >
      Rob, I think something similar to
      >
      Set RegularExpressi onObject = New RegExp
      >
      With RegularExpressi onObject
      .Pattern = "\<a(.|\n)*\>\< \/a\>"
      .IgnoreCase = True
      .Global = True
      End With
      >
      ReplacedText = RegularExpressi onObject.Replac e(InitialText, "")
      ..Pattern = "<a[^>]*>\s*<\/a>"

      will do.

      =============== ==

      However, why [yes, I know it is personal preference] not use a bit of
      jscript even if you use vbs in ASP:


      <% ' vbs
      dim t,result
      t="x<a \nhref='bbb'\n </a>\n\n<a href='bbb'x </a>"
      result = deleteEmptyAnch ors(t)
      %>


      <script language='jscri pt' runat='server'>
      function deleteEmptyAnch ors(t){
      return t.replace(/<a[^>]*>\s*<\/a>/gi,'');
      };
      </script>


      --
      Evertjan.
      The Netherlands.
      (Please change the x'es to dots in my emailaddress)

      Comment

      • Rob

        #4
        Re: Regular Expression help

        Thanks Evertjan

        I tried the other example "\<a(.|\n)*\>\< \/a\>" but my page was taking
        too long to process it. Then I tried your example "<a[^>]*>\s*<\/a>" and
        it works great.

        Thanks again.

        Rob



        *** Sent via Developersdex http://www.developersdex.com ***

        Comment

        Working...