Some help with regular expressions

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Dmitri

    Some help with regular expressions

    Hello RegExp gurus,
    I have a little problem.

    I need to convert huge and dirty HTML to CSV format, I stuck when I
    need to add semicolons inside tags eg:

    <tr calss="testme">
    <td><b>Foo</b></td>
    <td><strong id="bla bla">Bar</strong></td>
    </tr>

    I need to get:

    <tr calss="testme">
    <td><b>Foo;</b></td>
    <td><strong id="bla bla">Bar;</strong></td>
    </tr>

    Then I will strip tags, that I can do myself.
    Thanks in advance.
  • Michael Fesser

    #2
    Re: Some help with regular expressions

    ..oO(Dmitri)
    >Hello RegExp gurus,
    >I have a little problem.
    >
    >I need to convert huge and dirty HTML to CSV format, I stuck when I
    >need to add semicolons inside tags eg:
    >
    ><tr calss="testme">
    <td><b>Foo</b></td>
    <td><strong id="bla bla">Bar</strong></td>
    ></tr>
    >
    >I need to get:
    >
    ><tr calss="testme">
    <td><b>Foo;</b></td>
    <td><strong id="bla bla">Bar;</strong></td>
    ></tr>
    IMHO regular expressions are the wrong tool here.
    >Then I will strip tags, that I can do myself.
    >Thanks in advance.
    Have a look at DOM instead to parse the HTML into an XML tree. Then you
    can use XPath syntax to access all the nodes you need and easily format
    them any way you want.

    Micha

    Comment

    • Dmitri

      #3
      Re: Some help with regular expressions

      On Aug 4, 6:47 pm, Michael Fesser <neti...@gmx.de wrote:
      .oO(Dmitri)
      >
      >
      >
      Hello RegExp gurus,
      I have a little problem.
      >
      I need to convert huge and dirty HTML to CSV format, I stuck when I
      need to add semicolons inside tags eg:
      >
      <tr calss="testme">
         <td><b>Foo</b></td>
         <td><strong id="bla bla">Bar</strong></td>
      </tr>
      >
      I need to get:
      >
      <tr calss="testme">
         <td><b>Foo;</b></td>
         <td><strong id="bla bla">Bar;</strong></td>
      </tr>
      >
      IMHO regular expressions are the wrong tool here.
      >
      Then I will strip tags, that I can do myself.
      Thanks in advance.
      >
      Have a look at DOM instead to parse the HTML into an XML tree. Then you
      can use XPath syntax to access all the nodes you need and easily format
      them any way you want.
      >
      Micha
      Thanks for idea. I didn't think that. ))))

      Comment

      Working...