re troubles

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Evanda Remington

    re troubles

    I'm trying to filter some rows of an html table out, based on their
    contents. For input like:
    """
    <table>
    <tr>
    <td>Lasers</td><td>17</td> </tr>
    <tr> << want to filter
    <td>kittens</td><td>8</td> << this out.
    </tr> <<
    <tr> <td>robots</td><td>8</td> </tr>
    </table>
    """
    I would like to completely remove the (3 line) table row that makes mention
    of kittens. The regexp I have tried to use is: r"<tr>.*?kitten s.*?</tr>".
    When compiled and used with subs("",data), strangely removes everything
    from the first "<tr>" to the first "<tr>" after kittens.

    That is, the ".*?" notation works in the second half, but not in the first
    half. It behaves the same as ".*" should.

    Any advice?

    -e

    --
    Evanda Remington
    evanda@wreck.or g

  • Bengt Richter

    #2
    Re: re troubles

    On Thu, 18 Dec 2003 17:22:54 -0600, Evanda Remington <evanda@remingt ons.org> wrote:
    [color=blue]
    >I'm trying to filter some rows of an html table out, based on their
    >contents. For input like:
    >"""
    ><table>
    > <tr>
    > <td>Lasers</td><td>17</td> </tr>
    > <tr> << want to filter
    > <td>kittens</td><td>8</td> << this out.
    > </tr> <<
    > <tr> <td>robots</td><td>8</td> </tr>
    ></table>
    >"""
    >I would like to completely remove the (3 line) table row that makes mention
    >of kittens. The regexp I have tried to use is: r"<tr>.*?kitten s.*?</tr>".
    >When compiled and used with subs("",data), strangely removes everything
    >from the first "<tr>" to the first "<tr>" after kittens.
    >
    >That is, the ".*?" notation works in the second half, but not in the first
    >half. It behaves the same as ".*" should.
    >
    >Any advice?
    >[/color]
    See if this will work for you. I added some more kittens and robots. Otherwise
    a single instance could be done differently. I used 'XXX' rather than '' for example clarity.

    ====< evanda.py >============== ======
    import re
    s = """\
    <table>
    <tr>
    <td>Lasers</td><td>17</td> </tr>
    <tr> << want to filter
    <td>kittens</td><td>8</td> << this out.
    </tr> <<
    <tr> <td>robots</td><td>8</td> </tr>
    <tr> << want to filter
    <td>more kittens</td><td>8</td> << this out.
    </tr> <<
    <tr> <td>more robots</td><td>8</td> </tr>
    </table>
    """
    rxo = re.compile(r"(? ms)<tr>(?:[^<]|<[^t]|<t[^r]|<tr[^>])*?kittens.*?</tr>")
    print '==== before ====\n%s==== after sub XXX ====\n%s====' % (s, rxo.sub('XXX', s))
    =============== =============== =======
    Result:

    [19:02] C:\pywk\clp>eva nda.py
    ==== before ====
    <table>
    <tr>
    <td>Lasers</td><td>17</td> </tr>
    <tr> << want to filter
    <td>kittens</td><td>8</td> << this out.
    </tr> <<
    <tr> <td>robots</td><td>8</td> </tr>
    <tr> << want to filter
    <td>more kittens</td><td>8</td> << this out.
    </tr> <<
    <tr> <td>more robots</td><td>8</td> </tr>
    </table>
    ==== after sub XXX ====
    <table>
    <tr>
    <td>Lasers</td><td>17</td> </tr>
    XXX <<
    <tr> <td>robots</td><td>8</td> </tr>
    XXX <<
    <tr> <td>more robots</td><td>8</td> </tr>
    </table>
    ====

    Regards,
    Bengt Richter

    Comment

    Working...