regex help

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Paul

    regex help

    hi,

    i am trying to find a specific word in several html pages, that is _not_
    italic, which means not in any case bordered by an <i>-tag.

    e.g.

    match:
    <b><i>word</i></b>
    <i><b><span>wor d</span></b></i>
    <i>word</i>

    non-match:
    word
    <b>word</b>

    i don't really know how to meet this condition, so your help would be
    very appreciated.

    thank you,
    Paul
  • Jürgen Exner

    #2
    Re: regex help

    Paul wrote:[color=blue]
    > i am trying to find a specific word in several html pages, that is
    > _not_ italic, which means not in any case bordered by an <i>-tag.
    >
    > e.g.
    >
    > match:
    > <b><i>word</i></b>
    > <i><b><span>wor d</span></b></i>
    > <i>word</i>
    >
    > non-match:
    > word
    > <b>word</b>
    >
    > i don't really know how to meet this condition, so your help would be
    > very appreciated.[/color]

    Congratulations . You just re-discovered that regular expressions are not the
    right tool for parsing HTML.
    Solution: Use a proper HTML parser like HTML::Parser or one of its cousins
    and just define a callback function for the <i> tag.

    jue


    Comment

    Working...