Regular express for <p>, <ul> and <ol> tags

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Shahid

    Regular express for <p>, <ul> and <ol> tags

    Hi,
    I am parsing an .HTML file that contains following example code:
    <div>
    <p class="html_pre formatted" awml:style="HTM L Preformatted"
    dir="ltr" style="text-align:left"><sp an style="font-size:12pt;font-
    family:'Arial'" xml:lang="en-US" lang="en-US">Normal Text Arial 12
    Black before bullets.</span></p>
    <ul>
    <li class="html_pre formatted" dir="ltr" style="text-
    align:left">&nb sp;<span style="font-size:12pt;font-family:'Arial'"
    xml:lang="en-US" lang="en-US">Bullet1: If you want to convert bitmap
    images Single Line.</span></li>

    <li class="html_pre formatted" dir="ltr" style="text-
    align:left">&nb sp;<span style="font-size:12pt;font-family:'Arial'"
    xml:lang="en-US" lang="en-US">Bullet2: D you want to convert </
    span><span style="font-weight:bold;fon t-size:13pt;font-family:'Times
    New Roman';color:#f f0000" xml:lang="en-US" lang="en-US">Times New
    Roman Bold Red 13</span><span style="font-size:12pt;font-
    family:'Arial'" xml:lang="en-US" lang="en-US"like BMP, JPG?</span></
    li>
    <li class="html_pre formatted" dir="ltr" style="text-
    align:left">&nb sp;<span style="font-weight:bold;fon t-size:12pt;font-
    family:'Arial'" xml:lang="en-US" lang="en-US">Bullet3 bold:</
    span><span style="font-size:12pt;font-family:'Arial'" xml:lang="en-US"
    lang="en-US"If you want to convert bitmap images like BMP, JPG</
    span></li>
    <li class="html_pre formatted" dir="ltr" style="text-
    align:left">&nb sp;<span style="font-weight:bold;fon t-size:14pt;font-
    family:'Arial'" xml:lang="en-US" lang="en-US">Bullet4 bold 14: </
    span><span style="font-size:14pt;font-family:'Arial'" xml:lang="en-US"
    lang="en-US">If you want to convert bitmap images like BMP, JPG 2
    lines.</span></li>
    <li class="html_pre formatted" dir="ltr" style="text-
    align:left">&nb sp;<span style="font-weight:bold;fon t-size:16pt;font-
    family:'Arial'; color:#ff0000" xml:lang="en-US" lang="en-US">Bullet4
    bold 14 all Red: </span><span style="font-size:16pt;font-
    family:'Arial'; color:#ff0000" xml:lang="en-US" lang="en-US">If you
    want to convert bitmap images like BMP, JPG.</span></li>

    <li class="html_pre formatted" dir="ltr" style="text-
    align:left">&nb sp;<span style="font-weight:bold;fon t-size:14pt;font-
    family:'Arial'" xml:lang="en-US" lang="en-US">Bullet4 bold 14 Black:
    </
    span><span style="font-size:14pt;font-family:'Arial'; color:#0000ff"
    xml:lang="en-US" lang="en-US">Blue If you want to convert bitmap. </
    span><span style="font-size:16pt;font-family:'Arial'; color:#008000"
    xml:lang="en-US" lang="en-US">Green 16 images like BMP, JPG.</span>
    </li>
    </ul>
    <p class="html_pre formatted" awml:style="HTM L Preformatted"
    dir="ltr" style="text-align:left"><sp an style="font-size:14pt;font-
    family:'Arial'; color:#ff0000" xml:lang="en-US" lang="en-US">Normal
    Text Red Arial 14 after bullets.</span></p>
    <p class="html_pre formatted" awml:style="HTM L Preformatted"
    dir="ltr" style="text-align:left;marg in-left:0.2500in"> <span
    style="font-weight:bold;fon t-size:14pt;font-family:'Arial'"
    xml:lang="en-US" lang="en-US">&nbsp;</span></p>
    <p dir="ltr" style="text-align:left"></p>
    <p></p>
    </div>

    I am trying to parse all the <p>, <oland <ultags but couldn't
    succeed yet.
    I am trying following Regular Expression(RE):
    "(<[pP][^>]*>(.*)</[pP]>)|(<[oO][lL][^>]+>(.*)</[oO][lL]>)|(<[uU][lL]
    [^>]+>(.*)</[uU][lL]>)"

    I am using preg_match_all( ). Remember I am working in PHP.
    If any one can help me, I will be very grateful to him/her. I need its
    solution urgent.
  • Michael Fesser

    #2
    Re: Regular express for &lt;p&gt;, &lt;ul&gt; and &lt;ol&gt; tags

    ..oO(Shahid)
    >I am parsing an .HTML file that contains following example code:
    <div>
    <p class="html_pre formatted" awml:style="HTM L Preformatted"
    >dir="ltr" style="text-align:left"><sp an style="font-size:12pt;font-
    >family:'Arial' " xml:lang="en-US" lang="en-US">Normal Text Arial 12
    >Black before bullets.</span></p>
    <ul>
    >[...]
    >
    >I am trying to parse all the <p>, <oland <ultags but couldn't
    >succeed yet.
    >I am trying following Regular Expression(RE):
    >"(<[pP][^>]*>(.*)</[pP]>)|(<[oO][lL][^>]+>(.*)</[oO][lL]>)|(<[uU][lL]
    >[^>]+>(.*)</[uU][lL]>)"
    >
    >I am using preg_match_all( ). Remember I am working in PHP.
    >If any one can help me, I will be very grateful to him/her. I need its
    >solution urgent.
    Why don't you use the DOM with an XPath expression?

    Micha

    Comment

    • Curtis

      #3
      Re: Regular express for &lt;p&gt;, &lt;ul&gt; and &lt;ol&gt; tags

      Shahid wrote:
      Hi,
      I am parsing an .HTML file that contains following example code:
      [snip]
      >
      I am trying to parse all the <p>, <oland <ultags but couldn't
      succeed yet.
      I am trying following Regular Expression(RE):
      "(<[pP][^>]*>(.*)</[pP]>)|(<[oO][lL][^>]+>(.*)</[oO][lL]>)|(<[uU][lL]
      [^>]+>(.*)</[uU][lL]>)"
      >
      I am using preg_match_all( ). Remember I am working in PHP.
      If any one can help me, I will be very grateful to him/her. I need its
      solution urgent.
      Have you bothered checking php.net's docs? Their page for
      preg_match_all has an example regex doing what you want.

      --
      Curtis

      Comment

      Working...