Parsing HTML [solved using the re module]

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Patrick C
    New Member
    • Apr 2007
    • 54

    Parsing HTML [solved using the re module]

    Hello hello, i'm very much a beginner and I've done 1 task successfully (with help) and now i want to deviate just a little and i'm stumped. Here's what i've done...

    In a previous task I needed to get a specific number out of this source code:
    <TD HEIGHT="24" CLASS="bubblemi ddle" ALIGN="right" id="homeindexvo lume" name="homeindex volume">2,017,7 98,400</TD>

    so I used:
    e.compile('<TD> .*name="homeind exvolume">(.*?) </TD>',re.M|re.DO TALL)

    Now from a different piece of a source code i need a specific number when there is a lot more to the original line.
    Here's the source code:

    <tr><td bgcolor="EEEEEE "><FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" ><b>Total</b></font></td><td bgcolor="EEEEEE " align="right">< FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" >1,508,577,00 0</font></td><td bgcolor="EEEEEE " align="right">< FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" >51,073,000</font></td><td bgcolor="EEEEEE " align="right">< FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" >1,966,371,00 0</font></td><td bgcolor="EEEEEE " align="right">< FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" >2,125,754,37 3</font></td></tr>

    Now all I want is 1,508,577,000.

    How would I grab just that number?

    How about if I wanted a different nubmer in there, say 51,073,000?

    Thanks
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    Originally posted by Patrick C
    Hello hello, i'm very much a beginner and I've done 1 task successfully (with help) and now i want to deviate just a little and i'm stumped. Here's what i've done...

    In a previous task I needed to get a specific number out of this source code:
    <TD HEIGHT="24" CLASS="bubblemi ddle" ALIGN="right" id="homeindexvo lume" name="homeindex volume">2,017,7 98,400</TD>

    so I used:
    e.compile('<TD> .*name="homeind exvolume">(.*?) </TD>',re.M|re.DO TALL)

    Now from a different piece of a source code i need a specific number when there is a lot more to the original line.
    Here's the source code:

    <tr><td bgcolor="EEEEEE "><FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" ><b>Total</b></font></td><td bgcolor="EEEEEE " align="right">< FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" >1,508,577,00 0</font></td><td bgcolor="EEEEEE " align="right">< FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" >51,073,000</font></td><td bgcolor="EEEEEE " align="right">< FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" >1,966,371,00 0</font></td><td bgcolor="EEEEEE " align="right">< FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" >2,125,754,37 3</font></td></tr>

    Now all I want is 1,508,577,000.

    How would I grab just that number?

    How about if I wanted a different nubmer in there, say 51,073,000?

    Thanks
    This will extract the numbers from the string:[code=Python]import re

    s = '<tr><td bgcolor="EEEEEE "><FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" ><b>Total</b></font></td><td bgcolor="EEEEEE " align="right">< FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" >1,508,577,00 0</font></td><td bgcolor="EEEEEE " align="right">< FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" >51,073,000</font></td><td bgcolor="EEEEEE " align="right">< FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" >1,966,371,00 0</font></td><td bgcolor="EEEEEE " align="right">< FONT FACE="Arial,Hel vetica,sans-serif" SIZE="2" COLOR="#000000" >2,125,754,37 3</font></td></tr>'

    patt = r'>([0-9,]+)<'
    dataList = re.findall(patt , s)
    print dataList

    '''
    >>> ['1,508,577,000' , '51,073,000', '1,966,371,000' , '2,125,754,373']
    '''[/code]Use the list index to get individual items:
    Code:
    >>> number = dataList[0]
    >>> number
    '1,508,577,000'
    >>>

    Comment

    Working...