I always have no idea about how to express "conclude the entire word"
with regexp, while using python, I encountered this problem again...
for example, if I want to match the "string" in "test a string",
re.findall(r"[^a]* (\w+)","test a string") will work, but what if
there is not "a" but "an"(test a string)? the [^an] will failed
because it will stop at the first character "a".
I guess people not always use this kind of way to filter words?
Here comes the real problem I encountered:
I want to filter the text both in "<td>" block and the "<span>"'s
title attribute
############### ####### code ############### ##############
import re
content='''<tr align="center" valign="middle" class="CellCss" ><td
valign="middle" >LA</td><td valign="middle" >11/10/2008</td><td
valign="middle" >1340/1430</td><td valign="middle" >PF1/5</td><td
valign="middle" ><span title="Understa nding the stock market"
class="MouseCur sor">Understand ....</span></td><td title="Charisma "
valign="middle" >Charisma</td><td valign="middle" >Booked</td><td
valign="middle" >'''
re.findall(r''' <td valign="middle" >([^<]+)</td><td
valign="middle" >([^<]+)</td><td valign="middle" >([^<]+)</td><td
valign="middle" >([^<]+)</td><td valign="middle" ><span
title="([^"]*)"''',conten t)
############### ##### code end ############### #############
As you saw above,
I get the results with "LA,11/10/2008,1340/1430,PF1/5,Understanding
the stock market"
there are two "<span>" block but I can just get the "title" attribute
of the first "<span>" using regexp.
for the second, which should be "Charisma" I need to use some kind of
[^</td>]* to match "class="MouseCu rsor">Understan d....</span></td>",
then I can continue match the second "<span>" block.
Maybe I didn't describe this clearly, then feel free to tell me:)
thanks for any further reply!
with regexp, while using python, I encountered this problem again...
for example, if I want to match the "string" in "test a string",
re.findall(r"[^a]* (\w+)","test a string") will work, but what if
there is not "a" but "an"(test a string)? the [^an] will failed
because it will stop at the first character "a".
I guess people not always use this kind of way to filter words?
Here comes the real problem I encountered:
I want to filter the text both in "<td>" block and the "<span>"'s
title attribute
############### ####### code ############### ##############
import re
content='''<tr align="center" valign="middle" class="CellCss" ><td
valign="middle" >LA</td><td valign="middle" >11/10/2008</td><td
valign="middle" >1340/1430</td><td valign="middle" >PF1/5</td><td
valign="middle" ><span title="Understa nding the stock market"
class="MouseCur sor">Understand ....</span></td><td title="Charisma "
valign="middle" >Charisma</td><td valign="middle" >Booked</td><td
valign="middle" >'''
re.findall(r''' <td valign="middle" >([^<]+)</td><td
valign="middle" >([^<]+)</td><td valign="middle" >([^<]+)</td><td
valign="middle" >([^<]+)</td><td valign="middle" ><span
title="([^"]*)"''',conten t)
############### ##### code end ############### #############
As you saw above,
I get the results with "LA,11/10/2008,1340/1430,PF1/5,Understanding
the stock market"
there are two "<span>" block but I can just get the "title" attribute
of the first "<span>" using regexp.
for the second, which should be "Charisma" I need to use some kind of
[^</td>]* to match "class="MouseCu rsor">Understan d....</span></td>",
then I can continue match the second "<span>" block.
Maybe I didn't describe this clearly, then feel free to tell me:)
thanks for any further reply!
Comment