Having trouble with some lists in BeautifulSoup

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Alexnb

    Having trouble with some lists in BeautifulSoup


    Okay, what I want to do with this code is to got to thesaurus.refer ence.com
    and then search for a word and get the syns for it. Now, I can get the syns,
    but they are still in html form and some are hyperlinks. But I can't get the
    contents out. I am not that familiar with BeautifulSoup. So if anyone wants
    to look over this code(if you run it, it will make a lot more sense) and
    maybe help me out.

    side note: if you run it, a list object will print and what I am after is
    the part that starts:

    <td colspan="2" widht="100%">am erican...

    Heres the code:

    import urllib
    from BeautifulSoup import BeautifulSoup

    class defSyn:
    def __init__(self, word):
    self.word = word

    def get_syn(term):
    soup =
    BeautifulSoup(u rllib.urlopen(' http://thesaurus.refer ence.com/search?q=%s' %
    term))

    balls = soup.findAll('t able', {'width': '100%'})
    print soup.prettify()


    for tabs in soup.findAll('t able', {'width': '100%'}):
    yield tabs.findAll('t d', {'colspan': '2'})

    self.mainList = list(get_syn(se lf.word))
    print self.mainList[2]


    if You have any further questions I would be happy to answer.
    --
    View this message in context: http://www.nabble.com/Having-trouble...p18497409.html
    Sent from the Python - python-list mailing list archive at Nabble.com.

  • John Nagle

    #2
    Re: Having trouble with some lists in BeautifulSoup

    Alexnb wrote:
    Okay, what I want to do with this code is to got to thesaurus.refer ence.com
    and then search for a word and get the syns for it. Now, I can get the syns,
    but they are still in html form and some are hyperlinks. But I can't get the
    contents out. I am not that familiar with BeautifulSoup. So if anyone wants
    to look over this code(if you run it, it will make a lot more sense) and
    maybe help me out.
    The thesaurus site may become annoyed if you overdo this.

    However, it's not hard to do. Search the output for
    an "a" tag with class "noline", then extract the text content
    of the "a" tag. The BeautifulSoup manual will tell you how.

    If you want raw thesaurus data you can use freely, see
    "http://wordnet.princet on.edu".

    John Nagle

    Comment

    Working...