beutifulsoup

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • luca72

    beutifulsoup

    Hello
    I try to use beautifulsoup
    i have this:
    sito = urllib.urlopen( 'http://www.prova.com/')
    esamino = BeautifulSoup(s ito)
    luca = esamino.findAll ('tr', align='center')

    print luca[0]
    >><tr align="center"> <th width="5%"><a onclick="t('Onl y|G|BoT|05','#1 ');" href="#">#1</a></th><td width="10%">44. 4MB</td><td width="90%" align="left"><f ont color="orange"P c-prova.rar </font></td></tr>
    I need to get the following information:
    1)Only|G|BoT|05
    2)#1
    3)44.4MB
    4)Pc-prova.rar
    with: print luca[0].a.string i get #1
    with print luca[0].td.string i get 44.4MB
    can you explain me how to get the others two value
    Thanks
    Luca
  • Peter Pearson

    #2
    Re: beutifulsoup

    On Wed, 29 Oct 2008 09:45:31 -0700 (PDT), luca72 <lucaberto@libe ro.itwrote:
    Hello
    I try to use beautifulsoup
    i have this:
    sito = urllib.urlopen( 'http://www.prova.com/')
    esamino = BeautifulSoup(s ito)
    luca = esamino.findAll ('tr', align='center')
    >
    print luca[0]
    >
    [The following long string has been wrapped.]
    >>><tr align="center"> <th width="5%"><a onclick="t('Onl y|G|BoT|05','#1 ');"
    href="#">#1</a></th><td width="10%">44. 4MB</td>
    <td width="90%" align="left">
    <font color="orange"P c-prova.rar </font></td></tr>
    >
    I need to get the following information:
    1)Only|G|BoT|05
    2)#1
    3)44.4MB
    4)Pc-prova.rar
    with: print luca[0].a.string i get #1
    with print luca[0].td.string i get 44.4MB
    can you explain me how to get the others two value
    Like you, I struggle with BeautifulSoup; but perhaps this will help
    while waiting for somebody smarter to join the thread:
    >>soup = BeautifulSoup.B eautifulSoup(
    .... """<tr align="center"> <th width="5%">"""
    .... """<a onclick="t('Onl y|G|BoT|05','#1 ');" href="#">#1</a>"""
    .... """</th><td width="10%">44. 4MB</td><td width="90%" align="left">"" "
    .... """<font color="orange"P c-prova.rar </font></td></tr>""" )
    >>tr = soup.findAll( 'tr' )
    >>tr[0].findAll( text = True )
    [u'#1', u'44.4MB', u' Pc-prova.rar ']
    >>c = tr[0].findChild( attrs={"onclick ": True} )
    >>print c[ "onclick" ]
    t('Only|G|BoT|0 5','#1');


    --
    To email me, substitute nowhere->spamcop, invalid->net.

    Comment

    • Stefan Behnel

      #3
      Re: beutifulsoup

      Peter Pearson wrote:
      Like you, I struggle with BeautifulSoup
      Well, there's always lxml.html if you need it.



      Stefan

      Comment

      • Kay Schluehr

        #4
        Re: beutifulsoup

        On 29 Okt., 17:45, luca72 <lucabe...@libe ro.itwrote:
        Hello
        I try to use beautifulsoup
        i have this:
        sito = urllib.urlopen( 'http://www.prova.com/')
        esamino = BeautifulSoup(s ito)
        luca = esamino.findAll ('tr', align='center')
        >
        print luca[0]
        >
        ><tr align="center"> <th width="5%"><a onclick="t('Onl y|G|BoT|05','#1 ');" href="#">#1</a></th><td width="10%">44. 4MB</td><td width="90%" align="left"><f ont color="orange"P c-prova.rar </font></td></tr>
        >
        I need to get the following information:
        1)Only|G|BoT|05
        2)#1
        3)44.4MB
        4)Pc-prova.rar
        with: print luca[0].a.string i get #1
        with print luca[0].td.string i get 44.4MB
        can you explain me how to get the others two value
        Thanks
        Luca
        The same way you got `luca`

        1,2) luca.find("a")["onclick"].split("'") and search through the
        result list
        3) luca.find("td") .string
        4) luca.find("font ").string


        Comment

        • luca72

          #5
          Re: beutifulsoup

          hello
          Another stupit question instead of use
          sito = urllib.urlopen( 'http://www.prova.com/')
          esamino = BeautifulSoup(s ito)

          i do
          sito = urllib.urlopen( 'http://onlygame.hellow eb.eu/')
          file_sito = open('sito.html ', 'wb')
          for line in sito :
          file_sito.write (line)
          file_sito.close ()

          how can i pass the file sito.html to beautifulsoup?

          Regards

          Luca

          Comment

          • Kay Schluehr

            #6
            Re: beutifulsoup

            On 30 Okt., 18:28, luca72 <lucabe...@libe ro.itwrote:
            hello
            Another stupit question instead of use
            sito = urllib.urlopen( 'http://www.prova.com/')
            esamino = BeautifulSoup(s ito)
            >
            i do
             sito = urllib.urlopen( 'http://onlygame.hellow eb.eu/')
             file_sito = open('sito.html ', 'wb')
             for line in sito :
                 file_sito.write (line)
             file_sito.close ()
            >
            how can i pass the file sito.html to beautifulsoup?
            >
            Regards
            >
            Luca
            download = urllib.urlopen( "http://www.fiber-space.de/downloads/
            downloads.html" )
            BeautifulSoup(d ownload.read())

            Ciao

            Comment

            Working...