simple Question about using BeautifulSoup

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Alexnb

    simple Question about using BeautifulSoup


    Okay, I have used BeautifulSoup a lot lately, but I am wondering, how do you
    open a local html file?

    Usually I do something like this for a url

    soup = BeautifulSoup(u rllib.urlopen(' http://www.website.com ')

    but the file extension doesn't work. So how do I open one?
    --
    View this message in context: http://www.nabble.com/simple-Questio...p19069980.html
    Sent from the Python - python-list mailing list archive at Nabble.com.

  • Grzegorz Staniak

    #2
    Re: simple Question about using BeautifulSoup

    On 2008-08-20, Alexnb <alexnbryan@gma il.comwroted:
    Okay, I have used BeautifulSoup a lot lately, but I am wondering, how do you
    open a local html file?
    >
    Usually I do something like this for a url
    >
    soup = BeautifulSoup(u rllib.urlopen(' http://www.website.com ')
    >
    but the file extension doesn't work. So how do I open one?
    Have you tried the local file URL, like "file:///home/user/file.html"?

    GS
    --
    Grzegorz Staniak <gstaniak _at_ wp [dot] pl>

    Comment

    • Diez B. Roggisch

      #3
      Re: simple Question about using BeautifulSoup

      Alexnb wrote:
      >
      Okay, I have used BeautifulSoup a lot lately, but I am wondering, how do
      you open a local html file?
      >
      Usually I do something like this for a url
      >
      soup = BeautifulSoup(u rllib.urlopen(' http://www.website.com ')
      >
      but the file extension doesn't work. So how do I open one?
      The docs for urllib.urlopen clearly state that it returns a file-like
      object. Which BS seems to grok.

      So... how about passing another file-like object, like... *drumroll* - a
      file?

      soup = BeautifulSoup(o pen("myfile.htm l"))

      Apart from the documented possibility to pass the html as string, which
      means


      soup = BeautifulSoup(o pen("myfile.htm l").read())

      will work as well.

      Diez

      Comment

      Working...