Help working Beautifulsoup into Python script

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Johannes Height

    Help working Beautifulsoup into Python script

    Hello, when I attempt to run a script I have in Python, I included Beautifulsoup into the coding of it, however when I run the script, Beautiful Soup fails, could someone explain what I did wrong?

    Picture of error message: [imgnothumb]http://img440.imagesha ck.us/img440/3857/c2ebf7b323964ad 597f4b23.png[/imgnothumb]
    Last edited by Niheel; Oct 8 '10, 03:23 AM.
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    According to the error message, there is an invalid tag on line 2645 of the HTML you are trying to parse. I have never used Beautiful Soup, but according to the documentation you may be able to fix the HTML before the document is parsed by passing the constructor a markupMassage argument. See the documentation here.

    Comment

    • leegao
      New Member
      • Mar 2010
      • 3

      #3
      A common Javascript pattern is to directly insert elements into the DOM. To this effect, you will encounter many instances where an "improperly " coded script element (as in without using CDATA, a rare habit and one that I'm completely against) will cause the parser to grind to a screeching halt. The fix is simple, apply the following filter to your source string:

      Code:
      import re
      re_script = re.compile("<script.*?>((?:.|\s)+?)</script>")
      out = re_script.sub("", source)
      This will remove all script tags from the source string.

      Comment

      Working...