Python and Unicode

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Suresh Iyengar
    New Member
    • May 2007
    • 3

    Python and Unicode

    Hello,

    I want to fetch a web page and parse links in that. I am using the foll. code
    Code:
    file =urllib.urlopen("file:///home/suresh/html_parser/Category:Sports.html")
    content = file.read()
    # Process the page.
    But since the page contains UTF-8 content, its not able to parse it properly. But, if I save the page locally, then its able to parse. How to handle this problem,

    Thanks
    Last edited by numberwhun; Oct 27 '08, 05:24 PM. Reason: Please use code tags
  • YarrOfDoom
    Recognized Expert Top Contributor
    • Aug 2007
    • 1243

    #2
    If I understand your problem, this should help:
    Originally posted by Python Docs: Tutorial
    If you have data in a specific encoding and want to produce a corresponding Unicode string from it, you can use the unicode() function with the encoding name as the second argument.

    >>> unicode('\xc3\x a4\xc3\xb6\xc3\ xbc', 'utf-8')
    u'\xe4\xf6\xfc'
    If this is not what you're looking for, you can probably find your answer here.

    Comment

    Working...