"UnicodeError: UTF-16 stream does not start with BOM"

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • aberry
    New Member
    • Sep 2008
    • 10

    "UnicodeError: UTF-16 stream does not start with BOM"

    I have text file which contain Unicode data (say inp.txt)
    I read file using following code:-

    Code:
    import codecs
    infile = codecs.open('C:\\tdata\\inp.txt','r','utf-16',errors='ignore')
    data = infile.readlines()
    If I run above code ... it throws following error :-
    Code:
    "Traceback (most recent call last):
      File "C:\script\hypen\hyp.py", line 34, in ?
        data = infile.readlines()
      File "C:\Python24\lib\codecs.py", line 489, in readlines
        return self.reader.readlines(sizehint)
      File "C:\Python24\lib\codecs.py", line 404, in readlines
        data = self.read()
      File "C:\Python24\lib\codecs.py", line 293, in read
        newchars, decodedbytes = self.decode(data, self.errors)
      File "C:\Python24\lib\encodings\utf_16.py", line 49, in decode
        raise UnicodeError,"UTF-16 stream does not start with BOM"
    UnicodeError: UTF-16 stream does not start with BOM"
    But if I do create a new file (I did in Notepad on Win XP) and copy paste content of 'inp.txt' in it and save it as text file (choosing Unicode encoding which same as of inp.txt). Now with same above code reading this new file, it works absolutely fine. this seems weird... is notepad created file added some own magic chars :)

    Can anyone help me regarding this , what can be the issue here ? . Why creating a new file and saving contents in it worked FINE while original file still throws error. (I have got such 15 localized files from clients on which some processing as to be done, I want to avoid manually copy/paste rework). Any help appreciated...


    Thanks,
    anil
    Last edited by bvdet; Jan 6 '09, 03:59 PM. Reason: Fixed code tag
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    I found information on this link helpful. Since you know your encoding is "UTF-16", you may be able to use string method decode() to read your data. Notepad adds the BOM based on the encoding selected.

    Comment

    Working...