I have a CSV file created by VisualBasic in UTF-8. If I open the file in vi/emacs I see the Byte-Order marker (BOM), <feff>
So now when I read the file:
It prints a control character (u'\u\xef\xbb\x bf) as its first character. Shouldn't the decode strip this? Also tried the following to see what would happen and try to auto-detect the format:
For UTF-16 this is weird cause it states "UTF-16 stream does not start with BOM" even though the first char is the BOM. For UTF-8 no errors but it prints the control characters (u'\ufeff)
Any ideas what is going on with this? Possibly a badly encoded file?
So now when I read the file:
Code:
import codecs
f = open ('myfile')
test = f.readline ()
print test.decode ('utf-8')
Code:
import codecs
for encoding in ['utf-8', 'utf-16']:
try:
f = codecs.open ('myfile', encoding=encoding)
test = f.readline ()
test
except Exception, exc:
f = None
print (exc)
Any ideas what is going on with this? Possibly a badly encoded file?
Comment