Dotan Cohen wrote:
I believe that particular find/replace should be safe even if other
bytes represent encoded unicode.
In Python3, if you want to do more, and if you open in text mode, the
bytes with automatically be decoded, with UTF-8 the default, I believe.
Your sample said UTF-8, so that would be the right thing.
2008/10/14 <skip@pobox.com >:
>
Thanks, that's easier than I thought! I am sure with some googling I
will discover how to loop through all the files in a directory. One
question, though, is that code unicode-safe in the event that there
are unicode characters in there?
> DotanCan Python go through a directory of files and replace each
> Dotaninstance of "newline-space" with nothing?
>>
>Sure. Something like (*completely* untested, so caveat emptor):
>>
> import glob
> import os
>>
> for f in glob.glob('*.vc f'):
> # corrupt data
> uncooked = open(f, 'rb').read()
> # fix it
> cooked = uncooked.replac e('\n ', '')
> # backup original file for safety
> os.rename(f, '%s.orig' % f)
> # and save it
> open(f, 'wb').write(coo ked)
>>
> Dotaninstance of "newline-space" with nothing?
>>
>Sure. Something like (*completely* untested, so caveat emptor):
>>
> import glob
> import os
>>
> for f in glob.glob('*.vc f'):
> # corrupt data
> uncooked = open(f, 'rb').read()
> # fix it
> cooked = uncooked.replac e('\n ', '')
> # backup original file for safety
> os.rename(f, '%s.orig' % f)
> # and save it
> open(f, 'wb').write(coo ked)
>>
Thanks, that's easier than I thought! I am sure with some googling I
will discover how to loop through all the files in a directory. One
question, though, is that code unicode-safe in the event that there
are unicode characters in there?
bytes represent encoded unicode.
In Python3, if you want to do more, and if you open in text mode, the
bytes with automatically be decoded, with UTF-8 the default, I believe.
Your sample said UTF-8, so that would be the right thing.