Python for Vcard Parsing in UTF16

**Alex Martelli** · Apr 22 '07, 12:05 AM

Re: Python for Vcard Parsing in UTF16

R Wood <rwood@therandy mon.comwrote:
...

alias Linus_Torvalds Linus Torvalds <lt@linux.com >
>
To me this was a natural task for Perl. Turns out however, there's a catch.
Apple exports the file in UTF-16 to ensure anyone with Chinese characters in
their addressbook gets a legitimate Vcard file. And of course Perl somewhat
chokes on UTF. I've found several ways to do it that involve complicated
downloads and installations of Perl modules, but that defeats the purpose of
making it simple. In an ideal world you should be able to say "try this cool
script" and be done with it. Once you have to say "go to CPAN, download and
compile this module, then ..." it gets less exciting.
>
I know nothing about Python except that it interests me and has interested me
since I first learned the Rekall database frontend (Linux) runs on it. I just
ordered Learning Python and if that works out satisfactorily I'm going to go
back for Programming Python. In the meantime, I thought I would pose the
question to this newsgroup: would Python be useful for a parsing exercise like
this one?

Sure, Python and Perl (and Ruby) should be equally suitable for the
task, so, if Python appears more suitable by having built-in unicode
capabilities, go for it. I'm a bit uncertain about the UTF-16 export
though; I know some applications do use it (e.g., Microsoft Entourage),
but I thought Apple's Address Book didn't, and, having just tried a
VCard export from mine, it looks quite ASCII to me. Maybe you've set
some kind of preference, or...?

Alex

**R Wood** · Apr 22 '07, 08:45 AM

Re: Python for Vcard Parsing in UTF16

Alex Martelli wrote:

R Wood <rwood@therandy mon.comwrote:
...

>alias Linus_Torvalds Linus Torvalds <lt@linux.com >
>>
>To me this was a natural task for Perl. Turns out however, there's a
>catch. Apple exports the file in UTF-16 to ensure anyone with Chinese
>characters in
>their addressbook gets a legitimate Vcard file. And of course Perl
>somewhat
>chokes on UTF.

>
Sure, Python and Perl (and Ruby) should be equally suitable for the
task, so, if Python appears more suitable by having built-in unicode
capabilities, go for it. I'm a bit uncertain about the UTF-16 export
though; I know some applications do use it (e.g., Microsoft Entourage),
but I thought Apple's Address Book didn't, and, having just tried a
VCard export from mine, it looks quite ASCII to me. Maybe you've set
some kind of preference, or...?
>
>
Alex

I did the same thing. Apple's clever. If your addressbook doesn't have any
higher characters, ie nothing but ASCII, it will export your addressbook in
ASCII. But if you have anything else (in my case, Spanish, French, and
Italian) it goes for UTF16. I first thought it was UTF8 but realized since
Apple supports all sorts of Asian languages really well they need UTF16 to
deal with it, and importing the exported file into Jedit using UTF16
encoding confirmed that's what it is.

**Adam Atlas** · Apr 24 '07, 11:55 AM

Re: Python for Vcard Parsing in UTF16

On Apr 21, 7:28 pm, R Wood <r...@therandym on.comwrote:

I know nothing about Python except that it interests me and has interested me
since I first learned the Rekall database frontend (Linux) runs on it. I just
ordered Learning Python and if that works out satisfactorily I'm going to go
back for Programming Python. In the meantime, I thought I would pose the
question to this newsgroup: would Python be useful for a parsing exercise like
this one?

Here's a little function that takes some `str`-type data (i.e. what
you'd get from doing open(...).read( )) and, assuming it's a Vcard,
detects its encoding and converts it to a canonical `unicode` object.

def fix_encoding(s) :
m = u'BEGIN:VCARD'
for c in ('ascii', 'utf_16_be', 'utf_16_le', 'utf_8'):
try: u = unicode(s, c)
except UnicodeDecodeEr ror: continue
if m in u: return u
return None

**Adam Atlas** · Apr 24 '07, 12:05 PM

Re: Python for Vcard Parsing in UTF16

On Apr 21, 7:28 pm, R Wood <r...@therandym on.comwrote:

To me this was a natural task for Perl. Turns out however, there's a catch.
Apple exports the file in UTF-16 to ensure anyone with Chinese characters in
their addressbook gets a legitimate Vcard file.

Here's a function that, given a `str` containing a vcard in some
encoding, guesses the encoding and returns a canonical representation
as a `unicode` object.

def fix_encoding(s) :
m = u'BEGIN:VCARD'
for c in ('ascii', 'utf_16_be', 'utf_16_le', 'utf_8'):
try: u = unicode(s, c)
except UnicodeDecodeEr ror: continue
if m in u: return u
return None

Python for Vcard Parsing in UTF16

Python for Vcard Parsing in UTF16

Comment

Comment

Comment

Comment