Unicode lists and join (python 2.2.3)

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • nkarkhan

    Unicode lists and join (python 2.2.3)

    Hello,
    I have a list of strings, some of the strings might be unicode. I am
    trying to a .join operation on the list and the .join raises a unicode
    exception. I am looking for ways to get around this.
    I would like to get a unicode string out of the list with all string
    elements seperated by '\n'

    #!/usr/bin/env python
    import sys
    import string

    try:
    x = [u"\xeeabc2:xyz" , u"abc3:123"]
    u = "\xe7abc"
    x.append("%s:%s " % ("xfasfs", u))
    x.append(u"Hell o:afddfdsfa")

    y = u'\n'.join(x)
    print("Unicode Call worked!")
    except Exception, err:
    print("Exceptio n raised %s" % err)



    on a related note
    Why does this work with no exceptions

    x=[]
    u = "\xe7abc"
    x.append("%s:%s " % ("xfasfs", u))

    and this doesnt
    x=[]
    u = "\xe7abc"
    x.append("%s:%s " % (u"xfasfs", u))


    Thanks,
    Nitin.


  • =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

    #2
    Re: Unicode lists and join (python 2.2.3)

    x = [u"\xeeabc2:xyz" , u"abc3:123"]
    u = "\xe7abc"
    u is not a Unicode string.
    x.append("%s:%s " % ("xfasfs", u))
    so what you append is not a Unicode string, either.
    x.append(u"Hell o:afddfdsfa")
    >
    y = u'\n'.join(x)
    As a consequence, .join tries to convert the byte string to
    a Unicode string, and fails, because it contains non-ASCII
    bytes.
    Why does this work with no exceptions
    >
    x=[]
    u = "\xe7abc"
    x.append("%s:%s " % ("xfasfs", u))
    % here is applied to a byte string, with all arguments also byte
    strings. The result is a byte string.
    >
    and this doesnt
    x=[]
    u = "\xe7abc"
    x.append("%s:%s " % (u"xfasfs", u))
    % is applied to a byte string, with one argument being a Unicode
    string. The result is a Unicode string, where all byte strings
    get converted to Unicode. Converting u fails, as it has non-ASCII
    bytes in it.

    Regards,
    Martin

    Comment

    Working...