Embedding a literal "\u" in a unicode raw string.

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Romano Giannetti

    Embedding a literal "\u" in a unicode raw string.

    Hi,

    while writing some LaTeX preprocessing code, I stumbled into this problem: (I
    have a -*- coding: utf-8 -*- line, obviously)

    s = ur"añado $\uparrow$"

    Which gave an error because the \u escape is interpreted in raw unicode strings,
    too. So I found that the only way to solve this is to write:

    s = unicode(r"añad o $\uparrow$", "utf-8")

    or

    s = ur"añado $\u005cuparrow$ "

    The second one is too ugly to live, while the first is at least acceptable; but
    looking around the Python 3.0 doc, I saw that the first one will fail, too.

    Am I doing something wrong here or there is another solution for this?

    Romano



  • Diez B. Roggisch

    #2
    Re: Embedding a literal "\u&quo t; in a unicode raw string.

    Romano Giannetti wrote:
    Hi,
    >
    while writing some LaTeX preprocessing code, I stumbled into this problem:
    (I have a -*- coding: utf-8 -*- line, obviously)
    >
    s = ur"añado $\uparrow$"
    >
    Which gave an error because the \u escape is interpreted in raw unicode
    strings, too. So I found that the only way to solve this is to write:
    >
    s = unicode(r"añad o $\uparrow$", "utf-8")
    >
    or
    >
    s = ur"añado $\u005cuparrow$ "
    >
    The second one is too ugly to live, while the first is at least
    acceptable; but looking around the Python 3.0 doc, I saw that the first
    one will fail, too.
    >
    Am I doing something wrong here or there is another solution for this?
    Why don't you rid yourself of the raw-string? Then you need to do

    s = u"anando $\\uparrow$"

    which is considerably easier to read than both other variants above.

    Diez

    Comment

    • OKB (not okblacke)

      #3
      Re: Embedding a literal "\u&quo t; in a unicode raw string.

      Romano Giannetti wrote:
      Hi,
      >
      while writing some LaTeX preprocessing code, I stumbled into this
      problem: (I have a -*- coding: utf-8 -*- line, obviously)
      >
      s = ur"añado $\uparrow$"
      >
      Which gave an error because the \u escape is interpreted in raw
      unicode strings, too. So I found that the only way to solve this is
      to write:
      >
      s = unicode(r"añad o $\uparrow$", "utf-8")
      >
      or
      >
      s = ur"añado $\u005cuparrow$ "
      >
      The second one is too ugly to live, while the first is at least
      acceptable; but looking around the Python 3.0 doc, I saw that the
      first one will fail, too.
      >
      Am I doing something wrong here or there is another solution for
      this?
      I too encountered this problem, in the same situation (making
      strings that contain LaTeX commands). One possibility is to separate
      out just the bit that has the \u, and use string juxtaposition to attach
      it to the others:

      s = ur"añado " u"$\\uparrow $"

      It's not ideal, but I think it's easier to read than your solution
      #2.


      --
      --OKB (not okblacke)
      Brendan Barnwell
      "Do not follow where the path may lead. Go, instead, where there is
      no path, and leave a trail."
      --author unknown

      Comment

      • romano.giannetti@gmail.com

        #4
        Re: Embedding a literal "\u&quo t; in a unicode raw string.

        On Feb 25, 6:03 pm, "OKB (not okblacke)"
        <brenNOSPAMb... @NObrenSPAMbarn .netwrote:
        >
        I too encountered this problem, in the same situation (making
        strings that contain LaTeX commands). One possibility is to separate
        out just the bit that has the \u, and use string juxtaposition to attach
        it to the others:
        >
        s = ur"añado " u"$\\uparrow $"
        >
        It's not ideal, but I think it's easier to read than your solution
        #2.
        >
        Yes, I think I will do something like that, although... I really do
        not understand why \x5c is not interpreted in a raw string but \u005c
        is interpreted in a unicode raw string... is, well, not elegant. Raw
        should be raw...

        Thanks anyway

        Comment

        • NickC

          #5
          Re: Embedding a literal &quot;\u&quo t; in a unicode raw string.

          On Feb 26, 8:45 am, rmano <romano.gianne. ..@gmail.comwro te:
          BTW, 2to3.py should warn when a raw string (not unicode) with \u in
          it, I think.
          I tried it and it seems to ignore the problem...
          Python 3.0a3+ (py3k:61229, Mar 4 2008, 21:38:15)
          [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
          Type "help", "copyright" , "credits" or "license" for more information.
          >>r"\u"
          '\\u'
          >>r"\uparrow"
          '\\uparrow'
          >>r"\u005c"
          '\\u005c'
          >>r"\N{REVERS E SOLIDUS}"
          '\\N{REVERSE SOLIDUS}'
          >>"\u005c"
          '\\'
          >>"\N{REVERSE SOLIDUS}"
          '\\'

          2to3.py may be ignoring a problem, but existing raw 8-bit string
          literals containing a '\u' aren't going to be it. If anything is going
          to have a problem with conversion to Py3k at this point, it is raw
          Unicode literals that contain a Unicode escape.

          Comment

          • rmano

            #6
            Re: Embedding a literal &quot;\u&quo t; in a unicode raw string.

            On Mar 4, 1:00 pm, NickC <ncogh...@gmail .comwrote:
            >
            Python 3.0a3+ (py3k:61229, Mar 4 2008, 21:38:15)
            [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
            Type "help", "copyright" , "credits" or "license" for more information.>>r "\u"
            '\\u'
            >r"\uparrow"
            '\\uparrow'
            Nice to know... so it seems that the 3.0 doc was not updated. I think
            this is the correct
            behaviour. Thanks

            Comment

            Working...