optparse escaping control characters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • wannymahoots@gmail.com

    optparse escaping control characters

    optparse seems to be escaping control characters that I pass as
    arguments on the command line. Is this a bug? Am I missing
    something? Can this be prevented, or worked around?

    This behaviour doesn't occur with non-control characters.

    For example, if this program (called test.py):
    from optparse import OptionParser
    parser = OptionParser()
    parser.add_opti on("-d", dest="delimiter ", action="store")
    (options, args) = parser.parse_ar gs()
    print options

    is run as follows:
    python test.py -d '\t'

    it outputs:
    {'delimiter': '\\t'}

    i.e. the \t has had an escape character added to give \\t.

  • Hrvoje Niksic

    #2
    Re: optparse escaping control characters

    wannymahoots@gm ail.com writes:
    optparse seems to be escaping control characters that I pass as
    arguments on the command line. Is this a bug? Am I missing
    something? Can this be prevented, or worked around?
    It has nothing to do with optparse, it's how Python prints strings:

    $ python -c 'import sys; print sys.argv' '\t'
    ['-c', '\\t']

    Note that you're not really passing a control character to Python,
    you're passing a two-character string consisting of \ and t. When
    representing the string inside a data structure, Python escapes the \
    to avoid confusion with a real control character such as \t.

    If you try printing the string itself, you'll see that everything is
    correct:

    $ python -c 'import sys; print sys.argv[1]' '\t'
    \t

    Comment

    • Fredrik Lundh

      #3
      Re: optparse escaping control characters

      wannymahoots@gm ail.com wrote:
      optparse seems to be escaping control characters that I pass as
      arguments on the command line. Is this a bug? Am I missing
      something?
      you're missing the distinction between the content of a string object,
      and how the corresponding string literal looks.
      >>x = {'delimiter': '\\t'}
      >>x
      {'delimiter': '\\t'}
      >>x["delimiter"]
      '\\t'
      >>print x["delimiter"]
      \t
      >>len(x["delimiter"])
      2

      </F>

      Comment

      • Dan Halligan

        #4
        Re: optparse escaping control characters

        On Aug 19, 1:45 pm, Hrvoje Niksic <hnik...@xemacs .orgwrote:
        wannymaho...@gm ail.com writes:
        optparse seems to be escaping control characters that I pass as
        arguments on the command line.  Is this a bug?  Am I missing
        something?  Can this be prevented, or worked around?
        >
        It has nothing to do with optparse, it's how Python prints strings:
        >
        $ python -c 'import sys; print sys.argv' '\t'
        ['-c', '\\t']
        >
        Note that you're not really passing a control character to Python,
        you're passing a two-character string consisting of \ and t.  When
        representing the string inside a data structure, Python escapes the \
        to avoid confusion with a real control character such as \t.
        >
        If you try printing the string itself, you'll see that everything is
        correct:
        >
        $ python -c 'import sys; print sys.argv[1]' '\t'
        \t
        Thanks for the reply, much clearer now, just one more question. How
        would I pass a control character to python on the command line?

        Comment

        • John Machin

          #5
          Re: optparse escaping control characters

          On Aug 19, 10:35 pm, wannymaho...@gm ail.com wrote:
          optparse seems to be escaping control characters that I pass as
          arguments on the command line. Is this a bug? Am I missing
          something? Can this be prevented, or worked around?
          >
          This behaviour doesn't occur with non-control characters.
          >
          For example, if this program (called test.py):
          from optparse import OptionParser
          parser = OptionParser()
          parser.add_opti on("-d", dest="delimiter ", action="store")
          (options, args) = parser.parse_ar gs()
          print options
          >
          is run as follows:
          python test.py -d '\t'
          >
          it outputs:
          {'delimiter': '\\t'}
          >
          i.e. the \t has had an escape character added to give \\t.
          You are inputting a TWO-byte string composed of a backslash and a
          lowercase t, and feeding that to OptionParser.

          C:\junk>type test.py
          import sys; a = sys.argv[1]; d = {'delimiter': a}
          print len(a), a, str(a), repr(a)
          print d

          # Note: this is Windows, where the shell quote is ", not '
          C:\junk>python test.py "\t"
          2 \t \t '\\t'
          {'delimiter': '\\t'}

          The extra backslash that you see is caused by the (implicit) use of
          repr() to display the string.

          If you want/need to enter a literal TAB character in the command line,
          consult the manual for your shell.

          HTH,
          John

          Comment

          • Steven D'Aprano

            #6
            Re: optparse escaping control characters

            On Tue, 19 Aug 2008 05:35:27 -0700, wannymahoots wrote:
            optparse seems to be escaping control characters that I pass as
            arguments on the command line. Is this a bug? Am I missing something?
            Can this be prevented, or worked around?
            You are misinterpreting the evidence. Here's the short explanation:

            optparse isn't escaping a control character, because you're not supplying
            it with a control character. You're supplying it with two normal
            characters, which merely *look* like five (including the quote marks)
            because of Python's special handling of backslashes.


            If you need it, here's the long-winded explanation.

            I've made a small change to your test.py file to demonstrate:

            # test.py (modified)
            from optparse import OptionParser
            parser = OptionParser()
            parser.add_opti on("-d", dest="delimiter ", action="store")
            (options, args) = parser.parse_ar gs()
            print "Options:", options
            print "str of options.delimit er =", str(options.del imiter)
            print "repr of options.delimit er =", repr(options.de limiter)
            print "len of options.delimit er =", len(options.del imiter)


            Here's what it does when I call it:

            $ python test.py -d '\t'
            Options: {'delimiter': '\\t'}
            str of options.delimit er = \t
            repr of options.delimit er = '\\t'
            len of options.delimit er = 2


            When you pass '\t' in the command line, the shell sends a literal
            backslash followed by a lowercase t to Python. That is, it sends the
            literal string '\t', not a control character.

            Proof: pass the same string to the "wc" program using "echo". Don't
            forget that echo adds a newline to the string:

            $ echo 't' | wc # just a t
            1 1 2
            $ echo '\t' | wc # a backslash and a t, not a control character
            1 1 3


            That's the first half of the puzzle. Now the second half -- why is Python
            adding a *second* backslash to the backslash-t? Actually, it isn't, but
            it *seems* to be adding not just a second backslash but also two quote
            marks.

            The backslash in Python is special. If you wanted a literal backslash t
            in a Python string, you would have to type *two* backslashes:

            '\\t'

            because a single backslash followed by t is escaped to make a tab
            character.

            But be careful to note that even though you typed five characters (quote,
            backslash, backslash, t, quote) Python creates a string of length two: a
            single backslash and a t.

            Now, when you print something using the str() function, Python hides all
            that complexity from you. Hence the line of output that looks like this:

            str of options.delimit er = \t

            The argument is a literal backslash followed by a t, not a tab character.

            But when you print using the repr() function, Python shows you what you
            would have typed -- five characters as follows:

            repr of options.delimit er = '\\t'

            But that's just the *display* of a two character string. The actual
            string itself is only two characters, despite the two quotes and the two
            backslashes.

            Now for the final piece of the puzzle: when you print most composite
            objects, like the OptParse Value objects -- the object named "options" in
            your code -- Python prints the internals of it using repr() rather than
            str().



            --
            Steven

            Comment

            • wannymahoots

              #7
              Re: optparse escaping control characters

              Thanks for all the responses!

              Comment

              • Hrvoje Niksic

                #8
                Re: optparse escaping control characters

                Dan Halligan <dan.halligan@g mail.comwrites:
                How would I pass a control character to python on the command line?
                It depends on which command line you are using. Most Unix-like shells
                will allow you to input a control character by preceding it with ^V.
                Since \t is the TAB character, you should be able to input it like
                this:

                $ python -c 'import sys; print sys.argv' '^V<tab>'
                ['-c', '\t'] # note single backslash

                Comment

                Working...