Quote-aware string splitting

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • J. W. McCall

    Quote-aware string splitting

    Hello,

    I need to split a string as per string.strip(), but with a modification:
    I want it to recognize quoted strings and return them as one list item,
    regardless of any whitespace within the quoted string.

    For example, given the string:

    'spam "the life of brian" 42'

    I'd want it to return:

    ['spam', 'the life of brian', '42']

    I see no standard library function to do this, so what would be the most
    simple way to achieve this? This should be simple, but I must be tired
    as I'm not currently able to think of an elegant way to do this.

    Any ideas?

    Thanks,

    J. W. McCall
  • Tim Heaney

    #2
    Re: Quote-aware string splitting

    "J. W. McCall" <jmccall@housto n.rr.com> writes:[color=blue]
    >
    > I need to split a string as per string.strip(), but with a
    > modification: I want it to recognize quoted strings and return them as
    > one list item, regardless of any whitespace within the quoted string.
    >
    > For example, given the string:
    >
    > 'spam "the life of brian" 42'
    >
    > I'd want it to return:
    >
    > ['spam', 'the life of brian', '42']
    >
    > I see no standard library function to do this, so what would be the
    > most simple way to achieve this? This should be simple, but I must be
    > tired as I'm not currently able to think of an elegant way to do this.
    >
    > Any ideas?[/color]

    How about the csv module? It seems like it might be overkill, but it
    does already handle that sort of quoting
    [color=blue][color=green][color=darkred]
    >>> import csv
    >>> csv.reader(['spam "the life of brian" 42'], delimiter=' ').next()[/color][/color][/color]
    ['spam', 'the life of brian', '42']

    Comment

    • George Sakkis

      #3
      Re: Quote-aware string splitting

      > "J. W. McCall" <jmccall@housto n.rr.com> writes:[color=blue][color=green]
      > >
      > > I need to split a string as per string.strip(), but with a
      > > modification: I want it to recognize quoted strings and return them[/color][/color]
      as[color=blue][color=green]
      > > one list item, regardless of any whitespace within the quoted[/color][/color]
      string.[color=blue][color=green]
      > >
      > > For example, given the string:
      > >
      > > 'spam "the life of brian" 42'
      > >
      > > I'd want it to return:
      > >
      > > ['spam', 'the life of brian', '42']
      > >
      > > I see no standard library function to do this, so what would be the
      > > most simple way to achieve this? This should be simple, but I must[/color][/color]
      be[color=blue][color=green]
      > > tired as I'm not currently able to think of an elegant way to do[/color][/color]
      this.[color=blue][color=green]
      > >
      > > Any ideas?[/color]
      >
      > How about the csv module? It seems like it might be overkill, but it
      > does already handle that sort of quoting
      >[color=green][color=darkred]
      > >>> import csv
      > >>> csv.reader(['spam "the life of brian" 42'], delimiter='[/color][/color][/color]
      ').next()[color=blue]
      > ['spam', 'the life of brian', '42']
      >[/color]


      I don't know if this is as good as CSV's splitter, but it works
      reasonably well for me:

      import re
      regex = re.compile(r'''
      '.*?' | # single quoted substring
      ".*?" | # double quoted substring
      \S+ # all the rest
      ''', re.VERBOSE)

      print regex.findall(' ''
      This is 'single "quoted" string'
      followed by a "double 'quoted' string"
      ''')

      George

      Comment

      • Jeffrey Froman

        #4
        Re: Quote-aware string splitting

        J. W. McCall wrote:
        [color=blue]
        > For example, given the string:
        >
        > 'spam "the life of brian" 42'
        >
        > I'd want it to return:
        >
        > ['spam', 'the life of brian', '42'][/color]

        The .split() method of strings can take a substring, such as a quotation
        mark, as a delimiter. So a simple solution is:
        [color=blue][color=green][color=darkred]
        >>> x = 'spam "the life of brian" 42'
        >>> [z.strip() for z in x.split('"')][/color][/color][/color]
        ['spam', 'the life of brian', '42']


        Jeffrey

        Comment

        • George Sakkis

          #5
          Re: Quote-aware string splitting

          > import re[color=blue]
          > regex = re.compile(r'''
          > '.*?' | # single quoted substring
          > ".*?" | # double quoted substring
          > \S+ # all the rest
          > ''', re.VERBOSE)[/color]

          Oh, and if your strings may span more than one line, replace re.VERBOSE
          with re.VERBOSE | re.DOTALL.

          George

          Comment

          • Bengt Richter

            #6
            Re: Quote-aware string splitting

            On Mon, 25 Apr 2005 19:40:44 -0700, Jeffrey Froman <jeffrey@fro.ma n> wrote:
            [color=blue]
            >J. W. McCall wrote:
            >[color=green]
            >> For example, given the string:
            >>
            >> 'spam "the life of brian" 42'
            >>
            >> I'd want it to return:
            >>
            >> ['spam', 'the life of brian', '42'][/color]
            >
            >The .split() method of strings can take a substring, such as a quotation
            >mark, as a delimiter. So a simple solution is:
            >[color=green][color=darkred]
            >>>> x = 'spam "the life of brian" 42'
            >>>> [z.strip() for z in x.split('"')][/color][/color]
            >['spam', 'the life of brian', '42']
            >[/color]
            [color=blue][color=green][color=darkred]
            >>> x = ' sspam " ssthe life of brianss " 42'
            >>> [z.strip() for z in x.split('"')][/color][/color][/color]
            ['sspam', 'ssthe life of brianss', '42']

            Oops, note some spaces inside quotes near ss and missing double quotes in result.
            Maybe (not tested beyond what you see):
            [color=blue][color=green][color=darkred]
            >>> [r for r in [(i%2 and ['"'+z+'"'] or [z.strip()])[0] for i,z in enumerate(x.spl it('"'))] if r] or [''][/color][/color][/color]
            ['sspam', '" ssthe life of brianss "', '42'][color=blue][color=green][color=darkred]
            >>> x = ' "" "" '
            >>> [r for r in [(i%2 and ['"'+z+'"'] or [z.strip()])[0] for i,z in enumerate(x.spl it('"'))] ifr] or [''][/color][/color][/color]
            ['""', '""'][color=blue][color=green][color=darkred]
            >>> x='""'
            >>> [r for r in [(i%2 and ['"'+z+'"'] or [z.strip()])[0] for i,z in enumerate(x.spl it('"'))] ifr] or [''][/color][/color][/color]
            ['""'][color=blue][color=green][color=darkred]
            >>> x=''
            >>> [r for r in [(i%2 and ['"'+z+'"'] or [z.strip()])[0] for i,z in enumerate(x.spl it('"'))] ifr] or [''][/color][/color][/color]
            ['']
            [color=blue][color=green][color=darkred]
            >>> [(i%2 and ['"'+z+'"'] or [z.strip()])[0] for i,z in enumerate(x.spl it('"'))][/color][/color][/color]
            ['sspam', '" ssthe life of brianss "', '42']


            Regards,
            Bengt Richter

            Comment

            • Paul McGuire

              #7
              Re: Quote-aware string splitting

              Quoted strings are surprisingly stateful, so that using a parser isn't
              totally out of line. Here is a pyparsing example with some added test
              cases. Pyparsing's quotedString built-in handles single or double
              quotes (if you don't want to be this permissive, there are also
              sglQuotedString and dblQuotedString to choose from), plus escaped quote
              characters.

              The snippet below includes two samples. The first 3 lines give the
              equivalent to other suggestions on this thread. It is followed by a
              slightly enhanced version that strips quotation marks from any quoted
              entries.

              -- Paul
              (get pyparsing at http://pyparsing.sourceforge.net)
              ==========
              from pyparsing import *
              test = r'''spam 'it don\'t mean a thing' "the life of brian"
              42 'the meaning of "life"' grail'''
              print OneOrMore( quotedString | Word(printables ) ).parseString( test )

              # strip quotes during parsing
              def stripQuotes(s,l ,toks):
              return toks[0][1:-1]
              quotedString.se tParseAction( stripQuotes )
              print OneOrMore( quotedString | Word(printables ) ).parseString( test )
              ==========

              returns:
              ['spam', "'it don\\'t mean a thing'", '"the life of brian"', '42',
              '\'the meaning of "life"\'', 'grail']
              ['spam', "it don\\'t mean a thing", 'the life of brian', '42', 'the
              meaning of "life"', 'grail']

              Comment

              • Jeffrey Froman

                #8
                Re: Quote-aware string splitting

                Bengt Richter wrote:
                [color=blue]
                > Oops, note some spaces inside quotes near ss and missing double quotes in
                > result.[/color]

                And here I thought the main problem with my answer was that it didn't split
                unquoted segments into separate words at all! Clearly I missed the
                generalization being sought, and a more robust solution is in order.
                Fortunately, others have been forthcoming with them.

                Thank you,
                Jeffrey

                Comment

                Working...