text processing

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • jitenshah78@gmail.com

    text processing

    I have string like follow
    12560/ABC,12567/BC,123,567,890/JK

    I want above string to group like as follow
    (12560,ABC)
    (12567,BC)
    (123,567,890,JK )

    i try regular expression i am able to get first two not the third one.
    can regular expression given data in different groups


  • Marc 'BlackJack' Rintsch

    #2
    Re: text processing

    On Thu, 25 Sep 2008 15:51:28 +0100, jitenshah78@gma il.com wrote:
    I have string like follow
    12560/ABC,12567/BC,123,567,890/JK
    >
    I want above string to group like as follow (12560,ABC)
    (12567,BC)
    (123,567,890,JK )
    >
    i try regular expression i am able to get first two not the third one.
    can regular expression given data in different groups
    Without regular expressions:

    def group(string):
    result = list()
    for item in string.split(', '):
    if '/' in item:
    result.extend(i tem.split('/'))
    yield tuple(result)
    result = list()
    else:
    result.append(i tem)

    def main():
    string = '12560/ABC,12567/BC,123,567,890/JK'
    print list(group(stri ng))

    Ciao,
    Marc 'BlackJack' Rintsch

    Comment

    • MRAB

      #3
      Re: text processing

      On Sep 25, 6:34 pm, Marc 'BlackJack' Rintsch <bj_...@gmx.net wrote:
      On Thu, 25 Sep 2008 15:51:28 +0100, jitensha...@gma il.com wrote:
      I have string like follow
      12560/ABC,12567/BC,123,567,890/JK
      >
      I want above string to group like as follow (12560,ABC)
      (12567,BC)
      (123,567,890,JK )
      >
      i try regular expression i am able to get first two not the third one.
      can regular expression given data in different groups
      >
      Without regular expressions:
      >
      def group(string):
          result = list()
          for item in string.split(', '):
              if '/' in item:
                  result.extend(i tem.split('/'))
                  yield tuple(result)
                  result = list()
              else:
                  result.append(i tem)
      >
      def main():
          string = '12560/ABC,12567/BC,123,567,890/JK'
          print list(group(stri ng))
      >
      How about:
      >>string = "12560/ABC,12567/BC,123,567,890/JK"
      >>r = re.findall(r"(\ d+(?:,\d+)*/\w+)", string)
      >>r
      ['12560/ABC', '12567/BC', '123,567,890/JK']
      >>[tuple(x.replace (",", "/").split("/")) for x in r]
      [('12560', 'ABC'), ('12567', 'BC'), ('123', '567', '890', 'JK')]

      Comment

      • Paul McGuire

        #4
        Re: text processing

        On Sep 25, 9:51 am, "jitensha...@gm ail.com" <jitensha...@gm ail.com>
        wrote:
        I have string like follow
        12560/ABC,12567/BC,123,567,890/JK
        >
        I want above string to group like as follow
        (12560,ABC)
        (12567,BC)
        (123,567,890,JK )
        >
        i try regular expression i am able to get first two not the third one.
        can regular expression given data in different groups
        Looks like each item is:
        - a list of 1 or more integers, in a comma-delimited list
        - a slash
        - a word composed of alpha characters

        And the whole thing is a list of items in a comma-delimited list

        Now to implement that in pyparsing:
        >>data = "12560/ABC,12567/BC,123,567,890/JK"
        >>from pyparsing import Suppress, delimitedList, Word, alphas, nums, Group
        >>SLASH = Suppress('/')
        >>dataitem = delimitedList(W ord(nums)) + SLASH + Word(alphas)
        >>dataformat = delimitedList(G roup(dataitem))
        >>map(tuple, dataformat.pars eString(data))
        [('12560', 'ABC'), ('12567', 'BC'), ('123', '567', '890', 'JK')]

        Wah-lah! (as one of my wife's 1st graders announced in one of his
        school papers)

        -- Paul


        Comment

        Working...