finding indices in a sequence of parentheses

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Steven Bethard

    finding indices in a sequence of parentheses

    I have a list of strings that looks something like:

    lst = ['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*', '*)', '*)']

    The parentheses in the labels indicate where an "annotation " starts and
    ends. So for example, the label '(*)' at index 2 of the list means that
    I have an annotation at (2, 2), and the labels '(*', '*', '(*', '*))' at
    indices 4 through 7 mean that I have an annotation at (4, 7) and an
    annotation at (6, 7).

    I'd like to determine all indices at which I have an annotation. So for
    the data above, I want the indices:

    (2, 2), (4, 7), (6, 7), (8, 9) and (8, 10)

    Here's what I'm doing now:

    py> def indices(lst):
    .... stack = []
    .... for i, s in enumerate(lst):
    .... if s == 'O':
    .... continue
    .... stack.extend([i]*s.count('('))
    .... if '*' in s and not stack:
    .... raise Exception('No start for %r at %i' % (s, i))
    .... for _ in range(s.count(' )')):
    .... try:
    .... yield stack.pop(), i
    .... except IndexError:
    .... raise Exception('No start for %r at %i' % (s, i))
    .... if stack:
    .... raise Exception('No ends for starts at %r' % stack)
    ....
    py> list(indices(['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*',
    '*)', '*)', '0']))
    [(2, 2), (6, 7), (4, 7), (8, 9), (8, 10)]

    I think that works right, but I'm not certain. So two questions:

    (1) Can anyone see anything wrong with the code above? and
    (2) Does anyone see an easier/clearer/simpler[1] way of doing this?

    Thanks,

    STeVe

    [1] Yes, I know easier/clearer/simpler are subjective terms. It's okay,
    I'm only looking for opinions here anyway. =)
  • tiissa

    #2
    Re: finding indices in a sequence of parentheses

    Steven Bethard wrote:[color=blue]
    > (2) Does anyone see an easier/clearer/simpler[1] way of doing this?[/color]

    I'd personnally extract the parenthesis then zip the lists of indices.
    In short:
    [color=blue][color=green][color=darkred]
    >>> def indices(mylist) :[/color][/color][/color]
    ... lopen=reduce(li st.__add__, [[i]*s.count('(') for i,s in
    enumerate(mylis t)],[])
    ... lclose=reduce(l ist.__add__, [[i]*s.count(')') for i,s in
    enumerate(mylis t)],[])
    ... return zip(lopen,lclos e)
    ...[color=blue][color=green][color=darkred]
    >>> indices(lst)[/color][/color][/color]
    [(2, 2), (4, 7), (6, 7), (8, 9), (8, 10)][color=blue][color=green][color=darkred]
    >>>[/color][/color][/color]

    Before returning, you can check if the lists have same size and if the
    '(' has lower or equal index than ')' in each of these couples. If not
    you can raise the appropriate exception.

    Disclaimer: not tested further than example above (but confident).

    Comment

    • Steven Bethard

      #3
      Re: finding indices in a sequence of parentheses

      tiissa wrote:[color=blue]
      > I'd personnally extract the parenthesis then zip the lists of indices.
      > In short:
      >[color=green][color=darkred]
      > >>> def indices(mylist) :[/color][/color]
      > ... lopen=reduce(li st.__add__, [[i]*s.count('(') for i,s in
      > enumerate(mylis t)],[])
      > ... lclose=reduce(l ist.__add__, [[i]*s.count(')') for i,s in
      > enumerate(mylis t)],[])
      > ... return zip(lopen,lclos e)
      > ...[color=green][color=darkred]
      > >>> indices(lst)[/color][/color]
      > [(2, 2), (4, 7), (6, 7), (8, 9), (8, 10)][color=green][color=darkred]
      > >>>[/color][/color][/color]

      Thanks, that's a good idea. In case anyone else is reading this thread,
      and had to mentally unwrap the reduce expressions, I believe they could
      be written as:

      lopen = [x for i, s in enumerate(lst) for x in [i]*s.count('(')]
      lclose = [x for i, s in enumerate(lst) for x in [i]*s.count(')')]

      or maybe:

      lopen = [i for i, s in enumerate(lst) for _ in xrange(s.count( '('))]
      lclose = [i for i, s in enumerate(lst) for _ in xrange(s.count( ')'))]

      Sorry, I have an irrational fear of reduce. ;)

      STeVe

      Comment

      • Raymond Hettinger

        #4
        Re: finding indices in a sequence of parentheses

        [Steven Bethard][color=blue][color=green]
        >> I have a list of strings that looks something like:
        >> lst = ['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*', '*)', '*)'][/color][/color]
        . . .[color=blue][color=green]
        >> I want the indices:
        >> (2, 2), (4, 7), (6, 7), (8, 9) and (8, 10)[/color][/color]



        opener_stack = []
        for i, elem in enumerate(lst):
        for c in elem:
        if c == '(':
        opener_stack.ap pend(i)
        elif c == ')':
        print opener_stack.po p(), i


        To see something like this in production code, look at
        Tools/scripts/texcheck.py



        Raymond Hettinger

        Comment

        • Peter Otten

          #5
          Re: finding indices in a sequence of parentheses

          tiissa wrote:
          [color=blue]
          > Steven Bethard wrote:[color=green]
          >> (2) Does anyone see an easier/clearer/simpler[1] way of doing this?[/color]
          >
          > I'd personnally extract the parenthesis then zip the lists of indices.
          > In short:
          >[color=green][color=darkred]
          > >>> def indices(mylist) :[/color][/color]
          > ... lopen=reduce(li st.__add__, [[i]*s.count('(') for i,s in
          > enumerate(mylis t)],[])
          > ... lclose=reduce(l ist.__add__, [[i]*s.count(')') for i,s in
          > enumerate(mylis t)],[])
          > ... return zip(lopen,lclos e)
          > ...[color=green][color=darkred]
          > >>> indices(lst)[/color][/color]
          > [(2, 2), (4, 7), (6, 7), (8, 9), (8, 10)][color=green][color=darkred]
          > >>>[/color][/color]
          >
          > Before returning, you can check if the lists have same size and if the
          > '(' has lower or equal index than ')' in each of these couples. If not
          > you can raise the appropriate exception.
          >
          > Disclaimer: not tested further than example above (but confident).[/color]

          Not tested but confident should be an oxymoron for a programmer. Some
          examples:

          lst: ['(', '(', ')', ')']
          hettinger [(1, 2), (0, 3)]
          bethard [(1, 2), (0, 3)]
          tiissa [(0, 2), (1, 3)] oops (or am I just spoilt by the XML spec?)

          lst: ['(', ')(', ')']
          hettinger [(0, 1), (1, 2)]
          bethard [(1, 1), (0, 2)] oops
          tiissa [(0, 1), (1, 2)]

          So far Raymond's solution is my favourite...

          Peter

          Comment

          • tiissa

            #6
            Re: finding indices in a sequence of parentheses

            Peter Otten wrote:[color=blue]
            > tiissa wrote:[color=green]
            >>Disclaimer: not tested further than example above (but confident).[/color]
            >
            > Not tested but confident should be an oxymoron for a programmer.[/color]

            Not tested but confident is an oxymoron for mathemtaticians .
            Programmers know better than that, they leave bugs in their code to have
            more work to do. ;)

            OTOH, you're right. Matching parentheses cannot be done without a stack,
            that should have rung a bell.

            Comment

            • Peter Otten

              #7
              Re: finding indices in a sequence of parentheses

              tiissa wrote:
              [color=blue]
              > Not tested but confident is an oxymoron for mathemtaticians .[/color]

              I think no amount of testing will give these strange people confidence.
              "Proof" is the magic word here.

              Peter

              Comment

              • tiissa

                #8
                Re: finding indices in a sequence of parentheses

                Peter Otten wrote:[color=blue]
                > I think no amount of testing will give these strange people confidence.
                > "Proof" is the magic word here.[/color]

                Some would maybe be satisfied if your tests cover the whole set of input.

                When that's possible, that may be useless. But that's not a matter to
                bother them with. ;)

                (And of course, you just don't tell them how you generated their output
                with the same program.)

                Comment

                • Steven Bethard

                  #9
                  Re: finding indices in a sequence of parentheses

                  Raymond Hettinger wrote:[color=blue]
                  > [Steven Bethard]
                  >[color=green][color=darkred]
                  >>>I have a list of strings that looks something like:
                  >>>lst = ['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*', '*)', '*)'][/color][/color]
                  >
                  > . . .
                  >[color=green][color=darkred]
                  >>>I want the indices:
                  >>>(2, 2), (4, 7), (6, 7), (8, 9) and (8, 10)[/color][/color]
                  >
                  > opener_stack = []
                  > for i, elem in enumerate(lst):
                  > for c in elem:
                  > if c == '(':
                  > opener_stack.ap pend(i)
                  > elif c == ')':
                  > print opener_stack.po p(), i
                  >[/color]

                  Thanks Raymond, this is definitely an elegant solution. It was also easy
                  to add all the error checking I needed into this one. For the curious,
                  my final solution looks something like:

                  def indices(lst):
                  stack = []
                  for i, elem in enumerate(lst):
                  for c in elem:
                  if c == '(':
                  stack.append(i)
                  elif c == ')' and not stack:
                  raise Exception('")" at %i without "("' % i)
                  elif c == ')':
                  yield stack.pop(), i
                  elif c == 'O' and stack:
                  raise Exception('"O" at %i after "(" at %i' %
                  (i, stack[-1]))
                  elif c == '*' and not stack:
                  raise Exception('"*" at %i without "("' % i)
                  if stack:
                  raise Exception('"(" at %r without ")"' % stack)

                  Thanks again!

                  STeVe

                  Comment

                  Working...