brackets content regular expression

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • netimen

    brackets content regular expression

    I have a text containing brackets (or what is the correct term for
    '>'?). I'd like to match text in the uppermost level of brackets.

    So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt ff 2 >
    bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
    bbb < a <tt ff 2 )?

    P.S. sorry for my english.
  • Alex_Gaynor

    #2
    Re: brackets content regular expression

    On Oct 31, 1:25 pm, netimen <neti...@gmail. comwrote:
    I have a text containing brackets (or what is the correct term for
    '>'?). I'd like to match text in the uppermost level of brackets.
    >
    So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt  ff 2 >
    bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
    bbb < a <tt  ff 2 )?
    >
    P.S. sorry for my english.
    I think this is what you're looking for:

    In [11]: re.compile('\<( .*)\>').findall ('aaaa 123 < 1 aaa < t bbb < a
    <tt ff 2 bbbbb')
    Out[11]: [' 1 aaa < t bbb < a <tt ff 2 ']

    Comment

    • Paul McGuire

      #3
      Re: brackets content regular expression

      On Oct 31, 12:25 pm, netimen <neti...@gmail. comwrote:
      I have a text containing brackets (or what is the correct term for
      '>'?). I'd like to match text in the uppermost level of brackets.
      >
      So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt  ff 2 >
      bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
      bbb < a <tt  ff 2 )?
      >
      P.S. sorry for my english.
      To match opening and closing parens, delimiters, whatever (I refer to
      these '<>' as "angle brackets" when talking about them in this
      context, otherwise they are just "less than" and "greater than"), you
      will need some kind of stack-based parser. You can write your own
      without much trouble - there are built-ins in pyparsing that do most
      of the work.

      Here is the nestedExpr method:
      >>from pyparsing import nestedExpr
      >>print nestedExpr('<', '>').searchStri ng('aaaa 123 < 1 aaa < t bbb < a <tt ff 2 bbbbb')
      [[['1', 'aaa', ['t', 'bbb', ['a', ['tt'], 'ff']], '2']]]

      Note that the results show not the original nested text, but the
      parsed words in a fully nested structure.

      If all you want is the highest-level text, then you can wrap your
      nestedExpr parser inside a call to originalTextFor :
      >>from pyparsing import originalTextFor
      >>print originalTextFor (nestedExpr('<' ,'>')).searchSt ring('aaaa 123 < 1 aaa < t bbb < a <tt ff 2 bbbbb')
      [['< 1 aaa < t bbb < a <tt ff 2 >']]

      More on pyparsing at http://pyparsing.wikispaces.com.

      -- Paul

      Comment

      • Matimus

        #4
        Re: brackets content regular expression

        On Oct 31, 10:25 am, netimen <neti...@gmail. comwrote:
        I have a text containing brackets (or what is the correct term for
        '>'?). I'd like to match text in the uppermost level of brackets.
        >
        So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt  ff 2 >
        bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
        bbb < a <tt  ff 2 )?
        >
        P.S. sorry for my english.
        I think most people call them "angle brackets". Anyway it should be
        easy to just match the outer most brackets:
        >>import re
        >>text = "aaaa 123 < 1 aaa < t bbb < a <tt ff 2 >"
        >>r = re.compile("<(. +)>")
        >>m = r.search(text)
        >>m.group(1)
        ' 1 aaa < t bbb < a <tt ff 2 '

        In this case the regular expression is automatically greedy, matching
        the largest area possible. Note however that it won't work if you have
        something like this: "<first<second> ".

        Matt

        Comment

        • netimen

          #5
          Re: brackets content regular expression

          Thank's but if i have several top-level groups and want them match one
          by one:

          text = "a < b < Ó d here starts a new group: < e < f g >"

          I want to match first " b < Ó d " and then " e < f g " but not "
          b < Ó d here starts a new group: < e < f g "
          On 31 ÏËÔ, 20:53, Matimus <mccre...@gmail .comwrote:
          On Oct 31, 10:25šam, netimen <neti...@gmail. comwrote:
          >
          I have a text containing brackets (or what is the correct term for
          '>'?). I'd like to match text in the uppermost level of brackets.
          >
          So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt šff 2 >
          bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
          bbb < a <tt šff 2 )?
          >
          P.S. sorry for my english.
          >
          I think most people call them "angle brackets". Anyway it should be
          easy to just match the outer most brackets:
          >
          >import re
          >text = "aaaa 123 < 1 aaa < t bbb < a <tt šff 2 >"
          >r = re.compile("<(. +)>")
          >m = r.search(text)
          >m.group(1)
          >
          ' 1 aaa < t bbb < a <tt šff 2 '
          >
          In this case the regular expression is automatically greedy, matching
          the largest area possible. Note however that it won't work if you have
          something like this: "<first<second> ".
          >
          Matt

          Comment

          • netimen

            #6
            Re: brackets content regular expression

            there may be different levels of nesting:

            "a < b < Ó d here starts a new group: < 1 < e < f g 2 >
            another group: < 3 >"

            On 31 окт, 21:57, netimen <neti...@gmail. comwrote:
            Thank's but if i have several top-level groups and want them match one
            by one:
            >
            text = "a < b < Ó d here starts a new group:  < e < f  g >"
            >
            I want to match first " b < Ó d " and then " e < f  g " butnot "
            b < Ó d here starts a new group:  < e < f  g "
            On 31 ÏËÔ, 20:53, Matimus <mccre...@gmail .comwrote:
            >
            >
            >
            On Oct 31, 10:25šam, netimen <neti...@gmail. comwrote:
            >
            I have a text containing brackets (or what is the correct term for
            '>'?). I'd like to match text in the uppermost level of brackets.
            >
            So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt šff 2 >
            bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
            bbb < a <tt šff 2 )?
            >
            P.S. sorry for my english.
            >
            I think most people call them "angle brackets". Anyway it should be
            easy to just match the outer most brackets:
            >
            >>import re
            >>text = "aaaa 123 < 1 aaa < t bbb < a <tt šff 2 >"
            >>r = re.compile("<(. +)>")
            >>m = r.search(text)
            >>m.group(1)
            >
            ' 1 aaa < t bbb < a <tt šff 2 '
            >
            In this case the regular expression is automatically greedy, matching
            the largest area possible. Note however that it won't work if you have
            something like this: "<first<second> ".
            >
            Matt

            Comment

            • bearophileHUGS@lycos.com

              #7
              Re: brackets content regular expression

              netimen:
              Thank's but if i have several top-level groups and want them match one
              by one:
              text = "a < b < Ó d here starts a new group:  < e < f  g >"
              What other requirements do you have? If you list them all at once
              people will write you the code faster.

              bye,
              Bearophile

              Comment

              • Pierre Quentel

                #8
                Re: brackets content regular expression

                On 31 oct, 20:38, netimen <neti...@gmail. comwrote:
                there may be different levels of nesting:
                >
                "a < b < Ó d here starts a new group: < 1 < e < f  g 2 >
                another group: < 3 >"
                >
                On 31 окт, 21:57, netimen <neti...@gmail. comwrote:
                >
                Thank's but if i have several top-level groups and want them match one
                by one:
                >
                text = "a < b < Ó d here starts a new group:  < e < f  g >"
                >
                I want to match first " b < Ó d " and then " e < f  g " but not "
                b < Ó d here starts a new group:  < e < f  g "
                On 31 ÏËÔ, 20:53, Matimus <mccre...@gmail .comwrote:
                >
                On Oct 31, 10:25šam, netimen <neti...@gmail. comwrote:
                >
                I have a text containing brackets (or what is the correct term for
                '>'?). I'd like to match text in the uppermost level of brackets.
                >
                So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt šff >2 >
                bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
                bbb < a <tt šff 2 )?
                >
                P.S. sorry for my english.
                >
                I think most people call them "angle brackets". Anyway it should be
                easy to just match the outer most brackets:
                >
                >import re
                >text = "aaaa 123 < 1 aaa < t bbb < a <tt šff 2 >"
                >r = re.compile("<(. +)>")
                >m = r.search(text)
                >m.group(1)
                >
                ' 1 aaa < t bbb < a <tt šff 2 '
                >
                In this case the regular expression is automatically greedy, matching
                the largest area possible. Note however that it won't work if you have
                something like this: "<first<second> ".
                >
                Matt
                >
                >
                Hi,

                Regular expressions or pyparsing might be overkill for this problem ;
                you can use a simple algorithm to read each character, increment a
                counter when you find a < and decrement when you find a ; when the
                counter goes back to its initial value you have the end of a top level
                group

                Something like :

                def top_level(txt):
                level = 0
                start = None
                groups = []
                for i,car in enumerate(txt):
                if car == "<":
                level += 1
                if not start:
                start = i
                elif car == ">":
                level -= 1
                if start and level == 0:
                groups.append(t xt[start+1:i])
                start = None
                return groups

                print top_level("a < b < 0 d < 1 < e < f g 2 < 3 >")
                >[' b < 0 d ', ' 1 < e < f g 2 ', ' 3 ']
                Best,
                Pierre

                Comment

                • Matimus

                  #9
                  Re: brackets content regular expression

                  On Oct 31, 11:57 am, netimen <neti...@gmail. comwrote:
                  Thank's but if i have several top-level groups and want them match one
                  by one:
                  >
                  text = "a < b < Ó d here starts a new group:  < e < f  g >"
                  >
                  I want to match first " b < Ó d " and then " e < f  g " but not "
                  b < Ó d here starts a new group:  < e < f  g "
                  On 31 ÏËÔ, 20:53, Matimus <mccre...@gmail .comwrote:
                  >
                  On Oct 31, 10:25šam, netimen <neti...@gmail. comwrote:
                  >
                  I have a text containing brackets (or what is the correct term for
                  '>'?). I'd like to match text in the uppermost level of brackets.
                  >
                  So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt šff 2 >
                  bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
                  bbb < a <tt šff 2 )?
                  >
                  P.S. sorry for my english.
                  >
                  I think most people call them "angle brackets". Anyway it should be
                  easy to just match the outer most brackets:
                  >
                  >>import re
                  >>text = "aaaa 123 < 1 aaa < t bbb < a <tt šff 2 >"
                  >>r = re.compile("<(. +)>")
                  >>m = r.search(text)
                  >>m.group(1)
                  >
                  ' 1 aaa < t bbb < a <tt šff 2 '
                  >
                  In this case the regular expression is automatically greedy, matching
                  the largest area possible. Note however that it won't work if you have
                  something like this: "<first<second> ".
                  >
                  Matt
                  >
                  >
                  As far as I know, you can't do that with a regular expressions (by
                  definition regular expressions aren't recursive). You can use a
                  regular expression to aid you, but there is no magic expression that
                  will give it to you for free.

                  In this case it is actually pretty easy to do it without regular
                  expressions at all:
                  >>text = "a < b < O d here starts a new group: < e < f g >"
                  >>def get_nested_stri ngs(text, depth=0):
                  .... stack = []
                  .... for i, c in enumerate(text) :
                  .... if c == '<':
                  .... stack.append(i)
                  .... elif c == '>':
                  .... start = stack.pop() + 1
                  .... if len(stack) == depth:
                  .... yield text[start:i]
                  ....
                  >>for seg in get_nested_stri ngs(text):
                  .... print seg
                  ....
                  b < O d
                  e < f g


                  Matt

                  Comment

                  • netimen

                    #10
                    Re: brackets content regular expression

                    Yeah, I know it's quite simple to do manually. I was just interested
                    if it could be done by regular expressions. Thank you anyway.
                    On 1 нояб, 00:36, Matimus <mccre...@gmail .comwrote:
                    On Oct 31, 11:57 am, netimen <neti...@gmail. comwrote:
                    >
                    >
                    >
                    >
                    >
                    Thank's but if i have several top-level groups and want them match one
                    by one:
                    >
                    text = "a < b < Ó d here starts a new group:  < e < f  g >"
                    >
                    I want to match first " b < Ó d " and then " e < f  g " but not "
                    b < Ó d here starts a new group:  < e < f  g "
                    On 31 ÏËÔ, 20:53, Matimus <mccre...@gmail .comwrote:
                    >
                    On Oct 31, 10:25šam, netimen <neti...@gmail. comwrote:
                    >
                    I have a text containing brackets (or what is the correct term for
                    '>'?). I'd like to match text in the uppermost level of brackets.
                    >
                    So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt šff >2 >
                    bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
                    bbb < a <tt šff 2 )?
                    >
                    P.S. sorry for my english.
                    >
                    I think most people call them "angle brackets". Anyway it should be
                    easy to just match the outer most brackets:
                    >
                    >import re
                    >text = "aaaa 123 < 1 aaa < t bbb < a <tt šff 2 >"
                    >r = re.compile("<(. +)>")
                    >m = r.search(text)
                    >m.group(1)
                    >
                    ' 1 aaa < t bbb < a <tt šff 2 '
                    >
                    In this case the regular expression is automatically greedy, matching
                    the largest area possible. Note however that it won't work if you have
                    something like this: "<first<second> ".
                    >
                    Matt
                    >
                    As far as I know, you can't do that with a regular expressions (by
                    definition regular expressions aren't recursive). You can use a
                    regular expression to aid you, but there is no magic expression that
                    will give it to you for free.
                    >
                    In this case it is actually pretty easy to do it without regular
                    expressions at all:
                    >
                    >text = "a < b < O d here starts a new group:  < e < f  g >"
                    >def get_nested_stri ngs(text, depth=0):
                    >
                    ...     stack = []
                    ...     for i, c in enumerate(text) :
                    ...         if c == '<':
                    ...             stack.append(i)
                    ...         elif c == '>':
                    ...             start = stack.pop() + 1
                    ...             if len(stack) == depth:
                    ...                 yield text[start:i]
                    ...>>for seg in get_nested_stri ngs(text):
                    >
                    ...  print seg
                    ...
                     b < O d
                     e < f  g
                    >
                    Matt

                    Comment

                    Working...