Python Regex Question

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • MalteseUnderdog

    Python Regex Question


    Hi there I just started python (but this question isn't that trivial
    since I couldn't find it in google :) )

    I have the following text file entries (simplified)

    start #frag 1 start
    x=Dog # frag 1 end
    stop
    start # frag 2 start
    x=Cat # frag 2 end
    stop
    start #frag 3 start
    x=Dog #frag 3 end
    stop
    .....

    I need a regex expression which returns the start to the x=ANIMAL for
    only the x=Dog fragments so all my entries should be start ...
    (something here) ... x=Dog . So I am really interested in fragments 1
    and 3 only.

    My idea (primitive) ^start.*?x=Dog doesn't work because clearly it
    would return results

    start
    x=Dog # (good)

    and

    start
    x=Cat
    stop
    start
    x=Dog # bad since I only want start ... x=Dog portion

    Can you help me ?

    Thanks
    JP, Malta.
  • Tim Chase

    #2
    Re: Python Regex Question

    I need a regex expression which returns the start to the x=ANIMAL for
    only the x=Dog fragments so all my entries should be start ...
    (something here) ... x=Dog . So I am really interested in fragments 1
    and 3 only.
    >
    My idea (primitive) ^start.*?x=Dog doesn't work because clearly it
    would return results
    >
    start
    x=Dog # (good)
    >
    and
    >
    start
    x=Cat
    stop
    start
    x=Dog # bad since I only want start ... x=Dog portion
    Looks like the following does the trick:
    >>s = """start #frag 1 start
    .... x=Dog # frag 1 end
    .... stop
    .... start # frag 2 start
    .... x=Cat # frag 2 end
    .... stop
    .... start #frag 3 start
    .... x=Dog #frag 3 end
    .... stop"""
    >>import re
    >>r = re.compile(r'^s tart.*\nx=Dog.* \nstop.*', re.MULTILINE)
    >>for i, result in enumerate(r.fin dall(s)):
    .... print i, repr(result)
    ....
    0 'start #frag 1 start\nx=Dog # frag 1 end\nstop'
    1 'start #frag 3 start\nx=Dog #frag 3 end\nstop'

    -tkc







    Comment

    • Arnaud Delobelle

      #3
      Re: Python Regex Question

      On Oct 29, 7:01 pm, Tim Chase <python.l...@ti m.thechases.com wrote:
      I need a regex expression which returns the start to the x=ANIMAL for
      only the x=Dog fragments so all my entries should be start ...
      (something here) ... x=Dog .  So I am really interested in fragments 1
      and 3 only.
      >
      My idea (primitive) ^start.*?x=Dog doesn't work because clearly it
      would return results
      >
      start
      x=Dog  # (good)
      >
      and
      >
      start
      x=Cat
      stop
      start
      x=Dog # bad since I only want start ... x=Dog portion
      >
      Looks like the following does the trick:
      >
       >>s = """start      #frag 1 start
      ... x=Dog # frag 1 end
      ... stop
      ... start    # frag 2 start
      ... x=Cat # frag 2 end
      ... stop
      ... start     #frag 3 start
      ... x=Dog #frag 3 end
      ... stop"""
       >>import re
       >>r = re.compile(r'^s tart.*\nx=Dog.* \nstop.*', re.MULTILINE)
       >>for i, result in enumerate(r.fin dall(s)):
      ...     print i, repr(result)
      ...
      0 'start      #frag 1 start\nx=Dog # frag 1 end\nstop'
      1 'start     #frag 3 start\nx=Dog #frag 3 end\nstop'
      >
      -tkc
      This will only work if 'x=Dog' directly follows 'start' (which happens
      in the given example). If that's not necessarily the case, I would do
      it in two steps (in fact I wouldn't use regexps probably but...):
      >>for chunk in re.split(r'\nst op', data):
      .... m = re.search('^sta rt.*^x=Dog', chunk, re.DOTALL |
      re.MULTILINE)
      .... if m: print repr(m.group())
      ....
      'start #frag 1 start \nx=Dog'
      'start #frag 3 start \nx=Dog'

      --
      Arnaud

      Comment

      • Terry Reedy

        #4
        Re: Python Regex Question

        MalteseUnderdog wrote:
        Hi there I just started python (but this question isn't that trivial
        since I couldn't find it in google :) )
        >
        I have the following text file entries (simplified)
        >
        start #frag 1 start
        x=Dog # frag 1 end
        stop
        start # frag 2 start
        x=Cat # frag 2 end
        stop
        start #frag 3 start
        x=Dog #frag 3 end
        stop
        ....
        >
        I need a regex expression which returns the start to the x=ANIMAL for
        only the x=Dog fragments so all my entries should be start ...
        (something here) ... x=Dog . So I am really interested in fragments 1
        and 3 only.
        As I understand the above....
        I would first write a generator that separates the file into fragments
        and yields them one at a time. Perhaps something like

        def fragments(ifile ):
        frag = []
        for line in ifile:
        frag += line
        if <line ends fragment>:
        yield frag
        frag = []

        Then I would iterate through fragments, testing for the ones I want:

        for frag in fragments(somef ile):
        if 'x=Dog' in frag:
        <do whatever>

        Terry Jan Reedy

        Comment

        Working...