re.findall() hangs in python

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • silverburgh.meryl@gmail.com

    re.findall() hangs in python

    Hi,

    I have the following regular expression.
    It works when 'data' contains the pattern and I see 'match2' get print
    out.
    But when 'data' does not contain pattern, it just hangs at
    're.findall'

    pattern = re.compile("(.* )<img (.*?) src=\"(.*?)img( .*?)\"(.*?)",
    re.S)

    print "before find all"

    match = re.findall(patt ern, data)

    if (match):
    print "match2"



    Can you please tell me why it that?

  • Peter Otten

    #2
    Re: re.findall() hangs in python

    silverburgh.mer yl@gmail.com wrote:
    I have the following regular expression.
    It works when 'data' contains the pattern and I see 'match2' get print
    out.
    But when 'data' does not contain pattern, it just hangs at
    're.findall'
    >
    pattern = re.compile("(.* )<img (.*?) src=\"(.*?)img( .*?)\"(.*?)",
    re.S)
    >
    print "before find all"
    >
    match = re.findall(patt ern, data)
    >
    if (match):
    print "match2"
    >
    >
    >
    Can you please tell me why it that?
    Could it be that it is just slow? If not, post a small example of data that
    provokes findall() to hang.

    Peter

    Comment

    • 7stud

      #3
      Re: re.findall() hangs in python

      On Mar 31, 9:12 pm, "silverburgh.me ...@gmail.com"
      <silverburgh.me ...@gmail.comwr ote:
      Hi,
      >
      I have the following regular expression.
      It works when 'data' contains the pattern and I see 'match2' get print
      out.
      But when 'data' does not contain pattern, it just hangs at
      're.findall'
      >
      pattern = re.compile("(.* )<img (.*?) src=\"(.*?)img( .*?)\"(.*?)",
      re.S)
      >
      print "before find all"
      >
      match = re.findall(patt ern, data)
      >
      if (match):
      print "match2"
      >
      Can you please tell me why it that?
      It doesn't hang when I try it. Why don't you post a complete example
      that hangs.

      Also, you might consider using exterior single quotes around your
      string so that you don't have to escape double quotes inside the
      string.

      Comment

      • Gabriel Genellina

        #4
        Re: re.findall() hangs in python

        En Sun, 01 Apr 2007 03:58:51 -0300, Peter Otten <__peter__@web. de>
        escribió:
        silverburgh.mer yl@gmail.com wrote:
        >
        >I have the following regular expression.
        >It works when 'data' contains the pattern and I see 'match2' get print
        >out.
        >But when 'data' does not contain pattern, it just hangs at
        >'re.findall'
        >>
        >pattern = re.compile("(.* )<img (.*?) src=\"(.*?)img( .*?)\"(.*?)",
        >re.S)
        >
        Could it be that it is just slow? If not, post a small example of data
        that
        provokes findall() to hang.
        I bet it is very slooooooow!
        To the OP: do you actually need all those groups? Specially the first and
        last (.*), they match all the surrounding text.

        --
        Gabriel Genellina

        Comment

        • irstas@gmail.com

          #5
          Re: re.findall() hangs in python

          On Apr 1, 6:12 am, "silverburgh.me ...@gmail.com"
          <silverburgh.me ...@gmail.comwr ote:
          But when 'data' does not contain pattern, it just hangs at
          're.findall'
          >
          pattern = re.compile("(.* )<img (.*?) src=\"(.*?)img( .*?)\"(.*?)",
          re.S)
          That pattern is just really slow to evaluate. What you want is
          probably something more like this:

          re.compile(r'<i mg [^>]*src\s*=\s*"([^"]*img[^"]*)"')

          "dot" is usually not so great. Prefer "NOT end-character", like [^>]
          or [^"].

          Comment

          • silverburgh.meryl@gmail.com

            #6
            Re: re.findall() hangs in python

            On Apr 1, 5:23 am, irs...@gmail.co m wrote:
            On Apr 1, 6:12 am, "silverburgh.me ...@gmail.com"
            >
            <silverburgh.me ...@gmail.comwr ote:
            But when 'data' does not contain pattern, it just hangs at
            're.findall'
            >
            pattern = re.compile("(.* )<img (.*?) src=\"(.*?)img( .*?)\"(.*?)",
            re.S)
            >
            That pattern is just really slow to evaluate. What you want is
            probably something more like this:
            >
            re.compile(r'<i mg [^>]*src\s*=\s*"([^"]*img[^"]*)"')
            >
            "dot" is usually not so great. Prefer "NOT end-character", like [^>]
            or [^"].
            Thank you. Your suggestion solves my problem!

            Comment

            Working...