Trouble escaping regex strings

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Xx r3negade
    New Member
    • Apr 2008
    • 39

    Trouble escaping regex strings

    I'm using regular expressions to parse HTML hyperlinks and I've run into a problem. I'm trying to escape characters
    such as '.' and '?' for use in regular expressions, but it's not working

    Code:
    # Grabs a link.  For this example, let's say that the string grabbed is '<a href="http://google.com/?q=foo">Click</a>'
    link_url_original = GetLink()
    
    # Sanitize string for regex use
    link_url_original = re.sub("\.", "\.", link_url_original)
    link_url_original = re.sub("\?", "\?", link_url_original)
    
    toSub = 'http://google.com/?q=foo'
    to Repl = 'http://www.yahoo.com'
    
    final = re.sub(toSub, toRpl, link_url_original)
    print final
    The output is:
    Code:
    <a href="http://google\.com/\?q=foo">Click</a>
    Why aren't the added slashes being interpretted as escape characters?
  • Das123
    New Member
    • Aug 2008
    • 2

    #2
    The regular expression is the match string, not the replace string. The correct syntax would be like
    [CODE=Python]result = re.sub("\.", ".", subject)[/CODE]

    Comment

    • Xx r3negade
      New Member
      • Apr 2008
      • 39

      #3
      Originally posted by Das123
      The regular expression is the match string, not the replace string. The correct syntax would be like
      [CODE=Python]result = re.sub("\.", ".", subject)[/CODE]
      Replace a "." with another "."? What???

      Comment

      • Das123
        New Member
        • Aug 2008
        • 2

        #4
        Ahh, sorry. I didn't understand what you were trying to do.

        The answer is that the toSub needs to be escaped, not the link_original_u rl...

        Code:
        import re
        #link_url_original = GetLink()
        link_url_original = '<a href="http://google.com/?q=foo">Click</a>'
        
        toSub = "http://google.com/?q=foo"
        # Sanitize string for regex use
        toSub = re.sub("\.", "\.", toSub)
        toSub = re.sub("\?", "\?", toSub)
        toRpl = "http://www.yahoo.com"
        
        final = re.sub(toSub, toRpl, link_url_original)
        print final
        The result is...

        <a href="http://www.yahoo.com"> Click</a>

        Comment

        • Xx r3negade
          New Member
          • Apr 2008
          • 39

          #5
          Ah, I can't believe I didn't catch that, thanks.

          Comment

          Working...