quick regex question

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Xx r3negade
    New Member
    • Apr 2008
    • 39

    quick regex question

    Hi, I am very bad with regexes.
    I need a regular expression that will reduce a url like this:

    hxxp://example.com/somefolder/something/whatever

    to its base:

    hxxp://example.com

    The two slashes after the http: complicate things. I know I could just remove the "http://" and add it again later, but as a learning experience, I would like to do this regex in a single line.

    Thanks in advance
  • benjumanji
    New Member
    • Apr 2009
    • 2

    #2
    erm..

    import re
    p = re.compile(r'ht tp://.*?\.(com|co\.u k)')

    you can obviously adjust the end to add as many different domains as you can think of.

    Comment

    • bvdet
      Recognized Expert Specialist
      • Oct 2006
      • 2851

      #3
      The re solution may depend on the different possibilities of the URL name. Here is one possible solution:
      Code:
      import re
      
      patt = re.compile(r'([a-z]+?:/+?\w+?\.\w+?)/')
      
      m = patt.match("hxxp://example.com/somefolder/something/whatever")
      if m:
           print m.group(1)

      Comment

      Working...