String Regex problem

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Fazer

    String Regex problem

    Hello,

    I have a string which has a url (Begins with a http://) somewhere in
    it. I want to detect such a url and just spit out the url. Since I
    am very poor in regex, can someone show me how to do it using a few
    examples?

    Thanks a lot!
  • djw

    #2
    Re: String Regex problem

    Fazer wrote:[color=blue]
    > Hello,
    >
    > I have a string which has a url (Begins with a http://) somewhere in
    > it. I want to detect such a url and just spit out the url. Since I
    > am very poor in regex, can someone show me how to do it using a few
    > examples?
    >
    > Thanks a lot![/color]

    I would look here to improve your re-ex skills:



    Also, I find Kodos to be invaluable in developing and debugging regexs.
    Highly recommended.

    regular expressions, python regular expressions, debugging regular expressions, developing regular expressions, develop regular expressions, debug regular expressions, python regex, python re, python gui regex, python gui regular expressions


    Of course, you could just use urlparse in the standard library...

    Good luck,

    Don



    Comment

    • Skip Montanaro

      #3
      Re: String Regex problem

      [color=blue][color=green]
      >> Since I am very poor in regex, can someone show me how to do it using
      >> a few examples?[/color][/color]

      Don> http://www.amk.ca/python/howto/regex/
      ...
      Don> http://kodos.sourceforge.net

      If you're a Mac Python person there's also Dinu Gherman's excellent
      RegexPlor:



      Even if you're not, it's worth popping over there to watch the MPEG clip of
      RegexPlor in action.

      Skip

      Comment

      • Andrei

        #4
        Re: String Regex problem

        Skip Montanaro wrote on Mon, 24 Nov 2003 21:35:48 -0600:
        [color=blue][color=green][color=darkred]
        > >> Since I am very poor in regex, can someone show me how to do it using
        > >> a few examples?[/color][/color]
        >[/color]
        <snip>[color=blue]
        > Don> http://kodos.sourceforge.net
        >
        > If you're a Mac Python person there's also Dinu Gherman's excellent
        > RegexPlor:
        >
        > http://starship.python.net/crew/gherman/RegexPlor.html[/color]
        <snip>

        I'm biased here, but Kiki (but http://project5.freezope.org/kiki) is
        cross-platform and doesn't depend on Qt but on wxPy which is much easier
        for Windows users.

        Anyway, here's a regex I ripped out of my own code - you might want to
        simplify it though:

        """Regex for finding URLs:
        URL's start with http(s)/ftp/news ((http)|(ftp)|( news))
        followed by ://
        then any number of non-whitespace characters including
        numbers, dots, forward slashes, commas, question marks,
        ampersands, equality signs, dashes, underscores and plusses,
        but ending in a non-dot and non-plus!

        Result:

        (?:http|https|f tp|news)://(?:[@a-zA-Z0-9,/%:\&+#\?=\-_~;]+\.*)+[a-zA-Z0-9,/%:\&#\?=\-_]

        Tests:
        Plain old link: http://www.mail.yahoo.com.
        Containing numbers: ftp://bla.com/di~ng/co.rt,39,%93 or other
        Go to news://bl_a.com/?ha-h+a&query=tb for more info.
        A real link: <a href="http://x.com">http://x.com</a>.
        ftp://verylong.org/url/must/be/chopp...itwontfit.html
        (long one)
        <IMG src="http://b.com/image.gif" /> (a plain image tag)
        <a href=http://fixedlink.com/orginialinvalid .html>fixed</a> (original
        invalid HTML)
        Link containing an anchor
        <b>"http://myhomepage.com/index.html#01"</b>.
        """

        --
        Yours,

        Andrei

        =====
        Mail address in header catches spam. Real contact info (decode with rot13):
        cebwrpg5@jnanqb b.ay. Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V ernq
        gur yvfg, fb gurer'f ab arrq gb PP.


        Comment

        • Fazer

          #5
          Re: String Regex problem

          djw <dwelch91.nospa m@comcast.net> wrote in message news:<fSzwb.293 286$HS4.2642954 @attbi_s01>...[color=blue]
          > Fazer wrote:[color=green]
          > > Hello,
          > >
          > > I have a string which has a url (Begins with a http://) somewhere in
          > > it. I want to detect such a url and just spit out the url. Since I
          > > am very poor in regex, can someone show me how to do it using a few
          > > examples?
          > >
          > > Thanks a lot![/color]
          >
          > I would look here to improve your re-ex skills:
          >
          > http://www.amk.ca/python/howto/regex/
          >
          > Also, I find Kodos to be invaluable in developing and debugging regexs.
          > Highly recommended.
          >
          > http://kodos.sourceforge.net
          >
          > Of course, you could just use urlparse in the standard library...
          >
          > Good luck,
          >
          > Don[/color]

          Wow awesome! Thanks a lot for kodos. I hope I find it useful. I
          have actually found a better solution rather than using regex it self.

          Here's my solution and I think it works well:
          [x for x in moo.split(' ') if x.startswith('h ttp://')]

          Comment

          Working...