How to extract domain from string with regex?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • deko

    How to extract domain from string with regex?

    I'm sure someone has passed this way before...

    I want to check to see is a domain name is contained in a string, and if one is,
    I want to extract it. In these strings, domains are always preceded by
    "http://" or "http : //www" (without the spaces).

    in pseudo code, I thought it might look like this:

    if (eregi("http: //", $mystring))
    {
    $domain = explode("http: //", $mystring);
    $domain = array_reverse($ domain);
    }
    $parts = domain[0];
    explode(".", $parts);
    if ($parts[0] == "www")
    {
    $extracted = $parts[1]."."$parts[2];
    }
    else
    {
    $extracted = $parts[0]."."$parts[1];
    }

    Does this look about right?

    Thanks in advance.

  • deko

    #2
    Re: How to extract domain from string with regex?

    here's a cleaner example:

    if (eregi("http://", $mystring))
    {
    $mystring = explode("http://", $mystring);
    $mystring = array_reverse($ mystring);
    $domain = $mystring[0];
    $domain = explode(".", $domain);
    if ($domain[0] == "www")
    {
    $extracted = $domain[1].".".$domain[2];
    }
    else
    {
    $extracted = "$domain[0].".".$domain[1];
    }
    }

    Can I egrep on "http://" ? or do I need to escape the "/" ?

    Comment

    • Alvaro G. Vicario

      #3
      Re: How to extract domain from string with regex?

      *** deko escribió/wrote (Fri, 25 Aug 2006 23:09:28 -0700):
      In these strings, domains are always preceded by
      "http://" or "http : //www" (without the spaces).
      Without the spaces? Then, why do you add the spaces?

      Given that precondition, I wouldn't use regex:

      parse_url Parse a URL and return its components

      usage:
      array parse_url ( string url )

      Parameters
      url
      The URL to parse

      Return Values
      On seriously malformed URLs, parse_url() may return FALSE and emit a
      E_WARNING. Otherwise an associative array is returned, whose components may
      be (at least one):

      scheme - e.g. http
      host
      port
      user
      pass
      path
      query - after the question mark ?
      fragment - after the hashmark #


      in pseudo code, I thought it might look like this:
      >
      if (eregi("http: //", $mystring))
      Sorry, but I just can't understand all that story about spaces/not spaces
      :-?



      --
      -+ http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
      ++ Mi sitio sobre programación web: http://bits.demogracia.com
      +- Mi web de humor con rayos UVA: http://www.demogracia.com
      --

      Comment

      • deko

        #4
        Re: How to extract domain from string with regex?

        Thanks for the tip on parse_url.

        But I still have to find the URL (which could be anywhere) in the string.
        Without the spaces? Then, why do you add the spaces?
        Here's a psuedocode example without the spaces:

        if (eregi("http://", $mystring))
        {
        $mystring = explode("http://", $mystring);
        $mystring = array_reverse($ mystring);
        $domain = $mystring[0];
        $domain = explode(".", $domain);
        if ($domain[0] == "www")
        {
        $extracted = $domain[1].".".$domain[2];
        }
        else
        {
        $extracted = "$domain[0].".".$domain[1];
        }
        }

        Would it be better to use preg_match here?

        preg_match('@^( ?:http://)?([^/]+)@i',
        "http://www.php.net/index.html", $matches);
        $host = $matches[1];

        // get last two segments of host name
        preg_match('/[^.]+\.[^.]+$/', $host, $matches);
        echo "domain name is: {$matches[0]}\n";

        But would this work if the URL is buried in a string?

        Comment

        Working...