splitting merged words but www adresses (regexp)

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Piotr

    splitting merged words but www adresses (regexp)

    Is there any way to split all merged words but www and e-mail addresses?

    I have regexp

    preg_replace("/(\.)([[:alpha:]])/", "\\1 \\2", "www.google .com
    any,merged.word s mymail@domain.c om")

    it give me incorrect result:
    www. google. com any, merged. words mymail@domain. com

    i need result
    www.google.com any, merged. words mymail@domain.c om

    in my case, all web addresses has www. or http:// in beggining of string
    and email of course @ inside string

    is it possible to write regexp like this?
  • Chung Leong

    #2
    Re: splitting merged words but www adresses (regexp)


    "Piotr" <piou@gaztea.pl > wrote in message
    news:1ev9acl8mg 8u1$.1ceb4k7mun z73$.dlg@40tude .net...[color=blue]
    > Is there any way to split all merged words but www and e-mail addresses?
    >
    > I have regexp
    >
    > preg_replace("/(\.)([[:alpha:]])/", "\\1 \\2", "www.google .com
    > any,merged.word s mymail@domain.c om")
    >
    > it give me incorrect result:
    > www. google. com any, merged. words mymail@domain. com
    >
    > i need result
    > www.google.com any, merged. words mymail@domain.c om
    >
    > in my case, all web addresses has www. or http:// in beggining of string
    > and email of course @ inside string
    >
    > is it possible to write regexp like this?[/color]

    No. You would use a lookbehind assertion in instances like these, but the
    assertion has to be fixed length. Since a domain name can be of any number
    of characters, you can't do it.

    What you can do is first search for domain names and email addresses,
    replacing them with some placeholders, fix the merged words, then replace
    the placeholders again. Example:

    function encode($m) { return "###" . base64_encode($ m[0]) . "###"; }
    function decode($m) { return base64_decode($ m[1]); }

    $s = "www.google .com any,merged.word s mymail@domain.c om";
    $s = preg_replace_ca llback('/\bwww\.[\w\.]+/', 'encode', $s);
    $s = preg_replace_ca llback('/\b[\w\.]+@[\w\.]+/', 'encode', $s);
    $s = preg_replace('/([,.])(\w)/', '\1 \2', $s);
    $s = preg_replace_ca llback('/###(.*?)###/', 'decode', $s);

    echo $s;


    Comment

    • Piotr

      #3
      Re: splitting merged words but www adresses (regexp)

      Dnia Tue, 28 Sep 2004 23:37:13 -0400, Chung Leong napisa³(a):
      [color=blue]
      >
      > function encode($m) { return "###" . base64_encode($ m[0]) . "###"; }
      > function decode($m) { return base64_decode($ m[1]); }
      >
      > $s = "www.google .com any,merged.word s mymail@domain.c om";
      > $s = preg_replace_ca llback('/\bwww\.[\w\.]+/', 'encode', $s);
      > $s = preg_replace_ca llback('/\b[\w\.]+@[\w\.]+/', 'encode', $s);
      > $s = preg_replace('/([,.])(\w)/', '\1 \2', $s);
      > $s = preg_replace_ca llback('/###(.*?)###/', 'decode', $s);
      >
      > echo $s;[/color]

      Thanks a lot! it is great solution I searched a long time!

      Comment

      Working...