How to Extract a website from a string / plain text?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • clain
    New Member
    • Feb 2007
    • 79

    How to Extract a website from a string / plain text?

    I need to extract website from Plain text string, I searched in most of the forums but i could not find a better one..
    1. since its a plain text it will not have any HTML links or anchor tags to find and extract.
    2. the website may or may not contain "www." for example the website name can be "learnwell. com" instead of "www.learnwell. com".
    3. There are website names like main.cool.edu


    Here is an example string "visit our webiste gravesfab.com"
  • code green
    Recognized Expert Top Contributor
    • Mar 2007
    • 1726

    #2
    It may be me, but your question doesn't seem to make sense.
    What are you trying to do?

    Comment

    • clain
      New Member
      • Feb 2007
      • 79

      #3
      Its simple ... i have bulks of plain text files ... i just need to extract all the Websites from it.

      Comment

      • code green
        Recognized Expert Top Contributor
        • Mar 2007
        • 1726

        #4
        So you mean you want to find all the domain names in a text file?

        There may be a regex that validates a domain name structure.
        I know they exist for email addresses, so try googling for regex and domain name and validate or check

        Comment

        • clain
          New Member
          • Feb 2007
          • 79

          #5
          Hello Mr Code Green.. its not exactly the domain name. it may also contain sub-domain's for example "support.domain .com".

          And regarding Googling... I did not find a regex that match my criteria in while googling... and also "try googling" is not the answer that I am expecting from Bytes. if that was the case ,I would not have posted this topic here ... ha ha

          Comment

          • code green
            Recognized Expert Top Contributor
            • Mar 2007
            • 1726

            #6
            I am not good with regex, I always look for somebody elses solution with Mr Google, that is the only reason I suggested it.
            Like I said, I did find numerous versions that validated an email structure.
            I am suprised you did not find similar for web addresses.

            I will happily show my email regex.
            Maybe it will give you something to build on, or hopefully prompt a regex guru to suggest something better
            Code:
            if(preg_match('/^[[:alnum:]][a-z0-9_\.\-]*@[a-z0-9\.\-]+\.[a-z]{2,4}$/i',$email))

            Comment

            • clain
              New Member
              • Feb 2007
              • 79

              #7
              Thanks Buddy ... I can start from here... Some more work around on your regex must get me there to the actual code.

              To be frank I got many regex but could not find a perfect one.. most of them faild in odd conditions...

              hopfully a regex that can omit "@" symbol can be derived from you code... I am on it... thanks again

              Comment

              Working...