Regular Expression Function to remove email address in string

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • nemesisdan
    New Member
    • Apr 2009
    • 7

    Regular Expression Function to remove email address in string

    I have a form with text description field, but some people keep putting email addresses, telephone numbers, HTML and URL's in the descriptions.

    I can remove the HTML tags using a regular expression, but I can't figure out how to remove email addresses, telephone numbers and URL's from a text string... eg.

    'Lorem ipsum dolor sit amet, some.email@mydo main.com consectetur adipisicing elit, anotheremail@so medomain.com sed do eiusmod tempor incididunt ut labore 01234 567 891 et dolore +341234 567891 magna aliqua. 012 345 6789 Ut enim http://www.somedomain.comad minim www.somedoma.com veniam, quis somedomain.com nostrud exercitation http://somedomain.com ullamco laboris nisi ut aliquip ex ea commodo consequat.'

    This is driving me round the bend! Can anyone save me??

    Many thanks!

    Dan
  • jhardman
    Recognized Expert Specialist
    • Jan 2007
    • 3405

    #2
    if someone posts urls to me I just delete the whole post and ban the user/add the ip address to a blocked list. I don't see any reasonable way to block a phone number, and I think you would have to block '@' to block emails.

    Comment

    • nemesisdan
      New Member
      • Apr 2009
      • 7

      #3
      Hi jhardman

      Thanks for your reply and in essence I agree with you, unfortunately the peeps entering the info are paying customers... No matter how much I tell the sales dept. to tell their clients not to do this, they will always try to push it, which annoys all the other customers who do not.

      With a DB of over 1 million description records, I could really use a funcion to do this rather than me wasting my life checking every record in the db manually...

      Can you still help?

      Comment

      • jhardman
        Recognized Expert Specialist
        • Jan 2007
        • 3405

        #4
        OK, how about this, both urls and email addresses cannot contain white space. how about you delete from the @ symbol to the first white space on either side? For URLs, most should start with "http://" or "https://", most of the rest probably start with "www." (this won't get rid of all of them, but it will do the vast majority). I say delete from that to the next white space. for phone numbers you would need to search for a string of at least 7 characters that contain nothing but numbers and punctuation (spaces, periods, and hyphens are all commonly used, possibly other marks as well, to separate the parts of the phone number). Does that sound reasonable? If that will work for you, and if you need it, I can help you write the regexs.

        Jared

        Comment

        • nemesisdan
          New Member
          • Apr 2009
          • 7

          #5
          Hi jhardman

          Thanks, I see your logic, however the syntax I am struggling with... Can you help me write the regex's

          Kind regards

          Dan

          Comment

          • jhardman
            Recognized Expert Specialist
            • Jan 2007
            • 3405

            #6
            this should recognize phone numbers:
            Code:
            [\d\s\W]{7,}
            ( a string of at least seven characters with only numerals, white spaces and non-word characters).

            This should recognize email addresses:
            Code:
            [.\w]{3,}@[.\w]{5,}
            (at least 3 characters that can include periods or word characters, followed by an @ sign, followed by at least 5 characters that include word characters and periods)

            between them, these 2 should recognize MOST URLs
            Code:
            https?://[.\w]{3,}
            www.[.\w]{3,}
            (http s-optional :// followed by at least 3 word and period characters, and www. followed by at least 3 word and period characters) This doesn't catch domains without a prefix (like "mydomain.com") . If you wanted, you could try something like [.\w]{3,}.com, [.\w]{3,}.net etc.

            I wrote a quick script to check, feel free to use it: regex test

            Let me know if this helps. Did you need help with the code as well?

            Jared

            Comment

            • nemesisdan
              New Member
              • Apr 2009
              • 7

              #7
              Hi Jared

              Thanks for this! I'm gonna give a whirl ASAP...

              I'm ok with the code, thanks for the offer!

              I'll post back after testing

              Thanks again

              Dan

              Comment

              • nemesisdan
                New Member
                • Apr 2009
                • 7

                #8
                Hi Jared

                Can you keep your test script live until I post back so I have something to verify against..?

                Thanks again!

                Dan

                Comment

                • nemesisdan
                  New Member
                  • Apr 2009
                  • 7

                  #9
                  Hi Jared

                  Great news! It worked!

                  Take a look at this link to see my results Test Script

                  Thank you soo much!

                  Kind regards

                  Dan

                  Comment

                  • jhardman
                    Recognized Expert Specialist
                    • Jan 2007
                    • 3405

                    #10
                    Dan,

                    Glad to hear it worked. The hard part for me is always defining what characters to delete. Anyway, thanks for posting back.

                    Jared

                    Comment

                    • peter423
                      New Member
                      • Jul 2021
                      • 1

                      #11
                      Hi Jared, this is old post, but looks exactly something I need.

                      Would you be interested code a Wordpress plugin to use this on my website? Not free of cause.

                      What I would need is a a plugin with additional parameter to define a "Post type" to enable the code

                      Comment

                      • jhardman
                        Recognized Expert Specialist
                        • Jan 2007
                        • 3405

                        #12
                        Hi Peter,

                        Wow, this was from a long time ago! I remember this thread though.

                        I'm afraid I don't like wordpress and have never tried to work with it. If I have some time maybe I can look up how to make a wordpress plugin, but if you find someone else to make a wordpress plugin I would be willing to help write the regex to exclude or flag some posts.

                        Jared

                        Comment

                        Working...