Extract email addresses from big file.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • superc0red
    New Member
    • Feb 2007
    • 5

    Extract email addresses from big file.

    Hey.

    I have a big text file with data,
    and i want to extract mail addresses.

    How i can do it?
  • arne
    Recognized Expert Contributor
    • Oct 2006
    • 315

    #2
    Originally posted by superc0red
    Hey.

    I have a big text file with data,
    and i want to extract mail addresses.

    How i can do it?
    I guess there are plenty of ways to do it. Any constraints on the tool/language?

    Comment

    • superc0red
      New Member
      • Feb 2007
      • 5

      #3
      perl / shellscript using awk-sed-cut ??

      Comment

      • arne
        Recognized Expert Contributor
        • Oct 2006
        • 315

        #4
        Originally posted by superc0red
        perl / shellscript using awk-sed-cut ??
        Perl is certainly a reasonable choice, yes. If I had to do it, I would use it.

        Comment

        • Motoma
          Recognized Expert Specialist
          • Jan 2007
          • 3236

          #5
          Regular expressions would be a great way to do this. Try looking at the sed tool.

          Comment

          • ghostdog74
            Recognized Expert Contributor
            • Apr 2006
            • 511

            #6
            Code:
            awk '
            {
              for (i=1;i<=NF;i++) {
                   if ( $i ~ /[[:alpha:]]@[[:alpha:]]/ )  { 
            	  print $i      
                   }
              }
            }' "file"

            Comment

            • superc0red
              New Member
              • Feb 2007
              • 5

              #7
              Thanx for the code dude :)

              Comment

              • prn
                Recognized Expert Contributor
                • Apr 2007
                • 254

                #8
                It's been quite a while since I did anything with awk, so I wasn't sure how well ghostdog's code would work. It looked like it should handle only alphabetics with no more than one component on each side of the "@". So I made up a test file (test.txt):

                Code:
                this is a test file foo@bar.com we are looking for moo@drop.dhcp.bar.com email
                addresses inside, 00test@leo.bar.com, a text file with no
                particular fname.lname@bar.baz.net other par72@take.the.bus.au restrictions
                on the format or locations of the 23skidoo@bar.co.uk addresses inside the file.
                Let's try one at the end joe27@aol.com.
                I ran ghostdog's awk script on this and got the output:
                Code:
                foo@bar.com
                moo@drop.dhcp.bar.com
                00test@leo.bar.com,
                fname.lname@bar.baz.net
                23skidoo@bar.co.uk
                Note that this output has FIVE email addresses, but the file has SEVEN so there is something wrong. The two that are omitted have digits just beside the "@" so it looks like I was close but not quite right on how much awk would match with this RE. It catches everything between spaces into $i whenever it matches /[[:alpha:]]@[[:alpha:]]/

                But note that it also caught the comma following the third address "00test@leo.bar .com," which it should not include in the email address.

                Here's a Perl one-liner:
                [code=perl]perl -wne'while(/[\w\.]+@[\w\.]+/g){print "$&\n"}' test.txt[/code]
                This gives the output
                Code:
                foo@bar.com
                moo@drop.dhcp.bar.com
                00test@leo.bar.com
                fname.lname@bar.baz.net
                par72@take.the.bus.au
                23skidoo@bar.co.uk
                joe27@aol.com.
                which is almost correct (and does not include the comma following number 3, although it does include the period at the end).

                Here's a corrected version:
                [code=perl]perl -wne'while(/[\w\.]+@[\w\.]+\w+/g){print "$&\n"}' test.txt[/code]
                This yields
                Code:
                foo@bar.com
                moo@drop.dhcp.bar.com
                00test@leo.bar.com
                fname.lname@bar.baz.net
                par72@take.the.bus.au
                23skidoo@bar.co.uk
                joe27@aol.com
                I'm sure ghostdog74's awk script could also easily be fixed, but as I said, it's been a long time and I'm not sure how much I want to play with it. ;)

                HTH,
                Paul

                Comment

                • peripatetic
                  New Member
                  • Jul 2007
                  • 1

                  #9
                  Hi.
                  Thanks for this. I was using it for a while and thought it was wonderful. However it misses the legitimate hyphen character within emails. Here's an updated version.

                  Code:
                  perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}' emails.txt | sort -u > output.txt
                  I also piped it through sort to get a sorted, unique list of emails.

                  Comment

                  • Motoma
                    Recognized Expert Specialist
                    • Jan 2007
                    • 3236

                    #10
                    Originally posted by peripatetic
                    Hi.
                    Thanks for this. I was using it for a while and thought it was wonderful. However it misses the legitimate hyphen character within emails. Here's an updated version.

                    Code:
                    perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}' emails.txt | sort -u > output.txt
                    I also piped it through sort to get a sorted, unique list of emails.
                    Great catch peripatetic! Thanks for the addition, and welcome to The Scripts!

                    Comment

                    • HostQ8i
                      New Member
                      • Jan 2008
                      • 3

                      #11
                      guys can this perl script be used on websites ? and i replace the file with a web adress ? or how can i do this to get the emails included in a website ?


                      and let's say i have www.domain.com/aa.php=1 have some emails saved inside
                      and www.domain.com/aa.php=2 have also some mails .. how can i make a loop to get all the aa.php=variable and get the mails in all the files ?
                      thanks in advance and sorry for my english

                      Comment

                      • David Akpan
                        New Member
                        • Mar 2008
                        • 1

                        #12
                        I have a big file with many email addresses, how do i extract only the email address, if posible please include the software i can use

                        Comment

                        • Freakin
                          New Member
                          • May 2008
                          • 1

                          #13
                          How would I use a script like this on a group of files that are in a directory to retrieve email addresses from all of them?

                          Comment

                          • gpraghuram
                            Recognized Expert Top Contributor
                            • Mar 2007
                            • 1275

                            #14
                            Originally posted by Freakin
                            How would I use a script like this on a group of files that are in a directory to retrieve email addresses from all of them?

                            Try to combine the find command with xargs and the perl script given here like this.

                            find . -name "*.txt" | xargs perl <script given here>


                            Raghu

                            Comment

                            • RADEP
                              New Member
                              • Dec 2009
                              • 1

                              #15
                              I tried the above example but it didn't work for me.

                              I got the following error:

                              C:\Documents and Settings\user\D esktop\abc\trun k\docs>perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}'db_ em
                              ails.txt | sort -u > output.txt
                              Can't find string terminator "'" anywhere before EOF at -e line 1.
                              -uThe system cannot find the file specified.

                              Comment

                              Working...