LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Automatically find and export email addresses? (https://www.linuxquestions.org/questions/linux-newbie-8/automatically-find-and-export-email-addresses-644706/)

tommy.sean 05-25-2008 05:17 PM

Automatically find and export email addresses?
 
I’m a complete newb (when it comes to Linux) and I’m not even sure if what I have in mind is possible, but I know Linux has a lot of capability so I think there is probably a way.

I have a couple of 40 page text files, exported from contact lists (from programs in Windows, I have dual boot) these files are most junk I don’t need but they are also full of email addresses I DO need. I have been manually going through finding the email addresses and cutting and pasting them into a separate list. It is tedious as hell.

Is there anyway to make a script of something that just searches a text file and exports every word containing the @ symbol? So I’m looking for way to just automatically get all the email addresses out of a long text file and put them into a list. Is this possible? Thank you!

kilgoretrout 05-25-2008 05:33 PM

$ cat file.txt | grep @

asymptote 05-25-2008 05:38 PM

Where "file.txt" is your input file. You can output the contents into a file called address.txt by adding the following to kilgoretrout's command:
Code:

> address.txt
Address.txt will be automatically created and contain a list of all lines of entry that contain the @ symbol.

bigrigdriver 05-25-2008 05:51 PM

If I may throw in my 2 cents.

Asymptote's solution will end up with one address in address.txt becsuse every time the script finds an address, it will overwrite the previous one.

A small matter of syntax: change asymptote's solution to read
Code:

>> address.txt
The double >> will append addresses to address.txt rather than overwrite it.

asymptote 05-25-2008 05:58 PM

Not on my system! I tested it using the following code:
Code:

#List all files in the file system containing an "a" starting with
#the root directory and place the search results in scan.txt
311-laptop:~/Desktop$ sudo ls /* | grep a > scan.txt

#contents of scan.txt
311-laptop:~/Desktop$ cat scan.txt
/apt_get_update.cap
bash
bzcat
cat
dash
date
dnsdomainname
false
hostname
ld_static
loadkeys
nano
netcat
netstat
rbash
readlink
rnano
run-parts
tailf
tar
uname
zcat
abi-2.6.22-14-generic


chrism01 05-25-2008 05:59 PM

grep @ file.txt > address.txt

Only 1 process/invocation, so only need '>'

Note that cat file|grep pattern is UUOC (Useless Use of cat)

asymptote 05-25-2008 06:03 PM

Good point - bigrigdriver threw me off.

tommy.sean 05-25-2008 08:28 PM

I guess thats not going to work because the addresses are not in separate lines, they are in every line of text. This is what part of my text file looks like:

"C","","Ellington","","cell@example.net","Page1","","","","","","","","","","","","","","","","","", "","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""

"Cali","","Nichols","","ccni@example.com","","","","","","","","","","","","","","","","","","",""," ","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""

"Carey","","Davis","","carey@example.com","ida-rmb","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","" ,"","","","","","","","","","","","","","","","",""

"Carla","","Kociolek","","cakoci@example.com","","","","","","","","","","","","","","","","","","", "","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""

"Carmen","","Senter","","cirw@example.com","Page1","","","","","","","","","","","","","","","",""," ","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""," "

"Carol","","Foster","","caro@example.com","Page1","","","","","","","","","","","","","","","","","" ,"","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""

"Carol","","Carr","","carr@example.net","Parent","","",

So the "grep" command just ends up copying the whole file. Thanks anyways for your help guys.

asymptote 05-25-2008 09:56 PM

Who the hell is carey davis??? Why are you emailing her ?!?! If that's who I think it is you and I are GOING TO HAVE A TALK!

billymayday 05-25-2008 10:11 PM

A couple of thoughts:

First, those people probably woudn't be too impressed having their email addresses posted on a forum, so perhaps post and exampl.com type line and delete the rest.

Otherwise, I know full well that some of the scripting gurus will give you a perfectly elegant solution to your problem from the command line, but I'm very ordinary at awk and all the rest of thos tools. have you tried opening this file a comma delimited text in Excel/OO or similar, and pasting the address row into a text file? Crude, but effective if you are running a GUI and have OO or similar installed.


B

Edit, or you could just try this

http://linux.die.net/man/1/cut
http://lowfatlinux.com/linux-columns-cut.html

bigrigdriver 05-25-2008 11:47 PM

From the OP:
Quote:

I have a couple of 40 page text files, exported from contact lists (from programs in Windows, I have dual boot) these files are most junk I don’t need but they are also full of email addresses I DO need. I have been manually going through finding the email addresses and cutting and pasting them into a separate list. It is tedious as hell.
He clearly indicates he has more than one file to extract addresses from.

I stand by my suggestion of using the append redirect. To get all of the address into one file, from all filles they are to be extracted from, a loop through the files, with an append to the existant addresses.txt would be my way to do it.

There is no good reason (at least not one given by the OP) to have to run the script more than once to get the job done.

And tommy.sean, don't give up on us so quickly, You didn't give us any indication of the file formats, or you would have received quite different suggestions. billymayday only hints at what those answers would have been.

chrism01 05-26-2008 12:45 AM

Qucik 'n dirty perl
Code:

#!/usr/bin/perl -w
use strict;

my (
    $f1, $f1_rec, $f1_field, $f2
    );

$f1 = $ARGV[0];
$f2 = $ARGV[1];

open(F2,">>", "$f2") or die "Unable to open $f2: $!\n";

open(F1,"<", "$f1") or die "Unable to open $f1: $!\n";
while ( defined ( $f1_rec = <F1> ) )
{
    $f1_field = (split(/,/, $f1_rec))[4];
    print F2 "$f1_field\n";
}
close(F1) or die "Unable to close $f1: $!\n";
close(F2) or die "Unable to close $f2: $!\n";

Assumes email is always 5th field as per example data given above.

billymayday 05-26-2008 12:57 AM

Isn't using "cut" simpler?

tommy.sean 05-26-2008 04:57 PM

Well guys I finished up, the slow and tedious way. Thanks again for your help, at least I did learn some things. I would have to learn a lot more about pearl before I could have done it that way.
Thanks for the info. Should I close this forum or something now?


All times are GMT -5. The time now is 04:27 PM.