Searching .txt file for (specific) strings and printing them to new file

Hb_Kai · 02-13-2010, 12:45 AM

Hey, I know the title was too long but basically, what I'm trying to do is take a specific string from a text file and print the strings to a seperate file and on a new line. May seem confusing but here's an example.

Name: Bob, PhoneNumber: *****, MobileNumber: *****, Email: ***@**.com,

And I have a file with this type of formatting; if I was using the above example, I would like to take only the Emails from the file and print them to a completely seperate file by > seperate-file.txt but the problem is, I don't know how to go about that.

I tried using grep to search for a string of "***@***.com" but that would only print the entire line including Name, PhoneNumber, etcetera and after reading the man pages, found out that it's for whole lines.

I was wondering if anybody knew of a command that I could do this with? It's beginning to give me a bit of a headache.

Thanks for any help in advanced.

EricTRA · 02-13-2010, 12:54 AM

Hello,

This looks like homework to me, is it? You're not going to find a lot of help for homework on LQ, since it's against the rules. Anyway, if it is, you should look into awk and loops to obtain the result you want. If it's not homework post what you've already tried and we'll take it from there.

Kind regards,

Eric

Hb_Kai · 02-13-2010, 01:19 AM

Hey. Thank you for your reply.

No, I don't go to school or college at the minute, it has nothing to do with any homework or course work of the sort. All it is, was I have been trying to teach myself a bit more about BASH and its commands by looking into some of the man pages and making practice text files and other stuff and try to manipulate the way they work and what they contain and stuff from the terminal. I worked up this list of random nobodies with some random numbers, addresses, etc and was just wondering if this is the type of thing can be done with the terminal but being pretty sure that it was, it began giving me a headache so I wanted to ask on a forum.

I've tried such things as grep but grep could only print the whole lines and that's about as near as I could. The latest commands I have been trying are

Code:

find [directory] -type f -exec grep -i "[search string]" {} \; | cat > [file location/name]

as well as the grep -i

Code:

grep -i *@*.* random.txt

I have already looked into awk but when I was reading through the man pages, it was explaining about (mawk? And) how it's used with programs, which I didn't really understand how I would use awk to process the .txt file like so.

That's pretty much all I've been able to think of at the minute though.

EricTRA · 02-13-2010, 01:32 AM

Hello,

Ok, first, you don't need to pipe output into cat and then redirect to a file, you can just directly redirect to a file like this:

Code:

find [directory] -type f -exec grep -i "[search string]" {} \; > [file location/name]

By using awk you can split a line into cells if you have a common delimiter like in this case a space. So using awk you could put something like this:

Code:

find [directory] -type f -exec awk '{ print $8 }' {} \; > [file location/name]

or you could use cut to do the same.

Code:

find [directory] -type f -exec cut -d " " -f8 {} \; > [file location/name]

That would give you an output with the trailing comma included, so you can use sed to 'clip' it off and replace it by a newline character to get all the email addresses in a list.

I'm sure there are even other ways, as other users more acquainted with Bash will surely point out.

Kind regards,

Eric

AlucardZero · 02-13-2010, 08:49 AM

grep has a --only-matching option

Hipants · 02-14-2010, 05:15 AM

Hi, I think i might have the solution to your problem. I have found that your best friend in the linux world for text problems like this is grep, cut, and awk.

Grep is used to search for a whole line in a text file
Cut is used a lot of times with grep to separate the line, this is the one i used here, and awk does this and a whole bunch more. Takes a bit of practice but is a life saver in the end.

Anyway enough rambling.

Name: Bob, PhoneNumber: *****, MobileNumber: *****, Email: ***@**.com,

The way would do this is either separating by the colon : or the , first

1. cut -d: -f4 textfile

The -d is the delimeter or separator. In this case it is a : but it could be anything.

The -f is the field separator or the number of : in. Name is one, Bob, Phonenumber is two, 12345,MobileNumber is three, 1939493,Email is four and ***@**.com, is five etc

This will return us ***@**.com,

But the problem is the emails still have a comma so we will use it again this time with the delimeter as a comma.

so

Final Answer

cut -d: -f5 inputfile | cut -d, -f1 > outfile

Hope this helps...

colucix · 02-14-2010, 05:59 AM

Quote:

Originally Posted by Hb_Kai

I've tried such things as grep but grep could only print the whole lines and that's about as near as I could.

Not really true. Look at the -o option: it will print only the matching string. If more matches are on the same line, they will be printed on separate lines.

In a more general case, where the file format is strictly the one you've posted in the OP (that is a CSV format and lines made of key/value pairs) you can try something like this:

Code:

awk 'BEGIN{RS = ","}/Email/{print $NF}' file > output_file

In this way awk will consider commas as record separators and for records matching the key (Email) it will print the value.

Hb_Kai · 02-18-2010, 09:09 AM

Hey. I'm sorry for the late reply. I have been away the last couple of days. That one done the job exactly how I wanted. Thank you very much colucix.