LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-17-2011, 01:02 PM   #1
the_file
LQ Newbie
 
Registered: Jan 2011
Posts: 4

Rep: Reputation: 0
Bash script to fgrep a large file. With list as source for searching.


Hi,
I need to fgrep a list of things which are in a file. The file in which I will do the SEACHING is a large text file and I need fgrep to output each item from the list as a file with the item from the list as the file name.

Its kinda like this:

./script list.txt largefile.txt

output would be

jack.txt
screen.txt
blah.txt

I don't know bash all to well since I am learning it. Can anybody write this kind of thing?.

Thanks in advance.
 
Old 01-17-2011, 01:27 PM   #2
ruario
Senior Member
 
Registered: Jan 2011
Location: Oslo, Norway
Distribution: Slackware
Posts: 2,557

Rep: Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761
Assuming list.txt contains:
Code:
jack
screen
blah
And largefile.txt contained something like:
Code:
hello
jack
screen
dog
john
food
street
blah
corner
clock
bike
Then I think what you are asking for is:
Code:
fgrep -f list.txt largefile.txt | sed "s/$/.txt/"
So your script would therefore look something like:
Code:
#!/bin/sh
fgrep -f $1 $2 | sed "s/$/.txt/"
 
Old 01-17-2011, 01:33 PM   #3
ruario
Senior Member
 
Registered: Jan 2011
Location: Oslo, Norway
Distribution: Slackware
Posts: 2,557

Rep: Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761
If largefile.txt looks more like:
Code:
There was a guy called jack. He liked to watch tv.
But only if the tv had a large screen.

He tried to convince friends that this was the best
way but they found boring and hadly listened. To
them it sounded like blah.
Then you probably want:
Code:
fgrep -of list.txt largefile.txt | sed "s/$/.txt/"
and your script would therefore look something like:
Code:
#!/bin/sh
fgrep -of $1 $2 | sed "s/$/.txt/"
 
Old 01-17-2011, 02:13 PM   #4
the_file
LQ Newbie
 
Registered: Jan 2011
Posts: 4

Original Poster
Rep: Reputation: 0
Unfortunatly those scripts didn't work at all =(

I need to have each item from the list be a file each containing restults from the large file, essentially I want to the grab the whole line that containts something from the list. And the search items do have white spaces =/

But non the less I think were getting close.
 
Old 01-17-2011, 02:39 PM   #5
ruario
Senior Member
 
Registered: Jan 2011
Location: Oslo, Norway
Distribution: Slackware
Posts: 2,557

Rep: Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761
I honestly cannot image what you are asking. Do you want to actually create files? Could you provide me an example of what list.txt and largefile.txt might look like. And if you are trying to output files, what you expect the contents of say "jack.txt" would look like after your script successfully completed.
 
Old 01-17-2011, 04:01 PM   #6
ruario
Senior Member
 
Registered: Jan 2011
Location: Oslo, Norway
Distribution: Slackware
Posts: 2,557

Rep: Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761
Ok, I thought about this a little and I think I understand what you want.

Assuming list.txt contains:
Code:
jack
screen
blah
and largefile.txt contained:
Code:
There was a guy called jack. He liked to watch tv
but only if the tv had a large screen.

He tried to convince his friends to join him
but thought this sounded like a load of blah.
You want your command to produce three files.

A jack.txt file that contains:
Code:
There was a guy called jack. He liked to watch tv
A screen.txt file that contains:
Code:
but only if the tv had a large screen.
A blah.txt file that contains:
Code:
but thought this sounded like a load of blah.
Is this what you had in mind??
 
Old 01-17-2011, 04:12 PM   #7
ruario
Senior Member
 
Registered: Jan 2011
Location: Oslo, Norway
Distribution: Slackware
Posts: 2,557

Rep: Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761
Hmm ... Ok I assume you wanted to make a script because you thought this would be hard but if it was me I'd probably not bother with making a script and do it as one line with the wonderful GNU Parallel.

Code:
parallel -a list.txt 'fgrep "{}" largefile.txt > "{}.txt"'
P.S. If you don't have parallel search for it in your distro's repository and install it. It is great for stuff like this and a whole lot more!
 
Old 01-17-2011, 04:38 PM   #8
ruario
Senior Member
 
Registered: Jan 2011
Location: Oslo, Norway
Distribution: Slackware
Posts: 2,557

Rep: Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761
Re-reading your original request it was actually quite clear. For some reason I had presumed this was just part of some script that you were working on. I hadn't realised that you had summed up your entire requirements. Sorry for the confusion before!
 
Old 01-17-2011, 06:06 PM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Well not as clean as parallel (which I don't have either ), the following awk can work:
Code:
awk 'NR=FNR{words[i++]=$0;next}{for(x=0;x<i;x++)if($0 ~ words[x])print > words[x]".txt"}' list.txt largefile.txt
Or with bash:
Code:
#!/bin/bash

while read -r word
do
    grep $word largefile.txt > ${word}.txt
done<list.txt
 
1 members found this post helpful.
Old 01-17-2011, 11:28 PM   #10
ruario
Senior Member
 
Registered: Jan 2011
Location: Oslo, Norway
Distribution: Slackware
Posts: 2,557

Rep: Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761
Quote:
Originally Posted by grail View Post
Well not as clean as parallel (which I don't have either )
Consider getting it. What I did was just a really simple demo of what is possible. Check out these two introductory videos by the parallel author himself if you really want to get a glimpse of what is possible:

http://www.youtube.com/watch?v=OpaiGYxkSuQ
http://www.youtube.com/watch?v=P40akGWJ_gY

If for some reason your favoured distro does not include parallel, you can always get it from here:

http://www.gnu.org/software/parallel/

There are links to rpms and debs as well as the source. I can't recommend it highly enough.
 
Old 01-18-2011, 12:07 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
@ruario - cheers will check it out
 
Old 01-22-2011, 08:47 PM   #12
tange
LQ Newbie
 
Registered: Jul 2010
Posts: 13

Rep: Reputation: 9
Quote:
Originally Posted by ruario View Post
Code:
parallel -a list.txt 'fgrep "{}" largefile.txt > "{}.txt"'
Here is a tiny optimization. GNU Parallel is quite liberal in quoting, so you only need to quote special shell chars (in this case the >):

Code:
parallel -a list.txt fgrep {} largefile.txt \> {}.txt
This with do The Right Thing even if list.txt contains lines with words and spaces.

I know it is hard to get used to, when you are used to xargs' need for quoting everything.


/Ole
PS: Thanks for http://my.opera.com/ruario/blog/2011...h-gnu-parallel
 
1 members found this post helpful.
Old 01-23-2011, 07:50 AM   #13
ruario
Senior Member
 
Registered: Jan 2011
Location: Oslo, Norway
Distribution: Slackware
Posts: 2,557

Rep: Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761
@tange: Wow a reply from the Parallel author himself! Thanks for the quoting tip. Yeah that might take some getting used to but I can see it would make things so much more readable when applied to more complex examples.

I'm glad you read my blog post. I had been meant to write it for a while and it was actually this thread that reminded me to do it. I only touch on the basic stuff there because the few readers I have tend to be those interested in Opera development, so I wanted to use an example that would mean something to them. Also you have covered the more powerful stuff in detail in the documentation you provide already.

P.S. Thanks for Parallel. I couldn't live without it now. I just hope a few more distros start to include it by default.
 
Old 01-24-2011, 01:08 AM   #14
ruario
Senior Member
 
Registered: Jan 2011
Location: Oslo, Norway
Distribution: Slackware
Posts: 2,557

Rep: Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761
@tange: I decided to write another post. Once again my example is recursive unpacking of an archive but this time I pull apart a deb for the purpose editing and then put it back together (both times using Parallel). Hopefully this will be interesting to a wider range of people and hence encourage more people to take a look at your software.

http://my.opera.com/ruario/blog/2011...e-fun-with-gnu

Last edited by ruario; 01-24-2011 at 01:09 AM.
 
Old 01-24-2011, 03:42 PM   #15
ruario
Senior Member
 
Registered: Jan 2011
Location: Oslo, Norway
Distribution: Slackware
Posts: 2,557

Rep: Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761Reputation: 1761
Whilst you obviously should use parallel. at a push on a system without it installed you can force xargs to do this as follows:
Code:
xargs -a list.txt -d "\n" -I {} bash -c "fgrep '{}' largefile.txt > '{}.txt'"
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need bash script to list files, drop extension and dump to file talwar_ Programming 10 06-03-2011 09:18 AM
Searching tips for bash script hardening norbert74 Linux - Security 6 02-09-2010 12:26 PM
File too large (script is too large to execute) DeuceNegative Linux - General 1 05-09-2007 12:10 AM
reload source file on bash script noisebleed Linux - General 7 05-01-2007 03:25 AM
bash script for database searching using crontab saurya_s Linux - Software 5 01-22-2004 08:53 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 02:05 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration