LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-20-2011, 05:39 AM   #1
ust
Senior Member
 
Registered: Mar 2003
Location: fasdf
Distribution: Debian / Suse /RHEL
Posts: 1,130

Rep: Reputation: 30
grep file


I have a large plain text file , I would like to extract part of text from it , the condition is

1) extract the line that have the text "2011" and also
2) extract the line that have the text "warning" & "jobs" .

can advise what can i do ? thx
 
Old 12-20-2011, 05:47 AM   #2
aazkan
Member
 
Registered: Jan 2008
Posts: 72

Rep: Reputation: 5
This should work

cat somebigfile.txt |grep -ie 2011 -ie warning -ie jobs

HTH
 
Old 12-20-2011, 05:51 AM   #3
ust
Senior Member
 
Registered: Mar 2003
Location: fasdf
Distribution: Debian / Suse /RHEL
Posts: 1,130

Original Poster
Rep: Reputation: 30
thx reply ,

But I also would like to exclude those line do not have "warning" & "jobs" , what can i do ?

Thanks.
 
Old 12-20-2011, 06:00 AM   #4
aazkan
Member
 
Registered: Jan 2008
Posts: 72

Rep: Reputation: 5
cat somebigfile.txt |grep -ie 2011 -iv warning -iv jobs

but you probably need to filter it twice (first output to a tmp file, cat the tmp file and grep it again) so you'd get your results.
Sorry, it has been a long day and can't wait to go off.
 
Old 12-20-2011, 11:51 AM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958
@ust: It usually helps to post an actual example of your text, along with an example of the desired output.

And remember to please use [code][/code] tags around your code and data, to preserve formatting and to improve readability.


@aazkan: A few comments on this:

Code:
cat somebigfile.txt |grep -ie 2011 -iv warning -iv jobs
1) Useless use of cat. Just pass the filename directly to grep.

2) While it's not necessary as long as we're only searching for single simple words, you really should get into the habit of always quoting the expressions. Without quotes, any shell-reserved characters in them will be interpreted, and it will be broken up into separate arguments on any whitespace, probably breaking the command.

I personally think it also helps readability, as the expression being searched for is clearly differentiated from the other arguments around it.

Read these three links for a better understanding of how the shell handles arguments and whitespace:
http://mywiki.wooledge.org/Arguments
http://mywiki.wooledge.org/WordSplitting
http://mywiki.wooledge.org/Quotes

3) When using multiple expressions at once in grep, you need to prefix each and every one of them with the -e option. Also, the -i and -v options only need to be given once, as they unfortunately always apply globally. It's thus impossible in grep to simultaneously print lines containing one pattern and exclude lines containing another. You'd have to chain two grep commands together, or use a different tool such as sed to do that.
Code:
grep '2011' somebigfile.txt | grep -iv -e 'warning' -e 'jobs'

sed -rn -e '/2011/ { /(jobs|warning)/!p }' somebigfile.txt
4) When giving solutions to newbies, it's usually courteous to also post an explanation of how the commands you gave work, rather than just plop down code without any context. It can also help you to catch your own mistakes before you post them.

So in the above, the first grep command simply outputs lines that contain the string "2011", and pipes them into a second grep for further filtering. There, the -i option indicates case-insensitive matching, and the -v option inverts the output. The two -e expressions are thus the strings that we want to exclude.

So note that this apparently does NOT do what the OP asked (although we could use some clarification on this, as I mentioned above). This prints only the lines containing "2011" that don't also have warning or jobs in them. If I'm reading correctly though, I believe what he wants are lines that contain both 2011 and either warning or jobs. In which case, just remove the -v flag from the second grep.


The sed command does exactly the same thing as the two grep commands. First I used -r to enable extended regex (explained below), and -n to turn off printing by default. The -e option is similar to grep's.

/../ :indicates a regex pattern to match, in this case lines containing "2011".

{..} :groups the subsequent commands that operate on the first match.

/../ :again matches lines, this time from the results of the previous match.

(..|..) :means match either A or B. This is the part that needs the -r option. Although in gnu sed you can instead backslash escape the bracketing characters for the same effect ("\(..\|..\)"), I think it's cleaner just to enable extended regex.

! :inverts the condition of the match, similar to grep's -v option.

p :finally, the p command prints the resulting matches.

So again, just remove the ! to only get lines that contain both patterns.


@ust: I recommend that you read the man and info pages for grep and sed. It's generally a good idea to take the time to learn how the tools you're using really work.

Here are some useful sed resources for you too:
http://www.grymoire.com/Unix/Sed.html
http://sed.sourceforge.net/grabbag/
http://sed.sourceforge.net/sedfaq.html
http://sed.sourceforge.net/sed1line.txt

You should also learn at least the basics of regular expressions:
http://mywiki.wooledge.org/RegularExpression
http://www.grymoire.com/Unix/Regular.html

Last edited by David the H.; 12-20-2011 at 12:01 PM. Reason: minor fix
 
1 members found this post helpful.
Old 12-20-2011, 08:13 PM   #6
ust
Senior Member
 
Registered: Mar 2003
Location: fasdf
Distribution: Debian / Suse /RHEL
Posts: 1,130

Original Poster
Rep: Reputation: 30
The example :

the file content is
2011
2011
aaa warning jobs
bbb warning
ccc jobs
ddd

then the output should be as below , can advise . Thanks.
2011
2011
aaa warning jobs
 
Old 12-20-2011, 09:57 PM   #7
aazkan
Member
 
Registered: Jan 2008
Posts: 72

Rep: Reputation: 5
Hi David,

Thanks for the pointers and appreciate your input. I'll work on my future replies to question.

Regards.
 
Old 12-20-2011, 10:16 PM   #8
salasi
Senior Member
 
Registered: Jul 2007
Location: Directly above centre of the earth, UK
Distribution: SuSE, plus some hopping
Posts: 4,059

Rep: Reputation: 883Reputation: 883Reputation: 883Reputation: 883Reputation: 883Reputation: 883Reputation: 883
Quote:
Originally Posted by ust View Post
can advise what can i do ? thx
At this point, you really should be able to understand most of the points made by the contributors to this thread, and write some of your own code. You should be able to present some of your own code (in code tags, of course) and say exactly what it doesn't do that you want it to do.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
grep the contents of a file with another file bluethundr Linux - Newbie 3 09-26-2010 02:24 AM
hi ,i have to grep a series say 98782 from a file containing series how can i grep it sushil.cool Linux - Newbie 6 05-25-2010 07:27 AM
Grep file using last specific file patterns jeesun Programming 19 11-24-2009 10:51 PM
grep for string in file after a certain part of the file B-Boy Programming 6 02-18-2009 08:49 AM
grep output on stdout and grep output to file don't match xnomad Linux - General 3 01-13-2007 05:56 AM


All times are GMT -5. The time now is 09:34 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration