LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   grep file (https://www.linuxquestions.org/questions/linux-newbie-8/grep-file-919632/)

ust 12-20-2011 04:39 AM

grep file
 
I have a large plain text file , I would like to extract part of text from it , the condition is

1) extract the line that have the text "2011" and also
2) extract the line that have the text "warning" & "jobs" .

can advise what can i do ? thx

aazkan 12-20-2011 04:47 AM

This should work

cat somebigfile.txt |grep -ie 2011 -ie warning -ie jobs

HTH

ust 12-20-2011 04:51 AM

thx reply ,

But I also would like to exclude those line do not have "warning" & "jobs" , what can i do ?

Thanks.

aazkan 12-20-2011 05:00 AM

cat somebigfile.txt |grep -ie 2011 -iv warning -iv jobs

but you probably need to filter it twice (first output to a tmp file, cat the tmp file and grep it again) so you'd get your results.
Sorry, it has been a long day and can't wait to go off.

David the H. 12-20-2011 10:51 AM

@ust: It usually helps to post an actual example of your text, along with an example of the desired output.

And remember to please use [code][/code] tags around your code and data, to preserve formatting and to improve readability.


@aazkan: A few comments on this:

Code:

cat somebigfile.txt |grep -ie 2011 -iv warning -iv jobs
1) Useless use of cat. Just pass the filename directly to grep.

2) While it's not necessary as long as we're only searching for single simple words, you really should get into the habit of always quoting the expressions. Without quotes, any shell-reserved characters in them will be interpreted, and it will be broken up into separate arguments on any whitespace, probably breaking the command.

I personally think it also helps readability, as the expression being searched for is clearly differentiated from the other arguments around it.

Read these three links for a better understanding of how the shell handles arguments and whitespace:
http://mywiki.wooledge.org/Arguments
http://mywiki.wooledge.org/WordSplitting
http://mywiki.wooledge.org/Quotes

3) When using multiple expressions at once in grep, you need to prefix each and every one of them with the -e option. Also, the -i and -v options only need to be given once, as they unfortunately always apply globally. It's thus impossible in grep to simultaneously print lines containing one pattern and exclude lines containing another. You'd have to chain two grep commands together, or use a different tool such as sed to do that.
Code:


grep '2011' somebigfile.txt | grep -iv -e 'warning' -e 'jobs'

sed -rn -e '/2011/ { /(jobs|warning)/!p }' somebigfile.txt

4) When giving solutions to newbies, it's usually courteous to also post an explanation of how the commands you gave work, rather than just plop down code without any context. It can also help you to catch your own mistakes before you post them.

So in the above, the first grep command simply outputs lines that contain the string "2011", and pipes them into a second grep for further filtering. There, the -i option indicates case-insensitive matching, and the -v option inverts the output. The two -e expressions are thus the strings that we want to exclude.

So note that this apparently does NOT do what the OP asked (although we could use some clarification on this, as I mentioned above). This prints only the lines containing "2011" that don't also have warning or jobs in them. If I'm reading correctly though, I believe what he wants are lines that contain both 2011 and either warning or jobs. In which case, just remove the -v flag from the second grep.


The sed command does exactly the same thing as the two grep commands. First I used -r to enable extended regex (explained below), and -n to turn off printing by default. The -e option is similar to grep's.

/../ :indicates a regex pattern to match, in this case lines containing "2011".

{..} :groups the subsequent commands that operate on the first match.

/../ :again matches lines, this time from the results of the previous match.

(..|..) :means match either A or B. This is the part that needs the -r option. Although in gnu sed you can instead backslash escape the bracketing characters for the same effect ("\(..\|..\)"), I think it's cleaner just to enable extended regex.

! :inverts the condition of the match, similar to grep's -v option.

p :finally, the p command prints the resulting matches.

So again, just remove the ! to only get lines that contain both patterns.


@ust: I recommend that you read the man and info pages for grep and sed. It's generally a good idea to take the time to learn how the tools you're using really work.

Here are some useful sed resources for you too:
http://www.grymoire.com/Unix/Sed.html
http://sed.sourceforge.net/grabbag/
http://sed.sourceforge.net/sedfaq.html
http://sed.sourceforge.net/sed1line.txt

You should also learn at least the basics of regular expressions:
http://mywiki.wooledge.org/RegularExpression
http://www.grymoire.com/Unix/Regular.html

ust 12-20-2011 07:13 PM

The example :

the file content is
2011
2011
aaa warning jobs
bbb warning
ccc jobs
ddd

then the output should be as below , can advise . Thanks.
2011
2011
aaa warning jobs

aazkan 12-20-2011 08:57 PM

Hi David,

Thanks for the pointers and appreciate your input. I'll work on my future replies to question.

Regards.

salasi 12-20-2011 09:16 PM

Quote:

Originally Posted by ust (Post 4554322)
can advise what can i do ? thx

At this point, you really should be able to understand most of the points made by the contributors to this thread, and write some of your own code. You should be able to present some of your own code (in code tags, of course) and say exactly what it doesn't do that you want it to do.


All times are GMT -5. The time now is 07:48 PM.