LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-06-2012, 11:05 AM   #1
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Rep: Reputation: Disabled
limiting grep's return


Hi guys,

I have a bunch of files that have many columns of space seperated data. I need to search through all those files and find which ones have a specific numerical string - easy to do with the awk command, but awk will list out all those files that have that string in any column or is a part of a larger number. Well that doesn't work for me.

So I'm trying to write an awk script that uses grep that will only search a user specified column of data and must not be part of a larger number. In that second requirement, I'm trying a condition that looks for a space both before and after the numerical string.

I'm thinking this should be fairly simple, and done many times befor, but it's not for me

thankz, Tabitha
 
Old 09-06-2012, 11:20 AM   #2
kabamaru
Member
 
Registered: Dec 2011
Location: Greece
Distribution: Slackware
Posts: 276

Rep: Reputation: 134Reputation: 134
I'm not an AWK guru by any stretch but you can specify the field (column) and exact numerical string like this:

Code:
awk '$NUMBER_OF_COLUMN == NUMERICAL_STRING' file1 file2 file3
example (5th column, number "546"):

Code:
awk '$5 == 546' file1 file2 file3
This will only match lines where the 5th column is equal to that number.
 
Old 09-06-2012, 12:46 PM   #3
kabamaru
Member
 
Registered: Dec 2011
Location: Greece
Distribution: Slackware
Posts: 276

Rep: Reputation: 134Reputation: 134
And, to just print the files that match the criteria:

Code:
awk '$5 == 546 { print FILENAME }' file1 file2 file3 | uniq
...or the filenames and the matching lines:
Code:
awk '$5 == 546 { print FILENAME, ":", $0 }' file1 file2 file3
 
1 members found this post helpful.
Old 09-06-2012, 01:53 PM   #4
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Original Poster
Rep: Reputation: Disabled
to loop over a hundred plus files, I can't list them out, so can a do a while/do/done loop like this:

Code:
i=1
numfiles=10000
# choosing a numfiles value that will be much much greater than the actual number of files in the directory
while ((i<=numfiles))
do
   awk '$5 == 546 { print FILENAME }' file1 file2 file3 | uniq   
((i=i+1))
done
I'm not able to test this right now, maybe not till Friday or Monday, so that's why I'm asking and not testing it out

Last edited by atjurhs; 09-07-2012 at 07:18 PM.
 
Old 09-06-2012, 03:16 PM   #5
kabamaru
Member
 
Registered: Dec 2011
Location: Greece
Distribution: Slackware
Posts: 276

Rep: Reputation: 134Reputation: 134
No, that loop will execute the same command 10000 times on the same files (file1, file2, file3).

Why not use bash's filename expansion through wildcards? That way awk is only called once.
If you want all the files in a directory to be processed:

Code:
awk '$5 == 546 { print FILENAME }' * | uniq
or all the *.txt files in the directory:
Code:
awk '$5 == 546 { print FILENAME }' *.txt | uniq
If you want to put this in a script:
Code:
#!/bin/sh
# Filename: myscript.sh

awk '$5 == 546 { print FILENAME }' "$@" | uniq
make it executable:
Code:
chmod +x myscript.sh
and run the script like this (examples):
Code:
./myscript.sh file_1 file_2 file_3 file_n
or (all files)
Code:
./myscript.sh *
or (all .txt, .log, and .conf files):
Code:
./myscript.sh *.{txt,log,conf}

Last edited by kabamaru; 09-06-2012 at 03:27 PM.
 
1 members found this post helpful.
Old 09-06-2012, 08:41 PM   #6
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Original Poster
Rep: Reputation: Disabled
very cool! thanks!!!
 
Old 09-07-2012, 03:05 AM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Could you please remove the long, unbroken "###" lines from your post? They do nothing but make my browser window side-scroll. Thanks.

When posting questions about processing text files, it usually helps to provide an actual example of the input text, and what the output needs to be. Also post any commands you've already tried, so that we can see what you're thinking.

I say this because awk is a full scripting language of its own and capable of doing very exact matches. When you say something like "awk will list out all those files that have that string in any column or is a part of a larger number", that usually just means you haven't used the right awk command.
 
Old 09-11-2012, 10:36 AM   #8
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Original Poster
Rep: Reputation: Disabled
Kabamaru, I ended up using:

Code:
#!/bin/sh

awk '$5 == 546 { print FILENAME }' *.txt | uniq > out.txt
and it works really really well, thanks soooo much!

I'd like to add one more peice to the code, a line counter. So that in the output file instead of just listing:

fileABC.txt
fileDEF.txt
fileIJK.txt
fileQRS.txt
fileXYZ.tkt

Is there a way to run this script with a line counter, so that the number of occrances if each line in each file is also listed in the output file. Something like this:

fileABC.txt 5938
fileDEF.txt 13
fileIJK.txt 19
fileQRS.txt 3984
fileXYZ.tkt 6105

this way now I know not to waste my time on fileDEF.txt and fileIJK.txt and spend more time analying the other files because they have more data for me to use.

Last edited by atjurhs; 09-11-2012 at 10:41 AM.
 
Old 09-11-2012, 11:39 AM   #9
kabamaru
Member
 
Registered: Dec 2011
Location: Greece
Distribution: Slackware
Posts: 276

Rep: Reputation: 134Reputation: 134
How about this one? Put this in a file e.g. "myscript.awk":

Code:
#!/usr/bin/awk -f

$5 == 546 { if (FILENAME != last && last != "") {
                print last, count
                count = 0
            }
            count++
            last = FILENAME
          }

END { print last, count }
First make it executable, and then run it like this:

Code:
./myscript.awk *.txt
Replace the red characters appropriately.

Last edited by kabamaru; 09-11-2012 at 02:52 PM.
 
Old 09-11-2012, 03:52 PM   #10
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Original Poster
Rep: Reputation: Disabled
it ran successfully by

Code:
 ./myscript.awk *.txt | uniq > out.txt
I'm wondering if there is a way to put the
Code:
 *.txt | uniq > out.txt
inside the myscript.awk ?

I tried putting it at the end much like you did on you did in your 5th post, but that didn't work.

Tabitha
 
Old 09-11-2012, 04:29 PM   #11
kabamaru
Member
 
Registered: Dec 2011
Location: Greece
Distribution: Slackware
Posts: 276

Rep: Reputation: 134Reputation: 134
That's easy. You can put all this in a shell script with your awk program enclosed within single quotes:
Code:
#!/bin/sh

awk '
$5 == 546 { if (FILENAME != last && last != "") {
                print last, count
                count = 0
            }
            count++
            last = FILENAME
          }

END { print last, count }
' *.txt > out.txt
Btw you don't need 'uniq' anymore, as every output line will inevitably be unique ;-)

Last edited by kabamaru; 09-11-2012 at 04:39 PM.
 
Old 09-11-2012, 04:52 PM   #12
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Original Poster
Rep: Reputation: Disabled
aaaargh!!!

I almost got that, only I placed the single quote after

Code:
last = FILENAME
}'
thinking it should close before the END. I'm almost getting this stuff, just barely missing

thank so so much, here's a virtual hug
 
Old 09-12-2012, 04:18 AM   #13
kabamaru
Member
 
Registered: Dec 2011
Location: Greece
Distribution: Slackware
Posts: 276

Rep: Reputation: 134Reputation: 134
And you can sort the results by number of occurrences (descending order) by replacing

Code:
> out.txt
with

Code:
| sort -k2nr > out.txt
Output:

Code:
fileXYZ.tkt 6105
fileABC.txt 5938
fileQRS.txt 3984
fileIJK.txt 19
fileDEF.txt 13

Cheers.
 
1 members found this post helpful.
Old 09-12-2012, 08:52 AM   #14
atjurhs
Member
 
Registered: Aug 2012
Posts: 311

Original Poster
Rep: Reputation: Disabled
cool!!!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
grep detecting carriage return, how ? Grafbak Programming 22 01-13-2010 01:19 PM
limiting line matches in grep genderbender Programming 5 07-05-2008 05:18 PM
Limiting how deep in file grep searches Clutch2 Linux - Newbie 17 02-26-2008 03:35 AM
How do you return a value with grep ? pppaaarrrkkk Linux - Newbie 2 11-30-2007 06:37 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:14 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration