Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
01-22-2005, 05:47 PM
|
#1
|
|
LQ Newbie
Registered: Aug 2004
Posts: 9
Rep:
|
bash scripting-prblms w loops and grep
[CODE]
#!/bin/bash
for CURDOC in `grep -hr "From: Doctor" /home/jsb46/cs265/output/DrList`
do
echo \"$CURDOC\"
echo -----------------
grep -hr "From: Doctor $CURDOC" *|wc -l
echo
done
[CODE]
what's supposed to happen: The text file DrList is a list of doctors pulled from message board backup text files. One file per post. This was done with another script that looked for "From: Doctor" and threw the entire line into DrList. So DrLst looks like this:
From: Doctor Doctor's_Name
now I want to find out how many messages doctor has posted based on the same principal.
the problem : The searching goes fine, but i can't get the entire line of text from DrList into the variable, only one word at a time, so it searches for From:, Doctor, and Doctor's Name separately. it also searches for every occurence of a word each time it appears in DrList.
Bottom Line : please help me grep for a string like "From: Doctor Achilles" rather than all three text chunks separately. Any help is appreciated, so thanks in advance!
Last edited by phoeniks; 01-22-2005 at 05:48 PM.
|
|
|
|
01-22-2005, 08:16 PM
|
#2
|
|
Senior Member
Registered: Nov 2001
Location: Budapest, Hungary
Distribution: SuSE 6.4-11.3, Dsl linux, FreeBSD 4.3-6.2, Mandrake 8.2, Redhat, UHU, Debian Etch
Posts: 1,126
Rep:
|
Well, if you had just one file (say... data.txt) containing some lines, then this single command will count the occurrences of each line in data.txt, and it will print out each line followed by the number of its occurrence:
awk '{count[$0]=count[$0]+1} END {for (data in count) print data, count[data]}' data.txt
Alternatively, you could cat the datafile to the standard input of awk:
cat data.txt | awk '{count[$0]=count[$0]+1} END {for (data in count) print data, count[data]}'
You could also use wildcards, if you have several files:
awk '{count[$0]=count[$0]+1} END {for (data in count) print data, count[data]}' *.txt
You could filter the lines by pattern, if you do not want to count all lines:
awk '/pattern/ {count[$0]=count[$0]+1} END {for (data in count) print data, count[data]}' *.txt
You could also print the count first, and use the sort command to sort the output of awk in descending order of count:
awk '{count[$0]=count[$0]+1} END {for (data in count) print count[data], data}' *.txt | sort -nr
So you do not really need a script for your task; just a single awk command.
A note: you could also consider to directly feed the mails to awk and use the pattern filter, instead of generating those intermediate files in DrList.
P.S.:
Your script might also work (though much less efficiently), if you inserted these lines before the for cycle:
IFS="
"
Last edited by J_Szucs; 01-22-2005 at 08:29 PM.
|
|
|
|
01-23-2005, 11:52 AM
|
#3
|
|
LQ Newbie
Registered: Aug 2004
Posts: 9
Original Poster
Rep:
|
the thing is though, that the script is being run over a lot of files in different directories. It's going to be run in a specified directory containing about 65 numbered subdirectories. Inside of these subdirectories are the message files i'm searching. THe script has to be run in that base directory, that's why i was using the recursive grep option. So, when run, it has to go into each subdirectory, search each file for occurences of X (x being each line of DrList respectively) and report the number of instances of X over those 25000 files. My experience with awk is only about a week old, so if I can combine that inside of the bash script or make awk search subdirectories recursively, I'm not really sure how to do it. Thanks again.
EDIT: maybe this is a better option; I also tried using awk to grab the third field from drlist, which is the last name, and then pump that into a variable one at a time , and then grep -rh "From: Doctor $DOCTOR" . i've tried a bunch of things and all come very close to working, its just that i can't properly search for the entire string 'From: Doctor $DOCTOR', instead i get three separate searches for each field in those quotes.
Last edited by phoeniks; 01-23-2005 at 12:03 PM.
|
|
|
|
01-24-2005, 04:39 AM
|
#4
|
|
Senior Member
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,211
Rep: 
|
have you looked at using
sort & uniq -c?
e.g.
data:
Code:
billym.primadtpdev>cat ~/1
From: Doctor Jim
From: Doctor Bob
From: Doctor Ringo
From: Doctor Ringo
From: Doctor billy
From: Doctor billy
From: Doctor James
From: Doctor billy
From: Doctor Who
sort & uniq:
Code:
billym.primadtpdev>sort ~/1 | uniq -c
3 From: Doctor billy
1 From: Doctor Bob
1 From: Doctor James
1 From: Doctor Jim
2 From: Doctor Ringo
1 From: Doctor Who
|
|
|
|
01-24-2005, 11:12 AM
|
#5
|
|
Senior Member
Registered: Jul 2003
Location: Indiana
Distribution: Mandrake Slackware-current QNX4.25
Posts: 1,802
Rep:
|
Code:
while read CURDOC
do
echo \"$CURDOC\"
echo -----------------
grep -hr "From: Doctor $CURDOC" *|wc -l
echo
done </home/jsb46/cs265/output/DrList
|
|
|
|
01-24-2005, 04:00 PM
|
#6
|
|
Senior Member
Registered: Nov 2001
Location: Budapest, Hungary
Distribution: SuSE 6.4-11.3, Dsl linux, FreeBSD 4.3-6.2, Mandrake 8.2, Redhat, UHU, Debian Etch
Posts: 1,126
Rep:
|
Well, just now I realized what your task is actually. So, you have doctor names in file /path/to/drlist...
Though others posted complete solutions that work, I post here an alternative that is also supposed to work and has an advantage over other solutions: it runs grep only once (and with the -F option), so it is supposed to be much faster, especially if you have a lot of files to be searched:
grep -R -F "`cat /path/to/drlist | sed 's/^.*$/From: Doctor &/'`" /path/to/files/* | sort | uniq -c
If drlist contained lines formatted like this: "From: Doctor doctorname", then the counting command would be more simple:
grep -R -f /path/to/drlist /path/to/files/* | sort | uniq -c
(The latter command makes use of the fact that grep can search several regexps in one turn, and those regexps can come from a file - the drlist file in this case).
However, I must admit that with the simple "uniq -c" command (I did not know that option of uniq), there is no need for awk for this task.
Finally, I am really interested if you find the above commands actually faster. Please post your findings.
Last edited by J_Szucs; 01-24-2005 at 04:20 PM.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 01:12 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|