LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 01-22-2005, 05:47 PM   #1
phoeniks
LQ Newbie
 
Registered: Aug 2004
Posts: 9

Rep: Reputation: 0
bash scripting-prblms w loops and grep


[CODE]
#!/bin/bash

for CURDOC in `grep -hr "From: Doctor" /home/jsb46/cs265/output/DrList`
do

echo \"$CURDOC\"
echo -----------------
grep -hr "From: Doctor $CURDOC" *|wc -l
echo

done
[CODE]


what's supposed to happen: The text file DrList is a list of doctors pulled from message board backup text files. One file per post. This was done with another script that looked for "From: Doctor" and threw the entire line into DrList. So DrLst looks like this:

From: Doctor Doctor's_Name

now I want to find out how many messages doctor has posted based on the same principal.

the problem : The searching goes fine, but i can't get the entire line of text from DrList into the variable, only one word at a time, so it searches for From:, Doctor, and Doctor's Name separately. it also searches for every occurence of a word each time it appears in DrList.

Bottom Line : please help me grep for a string like "From: Doctor Achilles" rather than all three text chunks separately. Any help is appreciated, so thanks in advance!

Last edited by phoeniks; 01-22-2005 at 05:48 PM.
 
Old 01-22-2005, 08:16 PM   #2
J_Szucs
Senior Member
 
Registered: Nov 2001
Location: Budapest, Hungary
Distribution: SuSE 6.4-11.3, Dsl linux, FreeBSD 4.3-6.2, Mandrake 8.2, Redhat, UHU, Debian Etch
Posts: 1,126

Rep: Reputation: 58
Well, if you had just one file (say... data.txt) containing some lines, then this single command will count the occurrences of each line in data.txt, and it will print out each line followed by the number of its occurrence:
awk '{count[$0]=count[$0]+1} END {for (data in count) print data, count[data]}' data.txt

Alternatively, you could cat the datafile to the standard input of awk:
cat data.txt | awk '{count[$0]=count[$0]+1} END {for (data in count) print data, count[data]}'

You could also use wildcards, if you have several files:
awk '{count[$0]=count[$0]+1} END {for (data in count) print data, count[data]}' *.txt

You could filter the lines by pattern, if you do not want to count all lines:
awk '/pattern/ {count[$0]=count[$0]+1} END {for (data in count) print data, count[data]}' *.txt

You could also print the count first, and use the sort command to sort the output of awk in descending order of count:
awk '{count[$0]=count[$0]+1} END {for (data in count) print count[data], data}' *.txt | sort -nr

So you do not really need a script for your task; just a single awk command.

A note: you could also consider to directly feed the mails to awk and use the pattern filter, instead of generating those intermediate files in DrList.

P.S.:
Your script might also work (though much less efficiently), if you inserted these lines before the for cycle:
IFS="
"

Last edited by J_Szucs; 01-22-2005 at 08:29 PM.
 
Old 01-23-2005, 11:52 AM   #3
phoeniks
LQ Newbie
 
Registered: Aug 2004
Posts: 9

Original Poster
Rep: Reputation: 0
the thing is though, that the script is being run over a lot of files in different directories. It's going to be run in a specified directory containing about 65 numbered subdirectories. Inside of these subdirectories are the message files i'm searching. THe script has to be run in that base directory, that's why i was using the recursive grep option. So, when run, it has to go into each subdirectory, search each file for occurences of X (x being each line of DrList respectively) and report the number of instances of X over those 25000 files. My experience with awk is only about a week old, so if I can combine that inside of the bash script or make awk search subdirectories recursively, I'm not really sure how to do it. Thanks again.


EDIT: maybe this is a better option; I also tried using awk to grab the third field from drlist, which is the last name, and then pump that into a variable one at a time , and then grep -rh "From: Doctor $DOCTOR" . i've tried a bunch of things and all come very close to working, its just that i can't properly search for the entire string 'From: Doctor $DOCTOR', instead i get three separate searches for each field in those quotes.

Last edited by phoeniks; 01-23-2005 at 12:03 PM.
 
Old 01-24-2005, 04:39 AM   #4
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,290

Rep: Reputation: 174Reputation: 174
have you looked at using
sort & uniq -c?

e.g.

data:
Code:
billym.primadtpdev>cat ~/1

From: Doctor Jim
From: Doctor Bob
From: Doctor Ringo
From: Doctor Ringo
From: Doctor billy
From: Doctor billy
From: Doctor James
From: Doctor billy
From: Doctor Who
sort & uniq:
Code:
billym.primadtpdev>sort ~/1 | uniq -c

   3 From: Doctor billy
   1 From: Doctor Bob
   1 From: Doctor James
   1 From: Doctor Jim
   2 From: Doctor Ringo
   1 From: Doctor Who
 
Old 01-24-2005, 11:12 AM   #5
/bin/bash
Senior Member
 
Registered: Jul 2003
Location: Indiana
Distribution: Mandrake Slackware-current QNX4.25
Posts: 1,802

Rep: Reputation: 46
Code:
while read CURDOC
do
   echo \"$CURDOC\"
   echo -----------------
   grep -hr "From: Doctor $CURDOC" *|wc -l
   echo
done </home/jsb46/cs265/output/DrList
 
Old 01-24-2005, 04:00 PM   #6
J_Szucs
Senior Member
 
Registered: Nov 2001
Location: Budapest, Hungary
Distribution: SuSE 6.4-11.3, Dsl linux, FreeBSD 4.3-6.2, Mandrake 8.2, Redhat, UHU, Debian Etch
Posts: 1,126

Rep: Reputation: 58
Well, just now I realized what your task is actually. So, you have doctor names in file /path/to/drlist...

Though others posted complete solutions that work, I post here an alternative that is also supposed to work and has an advantage over other solutions: it runs grep only once (and with the -F option), so it is supposed to be much faster, especially if you have a lot of files to be searched:

grep -R -F "`cat /path/to/drlist | sed 's/^.*$/From: Doctor &/'`" /path/to/files/* | sort | uniq -c

If drlist contained lines formatted like this: "From: Doctor doctorname", then the counting command would be more simple:

grep -R -f /path/to/drlist /path/to/files/* | sort | uniq -c

(The latter command makes use of the fact that grep can search several regexps in one turn, and those regexps can come from a file - the drlist file in this case).

However, I must admit that with the simple "uniq -c" command (I did not know that option of uniq), there is no need for awk for this task.

Finally, I am really interested if you find the above commands actually faster. Please post your findings.

Last edited by J_Szucs; 01-24-2005 at 04:20 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bash, LS, For loops, and whitespaces in directories jhrbek Programming 27 09-22-2010 05:17 AM
Bash For Loops gives syntax error meadensi Linux - Newbie 2 02-23-2005 10:30 AM
bash, loops and spaces in filenames shy Programming 5 11-08-2004 07:43 AM
bash scripting - referring to external arguments into loops linsson Linux - General 2 07-23-2004 12:24 PM
bash - while + until loops grouping? trees Linux - General 2 02-19-2004 02:29 PM


All times are GMT -5. The time now is 02:59 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration