Don't worry about being a beginner. We all have to start somewhere. This looks like a good project to learn from.
As I asked in 1), please use
code tags around your script and text.
After reading the OP more carefully, I think I have an easier solution for you using sed, but first lets use your existing script as a learning exercise.
To begin with, I'm afraid I don't see your point about
less. When used with a pipe,
less and
cat do pretty much the same thing. It's the
-m option in
grep that limits the output either way.
In any case, the first thing you should look at is reducing the number of calls to external tools like
grep. Assuming that there's only one email (and Subject, To, From, etc.) per file, how about we just grab all the important lines at once, and process them later inside the shell?
Code:
searchlist='Subject|From|To|Message-ID|Date|Mime-Version|Content-Type|In-Reply-To'
IFS=$'\n' #forces wordbreaking on newlines only, necessary for setting the array
array=( egrep -n "^($searchlist)" "$I" ) )
# or more succinctly, if using bash 4+. Doesn't require changing IFS.
mapfile -t array < <( egrep -n "^($searchlist)" "$I" )
Now we have an array holding all the lines you want to search, prepended by their line number. Next we just have to find the entry containing the "Subject" and the one following it.
Code:
for line in "${array[@]}" ; do
if (( x == 1 )) ; then
end="${line%%:*}"
break
fi
if [[ $line == *Subject:* ]] ; then
start="${line%%:*}"
x=1
fi
done
The first if statement is ignored until Subject is found. That line sets the start value, as well as the variable "x". Then on the next iteration, it sets the end value and breaks the loop.
${line%%:*} strips off everything after the colon, replacing
cut.
Now you have start and end variables with the two matching line numbers. We just need to shift the endpoint by one and extract the final text for counting.
Code:
(( end-- ))
count="$(sed -n "$start,$end p" "$I" )"
echo "${#count}"
We can even dispense with wc, as the shell can count the output, though there may be a minor difference in number as trailing newlines may be removed by the shell.
I hope also you realize that this counts the "Subject: " header part too. If it's important you can adjust the sed command to remove it.
Code:
count="$(sed -n "$start,$end { s/Subject:[ ]*// ; p }" "$I" )"
You can figure the rest out yourself, I'm sure.
And see here for more on doing string manipulations in bash:
parameter expansion
string manipulation
Now, as I mentioned earlier, there's a better way.
sed can extract the block you want directly, if you use the proper address forms.
Code:
#Don't include "Subject" in the list.
searchlist="From|To|Message-ID|Date|Mime-Version|Content-Type|In-Reply-To"
count=$( sed -rn "/Subject/,/($searchlist)/ { /^($searchlist)/d ; p}' "$I" )
#or without "Subject"
count=$( sed -rn '/Subject/,/($searchlist)/ { /^($searchlist)/d ; s/Subject:[ ]*// ; p}' "$I" )
echo "${#count}"
It matches from "Subject", to the next line that contains something in "$searchlist". Then the sub-bracket removes the "$searchlist" line before printing.
See? No need to extract line numbers.
Here are a few useful sed references.
http://www.grymoire.com/Unix/Sed.html
http://sed.sourceforge.net/grabbag/
http://sed.sourceforge.net/sedfaq.html
http://sed.sourceforge.net/sed1line.txt