search file and extract lines matching array and within date range

MadeInGermany · 12-29-2018, 06:48 AM

Some improvement proposals:
1.
grep takes newline-separated patterns.
2.
It's more efficient to redirect the whole loop; further it gives you the choice between
done > filename and
done >> filename.
3.
< filename ... is more efficient than
cat filename | ...
4.
cd once and make sure it was successful!
5. Quotes around $var[@] and $@ keep the elements but prevent further word splitting and glob-expansion.

Code:

last7days=$(
  date +%Y-%m-%d -d "7 day ago"
  date +%Y-%m-%d -d "6 day ago"
  date +%Y-%m-%d -d "5 day ago"
  date +%Y-%m-%d -d "4 day ago"
  date +%Y-%m-%d -d "3 day ago"
  date +%Y-%m-%d -d "2 day ago"
  date +%Y-%m-%d -d "1 day ago"
  date "+%Y-%m-%d"
)
...
cd $folder || exit
for val in "${StringArray[@]}"
do
  for filename in ./*_${timestamp}.xml
  do
### below statement seeks existence of each element in StringArray
# if found, pipe to determine if changed in the last 7 days. pipe to outfile if meets all criteria
## 7-day capture
    < "$filename" grep -A1 "$val" | grep -B2 "$last7days"
  done
done >> $outfile

allend · 12-29-2018, 09:40 PM

Just for fun, a solution that uses gawk so that the 'mktime' function can be used to handle time stamps with greater flexibility. Also it saves multiple reads of the log files for each user.

Code:

#!/bin/bash

## Date setup for feeding into the process

start_date=$(date +%Y-%m-%d -d "7 day ago")
start_time="00:00:00"
end_date=$(date "+%Y-%m-%d")
end_time="23:59:59"

prevday=`date -d yesterday '+%Y%m%d'`
## remove the previous days file - housekeeping to reduce file buildup
#rm /tmp/svnaudit/svnaudit_$prevday.txt

timestamp=`date +%Y%m%d`

#### create file/dir variables
#folder=/mnt/midtier_logs/report/Audit
#outfile=/tmp/svnaudit/svnaudit_$timestamp.txt
outfile="outfile.txt"
#emailfile=/tmp/svnaudit/svnaudit_emailme.txt
#rm $emailfile

#cd $folder || exit

for filename in ./*_${timestamp}.xml; do
  gawk -v start_date=$start_date -v start_time=$start_time -v end_date=$end_date -v end_time=$end_time '
    BEGIN {
    FIELDWIDTHS="6:10 1:8"
    split(start_date,d,"-")
    split(start_time,t,":")
    st=mktime(d[1]" "d[2]" "d[3]" "t[1]" "t[2]" "t[3])
    split(end_date,d,"-")
    split(end_time,t,":")
    et=mktime(d[1]" "d[2]" "d[3]" "t[1]" "t[2]" "t[3])
    }
    /revision=/ {rev=$0}
    /user1|user12|user30|dev1|dev5|dev25|dev15|dev4/ {aut=$0}
    /[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}/ {
      split($1,d,"-")
      split($2,t,":")
      ct=mktime(d[1]" "d[2]" "d[3]" "t[1]" "t[2]" "t[3])
      if (ct > st && ct < et) {
        print rev
        print aut
        print
      }
    }' "$filename" >> "$outfile"
done

Notes:
1. I have used the FIELDWIDTHS variable to make it easy to isolate the date and time from the log file, as the line seems to be consistently created in the example data.
2. I have commented out the local file handling.

syg00 · 12-29-2018, 10:01 PM

Quote:

Originally Posted by allend

Also it saves multiple reads of the log files for each user.

I used to worry about such things, but for most (sane) sized files, this is no longer an issue - if the entire file remains RAM resident in the page-cache, no (extra) disk I/O ensues.

MadeInGermany · 12-30-2018, 12:48 AM

Quote:

Originally Posted by allend

Just for fun, a solution that uses gawk so that the 'mktime' function can be used to handle time stamps with greater flexibility. Also it saves multiple reads of the log files for each user.

Code:

#!/bin/bash

## Date setup for feeding into the process

start_date=$(date +%Y-%m-%d -d "7 day ago")
start_time="00:00:00"
end_date=$(date "+%Y-%m-%d")
end_time="23:59:59"

prevday=`date -d yesterday '+%Y%m%d'`
## remove the previous days file - housekeeping to reduce file buildup
#rm /tmp/svnaudit/svnaudit_$prevday.txt

timestamp=`date +%Y%m%d`

#### create file/dir variables
#folder=/mnt/midtier_logs/report/Audit
#outfile=/tmp/svnaudit/svnaudit_$timestamp.txt
outfile="outfile.txt"
#emailfile=/tmp/svnaudit/svnaudit_emailme.txt
#rm $emailfile

#cd $folder || exit

for filename in ./*_${timestamp}.xml; do
  gawk -v start_date=$start_date -v start_time=$start_time -v end_date=$end_date -v end_time=$end_time '
    BEGIN {
    FIELDWIDTHS="6:10 1:8"
    split(start_date,d,"-")
    split(start_time,t,":")
    st=mktime(d[1]" "d[2]" "d[3]" "t[1]" "t[2]" "t[3])
    split(end_date,d,"-")
    split(end_time,t,":")
    et=mktime(d[1]" "d[2]" "d[3]" "t[1]" "t[2]" "t[3])
    }
    /revision=/ {rev=$0}
    /user1|user12|user30|dev1|dev5|dev25|dev15|dev4/ {aut=$0}
    /[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}/ {
      split($1,d,"-")
      split($2,t,":")
      ct=mktime(d[1]" "d[2]" "d[3]" "t[1]" "t[2]" "t[3])
      if (ct > st && ct < et) {
        print rev
        print aut
        print
      }
    }' "$filename" >> "$outfile"
done

Notes:
1. I have used the FIELDWIDTHS variable to make it easy to isolate the date and time from the log file, as the line seems to be consistently created in the example data.
2. I have commented out the local file handling.

Again, use efficient redirection (especially if you want to reduce I/O operations!)

Code:

    }' "$filename"
done >> "$outfile"

pan64 · 12-30-2018, 03:06 AM

do not need the for loop

Code:

awk -v .... ' script ' ./*_${timestamp}.xml >> "$outfile"

MadeInGermany · 12-31-2018, 05:01 AM

... unless you want to handle zero matches or thousands of matches (prevent from "awk: two many arguments").