search file and extract lines matching array and within date range
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
search file and extract lines matching array and within date range
Hello. been struggling with this a while. i need to read multiple files (30) searching for a specific string in an array AND matching a date range (yesterday - 7 days).
my apologies as i know there is a way to post code but do not remember how
Code:
prevdate1=`date -d yesterday '+%Y-%m-%d'`
prevdate2=`date --date="7 day ago" '+%Y-%m-%d'`
timestamp=`date +%Y%m%d`
declare -a StringArray=("user1" "dev1" "dev2" "dev3")
folderin=/mnt/report/Audit
outfile=/tmp/svnaudit/svnaudit_$timestamp.txt
for val in ${StringArray[@]}; do
cd $folder
for filename in ./*_${timestamp}.xml; do
echo "reading $filename" >> $outfile
cat $filename |grep -A1 "$val"|egrep -B2 "($prevdate1|$prevdate2)" >> $outfile
# fi
done
done
You don’t grep for a date range, but two specific dates. To correct this, I would use date +%s to convert the dates to numbers, then check if the numbers are within the desired range.
By the way, what you are doing is known as cat abuse.
my apologies for the clumsiness in seeking assistance.
yes - it is severe cat abuse but it helped in seeking advice on what i was trying to accomplish from colleagues.
What i am trying to accomplish is obtaining data within a specific date range. the files contains data dating back to 2013. i only want data matching <date> yesterday </date> and seven days ago.
That explains why i was receiving specific dates as provided by $prevdate1 and $prevdate2. how would one specify a date range in egrep? I have seen several examples but they all do not appear to work.
again, we do not know the input file format and the required output. It looks like an xml, so probably an xml parser would be better (than grep) in perl/python/whatever. http://catb.org/~esr/faqs/smart-ques...html#beprecise
pardon - input is XML file ; output is simple TXT file as i only need to view the output for audit research.
Goal: I need only check in xml matching the array elements, capture current line which contains the date within range (2018-12-27) to (2018-12-20)and previous two lines so i capture author and revision
Since actual XML is involved, you should use a proper XML parser for that.
It'd take a short time to write up a perl script to parse that and measure the dates. The CPAN modules XML::XPath and Date::Calc would be of use there.
which is what i would expect when searching a range.
however, when i use my variables prevdate1, prevdayrange1, prevdayrange2
$prevdate1=`date +'%Y-%m'`
I appreciate the input but I feel like I am close so I am not looking to use an XML parser
Understand that you don't want to use an XML parser, but I'll throw out a couple of things for you to consider:
You currently have to read 30 files...if this expands, a small script is not going to work too well.
XML data is fairly trivial to parse; if even a TINY thing changes with the input currently, your script will require a lot of modification to work again
XML parsers are designed to handle XML data, and make it easy. Perl is easy to use, as is python. For example, in perl:
Code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find::Rule;
use XML::LibXML;
# Get a list of files from the input directory
my @InputFile = File::Find::Rule->file()
->name( qr/\.(XMI)$/ ) # The pattern to look for. Put more in delimited with a "|" if needed. Looks for XMI right now, case sensitive
->maxdepth(1) #....and ONLY look in the base directory...do not recurse...
->in( "/some/path/to/files" ); # ...and tell the module WHERE to look
# Process each file you find
foreach (@InputFile) {
$XMINPUT = $_;
my $InputXML = $_;
my $dom = XML::LibXML->load_xml(location => $InputXML);
# Now loop through the input file, reading entities one at a time for parsing/processing.
foreach my $title ($dom->findnodes('//log/logentry')) {
# This splits apart the tags and values into variables. Do things with them.
$Author = $title->findvalue('./Author');
$Date = $title->findvalue('./Date');
# Check date, output if needed.
if ($Date eq 'something') {
print "Date = $Date, Author = $Author\n"
}
}
TOTALLY untested, but that will easily churn through any files in a directory, look at the XML, assign the variables, and from there, you can examine/act. May have to change the XML entry point ("//log" instead of "//log/logentry"), or the variables, but this is all easy to test. Much more reliable than using awk/sed/grep to parse XML. And you can even eliminate the file::find module, and the lines that get the file names from the directory, and just pass a file name as an argument to this script...that's all there is to parsing/using XML.
Appreciate the input.
As I am not knowledgeable with Perl or Python, I was looking for something quick since it was only a seven days historical look. Since there appears to be a limitation expanding the variables, here is what myself and a colleague arrived at:
Code:
#!/bin/bash
## Date setup for feeding into the process
sevendays=`date +%Y-%m-%d -d "7 day ago"`
sixdays=`date +%Y-%m-%d -d "6 day ago"`
fivedays=`date +%Y-%m-%d -d "5 day ago"`
fourdays=`date +%Y-%m-%d -d "4 day ago"`
threedays=`date +%Y-%m-%d -d "3 day ago"`
twodays=`date +%Y-%m-%d -d "2 day ago"`
oneday=`date +%Y-%m-%d -d "1 day ago"`
today=`date "+%Y-%m-%d"`
prevday=`date -d yesterday '+%Y%m%d'`
## remove the previous days file - housekeeping to reduce file buildup
rm /tmp/svnaudit/svnaudit_$prevday.txt
currentyr=`date +%Y-`
timestamp=`date +%Y%m%d`
declare -a StringArray=("user1" "user12" "user30" "dev1" "dev5" "dev25" "dev15" "dev4")
#### create file/dir variables
folder=/mnt/midtier_logs/report/Audit
outfile=/tmp/svnaudit/svnaudit_$timestamp.txt
emailfile=/tmp/svnaudit/svnaudit_emailme.txt
rm $emailfile
for val in ${StringArray[@]}; do
cd $folder
for filename in ./*_${timestamp}.xml; do
### below statement seeks existence of each element in StringArray
# if found, pipe to determine if changed in the last 7 days. pipe to outfile if meets all criteria
## 7-day capture
cat $filename | grep -A1 "$val"|egrep -B2 "($today|$oneday|$twodays|$threedays|$fourdays|$fivedays|$sixdays|$sevendays)" >> $outfile
done
done
Appreciate the input.
As I am not knowledgeable with Perl or Python, I was looking for something quick since it was only a seven days historical look. Since there appears to be a limitation expanding the variables, here is what myself and a colleague arrived at:
Code:
#!/bin/bash
## Date setup for feeding into the process
sevendays=`date +%Y-%m-%d -d "7 day ago"`
sixdays=`date +%Y-%m-%d -d "6 day ago"`
fivedays=`date +%Y-%m-%d -d "5 day ago"`
fourdays=`date +%Y-%m-%d -d "4 day ago"`
threedays=`date +%Y-%m-%d -d "3 day ago"`
twodays=`date +%Y-%m-%d -d "2 day ago"`
oneday=`date +%Y-%m-%d -d "1 day ago"`
today=`date "+%Y-%m-%d"`
prevday=`date -d yesterday '+%Y%m%d'`
## remove the previous days file - housekeeping to reduce file buildup
rm /tmp/svnaudit/svnaudit_$prevday.txt
currentyr=`date +%Y-`
timestamp=`date +%Y%m%d`
declare -a StringArray=("user1" "user12" "user30" "dev1" "dev5" "dev25" "dev15" "dev4")
#### create file/dir variables
folder=/mnt/midtier_logs/report/Audit
outfile=/tmp/svnaudit/svnaudit_$timestamp.txt
emailfile=/tmp/svnaudit/svnaudit_emailme.txt
rm $emailfile
for val in ${StringArray[@]}; do
cd $folder
for filename in ./*_${timestamp}.xml; do
### below statement seeks existence of each element in StringArray
# if found, pipe to determine if changed in the last 7 days. pipe to outfile if meets all criteria
## 7-day capture
cat $filename | grep -A1 "$val"|egrep -B2 "($today|$oneday|$twodays|$threedays|$fourdays|$fivedays|$sixdays|$sevendays)" >> $outfile
done
done
Thank you again!
No worries, and thanks for posting your solution/code. A Perl/Python solution that does parse XML is more sturdy, but if this meets your needs and you're confident with the output, excellent. I lean towards Perl, since (with a very few lines), you can do a good bit that's MUCH harder to do with shell. The Time::Piece module could get your dates into a range-array, and you could then just look in the array for todays date...two lines of code. Might be worth learning, just to put more tools in your toolbox.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.