LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 12-27-2018, 01:55 PM   #1
j-me
Member
 
Registered: Jan 2003
Location: des moines, ia
Distribution: suse RH
Posts: 129

Rep: Reputation: 17
search file and extract lines matching array and within date range


Hello. been struggling with this a while. i need to read multiple files (30) searching for a specific string in an array AND matching a date range (yesterday - 7 days).

my apologies as i know there is a way to post code but do not remember how
Code:
prevdate1=`date -d yesterday '+%Y-%m-%d'`
prevdate2=`date --date="7 day ago" '+%Y-%m-%d'`
timestamp=`date +%Y%m%d`
declare -a StringArray=("user1" "dev1" "dev2" "dev3")
folderin=/mnt/report/Audit
outfile=/tmp/svnaudit/svnaudit_$timestamp.txt
for val in ${StringArray[@]}; do
  cd $folder
   for filename in ./*_${timestamp}.xml; do
    echo "reading $filename" >> $outfile

    cat $filename |grep -A1 "$val"|egrep -B2 "($prevdate1|$prevdate2)" >> $outfile
#    fi
 done
done
/*
output:
reading ./svnAudit_acc_20181227.xml
revision="6089">
<author>dev12</author>
<date>2018-12-26T15:37:53.853106Z</date>
--
revision="6051">
<author>dev25</author>
<date>2018-12-12T20:52:22.697533Z</date>
--
revision="6050">
<author>dev10</author>
<date>2018-12-12T20:46:02.612535Z</date>
--
revision="6049">
<author>dev10</author>
<date>2018-12-12T20:45:09.370066Z</date>
--
revision="6048">
<author>dev10</author>
<date>2018-12-12T20:38:29.755046Z</date>
--
revision="6047">
<author>dev15</author>
<date>2018-12-12T20:38:23.338277Z</date>
--
revision="6046">
<author>dev3</author>
<date>2018-12-12T20:35:56.381112Z</date>
--
revision="6045">
<author>dev1</author>
<date>2018-12-12T20:34:10.526142Z</date>
--
revision="6044">
<author>dev5</author>
<date>2018-12-12T20:32:47.430046Z</date>
--
revision="6043">
<author>dev30</author>
<date>2018-12-12T20:28:59.366148Z</date>
--
revision="6042">
<author>dev27</author>
<date>2018-12-12T19:48:48.263031Z</date>
reading ./svnAudit_ace_20181227.xml
reading ./svnAudit_act_20181227.xml
reading ./svnAudit_agp_20181227.xml
*/

Thank you in advance for any suggestions

Last edited by j-me; 12-28-2018 at 06:59 AM.
 
Old 12-27-2018, 04:14 PM   #2
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
Without knowing the input format, it’s close to impossible to comment.

See my signature for formatting code. You can edit your post and add/correct the code tags.
 
Old 12-27-2018, 04:18 PM   #3
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,734

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
I don't actually see a question here. Does the code not do what you want it to do? If not, what's not right?
 
Old 12-27-2018, 04:34 PM   #4
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
Quote:
Originally Posted by scasey View Post
I don't actually see a question here. Does the code not do what you want it to do? If not, what's not right?
The OP’s dates are not right.

Quote:
Code:
cat $filename |grep -A1 "$val"|egrep -B2 "($prevdate1|$prevdate2)" >> $outfile
You don’t grep for a date range, but two specific dates. To correct this, I would use date +%s to convert the dates to numbers, then check if the numbers are within the desired range.

By the way, what you are doing is known as cat abuse.
 
Old 12-27-2018, 06:41 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,140

Rep: Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122
With only 7 dates of concern, might be easier to just assign variables and test for all of them.

BTW - use [code] tags, not <code>.
 
Old 12-28-2018, 07:05 AM   #6
j-me
Member
 
Registered: Jan 2003
Location: des moines, ia
Distribution: suse RH
Posts: 129

Original Poster
Rep: Reputation: 17
my apologies for the clumsiness in seeking assistance.
yes - it is severe cat abuse but it helped in seeking advice on what i was trying to accomplish from colleagues.

What i am trying to accomplish is obtaining data within a specific date range. the files contains data dating back to 2013. i only want data matching <date> yesterday </date> and seven days ago.

That explains why i was receiving specific dates as provided by $prevdate1 and $prevdate2. how would one specify a date range in egrep? I have seen several examples but they all do not appear to work.

Thank you
 
Old 12-28-2018, 07:33 AM   #7
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,930

Rep: Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320
again, we do not know the input file format and the required output. It looks like an xml, so probably an xml parser would be better (than grep) in perl/python/whatever.
http://catb.org/~esr/faqs/smart-ques...html#beprecise
 
Old 12-28-2018, 07:49 AM   #8
j-me
Member
 
Registered: Jan 2003
Location: des moines, ia
Distribution: suse RH
Posts: 129

Original Poster
Rep: Reputation: 17
pardon - input is XML file ; output is simple TXT file as i only need to view the output for audit research.

Goal: I need only check in xml matching the array elements, capture current line which contains the date within range (2018-12-27) to (2018-12-20)and previous two lines so i capture author and revision
 
Old 12-28-2018, 07:55 AM   #9
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,930

Rep: Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320Reputation: 7320
can you please post some examples (obviously with fake data if required)
 
Old 12-28-2018, 08:13 AM   #10
j-me
Member
 
Registered: Jan 2003
Location: des moines, ia
Distribution: suse RH
Posts: 129

Original Poster
Rep: Reputation: 17
Is this enough of an example?

input file example (xml):

Code:
<?xml version="1.0" encoding="UTF-8"?>
<log>
<logentry
   revision="23024">
<author>user1</author>
<date>2018-12-20T14:06:24.493250Z</date>
<paths>
<path
   prop-mods="false"
   text-mods="true"
   kind="file"
   action="M">/midtierscripts/JBOSS/utility/New_Server_Setup/Build_Menu.sh</path>
</paths>
<msg>WTASK0126227- modify Build_Menu.sh to add log and config directories during a new Server build to the /opt/billing directory: </msg>
</logentry>
<logentry
   revision="23023">
<author>dev15</author>
<date>2018-12-17T16:17:43.747514Z</date>
<paths>
<path
   prop-mods="false"
   text-mods="true"
   kind="file"
   action="M">/midtierscripts/windows/AWS/StoredProcedures/Int.xml</path>
</paths>
<msg>STSK0037792- updated for Jenkins deploy stop for StoredProcedures/Int.xml</msg>
</logentry>
<logentry
   revision="23022">
<author>dev1</author>
<date>2018-12-14T16:18:31.578787Z</date>
<paths>
<path
   prop-mods="false"
   text-mods="false"
   kind="dir"
   action="A">/powershellscripts/FMJ</path>
<path
   prop-mods="false"
   text-mods="true"
   kind="file"
   action="A">/powershellscripts/FMJ/DeployBobjCode.ps1</path>
</paths>
<msg>RTSK0081943- created a new folder and added a powershell for scheduled deployments.</msg>
</logentry>
<logentry
   revision="23000">
<author>user1</author>
<date>2018-12-24T13:04:15.484140Z</date>
<paths>
<path
   prop-mods="true"
   text-mods="true"
   kind="file"
   action="A">/midtierscripts/JBOSS/utility/Jenkins_Deployments/mucunzipper.sh</path>
<path
   prop-mods="true"
   text-mods="true"
   kind="file"
   action="A">/midtierscripts/JBOSS/utility/Jenkins_Deployments/newzip_runner.sh</path>
</paths>
<msg>STSK0037322- creating new scripts to run muctask on jboss servers</msg>
</logentry>
<logentry>
   revision="22791">
<author>dev2</author>
<date>2018-12-26T17:04:33.665300Z</date>
<paths>
<path
   prop-mods="false"
   text-mods="true"
   kind="file"
   action="M">/powershellscripts/RNG/rng8w2k16_WCFSite.ps1</path>
</paths>
<msg>Correction AppPool name deployment loop</msg>
</logentry>
<logentry
   revision="1">
<author>dev10</author>
<date>2014-12-22T17:59:53.181189Z</date>
<paths>
<path
   prop-mods="false"
   text-mods="false"
   kind="dir"
   action="A">/powershellscripts</path>
<path
   prop-mods="false"
   text-mods="false"
   kind="dir"
   action="A">/powershellscripts/FFC</path>
<path
   prop-mods="false"
   text-mods="true"
   kind="file"
   action="A">/powershellscripts/FFC/DeployPublic.ps1</path>
</paths>
<msg>Added for powershellscripts for FFC/DeployPublic.ps1</msg>
</logentry>
</log>
output file expected (txt):
Code:
 
   revision="23024">
<author>user1</author>
<date>2018-12-20T14:06:24.493250Z</date>
   revision="23000">
<author>user1</author>
<date>2018-12-24T13:04:15.484140Z</date>
   revision="22791">
<author>dev2</author>
<date>2018-12-26T17:04:33.665300Z</date>
 
Old 12-28-2018, 09:13 AM   #11
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,328
Blog Entries: 3

Rep: Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726Reputation: 3726
Quote:
Originally Posted by j-me View Post
input file example (xml):
Since actual XML is involved, you should use a proper XML parser for that.

It'd take a short time to write up a perl script to parse that and measure the dates. The CPAN modules XML::XPath and Date::Calc would be of use there.
 
Old 12-28-2018, 09:32 AM   #12
j-me
Member
 
Registered: Jan 2003
Location: des moines, ia
Distribution: suse RH
Posts: 129

Original Poster
Rep: Reputation: 17
Here is what i have been able to accomplish:

$ echo egrep -B2 -e2018-12-{28..13}
egrep -B2 -e2018-12-28 -e2018-12-27 -e2018-12-26 -e2018-12-25 -e2018-12-24 -e2018-12-23 -e2018-12-22 -e2018-12-21 -e2018-12-20 -e2018-12-19 -e2018-12-18 -e2018-12-17 -e2018-12-16 -e2018-12-15 -e2018-12-14 -e2018-12-13

which is what i would expect when searching a range.
however, when i use my variables prevdate1, prevdayrange1, prevdayrange2
$prevdate1=`date +'%Y-%m'`

$prevdayrange1=`date +'%d'`

$prevdayrange2=`date --date="7 day ago" +'%d'

$echo egrep -B2 -e"$prevdate1"-{"$prevdayrange1".."$prevdayrange2"}
egrep -B2 -e2018-12-{28..13}

expands the variables but not as a range.

this is what i would expect: (pardon the cat abuse)

$cat svnAudit_smd_20181228.xml|egrep -B2 -e2018-12-{28..13} | more
revision="23024">
<author>e009137</author>
<date>2018-12-20T14:06:24.493250Z</date>
--
revision="23023">
<author>e009137</author>
<date>2018-12-17T16:17:43.747514Z</date>
--
revision="23022">
<author>e009137</author>
<date>2018-12-14T16:18:31.578787Z</date>
--
revision="23021">
<author>e009137</author>
<date>2018-12-13T13:55:47.586483Z</date>
--
revision="23020">
<author>e009137</author>
<date>2018-12-13T12:59:12.857066Z</date>


I appreciate the input but I feel like I am close so I am not looking to use an XML parser
 
Old 12-28-2018, 10:15 AM   #13
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,680

Rep: Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971
Quote:
Originally Posted by j-me View Post
Here is what i have been able to accomplish:
Code:
$ echo egrep -B2 -e2018-12-{28..13}
egrep -B2 -e2018-12-28 -e2018-12-27 -e2018-12-26 -e2018-12-25 -e2018-12-24 -e2018-12-23 -e2018-12-22 -e2018-12-21 -e2018-12-20 -e2018-12-19 -e2018-12-18 -e2018-12-17 -e2018-12-16 -e2018-12-15 -e2018-12-14 -e2018-12-13
which is what i would expect when searching a range. however, when i use my variables prevdate1, prevdayrange1, prevdayrange2
Code:
$prevdate1=`date +'%Y-%m'`
$prevdayrange1=`date +'%d'`
$prevdayrange2=`date --date="7 day ago" +'%d'
$echo egrep -B2 -e"$prevdate1"-{"$prevdayrange1".."$prevdayrange2"}
egrep -B2 -e2018-12-{28..13}
expands the variables but not as a range. this is what i would expect: (pardon the cat abuse)
Code:
$cat svnAudit_smd_20181228.xml|egrep -B2 -e2018-12-{28..13} | more
   revision="23024">
<author>e009137</author>
<date>2018-12-20T14:06:24.493250Z</date>
--
   revision="23023">
<author>e009137</author>
<date>2018-12-17T16:17:43.747514Z</date>
--
   revision="23022">
<author>e009137</author>
<date>2018-12-14T16:18:31.578787Z</date>
--
   revision="23021">
<author>e009137</author>
<date>2018-12-13T13:55:47.586483Z</date>
--
   revision="23020">
<author>e009137</author>
<date>2018-12-13T12:59:12.857066Z</date>
I appreciate the input but I feel like I am close so I am not looking to use an XML parser
Understand that you don't want to use an XML parser, but I'll throw out a couple of things for you to consider:
  1. You currently have to read 30 files...if this expands, a small script is not going to work too well.
  2. XML data is fairly trivial to parse; if even a TINY thing changes with the input currently, your script will require a lot of modification to work again
XML parsers are designed to handle XML data, and make it easy. Perl is easy to use, as is python. For example, in perl:
Code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find::Rule;
use XML::LibXML;

# Get a list of files from the input directory
my @InputFile = File::Find::Rule->file()
                              ->name( qr/\.(XMI)$/ )  # The pattern to look for.  Put more in delimited with a "|" if needed.  Looks for XMI right now, case sensitive
                              ->maxdepth(1)           #....and ONLY look in the base directory...do not recurse...
                              ->in( "/some/path/to/files" );     # ...and tell the module WHERE to look
# Process each file you find
 foreach (@InputFile) {
  $XMINPUT = $_;
  my $InputXML = $_;
    my $dom = XML::LibXML->load_xml(location => $InputXML);
    
    # Now loop through the input file, reading entities one at a time for parsing/processing.
    foreach my $title ($dom->findnodes('//log/logentry')) {
    # This splits apart the tags and values into variables. Do things with them.
    $Author = $title->findvalue('./Author');
    $Date = $title->findvalue('./Date');
    # Check date, output if needed.
    if ($Date eq 'something') {
      print "Date = $Date, Author = $Author\n"
      }
  }
TOTALLY untested, but that will easily churn through any files in a directory, look at the XML, assign the variables, and from there, you can examine/act. May have to change the XML entry point ("//log" instead of "//log/logentry"), or the variables, but this is all easy to test. Much more reliable than using awk/sed/grep to parse XML. And you can even eliminate the file::find module, and the lines that get the file names from the directory, and just pass a file name as an argument to this script...that's all there is to parsing/using XML.

Last edited by TB0ne; 12-28-2018 at 10:47 AM.
 
Old 12-28-2018, 12:46 PM   #14
j-me
Member
 
Registered: Jan 2003
Location: des moines, ia
Distribution: suse RH
Posts: 129

Original Poster
Rep: Reputation: 17
Appreciate the input.
As I am not knowledgeable with Perl or Python, I was looking for something quick since it was only a seven days historical look. Since there appears to be a limitation expanding the variables, here is what myself and a colleague arrived at:

Code:
#!/bin/bash

## Date setup for feeding into the process

sevendays=`date +%Y-%m-%d -d "7 day ago"`
sixdays=`date +%Y-%m-%d -d "6 day ago"`
fivedays=`date +%Y-%m-%d -d "5 day ago"`
fourdays=`date +%Y-%m-%d -d "4 day ago"`
threedays=`date +%Y-%m-%d -d "3 day ago"`
twodays=`date +%Y-%m-%d -d "2 day ago"`
oneday=`date +%Y-%m-%d -d "1 day ago"`
today=`date "+%Y-%m-%d"`

prevday=`date -d yesterday '+%Y%m%d'`
## remove the previous days file - housekeeping to reduce file buildup
rm /tmp/svnaudit/svnaudit_$prevday.txt

currentyr=`date +%Y-`
timestamp=`date +%Y%m%d`
declare -a StringArray=("user1" "user12" "user30" "dev1" "dev5" "dev25" "dev15" "dev4")

#### create file/dir variables
folder=/mnt/midtier_logs/report/Audit
outfile=/tmp/svnaudit/svnaudit_$timestamp.txt
 emailfile=/tmp/svnaudit/svnaudit_emailme.txt
 rm $emailfile
  for val in ${StringArray[@]}; do

   cd $folder
    for filename in ./*_${timestamp}.xml; do
### below statement seeks existence of each element in StringArray
# if found, pipe to determine if changed in the last 7 days. pipe to outfile if meets all criteria
## 7-day capture
   cat $filename | grep -A1 "$val"|egrep -B2 "($today|$oneday|$twodays|$threedays|$fourdays|$fivedays|$sixdays|$sevendays)"  >> $outfile
 done
done
Thank you again!
 
1 members found this post helpful.
Old 12-28-2018, 04:03 PM   #15
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,680

Rep: Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971Reputation: 7971
Quote:
Originally Posted by j-me View Post
Appreciate the input.
As I am not knowledgeable with Perl or Python, I was looking for something quick since it was only a seven days historical look. Since there appears to be a limitation expanding the variables, here is what myself and a colleague arrived at:

Code:
#!/bin/bash

## Date setup for feeding into the process

sevendays=`date +%Y-%m-%d -d "7 day ago"`
sixdays=`date +%Y-%m-%d -d "6 day ago"`
fivedays=`date +%Y-%m-%d -d "5 day ago"`
fourdays=`date +%Y-%m-%d -d "4 day ago"`
threedays=`date +%Y-%m-%d -d "3 day ago"`
twodays=`date +%Y-%m-%d -d "2 day ago"`
oneday=`date +%Y-%m-%d -d "1 day ago"`
today=`date "+%Y-%m-%d"`

prevday=`date -d yesterday '+%Y%m%d'`
## remove the previous days file - housekeeping to reduce file buildup
rm /tmp/svnaudit/svnaudit_$prevday.txt

currentyr=`date +%Y-`
timestamp=`date +%Y%m%d`
declare -a StringArray=("user1" "user12" "user30" "dev1" "dev5" "dev25" "dev15" "dev4")

#### create file/dir variables
folder=/mnt/midtier_logs/report/Audit
outfile=/tmp/svnaudit/svnaudit_$timestamp.txt
 emailfile=/tmp/svnaudit/svnaudit_emailme.txt
 rm $emailfile
  for val in ${StringArray[@]}; do

   cd $folder
    for filename in ./*_${timestamp}.xml; do
### below statement seeks existence of each element in StringArray
# if found, pipe to determine if changed in the last 7 days. pipe to outfile if meets all criteria
## 7-day capture
   cat $filename | grep -A1 "$val"|egrep -B2 "($today|$oneday|$twodays|$threedays|$fourdays|$fivedays|$sixdays|$sevendays)"  >> $outfile
 done
done
Thank you again!
No worries, and thanks for posting your solution/code. A Perl/Python solution that does parse XML is more sturdy, but if this meets your needs and you're confident with the output, excellent. I lean towards Perl, since (with a very few lines), you can do a good bit that's MUCH harder to do with shell. The Time::Piece module could get your dates into a range-array, and you could then just look in the array for todays date...two lines of code. Might be worth learning, just to put more tools in your toolbox.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
delete lines between string patterns (inclusive) only when the lines within this range don't contain a certain string vincix Programming 10 01-08-2019 04:42 AM
LXer: How To Empty a File, Delete N Lines From a File, Remove Matching String From a File, And Remove Empty/Blank Lines From a File In Linux LXer Syndicated Linux News 0 11-22-2017 12:30 PM
[SOLVED] Will receive two file on same date but i need to extract only one file with date like spatil20 Linux - General 7 07-01-2015 01:16 AM
[SOLVED] extract lines using date range veda92 Programming 2 07-22-2013 05:58 AM
[SOLVED] Search within a log file within a time Range tonan Linux - Newbie 5 08-25-2011 03:10 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 03:57 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration