LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-18-2019, 10:45 AM   #1
socalheel
Member
 
Registered: Oct 2012
Location: Raleigh, NC
Distribution: CentOS / RHEL
Posts: 158

Rep: Reputation: 3
Search a variable and a string in a file using AWK


I am trying to find two strings in a file, the first string will be a timestamp using a variable and the second string will always be POST

i'd like to get this accomplished with one awk command and i cannot figure it out.

I am starting out with this:
Code:
awk '/POST/ && -v timestamp=`date -d '1 hour ago' '+%d/%b/%Y:%H:'` '{ print $1 } filename.txt''
I know the variable portion of this works, but I don't know how to combine it to also search for POST.
can someone point me in the right direction?


example.
here is my file i am searching in:
Code:
4.4.4.4 (10.0.2.60) - - [18/Oct/2019:15:38:08 +0000] "GET /customer/ HTTP/1.1" 200 229 "https://www.site.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 Edge/16.16299" **0/152411**
- (10.0.2.60) - - [18/Oct/2019:15:38:09 +0000] "GET /heartbeat.txt HTTP/1.1" 200 6 "-" "ELB-HealthChecker/2.0" **0/8734**
4.4.4.4 (10.0.1.183) - - [18/Oct/2019:15:38:08 +0000] "GET /customer/ HTTP/1.1" 200 10136 "https://site.com/" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36" **0/438338**
5.5.5.5 (10.0.1.183) - - [18/Oct/2019:15:38:08 +0000] "POST /rest/  HTTP/1.1" 400 53 "-" "Throwback/200.0.4 ( 89 )" **0/376289**
5.5.5.5 (10.0.1.183) - - [18/Oct/2019:15:38:10 +0000] "POST /rest/ HTTP/1.1" 400 53 "-" "Throwback/200.0.4 ( 89 )" **0/367578**
5.5.5.5 (10.0.1.183) - - [18/Oct/2019:15:38:11 +0000] "POST /sale.html HTTP/1.1" 200 9256 "https://www.site.com/sale.html?p=5" "Mozilla/5.0 (Linux; Android 7.0; SM-T813) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.116 Safari/537.36" **0/689643**

I need to find every line that has the previous hour and the word POST and print out the first column. In this example, I would need 5.5.5.5 printed out 3 times
 
Old 10-18-2019, 10:56 AM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,312
Blog Entries: 3

Rep: Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722Reputation: 3722
I'd try it like this,

Code:
awk -v timestamp=$(date -d '1 hour ago' '+%d/%b/%Y:%H:') '/POST/ && $0~timestamp { print $1 }' filename.txt
though you could narrow the search to $5 instead.
 
Old 10-18-2019, 11:53 AM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
A crude solution...

With this InFile ...
Code:
4.4.4.4 (10.0.2.60) - - [18/Oct/2019:15:38:08 +0000] "GET /customer/ HTTP/1.1" 200 229 "https://www.site.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 Edge/16.16299" **0/152411**
- (10.0.2.60) - - [18/Oct/2019:15:38:09 +0000] "GET /heartbeat.txt HTTP/1.1" 200 6 "-" "ELB-HealthChecker/2.0" **0/8734**
4.4.4.4 (10.0.1.183) - - [18/Oct/2019:15:38:08 +0000] "GET /customer/ HTTP/1.1" 200 10136 "https://site.com/" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36" **0/438338**
5.5.5.5 (10.0.1.183) - - [18/Oct/2019:15:38:08 +0000] "POST /rest/  HTTP/1.1" 400 53 "-" "Throwback/200.0.4 ( 89 )" **0/376289**
5.5.5.5 (10.0.1.183) - - [18/Oct/2019:15:38:10 +0000] "POST /rest/ HTTP/1.1" 400 53 "-" "Throwback/200.0.4 ( 89 )" **0/367578**
5.5.5.5 (10.0.1.183) - - [18/Oct/2019:15:38:11 +0000] "POST /sale.html HTTP/1.1" 200 9256 "https://www.site.com/sale.html?p=5" "Mozilla/5.0 (Linux; Android 7.0; SM-T813) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.116 Safari/537.36" **0/689643**
... this awk ...
Code:
awk -F '[][]' '{a[$4]++; Line[NR]=$0}
  END{n=asorti(a,b)
      for (j=1;j<=n;j++) HOI=a[b[j]]-1  # HOI = Hour Of Interest
      for (j=1;j<=NR;j++) if (substr(Line[j],1,1)==HOI) print Line[j]}'  \
$InFile >$OutFile
... produced this OutFile ...
Code:
5.5.5.5 (10.0.1.183) - - [18/Oct/2019:15:38:08 +0000] "POST /rest/  HTTP/1.1" 400 53 "-" "Throwback/200.0.4 ( 89 )" **0/376289**
5.5.5.5 (10.0.1.183) - - [18/Oct/2019:15:38:10 +0000] "POST /rest/ HTTP/1.1" 400 53 "-" "Throwback/200.0.4 ( 89 )" **0/367578**
5.5.5.5 (10.0.1.183) - - [18/Oct/2019:15:38:11 +0000] "POST /sale.html HTTP/1.1" 200 9256 "https://www.site.com/sale.html?p=5" "Mozilla/5.0 (Linux; Android 7.0; SM-T813) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.116 Safari/537.36" **0/689643**
Daniel B. Martin

.

Last edited by danielbmartin; 10-18-2019 at 11:56 AM. Reason: Tighten the code, slightly
 
Old 10-18-2019, 12:05 PM   #4
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Solution withdrawn because I made wrong assumptions about
the content of the InFile ...

Anyway, the idea was to keep the entire InFile in an array and then print selected array elements.

Daniel B. Martin

.

Last edited by danielbmartin; 10-18-2019 at 12:43 PM.
 
Old 10-18-2019, 01:36 PM   #5
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
meh

Code:
while IFS="[]" read -a foo
do
  printf "test this \"%s\" " "${foo[1]}"
  printf "and print this \"%s\" if matched\n" "${foo[0]%% *}"
done < <(grep POST filename.txt )
your test is a moving target, so I've not attempted to do the test
 
Old 10-18-2019, 01:38 PM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
I'm still guessing about the content and format of the InFile.
Anyway, you could read the InFile twice instead of building a copy
in an array. Something like this ...

Code:
awk -F. '{if (NR==FNR) HOI=$1
          else if (substr($0,1,1)==HOI) print}'  \
$InFile $InFile >$OutFile
Daniel B. Martin

.
 
Old 10-18-2019, 01:42 PM   #7
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Quote:
Originally Posted by danielbmartin View Post
I'm still guessing about the content and format of the InFile.
Anyway, you could read the InFile twice instead of building a copy
in an array. Something like this ...

Code:
awk -F. '{if (NR==FNR) HOI=$1
          else if (substr($0,1,1)==HOI) print}'  \
$InFile $InFile >$OutFile
Daniel B. Martin

.

if you think about the test

"is the date/time 1 hour ago"

how often do you think that will return true?
 
Old 10-18-2019, 01:56 PM   #8
socalheel
Member
 
Registered: Oct 2012
Location: Raleigh, NC
Distribution: CentOS / RHEL
Posts: 158

Original Poster
Rep: Reputation: 3
Thanks all for the input, sorry I didn't post up sooner, I was trying to get this done and wasn't expecting such quick replies.

My input file is the standard apache access logs. I am looking for a pattern of a POST to a certain URL and the format will always be the same.

What I want is if a specific URL is POSTed to more than 10 times in an hour from the same IP address, I want to be notified via email of the number of times and the IP addresses associated with each POST. I used turbo's statement and came up with:

Code:
#!/bin/bash
file=carding_attack_ips.txt

#
echo -e "Possible carding attack on Propper's prod site.\n\nThere have been more than 10 POSTs to paypal during the last hour.\n\nBelow are the number of times POSTed and the IPs that need to be investigated.\n\n" > $file

#Looking for all the POSTs to paypal for the last hour
awk '$5 ~ /'$(date +%d.%b.%Y.%H -d "- 1 hour")'/ && $7 ~ /POST \/paypal\/transparent\/requestSecureToken\// {print $1}' /var/www/propper-prod/logs/access.log | sort | uniq -c | sort -r -k 1 -n >> $file
#awk -v timestamp=$(date -d '1 hour ago' '+%d/%b/%Y:%H:') '/POST \/paypal\/transparent\/requestSecureToken\// && $0~timestamp { print $1 }' /var/www/propper-prod/logs/access.log | sort | uniq -c | sort -r -k 1 -n >> $file


#If any posts are greater than 10 send an email starting to read the file at line 7 to avoid the email body
if [[ `awk -vNUM=7 '(NR>=NUM) {print $1}' $file` > 2 ]]
  then
   cat $file | mutt -s "Possible Carding Attack on Propper"  -- tom.moretto@atlanticbt.com
fi
I have the script running as a cronjob at the top of every hour.

Last edited by socalheel; 10-18-2019 at 03:35 PM.
 
Old 10-18-2019, 01:58 PM   #9
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
ok

had a think

Code:
while IFS="[]" read -a foo
do
  [[ ${foo[1]} =~ $(date +%d/%b/%Y:%H -d "- 1 hour") ]] \
    && printf "%s\n" "${foo[0]%% *}"
done < <(grep POST filename.txt )

Code:
]
while IFS="[]" read -a foo
do
  [[ ${foo[1]} =~ $(date +%d/%b/%Y:%H -d "- 1 hour") ]] \
    && printf "%s\n" "${foo[0]%% *}"
done < <(grep "$(date +%d/%b/%Y:%H -d "- 1 hour")" filename.txt )

Code:
while read -a foo
do
  [[ ${foo[@]} =~ POST ]] \
    && printf "%s\n" "${foo[0]}"
done < <(grep "$(date +%d/%b/%Y:%H -d "- 1 hour")" filename.txt )
the logic is selecting the hour is still flawed

you might want to figure out the age and test that

i.e.
Now=$( date -u +%s )
seconds since epoc,
that is your big number

workout the seconds since epoc of the ill formatted "18/Oct/2019:15:38:08 +0000"

bigNumber - littleNumber = age in seconds

is less than 3600?
 
Old 10-18-2019, 02:40 PM   #10
socalheel
Member
 
Registered: Oct 2012
Location: Raleigh, NC
Distribution: CentOS / RHEL
Posts: 158

Original Poster
Rep: Reputation: 3
firerat, i'm not following how the time selection is flawed. i need to get the hour of the previous hour.

i.e. at 2000 hours, i run
Code:
date -d '1 hour ago' '+%d/%b/%Y:%H:'
that will output
Code:
19/Oct/2019:19:
which is what i want to grep for.

are you suggesting a better way?
 
Old 10-18-2019, 02:40 PM   #11
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Code:
awk '$5 ~ /'$(date +%d.%b.%Y.%H -d "- 1 hour")'/ && $7 ~ /POST/ {print $1}' filename.txt
 
1 members found this post helpful.
Old 10-18-2019, 02:46 PM   #12
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Quote:
Originally Posted by socalheel View Post
firerat, i'm not following how the time selection is flawed. i need to get the hour of the previous hour.

i.e. if i run
Code:
date -d '1 hour ago' '+%d/%b/%Y:%H:'
at the top of hour 20, that will output
Code:
19/Oct/2019:19:
which is what i want to grep for.

are you suggesting a better way?

log format is flawed, you need to go through many steps to get it into a form that is flexible


there was no mention of cron in your OP, with that context I now understand the date to be a filter and not a test condition

anyway, solution in my previous post

very simple once I understood what you actually wanted
 
Old 10-18-2019, 02:48 PM   #13
socalheel
Member
 
Registered: Oct 2012
Location: Raleigh, NC
Distribution: CentOS / RHEL
Posts: 158

Original Poster
Rep: Reputation: 3
Quote:
Originally Posted by Firerat View Post
log format is flawed, you need to go through many steps to get it into a form that is flexible


there was no mention of cron in your OP, with that context I now understand the date to be a filter and not a test condition

anyway, solution in my previous post

very simple once I understood what you actually wanted

ahh, gotcha. i didn't know a cron would have made a difference with what i was looking for. sorry for not mentioning it sooner (i guess every detail is important).

with your awk statement, you substitute the forward slash with periods... is that for regex purposes to capture any character?
 
Old 10-18-2019, 03:21 PM   #14
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Quote:
Originally Posted by socalheel View Post
ahh, gotcha. i didn't know a cron would have made a difference with what i was looking for. sorry for not mentioning it sooner (i guess every detail is important).

with your awk statement, you substitute the forward slash with periods... is that for regex purposes to capture any character?
yeap ~ /18/Oct/2019:15/ proved problematic

18.Oct.2019:15
is a much safer filter

I could have grabbed to date to a var and escaped it, but . was quicker

yeah the cron mattered,. since that helped my understand the "previous hour"

and real time format has a standard,

Code:
date --iso-8601=ns

Code:
SecondsSince=$(date +%s.%N -d "$(date --iso-8601=ns)")
echo ${SecondsSince}
date -d "@${SecondsSince}"
when you get things back to seconds.nanoseconds you can do so maths and get back to a date/time

bad format in the log
Code:
date +%s.%N -d "18/Oct/2019:15:38:08"
 
Old 10-20-2019, 10:52 AM   #15
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,798

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
A simple way to have the / characters
Code:
awk '$5 ~ "'$(date +%d/%b/%Y/%H -d "- 1 hour")'" && $7 ~ /POST/ {print $1}'
But in general /search/ causes less problems than "search".
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Pass search results to awk, and use awk output to search other files bspears1 Linux - Newbie 8 07-21-2012 09:17 AM
How can awk search a string without using regular expression? 915086731 Programming 8 09-07-2011 10:07 PM
Sed/awk/grep search for number string of variable length in text file Alexr Linux - Newbie 10 01-19-2010 01:34 PM
AWK a variable Ouptut to a new variable and using the new variable with the old one alertroshannow Linux - Newbie 4 02-16-2009 12:08 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:33 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration