LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Search text if some part sequence exists (https://www.linuxquestions.org/questions/programming-9/search-text-if-some-part-sequence-exists-4175662211/)

pedropt 10-08-2019 08:30 AM

Search text if some part sequence exists
 
Hi guys , i dont even know how to write the topic to match what i need .

Here it is , i am writing a definition file with errors messages appears in my web log servers .

Example of def.conf

Quote:

ThinkPHP_RCE /module/action/param1 ${@die(md5(HelloThinkPHP))}
Now in web server log this line could be in many forms but that specific sequence is there , by this i mean :

Quote:

111.111.111.111 "GET /index.php/module/action/param1 ${@die(md5(HelloThinkPHP))}"
111.111.111.111 "GET /something//module/action/param1 ${@die(md5(HelloThinkPHP))}

Now , when i put my script searching the definition log , i will use the variable i have in the log , witch is :
"/index.php/module/action/param1 ${@die(md5(HelloThinkPHP))}"

and i want the script to identify that this line belongs to "/module/action/param1 ${@die(md5(HelloThinkPHP))}" , and then i will retrieve with awk the variable $1 witch is ThinkPHP_RCE.

How can i do this ?

Turbocapitalist 10-09-2019 03:23 AM

Can you go into a little more detail and give one or two more examples? It sounds like you want to read patterns in from one file and search for them in a second. If that is the case, you might have to escalate to perl to avoid lots of loops in AWK.

allend 10-09-2019 08:58 AM

I am going to be caned for this, but given
Code:

bash-5.0$ cat def.conf
aa foraa
bb forbb
cc forcc

and
Code:

bash-5.0$ cat def.log
111.111.111.111 "yyforcc"
111.111.111.111 "yyforaa"
111.111.111.111 "yyforbb"
111.111.111.111 "yyforxx"
222.222.222.222 "yyforaa"
333.333.333.333 "yyforbb"

then
Code:

bash-5.0$ awk 'FILENAME=="def.conf" {a[i]=$1;b[i]=$2;i++}; FILENAME!="def.conf" {for(i in b) {if(match($2,b[i])>0) {print a[i]; break} else {if(i==length(b)-1) {print "No match"}}}}' def.conf def.log
cc
aa
bb
No match
aa
bb

This reads the def.conf file into two arrays, then processes the def.log file. The use of length() is a gawk extension.

pedropt 10-09-2019 10:44 AM

Thanks both of you , Allend is almost there , the problem is i can not rely on 2 last characters found , because it is not enough and a lot of false positives will appear .
One of the difficulties here is that is have more text in the variable than on the file that will provid me the output i want .

If i have a file with definitions like :

1 rttrh/456430/ewrewr/88000
2 3907/weewrerw/2332/ertet

and i send the script to search the definition file above with this variable :

blalbalb/rttrh/456430/ewrewr/88000

then i am stuck because nothing will be found .
Another alternative would be the inverse , witch means picking line by line on definitions file and search on the log , this way will work because the variable will be small :

if i search for :
rttrh/456430/ewrewr/88000

in

blalbalb/rttrh/456430/ewrewr/88000

then i will have a positive output , but will waste a lot of resources and time to do it line by line .

Now , one this that will do the job will be removing the text untile first slash , and then search , if nothing found then remove the text until next front slash .
This way will work , but eventually i will do a lot of searches with not result that will increase time to the script .

allend 10-09-2019 11:15 AM

Quote:

the problem is i can not rely on 2 last characters found
Que?
Quote:

If i have a file with definitions like :
1 rttrh/456430/ewrewr/88000
2 3907/weewrerw/2332/ertet
This moves the goal posts from your original post.
Quote:

and i send the script to search the definition file above with this variable :
blalbalb/rttrh/456430/ewrewr/88000
Excuse me, but where was this in your original post?

Quote:

How can i do this ?
At this forum, you get ideas, not complete solutions.

astrogeek 10-09-2019 01:13 PM

Quote:

Originally Posted by pedropt (Post 6045405)
Thanks both of you , Allend is almost there , the problem is i can not rely on 2 last characters found , because it is not enough and a lot of false positives will appear .

What is missing for us is: Precisely how much of the matched string can you rely on?

Can you provide a clear example of a single match pattern from the def file, along with a few lines which should match, and a few which should not match. I have tried to see that from your examples already given but without success.

pedropt 10-09-2019 03:11 PM

allend , look , i didnt move from my original post , i just give another example .

Quote:

Now , when i put my script searching the definition log , i will use the variable i have in the log , witch is :
"/index.php/module/action/param1 ${@die(md5(HelloThinkPHP))}"
Quote:

Now in web server log this line could be in many forms but that specific sequence is there , by this i mean :

Quote:
111.111.111.111 "GET /index.php/module/action/param1 ${@die(md5(HelloThinkPHP))}"
The definition file is where it will search will be :
Quote:

ThinkPHP_RCE /module/action/param1 ${@die(md5(HelloThinkPHP))}
is exactly the same as :

Quote:

If i have a file with definitions like :

1 rttrh/456430/ewrewr/88000
2 3907/weewrerw/2332/ertet

and i send the script to search the definition file above with this variable :

blalbalb/rttrh/456430/ewrewr/88000

then i am stuck because nothing will be found .
Definitions is some file where i will store all the variables to be compared with .
The ip address on first post was just an example , of course that i will not send the ip address to grep , i will send only what i need to search .

-------------------------------------------------------------------

astrogeek , you are right , i tought about that before i made my last post .

from this example :
Quote:

/index.php/module/action/param1 ${@die(md5(HelloThinkPHP))}
i can leave out "/index.php" because that name file changes , so i will rely only on
Quote:

/module/action/param1 ${@die(md5(HelloThinkPHP))}
Now what i need is not how to put the 1st line as the second line "by removing everything until the 2nd front slash .
What i need is the fastest way to look into a big file for that combination .
I usually use grep , but for heavy files maybe it would be interesting to use something a little more faster .

However i have here an issue , the problem is that every line is different , and this can not be applied for 1 single case .
i have lines in log like this :
/HNAP1

with is an information disclosure to dlink routers (i believe) , on this case i can not remove until the 1st front slash .
Thinking a little bit better , what i really need is to see if on the beginning of the variable is a file or a directory .

directory = /something
file = /somethiong.php/OTHERSTUFF"

case is file then i will remove until 2nd front slash and use the rest , else use the complete variable .

Basically what i need is a quick way to search .

astrogeek 10-09-2019 04:34 PM

That still does not specify how much you can rely on very precisely.

Question: Can you rely on there always being a string matching /module/action/param1 in every line you want to search for?

Quote:

Originally Posted by pedropt (Post 6045471)
However i have here an issue , the problem is that every line is different , and this can not be applied for 1 single case .
i have lines in log like this :
/HNAP1

But you have not said what you want to do in these cases. Ignore the line? Search for the line? What?

UPDATE: Think in terms of your thread title:

Quote:

Search text if some part sequence exists
Define for us exactly the part sequence which exists and is used to trigger the search.

pedropt 10-09-2019 06:07 PM

Thanks for the reply astrogeek and everyone else here trying to help and trying to understand what the heck do i need .

Well , first let me post here some real log examples that anyone gets on their servers from attempts of exploiting .

Lets call it Server.log
Quote:

xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /wp-content/plugins/portabl e-phpmyadmin/wp-pma-mod/index.php
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /HNAP1/
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /prov/aastra.cfg
xxx.xxx.xxx.xxx - [09/Oct +0100] "POST /f4bb336d/admin.php
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /cmdd.php HTTP/1.1"
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /index.php/module/action/param1/${@die(md5(HelloThinkPHP))}
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /App/?content=die(md5(HelloThinkPHP))
xxx.xxx.xxx.xxx - [09/Oct +0100] "POST /editBlackAndWhiteList
xxx.xxx.xxx.xxx - [08/Oct +0100] "GET /0015650000000.cfg
Now , the definition file contains a sequence that it could be equal or not to what i have in log , this will be this way because by default hackers use automated scripts with potential directories , these scripts they use run a list of potential directories .
For me this means that i dont need to write in definitions file every line , i just need to write one line that i will know that they will use for sure , this way i can identify the technique used .

On the above QUOTE ; there are multiple exploitations they have try , but before start digging the definitions file for what they were after , my script 1st must identify what kind of request was made to the server .

From the above Quote what script must search :

Line 1 = /wp-content/plugins/portable-phpmyadmin/wp-pma-mod/
Line 2 = /HNAP1/
Line 3 = /prov/aastra.cfg
Line 4 = /f4bb336d/
Line 5 = Ignore
Line 6 = /module/action/param1/${@die(md5(HelloThinkPHP))}
Line 7 = /App/?content=die(md5(HelloThinkPHP))
Line 8 = /editBlackAndWhiteList
Line 9 = /0015650000000.cfg

How it should do in code :

if last text of variable is a file , and is a .php then remove that text file and search .
If it does not have any file in the beginning or end then search (Line 2)
if after a directory a file text exists but it is not php then search without removing anything .
if it starts with a filename other than php then search all .

Resuming :

- a)Detect if "anything.php" exists in the beginning or at the end of variable and remove it .
- b) Case a) code is true then execute it and search .
- Case a code is false then search

Now what is more important in the code is a fast search .

astrogeek 10-09-2019 06:58 PM

Looks a lot like you are reinventing modsecurity...

Your line 5 case seems at odds with your rule "anything.php = remove and search" as stated. How would the script know to ignore it?

What do you want to get as the final output? The lines from the log or simply a count of the matching lines?

How do you intend to use this? Near real time as lines are added to the logs? Once per day/week to extract stats? For reporting purposes or blocking purposes?

There is a lot of relevant info we do not have.

At the very least I think that you have the problem, as stated so far, backwards - instead of searching the logs, mangling the lines then searching the definitions for a match with the mangle, simply search the logs for matches to the second part of the definitions one definition at a time, replace matches, skip others.

That said, I don't think your problem is yet well enough defined as indicated by the line 5 mismatch, and I would suggest looking at a rule set for modsecurity to see what is actually involved in matching common exploits by regular expression.

allend 10-10-2019 06:46 AM

Given server.log (taking out the space beteen the l and e in portable in what was posted)
Quote:

xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /wp-content/plugins/portable-phpmyadmin/wp-pma-mod/index.php
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /HNAP1/
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /prov/aastra.cfg
xxx.xxx.xxx.xxx - [09/Oct +0100] "POST /f4bb336d/admin.php
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /cmdd.php HTTP/1.1"
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /index.php/module/action/param1/${@die(md5(HelloThinkPHP))}
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /App/?content=die(md5(HelloThinkPHP))
xxx.xxx.xxx.xxx - [09/Oct +0100] "POST /editBlackAndWhiteList
xxx.xxx.xxx.xxx - [08/Oct +0100] "GET /0015650000000.cfg
and server.conf (escaping the characters used in creating regular expressions)
Quote:

aa /wp-content/plugins/portable-phpmyadmin/wp-pma-mod/
bb /HNAP1/
cc /prov/aastra.cfg
dd /f4bb336d/
ee /module/action/param1/\${@die\(md5\(HelloThinkPHP\)\)}
ff /App/\?content=die\(md5\(HelloThinkPHP\)\)
gg /editBlackAndWhiteList
hh /0015650000000.cfg
and server.awk
Code:

FILENAME=="server.conf" {a[i]=$1;b[i]=$2;i++};
FILENAME!="server.conf" {
  for(i in b) {
    if(match($6,b[i])>0) {
      print a[i] " Found " b[i] " in " $0;
      break}
  };
}

then
Code:

bash-5.0$ awk -f server.awk server.conf server.log
aa Found /wp-content/plugins/portable-phpmyadmin/wp-pma-mod/ in xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /wp-content/plugins/portable-phpmyadmin/wp-pma-mod/index.php
bb Found /HNAP1/ in xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /HNAP1/
cc Found /prov/aastra.cfg in xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /prov/aastra.cfg
dd Found /f4bb336d/ in xxx.xxx.xxx.xxx - [09/Oct +0100] "POST /f4bb336d/admin.php
ee Found /module/action/param1/\${@die\(md5\(HelloThinkPHP\)\)} in xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /index.php/module/action/param1/${@die(md5(HelloThinkPHP))}
ff Found /App/\?content=die\(md5\(HelloThinkPHP\)\) in xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /App/?content=die(md5(HelloThinkPHP))
gg Found /editBlackAndWhiteList in xxx.xxx.xxx.xxx - [09/Oct +0100] "POST /editBlackAndWhiteList
hh Found /0015650000000.cfg in xxx.xxx.xxx.xxx - [08/Oct +0100] "GET /0015650000000.cfg


pedropt 10-11-2019 11:32 AM

Nice code allend , i will probably have to adjust it and remove the loop .


The comparison will not be directly to server log , before it checks to your code i will remove the

Quote:

xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /wp-content/plugins/portable-phpmyadmin/wp-pma-mod/index.php
to be only in variable :
Quote:

/wp-content/plugins/portable-phpmyadmin/wp-pma-mod/index.php
This will be done ip by ip , this means that i will choose firstly the ip , then from that your code will start to identify what was that ip doing in server , after your code i will add some other code that in case nothing was found in definitions file (server.conf) , then will ask me to add a new line to definitions file for future detection .
Great code indeed , i was not expecting it was so simple to do it .

pedropt 10-15-2019 05:05 PM

I did not yet marked this thread as solved because i am having difficulties to export the code
inside to the script without having an additional instruction file "server.awk" .

Code:

FILENAME=="server.conf" {a[i]=$1;b[i]=$2;i++};
FILENAME!="server.conf" {
  for(i in b) {
    if(match($6,b[i])>0) {
      print a[i] " Found " b[i] " in " $0;
      break}
  };
}

Exporting it normally to a script how it should be ?
Example :

var1 = some strings to be matched in server.conf
if the string is matches then stop , else continue checking other lines .

What i am doing here is :
After i select an ip to be checked in webservers log i firstly will grab all the log data from that ip to a temp file , and then i will need this code to compare the ip requests in temp.tmp file with the definitions file "server.conf" , the loop will read the first line of the temp file and will search in the definitions file "server.conf" if it matches , in case something was found then stop there and bring back the results .

Something like this :
After exporting the ip data to a tempfile

Code:

ipval="somecode before where i will retrieve the ip to be checked"

This will count the lines to be checked from that ip in temp file

cntip=$(wc -l temp.tmp | awk '{print$1}'
for i in (seq $cntip)
do

var1=$(sed -n ${i}p temp.tmp)

var2=$(awk instruction without the loop in previous code and retrieve $1 from server.conf in case matches any line in $2 in server.conf)

if [[ ! -z "$var2" ]]
then
#stop the loop
cntip="$i"
echo "$ipval activity in server was $var2"
fi
done

Note : temp.tmp file will be a cleaned file with only the requests that ip made , no more data will be there .
An example of temp.tmp file would be this :
Quote:

/TP/public/index.php
/TP/index.php
/thinkphp/html/public/index.php
/html/public/index.php
/public/index.php
/TP/html/public/index.php
/elrekt.php

Firerat 10-16-2019 02:47 AM

you really are making life complicated by using sed the way you are

and I don't understand what you are actually trying to do


Code:

cat > server.conf <<'EOF'
/wp-content/plugins/portable-phpmyadmin/wp-pma-mod/
/HNAP1/
/prov/aastra.cfg
/f4bb336d/
/module/action/param1/\${@die\(md5\(HelloThinkPHP\)\)}
/App/\?content=die\(md5\(HelloThinkPHP\)\)
/editBlackAndWhiteList
/0015650000000.cfg
EOF

Code:

cat > Server.log <<'EOF'
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /wp-content/plugins/portabl e-phpmyadmin/wp-pma-mod/index.php
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /HNAP1/
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /prov/aastra.cfg
xxx.xxx.xxx.xxx - [09/Oct +0100] "POST /f4bb336d/admin.php
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /cmdd.php HTTP/1.1"
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /index.php/module/action/param1/${@die(md5(HelloThinkPHP))}
xxx.xxx.xxx.xxx - [09/Oct +0100] "GET /App/?content=die(md5(HelloThinkPHP))
xxx.xxx.xxx.xxx - [09/Oct +0100] "POST /editBlackAndWhiteList
xxx.xxx.xxx.xxx - [08/Oct +0100] "GET /0015650000000.cfg
EOF

Code:

#!/bin/bash

Patterns=( $( cat server.conf ) )
# or
#while read P
#do
#    Patterns+=($P)
#done < server.conf

while read logline
do
    for i in "${Patterns[@]}"
    do 
        [[ ${logline} =~ $i ]] \
            && printf "Pattern %s matches \"%s\"\n" "${i}" "${logline}" \
            && break
    done
done < Server.log

The patterns in server.conf need to be patterns , at the moment they are not.

how does IP relate?

Code:

#
# create Patterns array as above
#
while read -a ip
do
# TODO add some condition here
  while read logline
  do
  # TODO add some condition here
    for i in "${Patterns[@]}"
    do 
      [[ ${logline} =~ $i ]] \
        && printf "Pattern %s matches \"%s\"\n" "${i}" "${logline}" \
        && break
        # TODO add real actions here
    done
  done< <(grep "${ip[10]%:*}" someotherlog.log)
done < somelog.log


allend 10-16-2019 08:37 AM

Quote:

After i select an ip to be checked in webservers log i firstly will grab all the log data from that ip to a temp file
Not necessary.
Given server.log
Quote:

111.111.111.111 - [09/Oct +0100] "GET /wp-content/plugins/portable-phpmyadmin/wp-pma-mod/index.php
222.222.222.222 - [09/Oct +0100] "GET /HNAP1/
111.111.111.111 - [09/Oct +0100] "GET /prov/aastra.cfg
222.222.222.222 - [09/Oct +0100] "POST /f4bb336d/admin.php
333.333.333.333 - [09/Oct +0100] "GET /cmdd.php HTTP/1.1"
111.111.111.111 - [09/Oct +0100] "GET /index.php/module/action/param1/${@die(md5(HelloThinkPHP))}
333.333.333.333 - [09/Oct +0100] "GET /App/?content=die(md5(HelloThinkPHP))
111.111.111.111 - [09/Oct +0100] "POST /editBlackAndWhiteList
222.222.222.222 - [08/Oct +0100] "GET /0015650000000.cfg
and server.awk
Code:

FILENAME=="server.conf" {a[i]=$1;b[i]=$2;i++};
FILENAME!="server.conf" && $1==IP {
  for(i in b) {
    if(match($6,b[i])>0) {
      print a[i] " Found " b[i] " in " $0;
      break}
  };
}

then
Code:

bash-5.0$ awk -f server.awk -v IP="111.111.111.111" server.conf server.log
aa Found /wp-content/plugins/portable-phpmyadmin/wp-pma-mod/ in 111.111.111.111 - [09/Oct +0100] "GET /wp-content/plugins/portable-phpmyadmin/wp-pma-mod/index.php
cc Found /prov/aastra.cfg in 111.111.111.111 - [09/Oct +0100] "GET /prov/aastra.cfg
ee Found /module/action/param1/\${@die\(md5\(HelloThinkPHP\)\)} in 111.111.111.111 - [09/Oct +0100] "GET /index.php/module/action/param1/${@die(md5(HelloThinkPHP))}
gg Found /editBlackAndWhiteList in 111.111.111.111 - [09/Oct +0100] "POST /editBlackAndWhiteList



All times are GMT -5. The time now is 09:05 PM.