LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-06-2013, 09:04 AM   #1
samasara
Member
 
Registered: Aug 2013
Posts: 34

Rep: Reputation: Disabled
Read .log format file and get special character from some lines by shell scripting


Hi Dear Users,
I want to write a shell script to read from a .log format file and get special characters from it.my log format is like this:
--471ea136-A--
[11/Jul/2013:06:42:08 --0400] Ud6MAH8AAAEAAAn4YBoAAAAK 192.168.153.128 42977 192.168.153.128 80
--471ea136-B--
GET /inssgtz7ieltdSstbw7e/neQhmsdwu7imdb0etet/eT/hsvegbff/EH/niRAvLwGK_L/osLnBWcHRk5oGMI/tmLJFqSww/sSjS6KRJB.html?Settotzeertnl=%27pn+&8nafitm=74LuKUC5t0J&4ttNe=Anmsyusi6&Mf1g-vYqyx=elTTsw&Euoytxp$
--471ea136-F--
HTTP/1.1 501 Method Not Implemented
Allow: TRACE
Connection: close
Content-Type: text/html; charset=iso-8859-1
--471ea136-H--
Message: Access denied with code 406 (phase 2). Pattern match "(^[\"'`\xc2\xb4\xe2\x80\x99\xe2\x80\x98;]+|[\"'`\xc2\xb4\xe2\x80\x99\xe2\x80\x98;]+$)" at ARGS:Settotzeertnl. [file "/usr/local/apach$e/conf/samane_rules/SpiderLabs-owasp-modsecurity-crs-33612c6/base_rules/modsecurity_crs_41_sql_injection_attacks.conf"] [line "64"] [id "981318"] [rev "2"] [msg "SQL Injection Attack: Commo$ Common Injection Testing Detected"] [data "Matched Data: ' found within ARGS:Settotzeertnl: 'pn "] [severity "CRITICAL"] [ver "OWASP_CRS/2.2.7"] [maturity "9"] [accuracy "8"] [tag "OWASP_CRS/WEB$_ATTACK/SQL_INJECTION"] [tag "WASCTC/WASC-19"] [tag "OWASP_TOP_10/A1"] [tag "OWASP_AppSensor/CIE1"] [tag "PCI/6.5.2"]Apache-Error: [file "http_filters.c"] [line 262] [level 3] Unknown Transfer-Encoding: wria, referer: http://www.hao8.de/0ehnqceo/segfto2z...f/tmdpe2En.avi
Action: Intercepted (phase 2)
Stopwatch: 1373539328722947 9277 (- - -)
Stopwatch2: 1373539328722947 9277; combined=587, p1=24, p2=549, p3=0, p4=0, p5=13, sr=0, sw=1, l=0, gc=0
Producer: ModSecurity for Apache/2.7.2 (http://www.modsecurity.org/).
Server: Apache/2.2.23 (Unix) mod_ssl/2.2.23 OpenSSL/1.0.0-fips DAV/2 PHP/5.4.12
Engine-Mode: "ENABLED"

--471ea136-Z--

and this is just for one log. for each log i want to Get data from B part, and get accuracy value from H part. the main question of me is that how to say to my shell script to get these data from each log. as you can see each log is determined just by A-Z part but how to say go to other log each time?
really thanks
 
Old 08-06-2013, 09:54 AM   #2
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
not entirely certain what you want, but this will do an 'OK' job ( I think )

Code:
sed '/[B,H]--$/,/^--/!d' Input.log
Output
Code:
--471ea136-B--
GET /inssgtz7ieltdSstbw7e/neQhmsdwu7imdb0etet/eT/hsvegbff/EH/niRAvLwGK_L/osLnBWcHRk5oGMI/tmLJFqSww/sSjS6KRJB.html?Settotzeertnl=%27pn+&8nafitm=74LuKUC5t0J&4ttNe=Anmsyusi6&Mf1g-vYqyx=elTTsw&Euoytxp$
--471ea136-F--
--471ea136-H--
Message: Access denied with code 406 (phase 2). Pattern match "(^[\"'`\xc2\xb4\xe2\x80\x99\xe2\x80\x98;]+|[\"'`\xc2\xb4\xe2\x80\x99\xe2\x80\x98;]+$)" at ARGS:Settotzeertnl. [file "/usr/local/apach$e/conf/samane_rules/SpiderLabs-owasp-modsecurity-crs-33612c6/base_rules/modsecurity_crs_41_sql_injection_attacks.conf"] [line "64"] [id "981318"] [rev "2"] [msg "SQL Injection Attack: Commo$ Common Injection Testing Detected"] [data "Matched Data: ' found within ARGS:Settotzeertnl: 'pn "] [severity "CRITICAL"] [ver "OWASP_CRS/2.2.7"] [maturity "9"] [accuracy "8"] [tag "OWASP_CRS/WEB$_ATTACK/SQL_INJECTION"] [tag "WASCTC/WASC-19"] [tag "OWASP_TOP_10/A1"] [tag "OWASP_AppSensor/CIE1"] [tag "PCI/6.5.2"]Apache-Error: [file "http_filters.c"] [line 262] [level 3] Unknown Transfer-Encoding: wria, referer: http://www.hao8.de/0ehnqceo/segfto2z...f/tmdpe2En.avi
Action: Intercepted (phase 2)
Stopwatch: 1373539328722947 9277 (- - -)
Stopwatch2: 1373539328722947 9277; combined=587, p1=24, p2=549, p3=0, p4=0, p5=13, sr=0, sw=1, l=0, gc=0
Producer: ModSecurity for Apache/2.7.2 (http://www.modsecurity.org/).
Server: Apache/2.2.23 (Unix) mod_ssl/2.2.23 OpenSSL/1.0.0-fips DAV/2 PHP/5.4.12
Engine-Mode: "ENABLED"

--471ea136-Z--
It does leave some 'junk' behind, I assume you just need the logs to be less distracting

to do on multiple files, you just need a loop

assuming logs are in a single dir;
Code:
for Log in /path/to/LogDir/*.log;do
    sed '/[B,H]--$/,/^--/!d' $Log > ${Log}.seded
done
above I redirected the output to Input.log.seded
because I have little imagination
if you wanted, you could get sed to edit "inline" as follows
Code:
for Log in /path/to/LogDir/*.log;do
    sed -i.backup '/[B,H]--$/,/^--/!d' $Log
done
if you remove the .backup ( which you could set as any string you wanted ) then no backup will be created.

Last edited by Firerat; 08-06-2013 at 10:04 AM. Reason: added ^ to range end ( so only match lines which start -- )
 
Old 08-07-2013, 02:21 AM   #3
samasara
Member
 
Registered: Aug 2013
Posts: 34

Original Poster
Rep: Reputation: Disabled
Read .log format file and get special character from some lines by shell scripting

Hi again dear users. I thank you for your reply. The thing is for example in the For loop that you said i want to get the information from B part, the whole string that is after GET part, then search in H part and Get "accuracy" field from it. when get these values pass these data as inputs to other program that has been written in c++ language. for each log that has B to H part this process should be done. It means for each log i should Get these data and send to other program for processing (that is in c++ language as i said).

It would be your kind if you help me.
regards
 
Old 08-07-2013, 03:52 AM   #4
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Ok, I think I better understand now


This is probably quite crude, and I have no 'awk style'
Code:
awk '{
if ($1 == "GET")
   {
    printf $2" "
   }
   {
   /Message:/;
      for (i=2;i<=NF;i++)
           if ($i == "[accuracy")
           {
             i++;gsub(/[",\]]/,"",$i);
             printf $i;
             break
           }
   }
}' Input.log
For your Input it will give
Code:
/inssgtz7ieltdSstbw7e/neQhmsdwu7imdb0etet/eT/hsvegbff/EH/niRAvLwGK_L/osLnBWcHRk5oGMI/tmLJFqSww/sSjS6KRJB.html?Settotzeertnl=%27pn+&8nafitm=74LuKUC5t0J&4ttNe=Anmsyusi6&Mf1g-vYqyx=elTTsw&Euoytxp$ 8
you could make it 'faster' by increasing the initial i @ (i=2;i<=NF;i++)

and it is probably better to use [ and ] as field separators
That would probably make the fields 'predictable' so no need to go in the loop testing for [accuracy

but with just one data set I can't tell.
But the above should work, perhaps not as fast as it should.
 
Old 08-07-2013, 04:19 AM   #5
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Don't know how well this is going to work

Code:
awk 'BEGIN{FS="["};
/^GET/{gsub(/^GET /,"",$1);printf $1" "};
/^Message:/{gsub(/[^[:digit:]]/,"",$13);print $13}' Input.log
it will likely fail, due to
Pattern match "(^[\"'`\xc2\xb4\xe2\x80\x99\xe2\x80\x98;]+|[\"'`\xc2\xb4\xe2\x80\x99\xe2\x80\x98;]+$)"

the Pattern match is probably introducing 'random' [ , so field 13 is unlikely to be valid

Last edited by Firerat; 08-07-2013 at 04:25 AM.
 
Old 08-08-2013, 12:57 PM   #6
samasara
Member
 
Registered: Aug 2013
Posts: 34

Original Poster
Rep: Reputation: Disabled
- Read .log format file and get special character from some lines by shell scripting

Hi again dear users and really thank you for your help. You said me to use some code like below for getting accuracy and GET data.
awk '{
if ($1 == "GET")
{
printf $2" "
}
{
/Message:/;
for (i=2;i<=NF;i++)
if ($i == "[accuracy")
{
i++;gsub(/[",\]]/,"",$i);
printf $i;
break
}
}
}' Input.log

Thank you for this suggestion. But the question is how to pass these data to other program that is in C++ language for further processing. I have a c++ code like that the input function is like this:
set input(1,10)
set input(2,5)
set input(3,6)
I want to pass accuracy value to the first statement(i mean set input(1,10).it means instead of 10 it should be the accuracy value(for example 8). for the second statement i get the information of the GET part and regards to that information i assign a value to that for second part and a value for third part. for example i get this part and regards to that i will pass for example 9 to second part and 0 to third part. But i do not know exactly for sending these valuse to c++ program in this manner how would it be possible?
Really thanks for your kind and help
regards
Samaneh berenjian
 
Old 08-08-2013, 02:28 PM   #7
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
sorry, I can't tell you how your c++ program gets its input
I simply do not know.


I really don't understand what you are doing here,

I understand you want to pass the "accuracy" , ( 8 in your sample data )
but how are you going to make the *long* string into an integer ?


you propose feeding a c++ program 8 9 and 0

what is it going to do with those three numbers?
 
Old 08-09-2013, 03:23 AM   #8
samasara
Member
 
Registered: Aug 2013
Posts: 34

Original Poster
Rep: Reputation: Disabled
Read .log format file and get special character from some lines by shell scripting

Hi again dear users,

Actually I send these 3 numbers to a c++ program and do some processing on them to get a result(This is a fuzzy logic program). Do you know I should write a program that get these data that has been extracted from the shell scripting that you write.

I think for identifying long i should write a for loop and in it i introduce accuracy field in it. but i do not actually how would it be?

anyway thank you for your kind and help.
If there is something else that you think it is useful for me to write this program it would be your kind if you help me.
thanks alot
regards
 
Old 08-09-2013, 05:46 AM   #9
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
but you are only getting one number from that log ( 8 )
I don't see where you get the other two

as for the C++ program, it probably makes sense to get its input from a file OR stdin

then

Code:
awk '{print awk stuff}' Input > Output
MyCplusplus Output
# or
awk '{print awk stuff}' Input | MyCplusplus
 
Old 08-09-2013, 07:16 AM   #10
samasara
Member
 
Registered: Aug 2013
Posts: 34

Original Poster
Rep: Reputation: Disabled
Read .log format file and get special character from some lines by shell scripting

The two others are identifying by myself regards to information that i obtain from GET part. i get it form GET part and i use if - else statement and i say if for example this is "/inssgtz7ieltdSstbw7e/neQhmsdwu7imdb0etet/eT/hsvegbff/EH/niRAvLwGK_L/osLnBWcHRk5oGMI/tmLJFqSww/sSjS6KRJB.html?Settotzeertnl=%27pn+&8nafitm=74LuKUC5t0J&4ttNe=Anmsyusi6&Mf1g-vYqyx=elTTsw&Euoytxp$" assign value 9 to second input and 0 to third input.
sincerely your's
 
Old 08-09-2013, 08:09 AM   #11
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Well that if else could be 'built in' to this awk

but right now you can make a file
Code:
awk '{print awk stuff}' Input > Output
each line having two 'fields'
you can further process that to run your tests on field one....
hmm, lets change that
Code:
awk 'BEGIN{Third=0}{
if ($1 == "GET")
   {
   Get=$2 
   if ( GET != "some kind of test , this is just example" )
      Get=9
   }
   {
   /Message:/;
      for (i=2;i<=NF;i++)
           if ($i == "[accuracy")
           {
             i++;gsub(/[^[:digit:]]/,"",$i);
             Acc=$i;
             printf "%d %d %d\n",Acc,Get,Third;
             break
           }
   }
}' Input.log > OutPutForCProgInput
fields are re-ordered , so accuracy is 1st, get is 2nd
I included a very dumb test set get 9, and if 3rd is always 0, just added it to the print, but would be better to assign that in BEGIN ( changed in code now )

So now you just need to get your C++ prog to use OutPutForCProgInput file as input
and if you can get it to use stdin,,
Code:
awk ..blah blah..
}' Input.log | C++Prog
 
Old 08-09-2013, 10:54 AM   #12
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
hi, quick-and-dirty:
Code:
[schneidz@hyper ~]$ egrep -A 1 "(-B--|-H--)" samasara.txt | sed s/^.*accuracy/""/ | sed s/\"\].*$/""/
--471ea136-B--
GET /inssgtz7ieltdSstbw7e/neQhmsdwu7imdb0etet/eT/hsvegbff/EH/niRAvLwGK_L/osLnBWcHRk5oGMI/tmLJFqSww/sSjS6KRJB.html?Settotzeertnl=%27pn+&8nafitm=74LuKUC5t0J&4ttNe=Anmsyusi6&Mf1g-vYqyx=elTTsw&Euoytxp$
--
--471ea136-H--
 "8
[schneidz@hyper ~]$ egrep -A 1 "(-B--|-H--)" samasara.txt | sed s/^.*accuracy/""/ | sed s/\"\].*$/""/ | /whatever/floats/your/boat.cxx
 
Old 08-12-2013, 08:55 AM   #13
samasara
Member
 
Registered: Aug 2013
Posts: 34

Original Poster
Rep: Reputation: Disabled
thANKS

Thanks a lot dear users for your help.
 
Old 08-15-2013, 01:03 AM   #14
samasara
Member
 
Registered: Aug 2013
Posts: 34

Original Poster
Rep: Reputation: Disabled
Read .log format file and get special character from some lines by shell scripting

Hi again dear users,
I just use this command
egrep -A 1 "(-B--|-H--)" samasara.txt | sed s/^.*accuracy/""/ | sed s/\"\].*$/""/ > sama.log
and my log file is like this now(3 logs):
--a58e7514-B--
GET /6ni.mdb?nozjrinYr=iqr&je=su+HYka%26ttAin+ewgety+object&rrofatiosmdereo=71&@gKna=ihw%3B&rrd6t4aeaet=9 &bodyrhPy=rN7eis7k&c3snNns=e4V4qrS%40%408r&1fhmdroyc=y3V%40hSUm&dhDt5al3tts=743978 HTTP/1.0
--
--a58e7514-H--
"8
--
--a58e7514-B--
GET /nZ1f7@k1Et/tZ/p76wc4jsJwd6hXQY/ulTHiisjxea/aaioitpoqmdsjrcettn/hqKkaspJpN.EFWHKzkF/dFqqKKrktH1/n1pttozE3a41/kn/s3aCv2mlLqJw/i133rAKdeV@H0/IE.js?oeas=eu&3cnsycnsa8=%29ttsd&oo6vT=6tajt&li=0reiK$
--
--a58e7514-H--
"8
--
--a58e7514-B--
GET /i_O6L@tzWZbo8mZ_n0I/muleXmWiahoftehi.css?gMqou6Mdaa=d%27ois%5Dh%3Ec+&GWwindow.openg8Y=219&4XAR_=e&T1v._Vh5jtelnetZr=An7% 259%3A%5DHpatvhl HTTP/1.0
--
--a58e7514-H--
"8

after that i try to use some proram like this:

awk 'BEGIN{Third=0}{
if ($1 == "GET")
{
Get=$2
if ( GET != "some kind of test , this is just example" )
Get=9
}
{
/Message:/;
for (i=2;i<=NF;i++)
if ($i == "[accuracy")
{
i++;gsub(/[^[:digit:]]/,"",$i);
Acc=$i;
printf "%d %d %d\n",Acc,Get,Third;
break
}
}
}' sama.log > mycplusplus program

Now the problem is that how can i make this program as a program to check every logs and for each log do something like above. And the other thing is in my c++ program i sould run the script with the command system(./myscript) to pass the data to the variable that are in my code?
Thanks
waiting for your reply
 
Old 08-15-2013, 02:24 AM   #15
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
This is getting frustrating

Firstly, please use
[code]
to
keep whitespace

in your code
[/code]

OK, next
if this is your c++ program, why not have it read and process the log files?
it will probably be faster then getting awk to do it.
I can probably learn how to do it in c++,. but you have a head start...


Third
you only need the one awk, no need for grep and sed..

Code:
awk 'BEGIN{Third=0}{
if ($1 == "GET")
   {
   Get=$2 
   if ( GET != "some kind of test , this is just example" )
      Get=9
   }
   {
   /Message:/;
      for (i=2;i<=NF;i++)
           if ($i == "[accuracy")
           {
             i++;gsub(/[^[:digit:]]/,"",$i);
             Acc=$i;
             printf "%d %d %d\n",Acc,Get,Third;
             break
           }
   }
}' /path/to/yourlogs/*.log | FuzzyLogicProg # assumes it will read stdin taking 3 args from each line
Better way of doing it ( should also be faster )
Save as ParseLog.awk ( You use a name that makes sense )
Code:
#!/usr/bin/awk -f 
BEGIN{Third=0}{
if ($1 == "GET") {
   Get=$2; if ( GET != "some kind of test , this is just example" )
      Get=9
   }
if ($1 == "Message:") {
    sub(/^Message:.+\[accuracy/,"",$0);
    gsub(/[^[:digit:]]/,"",$1);
    Acc=$1;
    printf "%d %d %d\n",Acc,Get,Third;
   }
}
make it executable
and
Code:
./ParseLog.awk /path/to/logs/*.log
Your sample data results in
Code:
8 9 0
how you get that into your program is up to you ( from stdin would be nice )

But please remember, you have only given ONE data set...
You have not defined how you test the 'Get' to end up with 9
or where the third value (0) comes from

So as it stands, the current awk scripts I have posted are useless to you..
UNLESS *YOU* modify them to suit.


And will it matter if you process the same logs over and over?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Find and Replace character/special character from the file MyRelam Red Hat 8 05-21-2012 12:52 AM
[SOLVED] read from keyboard while reading from file in SHELL SCRIPTING m3ll0 Programming 11 10-30-2010 08:30 AM
Shell script to read lines in a text file and filter user data srimal Linux - Newbie 5 10-21-2009 07:41 AM
Inserting lines into a file through shell scripting false-hopes Linux - General 1 10-22-2005 11:39 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:29 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration