Log file parsing

dfco · 10-18-2016, 10:23 AM

Hello Experts,

I am new in the website and seeking your help on a specific need.

I need to parse statistics in a csv format. Statistics are coming from a log file (see below) :

SOURCE DATE Tue Oct 18 08:55:27 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2

APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 695 Applied: 695 Rejected: 0 Affected: 695
APP_9784 Updated rows - Requested: 4201 Applied: 4201 Rejected: 0 Affected: 4171

APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
APP_9784 Updated rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2
APP_9784 Deleted rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 3

SOURCE DATE Tue Oct 18 09:35:54 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 3 Applied: 2 Rejected: 1 Affected: 2

APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 795 Applied: 695 Rejected: 0 Affected: 695
APP_9784 Updated rows - Requested: 4224 Applied: 4224 Rejected: 0 Affected: 4220

APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1228 Applied: 1228 Rejected: 0 Affected: 1228
APP_9784 Updated rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 5
APP_9784 Deleted rows - Requested: 7 Applied: 7 Rejected: 0 Affected: 5

Format of the output file :
TARGET_TABLE_NAME,INSERTED_APPLIED_ROWS,INSERTED_AFFECTED_ROWS,INSERTED_REJECTED_ROWS,UPDATED_APPLIE D_ROWS,UPDATED_AFFECTED_ROWS,UPDATED_REJECTED_ROWS,DELETED_APPLIED_ROWS,DELETED_AFFECTED_ROWS,DELETE D_REJECTED_ROWS

And the result should be (3 lines) :
CFG,2,2,1,0,0,0,0,0,0
PORTUG_ALL,695,695,0,4224,695,0,0,0,0
REMOTE_DTL,1228,1228,0,5,5,0,7,5,0

The numbers/statistics should be extracted from the latest SOURCE DATE (i.e. Tue Oct 18 09:35:54 2016)

I tried many options (cut, sed, awk, ...) but it does not work.

I would appreciate any help/suggestion on the matter.

Thanks, Abel

TB0ne · 10-18-2016, 11:18 AM

Quote:

Originally Posted by dfco

Hello Experts,
I am new in the website and seeking your help on a specific need. I need to parse statistics in a csv format. Statistics are coming from a log file (see below) :

SOURCE DATE Tue Oct 18 08:55:27 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2

APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 695 Applied: 695 Rejected: 0 Affected: 695
APP_9784 Updated rows - Requested: 4201 Applied: 4201 Rejected: 0 Affected: 4171

APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
APP_9784 Updated rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2
APP_9784 Deleted rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 3

SOURCE DATE Tue Oct 18 09:35:54 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 3 Applied: 2 Rejected: 1 Affected: 2

APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 795 Applied: 695 Rejected: 0 Affected: 695
APP_9784 Updated rows - Requested: 4224 Applied: 4224 Rejected: 0 Affected: 4220

APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1228 Applied: 1228 Rejected: 0 Affected: 1228
APP_9784 Updated rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 5
APP_9784 Deleted rows - Requested: 7 Applied: 7 Rejected: 0 Affected: 5

Format of the output file :
TARGET_TABLE_NAME,INSERTED_APPLIED_ROWS,INSERTED_AFFECTED_ROWS,INSERTED_REJECTED_ROWS,UPDATED_APPLIE D_ROWS,UPDATED_AFFECTED_ROWS,UPDATED_REJECTED_ROWS,DELETED_APPLIED_ROWS,DELETED_AFFECTED_ROWS,DELETE D_REJECTED_ROWS

And the result should be (3 lines) :
CFG,2,2,1,0,0,0,0,0,0
PORTUG_ALL,695,695,0,4224,695,0,0,0,0
REMOTE_DTL,1228,1228,0,5,5,0,7,5,0

The numbers/statistics should be extracted from the latest SOURCE DATE (i.e. Tue Oct 18 09:35:54 2016) I tried many options (cut, sed, awk, ...) but it does not work.

Please read the "Question Guidelines" link in my posting signature. Without knowing what you have done/tried or you posting your actual code, we can't tell you much. You say "tried many options", but don't tell us WHICH ONES, or give us details about what your actual goal is, or how often you need to do this. Solutions for a one-time fix will be different than something that's meant to be run numerous times a week/day.

Also, where is this data coming FROM? Could be there is already an option to save it in CSV format the way you want it.

szboardstretcher · 10-18-2016, 11:38 AM

If you know how to use awk and sed and all that, then this might help you get started. This will read in 14 lines as a record, with each line treated as a seperate variable, so you can massage the data into what you need:

Code:

cat logfile | while read -r line1; do
read -r line2
read -r line3
read -r line4
read -r line5
read -r line6
read -r line7
read -r line8
read -r line9
read -r line10
read -r line11
read -r line12
read -r line13
read -r line14
done

If this is a serious sized log, and you are planning to do many of them, you WILL DEFINITELY want to use a programmatic language like Python or C++ to do this. Doing it in bash will cost you performance over the long run.

dfco · 10-18-2016, 06:53 PM

@szboardstretcher : Thanks but it is not working or I am not getting it.

I am not an expert in the awk, sed, ... commands.

I tried :
* cat logfile | cut -f1 -d':'
* cut -d: -f2-5 logfile|grep -o '[0-9]*
* cut -d ":" -f 2- logfile

Looking forward to reading from you.

AwesomeMachine · 10-18-2016, 07:42 PM

Let's break the problem up into bite-sized pieces. A file is made up of lines, and lines are made up of fields (or words) divided by white space (tab, space). Each word is commonly represented as: $1, $2, ..., i.e. if I have somefile that contains:
cat bat
cat mat
cat sat

If I use

Code:

$ grep cat somefile

and the output is:

Code:

cat bat
cat mat
cat sat

then

Code:

$ grep cat somefile | awk '{print $2}'

will yield output:

Code:

bat
sat
mat

Whereas

Code:

grep cat somefile | awk '{print $1,$2}'

will yield output:

Code:

bat cat
mat cat
sat cat

But there is no awk one-liner that will do everything you want to do. You'll need an awk--or better yet PERL--script to do it all. But AWK is easier than PERL, albeit less powerful.

dfco · 10-18-2016, 07:51 PM

Thanks.

PERL ? Do you mean it is not doable easily ?

I can't parse the following line :
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218

and generate an output file like :
1218;1218;0;1218

allend · 10-18-2016, 09:12 PM

It is doable with a scripted solution.
e.g. For awk, perhaps this will give you some ideas.

Code:

BEGIN {OFS=","}
$2 == "Target:" {TARGET_TABLE_NAME = $3}
$2 == "Inserted" {INSERTED_APPLIED_ROWS = $8;
                  INSERTED_AFFECTED_ROWS = $12;
                  INSERTED_REJECTED_ROWS = $10}
...
/^$/ {print TARGET_TABLE_NAME,INSERTED_APPLIED_ROWS,INSERTED_AFFECTED_ROWS,INSERTED_REJECTED_ROWS;
      TARGET_TABLE_NAME = "";
      INSERTED_APPLIED_ROWS = 0;
      INSERTED_AFFECTED_ROWS = 0;
      INSERTED_REJECTED_ROWS = 0;
      ...}

TB0ne · 10-19-2016, 07:54 AM

Quote:

Originally Posted by dfco

Thanks.
PERL ? Do you mean it is not doable easily ?

Perl IS easy...and thank you for not replying to my questions. I had asked you where this data was coming FROM, since there is a possibility that it can be outputted into CSV natively, and I also asked you to show us what you've done/tried on your own. Why do you ignore these things?

Quote:

I can't parse the following line :
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218

and generate an output file like : 1218;1218;0;1218

Why?? Again, what DID YOU DO to attempt to parse this line??? Just saying "I can't parse the following line", tells us nothing about your efforts. Incidentally, I'm able to parse this almost down to what you need with a few sed statements, but your input and output are NOT MATCHING. For example, you posted this:

Code:

SOURCE DATE Tue Oct 18 08:55:27 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2 

APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 695 Applied: 695 Rejected: 0 Affected: 695 
APP_9784 Updated rows - Requested: 4201 Applied: 4201 Rejected: 0 Affected: 4171 

APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218 
APP_9784 Updated rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2 
APP_9784 Deleted rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 3

...and then said you want THIS for the output:

Code:

And the result should be (3 lines) :
CFG,2,2,1,0,0,0,0,0,0
PORTUG_ALL,695,695,0,4224,695,0,0,0,0
REMOTE_DTL,1228,1228,0,5,5,0,7,5,0

??? Look at the CFG input data...it has 4 numbers...2, 2, 0, and 2. For your 'required' output, you have 2,2,1,0,0,0,0,0,0. Where is that coming from, or are you saying you need the data padded??? And if this is for a database input, again we will ask where the data is coming from.

Turbocapitalist · 10-19-2016, 08:07 AM

dfco, please write where the data is coming from. You might have more options than you are aware of. Though the fall-back option is some simple manipulation with perl.

dfco · 10-19-2016, 12:47 PM

Maybe I need to write I am a beginner and complex Linux scripting is something new.

The log is a csv file and is generated by an application which insert/update/delete records into a table.

I tried to make simplify my need with the last example I gave :

APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
would generate the following line
1218;1218;0;1218

Thanks.

Turbocapitalist · 10-19-2016, 01:09 PM

Quote:

Originally Posted by dfco

APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
would generate the following line
1218;1218;0;1218

There are several goto tools for system administration. "grep", "sed", "awk", and "perl" top the list. Learning what they can do is a first step. "awk" seems most relevant here based on your sample, if the number of words is always the same.

Code:

awk 'BEGIN{ OFS=":"} {print $6, $8, $10 }' yourfile.txt

So start by looking up the pieces in the manual page for "awk", FS and OFS in particular.

Code:

man awk

It's not a complex language but tutorials will help, and there are many to choose from on the web.

grail · 10-19-2016, 01:17 PM

Quote:

Originally Posted by dfco

Maybe I need to write I am a beginner and complex Linux scripting is something new.

The log is a csv file and is generated by an application which insert/update/delete records into a table.

I tried to make simplify my need with the last example I gave :

APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
would generate the following line
1218;1218;0;1218

Thanks.

No one is asking for you to write anything complex, but we are asking to see your attempt for at present it appears you are making none and waiting for someone to tell you
how to do it ... which is not the LQ way

You mention that the log file is of csv format, but, on an examination of the data you have shown I am unable to find even one comma (seeing as csv :- comma separated values)

Is the name of the application not allowed to be known? This may be the case, but you should at least say so instead of avoiding the question.

Yes your simplified example is much easier, although any solution provided that works on this simple example will more than likely not work on the real one due to the extra data,
and again, you have shown no effort as to what you might have tried.

The general response to no effort is to point you at the manuals for sed/awk/perl or the like and ask you to come back when you have had a go.

So, as asked by many others, please answer the questions asked so you may be provided with the most appropriate response?

dfco · 10-19-2016, 01:19 PM

Ok. I will try with your recommendation.

Thanks.

dfco · 10-19-2016, 01:42 PM

What I tried :
grep "Initializing session" /infa_shared/SessLogs/s_m_CDC_GRP5.log | awk -F"[" '{print $2}' | awk -F"]" '{print $1}' > testfile.log
grep "SOURCE BASED COMMIT POINT" s_m_CDC_GRP2.log | awk -F"[" '{print $2}' > testfile.log
cat testfile.log
cut -d: -f 1,3 testfile.log
cut -d: -f 2,4 testfile.log
cut -d: -f 1,2,3,4,5,6 testfile.log
cut -d: -f 1,2,3,4,5,6 testfile.log
cut -d: -f 1 testfile.log
cut -d: -f 2 testfile.log
ls testfile.log | cut -f1 -d':'
cat testfile.log | cut -f1 -d':'
cat testfile.log | cut -f2 -d':'
cat testfile.log | cut -f3 -d':'
cat testfile.log | cut -f4 -d':'
cat testfile.log | cut -f5 -d':'
cat testfile.log | cut -f6 -d':'
cat testfile.log | cut -f6 -d'Affected'
cat testfile.log | cut -f6 -d': '
cut -d: -f6 testfile.log
cut -d: -f1 testfile.log
cut -d: -f2 testfile.log
cut -d: -c2 testfile.log
cut -d: -f2 -f3 testfile.log
cut -d: -f2-3 testfile.log
cut -d: -f2-4 testfile.log
cut -d: -f2-5 testfile.log
cut -d: -f2-5 testfile.log|grep [0-9]
cut -d: -f2-5 testfile.log|grep -o '[0-9]*'
cut -d: -f2-5 testfile.log|grep '[0-9]*'
cut -d: -f2-5 testfile.log|grep -o '[0-9*'
cut -d: -f2-5 testfile.log|grep -o '[0-9]'
cut -d : -f2-5 testfile.log
cut -d ": " -f2-5 testfile.log
cut -d ":"-f2-5 testfile.log
cut -d ":"-f 5- testfile.log
cut -d ":"-f 2- testfile.log
cut -d ":"-f 2-| testfile.log
grep -Eo '[0-9]{1,4}' testfile.log
grep -Eo '[0-9]{1,4}' testfile.log|cut -d: -f2-5
cut -d ":"-f 5- testfile.log|grep -Eo '[0-9]{1,4}'
cut -d ":"-f 5- testfile.log|grep -Eo '[0-9]'
sed 's/[^0-9]//g' testfile.log
grep -o '[0-9]*' testfile.log
cut -d ":"-f 2- testfile.log|grep "^[0-9]"
cut -d ":"-f 2- testfile.log|grep "[0-9]"
cut -d ":"-f 2- testfile.log|grep "[0-9]*"
awk -F"~" testfile.log
awk -F"~" '{print $1}' testfile.log
awk -F":" '{print $1}' testfile.log
awk -F":" '{print $2}' testfile.log
awk -F"Requested" '{print $2}' testfile.log
awk -F"Requested:" '{print $2}' testfile.log

dfco · 10-19-2016, 01:46 PM

If easier I am fine with the comma.

APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
would generate the following line
1218,1218,0,1218

I didn't know the name of the application was so important. It is ODI (Oracle Data Integrator).