LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-18-2016, 10:23 AM   #1
dfco
LQ Newbie
 
Registered: Oct 2016
Posts: 17

Rep: Reputation: Disabled
Log file parsing


Hello Experts,

I am new in the website and seeking your help on a specific need.

I need to parse statistics in a csv format. Statistics are coming from a log file (see below) :

SOURCE DATE Tue Oct 18 08:55:27 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2

APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 695 Applied: 695 Rejected: 0 Affected: 695
APP_9784 Updated rows - Requested: 4201 Applied: 4201 Rejected: 0 Affected: 4171

APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
APP_9784 Updated rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2
APP_9784 Deleted rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 3

SOURCE DATE Tue Oct 18 09:35:54 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 3 Applied: 2 Rejected: 1 Affected: 2

APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 795 Applied: 695 Rejected: 0 Affected: 695
APP_9784 Updated rows - Requested: 4224 Applied: 4224 Rejected: 0 Affected: 4220

APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1228 Applied: 1228 Rejected: 0 Affected: 1228
APP_9784 Updated rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 5
APP_9784 Deleted rows - Requested: 7 Applied: 7 Rejected: 0 Affected: 5

Format of the output file :
TARGET_TABLE_NAME,INSERTED_APPLIED_ROWS,INSERTED_AFFECTED_ROWS,INSERTED_REJECTED_ROWS,UPDATED_APPLIE D_ROWS,UPDATED_AFFECTED_ROWS,UPDATED_REJECTED_ROWS,DELETED_APPLIED_ROWS,DELETED_AFFECTED_ROWS,DELETE D_REJECTED_ROWS

And the result should be (3 lines) :
CFG,2,2,1,0,0,0,0,0,0
PORTUG_ALL,695,695,0,4224,695,0,0,0,0
REMOTE_DTL,1228,1228,0,5,5,0,7,5,0

The numbers/statistics should be extracted from the latest SOURCE DATE (i.e. Tue Oct 18 09:35:54 2016)

I tried many options (cut, sed, awk, ...) but it does not work.

I would appreciate any help/suggestion on the matter.

Thanks, Abel
 
Old 10-18-2016, 11:18 AM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,617

Rep: Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963
Quote:
Originally Posted by dfco View Post
Hello Experts,
I am new in the website and seeking your help on a specific need. I need to parse statistics in a csv format. Statistics are coming from a log file (see below) :

SOURCE DATE Tue Oct 18 08:55:27 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2

APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 695 Applied: 695 Rejected: 0 Affected: 695
APP_9784 Updated rows - Requested: 4201 Applied: 4201 Rejected: 0 Affected: 4171

APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
APP_9784 Updated rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2
APP_9784 Deleted rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 3

SOURCE DATE Tue Oct 18 09:35:54 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 3 Applied: 2 Rejected: 1 Affected: 2

APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 795 Applied: 695 Rejected: 0 Affected: 695
APP_9784 Updated rows - Requested: 4224 Applied: 4224 Rejected: 0 Affected: 4220

APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1228 Applied: 1228 Rejected: 0 Affected: 1228
APP_9784 Updated rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 5
APP_9784 Deleted rows - Requested: 7 Applied: 7 Rejected: 0 Affected: 5

Format of the output file :
TARGET_TABLE_NAME,INSERTED_APPLIED_ROWS,INSERTED_AFFECTED_ROWS,INSERTED_REJECTED_ROWS,UPDATED_APPLIE D_ROWS,UPDATED_AFFECTED_ROWS,UPDATED_REJECTED_ROWS,DELETED_APPLIED_ROWS,DELETED_AFFECTED_ROWS,DELETE D_REJECTED_ROWS

And the result should be (3 lines) :
CFG,2,2,1,0,0,0,0,0,0
PORTUG_ALL,695,695,0,4224,695,0,0,0,0
REMOTE_DTL,1228,1228,0,5,5,0,7,5,0

The numbers/statistics should be extracted from the latest SOURCE DATE (i.e. Tue Oct 18 09:35:54 2016) I tried many options (cut, sed, awk, ...) but it does not work.
Please read the "Question Guidelines" link in my posting signature. Without knowing what you have done/tried or you posting your actual code, we can't tell you much. You say "tried many options", but don't tell us WHICH ONES, or give us details about what your actual goal is, or how often you need to do this. Solutions for a one-time fix will be different than something that's meant to be run numerous times a week/day.

Also, where is this data coming FROM? Could be there is already an option to save it in CSV format the way you want it.
 
Old 10-18-2016, 11:38 AM   #3
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278

Rep: Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694
If you know how to use awk and sed and all that, then this might help you get started. This will read in 14 lines as a record, with each line treated as a seperate variable, so you can massage the data into what you need:

Code:
cat logfile | while read -r line1; do
read -r line2
read -r line3
read -r line4
read -r line5
read -r line6
read -r line7
read -r line8
read -r line9
read -r line10
read -r line11
read -r line12
read -r line13
read -r line14
done
If this is a serious sized log, and you are planning to do many of them, you WILL DEFINITELY want to use a programmatic language like Python or C++ to do this. Doing it in bash will cost you performance over the long run.

Last edited by szboardstretcher; 10-18-2016 at 11:42 AM.
 
1 members found this post helpful.
Old 10-18-2016, 06:53 PM   #4
dfco
LQ Newbie
 
Registered: Oct 2016
Posts: 17

Original Poster
Rep: Reputation: Disabled
@szboardstretcher : Thanks but it is not working or I am not getting it.

I am not an expert in the awk, sed, ... commands.

I tried :
* cat logfile | cut -f1 -d':'
* cut -d: -f2-5 logfile|grep -o '[0-9]*
* cut -d ":" -f 2- logfile

Looking forward to reading from you.
 
Old 10-18-2016, 07:42 PM   #5
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524

Rep: Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015
Let's break the problem up into bite-sized pieces. A file is made up of lines, and lines are made up of fields (or words) divided by white space (tab, space). Each word is commonly represented as: $1, $2, ..., i.e. if I have somefile that contains:
cat bat
cat mat
cat sat

If I use
Code:
$ grep cat somefile
and the output is:
Code:
cat bat
cat mat
cat sat
then
Code:
$ grep cat somefile | awk '{print $2}'
will yield output:
Code:
bat
sat
mat
Whereas
Code:
grep cat somefile | awk '{print $1,$2}'
will yield output:
Code:
bat cat
mat cat
sat cat
But there is no awk one-liner that will do everything you want to do. You'll need an awk--or better yet PERL--script to do it all. But AWK is easier than PERL, albeit less powerful.
 
1 members found this post helpful.
Old 10-18-2016, 07:51 PM   #6
dfco
LQ Newbie
 
Registered: Oct 2016
Posts: 17

Original Poster
Rep: Reputation: Disabled
Thanks.

PERL ? Do you mean it is not doable easily ?

I can't parse the following line :
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218

and generate an output file like :
1218;1218;0;1218
 
Old 10-18-2016, 09:12 PM   #7
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,367

Rep: Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748
It is doable with a scripted solution.
e.g. For awk, perhaps this will give you some ideas.
Code:
BEGIN {OFS=","}
$2 == "Target:" {TARGET_TABLE_NAME = $3}
$2 == "Inserted" {INSERTED_APPLIED_ROWS = $8;
                  INSERTED_AFFECTED_ROWS = $12;
                  INSERTED_REJECTED_ROWS = $10}
...
/^$/ {print TARGET_TABLE_NAME,INSERTED_APPLIED_ROWS,INSERTED_AFFECTED_ROWS,INSERTED_REJECTED_ROWS;
      TARGET_TABLE_NAME = "";
      INSERTED_APPLIED_ROWS = 0;
      INSERTED_AFFECTED_ROWS = 0;
      INSERTED_REJECTED_ROWS = 0;
      ...}
 
1 members found this post helpful.
Old 10-19-2016, 07:54 AM   #8
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,617

Rep: Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963
Quote:
Originally Posted by dfco View Post
Thanks.
PERL ? Do you mean it is not doable easily ?
Perl IS easy...and thank you for not replying to my questions. I had asked you where this data was coming FROM, since there is a possibility that it can be outputted into CSV natively, and I also asked you to show us what you've done/tried on your own. Why do you ignore these things?
Quote:
I can't parse the following line :
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218

and generate an output file like : 1218;1218;0;1218
Why?? Again, what DID YOU DO to attempt to parse this line??? Just saying "I can't parse the following line", tells us nothing about your efforts. Incidentally, I'm able to parse this almost down to what you need with a few sed statements, but your input and output are NOT MATCHING. For example, you posted this:
Code:
SOURCE DATE Tue Oct 18 08:55:27 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2 

APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 695 Applied: 695 Rejected: 0 Affected: 695 
APP_9784 Updated rows - Requested: 4201 Applied: 4201 Rejected: 0 Affected: 4171 

APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218 
APP_9784 Updated rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2 
APP_9784 Deleted rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 3
...and then said you want THIS for the output:
Code:
And the result should be (3 lines) :
CFG,2,2,1,0,0,0,0,0,0
PORTUG_ALL,695,695,0,4224,695,0,0,0,0
REMOTE_DTL,1228,1228,0,5,5,0,7,5,0
??? Look at the CFG input data...it has 4 numbers...2, 2, 0, and 2. For your 'required' output, you have 2,2,1,0,0,0,0,0,0. Where is that coming from, or are you saying you need the data padded??? And if this is for a database input, again we will ask where the data is coming from.
 
Old 10-19-2016, 08:07 AM   #9
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,294
Blog Entries: 3

Rep: Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719
dfco, please write where the data is coming from. You might have more options than you are aware of. Though the fall-back option is some simple manipulation with perl.
 
Old 10-19-2016, 12:47 PM   #10
dfco
LQ Newbie
 
Registered: Oct 2016
Posts: 17

Original Poster
Rep: Reputation: Disabled
Maybe I need to write I am a beginner and complex Linux scripting is something new.

The log is a csv file and is generated by an application which insert/update/delete records into a table.

I tried to make simplify my need with the last example I gave :

APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
would generate the following line
1218;1218;0;1218

Thanks.
 
Old 10-19-2016, 01:09 PM   #11
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,294
Blog Entries: 3

Rep: Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719
Quote:
Originally Posted by dfco View Post
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
would generate the following line
1218;1218;0;1218
There are several goto tools for system administration. "grep", "sed", "awk", and "perl" top the list. Learning what they can do is a first step. "awk" seems most relevant here based on your sample, if the number of words is always the same.

Code:
awk 'BEGIN{ OFS=":"} {print $6, $8, $10 }' yourfile.txt
So start by looking up the pieces in the manual page for "awk", FS and OFS in particular.

Code:
man awk
It's not a complex language but tutorials will help, and there are many to choose from on the web.
 
1 members found this post helpful.
Old 10-19-2016, 01:17 PM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Quote:
Originally Posted by dfco View Post
Maybe I need to write I am a beginner and complex Linux scripting is something new.

The log is a csv file and is generated by an application which insert/update/delete records into a table.

I tried to make simplify my need with the last example I gave :

APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
would generate the following line
1218;1218;0;1218

Thanks.
No one is asking for you to write anything complex, but we are asking to see your attempt for at present it appears you are making none and waiting for someone to tell you
how to do it ... which is not the LQ way

You mention that the log file is of csv format, but, on an examination of the data you have shown I am unable to find even one comma (seeing as csv :- comma separated values)

Is the name of the application not allowed to be known? This may be the case, but you should at least say so instead of avoiding the question.

Yes your simplified example is much easier, although any solution provided that works on this simple example will more than likely not work on the real one due to the extra data,
and again, you have shown no effort as to what you might have tried.

The general response to no effort is to point you at the manuals for sed/awk/perl or the like and ask you to come back when you have had a go.

So, as asked by many others, please answer the questions asked so you may be provided with the most appropriate response?
 
Old 10-19-2016, 01:19 PM   #13
dfco
LQ Newbie
 
Registered: Oct 2016
Posts: 17

Original Poster
Rep: Reputation: Disabled
Ok. I will try with your recommendation.

Thanks.
 
Old 10-19-2016, 01:42 PM   #14
dfco
LQ Newbie
 
Registered: Oct 2016
Posts: 17

Original Poster
Rep: Reputation: Disabled
What I tried :
grep "Initializing session" /infa_shared/SessLogs/s_m_CDC_GRP5.log | awk -F"[" '{print $2}' | awk -F"]" '{print $1}' > testfile.log
grep "SOURCE BASED COMMIT POINT" s_m_CDC_GRP2.log | awk -F"[" '{print $2}' > testfile.log
cat testfile.log
cut -d: -f 1,3 testfile.log
cut -d: -f 2,4 testfile.log
cut -d: -f 1,2,3,4,5,6 testfile.log
cut -d: -f 1,2,3,4,5,6 testfile.log
cut -d: -f 1 testfile.log
cut -d: -f 2 testfile.log
ls testfile.log | cut -f1 -d':'
cat testfile.log | cut -f1 -d':'
cat testfile.log | cut -f2 -d':'
cat testfile.log | cut -f3 -d':'
cat testfile.log | cut -f4 -d':'
cat testfile.log | cut -f5 -d':'
cat testfile.log | cut -f6 -d':'
cat testfile.log | cut -f6 -d'Affected'
cat testfile.log | cut -f6 -d': '
cut -d: -f6 testfile.log
cut -d: -f1 testfile.log
cut -d: -f2 testfile.log
cut -d: -c2 testfile.log
cut -d: -f2 -f3 testfile.log
cut -d: -f2-3 testfile.log
cut -d: -f2-4 testfile.log
cut -d: -f2-5 testfile.log
cut -d: -f2-5 testfile.log|grep [0-9]
cut -d: -f2-5 testfile.log|grep -o '[0-9]*'
cut -d: -f2-5 testfile.log|grep '[0-9]*'
cut -d: -f2-5 testfile.log|grep -o '[0-9*'
cut -d: -f2-5 testfile.log|grep -o '[0-9]'
cut -d : -f2-5 testfile.log
cut -d ": " -f2-5 testfile.log
cut -d ":"-f2-5 testfile.log
cut -d ":"-f 5- testfile.log
cut -d ":"-f 2- testfile.log
cut -d ":"-f 2-| testfile.log
grep -Eo '[0-9]{1,4}' testfile.log
grep -Eo '[0-9]{1,4}' testfile.log|cut -d: -f2-5
cut -d ":"-f 5- testfile.log|grep -Eo '[0-9]{1,4}'
cut -d ":"-f 5- testfile.log|grep -Eo '[0-9]'
sed 's/[^0-9]//g' testfile.log
grep -o '[0-9]*' testfile.log
cut -d ":"-f 2- testfile.log|grep "^[0-9]"
cut -d ":"-f 2- testfile.log|grep "[0-9]"
cut -d ":"-f 2- testfile.log|grep "[0-9]*"
awk -F"~" testfile.log
awk -F"~" '{print $1}' testfile.log
awk -F":" '{print $1}' testfile.log
awk -F":" '{print $2}' testfile.log
awk -F"Requested" '{print $2}' testfile.log
awk -F"Requested:" '{print $2}' testfile.log
 
Old 10-19-2016, 01:46 PM   #15
dfco
LQ Newbie
 
Registered: Oct 2016
Posts: 17

Original Poster
Rep: Reputation: Disabled
If easier I am fine with the comma.

APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
would generate the following line
1218,1218,0,1218

I didn't know the name of the application was so important. It is ODI (Oracle Data Integrator).
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help with parsing log file sysmicuser Linux - Newbie 5 03-10-2012 05:50 PM
Script for parsing a log file pepepapa82 Linux - Newbie 3 10-04-2011 01:24 AM
Parsing log file with awk sebelk Programming 1 08-31-2009 08:47 AM
Parsing a log file jpostma Linux - Newbie 5 12-05-2008 03:58 PM
Help on parsing a log file in BASH globemast Programming 5 01-11-2007 01:56 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:24 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration