Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
|
10-18-2016, 10:23 AM
|
#1
|
LQ Newbie
Registered: Oct 2016
Posts: 17
Rep: 
|
Log file parsing
Hello Experts,
I am new in the website and seeking your help on a specific need.
I need to parse statistics in a csv format. Statistics are coming from a log file (see below) :
SOURCE DATE Tue Oct 18 08:55:27 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2
APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 695 Applied: 695 Rejected: 0 Affected: 695
APP_9784 Updated rows - Requested: 4201 Applied: 4201 Rejected: 0 Affected: 4171
APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
APP_9784 Updated rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2
APP_9784 Deleted rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 3
SOURCE DATE Tue Oct 18 09:35:54 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 3 Applied: 2 Rejected: 1 Affected: 2
APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 795 Applied: 695 Rejected: 0 Affected: 695
APP_9784 Updated rows - Requested: 4224 Applied: 4224 Rejected: 0 Affected: 4220
APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1228 Applied: 1228 Rejected: 0 Affected: 1228
APP_9784 Updated rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 5
APP_9784 Deleted rows - Requested: 7 Applied: 7 Rejected: 0 Affected: 5
Format of the output file :
TARGET_TABLE_NAME,INSERTED_APPLIED_ROWS,INSERTED_AFFECTED_ROWS,INSERTED_REJECTED_ROWS,UPDATED_APPLIE D_ROWS,UPDATED_AFFECTED_ROWS,UPDATED_REJECTED_ROWS,DELETED_APPLIED_ROWS,DELETED_AFFECTED_ROWS,DELETE D_REJECTED_ROWS
And the result should be (3 lines) :
CFG,2,2,1,0,0,0,0,0,0
PORTUG_ALL,695,695,0,4224,695,0,0,0,0
REMOTE_DTL,1228,1228,0,5,5,0,7,5,0
The numbers/statistics should be extracted from the latest SOURCE DATE (i.e. Tue Oct 18 09:35:54 2016)
I tried many options (cut, sed, awk, ...) but it does not work.
I would appreciate any help/suggestion on the matter.
Thanks, Abel
|
|
|
10-18-2016, 11:18 AM
|
#2
|
LQ Guru
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 27,584
|
Quote:
Originally Posted by dfco
Hello Experts,
I am new in the website and seeking your help on a specific need. I need to parse statistics in a csv format. Statistics are coming from a log file (see below) :
SOURCE DATE Tue Oct 18 08:55:27 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2
APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 695 Applied: 695 Rejected: 0 Affected: 695
APP_9784 Updated rows - Requested: 4201 Applied: 4201 Rejected: 0 Affected: 4171
APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
APP_9784 Updated rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2
APP_9784 Deleted rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 3
SOURCE DATE Tue Oct 18 09:35:54 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 3 Applied: 2 Rejected: 1 Affected: 2
APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 795 Applied: 695 Rejected: 0 Affected: 695
APP_9784 Updated rows - Requested: 4224 Applied: 4224 Rejected: 0 Affected: 4220
APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1228 Applied: 1228 Rejected: 0 Affected: 1228
APP_9784 Updated rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 5
APP_9784 Deleted rows - Requested: 7 Applied: 7 Rejected: 0 Affected: 5
Format of the output file :
TARGET_TABLE_NAME,INSERTED_APPLIED_ROWS,INSERTED_AFFECTED_ROWS,INSERTED_REJECTED_ROWS,UPDATED_APPLIE D_ROWS,UPDATED_AFFECTED_ROWS,UPDATED_REJECTED_ROWS,DELETED_APPLIED_ROWS,DELETED_AFFECTED_ROWS,DELETE D_REJECTED_ROWS
And the result should be (3 lines) :
CFG,2,2,1,0,0,0,0,0,0
PORTUG_ALL,695,695,0,4224,695,0,0,0,0
REMOTE_DTL,1228,1228,0,5,5,0,7,5,0
The numbers/statistics should be extracted from the latest SOURCE DATE (i.e. Tue Oct 18 09:35:54 2016) I tried many options (cut, sed, awk, ...) but it does not work.
|
Please read the "Question Guidelines" link in my posting signature. Without knowing what you have done/tried or you posting your actual code, we can't tell you much. You say "tried many options", but don't tell us WHICH ONES, or give us details about what your actual goal is, or how often you need to do this. Solutions for a one-time fix will be different than something that's meant to be run numerous times a week/day.
Also, where is this data coming FROM? Could be there is already an option to save it in CSV format the way you want it.
|
|
|
10-18-2016, 11:38 AM
|
#3
|
Senior Member
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278
|
If you know how to use awk and sed and all that, then this might help you get started. This will read in 14 lines as a record, with each line treated as a seperate variable, so you can massage the data into what you need:
Code:
cat logfile | while read -r line1; do
read -r line2
read -r line3
read -r line4
read -r line5
read -r line6
read -r line7
read -r line8
read -r line9
read -r line10
read -r line11
read -r line12
read -r line13
read -r line14
done
If this is a serious sized log, and you are planning to do many of them, you WILL DEFINITELY want to use a programmatic language like Python or C++ to do this. Doing it in bash will cost you performance over the long run.
Last edited by szboardstretcher; 10-18-2016 at 11:42 AM.
|
|
1 members found this post helpful.
|
10-18-2016, 06:53 PM
|
#4
|
LQ Newbie
Registered: Oct 2016
Posts: 17
Original Poster
Rep: 
|
@szboardstretcher : Thanks but it is not working or I am not getting it.
I am not an expert in the awk, sed, ... commands.
I tried :
* cat logfile | cut -f1 -d':'
* cut -d: -f2-5 logfile|grep -o '[0-9]*
* cut -d ":" -f 2- logfile
Looking forward to reading from you.
|
|
|
10-18-2016, 07:42 PM
|
#5
|
LQ Guru
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524
|
Let's break the problem up into bite-sized pieces. A file is made up of lines, and lines are made up of fields (or words) divided by white space (tab, space). Each word is commonly represented as: $1, $2, ..., i.e. if I have somefile that contains:
cat bat
cat mat
cat sat
If I use
Code:
$ grep cat somefile
and the output is:
Code:
cat bat
cat mat
cat sat
then
Code:
$ grep cat somefile | awk '{print $2}'
will yield output: Whereas
Code:
grep cat somefile | awk '{print $1,$2}'
will yield output:
Code:
bat cat
mat cat
sat cat
But there is no awk one-liner that will do everything you want to do. You'll need an awk--or better yet PERL--script to do it all. But AWK is easier than PERL, albeit less powerful.
|
|
1 members found this post helpful.
|
10-18-2016, 07:51 PM
|
#6
|
LQ Newbie
Registered: Oct 2016
Posts: 17
Original Poster
Rep: 
|
Thanks.
PERL ? Do you mean it is not doable easily ?
I can't parse the following line :
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
and generate an output file like :
1218;1218;0;1218
|
|
|
10-18-2016, 09:12 PM
|
#7
|
LQ 5k Club
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,550
|
It is doable with a scripted solution.
e.g. For awk, perhaps this will give you some ideas.
Code:
BEGIN {OFS=","}
$2 == "Target:" {TARGET_TABLE_NAME = $3}
$2 == "Inserted" {INSERTED_APPLIED_ROWS = $8;
INSERTED_AFFECTED_ROWS = $12;
INSERTED_REJECTED_ROWS = $10}
...
/^$/ {print TARGET_TABLE_NAME,INSERTED_APPLIED_ROWS,INSERTED_AFFECTED_ROWS,INSERTED_REJECTED_ROWS;
TARGET_TABLE_NAME = "";
INSERTED_APPLIED_ROWS = 0;
INSERTED_AFFECTED_ROWS = 0;
INSERTED_REJECTED_ROWS = 0;
...}
|
|
1 members found this post helpful.
|
10-19-2016, 07:54 AM
|
#8
|
LQ Guru
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 27,584
|
Quote:
Originally Posted by dfco
Thanks.
PERL ? Do you mean it is not doable easily ?
|
Perl IS easy...and thank you for not replying to my questions. I had asked you where this data was coming FROM, since there is a possibility that it can be outputted into CSV natively, and I also asked you to show us what you've done/tried on your own. Why do you ignore these things?
Quote:
I can't parse the following line :
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
and generate an output file like : 1218;1218;0;1218
|
Why?? Again, what DID YOU DO to attempt to parse this line??? Just saying "I can't parse the following line", tells us nothing about your efforts. Incidentally, I'm able to parse this almost down to what you need with a few sed statements, but your input and output are NOT MATCHING. For example, you posted this:
Code:
SOURCE DATE Tue Oct 18 08:55:27 2016
===================================================
APP_9874 Target: CFG (Instance Name: [CFG])
APP_9889 Inserted rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2
APP_9874 Target: PORTUG_ALL (Instance Name: [PORTUG_ALL])
APP_9889 Inserted rows - Requested: 695 Applied: 695 Rejected: 0 Affected: 695
APP_9784 Updated rows - Requested: 4201 Applied: 4201 Rejected: 0 Affected: 4171
APP_9874 Target: REMOTE_DTL (Instance Name: [REMOTE_DTL])
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
APP_9784 Updated rows - Requested: 2 Applied: 2 Rejected: 0 Affected: 2
APP_9784 Deleted rows - Requested: 5 Applied: 5 Rejected: 0 Affected: 3
...and then said you want THIS for the output:
Code:
And the result should be (3 lines) :
CFG,2,2,1,0,0,0,0,0,0
PORTUG_ALL,695,695,0,4224,695,0,0,0,0
REMOTE_DTL,1228,1228,0,5,5,0,7,5,0
??? Look at the CFG input data...it has 4 numbers...2, 2, 0, and 2. For your 'required' output, you have 2,2,1,0,0,0,0,0,0. Where is that coming from, or are you saying you need the data padded??? And if this is for a database input, again we will ask where the data is coming from.
|
|
|
10-19-2016, 08:07 AM
|
#9
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,717
|
dfco, please write where the data is coming from. You might have more options than you are aware of. Though the fall-back option is some simple manipulation with perl.
|
|
|
10-19-2016, 12:47 PM
|
#10
|
LQ Newbie
Registered: Oct 2016
Posts: 17
Original Poster
Rep: 
|
Maybe I need to write I am a beginner and complex Linux scripting is something new.
The log is a csv file and is generated by an application which insert/update/delete records into a table.
I tried to make simplify my need with the last example I gave :
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
would generate the following line
1218;1218;0;1218
Thanks.
|
|
|
10-19-2016, 01:09 PM
|
#11
|
LQ Guru
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,717
|
Quote:
Originally Posted by dfco
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
would generate the following line
1218;1218;0;1218
|
There are several goto tools for system administration. "grep", "sed", "awk", and "perl" top the list. Learning what they can do is a first step. "awk" seems most relevant here based on your sample, if the number of words is always the same.
Code:
awk 'BEGIN{ OFS=":"} {print $6, $8, $10 }' yourfile.txt
So start by looking up the pieces in the manual page for "awk", FS and OFS in particular.
It's not a complex language but tutorials will help, and there are many to choose from on the web.
|
|
1 members found this post helpful.
|
10-19-2016, 01:17 PM
|
#12
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,035
|
Quote:
Originally Posted by dfco
Maybe I need to write I am a beginner and complex Linux scripting is something new.
The log is a csv file and is generated by an application which insert/update/delete records into a table.
I tried to make simplify my need with the last example I gave :
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
would generate the following line
1218;1218;0;1218
Thanks.
|
No one is asking for you to write anything complex, but we are asking to see your attempt for at present it appears you are making none and waiting for someone to tell you
how to do it ... which is not the LQ way
You mention that the log file is of csv format, but, on an examination of the data you have shown I am unable to find even one comma (seeing as csv :- comma separated values)
Is the name of the application not allowed to be known? This may be the case, but you should at least say so instead of avoiding the question.
Yes your simplified example is much easier, although any solution provided that works on this simple example will more than likely not work on the real one due to the extra data,
and again, you have shown no effort as to what you might have tried.
The general response to no effort is to point you at the manuals for sed/awk/perl or the like and ask you to come back when you have had a go.
So, as asked by many others, please answer the questions asked so you may be provided with the most appropriate response?
|
|
|
10-19-2016, 01:19 PM
|
#13
|
LQ Newbie
Registered: Oct 2016
Posts: 17
Original Poster
Rep: 
|
Ok. I will try with your recommendation.
Thanks.
|
|
|
10-19-2016, 01:42 PM
|
#14
|
LQ Newbie
Registered: Oct 2016
Posts: 17
Original Poster
Rep: 
|
What I tried :
grep "Initializing session" /infa_shared/SessLogs/s_m_CDC_GRP5.log | awk -F"[" '{print $2}' | awk -F"]" '{print $1}' > testfile.log
grep "SOURCE BASED COMMIT POINT" s_m_CDC_GRP2.log | awk -F"[" '{print $2}' > testfile.log
cat testfile.log
cut -d: -f 1,3 testfile.log
cut -d: -f 2,4 testfile.log
cut -d: -f 1,2,3,4,5,6 testfile.log
cut -d: -f 1,2,3,4,5,6 testfile.log
cut -d: -f 1 testfile.log
cut -d: -f 2 testfile.log
ls testfile.log | cut -f1 -d':'
cat testfile.log | cut -f1 -d':'
cat testfile.log | cut -f2 -d':'
cat testfile.log | cut -f3 -d':'
cat testfile.log | cut -f4 -d':'
cat testfile.log | cut -f5 -d':'
cat testfile.log | cut -f6 -d':'
cat testfile.log | cut -f6 -d'Affected'
cat testfile.log | cut -f6 -d': '
cut -d: -f6 testfile.log
cut -d: -f1 testfile.log
cut -d: -f2 testfile.log
cut -d: -c2 testfile.log
cut -d: -f2 -f3 testfile.log
cut -d: -f2-3 testfile.log
cut -d: -f2-4 testfile.log
cut -d: -f2-5 testfile.log
cut -d: -f2-5 testfile.log|grep [0-9]
cut -d: -f2-5 testfile.log|grep -o '[0-9]*'
cut -d: -f2-5 testfile.log|grep '[0-9]*'
cut -d: -f2-5 testfile.log|grep -o '[0-9*'
cut -d: -f2-5 testfile.log|grep -o '[0-9]'
cut -d : -f2-5 testfile.log
cut -d ": " -f2-5 testfile.log
cut -d ":"-f2-5 testfile.log
cut -d ":"-f 5- testfile.log
cut -d ":"-f 2- testfile.log
cut -d ":"-f 2-| testfile.log
grep -Eo '[0-9]{1,4}' testfile.log
grep -Eo '[0-9]{1,4}' testfile.log|cut -d: -f2-5
cut -d ":"-f 5- testfile.log|grep -Eo '[0-9]{1,4}'
cut -d ":"-f 5- testfile.log|grep -Eo '[0-9]'
sed 's/[^0-9]//g' testfile.log
grep -o '[0-9]*' testfile.log
cut -d ":"-f 2- testfile.log|grep "^[0-9]"
cut -d ":"-f 2- testfile.log|grep "[0-9]"
cut -d ":"-f 2- testfile.log|grep "[0-9]*"
awk -F"~" testfile.log
awk -F"~" '{print $1}' testfile.log
awk -F":" '{print $1}' testfile.log
awk -F":" '{print $2}' testfile.log
awk -F"Requested" '{print $2}' testfile.log
awk -F"Requested:" '{print $2}' testfile.log
|
|
|
10-19-2016, 01:46 PM
|
#15
|
LQ Newbie
Registered: Oct 2016
Posts: 17
Original Poster
Rep: 
|
If easier I am fine with the comma.
APP_9889 Inserted rows - Requested: 1218 Applied: 1218 Rejected: 0 Affected: 1218
would generate the following line
1218,1218,0,1218
I didn't know the name of the application was so important. It is ODI (Oracle Data Integrator).
|
|
|
All times are GMT -5. The time now is 05:03 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|