LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-08-2009, 10:30 AM   #1
benthad
LQ Newbie
 
Registered: Apr 2006
Posts: 5

Rep: Reputation: 0
Remove and replace characters in a logfile


Hello all,

I need to be able to remove the ,073 and num='' from a logfile that contains many lines like below and place a comma in between. The values are different on each line though.

2009-01-05 09:08:23,073 num='12345678999990342343242342342342'

Essentially I want to modify the log to show lines exactly in the format as follows:

2009-01-05, 09:08:23, 12345678999990342343242342342342

Thanks in advance.

Last edited by benthad; 01-08-2009 at 10:31 AM.
 
Old 01-08-2009, 11:37 AM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Which command did you use to remove the unwanted parts? Maybe just a slight modification can do the rest of the work. And what are the differences between the lines? Is the format exactly the same?
 
Old 01-08-2009, 01:20 PM   #3
jan61
Member
 
Registered: Jun 2008
Posts: 235

Rep: Reputation: 47
Moin,

assuming, that all lines have the format "yyyy-mm-dd hh:mm:ss,fff num='nnn...'" you could use sed:
Code:
sed -r "s/([^ ]+)\s+([^,]+),[0-9]+\s+num='([0-9]+)'/\1, \2, \3/" logfile
There are many ways to define the matching patterns. I choosed one, which describes the line as exactly as needed, if all lines look similar. If the file contains lines, which look different but match the pattern abocve too, you might need to refine it.

Jan
 
Old 01-09-2009, 02:41 AM   #4
benthad
LQ Newbie
 
Registered: Apr 2006
Posts: 5

Original Poster
Rep: Reputation: 0
colucix, Initial command was:

cat somelog.2009-01-05 | grep -v name-ID | grep somestring | grep num | awk '{print $1"\t"$2"\t"$3}' > somelog.2009-01-05-stg1

which gives the format in a new logfile (somelog.2009-01-05-stg1) exactly like this:

2009-01-05 23:50:31,618 num='11112345678907895678345678945678'
2009-01-05 23:52:03,917 num='75934857349857857435873498743987'
2009-01-05 23:52:46,541 num='32463245237452376542375473654733'
2009-01-05 23:56:43,209 num='98374839749823749837487387434334'
2009-01-05 23:57:03,672 num='98749832749832743874837487837433'

jan61, thanks for your line of sed code, however, might be something I'm doing wrong but when I run it against the above new outputted file like so:

sed -r "s/([^ ]+)\s+([^,]+),[0-9]+\s+num='([0-9]+)'/\1, \2, \3/" somelog.2009-01-05-stg1

I'm still getting the same results like this when it's run:

2009-01-05 23:50:31,618 num='11112345678907895678345678945678'
2009-01-05 23:52:03,917 num='75934857349857857435873498743987'
2009-01-05 23:52:46,541 num='32463245237452376542375473654733'
2009-01-05 23:56:43,209 num='98374839749823749837487387434334'
2009-01-05 23:57:03,672 num='98749832749832743874837487837433'

Thanks again for your help, greatly appreciated

Last edited by benthad; 01-09-2009 at 02:43 AM.
 
Old 01-09-2009, 03:03 AM   #5
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Since the lines have a fixed format/length you can modify the awk command as follows, using the substr function to get part of the second and third field:
Code:
awk '{printf "%s,\t%s,\t%s\n",$1,substr($2,1,8),substr($3,6,32)}'
this assumes you want a TAB after the commas, as in your example. If you just need a blank space, substitute \t with a space in the command above.

In addition, the sed command suggested by jan61 is excellent. To actually modify the file you have to add the -i option (edit the file in place). A general advice: first test the sed command without the -i option to see and check the result on the standard output, then edit it with -i. Or do a backup copy of the original file.
 
Old 01-09-2009, 03:53 AM   #6
benthad
LQ Newbie
 
Registered: Apr 2006
Posts: 5

Original Poster
Rep: Reputation: 0
Wow, thanks for the quick responses both.

colucix, I'm getting what I want now. Thanks:

running
cat somelog.2009-01-05 | grep -v name-ID | grep somestring | grep sid | awk '{printf "%s,\t%s,\t%s\n",$1,substr($2,1,8),substr($3,6,32)}' > somelog.2009-01-05-stg1

Gives me:

2009-01-05, 23:50:31, 11112345678907895678345678945678
2009-01-05, 23:52:03, 75934857349857857435873498743987
2009-01-05, 23:52:46, 32463245237452376542375473654733
2009-01-05, 23:56:43, 98374839749823749837487387434334
2009-01-05, 23:57:03, 98749832749832743874837487837433

So now I should be able to export this to a mysql database.

You guys rock. I really need to brush up on my sed & awk skills. Think I'll be buying me the O'reily book on this.

Thanks again.
 
Old 01-15-2009, 03:27 PM   #7
jan61
Member
 
Registered: Jun 2008
Posts: 235

Rep: Reputation: 47
Moin,

Quote:
Originally Posted by benthad View Post
...cat somelog.2009-01-05 | grep -v name-ID | grep somestring | grep num | awk '{print $1"\t"$2"\t"$3}' > somelog.2009-01-05-stg1
...
jan61, thanks for your line of sed code, however, might be something I'm doing wrong...
You didn't do something wrong, but your file format is different from the one I pasted from your previous post. You use a TAB as field delimiter, my sed uses only blanks in one pattern. That's why the pattern does not match and the lines are left unchanged. The fix is simple:
Code:
sed -r "s/([^\s]+)\s+([^,]+),[0-9]+\s+num='([0-9]+)'/\1, \2, \3/" somelog.2009-01-05-stg1
The \s matches blanks and tabs - so it should work with your file.

Jan
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed: replace same number of characters between tags unihiekka Linux - Newbie 6 12-30-2008 03:51 AM
Replace Ctrl-M (^M) characters with spaces.... visitnag Linux - Newbie 3 04-16-2008 09:05 AM
How to modify the names of files and replace characters with other characters or symb peter88 Linux - General 2 12-10-2006 03:05 AM
awk: remove similar lines from logfile peos Programming 7 06-19-2006 07:13 AM
replace null characters in a file Philipp Programming 2 09-20-2001 02:29 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:23 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration