Remove and replace characters in a logfile

benthad · 01-08-2009, 10:30 AM

Hello all,

I need to be able to remove the ,073 and num='' from a logfile that contains many lines like below and place a comma in between. The values are different on each line though.

2009-01-05 09:08:23,073 num='12345678999990342343242342342342'

Essentially I want to modify the log to show lines exactly in the format as follows:

2009-01-05, 09:08:23, 12345678999990342343242342342342

Thanks in advance.

colucix · 01-08-2009, 11:37 AM

Which command did you use to remove the unwanted parts? Maybe just a slight modification can do the rest of the work. And what are the differences between the lines? Is the format exactly the same?

jan61 · 01-08-2009, 01:20 PM

Moin,

assuming, that all lines have the format "yyyy-mm-dd hh:mm:ss,fff num='nnn...'" you could use sed:

Code:

sed -r "s/([^ ]+)\s+([^,]+),[0-9]+\s+num='([0-9]+)'/\1, \2, \3/" logfile

There are many ways to define the matching patterns. I choosed one, which describes the line as exactly as needed, if all lines look similar. If the file contains lines, which look different but match the pattern abocve too, you might need to refine it.

Jan

benthad · 01-09-2009, 02:41 AM

colucix, Initial command was:

cat somelog.2009-01-05 | grep -v name-ID | grep somestring | grep num | awk '{print $1"\t"$2"\t"$3}' > somelog.2009-01-05-stg1

which gives the format in a new logfile (somelog.2009-01-05-stg1) exactly like this:

2009-01-05 23:50:31,618 num='11112345678907895678345678945678'
2009-01-05 23:52:03,917 num='75934857349857857435873498743987'
2009-01-05 23:52:46,541 num='32463245237452376542375473654733'
2009-01-05 23:56:43,209 num='98374839749823749837487387434334'
2009-01-05 23:57:03,672 num='98749832749832743874837487837433'

jan61, thanks for your line of sed code, however, might be something I'm doing wrong but when I run it against the above new outputted file like so:

sed -r "s/([^ ]+)\s+([^,]+),[0-9]+\s+num='([0-9]+)'/\1, \2, \3/" somelog.2009-01-05-stg1

I'm still getting the same results like this when it's run:

2009-01-05 23:50:31,618 num='11112345678907895678345678945678'
2009-01-05 23:52:03,917 num='75934857349857857435873498743987'
2009-01-05 23:52:46,541 num='32463245237452376542375473654733'
2009-01-05 23:56:43,209 num='98374839749823749837487387434334'
2009-01-05 23:57:03,672 num='98749832749832743874837487837433'

Thanks again for your help, greatly appreciated

colucix · 01-09-2009, 03:03 AM

Since the lines have a fixed format/length you can modify the awk command as follows, using the substr function to get part of the second and third field:

Code:

awk '{printf "%s,\t%s,\t%s\n",$1,substr($2,1,8),substr($3,6,32)}'

this assumes you want a TAB after the commas, as in your example. If you just need a blank space, substitute \t with a space in the command above.

In addition, the sed command suggested by jan61 is excellent. To actually modify the file you have to add the -i option (edit the file in place). A general advice: first test the sed command without the -i option to see and check the result on the standard output, then edit it with -i. Or do a backup copy of the original file.

benthad · 01-09-2009, 03:53 AM

Wow, thanks for the quick responses both.

colucix, I'm getting what I want now. Thanks:

running
cat somelog.2009-01-05 | grep -v name-ID | grep somestring | grep sid | awk '{printf "%s,\t%s,\t%s\n",$1,substr($2,1,8),substr($3,6,32)}' > somelog.2009-01-05-stg1

Gives me:

2009-01-05, 23:50:31, 11112345678907895678345678945678
2009-01-05, 23:52:03, 75934857349857857435873498743987
2009-01-05, 23:52:46, 32463245237452376542375473654733
2009-01-05, 23:56:43, 98374839749823749837487387434334
2009-01-05, 23:57:03, 98749832749832743874837487837433

So now I should be able to export this to a mysql database.

You guys rock. I really need to brush up on my sed & awk skills. Think I'll be buying me the O'reily book on this.

Thanks again.

jan61 · 01-15-2009, 03:27 PM

Moin,

Quote:

Originally Posted by benthad

...cat somelog.2009-01-05 | grep -v name-ID | grep somestring | grep num | awk '{print $1"\t"$2"\t"$3}' > somelog.2009-01-05-stg1
...
jan61, thanks for your line of sed code, however, might be something I'm doing wrong...

You didn't do something wrong, but your file format is different from the one I pasted from your previous post. You use a TAB as field delimiter, my sed uses only blanks in one pattern. That's why the pattern does not match and the lines are left unchanged. The fix is simple:

Code:

sed -r "s/([^\s]+)\s+([^,]+),[0-9]+\s+num='([0-9]+)'/\1, \2, \3/" somelog.2009-01-05-stg1

The \s matches blanks and tabs - so it should work with your file.

Jan