Help with command/script to analyze syslog

oliveoyl · 06-26-2018, 08:00 PM

I have a bunch of files organized by host/date and I need help coming up with some useful stats for it. I'm interested getting a count of the module per user for all the files. I've been looking into awk/sed to do this but if you have any other ideas please share.

[root@logger ]# grep ModuleUsageTracking /var/log/syslog/login*/2018/06/26/messages | awk '{print $6}' | tail
user=bpt,module=gcc/5.2.0,path=/apps/modulefiles/Core/gcc/5.2.0,host=login02,job=none
user=bpt,module=python/3.5.0,path=/apps/modulefiles/Compiler/gcc/5.2/python/3.5.0,host=login02,job=none
user=bpt,module=gcc/5.2.0,path=/apps/modulefiles/Core/gcc/5.2.0,host=login02,job=none
user=bpt,module=python/3.5.0,path=/apps/modulefiles/Compiler/gcc/5.2/python/3.5.0,host=login02,job=none
user=bpt,module=gcc/5.2.0,path=/apps/modulefiles/Core/gcc/5.2.0,host=login02,job=none
user=bpt,module=python/3.5.0,path=/apps/modulefiles/Compiler/gcc/5.2/python/3.5.0,host=login02,job=none
user=zhl,module=gcc/4.8.2,path=/apps/modulefiles/Core/gcc/4.8.2,host=login03,job=none
user=zhl,module=cmake/3.5.2,path=/apps/modulefiles/Compiler/gcc/4.8.2/cmake/3.5.2,host=login03,job=none
user=rew,module=gcc/4.8.2,path=/apps/modulefiles/Core/gcc/4.8.2,host=login03,job=none
user=rew,module=cmake/3.5.2,path=/apps/modulefiles/Compiler/gcc/4.8.2/cmake/3.5.2,host=login03,job=none

oliveoyl · 06-26-2018, 08:18 PM

Looking to get something like this:

user count module

bpt 3 gcc-5.2.0
bpt 3 python-3.5.0
zhl 1 gcc-4.8.2
zhl 1 cmake-3.5.2
.
.
.

syg00 · 06-26-2018, 09:06 PM

Perl or awk would be my preferred - but any you favour would work; python, go, whatever. Hell, you could even use C ...

Given that you use grep piped to awk, I'm guessing your awk is not strong; you can do the selection using regex in awk itself. It also has substringing, but I'd use its ability to define multiple field separators to do the leg work. Then you can easily work on fields. Try this and see if it gets you any further.

Code:

awk -F"[/=,]" '{print $2"\t"$4"-"$5}'

.FWIW I used a one-liner with arrays of arrays (a gawk extension) to produce the following

Code:

User	Count	Module
zhl	1	 gcc-4.8.2
zhl	1	 cmake-3.5.2
bpt	3	 python-3.5.0
bpt	1	 python-2.4.0
bpt	3	 gcc-5.2.0
rew	1	 gcc-4.8.2
rew	1	 cmake-3.5.2

oliveoyl · 06-27-2018, 08:47 AM

Thanks syg00. I ended up with a long awk | awk -F, | sed | sort | uniq command. Can you educate me on your awk field separator -F"[/=,]" ? It works but also returns a bunch of blank output with just -

Quote:

Originally Posted by syg00

Perl or awk would be my preferred - but any you favour would work; python, go, whatever. Hell, you could even use C ...

Given that you use grep piped to awk, I'm guessing your awk is not strong; you can do the selection using regex in awk itself. It also has substringing, but I'd use its ability to define multiple field separators to do the leg work. Then you can easily work on fields. Try this and see if it gets you any further.

Code:

awk -F"[/=,]" '{print $2"\t"$4"-"$5}'

.FWIW I used a one-liner with arrays of arrays (a gawk extension) to produce the following

Code:

User	Count	Module
zhl	1	 gcc-4.8.2
zhl	1	 cmake-3.5.2
bpt	3	 python-3.5.0
bpt	1	 python-2.4.0
bpt	3	 gcc-5.2.0
rew	1	 gcc-4.8.2
rew	1	 cmake-3.5.2

MadeInGermany · 06-27-2018, 02:34 PM

Thefollowing is based on the previous posts (untested)

Code:

awk '/ModuleUsageTracking/ {print $6}' /var/log/syslog/login*/2018/06/26/messages |
awk -F"[/=,]" '
 {out[$2"\t"$4"-"$5]++}
 END {for(i in out) print out[i]"\t"i}
'

The string-addressed array ("out") keeps the strings unique, and its value is used for counting.

oliveoyl · 06-27-2018, 02:55 PM

That works. Thank you.

Some explanation on the field separator -F"[/=,]" would be great. Teach me to fish.

Quote:

Originally Posted by MadeInGermany

Thefollowing is based on the previous posts (untested)

Code:

awk '/ModuleUsageTracking/ {print $6}' /var/log/syslog/login*/2018/06/26/messages |
awk -F"[/=,]" '
 {out[$2"\t"$4"-"$5]++}
 END {for(i in out) print out[i]"\t"i}
'

The string-addressed array ("out") keeps the strings unique, and its value is used for counting.

MadeInGermany · 06-27-2018, 04:16 PM

The [ ] is a character set.
[/=,] is a character that is either / or = or ,

A character set belongs to the regular expression; in fact the field separator is a regular expression.

syg00 · 06-27-2018, 06:44 PM

Nice @MadeInGermany - there always exists a better way of doing things. Much neater than mine.

Some thoughts for the OP:
- character classes are generic, not just awk. "man grep" has a reasonable intro.
- grab the awk doco here. This is full user guide - there are a bunch of tutorials online, but this my "go to" for awk.
- associative arrays can take a while to get use to, but are amazingly effective.