Effective way to log?

Infernal211283 · 01-10-2006, 07:07 AM

I started to mess around with some very long logs (logging kernel iptables) and figured out that i'll have to shorten them from time to time so it will contain only relevant information (like arriving packets between hours xx:xx to yy:yy), i can shorten them in vim by pressing and holding 'dd' but the logs are huge so it'll take too much time, i think that there should be an option to erase lines in logs from line number x to line number y, if there's more convinient/automatic way to do this i'll appreciate the tip too.

thanks a lot.

Dtsazza · 01-10-2006, 07:47 AM

There is a very convenient way to do this, using sed. It's a very versatile tool (a Stream EDitor); look at a tutorial if you get a chance since it's great for automated text processing (together with awk and grep).

Anyway, to do what you've asked, you can use "sed '{startline},{endline}d' your_log_file", where {startline} and {endline} are the start and end of the lines you want to delete. This by default prints to standard out, so you can redirect it to a file for saving later. For example, to delete lines 1 to 10,000 you would use

Code:

$ sed '1,10000d' your_log_file > new_shorter_log_file

Although sed can also filter based on criteria, grep is the generally used tool for this, and will display only lines of its input that match a regular expression you specify. Depending on the format of your log files it may be tricky to come up with an unambiguous regular expression that will only match log files between a certain time, but it shouldn't be *too* difficult. If you can't work it out yourself, post a few sample lines from a log file and I'll try to guide you through it.

(Another approach is to re-format the log file using awk, so that the date/time column is first (if it isn't already), then sorting based on this new data (if necessary) which will make it much easier to grep over. If this is relevant, I'll demonstrate this too.)

Infernal211283 · 01-11-2006, 12:32 AM

Cheers! That's perfect and i'd really like to hear about reformatting the logs, my iptables log currently looks like this:

Jan 11 08:25:51 fedint kernel: ICMP LOGDROP: IN=eth1 OUT= MAC=00:06:29:4f:42:b4:00:0e:0c:5d:b6:a4:08:00 SRC=192.168.0.1 DST=192.168.0.194 LEN=84 TOS=0x00 PREC=0x00 TTL=128 ID=6898 DF PROTO=ICMP TYPE=0 CODE=0 ID=37476 SEQ=34
Jan 11 08:25:51 fedint kernel: ICMP LOGDROP: IN=eth1 OUT= MAC=00:06:29:4f:42:b4:00:0e:0c:5d:b6:a4:08:00 SRC=192.168.0.1 DST=192.168.0.194 LEN=84 TOS=0x00 PREC=0x00 TTL=128 ID=6964 DF PROTO=ICMP TYPE=0 CODE=0 ID=37476 SEQ=35

how can awk aid me?

and another question, if, say i want to make a script that will do this automatically every hour (with crontab) something like:

touch /var/log/iptables_<time_variable>
sed '1,10000d' /var/log/kernel.log > /var/log/iptables_<time_variable>

so my new shortened logs will be created with the time prefix of when they were created, and i eventually will have a different log for every hour.. actually my question is if there are time_variables available and how to use them.

Thank you.

Infernal211283 · 01-11-2006, 04:02 AM

..........................

Dtsazza · 01-11-2006, 04:42 AM

According to awk's man page (hmm, that's not the same as mine - never mind), awk "...is useful for manipulation of data files, text retrieval and processing, and for prototyping and experimenting with algorithms." I'm sure it has many good uses... but a very common one is for treating input as tabular data and rearranging the columns.

awk works in line-based mode like sed and grep, except it reads in a line as a list of variables - the first 'bit' of text goes into the variable $1, then the next 'bit' (after a bit of whitespace) becomes $2, etc... So for your log example above, if you just called awk on the first line, you'd have $1=Jan, $2=11, $3=08:25:51, $4=fedint, etc. This happens automatically and doesn't require anything on your part - where you use the variables is in telling awk what command to run.

awk commands are part of the AWK Programming Language, which I'll admit I don't know much about. However, they do have a print command, which in conjunction with the variables is perfect for filtering column data (you can even put conditionals in). For example, if you wanted to put the IP address first followed by the date, you could use

Code:

$ awk '{print($11, $1, $2)'}
SRC=192.168.0.1 Jan 11
SRC=192.168.0.1 Jan 11

(if I've counted the spaces properly

). And you could do things like change the order from $1, $2 to $2, $1 if you wanted the date in the form dd-mm instead. You can also insert literal text in prints too, and use substring functions, such as

Code:

$ awk '{print("IP address:", substr($11,5,15), " date:", $1, $2)'} log_file
IP address: 192.168.0.1  date: Jan 11
IP address: 192.168.0.1  date: Jan 11

to basically return the same information as above, but make it a bit prettier (more usefully, you can shoehorn data into a format expected by some other command). If you have a bit of spare time, you may want to look at a fairly short awk tutorial which will give you a few more ideas and examples.

Dtsazza · 01-11-2006, 04:48 AM

And as for your other question, the unix command date tells you the time (rather confusing, but it does tell you the date too; the time command is used to time things that are running). As it stands it gives you quite a long output with spaces, but you can pass in a format such as '%H:%M:%S' to format it how you want (there's an easy to understand list of all the format variables in the man page).

Or of course, you can take its standard output and use awk to rearrange it, but that would be (practice/) silly...

Infernal211283 · 01-11-2006, 07:59 AM

Looks good, thanks for the link also