How do I get lines 30.000 to 40.000 from an Apache access_log file?

Ujjain · 03-26-2009, 06:32 AM

I am looking to run Webalizer between hours 14:00 and 15:00 yesterday. I can search the specific line numbers for 14:00 and 15:00 if that's easier.

Unfortunately I only have log files for a complete day, which are 2GB, I need to have the logs for specific times.

If you require any extra information, please let me know.

reptiler · 03-26-2009, 06:44 AM

There may be a better way, but you could combine head and tail to cut the output.

Code:

head -n 40000 access_log | tail -n 10000

head will pass the first 40000 lines of the log to tail, which then outputs the last 10000, thus effectively outputting lines 30001 to 40000.

openSauce · 03-26-2009, 06:49 AM

Have you looked at awk?

Code:

awk 'NR > 30000 && NR < 40000' file

will print lines 30000-40000, not sure if that's quicker or slower than using head/tail.

If the lines contain timestamps, awk could also be used to find the lines between any given times, so that gives you more flexibility. Check the man page for syntax.

rizwanrafique · 03-26-2009, 07:19 AM

Using sed:

Code:

sed -n "30000,40000p" file

Ujjain · 03-26-2009, 07:52 AM

Thanks for your suggestions! They are all great, but I guess the 'sed' tool is designed for this and therefore the best and fastest solution!
Could you also help me find the easiest way to find the linenumber for a specific time in the Apache logs? The end-result should be an Apache file with logs from 14:00 to 15:00 of yesterday.

rizwanrafique · 03-27-2009, 05:37 AM

If you know the format of data/time in log files you can run grep -n for it. Full command to get line number for a time would be:

Code:

grep -m 1 -n "your time" log_file|cut -f 1 -d ":"