Filtering logs stored in gz archive based on time range
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Filtering logs stored in gz archive based on time range
Dear Linux friends,
Logrotate is conigured to archive logs after 7 days. So for example mail.log.8.gz from Apil 18 contains logs from 11th to 18th April. What I need to do is to filter logs from 23 April 15:00 to 2 June and write those logs to file. Is there a smart way to this like script
Of course you can gunzip or zcat the files to your filter, if this is a one off, then you can just list the files in time order, then just use a editor to omit the data outside your timeframe. Since it is time order it should be obvious which lines to keep, you just need the entries between the first and last line timestamp.
If however you want to do this periodically, it gets very complicated. For instance, I used perl and it has a log file ripper that can generate the time in Unix time, since trying to compare a date string can get really complicated. Since you know what the endpoints are going to be, then you just make sure the log entry is between those values for each log entry. Once it is greater, you know (since in time order) you don't need to do the rest of input.
You could to it in bash using the %s to generate start/end/current. Each entry incurs a exec of date, for a large log file could be really slow. Small log, probably acceptable.
I've done this exact thing about 100 times for various reasons.
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,803
Rep:
Quote:
Originally Posted by elgrandeperro
... since trying to compare a date string can get really complicated. Since you know what the endpoints are going to be, then you just make sure the log entry is between those values for each log entry. Once it is greater, you know (since in time order) you don't need
Well, if logrotate is rotating logs every 7 days, you'll need to examine all the logs that cover the desired time period. Concatenate the records from successive logs into one big output:
When dealing with that default date/time format ("mmm dd HH:MM:SS"), I've resorted to a Perl script that reads the log file line-by-line, extract everything before the hostname, format it as "yyyy-mm-ddTHH:MM:SS", test the record's time/data stamp to see if it's in the desired range:
Code:
$start = "2021-04-23T15:00:00";
$tstop = "2021-06-02T00:)0:)0";
if ( $datetime_ge $tstart ) {
if ( $datetime le $tstop ) {
# write original record to output
}
}
There are many ways to create "$datetime" from the log file bits. One way to get started: use "$months = 'JanFebMarAprMayJun...'" and 'int( index( $months, $month_from_log ) ) + 1'. Day of the month is a piece of cake.
If you have some control over the server, see about setting rsyslog to use ISO 8601 date format in the log records. It makes finding dates that fall after, before, or between certain dates far, far easier. (At least you'd no longer have to convert the dates in your filtering script.) See this old page for details. This change works on newer distributions, too, on non_Ubuntu Linuxes like OpenSUSE and, I strongly suspect, most others. It won't help you with older log files but future searches for records in a given date/time range should be much easier.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.