Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hi I am trying to write a proper awk statement to only return hostname entries from a logfile from a week ago to present time.
Logfile format is like this:
27-04-2024_00:04 hostname1 EverythingElseAfterHere
28-04-2024_02:05 hostname2 EverythingElseAfterHere
I thought I could reformat the date to a single string and compare like so:
#!/bin/bash
# get the date from a week ago:
lastweek=$(date +"%Y-%m-%d" --date="1 week ago")
# run today (5/1/24), this returns:
20240424
Then I tried converting field $1 in my file via awk to a similar format:
awk 'n=split($1,a,"[-_]") {print a[3] a[2] a[1]}' mylogfile
# this also looks good, returning as an example:
20240427
Here is where I get stuck. I want to (if possible) use the value of n to compare with lastweek and see if the date (value) is greater:
awk -v lastweek="$lastweek" 'n=split($1,a,"[-_]") {print a[3] a[2] a[1]} n > lastweek {print $2}' mylogfile
# this just returns more dates like '20240427' but I want field 2 with the hostname
I don't even know if I am doing the compare correctly or if its even possible.
I am trying to push the output from the split/print subcommand into 'n' and then compare that timestamp as text to the lastweek text and if n is greater then output $2 (hostname). Its getting messy and I am getting confused now as I am not very familiar with awk.
The date command you used has hyphens, hence why we are re-inserting them here.
Otherwise, the return value of split is not needed, nor is print needed to concatenate, and we use && to make it a single condition/action item.
-
Alternatively, with GNU Awk, there are date functions available, so we can re-format the date into descending order, and use mktime to output a timestamp, e.g:
You have already attracted replies from two of the sharp pencils who share their knowledge here, so nothing to add! But I invite you to visit the Programming forum here at LQ where you may find others eager to offer help with any programming question when needed!
I know you posted it works perfectly and I have not actually played with the code but it depends on what date/times you actually want to "extract". For 1 week ago does then mean based on today 2/5 (or 5/2) anything > 25/4 or >= 25/4? Is the log file in UTC (I would guess) or local time?
boughtonp's script works on seconds so that if you were running the script at say 0900 you would not necessarily see time stamps from 25/4 (again 1 week ago from today 2/5) < 0900.
On the other hand, Turbocapitalist's script should output anything > 25/4 (based on 2/5) regardless of time.
Assuming I am awake enough to follow everything...
split() returns the number of fields i.e. the number of resulting array elements.
A simple string concatenation is done as (a[3] a[2] a[1])
String concatenation in awk does not have an operator; for clarity I wrap it in parentheses.
An alternative is sprintf("%s%s%s", a[3], a[2], a[1])
Is the log file in UTC (I would guess) or local time?
Two good points I meant to mention - I got distracted by wrestling with the idiotic LQ "security" filter not letting me post.
My view is that log files should be UTC (or include timezone), but that's definitely not guaranteed, so it might be necessary to add/remove hours as appropriate.
Quote:
boughtonp's script works on seconds so that if you were running the script at say 0900 you would not necessarily see time stamps from 25/4 (again 1 week ago from today 2/5) < 0900.
This was a deliberate choice to do it that way - again I meant to make it clear but forgot.
If one wanted they can set the hour and minute values to zero for midnight and have it work the other way. (Or indeed, some other fixed time of day if that makes sense for the use-case.)
I had thought about setting the default time to midnight. There are a couple of odd cases where the OP might not get the exact desired data in either script. Depending on the data, the OP's timezone and when the script was set to run, the starting results could be either the day before or day after.
These are issues only the OP can determine - or more likely not give a damn about. "logs from a week ago" is sufficiently vague to not worry about IMHO. Plenty of good (awk) ideas already presented for the OP to work with.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.