LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   How to match different records based on Same Field values using awk (https://www.linuxquestions.org/questions/programming-9/how-to-match-different-records-based-on-same-field-values-using-awk-4175557932/)

Abdelrahman.fathy 11-04-2015 02:37 AM

How to match different records based on Same Field values using awk
 
Dear Linux Users,
I have data files that looks like this:
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 1cc 1cc 2af0 "" "" "" 07/15 13:27:12
DENY IamFeatureName1 1.0 IamUserName IamHostName "" 1 -18 1 07/15 13:27
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 14f 1cc 2b88 "" "" "" 07/15 13:27:18
DENY IamFeatureName1 1.0 IamUserName IamHostName "" 1 -18 1 07/15 13:27
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 2cf 1cc bfc "" "" "" 07/15 13:27:24
DENY IamFeatureName1 1.0 IamUserName IamHostName "" 1 -18 1 07/15 13:29
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 3ed 1cc ee4 "" "" "" 07/15 13:29:08
DENY IamFeatureName1 1.0 IamUserName IamHostName "" 1 -18 1 07/15 13:29
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 45d 1cc 298 "" "" "" 07/15 13:29:12
DENY IamFeatureName1 1.0 IamUserName IamHostName "" 1 -18 1 07/15 13:29
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 492 1cc b24 "" "" "" 07/15 13:29:16
DENY IamFeatureName1 1.0 IamUserName IamHostName "" 1 -18 1 07/15 13:29
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 50c 1cc 2528 "" "" "" 07/15 13:29:20
DENY IamFeatureName1 1.0 IamUserName IamHostName "" 1 -18 1 07/15 13:29
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 378 1cc 1c4c "" "" "" 07/15 13:29:23
DENY IamFeatureName1 1.0 IamUserName IamHostName "" 1 -18 1 07/15 13:29
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 2a9 1cc 1f1c "" "" "" 07/15 13:29:26
DENY IamFeatureName1 1.0 IamUserName IamHostName "" 1 -18 1 07/15 13:29
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 4c9 1cc 2970 "" "" "" 07/15 13:29:29
DENY IamFeatureName1 1.0 IamUserName IamHostName "" 1 -18 1 07/15 13:29
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 227 1cc 8a4 "" "" "" 07/15 13:29:33
DENY IamFeatureName1 1.0 IamUserName IamHostName "" 1 -18 1 07/15 13:29
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 541 1cc 22dc "" "" "" 07/15 13:29:36
DENY IamFeatureName1 1.0 IamUserName IamHostName "" 1 -18 1 07/15 13:29
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 581 1cc 2d0 "" "" "" 07/15 13:29:39
DENY IamFeatureName1 1.0 IamUserName IamHostName "" 1 -18 1 07/15 13:29
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 5c1 1cc 18e8 "" "" "" 07/15 13:29:42
DENY IamFeatureName1 1.0 IamUserName IamHostName "" 1 -18 1 07/15 13:29
OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 601 1cc 2844 "" "" "" 07/15 13:29:45
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 20 0 1cc 07/15 13:31:39
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 20 0 14f 07/15 13:31:41
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 20 0 2cf 07/15 13:31:42
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 20 0 3ed 07/15 13:31:43
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 20 0 45d 07/15 13:31:45
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 20 0 492 07/15 13:31:46
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 20 0 50c 07/15 13:31:47
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 20 0 378 07/15 13:31:48
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 20 0 2a9 07/15 13:31:49
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 20 0 4c9 07/15 13:31:52
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 20 0 601 07/15 13:32:09
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 20 0 5c1 07/15 13:32:13
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 20 0 541 07/15 13:32:13
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 20 0 581 07/15 13:32:13
IN 2 IamFeatureName 2013.11 IamUserName IamHostName "" 20 0 0 227 07/15 13:32:13
IN 1 results 2013.11 IamUserName IamHostName "" 20 40 0 24f 07/15 13:32:17
IN 1 results 2013.11 IamUserName IamHostName "" 20 20 0 f6 07/15 13:32:17


it's basically report log for rlms license server

I was wondering how to match two records starting was Keywords OUT and IN based on field 2(FeatureName),3(FeatureVersion),5(Username),6(Machine Name),11(Handle) starting with Out Keyword with Fields 3(FeatureName),4(FeatureVersion),5(Username),6(MachineName),11(Handle) and print matching Pairs of OUT/IN




i've written this code but it does have some issue

Code:

awk '
BEGIN{OFS=",";}
{if ($1 ~ /OUT/)
{
        OUTFeatureName[$3]=$3;
        OUTFeatureVersion[$3]=$4;
        OUTUsername[$3]=$5;
        OUTHostname[$3]=$6;
        OUTHandle[$3]=$11;
}

else if ($1 ~ /IN/)
{
        INFeatureName[$3]=$3;
        INFeatureVersion[$3]=$4;
        INUsername[$3]=$5;
        INHostname[$3]=$6;
        INHandle[$3]=$11;
}

 {
        for (var in INFeatureName )
        {
                print NR;
                if(match(INFeatureName[var],OUTFeatureName[var])>0 && match(INFeatureVersion[var],OUTFeatureVersion[var])>0 && match(INHostname[var],OUTHostname[var])>0 && match(INHandle[var],OUTHandle[var])>0 && match(INUsername[var],OUTUsername[var])>0)
                        print $0;
                  else
                    print "No records"
       
        }
 }

}' fileName


grail 11-04-2015 02:58 AM

Firstly, please use [code][/code] tags around code and data to keep them readable and scrollable.
Quote:

i've written this code but it does have some issue
And would you like to tell us the issue?

As far as the data goes, will ALL the OUT's always appear before ALL the IN's? If so, simply store all the OUT's and on hitting the first IN, start testing that the data you are looking for matches.

I am not able to follow your logic in the code 100, but what is obvious is that you store IN/OUT but then use the for loop straight away, but you may not have read all the data required (see above first)
If you cannot guarantee IN's after OUT's you will need to store all the data and then perform your for loop in the END stanza (opposite to your BEGIN you have used)

Abdelrahman.fathy 11-04-2015 04:05 AM

Dear grail,
First thanks very much,

I was wondering if the following code is better,It's still missing array indices comparsion??

but i took your point into consideration
Code:

awk '
BEGIN{OFS=",";}
{
        if ($1 ~ /OUT/)
        {
                OutRecord[$2FS$3FS$5FS$6]=$0
        }

        else if ($1 ~ /IN/)
        {
                InRecord[$3FS$4FS$5FS$6]=$0
        }

       

}END {
                for (var in InRecord )
                {
                        #print "IN record index is: "  var;
                         
                        if(match(InRecord[var],OutRecord[var])>0) # this is wrong because we are supposed to match indices not the record itself
                                {
                                        print "Printing Out: " OutRecord[var];
                                        print "Printing In: " InRecord[var];
                                }
                          else
                                print "The index did not match were: "
                }
        }'


Abdelrahman.fathy 11-04-2015 04:33 AM

also i tried this code but it's misbehaving:

Code:

awk '
BEGIN{OFS=",";}
{if ($1 ~ /OUT/)
OutRecord[$2FS$3FS$5FS$6]=$0;
else if ($1 ~ /IN/)
InRecord[$3FS$4FS$5FS$6]=$0;
}END {
for (var in InRecord )
{
for (varOut in OutRecord)
{
if (var in OutRecord )
{
#print "current var is: " var,varout
print "OUT record: " OutRecord[varOut]
print "IN record: "  InRecord[var]       
}
#else
#print "No records found between: " varOut,var
}
}
}' file


Abdelrahman.fathy 11-04-2015 04:39 AM

Maybe this is fine:

Code:

awk '
BEGIN{OFS=",";}
{if ($1 ~ /OUT/)
OutRecord[$2FS$3FS$5FS$6]=$0;
else if ($1 ~ /IN/)
InRecord[$3FS$4FS$5FS$6]=$0;
}END {
for (var in InRecord )
{
        if (var in OutRecord  )
{
#print "current var is: " var,varout
print "OUT record: " OutRecord[var]
print "IN record: "  InRecord[var]       
}

else
print "No records found for: " var
}
}' file


grail 11-04-2015 05:33 AM

So it is good to see you are iteratively getting to where you need to be :) So you have the results you need now? If so, please mark as SOLVED

Abdelrahman.fathy 11-04-2015 05:37 AM

First thanks very much, you empowered me towards it

Abdelrahman.fathy 11-08-2015 01:33 AM

Dear LinuxQuestion Community,

Can you please help me , I want to concatenate a calculated variable value(using mktime() function) to the whole record $0.

Code:

gsub(/[/]/," ",$17);
gsub(/[:]/," ",$18);
OutDate=("2015 ") $17 " " $18;
OUTTime=mktime(OutDate);
print $1FS$2FS$3FS$4FS$5FS$6FS$7FS$8FS$9FS$10FS$11FS$12FS$13FS$14FS$15FS$16FS$17$18FSOUTTime;
OutRecord[$2FS$3FS$5FS$6FS$11]=$1FS$2FS$3FS$4FS$5FS$6FS$7FS$8FS$9FS$10FS$11FS$12FS$13FS$14FS$15FS$16FS$17$18FSOUTTime;

but it does not give me the result i wanted

Abdelrahman.fathy 11-08-2015 02:11 AM

I found this as an answer,
Out=$1" "$2" "$3" "$4" "$5" "$6" "$7" "$8" "$9" "$10" "$11" "$12" "$13" "$14" "$15" "$16" "OUTTime;

but maybe , there is a better solution

grail 11-08-2015 03:28 AM

I am a little confused at what you are trying to do here? mktime function takes one argument with either 6 or 7 fields in it ("YYYY MM DD HH MM SS [DST]"), what values are in the 18 fields you are using??

Abdelrahman.fathy 11-08-2015 03:34 AM

Thanks grail For Answering.
the required arguments for Mktime is given in column 17 , 18 i am just concatenting the year and pass all values to mktime.

if you need more information about the file format
http://www.reprisesoftware.com/RLM_Enduser.html
in the section entitled "Reportlog File Format" , you will find all details about the file format i'm trying to process.
My apology for not uploading an actual file from the server,as it invades my customer privacy.

grail 11-08-2015 04:52 AM

Ok, thanks for explanation, but my next question would be, why are you re-printing all 18 fields when you could just use $0?

I see in post #9 you have removed 17 and 18, so in that case you could simply zero them after they have been used for mktime, so something like:
Code:

awk '{dspec = "2015"FS$17FS$18; NF=16;print $0,mktime(gensub(/[:/]/," ","g",dspec))}'

Abdelrahman.fathy 11-08-2015 09:58 PM

grail thanks very much,
I did remove more fields based on requirments,
so i have
Out=$1" "$2" "$3" "$5" "$6" "$8" "$11" "OUTTime;
In=$1" "$2" "$3" "$4" "$5" "$6" "$9" "$11" "INTime

Abdelrahman.fathy 11-10-2015 04:37 AM

grail or anybody can you please help me,
I found that the date and time in my file are in local time which is AST (UTC+3)
i need to convert those in my awk script to GMT

Is there are any method to do this.

grail 11-10-2015 10:20 AM

Have you read the relevant page from the manual?

http://www.gnu.org/software/gawk/man...Time-Functions

Abdelrahman.fathy 11-10-2015 10:30 AM

Grail if you are talking about strftime and UTC flag ,I cannot confirm if it did work. I will check it and let you know

Abdelrahman.fathy 11-11-2015 02:30 AM

grail;
i've used this function and it's doing the job

Code:

function ConvAST_To_GMT(timestamp)
{
        c="date -ud "timestamp;
        out=c|getline;
        close(c);
        return $0;
       
}

and i call it using IT="@" InFields[9] this as argument
where InFields[9] is the timestamp output from mktime


Another question is i have time like hh:mm
i want to concatenate 00 as ss
so that i can send the output to mktime
i tried this
DenyTimePart=OutDenials[11] "" ":" "00";
and so many other variation but none of them did the job

grail 11-11-2015 03:39 AM

So here is the simple line I used to convert to UTC:
Code:

echo 'OUT IamFeatureName 2013.11 9 IamUserName IamHostName "" 20 20 0 1cc 1cc 2af0 "" "" "" 07/15 13:27:12' | awk '{dspec = "2015"FS$17FS$18; NF=16;print $0,strftime("a %b %e %H:%M:%S %Z %Y",mktime(gensub(/[:/]/," ","g",dspec)),1),dspec}'
As for appending '00', I would need more information on why you think it does not work? I do see a needless pair of quotes after the array, but would not see these causing any issue.

Abdelrahman.fathy 11-11-2015 03:44 AM

thanks for the response but these needless pair of qoutes are needed for forcing awk to convert number to string so i can right pad with two zeros in place of seconds ofcousrse with space between them
for example if i have 10:15 as OutDenials[11]

i need output like 10 15 00

Abdelrahman.fathy 11-11-2015 03:52 AM

also your line that you posted produces an error:
awk: fatal: 3 is invalid as number of arguments for strftime
maybe it's the awk version(i've GNU Awk 3.1.5) or something else

grail 11-11-2015 05:02 AM

Well i see no reference to say it is a new feature to include the utc-flag, but maybe, my version is 4.1.3

As for the concatenation, I am not sure I follow, the below simple test seems to work just fine:
Code:

echo "10:57" | awk '{print $0":00"}'
No additional sets of quotes needed

Abdelrahman.fathy 11-11-2015 10:11 PM

grail,
I'm sorry to bother you too much but can you look at the output , i've exactly used your statment:
Code:

{print $0":00"}


:00Y Blah_check 1.0 trudy DellHd-77 "" 1 -18 1 07/15 13:27


I mean the keyword Deny in the start have been overwritten

Abdelrahman.fathy 11-11-2015 10:32 PM

Grail, it's ok i find my problem,
It's the difference between MSFT(Microsoft) and linux
i guess there was some non printable or some control character difference , so i used dos2unix and convert it and now your code is running fine.


All times are GMT -5. The time now is 09:55 AM.