LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   awk script to pull out last record (https://www.linuxquestions.org/questions/linux-newbie-8/awk-script-to-pull-out-last-record-937603/)

sarenace 04-01-2012 08:58 PM

awk script to pull out last record
 
Hello, I have a folder full of files that chop their data up into chunks separated by 3 newlines. I want a script to pull out either the last or second last record in each and every file. So far I have been trying this

awk 'BEGIN {RS="\n\n\n"}
END{print}' *

But all this returns is three blank lines. I'm not sure whats going on here, but I cant go through and do it manually, the folder has thousands of files. Any thoughts?

jhwilliams 04-01-2012 09:22 PM

Given the input set in test.txt,

Code:

one


two


three


four

The following awk script produces the result "four".

Code:

awk 'BEGIN { RS= ""; FS="\n+" } END { print $1 }' test.txt

sarenace 04-01-2012 10:03 PM

Umm, how does that help? Maybe i'm too much of a noob to understand what you're driving at here. That will only work for 1 file, I have thousands, the record separator is a space; I need it to be 3 newlines, the field separator is a number of new lines, I need the field separator to be the default, whitespace. Maybe it would help if i gave you an example. My files are hand histories, they have poker hands that look like this:

PokerStars Game #27738502010: Tournament #160417133, $0.25+$0.00 Hold'em No Limit - Level XV (250/500) - 2009/05/02 13:32:38 ET
Table '160417133 3' 9-max Seat #8 is the button
Seat 1: LLC 4Eva (9182 in chips)
Seat 2: 618shooter (25711 in chips) is sitting out
Seat 3: suposd2bRich (21475 in chips)
Seat 4: ElT007 (60940 in chips)
Seat 5: Orlando I (18044 in chips)
Seat 6: ih82bcool2 (8338 in chips)
Seat 7: kovilen007 (8353 in chips)
Seat 8: GerKingTiger (4404 in chips)
Seat 9: Phontaz (23553 in chips)
LLC 4Eva: posts the ante 60
618shooter: posts the ante 60
suposd2bRich: posts the ante 60
ElT007: posts the ante 60
Orlando I: posts the ante 60
ih82bcool2: posts the ante 60
kovilen007: posts the ante 60
GerKingTiger: posts the ante 60
Phontaz: posts the ante 60
Phontaz: posts small blind 250
LLC 4Eva: posts big blind 500
*** HOLE CARDS ***
Dealt to ElT007 [Qd Qc]
618shooter: folds
suposd2bRich: folds
ElT007: raises 2000 to 2500
Orlando I: raises 15484 to 17984 and is all-in
ih82bcool2: folds
kovilen007: calls 8293 and is all-in
GerKingTiger: folds
Phontaz: calls 17734
LLC 4Eva: folds
ElT007: raises 15484 to 33468
Phontaz: calls 5509 and is all-in
Uncalled bet (9975) returned to ElT007
*** FLOP *** [2d 2c 3c]
*** TURN *** [2d 2c 3c] [8h]
*** RIVER *** [2d 2c 3c 8h] [4d]
*** SHOW DOWN ***
Phontaz: shows [9s 9h] (two pair, Nines and Deuces)
ElT007: shows [Qd Qc] (two pair, Queens and Deuces)
618shooter has returned
ElT007 collected 11018 from side pot-2
Orlando I: shows [5d 5h] (two pair, Fives and Deuces)
ElT007 collected 29073 from side pot-1
kovilen007: shows [Kh As] (a pair of Deuces)
ElT007 collected 34212 from main pot
*** SUMMARY ***
Total pot 74303 Main pot 34212. Side pot-1 29073. Side pot-2 11018. | Rake 0
Board [2d 2c 3c 8h 4d]
Seat 1: LLC 4Eva (big blind) folded before Flop
Seat 2: 618shooter folded before Flop (didn't bet)
Seat 3: suposd2bRich folded before Flop (didn't bet)
Seat 4: ElT007 showed [Qd Qc] and won (74303) with two pair, Queens and Deuces
Seat 5: Orlando I showed [5d 5h] and lost with two pair, Fives and Deuces
Seat 6: ih82bcool2 folded before Flop (didn't bet)
Seat 7: kovilen007 showed [Kh As] and lost with a pair of Deuces
Seat 8: GerKingTiger (button) folded before Flop (didn't bet)
Seat 9: Phontaz (small blind) showed [9s 9h] and lost with two pair, Nines and Deuces

With each hand (above) separated from the others by 3 newlines. I want to return the last hand from each tournament, which would mean printing the last record in each of the thousands of tournament files in my folder, when the beginning of my awk script looks like this:

awk 'BEGIN{RS="\n\n\n"}

are things a little clearer now?

Tinkster 04-01-2012 10:10 PM

Code:

awk 'BEGIN{RS=ORS="\n\n\n";FS=OFS="\n"}{last=$0}END{print last}' *
maybe?



Cheers,
Tink

sarenace 04-01-2012 10:33 PM

The suggested script returned four blank lines. Which is really weird. Also I don't need to change the field separator, just the record separator. The fact that it only returns 4 blank lines leads me to believe that awk is going through every file in the folder, and then returning a few blank lines at the end of the last file; that's the problem with using the END variable, it won't do the operation on every file. I was thinking it would have something to do with the FNR variable, but I have NO idea how to work that in. I really wanna figure the majority of this one out for myself instead of bugging you guys, but I don't know how else to do this.

sarenace 04-01-2012 10:48 PM

Just tried another approach, attempting to be proactive, which may help you guys help me. Instead of trying to isolate the last line of each file, I tried to read the file backward and isolate the first, like so:

tac HH* |awk 'BEGIN {RS="\n\n\n"}
FNR==1{print $0}'

Obviously you guys are laughing at me now, as in this example, because awk is reading from stdout, FNR = NR, and will definitely only return the last record. Well the last record turned out to be a blank line. So i substituted FNR==2, and it returned the last record in the first file. So it seems that there is a blank line at the end of every file which is the last record, meaning that what I actually want is the second last record in every file. I had suspected this may be a problem, but now I have proven it. So, any more suggestions?

Tinkster 04-01-2012 10:48 PM

Odd ... I took the snippet you gave us, made three copies of it, changed the last line to
say "large & medium blind" respectively, and got the record with "large" returned.



edit: OK, I had stripped the 3 blank lines from the last record, that's why it worked.
Here's a new version that works with the trailing blank lines, too.

Code:

awk 'BEGIN{RS=ORS="\n\n\n";FS="\n"}{previous=last;last=$0}END{print previous}' poker

sarenace 04-01-2012 11:00 PM

Ok, you script seems to work, but only for 1 hand, as in, awk goes through every file in the folder, reads every record, and then, for the last file, it prints out the last hand. I need the last hand for every file. I feel as if we are closing in on this, and I am very grateful for your help. I have just had something of a brainwave, but have no idea how to realise it in awk, so maybe someone could figure this out?

How about something like: for FNR==1, print out NR-1, where RS="\n\n\n"

This will cause awk to print the previous record evaluated every time FNR resets, as in, every time awk loads a new file. This should cause awk to print out the last record of the previous file. If i get only blank lines, i could substitute NR-2, which would print out the 2nd last record of every file, giving me the result i need. Does anyone have any idea how to script this?

sarenace 04-01-2012 11:06 PM

Ok, idiotically, I just entered

awk 'BEGIN{RS=ORS="\n\n\n"}
FNR==1{print NR-1}' *

Which of course, caused awk to print all the line the current record number -1, giving me a string of random numbers, separated by 3 newlines. Is there anyway to get the print command to see I want it to print out the specified record, rather than the numerical value of the specified record?

sarenace 04-01-2012 11:36 PM

Ok, in another frustrating failed approach, I have entered


awk 'BEGIN{RS="\n\n\n"}
FNR==1{print NR-2}' HH* |sed "s/^/cat HH* |awk 'BEGIN{RS=\"\n\n\n\"}NR==/
s/$/{print \$0}'/" |/bin/bash

which resulted in a very large number of error messages; as well as being an utterly awful ugly script. To explain, the initial awk script finds the record number of the last hand in each file. This information is piped off to sed, which puts the line number in the middle of this command :

cat HH* |awk 'BEGIN{RS"\n\n\n"}NR==(^^record number of last hand in file^^){print $0}';

once the command is produced, its piped off to /bin/bash
However, sed see's the \n line as a newline command, so the output is actually

cat HH* |awk 'BEGIN{RS="


"}NR==40526{print $0}'

Sigh.

sarenace 04-02-2012 12:10 AM

Finally came up with a script that I was certain would work;


awk 'BEGIN{RS="\n\n\n"}
FNR==1{lasthand=NR-2}
NR==lasthand{print $0}' HH*

This may give a better idea of the approach I am taking, the script sets a variable, lasthand, to equal NR-2 for every line where FNR==1. finally, I set the program to print the entire record under the conditions where NR==lasthand. After coming up with this beautiful approach, the program returns a blankline. I am out of ideas. Any thoughts?

grail 04-02-2012 12:58 AM

How about something like:
Code:

awk 'BEGIN{RS="\n\n\n"}FNR == 1 && NR > 1{print last}NF{last = $0}END{print last}' *

sarenace 04-02-2012 01:24 AM

Much obliged kind sir, your script did the job. If its not too much trouble, could you please explain how it works? I am looking over the script and do not understand how you have done this.

grail 04-02-2012 01:45 AM

Sorry ... it is one of the things I keep forgetting to do.

BEGIN{RS="\n\n\n"} - Set record separator to 3 new lines

FNR == 1 && NR > 1{print last} - NR is number of records for all files where FNR is reset to zero for each new file read. Therefore, FNR equal to one will be at the start of each new file and we then
print the last record, stored in variable "last", of the previous file

NF{last = $0} - Here we store the last record in the variable "last" only if there are fields to be read, ie. NF != 0

END{print last} - This is to print the last record from the last file

Interestingly, if you have the latest version of gawk (4+), you could forgo the FNR part and instead of using END they have created a new variable called ENDFILE which as it sounds is executed at the
end of every file, so it would look like:
Code:

awk 'BEGIN{RS="\n\n\n"}NF{last = $0}ENDFILE{print last}' *


All times are GMT -5. The time now is 06:39 PM.