LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-01-2012, 09:58 PM   #1
sarenace
Member
 
Registered: Feb 2012
Posts: 57

Rep: Reputation: Disabled
awk script to pull out last record


Hello, I have a folder full of files that chop their data up into chunks separated by 3 newlines. I want a script to pull out either the last or second last record in each and every file. So far I have been trying this

awk 'BEGIN {RS="\n\n\n"}
END{print}' *

But all this returns is three blank lines. I'm not sure whats going on here, but I cant go through and do it manually, the folder has thousands of files. Any thoughts?
 
Old 04-01-2012, 10:22 PM   #2
jhwilliams
Senior Member
 
Registered: Apr 2007
Location: Portland, OR
Distribution: Debian, Android, LFS
Posts: 1,168

Rep: Reputation: 210Reputation: 210Reputation: 210
Given the input set in test.txt,

Code:
one


two


three


four
The following awk script produces the result "four".

Code:
awk 'BEGIN { RS= ""; FS="\n+" } END { print $1 }' test.txt
 
Old 04-01-2012, 11:03 PM   #3
sarenace
Member
 
Registered: Feb 2012
Posts: 57

Original Poster
Rep: Reputation: Disabled
Umm, how does that help? Maybe i'm too much of a noob to understand what you're driving at here. That will only work for 1 file, I have thousands, the record separator is a space; I need it to be 3 newlines, the field separator is a number of new lines, I need the field separator to be the default, whitespace. Maybe it would help if i gave you an example. My files are hand histories, they have poker hands that look like this:

PokerStars Game #27738502010: Tournament #160417133, $0.25+$0.00 Hold'em No Limit - Level XV (250/500) - 2009/05/02 13:32:38 ET
Table '160417133 3' 9-max Seat #8 is the button
Seat 1: LLC 4Eva (9182 in chips)
Seat 2: 618shooter (25711 in chips) is sitting out
Seat 3: suposd2bRich (21475 in chips)
Seat 4: ElT007 (60940 in chips)
Seat 5: Orlando I (18044 in chips)
Seat 6: ih82bcool2 (8338 in chips)
Seat 7: kovilen007 (8353 in chips)
Seat 8: GerKingTiger (4404 in chips)
Seat 9: Phontaz (23553 in chips)
LLC 4Eva: posts the ante 60
618shooter: posts the ante 60
suposd2bRich: posts the ante 60
ElT007: posts the ante 60
Orlando I: posts the ante 60
ih82bcool2: posts the ante 60
kovilen007: posts the ante 60
GerKingTiger: posts the ante 60
Phontaz: posts the ante 60
Phontaz: posts small blind 250
LLC 4Eva: posts big blind 500
*** HOLE CARDS ***
Dealt to ElT007 [Qd Qc]
618shooter: folds
suposd2bRich: folds
ElT007: raises 2000 to 2500
Orlando I: raises 15484 to 17984 and is all-in
ih82bcool2: folds
kovilen007: calls 8293 and is all-in
GerKingTiger: folds
Phontaz: calls 17734
LLC 4Eva: folds
ElT007: raises 15484 to 33468
Phontaz: calls 5509 and is all-in
Uncalled bet (9975) returned to ElT007
*** FLOP *** [2d 2c 3c]
*** TURN *** [2d 2c 3c] [8h]
*** RIVER *** [2d 2c 3c 8h] [4d]
*** SHOW DOWN ***
Phontaz: shows [9s 9h] (two pair, Nines and Deuces)
ElT007: shows [Qd Qc] (two pair, Queens and Deuces)
618shooter has returned
ElT007 collected 11018 from side pot-2
Orlando I: shows [5d 5h] (two pair, Fives and Deuces)
ElT007 collected 29073 from side pot-1
kovilen007: shows [Kh As] (a pair of Deuces)
ElT007 collected 34212 from main pot
*** SUMMARY ***
Total pot 74303 Main pot 34212. Side pot-1 29073. Side pot-2 11018. | Rake 0
Board [2d 2c 3c 8h 4d]
Seat 1: LLC 4Eva (big blind) folded before Flop
Seat 2: 618shooter folded before Flop (didn't bet)
Seat 3: suposd2bRich folded before Flop (didn't bet)
Seat 4: ElT007 showed [Qd Qc] and won (74303) with two pair, Queens and Deuces
Seat 5: Orlando I showed [5d 5h] and lost with two pair, Fives and Deuces
Seat 6: ih82bcool2 folded before Flop (didn't bet)
Seat 7: kovilen007 showed [Kh As] and lost with a pair of Deuces
Seat 8: GerKingTiger (button) folded before Flop (didn't bet)
Seat 9: Phontaz (small blind) showed [9s 9h] and lost with two pair, Nines and Deuces

With each hand (above) separated from the others by 3 newlines. I want to return the last hand from each tournament, which would mean printing the last record in each of the thousands of tournament files in my folder, when the beginning of my awk script looks like this:

awk 'BEGIN{RS="\n\n\n"}

are things a little clearer now?
 
Old 04-01-2012, 11:10 PM   #4
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,066
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
Code:
awk 'BEGIN{RS=ORS="\n\n\n";FS=OFS="\n"}{last=$0}END{print last}' *
maybe?



Cheers,
Tink
 
Old 04-01-2012, 11:33 PM   #5
sarenace
Member
 
Registered: Feb 2012
Posts: 57

Original Poster
Rep: Reputation: Disabled
The suggested script returned four blank lines. Which is really weird. Also I don't need to change the field separator, just the record separator. The fact that it only returns 4 blank lines leads me to believe that awk is going through every file in the folder, and then returning a few blank lines at the end of the last file; that's the problem with using the END variable, it won't do the operation on every file. I was thinking it would have something to do with the FNR variable, but I have NO idea how to work that in. I really wanna figure the majority of this one out for myself instead of bugging you guys, but I don't know how else to do this.
 
Old 04-01-2012, 11:48 PM   #6
sarenace
Member
 
Registered: Feb 2012
Posts: 57

Original Poster
Rep: Reputation: Disabled
Just tried another approach, attempting to be proactive, which may help you guys help me. Instead of trying to isolate the last line of each file, I tried to read the file backward and isolate the first, like so:

tac HH* |awk 'BEGIN {RS="\n\n\n"}
FNR==1{print $0}'

Obviously you guys are laughing at me now, as in this example, because awk is reading from stdout, FNR = NR, and will definitely only return the last record. Well the last record turned out to be a blank line. So i substituted FNR==2, and it returned the last record in the first file. So it seems that there is a blank line at the end of every file which is the last record, meaning that what I actually want is the second last record in every file. I had suspected this may be a problem, but now I have proven it. So, any more suggestions?
 
Old 04-01-2012, 11:48 PM   #7
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,066
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
Odd ... I took the snippet you gave us, made three copies of it, changed the last line to
say "large & medium blind" respectively, and got the record with "large" returned.



edit: OK, I had stripped the 3 blank lines from the last record, that's why it worked.
Here's a new version that works with the trailing blank lines, too.

Code:
awk 'BEGIN{RS=ORS="\n\n\n";FS="\n"}{previous=last;last=$0}END{print previous}' poker
 
Old 04-02-2012, 12:00 AM   #8
sarenace
Member
 
Registered: Feb 2012
Posts: 57

Original Poster
Rep: Reputation: Disabled
Ok, you script seems to work, but only for 1 hand, as in, awk goes through every file in the folder, reads every record, and then, for the last file, it prints out the last hand. I need the last hand for every file. I feel as if we are closing in on this, and I am very grateful for your help. I have just had something of a brainwave, but have no idea how to realise it in awk, so maybe someone could figure this out?

How about something like: for FNR==1, print out NR-1, where RS="\n\n\n"

This will cause awk to print the previous record evaluated every time FNR resets, as in, every time awk loads a new file. This should cause awk to print out the last record of the previous file. If i get only blank lines, i could substitute NR-2, which would print out the 2nd last record of every file, giving me the result i need. Does anyone have any idea how to script this?
 
Old 04-02-2012, 12:06 AM   #9
sarenace
Member
 
Registered: Feb 2012
Posts: 57

Original Poster
Rep: Reputation: Disabled
Ok, idiotically, I just entered

awk 'BEGIN{RS=ORS="\n\n\n"}
FNR==1{print NR-1}' *

Which of course, caused awk to print all the line the current record number -1, giving me a string of random numbers, separated by 3 newlines. Is there anyway to get the print command to see I want it to print out the specified record, rather than the numerical value of the specified record?
 
Old 04-02-2012, 12:36 AM   #10
sarenace
Member
 
Registered: Feb 2012
Posts: 57

Original Poster
Rep: Reputation: Disabled
Ok, in another frustrating failed approach, I have entered


awk 'BEGIN{RS="\n\n\n"}
FNR==1{print NR-2}' HH* |sed "s/^/cat HH* |awk 'BEGIN{RS=\"\n\n\n\"}NR==/
s/$/{print \$0}'/" |/bin/bash

which resulted in a very large number of error messages; as well as being an utterly awful ugly script. To explain, the initial awk script finds the record number of the last hand in each file. This information is piped off to sed, which puts the line number in the middle of this command :

cat HH* |awk 'BEGIN{RS"\n\n\n"}NR==(^^record number of last hand in file^^){print $0}';

once the command is produced, its piped off to /bin/bash
However, sed see's the \n line as a newline command, so the output is actually

cat HH* |awk 'BEGIN{RS="


"}NR==40526{print $0}'

Sigh.
 
Old 04-02-2012, 01:10 AM   #11
sarenace
Member
 
Registered: Feb 2012
Posts: 57

Original Poster
Rep: Reputation: Disabled
Finally came up with a script that I was certain would work;


awk 'BEGIN{RS="\n\n\n"}
FNR==1{lasthand=NR-2}
NR==lasthand{print $0}' HH*

This may give a better idea of the approach I am taking, the script sets a variable, lasthand, to equal NR-2 for every line where FNR==1. finally, I set the program to print the entire record under the conditions where NR==lasthand. After coming up with this beautiful approach, the program returns a blankline. I am out of ideas. Any thoughts?
 
Old 04-02-2012, 01:58 AM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
How about something like:
Code:
awk 'BEGIN{RS="\n\n\n"}FNR == 1 && NR > 1{print last}NF{last = $0}END{print last}' *
 
1 members found this post helpful.
Old 04-02-2012, 02:24 AM   #13
sarenace
Member
 
Registered: Feb 2012
Posts: 57

Original Poster
Rep: Reputation: Disabled
Much obliged kind sir, your script did the job. If its not too much trouble, could you please explain how it works? I am looking over the script and do not understand how you have done this.
 
Old 04-02-2012, 02:45 AM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
Sorry ... it is one of the things I keep forgetting to do.

BEGIN{RS="\n\n\n"} - Set record separator to 3 new lines

FNR == 1 && NR > 1{print last} - NR is number of records for all files where FNR is reset to zero for each new file read. Therefore, FNR equal to one will be at the start of each new file and we then
print the last record, stored in variable "last", of the previous file

NF{last = $0} - Here we store the last record in the variable "last" only if there are fields to be read, ie. NF != 0

END{print last} - This is to print the last record from the last file

Interestingly, if you have the latest version of gawk (4+), you could forgo the FNR part and instead of using END they have created a new variable called ENDFILE which as it sounds is executed at the
end of every file, so it would look like:
Code:
awk 'BEGIN{RS="\n\n\n"}NF{last = $0}ENDFILE{print last}' *
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Sed/awk/cut to pull a repeating string out of a longer string StupidNewbie Programming 3 03-21-2012 02:47 PM
awk: How can I keep the format of the record? quanba Programming 5 04-05-2010 10:50 PM
updating a field in a record through AWK suresh.chola Programming 9 01-18-2010 09:07 AM
awk record separator question johnpaulodonnell Linux - Newbie 2 07-30-2007 10:35 AM
awk: How can I return a specified record dimsh Linux - Newbie 4 09-24-2005 12:36 PM


All times are GMT -5. The time now is 06:45 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration