LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   awk script to pull out particular hands. (https://www.linuxquestions.org/questions/linux-newbie-8/awk-script-to-pull-out-particular-hands-938034/)

sarenace 04-04-2012 04:05 AM

awk script to pull out particular hands.
 
Hi guys,
I was on here the other day asking for some help with awk scripts. I'm using my blossoming gambling addiction as an excuse to relearn the command line. Particularly, I'm mining Poker hand histories for data, and I have a few questions that have me stumped. A hand from a hand history looks like this:

PokerStars Game #27738502010: Tournament #160417133, $0.25+$0.00 Hold'em No Limit - Level XV (250/500) - 2009/05/02 13:32:38 ET
Table '160417133 3' 9-max Seat #8 is the button
Seat 1: LLC 4Eva (9182 in chips)
Seat 2: 618shooter (25711 in chips) is sitting out
Seat 3: suposd2bRich (21475 in chips)
Seat 4: ElT007 (60940 in chips)
Seat 5: Orlando I (18044 in chips)
Seat 6: ih82bcool2 (8338 in chips)
Seat 7: kovilen007 (8353 in chips)
Seat 8: GerKingTiger (4404 in chips)
Seat 9: Phontaz (23553 in chips)
LLC 4Eva: posts the ante 60
618shooter: posts the ante 60
suposd2bRich: posts the ante 60
ElT007: posts the ante 60
Orlando I: posts the ante 60
ih82bcool2: posts the ante 60
kovilen007: posts the ante 60
GerKingTiger: posts the ante 60
Phontaz: posts the ante 60
Phontaz: posts small blind 250
LLC 4Eva: posts big blind 500
*** HOLE CARDS ***
Dealt to ElT007 [Qd Qc]
618shooter: folds
suposd2bRich: folds
ElT007: raises 2000 to 2500
Orlando I: raises 15484 to 17984 and is all-in
ih82bcool2: folds
kovilen007: calls 8293 and is all-in
GerKingTiger: folds
Phontaz: calls 17734
LLC 4Eva: folds
ElT007: raises 15484 to 33468
Phontaz: calls 5509 and is all-in
Uncalled bet (9975) returned to ElT007
*** FLOP *** [2d 2c 3c]
*** TURN *** [2d 2c 3c] [8h]
*** RIVER *** [2d 2c 3c 8h] [4d]
*** SHOW DOWN ***
Phontaz: shows [9s 9h] (two pair, Nines and Deuces)
ElT007: shows [Qd Qc] (two pair, Queens and Deuces)
618shooter has returned
ElT007 collected 11018 from side pot-2
Orlando I: shows [5d 5h] (two pair, Fives and Deuces)
ElT007 collected 29073 from side pot-1
kovilen007: shows [Kh As] (a pair of Deuces)
ElT007 collected 34212 from main pot
*** SUMMARY ***
Total pot 74303 Main pot 34212. Side pot-1 29073. Side pot-2 11018. | Rake 0
Board [2d 2c 3c 8h 4d]
Seat 1: LLC 4Eva (big blind) folded before Flop
Seat 2: 618shooter folded before Flop (didn't bet)
Seat 3: suposd2bRich folded before Flop (didn't bet)
Seat 4: ElT007 showed [Qd Qc] and won (74303) with two pair, Queens and Deuces
Seat 5: Orlando I showed [5d 5h] and lost with two pair, Fives and Deuces
Seat 6: ih82bcool2 folded before Flop (didn't bet)
Seat 7: kovilen007 showed [Kh As] and lost with a pair of Deuces
Seat 8: GerKingTiger (button) folded before Flop (didn't bet)
Seat 9: Phontaz (small blind) showed [9s 9h] and lost with two pair, Nines and Deuces

This will be separated from the next hand in the hand history by 3 newlines. I was hoping someone could give me some tips for pulling out certain hands, I really don't know where to start; which is why I'm bothering you guys. Here is what I want to do:
The example above includes the line

Dealt to ElT007 [Qd Qc]

i.e., the player was dealt a pair of queens. I want to pull out all hands in the hand history where the player was dealt a pair such as the one above, or a suited connector. In order to analyse an entire hand at a time, I've already realised I have to prod awk a little with something like this:

awk 'BEGIN{RS=ORS="\n\n\n"}(command)' handhistoryfile

OR

awk '/^PokerStars /,/\n\n\n/ && other_regxp{print}' handhistoryfile

So basically, awk has to load in the entire hand with one of the two methods above, move to the line saying dealt to, look at the hand dealt, and if both cards are the same, i.e., both are Q, or both are 9, print the hand. I have been trucking along pretty well with awk so far, but this is WAY over my head.
I also want to pull out suited connectors. These are hands like this:

Dealt to ElT007 [6d 7d]

i.e., two cards in consecutive order, where the suit (the lowercase letter after the card, in this case d, for diamonds) is the same for both. I really would have rather avoided bothering you guys with this, and would have much preferred figuring it out for myself, but this and a few other problems have honestly stumped me. I have no idea how to proceed. Any thoughts?

grail 04-04-2012 05:12 AM

hmmm ... how about you forget about a complete hand and simply search for the "Dealt to" line? It should then be a simple matter of extracting the data in the square brackets and comparing as desired.

sarenace 04-04-2012 05:23 AM

Unfortunately I need the whole hand; the reason I want to be able to pull out pairs and suited connectors is to see if I can play those hands profitably; I need to know how much money I made on each hand, how often I play them and under what conditions, etc etc. I thought perhaps something to do with the substr command might help, but I'm not sure how to use it. BTW, grail, are you always online answering questions?

grail 04-04-2012 06:21 AM

Then I would keep with the same format but also change the FS variable to be a single new line. You can then cycle through each field using a for loo and the NF variable for loop termination
and search for the required items.
Quote:

BTW, grail, are you always online answering questions?
Probably not anymore than any of the other regulars :)

sarenace 04-05-2012 11:58 PM

Found a script that can locate pairs but only on the line beginning "Dealt to".

awk '/Dealt to/{if (substr($NF,0,1)==substr($(NF-1),2,1)) print $0}' *

Would anyone have any idea how to extend the command out to grab the entire hand? I have tried


awk '/^PokerStars/,/\n\n\n/ && /^Dealt to/{if (substr($NF,0,1)==substr($(NF-1),2,1)) print $0}' HH*

But that results in the following error message:

(FILENAME=HH20111227 T490759814 No Limit Hold'em $2.28 + $0.22.txt FNR=82) fatal: attempt to access field -1

Any thoughts?

grail 04-06-2012 01:18 AM

As suggested:
Code:

awk 'BEGIN{RS=ORS="\n\n\n";FS="\n"}{for(i=1;i<=NF;i++)if($i ~ /Dealt to/){n=split($i,a,"[][ ]+");if(substr(a[n-1],1,1) == substr(a[n-2],1,1))print}}' HH*

sarenace 04-06-2012 02:16 AM

Once again you have swooped in and shown me the way lol. Looking over the code, I think I may have a dim idea of how it works. I'm sorry I wasn't able to figure out the code myself from your suggestion, I am quite inexperienced with the command line, I didn't know what a for loop was.

grail 04-06-2012 03:48 AM

No probs, here is the awk bible (well I like to think so) which is well worth a read and I would even
suggest starting at page one and just following the examples provided.

As a quick explanation:

BEGIN{RS=ORS="\n\n\n";FS="\n"} - Everything in BEGIN is process prior to any file(s) being read. The first 2 variables you know, the last being the field separator (FS). By setting to a
newline, each line in each record is now its own field.

for(i=1;i<=NF;i++) - a "for" loop is a simple construct in a large number of languages that allows one to loop over an interval which can be many things, here it is a simple counter from 1
to NF (number of fields)

if($i ~ /Dealt to/) - if the field we are looking at contains "Dealt to" then enter "if"

n=split($i,a,"[][ ]+") - split the field based on a delimiter being at least a space, [ or ] and any combination of the 3. Store the split data in the array "a" and return the number of splits to the variable "n".

if(substr(a[n-1],1,1) == substr(a[n-2],1,1))print - Assuming we are looking at input string of the format, "Dealt to ElT007 [Qd Qc]", once split, as per above, "Qd" will be the third last element
in array "a" and "Qc" will be the second last. Retrieve the substring of each containing the first character and compare, if equal, print the entire record.

As each hand has 60 fields, based on FS="\n", then you could also break out of the for loop at that point so as not to read all the other fields that we do not care about. I will let you look
up how that would be done :)


All times are GMT -5. The time now is 10:12 AM.