Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here. |
 |
08-31-2004, 08:46 PM
|
#1
|
|
Member
Registered: Apr 2004
Location: Richmond, VA - USA
Distribution: Debian
Posts: 62
|
Perl Hash Help
[ Log in to get rid of this advertisement]
Hello,
I have to parse several large text files and enter the results in to a database. Each text file is 65 to 70 thousand pages long. I need a jumpstart getting the text into arrays or hashes and then I think I can take it from there. I used sed to remove garbage from the file, but I am unsure where to go from there. Any help would be greatly appreciated.
Here is a sample record from the file:
Code:
VENDOR 61125 TOTAL DOLLAR VAR 77,097.60 PAGE 1 2003 08 01
VENDOR SIS
UNIT BASE SHIP TOT DOL DOLLAR PERCENT
CONTRACT NUMBER PRICE PRICE QTY U/I DATE PR NUMBER BIN/PART NUMBER VALUE VARIANCE VARIANCE
YT67DY7898DUFT5126 88.20000 70.00000 50 EA 00000000 POI90809819856 1560007117067 4,410.00 910.00 0
AWARD HISTORY PIIN BSCM N/A U/I UNIT PRICE AWD DT QTY OPT DT FOB REP TYPE
765WTY34TF56A 7J777 N EA 39.55000 93012 147 00000 2 Y B
PID DATA LINE NR LINE NR
01 001PART, DESCRIPTION, DATA 02 002TECHNICAL DATA AVAILABILITY:
03 003
The above record format repeats until EOF. The award history section repeats an undefined number of times for each main record.
Thanks
-Shawn
I appologize for the width of the record -- Couldn't think of a better way to post it.
Last edited by smaida; 09-07-2004 at 06:53 PM..
|
|
|
|
09-01-2004, 07:24 AM
|
#2
|
|
Member
Registered: Jun 2003
Location: Dublin, Ireland
Distribution: Slackware, LFS, Ubuntu, RedHat, Slamd64
Posts: 505
|
I would use something like...
Code:
while (<DATA>) {
chop;
@words = split;
}
And use the contents of the words array to control the flow of the script.
John
|
|
|
|
09-07-2004, 07:05 PM
|
#3
|
|
Member
Registered: Apr 2004
Location: Richmond, VA - USA
Distribution: Debian
Posts: 62
|
Perl Hash Help
Ok, at this point I am feeling rather stupid and lost.... I have had a few wonderful folks at perlmonks show me how to do this, but it's not sinking in.
Once again, here's the issue at hand... I have a text file (actually 45 of them), each 65,000 pages long. Why? I don't know.
Anyway, they are layered as such:
Code:
Vendor
Contract #(Contract may repeat here several times befor moving on to
#history. The contract numbers may be different and if so they
# need to reference the same lines of history)
History (History will repeat multiple times before moving on to footer)
Footer
Any given piece of information above may repeat at somepoint in the text file. Not really worried about that until I have to send this stuff to the database.
The code below is what I have: It's not pretty, nor is it right. I need to build the data structure as I loop through. I would think a hash of vendors with a hash of contracts with an array of histories and a hash of footers?
Code:
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
use Data::Dumper;
my ($header, $history, $footer);
my $contract = 0;
my @FILE;
my @VENDORS;
my @CONTRACTS;
my @FOOTER;
my @fields;
my $true = 1;
my $false = 0;
my ($vendor,$i);
my $file = "PACE_AUG.txt";
my $vendor_id = 0;
my %contracts;
my %vendors;
my %awards;
my $contract_id = 0;
my $dbh;
my $key = 0;
open (INFILE, $file);
@FILE = <INFILE>;
close (INFILE);
foreach (@FILE){
chomp;
next if /^$/;
if (/VENDOR.+PAGE/){
@fields = split;
$vendor = $fields[1];
$key++; #Increment Vendor Counter
$vendors{$key} = $fields[1];
next;
}
elsif (/\s+\S{17}\s+\S+\./){
@fields = split;
%contracts = (
"ContractNumber" => $fields[0],
"VendorPrice" => $fields[1],
"BasePrice" => $fields[2],
"Qty" => $fields[3],
"UI" => $fields[4],
"ShipDate" => $fields[5],
"PRNumber" => $fields[6],
"NSNNumber" => $fields[7],
"DollarValue" => $fields[8],
"DollarVariance" => $fields[9],
"PercentVariance" => $fields[10],
);
$contract++;
next;
}
elsif (/^\s+\S{13}\s+\S+\s+\S/){
$_ =~ s/^\s*//;
my @fields = unpack "A21 A9 A9 A2 A13 A8 A9 A9 A4 A5 A6", $_;
%awards = (
"PIIN" => $fields[0],
"FSCM" => $fields[1],
"NA" => $fields[2],
"UI" => $fields[3],
"UnitPrice" => $fields[4],
"AwdDT" => $fields[5],
"QTY" => $fields[6],
"OPTDT" => $fields[7],
"FOB" => $fields[8],
"REP" => $fields[9],
"TYPE" => $fields[10],
);
next;
}
else{
$_ =~ s/^\s*//;
if (/^\d{2}\s\d{3}/){
push (@FOOTER, "$_\n");
}
}
}
print Dumper \%vendors;
Last edited by smaida; 09-07-2004 at 07:08 PM..
|
|
|
|
09-09-2004, 11:00 AM
|
#4
|
|
Member
Registered: Jun 2003
Location: Dublin, Ireland
Distribution: Slackware, LFS, Ubuntu, RedHat, Slamd64
Posts: 505
|
Hi,
Debugging this for you isn't just a matter of a few minutes. The best thing you can do is get hold of a good Perl tutorial (google) and go down through your code.
Get it to work on one file first. It will be fairly easy to put that code into a loop to take care of all files after that.
Your basic idea seems ok, but you'll need to think about your data stucture - I guess you will be using hashes of arrays or somesuch. It's not trivial but not impossible either.
Sorry I couldn't give more concrete help.
John
|
|
|
|
09-09-2004, 05:04 PM
|
#5
|
|
Member
Registered: Apr 2004
Location: Richmond, VA - USA
Distribution: Debian
Posts: 62
|
John,
Thanks for the help. I have been reading the tutorials on complex data structues and almost have this thing working. I didn't really want someone to write the code. I was trying to find out how to insert an array in to a hash of hashes and how to insert a hash in to a hash of hashes. I was a little frustrated when I made the post (Sorry). The parsing works fine and I actually just went ahead and built all of the sql statements in to the code and inserted the data one step at a time. I just know that it would be much more efficient to try and build the data structure first.
As always
Thanks for the help....
|
|
|
|
09-13-2004, 05:33 AM
|
#6
|
|
Member
Registered: Jun 2003
Location: Dublin, Ireland
Distribution: Slackware, LFS, Ubuntu, RedHat, Slamd64
Posts: 505
|
No problem, it wasn't much help.
I find the perl cookbook (link in my previous post) very useful for examples of how to do tricky stuff like hashes of hashes and so on.
Good luck!
John
|
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 03:07 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
LQ Podcast
LQ Radio
|
|