Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
08-31-2004, 07:46 PM
|
#1
|
Member
Registered: Apr 2004
Location: Richmond, VA - USA
Distribution: Debian
Posts: 62
Rep:
|
Perl Hash Help
Hello,
I have to parse several large text files and enter the results in to a database. Each text file is 65 to 70 thousand pages long. I need a jumpstart getting the text into arrays or hashes and then I think I can take it from there. I used sed to remove garbage from the file, but I am unsure where to go from there. Any help would be greatly appreciated.
Here is a sample record from the file:
Code:
VENDOR 61125 TOTAL DOLLAR VAR 77,097.60 PAGE 1 2003 08 01
VENDOR SIS
UNIT BASE SHIP TOT DOL DOLLAR PERCENT
CONTRACT NUMBER PRICE PRICE QTY U/I DATE PR NUMBER BIN/PART NUMBER VALUE VARIANCE VARIANCE
YT67DY7898DUFT5126 88.20000 70.00000 50 EA 00000000 POI90809819856 1560007117067 4,410.00 910.00 0
AWARD HISTORY PIIN BSCM N/A U/I UNIT PRICE AWD DT QTY OPT DT FOB REP TYPE
765WTY34TF56A 7J777 N EA 39.55000 93012 147 00000 2 Y B
PID DATA LINE NR LINE NR
01 001PART, DESCRIPTION, DATA 02 002TECHNICAL DATA AVAILABILITY:
03 003
The above record format repeats until EOF. The award history section repeats an undefined number of times for each main record.
Thanks
-Shawn
I appologize for the width of the record -- Couldn't think of a better way to post it.
Last edited by smaida; 09-07-2004 at 05:53 PM.
|
|
|
09-01-2004, 06:24 AM
|
#2
|
Member
Registered: Jun 2003
Location: Dublin, Ireland
Distribution: Slackware, LFS, Ubuntu, RedHat, Slamd64
Posts: 507
Rep:
|
I would use something like...
Code:
while (<DATA>) {
chop;
@words = split;
}
And use the contents of the words array to control the flow of the script.
John
|
|
|
09-07-2004, 06:05 PM
|
#3
|
Member
Registered: Apr 2004
Location: Richmond, VA - USA
Distribution: Debian
Posts: 62
Original Poster
Rep:
|
Perl Hash Help
Ok, at this point I am feeling rather stupid and lost.... I have had a few wonderful folks at perlmonks show me how to do this, but it's not sinking in.
Once again, here's the issue at hand... I have a text file (actually 45 of them), each 65,000 pages long. Why? I don't know.
Anyway, they are layered as such:
Code:
Vendor
Contract #(Contract may repeat here several times befor moving on to
#history. The contract numbers may be different and if so they
# need to reference the same lines of history)
History (History will repeat multiple times before moving on to footer)
Footer
Any given piece of information above may repeat at somepoint in the text file. Not really worried about that until I have to send this stuff to the database.
The code below is what I have: It's not pretty, nor is it right. I need to build the data structure as I loop through. I would think a hash of vendors with a hash of contracts with an array of histories and a hash of footers?
Code:
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
use Data::Dumper;
my ($header, $history, $footer);
my $contract = 0;
my @FILE;
my @VENDORS;
my @CONTRACTS;
my @FOOTER;
my @fields;
my $true = 1;
my $false = 0;
my ($vendor,$i);
my $file = "PACE_AUG.txt";
my $vendor_id = 0;
my %contracts;
my %vendors;
my %awards;
my $contract_id = 0;
my $dbh;
my $key = 0;
open (INFILE, $file);
@FILE = <INFILE>;
close (INFILE);
foreach (@FILE){
chomp;
next if /^$/;
if (/VENDOR.+PAGE/){
@fields = split;
$vendor = $fields[1];
$key++; #Increment Vendor Counter
$vendors{$key} = $fields[1];
next;
}
elsif (/\s+\S{17}\s+\S+\./){
@fields = split;
%contracts = (
"ContractNumber" => $fields[0],
"VendorPrice" => $fields[1],
"BasePrice" => $fields[2],
"Qty" => $fields[3],
"UI" => $fields[4],
"ShipDate" => $fields[5],
"PRNumber" => $fields[6],
"NSNNumber" => $fields[7],
"DollarValue" => $fields[8],
"DollarVariance" => $fields[9],
"PercentVariance" => $fields[10],
);
$contract++;
next;
}
elsif (/^\s+\S{13}\s+\S+\s+\S/){
$_ =~ s/^\s*//;
my @fields = unpack "A21 A9 A9 A2 A13 A8 A9 A9 A4 A5 A6", $_;
%awards = (
"PIIN" => $fields[0],
"FSCM" => $fields[1],
"NA" => $fields[2],
"UI" => $fields[3],
"UnitPrice" => $fields[4],
"AwdDT" => $fields[5],
"QTY" => $fields[6],
"OPTDT" => $fields[7],
"FOB" => $fields[8],
"REP" => $fields[9],
"TYPE" => $fields[10],
);
next;
}
else{
$_ =~ s/^\s*//;
if (/^\d{2}\s\d{3}/){
push (@FOOTER, "$_\n");
}
}
}
print Dumper \%vendors;
Last edited by smaida; 09-07-2004 at 06:08 PM.
|
|
|
09-09-2004, 10:00 AM
|
#4
|
Member
Registered: Jun 2003
Location: Dublin, Ireland
Distribution: Slackware, LFS, Ubuntu, RedHat, Slamd64
Posts: 507
Rep:
|
Hi,
Debugging this for you isn't just a matter of a few minutes. The best thing you can do is get hold of a good Perl tutorial (google) and go down through your code.
Get it to work on one file first. It will be fairly easy to put that code into a loop to take care of all files after that.
Your basic idea seems ok, but you'll need to think about your data stucture - I guess you will be using hashes of arrays or somesuch. It's not trivial but not impossible either.
Sorry I couldn't give more concrete help.
John
|
|
|
09-09-2004, 04:04 PM
|
#5
|
Member
Registered: Apr 2004
Location: Richmond, VA - USA
Distribution: Debian
Posts: 62
Original Poster
Rep:
|
John,
Thanks for the help. I have been reading the tutorials on complex data structues and almost have this thing working. I didn't really want someone to write the code. I was trying to find out how to insert an array in to a hash of hashes and how to insert a hash in to a hash of hashes. I was a little frustrated when I made the post (Sorry). The parsing works fine and I actually just went ahead and built all of the sql statements in to the code and inserted the data one step at a time. I just know that it would be much more efficient to try and build the data structure first.
As always
Thanks for the help....
|
|
|
09-13-2004, 04:33 AM
|
#6
|
Member
Registered: Jun 2003
Location: Dublin, Ireland
Distribution: Slackware, LFS, Ubuntu, RedHat, Slamd64
Posts: 507
Rep:
|
No problem, it wasn't much help.
I find the perl cookbook (link in my previous post) very useful for examples of how to do tricky stuff like hashes of hashes and so on.
Good luck!
John
|
|
|
All times are GMT -5. The time now is 08:16 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|