LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-31-2004, 07:46 PM   #1
smaida
Member
 
Registered: Apr 2004
Location: Richmond, VA - USA
Distribution: Debian
Posts: 62

Rep: Reputation: 15
Perl Hash Help


Hello,

I have to parse several large text files and enter the results in to a database. Each text file is 65 to 70 thousand pages long. I need a jumpstart getting the text into arrays or hashes and then I think I can take it from there. I used sed to remove garbage from the file, but I am unsure where to go from there. Any help would be greatly appreciated.

Here is a sample record from the file:

Code:
 VENDOR  61125                       TOTAL DOLLAR VAR  77,097.60                     PAGE    1   2003 08 01

                           VENDOR        SIS
                             UNIT       BASE                  SHIP                                     TOT DOL   DOLLAR    PERCENT
   CONTRACT NUMBER          PRICE      PRICE     QTY  U/I     DATE       PR NUMBER    BIN/PART NUMBER    VALUE  VARIANCE  VARIANCE

   YT67DY7898DUFT5126      88.20000     70.00000      50  EA   00000000  POI90809819856    1560007117067    4,410.00     910.00     0


    AWARD HISTORY   PIIN                BSCM   N/A      U/I   UNIT PRICE  AWD DT      QTY   OPT DT  FOB  REP   TYPE

                    765WTY34TF56A        7J777    N        EA     39.55000   93012      147    00000   2    Y     B

   PID  DATA   LINE NR                                                     LINE NR
                 01 001PART, DESCRIPTION, DATA                            02 002TECHNICAL DATA AVAILABILITY:
                 03 003
The above record format repeats until EOF. The award history section repeats an undefined number of times for each main record.

Thanks
-Shawn

I appologize for the width of the record -- Couldn't think of a better way to post it.

Last edited by smaida; 09-07-2004 at 05:53 PM.
 
Old 09-01-2004, 06:24 AM   #2
jkobrien
Member
 
Registered: Jun 2003
Location: Dublin, Ireland
Distribution: Slackware, LFS, Ubuntu, RedHat, Slamd64
Posts: 507

Rep: Reputation: 30
I would use something like...

Code:
while (<DATA>) {
  chop;
  @words = split;
}
And use the contents of the words array to control the flow of the script.

John
 
Old 09-07-2004, 06:05 PM   #3
smaida
Member
 
Registered: Apr 2004
Location: Richmond, VA - USA
Distribution: Debian
Posts: 62

Original Poster
Rep: Reputation: 15
Perl Hash Help

Ok, at this point I am feeling rather stupid and lost.... I have had a few wonderful folks at perlmonks show me how to do this, but it's not sinking in.

Once again, here's the issue at hand... I have a text file (actually 45 of them), each 65,000 pages long. Why? I don't know.

Anyway, they are layered as such:
Code:
Vendor
  Contract  #(Contract may repeat here several times befor moving on to
                 #history.  The contract numbers may be different and if so they
                 # need to reference the same lines of history)
       History  (History will repeat multiple times before moving on to footer)
   Footer
Any given piece of information above may repeat at somepoint in the text file. Not really worried about that until I have to send this stuff to the database.

The code below is what I have: It's not pretty, nor is it right. I need to build the data structure as I loop through. I would think a hash of vendors with a hash of contracts with an array of histories and a hash of footers?


Code:
#!/usr/bin/perl

use strict;
use warnings;
use DBI;
use Data::Dumper;

my ($header, $history, $footer);
my $contract = 0;
my @FILE;
my @VENDORS;
my @CONTRACTS;
my @FOOTER; 
my @fields;
my $true = 1;
my $false = 0;
my ($vendor,$i);
my $file = "PACE_AUG.txt";
my $vendor_id = 0;
my %contracts;
my %vendors;
my %awards;
my $contract_id = 0;
my $dbh;
my $key = 0;



open (INFILE, $file);
@FILE = <INFILE>;
close (INFILE);


foreach (@FILE){
   chomp;
   next if /^$/;
   if (/VENDOR.+PAGE/){
      @fields = split;
      $vendor = $fields[1];
      $key++; #Increment Vendor Counter
           
      $vendors{$key} = $fields[1];
      next;
   }
   elsif (/\s+\S{17}\s+\S+\./){
      @fields = split;

      %contracts = (
         "ContractNumber"  => $fields[0],
	 "VendorPrice"     => $fields[1],
         "BasePrice"	   => $fields[2],
         "Qty"		   => $fields[3],
	 "UI"	           => $fields[4],
 	 "ShipDate"	   => $fields[5],
         "PRNumber"	   => $fields[6],
	 "NSNNumber"	   => $fields[7],
	 "DollarValue"	   => $fields[8],
	 "DollarVariance"  => $fields[9],
	 "PercentVariance" => $fields[10],
      );
      $contract++;
      next;
   }
   elsif (/^\s+\S{13}\s+\S+\s+\S/){
      $_ =~ s/^\s*//;
   my @fields = unpack "A21 A9 A9 A2 A13 A8 A9 A9 A4 A5 A6", $_;
      %awards = (
          "PIIN"	=> $fields[0],
	  "FSCM"	=> $fields[1],
	  "NA"		=> $fields[2],
	  "UI"		=> $fields[3],
	  "UnitPrice"	=> $fields[4],
	  "AwdDT"	=> $fields[5],
	  "QTY"		=> $fields[6],
	  "OPTDT"	=> $fields[7],
	  "FOB"		=> $fields[8],
	  "REP"		=> $fields[9],
	  "TYPE"	=> $fields[10],
      );
      next;
   }
   else{
     $_ =~ s/^\s*//;
     if (/^\d{2}\s\d{3}/){
        push (@FOOTER, "$_\n");
     }
   }
}

print Dumper \%vendors;

Last edited by smaida; 09-07-2004 at 06:08 PM.
 
Old 09-09-2004, 10:00 AM   #4
jkobrien
Member
 
Registered: Jun 2003
Location: Dublin, Ireland
Distribution: Slackware, LFS, Ubuntu, RedHat, Slamd64
Posts: 507

Rep: Reputation: 30
Hi,

Debugging this for you isn't just a matter of a few minutes. The best thing you can do is get hold of a good Perl tutorial (google) and go down through your code.

Get it to work on one file first. It will be fairly easy to put that code into a loop to take care of all files after that.

Your basic idea seems ok, but you'll need to think about your data stucture - I guess you will be using hashes of arrays or somesuch. It's not trivial but not impossible either.

Sorry I couldn't give more concrete help.

John
 
Old 09-09-2004, 04:04 PM   #5
smaida
Member
 
Registered: Apr 2004
Location: Richmond, VA - USA
Distribution: Debian
Posts: 62

Original Poster
Rep: Reputation: 15
John,

Thanks for the help. I have been reading the tutorials on complex data structues and almost have this thing working. I didn't really want someone to write the code. I was trying to find out how to insert an array in to a hash of hashes and how to insert a hash in to a hash of hashes. I was a little frustrated when I made the post (Sorry). The parsing works fine and I actually just went ahead and built all of the sql statements in to the code and inserted the data one step at a time. I just know that it would be much more efficient to try and build the data structure first.

As always
Thanks for the help....
 
Old 09-13-2004, 04:33 AM   #6
jkobrien
Member
 
Registered: Jun 2003
Location: Dublin, Ireland
Distribution: Slackware, LFS, Ubuntu, RedHat, Slamd64
Posts: 507

Rep: Reputation: 30
No problem, it wasn't much help.

I find the perl cookbook (link in my previous post) very useful for examples of how to do tricky stuff like hashes of hashes and so on.

Good luck!

John
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help parsing text file scilec Programming 5 12-02-2004 01:00 PM
need help parsing text file airman99 Linux - General 2 10-08-2004 09:09 PM
Parsing Text from a html file. Rezon Programming 6 10-18-2003 12:09 AM
Parsing a tab delimited text file jajanes Programming 9 08-08-2003 10:34 AM
Parsing a file for a string of text jamesmwlv Linux - General 2 12-02-2002 07:13 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:16 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration