LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Parsing large text file with perl (https://www.linuxquestions.org/questions/programming-9/parsing-large-text-file-with-perl-225022/)

smaida 08-31-2004 07:46 PM

Perl Hash Help
 
Hello,

I have to parse several large text files and enter the results in to a database. Each text file is 65 to 70 thousand pages long. I need a jumpstart getting the text into arrays or hashes and then I think I can take it from there. I used sed to remove garbage from the file, but I am unsure where to go from there. Any help would be greatly appreciated.

Here is a sample record from the file:

Code:

VENDOR  61125                      TOTAL DOLLAR VAR  77,097.60                    PAGE    1  2003 08 01

                          VENDOR        SIS
                            UNIT      BASE                  SHIP                                    TOT DOL  DOLLAR    PERCENT
  CONTRACT NUMBER          PRICE      PRICE    QTY  U/I    DATE      PR NUMBER    BIN/PART NUMBER    VALUE  VARIANCE  VARIANCE

  YT67DY7898DUFT5126      88.20000    70.00000      50  EA  00000000  POI90809819856    1560007117067    4,410.00    910.00    0


    AWARD HISTORY  PIIN                BSCM  N/A      U/I  UNIT PRICE  AWD DT      QTY  OPT DT  FOB  REP  TYPE

                    765WTY34TF56A        7J777    N        EA    39.55000  93012      147    00000  2    Y    B

  PID  DATA  LINE NR                                                    LINE NR
                01 001PART, DESCRIPTION, DATA                            02 002TECHNICAL DATA AVAILABILITY:
                03 003

The above record format repeats until EOF. The award history section repeats an undefined number of times for each main record.

Thanks
-Shawn

I appologize for the width of the record -- Couldn't think of a better way to post it.

jkobrien 09-01-2004 06:24 AM

I would use something like...

Code:

while (<DATA>) {
  chop;
  @words = split;
}

And use the contents of the words array to control the flow of the script.

John

smaida 09-07-2004 06:05 PM

Perl Hash Help
 
Ok, at this point I am feeling rather stupid and lost.... I have had a few wonderful folks at perlmonks show me how to do this, but it's not sinking in.

Once again, here's the issue at hand... I have a text file (actually 45 of them), each 65,000 pages long. Why? I don't know.

Anyway, they are layered as such:
Code:

Vendor
  Contract  #(Contract may repeat here several times befor moving on to
                #history.  The contract numbers may be different and if so they
                # need to reference the same lines of history)
      History  (History will repeat multiple times before moving on to footer)
  Footer

Any given piece of information above may repeat at somepoint in the text file. Not really worried about that until I have to send this stuff to the database.

The code below is what I have: It's not pretty, nor is it right. I need to build the data structure as I loop through. I would think a hash of vendors with a hash of contracts with an array of histories and a hash of footers?


Code:

#!/usr/bin/perl

use strict;
use warnings;
use DBI;
use Data::Dumper;

my ($header, $history, $footer);
my $contract = 0;
my @FILE;
my @VENDORS;
my @CONTRACTS;
my @FOOTER;
my @fields;
my $true = 1;
my $false = 0;
my ($vendor,$i);
my $file = "PACE_AUG.txt";
my $vendor_id = 0;
my %contracts;
my %vendors;
my %awards;
my $contract_id = 0;
my $dbh;
my $key = 0;



open (INFILE, $file);
@FILE = <INFILE>;
close (INFILE);


foreach (@FILE){
  chomp;
  next if /^$/;
  if (/VENDOR.+PAGE/){
      @fields = split;
      $vendor = $fields[1];
      $key++; #Increment Vendor Counter
         
      $vendors{$key} = $fields[1];
      next;
  }
  elsif (/\s+\S{17}\s+\S+\./){
      @fields = split;

      %contracts = (
        "ContractNumber"  => $fields[0],
        "VendorPrice"    => $fields[1],
        "BasePrice"          => $fields[2],
        "Qty"                  => $fields[3],
        "UI"                  => $fields[4],
          "ShipDate"          => $fields[5],
        "PRNumber"          => $fields[6],
        "NSNNumber"          => $fields[7],
        "DollarValue"          => $fields[8],
        "DollarVariance"  => $fields[9],
        "PercentVariance" => $fields[10],
      );
      $contract++;
      next;
  }
  elsif (/^\s+\S{13}\s+\S+\s+\S/){
      $_ =~ s/^\s*//;
  my @fields = unpack "A21 A9 A9 A2 A13 A8 A9 A9 A4 A5 A6", $_;
      %awards = (
          "PIIN"        => $fields[0],
          "FSCM"        => $fields[1],
          "NA"                => $fields[2],
          "UI"                => $fields[3],
          "UnitPrice"        => $fields[4],
          "AwdDT"        => $fields[5],
          "QTY"                => $fields[6],
          "OPTDT"        => $fields[7],
          "FOB"                => $fields[8],
          "REP"                => $fields[9],
          "TYPE"        => $fields[10],
      );
      next;
  }
  else{
    $_ =~ s/^\s*//;
    if (/^\d{2}\s\d{3}/){
        push (@FOOTER, "$_\n");
    }
  }
}

print Dumper \%vendors;


jkobrien 09-09-2004 10:00 AM

Hi,

Debugging this for you isn't just a matter of a few minutes. The best thing you can do is get hold of a good Perl tutorial (google) and go down through your code.

Get it to work on one file first. It will be fairly easy to put that code into a loop to take care of all files after that.

Your basic idea seems ok, but you'll need to think about your data stucture - I guess you will be using hashes of arrays or somesuch. It's not trivial but not impossible either.

Sorry I couldn't give more concrete help.

John

smaida 09-09-2004 04:04 PM

John,

Thanks for the help. I have been reading the tutorials on complex data structues and almost have this thing working. I didn't really want someone to write the code. I was trying to find out how to insert an array in to a hash of hashes and how to insert a hash in to a hash of hashes. I was a little frustrated when I made the post (Sorry). The parsing works fine and I actually just went ahead and built all of the sql statements in to the code and inserted the data one step at a time. I just know that it would be much more efficient to try and build the data structure first.

As always
Thanks for the help....

jkobrien 09-13-2004 04:33 AM

No problem, it wasn't much help.

I find the perl cookbook (link in my previous post) very useful for examples of how to do tricky stuff like hashes of hashes and so on.

Good luck!

John


All times are GMT -5. The time now is 12:02 AM.