Perl

Lordandmaker · 01-21-2009, 07:21 AM

I am writing a script to go through a file of blocks of key/value pairs, and convert them into an array of hashes. Each hash should contain one 'block' of data (i.e. split by an empty line), and be composed of the key/value pairs.
The file is as follows:

Code:

Item    Value
Drive   C:
Description     Local Fixed Disk
Compressed      No
File System     NTFS
Size    12.00 GB (12,880,787,968 bytes)
Free Space      4.42 GB (4,745,418,752 bytes)
Volume Name
Volume Serial Number    A4B40FDF

Drive   D:
Description     CD-ROM Disc

Drive   E:
Description     Local Fixed Disk
Compressed      No
File System     NTFS
Size    7.00 GB (7,517,904,896 bytes)
Free Space      3.29 GB (3,534,249,984 bytes)
Volume Name     SYS
Volume Serial Number    BE1F14B0

Drive   F:
Description     Local Fixed Disk
Compressed      No
File System     NTFS
Size    254.21 GB (272,955,916,288 bytes)
Free Space      182.64 GB (196,106,596,352 bytes)
Volume Name     DATA
Volume Serial Number    9448FDDF

Drive   J:
Description     Network Connection
Provider Name   \\jup-svr01\jbwv2

Drive   M:
Description     Network Connection
Provider Name   \\jup-svr02\macshare

Drive   S:
Description     Network Connection
Provider Name   \\jup-svr01\Shared

Drive   U:
Description     Network Connection
Provider Name   \\jup-svr01\users\Avi Greenbury

The code is as follows:

Code:

open REPORT, "$input_file" || die ("Error opening input file at $input_file");
while (<REPORT>){
        if (($_ =~ /^Info/)){
                $_ = "";
        }
        $data.=$_;
}
close REPORT;

@data_blocks=split(/\n\n/, $data);
@information=("");
$record="";
foreach (@data_blocks){
        @lines = split (/\n/, $_);
        foreach (@lines){
                ($key, $value)= split (/[ ]{2,}/, $_);
                $record->{$key} = $value;
                print $key, " = ", $value, "\n";
        }
        print "~ ~ ~ ~ ~\n";
        push @information, $record;
}

#$num_records = $#information;
#for ($i=0; $i<$num_records; $i++){
#       print "\ndrive =", $information[$i]{'Drive'};
#       print $i;
#}

If I run the file as-is, it prints out exactly what I want - a series of key/value pairs, in blocks separated with "~ ~ ~ ~":

Code:

Drive = C:
Description = Local Fixed Disk
Compressed = No
File System = NTFS
Size = 12.00 GB (12,880,787,968 bytes)
Free Space = 4.42 GB (4,745,418,752 bytes)
Volume Name =
Volume Serial Number = A4B40FDF
~ ~ ~ ~ ~
Drive = D:
Description = CD-ROM Disc
~ ~ ~ ~ ~
Drive = E:
Description = Local Fixed Disk
Compressed = No
File System = NTFS
etc....

If I comment out those two print lines and instead run the for loop, it appears I've got the same hash in the array multiple times:

Code:

drive =U:0
drive =U:1
drive =U:2
drive =U:3
drive =U:4
drive =U:5
drive =U:6
drive =U:7

I can't see what I've got wrong here - it looks to me like writing to the hashes and pushing them onto the array happen at the correct times? It certainly seems to be the same as any examples I can find on the net. Any ideas?

Telemachos · 01-21-2009, 12:56 PM

I'm not following a bunch of your choices, but they may have good reason. (For example, why stuff everything into the scalar $data only to split it again later?) I'm also heading out, so this is quick, but I think the key problem is how you are getting items into the %record hash (and what is ending up in the @information array).

Here's a quick stab from me with output (from your file). I removed the print statements and substituted a printout of the structure of the @information array to help show what gets in there:

Code:

#!/usr/bin/env perl
use warnings;
use strict;

my $data;

while (<>){
    if (($_ =~ /^Info/)){
                $_ = "";
        }
        $data.=$_;
}

my @data_blocks=split(/\n\n/, $data);
my @information;

foreach (@data_blocks){
    my %record;                          # Initialize an empty array for each chunk
    my @lines = split (/\n/, $_);

    foreach (@lines){
        my ($key, $value)= split (/[ ]{2,}/, $_);
        $record{$key} = $value;          # Load up the array for each line
                                         # No arrow - not dereferencing anything here
    }

    push @information, \%record;         # Push the hash reference for each chunk into @information
}

use Data::Dumper;
print Dumper \@information;

Output (using your file as 'input'):

Code:

hektor ~ $ ./AoH input 
$VAR1 = [
          {
            'Item' => 'Value',
            'Free Space' => '4.42 GB (4,745,418,752 bytes)',
            'Volume Name' => undef,
            'Drive' => 'C:',
            'Volume Serial Number' => 'A4B40FDF',
            'File System' => 'NTFS',
            'Size' => '12.00 GB (12,880,787,968 bytes)',
            'Description' => 'Local Fixed Disk',
            'Compressed' => 'No'
          },
          {
            'Description' => 'CD-ROM Disc',
            'Drive' => 'D:'
          },
          {
            'Free Space' => '3.29 GB (3,534,249,984 bytes)',
            'Volume Name' => 'SYS',
            'Drive' => 'E:',
            'File System' => 'NTFS',
            'Volume Serial Number' => 'BE1F14B0',
            'Size' => '7.00 GB (7,517,904,896 bytes)',
            'Description' => 'Local Fixed Disk',
            'Compressed' => 'No'
          },
          {
            'Free Space' => '182.64 GB (196,106,596,352 bytes)',
            'Volume Name' => 'DATA',
            'Drive' => 'F:',
            'File System' => 'NTFS',
            'Volume Serial Number' => '9448FDDF',
            'Size' => '254.21 GB (272,955,916,288 bytes)',
            'Description' => 'Local Fixed Disk',
            'Compressed' => 'No'
          },
          {
            'Provider Name' => '\\\\jup-svr01\\jbwv2',
            'Description' => 'Network Connection',
            'Drive' => 'J:'
          },
          {
            'Provider Name' => '\\\\jup-svr02\\macshare',
            'Description' => 'Network Connection',
            'Drive' => 'M:'
          },
          {
            'Provider Name' => '\\\\jup-svr01\\Shared',
            'Description' => 'Network Connection',
            'Drive' => 'S:'
          },
          {
            'Provider Name' => '\\\\jup-svr01\\users\\Avi Greenbury',
            'Description' => 'Network Connection',
            'Drive' => 'U:'
          }
        ];

Telemachos · 01-21-2009, 02:59 PM

Quick follow-up: if you want to display the drives, this will work:

Code:

foreach my $record (@information) {
    print "$record->{'Drive'}\n";
}

On the other hand, without more context that really isn't going to give much information. Here's the output:

Code:

hektor ~ $ ./AoH input 
C:
D:
E:
F:
J:
M:
S:
U:

Edit: This seems a little more informative:

Code:

foreach my $number (0..$#information) {
    printf "Record #%d -  %s\n",
    $number + 1, $information[$number]->{'Drive'};
}

Output:

Code:

hektor ~ $ ./AoH input 
Record #1 -  C:
Record #2 -  D:
Record #3 -  E:
Record #4 -  F:
Record #5 -  J:
Record #6 -  M:
Record #7 -  S:
Record #8 -  U:

wje_lq · 01-21-2009, 08:51 PM

Telemachos suggests well. But I'm going to take a different tack on this problem. Rather than provide an alternative suggestion, I'm going to answer the question behind Lordandmaker's question.

The question is: "How do I get the unique drive designations to show, rather than the final one each time?"

The question behind the question is: "What don't I understand about references and about arrays of hashes?" That's the one I wish to address, and I'll do it by starting with Lordandmaker's script.

First I'll make four adjustments to the script as Lordandmaker presented it, to make it easily runnable:

I'll put a perl shebang at the beginning. This should vary from system to system, because not all distributions put perl in the same place. Telemachos has an interesting solution to this, but I want to keep this simple and focus on the question at hand.
I'll assign a constant input filename.
I'll comment out the print statements, and uncomment the final paragraph.
I'll tame the output a little by moving the line feed character so that it's at the end of each line, not at the beginning.

Here's the script so far, with the changes in red:

Code:

#!/usr/bin/perl

$input_file="input.txt";

open REPORT, "$input_file" || die ("Error opening input file at $input_file");
while (<REPORT>){
        if (($_ =~ /^Info/)){
                $_ = "";
        }
        $data.=$_;
}
close REPORT;

@data_blocks=split(/\n\n/, $data);
@information=("");
$record="";
foreach (@data_blocks){
        @lines = split (/\n/, $_);
        foreach (@lines){
                ($key, $value)= split (/[ ]{2,}/, $_);
                $record->{$key} = $value;
                # commented out: print $key, " = ", $value, "\n";
        }
        # commented out: print "~ ~ ~ ~ ~\n";
        push @information, $record;
}

# uncommented:

$num_records = $#information;
for ($i=0; $i<$num_records; $i++){
       print "drive =", $information[$i]{'Drive'};
       print "$i\n";
}

So, why doesn't the code behave? Here's where my explaining skills fall down a little, because I'm not intimately familiar with references in Perl, or arrays of hashes for that matter. (I actually didn't use them until you presented this problem, and I may never use them again, because I think this aspect of Perl is too ugly to be allowed to live.) But first I'll show you how to fix the script. Then I'll show you where to find an explanation that should work for you.

To fix the script, I commented out one line and added another. Here's the revised script:

Code:

#!/usr/bin/perl

$input_file="input.txt";

open REPORT, "$input_file" || die ("Error opening input file at $input_file");
while (<REPORT>){
        if (($_ =~ /^Info/)){
                $_ = "";
        }
        $data.=$_;
}
close REPORT;

@data_blocks=split(/\n\n/, $data);
@information=("");
# $record="";
foreach (@data_blocks){
        my $record;
        @lines = split (/\n/, $_);
        foreach (@lines){
                ($key, $value)= split (/[ ]{2,}/, $_);
                $record->{$key} = $value;
                # commented out: print $key, " = ", $value, "\n";
        }
        # commented out: print "~ ~ ~ ~ ~\n";
        push @information, $record;
}

# uncommented:

$num_records = $#information;
for ($i=0; $i<$num_records; $i++){
       print "drive =", $information[$i]{'Drive'};
       print "$i\n";
}

Presto. It works. Here's the explanation. When you're done reading the explanation, the rest of that web page might prove useful to you.

Hope this helps.

Lordandmaker · 01-22-2009, 03:21 AM

Quote:

Originally Posted by Telemachos

I'm not following a bunch of your choices, but they may have good reason. (For example, why stuff everything into the scalar $data only to split it again later?) I'm also heading out, so this is quick, but I think the key problem is how you are getting items into the %record hash (and what is ending up in the @information array).

They probably don't have good reason. If you've got time, I'd appreciate whatever you can give as pointers of what it is I'm getting wrong. All of my perl 'knowledge' comes from playing until it works, rather than any actual learning (this is changing, slowly)

The change suggested by yourself and wje worked, as I presume you expected, so thank you for that! The best bit is that I can see /why/ it worked, too.

*wanders off to read yet more perldoc*

chrism01 · 01-22-2009, 04:46 AM

Well, the 2 things I'd point you at (if you don't already know them) is:
http://perldoc.perl.org/ - complete lang docs, inc examples and reasons & tutorials
http://www.perlmonks.org/?node=Tutorials - grouped tutorials at the guru site for Perl

Re Telemachos comment, which I agree with, normally you'd deal with each line as it came in from the file, instead of saving to a scalar then processing again. Imagine you've got a LOT of data, could create an out-of-mem situation. You'd have 2 copies of the data in mem...
Also, usually a good idea to split off the newline char immediately after reading so:

Code:

while (<>)
{
    chomp();
...
}

I also prefer named recs, as the default $_ can be confusing if nested, similarly using the 'invisible' implied $_, so I'd use this sort of construct

Code:

    # Open cfg file
    open( CONFIG_FILE, "<$cfg_file" ) or
            die "Can't open cfg file: $cfg_file: $!\n";

    # Process cfg file records
    while ( defined ( $cfg_rec = <CONFIG_FILE> ) )
    {
        # Remove unwanted chars
        chomp $cfg_rec;                 # newline
        $cfg_rec =~ s/#.*//;            # comments
        $cfg_rec =~ s/^\s+//;           # leading whitespace
        $cfg_rec =~ s/\s+$//;           # trailing whitespace

        next unless length($cfg_rec);   # anything left?

        # Split 'key=value' string
        ($key, $value) = split( /\s*=\s*/, $cfg_rec, 2);

        # Assign to global hash, forcing uppercase keys
        $cfg::params{uc($key)} = $value;
    }

TMTOWTDI : There's More Than One Way To Do It - Perl motto

wje_lq · 01-22-2009, 05:19 AM

Quote:

TMTOWTDI : There's More Than One Way To Do It - Perl motto

The blessing of Perl script creators.

The bane of Perl script maintainers.

Telemachos · 01-22-2009, 07:51 AM

Quote:

Originally Posted by Lordandmaker

If you've got time, I'd appreciate whatever you can give as pointers of what it is I'm getting wrong. All of my perl 'knowledge' comes from playing until it works, rather than any actual learning (this is changing, slowly)

Wje_lq and Chrism covered a lot of ground, but in the spirit of TIMTOWDI, here's how I might tackle this. (It's hard to get a complete sense of it, since I'm not really sure of the larger goals.)

I've put comments in the code to clarify some of my choices:

Code:

#!/usr/bin/env perl
use strict;                   # Always use strict and warnings
use warnings;                 # This will catch tons of mistakes for you

my $input_file = 'file';      # You can hardcode the file name or use <>

open my $file_handle, $input_file  # No need to quote $input_file here
    or die "Can't open $input_file: $!";  # Include $! in calls to die since it has the system error value
                                          # which is often helpful

my @data_blocks;                          # Create the empty array - no need to use = ("") or the like

{
    local $/ = "\n\n";                    # Temporarily redefine the $/ value (normally \n) which determine what counts as one "line"
    while (<$file_handle>) {              # Read the file, one "line" at a time
                                          # But a line now is a block
        chomp $_;                         
        push @data_blocks, $_;            # Push each block into @data_blocks
    }
}

my @information;

foreach my $block (@data_blocks) {        # Take one block at a time
    my @lines = split /\n/, $block;       # Split into lines
    my $record = {};                      # Initialize $record as an empty hash reference
    
    foreach my $line (@lines) {
        next if $line =~ m/^Item/;        # Skip the intro line
        next if $line =~ m/^~+$/;         # Skip the lines with only ~
        my ($key, $value) = split /[ ]{2,}/, $line;  # Get key, value pairs
        $record->{$key} = $value;                    # And assign to the hash reference
    }

    push @information, $record;                      # Then push hash ref onto the @information array
}

# For debugging, I often use Data::Dumper to show me the look of the
# data structure I created; uncomment if something goes wrong
#use Data::Dumper;
#print Dumper \@information;

# One way to print out the whole shebang without much fancy formatting
foreach my $record (@information) {
    foreach my $key (keys %$record) {
        if ( defined $record->{$key} ) {
            print "$key => $record->{$key}\n";
        }
        else {
            print "$key => undef\n";
        }
    }
    print "\n";
}

There are a lot of temporary and maybe unnecessary variables here, but this gives you some idea.

sundialsvcs · 01-22-2009, 08:48 AM

I'm looking at this requirement, and I'm thinking awk, or a Perl program that takes the same approach.

Let's take another look at that input-file of yours...

Code:

Drive = C:
Description = Local Fixed Disk
Compressed = No
File System = NTFS
Size = 12.00 GB (12,880,787,968 bytes)
Free Space = 4.42 GB (4,745,418,752 bytes)
Volume Name =
Volume Serial Number = A4B40FDF
~ ~ ~ ~ ~
Drive = D:
Description = CD-ROM Disc
~ ~ ~ ~ ~

Now, the approach that I would take on this (influenced, of course, by awk's approach ... and Perl grew out of awk ...) is to look at the file "one record at a time" and decide (a) how do I recognize each record-type, and (b) having recognized it, what do I want to do with it.

Each record can be identified by the first few characters...[list=1][*] At the BEGINning of the job, I want to initialize an empty record into which I'm gonna stuff data.[*]At the END, I'm going to want to make sure that I've written-out the last record.[*] For /^Drive\s*\=\*(.*)$, I'm going to capture the first phrase ($1, as captured by the parenthesized group (.*) in that regular-expression, and I'm going to remember this as the "Drive."[*] Other types of records are similar, and for Size and Free Space I might want to tear the string apart even more thoroughly to get easy access to various the numbers that are inside.[*]For /^\~/ ... a line beginning with "~" ... I know that I'm seeing the end of the current record, so I want to compose it and write it out (as I do at the END.

I suggest that you set-aside your Perl program for a few minutes and have a look at the material that's available with the info gawk command. (That's the "GNU awk," which is undoubtedly the implementation you have, and the material's a whole lot friendlier than man awk.)

One of the real "tricks" in this business is knowing what's the best tool for the job ... and conditioning yourself to look for an existing tool before you start to write your own. Even in Perl.

Lordandmaker · 01-22-2009, 09:36 AM

Quote:

Originally Posted by Telemachos

Wje_lq and Chrism made covered a lot of ground, but in the spirit of TIMTOWDI, here's how I might tackle this. (It's hard to get a complete sense of it, since I'm not really sure of the larger goals.)

I should perhaps have included them.
The idea is to generate a report on the hard drive usage of a Windows box. I use msinfo32.exe to produce this report, and this script gets data from it and outputs it to stdout to be piped to mail. Currently the output is:

Code:

Drive Size    Free    Used    %Free   Name
C:      12.0    4.42    7.58    36.8
E:      7.00    3.29    3.71    47      SYS
F:      254.    182.    72      71.6    DATA

It is to then be put in a loop which goes through all the reports in a directory created by the servers. (I know there are faaar better ways of doing this, like dedicated apps, but my boss has decided they're a bad idea, and the more programming I do myself, the more likely I am to get the company to pay me through a course).
There is a shebang and use strict, warnings and diagnostics at the top.

Quote:

# For debugging, I often use Data:

umper to show me the look of the
# data structure I created; uncomment if something goes wrong
#use Data:

umper;
#print Dumper \@information;

Is this approximately analogous to PHP's var_dump()?

Quote:

There are a lot of temporary and maybe unnecessary variables here, but this gives you some idea.

That's given me several. I was just trying to work out how to do what I was doing without reading the whole file into another var first, that'll do it. Cheers!

Quote:

Originally Posted by sundialsvcs

I'm looking at this requirement, and I'm thinking awk, or a Perl program that takes the same approach.

I had pondered awk, but for whatever reason lumped with perl...
I might well do this later when I have some spare time as an excercise in awk.

Quote:

I suggest that you set-aside your Perl program for a few minutes and have a look at the material that's available with the info gawk command. (That's the "GNU awk," which is undoubtedly the implementation you have, and the material's a whole lot friendlier than man awk.)

Aha! I'll give that a go. I've tried using man awk before, never occurred to me to try info gawk. Cheers!

Quote:

One of the real "tricks" in this business is knowing what's the best tool for the job ... and conditioning yourself to look for an existing tool before you start to write your own. Even in Perl.

Yeah, I'm _really_ bad at both of these, in real life as well as in computers.