LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-21-2009, 07:21 AM   #1
Lordandmaker
Member
 
Registered: Sep 2005
Location: London, UK
Distribution: Debian
Posts: 258

Rep: Reputation: 39
Perl - problems creating an array of hashes


I am writing a script to go through a file of blocks of key/value pairs, and convert them into an array of hashes. Each hash should contain one 'block' of data (i.e. split by an empty line), and be composed of the key/value pairs.
The file is as follows:
Code:
Item    Value
Drive   C:
Description     Local Fixed Disk
Compressed      No
File System     NTFS
Size    12.00 GB (12,880,787,968 bytes)
Free Space      4.42 GB (4,745,418,752 bytes)
Volume Name
Volume Serial Number    A4B40FDF

Drive   D:
Description     CD-ROM Disc

Drive   E:
Description     Local Fixed Disk
Compressed      No
File System     NTFS
Size    7.00 GB (7,517,904,896 bytes)
Free Space      3.29 GB (3,534,249,984 bytes)
Volume Name     SYS
Volume Serial Number    BE1F14B0

Drive   F:
Description     Local Fixed Disk
Compressed      No
File System     NTFS
Size    254.21 GB (272,955,916,288 bytes)
Free Space      182.64 GB (196,106,596,352 bytes)
Volume Name     DATA
Volume Serial Number    9448FDDF

Drive   J:
Description     Network Connection
Provider Name   \\jup-svr01\jbwv2

Drive   M:
Description     Network Connection
Provider Name   \\jup-svr02\macshare

Drive   S:
Description     Network Connection
Provider Name   \\jup-svr01\Shared

Drive   U:
Description     Network Connection
Provider Name   \\jup-svr01\users\Avi Greenbury
The code is as follows:

Code:
open REPORT, "$input_file" || die ("Error opening input file at $input_file");
while (<REPORT>){
        if (($_ =~ /^Info/)){
                $_ = "";
        }
        $data.=$_;
}
close REPORT;

@data_blocks=split(/\n\n/, $data);
@information=("");
$record="";
foreach (@data_blocks){
        @lines = split (/\n/, $_);
        foreach (@lines){
                ($key, $value)= split (/[ ]{2,}/, $_);
                $record->{$key} = $value;
                print $key, " = ", $value, "\n";
        }
        print "~ ~ ~ ~ ~\n";
        push @information, $record;
}

#$num_records = $#information;
#for ($i=0; $i<$num_records; $i++){
#       print "\ndrive =", $information[$i]{'Drive'};
#       print $i;
#}
If I run the file as-is, it prints out exactly what I want - a series of key/value pairs, in blocks separated with "~ ~ ~ ~":
Code:
Drive = C:
Description = Local Fixed Disk
Compressed = No
File System = NTFS
Size = 12.00 GB (12,880,787,968 bytes)
Free Space = 4.42 GB (4,745,418,752 bytes)
Volume Name =
Volume Serial Number = A4B40FDF
~ ~ ~ ~ ~
Drive = D:
Description = CD-ROM Disc
~ ~ ~ ~ ~
Drive = E:
Description = Local Fixed Disk
Compressed = No
File System = NTFS
etc....
If I comment out those two print lines and instead run the for loop, it appears I've got the same hash in the array multiple times:
Code:
drive =U:0
drive =U:1
drive =U:2
drive =U:3
drive =U:4
drive =U:5
drive =U:6
drive =U:7
I can't see what I've got wrong here - it looks to me like writing to the hashes and pushing them onto the array happen at the correct times? It certainly seems to be the same as any examples I can find on the net. Any ideas?

Last edited by Lordandmaker; 01-21-2009 at 07:23 AM. Reason: typo
 
Old 01-21-2009, 12:56 PM   #2
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 60
Arrow

I'm not following a bunch of your choices, but they may have good reason. (For example, why stuff everything into the scalar $data only to split it again later?) I'm also heading out, so this is quick, but I think the key problem is how you are getting items into the %record hash (and what is ending up in the @information array).

Here's a quick stab from me with output (from your file). I removed the print statements and substituted a printout of the structure of the @information array to help show what gets in there:
Code:
#!/usr/bin/env perl
use warnings;
use strict;

my $data;

while (<>){
    if (($_ =~ /^Info/)){
                $_ = "";
        }
        $data.=$_;
}

my @data_blocks=split(/\n\n/, $data);
my @information;

foreach (@data_blocks){
    my %record;                          # Initialize an empty array for each chunk
    my @lines = split (/\n/, $_);

    foreach (@lines){
        my ($key, $value)= split (/[ ]{2,}/, $_);
        $record{$key} = $value;          # Load up the array for each line
                                         # No arrow - not dereferencing anything here
    }

    push @information, \%record;         # Push the hash reference for each chunk into @information
}

use Data::Dumper;
print Dumper \@information;
Output (using your file as 'input'):
Code:
hektor ~ $ ./AoH input 
$VAR1 = [
          {
            'Item' => 'Value',
            'Free Space' => '4.42 GB (4,745,418,752 bytes)',
            'Volume Name' => undef,
            'Drive' => 'C:',
            'Volume Serial Number' => 'A4B40FDF',
            'File System' => 'NTFS',
            'Size' => '12.00 GB (12,880,787,968 bytes)',
            'Description' => 'Local Fixed Disk',
            'Compressed' => 'No'
          },
          {
            'Description' => 'CD-ROM Disc',
            'Drive' => 'D:'
          },
          {
            'Free Space' => '3.29 GB (3,534,249,984 bytes)',
            'Volume Name' => 'SYS',
            'Drive' => 'E:',
            'File System' => 'NTFS',
            'Volume Serial Number' => 'BE1F14B0',
            'Size' => '7.00 GB (7,517,904,896 bytes)',
            'Description' => 'Local Fixed Disk',
            'Compressed' => 'No'
          },
          {
            'Free Space' => '182.64 GB (196,106,596,352 bytes)',
            'Volume Name' => 'DATA',
            'Drive' => 'F:',
            'File System' => 'NTFS',
            'Volume Serial Number' => '9448FDDF',
            'Size' => '254.21 GB (272,955,916,288 bytes)',
            'Description' => 'Local Fixed Disk',
            'Compressed' => 'No'
          },
          {
            'Provider Name' => '\\\\jup-svr01\\jbwv2',
            'Description' => 'Network Connection',
            'Drive' => 'J:'
          },
          {
            'Provider Name' => '\\\\jup-svr02\\macshare',
            'Description' => 'Network Connection',
            'Drive' => 'M:'
          },
          {
            'Provider Name' => '\\\\jup-svr01\\Shared',
            'Description' => 'Network Connection',
            'Drive' => 'S:'
          },
          {
            'Provider Name' => '\\\\jup-svr01\\users\\Avi Greenbury',
            'Description' => 'Network Connection',
            'Drive' => 'U:'
          }
        ];

Last edited by Telemachos; 01-21-2009 at 12:58 PM.
 
Old 01-21-2009, 02:59 PM   #3
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 60
Quick follow-up: if you want to display the drives, this will work:
Code:
foreach my $record (@information) {
    print "$record->{'Drive'}\n";
}
On the other hand, without more context that really isn't going to give much information. Here's the output:
Code:
hektor ~ $ ./AoH input 
C:
D:
E:
F:
J:
M:
S:
U:
Edit: This seems a little more informative:
Code:
foreach my $number (0..$#information) {
    printf "Record #%d -  %s\n",
    $number + 1, $information[$number]->{'Drive'};
}
Output:
Code:
hektor ~ $ ./AoH input 
Record #1 -  C:
Record #2 -  D:
Record #3 -  E:
Record #4 -  F:
Record #5 -  J:
Record #6 -  M:
Record #7 -  S:
Record #8 -  U:

Last edited by Telemachos; 01-21-2009 at 03:16 PM.
 
Old 01-21-2009, 08:51 PM   #4
wje_lq
Member
 
Registered: Sep 2007
Location: Mariposa
Distribution: FreeBSD,Debian wheezy
Posts: 811

Rep: Reputation: 179Reputation: 179
Telemachos suggests well. But I'm going to take a different tack on this problem. Rather than provide an alternative suggestion, I'm going to answer the question behind Lordandmaker's question.

The question is: "How do I get the unique drive designations to show, rather than the final one each time?"

The question behind the question is: "What don't I understand about references and about arrays of hashes?" That's the one I wish to address, and I'll do it by starting with Lordandmaker's script.

First I'll make four adjustments to the script as Lordandmaker presented it, to make it easily runnable:
  1. I'll put a perl shebang at the beginning. This should vary from system to system, because not all distributions put perl in the same place. Telemachos has an interesting solution to this, but I want to keep this simple and focus on the question at hand.
  2. I'll assign a constant input filename.
  3. I'll comment out the print statements, and uncomment the final paragraph.
  4. I'll tame the output a little by moving the line feed character so that it's at the end of each line, not at the beginning.
Here's the script so far, with the changes in red:
Code:
#!/usr/bin/perl

$input_file="input.txt";

open REPORT, "$input_file" || die ("Error opening input file at $input_file");
while (<REPORT>){
        if (($_ =~ /^Info/)){
                $_ = "";
        }
        $data.=$_;
}
close REPORT;

@data_blocks=split(/\n\n/, $data);
@information=("");
$record="";
foreach (@data_blocks){
        @lines = split (/\n/, $_);
        foreach (@lines){
                ($key, $value)= split (/[ ]{2,}/, $_);
                $record->{$key} = $value;
                # commented out: print $key, " = ", $value, "\n";
        }
        # commented out: print "~ ~ ~ ~ ~\n";
        push @information, $record;
}

# uncommented:

$num_records = $#information;
for ($i=0; $i<$num_records; $i++){
       print "drive =", $information[$i]{'Drive'};
       print "$i\n";
}
So, why doesn't the code behave? Here's where my explaining skills fall down a little, because I'm not intimately familiar with references in Perl, or arrays of hashes for that matter. (I actually didn't use them until you presented this problem, and I may never use them again, because I think this aspect of Perl is too ugly to be allowed to live.) But first I'll show you how to fix the script. Then I'll show you where to find an explanation that should work for you.

To fix the script, I commented out one line and added another. Here's the revised script:
Code:
#!/usr/bin/perl

$input_file="input.txt";

open REPORT, "$input_file" || die ("Error opening input file at $input_file");
while (<REPORT>){
        if (($_ =~ /^Info/)){
                $_ = "";
        }
        $data.=$_;
}
close REPORT;

@data_blocks=split(/\n\n/, $data);
@information=("");
# $record="";
foreach (@data_blocks){
        my $record;
        @lines = split (/\n/, $_);
        foreach (@lines){
                ($key, $value)= split (/[ ]{2,}/, $_);
                $record->{$key} = $value;
                # commented out: print $key, " = ", $value, "\n";
        }
        # commented out: print "~ ~ ~ ~ ~\n";
        push @information, $record;
}

# uncommented:

$num_records = $#information;
for ($i=0; $i<$num_records; $i++){
       print "drive =", $information[$i]{'Drive'};
       print "$i\n";
}
Presto. It works. Here's the explanation. When you're done reading the explanation, the rest of that web page might prove useful to you.

Hope this helps.
 
Old 01-22-2009, 03:21 AM   #5
Lordandmaker
Member
 
Registered: Sep 2005
Location: London, UK
Distribution: Debian
Posts: 258

Original Poster
Rep: Reputation: 39
Thumbs up

Quote:
Originally Posted by Telemachos View Post
I'm not following a bunch of your choices, but they may have good reason. (For example, why stuff everything into the scalar $data only to split it again later?) I'm also heading out, so this is quick, but I think the key problem is how you are getting items into the %record hash (and what is ending up in the @information array).
They probably don't have good reason. If you've got time, I'd appreciate whatever you can give as pointers of what it is I'm getting wrong. All of my perl 'knowledge' comes from playing until it works, rather than any actual learning (this is changing, slowly)

The change suggested by yourself and wje worked, as I presume you expected, so thank you for that! The best bit is that I can see /why/ it worked, too.

*wanders off to read yet more perldoc*

Last edited by Lordandmaker; 01-22-2009 at 03:22 AM.
 
Old 01-22-2009, 04:46 AM   #6
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,362

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
Well, the 2 things I'd point you at (if you don't already know them) is:
http://perldoc.perl.org/ - complete lang docs, inc examples and reasons & tutorials
http://www.perlmonks.org/?node=Tutorials - grouped tutorials at the guru site for Perl

Re Telemachos comment, which I agree with, normally you'd deal with each line as it came in from the file, instead of saving to a scalar then processing again. Imagine you've got a LOT of data, could create an out-of-mem situation. You'd have 2 copies of the data in mem...
Also, usually a good idea to split off the newline char immediately after reading so:

Code:
while (<>)
{
    chomp();
...
}
I also prefer named recs, as the default $_ can be confusing if nested, similarly using the 'invisible' implied $_, so I'd use this sort of construct
Code:
    # Open cfg file
    open( CONFIG_FILE, "<$cfg_file" ) or
            die "Can't open cfg file: $cfg_file: $!\n";

    # Process cfg file records
    while ( defined ( $cfg_rec = <CONFIG_FILE> ) )
    {
        # Remove unwanted chars
        chomp $cfg_rec;                 # newline
        $cfg_rec =~ s/#.*//;            # comments
        $cfg_rec =~ s/^\s+//;           # leading whitespace
        $cfg_rec =~ s/\s+$//;           # trailing whitespace

        next unless length($cfg_rec);   # anything left?

        # Split 'key=value' string
        ($key, $value) = split( /\s*=\s*/, $cfg_rec, 2);

        # Assign to global hash, forcing uppercase keys
        $cfg::params{uc($key)} = $value;
    }
TMTOWTDI : There's More Than One Way To Do It - Perl motto
 
Old 01-22-2009, 05:19 AM   #7
wje_lq
Member
 
Registered: Sep 2007
Location: Mariposa
Distribution: FreeBSD,Debian wheezy
Posts: 811

Rep: Reputation: 179Reputation: 179
Quote:
TMTOWTDI : There's More Than One Way To Do It - Perl motto
The blessing of Perl script creators.

The bane of Perl script maintainers.
 
Old 01-22-2009, 07:51 AM   #8
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 60
Quote:
Originally Posted by Lordandmaker View Post
If you've got time, I'd appreciate whatever you can give as pointers of what it is I'm getting wrong. All of my perl 'knowledge' comes from playing until it works, rather than any actual learning (this is changing, slowly)
Wje_lq and Chrism covered a lot of ground, but in the spirit of TIMTOWDI, here's how I might tackle this. (It's hard to get a complete sense of it, since I'm not really sure of the larger goals.)

I've put comments in the code to clarify some of my choices:
Code:
#!/usr/bin/env perl
use strict;                   # Always use strict and warnings
use warnings;                 # This will catch tons of mistakes for you

my $input_file = 'file';      # You can hardcode the file name or use <>

open my $file_handle, $input_file  # No need to quote $input_file here
    or die "Can't open $input_file: $!";  # Include $! in calls to die since it has the system error value
                                          # which is often helpful

my @data_blocks;                          # Create the empty array - no need to use = ("") or the like

{
    local $/ = "\n\n";                    # Temporarily redefine the $/ value (normally \n) which determine what counts as one "line"
    while (<$file_handle>) {              # Read the file, one "line" at a time
                                          # But a line now is a block
        chomp $_;                         
        push @data_blocks, $_;            # Push each block into @data_blocks
    }
}

my @information;

foreach my $block (@data_blocks) {        # Take one block at a time
    my @lines = split /\n/, $block;       # Split into lines
    my $record = {};                      # Initialize $record as an empty hash reference
    
    foreach my $line (@lines) {
        next if $line =~ m/^Item/;        # Skip the intro line
        next if $line =~ m/^~+$/;         # Skip the lines with only ~
        my ($key, $value) = split /[ ]{2,}/, $line;  # Get key, value pairs
        $record->{$key} = $value;                    # And assign to the hash reference
    }

    push @information, $record;                      # Then push hash ref onto the @information array
}

# For debugging, I often use Data::Dumper to show me the look of the
# data structure I created; uncomment if something goes wrong
#use Data::Dumper;
#print Dumper \@information;

# One way to print out the whole shebang without much fancy formatting
foreach my $record (@information) {
    foreach my $key (keys %$record) {
        if ( defined $record->{$key} ) {
            print "$key => $record->{$key}\n";
        }
        else {
            print "$key => undef\n";
        }
    }
    print "\n";
}
There are a lot of temporary and maybe unnecessary variables here, but this gives you some idea.

Last edited by Telemachos; 01-22-2009 at 09:26 AM.
 
Old 01-22-2009, 08:48 AM   #9
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,670
Blog Entries: 4

Rep: Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945
I'm looking at this requirement, and I'm thinking awk, or a Perl program that takes the same approach.

Let's take another look at that input-file of yours...
Code:
Drive = C:
Description = Local Fixed Disk
Compressed = No
File System = NTFS
Size = 12.00 GB (12,880,787,968 bytes)
Free Space = 4.42 GB (4,745,418,752 bytes)
Volume Name =
Volume Serial Number = A4B40FDF
~ ~ ~ ~ ~
Drive = D:
Description = CD-ROM Disc
~ ~ ~ ~ ~
Now, the approach that I would take on this (influenced, of course, by awk's approach ... and Perl grew out of awk ...) is to look at the file "one record at a time" and decide (a) how do I recognize each record-type, and (b) having recognized it, what do I want to do with it.

Each record can be identified by the first few characters...[list=1][*] At the BEGINning of the job, I want to initialize an empty record into which I'm gonna stuff data.[*]At the END, I'm going to want to make sure that I've written-out the last record.[*] For /^Drive\s*\=\*(.*)$, I'm going to capture the first phrase ($1, as captured by the parenthesized group (.*) in that regular-expression, and I'm going to remember this as the "Drive."[*] Other types of records are similar, and for Size and Free Space I might want to tear the string apart even more thoroughly to get easy access to various the numbers that are inside.[*]For /^\~/ ... a line beginning with "~" ... I know that I'm seeing the end of the current record, so I want to compose it and write it out (as I do at the END.

I suggest that you set-aside your Perl program for a few minutes and have a look at the material that's available with the info gawk command. (That's the "GNU awk," which is undoubtedly the implementation you have, and the material's a whole lot friendlier than man awk.)

One of the real "tricks" in this business is knowing what's the best tool for the job ... and conditioning yourself to look for an existing tool before you start to write your own. Even in Perl.
 
Old 01-22-2009, 09:36 AM   #10
Lordandmaker
Member
 
Registered: Sep 2005
Location: London, UK
Distribution: Debian
Posts: 258

Original Poster
Rep: Reputation: 39
Quote:
Originally Posted by Telemachos View Post
Wje_lq and Chrism made covered a lot of ground, but in the spirit of TIMTOWDI, here's how I might tackle this. (It's hard to get a complete sense of it, since I'm not really sure of the larger goals.)
I should perhaps have included them.
The idea is to generate a report on the hard drive usage of a Windows box. I use msinfo32.exe to produce this report, and this script gets data from it and outputs it to stdout to be piped to mail. Currently the output is:
Code:
Drive Size    Free    Used    %Free   Name
C:      12.0    4.42    7.58    36.8
E:      7.00    3.29    3.71    47      SYS
F:      254.    182.    72      71.6    DATA
It is to then be put in a loop which goes through all the reports in a directory created by the servers. (I know there are faaar better ways of doing this, like dedicated apps, but my boss has decided they're a bad idea, and the more programming I do myself, the more likely I am to get the company to pay me through a course).
There is a shebang and use strict, warnings and diagnostics at the top.

Quote:
# For debugging, I often use Data:umper to show me the look of the
# data structure I created; uncomment if something goes wrong
#use Data:umper;
#print Dumper \@information;
Is this approximately analogous to PHP's var_dump()?

Quote:
There are a lot of temporary and maybe unnecessary variables here, but this gives you some idea.
That's given me several. I was just trying to work out how to do what I was doing without reading the whole file into another var first, that'll do it. Cheers!


Quote:
Originally Posted by sundialsvcs View Post
I'm looking at this requirement, and I'm thinking awk, or a Perl program that takes the same approach.
I had pondered awk, but for whatever reason lumped with perl...
I might well do this later when I have some spare time as an excercise in awk.
Quote:
I suggest that you set-aside your Perl program for a few minutes and have a look at the material that's available with the info gawk command. (That's the "GNU awk," which is undoubtedly the implementation you have, and the material's a whole lot friendlier than man awk.)
Aha! I'll give that a go. I've tried using man awk before, never occurred to me to try info gawk. Cheers!
Quote:
One of the real "tricks" in this business is knowing what's the best tool for the job ... and conditioning yourself to look for an existing tool before you start to write your own. Even in Perl.
Yeah, I'm _really_ bad at both of these, in real life as well as in computers.

Last edited by Lordandmaker; 01-22-2009 at 09:49 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl hash of hashes props666999 Programming 2 09-07-2006 04:43 AM
Perl hashes ShaqDiesel Programming 6 08-09-2006 02:54 AM
Nesting Hashes in Perl. faref Programming 2 06-07-2006 05:03 PM
Passing hash of hashes as a parameter (Perl) rose_bud4201 Programming 8 04-21-2005 07:18 PM
weird behaviour with hashes in perl weird_guy Programming 0 06-22-2004 09:51 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:54 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration