ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am writing a script to go through a file of blocks of key/value pairs, and convert them into an array of hashes. Each hash should contain one 'block' of data (i.e. split by an empty line), and be composed of the key/value pairs.
The file is as follows:
Code:
Item Value
Drive C:
Description Local Fixed Disk
Compressed No
File System NTFS
Size 12.00 GB (12,880,787,968 bytes)
Free Space 4.42 GB (4,745,418,752 bytes)
Volume Name
Volume Serial Number A4B40FDF
Drive D:
Description CD-ROM Disc
Drive E:
Description Local Fixed Disk
Compressed No
File System NTFS
Size 7.00 GB (7,517,904,896 bytes)
Free Space 3.29 GB (3,534,249,984 bytes)
Volume Name SYS
Volume Serial Number BE1F14B0
Drive F:
Description Local Fixed Disk
Compressed No
File System NTFS
Size 254.21 GB (272,955,916,288 bytes)
Free Space 182.64 GB (196,106,596,352 bytes)
Volume Name DATA
Volume Serial Number 9448FDDF
Drive J:
Description Network Connection
Provider Name \\jup-svr01\jbwv2
Drive M:
Description Network Connection
Provider Name \\jup-svr02\macshare
Drive S:
Description Network Connection
Provider Name \\jup-svr01\Shared
Drive U:
Description Network Connection
Provider Name \\jup-svr01\users\Avi Greenbury
I can't see what I've got wrong here - it looks to me like writing to the hashes and pushing them onto the array happen at the correct times? It certainly seems to be the same as any examples I can find on the net. Any ideas?
Last edited by Lordandmaker; 01-21-2009 at 07:23 AM.
Reason: typo
I'm not following a bunch of your choices, but they may have good reason. (For example, why stuff everything into the scalar $data only to split it again later?) I'm also heading out, so this is quick, but I think the key problem is how you are getting items into the %record hash (and what is ending up in the @information array).
Here's a quick stab from me with output (from your file). I removed the print statements and substituted a printout of the structure of the @information array to help show what gets in there:
Code:
#!/usr/bin/env perl
use warnings;
use strict;
my $data;
while (<>){
if (($_ =~ /^Info/)){
$_ = "";
}
$data.=$_;
}
my @data_blocks=split(/\n\n/, $data);
my @information;
foreach (@data_blocks){
my %record; # Initialize an empty array for each chunk
my @lines = split (/\n/, $_);
foreach (@lines){
my ($key, $value)= split (/[ ]{2,}/, $_);
$record{$key} = $value; # Load up the array for each line
# No arrow - not dereferencing anything here
}
push @information, \%record; # Push the hash reference for each chunk into @information
}
use Data::Dumper;
print Dumper \@information;
Telemachos suggests well. But I'm going to take a different tack on this problem. Rather than provide an alternative suggestion, I'm going to answer the question behind Lordandmaker's question.
The question is: "How do I get the unique drive designations to show, rather than the final one each time?"
The question behind the question is: "What don't I understand about references and about arrays of hashes?" That's the one I wish to address, and I'll do it by starting with Lordandmaker's script.
First I'll make four adjustments to the script as Lordandmaker presented it, to make it easily runnable:
I'll put a perl shebang at the beginning. This should vary from system to system, because not all distributions put perl in the same place. Telemachos has an interesting solution to this, but I want to keep this simple and focus on the question at hand.
I'll assign a constant input filename.
I'll comment out the print statements, and uncomment the final paragraph.
I'll tame the output a little by moving the line feed character so that it's at the end of each line, not at the beginning.
Here's the script so far, with the changes in red:
So, why doesn't the code behave? Here's where my explaining skills fall down a little, because I'm not intimately familiar with references in Perl, or arrays of hashes for that matter. (I actually didn't use them until you presented this problem, and I may never use them again, because I think this aspect of Perl is too ugly to be allowed to live.) But first I'll show you how to fix the script. Then I'll show you where to find an explanation that should work for you.
To fix the script, I commented out one line and added another. Here's the revised script:
I'm not following a bunch of your choices, but they may have good reason. (For example, why stuff everything into the scalar $data only to split it again later?) I'm also heading out, so this is quick, but I think the key problem is how you are getting items into the %record hash (and what is ending up in the @information array).
They probably don't have good reason. If you've got time, I'd appreciate whatever you can give as pointers of what it is I'm getting wrong. All of my perl 'knowledge' comes from playing until it works, rather than any actual learning (this is changing, slowly)
The change suggested by yourself and wje worked, as I presume you expected, so thank you for that! The best bit is that I can see /why/ it worked, too.
*wanders off to read yet more perldoc*
Last edited by Lordandmaker; 01-22-2009 at 03:22 AM.
Re Telemachos comment, which I agree with, normally you'd deal with each line as it came in from the file, instead of saving to a scalar then processing again. Imagine you've got a LOT of data, could create an out-of-mem situation. You'd have 2 copies of the data in mem...
Also, usually a good idea to split off the newline char immediately after reading so:
Code:
while (<>)
{
chomp();
...
}
I also prefer named recs, as the default $_ can be confusing if nested, similarly using the 'invisible' implied $_, so I'd use this sort of construct
Code:
# Open cfg file
open( CONFIG_FILE, "<$cfg_file" ) or
die "Can't open cfg file: $cfg_file: $!\n";
# Process cfg file records
while ( defined ( $cfg_rec = <CONFIG_FILE> ) )
{
# Remove unwanted chars
chomp $cfg_rec; # newline
$cfg_rec =~ s/#.*//; # comments
$cfg_rec =~ s/^\s+//; # leading whitespace
$cfg_rec =~ s/\s+$//; # trailing whitespace
next unless length($cfg_rec); # anything left?
# Split 'key=value' string
($key, $value) = split( /\s*=\s*/, $cfg_rec, 2);
# Assign to global hash, forcing uppercase keys
$cfg::params{uc($key)} = $value;
}
TMTOWTDI : There's More Than One Way To Do It - Perl motto
If you've got time, I'd appreciate whatever you can give as pointers of what it is I'm getting wrong. All of my perl 'knowledge' comes from playing until it works, rather than any actual learning (this is changing, slowly)
Wje_lq and Chrism covered a lot of ground, but in the spirit of TIMTOWDI, here's how I might tackle this. (It's hard to get a complete sense of it, since I'm not really sure of the larger goals.)
I've put comments in the code to clarify some of my choices:
Code:
#!/usr/bin/env perl
use strict; # Always use strict and warnings
use warnings; # This will catch tons of mistakes for you
my $input_file = 'file'; # You can hardcode the file name or use <>
open my $file_handle, $input_file # No need to quote $input_file here
or die "Can't open $input_file: $!"; # Include $! in calls to die since it has the system error value
# which is often helpful
my @data_blocks; # Create the empty array - no need to use = ("") or the like
{
local $/ = "\n\n"; # Temporarily redefine the $/ value (normally \n) which determine what counts as one "line"
while (<$file_handle>) { # Read the file, one "line" at a time
# But a line now is a block
chomp $_;
push @data_blocks, $_; # Push each block into @data_blocks
}
}
my @information;
foreach my $block (@data_blocks) { # Take one block at a time
my @lines = split /\n/, $block; # Split into lines
my $record = {}; # Initialize $record as an empty hash reference
foreach my $line (@lines) {
next if $line =~ m/^Item/; # Skip the intro line
next if $line =~ m/^~+$/; # Skip the lines with only ~
my ($key, $value) = split /[ ]{2,}/, $line; # Get key, value pairs
$record->{$key} = $value; # And assign to the hash reference
}
push @information, $record; # Then push hash ref onto the @information array
}
# For debugging, I often use Data::Dumper to show me the look of the
# data structure I created; uncomment if something goes wrong
#use Data::Dumper;
#print Dumper \@information;
# One way to print out the whole shebang without much fancy formatting
foreach my $record (@information) {
foreach my $key (keys %$record) {
if ( defined $record->{$key} ) {
print "$key => $record->{$key}\n";
}
else {
print "$key => undef\n";
}
}
print "\n";
}
There are a lot of temporary and maybe unnecessary variables here, but this gives you some idea.
Last edited by Telemachos; 01-22-2009 at 09:26 AM.
I'm looking at this requirement, and I'm thinking awk, or a Perl program that takes the same approach.
Let's take another look at that input-file of yours...
Code:
Drive = C:
Description = Local Fixed Disk
Compressed = No
File System = NTFS
Size = 12.00 GB (12,880,787,968 bytes)
Free Space = 4.42 GB (4,745,418,752 bytes)
Volume Name =
Volume Serial Number = A4B40FDF
~ ~ ~ ~ ~
Drive = D:
Description = CD-ROM Disc
~ ~ ~ ~ ~
Now, the approach that I would take on this (influenced, of course, by awk's approach ... and Perl grew out of awk ...) is to look at the file "one record at a time" and decide (a) how do I recognize each record-type, and (b) having recognized it, what do I want to do with it.
Each record can be identified by the first few characters...[list=1][*] At the BEGINning of the job, I want to initialize an empty record into which I'm gonna stuff data.[*]At the END, I'm going to want to make sure that I've written-out the last record.[*] For /^Drive\s*\=\*(.*)$, I'm going to capture the first phrase ($1, as captured by the parenthesized group (.*) in that regular-expression, and I'm going to remember this as the "Drive."[*] Other types of records are similar, and for Size and Free Space I might want to tear the string apart even more thoroughly to get easy access to various the numbers that are inside.[*]For /^\~/ ... a line beginning with "~" ... I know that I'm seeing the end of the current record, so I want to compose it and write it out (as I do at the END.
I suggest that you set-aside your Perl program for a few minutes and have a look at the material that's available with the info gawk command. (That's the "GNU awk," which is undoubtedly the implementation you have, and the material's a whole lot friendlier than man awk.)
One of the real "tricks" in this business is knowing what's the best tool for the job ... and conditioning yourself to look for an existing tool before you start to write your own. Even in Perl.
Wje_lq and Chrism made covered a lot of ground, but in the spirit of TIMTOWDI, here's how I might tackle this. (It's hard to get a complete sense of it, since I'm not really sure of the larger goals.)
I should perhaps have included them.
The idea is to generate a report on the hard drive usage of a Windows box. I use msinfo32.exe to produce this report, and this script gets data from it and outputs it to stdout to be piped to mail. Currently the output is:
Code:
Drive Size Free Used %Free Name
C: 12.0 4.42 7.58 36.8
E: 7.00 3.29 3.71 47 SYS
F: 254. 182. 72 71.6 DATA
It is to then be put in a loop which goes through all the reports in a directory created by the servers. (I know there are faaar better ways of doing this, like dedicated apps, but my boss has decided they're a bad idea, and the more programming I do myself, the more likely I am to get the company to pay me through a course).
There is a shebang and use strict, warnings and diagnostics at the top.
Quote:
# For debugging, I often use Data:umper to show me the look of the
# data structure I created; uncomment if something goes wrong
#use Data:umper;
#print Dumper \@information;
Is this approximately analogous to PHP's var_dump()?
Quote:
There are a lot of temporary and maybe unnecessary variables here, but this gives you some idea.
That's given me several. I was just trying to work out how to do what I was doing without reading the whole file into another var first, that'll do it. Cheers!
Quote:
Originally Posted by sundialsvcs
I'm looking at this requirement, and I'm thinking awk, or a Perl program that takes the same approach.
I had pondered awk, but for whatever reason lumped with perl...
I might well do this later when I have some spare time as an excercise in awk.
Quote:
I suggest that you set-aside your Perl program for a few minutes and have a look at the material that's available with the info gawk command. (That's the "GNU awk," which is undoubtedly the implementation you have, and the material's a whole lot friendlier than man awk.)
Aha! I'll give that a go. I've tried using man awk before, never occurred to me to try info gawk. Cheers!
Quote:
One of the real "tricks" in this business is knowing what's the best tool for the job ... and conditioning yourself to look for an existing tool before you start to write your own. Even in Perl.
Yeah, I'm _really_ bad at both of these, in real life as well as in computers.
Last edited by Lordandmaker; 01-22-2009 at 09:49 AM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.