Perl read file and parse blocks

Goni · 08-21-2010, 06:11 AM

Hello,
I am trying to make a perl script which reads data from a file and parse it. The data in the file has the following syntax

Code:

    Device Physical Name     : Not Visible
    Device Symmetrix Name    : 1234
    Device Serial ID         : N/A
    Attached BCV Device      : N/A
    Attached VDEV TGT Device : N/A
    Device Capacity
        {
        Cylinders            :       5120
        Tracks               :      76800
        512-byte Blocks      :   10485760
        MegaBytes            :       5120
        KiloBytes            :    5242880
        }

    Device Physical Name     : Not Visible
    Device Symmetrix Name    : 4567
    Device Serial ID         : N/A
    Device Capacity
        {
        Cylinders            :       5120
        Tracks               :      76800
        512-byte Blocks      :   10485760
        MegaBytes            :       5120
        KiloBytes            :    5242880
        }

Each unique record starts with "Device Physical Name". So, I have a set of records within "Device Physical Name". I want to read this set of records starting from "Device Physical Name" and ends up till next "Device Physical Name".

Offcourse FS is ":", and I just want to print/or later put info in a csv file. Would appreciate if I can get any help.

Goni

grail · 08-21-2010, 06:17 AM

Well this sounds very achievable

Where are you stuck? What have you tried?

Goni · 08-21-2010, 06:22 AM

Quote:

Originally Posted by grail

Well this sounds very achievable

Where are you stuck? What have you tried?

Since I am new to perl, that's why I asked for help.

Code:

#!/usr/bin/perl
$data_file="dev";
open(DAT, $data_file) || die("Could not open file!");
@raw_data=<DAT>;
foreach $device (@raw_data)
{
chomp $device;
($d1, $d2)=split(/\:/,$device);
while (($d1) = "Device Physical Name") {
print "$d1";
}
}

I am trying to get first field in d1 and it's value in d2. Later can play around with both d1 and its value.

grail · 08-21-2010, 06:32 AM

Ok I see where you are going

If you search around you will see a common way of reading from a file is as follows:

Code:

open(HANDLE, "file_name") || die "couldn't open the file!";

while($line = <HANDLE>){ # This is also seen a lot as while(<HANDLE>) but you can look that up
    print $line;
}

close(HANDLE);

Obviously you can split and do other thing other than print.

See if that helps?

konsolebox · 08-21-2010, 06:35 AM

Are the contents of the file in uniform? It looks like that its main delimeter is a blank line. In awk that can be easily achieved with RS = "" and FS = ":". But I'm interested to learn hacking this in Perl.

Edit: I think FS = ":" won't do. But anyway, how do you intend to save the values in csv?... Noting that some items have more parameters than the other: Attached BCV Device, Attached VDEV TGT Device..

Goni · 08-21-2010, 06:56 AM

Quote:

Originally Posted by konsolebox

Are the contents of the file in uniform? It looks like that its main delimeter is a blank line. In awk that can be easily achieved with RS = "" and FS = ":". But I'm interested to learn hacking this in Perl.

Edit: I think FS = ":" won't do. But anyway, how do you intend to save the values in csv?... Noting that some items have more parameters than the other: Attached BCV Device, Attached VDEV TGT Device..

The contents of the file are all uniform. None of the item value is empty. In csv, the first field will become the item heading. For example, list all Devices with their size, and rest of the properties/info. Each item will become 1 heading, and each heading will have more than 1 values.

Goni

Goni · 08-21-2010, 06:59 AM

Quote:

Originally Posted by grail

Ok I see where you are going

If you search around you will see a common way of reading from a file is as follows:

Code:

open(HANDLE, "file_name") || die "couldn't open the file!";

while($line = <HANDLE>){ # This is also seen a lot as while(<HANDLE>) but you can look that up
    print $line;
}

close(HANDLE);

Obviously you can split and do other thing other than print.

See if that helps?

Reading the file not a problem, how can I process a block of information while that block may or may not have a certain number of lines. 1 block may have 10 lines, other have 14.

If we do a split,

Code:

my @arr=split("\:",$line);

, the first element of the array brings all the values instead of just one.

Code:

if( ($arr[0] eq "Device Physical Name" ) )
...

won't help

konsolebox · 08-21-2010, 07:28 AM

Quote:

Originally Posted by Goni

The contents of the file are all uniform. None of the item value is empty. In csv, the first field will become the item heading. For example, list all Devices with their size, and rest of the properties/info. Each item will become 1 heading, and each heading will have more than 1 values.

Goni

Honestly I can't parse it. Can you give us an example output with the headers(?) of what you intend... At least based from the two entries. I think it can really make things clearer. You can place comments using # if you like.

Goni · 08-21-2010, 07:34 AM

Ok, here is what the sample output would look like, CSV output.

Code:

Device Physical Name,Device Symmetrix Name,Symmetrix ID.....
Not Visible,1234,1234567
Not Visible,3456,1234567
Not Visible,8726,1234567
Not Visible,0000,1234567
Not Visible,1234,1234567

Would that helps?

konsolebox · 08-21-2010, 07:48 AM

Well basing from that and from the two entries you can get output like this:

Code:

Not Visible,1234,N/A,N/A,N/A,5120,76800,10485760,5120,5242880
Not Visible,4567,N/A,5120,76800,10485760,5120,5242880

Notice the difference in the number of columns they have.

Which could only mean that you need to parse the file in a more XML-like way. Not just linear. With this the possible attributes that may occur should first be predetermined.

grail · 08-21-2010, 10:54 AM

I will put my hand up to say I am an extreme noob when it comes to Perl. With that in mind ... have a look:

Code:

#!/usr/bin/perl

use warnings;

open(HANDLE, "file1") || die "Unable to open file1";

$counter = 0;

while($line = <HANDLE>){
    chomp($line);
    if ($line =~ /Device Physical Name/)
    {
        $counter++;
        $records = {};
    }

    if ($line ne "" && $line =~ /:/)
    {
        ($field,$value) = split(/:/, $line);
        $records->{trim($field)} = trim($value);
    }

    push @array, $records if ($line eq "") ;
}

push @array, $records if (--$counter != $#array);

for $href ( @array ) {
    print "{ ";
    for $role ( keys %$href ) {
         print "$role=$href->{$role} ";
    }
    print "}\n";
}

close(HANDLE);

sub trim
{
    my $string = shift;

    $string =~ s/^\s+//;
    $string =~ s/\s+$//;

    return $string;
}

theNbomr · 08-21-2010, 11:58 AM

I would suggest letting Perl do the first level of disassembly of the file, by using the blank line as a record delimiter:

Code:

    $/="\n\n";

Having done that, each scalar read from the file will be a block of data, which can readily be split (hint) on newlines. From there, it looks pretty easy to create a hash of field names/values by splitting on ':'s.

--- rod.

konsolebox · 08-21-2010, 05:43 PM

Still are these the only attributes?

Code:

Device Physical Name
Device Symmetrix Name
Device Serial ID
Attached BCV Device
Attached VDEV TGT Device
Device Capacity  # ignored
Cylinders
Tracks
512-byte Blocks
MegaBytes
KiloBytes

Goni · 08-21-2010, 07:33 PM

Quote:

Originally Posted by konsolebox

Still are these the only attributes?

Code:

Device Physical Name
Device Symmetrix Name
Device Serial ID
Attached BCV Device
Attached VDEV TGT Device
Device Capacity  # ignored
Cylinders
Tracks
512-byte Blocks
MegaBytes
KiloBytes

No, there are some additional. But they are all with same FS. I think if it works for 2, it will work for all.

grail, it depends on what is the definition of a noob you got in your dictionary

but, your code seems looping each item more than 1 times. I tried it, yet to debug, give an output 86 times with "Device Physical Name" suppose to show only 4 times

konsolebox · 08-21-2010, 08:06 PM

@Goni It's really important to determine all the possible attributes first since the code will depend on it. If you know the possible attributes, you can already determine where to place the values and where to reserve places for null values like with the example before:

Code:

Not Visible,1234,N/A,N/A,N/A,5120,76800,10485760,5120,5242880
Not Visible,4567,N/A,5120,76800,10485760,5120,5242880

You can expect an output like this instead

Code:

Not Visible,1234,N/A,N/A,N/A,5120,76800,10485760,5120,5242880
Not Visible,4567,N/A,,,5120,76800,10485760,5120,5242880

With that the code will be simpler since you don't have to collect all of the headers (attributes) and data first then dynamically create an order based from the collected headers, then print the data.

With the simpler version you can immediately print the data for each entry since you already know the order and where to reserve the null values.

If it can't be determined then there's no choice but to create the harder code.

@grail I thought this might be a good reference since you already know much about awk: http://perldoc.perl.org/perltrap.html