LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-30-2006, 02:37 PM   #1
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 547

Rep: Reputation: 30
Perl: file read strategy


I have a file that looks similar to the following:


Code:
> K value
  25
> Iterations
  10
> Data properties:
  [0]: 502 194 102
  [1]: 9234  50  1899
  [2]: 95   908   145

What I need to do is use the line that begins with ">" to determine what variable(s) the data on the proceeding lines (that do not begin with ">") should store that data. (And notice that it is perfectly valid for a ">" statement to be followed by multiple lines of data, not just a single line).


I've got a really messy (but functional) implementation right now that uses a while loop on the file handle, checks if the line begins with ">" and if so, it sets a condition code. On the next loop if the condition code is set, it matches the line with the data that it expects to see for that condition code, and writes the data to the appropriate variable. But, there has to be a better way to do this I think. (I'm not a Perl guru by any means). The current code is very cumbersome, especially when adding a new condition code, and furthermore this is an on-going project so the data file may make some changes. So with what little Perl background I know, I thought of two ideas:



#1: Save any ">" line to a temporary loop variable when one is found, and use that to process subsequent data. (The down-side to this is that these files are rather large, and not all of the data that follows after every ">" line needs to be read...)


#2: When a ">" line is read, continue reading in subsequent lines of the file to get all of the data, without requiring the loop condition to be re-evaluated. (Disadvantage: this could be kind of dangerous since I'd be reading in lines of the file from within segments of the loop in addition to the loop itself...).


I also thought of using some manipulation with the "next", "redo", "continue", "last" key-words to get around the condition code implementation I have currently, but from what I have read on perldoc.org this doesn't seem possible...



I know that TMTOWTDI, but I would like to know if any more experienced Perl programmers have a suggestion or recommendation on how to accomplish this task. I'm trying to learn more than just the basic operations about Perl as I am working on this project, and I would rather have code that is easier to modify, understand, and maintain than continue to have to support this hacked up solution that I initially created. Thanks!
 
Old 06-30-2006, 02:47 PM   #2
jlinkels
LQ Guru
 
Registered: Oct 2003
Location: Bonaire, Leeuwarden
Distribution: Debian /Jessie/Stretch/Sid, Linux Mint DE
Posts: 5,195

Rep: Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043
What does "TMTOWTDI" mean?

I am not a PERL programmer, so maybe you are asking from some very PERL specific issues. I only know some general programming techniques.

But to me it appears that this is about designing an algorithm, not the PERL language.

If I have a situation like this (maybe a bit more complicated) I would implement different states, like:

st_reading_kval
st_reading_nritr
st_reading_data

A state would be initialized by reading "> something" and ended when you encounter "> something_else" Once you are in a state, read and process as appropriate until the state ends.

In this way you can handle quite complicated files.

If it is not worth implementing a state machine, maybe you did just fine with your "messy" code.

Or use Lex & Yacc if you can spend 2 years studying how these work.

jlinkels
 
Old 06-30-2006, 02:55 PM   #3
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 547

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by jlinkels
What does "TMTOWTDI" mean?
There's More Than One Way To Do It.

It is Perl's mantra.


Quote:
Originally Posted by jlinkels
A state would be initialized by reading "> something" and ended when you encounter "> something_else" Once you are in a state, read and process as appropriate until the state ends.
This is exactly what my condition code does (just substitute "state" with "condition code"). Sure a state machine is a pretty decent design, but it is a very cumbersome implementation in Perl in this particular case.


And yes, this is more of a "how would you do this in Perl" question than a general design question.


Quote:
Originally Posted by jlinkels
Or use Lex & Yacc if you can spend 2 years studying how these work.
jlinkels
Actually I did use lex and yacc about 3 years ago, but I've completely forgotten them since then.

Last edited by R00ts; 06-30-2006 at 02:57 PM.
 
Old 06-30-2006, 05:50 PM   #4
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
How about this? It reads your data from STDIN. There are three subroutines, with references to them stored in a hash where the hash keys correspond to the ">" lines.
Code:
#!/usr/bin/perl -w

use strict;

my %handles;

$handles{'K value'} = sub {
  my $content = shift;
  print "I just got a K value of ",$content,".\n";
};

$handles{'Iterations'} = sub {
  my $content = shift;
  print "There will be $content iterations.\n";
};

$handles{'Data properties:'} = sub {
  my $content = shift;
  if( $content =~ /\s*\[(\d)\]:\s+(\d+)\s+(\d+)\s+(\d+)/ ){
    print "The three arguments in line $1 are $2, $3, $4.\n";
  }
};

my $current_handle;

while( <> ){
  chomp;
  if( $_ =~ /^> (.*)\w*$/ ){
    $current_handle = $1;
    next;
  }
  &{$handles{$current_handle}}($_);
}

Last edited by spirit receiver; 07-02-2006 at 01:31 PM.
 
Old 07-02-2006, 11:00 PM   #5
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 547

Original Poster
Rep: Reputation: 30
Thanks spirit receiver, that seems like a nifty solution.


I'm still a little curious though, for the record is it possible to do something like this?

Code:
while (<FILE>) {
  if (m/^> some label/) {
    # read next line from FILE    
  }
  elsif (m/^> other label/) {
    while(1) {
      # read next line from FILE
      if (m/(\d+)/) {
        print "Read $1\n";
      } else {
        last; (end the loop)
      }
    }
  }
}

And of course, if any "read next line from FILE" statement fails because the EOF is reached, I would need to be able to detect that and abort/return from the subroutine. Is something like that possible/recommended in Perl?
 
Old 07-03-2006, 02:51 AM   #6
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
You'll run into trouble with that script: The while(1) loop will be finished once it retrieves a line that doesn't contain a digit. This line contains, say, "> some label". The script will continue with the outer loop, i.e. it will read the next line. This line will contain data, not a label, so it won't trigger any of the if clauses, and all subsequent lines will be ignored until the next label is reached.
 
Old 07-03-2006, 07:48 AM   #7
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 61
hi,

I wonder how to get a variable into the print line. For situations where the line isn't just three numbers.
Code:
> K value
  25
> Iterations
  10
> Data properties:
  [0]: 502 194 102
  [1]: 9234  50  1899 789
  [2]: 95   908   145 2567 456

Code:
$HANDLES{'Data properties:'} = sub {
  my $CONTENT = shift;
 if( $CONTENT =~ /\s*\[(\d+)\]:\s+(.*)/ ){
#  if( $CONTENT =~ /\s*\[(\d)\]:\s+(\d+)\s+(\d+)\s+(\d+)/ ){
    print "There are three arguments in line $1 are $2.\n";
#   print "The three arguments in line $1 are $2, $3, $4.\n";
  }
};
 
Old 07-03-2006, 09:27 AM   #8
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
I'd suggest splitting the line and to iterate over the resulting array:
Code:
foreach( split( '\s+', $line )) {
  printf( "The next value is %d.\n",$_ ) if ( /^\d+$/ );
};
 
Old 07-03-2006, 10:22 AM   #9
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 61
Thanks, I'll try that.
I was working on something like this...
Code:
$HANDLES{'Data properties:'} = sub {
  my $CONTENT = shift;
# get the number of args in each line
  my $COUNT = () = $CONTENT =~ /\s\w+/g;
if( $CONTENT =~ /\s*\[(\d+)\]:\s+(.*)/ ){
   print "The $COUNT arguments in line $1 are   $2\n";
 }
};

Last edited by homey; 07-03-2006 at 12:44 PM.
 
Old 07-03-2006, 11:33 PM   #10
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 547

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by spirit receiver
You'll run into trouble with that script: The while(1) loop will be finished once it retrieves a line that doesn't contain a digit. This line contains, say, "> some label". The script will continue with the outer loop, i.e. it will read the next line. This line will contain data, not a label, so it won't trigger any of the if clauses, and all subsequent lines will be ignored until the next label is reached.
Good point, I didn't think of that when I wrote that snippet. But I could easily just keep a $last_line variable that retains the last line read, couldn't I?



Even though I'd run into trouble with the script, is it possible to do? In other words what I really want to ask was: is it possible to read (or "peek") the next line of a file inside of a while(<FILE>) loop? (even if its not a good idea most of the time...)
 
Old 07-04-2006, 04:19 AM   #11
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
I'm not sure if I understood your question. Do you want to use an inner loop to read from the file without affecting the position where the outer loop will continue in the next pass? Then you'll have to restore the current position for the file handle using tell and seek. But this will only work with ordinary files, not with STDIN, for example.
 
Old 07-04-2006, 04:45 PM   #12
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 547

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by spirit receiver
I'm not sure if I understood your question. Do you want to use an inner loop to read from the file without affecting the position where the outer loop will continue in the next pass? Then you'll have to restore the current position for the file handle using tell and seek. But this will only work with ordinary files, not with STDIN, for example.
No, not necessarily an inner loop. Lets just say I want to do something simple like this:

Code:
while (<FILE>) {
    if (m/^>/) {
      my $string = # read the next line of file here
    }
}
I haven't been able to find anything that tells me whether that is possible or not (I'm not necessarily going to use it, I'm just incredibly curious at this point). If such a "read next line" exists without havint to do some tedious tell/seeking, on the next iteration through the loop after that "read next line of file" call has been made inside the if statement, would the while loop next get:

1) The same line that was previously read in the if statement?
2) The line that follows after the line that was previously read in the if statement?


Thanks once again.
 
Old 07-04-2006, 06:10 PM   #13
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
It will read the next line, i.e. 2). Each time you read from a file handle, it's current position will be changed, it doesn't matter where that reading takes place. Therefore, if you wanted 1) to happen, you'd have to store the current position using tell before reading in the if statement, and to restore it later with seek when you leave the if statement.

Edit: Maybe you're also asking how reading from the file in the if statement could be done? Simply by using "my $string = <FILE>;".

Last edited by spirit receiver; 07-04-2006 at 06:13 PM.
 
Old 07-05-2006, 04:48 PM   #14
R00ts
Member
 
Registered: Mar 2004
Location: Austin TX, USA
Distribution: Ubuntu 11.10, Fedora 16
Posts: 547

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by spirit receiver
It will read the next line, i.e. 2). Each time you read from a file handle, it's current position will be changed, it doesn't matter where that reading takes place. Therefore, if you wanted 1) to happen, you'd have to store the current position using tell before reading in the if statement, and to restore it later with seek when you leave the if statement.

Edit: Maybe you're also asking how reading from the file in the if statement could be done? Simply by using "my $string = <FILE>;".
That is exactly the answer that I was looking for. I knew it was something simple! Thank you 5x spirit receiver.
 
Old 07-06-2006, 05:59 AM   #15
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
I have split it into a hash of keys and values.
try this:

Code:
#!/usr/bin/perl -w

local $/ = "\n>";

@slurp = <>;
%slurp = map {split "\n", $_, 2} @slurp; # split each record into 2 and make a hash
print "\n'$k' = \n$v" while ($k,$v) = each(%slurp);
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
File Roll over strategy Deepak Inbasekaran Programming 6 04-05-2006 10:52 PM
Read a char from a file (PERL) linuxlover1 Programming 4 01-09-2005 09:10 AM
Perl Program That Read From .conf File Crashed_Again Programming 2 12-07-2003 06:49 AM
perl(Cwd) perl(File::Basename) perl(File::Copy) perl(strict)....What are those? Baldorg Linux - Software 1 11-09-2003 08:09 PM
File server backup strategy: best way? lhoff Linux - General 1 09-01-2001 10:24 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:18 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration