ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a file that looks similar to the following:
Code:
> K value
25
> Iterations
10
> Data properties:
[0]: 502 194 102
[1]: 9234 50 1899
[2]: 95 908 145
What I need to do is use the line that begins with ">" to determine what variable(s) the data on the proceeding lines (that do not begin with ">") should store that data. (And notice that it is perfectly valid for a ">" statement to be followed by multiple lines of data, not just a single line).
I've got a really messy (but functional) implementation right now that uses a while loop on the file handle, checks if the line begins with ">" and if so, it sets a condition code. On the next loop if the condition code is set, it matches the line with the data that it expects to see for that condition code, and writes the data to the appropriate variable. But, there has to be a better way to do this I think. (I'm not a Perl guru by any means). The current code is very cumbersome, especially when adding a new condition code, and furthermore this is an on-going project so the data file may make some changes. So with what little Perl background I know, I thought of two ideas:
#1: Save any ">" line to a temporary loop variable when one is found, and use that to process subsequent data. (The down-side to this is that these files are rather large, and not all of the data that follows after every ">" line needs to be read...)
#2: When a ">" line is read, continue reading in subsequent lines of the file to get all of the data, without requiring the loop condition to be re-evaluated. (Disadvantage: this could be kind of dangerous since I'd be reading in lines of the file from within segments of the loop in addition to the loop itself...).
I also thought of using some manipulation with the "next", "redo", "continue", "last" key-words to get around the condition code implementation I have currently, but from what I have read on perldoc.org this doesn't seem possible...
I know that TMTOWTDI, but I would like to know if any more experienced Perl programmers have a suggestion or recommendation on how to accomplish this task. I'm trying to learn more than just the basic operations about Perl as I am working on this project, and I would rather have code that is easier to modify, understand, and maintain than continue to have to support this hacked up solution that I initially created. Thanks!
Distribution: Debian /Jessie/Stretch/Sid, Linux Mint DE
Posts: 5,195
Rep:
What does "TMTOWTDI" mean?
I am not a PERL programmer, so maybe you are asking from some very PERL specific issues. I only know some general programming techniques.
But to me it appears that this is about designing an algorithm, not the PERL language.
If I have a situation like this (maybe a bit more complicated) I would implement different states, like:
st_reading_kval
st_reading_nritr
st_reading_data
A state would be initialized by reading "> something" and ended when you encounter "> something_else" Once you are in a state, read and process as appropriate until the state ends.
In this way you can handle quite complicated files.
If it is not worth implementing a state machine, maybe you did just fine with your "messy" code.
Or use Lex & Yacc if you can spend 2 years studying how these work.
A state would be initialized by reading "> something" and ended when you encounter "> something_else" Once you are in a state, read and process as appropriate until the state ends.
This is exactly what my condition code does (just substitute "state" with "condition code"). Sure a state machine is a pretty decent design, but it is a very cumbersome implementation in Perl in this particular case.
And yes, this is more of a "how would you do this in Perl" question than a general design question.
Quote:
Originally Posted by jlinkels
Or use Lex & Yacc if you can spend 2 years studying how these work.
jlinkels
Actually I did use lex and yacc about 3 years ago, but I've completely forgotten them since then.
How about this? It reads your data from STDIN. There are three subroutines, with references to them stored in a hash where the hash keys correspond to the ">" lines.
Code:
#!/usr/bin/perl -w
use strict;
my %handles;
$handles{'K value'} = sub {
my $content = shift;
print "I just got a K value of ",$content,".\n";
};
$handles{'Iterations'} = sub {
my $content = shift;
print "There will be $content iterations.\n";
};
$handles{'Data properties:'} = sub {
my $content = shift;
if( $content =~ /\s*\[(\d)\]:\s+(\d+)\s+(\d+)\s+(\d+)/ ){
print "The three arguments in line $1 are $2, $3, $4.\n";
}
};
my $current_handle;
while( <> ){
chomp;
if( $_ =~ /^> (.*)\w*$/ ){
$current_handle = $1;
next;
}
&{$handles{$current_handle}}($_);
}
Last edited by spirit receiver; 07-02-2006 at 01:31 PM.
Thanks spirit receiver, that seems like a nifty solution.
I'm still a little curious though, for the record is it possible to do something like this?
Code:
while (<FILE>) {
if (m/^> some label/) {
# read next line from FILE
}
elsif (m/^> other label/) {
while(1) {
# read next line from FILE
if (m/(\d+)/) {
print "Read $1\n";
} else {
last; (end the loop)
}
}
}
}
And of course, if any "read next line from FILE" statement fails because the EOF is reached, I would need to be able to detect that and abort/return from the subroutine. Is something like that possible/recommended in Perl?
You'll run into trouble with that script: The while(1) loop will be finished once it retrieves a line that doesn't contain a digit. This line contains, say, "> some label". The script will continue with the outer loop, i.e. it will read the next line. This line will contain data, not a label, so it won't trigger any of the if clauses, and all subsequent lines will be ignored until the next label is reached.
I wonder how to get a variable into the print line. For situations where the line isn't just three numbers.
Code:
> K value
25
> Iterations
10
> Data properties:
[0]: 502 194 102
[1]: 9234 50 1899 789
[2]: 95 908 145 2567 456
Code:
$HANDLES{'Data properties:'} = sub {
my $CONTENT = shift;
if( $CONTENT =~ /\s*\[(\d+)\]:\s+(.*)/ ){
# if( $CONTENT =~ /\s*\[(\d)\]:\s+(\d+)\s+(\d+)\s+(\d+)/ ){
print "There are three arguments in line $1 are $2.\n";
# print "The three arguments in line $1 are $2, $3, $4.\n";
}
};
Thanks, I'll try that.
I was working on something like this...
Code:
$HANDLES{'Data properties:'} = sub {
my $CONTENT = shift;
# get the number of args in each line
my $COUNT = () = $CONTENT =~ /\s\w+/g;
if( $CONTENT =~ /\s*\[(\d+)\]:\s+(.*)/ ){
print "The $COUNT arguments in line $1 are $2\n";
}
};
You'll run into trouble with that script: The while(1) loop will be finished once it retrieves a line that doesn't contain a digit. This line contains, say, "> some label". The script will continue with the outer loop, i.e. it will read the next line. This line will contain data, not a label, so it won't trigger any of the if clauses, and all subsequent lines will be ignored until the next label is reached.
Good point, I didn't think of that when I wrote that snippet. But I could easily just keep a $last_line variable that retains the last line read, couldn't I?
Even though I'd run into trouble with the script, is it possible to do? In other words what I really want to ask was: is it possible to read (or "peek") the next line of a file inside of a while(<FILE>) loop? (even if its not a good idea most of the time...)
I'm not sure if I understood your question. Do you want to use an inner loop to read from the file without affecting the position where the outer loop will continue in the next pass? Then you'll have to restore the current position for the file handle using tell and seek. But this will only work with ordinary files, not with STDIN, for example.
I'm not sure if I understood your question. Do you want to use an inner loop to read from the file without affecting the position where the outer loop will continue in the next pass? Then you'll have to restore the current position for the file handle using tell and seek. But this will only work with ordinary files, not with STDIN, for example.
No, not necessarily an inner loop. Lets just say I want to do something simple like this:
Code:
while (<FILE>) {
if (m/^>/) {
my $string = # read the next line of file here
}
}
I haven't been able to find anything that tells me whether that is possible or not (I'm not necessarily going to use it, I'm just incredibly curious at this point). If such a "read next line" exists without havint to do some tedious tell/seeking, on the next iteration through the loop after that "read next line of file" call has been made inside the if statement, would the while loop next get:
1) The same line that was previously read in the if statement?
2) The line that follows after the line that was previously read in the if statement?
It will read the next line, i.e. 2). Each time you read from a file handle, it's current position will be changed, it doesn't matter where that reading takes place. Therefore, if you wanted 1) to happen, you'd have to store the current position using tell before reading in the if statement, and to restore it later with seek when you leave the if statement.
Edit: Maybe you're also asking how reading from the file in the if statement could be done? Simply by using "my $string = <FILE>;".
Last edited by spirit receiver; 07-04-2006 at 06:13 PM.
It will read the next line, i.e. 2). Each time you read from a file handle, it's current position will be changed, it doesn't matter where that reading takes place. Therefore, if you wanted 1) to happen, you'd have to store the current position using tell before reading in the if statement, and to restore it later with seek when you leave the if statement.
Edit: Maybe you're also asking how reading from the file in the if statement could be done? Simply by using "my $string = <FILE>;".
That is exactly the answer that I was looking for. I knew it was something simple! Thank you 5x spirit receiver.
I have split it into a hash of keys and values.
try this:
Code:
#!/usr/bin/perl -w
local $/ = "\n>";
@slurp = <>;
%slurp = map {split "\n", $_, 2} @slurp; # split each record into 2 and make a hash
print "\n'$k' = \n$v" while ($k,$v) = each(%slurp);
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.