ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm using Perl 5.8.7 on Cygwin 1.5.18 and recently I ran into a strange problem:
I have file1 whose contents are along the lines of:
...
XXX 10
YYY 12
ZZZ 17
...
I have file2 whose contents are along the lines of:
...
XXX
AAA
ZZZ
DDD
...
I have a perl script which reads each line from file2, attempts a match in file1, then extracts the second field:
Code:
...
my $file1 = ## path to file1 ##
my $file2 = ## path to file2 ##
open IN, "<$file2" or die "blabblahblah"
while (<IN>) {
chomp;
my $d = $_; print "d=zzz${d}zzz\n";
my $left = `grep $d $file1`; print "left=aaa${left}aaa\n";
chomp $left; print "left=bbb${left}bbb\n";
my $v = "-";
if ($left ne "") { $v = (split)[1], $left; }
print "v=ccc${v}ccc\n";
...
}
...
My debug print statements output something totally unexpected (I'm only going to show one attempted match for XXX below; the rest are similar):
d=zzzXXXzzz
left=aaaXXX 10
aaa
bbbt=bbbXXX 10
v=cccccc
Calling chomp on the newline terminated string returned by grep totally messed up that string (as seen from the debug outputs). Subsequently, (split)[0] on that string returns XXX as expected (not shown here), but (split)[1] on that string returns a null string (instead of 10 as expected). Anyone knows what is going on here or how to fix it? Thanks in advance.
Here's something a little simpler that does it. Just call it with file1 and file2 as arguments (in that order). And please don't put two statements on one line in a program, it makes things impossible to read.
Code:
#!/usr/bin/perl
while(<>){
my ($key,$val) = split /\s+/;
if ($hash{$key}) {
print $hash{$key},"\n";
} else {
$hash{$key} = $val;
}
}
1) A user on Linux Forums pointed for me that I used split on the default $_ instead of $left
Incorrect:
Code:
if ($left ne "") { $v = (split)[1], $left; }
Should be:
Code:
if ($left ne "") { $v = (split /\s+/, $left)[1]; }
This caused the $v to be undef/blank in the debug printouts.
2) Chomp has a problem with windows-style terminated strings, i.e. \r\f. Chomp only removes the \r leaving the \f intact, which causes the wraparound problem as shown in the debug output. The hanging \f also causes other string comparison problems.
Chomp removes the input record separator (special variable $\) which by default is a "\n". Set it to whatever you want and chomp will remove it. Alternatively, you can use the regex
Code:
s/\s+$//
which will remove any and all whitespace characters (including carriage returns and line feeds) from the end of the string.
puffinman, thanks for the suggestion. But I see a problem with each of the alternatives:
1) Setting $\
This means the script will only work correctly for one specific type of files (unix, windows, or mac). Certainly, this alternative will not work if you don't know ahead of time which type of files your script will have to deal with. Also, this definitely won't work if your script needs to work with more than one type of files.
2) Using regex s/\s+$// instead of chomp
This of course will work with all types of files. I can even define my custom chomp to do this regex if calling chomp is more convenient. However, this highlights the problem of having to define (or redefine) common functions in Perl just have my scripts work correctly cross-platform. If there are a dozen more like chomp, then I have to redefine them all for every single one of my scripts? Wouldn't it be better if the Perl language is implemented with cross-platform in mind instead of shifting this burden to its programmers?
Chomp has a problem with windows-style terminated strings, i.e. \r\f.
Actually, it's \r\n, not \r\f. \f is a formfeed, not a newline. Anyway, chomp is operating exactly as it's supposed to, removing any \n's from a string. as long as you're aware of this, there isn't any problem, just strip out the \r's ($line =~ s/\r//).
Quote:
Originally Posted by thanhvn
1) Setting $\
This means the script will only work correctly for one specific type of files (unix, windows, or mac). Certainly, this alternative will not work if you don't know ahead of time which type of files your script will have to deal with.
Exactly, so you need to normalize your incoming data so that it all has the expected format.
Quote:
Originally Posted by thanhvn
However, this highlights the problem of having to define (or redefine) common functions in Perl just have my scripts work correctly cross-platform. If there are a dozen more like chomp, then I have to redefine them all for every single one of my scripts? Wouldn't it be better if the Perl language is implemented with cross-platform in mind instead of shifting this burden to its programmers?
No, I don't see M$ bending over backwards so their applications will run on Linux, but that's beside the point. There's quite a bit of functionality in Perl and M$ can't support half of it simply because it doesn't have the facilities (such as socket programming). Since Perl started on, is developed on and the majority of scripts run on Linux and Unix systems, that's where the focus of expansion of features resides. Further, since Perl "lives" in Linux, it has to at least try to be backward compatable as much as possible with earlier versions of Perl, and a new chomp function, a function which is in just about every script out there that reads a file, would probably (or at least possibly) break all of those scripts (since the original developers would already have dealt with it only removing \n if they had to). If you're writing a cross platform script, you just have to take that into consideration and code accordingly. I could perhaps see adding in a Linux-safe alternative function to chomp, but not in replacing chomp altogether. The same goes for any other function that works just fine under *nix and not Windows - in fact, it wouldn't surprise me at all if there's already a Perl module that includes this functionality.
Every single developer I know also keeps a collection of code snippets, no matter what language they work in. If you're going to be writing a lot of cross platform scripts, perhaps you should invest some time into creating a few subs to keep around and just include in your scripts as needed.
Now, on to your code. To begin with, you need to work on your formatting, especially, as has already been pointed out, hit the enter key every once in a while (as in, after every ; { or }). Your 4th line down in the OP is missing a semi-colon on the end. Finally, you look as though you're trying to fit as much code as possible into a small space; putting multiple statements on one line, using single character variables and so forth. If you keep this up, you're going to get lost fast when you start writing larger scripts. Each statement should be on its own line, and each variable should have a descriptive name, this isn't C.
The following code solves your issue and does so in a clear and consise manner. Anyone with passing knowlege of Perl should be able to read this without difficulty. This not only make troubleshooting easier, but getting used to writing code like this allows you to see at a glance what the script is doing.
Code:
#!/usr/bin/perl -w
my $keysfile = 'file1.txt'; ## the file containing the keys ##
my $datafile = 'file2.txt'; ## the file containing data ##
# this is boilerplate code for reading a file into an array
open(FILE,"$keysfile") || die "Cannot open $keysfile: $1";
my @keys = <FILE>;
close FILE;
chomp(@keys);
# ...so do it again for the data file
open(FILE,"$datafile") || die "Cannot open $datafile: $1";
my @data = <FILE>;
close FILE;
chomp(@data);
# Now, regardless of what each line is terminated with, it doesn't matter
# because each element is an array element. We're not searching for
# anything on the end of the element (except for maybe formatting later).
# you have 2 *sane* options here: Either spin the keys array and
# regex match it to the data array, or split the data into a hash.
# Personally, I think a hash is perfect for this because you can do a direct match
# in this situation
%datahash = ();
foreach my $el (@data){
my($key,$value) = split(/\s+/,$el);
$datahash{$key} = $value;
}
# Finally, all you need to do now is spin through the keys array
# and match that to the data hash. If you opted to not assign the data to a hash,
# you would simply perform a regex on the @data array here instead of the match
# to the %datahash key.
foreach my $key (@keys){
my $data_value = ''; # reinit with each pass
if( $datahash{$key} && $datahash{$key} ne '' ){
$data_value = $datahash{$key};
# now you have $key and $data_value and you can handle them
# however you want
print $key.' = '.$data_value."\n";
}
}
exit;
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.