trouble with expr inside a while loop

farkus888 · 04-04-2007, 09:13 PM

I have a file with a 2 column list, the first column is a decimal value. the second column is an item that goes with that value.

I want a list with just the Item and only when the Value is less than 107.1.

Code:

	while read Value Item
	do
	if [ `expr $Value \< 107.1` \= 1 ]
	then 
	echo "$Item" >> /var/tmp/Itemlist$$.$$
	else echo "$Value $Item" >> /var/tmp/failed$$.$$
	fi
	done < /var/tmp/list

I added the else command because I wanted to look at the items that were failing when I noticed something was wrong, I have a lot of items in failed file that should be in the Itemlist file.

can anyone please help me get this working?

wjevans_7d1@yahoo.co · 04-04-2007, 10:05 PM

I've never used expr before. I looked at the man page. The man page says:

Quote:

Comparisons are arithmetic if both ARGs are numbers, else lexicographical.

Experimentation shows that 5 < 107, but not 5 < 107.1. I'm guessing that "numbers" in the above quote means "integers". Since 107.1 is not an integer, the comparison is lexicographical; that is, the strings are compared without knowing that they're numbers. Just as alpha comes before beta comes before gamma, 107.1 comes before 5.

So what to do? Avoid expr. Use perl. This sample worked for me.

Code:

while read Value Item
do
if ! perl -e "exit($Value < 107.1)"
then 
echo "$Item succeeded"
else echo "$Value $Item failed"
fi
done

The downside is that you may have performance (i.e., speed) problems if you're evaluating tens of thousands of numbers. If that's the case, break down and write the whole thing in perl.

Hope this helps.

farkus888 · 04-04-2007, 10:28 PM

unfortunately I don't know perl. haven't bothered to learn it yet because we have it installed but aren't aloud to download premade modules for it. if you care to share any teach yourself references I would appreciate it.

It seems you did guess that this is a small loop inside a larger script, 67 files with about 18,000 lines of code in each one. this whole thing already takes 12 hours to run. I'll try that and see how much time its going to add.

wjevans_7d1@yahoo.co · 04-05-2007, 08:06 AM

Oodles of perl info:

http://www.perl.org/docs.html

makyo · 04-05-2007, 09:44 AM

Hi.

The awk language can often be more understandable than perl.

Here is a timing of your code, the code with perl in it, and an awk script. The data file is 1000 lines, and was randomly generated so that the mean of the first column is 107; the second column contains the line number. So we expect about 500 to be more than 107.1:

Code:

% ./doit

 Data file contains number of lines: 1000


 User code 1, expr, expect wrong answers:
 Lines greater than 107.1 = 959

real    0m0.849s
user    0m0.291s
sys     0m0.515s


 User code 2, perl:
 Lines greater than 107.1 = 503

real    0m2.047s
user    0m0.979s
sys     0m0.981s


 Code 3, gawk:
 Lines greater than 107.1 = 503

real    0m0.005s
user    0m0.003s
sys     0m0.002s

The awk script is:

Code:

#!/bin/sh

rm r3

gawk '
        { lines++ }
$1 < 107.1      { hits++; print $2 >> "r3" }
END { print " Lines greater than 107.1 =", lines-hits }
' r2

Using shell constructs for large numbers of comparisons is slow, and only works for integers (zsh might be different). Adding a load of the perl interpreter for each comparison is very expensive.

Briefly, the awk script says that for each line, increase the variable "lines" by 1, if the first value on a line is less than 107.1, then append the second field to file r3, and at the end-of-file of r2, print the count.

Learning a language like awk can save you a lot of time, and is likely to be more accessible than perl. (Once you get used to awk you can convert awk scripts to perl automatically if you desire.) ... cheers, makyo

farkus888 · 04-06-2007, 03:19 AM

now that I know awk can do that its a lot simpler than your method, all I need is

Code:

awk '$1 < 107.1 { print $2 }' List > Items

and it is a lot [hours] quicker than the perl method.

awk is in my to learn list, before perl. but I only have time to work on that when I am not busy with a project. the real kicker is I wrote this whole thing and spent hours trying to figure out that little part to find out today that the program that gave us that data can do all this work itself. this is what happens when data comes through a chain of six people before it gets to the person who needs it.

either way I learned something, thanks all for the help. I'm sure I'll be back next time the man pages and google aren't enough for me to figure something out.