Addition with matching arrays (or something)

slakmagik · 02-17-2004, 06:38 PM

I have a three stage project and I've completed two of them. The last has my brains dribbling out my ears. I have an html file with names and numbers amidst strings of markup, two sections of which are relevant (three, but the third will take care of itself). And I have a text file with names and numbers in two columns. Like so:

Code:

...
<tr><td>foo</td><blah>blah</blah<td>John Doe</td><blah>blah</blah><td>112</td></tr>
...

and

Code:

...
        1 John Doe
...

I want to take the name John Doe in the text file and match it with the John Doe in the html file and, if they match, add John Doe's number to his old number. Then I'd sed the html to replace 112 with 113. And so on for all the names and numbers in the text file. And the length (number of name/number lines) of both files may vary, especially for the text file. (This is a repetitive operation.)

I am not a programmer (as will be obvious) and this is not a script, but notes.

Code:

#!/bin/bash
OLDIFS=$IFS
IFS=";"
cname=($(awk -F"<[^>][^>]*>" '{ORS = ";"; print $4}' page.html))
cnum=($(awk -F"<[^>][^>]*>" '{ORS = ";"; print $7}' page.html))
name=($(awk -F"	" '{ORS = ";"; print $2}' new.txt))
num=($(awk -F"	" '{ORS = ";"; print $1}' new.txt))

echo ${name[14]}
echo ${num[14]}
echo ${cname[87]}
echo ${cnum[87]}
echo `calc ${cnum[87]}+${num[14]}`
if [ ${name[14]} = ${cname[87]} ]; then
	echo "Yippee"
else
	echo "Shit"
fi	
IFS=$OLDIFS

The output is

Code:

John Doe 
      1
John Doe
112
113
Shit

So I've gotten as far as isolating the relevant fields and being able to add them. ('calc' is a sort of function in my ~/.bash_profile because I don't like the way 'expr' or whatever it is - maybe it was 'bc' - something had a syntax I didn't like.) And I'm pretty sure I can sed the lines if I can just get to that point. Now, the reason everything is so goofy is because of the stupid spaces in the names and the markup gibberish and because I'm an idiot. The 87th and 14th whoozits are irrelevant - I was just using them to see if anything worked.

Anyway - I have no idea how to compare the two variables in each array to see if they match, no idea how to iterate over the list(s), and no idea what I'm doing in general. I'm just making it up as I go along. If anyone could tell me where I've gone wrong in utterly misconstructing the problem/solution or could push this gibberish the rest of the way or, quite frankly, in this instance, just write the thing for a mentally challenged person, I'd be eternally grateful. There's just a fundamental *thing* about scripting that I'm just not getting. And, yeah, I suppose I should 'use perl' but I intend to get familiar with bash, awk, sed, and all those great guys and am willing to learn... I just can't seem to. But I have no interest (any more) in perl and wouldn't want to (try) to learn it for this single task.

Sorry about the length of the post but I didn't know how to make it any shorter without making it even less clear. Now I'm going to go lay down. I've been working on this for days (or months, depending on how you look at it) and one version of these notes went down in flames when a hard drive crashed today and I had to reconstruct. It just doesn't seem like it should be hard. 'If field 2 of each record of file1 matches field 4 of any record of file2, add field 1 of file1 to field 7 of file 2'.

ToniT · 02-25-2004, 07:14 PM

ok,

If I understand correctly, the problem is with trimming the spaces/tabs away from the starts and the ends of lines/fields.

There is a sample sed script that does the trick:

Code:

echo -e '  \t  joo   \npajoo  \n  Niinpaniin' | sed 's/^[ \t]*\([^ \t]*\)[ \t]*$/\1/' | hexdump -C

(The echo is for generating sample data and hexdump is just for verifying that the result is correct).

slakmagik · 02-25-2004, 09:52 PM

Hm. Ironically, I figured out awhile ago that a space was the problem with why the variables weren't testing as equal. But that wasn't the primary problem and wasn't cropping up for any sensible reason. The second two-column file is actually already generated mostly by sed in another script. I just changed this

sed -e 's/.* - //;s/\[ .*//g' ${dir}file.txt | \
sort | uniq -ci > ${dir}${date}_totals.txt

to this

sed -e 's/.* - //;s/ \[ .*//g' ${dir}file.txt | \
sort | uniq -ci > ${dir}${date}_totals.txt

iirc. Drove me batty because I couldn't *see* anything wrong, but the initial sed script was leaving a trailing space before the bracket and everything after it was removed. This strips the space and everything after the bracket and now they test true. (The space between the numbers and the names are actually tabs and not an issue and the html strips clean.)

But thanks very much for that - it would have set me on the right track and I was beginning to think this was a permanent zero-reply thread.

My actual problem is that I simply don't know what the hell I'm doing.

As I say, in human terms, I manually read a line from file A, extracting the value of field 2. I then scan file B for a match. Having found the match, I take the value of field 1 from the same line of file A and add it to the corresponding value of field 7 of file B. And so it repeats for each line of file A until file A is exhausted.

I just don't know how to translate the algorithm to machine terms and script it. I *thought* an 'array' and letting awk and/or sed 'process' them would be the way to go. But I don't even understand 'arrays' or know quite *how* to 'process' them. On simple transforms like just generating file A, I don't have a problem. (Except with spaces.

) But generating file C from file A and file B is killing me.