LinuxAnswers - the LQ Linux tutorial section.
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


Search this Thread
Old 02-17-2004, 06:38 PM   #1
Senior Member
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
Addition with matching arrays (or something)

I have a three stage project and I've completed two of them. The last has my brains dribbling out my ears. I have an html file with names and numbers amidst strings of markup, two sections of which are relevant (three, but the third will take care of itself). And I have a text file with names and numbers in two columns. Like so:
<tr><td>foo</td><blah>blah</blah<td>John Doe</td><blah>blah</blah><td>112</td></tr>
        1 John Doe
I want to take the name John Doe in the text file and match it with the John Doe in the html file and, if they match, add John Doe's number to his old number. Then I'd sed the html to replace 112 with 113. And so on for all the names and numbers in the text file. And the length (number of name/number lines) of both files may vary, especially for the text file. (This is a repetitive operation.)

I am not a programmer (as will be obvious) and this is not a script, but notes.
cname=($(awk -F"<[^>][^>]*>" '{ORS = ";"; print $4}' page.html))
cnum=($(awk -F"<[^>][^>]*>" '{ORS = ";"; print $7}' page.html))
name=($(awk -F"	" '{ORS = ";"; print $2}' new.txt))
num=($(awk -F"	" '{ORS = ";"; print $1}' new.txt))

echo ${name[14]}
echo ${num[14]}
echo ${cname[87]}
echo ${cnum[87]}
echo `calc ${cnum[87]}+${num[14]}`
if [ ${name[14]} = ${cname[87]} ]; then
	echo "Yippee"
	echo "Shit"
The output is
John Doe 
John Doe
So I've gotten as far as isolating the relevant fields and being able to add them. ('calc' is a sort of function in my ~/.bash_profile because I don't like the way 'expr' or whatever it is - maybe it was 'bc' - something had a syntax I didn't like.) And I'm pretty sure I can sed the lines if I can just get to that point. Now, the reason everything is so goofy is because of the stupid spaces in the names and the markup gibberish and because I'm an idiot. The 87th and 14th whoozits are irrelevant - I was just using them to see if anything worked.

Anyway - I have no idea how to compare the two variables in each array to see if they match, no idea how to iterate over the list(s), and no idea what I'm doing in general. I'm just making it up as I go along. If anyone could tell me where I've gone wrong in utterly misconstructing the problem/solution or could push this gibberish the rest of the way or, quite frankly, in this instance, just write the thing for a mentally challenged person, I'd be eternally grateful. There's just a fundamental *thing* about scripting that I'm just not getting. And, yeah, I suppose I should 'use perl' but I intend to get familiar with bash, awk, sed, and all those great guys and am willing to learn... I just can't seem to. But I have no interest (any more) in perl and wouldn't want to (try) to learn it for this single task.

Sorry about the length of the post but I didn't know how to make it any shorter without making it even less clear. Now I'm going to go lay down. I've been working on this for days (or months, depending on how you look at it) and one version of these notes went down in flames when a hard drive crashed today and I had to reconstruct. It just doesn't seem like it should be hard. 'If field 2 of each record of file1 matches field 4 of any record of file2, add field 1 of file1 to field 7 of file 2'.
Old 02-25-2004, 07:14 PM   #2
Senior Member
Registered: Oct 2003
Location: Zurich, Switzerland
Distribution: Debian/unstable
Posts: 1,357

Rep: Reputation: 47

If I understand correctly, the problem is with trimming the spaces/tabs away from the starts and the ends of lines/fields.

There is a sample sed script that does the trick:
echo -e '  \t  joo   \npajoo  \n  Niinpaniin' | sed 's/^[ \t]*\([^ \t]*\)[ \t]*$/\1/' | hexdump -C
(The echo is for generating sample data and hexdump is just for verifying that the result is correct).
Old 02-25-2004, 09:52 PM   #3
Senior Member
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Original Poster
Rep: Reputation: Disabled
Hm. Ironically, I figured out awhile ago that a space was the problem with why the variables weren't testing as equal. But that wasn't the primary problem and wasn't cropping up for any sensible reason. The second two-column file is actually already generated mostly by sed in another script. I just changed this

sed -e 's/.* - //;s/\[ .*//g' ${dir}file.txt | \
sort | uniq -ci > ${dir}${date}_totals.txt

to this

sed -e 's/.* - //;s/ \[ .*//g' ${dir}file.txt | \
sort | uniq -ci > ${dir}${date}_totals.txt

iirc. Drove me batty because I couldn't *see* anything wrong, but the initial sed script was leaving a trailing space before the bracket and everything after it was removed. This strips the space and everything after the bracket and now they test true. (The space between the numbers and the names are actually tabs and not an issue and the html strips clean.)

But thanks very much for that - it would have set me on the right track and I was beginning to think this was a permanent zero-reply thread.

My actual problem is that I simply don't know what the hell I'm doing. As I say, in human terms, I manually read a line from file A, extracting the value of field 2. I then scan file B for a match. Having found the match, I take the value of field 1 from the same line of file A and add it to the corresponding value of field 7 of file B. And so it repeats for each line of file A until file A is exhausted.

I just don't know how to translate the algorithm to machine terms and script it. I *thought* an 'array' and letting awk and/or sed 'process' them would be the way to go. But I don't even understand 'arrays' or know quite *how* to 'process' them. On simple transforms like just generating file A, I don't have a problem. (Except with spaces. ) But generating file C from file A and file B is killing me.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Find/grep command to find matching files, print filename, then print matching content stefanlasiewski Programming 8 12-18-2013 05:36 PM
DNS Entry Addition pradsy90 Linux - General 1 06-07-2004 01:34 AM
Second Hard Drive Addition.... Pleiades Fedora 1 04-15-2004 08:39 PM
Odd float addition in C evaluator Programming 1 09-24-2003 10:55 PM
Links Addition(s) MasterC LQ Suggestions & Feedback 2 03-24-2003 01:42 PM

All times are GMT -5. The time now is 10:42 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration