Sum up value of columns in 4 files using shell script
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I want to get column 2 of each file(like 1010478) and sum up values of columns 6,7,8 of each file and the whole process should not take more than 15-30 min. So my output should be a different file with records as below.
Total record count = 4
1 2 3 4 5 6 7 8
---------------------------
SR|1010478|000044590|Montfort Dr|Dallas|7|2|3|
SR|1014759|000105790|Holmes Dr|Connecticut|4|4|5|
SR|1016609|000108901|Mark Dr|Houston|2|6|5|
SR|1017503|000111262|Preston Rd|Chicago|6|8|1|
Can someone help me with this
Last edited by reach.sree@gmail.com; 06-06-2012 at 08:57 PM.
Reason: better clarity
Regardless, I'm not following what you want----eg what exactly is column 4 in the second group? Is it maybe the number of occurrences of the string in column 2?
to get the number of occurences of a pattern in a file, you can do this:
Code:
grep 'pattern' filename | wc -l
(this assumes one occurence of the pattern per line)
Thanks for the prompt reply. It is not the number of occurrences of string in column 2. the string in column 2 has a count in column 6. I want to get the string in column 2 find it in all 4 files and sum the counts in column 6. Please look at the edited question for more clarity.
There are 4 files. Each one has a series of entries in the format shown. Find all the entries with the same pattern in field #2,and then add up the numerical values in the last 3 fields.
Assuming this is correct, then here is how I would start:
concatenate (combine) the files into one file
sort on the 2nd field
in a loop:
---extract the values in the last three fields
---move to the next line and see if field 2 is still the same---if so, add the values in the last three fields to the running total
---when field 2 changes, then write out the desired output and then start accumulating again
Actually, it may wind up being at least two nested loops---the inner loop runs for each new value in field 2
Forty thousand records should not take more than a few seconds using awk. Save the following as say sum678:
Code:
#!/usr/bin/awk -f
# Setup rule:
BEGIN {
# Accept any newline convention. Each line is a record.
RS = "(\r\n|\n\r|\r|\n)"
# Fields are separated by pipe characters.
FS = "|"
# No keys known yet.
keys = 0
}
# Rule applied to each record:
{
# Second field is the unique key.
k = $2
if (k in key) {
# Already known key. Just sum.
sum6[k] += $6
sum7[k] += $7
sum8[k] += $8
} else {
# Add new key into the key array (indexed 1 to keys),
key[++keys] = k
# but the data in associative arrays under that key.
common[k] = $1 "|" $2 "|" $3 "|" $4 "|" $5
sum6[k] = $6
sum7[k] = $7
sum8[k] = $8
}
}
# Final rule:
END {
for (i = 1; i <= keys; i++) {
k = key[i]
printf("%s|%.0f|%.0f|%.0f|\n", common[k], sum6[k], sum7[k], sum8[k])
}
}
Make it executable using chmod a+rx sum678, and run it:
Code:
./sum678 file1file2 ... fileN > results
Note that the sums are output as rounded to nearest integers. If you want general numeric format, use %g instead of %.0f. For other formatting options, check out printf-related parts in the GNU Awk User's Manual.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.