LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-06-2012, 07:41 PM   #1
reach.sree@gmail.com
LQ Newbie
 
Registered: Jun 2012
Posts: 6

Rep: Reputation: Disabled
Sum up value of columns in 4 files using shell script


Sorry my earlier question might not have been clear enough. Hoping this is clearer

I have records like below in 4 different files which have about 10000 records each, all records unique and sorted based on column 2.

1 2 3 4 5 6 7 8
-------------------------------------------------
SR|1010478|000044590|Montfort Dr|Dallas|4|1|2|
SR|1014759|000105790|Holmes Dr|Connecticut|2|2|3|
SR|1016609|000108901|Mark Dr|Houston|1|3|2|
SR|1017503|000111262|Preston Rd|Chicago|3|4|1|

1 2 3 4 5 6 7 8
----------------------------------------------
SR|1010478|000044590|Montfort Dr|Dallas|3|1|1|
SR|1014759|000105790|Holmes Dr|Connecticut|2|2|2|
SR|1016609|000108901|Mark Dr|Houston|1|3|3|
SR|1017503|000111262|Preston Rd|Chicago|3|4|0|

I want to get column 2 of each file(like 1010478) and sum up values of columns 6,7,8 of each file and the whole process should not take more than 15-30 min. So my output should be a different file with records as below.



Total record count = 4
1 2 3 4 5 6 7 8
---------------------------
SR|1010478|000044590|Montfort Dr|Dallas|7|2|3|
SR|1014759|000105790|Holmes Dr|Connecticut|4|4|5|
SR|1016609|000108901|Mark Dr|Houston|2|6|5|
SR|1017503|000111262|Preston Rd|Chicago|6|8|1|
Can someone help me with this

Last edited by reach.sree@gmail.com; 06-06-2012 at 08:57 PM. Reason: better clarity
 
Old 06-06-2012, 08:20 PM   #2
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Homework?

Regardless, I'm not following what you want----eg what exactly is column 4 in the second group? Is it maybe the number of occurrences of the string in column 2?

to get the number of occurences of a pattern in a file, you can do this:
Code:
grep 'pattern' filename | wc -l
(this assumes one occurence of the pattern per line)
 
Old 06-06-2012, 08:59 PM   #3
reach.sree@gmail.com
LQ Newbie
 
Registered: Jun 2012
Posts: 6

Original Poster
Rep: Reputation: Disabled
Hi pixellany,

Thanks for the prompt reply. It is not the number of occurrences of string in column 2. the string in column 2 has a count in column 6. I want to get the string in column 2 find it in all 4 files and sum the counts in column 6. Please look at the edited question for more clarity.

regards
 
Old 06-06-2012, 09:23 PM   #4
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Let me play it back to see if I understand.

There are 4 files. Each one has a series of entries in the format shown. Find all the entries with the same pattern in field #2,and then add up the numerical values in the last 3 fields.

Assuming this is correct, then here is how I would start:

concatenate (combine) the files into one file
sort on the 2nd field
in a loop:
---extract the values in the last three fields
---move to the next line and see if field 2 is still the same---if so, add the values in the last three fields to the running total
---when field 2 changes, then write out the desired output and then start accumulating again

Actually, it may wind up being at least two nested loops---the inner loop runs for each new value in field 2
 
Old 06-06-2012, 10:33 PM   #5
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Forty thousand records should not take more than a few seconds using awk. Save the following as say sum678:
Code:
#!/usr/bin/awk -f
# Setup rule:
BEGIN {

    # Accept any newline convention. Each line is a record.
    RS = "(\r\n|\n\r|\r|\n)"

    # Fields are separated by pipe characters.
    FS = "|"

    # No keys known yet.
    keys = 0
}

# Rule applied to each record:
{
    # Second field is the unique key.
    k = $2

    if (k in key) {
        # Already known key. Just sum.
        sum6[k] += $6
        sum7[k] += $7
        sum8[k] += $8
    } else {
        # Add new key into the key array (indexed 1 to keys),
        key[++keys] = k
        # but the data in associative arrays under that key.
        common[k] = $1 "|" $2 "|" $3 "|" $4 "|" $5
        sum6[k] = $6
        sum7[k] = $7
        sum8[k] = $8
    }
}

# Final rule:
END {
    for (i = 1; i <= keys; i++) {
        k = key[i]
        printf("%s|%.0f|%.0f|%.0f|\n", common[k], sum6[k], sum7[k], sum8[k])
    }
}
Make it executable using chmod a+rx sum678, and run it:
Code:
./sum678 file1 file2 ... fileN > results
Note that the sums are output as rounded to nearest integers. If you want general numeric format, use %g instead of %.0f. For other formatting options, check out printf-related parts in the GNU Awk User's Manual.
 
Old 06-07-2012, 05:48 PM   #6
reach.sree@gmail.com
LQ Newbie
 
Registered: Jun 2012
Posts: 6

Original Poster
Rep: Reputation: Disabled
Thanks all. I went with
awk -F"|" '{a[$2]+=$6;b[$2]=$1 FS $2 FS $3 FS $4 FS $5 FS a[$2] FS} END {for ( i in b) { print b[i]}}' file1 file2 file3 file4
 
Old 06-07-2012, 07:29 PM   #7
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
that's pretty ugly, I'd make you do it again
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
shell script to find the sum of numbers associated with each character in a string Aagam Linux - Newbie 22 02-07-2012 08:59 AM
mysql sum from one of two columns secretlydead Programming 1 12-16-2009 01:11 AM
shell script/command for converting columns/table onto a single line skuz_ball Programming 9 11-30-2007 03:02 AM
Shell - Accessing COLUMNS in a script!? BlueSpirit Programming 3 06-04-2007 12:22 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:59 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration